Nov 29, 2009

Economics and Security Notes...

The Arms article on access management (Arms, 1998) provides a good overview of what must have been a fairly preliminary set of ideas on how to provide secure access to digital materials. I wonder if the article is still relevant, that is, are the policies for access management, more or less, still in place today? Have those policies changed a little, a lot?

The Lesk article gives a good overview of economic issues facing libraries. One of the main and most relevant points the Lesk makes is that with additional services that libraries are providing, the cost of those services must be made up somehow. How to make up that cost difference is an important consideration for libraries. A number of pay for service options are listed in the article, but it doesn't occur to me that any of them are actually being used in any significant way for DLs with the exception of the no-fee model. Is it even realistic for DLs to think about charging for information? Have any libraries (academic or otherwise) moved to a charge for service model for digital or analog information access?

Social Issues - Muddiest Points

Given our discussion of internet access worldwide, and the speculation that much of the increase in access in Asia and Africa is due to access via cellular telephones, do we have any good examples of DLs that are easily accessible or readable by smart phone? What types of DL applications can we envision that would use a smart phone rather than a PC as the primary user interface platform?

Oct 30, 2009

Digital Preservation Notes

I found the motivating factors aspect of the Reiger (2008) article on large scale digitization initiatives to be interesting, especially the component that discusses the motivating factors of libraries. One of these motivating factors was the ability of participating libraries to engage in research and development on how to handle large amounts of digital data. I would be interested to see some of the research that has come out of these libraries and to see how their data handling practices compare with scientific data management organizations. Are there projects at libraries that we can look at that are investigating these issues?

As with many other terms in this class, OAIS is something that I have heard referred to a number of times since starting the MLIS program, but was never clear on the details. The Lavoie article gives a great overview of the OAIS and clarifies many of the questions that I had regarding the form and function of the OAIS Reference Model. Would it be possible to point us towards a data management plan that lays out in detail, for an actual digital archive or library, a management plan that covers all of most aspects of the OAIS model? I would be interested to see how the aspects of the OAIS model are approached by a real archive or library.

References:

Lavoie, BF. 2004. The Open Archival Information System Reference Model: Introductory Guide. DPTC Technology Series Report 04-01. OCLC Inc. and Digital Preservation Coalition.

Reiger, OY. 2008. Preservation in the age of large-scale digitization. Council on Library and Information Resources. Washington D.C.

Muddiest Points

I am curious about how difficult it is to set up search for an OAI-PMH or a z39.50 system is. Is these really complicated systems to set up? Are there tools (e.g. through DSpace or Greenstone) that allow a digital library creator to set up federated searches?

Oct 25, 2009

Weekly Post - Info. Retrieval week 2 - federated search, z39.50, PMH, etc.

Well, this week's articles were quite interesting and covered quite a bit of material.

Both the OAI-PMH and z39.50 articles gave great overviews of the history of these standards and generally how they function. Having heard quite a bit about both OAI-PMH and z39.50, but not knowing much about the details, it was nice to have a little clarity on the subjects. I felt, however, that it was difficult to get a really good sense for how, even generally, the two protocols function especially when compared with one another. Would it be possible to give a general overview of this in class?

The discussion of federated searching in the two articles we read was, I felt, fairly polarized and a touch superficial. I understand that there are a number of misconceptions about federated search, and it is important to clarify on these misconceptions. The Miller article was entirely too optimistic about the potential for federated search to change the nature of the library catalog. Miller tries to draw a parallel between the library catalog versus federated search paradigm to the amazon.com versus google.com paradigm. I'm not sure this holds much water given that people do not use a library to gain access to the broad array of information that the google comparison would suggest. I think, though, that the idea of a large-scale federated search program, like the one discussed in the Lossau article, could create a very powerful force in academic information access.

Oct 20, 2009

Information Retrieval


I was quite interested in the readings this week as they touched on subjects with which I am not entirely familiar. The broad discussion of how web search works was quite informative, followed by the Henziger et al. (2002) article discussing in more depth some of the issues surrounding web search. Over the seven years that have passed since the Henziger article was written I wonder if and how some of the issues presented have been resolved and what other problems have cropped up in web search. Specifically, has social media and the increase in the amount of linking changed the landscape of the link-based search results? For instance, do crawlers find social media websites and index the pages therein? How have search companies responded to links generating from social media websites?

I am interested to know how much of this type of material will be covered in the Advanced Topics in Information Retrieval course next semester.


References:

Henzinger, MR, Motwani, R, Silverstein, C. 2002. Challenges in web search engines. SIGIR Forum. 36: 11-22.

Oct 17, 2009

Muddiest Point - XML and Next Week Readings

XML: This class seemed very straightforward, even though we were discussing a fairly complicated topic. I am interested in hearing a little more about style sheets used for formatting and layout, but I realize these are probably outside of the scope of this class.


Next Week Readings: I can't access any of the readings for next week and I wonder if there is another source out there somewhere. I'll try another search to see what I can dredge up though...

Oct 8, 2009

Reading Response - XML

On the surface XML seems like a fairly straightforward language. However, once DTDs, XML Schema, and non-text entities come into the picture things seem to get quite a bit more complicated. Are there any good online resources with practice (homework) for programming in XML? As we all know, reading about programming can be somewhat informative but ultimately programming is the way to learn.

On a related note, I think it would be useful to see some of the things that XML allows us to do - especially the things noted in the Bryan article:
- Create a compound document
- Place imagery in a file
- Add editorial content to a file

Is there any way we can look at some examples of this in class or as a "hand on point?"

Also, as mentioned in the Bryan article an XML document is considered 'valid' if it has processing instructions (e.g. encoding, XML version, etc), a DTD (or a link to one), and a properly tagged document. Why would one want to have a non-valid XML document? Is there a reason to not include some of those elements or is it just bad practice?

Muddiest Point - Metadata

I am interested in learning how a metadata harvester works. What does it look for? Can it look for elements of more than one metadata schema if the element names are the same?

Oct 2, 2009

images on flickr

well, i've uploaded some images to flickr. here they are...

Sep 21, 2009

Muddiest Point - 2.5

Hopefully we can discuss, in class, or in a special session with Dr. He or the TA (whose name I can't remember at this moment) troubleshooting the installation of DSpace or Fedora or Greenstone. I made an attempt at installing all three of these this weekend to no avail. Granted, I could have dug in to the help files and online message boards a little more, I thought that if there was some local knowledge vis a vis installations that we might be able to start using the software a little sooner.

Sep 17, 2009

Unit 3 Reading Notes

Well, now we're getting down in the weeds with some of this material and I kind of like it. The discussion in the Arms chapter answered a lot of questions that have come up in my mind over the last, say, 10 years. Difference between HTML and XML, what is 'mark up', how are electronic texts generated and rendered. While we have certainly only seen the tip of the iceberg when it comes to SGML, XML etc. I wonder how difficult these languages are to learn and to use effectively. I also wonder about OCR and how much it may have improved since Arms wrote this text (its been 10 years after all). Are there OCR programs that are able to read handwritten text, non-standard characters (e.g. the long s seen often before the end of the 18 century), etc.?

I was also fascinated by the discussions in both the Paskin and Lynch documents. While reading Paskin it occurred to me that the proprietary nature of the DOI system might be problematic, or at least viewed as problematic by a community that tends (as far as I can tell) to prefer open systems for software development and programming. The other identifiers described in Lynch seem to each have its own specific niche, and I am curious about interoperability among these identifiers. The URN appears to be more universal in approach, but it is not ready for use at this point (or at least web browsers are not able to make use of them).

As mentioned with the DOI system, the questions raised by Lynch vis a vis uncertainties surrounding potentially proprietary third party databases is unsettling. One would hope that these systems could be implemented in an open (and free) way.

Sep 15, 2009

Muddiest Point - Unit 2

I am curious about how, for instance, the OAI-PMH system works. I have seen a few examples and I know we are going to talk about this later in the class, but it just seems like such a useful protocol. Are there readily available software system that allow the use of the OAI-PMH (i.e. can you use DSPACE or Greenstone)? What are the operational boundaries to setting up a distributed digital library with this type of functionality? I imagine that in some communities getting everyone to format data or metadata in the required way (Dublin CORE?) could be a major sticking point. If the metadata schema required to use the OAI-PMH does not work well with an established institutional metadata schema are there workarounds? For instance, Federal projects are required to store geospatial metadata in FDGC form, can that be augmented in order to set up an OAI-PMH based digital library?

Anyway, maybe I am getting ahead of things, but this is very interesting to me and what I hope to do in the future.

Sep 10, 2009

Reading Response - Week 2

The issues of interoperability raised by Payette et al. are of special interest to me - especially as they relate to digital libraries in the scientific domain. I have seen a need for work in this area while working with scientists on interdisciplinary research projects. Often investigators from different scientific backgrounds have different standards for data formatting which can result in a great deal of efficiency loss when data sharing occurs - due primarily to time spent reformatting data. The architecture proposed by Payette et al. (and by Arms et al. for that matter) not only could allow investigators to search for and share data more easily, but potentially could contain technology, in the form of disseminators, for data conversion (e.g. from one data format to another) in the data sharing process.

The extensibility and flexibility of the systems described in Payette et al. and Arms et al. would also be useful in the types of research work I have been involved with in the past. Historically, data sharing systems I have used have been static, which causes many a problem when switching from one project to another or when integrating new data into a project.

I also appreciate the basic principles of the architecture described by Arms et al.: the need for flexibility in the user interface, the need for straightforward collections management, and the need to keep the social, economic, and legal (and I would add technological to this list) frameworks in mind when developing the library. Quite important, in my experience, is the need for flexibility in the user interface. The interface must take into account as many of the potential needs of the user as possible is crucial. On the one hand, an overly focused interface may restrict the usability of the library. However, there is a limit to this flexibility; one should not generalize the system to an extent that makes it too general. The question is how to balance flexibility with applicability vis a vis the expected user base.

A few questions come to mind when reading these articles. Payette et al. and Suleman et al. are both about a decade old and both articles describe either prototype or early systems as examples of the architecture they describe. How have digital library architectures changed in the intervening years? Are the basic concepts the same as they were when these papers were written? Do we have any examples of large interoperable digital libraries that we can look at?

References:

Payette, S, Blanchi, C, Lagoze, C, Overly, EA. Interoperability for digital objects and repositories, The Cornell/CNRI Experiements. D-Lib Magazine, May 1999, Volume 5, Issue 5.

Suleman, H, and Fox, EA. A Framework for Building Open Digital Libraries, D-Lib Magazine, December 2001. Volume 7 Number 12.

Arms, WY, Blanchi, C, and Overly, EA. An Architecture for Information in Digital Libraries. D-Lib Magazine, February 1997.

Sep 4, 2009

Muddiest Point - Week 1

I am confused about the course of assignments over the next two weeks.

We do not have class next week (Sept. 7). Are we expected to post week 2 reading comments this week even though week 2 doesn't actually happen until the following week? Are week 1 reading comments required?

If you could clarify I'm sure it would be appreciated by more folks than me.

Thanks!

Sep 3, 2009

Blog is go!

here starts the blog i have created for various MLIS program activities. enjoy!