brontobot: October 2009

Oct 30, 2009

Digital Preservation Notes

I found the motivating factors aspect of the Reiger (2008) article on large scale digitization initiatives to be interesting, especially the component that discusses the motivating factors of libraries. One of these motivating factors was the ability of participating libraries to engage in research and development on how to handle large amounts of digital data. I would be interested to see some of the research that has come out of these libraries and to see how their data handling practices compare with scientific data management organizations. Are there projects at libraries that we can look at that are investigating these issues?

As with many other terms in this class, OAIS is something that I have heard referred to a number of times since starting the MLIS program, but was never clear on the details. The Lavoie article gives a great overview of the OAIS and clarifies many of the questions that I had regarding the form and function of the OAIS Reference Model. Would it be possible to point us towards a data management plan that lays out in detail, for an actual digital archive or library, a management plan that covers all of most aspects of the OAIS model? I would be interested to see how the aspects of the OAIS model are approached by a real archive or library.

References:

Lavoie, BF. 2004. The Open Archival Information System Reference Model: Introductory Guide. DPTC Technology Series Report 04-01. OCLC Inc. and Digital Preservation Coalition.

Reiger, OY. 2008. Preservation in the age of large-scale digitization. Council on Library and Information Resources. Washington D.C.

Muddiest Points

I am curious about how difficult it is to set up search for an OAI-PMH or a z39.50 system is. Is these really complicated systems to set up? Are there tools (e.g. through DSpace or Greenstone) that allow a digital library creator to set up federated searches?

Oct 25, 2009

Weekly Post - Info. Retrieval week 2 - federated search, z39.50, PMH, etc.

Well, this week's articles were quite interesting and covered quite a bit of material.

Both the OAI-PMH and z39.50 articles gave great overviews of the history of these standards and generally how they function. Having heard quite a bit about both OAI-PMH and z39.50, but not knowing much about the details, it was nice to have a little clarity on the subjects. I felt, however, that it was difficult to get a really good sense for how, even generally, the two protocols function especially when compared with one another. Would it be possible to give a general overview of this in class?

The discussion of federated searching in the two articles we read was, I felt, fairly polarized and a touch superficial. I understand that there are a number of misconceptions about federated search, and it is important to clarify on these misconceptions. The Miller article was entirely too optimistic about the potential for federated search to change the nature of the library catalog. Miller tries to draw a parallel between the library catalog versus federated search paradigm to the amazon.com versus google.com paradigm. I'm not sure this holds much water given that people do not use a library to gain access to the broad array of information that the google comparison would suggest. I think, though, that the idea of a large-scale federated search program, like the one discussed in the Lossau article, could create a very powerful force in academic information access.

Oct 20, 2009

Information Retrieval

I was quite interested in the readings this week as they touched on subjects with which I am not entirely familiar. The broad discussion of how web search works was quite informative, followed by the Henziger et al. (2002) article discussing in more depth some of the issues surrounding web search. Over the seven years that have passed since the Henziger article was written I wonder if and how some of the issues presented have been resolved and what other problems have cropped up in web search. Specifically, has social media and the increase in the amount of linking changed the landscape of the link-based search results? For instance, do crawlers find social media websites and index the pages therein? How have search companies responded to links generating from social media websites?

I am interested to know how much of this type of material will be covered in the Advanced Topics in Information Retrieval course next semester.

References:

Henzinger, MR, Motwani, R, Silverstein, C. 2002. Challenges in web search engines. SIGIR Forum. 36: 11-22.

Oct 17, 2009

Muddiest Point - XML and Next Week Readings

XML: This class seemed very straightforward, even though we were discussing a fairly complicated topic. I am interested in hearing a little more about style sheets used for formatting and layout, but I realize these are probably outside of the scope of this class.

Next Week Readings: I can't access any of the readings for next week and I wonder if there is another source out there somewhere. I'll try another search to see what I can dredge up though...

Oct 8, 2009

Reading Response - XML

On the surface XML seems like a fairly straightforward language. However, once DTDs, XML Schema, and non-text entities come into the picture things seem to get quite a bit more complicated. Are there any good online resources with practice (homework) for programming in XML? As we all know, reading about programming can be somewhat informative but ultimately programming is the way to learn.

On a related note, I think it would be useful to see some of the things that XML allows us to do - especially the things noted in the Bryan article:
- Create a compound document
- Place imagery in a file
- Add editorial content to a file

Is there any way we can look at some examples of this in class or as a "hand on point?"

Also, as mentioned in the Bryan article an XML document is considered 'valid' if it has processing instructions (e.g. encoding, XML version, etc), a DTD (or a link to one), and a properly tagged document. Why would one want to have a non-valid XML document? Is there a reason to not include some of those elements or is it just bad practice?

Muddiest Point - Metadata

I am interested in learning how a metadata harvester works. What does it look for? Can it look for elements of more than one metadata schema if the element names are the same?

Oct 2, 2009

images on flickr

well, i've uploaded some images to flickr. here they are...

brontobot