Mar 21, 2010

Muddiest Point - Unit 8

Apparently I read the right chapter from the wrong book last week, WOOPS! Anyway, no muddiest points for me this week.

Mar 14, 2010

Readings - XML Retrieval

The IIR chapter on XML retrieval contained a number of basic elements about XML and XML parsing tools as an introduction then launched into an extensive discussion of XML retrieval methods.

One element of confusion I had was with the Structured Document Retrieval Principle. The principle itself is not at all confusing, but the example given in the text seems to be the opposite of the ideals of the principle. If the idea is to provide the most specific element vis a vis the query, why would a query for Macbeth return the Title Macbeth rather than the Scene Macbeth's Castle, which is a more specific element (i.e. further down the element tree)?

The discussion of the vector space model of XML retrieval is a little confusing, but I suspect this will be clarified in the lecture this week.

It would also be nice have a little more discussion of data-centric XML retrieval. The chapter basically blows this off as something this is best not handled in XML retrieval, but maybe we could talk about that a little bit. I am curious, even, what a data centric XML file would look like, given that most data is tabular and linked across fields.

Muddiest Point - Unit 7

The only issue I have at this point is with the homework. Given that all of the assignments are, to a degree, dependent upon successful completion of the previous assignment, I think it would be helpful to get feedback (grades, comments) relatively soon.

Thanks!

Feb 21, 2010

Readings - Relevance Feedback and Query Expansion

I found the readings this week to be informative, as I had never considered the details of how user-feedback is or could be incorporated in to IR. I specifically found the discussion of pseudo and implicit relevance feedback to be interesting. I wonder about the tradeoffs between query efficiency and retrieval success in pseudo relevance feedback given that one must, presumably, run two queries to get one result. Is  this efficiency not really an issue?

I also found the discussion of thesaurus-based query expansion to be interesting. I have seen some of this in my work with bibliographic databases, but might look into it a little more now that I understand how it works.

Feb 19, 2010

Muddiest Point - Unit 6

I am still interested in hearing why, when comparing results of IR systems using MAP and other averaging statistics, the IR community does not also look at variability about the mean. Simple statistics can tell us a great deal about whether things are truly 'different' in systems like this or how significant those differences are. Is this truly not an issue that is discussed in this field?

Feb 14, 2010

Readings - Evaluation


The readings on evaluation this week were interesting, if a little too, i think, focused on objective evaluation. In IIR the discussion focused on standard test collections, precision and recall statistics, and a number of other statistics such as the precision-recall graph and mean average precision. This was followed by discussion of relevance and the problems associated with human-based evaluation. A few things in this reading came to mind that I wouldn't mind seeing discussed in class. First, given that most of the test collections discussed in the reading are from news wire or other news sources, is there any concern, and has there been any study, about potential bias of evaluation statistics based on the content of these documents? Second, the reading seemed to downplay pretty heavily the utility of human-based evaluation. Isn't there a place for this subjective evaluation given that this is how these systems are evaluated in the end-game (by human users)?


Muddies Points - Unit 5

No muddiest points this week. The lectures have been doing a great job of clarifying the readings.