brontobot: January 2010

Jan 31, 2010

Muddiest Points - Unit 4

I rather enjoyed class this week and any questions I had about the details of the reading were cleared up in discussion. No muddiest points this time!

Jan 24, 2010

The readings for this week, while a little complex, were fairly interesting. It is nice to see metadata fields or other structured fields brought into the discussion as they seemed a blatant exemption up to this point. Weighting of the zones of a document, also, seemed like an obvious improvement on the simple models we have discussed until now. I found the description of 'machine-learned relevance' to be overly complicated - is this just a simple calibration process (i.e. build the weights based on expert opinion for a known document set then calibrate the parameters for the calculation of weights to match the expert set)?

The discussion of the vector space model was slightly perplexing, though I think I understand the basics. I hope we can clarify in slightly simpler terms in class.

Muddiest Points - Unit 3

The lecture and readings for this last week were all fairly clear to me. While I understand Heap's Law, I still am not clear on the practical use of the law. Is it used merely for estimating the vocabulary size of the collection in order to help determine memory and storage requirements for the eventual postings list and dictionary?

Jan 17, 2010

Readings - Unit 2

The chapter on indexing methods was interesting. I like reading about the different methods and about the hardware constraints on indexing. I am not clear, though, on what the practical implications of these hardware constraints are. Clearly we are talking about fractions of seconds difference among indexing methods, and these add up to be significant issues, but what types of real-world differences are we talking about?

As for the compression chapter. Not much to say here.

I am enjoying this class greatly!

Muddiest Points - Unit 2

I don't have any muddiest points this week. Maybe next week though!

Jan 10, 2010

IR: Reading Notes Unit 2

Reading through the assignment for Introduction to Information Retrieval I was struck by how many of the basic indexing, posting lists, and tolerant retrieval I had covered in my previous programming work. Given that the details that we have been exposed to in this first reading assignment are likely quite elementary, I am curious to see how complicated these IR systems can become.

One point that was a touch confusing to me in the reading had to do with the permuterm index. I am not entirely clear on how rotating the term in question as mentioned on pages 53 and 54 of IIR helps resolve the wildcard query problem. Any chance we could have this explained a little more clearly in class this week?

IR: Muddiest Points Unit 1

No muddiest points here. Thanks!

Here Begins Information Storage and Retrieval Posts:

Coming Soon...