The readings on evaluation this week were interesting, if a little too, i think, focused on objective evaluation. In IIR the discussion focused on standard test collections, precision and recall statistics, and a number of other statistics such as the precision-recall graph and mean average precision. This was followed by discussion of relevance and the problems associated with human-based evaluation. A few things in this reading came to mind that I wouldn't mind seeing discussed in class. First, given that most of the test collections discussed in the reading are from news wire or other news sources, is there any concern, and has there been any study, about potential bias of evaluation statistics based on the content of these documents? Second, the reading seemed to downplay pretty heavily the utility of human-based evaluation. Isn't there a place for this subjective evaluation given that this is how these systems are evaluated in the end-game (by human users)?
No comments:
Post a Comment