Well, now we're getting down in the weeds with some of this material and I kind of like it. The discussion in the Arms chapter answered a lot of questions that have come up in my mind over the last, say, 10 years. Difference between HTML and XML, what is 'mark up', how are electronic texts generated and rendered. While we have certainly only seen the tip of the iceberg when it comes to SGML, XML etc. I wonder how difficult these languages are to learn and to use effectively. I also wonder about OCR and how much it may have improved since Arms wrote this text (its been 10 years after all). Are there OCR programs that are able to read handwritten text, non-standard characters (e.g. the long s seen often before the end of the 18 century), etc.?
I was also fascinated by the discussions in both the Paskin and Lynch documents. While reading Paskin it occurred to me that the proprietary nature of the DOI system might be problematic, or at least viewed as problematic by a community that tends (as far as I can tell) to prefer open systems for software development and programming. The other identifiers described in Lynch seem to each have its own specific niche, and I am curious about interoperability among these identifiers. The URN appears to be more universal in approach, but it is not ready for use at this point (or at least web browsers are not able to make use of them).
As mentioned with the DOI system, the questions raised by Lynch vis a vis uncertainties surrounding potentially proprietary third party databases is unsettling. One would hope that these systems could be implemented in an open (and free) way.
Sep 17, 2009
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment