dean: OCLC WorldCat Quality Report

My job as a cataloger (as you probably know) requires me to use OCLC’s WorldCat daily, through the Connexion Client. For you intrepid readers that skimmed that jargon-heavy first sentence, allow me to explain. WorldCat is the largest bibliographic database in the world - meaning that it has more catalog records in it than any other database. It is where I start when I catalog a book, as well as where I do the majority of my work. As we are an OCLC member library, we are expected to contribute records (and update our holdings) for every item we have in the library collection. As you can imagine, in the course of cataloging items in the collection, I come into contact with records of a wide array of quality, from very minimal level records, to records that (to me) seem to have too much information in them. There is truly no standard level of quality in the WorldCat database, and even records indicated to be of the best quality are often not exactly that. When my colleague Penny Baker at the Clark posted this tweet recently (in her usual humorous style), I came across OCLC’s report on the quality of records in the WorldCat database.

West End Branch. Boston Public Library. Jewish book week. Arranged by Miss Goldstein, the centerpiece is a wood-carving with gold inlay of Maimonides by the Boston artist Boris Mirski

Needless to say, I was curious to know what OCLC has to say about the quality of records in its database. The report (in no great depth) discusses what I see as four key issues that are key to the current high rate of “sludge” in the database. The first issue that the report highlights is the “unprecedented growth of WorldCat” (p. 3) due to record loads from libraries outside of North America. This growth was too much for the automated duplicate resolution software to handle, so I suppose OCLC just switched this software off. Why would they do this? I wish they would have told us more about why they just gave up, and let these problems mushroom. However, the revision of this software is discussed as being insufficient to keep up with the growth of records in the database.

The second issue discussed in the report is the policy of OCLC allowing what they call “parallel records” (p. 7), or records about the same item, but with records in different languages. I found the graphic on this page to be completely unhelpful, as from the titles one cannot discern (this is often the case in Connexion as well) which records are in English. There are many of these, as there are many records created by national bibliographic institutions (the Library of Congress and the British Library) that have many fields in them in other languages. At best, deleting these fields from the records before exporting them into the local catalog takes a few minutes, and - at worst - it requires almost a complete re-working of the record, which can take much longer, depending upon the complexity of the item being described. Either way, these parallel entries in WorldCat reduce the efficiency of the cataloger at work.

One issue that I discussed recently with Sam Duncan at the Amon Carter is the propagation of records in the WorldCat database for reproductions of the original item. The library at the Carter acquires a great deal of dissertations, which have moved from being print copies to digital reproductions, usually in a PDF format. Keeping the “genealogy” of these records (from original to reproduction) intact is very challenging with current (and even future) metadata standards. These records for reproductions of originals contribute to the “sludge” in the database as well. This sludge creates a challenging experience for the end-user of WorldCat.org in searching for a specific item in a library near them - one of the supposed strengths of the database.

What has OCLC done in response? I must applaud them for not just tweaking their duplicate detection algorithms, but for thinking a bit outside of the box and into the future, and thinking about both catalogers and end-users of the data we create. GLIMIR is its name, and I want to share with you my feelings about it.

The Global Library Manifestation Identifier took a page right out of the model that is FRBR (Functional Requirements for Bibliographic Records) and it’s nice to see such a heavyweight in the library world embracing FRBR in such a strong way. Using those principles to group records will help to generate a better end-user experience for people using WorldCat.org for item discovery, and will ostensibly improve my experience in the database as a cataloger. That said, I reserve the right to change my feeling about how this will improve the experience until it is implemented. On the more technical side of things, having a single number for multiple items will be advantageous to help databases and ILS’ manipulate the data in the best way possible. A standard number makes for far more accurate and predictable machine manipulation of data, rather than relying on subfields and fields in a metadata record with non-standard text in them.

There are some disadvantages as well. To mirror my first point in the previous paragraph, this program forces FRBR down people’s throats. Catalogers and libraries are not all the same - simply because a big player like OCLC or the Library of Congress embraces it doesn’t mean that all libraries will just toe the line. We all have different needs that we serve, and FRBR as a model just might not work for libraries out there. Hopefully GLIMIR implementation will be done in such a way so as not to make WorldCat difficult to use for libraries that don’t want to use FRBR. My other concern is almost trivial, but why did we need another acronym, and why the extra vowels? I think we catalogers and technical services librarians already have way too many acronyms.

I am curious to see what this new model will look like in the database. I think it has some great potential, but it will not be the ultimate solution to the lower quality of bibliographic data in WorldCat. Few single programs like this solve such a complex problem, and this model does not really solve the fundamental issue - catalogers contributing sub-standard records to the database. Of course, training and education for catalogers is another post for another time. I’d love to hear your thoughts about this post, or the OCLC report, in the comments!

dean

Tuesday, September 20, 2011

OCLC WorldCat Quality Report

No comments:

Post a Comment