Tuesday, October 19, 2010

Baseball Archivists and Metadata

You know, it's aways interesting where metadata pops up. I was reading the New York Times on my iPad on the 13th of this month (October) and I read the article For Baseball Archivists, a Tag Ends Every Play, by John Branch, that combines two things I am very passionate about: baseball and metadata. This fascinating article discusses the task of describing all of the games in a "baseball year" (approximately 10,000 hours) with metadata, or, tags.

Portrait of Matty McIntyre, baseball player

So, you might ask, what is the point of all this work? Well, allow me to use this quote for illumination:

“Your archive is only as good as what you know is in it,” said Elizabeth Scott, M.L.B. Productions’ vice president for programming and business affairs.


Thank you, Ms. Scott. Any information resource is only as good as what you know is in it - and it is this fundamental task of accurate and helpful description that catalogers (or metadata generators/taggers) perform in the library. Let me walk down this cataloging path a bit further. As you read in the article, the "loggers" of these games have several buttons to choose from to describe events, etc. in the game - and on each of these buttons is a word or phrase. This way, each action or event is described consistently. This is a controlled vocabulary. Even though it was created in-house, it is still a controlled vocabulary with preferred names and headings. What does this mean for the patron or searcher? Well, as long as you are familiar with the preferred terms and names, you can search for, say, "throw equipment" in this system and the results returned would feature all of the cataloged (or logged) clips of a player throwing equipment. You can be more specific than that, of course, but I think you get the general idea. How does this apply to you in a library? Well, the vast majority of libraries in the United States catalog using the Library of Congress' Authorities. So, if you are reasonably familiar with the terms in LCSH, you can search almost all library catalogs very effectively. And even if you don't memorize the preferred terms, your librarian will be familiar with them! What do they look like in a catalog? Well, follow this link and check out the Main author, Subject, and Institution fields. These are terms from LCSH, or, Library of Congress Authorities.

So what difference did consistent metadata and description make for the MLB Networks folks with these video archives? Once again, I quote from the article:

There is no complete World Series from before 1965. It was not until 1998 that baseball kept the full broadcasts of all its games. Now its archive, up to about 160,000 hours, grows by about 10,000 hours each year.

Access to it has always been clumsy. Two years ago, if someone wanted the season’s worth of diving catches by Tampa Bay’s Carl Crawford, a guess at search terms would be entered into a computer. The results — probably incomplete because tags and search terms were not standard — would be a list of tapes. Tapes would be retrieved from the vast metal shelves of baseball’s library, plopped into a machine one at a time and forwarded to the desired highlight.

On Sunday night, Farrell typed “Crawford,” “diving” and “2010” into a computer. As fast as a Google search, 15 plays appeared in a list, each with a brief description. Clicking on the first, there was Crawford, making a diving grab.


So, in this case (if you'll pardon the pun), the proof is in the play. Or, the ease of finding it, just as the proof of the hard work of professional catalogers is in the ease with which patrons find the information resources they seek.

1 comment:

  1. This is very interesting. It makes me want to watch some baseball. Oh! Who won last night, by the way? We should be watching the Yankees and Rangers more. :)

    Ten thousand hours in a baseball year. Wow.

    ReplyDelete