Wednesday, November 3, 2010

The Semantic Web

This idea of the semantic or linked web is a concept I have been wrestling with over several months. It's a concept I see bandied about regularly in American Libraries, but I think it's something few people outside of the programming and IT world take the time to familiarize themselves with. As a librarian, I think it's my responsibility to at least be conversant with this new web initiative, and (as I think it's a great concept) it is something I would like to advocate for.

I've watched several videos and read a few articles online about this concept, but by far the best video I found is by the progenitor of this concept, Tim Berners-Lee. Here's his TEDTalk all about it:



Berners-Lee says that the purpose of the linked web is to take the raw data and do something with it. Pursuant to this goal, he highlights three broad requirements for this linked data:

First, it should use the http protocol, which is not just for documents anymore.
Second, the data shared should both be standardized and useful.
Third, the data needs to have relationships.

This seems like a simple task, no? Well, there is a mind-boggiling amount of data on the internet. Of course, I think many of us know this in an abstract way, but here's a beautiful look at what that means:



Of course, not only is there an incredible amount of information collectively "on" the internet, it is in different formats - html, xml, css, text blocks, images, and the variant standards of metadata associated with these disparate formats. In order for this data to be usable, it needs to conform to some type of widely accepted format - so that the program and computer you are working with will be able to read, interpret, and display this raw, shared data.

Ok, so you might be asking yourself - where do librarians fit into this call for linked data? Librarians have a responsibility to contribute to the semantic/linked web, and have the tools to make it better. Where then can we help?

Area one: Metadata and standards. Come on, admit it, you knew I would say this first. However, our user focused bent and expertise with both standards and metadata make us as a profession uniquely disposed to lend a very big helping hand to linked data in a fundamental (and perhaps less than glamorous) area. Without good, standardized metadata to describe all this open data, this movement is really for naught. Can you imagine looking at massive sets of unformatted and un-described data? Now, are librarians setting their jobs aside to do this for others? Not for the most part - the role we play is one in which we help guide the discussion about standards and such.

Area two: as Berners-Lee states, this movement wants the raw, unadulterated data available online so it can be used. What data can librarians share with the web? Well, as librarians, we are collectively setting up, administering, and using digital libraries and repositories which are loaded with data. We need to ensure that this data is freely available to the "web" in a standardized format so that it can be used. This would help get the information in these items out and usable to the wider world.

In addition, we should free up our bibliographic data for the web to use. Indeed, this is something Jason Thomale noted in his recent article:

Extensive efforts to rectify the situation have been underway for over a decade, such as creating new models for bibliographic data, updating cataloging rules, and attempting to convert library data to new formats. However, much bibliographic data is still locked away in MARC.

Almost all of our bibliographic data is locked up in two ways: OCLC's database and the MARC format. OCLC flatly refuses to make the bibliographic data it holds available to the wider world. On one level, I can understand this - their exclusive, massive holdings of this data are partially what makes people want to join. However, this data should be shared - what happens to this raw data could be incredible, and could even be incredibly helpful to libraries. MARC is great for cataloging in a library environment - I love using it for this. It is really bad as an objective metadata standard, though. In order for our bibliographic data to be shared, it must be "freed" from MARC.

So what's the point of all this hard work and sharing? Well, the hope is that people do really cool stuff with the data, and in the end, that's the point of being a librarian - the facilitation of learning and knowledge creation. Dave Lankes is a key progenitor this concept - which he discusses here:



This idea of taking data and using it in a new context with meaning is nothing new. Indeed. Dr. John Snow did it over 100 years ago during a cholera epidemic in SoHo. He took the data and made a map:



So what are some cool things being done now with this raw data? Here is a great example from Portland:



A year after his initial talk, Berners-Lee came back to TED and shared what one year of raw data sharing had achieved:



I guess my last question is this: Where do we sign up?

1 comment:

  1. What a well thought out post. You know I love the John Snow graphic and, believe it or not, I have a post coming up about public transport this week. We should have consulted one another. :)

    ReplyDelete