Musematic
Cultural Collections and the Semantic Web

Posted by on Wednesday May 12 2010

Within change lies great opportunity, but what happens when individual change is incremental and the rest of the world is exponential? Our cultural institutions are in the slow lane, still, and they are being threatened because of it.

I was reminded of that threat very keenly today, after I saw this article from Read Write Web: Google Adds Semantic Search Results with Google Squared.

According to the company’s blog from one year ago today, when Google Squared first launched, “unlike a normal search engine, Google Squared doesn’t find webpages about your topic — instead, it automatically fetches and organizes facts from across the Internet.”

By clicking “show sources” on the Squared-provided result, a list of sources appears showing you how Google is arriving at this answer.

Much of this information, however, relies either on Google’s ability to naturally parse information or for web publishers to begin “adopting microformats or RDFa standards to mark up their HTML and bring this structured data to the surface”, as the company wrote at launch last year.

This is really quite fantastic for information-seekers. We’ve been heading this way for a while, and I’m happy to see this rolled out on this scale. For most users, they won’t even notice it, but for those of us whose business it is to provide content, what does it mean for us?

First, it means is that we’ve really got to get our collective acts together. When I was writing my master’s thesis four years ago, I posited that soon we would have our collections online and we would be able to move on from public access and onto public interpretation. Unfortunately, my timeline was wrong and many institutions are still at square one.

Looking collectively at the field, there are hundreds (or thousands) of collections, large and small, who still do not have collection information management systems, digital asset management systems, content management systems, SEO optimization, metadata standards, embedded metadata, or a combination of all of the above.  Why is this? Well, for the most part, museums, libraries, and archives are notoriously bad at adopting complex technologies unless significant pressure is applied either internally or externally. And when they do recognize the need, the persons responsible for advocating for adoption find themselves stuck trying to explain something intangible to a board of directors who are more interested in on-site programming and foot traffic. It is incredibly difficult at this time to show hard statistics about SEO increasing foot traffic or even online learning.

For example, here’s my institution, the Magnes’, most well-known work:

Lavater and Lessing Visit Moses Mendelssohn (1856) by Moritz Daniel Oppenheim - Magnes Collection

I just performed a search for this piece, using a couple of different search terms. The first result in both image and web search (Google) was the page in Flickr. Second hit was to Wikipedia and then Wikimedia Commons. For the last two, the image had been scanned from a German text Magnes had licensed the image to. There was no link to Magnes, nor anything that suggested that the piece was in our collection. There were zero hits to our website or to our collections online (in my own defense, we’re overhauling our website for precisely this reason, embedding metadata into the images, and I have no control over database SEO right now!).

This isn’t limited to small, underfunded history museums. If you perform a web search for “starry night van gogh”, MoMA is the third hit. Not too bad, actually. But if you perform an image search, Van Gogh’s “Starry Night” is displayed prominently, but you won’t find MoMA as a source until the bottom of the second page.

Ok. So clearly we all need to do some work with optimizing both our sites and our images. However, those things are really hard for most museums. If I could wave a magic wand, there would be a product that does all of the following (vendors, are you listening? Take notes. DO THIS. We will give you our money!):

  • Manages museum, library and archive information in one, federated database, including exhibitions, conservation, provenance, rights, location tracking, etc.
  • Exports and imports into a variety of crosswalked metadata standards
  • Utilizes controlled vocabularies and standards in order to facilitate pan-institutional linkages
  • Serves as a robust digital asset management system, embedding the collection data into EXIF/IPTC/XML fields of the master asset and makes derivatives at will
  • Displays collection assets online in a clean, flexible, attractive manner, utilizing sharing and embedding features, optimizing keywords and tagging, and having an available API for online visitors to use

The frustrating thing for me is that I’ve seen a lot of systems that do most of this, but not all of them. Institutions who can do this use a variety of systems, bandaged together with bits of programming. Granted, I’m not a programmer, but I know such a system is possible. I’ve seen bits of it work together, but none all at once. Lacking such a system means that institutions can’t fully prepare their assets for the semantic web.

Archives and libraries are likely a bit better off, as they’ve applied easily-computer-readable XML standards to their already happily formatted data. But I have to wonder if the model for siloing data will be a benefit or a curse in the long run. Will these silos be flexible enough to engage with online users expecting to find information with only one or two search terms, in only one location? This leads me to my next point…

Second, we need to take a hard look at how we’re actively sharing our data and with whom. Search engines are not going to find all of our stuff, from us, if we release the assets online without some methods of bringing the user back to home base. Like the Oppenheim painting above, users finding assets online won’t know where its from, and thus, probably, won’t have the benefit of any additional research about the original works. Our authority is threatened because of this. Our ace in the hole is that we have the authentic object, but what happens if no one knows where the authentic object is?

I’m not at all suggesting that we limit our release of assets, but I am recommending that we slow down a moment and take stock. “Just get them online” isn’t good enough. It’s never really been good enough, only a start. Online assets mean very little if they lose their context. A digital file of a painting may be pretty, but without the information we can provide, its only function is ornamental.

The way we use the web is changing. The way the search engines are using the web is changing. We need to respond more quickly to changing search algorithms and use patterns, and we need to try to figure out some easier solutions for linking our data to the rest of the web. A cohesive, easy-to-use product would be a good start. We’re also siloed within our own institutions and consortiums, making connections with our friends while forgetting that we’re also part of the larger world. I fear for those of us who don’t get a handle on these problems soon; if we fail to do so, our role as respected educational resources becomes diminished, if only from lack of exposure.

Be Sociable, Share!

Filed under: Collections andMetaverse andRandom Musings

7 Responses to “Cultural Collections and the Semantic Web”

  1. Bruce Falk
    May 13th, 2010 08:28

    I agree wholeheartedly, and think this leads straight into the needle-in-the-digital-haystack problem that I blogged about a few years back. Consistent with my latest blog note (http://blog.museotech.net/2010/04/upnext-proposing-new-paradigm-for.html), perhaps it’s time to put together a steering committee of museum participants to create (and then maintain) an open-source-built database of shared, stored resources, with each museum participant having independent rights to set access to their respective digitized collection and curate metadata?

    I’m not so fond of action by committee, but there are limits to what museums can do independently. It would appear that some serious, focused cooperation can get us much further down the digital pike than our independent resources would allow.


  2. Guenter Waibel
    May 13th, 2010 12:48

    Perian, great post! If you believe the premise that we’re all massively short-changing the potential of cultural heritage content by silo’ing it into the websites of 17,500 museums, 122,356 libraries and countless (I literally can’t find a count) archives, then it seems to me you’re looking at 3 options:

    1. Make the Search Engines our aggregator – that’s what SEO is all about. It’ll likely help you improve access to your stuff, but it won’t really raise all ships. You’re competing, not collaborating.
    2. Use existing online hubs as our aggregator – that’s in essence what the Flickr Commons did. There are great outlets for certain types of materials, like Flickr for photographic materials, but not one outlet where all of it can come together.
    3. Create our own cultural heritage hub that has proper SEO and discloses into existing online hubs as appropriate. I’m watching both http://www.collectionstrust.org.uk/culturegrid (literally watch this video http://bit.ly/dpYimM) and Europeana (http://www.europeana.eu/portal/) to see how our colleagues in the UK and Europe fare with that sort of thing. You could also think of the emerging Smithsonian Commons (http://bit.ly/a1OIeF) as a model which could be scaled up nationally.

    I’ve blogged about the challenges we face in the US in pulling this sort of thing off here http://bit.ly/adVLB2 and here http://bit.ly/arJKLH.

    P.S.: In case you did follow the link to the great Collections Trust video, Nick Poole will speak at the Collaboration Forum at the Smithsonian about new ways to think about aggregating content. The agenda isn’t up yet, but the announcement is at http://www.oclc.org/research/events/2010-09-20.htm.


  3. TS
    May 14th, 2010 05:14

    Regarding your wish list, what about these open-source projects?
    http://www.collectiveaccess.org
    http://www.collectionspace.org
    I think they both hit at least 4 of your 5 bullet points. Neither offers the holy grail of cross-institutional federation, of course, but if they — and other systems, and institutions — can agree on SEO and microformat standards, we might be getting somewhere. Actually, the Great Google will just dictate standards to us, so let’s make the vendors and open-source projects pay attention and prioritize accordingly.


  4. Perian Sully
    May 14th, 2010 05:46

    Indeed – that’s the trick with those projects (both of which I’ve been following with some interest. I’m sad that CollectionSpace won’t develop for archive and library information – at least not now. I’m not sure about CollectiveAccess). I haven’t seen an emphasis on online outreach of collections, via SEO, metadata, ontologies, etc. They’re great projects, and ideally, they’ll take all of that into account when developing them; but I haven’t seen anything that suggests they’re thinking beyond immediate desktop use and siloed publishing (ie. on the institution’s website). I completely admit that I’m not close with the developers, so I don’t know if that’s something they’ve addressed.

    Possibly, though, with open source solutions, it might be possible for us to build some plugins to add that necessary functionality.


  5. Perian Sully
    May 14th, 2010 05:55

    Thanks Guenter!

    With the challenges and models presented, what do you think the likelihood is of getting a national forum together to harness all of the efforts currently in play? There’re a lot of fantastic projects out there, each working on a different piece of the puzzle, but I haven’t come across anything unifying theory yet. The options you cite aren’t mutually exclusive, and it would be important to leverage all of them simultaneously (assuming the way people use web doesn’t drastically change in the immediate future).

    P.S. With any luck, I’ll be in a position to attend the forum. I’d love to participate. If not, though, looking forward to reading proceedings.


  6. Bruce Falk
    June 11th, 2010 12:44

    I was just asked by a colleague to identify “well-crafted museum collection / art education wikis,” and since there seemed to be an emerging list of links here, thought I’d add my own. I think we can all agree that we get the most bang for our buck by assuring the accuracy and credibility of content that is already publicly available before we go about reinventing or perfecting the wheel.

    That said, “well-crafted” is very much in the eye of the beholder, and since wikis are by definition crowd-sourced, I would expect the more popular ones to be more of a hodgepodge than one would find from a heavily-moderated site with low visitation. I’d also agree with TS and Guenter that we should not limit consideration to wiki-models alone, but consider all forms of online collaboration, be they well-populated communities which are open to content-partner contribution, such as Thinkfinity (http://community.thinkfinity.org/index.jspa) , and the brand new multi-disciplinary/pan-institutional .

    What about Liam Wyatt’s thoughts about our use of Wikipedia (http://conference.archimuse.com/forum/metrics_musums_wikipedia)? And finally, can we separate discussions of collaborating on development/deployment of a communal interface for accessing semantic data (say, by using Google Earth or Google Maps to focus our map-based content uploads in lieu of re-developing sites like Placeography (http://www.placeography.org/index.php/Main_Page) from our discussions about establishing standard protocols for recording and sharing data? Is it naive to consider the desire to collect and promulgate information (facts and expertise) to be shared by all, while allowing for continuing our respective unique interpretation and presentation?


  7. Mia
    July 10th, 2010 08:35

    Hi Perian,

    I wanted to let you know about discussions going on in London about ‘linking museums’ (a catch-all phrase I’m using for semantic web, linked data, RDFa, microformats and whatever other machine-readable formats are out there) at http://museum-api.pbworks.com/July-2010-meetup and http://museum-api.pbworks.com/Linking+Museums+write-up

    Helping people find their way back to the ‘authentic object’ from search engine results pages would be a really good test for any structured pages people manage to get online.

    cheers, Mia


Leave a Reply

Bad Behavior has blocked 1844 access attempts in the last 7 days.