Of Spiders, Yahoo, and Regular Folks

By Evan Meehan

Data, whether it be found at a museum, or online, is only useful if it can be found.  In museums this means that uncatalogued artifacts sit in a bin on a shelf collecting dust until someone has time to brush the dust off, figure out what this artifact is, write down the information, and then put the bin on a different shelf.  At some point in the future the items contained within that bin may become part of an exhibit.  Online it is much the same.  As a rephrasing of a zen koan goes: if a website is published but cannot be found on google, does it contain any words?

Continue reading

Advertisements

Should We Let Them Inside the Tower?

The discussion of folksonomies is interesting, and one that I know many museums are having right now.  Most museums these days attempt to have at least a selection of their collections database available for researchers to search, usually via their website.  This sounds like a great idea and many people express at least a novel interest in it.  Anybody who has ever actually searched one of these databases, though, can tell you that you have to be lucky to find exactly what you are looking for.  This is not due to technological limitations or anything, it is due to the museum field’s complicated methods of cataloging objects.  We restrict access to collections databases like they are the “nuclear football.”  Most staff members will never have access, and when they do manage to get it, it is usually only on a read-only level.  Even when we bring in interns to work with the collections, they have to undergo a background check and at least be in the process of obtaining a graduate degree (absolutely no undergrads!).  We treat the database almost as a sacred temple.  And much like a sacred text, the classification system needs a trained professional to translate it.  “Communication T&E” is not the first category that would come to someone’s mind when searching for something like a telephone.  Even when the trained staff is cataloging items we often have to check our copy of Nomenclature (and sometimes get very creative with categories).  This, of course, is what complicates the process of making our database searchable on our website.

So, many museums are now looking towards a system of folksonomies to bolster the standard categories and classifications.  Much like Nina Simon’s idea of allowing patrons to tag objects within the galleries, allowing patrons to tag items within the collections database would provide a feasible way to make the collection easily searchable using more common categories that people outside of the museum field can actually relate to.  It would also take the stress off of the web team rather than making them guess all of the phrases that a web user may try to use.  This very well could be successful in creating an online collections database that is actually usable by the general public.  The maintenance of this database, though, would take a lot of effort and could eat up significant amounts of staff time, a scarce resource for most museums.  The tag submissions would frequently be plagued by spam-bots and trolls, trying to inject unwanted materials into the metadata.  It is a risk/reward scenario that each museum will have to weigh to decide whether it is worth it or not.  For us, this is a question that we haven’t been able to answer yet.

Organizing Our World

When considering how to approach the management of digital collections, several important questions arise. How can we, as digital curators, begin to impose order on the endless (and exponentially growing) amount of data that exists in cyberspace? Who determines how to organize it, and will the system even reveal the data once it has been queried?

It seems that every organization who may have data to catalog has a different set of system standards. Elings and Waibel throw so many acronyms at me that I can’t even begin to sort them all out. And with each standard having different parameters, one would have to be super human to master all of them. This begs the question of how usable – and searchable – all these different database systems are if they exist exclusively and cannot be cross referenced. If a library and a museum have the same rare first edition book in their collection, will both systems produce a hit on the item if users do not know how to search the database? Or what if they are only familiar with one and not both? Systems must be able to be cross-referenced, and items are best cataloged using the standards meant to describe the particular type of media, not the standards meant for the institution that happens to be housing that media.

Sometimes, to work for humans, the system needs a bit of the human touch. Enter folksonomies. The layman’s catalog that discards the systematic tree branches of organization and labeling and instead shifts to the use of clouds and trails of tags. While only slightly less efficient at imposing order and producing search results, it democratizes the cataloging system and allows everyone to tag items, not by some strict and meticulous set of standards, but based on what the thing is most akin to. To simplify even further, like is tagged with like, and we  choose which tags are most useful by collectively using them the most. Based on the statistics of search results, it seems unfair to call one method of cataloging far superior than the other, and best to say they benefit each other when used together. Can a system that operates without  defined parameters and fundamentally lacks a systematic approach be tailored to produce results for the computers that are combing the database, or is there simply too much out there to be cataloged by hand? With the excess of items waiting for their metadata, it makes more sense to value quantity over quality. Curation can be outsourced to the internet, and a less rigorous system of cataloging allows media to be added to databases that may have sat indefinitely in backlog limbo. The details can be fleshed out later.

It was not so long ago that little internet elves managed the card catalogs of the internet for search engines such as Yahoo and the like. For a brief moment, this method was superior – but the internet has outgrown our ability to wrangle it into submission. Now, it seems that the best approach is a combination of methods. A shotgun blast may not be as precise as a sniper’s aim, but it will cover more surface area. The results might be a little spread out and disjointed, but with some extra effort, you’re more likely to be able to find a number of sources that are valuable to your research, and your job is to hone in and filter out the hits that fall outside your scope.

Pam Enlow

The Evolution of Searchability

The piece “Survival of the Fittest Tag: Folksonomies, Findability, and the Evolution of Information Organization” discussed how searchability evolved alongside the internet. Initially, one entity dictated search terms within a database. According to author Alexis Wichowski, this all changed with the birth of folksonomies, or user generated classifications of data. Wichowski cited the site delicious.com as the originator of an editable, Web 2.0 style tagging system in 2003. Wichowski presented multiple arguments for and against the use of folksonomies. Some scholars argued that this system of tagging allowed for a broader range of connective networks. It certainly fit with contemporary user’s desires to customize the internet. However, within this system there lies the propensity for redundancies. Multiple users may tag their own content with the same words. Conversely, some users may search for the same content using regional colloquialisms of the same word. Wichowski proffered the example of an item being tagged “mom,” and therefore not being searchable through the terms “mother,” “mum,” or “ma.” However, at the time of the article there was not an enormous amount of scholarship on the benefits and pitfalls of user generated search criteria. Wichowski was therefore unable to predict the fate of folksonomic categorization, but concluded that search methods would last as long as their usability endured. This argument portrayed the user generated changes to the internet as a form of evolution.

 

Other writings also demonstrated this point. In the piece “Once the Most Powerful Person in Search, Srinija Srinivasan Leaves Yahoo” by Danny Sullivan briefly recorded the career of Srinija Srinivasan and her role at Yahoo. According to Sullivan, Srinivasan’s work cataloguing the internet once made Yahoo the most successful search engine. The secret to this was people. Srinivasan created the structure used by human editors to catalogue digital content. This meant that users had access to material in an order dictated by human editors, with the best, most relevant results at the top. The system worked well in the early days of the internet, but eventually proved too slow to last in the fast paced digital ecosystem described by Wichowski. Eventually sites like Google developed algorithms which allowed search terms to be applied to an entire text rather than just the title. This meant that searches were no longer limited to the categories created by Srinivasan and her team. Ultimately, in this bout of evolution, the robots won.
However, there are other sites which allow for Srinivasan’s human dictated organizational structure on a smaller scale. The popular site Omeka allows users to generate databases for collections of digital representations of real world objects. This system necessitates human organization and cataloguing. It also allows for the system of tagging described by Wichowski. Each creator also tags their items in order to increase searchability. Though it may seem that human cataloguing systems went virtually extinct, it seems that they too have simply evolved into their most usable form. Rather than using these systems to catalogue the entire internet, individuals of various institutions use Omeka and similar programs as platforms with which to catalogue physical objects in the digital realm.

Week 8: Tagging the Ivory Tower

If you visit the online gaming store Steam, you can locate a “most searched” option. This will show you the top-searched tags by people seeking new video games to play. One of the oft-cited complaints in the Steam community, by developers and consumers, is that the tagging system is not curated by Steam itself. A user can create their own tag, and, barring any slur, it will be linked to a game. Some of these can be, and are, negative in connotation. This is how a game like Dear Esther can be tagged as both “story-driven” and the more derivative “walking simulator.” Other tags might include “pointless,” “bad”, and, for a brief period of time, “gay.”

And they say gamers are not juvenile.

However, it is because the user-created tagging system exists on Steam that many independent games, such as Gone Home and Undertale, were able to avoid the pitfalls of infinite possibilities and become massive hits. An algorithm would never picked them up for featuring on the main page.

A similar dichotomy exists in the tagging and categorization of films. Netflix has a select number of taggers who are paid to watch movies and apply certain taxonomies onto them. This is how we get “comedies with a strong female lead.” But to get to “comedies with a strong femaile lead” you either have to a) be lucky and have it featured on your Netflix main page, or b) enter a rabbit hole of search options. Either way, you are still at the whim of algorithms deciding what to show you.

These hems and haws made Danny Sullivan’s article such an interesting read. Beyond the fact that a website exists solely for search engine news (NEW GOOGLE DOODLE SHOCKS THE WORLD), but also for the evolution of online databases from a human-led effort to a series of series of codes. On one hand, human hands separate the wheat from the chaff; but on the other, you can only sort so much wheat in a day. Google can clear all the fields, but the handiwork is messy. Which leaves the door open for smaller farmers to clean up. Wichowski discusses this in “Survival.” Using natural language allows for easier searchability, but once again, that human factor can rear its ugly head.

Historical databases are a tricky thing. I believe that allowing a larger group of people to be involved in the search process would be useful. It would get more people involved in historical work (always a perk) and it would make searches more practical. But I am also a cynical man, and know how the minds of many work. I do not want the Library of Congress to end up like Steam. I felt similarly about last week’s readings; more curation is wonderful as long as it follows certain guidelines and does not stray into mob tagging. I think this is the real question we have to grasp as digital history evolves:

How much control are we willing to give to those we might not have 100 percent trust in?

Jacob Dent

Week Eight: Tags; ontology and taxonomy

Kia Guest

This week’s readings cover software utilized in digital history and archives, as well as demystifying metadata and various other aspects of digital history software and archival methods.

The first reading “Metadata for All: Descriptive Standards and Metadata Sharing across Libraries, Archives, and Museums” discusses the challenges various outlets face with integrating digital content. The author decides to demystify metadata by breaking down and categorizing the different types digital standards and practices used by libraries, museums, and archives.

It was pretty interesting to learn about the history behind digitizing history before it became what are accustomed to in today’s archives. According to “Metadata for All…” the first digital software, AMICO Library was created in 1997 and seemed to be tedious and labor intensive with regards to properly archiving a piece of historical data. With the digitization of history becoming more of the norm, standard practices of digitizing historic data were created by a group of Californian institutions. The main concepts of the standards were data fields and structure, which according to this article, as like bottles waiting to be filled with whatever substance is needed. Data content and data values establish what will be filled into the data fields and structure. Data format pertains to the information coded within a particular file, and data exchange refers to the methods in which a collection is shared with the public. Using the bottles metaphor to describe each standard practice was very helpful in terms of explaining the importance of the implementation of the standards and practices. It was also interesting to find out that standards and practices produced by the California institutions was not a new practice. The first set of rules and standards for museums and libraries was created in 1852 by Smithsonian Librarian Charles Coffin Jewett (pg. 12).

In keeping with the theme of historical software, the article “Omeka and Its Peers” compares Omeka with other competing digitization software. However, the article quickly asserts that there is no comparison, that Omeka wins hands down. Omeka offers resources for librarians, academic users, and archives to present to the public a fascinating online exhibition for little of nothing cost wise. According to the article, no other software offers the same perks that Omeka offers. This article pretty much reads as an advertisement for Omeka, which is hard to argue against when you read about how beneficial it is to users who want to provide digital exhibitions for public consumption. Though brief in length, this article provided enough information regarding the usefulness of Omeka.

The article on Yahoo and resignation of one of Yahoo’s earliest staffers, Srinija Srinivasan did not follow the same themes as the other articles, but it’s inclusion was necessary. This is article briefly discusses how Yahoo was once the king of search engines when using the world wide web was a new phenomenon. Yahoo was the catalyst to other search engines such as Bing and Google that would ultimately eclipse Yahoo in popularity. Within the article there was a blub about Yahoo’s first directory floating around on the world wide web, still accessible and sort of a relic of how the internet previously looked and a testament to how search engines were once powered by humans, not computerized algorithms. Yahoo is an example of how past search engines operated, and how search engines have evolved over time.

Discoverability and Its Challenges

Sarah Kirkley

Alexis Wichowski discusses folksonomies and tagging as a way in which outlying information (not happening in the mainstream of the information environment) is managed in “Survival of the Fittest.” She says that folksonomies are tags, used by everyday or “ordinary” users, that have created a new information organization system to help manage the volume of information online. Tags can be used to simply remind a user to return to a particular site or to organize them based on content or other descriptors for ease of retrieval and use. Wichowski discusses their evolution over time and their uniqueness from traditional metadata schemas or controlled vocabularies used in the information science world; one flaw she points out about folksonomies is their lack of context. Wichowski starts the article by stating that information must be useful and findable, and the natural language can contribute to the discoverability of information simply because it has been described in language familiar to those seeking the content.

Mary W. Elings and Gunter Waibels’s “Metadata for All” describes the challenges faced by institutions and inconsistency in metadata. They describe the various standards used within libraries, archives, and museums and cover the history of the use of those standards and how they have evolved or moved to others over time. They argue for modifying the standards based on material type, rather than institution or community (library, archives, museum, etc.) type. For instance, cultural heritage materials at museums, libraries, and archives would all be described using the same standards, while books at those same institutions would all be described using their own set of standards separate from cultural materials. The current lack of interoperability ultimately negatively impacts user experiences in accessing resources, which is detrimental to all of the communities mentioned.

I think online tagging both facilitates access to information but can also make it more difficult in some situations. Users who are used to getting relevant search results by using a colloquial term via Google may be surprised when that same search yields no results in a library catalog or database. The Library of Congress Subject Headings are often outdated, such as the much discussed change from ‘Illegal Aliens’ subject heading to ‘Noncitizens’ last year. The lack of natural language in the subject headings make some research more difficult than necessary, and it omits certain groups of people and perspectives. The lack of diversity among those in the field is reflected in the language used, and the natural language used in online tagging and folksonomies offers a chance to include some of those underrepresented populations and to ultimately make the information realm more inclusive in its descriptions.

While I know that interoperability among libraries, archives, and museums would have a great impact on the discoverability of resources, I wonder how feasible this move is. MARC has been rumored to be on its way out for decades but is still in use. Even within the library in which I work, multiple standards being used (MARC in the catalog, Dublin Core in the institutional repository and digitized special collections, EAD for archival finding aids, etc.). Those standards are content-based but having multiple systems of access can lead to confusion and lack of use. Trying to cross populate content management systems or adding a discovery layer can help, but still leads to confusion and there are often just too many clicks to get to the actually content. And lastly, the institutions discussed in the readings are often constrained by budgets and may only be able to afford one platform with one metadata standard for all content.

Digital Narratives

In his piece “The Database as a Genre of New Media,” Lev Manovich asked “How can one keep a coherent narrative or any other development trajectory through the material if it keeps changing?” This question demonstrated with Manovich’s basic argument that digital content, specifically in the form of databases, cannot be manifested as a narrative. Manovich suggested that this lack of a narrative had begun to leak into the analog world, one which increasingly aligned with digital principles. One argument was that digital content typically takes the form of databases, or collections of items with no necessarily narrative based organizational system. Another reason he gave for this was that digital content is constantly being updated, hence the question, can a narrative with no end actually exist. Thus Manovich posits a digital realm in which there exists no potential for a narrative by virtue of its reliance on databases. However, Manovich did not anticipate the possibility that databases themselves could tell narratives.

Manovich referenced photo albums as an analog version of a database: a constantly expanding collection of images with no narrative. Except that all photo albums have a narrative. They tell the story of their creators, usually in chronological order, and without words. Online collections are no different. Many websites which could be considered collections – blogs, vlogs, social media profiles, websites – can also reveal narratives by virtue of their evolution over time. Manovich’s argument seems based on the idea that digital processing is at odds with human understanding. However, most digital structures are dependent on human’s understanding of time, and the importance of chronological order within human society. Rather than digital organization impacting analog society, the digital realm is an extreme representation of humanity’s organizational preferences. Because digital content cannot be organized in any manner illogical to human thought, it cannot avoid humanity’s tendency towards narrative building.

Rosensweig gave a prime example of this in his piece “Why Collecting History Online is Web 1.5.” In this essay, Rosensweig discussed his involvement in the project to establish a digital memory bank to collect personal accounts of Hurricane Katrina survivors in the aftermath of the storm. In recounting the evolution of the site, Rosensweig created a narrative. The story began long before the hurricane came onto the radar, with the development of the web itself. As the internet shifted from its early, more structured state to “web 2.0,” a more editable and interactive version, the beginnings of crowdsourced digital history collections appeared. The Hurricane Digital Memory Bank built on earlier attempts at crowdsourced collections, specifically the database dedicated to survivors of 9/11. As Rosensweig described the database, it became clear that this collection of individual narratives created multiple overarching narratives. The individual stories themselves demonstrated the impact of storms on impoverished communities, as well as government response times in relation to the socioeconomic alignments of certain neighborhoods. The rate at which the certain individuals shared their stories online also demonstrated rate of recovery within communities, and potentially the types of people who returned to New Orleans to rebuild.

 

Narratives, Databases, and the Compatibility of History with Web 2.0

By Evan Meehan

Historians, either by virtue of their craft, or by the fact that they are all old fuddy-duddies have a tendency to live in the past.  This means that even well respected historians like Patrick Manning occasionally assert patently absurd things like: “it is unlikely that the techniques of narrative have advanced much in recent times.”[1]  Continue reading

Moving from 1.5 to 2.0, is the risk Worth the Reward for Museums?

Aspiring to what Nina Simon wrote about in 2007, many museums devote substantial resources and time to including interactivity within their exhibits.  Thousands of dollars are spent on developing and maintaining kiosks, recording booths and interactive activities for patrons of various ages and interests.  But despite all of these resources, museums still seem unable to proceed past Nina Simon’s third level, where the exhibits would truly become “2.0.”  Instead, they end up falling more in line with Brennan and Kelly’s “1.5,” where patrons can interact to an extent, but the information and curatorial choices are still heavily subjected to institutional gate-keeping.  Even at my museum, we continuously fall short.  We have brand new kiosks that quiz and poll patrons on everything from Atlanta neighborhoods to religion, but their interactions aren’t allowed to proceed past simply viewing how many people received or chose the same result.  We’ve even taken suggestions for objects within past exhibits, but again, curatorial gate-keeping often wins out.  For as much as museums want to become like web 2.0 where exhibits are designed and supported by patrons, one prevalent thing is stopping them: fear.

Despite museums’ good intentions, in my experience meetings about interactivity and patron participation are almost always dominated by fear: the fear of the loss of authority, the fear of the loss of respect and standing, the fear of low quality exhibits.  With funding sources drying up and the government purportedly looking at scrapping the National Endowment for the Humanities, even non-profit museums are having to focus  more and more on admission earnings to stay afloat.  This has created an atmosphere in which new methods of exhibit design that could alienate patrons are often seen as more risky than innovative.  This is especially true for museums that include topics that are controversial for one reason or another, such as the Civil War in the south.  We’ve had to talk to a number of our own volunteers for presenting ideas to patrons that are not in line with the institutional position!  There is a fear that if even our own volunteers seek to exert the “lost cause” narrative within the museum, any effort to allow patron participation would be disastrous.  Even if the museum were to distance itself from what patrons choose, it would still hurt the reputations of our institution and its staff.  This loss in reputation could hurt admission numbers which would have long reaching effects, both towards future exhibition capabilities and staffing positions.

The trick is to find a way to allow level five interactivity but still mitigate the risks that it carries.  This is something that online games have been struggling to do within their own communities and they have been at the forefront of finding ways to maintain digital civility.  League of Legends, one of the largest competitive online gaming communities, has had some success with their policies, but it would seem that this may still be a ways off for museums.