Why Metadata

From ActiveArchives
Jump to: navigation, search

View as slideshow


Why Metadata

Shocking fact: Metadata isn't inherently useful

A system can be 100% standards compliant, with a flawless database scheme and excellently programmed forms, and still produce bad or incomplete or unmaintainable collections.

Why? Because broken or bad editorial flows produce their own "noise".

The dark side

  • Database-centricity where database schema and forms are rigid and only changable by programmers
  • Blind data entry (Data entry no (immediate) feedback and no concern for relation to collection)
  • Draconian validation that stops editorial workflow in the name of formal validity and encourage workarounds leading to...
  • Systemic ambiguity created by over-specificity and difficult workflows (as exasperated users simply do what's necessary to make the system work rather than enter the data they want or need to.)
  • Self-contained databases that ignore the web, and everything "outside" the scope of the institution

The good

  • Consistent use of terms (when appropriate) leading to useful and complete indexes
  • Flexible granularity (document, paragraph, media)
  • Cross institutional sharing

Practical outcomes of well-maintained metadata

  • Improved website:
    • Context-rich search results specific to your institution
    • Improved cross-site index (Names, Places, Events)
    • New views of data: (Interactive) Network visualisations, Event Calendar, Timeline
    • Widgets to enhance existing pages (see Infobox example)

Practical outcomes of well-maintained metadata (2)

  • New forms of publications:
    • Dynamic publications (Search => epub)
    • Quotations
    • Cross-source publications
  • Connect to an active community of Linked Open Data in Cultural Sector (LOD/GLAM e.g. http://openglam.org/)
    • Shared experiences, tools
    • Federated search:
    • Search results from partners alongside your own resources (and vice versa)
    • "Webring" style crosslinking of resources
  • Names, Places, Events, Themes

Example: Semantic Infobox

Imagine clicking on someone's name in your site and seeing a popup showing:

  • Summary information about the person (thumbnail image, title)
  • List of related publications
  • List of related events, highlighting any upcoming events
  • List of other documents which refer to this person.

>>TODO: Mockup Image<<

Example: Media Player (Jukebox)

As a positive example, consider the common Media Player interface:


  • Starts from the Music files themselves, database is "merely an index" of those files
  • Tabular displays, flexible and immediate sorting to see metadata & inconsistencies
  • Flexible editing including "Group view"
  • Specialized tools to fix common glitches

Example: Media Player


  • Changes in the editor are made to the metadata in the original audio files, the database is merely an index (ie could be regenerated form the files)

Example: Media Player


  • Metadata fixer helpfully detects common differences in metadata helping to "close the loop" and groom the data.

Rethinking forms

Traditionally, controlled vocabularies or convergence of terms is designed in via database schemas and entered via forms with validation.

The problem with this approach is when it's rigid, the database schema completely determines editorial structure. As needs change or exceptions occur, the system fails, either through dulling of data by workarounds or missing information or simply by underuse or abandonment of the system and lost editorial work waiting for a redesigned system.

Document->Index instead of Database->Document


Alternatively, work like a web search engine:

  1. Take documents "as is" and wherever they are and index them.
  2. Provide interfaces that give feedback in the form of indexes and (semantic) search results.
  3. Resulting feedback loop drives convergence of terms.



Toneelstof is an interactive timeline of theater activity in Flanders, Belgium produced by Constant in assignment of the Flemish Institution for the Performing Arts (VTI). It represented a republishing of materials original developed for DVD ROMs, based primarily on transcribed video interviews and a wealth of other archive materials both from within and outside of the VTI.

Markup instead of Formal Tagging/Classification

The "backend" editors interface is based on MediaWiki (the software behind Wikipedia), enhanced with a "semantics" plugin (SemanticMediaWiki).

Based on the experience of HyperText, description formats like RDF work by allowing "flavored links" within a text as a means of "tagging" a document or providing structured information.


Once saved, the following "Info box" appears alongside the article giving feedback on the linked names by type of link:

MegStuart SMW.png

Name Auto-completion


Here an external document (on the site UBU web, indicated by its URL) is annotated in the editor's wiki). In the related people entry, names, an auto-complete dropdown appears as soon as one types a name. Semi-colons may be used to separate multiple names.

Content-fitting Forms

Systems like SemanticMediaWiki provide a compelling example of dynamic forms with powerful auto-completion.

The key is that the forms may themselves be altered as needed to fit changing requirements.

Who's who


A second interface, Who's Who visualizes the entire database of people & organizations in the VTI database.




As the joke goes: the great thing about standards is there are so many to choose from. While there will never be (nor should be) one "killer" standard, there is some cohesion in a "mesh" of standards & best practices. Like the web itself, a sustainable strategy is to create tools and workflows that support this mesh.


  • The selection of one standard doesn't exclude another. A useful taxonomy tool helps to create useful synonyms to group compatible labels.

The Shinto Shrine


Memory practices in the sciences - Geoffrey C. Bowker

The shrine buildings at Naikū and Gekū, as well as the Uji Bridge, are rebuilt every 20 years as a part of the Shinto belief of the death and renewal of nature and the impermanence of all things — wabi-sabi — and as a way of passing building techniques from one generation to the next. The twenty year renewal process is called the Sengu. In August, in a long-standing tradition the people who live in Ise are allowed to enter the area around the Inner Sanctum of the Naiku as well as the Geku.

What links here

Personal tools