Linked Data: The Good Parts

From ActiveArchives
Jump to: navigation, search

Construction.gif DRAFT!

View as slideshow

This is an attempt to collect some ideas about RDF/Linked Data. While I have been talking & working with others about RDF/LD and certain ideas seem to be crystalizing. It strikes me that in these discussions I seem in constant need of culling the core and exciting ideas away from the examples and ideas that lead to misunderstandings, miscommunication, and disparaging cynical self-destruct.

Michael Murtaugh 13:45, 27 February 2011 (UTC)


Good parts

  • publishing/writing with Linked Data enables the use of a dynamic, diverse, and distributed toolset for retrieval, combination, analysis, and visualisation of web-based writing
  • Linked Data represents a return to HyperText and a break with the "tyranny of the table" in terms of data representation, so it's about flexibility, but also the freedom of text to express ideas outside of the form. LD offers a truly "web" oriented way to think about sharing data.
  • Linked Data enables decentralization, things like federated search, social networks, collaborative writing, all without depending upon definitive, fixed, centralized, (commercial) service or API.
  • Schemas are all about fishing out useful sharable structured data from "free form" writing

Bad parts

  • RDF/XML representation: hard to read: misleading about what RDF is / disconnect from writing. (Takes the least interesting aspect of markup) -- alternative RDFa.
  • Reliance on centralized sources such as dbpedia / freebase (Repetition of the "google" model)... the important sources are the distributed and situated knowledge of individuals & institutions.
  • Formal schemas when they are defined / presented in a top-down manner unconcerned about practice (The "use or die" message for Standards)
  • Datatypes: While they seem a good idea, they also are often overbodig mbt typing of links. Microdata takes the smart step of removing. The fact that actual support for the types in stores is somewhat spotty doesn't help. Helps make RDF feel bloated and a lot of fuss for very little impact.


Linked Data is a semantically enhanced HyperText not (just) a Global Gameshow of facts

A common way to explain triples is as natural language statements, or as simple questions and answers. While this is one way of thinking about triples, it seems problematic as it suggests a kind of "dumbing down" to the kinds of knowledge stored in triples by implying that the full essence of meaning is captured by reductive "factoids"s of the form "Subject Predicate Object". (What's the capital of Massachusetts? Buzz.... Boston, Correct!) In contrast, RDF is very much about the HyperTextual practice of writing with links. In fact it adds a creative element to that writing in the form of qualifying the links with a label, to give the relationship a kind of flavor or color. In this way, semantic links implies a creative writing space with much more potential than just a kind global gameshow of facts.

Schemas are patterns that allow convenient recognition / retrieval of useful structured data from the soup of links

Schemas are too often presented as top-down standards that institutions need to "use or die". In fact, however, like the web, new schemas are free to created as necessary. The fact that standards need to (come to) exist in order for sharing of data, does not imply that the standard needs to be established before (any) kind of use to take place is misguided. In contrast, examples that in some sense represent "schematic" publishing or writing of linked data exist all over the web. The microformat community is an example of an attempt to explicitly codify these semantic structures.

Many proposed schema's today suffer from the fact that very little is "written" using the schema.

SPARQL SELECT is the key to federated search

SPARQL CONSTRUCT (and DESCRIBE?) is a (selective) database export tool

In a discussion about SPARQL, the question arose if the results were not themselves RDF. When using a "SELECT", the answer is no -- because the SPARQL SELECT statement "tablifies" the data, and this is a good thing in that it can / often is used to retrieve specific data (you use the ?foo variable form in the SELECT statement in effect to pick out the nodes that you are interested in using). A SPARQL DESCRIBE statement, however, returns a partial RDF graph of the related / matching nodes. In this way, a SPARQL DESCRIBE query is a way of exporting portions of one's RDF store, and pulling it to another repository. In this way it comes close to couchDB's kind of replication (though without the precision/care of revision numbers).


Personal tools