Command Line RDF

From ActiveArchives
Jump to: navigation, search

After being inspired by Dave Beckett's presentation on the suite of command line tools available from his Redland RDF library, I started using the tools myself. Here, some "best ofs" / notes.



XML to turtle

Turtle is a very readable format, often much better than XML for seeing the essential structure / hierarchy of RDF.

rapper --output turtle


Read from a local HTML file, output to XML (rdfxml), and redirect the output to a new (xml) file.

rapper --input rdfa --output rdfxml 117.html > 117.xml

Working with a local store

In the following examples "aardf" is the name of a local store (in the default "hashes" format), and thus could be whatever. Note that the database actually comprises a number of files (4 in total when using contexts):


Display/dump the full contents of a store

rdfproc aardf print

For easy reading, serialize to turtle:

rdfproc aardf serialize turtle

For archiving / migration to another store, dump to an XML file:

rdfproc aardf serialize rdfxml > dump.xml

View the arcs (relations to a particular node)

rdfproc -c aardf arcs-out

Display the full list of contexts in a store

rdfproc -c aardf contexts

Add the contents of RDF from a local file to the store (with context)

The repetition of the document is to set the base-uri, and context respectively.

XML file using the filename as baseuri and context

rdfproc -c aardf parse 117.xml rdfxml 117.xml 117.xml

Turtle file with no baseuri and filename as context

rdfproc -c aardf parse meta.turtle turtle - SpecialMetaData

Clearing all statements from a context

rdfproc -c aardf remove-context SpecialMetaData

Querying the store with SPARQL

rdfproc -c aardf query sparql - "PREFIX dc:<> select ?title where { ?doc dc:title ?title } ORDER BY ?title"

Important discovery!

In doing a very generic query to list all the relations to a large set of documents, I see that the order of terms in a SPARQL query makes a big performance difference. Namely this query:

rdfproc -c aardf query sparql - "PREFIX dc:<> SELECT DISTINCT ?rel WHERE { ?doc dc:title ?title. ?doc ?rel ?obj. }"

Takes an unacceptable amount of time (on the order of minutes).


rdfproc -c aardf query sparql - "PREFIX dc:<> SELECT DISTINCT ?rel WHERE { ?doc ?rel ?object. ?doc dc:title ?title. }"

Takes about 3 seconds.

Other queries:

rdfproc -c aardf query sparql - "
PREFIX dc:<>
PREFIX sarma:<>
SELECT ?title
?doc dc:title ?title.
?doc ?rel <http://localhost:8000/tags/Ballets_Russes>
ORDER BY ?title

"Facet query"

rdfproc -c aardf -r table query sparql - "
PREFIX dc:<>
PREFIX sarma:<>
SELECT ?sub ?rel ?obj
?sub ?rel ?obj.
?sub ?foo <http://localhost:8000/tags/Ballets_Russes>.

Querying a live data source and outputing the results as an HTML table

roqet --data -e "PREFIX foaf:<> SELECT ?name WHERE { ?s foaf:name ?name }" -r html > test.html

What links here

Personal tools