Command Line RDF

From ActiveArchives
Jump to: navigation, search

After being inspired by Dave Beckett's presentation on the suite of command line tools available from his Redland RDF library, I started using the tools myself. Here, some "best ofs" / notes.

Contents

Converting

XML to turtle

Turtle is a very readable format, often much better than XML for seeing the essential structure / hierarchy of RDF.

rapper --output turtle http://www.dajobe.org/foaf.rdf

RDFA to XML

Read from a local HTML file, output to XML (rdfxml), and redirect the output to a new (xml) file.

rapper --input rdfa --output rdfxml 117.html > 117.xml

Working with a local store

In the following examples "aardf" is the name of a local store (in the default "hashes" format), and thus could be whatever. Note that the database actually comprises a number of files (4 in total when using contexts):

aardf-contexts.db
aardf-po2s.db
aardf-so2p.db
aardf-sp2o.db

Display/dump the full contents of a store

rdfproc aardf print

For easy reading, serialize to turtle:

rdfproc aardf serialize turtle

For archiving / migration to another store, dump to an XML file:

rdfproc aardf serialize rdfxml > dump.xml

View the arcs (relations to a particular node)

rdfproc -c aardf arcs-out http://sarma.be/docs/view/117

Display the full list of contexts in a store

rdfproc -c aardf contexts

Add the contents of RDF from a local file to the store (with context)

The repetition of the document is to set the base-uri, and context respectively.

XML file using the filename as baseuri and context

rdfproc -c aardf parse 117.xml rdfxml 117.xml 117.xml

Turtle file with no baseuri and filename as context

rdfproc -c aardf parse meta.turtle turtle - SpecialMetaData

Clearing all statements from a context

rdfproc -c aardf remove-context SpecialMetaData

Querying the store with SPARQL

rdfproc -c aardf query sparql - "PREFIX dc:<http://purl.org/dc/elements/1.1/> select ?title where { ?doc dc:title ?title } ORDER BY ?title"

Important discovery!

In doing a very generic query to list all the relations to a large set of documents, I see that the order of terms in a SPARQL query makes a big performance difference. Namely this query:

rdfproc -c aardf query sparql - "PREFIX dc:<http://purl.org/dc/elements/1.1/> SELECT DISTINCT ?rel WHERE { ?doc dc:title ?title. ?doc ?rel ?obj. }"

Takes an unacceptable amount of time (on the order of minutes).

While:

rdfproc -c aardf query sparql - "PREFIX dc:<http://purl.org/dc/elements/1.1/> SELECT DISTINCT ?rel WHERE { ?doc ?rel ?object. ?doc dc:title ?title. }"

Takes about 3 seconds.

Other queries:

rdfproc -c aardf query sparql - "
PREFIX dc:<http://purl.org/dc/elements/1.1/>
PREFIX sarma:<http://sarma.be/terms/>
SELECT ?title
WHERE {
?doc dc:title ?title.
?doc ?rel <http://localhost:8000/tags/Ballets_Russes>
}
ORDER BY ?title
"

"Facet query"

rdfproc -c aardf -r table query sparql - "
PREFIX dc:<http://purl.org/dc/elements/1.1/>
PREFIX sarma:<http://sarma.be/terms/>
SELECT ?sub ?rel ?obj
WHERE {
?sub ?rel ?obj.
?sub ?foo <http://localhost:8000/tags/Ballets_Russes>.
}
"

Querying a live data source and outputing the results as an HTML table

roqet --data http://www.dajobe.org/foaf.rdf -e "PREFIX foaf:<http://xmlns.com/foaf/0.1/> SELECT ?name WHERE { ?s foaf:name ?name }" -r html > test.html

What links here

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox