Faceted Browsing with RDF

From ActiveArchives
Jump to: navigation, search

Construction.gif This page represents an in process project. Temporalities may not fully align with reality.

What is faceted browsing?

Implementing Faceted Browsing with SPARQL

PREFIX dc:<http://purl.org/dc/elements/1.1/>
SELECT ?doc ?title
WHERE {
  ?doc dc:title ?title .
  {
    {?doc dc:LANGUAGE "Dutch"@nl .}
  }
  {
    {?doc dc:creator <http://www.sarma.be/tags/Jeroen_Peeters> .}
    UNION
    {?doc dc:creator <http://www.sarma.be/tags/Alexander_Baervoets> .}
  }
}
ORDER BY ?title

Implementing Faceted Browsing in Python & SPARQL

This code uses the rdflib library (version 3.1), and a homegrown SparqlQuery convenience class.

The process is basically:

  1. Get a list of all the relationships for the current set of items.
  2. For each relationship, produce list of all possible values and their associated counts.

In this case the "context" if defined by two things:

  • Look for only "items" that have a title (using the dc:title predicate).
  • Apply the current "facet" to further filter the current set where:
    • The "facet" is a dictionary mapping relationship urls to a list of values (rdflib entities).
def getRelations (http, baseurl, facet=None, norels=None):
    """ Returns the list of all (unique) relations (urirefs) to things that have titles
    norels: optional list of relationship URLs to exclude
    """
 
    q = SparqlQuery()
    q.prefix("dc:<http://purl.org/dc/elements/1.1/>")
    q.select("?rel", distinct=True)
    q.where("?doc dc:title ?title .")
    q.where("?doc ?rel ?obj .")
    if facet:
        for rel, values in facet.items():
            for val in values:
                q.where_clause("?doc <%s> %s ." % (rel, val.n3()))
            q.where_clause_end()
    if norels:
        for relurl in norels:
            q.filter("(?rel != <%s>) ." % relurl)
    q = q.render()
    # print q
    results = sparql_query_list(http, baseurl, q)
    relations = [b['rel'] for b in results]
    return relations

...

def getRelationValueCounts (http, baseurl, relation, facet=None):
    """
    Returns a count dictionary mapping values to document (things with titles) counts.
    """
    q = SparqlQuery()
    q.prefix("dc:<http://purl.org/dc/elements/1.1/>")
    q.select("?obj")
    q.where("?doc dc:title ?title .")
    if facet:
        for rel, values in facet.items():
            # don't filter a category with values in the same category
            # e.g. only filter a count list using selections in the *other* filters
            if rel == relation: continue
            for val in values:
                q.where_clause("?doc <%s> %s ." % (rel, val.n3()))
            q.where_clause_end()
 
    q.where("?doc <%s> ?obj ." % relation)
    q.orderby("?obj")
    results = sparql_query_list(http, baseurl, q.render())
    results = [b['obj'] for b in results]
 
    ret = []
    curvalue = None
    for value in results:
        if curvalue != value:
            if curvalue and curcount:
                ret.append((curvalue, curcount))
            curvalue = value
            curcount = 0
        curcount += 1
    if curvalue and curcount:
        ret.append((curvalue, curcount))
 
    return ret

What links here

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox