From ActiveArchives
Jump to: navigation, search



There are three kinds of plugins in AA: sniffers, filters, and exporters.

The lifecycle of a resource in AA: sniff, filter, and export. In parallel, Annotations allow resources to be described, tagged, reorganized.


Input: URL, Output: Annotation (Structured Text) containing metadata, links

Motivation: Command line tools often offer extremely useful information about resources (ffmpeg to check a movie, exiftool to explore the metadata of a tag, an HTML parser to extract the structure of a page). Aim to capture this ability in a format (using RDF) to enable its use / mixing.

Position: Tool-specific views of media -- tries *not* to make clean abstractions (multi-purpose Image viewer), but rather content/format/tool specific (think Linux one-thing-done-really-well). A sniffer corresponds to a "source" of information about a resource. AA's ability to combine different sources of information (through in part the use of RDF) allows for this diversity of information to be merged as needed / desired. (Example: Could use a youtube tag, and an EXIF image tag together).

!? URL | sniffer => new URL which is actually the URL of the annotation!

Name Notes
HTML5lib (python)
identify (imagemagick)


Input: Media URL, Output: (New) Media URL. Typically Asynchronous

Name Parameters Description
transcode format, bitrate, framesize
framegrabber time
fragment/timecrop start, end / duration embeds a fragment of a time-based media
resize w, h
crop x, y, w, h
youtubedownloader format (code corresponding to youtube version to download)
fadein duration fades in a time-based media
fadeout duration fades out a time-based media
zoom enables a javascript zoom plugin for an image file (jquery iviewer?)
thumbnail w, h creates a thumbnail (keeping ratio) of an image file
volume integer plays one file with the given volume
speed x2, x4, -x2, -x4... plays a time-based media with the given speed
controls displays control buttons for a time-based media
autoplay autoplays a time-based media when it is triggered
embed html5 (default), html5audio takes an audio/video ressource an wraps it in an audio/video element


Input: Playlist, Output: various

Name Output Description
srtexport text/srt export titles to .srt format
feed text/rss, atom? export to RSS, ATOM feeds
mltscript text/bash
mltrender (*) video/webm, ...
svgmaker text/svg
audacity markers text/plain export to Audacity markers


Some URLs:


Sniffing inspects a URL and displays "metadata" either contained within the resource, or indirectly available via the API of a web service. Sniffer plugins allow AA to be customized and extended to specific kinds of URL-based resources depending on the needs of an archive.


To inspect the above URLs:


Sniff tries all known sniffers on the URL, and collects and returns a list of URLs of specific sniffers with "something to say". Clicking on the individual link:


Yields an HTML+RDF page with speicific metadata and eventually links to "sub-sniffers" to get further metadata (such as youtube comments).


In AA, crawling is the process of reading the contents of an (HTML) page, indexing its contents, and then to follow links and repeat the process (either automatically or guided manually by an editor).

Using the crawl URL:


Add the Sniff URL above to the aa Crawler / Indexer.


(interactive mode? or follow all...)

AA starts at the sniff URL, then follows the links.

Evt: Archiving / snapshotting / versioning the contents will be part of the crawler.

Eventually optional support for both internal/external RDF stores.


AA provides a means of facet browsing crawled / indexed content of the archive.

  • short/long form tabular form, javascript sortable


Create a new page, make (semantic) links. Index these URLs by the Crawler

AA supports a simple markup to embed resources in a page. Embedding a resource adds it to the archive, thus enabling all the capabilities / support of the AA software.

{{ http://video.constantvzw.org/vj12/Michael_Moss.ogv }}

This embeds adds the resource to archive, sniffs it, crawls the pages. Uses RDF information to intelligently select a javascript delegate to include the resource on the page in a web accessible way.

The embed markup supports creating pipelines with filters. Filters are AA plugins that can be customized / extended and may implemented in any language (python / CGI, BASH).

{{ http://video.constantvzw.org/vj12/Michael_Moss.ogv | fragment start=01:00 duration=00:10 | transcode format=oggaudio }}

This embed has the effect of creates a new resource, which is then embedded into the referring page.


There is a simple markup for annotating an embedded resource.

  • Annotation markup makes it easy to create page fragments (div's) linked to resources (using RDF's @about).
  • Beyond just establishing rdf triples, Annotation's are also about a having a compact unit of labelling (ie the context of the links/properties are important to preserve & work with). In this way AA extends rdf technology. (Is this the "context"?)

Current example of Michael Moss + Annotations: http://activearchives.org/aaa/resources/25/

(maybe have a generalized means of javascript-expanding a link within a page!)

  • Is an annotation a kind of embed as well?

Playlist / Export

Search markup?

How to export modules fit in? (automatic table / list)?

How to make custom (django) views easy to create / mix in ?!


Core concepts: Resource, Page, Annotation, Playlist, Tags, Metadata (via typed links)


  • Have an API that corresponds to actual archive use, not over-generalizing.
  • (related to this) Have an API that leads to meaningful/useful URLs like:



A resource is the main item in an archive, it might be a video, an image, a webpage. Each resource is identified by a URL.

URL Method Description
/resource/ID/annotations/ GET Get annotations, optional: start, end times filter by time range
/resource/ID/ DELETE Delete the resource and all associated metadata (annotations, playlists)
/resource/ID/ POST Add the given annotations
/resource/ID/ PUT Reset annotations to the posted annotations
/resource/ID/thumbnails/ GET Returns a list of available thumbnails
/resource/ID/versions/ GET
/resource/ID/tags/ GET Retrieve all tags for the resource
/resource/ID/tagsbytype/ GET The list of all tags organized by type for the resource
/resource/ID/tags/type/ GET Retrieve all tags of the given type for the resource
/resource/ID/metadata/ GET Retrieve all metadata for the resource
/resource/ID/metadata/NAME/ GET Get the associated values for the given name.
/resource/ID/metadata/NAME/ PUT Set the values for the given name.
/resource/ID/metadata/NAME/ POST Append values for the given name.


A wiki style page, page can be used to make annotations (among other things).

URL Method Description
/page/PageName/ GET (Wikified page name), Get page contents
/page/PageName/annotations/ GET
/page/PageName/ DELETE Delete the resource and all associated metadata (annotations, playlists)


An annotation is like a note attached to a particular resource. Optionally, an annotation may be part of a playlist.

URL Method Description
/annotation/ID/ GET
/annotation/ID/ DELETE Delete the annotation
/annotation/ID/ POST Append text to the given annotation
/annotation/ID/ PUT Set the text of the given annotation


A sequence of annotations. May be related to a number of Resources.

URL Method Description
/playlist/ID/ GET
/playlist/ID/ DELETE Delete the playlist
/playlist/ID/ POST Append annotations to the given playlist
/playlist/ID/ PUT Set the annotations of the playlist


Tags have resource, source, type, and value.

Tags can be organized as synonyms, and placed in hierarchies (via a parent property).


Sources are used as the base of URLs for things like resources and tag types.

Flat structure of URLs with links ipv Hierarchy

What links here

Personal tools