The Web of Programs

One of the distinguishing features of Ripple is that its programs can be easily embedded in the Web of Data (as well as in RDF triple stores), allowing you to build up a network of interlinked programs, and to build on programs others have put up on the Web.

Similarly to visiting a Web page, discovering and executing a program in Ripple is effortless (all you need is its URI), while publishing new content on the Web requires a little more effort and planning. As with HTML web pages, there are many strategies for getting the data out there. The following will illustrate a file-based approach which uses a basic recipe for serving static RDF files on the Web.

Read on at the new wiki page The Web of Programs.

Shiny new wiki

Ripple has found a new home on GitHub. Not only has the code base been moved over from Google Code, but a new, much-improved wiki has been added. This contains a “running Ripple” guide, a couple of tutorial-style pages on exploring Linked Data and JSON, an overview of syntax and commands, and user documentation on all 150-or-so language primitives. Check it out.

JSON support in Ripple

There are a number of popular formats for structured data on the Web. Ripple supports RDF/XML as well as several other RDF document formats, allowing you to navigate programmatically through the Web of Data, and even embed programs in the Web of Data. However, there’s also a lot of good JSON data out there which shares many of the advantages of RDF data: unambiguous syntax, a graph-like structure, and the ability to link from one document to another (albeit in domain-specific ways). For example, here’s a bit of JSON about San Francisco’s Embarcadero neighborhood, provided by Twitter:


{
    "place_type":"neighborhood",
    "contained_within":[
        {
            "place_type":"city",
            "full_name":"San Francisco, CA",
            "name":"San Francisco",
            "url":"http://api.twitter.com/1/geo/id/5a110d312052166f.json",
            "id":"5a110d312052166f"
            ...
        }
    ],
    "country":"The United States of America",
    "country_code":"US",
    "full_name":"The Embarcadero, San Francisco",
    "name":"The Embarcadero",
    "url":"http://api.twitter.com/1/geo/id/90942366be65cd2c.json",
    "id":"90942366be65cd2c"
    ...
}

Ripple’s HTTP client is capable of fetching JSON documents on the fly, so all we need is a way to parse the retrieved JSON, as well as a way to navigate through it. As of today, the new graph:to-json primitive provides the parser, while a new kind of predicate provides the ability to navigate through the key-value hierarchy. For example, we can turn the following (serialized) JSON into a native JSON object:


@list j: "{\"foo\": true, \"bar\": [6, 9, 42]}" to-json.

:j.

  [1]  {"foo":true,"bar":[6,9,42]}

Now we can traverse as far into the hierarchy as we want:


:j. "foo".

  [1]  true

:j. "bar". each.

  [1]  6
  [2]  9
  [3]  42

Note that JSON arrays become lists in Ripple, while booleans become xsd:boolean values, and numbers become xsd:integer, xsd:long or xsd:double values. Here’s an example which uses the Twitter data mentioned above:

# Fetches Twitter Places JSON from Twitter's REST API
# id: a place id (e.g. "90942366be65cd2c")
@list id twitter-place: \
    "http://api.twitter.com/1/geo/id/" id concat. \
    ".json" concat. \
    to-uri. get. to-json.

# Fetches the parent features of a place
# place: the JSON of an already-fetched place
@list place parent-place: \
    place "contained_within". each. \
    "id". :twitter-place.

These two mappings allow us to turn a Twitter Places id into a native JSON object after fetching the corresponding JSON document from Twitter, and to continue traversing to places described in other JSON documents:

@list embarcadero: "90942366be65cd2c" :twitter-place.
  
:embarcadero.

  [1]  {"id":"90942366be65cd2c","bounding_box":{"type":"Pomµç–vöâ"Â&6ö÷&F–æFW2#¢âââ×ÐР

Now we can get the name of the neighborhood:

:embarcadero. "name".

  [1]  "The Embarcadero"

… the higher-level places it is contained in:

:embarcadero. :parent-place* "full_name".

  [1]  "The Embarcadero, San Francisco"
  [2]  "San Francisco, CA"
  [3]  "California, US"
  [4]  "The United States of America"

…and so on:

:embarcadero. :parent-place{2} "geometry". "coordinates". each. each.

  [1]  (-124.482003E0 32.528832E0)
  [2]  (-114.131211E0 32.528832E0)
  [3]  (-114.131211E0 42.009517E0)
  [4]  (-124.482003E0 42.009517E0)
  [5]  (-124.482003E0 32.528832E0)

Note that unlike RDF data, JSON documents are currently not cached in Ripple (so there is no speedup for repeated queries). JSON support is included in the Ripple 0.6 (alpha) release.

Edit 2011-5-13: examples updated to use latest syntax

It’s official: Ripple 0.5

It has been a really, really long time since I announced the last release of Ripple. This hasn’t been due to lack of new developments so much as lack of knowing when to stop and call the thing a release. The list of tweaks and enhancements in the issues list tends to grow faster than it shrinks, and there has always been one item or another I’ve wanted to tick off of the list. Anyway, Ripple 0.5.1 is here. I’ll try to remember to release earlier and oftener in future.
Apart from the regex and pattern-matching syntax and invertible mappings mentioned in previous posts, as well as the separation of Ripple from its linked data client (called LinkedDataSail, now also used in the Gremlin graph query language), notable changes since 0.4 include:

  • unit tests for each and every primitive (of which there are well over a hundred)
  • strict agreement with Turtle syntax w.r.t. numeric values: abbreviated forms of xsd:integer, xsd:double, xsd:decimal and xsd:boolean values now follow Turtle/N3 conventions
  • new Joy primitives: logic:while, stack:drop, and stack:take
  • other new primitives: etc:dateTimeToMillis, stream:intersect, stream:require, string:md5, math:ceil, math:floor
  • an @unprefix directive to unbind namespace prefixes previously bound by @prefix
  • optional read-only access to a base Sail
  • built-in support for AllegroGraph
  • an improved library loader
  • a total order for native values which takes equivalence relations into account (enabling comparison primitives to behave consistently across data types), as well as improved hashing behavior for native values
  • ability to embed Ripple queries in Java code, as well as to parse Ripple source files
  • robust error recovery for the query engine
  • list/program equivalence. This is subtle but fundamental to the way Ripple now deals with list primitives. It has always been the case in Ripple that any list is potentially a program. However, there are additional possibilities for program rewriting and optimization when every program is also a list. What this means for the programmer is that you can now manipulate Ripple programs — even primitives and RDF predicates — using list primitives such as cat, first, and rest.
  • limited SPARQL and SPARUL functionality
  • metadata of primitives is now derived from the corresponding Java API of the primitive. This metadata is converted to RDF on demand, so that it never falls out of sync with the implementation

For full details, see the commit logs and API documentation.
Recent improvements to LinkedDataSail include incremental caching (meaning that it doesn’t wait until it’s shut down to persist the metadata it gathers about Linked Data URIs, making it safer for large crawls) and an LRU caching policy (again… better for large crawls).
Ripple 0.5.1 can be downloaded here. While we’re at it, here’s a little user documentation:

  1. After you expand the *.zip, run the startup script (ripple.sh or ripple.bat)
  2. Now paste this in at the Ripple command line:
    
    @define timbl: <http://www.w3.org/People/Berners-Lee/card#i> .
    :timbl >> .
    
  3. …and this:
    
    @prefix foaf: <http://xmlns.com/foaf/0.1/> .
    :timbl >> foaf:knows >> .
    
  4. Watch the RDF data streaming in for a while (double-tap ESC if you get bored), then type:
    
    :timbl >> foaf:knows{2} >> foaf:name >> distinct >> .
    

And there you are: a simple FOAF crawler. Experiment with different queries to grab different slices of the Web of Data.

Arrows and regex

This post will introduce Ripple’s application and regular expression syntax. The very first releases of Ripple (as in the screencast) included a single, infix symbol, “/” for the application of mappings. For example, to map the numbers 2 and 3 to their sum, you would have used (2 3 /add). While the slash is still supported for the sake of backwards compatibility, Ripple’s preferred syntax is now entirely postfix-based, and includes various constructions for “forward” and “backward” application of mappings, as well as for regular expressions. These are:

  • “>>” — forward application. Simply applies a mapping, exactly once. For instance, the example above should be written: (2 3 add >>) using the preferred syntax.
  • “<<” — backward application. Applies the inverse of a mapping, insofar as this is defined. For instance, the expression (2 3 add <<) yields (-1), as the inverse mapping of add is defined to be sub, the subtraction primitive. In RDF applications, backward application is useful for traversing links from head to tail rather than from tail to head. For example, if (:timbl foaf:knows >>) yields all individuals known by Tim Berners-Lee, then (:timbl foaf:knows <<) yields all individuals who know Tim Berners-Lee, according to Ripple’s knowledge base. In the current application, you can even traverse backwards from literal values. For example, ("Timothy Berners-Lee" foaf:name <<) yields (:timbl) himself.
  • “?” — optional quantifier. This and the following constructions provide POSIX-style regular expressions in Ripple. When it stands before an application operator, “?” applies the operator both once and not at all. For instance, the expression (:timbl foaf:knows? >>) yields both Tim Berners-Lee and the individuals he knows. The expression (42 neg? >>) yields both (42) and (-42).
  • “*” — star quantifier. When it stands before an application operator, “*” applies the operator zero or more times. This is particularly useful when working with recursive data structures such as lists. For example, the expression ((10 20 30) rdf:rest* >> rdf:first >>) yields (10), (20), and (30).
  • “+” — plus quantifier. Like “*”, but applies its operator at least once. Thus, the expression ((10 20 30) rdf:rest+ >> rdf:first >>) yields only (20) and (30), as the rdf:rest mapping is applied once, then twice before the end of the list is reached.
  • “{n}” — numeric quantifier. Applies its operator a single, specified number n of times. For instance, ((10 20 30) rdf:rest{2} >> rdf:first >>) yields (30). The expression (:timbl foaf:knows{2} >>) yields all individuals known by Tim Berners-Lee, in transitive fashion for two degrees. This is the same as (:timbl foaf:knows >> foaf:knows >>).
  • “{n,m}” — range quantifier. Applies its operator at least n times and at most m times. For instance, ((10 20 30) rdf:rest{0,1} >> rdf:first >>) yields (10) and (20). (:timbl foaf:knows{2,3} <<) yields all individuals from whom Tim Berners-Lee is two or three degrees removed, according to the foaf:knows mapping.

Note that despite this diversity of syntax, there is and always has been only one true application operator in Ripple, still called op. Apart from the forward application symbol “>>” which is simply an alias for op, all of the above constructions are merely syntactic sugar for expressions involving op together with one primitive mapping or another. For example, the expression (:timbl foaf:knows{2} >>) parses to the same Ripple program as (:timbl foaf:knows 2 timesApply op).

Prettifying the command line

Right. Blog. Keyboard. Fingers. Just start typing. So, I needed to take a screen capture of the Ripple command line for a presentation yesterday, and was a little embarrassed by this old and awkward formatting:

    
    1 >>  :timbl >> foaf:knows >> foaf:name >> .
    
    rdf:_1  ("Dan Brickley"@en)
    rdf:_2  ("Libby Miller")
    rdf:_3  ("Jim Hendler")
    rdf:_4  ("Henry J. Story")
    
    2 >>
    

Old, because this is how it has been since the dawn of Ripple time. Awkward, because:

  1. The >> input prompt clashes with the >> application operator (which in earlier versions of Ripple was a slash, apart from being an infix operator. More to come on the new syntax).
  2. The RDF Bag -styled index for query results (rdf:_1 and so on) has always been a little misleading. It’s particularly wrong now that Ripple is much more loosely coupled with the RDF data model.
  3. Without the spurious RDF Bag syntax, the parentheses around individual query results (indicating that they are lists) are as unnecessary as they are unsightly. Just as the top-level parentheses of a line of input are omitted — so you can type 2 3 add >> instead of the more obviously list-like (2 3 add >>) — so it can be with output: just pretend the parentheses are there, and remember that query results really are lists.

It took all of five minutes to put a much improved format in place:

    
    1) :timbl >> foaf:knows >> foaf:name >> .
    
      [1]  "Dan Brickley"@en
      [2]  "Libby Miller"
      [3]  "Jim Hendler"
      [4]  "Henry J. Story"
    
    2)
    

This does look better, doesn’t it?

Ripple’s not dead

Alright, so I’m not much of a blogger. I guess that’s obvious by now. Nonetheless, the subject of this very neglected blog, the Ripple language, has come a long way in the last seven-and-a-half months. Ripple is now used commercially, which has driven its development in new and interesting directions. The language and query environment, now compatible with any Sesame 2.0 Sail implementation, are clearly separated from the linked data client, which in turn is compatible with Sesame-based applications distinct from Ripple. The API has been extended to allow for more specialized network algorithms. A developer may now embed Ripple query strings in Java source code, making it much easier to use Ripple as a software component, as opposed to a stand-alone tool. The syntax of the language has grown and matured, with support for regular expressions, backward and forward traversal of networks, and user-friendly, pattern-matching program definitions. In short, Ripple is becoming a real programming language. As it’s an open-source language, I’ve decided that the source code really ought to be accessible somewhere (other than in months-old release packages), so I’ve put it on Google Code. Maybe I should pipe the SVN commit messages into my blog.

Note: you can check out an up-to-the-minute working copy of Ripple like so (requires a Subversion client)

svn checkout http://ripple.googlecode.com/svn/trunk/ ripple

Ripple 0.4 released

This is the first software release for Ripple since its debut at ESWC last June. While the syntax and computational model of of the language have not changed, the implementation contains a lot of new material in its libraries, and also makes better use of Java concurrency, speeding up queries and improving interactivity. Particular new features include:

  • Streaming query results. One shortcoming of previous releases was that there was no way to interrupt a query. It was all too easy to find yourself stuck waiting on a program which either would never terminate or was busy churning out far more results than you needed. What’s more, you had no way of seeing what was going on in those situations, because query results were tucked away in a buffer until the program halted. Now, individual query results stream onto your terminal as soon as they’re computed. A double tap to the ESC key tells the query engine that you’ve seen enough.
  • Floating point math. Ripple 0.3 was limited to integer arithmetic. Ripple 0.4, on the other hand, has over two dozen math primitives (borrowed entirely from Java), including trig functions and a random number generator, all of which play well with both integer and floating point numbers.
  • RDF document primitives. Ripple’s RDF-specific features are largely hidden behind its syntax. RDF documents are requested, parsed and manipulated transparently, as needed to answer queries. That’s as it should be. However, so often do I find myself using document-centric tools in conjunction with Ripple that it’s been very handy to just stick a couple of document-centric primitives in Ripple itself: the graph:triples primitive consumes the URI of a Semantic Web document and produces each of the statements contained in the document, while graph:namespaces produces each of the namespaces defined in the document.
  • Literal reification and type-casting primitives. A list of rewiring scenarios by Stefano Mazzocchi inspired a number of primitives for manipulating data types, including graph:toUri which consumes a string literal and produces a URI. Additionally, xsd:type and xml:lang are no longer mere URIs in Ripple: they’re primitive functions. xsd:type gives you the data type of a typed literal, and xml:lang gives you the language tag (if any) of a plain literal.
  • Even more new primitives. This release makes most java.lang.String methods available through a new string: library. I’ve thrown in a few new stack: primitives, too, as well as a services: library with hooks into selected Semantic Web services, including Sindice and Swoogle.
  • owl:sameAs smushing for backwards compatibility. Ripple’s primitive functions are identified with URIs which are as cool as my domain name (and I think that’s cool enough for alpha software). But what happens when primitives change? Older versions of Ripple will not understand programs which reference newer primitives, and that’s just how it goes. However, new versions of Ripple must be able to understand old programs, which is where owl:sameAs links come in. If a primitive described in a newer namespace is linked through owl:sameAs to a primitive in an older namespace, the application understands them to be the same thing. It makes publishing library descriptions that much easier.
  • Extension loader. Previously, if you wanted to add new primitives to Ripple, you had to insert them into Ripple’s own source code and rebuild. Now all you need is a Ripple dependency and a text file called extensions.txt which tells the application what to load when it starts up. To include custom libraries in the query environment, just list them by name.
  • Improved crawling. Ripple is now able to handle multiple HTTP connections in parallel, which means dramatically faster crawling. In keeping with crawler etiquette, Ripple maintains a history of HTTP requests as it goes, taking care not to burden any one host with rapid fire requests.
  • Miscellaneous: this release has a better ratio of unit tests to core classes than 0.3, as well as an improved launcher script, new command-line options, and a context-preserving TriG cache.

Finally, you might notice a side-project called the “Ripple media extension” alongside Ripple proper on the downloads page. It includes some new functionality which doesn’t necessarily belong in a command-line tool, but which I find interesting:

  • media:play — plays an audio (for now: MIDI) file
  • media:show — displays an image
  • media:speak — speaks a passage of text in the default FreeTTS voice

I expect this to grow into something of an experiment in visual and spoken Semantic Web UI.