Sunday, July 03, 2011

Resources In Various Frames of JSON

I've been mulling over the role(s) JSON should play in representing RDF the last couple of days (well, the last year or so really).

Having worked with RDF for some years now, more or less full time, in specific contexts (legal information, and lately educational data), I'm getting a hang of some usage patterns. I'm also following the general Linked Data community (as well as various REST endeavors, Atom applications, and various, mostly dynamic language based, platforms). In all of this, I've made some observations:

1. If I want to use RDF fully, anything but a full RDF API (Graph navigation, Resource and Literal objects and all) often cause a lot of friction. That is, unless I already know what "framing" I need of the data, I really need a whole data set to dig through, for purposes of e.g. finding incoming relations, looking up rdfs:label of types and properties, filtering on languages, irregular data (such as dc:creator used both with literals or URI references) and so on.

2. Encountering general JSON data (not RDF specific) on the web, I sometimes come across quite crude stuff, mostly representing a database slice, or gratuitous exports from some form of O/R mapper. It may look meaningful, but often shows signs of ad-hoc design, unstable modeling without "proper" domain understanding and/or implementation leakage. However, the data is accessible for most web programmers without the need to get the domain properly, no matter how poor this data representation may be. JSON is language native. The use case is to have users (programmers) be able to spin around a specific, boxed and digested slice of data. Ideally you should also be able to find and follow links in it. (Basically the values which match something like /^(http(s)?:|\/|\.\/)\S+/ ...).

3. If I know and control the data, I can frame it in a usage scenario (such as for rendering navigable web pages with summaries of entities) based on a specific domain (such as an article document, its revisions and author, etc.). Here is great potential for reducing the full data into a something, raw, more (web) programming language native. This is where a JSONic approach fits the bill. Examples of how such data can look includes the JSON data of e.g. New York Times, LCSH. The Linked Data API JSON is especially worth mentioning, since they explicitly reduce the data for this casual use so many need.

Point 1 is just a basic observation: for general processing, RDF is best used (produced and consumed) as RDF, and nothing else. It can represent a domain in high fidelity, and merges and is navigable in a way no other data model I've seen supports.

Point 2 is about quick and sometimes dirty. Cutting some corners to get from A to B without stopping for directions. You cannot do much more than that though, and in some cases, "B" might not be where you want to go. But it works, and if the use case is well understood, anything more will be considered waste for anyone not wanting the bigger contexts.

Point 3 then, is about how to go from 1 into 2. This is what I firmly believe the focus of RDF as JSON should be. And since 2 is many things, there may be no general answer. But there is at least one: how to represent a linked resource on the web, for which RDF exists, as a concise bounded description, showing inherent properties, outgoing links and per application considered relevant incoming links. And how to do this in JSON in high fidelity but immediately consumable by someone not wanting more than "just the data".

Many people have expressed opinions about these things of course. You should read posts by e.g. Leigh Dodds and Nathan Rixham, and look at some JSON Serialization Examples. Also monitor e.g. the Linked JSON W3C mailing list and of course the ongoing work of JSON-LD. Related to the Linked Data API and its "instrumental" JSON is also a recent presentation by Jeni Tennison: Data All the Way Down. It's short and very insightful. End-users have different needs than re-users!

Early on (over a year ago) I drafted Gluon. I have not used that much since. A related invention I have used though, is SparqlTree. While it isn't really a mechanism for defining how to map RDF terms to JSON (but to formulate SPARQL selects digestible into compact results), it does so quite well for specific scenarios. It is very useful to create frames to work on, where code paths are fully deterministic, and where there is a one-way direction of relations (which is needed in JSON trees, as opposed to RDF graphs where we can follow rel and rev alike). Admittedly I've done less than I should to market SparqlTree. But then again, it is a very simple and instrumental solution over an existing technology. I recently gave a glimpse of how I use it in a mail concerning the "Construct Where of SPARQL 1.1" .

Reflecting on all of this, I'm quite convinced that anything like RDF/XML or Turtle is beyond what JSON should ever be used for. That is, support for all kinds of general RDF, using prefixes (whom I love when I need to say anything about anything) and exposing the full, rich, internationalized and richly and extensibly datatyped world of literals is beyond the scenarios where JSON is useful. If you need full RDF, use Turtle. Seriously. It's the best! It's rather enjoyable to write, and I can consume it with any RDF API or SPARQL.

The only case I can think of where "full RDF in JSON" might apply is for machine-to-machine data where for some reason only JSON is viable. For this, I can see the value of having Talis' RDF/JSON standardized. It is used in the wild. It is reminiscent of the SPARQL results in JSON, which for me is also quite machine-like (and the very reason for me inventing SparqlTree in the first place!). I'd never hand-author it or prefer to work on it undigested. But that's ok. If handed to me I'd read it into a graph as quickly as possible, and that'd be dead simple to do.

So where does this leave us? Well, the Gluon I designed contains a general indecision, the split into a raw and a compact form. The problem is that they are overlapping. You can fold in parts of data into compact form. This is complex, confusing and practically useless. Also, the raw form is just another one in the plethora or more or less "turtle in JSON" designs which cropped up in the last years. I doubt that any such hybrid is usable: either you know RDF and should use Turtle, or you don't and you want simple JSON, without the richness of inlined RDF details.

My current intent is to remove the raw form entirely, and design the profile mechanism so that it is "air tight". I also want to make it as compact as possible, true to RDF idioms but still "just JSON". A goal will still also be that if present, a profile should be possible to use to get RDF from the JSON. This way, there is a possibility of adding the richer context and integratability of RDF to certain forms of well designed JSON. This of course implies that Gluon-profile compatible JSON will be considered well designed. But that is a goal. It has to look good for someone not knowing RDF!

I have a strawman of a "next generation gluon profile" in the works. I doubt that you can glimpse my design from that alone, but anyway.

Some things to note:
  • The 'default' feature will be more aligned with the @vocab mechanism of RDFa 1.1 (and JSON-LD)
  • Keywords ('reserved') can be redefined. There are preset top-level keys, but that's it. (A parser could parameterize that too of course.)
  • No CURIEs - every token is "imported" from a vocabulary.
  • Types will be powerful. They'll determine default 'vocab' for a resource description (i.e. JSON object), and you can also import terms locally for a type (so that a Person title is foaf:title although 'title' is globally from 'dc').
  • If there are multiple values for a term (i.e. multiple triples with the same subject and predicate), a defined prefix or suffix will be added to the term. This is an experiment to make this nagging problem both explicit and automatic.
  • The 'define' will be reduced to a much less needed component. Using 'autocoerce', pattern matching on values will be bravely used to coerce mainly date, dateTime and URI references to their proper types.
  • Incoming links can be represented as 'inverseOf' attributes, thus making it possible to frame more of a graph as a tree.
  • Named bnodes are out (though they might be snuck in via a "_:" link protocol..). Anonymous bnodes are just fine.
This is a design sketch though. Next steps are to work on adapting my test implementations and usage thereof.

An auxiliary but very interesting goal is the possibility of using these profiles in a high-level API wrapper around an RDF graph, making access to it look similar to using Gluon JSON as is (but with the added bonus of "reaching down" the abstraction to get at the details when needed). (This is the direction I've had in mind for any development of my nowadays aged Oort Python O/R mapper. More importantly, the current W3C RDF/Structured Data API design work also leans towards such features, with the Projection interface.)

(Note that profiles will reasonably not be anything like full "JSON schemas". It's about mapping terms to URI:s and as little else as possible to handle datatyping and the mismatch between graphs and JSON trees. There is a need for determining if a term has one or many values, but as noted I'm working on making that as automatic as possible. Casting datatypes is also needed in come cases but should be kept to a minimum.)

Finally, I really want to stress that I want to support the progress of JSON-LD! I really hope for an outcome to be a unification of all these efforts. The current jungle of slightly incompatible "RDF as JSON"s sketches is quite confusing (and I know, Gluon is one of the trees in that jungle). I believe JSON-LD and the corresponding W3C list is where the action is. Since there is work in JSON-LD on profiles/contexts, and a general discussion of what the use cases are, I hope that this post and my future Gluon profile work can help in the progress of this! For me Gluon is the journey and I hope JSON-LD is the destination. But there are many wills at work here, so let's see how it all evolves.