Dust Feed: rdf

Showing posts with label rdf. Show all posts

Sunday, July 03, 2011

Resources In Various Frames of JSON

I've been mulling over the role(s) JSON should play in representing RDF the last couple of days (well, the last year or so really).

Having worked with RDF for some years now, more or less full time, in specific contexts (legal information, and lately educational data), I'm getting a hang of some usage patterns. I'm also following the general Linked Data community (as well as various REST endeavors, Atom applications, and various, mostly dynamic language based, platforms). In all of this, I've made some observations:

1. If I want to use RDF fully, anything but a full RDF API (Graph navigation, Resource and Literal objects and all) often cause a lot of friction. That is, unless I already know what "framing" I need of the data, I really need a whole data set to dig through, for purposes of e.g. finding incoming relations, looking up rdfs:label of types and properties, filtering on languages, irregular data (such as dc:creator used both with literals or URI references) and so on.

2. Encountering general JSON data (not RDF specific) on the web, I sometimes come across quite crude stuff, mostly representing a database slice, or gratuitous exports from some form of O/R mapper. It may look meaningful, but often shows signs of ad-hoc design, unstable modeling without "proper" domain understanding and/or implementation leakage. However, the data is accessible for most web programmers without the need to get the domain properly, no matter how poor this data representation may be. JSON is language native. The use case is to have users (programmers) be able to spin around a specific, boxed and digested slice of data. Ideally you should also be able to find and follow links in it. (Basically the values which match something like /^(http(s)?:|\/|\.\/)\S+/ ...).

3. If I know and control the data, I can frame it in a usage scenario (such as for rendering navigable web pages with summaries of entities) based on a specific domain (such as an article document, its revisions and author, etc.). Here is great potential for reducing the full data into a something, raw, more (web) programming language native. This is where a JSONic approach fits the bill. Examples of how such data can look includes the JSON data of e.g. New York Times, LCSH. The Linked Data API JSON is especially worth mentioning, since they explicitly reduce the data for this casual use so many need.

Point 1 is just a basic observation: for general processing, RDF is best used (produced and consumed) as RDF, and nothing else. It can represent a domain in high fidelity, and merges and is navigable in a way no other data model I've seen supports.

Point 2 is about quick and sometimes dirty. Cutting some corners to get from A to B without stopping for directions. You cannot do much more than that though, and in some cases, "B" might not be where you want to go. But it works, and if the use case is well understood, anything more will be considered waste for anyone not wanting the bigger contexts.

Point 3 then, is about how to go from 1 into 2. This is what I firmly believe the focus of RDF as JSON should be. And since 2 is many things, there may be no general answer. But there is at least one: how to represent a linked resource on the web, for which RDF exists, as a concise bounded description, showing inherent properties, outgoing links and per application considered relevant incoming links. And how to do this in JSON in high fidelity but immediately consumable by someone not wanting more than "just the data".

Many people have expressed opinions about these things of course. You should read posts by e.g. Leigh Dodds and Nathan Rixham, and look at some JSON Serialization Examples. Also monitor e.g. the Linked JSON W3C mailing list and of course the ongoing work of JSON-LD. Related to the Linked Data API and its "instrumental" JSON is also a recent presentation by Jeni Tennison: Data All the Way Down. It's short and very insightful. End-users have different needs than re-users!

Early on (over a year ago) I drafted Gluon. I have not used that much since. A related invention I have used though, is SparqlTree. While it isn't really a mechanism for defining how to map RDF terms to JSON (but to formulate SPARQL selects digestible into compact results), it does so quite well for specific scenarios. It is very useful to create frames to work on, where code paths are fully deterministic, and where there is a one-way direction of relations (which is needed in JSON trees, as opposed to RDF graphs where we can follow rel and rev alike). Admittedly I've done less than I should to market SparqlTree. But then again, it is a very simple and instrumental solution over an existing technology. I recently gave a glimpse of how I use it in a mail concerning the "Construct Where of SPARQL 1.1" .

Reflecting on all of this, I'm quite convinced that anything like RDF/XML or Turtle is beyond what JSON should ever be used for. That is, support for all kinds of general RDF, using prefixes (whom I love when I need to say anything about anything) and exposing the full, rich, internationalized and richly and extensibly datatyped world of literals is beyond the scenarios where JSON is useful. If you need full RDF, use Turtle. Seriously. It's the best! It's rather enjoyable to write, and I can consume it with any RDF API or SPARQL.

The only case I can think of where "full RDF in JSON" might apply is for machine-to-machine data where for some reason only JSON is viable. For this, I can see the value of having Talis' RDF/JSON standardized. It is used in the wild. It is reminiscent of the SPARQL results in JSON, which for me is also quite machine-like (and the very reason for me inventing SparqlTree in the first place!). I'd never hand-author it or prefer to work on it undigested. But that's ok. If handed to me I'd read it into a graph as quickly as possible, and that'd be dead simple to do.

So where does this leave us? Well, the Gluon I designed contains a general indecision, the split into a raw and a compact form. The problem is that they are overlapping. You can fold in parts of data into compact form. This is complex, confusing and practically useless. Also, the raw form is just another one in the plethora or more or less "turtle in JSON" designs which cropped up in the last years. I doubt that any such hybrid is usable: either you know RDF and should use Turtle, or you don't and you want simple JSON, without the richness of inlined RDF details.

My current intent is to remove the raw form entirely, and design the profile mechanism so that it is "air tight". I also want to make it as compact as possible, true to RDF idioms but still "just JSON". A goal will still also be that if present, a profile should be possible to use to get RDF from the JSON. This way, there is a possibility of adding the richer context and integratability of RDF to certain forms of well designed JSON. This of course implies that Gluon-profile compatible JSON will be considered well designed. But that is a goal. It has to look good for someone not knowing RDF!

I have a strawman of a "next generation gluon profile" in the works. I doubt that you can glimpse my design from that alone, but anyway.

Some things to note:

The 'default' feature will be more aligned with the @vocab mechanism of RDFa 1.1 (and JSON-LD)
Keywords ('reserved') can be redefined. There are preset top-level keys, but that's it. (A parser could parameterize that too of course.)
No CURIEs - every token is "imported" from a vocabulary.
Types will be powerful. They'll determine default 'vocab' for a resource description (i.e. JSON object), and you can also import terms locally for a type (so that a Person title is foaf:title although 'title' is globally from 'dc').
If there are multiple values for a term (i.e. multiple triples with the same subject and predicate), a defined prefix or suffix will be added to the term. This is an experiment to make this nagging problem both explicit and automatic.
The 'define' will be reduced to a much less needed component. Using 'autocoerce', pattern matching on values will be bravely used to coerce mainly date, dateTime and URI references to their proper types.
Incoming links can be represented as 'inverseOf' attributes, thus making it possible to frame more of a graph as a tree.
Named bnodes are out (though they might be snuck in via a "_:" link protocol..). Anonymous bnodes are just fine.

This is a design sketch though. Next steps are to work on adapting my test implementations and usage thereof.

An auxiliary but very interesting goal is the possibility of using these profiles in a high-level API wrapper around an RDF graph, making access to it look similar to using Gluon JSON as is (but with the added bonus of "reaching down" the abstraction to get at the details when needed). (This is the direction I've had in mind for any development of my nowadays aged Oort Python O/R mapper. More importantly, the current W3C RDF/Structured Data API design work also leans towards such features, with the Projection interface.)

(Note that profiles will reasonably not be anything like full "JSON schemas". It's about mapping terms to URI:s and as little else as possible to handle datatyping and the mismatch between graphs and JSON trees. There is a need for determining if a term has one or many values, but as noted I'm working on making that as automatic as possible. Casting datatypes is also needed in come cases but should be kept to a minimum.)

Finally, I really want to stress that I want to support the progress of JSON-LD! I really hope for an outcome to be a unification of all these efforts. The current jungle of slightly incompatible "RDF as JSON"s sketches is quite confusing (and I know, Gluon is one of the trees in that jungle). I believe JSON-LD and the corresponding W3C list is where the action is. Since there is work in JSON-LD on profiles/contexts, and a general discussion of what the use cases are, I hope that this post and my future Gluon profile work can help in the progress of this! For me Gluon is the journey and I hope JSON-LD is the destination. But there are many wills at work here, so let's see how it all evolves.

Saturday, September 27, 2008

Resources, Manifests, Contexts

Just took a quick look at oEmbed (found from a context I was led to from Emil Stenström (an excellent Django promoter in my surroundings btw. Kudos.)).

While oEmbed is certainly quite neat, I very much agree with the criticism regarding the lack of RESTfulness, and that they have defined a new metadata carrier. I think oEmbed would work very well as an extension element in Atom Entry documents (who already have most of the properties oEmbed (re-)defines). Or by reusing (in such atom entry docs) e.g. Media RSS, as Stephen Weber suggested.

Granted, if (as I do hope) RESTfulness and Atom permeation on the web becomes much more well established (approaching ubiquity), this would be dead easy to define further down the line. (And signs of this adoption continue to pop up, even involving the gargantuans..)

But since it wasn't done right away, oEmbed is to some extent another part of the fragmented web data infrastructure — already in dire need of unification. It's not terrible of course, JSON is very effective — it's just too context-dependent and stripped to work for much more than end-user consumption in "vertical" scenarios. While oEmbed itself is such a scenario, it could very well piggy-back on a more reusable format and thus promote much wider data usability.

A mockup (with unsolicited URI minting in the spaces of others) based on the oEmbed quick example could look like:


<entry xmlns="http://www.w3.org/2005/Atom"
       xmlns:oembed="http://oembed.com/ns/2008/atom/">
  <id>tag:flickr.com,2008:/3123/2341623661_7c99f48bbf_m.jpg</id>
  <title>ZB8T0193</title>
  <summary></summary>
  <content src="http://farm4.static.flickr.com/3123/2341623661_7c99f48bbf_m.jpg"
           type="image/jpg"/>
  <oembed:photo version="1.0" width="240" height="160"/>
  <author>
    <name>Bees</name>
    <uri>http://www.flickr.com/photos/bees/</uri>
  </author>
  <source>
    <id>tag:flickr.com,2008:/feed</id>
    <author>
      <name>Flickr</name>
      <uri>http://www.flickr.com/</uri>
    </author>
  </source>
</entry>

The main point, which I have mentioned before, is that Atom Entries work extremely well as manifests of resources. This is something I hope the REST community will pick up in a large way. Atom feeds complement the RESTful infrastructure by defining a standard format for resource collections, and from that it seems quite natural to expose manifests of singular resources as well using the same format.

In case you're wondering: no, I still believe in RDF. It's just easier to sell uniformity one step at a time, and RDF is unfortunately still not well known in the instrumental service shops I've come in contact with (you know, the ones where integration projects pop up ever so often, mainly involves hard technology, and rarely if ever reuse domain knowledge properly). So I choose to support Atom adoption to increase resource orientation and uniformity — we can continue on to RDF if these principles continue to gain momentum (which they will, I'm sure).

Thus I also think we should keep defining the bridge(s) from Atom to RDF for the 3.0 web.. There are some sizzling activities on that respect which can be seen both in the Atom syntax mailing list and the semantic web list. My interest stems from what I currently do at work (and as a hobby it seems). Albeit this is from a very instrumental perspective — and as a complement, rather than an actual bridge.

In part, it's about making Atom entries from RDF, in order for simple RESTful consumers to be able to eat some specific Atom crumbs from the semantic cakes I'm most certainly keeping (the best thing since croutons, no doubt). These entries aren't complete mappings, only selected parts, semantically more coarse-grained and ambiguous. While ambiguity corrupts data (making integration a nightmare), it is used effectively in "lower-case sem-web" things such as tagging and JSON. (Admittedly I suppose it's ontologically and cognitively questionable whether it can ever be fully avoided though.)

We have proper RDF at the core, so this is about meeting "half way" with the gist of keeping things simple without loosing data quality in the process. To reduce and contextualize for common services — that is at the service level, not the resource level. (I called this "RA/SA decoupling" somewhere, for "Resource Application"/"Service Application". Ah well, this will all be clarified when I write down the COURT manifesto ("Crafting Organisation Using Resources over Time"). :D)

Hopefully, this Atom-from-RDF stuff will be reusable enough to be part of my Out of RDF Transmogrifier work. Which (in my private lab) has been expanded beyond Python, currently with a simple Javascript version of the core mapping method ("soonish" to be published). Upon that I'm aiming for a pure js-based "record editor", ported from an (py-)Oort-based prototype from last year. I also hope the service parts of my daytime work may become reusable and open-sourced as well in the coming months. Future will tell.

Tuesday, June 17, 2008

Brave New World

[I wrote this little silliness more than a year ago. Didn't find an appropriate time to post it (it served mostly as critique of the computerized semantics I usually like so much). Now, with Sweden (where I live) on the brink of legalizing a little Echelon of our own (or their own, as it were), it seems as good a time as any.]

.. Sometimes I wonder if what we do is just a fancy way of digging our own graves.


Echelon> Event 116822052.26#113967:
  Instant Message detected: thinking.. done (in 0.032 ms).
  Storing inferred knowledge as:
   [ foaf:mbox_sha1sum "286e3265...";
     emo:likes <urn:bbdb:1997:python> ].
  in context:
   <urn:gov:surveillance:event:116822052.26:113967>
     a :NLPResult;
     :timestamp 116822052.26032;
     :trustability 0.681132 .
  Done. Running ruleset..
   CoreBot>
     - No known subject for:
       owl:inverseFunctionalProperty
         foaf:mbox_sha1sum with "286e3265..."
       , using BNode _:a21c3dd723ffe39
       (subject of 31102 previous facts).
     - rdf:type foaf:Person inferred for _:a21c3dd723ffe39.
     - No unknown knowledge gained. Done.
   EmotionTracker>
     - increasing likability of <urn:bbdb:1997:python> .
   MicroSoftAgent>
     - increasing demand of <urn:bbdb:2006:ironpython> .
     - increasing threat of <urn:bbdb:1998:opensource> .
  Backtrack threshold at 1337
    - refactoring reasoner dependencies.. Done.
  Done (passed through 4211 reasoners).
  Done (in 0.11 ms).

Echelon> System Update:
  "Owner Change. Renaming."
  Done (in 0.002 ms).

SkyNet>

Tuesday, May 27, 2008

Representation Taxation

It seems my thoughts about Atom Entries will be delayed a while longer. This post begun life as a response to this "BlubML" post by Bill de hÓra, but quickly turned out to be about my view of data models, as used in the IT industry yesterday, today and tomorrow. And industry rather ridden with technology wars, which has reduced the status of information into mere fodder. Leaving meaning, shared and reused concepts, discoverability, integratability and hence interoperability in many ways just a far-off vision many never even have time time to think about (leaving us thrashing data in the technology trenches).

The mentioned post reflects (by quotation) upon markup and the perceived "angle bracket tax". I can definitely get that. But, as mentioned in a comment by Iain Buckingham, the issue at hand is probably mainly about what the markup is supposed to represent. (And now I totally leave the subject of lightweight text formats (which I like, by the way); I'll focus on data models, not syntax). Take the content model of XML. It is a mixture of plain text (useful for primitive data like numbers, dates and other "human language" constructs) and different structure blocks/markers (elements, attributes, pi:s, some more) who have no useful semantics apart from a label (sometimes in a namespace), ordering and nesting. This has been proven as very useful to represent documents, where documents range from books via web pages (also as application views) to vector graphics and so on.

The "defining nature" of documents is difficult to pin down, but it seems we get by fairly efficiently anyway. Mainly since this semi-structure is intended (and, incidentally, often indented) for one-way (sometimes layered) rendering of output in turn meant for human consumption, which actually works without information precision (albeit sometimes with more or less apparent negative concequences).

Documents aside, let's continue to data records. More or less the first use of XML was as a serialization of the RDF data model. This model has real semantics, where unique resources are described with statements, whose parts (subject, predicate, object) are either resources (all three) or primitive values (for objects). Unfortunately, RDF still struggles to emerge as a useful representation in the industry at large, partly due to its initial appearance as XML (from whose abundance of namespaces and literal URI:s even many XML neophytes has fled in panic), partly due to its data model being more academic (rooted in logics and AI) and thus less approachable than the (IMHO) less expressive but quite succinct object oriented, class based data model of most common modern programming languages.

This latter model has always been fragmented by language differences, and up close hard to pin down due to conflicting computer science details (OO encapsulation, coupling with function and message metaphors etc). And while objects often live in (connect as) graphs, their identity has been hidden, often being a mere memory address. Quite parallell with this, data has been pushed into the even more mechanical and implementation focused semantics of the relational model, in SQL databases, for decades. While these records do have explicit IDs (or composites, or...), they are for internal use within a given application context. Any usage beyond that must be secured by convention outside of that. The often cumbersome use of RDBs (where "splitting and slicing" all but the flattest records is needed) has been minimalistically remedied by the O/R-mapping tools in the OO languages, which has proven exceptionally useful in introspectable and/or dynamic languages. But it's mostly just a prettier surface upon the same old relational model. Signs of which can be seen by the specific restrictions (that many O/R:s have) placed upon the object semantics of the "host" language (inheritance, composition, coupling with functionality), and not the least the invention of SQL-like languages, mostly one per mapping implementation (although some credit should go to earlier efforts by the object database people for trying to standardize such things).

But there are very useful aspects of an OO-based representation. As a common denominator, JSON should be held up as an extremely useful format (albeit quite void of namespacing). It represents the "bare bones" data record format which is isomorphic with the OO languages (be it class- or prototype-based ones). And although a more narrow scope, it has more defined semantics than XML, since it explicitly differs between properties and lists (and does not introduce the artificial controversy of whether to use "attributes" or "elements"). But it's still a long way from the identifiable resources and datatyped or human language typed literals of RDF. Depending on context/domain needs, this may or may not be a painful point to realize (if not, JSON is indeed the simpler thing that actually works).

The attempt to express similar semantics in W3C:s XML Schema is to me a painful one, and in view of the complexities it adds and how little it brings in succinctness and clarity (and immediately/apparently useful general semantics) is a sign of that schema technology's failure. Use of RelaxNG isn't too bad though, and I find nothing inherently long in some formats, such as Atom, whose role is to bring a packaging and insulating channel format for resources (in the URI, REST, and most imporantly the RDF sense of the word). Atom defines a model which is scoped and simple (as in easy to grasp, limited in expressiveness). This is done with a text specification (RFC 4287 and its kin) coupled with a (non-normative) RelaxNG (compact) schema for easier machine use. For some things this can be a fitting approach. I don't believe its the best way to fully express records though (but perhaps to glean a narrow but useful aspect of such).

Now, let's turn the focus back to the objects often used as records with e.g. O/R technologies. It's been mainly through the emergence of adherence to the principles of REST, in turn (in practise) dependent upon URI:s and ideals of "cool URI:s" (for e.g. sustainable connectedness), that these objects have gained identities usable outside of a given, often ad-hoc defined, changing and quite internal (think "data silo") management of the data at hand. This process is the evolution of the Web as a data platform (which I believe should be the basis of a stable distributed computing platform, where one is needed), and it seems to me that the use of RDF is now a very promising next step. Since it embraces (indeed builds upon) this uniform and global identification scheme. Since it is "data first" oriented — you can use it to state things about resources which are meaningful even for machine processing without any invention/specification on your part. And since (due to URI:s and the semantic precision) it's less vulnerable to coupling with internal solutions, which "bare" O/R-mapped objects often are (in the same way, albeit less humongous, as automatically generated XML-schema defined API-fragments that hide in SOAP envelopes), due to implementation details (and sometimes a hard-wired and thus fragile URI minting mechanism).

With JSON, RDF shares the decoupling of types/classes ("frames" in AI) and properties of resources. But in JSON properties are simple, totally local keys, and not first-class resources themselves possible to describe in a uniform manner. And JSON has other upper limits (such as the literal language mechanism), which can't be surpassed without adding semantics, "somehow" (e.g. conventions). This may work for specific needs (and very well), but is hard to scale to the levels of interoperability needed for a proper web-based data record format. (Similar problems riddles it's ragged step-cousin microformats, which also lacks formal definitions for how to handle (especially decentrally so).)

Back in "just XML"-land, as said, it seems to work for the semi-structured logical enigmas of XML that the above mentioned "documents" are, but as for data records, the XML format will be a serialization of some very specific, elsewhere explicitly defined model (document formats as well of course, but these often make some use of the more exoteric mixed content model XML supports). One such model is the "web resources over time" that Atom works for (coupled with the HTTP infrastructure and RESTful principles). Granted, such XML-based formats can be used without worrying too much about the mysterious XML infoset model (including the RDF/XML format, its free-form character aside). But to be useful, they need to be standardized and extensively supported by the larger community. Otherwise, apart from needing to design and implement the deserialized model from scratch, the creeping complexities that an ill-thought XML model design lets in (or forever locks out any extensibility from) may emerge whenever you want to express something idiomatic to your data.

That's all. This was mostly letting out some thoughts that have been building up pressure in my mind lately. And became yet another prelude to my thoughts still simmering regarding how to effectively use Atom with payloads of RDF. And also about some of the more crude ways, if you need to gain ground quickly (but more or less imperfectly). You can do that with JSON (fairly ok, but with JSON you're quite localized anyway, so why take the Atom highway if you only need to use your bike?), microformats (not ok in by my standards (again: where's your model?)), RDFa (if you have overlapping needs it can be brilliant), or with simplistic Atom extensions (could be very XML-taxating and thus risky; but done "right" it's kind of a "poor mans RDF", a bleak but somewhat useful shadow of the real thing).

(Admittedly this post also turned out to be a rant with questionable focus, but why not. Pardon my posting in quite a state of fatigue.)

Wednesday, April 16, 2008

Getting lxml 2 into my gnarly old Tiger; going on ROA

In case anyone might google for something like:

lxml "Symbol not found" _xmlSchematronNewParserCtxt

, I'd like to reiterate the steps I just took to get lxml 2 (2.1 beta 1 to be precise) up and running on OS X 10.4 (which I haven't yet purged in favour of the shiny Leopard for reasons of fear, uncertainty and doubt. And while pleasurable, quite mixed impressions from that cat when running my wicked new iMac (and my funky old iBook, in case you're dying to know)).

In short: you need a fresh MacPorts (1.6, I had 1.3.something):

$ sudo port selfupdate

, and then, wipe your current libxml2 (along with any dependent libraries, MacPorts will tell you which):

$ sudo port uninstall libxml2@2.6.23_0

(You might not need the version number; I had tried to get the new libxml2 before this so there was an Activating libxml2 2.6.31_0 failed: Image error: Another version of this port thingy going on.) Then just:

$ sudo port install libxml2 libxslt

and you'll be ready to do the final (after removing any misbehaving lxml (compiled with the old libxml2) from site-packages) :

$ sudo easy_install lxml

But why?

Because lxml 2 is a marvellous thing for churning through XML and HTML with Python. There was always XPath and XSLT, C14N in lxml 1.x too (admittedly also in 4Suite as well; Python has had strong XML support for many, many years). But in lxml 2, you also get:

CSS selectors
much improved "bad HTML" support (including BeautifulSoup integration)
a relaxed comparison mechanism for XML and HTML in doctests!

And more, I'm sure.

So, why all this Yak Shaving tonight? Just this afternoon I put together an lxml and httplib2-based "REST test" thing. Doesn't sound too exciting? I think it may be, since I use it with the old but still insanely modern doctest module, namely its support for executing plain text documents as full-fledged tests. This gives the possibility (from a tiny amount of code), to run plain text specifications for ROA-apps with a single command:

Set up namespaces:

 >>> xmlns("http://www.w3.org/1999/xhtml")
 >>> xmlns(a="http://www.w3.org/2005/Atom")

We have a feed:

 >>> http_get("/feed",
 ...     xpath("/a:feed/a:link[@rel='next-archive']"))
 ...
 200 OK
 Content-Type: application/atom+xml;feed
 [XPath 1 matched]

Retrieving a document entry with content-negotiation:

 >>> http_get("/publ/sfs/1999:175",
 ...         headers={"Accepts": "application/pdf"},
 ...         follow=True)
 ...
 303 See Other
 Location: /publ/sfs/1999:175/pdf,sv
 [Following..]
 200 OK
 Content-Type: application/pdf
 Content-Language: sv
 [Body]

That's the early design at least. Note that the above is the test. Doctest checks the actual output and complains if it differs. Declarative and simple. (And done before, of course.)

Ok, I was a bit overambitious with all the "ROA"; this is mostly for checking your content-negotiation, redirects and so on. But I think it'll be useful for a large system I'm putting together, which will (if things work out well) at its core have a well-behaved Atom-based resource service loaded with (linked) RDF payloads. Complete with feed archiving and tombstones, it will in turn be used as fodder for a SPAQRL service. (Seriously; it's almost "SOA Orchestration Done Right", if such a thing can be done.) But this I'll get back too..

(.. I may call the practice "COURT" — Crafting Organization Using Resources over Time. It's about using Atom Entries as manifests of web resources with multiple representations, fully described by RDF and working as an over-time serialized content repository update stream. (Well, that's basically what Atom is/works perfectly for, so I'm by no means inventing grand things here — grand as they IMHO are.)

Sunday, October 07, 2007

Out To Transmogrify Resource Descriptions

Finally, I've bundled another release of Oort. And OortPub! I've modularized the package into two Eggs, separating the core O/R things from the (experimental) WSGI web stuff.

At the cheeseshop you'll now find them both, at

Oort (0.4), and
OortPub (0.1)

respectively. I did this not the least since the core Oort package is very useful on its own, in conjunction with other tools altogether. I've used it for a lot of stuff lately, none of which involved OortPub. Check out e.g. the QueryContext for some new features..

I hope this will become useful to others as well.

The website got a little overhaul as well, to reflect this change, and to include some more documentation (not so much more yet, but the basic stuff is in place). Oh, by the way, it's generated using Oort objects (among other things like Genshi, ReStructuredText, google-code-prettify and of course RDFLib).

Tuesday, March 27, 2007

Cutting Edges

For all of you out there writing droves of RDF, churning out endless amounts of Notation 3, and doing it all in Vim.. (we must be in the millions I'm sure), be sure to check out my little RDF Namespace-complete — giving you RDF Vocabulary/Model Omni-Completion for Vim 7+. Requires Vim compiled with Python and RDFLib.

On a related note, I've been using RDFLib a lot lately (around the clock as it seems). It's an honor to contribute in an ever so small way to it, and it's great to see it progressing so well (it's been mature for years if that needs to be said). The SPARQL-support is working, and along with Chimezie Ogbuji's FuXi there is now an even stronger Python RDF platform to work with.

Monday, February 12, 2007

Oort In Space, In Time, At 0.3.2

Oort is now at 0.3.2. See the release history for details. Also, check out the tutorial — now functional at last.

Apart from fixes, the API is slightly cleaner and RdfQueries can now be updated, directly or by loading a dictionary. This means that JSON to RDF is a no-brainer when using Oort to do the semantic enhancements. (This is very new though, use with care.)

Friday, January 26, 2007

A Great Day for Specificity

dbpedia.org - Using Wikipedia as a Web Database

[...] dbpedia can also be seen as a huge ontology that assigns URIs to plenty of concepts and backs these URIs with with dereferencable RDF descriptions.

We have advanced a tech level!

This is really good timing, as I just recently considered using TDB URNs for referencing non-retrievable things and concepts (using TDB-versions of URLs to Wikipedia articles and web sites). Finding out if the idea of TDB and DURI URNs have been long since abandoned was my next step down that path. (Then there's the use of owl:InverseFunctionalProperty and bnodes (or "just-in-time"-invented ad-hoc URIs) and throwing in owl:sameAs if official ones are discovered..)

With DBPedia, a lot of that becomes unnecessary. The availibility of public, durable URIs for common concepts will surely ease the integration of disparate sources of knowledge. That is, if we start to use dbpedia-URIs in our statements.

And gone will be the strangeness of saying "I'm interested in [the word 'semantics']", "This text was edited with [the website <http://vim.org>]" and "I was born in [the wikipedia article about 'Stockholm']"..

Wednesday, January 17, 2007

Knowledge, The Bits and Pieces

In a recent post by Lee Feigenbaum, he talks about Using RDF on the Web. Naturally I find this very interesting. In my work on the Oort toolkit, I use an approach of "removing dimensions": namespaces, I18N (optionally), RDF-specific distinctions (collections vs. multiple properties) and other forms of graph traversing. This is done with declarative programming very similar to ORM-tools in dynamic languages (mainly class declarations with attributes describing desired data selection). The resulting objects become simple value trees — with the added bonus of automatic JSON-serialization.

Ideally, this approach will be deterministically reversible. I have not yet implemented it, but the idea is that the declared classes (RdfQueries in Oort — let's call them "facets" here) could take JSON as input and reproduce triples. Using checksums of the JSON would make over-the-web editing possible.

Since the task of "updating" a subgraph is somewhat difficult at best, I think a basic "wipe and replace" approach may be simplest. There are many dangers here of course (removing a relation must not remove knowledge about its object — unless perhaps if that object is a bnode..).

Albeit all of this is Python to the core now, nothing in the design — declarative as it is — prevents the approach from being more general. Indeed, such facets, were they themselves serializable, could be used as structured content retrieval over-the-web too. Ok, maybe I'm reinventing SPARQL now.. Or should I use SPARQL for their remote execution? It seems reasonable (I mean that's exactly how ORMs do SQL).

Now, I seem to end up in the RPC/RESTful camp with this. A solution to that could be: use the facets on the client, having them use SPARQL for retrieval. Then you have clients working against any kind of SPARQL endpoint, mashup-style. Still, if facets are completely reversible, they may be powerful, aware tools, and perhaps an alternative to SPARQL in intercommunication? That's a pipe dream for me right now though.

The SPARQL-way is of course only for reading data. Fine, the RDF model may be too expressive (fine-grained) to be practical for direct over-the-web editing in specific situations anyway. A confined approach such as this JSON+checksums+"careful with nested knowledge" may be better for this.

I think of JSON-RPC here as I view e.g. microformats with GRDDL — it's leverage, not a final solution. RDF models are the ideal, we may just need reversible simplifications to gain mass usability. I touched upon JSON for integrated apps, but stuff like Atom for "talking to strangers" in response to a previous post by Lee. His post which I refer to above hints at better stuff than just Atom if we want RDF all the way also in more specific apps.

So, what I'm saying is really: I also desire the best of two worlds. The RDF model is sound and extremely valuable, but after all, simple domain-specific object manipulation is what makes the web go round. A solution may be some form of O/R mapping for RDF. The difference is not wanting to forget the data layer (there is no monster in the closet as there is with raw SQL..), just streamline some of the work with it.

Let's hope 2007 will be a good year for the knowledge revolution.

Tuesday, January 16, 2007

Cobwebs and Rainbows, Round 3

Web 3.0. Like 2.0, but meaningful and integrated (as opposed to unqualified punyformats and mess-ups). And sensible (whatever that means.. "think Scandinavian Ajax" — to cite my boss at V.).

I'll be damned if that revolution won't finally take us from the information age to the knowledge age. Of course, this would mean yet another technological shift — one I've been longing for for a long time. Let's see (disregarding "Enterprise Web X" here; sorry Java and whatNot- uhm.. dotNet):

Web 1.0: Perl + random flatfiles and some raw SQL (websafe black, white and blue)
Web 1.5: PHP + SQL (off-black on white, green and blue hues, some orange)
Web 2.0: Rails + generated SQL (pink on white, huge fonts, lots of attitude — think adolescence)
Web 3.0: Python + RDF (color is not an issue)

Or I'll die trying. Truly. Die. Trying. (Well at least I may end up drinking as much in my thirties as I did in my twenties. Cheers.)

Then for 4.0, I suppose fully OWL-enabled reasoners and modernized but still brutal OCaml could be fun.

[I'm not utterly serious here, but a bit. Please don't take too much offense.]

Saturday, December 23, 2006

Oort To Got Itself Updated

Version 0.3.1 of Oort is out, with some fixes, cleanup and new stuff. See the release history for details. (Also, note that Oort now resides at http://oort.to/).

I've been using RDF a lot lately (in work with legal information for the swedish government), which gave me opportunity for more real-life testing of this toolkit. I hope you find it of value too.

The base technologies of Oort — RDFLib, Genshi and Paste — have gotten updates lately too. Be sure to check them out.

Tuesday, October 03, 2006

One Oort to RDF, o-dot-three

Another one hits the Dust.. feed. Uhm. Ok, seriously.. I just put up Oort 0.3, with some new stuff. Mainly a rudimentary template for Paste. And the switch from Kid to Genshi for templating.

I've also begun writing some kind of tutorial.. Like many others, I use dp.SyntaxHighlighter for the code samples (the Oort pages themselves are written in reStructuredText and assembled with the other stuff through Python and Genshi).

And the Oort pages are looking a tad bit better, I hope. ;)

Friday, September 29, 2006

Nightly Rewrites

After literally an all-nighter yesterday Oort 0.2 is now up for grabs. A very sparse example at that page will serve as a stand-in for a proper introduction for a while. After the cleanup mentioned earlier, I can finally toy with some operator overloading sugar (>> for now). There are some tests included now, and then there is the addition of RdfDisplay and JsonDisplay to remove the need for templates to carry out serialization of query results to those formats.

Wednesday, September 27, 2006

Whisky Pasting Space Clouds

My spirits somewhat rekindled by this nice presentation about WSGI, Paste and Pylons, I finally undertook the actually easy task of switching from CherryPy to a pure Paste-utilizing WSGI. Why? What? Last things first: Oort, as previously mentioned, is a little web toolkit for building RDF-driven web apps. It has a very specific scope, and thus the why-answer is: because it's better suited as a small WSGI-thing composable with whatever else you want to build legac* ehrm, regular web apps with. So go ahead and publish your RDF Graphs (I'm sure you have an abundance of them) with Oort, right among your other object branches in CherryPy, or mod_python, or among the URLs of a Pylons app, or whatever.

WSGI is a really neat thing, being so simple (to use) as it is. Same goes for RDF, in my opinion, but that's another rant. (A quite reoccurring one at that, if you happen to know me..)

Oh, where is Oort, you may ask? I finally put it up for display at the CheeseShop tonight. Like so many other things, it's alpha software and all. It is used for real work though. Next thing could be a little how-to (and why not a nice logo..).