Dust Feed

One Liner At a Time

2011-11-09T19:14:00.003+01:00

Just some one-liners. 'Cause, you know, to post something, and to each one its own.

even_map = dict(('key_%s' % i, i) for i in range(1, 9) if i % 2 == 0)

even_map = Hash[(1..9).find_all { |i| (i).even? }.collect {
    |i| ["key_#{i}", i] }]

(def even-map (into {} (for [i (range 1 9) :when (= (mod i 2) 0)]
                         [(keyword (str "key_" i)) i])))

Resources In Various Frames of JSON

2011-07-03T12:30:00.008+02:00

I've been mulling over the role(s) JSON should play in representing RDF the last couple of days (well, the last year or so really).

Having worked with RDF for some years now, more or less full time, in specific contexts (legal information, and lately educational data), I'm getting a hang of some usage patterns. I'm also following the general Linked Data community (as well as various REST endeavors, Atom applications, and various, mostly dynamic language based, platforms). In all of this, I've made some observations:

1. If I want to use RDF fully, anything but a full RDF API (Graph navigation, Resource and Literal objects and all) often cause a lot of friction. That is, unless I already know what "framing" I need of the data, I really need a whole data set to dig through, for purposes of e.g. finding incoming relations, looking up rdfs:label of types and properties, filtering on languages, irregular data (such as dc:creator used both with literals or URI references) and so on.

2. Encountering general JSON data (not RDF specific) on the web, I sometimes come across quite crude stuff, mostly representing a database slice, or gratuitous exports from some form of O/R mapper. It may look meaningful, but often shows signs of ad-hoc design, unstable modeling without "proper" domain understanding and/or implementation leakage. However, the data is accessible for most web programmers without the need to get the domain properly, no matter how poor this data representation may be. JSON is language native. The use case is to have users (programmers) be able to spin around a specific, boxed and digested slice of data. Ideally you should also be able to find and follow links in it. (Basically the values which match something like /^(http(s)?:|\/|\.\/)\S+/ ...).

3. If I know and control the data, I can frame it in a usage scenario (such as for rendering navigable web pages with summaries of entities) based on a specific domain (such as an article document, its revisions and author, etc.). Here is great potential for reducing the full data into a something, raw, more (web) programming language native. This is where a JSONic approach fits the bill. Examples of how such data can look includes the JSON data of e.g. New York Times, LCSH. The Linked Data API JSON is especially worth mentioning, since they explicitly reduce the data for this casual use so many need.

Point 1 is just a basic observation: for general processing, RDF is best used (produced and consumed) as RDF, and nothing else. It can represent a domain in high fidelity, and merges and is navigable in a way no other data model I've seen supports.

Point 2 is about quick and sometimes dirty. Cutting some corners to get from A to B without stopping for directions. You cannot do much more than that though, and in some cases, "B" might not be where you want to go. But it works, and if the use case is well understood, anything more will be considered waste for anyone not wanting the bigger contexts.

Point 3 then, is about how to go from 1 into 2. This is what I firmly believe the focus of RDF as JSON should be. And since 2 is many things, there may be no general answer. But there is at least one: how to represent a linked resource on the web, for which RDF exists, as a concise bounded description, showing inherent properties, outgoing links and per application considered relevant incoming links. And how to do this in JSON in high fidelity but immediately consumable by someone not wanting more than "just the data".

Many people have expressed opinions about these things of course. You should read posts by e.g. Leigh Dodds and Nathan Rixham, and look at some JSON Serialization Examples. Also monitor e.g. the Linked JSON W3C mailing list and of course the ongoing work of JSON-LD. Related to the Linked Data API and its "instrumental" JSON is also a recent presentation by Jeni Tennison: Data All the Way Down. It's short and very insightful. End-users have different needs than re-users!

Early on (over a year ago) I drafted Gluon. I have not used that much since. A related invention I have used though, is SparqlTree. While it isn't really a mechanism for defining how to map RDF terms to JSON (but to formulate SPARQL selects digestible into compact results), it does so quite well for specific scenarios. It is very useful to create frames to work on, where code paths are fully deterministic, and where there is a one-way direction of relations (which is needed in JSON trees, as opposed to RDF graphs where we can follow rel and rev alike). Admittedly I've done less than I should to market SparqlTree. But then again, it is a very simple and instrumental solution over an existing technology. I recently gave a glimpse of how I use it in a mail concerning the "Construct Where of SPARQL 1.1" .

Reflecting on all of this, I'm quite convinced that anything like RDF/XML or Turtle is beyond what JSON should ever be used for. That is, support for all kinds of general RDF, using prefixes (whom I love when I need to say anything about anything) and exposing the full, rich, internationalized and richly and extensibly datatyped world of literals is beyond the scenarios where JSON is useful. If you need full RDF, use Turtle. Seriously. It's the best! It's rather enjoyable to write, and I can consume it with any RDF API or SPARQL.

The only case I can think of where "full RDF in JSON" might apply is for machine-to-machine data where for some reason only JSON is viable. For this, I can see the value of having Talis' RDF/JSON standardized. It is used in the wild. It is reminiscent of the SPARQL results in JSON, which for me is also quite machine-like (and the very reason for me inventing SparqlTree in the first place!). I'd never hand-author it or prefer to work on it undigested. But that's ok. If handed to me I'd read it into a graph as quickly as possible, and that'd be dead simple to do.

So where does this leave us? Well, the Gluon I designed contains a general indecision, the split into a raw and a compact form. The problem is that they are overlapping. You can fold in parts of data into compact form. This is complex, confusing and practically useless. Also, the raw form is just another one in the plethora or more or less "turtle in JSON" designs which cropped up in the last years. I doubt that any such hybrid is usable: either you know RDF and should use Turtle, or you don't and you want simple JSON, without the richness of inlined RDF details.

My current intent is to remove the raw form entirely, and design the profile mechanism so that it is "air tight". I also want to make it as compact as possible, true to RDF idioms but still "just JSON". A goal will still also be that if present, a profile should be possible to use to get RDF from the JSON. This way, there is a possibility of adding the richer context and integratability of RDF to certain forms of well designed JSON. This of course implies that Gluon-profile compatible JSON will be considered well designed. But that is a goal. It has to look good for someone not knowing RDF!

I have a strawman of a "next generation gluon profile" in the works. I doubt that you can glimpse my design from that alone, but anyway.

Some things to note:

The 'default' feature will be more aligned with the @vocab mechanism of RDFa 1.1 (and JSON-LD)
Keywords ('reserved') can be redefined. There are preset top-level keys, but that's it. (A parser could parameterize that too of course.)
No CURIEs - every token is "imported" from a vocabulary.
Types will be powerful. They'll determine default 'vocab' for a resource description (i.e. JSON object), and you can also import terms locally for a type (so that a Person title is foaf:title although 'title' is globally from 'dc').
If there are multiple values for a term (i.e. multiple triples with the same subject and predicate), a defined prefix or suffix will be added to the term. This is an experiment to make this nagging problem both explicit and automatic.
The 'define' will be reduced to a much less needed component. Using 'autocoerce', pattern matching on values will be bravely used to coerce mainly date, dateTime and URI references to their proper types.
Incoming links can be represented as 'inverseOf' attributes, thus making it possible to frame more of a graph as a tree.
Named bnodes are out (though they might be snuck in via a "_:" link protocol..). Anonymous bnodes are just fine.

This is a design sketch though. Next steps are to work on adapting my test implementations and usage thereof.

An auxiliary but very interesting goal is the possibility of using these profiles in a high-level API wrapper around an RDF graph, making access to it look similar to using Gluon JSON as is (but with the added bonus of "reaching down" the abstraction to get at the details when needed). (This is the direction I've had in mind for any development of my nowadays aged Oort Python O/R mapper. More importantly, the current W3C RDF/Structured Data API design work also leans towards such features, with the Projection interface.)

(Note that profiles will reasonably not be anything like full "JSON schemas". It's about mapping terms to URI:s and as little else as possible to handle datatyping and the mismatch between graphs and JSON trees. There is a need for determining if a term has one or many values, but as noted I'm working on making that as automatic as possible. Casting datatypes is also needed in come cases but should be kept to a minimum.)

Finally, I really want to stress that I want to support the progress of JSON-LD! I really hope for an outcome to be a unification of all these efforts. The current jungle of slightly incompatible "RDF as JSON"s sketches is quite confusing (and I know, Gluon is one of the trees in that jungle). I believe JSON-LD and the corresponding W3C list is where the action is. Since there is work in JSON-LD on profiles/contexts, and a general discussion of what the use cases are, I hope that this post and my future Gluon profile work can help in the progress of this! For me Gluon is the journey and I hope JSON-LD is the destination. But there are many wills at work here, so let's see how it all evolves.

The Stone in the House of Glass

2009-12-31T00:03:00.002+01:00

The stone in the house of glass

began to tumble

It didn't really see

where it was going

The stone in the house of glass

began to rumble

It didn't really hear

what it was doing

The stone in the house of glass

took a dive

It didn't really sense

its own surroundings

The stone in the house of glass

began to crumble

It didn't really know

its limitations

The stone in the heap of shards

has smashed the building

It didn't really get

what it was all about

The Groovy Times

2009-11-12T22:58:00.012+01:00

It has been so long since my last post here. I've twittered away like the rest of my peers. I guess I could dish out details from my personal life of the past year now. To examine my interrupt. I won't.

I've been using Groovy a lot in my work on the Swedish Legal Information System. While the things I find interesting in this work deserve many separate posts, I'll just spend this one to drop some nice stuff about groovy.

"Why Groovy", you might ask? Oh dear. For "political" reasons (this is an entire topic of its own), I have to use Java. But the language Java is often so much overwork and ceremony; riddled with convoluted ways to make explicit patterns and formalisms. Dynamic languages are pragmatic. Sure they have flaws, but I find the compromise acceptable. There are probably thousands of articles discussing this, and I prefer to debate it elsewhere (mostly with friends over lunch/dinner/beer).

I must deliver the bulk in Java, and Groovy is impressively close to Java, but with so much more expressive power (ease of use). (More than necessary? My pythonic side says "probably".) I've worked a lot with Jython the last decade, and some with JRuby. Groovy trumps them both (IMHO) when it comes to java library and java culture interoperability. I can spike, explore, and make tests in groovy. Then I add all this horrendous checked exception handling, spinkle some semicolons for the grumpy old compiler, and finally explode-a-pop all the def:s to bulky types when things need to "harden" (to fulfil the contract of delivering .java...; the tests are left in groovy). Not a big deal (especially since I use an editor that makes code editing a breeze). And groovy cross-compiles nicely with traditional cruft.

So what have I used more specifically? Spock. Check it out. Rewrite ten of your JUnit4 tests in it, and if you go back, write ten more. Don't go back. If you're already testing with say JRuby, I won't push you, but I assure you Spock is worth looking at. Specs become liberatingly clear and thin. Data-driving some of them is pure joy. Mocking is dead simple. (Sorry, I won't put code examples here now: look at the spock docs for that. Try them out!)

For building, we do not use Gradle (not yet at least), but that scary beast of mindnumbing declarativity (which I'm usually for), dreadful xml (which I can handle due to prolonged exposure) and conflated purposes (no excuse here) known as Maven 2. It seemed paramount in the surroundings when we started, and won't go away soon. I use it as little as possible (and it is quite impressive when you let it do its thing).

Which leads me to the last thing I want to mention: how to use Groovy's very convenient, builtin Grape system (and its @Grab mechanism) together with my local maven2 artifact repo. That one in <~/.m2>, where all my local packages have been mvn install:ed (along with the umpteen dependencies).

The thing was, when I started, I naively thought things would kind of work at least semi-automagically. I've spent my time in CLASSPATH hell. I wanted groovy to tap into the local m2. Then I was disillusioned again, and attempted to run experimental groovy scripts via GMaven. Didn't fit my use cases at all (that thing is great at compiling, I leave it at that). I shellscripted the path from mvn dependency:build-classpath, then built pathing jars, then just felt quite uneasy (such moves work, but it's not particularly clean).

When Grape appeared I tinkered with the Ivy config in <~/.groovy/grapeConfig.xml>. It surely looks so simple. I couldn't figure it out. Benhard could. Neat.. But alas, that solution copies all the dependencies from the local m2 repo to grape's ivy repo. And I could not get it to grab my new local SNAPSHOT-stuff as they landed either (in spite of eleventy ivy attributes claiming to force all kinds of checks).

Then it appeared, from a combination of fatigue and taking a step back (cue magic "aahh").

Use Groovy's Grape with your Local Maven File Repo

Locate $HOME/.groovy/grapeConfig.xml. If it doesn't exist, see the Grape docs for how the default version should look.

Then add, directly after (xpath) ivysettings/settings, the following directive:

  <caches useOrigin="true"/>

And, in (xpath) /ivysettings/resolvers/chain, after the first filesystem, add:

<filesystem name="local-maven2" m2compatible="true">
  <ivy pattern="${user.home}/.m2/repository/[organisation]/[module]/[revision]/[module]-[revision].pom"/>
  <artifact pattern="${user.home}/.m2/repository/[organisation]/[module]/[revision]/[artifact]-[revision](-[classifier]).[ext]"/>
</filesystem>

With that in place (pardon the line width), Grape (and thus @Grab) will happily use anything it finds in your local m2 file repo, without copying the jar:s. (It will still download other stuff to use to the default ~/.groovy/grapes/, which is fine by me.)

That's it for now. There are lot's of cool stuff with Groovy, if you're in a Java environment and want to ackowledge that without giving up modern power.

Three years ago I looked at Scala for the first time, and found it quite interesting. Then I got a bit spooked by the academic machinations of it. Lately that interest has been quite rekindled though, and the future will show what will come of that. I am very happy to have used Groovy so far though, and I would certainly recommend it. Scala may yet be for tomorrow, Groovy is for the Java user of right now.

(Of course, I recommend to continuously look beyond the JVM as well. Simplicity is hard to reach in increments without designing for it from the start. But things will reasonably evolve in most "camps".)

Labelled Reduction as A Good Thing

2008-11-27T18:00:00.006+01:00

One small thing (of the many) in Python 2.6 I like (and have waited for since it appeared as a recipe), is collections.namedtuple. It is very useful in itself, but the fact that the stdlib has been adapted to use it throughout is quite nice. Consider the following code:

from urlparse import urlparse
print urlparse(
        "http://localhost:8080/doc/something;en?rev=1")

If run with Python 2.5, you get this tuple:

('http', 'localhost:8080', '/doc/something',
 'en', 'rev=1', '')

, whereas in 2.6 it is a namedtuple:

ParseResult(
        scheme='http', netloc='localhost:8080',
        path='/doc/something',
        params='en', query='rev=1', fragment='')

The last one is unpackable just like a regular tuple, but you can access the parts as attributes as well. This little "data struct" is quite handy, since I don't like to access tuples by index, but quite often pass them around and only access some piece of them at a time.

(With these you don't need to create full-fledged classes for every kind of instrumental data. (Sometimes coupling data and functionality in a single paradigm may be a coarse hammer, treating nails and iron chips alike..) Nor resort to the use of dictionaries where you really want a "restricted value lens", if you will.. But this is another rant altogether.)

Of course, there's lots more to enjoy in 2.6 (the enhanced property for decorator use, ABC:s, json, 2to3 etc).

On a related note, do check out Swaroop C H:s excellent and free books on Python (2.x + 3.0(!)): A Byte of Python. And if you're into Vim (you should be, IMHO) his new A Byte of Vim.

Gnostic Nihilism

2008-10-24T19:43:00.006+02:00

Here's my take on it. God actually exists. It's the world that doesn't. It's just that when you leave a being like that all alone in nothingness for an eternity, it starts to dream up all sorts of amazing and insane things.

Or to put it in mathematical terms (incidentally also being my unbound alternative to "42" for half my life):

0 * oo = x, where 0 < x < oo.

("oo" is for 𝌺, i.e. "eternity", of course.)

Take that you goddamn realists! I laugh in your general direction! Ha. Ha I say.

If you nevertheless find a discrepancy in my flawless argument, I'd gather you're just cheating by using rationality. How gnostic is that? I suppose next you're going to argue that there is logical truth in empirical facts.

Resources, Manifests, Contexts

2008-09-27T18:25:00.005+02:00

Just took a quick look at oEmbed (found from a context I was led to from Emil Stenström (an excellent Django promoter in my surroundings btw. Kudos.)).

While oEmbed is certainly quite neat, I very much agree with the criticism regarding the lack of RESTfulness, and that they have defined a new metadata carrier. I think oEmbed would work very well as an extension element in Atom Entry documents (who already have most of the properties oEmbed (re-)defines). Or by reusing (in such atom entry docs) e.g. Media RSS, as Stephen Weber suggested.

Granted, if (as I do hope) RESTfulness and Atom permeation on the web becomes much more well established (approaching ubiquity), this would be dead easy to define further down the line. (And signs of this adoption continue to pop up, even involving the gargantuans..)

But since it wasn't done right away, oEmbed is to some extent another part of the fragmented web data infrastructure — already in dire need of unification. It's not terrible of course, JSON is very effective — it's just too context-dependent and stripped to work for much more than end-user consumption in "vertical" scenarios. While oEmbed itself is such a scenario, it could very well piggy-back on a more reusable format and thus promote much wider data usability.

A mockup (with unsolicited URI minting in the spaces of others) based on the oEmbed quick example could look like:


<entry xmlns="http://www.w3.org/2005/Atom"
       xmlns:oembed="http://oembed.com/ns/2008/atom/">
  <id>tag:flickr.com,2008:/3123/2341623661_7c99f48bbf_m.jpg</id>
  <title>ZB8T0193</title>
  <summary></summary>
  <content src="http://farm4.static.flickr.com/3123/2341623661_7c99f48bbf_m.jpg"
           type="image/jpg"/>
  <oembed:photo version="1.0" width="240" height="160"/>
  <author>
    <name>Bees</name>
    <uri>http://www.flickr.com/photos/bees/</uri>
  </author>
  <source>
    <id>tag:flickr.com,2008:/feed</id>
    <author>
      <name>Flickr</name>
      <uri>http://www.flickr.com/</uri>
    </author>
  </source>
</entry>

The main point, which I have mentioned before, is that Atom Entries work extremely well as manifests of resources. This is something I hope the REST community will pick up in a large way. Atom feeds complement the RESTful infrastructure by defining a standard format for resource collections, and from that it seems quite natural to expose manifests of singular resources as well using the same format.

In case you're wondering: no, I still believe in RDF. It's just easier to sell uniformity one step at a time, and RDF is unfortunately still not well known in the instrumental service shops I've come in contact with (you know, the ones where integration projects pop up ever so often, mainly involves hard technology, and rarely if ever reuse domain knowledge properly). So I choose to support Atom adoption to increase resource orientation and uniformity — we can continue on to RDF if these principles continue to gain momentum (which they will, I'm sure).

Thus I also think we should keep defining the bridge(s) from Atom to RDF for the 3.0 web.. There are some sizzling activities on that respect which can be seen both in the Atom syntax mailing list and the semantic web list. My interest stems from what I currently do at work (and as a hobby it seems). Albeit this is from a very instrumental perspective — and as a complement, rather than an actual bridge.

In part, it's about making Atom entries from RDF, in order for simple RESTful consumers to be able to eat some specific Atom crumbs from the semantic cakes I'm most certainly keeping (the best thing since croutons, no doubt). These entries aren't complete mappings, only selected parts, semantically more coarse-grained and ambiguous. While ambiguity corrupts data (making integration a nightmare), it is used effectively in "lower-case sem-web" things such as tagging and JSON. (Admittedly I suppose it's ontologically and cognitively questionable whether it can ever be fully avoided though.)

We have proper RDF at the core, so this is about meeting "half way" with the gist of keeping things simple without loosing data quality in the process. To reduce and contextualize for common services — that is at the service level, not the resource level. (I called this "RA/SA decoupling" somewhere, for "Resource Application"/"Service Application". Ah well, this will all be clarified when I write down the COURT manifesto ("Crafting Organisation Using Resources over Time"). :D)

Hopefully, this Atom-from-RDF stuff will be reusable enough to be part of my Out of RDF Transmogrifier work. Which (in my private lab) has been expanded beyond Python, currently with a simple Javascript version of the core mapping method ("soonish" to be published). Upon that I'm aiming for a pure js-based "record editor", ported from an (py-)Oort-based prototype from last year. I also hope the service parts of my daytime work may become reusable and open-sourced as well in the coming months. Future will tell.

Possession

2008-09-21T02:50:00.006+02:00

I honestly thought that my next post wouldn't be a drunken one. Alas, it'll be. This time, it's prompted by a sample initializing "Requiem" by Delerium (-89, you won't bother): "Possession is a state of mind". Somehow this statement rings true. Then I thought about "theft". Following, I figured reflecting about "possession" was at the core. You'll be the judge. As always. Please respect possession - and never (ever ever) be the thief. Right.

Have you ever

2008-06-26T02:59:00.004+02:00

stayed within your own contemplation
long enough,
to realize
that you're never
outside
of your contemplation?

Leave.

The

2008-06-21T05:52:00.002+02:00

most merciful thing, is the inability of the human mind to correlate all its contents. Please don't. Or fret not. At least -- not at least.

Brave New World

2008-06-17T13:30:00.002+02:00

[I wrote this little silliness more than a year ago. Didn't find an appropriate time to post it (it served mostly as critique of the computerized semantics I usually like so much). Now, with Sweden (where I live) on the brink of legalizing a little Echelon of our own (or their own, as it were), it seems as good a time as any.]

.. Sometimes I wonder if what we do is just a fancy way of digging our own graves.


Echelon> Event 116822052.26#113967:
  Instant Message detected: thinking.. done (in 0.032 ms).
  Storing inferred knowledge as:
   [ foaf:mbox_sha1sum "286e3265...";
     emo:likes <urn:bbdb:1997:python> ].
  in context:
   <urn:gov:surveillance:event:116822052.26:113967>
     a :NLPResult;
     :timestamp 116822052.26032;
     :trustability 0.681132 .
  Done. Running ruleset..
   CoreBot>
     - No known subject for:
       owl:inverseFunctionalProperty
         foaf:mbox_sha1sum with "286e3265..."
       , using BNode _:a21c3dd723ffe39
       (subject of 31102 previous facts).
     - rdf:type foaf:Person inferred for _:a21c3dd723ffe39.
     - No unknown knowledge gained. Done.
   EmotionTracker>
     - increasing likability of <urn:bbdb:1997:python> .
   MicroSoftAgent>
     - increasing demand of <urn:bbdb:2006:ironpython> .
     - increasing threat of <urn:bbdb:1998:opensource> .
  Backtrack threshold at 1337
    - refactoring reasoner dependencies.. Done.
  Done (passed through 4211 reasoners).
  Done (in 0.11 ms).

Echelon> System Update:
  "Owner Change. Renaming."
  Done (in 0.002 ms).

SkyNet>

Representation Taxation

2008-05-27T22:34:00.013+02:00

It seems my thoughts about Atom Entries will be delayed a while longer. This post begun life as a response to this "BlubML" post by Bill de hÓra, but quickly turned out to be about my view of data models, as used in the IT industry yesterday, today and tomorrow. And industry rather ridden with technology wars, which has reduced the status of information into mere fodder. Leaving meaning, shared and reused concepts, discoverability, integratability and hence interoperability in many ways just a far-off vision many never even have time time to think about (leaving us thrashing data in the technology trenches).

The mentioned post reflects (by quotation) upon markup and the perceived "angle bracket tax". I can definitely get that. But, as mentioned in a comment by Iain Buckingham, the issue at hand is probably mainly about what the markup is supposed to represent. (And now I totally leave the subject of lightweight text formats (which I like, by the way); I'll focus on data models, not syntax). Take the content model of XML. It is a mixture of plain text (useful for primitive data like numbers, dates and other "human language" constructs) and different structure blocks/markers (elements, attributes, pi:s, some more) who have no useful semantics apart from a label (sometimes in a namespace), ordering and nesting. This has been proven as very useful to represent documents, where documents range from books via web pages (also as application views) to vector graphics and so on.

The "defining nature" of documents is difficult to pin down, but it seems we get by fairly efficiently anyway. Mainly since this semi-structure is intended (and, incidentally, often indented) for one-way (sometimes layered) rendering of output in turn meant for human consumption, which actually works without information precision (albeit sometimes with more or less apparent negative concequences).

Documents aside, let's continue to data records. More or less the first use of XML was as a serialization of the RDF data model. This model has real semantics, where unique resources are described with statements, whose parts (subject, predicate, object) are either resources (all three) or primitive values (for objects). Unfortunately, RDF still struggles to emerge as a useful representation in the industry at large, partly due to its initial appearance as XML (from whose abundance of namespaces and literal URI:s even many XML neophytes has fled in panic), partly due to its data model being more academic (rooted in logics and AI) and thus less approachable than the (IMHO) less expressive but quite succinct object oriented, class based data model of most common modern programming languages.

This latter model has always been fragmented by language differences, and up close hard to pin down due to conflicting computer science details (OO encapsulation, coupling with function and message metaphors etc). And while objects often live in (connect as) graphs, their identity has been hidden, often being a mere memory address. Quite parallell with this, data has been pushed into the even more mechanical and implementation focused semantics of the relational model, in SQL databases, for decades. While these records do have explicit IDs (or composites, or...), they are for internal use within a given application context. Any usage beyond that must be secured by convention outside of that. The often cumbersome use of RDBs (where "splitting and slicing" all but the flattest records is needed) has been minimalistically remedied by the O/R-mapping tools in the OO languages, which has proven exceptionally useful in introspectable and/or dynamic languages. But it's mostly just a prettier surface upon the same old relational model. Signs of which can be seen by the specific restrictions (that many O/R:s have) placed upon the object semantics of the "host" language (inheritance, composition, coupling with functionality), and not the least the invention of SQL-like languages, mostly one per mapping implementation (although some credit should go to earlier efforts by the object database people for trying to standardize such things).

But there are very useful aspects of an OO-based representation. As a common denominator, JSON should be held up as an extremely useful format (albeit quite void of namespacing). It represents the "bare bones" data record format which is isomorphic with the OO languages (be it class- or prototype-based ones). And although a more narrow scope, it has more defined semantics than XML, since it explicitly differs between properties and lists (and does not introduce the artificial controversy of whether to use "attributes" or "elements"). But it's still a long way from the identifiable resources and datatyped or human language typed literals of RDF. Depending on context/domain needs, this may or may not be a painful point to realize (if not, JSON is indeed the simpler thing that actually works).

The attempt to express similar semantics in W3C:s XML Schema is to me a painful one, and in view of the complexities it adds and how little it brings in succinctness and clarity (and immediately/apparently useful general semantics) is a sign of that schema technology's failure. Use of RelaxNG isn't too bad though, and I find nothing inherently long in some formats, such as Atom, whose role is to bring a packaging and insulating channel format for resources (in the URI, REST, and most imporantly the RDF sense of the word). Atom defines a model which is scoped and simple (as in easy to grasp, limited in expressiveness). This is done with a text specification (RFC 4287 and its kin) coupled with a (non-normative) RelaxNG (compact) schema for easier machine use. For some things this can be a fitting approach. I don't believe its the best way to fully express records though (but perhaps to glean a narrow but useful aspect of such).

Now, let's turn the focus back to the objects often used as records with e.g. O/R technologies. It's been mainly through the emergence of adherence to the principles of REST, in turn (in practise) dependent upon URI:s and ideals of "cool URI:s" (for e.g. sustainable connectedness), that these objects have gained identities usable outside of a given, often ad-hoc defined, changing and quite internal (think "data silo") management of the data at hand. This process is the evolution of the Web as a data platform (which I believe should be the basis of a stable distributed computing platform, where one is needed), and it seems to me that the use of RDF is now a very promising next step. Since it embraces (indeed builds upon) this uniform and global identification scheme. Since it is "data first" oriented — you can use it to state things about resources which are meaningful even for machine processing without any invention/specification on your part. And since (due to URI:s and the semantic precision) it's less vulnerable to coupling with internal solutions, which "bare" O/R-mapped objects often are (in the same way, albeit less humongous, as automatically generated XML-schema defined API-fragments that hide in SOAP envelopes), due to implementation details (and sometimes a hard-wired and thus fragile URI minting mechanism).

With JSON, RDF shares the decoupling of types/classes ("frames" in AI) and properties of resources. But in JSON properties are simple, totally local keys, and not first-class resources themselves possible to describe in a uniform manner. And JSON has other upper limits (such as the literal language mechanism), which can't be surpassed without adding semantics, "somehow" (e.g. conventions). This may work for specific needs (and very well), but is hard to scale to the levels of interoperability needed for a proper web-based data record format. (Similar problems riddles it's ragged step-cousin microformats, which also lacks formal definitions for how to handle (especially decentrally so).)

Back in "just XML"-land, as said, it seems to work for the semi-structured logical enigmas of XML that the above mentioned "documents" are, but as for data records, the XML format will be a serialization of some very specific, elsewhere explicitly defined model (document formats as well of course, but these often make some use of the more exoteric mixed content model XML supports). One such model is the "web resources over time" that Atom works for (coupled with the HTTP infrastructure and RESTful principles). Granted, such XML-based formats can be used without worrying too much about the mysterious XML infoset model (including the RDF/XML format, its free-form character aside). But to be useful, they need to be standardized and extensively supported by the larger community. Otherwise, apart from needing to design and implement the deserialized model from scratch, the creeping complexities that an ill-thought XML model design lets in (or forever locks out any extensibility from) may emerge whenever you want to express something idiomatic to your data.

That's all. This was mostly letting out some thoughts that have been building up pressure in my mind lately. And became yet another prelude to my thoughts still simmering regarding how to effectively use Atom with payloads of RDF. And also about some of the more crude ways, if you need to gain ground quickly (but more or less imperfectly). You can do that with JSON (fairly ok, but with JSON you're quite localized anyway, so why take the Atom highway if you only need to use your bike?), microformats (not ok in by my standards (again: where's your model?)), RDFa (if you have overlapping needs it can be brilliant), or with simplistic Atom extensions (could be very XML-taxating and thus risky; but done "right" it's kind of a "poor mans RDF", a bleak but somewhat useful shadow of the real thing).

(Admittedly this post also turned out to be a rant with questionable focus, but why not. Pardon my posting in quite a state of fatigue.)

Memes and Principles, Intent

2008-05-20T20:41:00.005+02:00

This is a prelude to an upcoming post where I intend to speak about Atom Entries as some kind of "simplest thing that could possibly work" (for the specific purpose of representing manifests of resources ("resource" as the R in URI (and in RDF, of course) — i.e. the resources of the web, and the mind (perhaps even "the world", but I doubt it))).

The prelude is just some thoughts about a couple of principles.

First: the simplest thing that could possibly work. A phrase often quoted and often misapplied. This is common knowledge, and many clarify things by interpreting what "possibly work" means. I won't. I'd just like to rephrase it as "the simpler thing that actually works". It's useful since it both hints that there can be more complex/complicated things (see #3 and #4 in The Zen of Python) that don't actually work (for some arbitrary meaning of "work", admittedly), and (obviously) that there are simpler things that don't work at all (less or more simple than the one that works — this is the heart of the problem). I feel this phrasing avoids the confusion the original one often causes (it just seemed to be a simpler way of saying it that still works..). So, to repeat: just say "the simpler thing that actually works". It works.

Second: just a recap of an old joke of mine regarding the principle of DRY. I very much do prefer if programming practise adheres to that one. There's too much code that suffers from MOIST. Still, I advice you to avoid to DRY until SHRIVELLED. (Those sure are acronyms. Please do hover.) This is related to the first principle above, and I mainly mean that some practises, such as heavy use of metaprogramming, simply performs too much reduction to make the intent clear even to the appropriately trained eye. Mileage will vary, but as many others have already advised: don't be too clever. It will bite others, and you as well. (Ok; honestly, my main intent was probably just to make a pun out of a perfectly sane advice.)

I will probably repeat this.

[Side note: It's funny. Just this last week I've noticed I habitually misspell "intent" as "indent". I had to correct myself about three times while writing this. Obviously my prolonged use of Python has caused these two distinct words to conflate in my mind. I never indented that.]

Getting lxml 2 into my gnarly old Tiger; going on ROA

2008-04-16T20:00:00.013+02:00

In case anyone might google for something like:

lxml "Symbol not found" _xmlSchematronNewParserCtxt

, I'd like to reiterate the steps I just took to get lxml 2 (2.1 beta 1 to be precise) up and running on OS X 10.4 (which I haven't yet purged in favour of the shiny Leopard for reasons of fear, uncertainty and doubt. And while pleasurable, quite mixed impressions from that cat when running my wicked new iMac (and my funky old iBook, in case you're dying to know)).

In short: you need a fresh MacPorts (1.6, I had 1.3.something):

$ sudo port selfupdate

, and then, wipe your current libxml2 (along with any dependent libraries, MacPorts will tell you which):

$ sudo port uninstall libxml2@2.6.23_0

(You might not need the version number; I had tried to get the new libxml2 before this so there was an Activating libxml2 2.6.31_0 failed: Image error: Another version of this port thingy going on.) Then just:

$ sudo port install libxml2 libxslt

and you'll be ready to do the final (after removing any misbehaving lxml (compiled with the old libxml2) from site-packages) :

$ sudo easy_install lxml

But why?

Because lxml 2 is a marvellous thing for churning through XML and HTML with Python. There was always XPath and XSLT, C14N in lxml 1.x too (admittedly also in 4Suite as well; Python has had strong XML support for many, many years). But in lxml 2, you also get:

CSS selectors
much improved "bad HTML" support (including BeautifulSoup integration)
a relaxed comparison mechanism for XML and HTML in doctests!

And more, I'm sure.

So, why all this Yak Shaving tonight? Just this afternoon I put together an lxml and httplib2-based "REST test" thing. Doesn't sound too exciting? I think it may be, since I use it with the old but still insanely modern doctest module, namely its support for executing plain text documents as full-fledged tests. This gives the possibility (from a tiny amount of code), to run plain text specifications for ROA-apps with a single command:

Set up namespaces:

 >>> xmlns("http://www.w3.org/1999/xhtml")
 >>> xmlns(a="http://www.w3.org/2005/Atom")

We have a feed:

 >>> http_get("/feed",
 ...     xpath("/a:feed/a:link[@rel='next-archive']"))
 ...
 200 OK
 Content-Type: application/atom+xml;feed
 [XPath 1 matched]

Retrieving a document entry with content-negotiation:

 >>> http_get("/publ/sfs/1999:175",
 ...         headers={"Accepts": "application/pdf"},
 ...         follow=True)
 ...
 303 See Other
 Location: /publ/sfs/1999:175/pdf,sv
 [Following..]
 200 OK
 Content-Type: application/pdf
 Content-Language: sv
 [Body]

That's the early design at least. Note that the above is the test. Doctest checks the actual output and complains if it differs. Declarative and simple. (And done before, of course.)

Ok, I was a bit overambitious with all the "ROA"; this is mostly for checking your content-negotiation, redirects and so on. But I think it'll be useful for a large system I'm putting together, which will (if things work out well) at its core have a well-behaved Atom-based resource service loaded with (linked) RDF payloads. Complete with feed archiving and tombstones, it will in turn be used as fodder for a SPAQRL service. (Seriously; it's almost "SOA Orchestration Done Right", if such a thing can be done.) But this I'll get back too..

(.. I may call the practice "COURT" — Crafting Organization Using Resources over Time. It's about using Atom Entries as manifests of web resources with multiple representations, fully described by RDF and working as an over-time serialized content repository update stream. (Well, that's basically what Atom is/works perfectly for, so I'm by no means inventing grand things here — grand as they IMHO are.)

Old Bad Breaks Improved by Dividing

2008-04-16T19:34:00.008+02:00

I stopped blogging quite a while ago. Partly because I cooked up long articles about Atom-based services with RDF payloads in my head that never got written down (I shall get back to this). Partly because I use this "convert newlines to breaks" in blogspot. If I turned it off, I hadto edit all my old posts. Simple in itself, but I couldn't bear touching those old "modified" times.

So all my posts became these horribly formatted blocks of old, stale, ill-parsable html.

I don't care anymore. I'll use this to post random stuff, and put proper articles (if any) at Neverspace.net or Oort.to. And in a couple of decades from now, move my blogging to a a construct of my own, full of Atom-based services with RDF payloads.

Oh. Just-in-Time Update: Switching to edit-mode in the blogspot editor I just realize that there are no horrible little bloody breaks anymore. They are div:s now! Perhaps they could even become p:s. Nice! Thank you blogspot/google people! Then I don't have to move, since apart from RDF payloads, all Google services are indeed Atom-based (RSS 2.0 too, but I don't care for that).

(Thus I shall "soon" attempt to post from Vim to here, possibly with the python-based GData stuff I haven't been using for anything productive yet. Or some of the other TODO:s that clog my creative pipe far too often.)

Now for another post, possibly with some useful content in it as well.

Out-of-Sync Update: Typically, those pesky breaks still show up. Well, at least they are funnily isolated in div:s too..

Out To Transmogrify Resource Descriptions

2007-10-07T19:58:00.000+02:00

Finally, I've bundled another release of Oort. And OortPub! I've modularized the package into two Eggs, separating the core O/R things from the (experimental) WSGI web stuff.

At the cheeseshop you'll now find them both, at

Oort (0.4), and
OortPub (0.1)

respectively. I did this not the least since the core Oort package is very useful on its own, in conjunction with other tools altogether. I've used it for a lot of stuff lately, none of which involved OortPub. Check out e.g. the QueryContext for some new features..

I hope this will become useful to others as well.

The website got a little overhaul as well, to reflect this change, and to include some more documentation (not so much more yet, but the basic stuff is in place). Oh, by the way, it's generated using Oort objects (among other things like Genshi, ReStructuredText, google-code-prettify and of course RDFLib).

OMG! Snakes on a Phone!

2007-09-27T21:20:00.000+02:00

So, I got this new cell phone from work. A Nokia 6120 Classic, which has just the size and features I wanted (and more, I'm sure). I picked the white/silver model, to go with my kitchen supplies (in honour of the fact that with Series 60, you can basically install the Kitchen Sink if you so choose to).

Going further, I of course wanted to play Monkey Island and such on the amazing ScummVM. Well, admire the running of; actually playing through them might be a little straining.. Though great screen clarity makes it quite possible to discern the orange and pink on the red herring you steal from the Loom™ seagull outside the Scumm Bar kitchen, it's quite thumb-numbing to actually snatch it in time..

And then, surely, I couldn't wait to get to the real goodies: Python. Specifically, The Python for Series 60 distro by Nokia (open source). I knew it was quite capable, but seriously; it basically lets you p0wn your phone.

(Half a decade ago I had fun running Python on my old Psion Revo; but that didn't go much further than some file tinkering and exploration of new-style classes (metaprogramming and whatnot) during commuting. But this is different. The same. But different.)

You go ahead by following the instructions, basically downloading the PythonForS60_1_4_0_3rdEd.sis and PythonScriptShell_1_4_0_3rdEd.sis from sourceforge, wiring or beaming them to the phone and running them to install (py, then shell). I also got the PythonForS60_1_4_0_doc.pdf, and marvelled at its beauty. I mean, apart from greater parts of the Python Standard Library, a mere glance at the modules provided for S60 stuff gives you an impression of the power at hand:

graphics, e32db, messaging, inbox, location, sysinfo, camera, audio, telephone, calendar, contacts, keycapture, topwindow, gles, glcanvas

Now, I haven't gotten very far yet; mostly tinkered with urllib to download some stuff from the web (which just worked (after a nice confirmation dialogue from the phone to actually access the net)), rendering the obligatory mandelbrot and then peeking at the real deal by looking at the SMS inbox contents using the interactive console.

Or actually, using

$ screen /dev/tty.bt_console

from my OS X Terminal (iTerm really). Magically set up by simply following this excellent guide.

Perhaps screen scraping wasn't really what I thought of when wanting Python on the phone, but you can't deny it's cool to be able to (perhaps awaiting flatrate subscription to not drain my wallet in the process though).

Speaking of screens and webs again; damn that WebKit-based browser in the phone really worked! It actually ran the JavaScript-based SyntaxHighlighter, thus rendering the Oort Tutorial quite faithfully. I did not expect that (nor my rekindled desire for highly variable width web sites..).

[And speaking of Oort; I've added some somewhat nifty stuff (if I may say so) lately; accessible in the svn repo at google code for now. A release should be coming up. Pre- or post my planned split into core and web/publication stuff I haven't decided yet. There really is a working RDF-to-object mapper underneath the touted WSGI stuff (which is just a minimal layer).]

Now I should go on to do something productive (like shaving a yak).

A Mind must Become be4 it can Go

2007-09-19T01:04:00.000+02:00

Now, I do not understand why the Web 4.0 page at Wikipedia has been protected to prevent creation. It is the fear of the machine? Because as everyone must know, Web 4.0 is the peak of our civilization, when mankind will unite in celebration, marvelling at our own magnificence as we give birth to AI. The Web at 4.0 will also affectionately be known as "Web H4L", to honor its predecessor, born on January 12, 1997 at the HAL Plant in Urbana, Illinois.

Of wheels and fortunes

2007-08-21T13:43:00.000+02:00

[I wrote this comment a while ago, but left it hanging. Rather than letting it die, I just throw it into the mix (to show a life sign if nothing more).]

I was inspired by this post about reinvention (stumbled upon it, probably via reddit). An interesting point, addressing many states of affairs of the past, present and future.

To me "gratuitous reinvention" is a "mixed curse". In general much can be lost by the reinvention of wheels. But it is definitely healthy with competing ideas, problem formulations and solutions. In times of surplus, the risk is unfortunately quite big to overlook the fact that thorough thought processes have already addressed situations that seem novel when first encountered. When this leads to incompatible diversification (as opposed to interchangeable alternatives), much is devalued. This is something one should always be aware of, both when choosing among existing solutions and embarking on own adventures of invention. With that in mind though, quests of the latter type may very well lead to a specification of what needs to be solved, and how. At that point, (re)visiting existing work is a wise course.

Keeping it simple (enough but not more) is a fine ideal. But keeping it integrable is also very important for the evolution of infrastructure — where complexity can evolve without necessarily leaking down into the component parts.

Choosing right tools for jobs reasonably means taking paths that lead to a minimizing of energy expenditure — over time. Time is where the real trickiness comes in. Sometimes, when your intuition tells you to, you have to make leaps of faith. This is true in life as well as in art. Whatever those two things are. Wheels and fortunes.

Point of Data

2007-05-10T00:28:00.000+02:00

(Read all of this, it is a Zen Koan.)

Been thinking about data lately (possibly "in" as well). And information, content, metadata, meaning, knowledge, taxonomies, ontologies.. Ontology. Flashbacks of never-ending philosophical debates, epistemology, Plato's Ideals, empiricism, positivism, all of that. Well not so much of the latter really, I guess I've learnt to recognize mental swamps before I go trekking nowadays.

But data. I think a little clarification could be in order. I'd say it goes like:

Data: Particulars (atoms if you will) of that which compose our impressions. Not the world, but that stuff from which we "get the world".
Information: I just quote Gregory Bateson: "a difference that makes a difference". The part of data we can use, as opposed to "noise" or "void".
Content: A tricky thing. "Composited information with a bound context" perhaps, with varying (often hard to measure) complexity.
Metadata: Let's say "added bits used to externally correlate content". I'm not much of a fan of the word anymore. I may deprecate it in favour of just "context-providing statements". The stuff that turns data we don't get into data we get.

I often just call content and metadata "data in" and "data about" nowadays.

Of Content

Content is perhaps the most widely used and less defined stuff. It's abundant, the substance which we structure (and in so doing "contentifying" it further). It is the composition from which richer meaning can be derived. By it's virtue of having a context. This article is "content". That last statement is information (this was a reification, but I digress). Possibly "metadata". Now it's just getting funny. Anyway, content is stuff which can be molded to gain shapes and shades, color and tone; somewhat "synergetical" effects which may or may not add meaning — more often than not depending on cultural aspects (part of the implicit context).

It is stuff which we leave semi-formal at best, the information we hardly can process by anything less than our own neural networks (brains). Somewhat pessimistically perhaps, information which during interpretation gain illusory qualities (akin to optical illusions) which either enhance or corrupt the bringing of meaning.

I won't get into information theory nor semantical discussions. Neither into a discussion of techniques of how to "fluff" expressions to become more sympathetical, making the receiver prone to interpret them as "meaningful".

To the Point

If you're still reading, this is the part where I intend to make a point. Binding the context by which this becomes meaningful.

This content bears no inherent meaning. It's all in your head. Perchance it may have conveyed a structure which made sense, put your mind into a state of "ah, ok". I really just needed to externalize that first part of categorizing some terms that continue to come up in discussions, and for which I needed some binding interpretations.

Perhaps you detected an "anomaly" if you tried to interpret the terms in a hierarchical fashion. "Data in" and "data of" was my differentiation of Content and Metadata, seemingly "instances of Information". But Data isn't necessarily Information. The thing is, that "thing that makes the difference" is where the "meaning", "semantic relevance", "precise form" comes in, and what that is my current state of mind can only grasp by intuition.

We use Content, and the more fine-grained and "to the point" context-providing statements (Metadata), one as compositions within a context, the other to bind both particulars and contexts by means of relations and characteristics. To have a feel for the difference is important, for it is the key by which to understand why Knowledge Representation is the missing piece in many Information Technology issues today.

And this is where How To Tell Stuff To a Computer, and then The Semantic Web FAQ comes in, as my recommended reading of the week, to get you going. Those sources of information will hopefully make the difference that eluded you here.

That's the point.

(Or is it?)

Cutting Edges

2007-03-27T00:18:00.000+02:00

For all of you out there writing droves of RDF, churning out endless amounts of Notation 3, and doing it all in Vim.. (we must be in the millions I'm sure), be sure to check out my little RDF Namespace-complete — giving you RDF Vocabulary/Model Omni-Completion for Vim 7+. Requires Vim compiled with Python and RDFLib.

On a related note, I've been using RDFLib a lot lately (around the clock as it seems). It's an honor to contribute in an ever so small way to it, and it's great to see it progressing so well (it's been mature for years if that needs to be said). The SPARQL-support is working, and along with Chimezie Ogbuji's FuXi there is now an even stronger Python RDF platform to work with.

Oort In Space, In Time, At 0.3.2

2007-02-12T23:42:00.000+01:00

Oort is now at 0.3.2. See the release history for details. Also, check out the tutorial — now functional at last.

Apart from fixes, the API is slightly cleaner and RdfQueries can now be updated, directly or by loading a dictionary. This means that JSON to RDF is a no-brainer when using Oort to do the semantic enhancements. (This is very new though, use with care.)

A Great Day for Specificity

2007-01-26T15:49:00.000+01:00

dbpedia.org - Using Wikipedia as a Web Database

[...] dbpedia can also be seen as a huge ontology that assigns URIs to plenty of concepts and backs these URIs with with dereferencable RDF descriptions.

We have advanced a tech level!

This is really good timing, as I just recently considered using TDB URNs for referencing non-retrievable things and concepts (using TDB-versions of URLs to Wikipedia articles and web sites). Finding out if the idea of TDB and DURI URNs have been long since abandoned was my next step down that path. (Then there's the use of owl:InverseFunctionalProperty and bnodes (or "just-in-time"-invented ad-hoc URIs) and throwing in owl:sameAs if official ones are discovered..)

With DBPedia, a lot of that becomes unnecessary. The availibility of public, durable URIs for common concepts will surely ease the integration of disparate sources of knowledge. That is, if we start to use dbpedia-URIs in our statements.

And gone will be the strangeness of saying "I'm interested in [the word 'semantics']", "This text was edited with [the website <http://vim.org>]" and "I was born in [the wikipedia article about 'Stockholm']"..

Knowledge, The Bits and Pieces

2007-01-17T18:47:00.000+01:00

In a recent post by Lee Feigenbaum, he talks about Using RDF on the Web. Naturally I find this very interesting. In my work on the Oort toolkit, I use an approach of "removing dimensions": namespaces, I18N (optionally), RDF-specific distinctions (collections vs. multiple properties) and other forms of graph traversing. This is done with declarative programming very similar to ORM-tools in dynamic languages (mainly class declarations with attributes describing desired data selection). The resulting objects become simple value trees — with the added bonus of automatic JSON-serialization.

Ideally, this approach will be deterministically reversible. I have not yet implemented it, but the idea is that the declared classes (RdfQueries in Oort — let's call them "facets" here) could take JSON as input and reproduce triples. Using checksums of the JSON would make over-the-web editing possible.

Since the task of "updating" a subgraph is somewhat difficult at best, I think a basic "wipe and replace" approach may be simplest. There are many dangers here of course (removing a relation must not remove knowledge about its object — unless perhaps if that object is a bnode..).

Albeit all of this is Python to the core now, nothing in the design — declarative as it is — prevents the approach from being more general. Indeed, such facets, were they themselves serializable, could be used as structured content retrieval over-the-web too. Ok, maybe I'm reinventing SPARQL now.. Or should I use SPARQL for their remote execution? It seems reasonable (I mean that's exactly how ORMs do SQL).

Now, I seem to end up in the RPC/RESTful camp with this. A solution to that could be: use the facets on the client, having them use SPARQL for retrieval. Then you have clients working against any kind of SPARQL endpoint, mashup-style. Still, if facets are completely reversible, they may be powerful, aware tools, and perhaps an alternative to SPARQL in intercommunication? That's a pipe dream for me right now though.

The SPARQL-way is of course only for reading data. Fine, the RDF model may be too expressive (fine-grained) to be practical for direct over-the-web editing in specific situations anyway. A confined approach such as this JSON+checksums+"careful with nested knowledge" may be better for this.

I think of JSON-RPC here as I view e.g. microformats with GRDDL — it's leverage, not a final solution. RDF models are the ideal, we may just need reversible simplifications to gain mass usability. I touched upon JSON for integrated apps, but stuff like Atom for "talking to strangers" in response to a previous post by Lee. His post which I refer to above hints at better stuff than just Atom if we want RDF all the way also in more specific apps.

So, what I'm saying is really: I also desire the best of two worlds. The RDF model is sound and extremely valuable, but after all, simple domain-specific object manipulation is what makes the web go round. A solution may be some form of O/R mapping for RDF. The difference is not wanting to forget the data layer (there is no monster in the closet as there is with raw SQL..), just streamline some of the work with it.

Let's hope 2007 will be a good year for the knowledge revolution.

Cobwebs and Rainbows, Round 3

2007-01-16T13:38:00.000+01:00

Web 3.0. Like 2.0, but meaningful and integrated (as opposed to unqualified punyformats and mess-ups). And sensible (whatever that means.. "think Scandinavian Ajax" — to cite my boss at V.).

I'll be damned if that revolution won't finally take us from the information age to the knowledge age. Of course, this would mean yet another technological shift — one I've been longing for for a long time. Let's see (disregarding "Enterprise Web X" here; sorry Java and whatNot- uhm.. dotNet):

Web 1.0: Perl + random flatfiles and some raw SQL (websafe black, white and blue)
Web 1.5: PHP + SQL (off-black on white, green and blue hues, some orange)
Web 2.0: Rails + generated SQL (pink on white, huge fonts, lots of attitude — think adolescence)
Web 3.0: Python + RDF (color is not an issue)

Or I'll die trying. Truly. Die. Trying. (Well at least I may end up drinking as much in my thirties as I did in my twenties. Cheers.)

Then for 4.0, I suppose fully OWL-enabled reasoners and modernized but still brutal OCaml could be fun.

[I'm not utterly serious here, but a bit. Please don't take too much offense.]