Friday, January 26, 2007

A Great Day for Specificity - Using Wikipedia as a Web Database
[...] dbpedia can also be seen as a huge ontology that assigns URIs to plenty of concepts and backs these URIs with with dereferencable RDF descriptions.
We have advanced a tech level!

This is really good timing, as I just recently considered using TDB URNs for referencing non-retrievable things and concepts (using TDB-versions of URLs to Wikipedia articles and web sites). Finding out if the idea of TDB and DURI URNs have been long since abandoned was my next step down that path. (Then there's the use of owl:InverseFunctionalProperty and bnodes (or "just-in-time"-invented ad-hoc URIs) and throwing in owl:sameAs if official ones are discovered..)

With DBPedia, a lot of that becomes unnecessary. The availibility of public, durable URIs for common concepts will surely ease the integration of disparate sources of knowledge. That is, if we start to use dbpedia-URIs in our statements.

And gone will be the strangeness of saying "I'm interested in [the word 'semantics']", "This text was edited with [the website <>]" and "I was born in [the wikipedia article about 'Stockholm']"..

Wednesday, January 17, 2007

Knowledge, The Bits and Pieces

In a recent post by Lee Feigenbaum, he talks about Using RDF on the Web. Naturally I find this very interesting. In my work on the Oort toolkit, I use an approach of "removing dimensions": namespaces, I18N (optionally), RDF-specific distinctions (collections vs. multiple properties) and other forms of graph traversing. This is done with declarative programming very similar to ORM-tools in dynamic languages (mainly class declarations with attributes describing desired data selection). The resulting objects become simple value trees — with the added bonus of automatic JSON-serialization.

Ideally, this approach will be deterministically reversible. I have not yet implemented it, but the idea is that the declared classes (RdfQueries in Oort — let's call them "facets" here) could take JSON as input and reproduce triples. Using checksums of the JSON would make over-the-web editing possible.

Since the task of "updating" a subgraph is somewhat difficult at best, I think a basic "wipe and replace" approach may be simplest. There are many dangers here of course (removing a relation must not remove knowledge about its object — unless perhaps if that object is a bnode..).

Albeit all of this is Python to the core now, nothing in the design — declarative as it is — prevents the approach from being more general. Indeed, such facets, were they themselves serializable, could be used as structured content retrieval over-the-web too. Ok, maybe I'm reinventing SPARQL now.. Or should I use SPARQL for their remote execution? It seems reasonable (I mean that's exactly how ORMs do SQL).

Now, I seem to end up in the RPC/RESTful camp with this. A solution to that could be: use the facets on the client, having them use SPARQL for retrieval. Then you have clients working against any kind of SPARQL endpoint, mashup-style. Still, if facets are completely reversible, they may be powerful, aware tools, and perhaps an alternative to SPARQL in intercommunication? That's a pipe dream for me right now though.

The SPARQL-way is of course only for reading data. Fine, the RDF model may be too expressive (fine-grained) to be practical for direct over-the-web editing in specific situations anyway. A confined approach such as this JSON+checksums+"careful with nested knowledge" may be better for this.

I think of JSON-RPC here as I view e.g. microformats with GRDDL — it's leverage, not a final solution. RDF models are the ideal, we may just need reversible simplifications to gain mass usability. I touched upon JSON for integrated apps, but stuff like Atom for "talking to strangers" in response to a previous post by Lee. His post which I refer to above hints at better stuff than just Atom if we want RDF all the way also in more specific apps.

So, what I'm saying is really: I also desire the best of two worlds. The RDF model is sound and extremely valuable, but after all, simple domain-specific object manipulation is what makes the web go round. A solution may be some form of O/R mapping for RDF. The difference is not wanting to forget the data layer (there is no monster in the closet as there is with raw SQL..), just streamline some of the work with it.

Let's hope 2007 will be a good year for the knowledge revolution.

Tuesday, January 16, 2007

Cobwebs and Rainbows, Round 3

Web 3.0. Like 2.0, but meaningful and integrated (as opposed to unqualified punyformats and mess-ups). And sensible (whatever that means.. "think Scandinavian Ajax" — to cite my boss at V.).

I'll be damned if that revolution won't finally take us from the information age to the knowledge age. Of course, this would mean yet another technological shift — one I've been longing for for a long time. Let's see (disregarding "Enterprise Web X" here; sorry Java and whatNot- uhm.. dotNet):
Web 1.0
Perl + random flatfiles and some raw SQL (websafe black, white and blue)
Web 1.5
PHP + SQL (off-black on white, green and blue hues, some orange)
Web 2.0
Rails + generated SQL (pink on white, huge fonts, lots of attitude — think adolescence)
Web 3.0
Python + RDF (color is not an issue)
Or I'll die trying. Truly. Die. Trying. (Well at least I may end up drinking as much in my thirties as I did in my twenties. Cheers.)

Then for 4.0, I suppose fully OWL-enabled reasoners and modernized but still brutal OCaml could be fun.

[I'm not utterly serious here, but a bit. Please don't take too much offense.]

Thursday, January 11, 2007

Ode To Keepers of Yaks

I am the Yak Shaver
The digression gangsta
I cannot concentrate-ah
I'm living it badd *word it up*
[there would be more verses but right now I'm off to a bookstore shopping for synonym books in urban language - on the way of course realizing I should order one online and become occupied by downloading the latest Opera to my phone so I can view the online shops better - then realizing my battery is low and divert to an electronics shop to get an USB recharger - all in due course for my promising career as a lyricist of course]

The art of Yak Shaving must never be lost to the world. It's my way of life (do the detriment of any spare time of course).