Dust Feed: 2007

Sunday, October 07, 2007

Out To Transmogrify Resource Descriptions

Finally, I've bundled another release of Oort. And OortPub! I've modularized the package into two Eggs, separating the core O/R things from the (experimental) WSGI web stuff.

At the cheeseshop you'll now find them both, at

Oort (0.4), and
OortPub (0.1)

respectively. I did this not the least since the core Oort package is very useful on its own, in conjunction with other tools altogether. I've used it for a lot of stuff lately, none of which involved OortPub. Check out e.g. the QueryContext for some new features..

I hope this will become useful to others as well.

The website got a little overhaul as well, to reflect this change, and to include some more documentation (not so much more yet, but the basic stuff is in place). Oh, by the way, it's generated using Oort objects (among other things like Genshi, ReStructuredText, google-code-prettify and of course RDFLib).

Thursday, September 27, 2007

OMG! Snakes on a Phone!

So, I got this new cell phone from work. A Nokia 6120 Classic, which has just the size and features I wanted (and more, I'm sure). I picked the white/silver model, to go with my kitchen supplies (in honour of the fact that with Series 60, you can basically install the Kitchen Sink if you so choose to).

Going further, I of course wanted to play Monkey Island and such on the amazing ScummVM. Well, admire the running of; actually playing through them might be a little straining.. Though great screen clarity makes it quite possible to discern the orange and pink on the red herring you steal from the Loom™ seagull outside the Scumm Bar kitchen, it's quite thumb-numbing to actually snatch it in time..

And then, surely, I couldn't wait to get to the real goodies: Python. Specifically, The Python for Series 60 distro by Nokia (open source). I knew it was quite capable, but seriously; it basically lets you p0wn your phone.

(Half a decade ago I had fun running Python on my old Psion Revo; but that didn't go much further than some file tinkering and exploration of new-style classes (metaprogramming and whatnot) during commuting. But this is different. The same. But different.)

You go ahead by following the instructions, basically downloading the PythonForS60_1_4_0_3rdEd.sis and PythonScriptShell_1_4_0_3rdEd.sis from sourceforge, wiring or beaming them to the phone and running them to install (py, then shell). I also got the PythonForS60_1_4_0_doc.pdf, and marvelled at its beauty. I mean, apart from greater parts of the Python Standard Library, a mere glance at the modules provided for S60 stuff gives you an impression of the power at hand:

graphics, e32db, messaging, inbox, location, sysinfo, camera, audio, telephone, calendar, contacts, keycapture, topwindow, gles, glcanvas

Now, I haven't gotten very far yet; mostly tinkered with urllib to download some stuff from the web (which just worked (after a nice confirmation dialogue from the phone to actually access the net)), rendering the obligatory mandelbrot and then peeking at the real deal by looking at the SMS inbox contents using the interactive console.

Or actually, using

$ screen /dev/tty.bt_console

from my OS X Terminal (iTerm really). Magically set up by simply following this excellent guide.

Perhaps screen scraping wasn't really what I thought of when wanting Python on the phone, but you can't deny it's cool to be able to (perhaps awaiting flatrate subscription to not drain my wallet in the process though).

Speaking of screens and webs again; damn that WebKit-based browser in the phone really worked! It actually ran the JavaScript-based SyntaxHighlighter, thus rendering the Oort Tutorial quite faithfully. I did not expect that (nor my rekindled desire for highly variable width web sites..).

[And speaking of Oort; I've added some somewhat nifty stuff (if I may say so) lately; accessible in the svn repo at google code for now. A release should be coming up. Pre- or post my planned split into core and web/publication stuff I haven't decided yet. There really is a working RDF-to-object mapper underneath the touted WSGI stuff (which is just a minimal layer).]

Now I should go on to do something productive (like shaving a yak).

Wednesday, September 19, 2007

A Mind must Become be4 it can Go

Now, I do not understand why the Web 4.0 page at Wikipedia has been protected to prevent creation. It is the fear of the machine? Because as everyone must know, Web 4.0 is the peak of our civilization, when mankind will unite in celebration, marvelling at our own magnificence as we give birth to AI. The Web at 4.0 will also affectionately be known as "Web H4L", to honor its predecessor, born on January 12, 1997 at the HAL Plant in Urbana, Illinois.

Tuesday, August 21, 2007

Of wheels and fortunes

[I wrote this comment a while ago, but left it hanging. Rather than letting it die, I just throw it into the mix (to show a life sign if nothing more).]

I was inspired by this post about reinvention (stumbled upon it, probably via reddit). An interesting point, addressing many states of affairs of the past, present and future.

To me "gratuitous reinvention" is a "mixed curse". In general much can be lost by the reinvention of wheels. But it is definitely healthy with competing ideas, problem formulations and solutions. In times of surplus, the risk is unfortunately quite big to overlook the fact that thorough thought processes have already addressed situations that seem novel when first encountered. When this leads to incompatible diversification (as opposed to interchangeable alternatives), much is devalued. This is something one should always be aware of, both when choosing among existing solutions and embarking on own adventures of invention. With that in mind though, quests of the latter type may very well lead to a specification of what needs to be solved, and how. At that point, (re)visiting existing work is a wise course.

Keeping it simple (enough but not more) is a fine ideal. But keeping it integrable is also very important for the evolution of infrastructure — where complexity can evolve without necessarily leaking down into the component parts.

Choosing right tools for jobs reasonably means taking paths that lead to a minimizing of energy expenditure — over time. Time is where the real trickiness comes in. Sometimes, when your intuition tells you to, you have to make leaps of faith. This is true in life as well as in art. Whatever those two things are. Wheels and fortunes.

Thursday, May 10, 2007

Point of Data

(Read all of this, it is a Zen Koan.)

Been thinking about data lately (possibly "in" as well). And information, content, metadata, meaning, knowledge, taxonomies, ontologies.. Ontology. Flashbacks of never-ending philosophical debates, epistemology, Plato's Ideals, empiricism, positivism, all of that. Well not so much of the latter really, I guess I've learnt to recognize mental swamps before I go trekking nowadays.

But data. I think a little clarification could be in order. I'd say it goes like:

Data: Particulars (atoms if you will) of that which compose our impressions. Not the world, but that stuff from which we "get the world".
Information: I just quote Gregory Bateson: "a difference that makes a difference". The part of data we can use, as opposed to "noise" or "void".
Content: A tricky thing. "Composited information with a bound context" perhaps, with varying (often hard to measure) complexity.
Metadata: Let's say "added bits used to externally correlate content". I'm not much of a fan of the word anymore. I may deprecate it in favour of just "context-providing statements". The stuff that turns data we don't get into data we get.

I often just call content and metadata "data in" and "data about" nowadays.

Of Content

Content is perhaps the most widely used and less defined stuff. It's abundant, the substance which we structure (and in so doing "contentifying" it further). It is the composition from which richer meaning can be derived. By it's virtue of having a context. This article is "content". That last statement is information (this was a reification, but I digress). Possibly "metadata". Now it's just getting funny. Anyway, content is stuff which can be molded to gain shapes and shades, color and tone; somewhat "synergetical" effects which may or may not add meaning — more often than not depending on cultural aspects (part of the implicit context).

It is stuff which we leave semi-formal at best, the information we hardly can process by anything less than our own neural networks (brains). Somewhat pessimistically perhaps, information which during interpretation gain illusory qualities (akin to optical illusions) which either enhance or corrupt the bringing of meaning.

I won't get into information theory nor semantical discussions. Neither into a discussion of techniques of how to "fluff" expressions to become more sympathetical, making the receiver prone to interpret them as "meaningful".

To the Point

If you're still reading, this is the part where I intend to make a point. Binding the context by which this becomes meaningful.

This content bears no inherent meaning. It's all in your head. Perchance it may have conveyed a structure which made sense, put your mind into a state of "ah, ok". I really just needed to externalize that first part of categorizing some terms that continue to come up in discussions, and for which I needed some binding interpretations.

Perhaps you detected an "anomaly" if you tried to interpret the terms in a hierarchical fashion. "Data in" and "data of" was my differentiation of Content and Metadata, seemingly "instances of Information". But Data isn't necessarily Information. The thing is, that "thing that makes the difference" is where the "meaning", "semantic relevance", "precise form" comes in, and what that is my current state of mind can only grasp by intuition.

We use Content, and the more fine-grained and "to the point" context-providing statements (Metadata), one as compositions within a context, the other to bind both particulars and contexts by means of relations and characteristics. To have a feel for the difference is important, for it is the key by which to understand why Knowledge Representation is the missing piece in many Information Technology issues today.

And this is where How To Tell Stuff To a Computer, and then The Semantic Web FAQ comes in, as my recommended reading of the week, to get you going. Those sources of information will hopefully make the difference that eluded you here.

That's the point.

(Or is it?)

Tuesday, March 27, 2007

Cutting Edges

For all of you out there writing droves of RDF, churning out endless amounts of Notation 3, and doing it all in Vim.. (we must be in the millions I'm sure), be sure to check out my little RDF Namespace-complete — giving you RDF Vocabulary/Model Omni-Completion for Vim 7+. Requires Vim compiled with Python and RDFLib.

On a related note, I've been using RDFLib a lot lately (around the clock as it seems). It's an honor to contribute in an ever so small way to it, and it's great to see it progressing so well (it's been mature for years if that needs to be said). The SPARQL-support is working, and along with Chimezie Ogbuji's FuXi there is now an even stronger Python RDF platform to work with.

Monday, February 12, 2007

Oort In Space, In Time, At 0.3.2

Oort is now at 0.3.2. See the release history for details. Also, check out the tutorial — now functional at last.

Apart from fixes, the API is slightly cleaner and RdfQueries can now be updated, directly or by loading a dictionary. This means that JSON to RDF is a no-brainer when using Oort to do the semantic enhancements. (This is very new though, use with care.)

Friday, January 26, 2007

A Great Day for Specificity

dbpedia.org - Using Wikipedia as a Web Database

[...] dbpedia can also be seen as a huge ontology that assigns URIs to plenty of concepts and backs these URIs with with dereferencable RDF descriptions.

We have advanced a tech level!

This is really good timing, as I just recently considered using TDB URNs for referencing non-retrievable things and concepts (using TDB-versions of URLs to Wikipedia articles and web sites). Finding out if the idea of TDB and DURI URNs have been long since abandoned was my next step down that path. (Then there's the use of owl:InverseFunctionalProperty and bnodes (or "just-in-time"-invented ad-hoc URIs) and throwing in owl:sameAs if official ones are discovered..)

With DBPedia, a lot of that becomes unnecessary. The availibility of public, durable URIs for common concepts will surely ease the integration of disparate sources of knowledge. That is, if we start to use dbpedia-URIs in our statements.

And gone will be the strangeness of saying "I'm interested in [the word 'semantics']", "This text was edited with [the website <http://vim.org>]" and "I was born in [the wikipedia article about 'Stockholm']"..

Wednesday, January 17, 2007

Knowledge, The Bits and Pieces

In a recent post by Lee Feigenbaum, he talks about Using RDF on the Web. Naturally I find this very interesting. In my work on the Oort toolkit, I use an approach of "removing dimensions": namespaces, I18N (optionally), RDF-specific distinctions (collections vs. multiple properties) and other forms of graph traversing. This is done with declarative programming very similar to ORM-tools in dynamic languages (mainly class declarations with attributes describing desired data selection). The resulting objects become simple value trees — with the added bonus of automatic JSON-serialization.

Ideally, this approach will be deterministically reversible. I have not yet implemented it, but the idea is that the declared classes (RdfQueries in Oort — let's call them "facets" here) could take JSON as input and reproduce triples. Using checksums of the JSON would make over-the-web editing possible.

Since the task of "updating" a subgraph is somewhat difficult at best, I think a basic "wipe and replace" approach may be simplest. There are many dangers here of course (removing a relation must not remove knowledge about its object — unless perhaps if that object is a bnode..).

Albeit all of this is Python to the core now, nothing in the design — declarative as it is — prevents the approach from being more general. Indeed, such facets, were they themselves serializable, could be used as structured content retrieval over-the-web too. Ok, maybe I'm reinventing SPARQL now.. Or should I use SPARQL for their remote execution? It seems reasonable (I mean that's exactly how ORMs do SQL).

Now, I seem to end up in the RPC/RESTful camp with this. A solution to that could be: use the facets on the client, having them use SPARQL for retrieval. Then you have clients working against any kind of SPARQL endpoint, mashup-style. Still, if facets are completely reversible, they may be powerful, aware tools, and perhaps an alternative to SPARQL in intercommunication? That's a pipe dream for me right now though.

The SPARQL-way is of course only for reading data. Fine, the RDF model may be too expressive (fine-grained) to be practical for direct over-the-web editing in specific situations anyway. A confined approach such as this JSON+checksums+"careful with nested knowledge" may be better for this.

I think of JSON-RPC here as I view e.g. microformats with GRDDL — it's leverage, not a final solution. RDF models are the ideal, we may just need reversible simplifications to gain mass usability. I touched upon JSON for integrated apps, but stuff like Atom for "talking to strangers" in response to a previous post by Lee. His post which I refer to above hints at better stuff than just Atom if we want RDF all the way also in more specific apps.

So, what I'm saying is really: I also desire the best of two worlds. The RDF model is sound and extremely valuable, but after all, simple domain-specific object manipulation is what makes the web go round. A solution may be some form of O/R mapping for RDF. The difference is not wanting to forget the data layer (there is no monster in the closet as there is with raw SQL..), just streamline some of the work with it.

Let's hope 2007 will be a good year for the knowledge revolution.

Tuesday, January 16, 2007

Cobwebs and Rainbows, Round 3

Web 3.0. Like 2.0, but meaningful and integrated (as opposed to unqualified punyformats and mess-ups). And sensible (whatever that means.. "think Scandinavian Ajax" — to cite my boss at V.).

I'll be damned if that revolution won't finally take us from the information age to the knowledge age. Of course, this would mean yet another technological shift — one I've been longing for for a long time. Let's see (disregarding "Enterprise Web X" here; sorry Java and whatNot- uhm.. dotNet):

Web 1.0: Perl + random flatfiles and some raw SQL (websafe black, white and blue)
Web 1.5: PHP + SQL (off-black on white, green and blue hues, some orange)
Web 2.0: Rails + generated SQL (pink on white, huge fonts, lots of attitude — think adolescence)
Web 3.0: Python + RDF (color is not an issue)

Or I'll die trying. Truly. Die. Trying. (Well at least I may end up drinking as much in my thirties as I did in my twenties. Cheers.)

Then for 4.0, I suppose fully OWL-enabled reasoners and modernized but still brutal OCaml could be fun.

[I'm not utterly serious here, but a bit. Please don't take too much offense.]

Thursday, January 11, 2007

Ode To Keepers of Yaks

I am the Yak Shaver
The digression gangsta
I cannot concentrate-ah
I'm living it badd *word it up*

[there would be more verses but right now I'm off to a bookstore shopping for synonym books in urban language - on the way of course realizing I should order one online and become occupied by downloading the latest Opera to my phone so I can view the online shops better - then realizing my battery is low and divert to an electronics shop to get an USB recharger - all in due course for my promising career as a lyricist of course]

The art of Yak Shaving must never be lost to the world. It's my way of life (do the detriment of any spare time of course).