Semantic search

Jump to navigation Jump to search

Introducing rdf2owlxml

Very thoughtful - I simply forgot to publish the last entry of this blog. Well, there you see it finally... but let's move to the new news.
Another KAON2 based tool - rdf2owlxml - just got finished, a converter to turn RDF/XML-serialisation of an OWL-ontology into an OWL/XML Presentation Syntax document. And it even works with the Wine-ontology.

So, whenever you need an ontology in the easy to read OWL/XML Presentation Syntax - for example, in order to XSL it further to a HTML-page representing your ontology, or anything like that, because it's hard to do this stuff with RDF/XML, go to rdf2owlxml and just grab the results! (The results work fine with dlpconvert as well, by the way).

Hope you like it, but be reminded - it is a very early service right now, only a 0.2 version.

Released dlpconvert

There's so much to do right now here, and even much, much more I'd like to do - and thus I'm getting a bit late on my announcements. Finally, here we go: dlpconvert is released in a 0.5 version.

You probably think: what is dlpconvert? dlpconvert is able to convert ontologies, that are within the dlp-fragment, from one syntax - namely the OWL XML Presentation Syntax- into another, here Datalog. Thus you can just take your ontology, convert it and then use the result as your program in your Prolog-engine. Isn't that cool?

Well, it would be much cooler if it were stronger tested, and if it could read some more common syntaxes like RDF/XML-Serialisation of OWL ontologies, but both is on the way. As for testing I would hope that you may test a bit - as for the serialisation, it should be available pretty soon.

dlpconvert is based totally on KAON2 for the reduction of the ontology.

I will write more as soon as I have more time.

Comments to naming

Richard Newman sent me some thoughtful comments via eMail on the What's in a name series (there were also some great comments on the individual entries, feel free to browse them). He sent them via eMail, cause he thought he couldn't comment - that should be wrong, everyone should be able to comment anonymously. Or did anyone else encounter problems? I should switch to some dedicated software soon, anyway, but right now I don't have the time to dig deeper into it. I especially miss trackback, sigh.

Here's what Richard wrote:

"Your first point, about ISBNs and "what's being referenced" --- I think you'd be interested in FRBR, which is a modelling of the bibliographical domain. It splits things up into

Work -> Expression -> Manifestation -> Item

A work is an abstract concept, like "Politeia". An expression is a realisation of a work, so a particular translation is an expression. A manifestation is physical embodiment of an expression: this is what's given an ISBN. All copies of a certain book are Items; the edition of the book is their Manifestation.

So, you see, when you're discussing Plato's Politeia, you have to be conceptually clear about whether you're talking about works, expressions, manifestations, or items.

E.g.

:PolWork dc:creator "Plato" ;
  rdfs:label "Plato's Politeia, the abstract concept." .
:PolExp1 ex:translator "Mr Smith" ;
  frbr:work :PolWork ;
  rdfs:label "Mr. Smith's translation of Plato's Politeia." .
:PolMan1 ex:publisher "Penguin" ;
  frbr:expression :PolExp1 ;
  rdfs:label "Penguin's edition of Smith's translation." .
:MyCopy ex:owner hg:RichardNewman ;
  frbr:manifestation :PolMan1 ;
  rdfs:label "Richard's copy of the Penguin edition." .

Do you see? Each level has its own properties (and some may be duplicated; e.g. each has a title: the title of the abstract work, the name given to the translation, the name Penguin prints on each book, and the name printed on my copy).

I've done a bit of work on modelling FRBR in RDFS/OWL, but haven't yet finished. "

I think that's really interesting, and taking a look at FRBR it was pretty well done. I sure am looking forward to see Richards interpretation in OWL, and will probably use it.

"Your second issue is the difference between a resource and its representation. A URI should only refer to one thing; it is entirely wrong to use http://www.holygoat.co.uk to refer both to my homepage (as in using RDF to describe its language, or size, or last-modified) and to me (my name, my email address, etc.) which I have seen done.

Your web server should return RDF for http://semantic.nodx.net/#Plato if your browser says that it accepts RDF+XML. A normal browser should have an HTML representation returned. Indeed, it's possible to do the following:

  • the abstract resource. Hit this with a browser, get an HTML page; with an RDF agent, get some RDF.
http://example.com/Plato a rdf:resource .
  • the HTML representation.
http://example.com/Plato/html a ex:representation ;
  ex:representationOf http://example.com/Plato .
    1. the RDF.
http://example.com/Plato/rdf a ex:representation ;
  ex:representationOf http://example.com/Plato .

i.e. you can unambiguously refer to each representation, and the resource. When your client arrives, asking for Plato, you can redirect them to the appropriate place. Clever, huh?

URIs should never give a 404. They should return the appropriate headers or content for whatever the client is requesting; this may be the RDF file in which the resource is defined, if the client understands RDF, or an HTML page.

If you're interested in this sort of thing, it pops up on the W3C's RDF Interest Group list occasionally.

Patrick Stickler and others have come up with an additional HTTP verb, MGET, which will return the RDF description of a resource. Combined with their URIQA architecture, it will give you a Concise Bounded Description for a URI. This stops you having to somehow put descriptions into particular files, and better deals with the distributed nature of the Semantic Web. Check it out; it presents several convincing arguments for not using fragment identifiers to refer to resources, and solves your bandwidth problem. You should never have to dump a whole file to get a description of a URI."

I have to note that Richard wrote me this just after part 4 of the series was published, so I could answer some of the questions already in the last two parts. Just to summarise it: I don't like content negotiation. Although it is technically totally feasible, I disagree that it should be done or is a good solution. If my browser asks for http://semantic.nodix.net/#Plato I don't think I should get different things depending on the content negotiation. This feels like cheating.

I wrote that to Richard already, and he answered:

"I think we agree on the main point, which is that

foaf:name "Richard" ; ex:format "HTML" .

which is a travesty :) "

He is totally right here.

"You still see it happen, though, with people referring to Wikipedia pages as if they were the abstract resource.

The content negotiation (getting different things depending on what you accept) is exactly what the Web is supposed to do. If I'm using a mobile browser, I want a simplified version of a page; if I'm an RDF agent, I want RDF, if it exists, because HTML is of no use to me. A common usage of this is to serve up strict XHTML to Mozilla, and less-strict HTML to Internet Explorer. It is also done all the time to serve PNG where the client accepts it, and GIF if it doesn't, and there is an intentional disconnect on the Web between a resource and its representations.

The lack of such a disconnect would lead to exactly the problem you describe; if I can't return a representation of a resource, because it's abstract, then how do I find out anything about it? I could use MGET, but you can't MGET a person... so, if you want to talk about the real world thing "Plato", he has to 404, or you get the "what am I talking about?" problem. Better, in my view, to redirect a browser to plato.html and a SW agent to a chunk of RDF. "

I would rather like to ask for http://semantic.nodix.net/Plato.rdf to get the RDF/XML representation, http://semantic.nodix.net/Plato.owl to get the OWL/XML representation, http://semantic.nodix.net/Plato.html to get a HTML page for the user to read and Plato.jpg for a picture of Plato. This shouldn't be hidden behind content negotiation. I know, I know, Patrick would strongly disagree here, but I think it feels wrong and actually defies the idea of an URI.

"You can do exactly that (and I agree that the representations should have separate URIs --- conneg is only for when you're trying to get some description of an abstract resource), but then how do you refer to the abstract concept of "Plato"? http://.../Plato is a resource, and I want to make statements about him. But there's no point in it being 404 when dereferenced, because then how would I find out that Plato.html exists? HTTP doesn't return URIs, it returns representations of them.

A URI is simply something that is dereferenced to get a representation, and that representation should be decided on by conneg. In this case, /Plato is an abstract resource, so one of the representations should be returned. We can then make statements about Plato (e.g. foaf:name "Plato"), and about the JPEG and HTML representations, because they have different URIs, but still get something useful back when we want to access /Plato."

I also dislike MGET right now. Maybe I am wrong, but to me, the whole URIQA architecture feels somewhat wrong - but maybe I should just dwell deeper into it, I have to admit, I didn't study it yet enough to really be in a position to bash on it. The problem is, that MGET seems unnecessary to me - and it works on a different conceptual level than the rest of the Semantic Web proposals. I think everything MGET solves can be solved with tools that already exist: Richards example above, where he gives triples telling us which representations are used to describe a resource, shows perfectly well that you actually don't need content negotiation and MGET.

"There are things to question about URIQA, but it does have some good going for it. MGET is actually an implicit query. In the standard Web model, you request URIs and get back document representations. Doing an MGET on a Web server is asking it to return a description, regardless of where on the site descriptions of that resource exist, and you're explicitly asking for meta-data. As Patrick points out, it's similar doing a GET and specifying that you accept RDF, but is likely to be more concise (the difference between a "representation" and a "description"). In fact, this is exactly what the Nokia URIQA server does.

MGET overlaps with query servers a bit, and with GET a bit, but it's a little bit special, too. The whole idea is that from a single URI you can get a useful description of a resource, just by issuing a single MGET. Every other approach needs more work."

This URIQA / MGET stuff sounds more and more interesting. I really should dwell deeper into it.

Also, the idea of Concise Bounded Descriptions may be very neat, I have to study that more as well. Funny thing, the very same day Richard pointed me to it, a colleague told me about it too - this is usually a sign, that this idea is worth considering more.

Richard also wrote "URIs should never give a 404", and as you know, I disagreed with it mildly. He tried to summarise his position:

"I consider that each returned resource should have its own URI --- e.g. Plato.jpg --- and that the original URI should be used to make statements about the abstract resource. This allows you to say

...Plato foaf:name "Plato" .
...Plato.jpg ex:resolution "150dpi" .
...Plato.html dc:creator "Denny" .

Dereferencing the abstract resource, rather than throwing a 404, should do something useful --- e.g. redirecting with a 303 to one of the representations. Have you ever tried viewing a Blogger Atom feed in your browser? If you hit it with an RSS reader, you get the XML, but in a browser Blogger shows you an XHTML transformation of the XML. That's useful, and I think that's how the Semantic Web should work. Imagine if your agent hit /Plato, and got RDF out of it, but when you looked at it with your browser you saw a dynamically-generated HTML page? Handy!

I can understand your objection, though; it does seem wrong that you get different things out of the same URI. However, you should almost always get HTML out of plato.html, and RDF out of plato.rdf. All the conneg is doing is making sure you can see an abstract thing in the best way possible, according to what you've told the server you can understand. "

Richard is pretty good in convincing me, cause he uses the right arguments: it's for the people, dummy, and the machines can work it out anyway.

I still stick to the recommendations I gave yesterday. But just as I am writing, and rereading it all, I am starting to change my mind on content negotiation. Maybe it is a good thing. I will have to think about it some more, and as soon as I come to a solution, I will bother you with it again. I still have a gut feeling about it that tells me 'no', but the reasons given sound very convincing and I agree with most of them, so heck, let's meditate on this as soon as I find a few hours to spare.

Big thanks to Richard and his thoughts, anyway. I hope this discussion helps you to make up your own mind as well.

What's in a name - Part 6

In this series we learned how to make URIs for entities. I know there's a big discussion flaring up every few weeks or so, if we should use fragment identifier or not. For me, this question is pretty much settled. Using a fragment identifier has the advantage of giving you the ability of providing a human readable page for those few lost souls who look up the URI, so maybe it's a tad nicer than using no fragment identifier and returning 404s. Not using fragids has the advantage of probably reducing bandwidth - but this discussion should be more or less academic, because looking up URIs, as we have seen, should not happen.

There is some talking about different representations, negotiating media-types, returning RDF in one, XHTML in the other case, but to be honest, I think that's far too complicated. And you would need to use another web server and extensions to HTTP to make this real, which doesn't really help the advent of the Semantic Web. Look at Nokias URIQA project for more information.

Keep this rules in mind, and everything should be fine:

  • be careful to use unused URIs if you reference a new entity. Take one from an URI space you have control of, so that URI collision won't appear
  • don't put a website under the URI you used to to name an entity. That would lead to URI collision
  • try to make nice looking URIs, but don't try to hard. They are supposed to be hidden by the application anyway
  • provide rdfs:label and rdfs:seeAlso instead. This solves everything you would want to try to solve with URI naming, but in a standard compliant way
  • give your resources URIs. Please. So that other can reference them more easily.

I should emphasise the last one more. Especially using RDF/XML-Syntax easily leads to anonymous nodes, which are a pain in the ass because they are hard or impossible to address. Especially, don't use rdf:nodeID. They don't give your node an ID that's visible to the outer world. This is just a local name. Don't use it, please.

The second is using them like this:

<foaf:person about="me">
  <foaf:knows>
    <foaf:Person>
      <foaf:name>J. Random User</foaf:name>
    </foaf:Person>
  </foaf:knows>
</foaf:Person>

Actually, the Person known to "me" is an anonymous one. You can't refer to her. Again, try to avoid that. If you can, look up the URI the person gave to herself in her own FOAF-file. Or give her a name in your own URI-space. Don't be afraid, you won't run out of it.

Another very interesting approach is to use published subjects. I will return to this in another blog, promised, but so long: never forget, there is owl:sameAs to make two URIs point to the same thing, so don't mind too much if you doublename something.

Well, that's it. I hope you enjoyed the series, and that you learned a bit from it. Looking forward to your comments, and your questions.

What's in a name - Part 5

After calling Plato an XML-Element, making movies out of websites and having several accidents with careless URIs, it seems we return to the very beginning of this series.

http://semantic.nodix.net/document/Politeia dc:creator "Plato".

Whereby http://semantic.nodix.net/document/Politeia explicitly does not resolve but returns a 404, resource not found. Let's remember, why didn't we like it? Because humans, upon seeing this, have the urge to click on it in order to get more information about it. A pretty good argument, but every solution we tried brought us more or less trouble. We didn't get happy with any of them.

But how can I dismiss such an argument? Don't I risk loosing focus with saying "don't care about humans going nowhere"? No, I really don't think so. Due to two reasons, one meant for humans and one for the machines.

First the humans (humans always should go first, remember this, Ms and Mr PhD-student): humans actually never see this URI (or at least, should not but when debugging). URIs who will grace the GUI should have a rdfs:label which provides the label human users will see when working with this resource. Let's be honest: only geeks like us think that http://semantic.nodix.net/document/Politeia is a pretty obvious and easy name for a resource. Normal humans would probably prefer "Politeia", or even "The Republic" (which is the usual name in English speaking countries). Or be able to define their own name.

As they don't see the URI, they actually never feel the urge to click on it, or to copy and paste it to the next browser window. Naming it http://semantic.nodix.net/document/Politeia instead of http://semantic.nodix.net/concept/1383b_xc is just for the sake of readability of the source RDF files, but actually you should not derive any information out of the URI (that's what the standard says). The computer won't either.

The second point is, a RDF application shouldn't look up URIs either. It's just wrong. URIs are just names, it is important that they remain unique, but they are not there for looking up in a browser. That's what URLs are for. It's a shame they look the same. Mozilla realised the distinction when they gave their XUL language the namespace http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul. Application developers should realise this too. rdfs:seeAlso and rdfs:isDefinedBy give explicit links applications may follow to get more information about a resource, and using owl:imports actually forces this behaviour - but the name does not.

Getting information out of names is like making fun of names. It's mean. Remember the in-kids in primary school making fun of out-kids because of their names? You know you're better than that (and, being a geek, you probably were an out-kid, so mere compassion and fond memories should hold you back too)..

Just to repeat it explicitly: if an URI gives back a 404 when you put it in a browser navigation bar - that's OK. It was supposed to identify resources, not to locate them.

Now you know the difference between URIs and URLs, and you know why avoiding URI collision is important and how to avoid it. We'll wrap it all in the final instalment of the series (tomorrow, I sincerely hope) and give some practical hints, too.

By the way, right after the series I will talk about content negotiation, which was mentioned in the comments and in e-Mails.

Uh, and just another thing: the wary reader (and every reader should be wary) may also have noticed that

Philosophy:Politeia dc:creator "Plato".

is total nonsense: it says, that there is a resource (identified with QName Philosophy:Politeia) that is created by "Plato". Rest assured that this is wrong - no, not because Socrates should be credited as the creator of the Politeia (this is another discussion entirely) but because the statement claims that the string "Plato" created it - not a Person known by this name (who would be a resource that should have an URI). But this mistake is probably the most frequent one in the world of the Semantic Web - a mistake nevertheless.

It's OK if you make it. Most applications will cope with it (and some are actually not able to cope with the correct way). But it would not be OK if you didn't know that you are making a mistake.

What's in a name - Part 4

I promised you four solutions to the problem of dubbing with appropriate URIs. So, without further ado, let's go.

The first one you've seen already. It's using anonymous nodes.

_person foaf:interest _security.
http://dmoz.org/Computers/Security/ dc:subject _security.

But here we get the problem, that we can't reference _security from outside, thus loosing a lot of the possibilities inherent in the Semantic Web, because this way you can not say that someone else is interested in the same topic as _person above. Even if you say, in another RDF file,

_person2 foaf:interest _security.
http://dmoz.org/Computers/Security/ dc:subject _security.

_security actually does not have to be the same as above. Who says, websites only have one subject? The coincidental equality of the variable name _security bears as much semantics as the equality of two variables x in a C and a Python-Program.
So this solution, although possible, bears too much short-comings. Let's move on.

The second solution is hardly available to the majority of us puny mortals. It's introducing a new URI schema. Let's return to our very first example, where we wanted to say that the Politeia was written by Plato.

urn:isbn:0192833707 dc:creator "Plato".

Great! No problems here. Sure, your web-browser can't (yet) resolve urn:isbn:0192833707, but no ambiguity here: we know exactly of what we speak.

Do we? Incidentally, urn:isbn:0465069347 also denotes the Politeia. No, not in another language (those would be another handful of ISBN numbers), just a different version (the text is public domain). Now, does the following statement hold?

urn:isbn:0192833707 owl:sameAs urn:isbn:0465069347.

Most definitively not. They have different translators. They have different publishers. These are different books. But it's the same - what? What is the same? It's not the same text. It's not the same book. They may have the same source text they are translated from. But how to express this correctly and still useful?

The urn:isbn: scheme is very useful for a very special kind of entities - published books, even the different versions of published books.
The problem with this solution that you would need tons of schemes. Imagine the number of committees! This would, no, this should never happen. We definitively need an easier solution, although this one certainly does work for very special domains.

Let's move on to the third solution: the magic word is fragment identifier. #. Instead of saying:

http://semantic.nodix.net/Politeia dc:creator http://semantic.nodix.net/Plato.

and thus getting 404s en masse, I just say:

http://semantic.nodix.net/#Politeia dc:creator http://semantic.nodx.net/#Plato.

See? No 404. You get to the homepage of this blog by clicking there. And it's valid RDF as well. So, isn't it just perfect? Everything we wished for?

Not totally, I fear. If I click on http://semantic.nodx.net/#Plato, I actually expect to read something about Plato, and not to see a blog about the Semantic Web. So this somehow would disappoint me. Better than a 404, still...
The other point is my bandwidth. There can be RDF files with thousands of references. Following every single one will lead to considerable bandwidth abuse. For naught, as there is no further information about the subject on the other side. Maybe using http://semantic.nodix.net/person#Plato would solve both problems, with http://semantic.nodix.net/person being a website saying something like "This page is used to reserve conceptual space for persons. To understand this, you must understand the magic of URIs and the Semantic Web. Now, go back whereever you came from and have a nice day." Not too much webspace and bandwith will be used for this tiny HTML-page.

You should be careful though to not have a real fragment identifier "Plato" in the page, or you would actually dereference to this element. URI collision again. You don't want Plato to become half-philosopher / half-XML-element, do you?

We will return to fragment identifiers in the last part of this six part series again. And now let's take a quick look at the fourth solution - we will discuss it more thoroughly next time.

Use a fresh URI whenever you need an URI and don't care about it giving a 404.

What's in a name - Part 3

Last time we merrily published our first statement for the Semantic Web:

http://www.imdb.com/title/tt0088247/ http://purl.org/dc/elements/1.1/creator "James Cameron".

A fellow Semantic Web author didn't like the number-encoded IMdb-URI, but found a much more compelling one and then published the following statement:

http://en.wikipedia.org/wiki/The_Terminator http://purl.org/dc/elements/1.1/date "1984-10-26".

A third one sees those and, in order to foster integration of data offers helpfully the following statement:

http://www.imdb.com/title/tt0088247/ owl:sameAs http://en.wikipedia.org/wiki/The_Terminator.

And now they live merrily ever after. Or do you hear the thunder of doom rolling?

The problem is that the URIs above actually already denote something, namely the IMdb website about the Terminator and the Wikipedia-article on the Terminator. They did not denote the movie itself, but that's how they're used in our examples. Statement #3 above actually says the two websites are the same. The first one says, that "James Cameron" created the IMdb website on the Terminator (they'd wish), and the second one says that the Wikipedia article was created in 1984, which is wrong (July 23, 2001 would be the correct date). We have a classic case of URI collision.

This happens all the time. People working professionally on this do this too:

_person foaf:interest http://dmoz.org/Computers/Security/.

I'd bet, _person (remaining anonymously here) does not have such a heavy interest in the website http://dmoz.org/Computers/Security/, but rather in the Topic the website is about.

_person foaf:interest _security.
http://dmoz.org/Computers/Security/ dc:subject _security.

Instead of letting _security be anonymous, we'd rather give it a real URI. This way we can reference it later.

_person foaf:interest http://semantic.nodix.net/topic/security.
http://dmoz.org/Computers/Security/ dc:subject http://semantic.nodix.net/topics/security.

But, oh pain - now we're exactly at the same spot we've been in the last part. We have an URI that does not dereference to a website (by the way, I do know that the definition of foaf:interest actually says the semantics of foaf:interest is, that the Subject is interested in the Topic of the Object, and not the Object itself, but that's not my point here)
Thinking for a moment about it, we must conclude that it is actually impossible to achieve both goals: either the URIs will identify a resource retrievable over the web and are thus unsuitable as URIs for entities outside the web (like persons, chairs and such) because of URI collision, or they don't - and will then lead to 404-land.

Isn't there any solution? (Drums) Stay tuned for the next exciting installment of this series, introducing not one, not two, not three, but four solutions to this problem!

What's in a name - Part 2

How to give a resource a name, an URI? Let's look at this statement:

movie:Terminator dc:creator "James Cameron".

Happy with that? This is a valid RDF statement, and you understand what I wanted to say, and your RDF machine will be able to read and process it, too, so everything is fine.

Well, almost. movie:Terminator is a QName, and movie: is just a shorthand prefix, a namespace, that actually has to be defined as something. But as what? URIs are well-defined, so we shouldn't just define the namespace arbitrarily. The problem is, someone else could do the same, and suddenly, one URI could denote two different resources - this is called URI collision, and it is the next worst thing to immanentizing the Eschaton. That's why you should grab some URI space for yourself and there you go, you may define as many URIs there as you like (remember, the U in URI means Universal, that's why they make such a fuss about the URI space and ownership of it).

I am the webmaster of http://semantic.nodix.net, and the URI belongs to me and with it, all the URIs starting with it. Thus I decide, that movie: shall be http://semantic.nodix.net/movie/. Our example statement thus is the same as:

http://semantic.nodix.net/movie/Terminator http://purl.org/dc/elements/1.1/creator "James Cameron".

So this is actually what the computer sees. The short hand notation above is just for humans. But if you're like me, and you see the above Subject, you're already annoyed that it is not a link, that you can't click on it. So you copy it into your browser address bar, and go to http://semantic.nodix.net/movie/Terminator. Ups. A 404, the website is not found. You start thinking, oh man, stupid! Why you giving the resource such a name that looks so much like an web address, and then point it to 404-Nirvana?

Many think so. That's because they don't grasp the difference between URIs and URLs, and to be honest, this difference is maybe the worst idea the W3C ever had (that's a hard-to-achieve compliment, considering the introduction of XML/RDF-serialisation and XSD). We will return to this difference, but for now, let's see what usually happens.

Because http://semantic.nodix.net/movie/Terminator leads to nowhere, and I'm far too lazy to make a website for the Terminator just for this example, we will take another URI for the movie. Jumping to IMdb we quickly find the appropriate one, and then we can reformulate our statement:

http://www.imdb.com/title/tt0088247/ http://purl.org/dc/elements/1.1/creator "James Cameron".

Great! Our subject is a valid URI, clicking on http://www.imdb.com/title/tt0088247/ (or pasting it to a browser) will tell you more about the subject, and we have a valid RDF statement. Everything is fine again...

...until next time, where we will discuss the minor problems of our solution.

What's in a name - Part 1

There are tons of mistakes that may occur when writing down RDF statements. I will post a six part series of blog entries, starting with this one, about what can go wrong in the course of naming resources, why it is wrong, and why you should care - if at all. I'll try to mix experience with pragmatics, usability with philosophy. And I surely hope that, if you disagree, you'll do so in the comments or in your own blog.

The first one is the easiest to spot. Here we go:

"Politeia" dc:creator "Plato".

If you don't know about the differences between Literals, QNames and URIs, please take a look at the RDF Primer. It's easy to read and absolutely essential. If you know about the differences, you already know that the above said actually isn't a valid RDF statement: you can't have a literal as the subject of a statement. So, let's change this:

philo:Politeia dc:creator "Plato".

What's the difference between these two? In the first one you say that "Plato" is the creator of "Politeia" (we take the semantics of dc:creator for granted for now). But in the second you say that "Plato" is the creator of philo:Politeia. That's like in Dragonheart, where Bowen tries to find a name for the dragon because he can't just call him "dragon", and he decides on "draco". The dragon comments: "So, instead of calling me dragon in your own language, you decide to call me dragon in another language."

Yep, we decide to talk about Politeia in another language. Because RDF is another language. It tries to look like ours, it even has subjects, objects, predicates, but it is not the language of humans. It is (mostly) much easier, so easy in fact even computers can cope with it (and that's about the whole point of the Semantic Web in the first place, so you shouldn't be too surprised here).

"Politeia" has a well defined meaning: it is a literal (the quotation marks tell you that) and thus it is interpreted as a value. "Politeia" actually is just a word, a symbol, a sign pointing to the meant string Politeia (a better example would be: "42" means the number 42. "101010b", "Fourty-Two" or "2Ah" would have been perfectly valid other signs denoting the number 42).

And what about philo:Politeia? How is it different from "Politeia", what does this point to?

philo:Politeia is a Qualified Name (QName), and thus ultimatively a short-hand notation for an URI, an Unified Resource Identifier. In RDF, everything has to be a resource (well, remember, RDF stands for Resource Description Framework), but that's not really a constraint, as you may simply consider everything a resource. Even you and me. And URIs are names for resources. Universally (well, at least globally) unique names. Like philo:Politeia.

You may wonder about what your URI is, the one URI denoting you. Or what the URI of Plato is, or of the Politeia? How to choose good URIs, and what may go wrong? And what do URIs actually denote, and how? We'll discuss this all in the next five parts of this series, don't worry, just stay tuned.

Why we will win

People keep saying that the Semantic Web is just a hype. That we are just an unholy chimaera of undead AI researchers talking about problems solved by the database guys 15 years ago. And that our work will never make any impact in the so called real world out there.

As I stated before: I'm a believer. I'm even a catholic, so this means I'm pretty good at ignoring hard facts about reality in order to stick to my beliefs, but it is different in this case: I slowly start to comprehend why Semantic Web technology will prevail and make life better for everyone out there. It' simply the next step in the IT RevoEvolution.

Let's remember the history of computing. Shortly after the invention of the abacus the obvious next step, the computer mainframe, appeared. Whoever wanted to work with it, had to learn to use this one mainframe model (well, the very first ones were one-of-a-kind machines). Being able to use one didn't necessarily help you using the other.

First the costs for software development were negligible. But slowly this changed, and Fred Brooks wrote down his experience with creating the legendary System/360 in the Mythical Man-Month (a must-read for software engineers), showing how much has changed.

Change was about to come, and it did come twofold. Dennis Ritchie is to blame for both of them: together with Ken Thompson he made Unix, but in order to make that, he had to make a programming language to write Unix in, this was C, which he made together with Brian Kernighan (this account is overly simplified, look at the history of Unix for a better overview).

Things became much easier now. You could port programs in a simpler way than before, just recompile (and introduce a few hundred #IFDEFs). Still, the masses used the Commodore 64, the Amiga, the Atari ST. Buying a compatible model was more important than looking at the stats. It was the achievement of the hardware development for the PC and of Microsoft to unify the operating systems for home computers.

Then came the dawning of the age of World Wide Web. Suddenly the operating system became uninteresting, the browser you use was more important. Browser wars raged. And in parallel, Java emerged. Compile once, run everywhere. How cool was that? And after the browser wars ended, the W3Cs cries for standards became heard.

That's the world as it is now. Working at the AIFB, I see how no one cares what operating system the other has, be it Linux, Mac or Windows, as long as you have a running Java Virtual Machine, a Python interpreter, a Browser, a C++ compiler. Portability really isn't the problem anymore (like everything in this text, this is oversimplified).

But do you think, being OS independent is enough? Are you content with having your programs run everywhere? If so, fine. But you shouldn't be. You should ask for more. You also want to be independent of applications! Take back your data. Data wants to be free, not locked inside an application. After you have written your text in Word, you want to be able to work with it in your Latex typesetter. After getting contact information via a Bluetooth connection to your mobile phone, you want to be able to send an eMail to the contact from your web mail account.

There are two ways to achieve this: the one is with standard data formats. If everyone uses vCard-files for contact information, the data should flow freely, shouldn't it? OpenOffice can read Word files, so there we see interoperability of data, don't we?

Yes, we do. And if it works, fine. But more often than not it doesn't. You need to export and import data explicitly. Tedious, boring, error prone, unnerving. Standards don't happen that easily. Often enough interoperability is achieved with reverse engineering. That's not the way to go.

Using a common data model with well defined semantics and solving tons of interoperability questions (Charset, syntax, file transfer) and being able to declare semantic mappings with ontologies - just try to imagine that! Applications being aware of each other, speaking a common language - but without standard bodies discussing it for years, defining it statically, unmoving.

There is a common theme in the IT history towards more freedom. I don't mean free like in free speech, I mean free like in free will.

That's why we will win.

I am weak

Basically I was working today, instead of doing some stuff I should have finished a week ago for some private activities.

The challenge I posed myself: how semantic can I already get? What tools can I already use? Firefox has some pretty neat extensions, like FOAFer, or the del.icio.us plugin. I'll see if I can work with them, if there's a real payoff. The coolest, somehow semantic plugin I installed is the SearchStatus. It shows me the PageRank and the Alexa rating of the visited site. I think that's really great. It gives me just the first glimpse of what metadata can do in helping being an informed user. The Link Toolbar should be absolutely necessary, but pitily it isn't, as not enough people make us of HTMLs link element the way it is supposed to be used.

Totally unsemantic is the mouse gestures plugin. Nevertheless, I loved those with Opera, and I'm happy to have them back.

Still, there are such neat things like a RDF editor and query engine. Installed it and now I want to see how to work with it... but actually I should go upstairs, clean my room, organise my bills and insurance and doing all this real life stuff...

What's the short message? Get Firefox today and discover its extensions!

Imagine there's a revolution...

... and no one is going to it.

This notion sometimes scares me when I think abou the semantic web. What if all this great ideas are just to complex to be implemented? What if it remains an ivory tower dream? But, on the other hand, how much pragmatism can we take without loosing the vision?

And then, again, I see the semantic web working already: it's del.icio.us, it's flickr, it's julie, and there's so much more to come. The big time of the semantic web is yet to come, and I think none of us can really imagine the impact it is going to have. But it will definitively be interesting!

AcceLogiChip

Accelerated logic chips - that would be neat.

The problem with all this OWL stuff is, that it is computationally expensive. Google beats you in speed easily, having some 60.000 PCs or so, but indexing some 8 billion web pages, each with maybe a thousand words. And if you ever tried Googles Desktop Search, you will see they can perform this miracles right on your PC too! (Never mind that there are a dozen tools doing exactly the same stuff Googles Desktop Search does, just better - but hey, they lack the name!)

What does the Semantic Web achieve? Well, ever tried to run a logic inferencing engine with a few million instances? With a highly axiomatized TBox of, let's say, just a few thousand terms? No? You really should.

Sure, our PCs do get faster all the time (thanks to Moores Law!), but is that fast enough? We want to see the Semantic Web up and running not in a few more iterations of Moores Law, but much, much earlier. Why not use the same trick graphic magicians did? Highly specialized accelerated logic chips, things that can do your tableu reasoning in just a fraction of the time needed with your bloated all-purpose-CPU.

World Wide Prolog

Today I had an idea - maybe this whole Semantic Web idea is nothing else than a big worldwide Prolog program. It's the AI researchers trying to enter the real world through the W3Cs backdoor...

No, really, think about it: almost all most people do with OWL is actually some logic programing. Declaring subsumptions, predicates, conjunctions, testing for entailment, getting answers out of this - but on a world wide scale. And your browser does the inferencing for you (or maybe the server? Depends on your architecture).

They are still a lot of questions open (and the actual semantic differences between Description Logics, and Logic Programming surely ain't the smalles ones of them), like how to infere anything with contradicting data (something that surely will happen in the World Wide Semantic Web), how to treat dynamics (I'm not sure how to do that without reification in RDF), and much more. Looking forward to see this issues resolved...

Gnowsis and further

Today, Leo Sauermann of the DFKI was here, presenting his work on Gnowsis. It was really interesting, and though I don't agree with everything he said, I am totally impressed by the working system he presented. It's close to some ideas I had, about a Semantic Operating System Kernel, doing nothing but administrate your RDF data and offering it to any application around via a http-protocol. Well, I guess this idea was just a tat too obvious...

So I installed Gnowsis on my own desktop and play around with it now. I guess the problem is we don't really have roundtrip information yet - i.e., Information I change in one place shall magically be changed everywhere. What Gnowsis does is integrate the data from various sources into one view, that makes a lot of applications easily accessible. Great idea. But roundtripping data integration is definitively what we need: if I change the phone number of a person, I want this change to get propagated to all applications.

So again, differing to Gnowsis I would prefer a RDF store, that actually offers the whole data householding for all applications sitting atop. Applications are nought but a view on your data. Integrating from existing applications is done the Gnowsis way, but after that we leave the common trail. Oh well, as said, really interesting talk.

Mother philosophy

I should start to write some content on this blog soon, but actually I am still impressed with this technology I am learning here every day...

When the FOIS2004 was approaching, an Italian newspapers published this under the heading "Philosophy - finally useful for something" (or so, my Italian is based on a autodidactic half day course). I found this funny, and totally untrue.

Philosophy always had the bad luck, that every time a certain aspect of it provoced wider attention, this aspect became a discipline of its own. Physics, geometry and mathematics are the classical examples, later on theology, linguistics, anthropology, and then, in the 20th century, logic went this way too. It's like philosophy being the big incubator for new disciplines (you can see that still in the anglo-american tradition of almost all doctors actually being Ph.D.s, philosophical doctors.

Thus this misconception becomes understandable. Now, let's look around - what's the next discipline being born from philosophy? Will it be business ethics? Will it be the philosophy of science, being renamed as scientific managment?

My guess is: due to the fast growing area of the Semantic Web, it will be ontology. Today, the Wikipedia already made two articles on it, ontologies in philosophy and ontologies in computer science. This trend will gain momentum, and even though applied ontology will always feed from the fundamental work done from Socrates until today, it will become a full-fledged discipline of its own.

I'm a believer

The Semantic Web is promising quite a lot. Just take a look at the most cited description of the vision of the Semantic Web, written by Tim Berners-Lee and others. Many people are researching on the various aspects of the SemWeb, but in personal discussions I often sense a lack of believing.

I believe in it. I believe it will change the world. It will be a huge step forward to the data integration problem. It will allow many people to have more time to spend on the things they really love to do. It will help people organize their lives. It will make computers seem more intelligent and helpful. It will make the world a better place to live in.

This doesn't mean it will safe the world. It will offer only "nice to have"-features, but then, so many of them you will hardly be able to think of another world. I hardly remember the world how it was before e-Mail came along (I'm not that old yet, mind you). I sometimes can't remember how we went out in the evening without a mobile. That's where I see the SemWeb in 10 years: no one will think it's essential, but you will be amazed when thinking back how you lived without it.

Who am I?

Well, as this being a blog, it will turn out that it is more important what I write than who I am. Just for the context, I nevertheless want to offer a short sketch about my bio.

I studied Computer Science and Philosophy at the University of Stuttgart, Germany. In Computer Science, I thought about Software Architectures, Programming Languages and User Interfaces, and my master thesis happened to be the first package to offer a validating XML parser for the programming language Ada 95.
In Philosophy I started thinking a lot of Justice, especially John Rawls and Plato, but finally I had a strong move to Construcitivst Epistemology and the ontological status of neural networks (both papers are in German and available from my website.

It's a pretty funny thing that next week I will listen to talk on neural networks and ontologies again, and nevertheless my then made paper and the talk won't have too much in common ;-)

Well, so how comes I am working on Semantic Web technologies by now? I have the incredible luck to work in the Knowledge Management Group of the AIFB in Karlsruhe, and there on the EU SEKT Project. I still have a lot to learn, but in the last few weeks I aggregated quite a good grasp on Ontology Engineering, RDF and OWL and some other fields. This is all pretty exicting and amazing and I am looking forward to see what's around the next triple.

Welcome!

Welcome to my new blog! Technology kindly provided by Blogger.com