Semantic search

Jump to navigation Jump to search

SSSW Day 4

This day no theoretical talks, but instead two invited speakers - and much social programme, with a lunch at a swimming pool and a dinner in Segovia. Segovia is a beautiful town, with a huge, real, still standing roman aqueduct. Stunning. And there I ate the best pork ever! The aqueduct survived the huge earthquake of Lisbon of 1755, although houses around it crumbled and broke. This is, because it is built without any mortar - just stone over stone. So the stones could swing and move slightly, and the construction survived.
Made me think of loosely coupled systems. I probably had too much computer science the last few days.

The talks were very different today: first was Mike Woolridge of the University of Liverpool. He talked about Multiagent Systems in the past, the present and the future. He identified five trends in computing: Ubiquity, Interconnection, Intelligence, Delegation and Human-orientation.
His view on intelligence was very interesting: it is about the complexity of tasks that we are able to automate and delegate to computers. He quoted John Alan Robertson - the guy who invented resolution calculus, a professor of philosophy - as exclaiming "This is Artificial Intelligence!", when he saw a presentation of the FORTRAN compiler at a conference. I guess the point was, don't mind about becoming as intelligent as humans, just mind at getting closer.
"The fact that humans were in control of cars - our grandchildren will be quite uncomfortable with this idea."

The second talk was returning to the Semantic Web in a very pragmatic way: how to make money with it? Richard Benjamins of iSOCO just flew in from Amsterdam where he was at the SEKT meeting, and he brought promising news about the developing market for Semantic Web technologies. Mike Woolridge was criticizing Richard's optimistic projections and noted that he also, about ten years ago, spent a lot of energy and money into the growing Multiagent market - and lost most of it. It was an interesting discussion - Richard being the believer, Mike the sceptic, and a lot of young people betting a few years worth of life on the ideas presented by the first one...

SSSW Day 5

Today (which is July 15th) just one talk. The rest of the day - beside the big dinner (oh well, yes, there was a phantastic dinner speech performed by Aldo Gangemi and prepared by Enrico and Asun if I understood it correctly, which was hilariously funny) and the disco - was available for work on the mini projects. But more about the mini projects in the next blog.

The talk was given by University of Manchester's Carol Goble (I like that website. It starts with the sentence "This web page is undergoing a major overhaul, and about time. This picture is 10 years old. the most recent ones are far too depressing to put on a web site." How many professors did you have that would have done this?). She gave a fun and nevertheless insightful talk about the Semantic Web and the Grid, describing the relationship between the two as a very long engagement. The grid is the old, grudgy, hard working groom, the Semantic Web the bride, being aesthetically pleasing and beautiful.

What is getting gridders excited? Flexible and extensible schemata, data fusion and reasoning. Sounds familiar? Yes, these are exactly the main features of Semantic Web technologies! The grid is not about accessing big computers (as most people think in the US, but they are a bit behind on this as well), it is about knowledge communities. But one thing is definitively lacking: scalability, people, scalability. They went to test a few Semantic Web technologies with a little data - 18 million triples. Every tool broke. The scalability lacks, even thought the ideas are great.

John Domingue pointed out, that scalability is not that much of a problem as it seems, because the TBoxes, where the actual reasoning will happen, will always remain relatively small. And the scalability issue with the ABoxes can be solved with classic database technology.

The grid offers real applications, real users, real problems. The Semantic Web offers a lot of solutions and discussions about the best solution - but lack surprisingly often an actual problem. So it is obvious that the two fit together very nicely. At the end, Carole described them as engaged, but not married yet.

At the end she quotes Trotsky: "Revolution is only possible when it becomes inevitable" (well, at least she claims it's Trosky, Google claims its Carole Goble, maybe someone has a source? - Wikiquote doesn't have it yet). The quote is in line with almost all speakers: the Semantic Web is not Revolution, it is Evolution, an extension of the current web.

Thanks for the talk, Carole!

SSSW Last Day

The Summer School on Ontological Engineering and the Semantic Web finished on Saturday, July 16th, and I can't remember having a more intense and packed week in years. I really enjoyed it - the tutorials, the invited talks, the workshops, the social events, the mini project - all of it was awesome. It's a pity that it's all over now.

Today, besides the farewells and thank yous and the party in Madrid with maybe half of the people, also saw the presentation of the mini projects. The mini projects where somewhat similar to the The Semantic Web In One Day we had last year - but without a real implementation. Groups of four or five people had to create a Semantic Web solution in only six hours (well, at least conceptually).

The results were interesting. All of them were well done and highlighted some promising use cases for the Semantic Web, where data integration will play an important role: going out in the evening, travelling, dating. I'd rather not consider too deeply if computer scientists are rather attacking an own itch here ;) I really enjoyed the Peer2Peer theater, where messages wandered through the whole class room in order to visualize the system. This was fun.

Our own mini project modelled the Summer School and the projects itself, capturing knowledge about the buildup of the groups and classifying them. We had to use not only quite complex OWL constructs, but also SWRL-rules - and we still had problems expressing a quite simple set of rules. Right now we are trying to write these experiences down in a paper, I will inform you here as soon as it is ready. Our legendary eternal struggle at the boundaries of sanity and Semantic Web technologies seemed to be impressive enough to have earned us a cool price. A clock.

Thanks to all organizers, tutors and invited speakers of the Summer School, thanks to all the students as well, for making it such a great week. Loved it, really. I hope to stay in touch with all of you and see you at some conference pretty soon!

SWAT4HCLS trip report

This week saw the 12th SWAT4HCLS event in Edinburgh, Scotland. It started with a day of tutorials and workshops on Monday, December 10th, on topics such as SPARQL, querying, ontology matching, and using Wikibase and Wikidata.

Conference presentations went on for two days, Tuesday and Wednesday. This included four keynotes, including mine on Wikidata, and how to move beyond Wikidata (presenting the ideas from my Abstract Wikipedia papers). The other three keynotes (as well as a number of the paper presentation) were all centered on the FAIR concept which I already saw being so prominent at the eScience conference earlier this year. FAIR as in Findable, Accessible, Interoperable, and Reusable publication of data. I am very happy to see these ideas spread out so prominently!

Birgitta König-Ries talked about how to use semantic technologies to manage FAIR data. Dov Greenbaum talked about how licenses interplay with data and what it means for FAIR data - personally, my personal favorite of the keynotes, because of my morbid fascination regarding licenses and intellectual property rights pertaining to data and knowledge. He actually confirmed my understanding of the area - that you can’t really use copyright for data, and thus the application of CC-BY or similar licenses to data would stand on shaky grounds in a court. The last keynote was by Helen Parkinson, who gave a great talk on the issues that come up when building vocabularies, including issues around over-ontologizing (and the siren call of just keeping on modeling) and others. She put the issues in parallel to the travels of Odysseus, which was delightful.

The conference talks and posters were really on spot on the topic of the conference: using semantic web technologies in the life sciences, health care, and related fields. It was a very satisfying experience to see so many applications of the technologies that Semantic Web researchers and developers have been creating over the years. My personal favorite was MetaStanza, web components that visualize SPARQL results in many different ways (a much needed update to SPARK, that Andreas Harth and I had developed almost a decade ago).

On Thursday, the conference closed with a Hackathon day, which I couldn’t attend unfortunately.

Thanks to the organizers for the event, and thanks again for the invitation to beautiful Edinburgh!

Other trip reports (send me more if you have them):

SWSA panel

Thursday, October 7, 2021, saw a panel of three founding members of the Semantic Web research community, who each have been my teachers and mentors over the years: Rudi Studer, Natasha Noy, and Jim Hendler. I loved watching the panel and enjoyed it thoroughly, also because it was just great to see all of them again.

There were many interesting insights and thoughts in this panel, too many to write them all down, but I want to mention a few.

It was interesting how much all panelists talked about creating the Semantic Web community, and how much of an intentional effort that was. Deciding that it needs a conference, a journal, an organization, setting those up, and their interactions. Seeing and fostering a sustainable research community grown out of an idea is a formidable and amazing effort. They all mentioned positively the diversity in the community, and that it was a conscious effort to work towards that. Rudi mentioned that the future challenge will be with ensuring that computer science students actually have Semantic Web technologies integrated into their standard curriculum.

They named a number of the successes that were influenced by the Semantic Web research work, such as Schema.org, the heavy use of SPARQL in supercomputing (I had no idea!), Wikidata (thanks for the shout out, Rudi!), and the development of scalable graph databases. Natasha raised the advantage of having common identifiers throughout an organization, i.e. that everyone refers to California the same way. They also named areas that remained elusive and that they expect to see progress in the coming years, Rudi in particular mentioned Agents and Common Sense, which was echoed by the other participants, and Jim mentioned Personal Knowledge Graphs. Jim mentioned he was surprised by the growing importance of unstructured data. Jim is also hoping for something akin to “procedural attachments” - you see some new data coming in, you perform this action (I would like to think that a little Wikifunctions goes a long way).

We need both, open knowledge graphs and closed knowledge graphs (think of your personal ones, but also the ones by companies).

The most important contribution so far and also well into the future was the idea of decentralization of semantics. To allow different stakeholders to work asynchronously and separately on parts of the semantics and yet share data. This also includes the decentralization of knowledge graphs, but also in the future we will encounter a world where semantics are increasingly brought together and yet decentralized.

One interesting anecdote was shared by Natasha. She was talking about a keynote by Guha (one of the few researchers who were namechecked in the panel, along with Tim Berners-Lee) at ISWC in Sydney 2013. How Guha was saying how simple the technology needs to be, and how there were many in the audience who were aghast and shocked by the talk. Now, eight years later and given her experience building Dataset Search, she appreciates the insights. If they have a discussion about a new property for longer than five minutes, they drop it. It’s too complicated, and people will use it wrong so often that the data cleanup will become expensive.

All of them shared the advice for researchers in their early career stage to work on topics that truly inspire them, on problems that are real and that they and others care about, and that if they do so, the results have the best chance to have impact. Think about problems you can explain to people not in your field, about “how can we use triples to save the world” - and not just about “hey, look, that problem that we solved with these other technologies previously, now we can also solve it with Semantic Web technologies”. This doesn’t really help anyone. Solve new problems. Solve real problems. And do what you are truly passionate about.

I enjoyed the panel, and can recommend everyone in the Semantic Web research area or any related, nearby research, to check it out. Thanks to the organizers for this talk (which is the first session in a series of talks that will continue with Ora Lassila early December).


Sam Altman and the veil of ignorance

(This is not about Altman having been removed as CEO of OpenAI)

During the APEC forum on Thursday, Sam Altman has been cited to having said the following thing: "Four times now in the history of OpenAI—the most recent time was just in the last couple of weeks—I’ve gotten to be in the room when we push the veil of ignorance back and the frontier of discovery forward. And getting to do that is like the professional honor of a lifetime."

He meant that as an uplifting quote to describe how awesome his company and their achievements are.

I find it deeply worrying. Why?

The "veil of ignorance" (also known as the original position) is a thought experiment introduced by John Rawls, one of the leading American moral and political philosophers of the 20th century. The goal is to think about the fairness of a society or a social system without you knowing where in the system you end up: are you on top or at the bottom? What are your skills, your talents? Who are your friends? Do you have disabilities? What is your gender, your family history?

The whole point is to *not* push the veil of ignorance back, otherwise you'll create an unfair system. It is a good tool to think about the coming disruptions by AI technology.

The fact that he's using that specific term but is obviously entirely oblivious to its meaning tells us that there was a path that term took, probably from someone working on ethics to then-CEO Altman, and that someone didn't listen. The meaning was lost, and the beautiful phrase was entirely repurposed.

Given that's coming from the then-CEO of the company that claims and insists on, again and again (without substantial proof) that they are doing all this for the greater benefit of all humanity, that are, despite their name, increasingly closing their results, making public scrutiny increasingly difficult if not impossible - well, I find that worrying. The quote indicates that they have no idea about a basic tool towards evaluating fairness, even worse, have heard about it - but they have not listened or comprehended.

San Francisco and Challenges

Time is running totally crazy on me in the last few weeks. Right now I am in San Francisco -- if you like to suggest a meeting, drop me a line.

The CKC Challenge is going on and well! If you didn't have the time yet, check it out! Everybody is speaking about how to foster communities for shared knowledge building, this challenge is actually doing it, and we hope to get some good numbers and figures out of it. An fun -- there is a mystery prize involved! Hope to see as many of you as possible at the CKC 2007 in a few days!

Yet another challenge with prizes is going on at Centiare. Believe it or not, you can actually make money with using a Semantic MediaWiki, wih the Centiare Prize 2007. Read more there.

Saturn the alligator

Today at work I learned about Saturn the alligator. Born to humble origins in 1936 in Mississippi, he moved to Berlin where he became acquainted with Hitler. After the bombing of the Berlin Zoo he wandered through the streets. British troops found him, gave him to the Soviets, where against all odds he survived a number of near death situations - among others he refused to eat for a year - and still lives today, in an enclosure sponsored by Lacoste.

I also went to Wikidata to improve the entry on Saturn. For that I needed to find the right property to express the connection between Saturn, and the Moscow Zoo, where he is held.

The following SPARQL query was helpful: https://w.wiki/7ga

It tells you which properties connect animals with zoos how often - and in the Query Helper UI it should be easy to change either types to figure out good candidates for the property you are looking for.

Semantic MediaWiki 0.6: Timeline support, ask pages, et al.

It has been quite a while since the last release of Semantic MediaWiki, but there was enormous work going into it. Huge thanks to all contributors, especially Markus, who has written the bulk of the new code, reworked much of the existing, and pulled together the contributions from the other coders, and the Simile team for their great Timeline code that we reused. (I lost overview, because the last few weeks have seen some travels and a lot of work, especially ISWC2006 and the final review of the SEKT project I am working on. I will blog on SEKT more as soon as some further steps are done).

So, what's new in the second Beta-release of the Semantic MediaWiki? Besides about 2.7 tons of code fixes, usability and performance improvements, we also have a number of neat new features. I will outline just four of them:

  • Timeline support: you know SIMILE's Timeline tool? No? You should. It is like Google Maps for the fourth dimension. Take a look at the Timeline webpage to see some examples. Or at ontoworld's list of upcoming events. Yes, created dynamically out of the wiki data.
  • Ask pages: the simple semantic search was too simple, you think? Now we finally have a semantic search we dare not to call simple. Based on the existing Ask Inline Queries, and actually making them also fully functional, the ask pages allow to dynamically query the wiki knowledge base. No more sandbox article editing to get your questions answered. Go for the semantic search, and build your ask queries there. And all retrievable via GET. Yes, you can link to custom made queries from everywhere!
  • Service links: now all attributes can automatically link to further resources via the service links displayed in the fact box. Sounds abstract? It's not, it's rather a very powerful tool to weave the web tighter together: service links specify how to connect the attributes data to external services that use that data, for example, how to connect geographic coordinates with Yahoo maps, or ontologies with Swoogle, or movies with IMdb, or books with Amazon, or ... well, you can configure it yourself, so your imagination is the limit.
  • Full RDF export: some people don't like pulling the RDF together from many different pages. Well, go and get the whole RDF export here. There is now a maintenance script included which can be used via a cron job (or manually) to create an RDF dump of the whole data inside the wiki. This is really useful for smaller wikis, and external tools can just take that data and try to use it. By the way, if you have an external tool and reuse the data, we would be happy if you tell us. We are really looking forward to more examples of reuse of data from a Semantic MediaWiki installation!

I am looking much forward to December, when I can finally join Markus again with the coding and testing. Thank you so very much for your support, interest, critical and encouraging remarks with regards to Semantic MediaWiki. Grab the code, update your installation, or take the chance and switch your wiki to Semantic MediaWiki.

Just a remark: my preferred way to install both MediaWiki and Semantic MediaWiki is to pull it directly from the SVN instead of taking the releases. It's actually less work and helps you tremendously in keeping up to date.

Semantic MediaWiki 1.0 released

After about two years of development and already with installations all over the world, we are very happy to announce the release of Version 1.0 of Semantic MediaWiki, and thus the first stable version. No alpha, no beta, it's out now, and we think you can use it productively. Markus managed to release it in 2007 (on the last day of the year), and it has moved far beyond what 0.7 was, in stability, features, and performance. The biggest change is a completely new ask syntax, much more powerful since it works much smoother with MediaWiki's other systems like the parser functions, and we keep constantly baffling ourselves about what is possible with the new system.

We have finally reached a point where we can say, OK, let's go for massive user testing. We want big and heavy used installations to test our system. We are fully aware that the full power of the queries can easily kill an installation, but there are many ways to tweak performance and expressivity. We are now highly interested in performance reports, and then moving towards our actual goal, Wikipedia.

A lot has changed. You can find a full list of changes in the release notes. And you can download and install Semantic MediaWiki form SourceForge. Spread the word!

There remains still a lot of things to do. We have plenty of ideas how to make it more useful, and our users and co-developers also seem to have plenty of ideas. It is great fun to see the numbers of contributors to the code increase, and also to see the mailing lists being very lively. Personally, I am very happy to see Semantic MediaWiki flourish as it does, and I am thankful to Markus for starting this year (or rather ending the last) with such a great step.

Semantic MediaWiki Demo

Yeah! Doccheck's Klaus Lassleben is implementing the Semantic MediaWiki, and there's a version of it running for quite some time already, but some bugs had to be killed. Now, go and take a look! It's great.

And the coolest thing is the search. Just start typing the relation, and it gives you an autoexpansion, just like Google Suggest does (well, a tiny bit better :) Sure, the autoexpansion is no scientific breakthrough, but it's a pretty darn cool feature.

The SourceForge project Semediawiki is already up and running, and I sure hope that Mr Lassleben will commit the code any day soon!

Even better, Sudarshan has already started implementing extensions to it - without having the code! That's some dedication. His demo is running here, and shows how the typed links may be hidden from the source text of the wiki, for those users who don't like it. Great.

Now, go and check the demo!

Semantic MediaWiki goes business

... but not with the developers. Harry Chen writes about it, and several places copy the press release about Centiare. Actually, we didn't even know about it, and were a bit surprised to hear that news after our trip to India (which was very exciting, by the way). But that's OK, and actually, it's pretty exciting as well. I wish Centiare all the best! Here is their press release.

They write:

Centiare's founder, Karl Nagel, genuinely feels that the world is on the verge of an enormous breakthrough in MediaWiki applications. He says, "What Microsoft Office has been for the past 15 years, MediaWiki will be for the next fifteen." And Centiare will employ the most robust extension of that software, Semantic MediaWiki.

Wow -- I'd never claim that SMW is the most robust extension of MediaWiki -- there are so many of them, and most of them have a much easier time of being robust! But the view of MediaWiki taking the place of Office -- Intriguing. Although I'd put my bets rather on stuff like Google Docs (former Writely), and add some semantic spice to it. Collaborative knowledge construction will be the next big thing. Really big I mean. Oh, speaking about that, check out this WWW workshop on collaborative knowledge construction. Deadline is February 2nd, 2007.

Click here for more information about Centiare.


Comments are still missing on this post.

Semantic MediaWiki officially Beta

Semantic MediaWiki has gone officially Beta. Markus Krötzsch released Version 0.5 yesterday -- download it at Sourceforge and update your installation!

Markus and I are both busy today updating existing installations (and creating new ones -- greetings towards California!). The new version has several new features:

  • One can reuse existing Semantic Web vocabulary, like FOAF. This feature is used so strongly, it led to Swoogle actually believing, FOAF was defined at ontoworld!
  • The unit code was improved a lot -- one can define linear units now from inside the wiki. Africa has a list of all African countries, and you can see their size neatly listed.
  • New datatypes for URLs and Emails.
  • Better code (why we dare to call us Beta)

Check out the new features on ontoworld. Thanks for Markus and S for the coding marathon this weekend, that allowed to make this new release! We are fetching bugs now, and planning 0.6, and our first big stress tests, with lots and lots of data...


Comments are still missing on this post.

Semantic MediaWiki: The code is out there

Finally! 500 nice lines of code, including the AJAX-powered search, and that's it, version 0.1 of the SeMediaWiki project! Go to Sourceforge and grab the source! Test it! Tell us about the bugs you found, and start developing your own ideas. Create your own Semantic Wiki right now, today.

Well, yes, sure, there is a hell of a lot left to do. Like a proper triplestore connecting to the Wiki. Or a RDF-serialization. But hey, there's something you can play with.

Semantic Mediawiki 0.3

Yay! Markus "the Sorcerer" Krötzsch finished the new release of Semantic MediaWiki today. The demo website is already running version 0.3 for a while.

I'll let Markus speak:

I am glad to finally announce the official release of Semantic MediaWiki 0.3, available as usual at http://sourceforge.net/projects/semediawiki/. The final 0.3 is largely equivalent to the preview version that is still running on wiki.ontoworld.org -- the latest changes mainly concern localization.

Semantic MediaWiki 0.3 now runs on MediaWiki 1.6.1 that was released just yesterday. Older versions of MediaWiki should also work but upgrading is generally recommended.

The main new features of 0.3 are:

  • support for geographical coordinates (new datatype),
  • improved user interface: service links for JScript tooltips, CSS layout,
  • OWL/RDF export of all annotation data,
  • simplified installation process (including special page for setup/upgrade),
  • (almost) complete localization; translations available for English and German,
  • better MediaWiki integration: namespaces, user/content language, support for MediaWiki 1.6,
  • specials for displaying all relations/attributes,
  • experimental (OWL/RDF) ontology import feature,
  • and, last but not least, we also fixed quite some bugs.

The next steps towards 0.4 will probably be the inclusion of query results into existing pages, date/time support, and individual user settings for displaying certain datatypes. We also will have another look at ways of hiding the annotations from uninitiated users.

Have fun.

Markus

P.S.: I am not available during the weekend. Upgrading existing wikis should work (it's what we do all the time ;), but be aware that there is not going to be much support during the next three days.


Comments are still missing on this post.

Semantic Mediawiki 0.4 - Knowledge Inside!

15 May 2006

Until now, Semantic MediaWiki was kind of a nerds project. Yes, you could get a lot of information out in RDF, and actually, I used it as an RDF editor more than once -- but heck, what normal person needs that?

Now, with the freshly implemented feature, the advantages of a Semantic MediaWiki over a normal MediaWiki should become obvious: you can simply ask the wiki for stuff! Wiki, what are the 10 biggest city in the US? Put a list here. Or, wiki, what is the height of the current German chancellor? Put the info here. I have made a writeup on those inline queries on our demo wiki. Go there, read it.

But a lot of other things made it into the 0.4 release. Here's Markus' list:

  • Improved output for Special:Relations and Special:Attributes: usage of
  • relations and attributes is now counted
  • Improved ontology import feature, allowing to import ontologies and to update existing pages with new ontological information
  • Experimental suport for date/time datatype
  • More datypes with units: mass and time duration
  • Support for EXP-notation with numbers, as e.g. 2.345e13. Improved number formating in infobox.
  • Configurable infobox: infobox can be hidden if empty, or switched off completely. This also works around a bug with MediaWiki galeries.
  • Prototype version of Special:Types, showing all available datatypes with their names in the current language setting.
  • "[[:located in::Paris]]" will now be rendered as "located in [[Paris]]"
  • More efficient storage: changed database layout, indexes for fast search
  • Code cleaned up, new style guidelines
  • Bugfixes, a lot of Bugixes

Thanks to everyone who contributed and still contributes to the project! And, connected to this, thanks to the answers to my last blog entry -- I will write more on this tomorrow.

Semantic Scripting

28 May 2005

Oh my, I really need to designate some time to this blog. But let's not ranting about time - no one of us has time - let's directly dive into my paper for the Workshop on Scripting for the Semantic Web on the 2nd ESWC in Heraklion next week. Here is the abstract.

Python reached out to a wide and diverse audience in the last few years. During its evolution it combined a number of different paradigms under its hood: imperative, object-oriented, functional, listoriented, even aspect-oriented programming paradigms are allowed, but still remain true to the Python way of programming, thus retaining simplicity, readability and fun. OWL is a knowledge representation language for the definition of ontologies, standardised by the W3C. It reaps upon the power of Description Logics and allows both the definition of concepts and their interrelations as well as the description of instances. Being created as part of the notoriously known Semantic Web language stack, its dynamics and openness lends naturally to the ever evolving Python language. We will sketch the idea of an integration of OWL and Python, but not by simply suggesting an OWL library, but rather by introducing and motivating the benefits a really deep integration offers, how it can change programming, and make it even more fun.

You can read the full paper on Deep Integration of Scripting Languages and Semantic Web Technologies. Have fun! If you can manage it, pass by the workshop and give me your comments, rants, and fresh ideas - as well as the spontaneous promise to help me design and implement this idea! I am very excited about the workshop and looking forward to it. See you there!

Semantic Web Challenge 2006 winners

Sorry for the terseness, but I am sitting in the ceremony.

18 submissions. 14 passed the minimal criteria.

Find more information on challenge.semanticweb.org -- list of Finalists, links, etc. See also on ontoworld.

And the winners are ...

3. Enabling Semantic Web communities with DBin: an overview (by Christian Morbidoni, Giovanni Tummarello, Michele Nucci)

2. Foafing the Music: Bridging the semantic gap in music recommendation (by Oscar Celma)

1. MultimediaN E-Culture demonstrator (by Alia Amin, Bob Wielinga, Borys Omelayenko, Guus Schreiber, Jacco van Ossenbruggen, Janneke van Kersen, Jan Wielemaker, Jos Taekema, Laura Hollink, Lynda Hardman, Marco de Niet, Mark van Assem, Michiel Hildebrand, Ronny Siebes, Victor de Boer, Zhisheng Huang)

Congratulations! It is great to have such great projects to show off! :)


Comments are still missing on this post.

Semantic Web Gender Issue

Well, at least they went quite a way. With Google Base one can create new types of entities, entities themselves, and search for them. I am not too sure about the User Interface yet, but it's surely one of the best actually running onbig amounts of data. Nice query refinement, really.

But heck, there's one thing that scares me off. I was looking today for all the people interested in the Semantic Web, and there are already some in. And you can filter them by gender. I was just gently surprised about the choices I was offered when I wanted to filter them by gender...

Hier fehlt noch ein Bild.

Oh come on, Google. I know there are not that many girls in computer science, but really, it's not that bad!

Semantic Web Summer School 2006

The Summer School for the Semantic Web and Ontological Engineering is an annual event that brings together PhD students from all over the world and some of the brightest heads in the Semantic Web, to teach, to socialize, to learn, and to have fun. This year's invited speakers are Jim Hendler himself, and Enrico Motta, Stephan Baumann, Guus Schreiber, and the tutors are John Domingue, Asun Gomez-Perez, Jerome Euzenat, Sean Bechhofer, Fabio Ciravegna, Aldo Gangemi. You will learn a lot. You will have lots of fun. The place is really beautiful, the girls, well at least last year, were really beautiful, the stuff we learned was interesting, and inspired quite some cooperation further on. And it's really great for getting to know a lot of people: at the next conference you're guaranteed to meet someone again, and thus it is also a perfect possibility ot get into the community.

The deadline is May 1st, so be sure to go over to the SSSW2006 website and sign up.

If this didn't convince you, take a look at my series of posts about last year's summer school.


Comments are still missing on this post.

Semantic Web and Web 2.0

I usually don't just point to other blog entries (thus being a bad blogger regarding netiquette), but this time Benjamin Nowack nailed it in his post on the Semantic Web and Web 2.0. I read the iX article (a popular German computer technology magazine), and I lost quite some respect for the magazine as there were so many unfounded claims, off-the-hand remarks, and so much bad attitude in the article (and in further articles scuttered around the issue) towards the Semantic Web that I thought the publisher was personally set on a crusade. I could go through the article and write a commentory on it, and list the errors, but honestly, I don't see the point. At least it made me appreciate peer review and scientific method a lot more. The implementation of peer review is flawed as well, but I realize it could be so much worse (and it could be better as well - maybe PLoS is a better implementation of peer review).

So, go to Benji's post and convince yourself: there is no "vs" in Semantic Web and Web 2.0.

Semantic Web patent

Tim Finin and Jim Hendler are asking about the earliest usage of the term Semantic Web. Tim Berners-Lee (who else?) spoke about the need of semantics in the web at the WWW 1994 plenary talk in Geneva, though the term Semantic Web does not appear there directly. Whatever. What rather surprised me, though, is, when surfing a bit for the term, I discovered that Amit Sheth, host of this year's ISWC, filed the patent on it, back in 2000: System and method for creating a Semantic Web. My guess would be, that is the oldest patent of it.

Semantic Wikipedia

Marrying Wikipedia and the Semantic Web in Six Easy Steps - that was the title of the WikiMania 2005 presentation we gave about a month ago. On the Meta-Wikipedia we - especially Markus Krötzsch - were quite active on the Semantic MediaWiki project, changing and expanding our plans. DocCheck is working right now on a basic implementation of the ideas - they have lots of Wiki-Experience already, with Flexicon, a MediaWiki-based medical lexicon. We surely hope the prototype will be up and running soon!

Wow, the project seems perceived pretty well.

Tim Finin, Professor in Maryland: "I think this is an exciting project with a lot of potential. Wikipedia, for example, is marvelously successful and has made us all smarter. I’d like my software agents to have a Wikipedia of their own, one they can use to get the knowledge they need and to which they can (eventually) contribute." - Wikipedia meets the Semantic Web, Ebiquity blog at UMBC

Mike Linksvayer, CTO of Creative Commons: "The Semantic MediaWiki proposal looks really promising. Anyone who knows how to edit Wikipedia articles should find the syntax simple and usable. All that fantastic data, unlocked. (I’ve been meaning to write on post on why explicit metadata is democratic.) Wikipedia database dump downloads will skyrocket." - Annotating Wikipedia, Mike Linksvayers Blog

Danny Ayers, one of the developers of Atom and Author of Atom and RSS Programming: "The plan looks very well thought out and quite a pile of related information has been gathered. I expect most folks that have looked at doing an RDF-backed Wiki would come to the same conclusion I did (cf. stiki) - it’s easy to do, but difficult to do well. But this effort looks like it should be the one." - Wikipedia Bits, Danny Ayers, Raw Blog

Lambert Heller of the University of Münster wrote a German blog entry on the netbib weblog, predicting world domination. Rita Nieland has a Dutch entry on her blog, calling us heroes - if we succeed. And on Blog posible Alejandro Gonzalo Bravo García has written a great Spanish entry, saying it all: the web is moving, and at great speed!

So, the idea seems catching like a cold in rainy weather, we really hope the implementation will soon be there. If you're interested in contributing - either ideas or implementation - join our effort! Write us!

Semantic Wikipedia presentations

Last week on the Semantics 2006 Markus and I gave talks on the Semantic MediaWiki. I was happy to be invited to give one of the keynotes at the event. A lot of people were nice enough to come to me later to tell me how much they liked the talk. And I got a lot of requests for the slides. I decided to upload them, but wanted to clean them a bit. I am pretty sure that the slides are not self-sufficient -- they are tailored to my style of presentations a lot. But I added some comments to the slides, so maybe this will help you understand what I tried to say if you have not been in Vienna. Find the slides of the Semantics 2006 keynote on Semantic Wikipedia here. Careful, 25 MB.

But a few weeks ago I was at the KMi Podium for an invited talk there. The good thing is, they don't have just the slides, they also have a video of the talk, so this will help much more in understanding the slides. The talk at KMi has been a bit more technical and a lot shorter (different audiences, different talks). Have fun!

Shazam!

Shazam! was fun. And had more heart than many other superhero stories. I liked that, for the first time, a DC universe movie felt like it's organically part of that universe - with all the backpacks with Batman and Superman logos and stuff. That was really neat.

Since I saw him in the first trailer I was looking forward to see Steve Carell playing the villain. Turns out it was Mark Strong, not Steve Carell. Ah well.

I am not sure the film knew exactly at whom it was marketed. The theater was full with kids, and given the trailers it was clear that the intention was to get as many families into it as possible. But the horror sequences, the graphic violence, the expletives, and the strip club scenes were not exactly for that audience. PG-13 is an appropriate rating.

It was a joy to watch the protagonist and his buddy explore and discover his powers. Colorful, lively, fun. Easily the best scenes of the movie.

The foster family drama gave the movie it's heart, but the movie seemed a bit overwhelmed by it. I wish that part was executed a bit better. But then again, it's a superhero movie, and given that it was far better than many of the other movies of its genre. But as far as High School and family drama superheroes go, it doesn't get anywhere near Spiderman: Homecoming.

Mid credit scenes. A tradition that Marvel started and that DC keeps copying - but unlike Marvel DC hasn't really paid up to the teasers in their scenes. And regarding cameos - also something where DC could learn so much from Marvel. Also, what's up with being afraid of naming their heroes? Be it in Man of Steel with Superman or here with Billy, the hero doesn't figure out his name (until the next movie comes along and everybody refers to him as Superman as if it was obvious all the time).

All in all, an enjoyable movie while waiting for Avengers: Endgame, and hopefully a sign that DC is finally getting on the right path.

She likes music, but only when the music is loud

Original in German by Herbert Grönemeyer, 1983.

She sits on her windsill all day
Her legs dangling to the music
The noise from her room
drives all the neighbours mad
She is content
smiles merrily

She doesn't know
that snow
falls
without a sound
to the ground

Doesn't notice
the knocking
on the wall

She likes music
but only
when the music is loud
When it hits her stomach
with the sound

She likes music
but only
when the music is loud
When her feet feel
the shaking ground

She then forgets
that she is deaf

The man of her dreams
must play the bass
the tickling in her stomach
drives her crazy

Her mouth seems
to scream
with happiness
silently
her gaze removed
from this world

Her hands don't know
with whom to talk
No one's there
to speak to her

She likes music
but only
when the music is loud
When it hits her stomach
with the sound

She likes music
but only
when the music is loud
When her feet feel
the shaking ground

Site went down

The site went down, again. First time was in July, when Apache had issues, this time it's due to MySQL acting up and frying the database. I found a snapshot from July 2019, and am trying to recreate the entries from in between (thanks, Wayback Machine!)

Until then, at least the site is back up, even though they might be some losses in the content.

P.S.: it should all be back up. If something is missing, please email me.

Sleeping Lady with a Black Vase

31 May 2024

In 2009, a Hungarian art historian was watching the movie Stuart Little with his 3 year old daughter. And he's like "funny, that painting that's used in the set looks like that 1928 black and white photograph I have seen, of a piece of art which has been lost". So he sends a few emails...

Turns out, it *is* the actual artwork by Róbert Berény (1887-1953) which was last seen in public in 1928, and somehow made it to Sony, where it was used in a number of soap opera episodes and in Stuart Little.

Social Web and Knowledge Management

Obviously, the social web is coming. And it's also coming to this year's WWW conference in Beijing!

I find this topic very interesting. The SWKM picks up the theme of last year's very successful CKC2007 workshop, also at the WWW, where we aimed at allowing the collaborative knowledge construction. The SWKM is a bit broader, since it is not just about knowledge construction, but about the whole topic of knowledge management, and how the web changes everything.

If you are interested in the social web, or the semantic web, or specifically about the intersection of these two, and how it can be applied for knowledge management within or without an organisation, you will like the SWKM workshop at the WWW2008. You can submit papers until January 21st, 2008. All information can be found at the Social Web and Knowledge management workshop website.

Spring cleaning

Going through my old online presence and cleaning it up is really a trip down memory lane. I am also happy that most - although not all - of the old demos still work. This is going to be fun to release it all again.

Today I discovered that we had four more German translations of Something Positive that we never published. So that's another thing that I am going to publish soon, yay!

Stanford seminar on Knowledge Graphs

My friend Vinay Chaudhri is organising a seminar on Knowledge Graphs with Naren Chittar and Michael Genesereth this semester at Stanford.

I have the honour to present in it as the opening guest lecturer, introducing what Knowledge Graphs are and what are good for.

Due to the current COVID situation, the seminar was turned virtual, and opened to everyone to attend to.

Other speakers during the semester include Juan Sequeda, Marie-Laure Mugnier, Héctor Pérez Urbina, Michael Uschold, Jure Leskovec, Luna Dong, Mark Musen, and many others.

Star Trek's 32nd century

I like Star Trek for the cool technology, which has inspired plenty of people to work eg on "the Star Trek computer". I love Star Trek for the utopian society of plenty they sketch in the 23rd and 24th century.

I claim it is because of the laziness of the writing: they don't keep that utopia up.

When I heard about Discovery going to the 32nd century, I was excited about the wonders they would dream up. The new technology. The society. The culture. The breakthroughs.

With regards to that, it was a massive let down. Extremely disappointing.

Stars in our eyes

I grew up in a suburban, almost rural area in southern Germany, and I remember the hundreds, if not thousands of stars I could see at night. In the summers, that I spent on an island in Croatia, it was even more marvelous, and the dark night sky was breathtaking.

As I grew up, the stars dimmed, and I saw fewer and fewer of those, until only the brightest stars were visible. It was blindingly obvious that air and light pollution have swallowed that every-night miracle and confined it to my memory only.

Until in my late twenties I finally accepted and got glasses. Now the stars are back, as beautiful and brilliant as they have ever been.

Start the website again

This is no blog anymore. I haven't had entries for years, and even before then sporadically. This is a wiki, but somehow it is not that either. Currently you cannot make comments. Updating the software is a pain in the ass. But I like to have a site where I can publish again. Switch to another CMS? Maybe one day. But I like Semantic MediaWiki. So what will I do? I do not know. But I know I will slowly refresh this page again. Somehow.

A new part of my life is starting soon. And I want to have a platform to talk about it. And as much as I like Facebook or Google+, I like to have some form of control over this platform. Facebook and Google+ -- maybe they won't disappear in a year. But what about ten? Twenty? Fifty years? I'll still be around (I hope), but they might not...

Let's see what will happen here. For now, I republished the retelling of a day as a story I first published on Google+ (My day in Jerusalem) and a poem that feels eerily relevant whenever I think about it (Wenn ich wollte)

Starting Abstract Wikipedia

I am very happy about the Board of the Wikimedia Foundation having approved the proposal for the multilingual Wikipedia aka Abstract Wikipedia aka Wikilambda aka we'll need to find a name for it.

In order to make that project a reality, I will as of next week join the Foundation. We will be starting with a small, exploratory team, which will allow us to have plenty of time to continue to socialize and discuss and refine the idea. Being able to work on this full time and with a team should allow us to make significant progress. I am very excited about that.

I am sad to leave Google. It was a great time, and I learned a lot about running *large* projects, and I met so many brilliant people, and I ... seriously, it was a great six and a half years, and I will very much miss it.

There is so much more I want to write but right now I am just super happy and super excited. Thanks everyone!

Summer School for the Semantic Web, Day 0

Arrived in Cercedilla today, at the Semantic Web Summer School. I really was looking forward to these days, and now, flipping through the detailed programme I am even more excited. This will be a very intense week, I guess, where we learn a lot and have loads of fun.

I was surprised by the sheer number of students being here: 56 or 57 students have come to the summer school, from all over the world - met someone from Australia, from Pittsburgh, and many Europeans. Happily, I also met quite a number of people I already knew, and thus I know it will be a pleasurable week. But let's just do the math for a second: we have more than 50 accepted students at this summer school. There are at least three other summer schools with related fields, like the one in Ljubljana the week before, there's one in Edinburgh, and the ESSLLI. So, that's about 200 students. Even if we claim that every single PhD student is going to a summer school - which I don't think - that would mean we get 200 theses every year! (Probably this number will be only reached in three years or so)

So, just looking at the sheer amount of people working on it - what's the expected impact?

Interesting times lie ahead.

Supporting disaster relief with semantics

Soenke Ziesche, who has worked on humanitarian projects for the United Nations for the last six years, wrote an article for xml.com on the use of semantic wikis in disaster relief operations. That is a great scenario I never thought about, and basically one of these scenarios I think of when I say in my talks: "I'll be surprised if we don't get surprised by how this will be used." Probably I would even go to state the following: if nothing unexpected happens with it, the technology was too specific.

Just the thought that semantic technology in general, and maybe even Semantic MediaWiki in particular, could relief the effects of a natural disaster, or maybe even safe a life, this thought is so incredible exciting and rewarding. Thank you so much Soenke!

Taking a self-driving car

Ten years ago, my daughter was just born and I just joined Google, who were working on self-driving cars. And I was always hoping that my daughter would not have to need to learn how to drive a car (but that if she wanted, she may). In the last ten years I lost confidence in that hope.

Yesterday, thanks to my wife organizing it, we took our first ride with a self-driving car, driving about ten minutes through San Francisco. And I guess a world-wide roll out will take time, maybe a lot of time, but what can I say: it drove very well.

Talk in Korea

If you're around this Tuesday, February 13th, in Seoul, come by the Semantic Web 2.0 conference. I had the honour to be invited to give a talk on the Semantic Wikipedia (where a lot is happening right now, I will blog about this when I come back from Korea, and when the stuff gets fixed).

Looking forward to see you there!

Tech layoffs of 2022

Very interesting article reflecting on the current round of layoffs in the tech industry. The author explains it within the context of the wider economy. I'm surprised that the pandemic is not mentioned, which lead to accelerated growth early in the pandemic, which now hasn't turned out to be sustained. But the other arguments - from low interest rates to constant undervaluation due to the dot com bust around the millennium - this seems to tell a rather coherent story.

One particularly interesting point is the outlook that the tech company has gobbled up so much programming talent that other industries were starved of it. A lot of industries would benefit from (more modestly paid) software engineers, which might stimulate the whole economy to grow. Software might still be "eating the world", but that doesn't have to translate into software companies eating up the economy. There are so many businesses with domain expertise that cannot be easily replaced by some Silicon Valley engineer - but who would benefit from some programmers on staff.

This is especially true with the last decade of AI results. There is a massive overhang of capabilities that we have unlocked, which hasn't found its way into products yet, partly because all the skills necessary to turn these into products at the right places were just concentrated through enormously high wages in a small set of companies. There are so many businesses who would benefit from the latest machine learning methods. But folks prefer, understandably, to work in a place that gives them the promise of revolutionizing whole industries or saving the world.

But there is so much potential value to be generated if we also take some more modest goals into account. Not all of us need to work on AGI, it's also great to use software engineering skills to improve working conditions at the assembly line of a small local factory. With or without machine learning.

Temperatures in California

It has been a bit chillier the last few days. I noticed that after almost a decade in California, I feel pretty comfortable with understanding temperatures in Fahrenheit - as long as they are over 60° F. If it is colder, I need to switch to Celsius in order to understand how cold it exactly is. I have no idea what 40° or 45° or 50° F are, but I still know what 5° C is!

The fact that I still haven't acclimatised to Fahrenheit for the cooler temperatures tells you a lot about the climate in California.

Ten years of Wikidata

Today it's ten years since Wikidata had launched. A few memories.

It's been an amazing time. In the summer of 2011, people still didn't believe Wikidata would happen. In the fall of 2012, it was there.

Markus Krötzsch and I were pushing for the idea of a Semantic Wikipedia since 2005. Semantic MediaWiki was born from that idea, Freebase and DBpedia launched in 2007, microformats in Wikipedia became a grassroots thing, but no one was working on the real thing at the Wikimedia Foundation.

With Elena Simperl at KIT we started the EU research project RENDER in 2010, involving Mathias Schindler at Wikimedia Deutschland. It was about knowledge diversity on the Web, still an incredibly important topic. In RENDER, we developed ideas for the flexible representation of knowledge, and how to deal with contradicting and incomplete information. We analysed Wikipedia to understand the necessity of these ideas.

In 2010, I was finishing my PhD at KIT, and got an invitation by Yolanda Gil to work at the ISI at University of Southern California for a half year sabbatical. There, Yolanda, Varun Ratnakar, Markus and I developed a prototype for Wikidata which received the third place in the ISWC Semantic Web Challenge that year.

In 2011, the Wikimedia Data summit happened, invited by Tim O'Reilly and organised by Danese Cooper, to the headquarters of O'Reilly in Sebastopol, CA. There were folks from the Wikimedia Foundation, Freebase, DBpedia, Semantic MediaWiki, O'Reilly, there was Guha, Mark Greaves, I think, and others. I think that's where it became clear that Wikidata would be feasible.

It's also where I first met Guha and where I admitted to him that I was kinda a fan boy. He invented MFC, RDF, had worked with Douglas Lenat on CYC, and later that year introduced Schema.org. He's now working on Data Commons. Check it out, it's awesome.

Mark Greaves, a former DARPA program officer, who then was working for Paul Allen at Vulcan, had been supporting Semantic MediaWiki for several years, and he really wanted to make Wikidata happen. He knew my PhD was done, and that I was thinking about my next step. I thought it would be academia, but he suggested I should write up a project proposal for Wikidata.

After six years advocating for it, I understood that someone would need to step up to make it happen. With the support and confidence of so many people - Markus Krötzsch, Elena Simperl, Mark Greaves, Guha, Jamie Taylor, Rudi Studer, John Giannandrea, and others - I drafted the proposal.

The Board of the Wikimedia Foundation approved the proposal as a new Wikimedia project, but neither allocated the funding, nor directed the Foundation to do it. In fact, the Foundation was reluctant to take it on, unsure whether they would be able to host such a project development at that time. Back then, that was a wise decision.

Erik Möller, then CTO of the Foundation, was the driving force behind a major change: instead of turning the individual Wikipedias semantic, we would have a single Wikidata for all languages. Erik was also the one who had secured the domain for Wikidata. Many years prior.

Over the next half year and with the help of the Wikimedia Foundation, we secured funding from AI2 (Paul Allen), Google (who had acquired Freebase in the meantime), and the Gordon and Betty Moore Foundation, 1.3 million.

Other funders backed out because I insisted on the Wikidata ontology to be entirely under the control of the community. They argued to have professional ontologists, or reuse ontologies, or to use DBpedia to seed Wikidata. I said no. I firmly believed, and still believe, that the ontology has to be owned, created and maintained by the community. I invited the ontologists to join the project as community members, but to the best of my knowledge, they never made significant contributions. We did miss out on quite a bit of funding, though.

There we were. We had the funding and the project proposal, but no one to host us. We were even thinking of founding a new organisation, or hosting it at KIT, but due to the RENDER collaboration, Mathias Schindler had us talk with Pavel Richter, ED of Wikimedia Deutschland, and Pavel offered to host the development of Wikidata.

For Pavel and Wikimedia Deutschland this was a big step: the development team would significantly increase WMDE (I think, almost double it in size, if I remember correctly), which would necessitate a sudden transformation and increased professionalisation of WMDE. But Pavel was ready for it, and managed this growth admirably.

On April 1st 2012, we started the development of Wikidata. On October 29 2012 we launched the site.

The original launch was utterly useless. All you could do was creating new pages with Q IDs (the Q being a homage to Kamara, my wife), associated those Q IDs with labels in many languages, and connect to articles in Wikipedia, so called sitelinks. You could not add any statements yet. You could not connect items with each other. The sitelinks were not used anywhere. The labels were not used anywhere. As I said, the site was completely useless. And great fun, at least to me.

QIDs for entities are still being often disparaged. Why QIDs? Why not just the English name? Isn't dbp:Tokyo much easier to understand than Q1490? It was an uphill battle ten years ago to overcome the anglocentricity of many people. Unfortunately, this has not changed much. I am thankful to the Wikimedia movement to be one of the places that encourages, values, and supports the multilingual approach of Wikidata.

Over the next few months, the first few Wikipedias were able to access the sitelinks from Wikidata, and started deleting the sitelinks from their Wikipedias. This lead to a removal of more than 240 million lines of wikitext across the Wikipedias. 240 million lines that didn't need to be maintained anymore. In some languages, these lines constituted more than half of the content of the Wikipedia. In many languages, editing activity dropped dramatically at first, sometimes by 80%.

But then something happened. Those edits were mostly bots. And with those bots gone, humans were suddenly better able to see each other and build a more meaningful community. In many languages, this eventually lead to an increased community activity.

One of my biggest miscalculations when launching Wikidata was to entirely dismiss the possibility of a SPARQL endpoint. I thought that none of the existing open source triple stores would be performant enough. Peter Haase was instrumental in showing that I was wrong. Today, the SPARQL endpoint is an absolutely crucial piece of the Wikidata infrastructure, and is widely used to explore the dataset. And with its beautiful visualisations, I find it almost criminally underused. Unfortunately, the SPARQL endpoint is also the piece of infrastructure that worries us the most. The Wikimedia Foundation is working hard on figuring out the future for this service, and if you can offer substantial help, please reach out.

Today, Wikidata has more than 1.4 billion statements about approximately 100 million topics. It is by far the most edited Wikimedia project, with more edits than the English, German, and French Wikipedia together - even though they are each a decade older than Wikidata.

Wikidata is widely used. Almost every time Wikipedia serves one of its 24 billion monthly page views. Or during the pandemic in order to centralise the data about COVID cases in India to make them available across the languages of India. By large companies answering questions and fulfilling tasks with their intelligent assistants, be it Google or Apple or Microsoft. By academia, where you will find thousands of research papers using Wikidata. By numerous Open Source projects, by one-off analyses by data scientists, by small enterprises using the dataset, by student programmers exploring and playing with it on the weekend, by spreadsheet enthusiasts enriching their data, by scientists, librarians and curators linking their datasets to Wikidata, and thus to each other. Already, more than 7,000 catalogs are linked to Wikidata, and thus to each other, really and substantially establishing a Web of linked data.

I will always remember the Amazon developer who approached me after a talk. He had used Wikidata to gather data about movies. I was surprised: Amazon owns imdb, why would they ever use anything else for movies? He said that imdb was great for what it had, but Wikidata complemented it in unexpected ways, offering many interesting connections between the movies and other topics which would be out of scope for imdb.

Not to be misunderstood: knowledge bases such as imdb are amazing, and Wikidata does not aim to replace them. They often have a clear scope, have a higher quality, and almost always a better coverage in their field than Wikidata ever can hope to have, or aims to have. And that's OK. Wikidata's goal is not to replace these knowledge bases. But to provide the connecting tissue between the many knowledge bases out there. To connect them. To provide a common set of entities to work with. To turn the individual knowledge bases into a large interconnected Web of knowledge.

I am still surprised that Wikidata is not known more widely among developers. It always makes me smile with joy when I see yet another developer who just discovered Wikidata and writes an excited post about it and how much it helped them. In the last two weeks, I stumbled upon two projects who used Wikidata identifiers where I didn't expect them at all, just used them as if it was the most normal thing in the world. This is something I hope we will see even more in the future. I hope that Wikidata will become the common knowledge base that is ubiquitously used by a large swarm of intelligent applications. Not only to make these applications be smarter, by knowing more about the world - but also by allowing these applications to exchange data with each other more effectively because they are using the same language.

And most importantly: Wikidata has a healthy, large, and comparatively friendly and diverse community. It is one of the most active Wikimedia projects, only trailing the English Wikipedia, and usually similarly active as Commons.

Last time I checked, more than 400,000 people have contributed to Wikidata. For me, that is easily the most surprising number about the project. If you had asked me in 2012 how many people would contribute to Wikidata, I would have sheepishly hoped for a few hundred, maybe a few thousand. And I would have defensively explained why that's OK. I am humbled and awestruck by the fact that several hundred thousand people have contributed to an open knowledge base that is available to everyone, and that everyone can contribute to.

And that I think is the most important role that Wikidata plays. That it is a place that everyone can contribute to. That the knowledge base that everyone uses is not owned and gateguarded by any one company or government, but that it is a common good, that everyone can contribute to. That everyone with an internet connection can lend their voice to the sum of all knowledge.

We all own Wikidata. We are responsible for Wikidata. And we all benefit from Wikidata.

It has been an amazing ten years. I am looking forward to many more years of Wikidata, and to the many new roles that it will play in the years to come, and to the many people who will contribute to it.

Shoutout to the brilliant team that started the work on Wikidata: Lydia Pintscher, Abraham Taherivand, Daniel Kinzler, Jeroen De Dauw, Katie Filbert, Tobias Gritschacher, Jens Ohlig, John Blad, Daniel Werner, Henning Snater, and Silke Meyer.

And thank you for all these amazing pictures of cakes for Wikidata's birthday. (And if you're curious what is coming next: we are working on Wikifunctions and Abstract Wikipedia, in order to allow more people to contribute more knowledge to even more people!)

The Center of the Universe

The discovery of the center of the universe led to a series of unexpected consequences. It killed some, it enlightened others, but most people just were left utterly confused in the end.

When the results from the Total Radiating Universal Tessellation Hyperfield satellites measurements came in, it became depressingly clear that the universe was indeed contracting. Very slowly, but without any reasonable doubt — or, as the physicists said, they were five sigma sure about it. As the data from the measurements became available, physicists, cosmologists, topologists, even a few mathematically inclined philosophers, and a huge number of volunteers started to investigate it. And after a short period of time, they came to a whole set of staggering conclusions.

First, the Universe had a rather simple four-dimensional form. The only unfortunate blemishes in this theory were the black holes, but most of the volunteers, philosophers, and topologists decided to ignore these as accidental.

Second, the form was bounded. There was a beginning and an end in time, and there were boundaries in space, and those who understood that these were the same were enlightened about the form of the universe.

Third, since the form of the universe was bounded and simple, it had a center. Whereas this was slightly surprising it was a necessary consequence of the previous findings. What first seemed exciting, but soon will turn out not to be only the heart of this report, but the heart of all humanity, was that the data collected by the satellites allowed to calculate the position of the center of the universe.

Before that, let me recapture what we traditionally knew about how the universe is built. Our sun is a star, around which a few planets travel, one of them being our Earth. Our sun is one of a few tens of billions of stars that form a long curved thread which ties around a supermassive black hole. A small number of such threads are tangled together, forming the spiral arms of our galaxy, the Milky Way. Our galaxy consists of half a trillion stars like our sun.

Galaxies, like everything else in the universe, like to stick together and form groups. A few hundred thousand galaxies make up a supercluster. A few of these superclusters together build enormous walls of stars, filaments traversing the universe. The galaxies of such a wall are all in a single plane, more or less, and sometimes even in a single line.

Between these walls, walls made of superclusters and galaxies and stars and planets, there is, basically, nothing. The walls of stars are like gigantic honeycombs, and between them, are enormous empty spaces, hundred million of light years wide. When you look at a honeycomb, you will see that the empty spaces between the walls are much, much larger than the walls themselves. Such is the universe. You might think that the distance from here to the next grocery store is quite far, or that the ocean is quite big. But the distance from the earth to the sun is so much bigger, and the distance from the sun to the next star again so much more. And from our galaxy to the next, there is a huge empty space. Nevertheless, our galaxy is so close to the next group of galaxies that they together form a building block of a huge wall, separating two unimaginable large empty spaces from each other.

So when we figured out that we can calculate the center of the universe, it was widely expected that the center would be somewhere in one of those vast spaces of nothing. The chances that it would be in one of the filaments were tiny.

It turned out that this was not a question of chance.

The center of the universe was not only inside of a filament, but the first quick calculations (quick, though, has to be understood as taking three and a half years) suggested that the center is actually within our filament. And not only within our filament — but our galaxy. Within a one light year radius of our sun.

The team that made these calculations was working at a small research institute in rural Japan. They did not believe the results, and double and triple checked them. The head of the institute had graduated from Princeton, and called his former advisor there. Although it was deep in the night in Japan, they talked for many hours. In the end he learned that Princeton has made the same calculations, and received their own results about eight months ago. They didn’t dare to publish them. There must have been a mistake. These results had to be wrong.

Science has humiliated the whole of humanity again and again. And it was quite successful in doing so. A scientist would much easier accept that the center of the universe is some mathematical construct pointing to nothing than what the infallible mathematics indicated. But the data was out. And the number of people making the above mentioned realizations and calculations continued growing. It was only a matter of time. And when the Catholic University of Rio de Janeiro finally published the results — in a carefully written paper, without any accompanying press release, and formulated so cautiously and defensively — all the scientists who already knew the results held their breath.

The storm was unimaginable. Everyone demanded an explanation, but no one would listen to anyone offering one. The religions rejoiced, claiming they knew it all along, and many flocked to the mosques and churches and temples, as a proof of God was finally found. The irony of science leading humans to the embrace of religion was profoundly lost at that time, but later recognized as one of the largest jokes in history. Science has dealt its ultimate humiliation, not to humanity, but perversely to its most devout followers, the scientists. The scientists, who, while trashing the superiority of humans over the world, were secretly inflating their own, and were now reminded that they were merely slaves to a most cruel mistress. Their bitter resistance to the results did not stop them from emerging.

The mathematics and calculations were soon made public. The mathematics were deceptively simple, once the required factorizations were done, and easy to check. High school courses went through the proofs, and desperate parents peeked over the shoulders of their daughters and sons who, sometimes for the first time, talked of integrals and imaginary numbers. Television and streaming platforms were explaining discriminants and complex numbers and roots of higher degrees. Websites offering math courses bent under the load and moral weight.

There is one weird thing about roots. The root of a number is the number that, multiplied with itself, gives you the original number. The weird thing is that there is usually not a single, unique result to that question. For example, the root of the number four is not just two, but also minus two, as minus two times minus two results in four, too. There are two roots of the second degree (which we usually call the square root). There are three roots of the third degree (sometimes called the cube root). There are four roots of the fourth degree. And so on. All of them are correct. Sometimes you can discard one or the other because the result has to fit certain constraints (say, you are looking only for the positive root of four), but sometimes, you can not.

As the calculations went public, the methods became more and more refined. The results became increasingly precise, and as the data from the satellites poured in, one of the last steps involved a root of the seventh degree. First, this was regarded as a minor curiosity, especially because these seven results led to basically the same point. Cosmologically speaking.

Earth is moving. Earth is moving around the sun, with a speed of a sixty seven thousand miles per hour, or eighteen miles each second. Also the sun is moving, and the earth is moving with the sun, and our galaxy is moving, and with our galaxy the sun moves along, and with the sun our earth. We are racing with a speed of a thousand miles each second in some direction away from the center of the universe.

And it was realized, maybe we just passed the center of the universe. Maybe it was just an accident, maybe all the planets and stars pass the center of the universe at some point. That we are so close to the center of the universe might be just a funny coincidence.

And maybe they are right. Maybe every star will at some point cross the center of the universe within the distance of a light year.

At some point though it was realized that, since the universe was bounded in all four dimensions, there was not only a center in space, but also a center in time, a midpoint between the beginning of the universe and its future end.

All human history is encompassed in the last hundred thousand years. From the mitochondrial Eve and the Y-Chromosomal Adam who lived in Africa, the mother of our mother of our mother, and so on, that we all share, and the father of our father of our father, and so on, that we all share, their descendants, our ancestors, who crossed the then fertile jungle of the Sahara and who afterwards settled the whole planet, painted on the walls of caves and filled the air with music by blowing over grass blades and into hollow bones, wandered over the land bridge connecting Asia with the Americas and traveled over the vast Pacific to discover tiny islands, until the recent invention of the alphabet, all of this happened in the last hundred thousand years. The universe has an age of hundred thousand times a hundred thousand years, roughly. And the fabled midpoint turned out to be within the last few thousand years.

The hopes that our earth was just accidentally next to the center of the universe was shattered. As the precision of the calculations increased, it became clearer and clearer that earth was not merely close to the center of the universe, but back at the midpoint of history, earth was right there in the center. In every single of the seven possible results, Earth was right at the center of the universe. [1]

As the calculations continued over the years, a new class of mystic mathematicians emerged, and many walls between religion and science were shattered. On both sides the unshakeable ones remained: the scientists who would not admit that these results mean anything, that it all is merely a mathematical abstraction; and the priests who say that these results mean nothing, that they don’t tell us about how to live a good life. That these parallels intersect, is the only trace of infinity left.


[1] As the results refined, it seemed that the seven mathematical solutions for the center of time and space turned out to be some very well known dates. So far the precisions calculated was ten years here or there. The well known dates were: 3760 BC, 541 BC, 30 AD, and 610 AD. The other dates turned out to be quite less well known: 10909 BC, 3114 BC, and 1989 AD. The interpretation of the dates led to a well-known series of events all over the world, which we will not discuss here.


(This story was first published on Medium on February 2, 2014 under CC-BY 4.0).

The Fourth Scream

Janie loved her research. It was at the intersection of so many interesting areas - genetics, linguistics, neuroscience. And the best thing about it - she could work the whole day with these adorable vervet monkeys.

One more time, she showed the video of the flying eagle to Kassandra. The MRI helmet on Kassandra’s little head measured the neuron activation, highlighting the same region on her computer screen as the other times, the same region as with the other monkeys. Kassandra let out the scream that Janie was able to understand herself by now, the scream meaning “Eagle!”, and the other monkeys behind the bars in the far end of the room, in a cage large as half the room, ran to cover in the bushes and small caves, if they were close enough. As they did every time.

That MRI helmet was a masterpiece. She could measure the activation of the neurons in unprecedented high resolution. And not only that, she could even send inferencing waves back, stimulating very fine grained regions in the monkey’s brain. The stimulation wasn’t very fast, but it was a modern miracle.

She slipped a raspberry to Kassandra, and Kassandra quickly snatched it and stuffed it in her mouth. The monkeys came from different populations from all over Southern and Eastern Africa, and yet they all understood the same three screams. Even when the baby monkeys were raised by mute parents, the baby monkeys understood the same three screams. One scream was to warn them from leopards, one scream was to warn them from snakes, and the third scream was to warn them from eagles. The screams were universally understood by everyone across the globe - by every vervet monkey, that is. A language encoded in the DNA of the species.

She called up the aggregated areas from the scream from her last few experiments. In the last five years, she was able to trace back the proteins that were responsible for the growth of these four areas, and thus the DNA encoding these calls. She could prove that these three different screams, the three different words of Vervetian, were all encoded in DNA. That was very different from human language, where every word is learned, arbitrary, and none of the words were encoded in our DNA. Some researchers believed that other parts of our language were encoded in our DNA: deep grammatical patterns, the ability to merge chunks into hierarchies of meaning when parsing sentences, or the categorical difference between hearing the syllable ba and the syllable ga. But she was the first one to provably connect three different concrete genes with three different words that an animal produces and understands.

She told the software to create an overlapping picture of the three different brain areas activated by the three screams. It was a three dimensional picture that she could turn, zoom, and slice freely, in real time. The strands of DNA were highlighted at the bottom of the screen, in the same colors as the three different areas in the brain. One gene, then a break, then the other two genes she had identified. Leopard, snake, eagle.

She started to turn the visualization of the brain areas, as Kassandra started squealing in pain. Her hand was stuck between the cage bars and the plate with raspberries. The little thief was trying to sneak out a raspberry or two! Janie laughed, and helped the monkey get the hand unstuck. Kassandra yanked it back into the cage, looked at Janie accusingly, knowing that the pain was Janie’s fault for not giving her enough raspberries. Janie snickered, took out another raspberry and gave it to the monkey. She snatched it out of Janie’s hand, without stopping the accusing stare, and Janie then put the plate to the other side of the table, in safe distance and out of sight of Kassandra.

She looked back at the screen. When Kassandra cried out, her hand had twitched, and turned the visualization to a weird angle. She just wanted to turn it back to a more common view, when she suddenly stopped.

From this angle, she could see the three different areas, connecting together with the audiovisual cortex at a common point, like the leaves of a clover. But that was just it. It really looked like three leaves of a four-leaf clover. The area where the fourth leaf would be - it looked a lot like the areas where the other three leaves were.

She zoomed into the audiovisual cortex. She marked the neurons that triggered each of the three leaves. And then she looked at the fourth leaf. The connection to the cortex was similar. A bit different, but similar enough. She was able to identify what probably are the trigger-neurons, just like she was able to find them for the other three areas.

She targeted the MRI helmet on the neurons connected to the eagle trigger neurons, and with a click she sent a stimulus. Kassandra looked up, a bit confused. Janie looked at the neurons, how they triggered, unrolled the activation patterns, and saw how the signal was suppressed. She reprogrammed the MRI helmet, refined the neurons to be stimulated, and sent off another stimulus.

Kassandra yanked her head up, looking around, surprised. She looked at her screen, but it showed nothing as well. She walked nervously around inside the little cage, looking worriedly to the ceiling of the lab, confused. Janie again analyzed the activation patterns, and saw how it almost went through. There seemed to be a single last gatekeeper to pass. She reprogrammed the stimulator again. Third time's the charm, they say. She just remembered a former boyfriend, who was going on and on about this proverb. How no one knew how old it was, where it began, and how many different cultures all over the world associate trying something three times with eventual success, or an eventual curse. How some people believed you need to call the devil's name three times to —

Kassandra screamed out the same scream as before, the scream saying “Eagle!”. The MRI helmet had sent the stimulus, and it worked. The other monkeys jumped for cover. Kassandra raised her own arms above her head, peeking through her fingers to find the eagle she had just sensed.

Janie was more than excited! This alone will make a great paper. She could get the monkeys to scream out one of the three words of their language by a simple stimulation of particular neurons! Sure, she expected this to work - why wouldn’t it? But the actual scream, the confirmation, was exhilarating. As expected, the neurons now had a heightened potential, were easier to activate, waiting for more input. They slowly cooled down as Kassandra didn’t see any eagles.

She looked at the neurons connected to the fourth leaf. The gap. Was there a secret, fourth word hidden? One that all the zoologists studying vervet monkeys have missed so far? What would that word be? She reprogrammed the MRI helmet, aiming at the neurons that would trigger the fourth leaf. If her theory was right. With another click she sent a stimulus to the —

Janie was crouching in the corner of the room, breathing heavily, cold sweat was covering her arms, her face, her whole body. Her clothes were clamp. Her arms were slung above her head. She didn’t remember how she got here. The office chair she was just sitting in a moment ago, laid on the floor. The monkeys were quiet. Eerily quiet. She couldn’t see them from where she was, she couldn’t even see Kassandra from here, who was in the cage next to her computer. One of the halogen lamps in the ceiling was flickering. It wasn’t doing that before, was it?

She slowly stood up. Her body was shivering. She felt dizzy. She almost stumbled, just standing up. She slowly lowered her arms, but her arms were shaking. She looked for Kassandra. Kassandra was completely quiet, rolled up in the very corner of her cage, her arms slung around herself, her eyes staring catatonically forward, into nothing.

Janie took a step towards the middle of the room. She could see a bit more of the cage. The monkeys were partly huddled together, shaking in fear. One of them laid in the middle of the cage, his face in a grimace of terror. He was dead. She thought it was Rambo, but she wasn’t sure. She stumbled to the computer, pulled the chair from the floor, slumped into it.

The MRI helmet had recorded the activation pattern. She stepped through it. It did behave partially the same: the neurons triggered the unknown leaf, as expected, and that lead to activate the muscles around the lungs, the throat, the tongue, the mouth - in short, that activated the scream. But, unlike with the eagle scream, the activation potential did not increase, it was now suppressed. Like if it was trying to avoid a second triggering. She checked the pattern: yes, the neuron triggered that suppression itself. That was different. How did this secret scream sound?

Oh no! No, no, no, no, NOO!! She had not recorded the experiment. How stupid!

She was excited. She was scared, too, but she tried to push that away. She needed to record that scream. She needed to record the fourth word, the secret word of vervet monkeys. She switched on all three cameras in the lab, one pointed at the large cage with the monkeys, the other two pointing at Kassandra - and then she changed her mind, and turned one onto herself. What has happened to herself? Why couldn’t she remember hearing the scream? Why was she been crouching on the floor like one of the monkeys?

She checked her computer. The MRI helmet was calibrated as before, pointing at the group of triggering neurons. The suppression was ebbing down, but not as fast as she wanted. She increased the stimulation power. She shouldn’t. She should follow protocol. But this all was crazy. This was a cover story for Nature. With her as first author. She checked the recording devices. All three were on. The streams were feeding back into her computer. She clicked to send the sti—

She felt the floor beneath her. It was dirty and cold. She was laying on the floor, face down. Her ears were ringing. She turned her head, opened her eyes. Her vision was blurred. Over the ringing in her ears she didn’t hear a single sound from the monkeys. She tried to move, and she felt her pants were wet. She tried to stand up, to push herself up.

She couldn’t.

She panicked. Shivered. And when she felt the tears running over her face, she clenched her teeth together. She tried to breath, consciously, to collect herself, to gain control. Again she tried to stand up, and this time her arms and legs moved. Slower than she wanted. Weaker than she hoped. She was shaking. But she moved. She grabbed the chair. Pulled herself up a bit. The computer screen was as before, as if nothing has happened. She looked to Kassandra.

Kassandra was dead. Her eyes were bloodshot. Her face was a mask of pure terror, staring at nothing in the middle of the room. Janie tried to look at the cage with the other monkeys, but she couldn’t focus her gaze. She tried to yank herself into the chair.

The chair rolled away, and she crashed to the floor.

She had went too far. She had made a mistake. She should have had followed protocol. She was too ambitious, her curiosity and her impatience took the best of her. She had to focus. She had to fix things. But first she needed to call for help. She crawled to the chair. She pulled herself up, tried to sit in the chair, and she did it. She was sitting. Success.

Slowly, she rolled back to the computer. Her office didn’t have a phone. She double-clicked on the security app on her desktop. She had no idea how it worked, she never had to call security before. She hoped it would just work. A screen opened, asking her for some input. She couldn’t read it. She tried to focus. She didn’t know what to do. After a few moments the app changed, and it said in big letters: HELP IS ON THE WAY. STAY CALM. She closed her eyes. Breathed. Good.

After a few moments she felt better. She opened her eyes. HELP IS ON THE WAY. STAY CALM. She read it, once, twice. She nodded, her gaze jumping over the rest of the screen.

The recording was still on.

She moved the mouse cursor to the recording app. She wanted to see what has happened. There was nothing to do anyway, until security came. She clicked on the play button.

The recording filled three windows, one for each of the cameras. One pointed at the large cage with the vervet monkeys, two at Kassandra. Then, one of the cameras pointing at Kassandra was moved, pointing at Janie, just moments ago - it was moments, was it? - sitting at the desk. She saw herself getting ready to send the second stimulus to Kassandra, to make her call the secret scream a second time.

And then, from the recording, Kassandra called for a third time.

The end

The Future of Knowledge Graphs in a World of Large Language Models

The Knowledge Graph Conference 2023 in New York City invited me for a keynote on May 11, 2023. Given that basically all conversations these days are about large language models, I have given a talk about my understanding on how knowledge graphs and large language models go together.

After the conference, I did a recording of the talk, giving it one more time, in order to improve the quality of the recording. The talk had gotten more than 10,000 views on YouTube so far, which, for me, is totally astonishing.

I forgot to link it here, so here we go finally:

The Heat Death of the Internet

Good observations, and closing on a hopeful note. Short and pointed read.

The Jones Brothers

The two Jones brothers never got along, but both were too stubborn to leave the family estate. They built out two entrances to the estate, one from the south, near Jefferson Avenue, and the newer, bigger one, closer to the historic downtown, and each brother chose to use one of the entrances exclusively, in order to avoid the other and their family. To the confusion of the local folk (but to the open enjoyment of the high school's grammar teacher, who was, surprisingly for his role, a descriptivist), they named the western gate the Jones' gate, and the southern one the Jones's gate, and the brothers earnestly thought that that settled it.

It didn't.

The Ring verse in German

28 May 2024

I finally got the Lord of the Rings in English. I never read it in its native English, only in a German translation, about thirty years ago.

And already on the first page I am stumped: the ring verse seems to me sooo much better in German than in English. Now, it is absolutely possible that this is due to me having read it as an impressionable teenager and having carried the translation with me for three decades and thus developed fondness and familiarity with it, but I think it's more than that.

Here are the verses in English, German, and a literal back-translation of the German to English:

Three Rings for the Elven-kings under the sky,
Seven for the Dwarf-lords in their halls of stone,
Nine for Mortal Men doomed to die,
One for the Dark Lord on his dark throne
In the Land of Mordor where the Shadows lie.
One Ring to rule them all,
One Ring to find them,
One Ring to bring them all,
and in the darkness bind them
In the Land of Mordor where the Shadows lie.

German translation by von Freymann:

Drei Ringe den Elbenkönigen hoch im Licht,'
Sieben den Zwergenherrschern in ihren Hallen aus Stein,
Den Sterblichen, ewig dem Tode verfallen, neun,
Einer dem dunklen Herrn auf dunklem Thron
Im Lande Mordor, wo die Schatten drohn.
Einen Ring, sie zu knechten, sie all zu finden,
ins Dunkle zu treiben und ewig zu binden
Im Lande Mordor, wo die Schatten drohn.

Back-translation of her translation by me:

Three Rings for the Elven kings high in the light,
Seven for the Dwarf-lords in their halls of stone,
For the mortals, eternally doomed to death, nine,
One for the Dark Lord on dark throne
In the Land of Mordor, where the Shadows loom.
One Ring, to enslave them, to find them,
to drive to Darkness, and forever bind them
In the Land of Mordor, where the Shadows loom.

The differences are small, but I find the selection of words by the translator to be stronger and more evocative than Tolkien's original. Which is amazing. Thanks to the great Ebba-Margareta von Freymann for her wonderful translation of the poems!

Originally, the publisher Klett hat trouble with translating Tolkien's poems, but Ebba-Margareta had been, for many years working on the translation of poems by Tolkien, and by using her translations, Klett did a great service to the book for the German-speaking world.


The Strange Case of Booker T. Washington’s Birthday

A lovely geeky essay about how much work a single edit to Wikipedia can be. I went down this kind of rabbit holes myself more than once, and so I very much enjoyed the essay.

The Surrounding Sea

Explore the ocean of words in which we all are swimming, day in day out. A site that allows you to browse through the lexicographic data in Wikidata along four dimensions:

  • alphabetical, like in a good old fashioned dictionary
  • through translations and synonyms
  • where does this word come from, and where did it go
  • narrower and wider words, describing a hierarchy of meanings

Wikidata contains over 1.2 million lexicographic entries, but you will see the many gaps when exploring the sea of words. Please join us in charting out more of the world of words.

Happy 23rd birthday to Wikipedia and the movement it started!