Difference between revisions of "Main Page"
imported>Denny |
imported>Denny |
||
| Line 2: | Line 2: | ||
__NOTOC__ | __NOTOC__ | ||
| + | |||
| + | {{#ask:[[Category:Blog post]] [[published::+]] | ||
| + | |format=embedded | ||
| + | |default=None yet | ||
| + | |sort=published | ||
| + | |order=desc | ||
| + | }} | ||
Revision as of 17:56, 24 December 2007
<ask default="None yet" format="embedded" limit="2" sort="published" order="desc"> +"+" contains an extrinsic dash or other characters that are invalid for a date interpretation.</ask>
The new thing about the Semantic Web is not the semantics, it's the Web
In my 2025 ISWC keynote (publication of recording is pending) I was using some text which I considered a quote: “The new thing about the Semantic Web is not the semantics, it's the Web.”
Now I thought the quote was by Chris Welty, and I thought it was in his 2007 ISWC keynote (what a nice recall!), but I didn't have the time to actually check it and I could be wrong.
On arrival in Nara I met Natasha Noy, and asked her if she remembered the quote. She did not, but said it didn't sound like something that Chris would say, and it sounded more like something Jim Hendler would say.
So in the talk itself, I had the quote, but said that I was unsure to whom to attribute it.
Later, Juan Sequeda and I were trying to figure out where the quote is from. Juan just asked Chris via messenger (he wasn't in Nara), but Chris said it's nothing he remembers saying. I think he also thought it sounds more like something Jim would say.
I asked Gemini, and Gemini thought it was Tim Berners-Lee. Gemini Deep Research on the other hand was guessing Jim Hendler, with input from Dean Allemang and maybe Tim Berners-Lee. Juan asked a paid version of ChatGPT, and that actually found the quote (“In the Semantic Web, it is not the Semantic which is new, it is the Web which is new”) in the Semantic Web FAQ by the W3C, attributed to Chris Welty. Finally, I asked ChatGPT (free version), and it attributed the quote to -- well, Denny Vrandečić. Myself. Very funny
So, I finally watched the 2007 keynote, and indeed: he gets very close to the quote, by combining what he said: “The emphasis is on Web, not on Semantic.” (24:20-25:00) and “It wasn't the KR in Semantic Web that was novel, it was the Web that was novel.” (39:40-40:00).
Now I can confidently attribute the quote to Chris Welty.
Update: there has been a twist to the story! Kingsley Udehen found the quote in slides by Tim Berners-Lee wrapping up ISWC 2005, Slide 3, two years before Chris' keynote. But -- attributed to Chris Welty! So the conclusion stands, just with an even older provenance.
Kicking off the naming contest for Abstract Wikipedia
Before we started Abstract Wikipedia, it was always clear that "Abstract Wikipedia" was just a working name for the project, not the name that the Website would eventually get. It was intentionally chosen to be minimally descriptive --it is abstract in the sense that it captures encyclopaedic content abstracted from a concrete natural language-- but also not a very good name, because it is confusing: people confuse it with a Wikipedia of Abstracts (short summaries of articles), or think of math or art.
Thus we are kicking off a naming contest and were inviting all Wikimedians to join! Already, more than a dozen names are there, and you can suggest more, and vote for your favorites.
Let's find the best possible name!
Keynote at ISWC 2025
It is an honor to be invited as a keynote speaker to the ISWC - International Semantic Web Conference this year in Nara, Japan.
This is particularly exciting for me, because during my PhD research, ISWC has been my "home conference" - the prime conference in my research area, where the research community that I felt affiliated with was meeting. It will be 20 years since I attended my first ISWC, in Galway, and every year I attended I enjoyed both the academic program and the opportunity to meet friends again. I am very much looking forward to meet friends again, some which I haven't seen for many years.
The title of the talk will be:
- "Wikipedia and the Semantic Web - Celebrating 20 years of co-development, and the future"
If there's anything I should mention, let me know. The full abstract and how to register for the conference, can be found here: ISWC 2025 Keynotes abstracts and speakers
I would be very excited to see you there, either to see you again, or to meet you there for the first time!
Powerball jackpot at $1.7bn
The Powerball jackpot is around $1.7 billion. Given that a ticket costs $2 and the chance to win the jackpot is about 1:300 million it seems almost rational to buy a ticket. I mean, I'd buy a ticket or two if I would still live in the US.
Or, as a friend suggested, take $600 million and buy 300 million tickets and win for sure.
Would that work?
Yes, it would work in the sense that you would have a guaranteed jackpot win. But does it pay off?
Let's assume you don't have $600 million in cash lying around and instead get a loan (ha, sure!) for that amount (if you're reading this and you *do* have $600 million in cash lying around, the analysis would be quite different, interestingly, but in that case send me a DM and I'll sell you the analysis).
Now, there are a few considerations:
1. You will be paying taxes on the prize money. At least 37% in federal taxes, and then between 0% and 11% in state taxes.
2. The $1.7 billion is the sum that you get over 30 annual payments, where each payment is 5% more than the previous one. That starts at about $13-$16 million after taxes, depending on your state. That will likely not be enough to cover the interest on the $600 million, but that depends on your loan terms.
3. You can opt for a one-time payment instead of the 30 years of annual payments. But in that case it's "only" $770 million you get. Or between $400-$500 million after taxes.
4. The biggest risk is that someone else might win too. In that case you split evenly with every other winner. That means a single other winner reduces your payout by 50%. If there are more, you get even less.
5. On the other side, you will actually get more than just the Jackpot. Because you bought 300 million tickets, several millions of those tickets will win something. Altogether, that's another $90 million roughly. Again, before taxes.
Seems close cut so far, but...
6. Here comes the kicker -- you can actually deduct the cost of the lottery tickets against the taxes on lottery winnings, if you itemize them. So better keep the receipts!
That would reduce the taxes considerably, if you take the one-time payment (with the annual payments, it probably won't last for 30 years). So the total calculation for the one-time payment would be:
Total win $770 million + $90 million = $860 million. $600 million is tax free, on the other $260 million you'll have 37-48% tax, you'll still end up with more than $130 million profit, with which you'll be able to pay off the interest on the short-term loan easily.
But because of consideration nr. 4, and given that there are many other players given that huge jackpot, you'll likely have to split. And if that happens, your losses will be catastrophic:
Assuming a single other winner, your total prize money drops to $385 million + $90 million (that part is fixed) = $475 million. Fortunately it's all tax free (because, well, you didn't actually make any money), unfortunately you're now at least $125 million worse off, plus interest. If more people win, it gets worse.
My recommendation: if you're into this kind of things, buy one ticket (the first ticket increases your chances by a lot, the second ticket merely doubles it), dream about what you'd do with the money for a day or two, but have no expectations to win. Chances are you won't (there's a reason it's called a tax on stupidity).
And definitely don't do the "buy every single ticket" scheme with loaned money. That will end badly.
P.S. (a few days later): the Powerball was won by two tickets, and thus had to be split.
Tech companies and the size of the market
If you think there might be an AI bubble, but don't worry about it, even if it pops, how bad can it be?
Here's one number: seven tech companies (NVIDIA, Microsoft, Google, Apple, Meta, Tesla and Amazon) are worth about 20 trillion together, representing a third of the value of the total US stock market.
For context: total subprime mortgages in 2008 were 1.3 trillion, total US mortgage debt was 10 trillion.
You have a 401k or have invested in index fonds? Your money is very likely significantly tied with the AI market. If you want that, that's fine. If you're worried it might be a bubble ---
Additions with unique digits: a tale of puzzling and AI
The other day, Markus Krötzsch and I were catching up when one of his kids came in. She told us about a puzzle her math teacher gave her:
How many integer sums are there where the equation uses each digit at most once?
The simplest example is 1+2=3, but 12+47=59 also works. On the other hand, 1+9=10 isn’t a solution because the digit ‘1’ appears twice (in the first number and the sum). Markus’s wife Anja was the first of us to find a solution using three-digit numbers.
I recommend trying to solve it yourself at this point, and then come back to this post. It is much more fun like that.
Thinking for a moment, we realized there seemed to be quite a few solutions. But how many? As computer scientists, our immediate thought was to write a program. First, though, we estimated the complexity of a brute force approach: if we tried all possible permutations of the digits 0 to 9, that’s 10! permutations, or about 3.6 million. A computer should work through that pretty quickly.
I started writing code, and did so even more inefficiently. I included ‘+’ and ‘=’ with the 10 digits, then iterated through all permutations (and shorter versions) to check for solutions. This ballooned the candidate pool to an upper bound of 12!x10, or about 5 billion. I knew there was plenty of room for improvement – the code wasn’t smart – but it should find all the results. I started running it. At this point, I was pretty sure this was the kind of problem that with raw computational power can be solved faster than taking the time to come up with 'smart' code and better heuristics. My full code is here.
As my code ran, Markus took it as a fun challenge and set out to create a smarter solution — which, admittedly, shouldn’t be too hard. Instead of iterating through all possible (and many impossible) equation strings, he only permuted the numbers, applied a maximum length, and checked those. Markus’ code was impressively fast – just 4 seconds! It seemed like we had found the 'right' way to solve it, using better heuristics. Or had we?
Soon after, my code finished, taking 20 minutes. We both had the same answer! (You can find the answer in the linked material, but I am keeping it out here in case you want to try it out yourself). Having two different approaches with the same result gave us high confidence in the answer.
Now, that’s where it could have ended. But the kids were teasing us: it took us forever to write the code (less than half an hour, but still), why didn’t we just ask ChatGPT for the answer?
We were pretty convinced there’s no chance ChatGPT would be able to answer that. So we asked. Markus formulated the prompt (in German, due to my prodding, so it’s more fun for his kid watching), and ChatGPT started thinking.
I have never seen it thinking for so long. Almost seven minutes.
We used that time to laugh and joke about it. No way it would figure it out! We were looking at the steps it was describing. They didn’t seem too bad. What is a cryptarithm? Never heard of that. We were getting ready to amuse ourselves by watching and dissecting ChatGPT’s answer.
Boy, we were wrong!
It returned the same number we had found. The answer was great! Not only that, we looked into the thinking process and found the code that ChatGPT wrote. That code ran in fractions of a second. It wasn’t just faster; the way it built the equations, digit by digit, felt like a fundamentally different strategy than our permutations. Here is a link to the full conversation (local archive). Where my code ran in 20 minutes and Markus’s in 4 seconds, ChatGPT’s code ran in 60 milliseconds.
We were completely baffled. I searched the Web for this riddle and solutions, hoping to find indications it had been solved before, that it was part of the corpus. It might be the case, but we didn’t find it anywhere. ChatGPT came up with more efficient code, and was faster to write it. We were deeply impressed, and were left with two questions, without immediate answers:
- Given that this is a language model, how did it actually manage to reach this solution? Why did this work?
- Given that ChatGPT can do this, what does this mean? What are the consequences of having such an ability available? How does this change things?
I kicked off my local Ollama with OpenAI’s recent Open Weight model (120 billion parameters), using Markus’s prompt. I let it run in the background while Markus and I kept chatting. It took a long time – tens of minutes? I didn’t keep track – and we were again confident it wouldn’t be able to answer the question. This was mostly because my local Ollama doesn’t have access to a Python interpreter. And indeed, it answered that there were 45 solutions, very much off from the 1991 solutions we were looking for.
We were a bit relieved. It was getting late, so we called it a night. Seeing Ollama struggle initially, giving me numbers far off the mark, actually reinforced my earlier skepticism. It felt like my intuition about what these models couldn't do was being confirmed – at least for a moment. I asked the system to list all 45 solutions, and it did. I noticed that 1+2=3 was missing. I asked if that would be a correct answer and if it wanted to reconsider its number of solutions, then left it to run while I went to sleep.
The next day I checked the answer. The model had increased its answer to 124 solutions and wrote new code. It told me I should run that Python code myself, as a next step. So, again, the wrong answer.
And then I did run the code. From both the first answer, and the second, improved answer. If you actually run the code, both of them result in the correct answer! The first version took about 20 seconds. The second version then took almost half an hour to run, but I think that’s only because at this point the LLM was becoming a bit extra cautious; it even checked for five-digit numbers, although four-digit numbers were sufficient. I took the LLM’s last code, made some minor clean-up to the code, and reduced it to four digits again, and the code gives the correct answer in 20 seconds again. My complete conversation with gpt-oss is available here.
Not only that: I would even claim that this cleaned-up code provides the easiest to read and understand of all the solutions we discussed so far. It iterates over all possible numbers up to four digits, making sure the second number is always bigger than the first, gets their sum, checks the result for digit uniqueness (quite elegantly using a set!), and counts the solutions. The code is short and, I’d say, elegant.
Markus has also written a second version, which beats the one by ChatGPT in efficiency, running in only 20 milliseconds', and probably being so fast that the Unix time command doesn't measure it reliably.
I am impressed by the answers by the models (and by Markus as well). I think this is a great example of my biggest problem with all of these models, something I’ve been saying for years: I cannot explain to someone else what these models are good at, and what they are not good at.
But I thought I had a good intuition myself. That I roughly knew what kind of questions they would answer well, and which ones I shouldn’t trust them with. What my experience from this experiment showed me is that I do not. I don’t know what the capabilities of these models are.
Given the recent news around the math olympiad, should I have expected such a result? Was this just a toss-up? Do these systems now reliably solve problems like these? Would we have trusted ChatGPt’s result, or would we have been skeptical, if Markus and I hadn't calculated the result for the riddle ourselves?
I am tempted to rationalize it away, to think that the problem was simpler than we made it, that my complex code involving permutations was a red herring, and that’s why the systems could solve it. Maybe that’s just me dealing with the AI effect, where we downplay AI achievements once they happen.
Thanks for reading so far. And big thanks to Markus’s kids for bringing up this riddle, and their math teacher for asking it.
Wikipedia is an encyclopedia
Just happy to post that Wikifunctions can now create some simple sentences in multiple languages. More information on the Abstract Wikipedia newsletter.
Oh! Dalmatien
oh! Dalmatien von Doris Akrap erschien vor wenigen Wochen, ein sogenannter Insider-Guide, im Folio-Verlag. Darin enthalten sind drei Dutzend kurze Artikel, jeder etwa drei Seiten lang, mit Geschichten aus und zu Dalmatien. Zum Dalmatinischem Essen und Trinken, zur Dalmatinischen Lebensweise, der lokalen Sprache, den Winden der Region und welchen Einfluss sie auf das Gemüt haben, interessante Anekdoten zur Kultur und zur Geschichte.
Die Schreibweise ist durchgehend unterhaltsam und flott, die Geschichten hängen nicht zusammen und können in beliebiger Reihenfolge gelesen oder übersprungen werden, sie sind kurz, so das man sie in kurzen Pausen lesen kann -- die perfekte Reiselektüre für Dalmatien.
Besonders gefallen hat mir das Einweben der Immigrationskindererfahrungen der Autorin, deren Echo ich in meiner eigenen Familiengeschichte fühlen konnte. Wieviel ihrem Vater das Haus in Dalmatien bedeutete. Wie sehr sie selbst daran hing und wie das Haus sie durch ihr Leben begleitete, auch über den Tod ihres Vaters hinaus. Die dramatische Familiengeschichte im zweiten Weltkrieg. Aber all das überschattet nicht die Leichtigkeit des Buches, und es bleibt eine unterhaltsame und dennoch lehreiche Sommerlektüre, die einen Einblick in diese wunderschöne Region erlaubt.
Selbst der Artikel über Brač hat mir unerwarteter weise etwas völlig unbekanntes eröffnet, die Band Valentino Bošković und ihre sehr absurde Musik und Mythologie.
Das Buch hat sich für mich gelohnt, und ich kann es nur empfehlen!
- oh! Dalmatien im Folio-Verlag
- Episode des Podcasts Balla Balla Balkan mit dem Interview mit Doris Akrap zu dem Buch
Fresh bread
"A loaf of bread please"
"I'm sorry, I can't cut it because it's still hot."
"GIVE. IT. TO. ME."
News about Abstract Wikipedia
Some material about Abstract Wikipedia from the last few days:
- Abstract Wikipedia is a MacArthur 100&Change finalist
- Knowledge in the Age of AI, talk at King's College London by me, on the occasion of becoming a Visiting Professor
- Episode of Between the Brackets by Yaron Koren with me
30 years of wikis
Today is the 30th anniversary of the launch of the first wiki by Ward Cunningham. A page that anyone could edit. Right from the browser. It was generally seen as a bad idea. What if people did bad things?
Originally with the goal to support the software development community in creating a repository of software design patterns, wikis were later used for many other goals (even an encyclopedia!), and became part of the recipe, together with blogs, fora and early social media, that was considered the Web 2.0.
Thank you, Ward, and congratulations on the first 30 years.
A wiki birthday card is being collected on Wikiindex.
My thoughts on Alignment research
Alignment research seeks to ensure that hypothetical future superintelligent AIs will be beneficial to humanity—that they are "aligned" with "our goals," that they won’t turn into Skynet or universal paperclip factories.
But these AI systems will be embedded in larger processes and organizations. And the problem is: we haven’t even figured out how to align those systems with human values.
Throughout history, companies and institutions have committed atrocious deeds—killing, poisoning, discriminating—sometimes intentionally, sometimes despite the best intentions of the individuals within them. These organizations were composed entirely of humans. There was no lack of human intelligence that could have recognized and tempered their misalignment.
Sometimes, misalignment was prevented. When it was, we might have called the people responsible heroes—or insubordinate. We might have awarded them medals, or they might have lost their lives.
Haven’t we all witnessed situations where a human, using a computer or acting within an organization, seemed unable to do the obvious right thing?
Yesterday, my flight to Philadelphia was delayed by a day. So I called the hotel I had booked to let them know I’d be arriving later.
The drama and the pain the front desk clerk went through!
“If you don’t show up today,” he told me, “your whole reservation will be canceled by the system. And we’re fully booked.”
“That’s why I’m calling. I am coming—just a day later. I’m not asking for a refund.”
“No, look, the system won’t let me just cancel one night. And I can’t create a new reservation. And if you don’t check in today, your booking will be canceled…”
And that was a minor issue. The clerk wanted to help. It is a classical case of Little Britain's "Computer says no" sketch. And yet, more and more decisions are being made algorithmically—decisions far more consequential than whether I’ll have a hotel room for the night. Decisions about mortgages and university admissions. Decisions about medical procedures. Decisions about clemency and prison terms. All handled by systems that are becoming increasingly "intelligent"—and increasingly opaque. Systems in which human oversight is diminishing, for better and for worse.
For millennia, organizations and institutions have exhibited superhuman capabilities—sometimes even superhuman intelligence. They accomplish things no individual human could achieve alone. Though we often tell history as a story of heroes and individuals, humanity’s greatest feats have been the work of institutions and societies. Even the individuals we celebrate typically succeeded because they lived in environments that provided the time, space, and resources to focus on their work.
Yet we have no reliable way of ensuring that these superhuman institutions—corporations, governments, bureaucracies—are aligned with the broader goals of humanity. We know that laissez-faire policies have allowed companies to do terrible things. We know that bureaucracies, over time, become self-serving, prioritizing their own growth over their original purpose. We know that organizations can produce outcomes directly opposed to their stated missions.
And these misalignments happen despite the fact that these organizations are made up of humans—beings with whom we are intimately familiar. If we can’t even align them, what hope do we have of aligning an alien, inhuman intelligence? Or even a superintelligence?
More troubling still: why should we accept a future in which only a handful of trillion-dollar companies—the dominant tech firms of the Western U.S.—control access to such powerful, unalignable systems? What have these corporations done to earn such an extraordinary level of trust in a technology that some fear could be catastrophic?
What am I arguing for? To stop alignment research? No, not at all. But I would love for us to shift our focus to the short- and mid-term effects of these technologies. Instead of debating whether we might have to fight Skynet, we should be considering how to prevent further concentration of wealth by 2030 and how to ensure a fairer distribution of the benefits these technologies bring to humanity. Instead of worrying about Roko’s basilisk, we should examine the impact of LLMs on employment markets—especially given the precarious state of unions and labor regulations in certain countries. Rather than fixating on hypothetical paperclip-maximizing AIs, we should focus on the real and immediate dangers of lethal autonomous weapons in warfare and terrorism.
The Editors
I finished reading "The Editors" by Stephen Harrison, and I really enjoyed it. The novel follows some crucial moments of Infopendium, a free, editable online encyclopedia with mostly anonymous contributors. The setting is a fictionalized version of Wikipedia, and set around the beginning of the COVID pandemic.
The author is a journalist who has covered Wikipedia before, and now has written a fictional novel. It's not a roman à clef - the events described here have not happened for Wikipedia, even though some of the characters feel very much inspired by real Wikipedia contributors. I constantly had people I know playing the roles of DejaNu, Prospero, DocMirza, and Telos in my inner cinema. And as the book continued I found myself apologizing in my mind to the real people, because they would never act as in the book.
There were some later scenes I had a lot of trouble to suspend disbelief for, but it's hard to say which ones without spoiling too much. Also, I'm very glad that the real world Wikipedia is far more technically robust than Infopendium seems to be.
I recommend reading it. It offers a fictional entrypoint to ideas like edit wars, systemic bias, the pushback to it, anonymous collaboration, community values, sock puppets, conflict of interest, paid editing, and more, and I found it also a good yarn, with a richly woven plot. Thanks for the book!
AI and centralization
We have a number of big American companies with a lot of influential connections which have literally spent billions of dollars into developing large models. And then another company comes in and releases a similar product available for free.
Suddenly, trillions of dollars are on the line. With their connections they can call for regulation, designed to protect their investment. They could claim that the free system is unsafe and dangerous, as Microsoft and Oracle were doing in the 90s with regards to open source. They could try to use and extend copyright once they have benefitted from the loose regulations, as Disney was doing in the 60s to 90s. They could increase the regulatory hurdles to enter the market. They could finance scientific studies, philosophers and ethicists to publish about the dangers and benefits of having this technology widely available, another playbook tobacco and oil companies have been following for decades.
It's about trillions of dollars. Some technology giants are seeing that opportunity to make easy money dissipate. They would love if everyone has to use their models, running on their cloud infrastructure. They would love if every little app made many calls to their services, sending a constant stream of money to them, if every piece of value created had an effective AI "tax" they would collect. In the 90s and 00s Microsoft made huge amounts of money through the OS "tax", then Apple and Google and Microsoft made huge amounts of money through the app store "tax". Amazon and Microsoft and Google and OpenAI would love to have a repeat of that business model.
I would expect a lot of soft and hard power to be pushed around in the coming months. Many old playbooks reiterated, but also new playbooks introduced. Unimaginable amounts of value and money can and will be made, but how it will be distributed is an utterly non-transparent process. I don't know what an effective way would be to avoid a highly centralized world, to ensure that the fruits of all this work is distributed just a little bit more equally, to have a world in which we all have a bit of equity in the value being created.
To state it clearly: I'm not afraid of a superintelligent AI that will turn us all into paperclips. I'm afraid of a world where a handful of people have centralized extreme amounts of power and wealth, and where most of us struggle with living a good life in dignity. I'm afraid of a world where we don't have a say anymore in what happens. I'm afraid of a world where we effectively lost democracy and individual agency.
There is enough to go around to allow everyone to live a good life. And AI has the opportunity to add even more value to the world. But this will go with huge disruptions. How we distribute the wealth, value and power in the world is going to be one of the major questions of the 21st century. Again.
Languages with the best lexicographic data coverage in Wikidata 2024
Languages with the best coverage as of the end of 2023
- English 93.1% (=, +0.2%)
- Italian 92.6% (+7, +9.7%)
- Danish 92.3% (+3, +5.4%)
- Spanish 91.8% (-2, +0.5%)
- Norwegian Bokmal 89.4% (-2, +0.3%)
- Swedish 89.3% (-2, +0.4%)
- French 87.6% (-2, +0.6%)
- Latin 85.7% (-1, -0.1%)
- Norwegian Nynorsk 81.8% (+1, +1.6%)
- Estonian 81.3% (-1, +0.1%)
- German 79.6% (=, +0.1%)
- Malay 77.8% (+2, +4.7%)
- Basque 75.9% (-1, =)
- Portuguese 74.9% (-1, +0.1%)
- Panjabi 73.3% (=, +2.3%)
- Breton 71.1% (+1, +3.8%)
- Czech 69.3% (NEW, +6.1%)
- Slovak 67.8% (-2, =)
- Igbo 67.8% (NEW, +2.0%)
What does the coverage mean? Given a text (usually Wikipedia in that language, but in some cases a corpus from the Leipzig Corpora Collection), how many of the occurrences in that text are already represented as forms in Wikidata's lexicographic data. The first number in the parentheses is the change in rank compared to last year, and the second number the change in coverage compared to last year.
The list contains all languages where the data covers more than two thirds of the selected corpus.
English managed to keep the lead, but the distance to the second place melted from 1.6% last year to a mere 0.5% this year. Italian and Danish made huge jumps forward, Italian by increasing coverage by almost 10% and raising seven ranks to second place. Compared to last year, two new languages made it into the top list, Czech and Igbo, both cracking the ⅔ limit to join the top list – Hindi just being behind at 66.5%.
The complete data is available on Wikidata.
Progress in lexicographic data in Wikidata 2024
Here are some highlights of the progress in lexicographic data in Wikidata in 2024
- Hausa: jumped from 1.5% coverage right to 40%
- Danish: Danish also made another huge jump forward, increasing the number of forms from 170k to 570k, form coverage from 33% to 52%, and token coverage from 87% to 92%
- Italian: Italian made another huge push, increased the number of forms from 290k to 410k, and the coverage from 83% to 93%
- Spanish: Spanish also kept pushing forward, increasing the number of forms from 440k to 560k, and the coverage from 91.3% to 91.8%
- Norwegian (Nynorsk): increased the number of forms from 67k to 88k, and coverage from 80% to 82%
- Czech: increased the coverage from 63% to 69%, the number of forms from 190k to 210k
- Tamil: almost doubled the number of forms from 3800 to 6600, increasing coverage from 8% to 11%
- Breton: added 1000 new forms, increasing the coverage from 67% to 71%
- Croatian: increased from 4k to 5.5k forms, improving coverage from 45% to 48%
What does the coverage mean? Given a text (usually Wikipedia in that language, but in some cases a corpus from the Leipzig Corpora Collection), how many of the occurrences in that text are already represented as forms in Wikidata's lexicographic data. Note that every percent more gets much more difficult than the previous one: an increase from 1% to 2% usually needs much much less work than from 91% to 92%.
See also last year's progress.
Wikidata lexicographic data coverage for Croatian in 2024
For last year I picked up an ambitious goal for growing the lexicographic data for Croatian in 2024. And, just like last year, I missed again.
My goal was to grow the coverage to 50% - i.e. half of all the words in a Croatian corpus would be found in Wikidata. Instead, we grew from 45.5% to 47.9%. The number of forms grew from 4115 to 5506, more than a thousand new forms, a far bigger growth in forms than last year. So, even though the goal was missed, the speed of growth in Croatian is accelerating.
Part of that growth in forms is due to Google's Wordgraph release, a free dataset with words in about 40 languages which describe people - both demonyms and professions.
Do I want to set again a goal? After missing it twice, I am hesitant. Would I again reduce the goal further? But less than 50% sounds defeatist. But back to 60% is obviously too much. So, yes, let's go for 50% again. Let's see where it will take us this time. It's only 2.1% of coverage away from 50%, so that should be doable.
Large Language Models, Knowledge Graphs and Search Engines
How can Large Language Models (LLMs), Knowledge Graphs and Search Engines be combined to best serve users? What are the strengths and limitations of these technologies?
Aidan Hogan (Universidad de Chile, previously DERI, Linked data), Luna Dong (Meta, previously Amazon and Google), Gerhard Weikum (MPI, Yago), and myself (Wikimedia, previously Google) have been invited to give keynotes on this topic in the last year or two, on different conferences. Now we wrote a paper together to synthesise and capture some of the ideas we were presenting.
- Large Language Models, Knowledge Graphs and Search Engines: A Crossroads for Answering Users' Questions, arxiv.org/abs/2501.06699
Translating Nazor: The Man Who Lost a Button
The most famous child of the island of Brač is very likely Vladimir Nazor. His books are part of the canon for Croatian children, and, as fate has laid it out, he also happened to become the first head of state of Croatia during and after World War II.
In 1924, exactly a hundred years ago, he published "Stories from childhood", a collection of short stories. I took one of his stories from that collection and translated it into English, to make some of his work more accessible to more readers, and to see how I would do with such a translation.
I welcome you to read "The Man Who Lost a Button". Feedback, comments, and reviews are very welcome. I am also planning to make a translation into German, but I don't know how long that will take.
2024 US election
Some thoughts on the US election.
Wrong theory: 2024 was lost because Harris voters stayed home
I first believed that Harris lost just because people were staying home compared to 2020. But that, by itself, is an insufficient explanation.
At first glance, this seems to hold water: currently, we have 71 million votes reported for Harris, and 75 million votes reported for Trump, whereas last time Biden got 81 million votes and Trump 74 million votes. 10 million votes less is enough to lose an election, right?
There are two things that make this analysis insufficient: first, California is really slow at counting, and it is likely that both candidates will have a few million votes more when all is counted. Harris already has more votes than any candidate ever had, besides Biden and Trump.
Trump already has more votes than he got in the previous two elections. In 2020, more people voted for Trump than in 2016. In 2024, more people voted for Trump than in 2020.
Second, let’s look at the states that switched from Biden to Trump:
- Wisconsin and Georgia: both Trump and Harris got more votes than Trump or Biden respectively in 2020
- Pennsylvania, Nevada and Michigan: Trump already has more voters in 2024 than Biden had in 2020. Even if Harris had the same number of voters as Biden had in 2020, she would have lost these states.
- Arizona still hasn’t counted a sixth of their votes, and it is unclear where the numbers will end up. If we just extrapolate linearly, Arizona will comfortably be in one of the two buckets above.
Result: There is no state where Biden’s 2020 turnout would have made a difference for Harris. (With the possible but unlikely exception of Arizona, where the counting is still lagging behind)
Yes, 10 million votes fewer for Harris than for Biden looks terrible and like sufficient explanation, but 1) this is not the final result and it will become much tighter, and 2) it wouldn’t have made a difference.
California is slow at counting
I was really confused: why had California only reported two thirds of its votes so far. I found the article below, explaining some of it, but it really seems a home-made mess for California, and one that the state should clean up.
https://www.berkeleyside.org/2024/11/08/alameda-county-election-results-slow-registrar
Voting results in PDF instead of JSON
Voting results in Alameda County will be released as PDF instead of JSON. The Registrar for Votes “recently told the Board of Supervisors that he’s following guidance from the California Secretary of State, which is recommending the PDF format to better safeguard the privacy of voters.”
This statement is wrong. JSON does not safeguard the privacy of voters any better than PDF does. This statement is not just wrong, it doesn’t even make sense.
In 2022, thanks to the availability of the JSON files, a third-party audit found an error in one Alameda election, resulting in the wrong person being certified. “Election advocates say the PDF format is almost impossible to analyze, which means outside organizations won’t be able to double-check [...] [I]f the registrar had released the cast vote record in PDF format in 2022, the wrong person would still be sitting in an OUSD board seat.”
The county registrar is just following the California Secretary of State. According to a letter by the registrar: “If a Registrar intends to produce the CVR [Cast Vote Record], it must be in a secure and locked PDF format. The Secretary of State views this as a directive that must be followed according to state law. I noted that this format does not allow for easy data analysis. The Secretary of State’s Office explained that they were aware of the limitations when they issued this directive. [...] San Francisco has historically produced its CVR in JSON format, contrary to the Secretary of State's directive. The Secretary of State’s office has informed me that they are in discussions with San Francisco to bring them into compliance”.
Sources:
- https://www.berkeleyside.org/2024/11/08/alameda-county-election-results-slow-registrar
- https://oaklandside.org/wp-content/uploads/2024/11/FW_-Update-on-Cast-Vote-Record-CVR-Production-1.pdf
- https://oaklandside.org/2022/12/28/alameda-county-registrar-miscounted-ballots-oakland-election-2022/
It was not a decisive win
There are many analyses about why Harris lost the election, and many are going far overboard, and often for political reasons, with the aim to influence the platform of the Democratic party for the next election. This wasn’t a decisive win.
I wanted to make the argument that 30k voters in Wisconsin, 80k voters in Michigan, and 140k voters in Pennsylvania would have made the difference. And that’s true. I wanted to compare that with other US elections, and show that this is tighter than usual.
But it’s not. US elections are just often very tight. There are exceptions, the first Obama election was such an exception. But in general, American elections are tight (I’ll define a tight election as “if I can find that by flipping less than 0.5% of the voters, a different president would have been elected”).
I don’t know how advisable it is to make big decisions on a basically random outcome.
How to pronounce MySQL
Today I learned (or re-learned), that the "My" in MySQL does not stand for the English word my but for the Swedish name My, which is the name of the daughter of MySQL co-founder Michael Widenius. ♡
The name My was introduced by Tove Jansson for the Moomins character Little My.
According to the Words & Stuff blog: "Turns out that she was named after the Greek letter mu; for the Finnish pronunciation of Myy, see the video. In English, it turns out that her name is pronounced like the English word my (/maɪ/), rhyming with the English word hi."
So you could pronounce My as [ˈmyː] or as [maɪ].
This is in addition to the well-known discussion about how to pronounce SQL, which I will not further dive into here.
By the way, the MySQL documentation defines the official pronunciation: "The official way to pronounce “MySQL” is “My Ess Que Ell” (not “my sequel”), but we do not mind if you pronounce it as “my sequel” or in some other localized way.", but it seems that when speaking Swedish the MySQL developers also say "mü-ess-ku-ell" (source).
A passport odyssey
A story of hope, decades long lost friends, and love beyond borders. A story of going to a new world, a story of challenges. But above all, a story of bureaucracy.
Almost three years ago, my wife and I were blessed with our little sunshine. She was born in the City by the Bay, San Francisco, just a few months after we moved there from Berlin, Germany. A few weeks after her birth, we decided to start the process that would get her the papers confirming she’s a European citizen — I am Croatian, and thus by Croatian law, she is Croatian too. All we needed was to get the paperwork done so that she actually holds the Croatian passport in her little hands. How hard could that be?
The closest Croatian consulate is in Los Angeles, but they offer a great service: more or less regularly they come to different cities in their area of responsibility, and offer consular services there. I called the consulate in Los Angeles, and figured out what papers we needed, and when they would be close to San Francisco the next time. It was a few weeks later that we drove to San Jose and to submit all necessary paperwork.
Waiting at the consulate, I noticed a man who looked like he was from Brač, the same island I am from. Now note that Croatia has more than 4 Million people, and Brač only has 14,434 of those, so the sheer probability of him being from Brač was less than one percent — if he was from Croatia at all. I told my wife that I think he’s from Brač.
“What? How would you know?”
“He looks like it.”
“What do you mean, he looks like it?”
“I don’t know. He does.”
“That’s nonsense.”
“I’m gonna ask him.”
As said, Brač is an island, so it might be that this little bit of isolation might have lead to people look in a certain way. Or it might just be that this specific nose just looked too much like my cousin’s nose. Who knows. I went over, and asked him.
He was.
So we started talking about people that we both know (turns out, there were a few). After a minute or two, a lady overheard us talking and also chimed in. She also knew a few of those people. She also happened to be from Brač. We figured that we had quite a few common acquaintances, until I suddenly mentioned my parents’ names.
The lady looked at me in shock. She asked, to be sure she didn’t mishear. I confirmed. She asked again. I confirmed. She started crying.
Which was a bit awkward.
It turns out, that my mother and she were classmates. Like half a century ago, half a world away, they went to the same school every morning. She had emigrated to California many years ago, and she had visited my mother in Supetar on Brač when I was the age of my daughter. She had played with me more than thirty years ago. On the spot, I gave a call to my mother and let the two of them talk. What a surprise!
But back to the paperwork. There was a small extra step required, it turned out. My wife and I had, in fact, not yet registered our marriage in Croatia. And in order to register my daughter’s birth correctly it would be necessary to first register our marriage.
A year earlier, we already had tried that once, but it failed because of a tiny problem.
We got married two years before in Berlin, as we lived there. And as we were planning to travel to Croatia rather soon, we thought we would register our marriage in Croatia instead of through the consulate in Berlin. Should be much simpler.
So on a very hot summer day four years ago we went to the administration in Supetar on Brač in order to register our marriage. We had all necessary papers with us, but, as said there was a tiny problem: what is my name?
It turns out that my Croatian documents had a dash between my first and second name, effectively turning it in a single double-name. My German papers though, throughout, lack this little dash. And so did our German marriage certificate. No dash. So what was my name? I had my mom there. I asked her. She didn’t know. It was a chaotic birth because I decided to come early. It was a bit of a jumble. She didn’t remember my name. Thanks, mom.
What has happened?
When I was born in Germany — and I am sorry for the flashback within the flashback — the consulate there send a message to the administration in Supetar in what was back then Yugoslavia. Given that this was in the dark ages before the internet, the message was a so-called fax. A fax is a scanner that takes the scanned data and sends it over an active phone connection to another fax, where the scan is printed. Faxes back then usually used about 300 to 1200 bytes per second, and on long distance calls, especially to the islands — where telephone lines were a very rare commodity — such faxes became quite expensive. Because of that, faxes heavily compressed the scanned data. Also scanners and printers back then, especially in fax machines, were not particularly great. The result was that faxes often looked like cheap copies that have travelled around half the world, which was in fact the case.
So when the consulate send a fax to the administration in Supetar, the fax that was received had a little splotch between my first and middle name. When they read it, they read that splotch as a dash, connecting my names. And that is how I was registered in Yugoslavia, and this is how Croatia registered me from the Yugoslav records. In fact, on that hot summer day in Supetar we actually saw the fax from back then — they still had it in their archive, and it really is easily mistaken for a dash — and that is how my name in Croatia and in Germany started diverging.
The administration recognized the error, and offered to immediately fix it. They would correct my papers, issue a new passport, and register the marriage. My name would be cleared.
Alas — we were just a few weeks from emigrating to the United States. Just the week before traveling to Croatia the United States consulate in Berlin had glued our visas into our passports. Changing the passport now would come at the most inconvenient time: even just getting an appointment with the US consulate in time would have been nearly impossible. And so we decided not to fix it at the time.
Fast forward. In order to get the Croatian passport for my newborn I first had to get her nationality confirmed. In order to confirm her nationality I first needed to get her birth registered. In order to get her birth registered I first had to get my marriage registered. In order to get my marriage registered I first had to get my name fixed.
Then the following steps took months of me communicating with the consulate, the consulate communicating with the administration in Croatia, and all back. In the end I got new papers that my name, indeed, had no dash. With that we went and registered our marriage. And with that we registered the baby’s birth. With that we established that she is indeed Croatian. And with that we could ask for a passport to be issued. More than 18 months of back and forth have passed until we reached that point.
A few weeks later, I asked for an update. Another few weeks later again. I didn’t receive any answer. So I called the consulate, to learn that the consul I was working with was not working there anymore. My emails were going nowhere.
I explained my situation. It took a while. I sent the documentation. I expected that all of this might restart from square one, but actually it did not. Within a few weeks my registration was updated, the passport issued, and together with the marriage and birth certificates, and also with a proof of nationality on my new old name, all papers send to us. Just in time for Valentine, my wife and I are now also officially married in Croatia, and my daughter has all the papers that prove she is a Croatian.
Closing this chapter of bureaucracy, I want to thank all people in the administration that were involved. Even though it took a ridiculously long time, everyone was always extremely friendly and helpful. I still find it hard to believe how a little faxing artifact almost four decades ago lead to prolonging a standard process to take years, and that reconnected my mother with a long-lost friend. It is amusing to see how easily reality can turn absurd.
First published on Medium on February 14, 2017.
Trademark on people names?
Seven years ago, a UK born kid was named Loki Skywalker Mowbray. The family was planning to travel to the Dominican Republic and applied for a passport, and the UK Home Office denied the passport because Skywalker is a Trademark of Disney. Same thing happened a few weeks earlier, when a six year old girl named Khaleesi got her passport denied.
Loki got his passport issued, it is said. And I'm baffled that anyone in the Home Office would think that's an acceptable course of action.
The quest for the lost graveyard
About thirty to forty years ago I usually spent my summers in Croatia, on the island of Brač. Some of the time I spent in Donji Humac, the home of my mother’s family, the rest of the time in Pučišća, the home of my father’s family.
In Pučišća, I often spend time with my cousins, including my cousin Robert. Like every kid of that age, we explored the neighborhood, and there was plenty to explore. One day, instead of going our usual way, towards the sea, we went the other direction. We crossed the nearby bypass road, and then, on the other side, found a small graveyard, with a chapel in the middle which also doubled as a crypt for a local rich family.
I remember the pine trees, the shade, the spiderwebs across the trees. I did not remember the name of the family for sure, but I think it was Dominis, or Gospodetnić. I remember the small stone fence which gave the graveyard an almost square shape. I remember the dark plates with the hard to read names, almost washed out by time and the scarce rain.
The main graveyard of Pučišća is in a different place, on the far end of the town, near one of the coves on the way to the large quarry outside of town. Since then, I learned a lot more about the history of Pučišća, and it often mentioned that main graveyard. The history of that place went back to Roman times, featuring a shrine to Jupiter and a little church to Saint Stephen from the 11th century. Also my family’s grave is in that graveyard, but only starting with my grandfather. I could never find where my great grandfather, or earlier generations, were buried.
I came to believe that the other graveyard, the one Robert and I had found, was for the less wealthy people of Pučiśća. I first thought it was the older one, Robert and I called it the 'old graveyard', but this didn’t make sense since the main graveyard literally contains the oldest traces of human settlement in town. We must have been mistaken.
Over the last few years, I tried to figure out more about the graveyard, but none of the sources I read mentioned it. There was also no entry on the find-a-grave website. I used Google Maps and OpenStreetMaps to find it, but failed. I used Google StreetView to follow the bypass road, which has been redone since, but couldn’t find it either. I decided that at the next opportunity I will find the graveyard again, and document all graves on find-a-grave. Maybe I will even find some ancestors.
This year I finally went back to Pučišća for a few weeks. Whereas I found it too hot to do much exploration, on one of the few cooler evenings I decided to finally take the walk, and find it. It took me a while, I wasn’t sure about the way, but eventually I came upon a square enclosure of the right size with a chapel in the middle. The chapel was dedicated to the Lady of Lourdes, and looked somewhat different than I remembered it. In particular, it did not contain a crypt. And although it had pine trees and spider webs, there was not a single grave, merely a large stone cross which had toppled over. On the way back, I also could reconstruct the path that Robert and I took a few decades ago. I am very sure this is the right place, but there are no graves.
I was confused. The next day, I happened to meet Robert. I asked him whether he remembers how we went exploring that direction as kids, and he immediately knew what I was talking about. Seemed to be a core memory for both of us. And then I asked about the graves.
There are no graves, he said. There were never graves. There never was a graveyard, and we had not found one. I had confabulated that whole part. We had found the enclosure and the chapel, but the other memories were an invention of my imagination. No wonder I could never find anything about it.
I am glad I resolved that question. I am a bit surprised by how well established that wrong memory was. Unsurprisingly, I still can recall the wrong memory of the graveyard, even though now I know it is wrong, and any memory of the actual events has long faded and been replaced with my continuous retelling of a story that never happened.
Heading for Germany
We're heading to the airport, to leave the United States, after more than ten years, and settle in Germany. It was a great time. California is amazing and beautiful. We had the opportunity to meet some awesome people, and I hope to stay connected with many of them for the rest of our lives. Thanks to everyone!
Thanks particularly to my wonderful wife who organized this move, and got everything ready for it, including the stressful procurement of an international health certificate for our cat in literally the last day possible. Or getting about hundred boxes packed to be shipped. Or figuring out how to sell a car on a short notice. And many other things, while keeping my back free so I could keep working.
I'm looking forward to come back to Germany, and I hope that my wife and daughter will find welcome and roots in our next part of our journey through life.
For more background on why we are leaving, see the previous post about moving to Germany.
P.S.: International travel with a pet is not recommended.
Github not displaying external contributions anymore
Git is a very widely used version control system. Version control systems are an absolute crucial tool for collaborating and developing software. Git was developed to be a decentralized such system, meaning that people could easier develop their own versions, collaborate on side ideas, and not rely on a single large central repository.
Github is a Microsoft-owned website which made it easy to start, maintain, and share Git repositories. In fact so easy that in many ways the advantages of decentralization that have been built-in into Git have been nullified. Convenience beats many other advantages, or "worse is better", an often stated adage.
Some organizations and projects, such as Wikimedia, decided to host their own Git instance, and not rely on Microsoft's. Due to the decentralized model of Git that's absolutely possible and encouraged. It is a bit of a hassle, but you don't rely on Microsoft for your project.
Github has become an important "hub" for developers, also because they provide profile pages for developers, showing off their contributions, achievements, etc. Hiring managers will often look at a developer's Github page to assess a candidate.
Microsoft made a change that contributions to projects will only "count" and be reflected on the Github profile of the contributor if they are made through Github (unless they are members of the organization owning the mirrored Git). Contributions through other paths don't count for the profile. Microsoft, worth a trillion dollar, is explaining that it's too "nuanced and difficult" for them to continue to display contributions on your profile which happened outside of Github.
I mean, it is clearly the fault of the community to allow Microsoft to embrace and enclosure this space. Will this change be enough to have developers leave Github? (No) How difficult will it be to get hiring managers to not just reflexively look up a Github profile? (Very) Will there be an outcry that will make Microsoft change their mind? (No) Is this just a move to ensure that they enclose the Open Source workflow even more? (They'll say no, and it might even be true, but they sure won't mind that this is happening)
The lesson we should learn, but won't, is to not allow companies to enclose and control such spaces. But we keep doing that, again and again. It's a pity.
Productivity pro tip
- make a list of all things you need to do
- keep that list roughly in order of priority, particularly on the first 3-5 items (lower on the list it doesn't matter that much)
- procrastinate the whole day from doing the number 1 item by doing the number 2 to 5 items
Facebook checking my activity
Facebook locked my account because of unusual behavior. I'm thankful they're checking. I often see obviously spammy behavior on Facebook.
Then they show me my latest posts and comments and ask me which one of these wasn't by me. And they all were by me. There was nothing in the sort of "Oh, I've now seen three of your posts, and you look like a really interesting person. Do you want to be my friend?" or trying to sell NFTs and coins or day trading.
Yeah, no, AI will still take a moment.
Experiment to understand LLMs better
Here’s an experiment I would love to do if I had the resources. Just to start gaining some more understanding of how LLMs work.
- Train an LLM Z on a lot of English text.
- Ensure that the LLM in its response uses correctly the past tense of “go”, “went”, in its responses.
- Ask the LLM directly what the past tense of “to go” is, and expect “went”.
- Remove all sentences / texts from the corpus that contain the word “went”. Add more text to the corpus to make it roughly the same size again.
- Train an LLM A on that corpus.
- Use the same prompts to see what the LLM uses instead of “went”.
- Ask the LLM directly what the past tense of “to go” is. I expect “goed”?
- How many example sentences / texts containing the text “went” does one need to add to the corpus of LLM A and retrain in order for the resulting LLM to get it right. Is one enough? Ten? A thousand?
- Add an explicit sentence ‘The past tense of “to go” is “went”’. to the corpus of LLM A and retrain instead of the implicit training data. Did the trained LLM now get it right? Does it use it right? Does it answer the explicit question correctly?
- Add an explicit sentence to the prompt of LLM A, instead of retraining it. Does it use the word right? Does it answer the explicit question correctly?
If there is some similar work to this out there, or if anyone has some work like this, I’d be very curious for pointers.
P.S.: Also, I would love to see whether people who do research on LLMs could correctly predict the result of this experiment ;)
Taking a self-driving car
Ten years ago, my daughter was just born and I just joined Google, who were working on self-driving cars. And I was always hoping that my daughter would not have to need to learn how to drive a car (but that if she wanted, she may). In the last ten years I lost confidence in that hope.
Yesterday, thanks to my wife organizing it, we took our first ride with a self-driving car, driving about ten minutes through San Francisco. And I guess a world-wide roll out will take time, maybe a lot of time, but what can I say: it drove very well.
Sleeping Lady with a Black Vase
In 2009, a Hungarian art historian was watching the movie Stuart Little with his 3 year old daughter. And he's like "funny, that painting that's used in the set looks like that 1928 black and white photograph I have seen, of a piece of art which has been lost". So he sends a few emails...
Turns out, it *is* the actual artwork by Róbert Berény (1887-1953) which was last seen in public in 1928, and somehow made it to Sony, where it was used in a number of soap opera episodes and in Stuart Little.
The Ring verse in German
I finally got the Lord of the Rings in English. I never read it in its native English, only in a German translation, about thirty years ago.
And already on the first page I am stumped: the ring verse seems to me sooo much better in German than in English. Now, it is absolutely possible that this is due to me having read it as an impressionable teenager and having carried the translation with me for three decades and thus developed fondness and familiarity with it, but I think it's more than that.
Here are the verses in English, German, and a literal back-translation of the German to English:
- Three Rings for the Elven-kings under the sky,
- Seven for the Dwarf-lords in their halls of stone,
- Nine for Mortal Men doomed to die,
- One for the Dark Lord on his dark throne
- In the Land of Mordor where the Shadows lie.
- One Ring to rule them all,
- One Ring to find them,
- One Ring to bring them all,
- and in the darkness bind them
- In the Land of Mordor where the Shadows lie.
German translation by von Freymann:
- Drei Ringe den Elbenkönigen hoch im Licht,'
- Sieben den Zwergenherrschern in ihren Hallen aus Stein,
- Den Sterblichen, ewig dem Tode verfallen, neun,
- Einer dem dunklen Herrn auf dunklem Thron
- Im Lande Mordor, wo die Schatten drohn.
- Einen Ring, sie zu knechten, sie all zu finden,
- ins Dunkle zu treiben und ewig zu binden
- Im Lande Mordor, wo die Schatten drohn.
Back-translation of her translation by me:
- Three Rings for the Elven kings high in the light,
- Seven for the Dwarf-lords in their halls of stone,
- For the mortals, eternally doomed to death, nine,
- One for the Dark Lord on dark throne
- In the Land of Mordor, where the Shadows loom.
- One Ring, to enslave them, to find them,
- to drive to Darkness, and forever bind them
- In the Land of Mordor, where the Shadows loom.
The differences are small, but I find the selection of words by the translator to be stronger and more evocative than Tolkien's original. Which is amazing. Thanks to the great Ebba-Margareta von Freymann for her wonderful translation of the poems!
Originally, the publisher Klett hat trouble with translating Tolkien's poems, but Ebba-Margareta had been, for many years working on the translation of poems by Tolkien, and by using her translations, Klett did a great service to the book for the German-speaking world.
The height of Anson Mount
Slop is filling up the Internet.
Today my Google Now feed even suggested (!) the following page which was focused solely on the height of Anson Mount. Now I assume Google thinks I'm interested in the actor because I've read about Star Trek.
https://berkah.blob.core.windows.net/ernews/how-tall-is-anson-mount.html
The article has a certain fascination, because it claims to be the ultimate guide to Anson Mount's height, and it goes in a lot of detail about it, for example explaining that height is often measured in feet and inches, or how having more height helps Mount find better fitting clothes.
It's also fascinating because it gives his height as 6'3 / 1.91. Google Knowledge Graph claims 6'1 / 1.85 without a source. And IMDb states 5'11½ / 1.82. The website Celebrity Heights lists 5'11¼ / 1.81. I kid you not.
That makes me wonder whether I'm yearning back to times when people were publishing stuff like this (I'm not):
Here we see reporting about a Twitter discussion between Mount and director James Gunn about actors lying about their height, and Mount seemingly being touchy about that subject.
The algorithmically pushed article also mentions Mount's place of birth in Tennessee (Wikipedia though says Illinois, but trust whom you will).
The Web has, almost from the beginning, been a place that you shouldn't trust blindly. I used to trust Google to be a first layer of defense. But the last few weeks indicate that this is no longer the case. Google will now push AI generated slop right to me, whereas it should try to keep me from even pulling it from the Web. I hope Google will figure that out.
In the last few weeks it's getting increasingly difficult to get correct information on the Web. I'm noticing it around Pokemon Go, where I look up whether a Pokemon has already been released, or how to evolve it. I get arbitrary answers, which I found plain wrong several times. Google's results are not ranked by trustworthiness, and now I have to start to remember which sites to trust, which sucks.
This is going to be exhausting.
(And if you think this is only true about pop culture stuff, then bless your heart)
Little Richard and James Brown
When Little Richard started becoming more famous, he already had signed up for a number of gigs but was then getting much better opportunities coming in. He was worried about his reputation, so he did not want to cancel the previous agreed gigs, but also did not want to miss the new opportunities. Instead he sent a different singer who was introduced as Little Richard, because most concert goers back then did not know how Little Richard exactly looked like.
The stand-in was James Brown, who at this point was unknown, and who later had a huge career, becoming an inaugural inductee to the Rock and Roll Hall of Fame - two years before Little Richard.
(I am learning a lot from and am enjoying Andrew Hickey's brilliant podcast "A History of Rock and Roll in 500 Songs")
Johnny Cash and Stalin
Johnny Cash was the first American to learn about Stalin's death.
At that time, Cash was a member of the Armed Forces and stationed in Germany. According to Cash, he was the one to intercept the Morse code message about Stalin's death before it was announced.
The Heat Death of the Internet
Good observations, and closing on a hopeful note. Short and pointed read.
Beyoncé's Number One in Country
Beyoncé very explicitly announced her latest album to be a country album, calling it "Cowboy Carter", and her single "Texas Hold 'Em" made her the first Black woman to top Billboard's Hot Country Songs charts.
It is good that Beyoncé made it so glaringly obvious that her song is a country song. The number of Black artists to have topped the Hot Country Song charts is surprisingly small: Charley Pride in the 70s, Ray Charles in a duet with Willie Nelson for one week in 1984, and then Darius Rucker and Kane Brown in the last decade or two.
Maybe one reason to understand why it is so hard to chart for Black artists in this particular genre: "Old Town Road", the debut single by Lil Nas X, first was listed on the Hot Country Song chart, but then Billboard decided that this was a mistake and instead recategorized the song, taking it off the Country charts in March 2019 before it would have become the Number One hit on April 6, 2019 were it not removed.
Billboard released a long explanation explaining that this decision had nothing to do with racism.
Cowboy Carter was released exactly in the same week five years after Old Town Road would have hit Number One.
I guess Beyoncé really wanted to make sure that everyone knows that her album and single are country.
War in the shadows
A few years ago I learned with shock and surprise that in the 1960s and 1970s Croatians have been assassinated by the Yugoslav secret service in other countries, such as Germany, and that the German government back then chose to mostly look away. That upset me. In the last few weeks I listened to a number of podcasts that were going into more details about these events, and it turned out that some of those murdered Croatians were entangled with the WW2 fascist Croatian Ustasha regime -- either by being Ustasha themselves, or by actively working towards recreating the Ustasha regime in Croatia.
Some of the people involved were actively pursing terrorist acts - killing diplomats and trying to kill politicians, hijacking and possibly downing airplanes, bombing cinemas, and even trying an actual armed uprising.
There was a failed attempt of planting seventeen bombs along the Croatian Adria, on tourist beaches, during the early tourist season, and to detonate them all simultaneously, in order to starve off income from tourism for Yugoslavia.
Germany struggled with these events themselves: their own secret service was tasked with protecting the German state, and it was initially even unclear how to deal with organizations whose goal is to destabilize a foreign government. Laws and rules were changed in order to deal with the Croatian extremists, rules that were later applied to the PLO, IRA, Hamas, etc.
Knowing a bit more of the background, where it seems that a communist regime was assassinating fascists and terrorists, does not excuse these acts, nor the German inactivity. It is a political assassination without due process. But it makes it a bit better understandable why the German post-Nazi administration, that was at that time busy with its own wave of terror by the Rote Armee Fraktion RAF, was not giving more attention to these events. And Germany received some of its due when Yugoslavia captured some of the kidnappers and murderers of Hanns Martin Schleyer, and did not extradite them to Germany, but let them go, because Germany did not agree to hand over Croatian separatists in return.
Croatians had a very different reputation in the 1970s than the have today.
I still feel like I have a very incomplete picture of all of these events, but so many things happened that I had no idea about.
Source podcasts in German
- Krieg im Schatten - Auf Deutsch, ein Podcast in sechs Folgen, zu einem bestimmten Fall in den Ereignissen - Mischung aus True Crime und Geschichte
- Episode 78 von "Neues vom Ballaballa-Balkan" - Besser fand ich noch diese Folge, welche die Ereignisse im größeren Zusammenhang betrachtet
Daniel Dennett
R.I.P. Daniel Dennett.
An influential modern voice on the question of Philosophy and AI, especially with the idea of the intentional stance.
Katherine Maher on The Truth
Wikipedia is about verifiable facts from reliable sources. For Wikipedia, arguing with "The Truth" is often not effective. Wikipedians don't argue "because it's true" but "because that's what's in this source".
It is painful and upsetting to see Katherine Maher so viciously and widely attacked on Twitter. Especially for a quote repeated out-of-context which restates one of the foundations of Wikipedia.
I have worked with Katherine. We were lucky to have her at Wikipedia, and NPR is lucky to have her now.
The quote - again, as said, taken out of the context that it stems from the way Wikipedia editors collaborate is: "Our reverence for the truth might be a distraction that's getting in the way of finding common ground and getting things done."
It is taken from this TED Talk by Katherine, which provides sufficient context for the quote.
Partial copyright for an AI generated work
Interesting development in US cases around copyright and AI: author Elisa Shupe asked for copyright registration on a book that was created with the help of generative AI. Shupe stated that not giving her registration would be disabilities discrimination, since she would not have been able to create her work otherwise. On appeal, her work was partially granted protection for the “selection, coordination, and arrangement of text generated by artificial intelligence”, without referral to the disability argument.
Northern Arizona
Last week we had a wonderful trip through Northern Arizona.
Itinerary: starting in Phoenix going Northeast through Tonto National Forest towards Winslow. In Tonto, we met our first surprise, which would become a recurring pattern: whereas we expected Arizona in April to be hot, and we were prepared for hot, it had some really cold spells, and we were not prepared for cold. We started in the Sonoran Desert, surrounded by cacti and sun, but one and a half hours later in Tonto, we were driving through a veritable snow storm, but fortunately, just as it was getting worrisome, we crossed the ridge and started descending towards Winslow to the North.
The Colorado Plateau on the other side of the ridge was then pleasant and warm, and the next days we traveled through and visited the Petrified Forest, Monument Valley, Horseshoe Bend, Antelope Canyon, and more.
After that we headed for the Grand Canyon, but temperatures dropped so low, and we didn't have the right outfit for that, we stayed less than a day there, most of it huddled in the hotel room. Still, the views we got were just amazing, and throwing snowballs was an unexpected fun exercise.
Our last stop took us to Sedona, where we were again welcomed with amazing views. The rocks and formations all had in common that they dramatically changed with the movement of the sun, or with us moving around, and the views were always fresh.
Numbers: Our trip took us about 950 miles / 1500 kilometeres of driving, and I was happy that it was a good Jeep for this trip. The difference in altitude went from 1000 feet / 330 meters in Phoenix up to 8000 feet / 2400 meters driving through Coconino. Temperatures ranged from 86° F / 30° C to 20° F / -7° C.
What I learned again is how big this country is. And how beautiful.
Surprises: One thing that surprised me was how hidden the Canyons can be. Well, you can't hide Grand Canyon, but it is easy to pass by Antelope Canyon and not realizing it is there. Because it is just a cut in the plateau.
I also was surprised about how flat and wide the land is. I have mostly lived in areas where you had mountains or at least hills nearby, but the Colorado Plateau has large wide swaths of flat land. "Once the land was as plane as a pancake".
I mentioned the biggest surprise already, which was how cold it got.
Towns: it was astonishing to see the difference between, on the one side, a town such as Page or Sedona and on the other side Winslow. All three have a similar population, but Page and Sedona felt vigorous, lively, clean, whereas Winslow felt as if it was on the decline, deserted, struggling.
The hotel we stayed in in Winslow, La Posada, was a beautiful, weird, unique jewel that I hesitate to flat-out recommend, it is too unusual for that, but that I still enjoyed experiencing. It is clearly very different from any other hotel I ever stayed in, full of history, and embracing themes of both suicide and hope, respectfully trying to grow with the native population, and aiming to revive the city's old town, and it is difficult to really capture the vibe it was sending out.
For pictures, I am afraid I am pointing to my Facebook posts, which should be visible without login:
- Scottsdale
- Palo Verde tree
- Desert Botanical Garden, Phoenix
- Desert to snowstorm
- Winslow
- La Posada hotel
- Petrified forest
- Monument valley
- Horseshoe bend
- Antelope canyon
- Grand canyon
- Sedona
Crossing eight time zone borders in three hours
Hopi Nation is an enclave within Navajo Nation. Navajo Nation is located across three US states, Arizona, New Mexico, and Utah.
Arizona does not observe daylight saving time. Navajo Nation observes daylight saving time. Hopi Nation does not observe daylight saving time. You can drive three hours in that area and cross timezones eight times.
All of the individual decisions make totally sense:
Arizona does not adhere to daylight saving time because any measure that makes sure Arizona residents get more sunshine is worse than bringing coals to Newcastle, as the saying goes. They are smart to not use daylight saving time.
Navajo Nation uses daylight saving time because they want to have the same timezone for their whole area, and they are also in two other states, Utah and New Mexico, which both have daylight saving time, so they decided to do so too, which makes totally sense.
And Hopi Nation, even though it is enclosed by the Navajo Nation, lies entirely within the state of Arizona, so it makes sense for them to follow *that* state.
All the individual decisions make sense, but the outcome must be rather inconvenient and potentially confusing for the people living there.
(Bonus:the solution for these seem obvious to me. Utah and New Mexico and many other southern US states should just get rid of daylight saving time, just as Arizona did, and Navajo Nation should follow suit. But that's just my opinion.)
New home in Emeryville
Our new (temporary home) is the City of Emeryville. Emeryville has a population of almost 13,000 people. The apartment complex we live in has about 400 units, and I estimate that they have about 2 people on average in each. Assuming that about 90% of the apartments are occupied, this single apartment complex would constitute between 5 and 10% of the population of the whole city.
A conspiracy to kill a browser
Great story about how YouTube helped with moving away from IE6.
- "Our most renegade web developer, an otherwise soft-spoken Croatian guy, insisted on checking in the code under his name, as a badge of personal honor, and the rest of us leveraged our OldTuber status to approve the code review."
I swear that wasn't me. Although I would have loved to do it.
(first published on Facebook March 12, 2024)
35th birthday of the Web
Celebrating the 35th birthday of the World Wide Web, a letter by its founder, Tim Berners-Lee.
Discussing some of the issues of the Web of today: too much centralization, too much exploitation, too much disinformation, all made even more dire by the development of AI.
What to do? Some of the solution the letter mentions are Mastodon, a decentralized social network, and Solid, a Web-standards-based data governance solution, but it recognizes that more is needed, "to back the morally courageous leadership that is rising, collectivise their solutions, and to overturn the online world being dictated by profit to one that is dictated by the needs of humanity." I agree with that, but find it a bit vague.
I first was terribly annoyed that the letter was published on Medium, as this is a symptom of the centralization of the Web. I say, completely conscious that I am discussing it on Facebook. Obviously, both of this should be happening on our own domains, and it also does: I link not to Medium, but to the Web Foundation site, and I also have this posted on my own site and on my Mastodon account. So, it is there, on the real Web, not just on the closed walled gardens of Facebook and on one of the megasites such as Medium. But there is no indication of engagement on the Web Foundation's post, whereas the Medium article records more than 10,000 reactions, and my Facebook post will also show more reactions than my Website (but the Mastodon page could be competitive with Facebook for me).
I want to believe that Solid is the next important step, but Leigh Dodds's recent post on Solid, and particularly the discussion in the post, didn't inspire hope.
Gödel on language
- "The more I think about language, the more it amazes me that people ever understand each other at all." - Kurt Gödel
Rainbows end
Rainbows end.
The book, written in 2006, was set in 2025 in San Diego. Its author, Vernor Vinge, died yesterday, March 20, 2024, in nearby La Jolla, at the age of 79. He is probably best known for popularizing the concept of the Technological Singularity, but I found many other of his ideas far more fascinating.
Rainbows end explores themes such as shared realities, digital surveillance, and the digitisation of the world, years before Marc Andreessen proclaimed that "software is eating the word", describing it much more colorfully and rich than Andreessen ever did.
His other work that I enjoyed is True Names, discussing anonymity and pseudonymity on the Web. A version of the book was published with essays by Marvin Minsky, Danny Hillis, and others. who were inspired by True Names.
His Science Fiction was in a rare genre, which I love to read more about: mostly non-dystopian, in the nearby future, hard sci-fi, and yet, imaginative, exploring novel technologies and their implications on individuals and society.
Rainbows end.
From vexing uncertainty to intellectual humility
A philosopher with schizophrenia wrote a harrowing account of how he experiences schizophrenia. And I wonder if some of the lessons are true for everyone, and what that means for society.
- "It’s definite belief, not certainty, that allows me to get along. It’s not that certainty, or something like it, never matters. If you are fixing dinner for me I’ll try to be clear about the eggplant allergy [...] But most of the time, just having a definite, if unconfirmed and possibly false, belief about the situation is fine. It allows one to get along.
- "I think of this attitude as a kind of “intellectual humility” because although I do care about truth—and as a consequence of caring about truth, I do form beliefs about what is true—I no longer agonize about whether my judgments are wrong. For me, living relatively free from debilitating anxiety is incompatible with relentless pursuit of truth. Instead, I need clear beliefs and a willingness to change them when circumstances and evidence demand, without worrying about, or getting upset about, being wrong. This attitude has made life better and has made the “near-collapses” much rarer."
(first published on Facebook March 13, 2024)
Feeding the cat
Every morning, I lovingly and carefully scoop out every single morsel of meat from the tin of wet food for our cat. And then he eats a tenth of it.