Main Page
Large Language Models, Knowledge Graphs and Search Engines
How can Large Language Models (LLMs), Knowledge Graphs and Search Engines be combined to best serve users? What are the strengths and limitations of these technologies?
Aidan Hogan (Universidad de Chile, previously DERI, Linked data), Luna Dong (Meta, previously Amazon and Google), Gerhard Weikum (MPI, Yago), and myself (Wikimedia, previously Google) have been invited to give keynotes on this topic in the last year or two, on different conferences. Now we wrote a paper together to synthesise and capture some of the ideas we were presenting.
- Large Language Models, Knowledge Graphs and Search Engines: A Crossroads for Answering Users' Questions, arxiv.org/abs/2501.06699
Translating Nazor: The Man Who Lost a Button
The most famous child of the island of Brač is very likely Vladimir Nazor. His books are part of the canon for Croatian children, and, as fate has laid it out, he also happened to become the first head of state of Croatia during and after World War II.
In 1924, exactly a hundred years ago, he published "Stories from childhood", a collection of short stories. I took one of his stories from that collection and translated it into English, to make some of his work more accessible to more readers, and to see how I would do with such a translation.
I welcome you to read "The Man Who Lost a Button". Feedback, comments, and reviews are very welcome. I am also planning to make a translation into German, but I don't know how long that will take.
2024 US election
Some thoughts on the US election.
Wrong theory: 2024 was lost because Harris voters stayed home
I first believed that Harris lost just because people were staying home compared to 2020. But that, by itself, is an insufficient explanation.
At first glance, this seems to hold water: currently, we have 71 million votes reported for Harris, and 75 million votes reported for Trump, whereas last time Biden got 81 million votes and Trump 74 million votes. 10 million votes less is enough to lose an election, right?
There are two things that make this analysis insufficient: first, California is really slow at counting, and it is likely that both candidates will have a few million votes more when all is counted. Harris already has more votes than any candidate ever had, besides Biden and Trump.
Trump already has more votes than he got in the previous two elections. In 2020, more people voted for Trump than in 2016. In 2024, more people voted for Trump than in 2020.
Second, let’s look at the states that switched from Biden to Trump:
- Wisconsin and Georgia: both Trump and Harris got more votes than Trump or Biden respectively in 2020
- Pennsylvania, Nevada and Michigan: Trump already has more voters in 2024 than Biden had in 2020. Even if Harris had the same number of voters as Biden had in 2020, she would have lost these states.
- Arizona still hasn’t counted a sixth of their votes, and it is unclear where the numbers will end up. If we just extrapolate linearly, Arizona will comfortably be in one of the two buckets above.
Result: There is no state where Biden’s 2020 turnout would have made a difference for Harris. (With the possible but unlikely exception of Arizona, where the counting is still lagging behind)
Yes, 10 million votes fewer for Harris than for Biden looks terrible and like sufficient explanation, but 1) this is not the final result and it will become much tighter, and 2) it wouldn’t have made a difference.
California is slow at counting
I was really confused: why had California only reported two thirds of its votes so far. I found the article below, explaining some of it, but it really seems a home-made mess for California, and one that the state should clean up.
https://www.berkeleyside.org/2024/11/08/alameda-county-election-results-slow-registrar
Voting results in PDF instead of JSON
Voting results in Alameda County will be released as PDF instead of JSON. The Registrar for Votes “recently told the Board of Supervisors that he’s following guidance from the California Secretary of State, which is recommending the PDF format to better safeguard the privacy of voters.”
This statement is wrong. JSON does not safeguard the privacy of voters any better than PDF does. This statement is not just wrong, it doesn’t even make sense.
In 2022, thanks to the availability of the JSON files, a third-party audit found an error in one Alameda election, resulting in the wrong person being certified. “Election advocates say the PDF format is almost impossible to analyze, which means outside organizations won’t be able to double-check [...] [I]f the registrar had released the cast vote record in PDF format in 2022, the wrong person would still be sitting in an OUSD board seat.”
The county registrar is just following the California Secretary of State. According to a letter by the registrar: “If a Registrar intends to produce the CVR [Cast Vote Record], it must be in a secure and locked PDF format. The Secretary of State views this as a directive that must be followed according to state law. I noted that this format does not allow for easy data analysis. The Secretary of State’s Office explained that they were aware of the limitations when they issued this directive. [...] San Francisco has historically produced its CVR in JSON format, contrary to the Secretary of State's directive. The Secretary of State’s office has informed me that they are in discussions with San Francisco to bring them into compliance”.
Sources:
- https://www.berkeleyside.org/2024/11/08/alameda-county-election-results-slow-registrar
- https://oaklandside.org/wp-content/uploads/2024/11/FW_-Update-on-Cast-Vote-Record-CVR-Production-1.pdf
- https://oaklandside.org/2022/12/28/alameda-county-registrar-miscounted-ballots-oakland-election-2022/
It was not a decisive win
There are many analyses about why Harris lost the election, and many are going far overboard, and often for political reasons, with the aim to influence the platform of the Democratic party for the next election. This wasn’t a decisive win.
I wanted to make the argument that 30k voters in Wisconsin, 80k voters in Michigan, and 140k voters in Pennsylvania would have made the difference. And that’s true. I wanted to compare that with other US elections, and show that this is tighter than usual.
But it’s not. US elections are just often very tight. There are exceptions, the first Obama election was such an exception. But in general, American elections are tight (I’ll define a tight election as “if I can find that by flipping less than 0.5% of the voters, a different president would have been elected”).
I don’t know how advisable it is to make big decisions on a basically random outcome.
How to pronounce MySQL
Today I learned (or re-learned), that the "My" in MySQL does not stand for the English word my but for the Swedish name My, which is the name of the daughter of MySQL co-founder Michael Widenius. ♡
The name My was introduced by Tove Jansson for the Moomins character Little My.
According to the Words & Stuff blog: "Turns out that she was named after the Greek letter mu; for the Finnish pronunciation of Myy, see the video. In English, it turns out that her name is pronounced like the English word my (/maɪ/), rhyming with the English word hi."
So you could pronounce My as [ˈmyː]
or as [maɪ]
.
This is in addition to the well-known discussion about how to pronounce SQL, which I will not further dive into here.
By the way, the MySQL documentation defines the official pronunciation: "The official way to pronounce “MySQL” is “My Ess Que Ell” (not “my sequel”), but we do not mind if you pronounce it as “my sequel” or in some other localized way.", but it seems that when speaking Swedish the MySQL developers also say "mü-ess-ku-ell" (source).
A passport odyssey
A story of hope, decades long lost friends, and love beyond borders. A story of going to a new world, a story of challenges. But above all, a story of bureaucracy.
Almost three years ago, my wife and I were blessed with our little sunshine. She was born in the City by the Bay, San Francisco, just a few months after we moved there from Berlin, Germany. A few weeks after her birth, we decided to start the process that would get her the papers confirming she’s a European citizen — I am Croatian, and thus by Croatian law, she is Croatian too. All we needed was to get the paperwork done so that she actually holds the Croatian passport in her little hands. How hard could that be?
The closest Croatian consulate is in Los Angeles, but they offer a great service: more or less regularly they come to different cities in their area of responsibility, and offer consular services there. I called the consulate in Los Angeles, and figured out what papers we needed, and when they would be close to San Francisco the next time. It was a few weeks later that we drove to San Jose and to submit all necessary paperwork.
Waiting at the consulate, I noticed a man who looked like he was from Brač, the same island I am from. Now note that Croatia has more than 4 Million people, and Brač only has 14,434 of those, so the sheer probability of him being from Brač was less than one percent — if he was from Croatia at all. I told my wife that I think he’s from Brač.
“What? How would you know?”
“He looks like it.”
“What do you mean, he looks like it?”
“I don’t know. He does.”
“That’s nonsense.”
“I’m gonna ask him.”
As said, Brač is an island, so it might be that this little bit of isolation might have lead to people look in a certain way. Or it might just be that this specific nose just looked too much like my cousin’s nose. Who knows. I went over, and asked him.
He was.
So we started talking about people that we both know (turns out, there were a few). After a minute or two, a lady overheard us talking and also chimed in. She also knew a few of those people. She also happened to be from Brač. We figured that we had quite a few common acquaintances, until I suddenly mentioned my parents’ names.
The lady looked at me in shock. She asked, to be sure she didn’t mishear. I confirmed. She asked again. I confirmed. She started crying.
Which was a bit awkward.
It turns out, that my mother and she were classmates. Like half a century ago, half a world away, they went to the same school every morning. She had emigrated to California many years ago, and she had visited my mother in Supetar on Brač when I was the age of my daughter. She had played with me more than thirty years ago. On the spot, I gave a call to my mother and let the two of them talk. What a surprise!
But back to the paperwork. There was a small extra step required, it turned out. My wife and I had, in fact, not yet registered our marriage in Croatia. And in order to register my daughter’s birth correctly it would be necessary to first register our marriage.
A year earlier, we already had tried that once, but it failed because of a tiny problem.
We got married two years before in Berlin, as we lived there. And as we were planning to travel to Croatia rather soon, we thought we would register our marriage in Croatia instead of through the consulate in Berlin. Should be much simpler.
So on a very hot summer day four years ago we went to the administration in Supetar on Brač in order to register our marriage. We had all necessary papers with us, but, as said there was a tiny problem: what is my name?
It turns out that my Croatian documents had a dash between my first and second name, effectively turning it in a single double-name. My German papers though, throughout, lack this little dash. And so did our German marriage certificate. No dash. So what was my name? I had my mom there. I asked her. She didn’t know. It was a chaotic birth because I decided to come early. It was a bit of a jumble. She didn’t remember my name. Thanks, mom.
What has happened?
When I was born in Germany — and I am sorry for the flashback within the flashback — the consulate there send a message to the administration in Supetar in what was back then Yugoslavia. Given that this was in the dark ages before the internet, the message was a so-called fax. A fax is a scanner that takes the scanned data and sends it over an active phone connection to another fax, where the scan is printed. Faxes back then usually used about 300 to 1200 bytes per second, and on long distance calls, especially to the islands — where telephone lines were a very rare commodity — such faxes became quite expensive. Because of that, faxes heavily compressed the scanned data. Also scanners and printers back then, especially in fax machines, were not particularly great. The result was that faxes often looked like cheap copies that have travelled around half the world, which was in fact the case.
So when the consulate send a fax to the administration in Supetar, the fax that was received had a little splotch between my first and middle name. When they read it, they read that splotch as a dash, connecting my names. And that is how I was registered in Yugoslavia, and this is how Croatia registered me from the Yugoslav records. In fact, on that hot summer day in Supetar we actually saw the fax from back then — they still had it in their archive, and it really is easily mistaken for a dash — and that is how my name in Croatia and in Germany started diverging.
The administration recognized the error, and offered to immediately fix it. They would correct my papers, issue a new passport, and register the marriage. My name would be cleared.
Alas — we were just a few weeks from emigrating to the United States. Just the week before traveling to Croatia the United States consulate in Berlin had glued our visas into our passports. Changing the passport now would come at the most inconvenient time: even just getting an appointment with the US consulate in time would have been nearly impossible. And so we decided not to fix it at the time.
Fast forward. In order to get the Croatian passport for my newborn I first had to get her nationality confirmed. In order to confirm her nationality I first needed to get her birth registered. In order to get her birth registered I first had to get my marriage registered. In order to get my marriage registered I first had to get my name fixed.
Then the following steps took months of me communicating with the consulate, the consulate communicating with the administration in Croatia, and all back. In the end I got new papers that my name, indeed, had no dash. With that we went and registered our marriage. And with that we registered the baby’s birth. With that we established that she is indeed Croatian. And with that we could ask for a passport to be issued. More than 18 months of back and forth have passed until we reached that point.
A few weeks later, I asked for an update. Another few weeks later again. I didn’t receive any answer. So I called the consulate, to learn that the consul I was working with was not working there anymore. My emails were going nowhere.
I explained my situation. It took a while. I sent the documentation. I expected that all of this might restart from square one, but actually it did not. Within a few weeks my registration was updated, the passport issued, and together with the marriage and birth certificates, and also with a proof of nationality on my new old name, all papers send to us. Just in time for Valentine, my wife and I are now also officially married in Croatia, and my daughter has all the papers that prove she is a Croatian.
Closing this chapter of bureaucracy, I want to thank all people in the administration that were involved. Even though it took a ridiculously long time, everyone was always extremely friendly and helpful. I still find it hard to believe how a little faxing artifact almost four decades ago lead to prolonging a standard process to take years, and that reconnected my mother with a long-lost friend. It is amusing to see how easily reality can turn absurd.
First published on Medium on February 14, 2017.
Trademark on people names?
Seven years ago, a UK born kid was named Loki Skywalker Mowbray. The family was planning to travel to the Dominican Republic and applied for a passport, and the UK Home Office denied the passport because Skywalker is a Trademark of Disney. Same thing happened a few weeks earlier, when a six year old girl named Khaleesi got her passport denied.
Loki got his passport issued, it is said. And I'm baffled that anyone in the Home Office would think that's an acceptable course of action.
The quest for the lost graveyard
About thirty to forty years ago I usually spent my summers in Croatia, on the island of Brač. Some of the time I spent in Donji Humac, the home of my mother’s family, the rest of the time in Pučišća, the home of my father’s family.
In Pučišća, I often spend time with my cousins, including my cousin Robert. Like every kid of that age, we explored the neighborhood, and there was plenty to explore. One day, instead of going our usual way, towards the sea, we went the other direction. We crossed the nearby bypass road, and then, on the other side, found a small graveyard, with a chapel in the middle which also doubled as a crypt for a local rich family.
I remember the pine trees, the shade, the spiderwebs across the trees. I did not remember the name of the family for sure, but I think it was Dominis, or Gospodetnić. I remember the small stone fence which gave the graveyard an almost square shape. I remember the dark plates with the hard to read names, almost washed out by time and the scarce rain.
The main graveyard of Pučišća is in a different place, on the far end of the town, near one of the coves on the way to the large quarry outside of town. Since then, I learned a lot more about the history of Pučišća, and it often mentioned that main graveyard. The history of that place went back to Roman times, featuring a shrine to Jupiter and a little church to Saint Stephen from the 11th century. Also my family’s grave is in that graveyard, but only starting with my grandfather. I could never find where my great grandfather, or earlier generations, were buried.
I came to believe that the other graveyard, the one Robert and I had found, was for the less wealthy people of Pučiśća. I first thought it was the older one, Robert and I called it the 'old graveyard', but this didn’t make sense since the main graveyard literally contains the oldest traces of human settlement in town. We must have been mistaken.
Over the last few years, I tried to figure out more about the graveyard, but none of the sources I read mentioned it. There was also no entry on the find-a-grave website. I used Google Maps and OpenStreetMaps to find it, but failed. I used Google StreetView to follow the bypass road, which has been redone since, but couldn’t find it either. I decided that at the next opportunity I will find the graveyard again, and document all graves on find-a-grave. Maybe I will even find some ancestors.
This year I finally went back to Pučišća for a few weeks. Whereas I found it too hot to do much exploration, on one of the few cooler evenings I decided to finally take the walk, and find it. It took me a while, I wasn’t sure about the way, but eventually I came upon a square enclosure of the right size with a chapel in the middle. The chapel was dedicated to the Lady of Lourdes, and looked somewhat different than I remembered it. In particular, it did not contain a crypt. And although it had pine trees and spider webs, there was not a single grave, merely a large stone cross which had toppled over. On the way back, I also could reconstruct the path that Robert and I took a few decades ago. I am very sure this is the right place, but there are no graves.
I was confused. The next day, I happened to meet Robert. I asked him whether he remembers how we went exploring that direction as kids, and he immediately knew what I was talking about. Seemed to be a core memory for both of us. And then I asked about the graves.
There are no graves, he said. There were never graves. There never was a graveyard, and we had not found one. I had confabulated that whole part. We had found the enclosure and the chapel, but the other memories were an invention of my imagination. No wonder I could never find anything about it.
I am glad I resolved that question. I am a bit surprised by how well established that wrong memory was. Unsurprisingly, I still can recall the wrong memory of the graveyard, even though now I know it is wrong, and any memory of the actual events has long faded and been replaced with my continuous retelling of a story that never happened.
Heading for Germany
We're heading to the airport, to leave the United States, after more than ten years, and settle in Germany. It was a great time. California is amazing and beautiful. We had the opportunity to meet some awesome people, and I hope to stay connected with many of them for the rest of our lives. Thanks to everyone!
Thanks particularly to my wonderful wife who organized this move, and got everything ready for it, including the stressful procurement of an international health certificate for our cat in literally the last day possible. Or getting about hundred boxes packed to be shipped. Or figuring out how to sell a car on a short notice. And many other things, while keeping my back free so I could keep working.
I'm looking forward to come back to Germany, and I hope that my wife and daughter will find welcome and roots in our next part of our journey through life.
For more background on why we are leaving, see the previous post about moving to Germany.
P.S.: International travel with a pet is not recommended.
Github not displaying external contributions anymore
Git is a very widely used version control system. Version control systems are an absolute crucial tool for collaborating and developing software. Git was developed to be a decentralized such system, meaning that people could easier develop their own versions, collaborate on side ideas, and not rely on a single large central repository.
Github is a Microsoft-owned website which made it easy to start, maintain, and share Git repositories. In fact so easy that in many ways the advantages of decentralization that have been built-in into Git have been nullified. Convenience beats many other advantages, or "worse is better", an often stated adage.
Some organizations and projects, such as Wikimedia, decided to host their own Git instance, and not rely on Microsoft's. Due to the decentralized model of Git that's absolutely possible and encouraged. It is a bit of a hassle, but you don't rely on Microsoft for your project.
Github has become an important "hub" for developers, also because they provide profile pages for developers, showing off their contributions, achievements, etc. Hiring managers will often look at a developer's Github page to assess a candidate.
Microsoft made a change that contributions to projects will only "count" and be reflected on the Github profile of the contributor if they are made through Github (unless they are members of the organization owning the mirrored Git). Contributions through other paths don't count for the profile. Microsoft, worth a trillion dollar, is explaining that it's too "nuanced and difficult" for them to continue to display contributions on your profile which happened outside of Github.
I mean, it is clearly the fault of the community to allow Microsoft to embrace and enclosure this space. Will this change be enough to have developers leave Github? (No) How difficult will it be to get hiring managers to not just reflexively look up a Github profile? (Very) Will there be an outcry that will make Microsoft change their mind? (No) Is this just a move to ensure that they enclose the Open Source workflow even more? (They'll say no, and it might even be true, but they sure won't mind that this is happening)
The lesson we should learn, but won't, is to not allow companies to enclose and control such spaces. But we keep doing that, again and again. It's a pity.
Productivity pro tip
- make a list of all things you need to do
- keep that list roughly in order of priority, particularly on the first 3-5 items (lower on the list it doesn't matter that much)
- procrastinate the whole day from doing the number 1 item by doing the number 2 to 5 items