Difference between revisions of "Main Page"

From Simia

Jump to navigation Jump to search

Revision as of 18:56, 24 December 2007

<ask default="None yet" format="embedded" limit="2" sort="published" order="desc"> +"+" contains an extrinsic dash or other characters that are invalid for a date interpretation.</ask>

30 years of wikis

Today is the 30th anniversary of the launch of the first wiki by Ward Cunningham. A page that anyone could edit. Right from the browser. It was generally seen as a bad idea. What if people did bad things?

Originally with the goal to support the software development community in creating a repository of software design patterns, wikis were later used for many other goals (even an encyclopedia!), and became part of the recipe, together with blogs, fora and early social media, that was considered the Web 2.0.

Thank you, Ward, and congratulations on the first 30 years.

A wiki birthday card is being collected on Wikiindex.

My thoughts on Alignment research

Alignment research seeks to ensure that hypothetical future superintelligent AIs will be beneficial to humanity—that they are "aligned" with "our goals," that they won’t turn into Skynet or universal paperclip factories.

But these AI systems will be embedded in larger processes and organizations. And the problem is: we haven’t even figured out how to align those systems with human values.

Throughout history, companies and institutions have committed atrocious deeds—killing, poisoning, discriminating—sometimes intentionally, sometimes despite the best intentions of the individuals within them. These organizations were composed entirely of humans. There was no lack of human intelligence that could have recognized and tempered their misalignment.

Sometimes, misalignment was prevented. When it was, we might have called the people responsible heroes—or insubordinate. We might have awarded them medals, or they might have lost their lives.

Haven’t we all witnessed situations where a human, using a computer or acting within an organization, seemed unable to do the obvious right thing?

Yesterday, my flight to Philadelphia was delayed by a day. So I called the hotel I had booked to let them know I’d be arriving later.

The drama and the pain the front desk clerk went through!

“If you don’t show up today,” he told me, “your whole reservation will be canceled by the system. And we’re fully booked.”

“That’s why I’m calling. I am coming—just a day later. I’m not asking for a refund.”

“No, look, the system won’t let me just cancel one night. And I can’t create a new reservation. And if you don’t check in today, your booking will be canceled…”

And that was a minor issue. The clerk wanted to help. It is a classical case of Little Britain's "Computer says no" sketch. And yet, more and more decisions are being made algorithmically—decisions far more consequential than whether I’ll have a hotel room for the night. Decisions about mortgages and university admissions. Decisions about medical procedures. Decisions about clemency and prison terms. All handled by systems that are becoming increasingly "intelligent"—and increasingly opaque. Systems in which human oversight is diminishing, for better and for worse.

For millennia, organizations and institutions have exhibited superhuman capabilities—sometimes even superhuman intelligence. They accomplish things no individual human could achieve alone. Though we often tell history as a story of heroes and individuals, humanity’s greatest feats have been the work of institutions and societies. Even the individuals we celebrate typically succeeded because they lived in environments that provided the time, space, and resources to focus on their work.

Yet we have no reliable way of ensuring that these superhuman institutions—corporations, governments, bureaucracies—are aligned with the broader goals of humanity. We know that laissez-faire policies have allowed companies to do terrible things. We know that bureaucracies, over time, become self-serving, prioritizing their own growth over their original purpose. We know that organizations can produce outcomes directly opposed to their stated missions.

And these misalignments happen despite the fact that these organizations are made up of humans—beings with whom we are intimately familiar. If we can’t even align them, what hope do we have of aligning an alien, inhuman intelligence? Or even a superintelligence?

More troubling still: why should we accept a future in which only a handful of trillion-dollar companies—the dominant tech firms of the Western U.S.—control access to such powerful, unalignable systems? What have these corporations done to earn such an extraordinary level of trust in a technology that some fear could be catastrophic?

What am I arguing for? To stop alignment research? No, not at all. But I would love for us to shift our focus to the short- and mid-term effects of these technologies. Instead of debating whether we might have to fight Skynet, we should be considering how to prevent further concentration of wealth by 2030 and how to ensure a fairer distribution of the benefits these technologies bring to humanity. Instead of worrying about Roko’s basilisk, we should examine the impact of LLMs on employment markets—especially given the precarious state of unions and labor regulations in certain countries. Rather than fixating on hypothetical paperclip-maximizing AIs, we should focus on the real and immediate dangers of lethal autonomous weapons in warfare and terrorism.

The Editors

2 February 2025

I finished reading "The Editors" by Stephen Harrison, and I really enjoyed it. The novel follows some crucial moments of Infopendium, a free, editable online encyclopedia with mostly anonymous contributors. The setting is a fictionalized version of Wikipedia, and set around the beginning of the COVID pandemic.

The author is a journalist who has covered Wikipedia before, and now has written a fictional novel. It's not a roman à clef - the events described here have not happened for Wikipedia, even though some of the characters feel very much inspired by real Wikipedia contributors. I constantly had people I know playing the roles of DejaNu, Prospero, DocMirza, and Telos in my inner cinema. And as the book continued I found myself apologizing in my mind to the real people, because they would never act as in the book.

There were some later scenes I had a lot of trouble to suspend disbelief for, but it's hard to say which ones without spoiling too much. Also, I'm very glad that the real world Wikipedia is far more technically robust than Infopendium seems to be.

I recommend reading it. It offers a fictional entrypoint to ideas like edit wars, systemic bias, the pushback to it, anonymous collaboration, community values, sock puppets, conflict of interest, paid editing, and more, and I found it also a good yarn, with a richly woven plot. Thanks for the book!

AI and centralization

31 January 2025

We have a number of big American companies with a lot of influential connections which have literally spent billions of dollars into developing large models. And then another company comes in and releases a similar product available for free.

Suddenly, trillions of dollars are on the line. With their connections they can call for regulation, designed to protect their investment. They could claim that the free system is unsafe and dangerous, as Microsoft and Oracle were doing in the 90s with regards to open source. They could try to use and extend copyright once they have benefitted from the loose regulations, as Disney was doing in the 60s to 90s. They could increase the regulatory hurdles to enter the market. They could finance scientific studies, philosophers and ethicists to publish about the dangers and benefits of having this technology widely available, another playbook tobacco and oil companies have been following for decades.

It's about trillions of dollars. Some technology giants are seeing that opportunity to make easy money dissipate. They would love if everyone has to use their models, running on their cloud infrastructure. They would love if every little app made many calls to their services, sending a constant stream of money to them, if every piece of value created had an effective AI "tax" they would collect. In the 90s and 00s Microsoft made huge amounts of money through the OS "tax", then Apple and Google and Microsoft made huge amounts of money through the app store "tax". Amazon and Microsoft and Google and OpenAI would love to have a repeat of that business model.

I would expect a lot of soft and hard power to be pushed around in the coming months. Many old playbooks reiterated, but also new playbooks introduced. Unimaginable amounts of value and money can and will be made, but how it will be distributed is an utterly non-transparent process. I don't know what an effective way would be to avoid a highly centralized world, to ensure that the fruits of all this work is distributed just a little bit more equally, to have a world in which we all have a bit of equity in the value being created.

To state it clearly: I'm not afraid of a superintelligent AI that will turn us all into paperclips. I'm afraid of a world where a handful of people have centralized extreme amounts of power and wealth, and where most of us struggle with living a good life in dignity. I'm afraid of a world where we don't have a say anymore in what happens. I'm afraid of a world where we effectively lost democracy and individual agency.

There is enough to go around to allow everyone to live a good life. And AI has the opportunity to add even more value to the world. But this will go with huge disruptions. How we distribute the wealth, value and power in the world is going to be one of the major questions of the 21st century. Again.

Languages with the best lexicographic data coverage in Wikidata 2024

23 January 2025

Languages with the best coverage as of the end of 2023

English 93.1% (=, +0.2%)
Italian 92.6% (+7, +9.7%)
Danish 92.3% (+3, +5.4%)
Spanish 91.8% (-2, +0.5%)
Norwegian Bokmal 89.4% (-2, +0.3%)
Swedish 89.3% (-2, +0.4%)
French 87.6% (-2, +0.6%)
Latin 85.7% (-1, -0.1%)
Norwegian Nynorsk 81.8% (+1, +1.6%)
Estonian 81.3% (-1, +0.1%)
German 79.6% (=, +0.1%)
Malay 77.8% (+2, +4.7%)
Basque 75.9% (-1, =)
Portuguese 74.9% (-1, +0.1%)
Panjabi 73.3% (=, +2.3%)
Breton 71.1% (+1, +3.8%)
Czech 69.3% (NEW, +6.1%)
Slovak 67.8% (-2, =)
Igbo 67.8% (NEW, +2.0%)

What does the coverage mean? Given a text (usually Wikipedia in that language, but in some cases a corpus from the Leipzig Corpora Collection), how many of the occurrences in that text are already represented as forms in Wikidata's lexicographic data. The first number in the parentheses is the change in rank compared to last year, and the second number the change in coverage compared to last year.

The list contains all languages where the data covers more than two thirds of the selected corpus.

English managed to keep the lead, but the distance to the second place melted from 1.6% last year to a mere 0.5% this year. Italian and Danish made huge jumps forward, Italian by increasing coverage by almost 10% and raising seven ranks to second place. Compared to last year, two new languages made it into the top list, Czech and Igbo, both cracking the ⅔ limit to join the top list – Hindi just being behind at 66.5%.

The complete data is available on Wikidata.

Progress in lexicographic data in Wikidata 2024

22 January 2025

Here are some highlights of the progress in lexicographic data in Wikidata in 2024

Hausa: jumped from 1.5% coverage right to 40%
Danish: Danish also made another huge jump forward, increasing the number of forms from 170k to 570k, form coverage from 33% to 52%, and token coverage from 87% to 92%
Italian: Italian made another huge push, increased the number of forms from 290k to 410k, and the coverage from 83% to 93%
Spanish: Spanish also kept pushing forward, increasing the number of forms from 440k to 560k, and the coverage from 91.3% to 91.8%
Norwegian (Nynorsk): increased the number of forms from 67k to 88k, and coverage from 80% to 82%
Czech: increased the coverage from 63% to 69%, the number of forms from 190k to 210k
Tamil: almost doubled the number of forms from 3800 to 6600, increasing coverage from 8% to 11%
Breton: added 1000 new forms, increasing the coverage from 67% to 71%
Croatian: increased from 4k to 5.5k forms, improving coverage from 45% to 48%

What does the coverage mean? Given a text (usually Wikipedia in that language, but in some cases a corpus from the Leipzig Corpora Collection), how many of the occurrences in that text are already represented as forms in Wikidata's lexicographic data. Note that every percent more gets much more difficult than the previous one: an increase from 1% to 2% usually needs much much less work than from 91% to 92%.

See also last year's progress.

Wikidata lexicographic data coverage for Croatian in 2024

21 January 2025

For last year I picked up an ambitious goal for growing the lexicographic data for Croatian in 2024. And, just like last year, I missed again.

My goal was to grow the coverage to 50% - i.e. half of all the words in a Croatian corpus would be found in Wikidata. Instead, we grew from 45.5% to 47.9%. The number of forms grew from 4115 to 5506, more than a thousand new forms, a far bigger growth in forms than last year. So, even though the goal was missed, the speed of growth in Croatian is accelerating.

Part of that growth in forms is due to Google's Wordgraph release, a free dataset with words in about 40 languages which describe people - both demonyms and professions.

Do I want to set again a goal? After missing it twice, I am hesitant. Would I again reduce the goal further? But less than 50% sounds defeatist. But back to 60% is obviously too much. So, yes, let's go for 50% again. Let's see where it will take us this time. It's only 2.1% of coverage away from 50%, so that should be doable.

Large Language Models, Knowledge Graphs and Search Engines

16 January 2025

How can Large Language Models (LLMs), Knowledge Graphs and Search Engines be combined to best serve users? What are the strengths and limitations of these technologies?

Aidan Hogan (Universidad de Chile, previously DERI, Linked data), Luna Dong (Meta, previously Amazon and Google), Gerhard Weikum (MPI, Yago), and myself (Wikimedia, previously Google) have been invited to give keynotes on this topic in the last year or two, on different conferences. Now we wrote a paper together to synthesise and capture some of the ideas we were presenting.

Large Language Models, Knowledge Graphs and Search Engines: A Crossroads for Answering Users' Questions, arxiv.org/abs/2501.06699

Translating Nazor: The Man Who Lost a Button

23 December 2024

The most famous child of the island of Brač is very likely Vladimir Nazor. His books are part of the canon for Croatian children, and, as fate has laid it out, he also happened to become the first head of state of Croatia during and after World War II.

In 1924, exactly a hundred years ago, he published "Stories from childhood", a collection of short stories. I took one of his stories from that collection and translated it into English, to make some of his work more accessible to more readers, and to see how I would do with such a translation.

I welcome you to read "The Man Who Lost a Button". Feedback, comments, and reviews are very welcome. I am also planning to make a translation into German, but I don't know how long that will take.

2024 US election

10 November 2024

Some thoughts on the US election.

Wrong theory: 2024 was lost because Harris voters stayed home

I first believed that Harris lost just because people were staying home compared to 2020. But that, by itself, is an insufficient explanation.

At first glance, this seems to hold water: currently, we have 71 million votes reported for Harris, and 75 million votes reported for Trump, whereas last time Biden got 81 million votes and Trump 74 million votes. 10 million votes less is enough to lose an election, right?

There are two things that make this analysis insufficient: first, California is really slow at counting, and it is likely that both candidates will have a few million votes more when all is counted. Harris already has more votes than any candidate ever had, besides Biden and Trump.

Trump already has more votes than he got in the previous two elections. In 2020, more people voted for Trump than in 2016. In 2024, more people voted for Trump than in 2020.

Second, let’s look at the states that switched from Biden to Trump:

Wisconsin and Georgia: both Trump and Harris got more votes than Trump or Biden respectively in 2020
Pennsylvania, Nevada and Michigan: Trump already has more voters in 2024 than Biden had in 2020. Even if Harris had the same number of voters as Biden had in 2020, she would have lost these states.
Arizona still hasn’t counted a sixth of their votes, and it is unclear where the numbers will end up. If we just extrapolate linearly, Arizona will comfortably be in one of the two buckets above.

Result: There is no state where Biden’s 2020 turnout would have made a difference for Harris. (With the possible but unlikely exception of Arizona, where the counting is still lagging behind)

Yes, 10 million votes fewer for Harris than for Biden looks terrible and like sufficient explanation, but 1) this is not the final result and it will become much tighter, and 2) it wouldn’t have made a difference.

California is slow at counting

I was really confused: why had California only reported two thirds of its votes so far. I found the article below, explaining some of it, but it really seems a home-made mess for California, and one that the state should clean up.

https://www.berkeleyside.org/2024/11/08/alameda-county-election-results-slow-registrar

Voting results in PDF instead of JSON

Voting results in Alameda County will be released as PDF instead of JSON. The Registrar for Votes “recently told the Board of Supervisors that he’s following guidance from the California Secretary of State, which is recommending the PDF format to better safeguard the privacy of voters.”

This statement is wrong. JSON does not safeguard the privacy of voters any better than PDF does. This statement is not just wrong, it doesn’t even make sense.

In 2022, thanks to the availability of the JSON files, a third-party audit found an error in one Alameda election, resulting in the wrong person being certified. “Election advocates say the PDF format is almost impossible to analyze, which means outside organizations won’t be able to double-check [...] [I]f the registrar had released the cast vote record in PDF format in 2022, the wrong person would still be sitting in an OUSD board seat.”

The county registrar is just following the California Secretary of State. According to a letter by the registrar: “If a Registrar intends to produce the CVR [Cast Vote Record], it must be in a secure and locked PDF format. The Secretary of State views this as a directive that must be followed according to state law. I noted that this format does not allow for easy data analysis. The Secretary of State’s Office explained that they were aware of the limitations when they issued this directive. [...] San Francisco has historically produced its CVR in JSON format, contrary to the Secretary of State's directive. The Secretary of State’s office has informed me that they are in discussions with San Francisco to bring them into compliance”.

Sources:

It was not a decisive win

There are many analyses about why Harris lost the election, and many are going far overboard, and often for political reasons, with the aim to influence the platform of the Democratic party for the next election. This wasn’t a decisive win.

I wanted to make the argument that 30k voters in Wisconsin, 80k voters in Michigan, and 140k voters in Pennsylvania would have made the difference. And that’s true. I wanted to compare that with other US elections, and show that this is tighter than usual.

But it’s not. US elections are just often very tight. There are exceptions, the first Obama election was such an exception. But in general, American elections are tight (I’ll define a tight election as “if I can find that by flipping less than 0.5% of the voters, a different president would have been elected”).

I don’t know how advisable it is to make big decisions on a basically random outcome.

How to pronounce MySQL

17 October 2024

Today I learned (or re-learned), that the "My" in MySQL does not stand for the English word my but for the Swedish name My, which is the name of the daughter of MySQL co-founder Michael Widenius. ♡

The name My was introduced by Tove Jansson for the Moomins character Little My.

According to the Words & Stuff blog: "Turns out that she was named after the Greek letter mu; for the Finnish pronunciation of Myy, see the video. In English, it turns out that her name is pronounced like the English word my (/maɪ/), rhyming with the English word hi."

So you could pronounce My as [ˈmyː] or as [maɪ].

This is in addition to the well-known discussion about how to pronounce SQL, which I will not further dive into here.

By the way, the MySQL documentation defines the official pronunciation: "The official way to pronounce “MySQL” is “My Ess Que Ell” (not “my sequel”), but we do not mind if you pronounce it as “my sequel” or in some other localized way.", but it seems that when speaking Swedish the MySQL developers also say "mü-ess-ku-ell" (source).

A passport odyssey

10 October 2024

A story of hope, decades long lost friends, and love beyond borders. A story of going to a new world, a story of challenges. But above all, a story of bureaucracy.

Almost three years ago, my wife and I were blessed with our little sunshine. She was born in the City by the Bay, San Francisco, just a few months after we moved there from Berlin, Germany. A few weeks after her birth, we decided to start the process that would get her the papers confirming she’s a European citizen — I am Croatian, and thus by Croatian law, she is Croatian too. All we needed was to get the paperwork done so that she actually holds the Croatian passport in her little hands. How hard could that be?

The closest Croatian consulate is in Los Angeles, but they offer a great service: more or less regularly they come to different cities in their area of responsibility, and offer consular services there. I called the consulate in Los Angeles, and figured out what papers we needed, and when they would be close to San Francisco the next time. It was a few weeks later that we drove to San Jose and to submit all necessary paperwork.

Waiting at the consulate, I noticed a man who looked like he was from Brač, the same island I am from. Now note that Croatia has more than 4 Million people, and Brač only has 14,434 of those, so the sheer probability of him being from Brač was less than one percent — if he was from Croatia at all. I told my wife that I think he’s from Brač.

“What? How would you know?”

“He looks like it.”

“What do you mean, he looks like it?”

“I don’t know. He does.”

“That’s nonsense.”

“I’m gonna ask him.”

As said, Brač is an island, so it might be that this little bit of isolation might have lead to people look in a certain way. Or it might just be that this specific nose just looked too much like my cousin’s nose. Who knows. I went over, and asked him.

He was.

So we started talking about people that we both know (turns out, there were a few). After a minute or two, a lady overheard us talking and also chimed in. She also knew a few of those people. She also happened to be from Brač. We figured that we had quite a few common acquaintances, until I suddenly mentioned my parents’ names.

The lady looked at me in shock. She asked, to be sure she didn’t mishear. I confirmed. She asked again. I confirmed. She started crying.

Which was a bit awkward.

It turns out, that my mother and she were classmates. Like half a century ago, half a world away, they went to the same school every morning. She had emigrated to California many years ago, and she had visited my mother in Supetar on Brač when I was the age of my daughter. She had played with me more than thirty years ago. On the spot, I gave a call to my mother and let the two of them talk. What a surprise!

But back to the paperwork. There was a small extra step required, it turned out. My wife and I had, in fact, not yet registered our marriage in Croatia. And in order to register my daughter’s birth correctly it would be necessary to first register our marriage.

A year earlier, we already had tried that once, but it failed because of a tiny problem.

We got married two years before in Berlin, as we lived there. And as we were planning to travel to Croatia rather soon, we thought we would register our marriage in Croatia instead of through the consulate in Berlin. Should be much simpler.

So on a very hot summer day four years ago we went to the administration in Supetar on Brač in order to register our marriage. We had all necessary papers with us, but, as said there was a tiny problem: what is my name?

It turns out that my Croatian documents had a dash between my first and second name, effectively turning it in a single double-name. My German papers though, throughout, lack this little dash. And so did our German marriage certificate. No dash. So what was my name? I had my mom there. I asked her. She didn’t know. It was a chaotic birth because I decided to come early. It was a bit of a jumble. She didn’t remember my name. Thanks, mom.

What has happened?

When I was born in Germany — and I am sorry for the flashback within the flashback — the consulate there send a message to the administration in Supetar in what was back then Yugoslavia. Given that this was in the dark ages before the internet, the message was a so-called fax. A fax is a scanner that takes the scanned data and sends it over an active phone connection to another fax, where the scan is printed. Faxes back then usually used about 300 to 1200 bytes per second, and on long distance calls, especially to the islands — where telephone lines were a very rare commodity — such faxes became quite expensive. Because of that, faxes heavily compressed the scanned data. Also scanners and printers back then, especially in fax machines, were not particularly great. The result was that faxes often looked like cheap copies that have travelled around half the world, which was in fact the case.

So when the consulate send a fax to the administration in Supetar, the fax that was received had a little splotch between my first and middle name. When they read it, they read that splotch as a dash, connecting my names. And that is how I was registered in Yugoslavia, and this is how Croatia registered me from the Yugoslav records. In fact, on that hot summer day in Supetar we actually saw the fax from back then — they still had it in their archive, and it really is easily mistaken for a dash — and that is how my name in Croatia and in Germany started diverging.

The administration recognized the error, and offered to immediately fix it. They would correct my papers, issue a new passport, and register the marriage. My name would be cleared.

Alas — we were just a few weeks from emigrating to the United States. Just the week before traveling to Croatia the United States consulate in Berlin had glued our visas into our passports. Changing the passport now would come at the most inconvenient time: even just getting an appointment with the US consulate in time would have been nearly impossible. And so we decided not to fix it at the time.

Fast forward. In order to get the Croatian passport for my newborn I first had to get her nationality confirmed. In order to confirm her nationality I first needed to get her birth registered. In order to get her birth registered I first had to get my marriage registered. In order to get my marriage registered I first had to get my name fixed.

Then the following steps took months of me communicating with the consulate, the consulate communicating with the administration in Croatia, and all back. In the end I got new papers that my name, indeed, had no dash. With that we went and registered our marriage. And with that we registered the baby’s birth. With that we established that she is indeed Croatian. And with that we could ask for a passport to be issued. More than 18 months of back and forth have passed until we reached that point.

A few weeks later, I asked for an update. Another few weeks later again. I didn’t receive any answer. So I called the consulate, to learn that the consul I was working with was not working there anymore. My emails were going nowhere.

I explained my situation. It took a while. I sent the documentation. I expected that all of this might restart from square one, but actually it did not. Within a few weeks my registration was updated, the passport issued, and together with the marriage and birth certificates, and also with a proof of nationality on my new old name, all papers send to us. Just in time for Valentine, my wife and I are now also officially married in Croatia, and my daughter has all the papers that prove she is a Croatian.

Closing this chapter of bureaucracy, I want to thank all people in the administration that were involved. Even though it took a ridiculously long time, everyone was always extremely friendly and helpful. I still find it hard to believe how a little faxing artifact almost four decades ago lead to prolonging a standard process to take years, and that reconnected my mother with a long-lost friend. It is amusing to see how easily reality can turn absurd.

First published on Medium on February 14, 2017.

Trademark on people names?

23 September 2024

Seven years ago, a UK born kid was named Loki Skywalker Mowbray. The family was planning to travel to the Dominican Republic and applied for a passport, and the UK Home Office denied the passport because Skywalker is a Trademark of Disney. Same thing happened a few weeks earlier, when a six year old girl named Khaleesi got her passport denied.

Loki got his passport issued, it is said. And I'm baffled that anyone in the Home Office would think that's an acceptable course of action.

Source : https://www.malaymail.com/news/life/2024/09/21/seven-year-old-boy-denied-passport-by-uk-home-office-over-star-wars-copyright-infringement-for-skywalker-name/151183

The quest for the lost graveyard

About thirty to forty years ago I usually spent my summers in Croatia, on the island of Brač. Some of the time I spent in Donji Humac, the home of my mother’s family, the rest of the time in Pučišća, the home of my father’s family.

In Pučišća, I often spend time with my cousins, including my cousin Robert. Like every kid of that age, we explored the neighborhood, and there was plenty to explore. One day, instead of going our usual way, towards the sea, we went the other direction. We crossed the nearby bypass road, and then, on the other side, found a small graveyard, with a chapel in the middle which also doubled as a crypt for a local rich family.

I remember the pine trees, the shade, the spiderwebs across the trees. I did not remember the name of the family for sure, but I think it was Dominis, or Gospodetnić. I remember the small stone fence which gave the graveyard an almost square shape. I remember the dark plates with the hard to read names, almost washed out by time and the scarce rain.

The main graveyard of Pučišća is in a different place, on the far end of the town, near one of the coves on the way to the large quarry outside of town. Since then, I learned a lot more about the history of Pučišća, and it often mentioned that main graveyard. The history of that place went back to Roman times, featuring a shrine to Jupiter and a little church to Saint Stephen from the 11th century. Also my family’s grave is in that graveyard, but only starting with my grandfather. I could never find where my great grandfather, or earlier generations, were buried.

I came to believe that the other graveyard, the one Robert and I had found, was for the less wealthy people of Pučiśća. I first thought it was the older one, Robert and I called it the 'old graveyard', but this didn’t make sense since the main graveyard literally contains the oldest traces of human settlement in town. We must have been mistaken.

Over the last few years, I tried to figure out more about the graveyard, but none of the sources I read mentioned it. There was also no entry on the find-a-grave website. I used Google Maps and OpenStreetMaps to find it, but failed. I used Google StreetView to follow the bypass road, which has been redone since, but couldn’t find it either. I decided that at the next opportunity I will find the graveyard again, and document all graves on find-a-grave. Maybe I will even find some ancestors.

This year I finally went back to Pučišća for a few weeks. Whereas I found it too hot to do much exploration, on one of the few cooler evenings I decided to finally take the walk, and find it. It took me a while, I wasn’t sure about the way, but eventually I came upon a square enclosure of the right size with a chapel in the middle. The chapel was dedicated to the Lady of Lourdes, and looked somewhat different than I remembered it. In particular, it did not contain a crypt. And although it had pine trees and spider webs, there was not a single grave, merely a large stone cross which had toppled over. On the way back, I also could reconstruct the path that Robert and I took a few decades ago. I am very sure this is the right place, but there are no graves.

I was confused. The next day, I happened to meet Robert. I asked him whether he remembers how we went exploring that direction as kids, and he immediately knew what I was talking about. Seemed to be a core memory for both of us. And then I asked about the graves.

There are no graves, he said. There were never graves. There never was a graveyard, and we had not found one. I had confabulated that whole part. We had found the enclosure and the chapel, but the other memories were an invention of my imagination. No wonder I could never find anything about it.

I am glad I resolved that question. I am a bit surprised by how well established that wrong memory was. Unsurprisingly, I still can recall the wrong memory of the graveyard, even though now I know it is wrong, and any memory of the actual events has long faded and been replaced with my continuous retelling of a story that never happened.

Heading for Germany

We're heading to the airport, to leave the United States, after more than ten years, and settle in Germany. It was a great time. California is amazing and beautiful. We had the opportunity to meet some awesome people, and I hope to stay connected with many of them for the rest of our lives. Thanks to everyone!

Thanks particularly to my wonderful wife who organized this move, and got everything ready for it, including the stressful procurement of an international health certificate for our cat in literally the last day possible. Or getting about hundred boxes packed to be shipped. Or figuring out how to sell a car on a short notice. And many other things, while keeping my back free so I could keep working.

I'm looking forward to come back to Germany, and I hope that my wife and daughter will find welcome and roots in our next part of our journey through life.

For more background on why we are leaving, see the previous post about moving to Germany.

P.S.: International travel with a pet is not recommended.

Github not displaying external contributions anymore

Git is a very widely used version control system. Version control systems are an absolute crucial tool for collaborating and developing software. Git was developed to be a decentralized such system, meaning that people could easier develop their own versions, collaborate on side ideas, and not rely on a single large central repository.

Github is a Microsoft-owned website which made it easy to start, maintain, and share Git repositories. In fact so easy that in many ways the advantages of decentralization that have been built-in into Git have been nullified. Convenience beats many other advantages, or "worse is better", an often stated adage.

Some organizations and projects, such as Wikimedia, decided to host their own Git instance, and not rely on Microsoft's. Due to the decentralized model of Git that's absolutely possible and encouraged. It is a bit of a hassle, but you don't rely on Microsoft for your project.

Github has become an important "hub" for developers, also because they provide profile pages for developers, showing off their contributions, achievements, etc. Hiring managers will often look at a developer's Github page to assess a candidate.

Microsoft made a change that contributions to projects will only "count" and be reflected on the Github profile of the contributor if they are made through Github (unless they are members of the organization owning the mirrored Git). Contributions through other paths don't count for the profile. Microsoft, worth a trillion dollar, is explaining that it's too "nuanced and difficult" for them to continue to display contributions on your profile which happened outside of Github.

I mean, it is clearly the fault of the community to allow Microsoft to embrace and enclosure this space. Will this change be enough to have developers leave Github? (No) How difficult will it be to get hiring managers to not just reflexively look up a Github profile? (Very) Will there be an outcry that will make Microsoft change their mind? (No) Is this just a move to ensure that they enclose the Open Source workflow even more? (They'll say no, and it might even be true, but they sure won't mind that this is happening)

The lesson we should learn, but won't, is to not allow companies to enclose and control such spaces. But we keep doing that, again and again. It's a pity.

Source: Starring a repository you've contributed to should make it show up on your profile, just like how it was for the past 10+ years

Productivity pro tip

make a list of all things you need to do
keep that list roughly in order of priority, particularly on the first 3-5 items (lower on the list it doesn't matter that much)
procrastinate the whole day from doing the number 1 item by doing the number 2 to 5 items

Facebook checking my activity

Facebook locked my account because of unusual behavior. I'm thankful they're checking. I often see obviously spammy behavior on Facebook.

Then they show me my latest posts and comments and ask me which one of these wasn't by me. And they all were by me. There was nothing in the sort of "Oh, I've now seen three of your posts, and you look like a really interesting person. Do you want to be my friend?" or trying to sell NFTs and coins or day trading.

Yeah, no, AI will still take a moment.

Experiment to understand LLMs better

Here’s an experiment I would love to do if I had the resources. Just to start gaining some more understanding of how LLMs work.

Train an LLM Z on a lot of English text.
Ensure that the LLM in its response uses correctly the past tense of “go”, “went”, in its responses.
Ask the LLM directly what the past tense of “to go” is, and expect “went”.
Remove all sentences / texts from the corpus that contain the word “went”. Add more text to the corpus to make it roughly the same size again.
Train an LLM A on that corpus.
Use the same prompts to see what the LLM uses instead of “went”.
Ask the LLM directly what the past tense of “to go” is. I expect “goed”?
How many example sentences / texts containing the text “went” does one need to add to the corpus of LLM A and retrain in order for the resulting LLM to get it right. Is one enough? Ten? A thousand?
Add an explicit sentence ‘The past tense of “to go” is “went”’. to the corpus of LLM A and retrain instead of the implicit training data. Did the trained LLM now get it right? Does it use it right? Does it answer the explicit question correctly?
Add an explicit sentence to the prompt of LLM A, instead of retraining it. Does it use the word right? Does it answer the explicit question correctly?

If there is some similar work to this out there, or if anyone has some work like this, I’d be very curious for pointers.

P.S.: Also, I would love to see whether people who do research on LLMs could correctly predict the result of this experiment ;)

Taking a self-driving car

Ten years ago, my daughter was just born and I just joined Google, who were working on self-driving cars. And I was always hoping that my daughter would not have to need to learn how to drive a car (but that if she wanted, she may). In the last ten years I lost confidence in that hope.

Yesterday, thanks to my wife organizing it, we took our first ride with a self-driving car, driving about ten minutes through San Francisco. And I guess a world-wide roll out will take time, maybe a lot of time, but what can I say: it drove very well.

Sleeping Lady with a Black Vase

31 May 2024

In 2009, a Hungarian art historian was watching the movie Stuart Little with his 3 year old daughter. And he's like "funny, that painting that's used in the set looks like that 1928 black and white photograph I have seen, of a piece of art which has been lost". So he sends a few emails...

Turns out, it *is* the actual artwork by Róbert Berény (1887-1953) which was last seen in public in 1928, and somehow made it to Sony, where it was used in a number of soap opera episodes and in Stuart Little.

Wikipedia article about Sleeping Lady with Black Vase

The Ring verse in German

28 May 2024

I finally got the Lord of the Rings in English. I never read it in its native English, only in a German translation, about thirty years ago.

And already on the first page I am stumped: the ring verse seems to me sooo much better in German than in English. Now, it is absolutely possible that this is due to me having read it as an impressionable teenager and having carried the translation with me for three decades and thus developed fondness and familiarity with it, but I think it's more than that.

Here are the verses in English, German, and a literal back-translation of the German to English:

Three Rings for the Elven-kings under the sky,

Seven for the Dwarf-lords in their halls of stone,

Nine for Mortal Men doomed to die,

One for the Dark Lord on his dark throne

In the Land of Mordor where the Shadows lie.

One Ring to rule them all,

One Ring to find them,

One Ring to bring them all,

and in the darkness bind them

In the Land of Mordor where the Shadows lie.

German translation by von Freymann:

Drei Ringe den Elbenkönigen hoch im Licht,'

Sieben den Zwergenherrschern in ihren Hallen aus Stein,

Den Sterblichen, ewig dem Tode verfallen, neun,

Einer dem dunklen Herrn auf dunklem Thron

Im Lande Mordor, wo die Schatten drohn.

Einen Ring, sie zu knechten, sie all zu finden,

ins Dunkle zu treiben und ewig zu binden

Im Lande Mordor, wo die Schatten drohn.

Back-translation of her translation by me:

Three Rings for the Elven kings high in the light,

Seven for the Dwarf-lords in their halls of stone,

For the mortals, eternally doomed to death, nine,

One for the Dark Lord on dark throne

In the Land of Mordor, where the Shadows loom.

One Ring, to enslave them, to find them,

to drive to Darkness, and forever bind them

In the Land of Mordor, where the Shadows loom.

The differences are small, but I find the selection of words by the translator to be stronger and more evocative than Tolkien's original. Which is amazing. Thanks to the great Ebba-Margareta von Freymann for her wonderful translation of the poems!

Originally, the publisher Klett hat trouble with translating Tolkien's poems, but Ebba-Margareta had been, for many years working on the translation of poems by Tolkien, and by using her translations, Klett did a great service to the book for the German-speaking world.

The height of Anson Mount

26 May 2024

Slop is filling up the Internet.

Today my Google Now feed even suggested (!) the following page which was focused solely on the height of Anson Mount. Now I assume Google thinks I'm interested in the actor because I've read about Star Trek.

https://berkah.blob.core.windows.net/ernews/how-tall-is-anson-mount.html

The article has a certain fascination, because it claims to be the ultimate guide to Anson Mount's height, and it goes in a lot of detail about it, for example explaining that height is often measured in feet and inches, or how having more height helps Mount find better fitting clothes.

It's also fascinating because it gives his height as 6'3 / 1.91. Google Knowledge Graph claims 6'1 / 1.85 without a source. And IMDb states 5'11½ / 1.82. The website Celebrity Heights lists 5'11¼ / 1.81. I kid you not.

That makes me wonder whether I'm yearning back to times when people were publishing stuff like this (I'm not):

https://winteriscoming.net/2021/06/17/james-gunn-star-trek-anson-mount-fight-twitter-actors-lie-height/

Here we see reporting about a Twitter discussion between Mount and director James Gunn about actors lying about their height, and Mount seemingly being touchy about that subject.

The algorithmically pushed article also mentions Mount's place of birth in Tennessee (Wikipedia though says Illinois, but trust whom you will).

The Web has, almost from the beginning, been a place that you shouldn't trust blindly. I used to trust Google to be a first layer of defense. But the last few weeks indicate that this is no longer the case. Google will now push AI generated slop right to me, whereas it should try to keep me from even pulling it from the Web. I hope Google will figure that out.

In the last few weeks it's getting increasingly difficult to get correct information on the Web. I'm noticing it around Pokemon Go, where I look up whether a Pokemon has already been released, or how to evolve it. I get arbitrary answers, which I found plain wrong several times. Google's results are not ranked by trustworthiness, and now I have to start to remember which sites to trust, which sucks.

This is going to be exhausting.

(And if you think this is only true about pop culture stuff, then bless your heart)

Little Richard and James Brown

When Little Richard started becoming more famous, he already had signed up for a number of gigs but was then getting much better opportunities coming in. He was worried about his reputation, so he did not want to cancel the previous agreed gigs, but also did not want to miss the new opportunities. Instead he sent a different singer who was introduced as Little Richard, because most concert goers back then did not know how Little Richard exactly looked like.

The stand-in was James Brown, who at this point was unknown, and who later had a huge career, becoming an inaugural inductee to the Rock and Roll Hall of Fame - two years before Little Richard.

(I am learning a lot from and am enjoying Andrew Hickey's brilliant podcast "A History of Rock and Roll in 500 Songs")

Johnny Cash and Stalin

Johnny Cash was the first American to learn about Stalin's death.

At that time, Cash was a member of the Armed Forces and stationed in Germany. According to Cash, he was the one to intercept the Morse code message about Stalin's death before it was announced.

The Heat Death of the Internet

Good observations, and closing on a hopeful note. Short and pointed read.

The Heat Death of the Internet, by Gregory Bennett

Beyoncé's Number One in Country

Beyoncé very explicitly announced her latest album to be a country album, calling it "Cowboy Carter", and her single "Texas Hold 'Em" made her the first Black woman to top Billboard's Hot Country Songs charts.

It is good that Beyoncé made it so glaringly obvious that her song is a country song. The number of Black artists to have topped the Hot Country Song charts is surprisingly small: Charley Pride in the 70s, Ray Charles in a duet with Willie Nelson for one week in 1984, and then Darius Rucker and Kane Brown in the last decade or two.

Maybe one reason to understand why it is so hard to chart for Black artists in this particular genre: "Old Town Road", the debut single by Lil Nas X, first was listed on the Hot Country Song chart, but then Billboard decided that this was a mistake and instead recategorized the song, taking it off the Country charts in March 2019 before it would have become the Number One hit on April 6, 2019 were it not removed.

Billboard released a long explanation explaining that this decision had nothing to do with racism.

Cowboy Carter was released exactly in the same week five years after Old Town Road would have hit Number One.

I guess Beyoncé really wanted to make sure that everyone knows that her album and single are country.

War in the shadows

A few years ago I learned with shock and surprise that in the 1960s and 1970s Croatians have been assassinated by the Yugoslav secret service in other countries, such as Germany, and that the German government back then chose to mostly look away. That upset me. In the last few weeks I listened to a number of podcasts that were going into more details about these events, and it turned out that some of those murdered Croatians were entangled with the WW2 fascist Croatian Ustasha regime -- either by being Ustasha themselves, or by actively working towards recreating the Ustasha regime in Croatia.

Some of the people involved were actively pursing terrorist acts - killing diplomats and trying to kill politicians, hijacking and possibly downing airplanes, bombing cinemas, and even trying an actual armed uprising.

There was a failed attempt of planting seventeen bombs along the Croatian Adria, on tourist beaches, during the early tourist season, and to detonate them all simultaneously, in order to starve off income from tourism for Yugoslavia.

Germany struggled with these events themselves: their own secret service was tasked with protecting the German state, and it was initially even unclear how to deal with organizations whose goal is to destabilize a foreign government. Laws and rules were changed in order to deal with the Croatian extremists, rules that were later applied to the PLO, IRA, Hamas, etc.

Knowing a bit more of the background, where it seems that a communist regime was assassinating fascists and terrorists, does not excuse these acts, nor the German inactivity. It is a political assassination without due process. But it makes it a bit better understandable why the German post-Nazi administration, that was at that time busy with its own wave of terror by the Rote Armee Fraktion RAF, was not giving more attention to these events. And Germany received some of its due when Yugoslavia captured some of the kidnappers and murderers of Hanns Martin Schleyer, and did not extradite them to Germany, but let them go, because Germany did not agree to hand over Croatian separatists in return.

Croatians had a very different reputation in the 1970s than the have today.

I still feel like I have a very incomplete picture of all of these events, but so many things happened that I had no idea about.

Source podcasts in German

Krieg im Schatten - Auf Deutsch, ein Podcast in sechs Folgen, zu einem bestimmten Fall in den Ereignissen - Mischung aus True Crime und Geschichte
Episode 78 von "Neues vom Ballaballa-Balkan" - Besser fand ich noch diese Folge, welche die Ereignisse im größeren Zusammenhang betrachtet

Daniel Dennett

R.I.P. Daniel Dennett.

An influential modern voice on the question of Philosophy and AI, especially with the idea of the intentional stance.

Daniel Dennett, Obituary, by Dustin Sigsbee, in the Daily Nous

Katherine Maher on The Truth

Wikipedia is about verifiable facts from reliable sources. For Wikipedia, arguing with "The Truth" is often not effective. Wikipedians don't argue "because it's true" but "because that's what's in this source".

It is painful and upsetting to see Katherine Maher so viciously and widely attacked on Twitter. Especially for a quote repeated out-of-context which restates one of the foundations of Wikipedia.

I have worked with Katherine. We were lucky to have her at Wikipedia, and NPR is lucky to have her now.

The quote - again, as said, taken out of the context that it stems from the way Wikipedia editors collaborate is: "Our reverence for the truth might be a distraction that's getting in the way of finding common ground and getting things done."

It is taken from this TED Talk by Katherine, which provides sufficient context for the quote.

Partial copyright for an AI generated work

Interesting development in US cases around copyright and AI: author Elisa Shupe asked for copyright registration on a book that was created with the help of generative AI. Shupe stated that not giving her registration would be disabilities discrimination, since she would not have been able to create her work otherwise. On appeal, her work was partially granted protection for the “selection, coordination, and arrangement of text generated by artificial intelligence”, without referral to the disability argument.

Wired: How One Author Pushed the Limits of AI Copyright, by Kate Knibbs

Northern Arizona

Last week we had a wonderful trip through Northern Arizona.

Itinerary: starting in Phoenix going Northeast through Tonto National Forest towards Winslow. In Tonto, we met our first surprise, which would become a recurring pattern: whereas we expected Arizona in April to be hot, and we were prepared for hot, it had some really cold spells, and we were not prepared for cold. We started in the Sonoran Desert, surrounded by cacti and sun, but one and a half hours later in Tonto, we were driving through a veritable snow storm, but fortunately, just as it was getting worrisome, we crossed the ridge and started descending towards Winslow to the North.

The Colorado Plateau on the other side of the ridge was then pleasant and warm, and the next days we traveled through and visited the Petrified Forest, Monument Valley, Horseshoe Bend, Antelope Canyon, and more.

After that we headed for the Grand Canyon, but temperatures dropped so low, and we didn't have the right outfit for that, we stayed less than a day there, most of it huddled in the hotel room. Still, the views we got were just amazing, and throwing snowballs was an unexpected fun exercise.

Our last stop took us to Sedona, where we were again welcomed with amazing views. The rocks and formations all had in common that they dramatically changed with the movement of the sun, or with us moving around, and the views were always fresh.

Numbers: Our trip took us about 950 miles / 1500 kilometeres of driving, and I was happy that it was a good Jeep for this trip. The difference in altitude went from 1000 feet / 330 meters in Phoenix up to 8000 feet / 2400 meters driving through Coconino. Temperatures ranged from 86° F / 30° C to 20° F / -7° C.

What I learned again is how big this country is. And how beautiful.

Surprises: One thing that surprised me was how hidden the Canyons can be. Well, you can't hide Grand Canyon, but it is easy to pass by Antelope Canyon and not realizing it is there. Because it is just a cut in the plateau.

I also was surprised about how flat and wide the land is. I have mostly lived in areas where you had mountains or at least hills nearby, but the Colorado Plateau has large wide swaths of flat land. "Once the land was as plane as a pancake".

I mentioned the biggest surprise already, which was how cold it got.

Towns: it was astonishing to see the difference between, on the one side, a town such as Page or Sedona and on the other side Winslow. All three have a similar population, but Page and Sedona felt vigorous, lively, clean, whereas Winslow felt as if it was on the decline, deserted, struggling.

The hotel we stayed in in Winslow, La Posada, was a beautiful, weird, unique jewel that I hesitate to flat-out recommend, it is too unusual for that, but that I still enjoyed experiencing. It is clearly very different from any other hotel I ever stayed in, full of history, and embracing themes of both suicide and hope, respectfully trying to grow with the native population, and aiming to revive the city's old town, and it is difficult to really capture the vibe it was sending out.

For pictures, I am afraid I am pointing to my Facebook posts, which should be visible without login:

Crossing eight time zone borders in three hours

Hopi Nation is an enclave within Navajo Nation. Navajo Nation is located across three US states, Arizona, New Mexico, and Utah.

Arizona does not observe daylight saving time. Navajo Nation observes daylight saving time. Hopi Nation does not observe daylight saving time. You can drive three hours in that area and cross timezones eight times.

All of the individual decisions make totally sense:

Arizona does not adhere to daylight saving time because any measure that makes sure Arizona residents get more sunshine is worse than bringing coals to Newcastle, as the saying goes. They are smart to not use daylight saving time.

Navajo Nation uses daylight saving time because they want to have the same timezone for their whole area, and they are also in two other states, Utah and New Mexico, which both have daylight saving time, so they decided to do so too, which makes totally sense.

And Hopi Nation, even though it is enclosed by the Navajo Nation, lies entirely within the state of Arizona, so it makes sense for them to follow *that* state.

All the individual decisions make sense, but the outcome must be rather inconvenient and potentially confusing for the people living there.

(Bonus:the solution for these seem obvious to me. Utah and New Mexico and many other southern US states should just get rid of daylight saving time, just as Arizona did, and Navajo Nation should follow suit. But that's just my opinion.)

New home in Emeryville

Our new (temporary home) is the City of Emeryville. Emeryville has a population of almost 13,000 people. The apartment complex we live in has about 400 units, and I estimate that they have about 2 people on average in each. Assuming that about 90% of the apartments are occupied, this single apartment complex would constitute between 5 and 10% of the population of the whole city.

A conspiracy to kill a browser

Great story about how YouTube helped with moving away from IE6.

"Our most renegade web developer, an otherwise soft-spoken Croatian guy, insisted on checking in the code under his name, as a badge of personal honor, and the rest of us leveraged our OldTuber status to approve the code review."

I swear that wasn't me. Although I would have loved to do it.

Chris Zacharias, A conspiracy to kill IE6

(first published on Facebook March 12, 2024)

35th birthday of the Web

Celebrating the 35th birthday of the World Wide Web, a letter by its founder, Tim Berners-Lee.

Discussing some of the issues of the Web of today: too much centralization, too much exploitation, too much disinformation, all made even more dire by the development of AI.

What to do? Some of the solution the letter mentions are Mastodon, a decentralized social network, and Solid, a Web-standards-based data governance solution, but it recognizes that more is needed, "to back the morally courageous leadership that is rising, collectivise their solutions, and to overturn the online world being dictated by profit to one that is dictated by the needs of humanity." I agree with that, but find it a bit vague.

Tim Berners-Lee on the 35th birthday of the Web, Web Foundation

I first was terribly annoyed that the letter was published on Medium, as this is a symptom of the centralization of the Web. I say, completely conscious that I am discussing it on Facebook. Obviously, both of this should be happening on our own domains, and it also does: I link not to Medium, but to the Web Foundation site, and I also have this posted on my own site and on my Mastodon account. So, it is there, on the real Web, not just on the closed walled gardens of Facebook and on one of the megasites such as Medium. But there is no indication of engagement on the Web Foundation's post, whereas the Medium article records more than 10,000 reactions, and my Facebook post will also show more reactions than my Website (but the Mastodon page could be competitive with Facebook for me).

I want to believe that Solid is the next important step, but Leigh Dodds's recent post on Solid, and particularly the discussion in the post, didn't inspire hope.

Gödel on language

"The more I think about language, the more it amazes me that people ever understand each other at all." - Kurt Gödel

Rainbows end

Rainbows end.

The book, written in 2006, was set in 2025 in San Diego. Its author, Vernor Vinge, died yesterday, March 20, 2024, in nearby La Jolla, at the age of 79. He is probably best known for popularizing the concept of the Technological Singularity, but I found many other of his ideas far more fascinating.

Rainbows end explores themes such as shared realities, digital surveillance, and the digitisation of the world, years before Marc Andreessen proclaimed that "software is eating the word", describing it much more colorfully and rich than Andreessen ever did.

His other work that I enjoyed is True Names, discussing anonymity and pseudonymity on the Web. A version of the book was published with essays by Marvin Minsky, Danny Hillis, and others. who were inspired by True Names.

His Science Fiction was in a rare genre, which I love to read more about: mostly non-dystopian, in the nearby future, hard sci-fi, and yet, imaginative, exploring novel technologies and their implications on individuals and society.

Rainbows end.

Photo of a sign I took in 2008 in New Zealand, pointing to Rainbows End - a local theme park.

From vexing uncertainty to intellectual humility

A philosopher with schizophrenia wrote a harrowing account of how he experiences schizophrenia. And I wonder if some of the lessons are true for everyone, and what that means for society.

"It’s definite belief, not certainty, that allows me to get along. It’s not that certainty, or something like it, never matters. If you are fixing dinner for me I’ll try to be clear about the eggplant allergy [...] But most of the time, just having a definite, if unconfirmed and possibly false, belief about the situation is fine. It allows one to get along.

"I think of this attitude as a kind of “intellectual humility” because although I do care about truth—and as a consequence of caring about truth, I do form beliefs about what is true—I no longer agonize about whether my judgments are wrong. For me, living relatively free from debilitating anxiety is incompatible with relentless pursuit of truth. Instead, I need clear beliefs and a willingness to change them when circumstances and evidence demand, without worrying about, or getting upset about, being wrong. This attitude has made life better and has made the “near-collapses” much rarer."

From Vexing Uncertainty to Intellectual Humility by Michael Dickson

(first published on Facebook March 13, 2024)

Feeding the cat

Every morning, I lovingly and carefully scoop out every single morsel of meat from the tin of wet food for our cat. And then he eats a tenth of it.

Dolly Parton's What's up?

Dolly Parton is an amazing person. On "Rockstar", her latest album, she covered a great number of songs, with amazing collaborators, often the original interpreters or writers. In her cover of "What's up?", a song I really love, with Linda Perry, she changed a few lines of the lyrics, as one often does when covering, to make a song their own.

Instead of "Twenty-five years and my life is still...", she's singing "All of these years and my life is still..." - and makes total sense, because unlike Linda Perry she wasn't 25 when she wrote it, she was 77 when she recorded it.

Instead of "I take a deep breath and I get real high", Dolly takes "a deep breath and wonders why", and it makes sense, because, hey, it's Dolly Parton.

But here's the line that hurts, right there when the song reaches its high point:

"And I pray,

Oh my God do I pray!

I pray every. single. day.

for a revolution!"

She changed one letter in the last word:

"for a resolution"

And it just breaks my heart. Because it feels so weak. Because it seems to betray the song. Because it seems to betray everything. And also because I might agree with her, and that feels like betrayal too.

Dolly Parton and Linda Perry: What's up?

Views on the US economy 2024

By most metrics, the American economy is doing well. But the perception of the American economy is much weaker than its actual strength. This seems to finally slowly break up a bit, and people are realizing that things are actually not that bad.

Here's an article that tries to explain it: because of high interest rates, credit is expensive, including credit card debt, and if someone is buying a home now.

‘I’m making the most money I’ve ever made, and I’m still living paycheck to paycheck’, Fortune, March 15, 2024, by Claire Ballentine, Eliza Ronalds-Hannon and Bloomberg

But if you go beyond the anecdotes as this essay does, and look at the actual data, you will find something else: it is a very partisan thing.

For Democrats we find that it depends. Basically, the more you fit to the dominant group - the richer you are, the older, the better educated, the "whiter", the "maler" - the better your view of the economy.

For Republicans we don't find any such differentiation. Everyone is negative about it, across the board. Their perception of the economic situation are crassly different from the perception of their Democratic peers.

Views of the nation's economy, Pew Research

Libertarian cities

I usually try to contain my "Schadenfreude", but reading this article made it really difficult to do so. It starts with the story of Rio Verde Foothills and its lack of water supply after it was intentionally built to circumvent zoning regulations regarding water supply, and lists a few other examples, such as

"Grafton, New Hampshire. It’s a tiny town that was taken over by libertarians who moved there en masse to create their vision of heaven on earth. They voted themselves into power, slashed taxes and cut the town’s already minuscule budget to the bone. Journalist Matthew Hongoltz-Hetling recounts what happened next:

'Grafton was a poor town to begin with, but with tax revenue dropping even as its population expanded, things got steadily worse. Potholes multiplied, domestic disputes proliferated, violent crime spiked, and town workers started going without heat. ...'

Then the town was taken over by bears."

The article is worth reading:

Why libertarian cities fail by Adam Lee, April 6, 2023, OnlySky Media

The Wikipedia article is even more damning:

"Grafton is an active hub for Libertarians as part of the Free Town Project, an offshoot of the Free State Project. Grafton's appeal as a favorable destination was due to its absence of zoning laws and a very low property tax rate. Grafton was the focus of a movement begun by members of the Free State Project that sought to encourage libertarians to move to the town. After a rash of lawsuits from Free Towners, an influx of sex offenders, an increase of crime, problems with bold local bears, and the first murders in the town's history, the Libertarian project ended in 2016."

Grafton, New Hampshire - English Wikipedia

Get Morse code from text

6 February 2024

On Wikifunctions we have a function that translates text to Morse code. Go ahead, try it out.

I am stating that mostly in order to see if we can get Google to index the function pages on Wikifunctions, because we initially accidentally had them all set to not be indexed.

Playing around with Aliquot

28 January 2024

Warning! Very technical, not particularly insightful, and overly long post about hacking, discrete mathematics, and rabbit holes. I don't think that anything here is novel, others have done more comprehensive work, and found out more interesting stuff. This is not research, I am just playing around.

Ernest Davis is a NYU professor, author of many publications (including “Rebooting AI” with Gary Marcus) and a friend on Facebook. Two days ago he posted about Aliquot sequences, and that it is yet unknown how they behave.

What is an Aliquot sequence? Given a number, take all its proper divisors, and add them up. That’s the next number in the sequence.

It seems that most of these either lead to 0 or end in a repeating pattern. But it may also be that they just keep on going indefinitely. We don’t know if there is any starting number for which that is the case. But the smallest candidate for that is 276.

So far, we know the first 2,145 steps for the Aliquot sequence starting at 276. That results in a number with 214 digits. We don’t know the successor of that number.

I was curious. I know that I wouldn’t be able to figure this out quickly, or, probably ever, because I simply don’t have enough skills in discrete mathematics (it’s not my field), but I wanted to figure out how much progress I can make with little effort. So I coded something up.

Anyone who really wanted to make a crack on this problem would probably choose C. Or, on the other side of the spectrum, Mathematica, but I don’t have a license and I am lazy. So I chose JavaScript. There were two hunches for going with JavaScript instead of my usual first language, Python, which would pay off later, but I will reveal them later in this post.

So, my first implementation was very naïve (source code). The function that calculates the next step in the Aliquot sequence is usually called s in the literature, so I kept that name:

 const divisors = (integer) => {
   const result = []
   for(let i = BigInt(1); i < integer; i++) {
     if(integer % i == 0) result.push(i)
   }
   return result
 }

 const sum = x => x.reduce(
   (partialSum, i) => partialSum + i, BigInt(0)
 )

 const s = (integer) => sum(divisors(integer))

I went for BigInt, not integer, because Ernest said that the 2,145th step had 214 digits, and the standard integer numbers in JavaScript stop being exact before we reach 16 digits (at 9,007,199,254,740,991, to be exact), so I chose BigInt, which supports arbitrary long integer numbers.

The first 30 steps ran each under a second on my one decade old 4 core Mac occupying one of the cores, reaching 8 digits, but then already the 36th step took longer than a minute - and we only had 10 digits so far. Worrying about the limits of integers turned out to be a bit preliminary: With this approach I would probably not reach that limit in a lifetime.

I dropped BigInt and just used the normal integers (source code). That gave me 10x-40x speedup! Now the first 33 steps were faster than a second, reaching 9 digits, and it took until the 45th step with 10 digits to be the first one to take longer than a minute. Unsurprisingly, a constant factor speedup wouldn’t do the trick here, we’re fighting against an exponential problem after all.

It was time to make the code less naïve (source code), and the first idea was to not check every number smaller than the target integer whether it divides (line 3 above), but only up to half of the target integer.

 const divisors = (integer) => {
   const result = []
   const half = integer / 2
   for(let i = 1; i <= half; i++) {
     if(integer % i == 0) result.push(i)
   }
   return result
 }

Tiny change. And exactly the expected impact: it ran double as fast. Now the first 34 steps ran under one second each (9 digits), and the first one to take longer than a minute was the 48th step (11 digits).

Checking until half of the target seemed still excessive. After all, for factorization we only need to check until the square root. That should be a more than constant speedup. And once we have all the factors, we should be able to quickly reconstruct all the divisors. Now this is the point where I have to admit that I have a cold or something, and the code for creating the divisors from the factors is probably overly convoluted and slow, but you know what? It doesn’t matter. The only thing that matters will be the speed of the factorization.

So my next step (source code) was a combination of a still quite naïve approach to factorization, with another function that recreates all the divisors.

 const factorize = (integer) => {
   const result = [ 1 ]
   let i = 2
   let product = 1
   let rest = integer
   let limit = Math.ceil(Math.sqrt(integer))
   while (i <= limit) {
     if (rest % i == 0) {
       result.push(i)
       product *= i
       rest = integer / product
       limit = Math.ceil(Math.sqrt(rest))
     } else {
       i++
     }
   }
   result.push(rest)
   return result
 }

 const divisors = (integer) => {
   const result = [ 1 ]
   const inner = (integer, list) => {
     result.push(integer)
     if (list.length === 0) {
       return [ integer ]
     }
     const in_results = [ integer ]
     const in_factors = inner(list[0], list.slice(1))
     for (const f of in_factors) {
       result.push(integer*f)
       result.push(f)
       in_results.push(integer*f)
       in_results.push(f)
     }
     return in_results
   }
   const list = factorize(integer)
   inner(list[0], list.slice(1))
   const im = [...new Set(result)].sort((a, b) => a - b)
   return im.slice(0, im.length-1)
 }

That made a big difference! The first 102 steps all were faster than a second, reaching 20 digits! That’s more than 100x speedup! And then, after step 116, the thing crashed.

Remember, integer only does well until 16 digits. The numbers were just too big for the standard integer type. So, back to BigInt. The availability of BigInt in JavaScript was my first hunch for choosing JavaScript (although that would have worked just fine in Python 3 as well). And that led to two surprises.

First, sorting arrays with BigInts is different from sorting arrays with integers. Well, I find already sorting arrays of integers a bit weird in JavaScript. If you don’t specify otherwise it sorts numbers lexicographically, instead of by value:

 [ 10, 2, 1 ].sort() == [ 1, 10, 2 ]

You need to provide a custom sorting function in order to sort the numbers by value, e.g.

 [ 10, 2, 1 ].sort((a, b) => a-b) == [ 1, 2, 10 ]

The same custom sorting functions won’t work for BigInt, though. The custom function for sorting requires an integer result, not a BigInt. We can write something like this:

 .sort((a, b) => (a < b)?-1:((a > b)?1:0))

The second surprise was that BigInt doesn’t have a square root function in the standard library. So I need to write one. Well, Newton is quickly implemented, or copy and pasted (source code).

Now, with switching to BigInt, we get the expected slowdown. The first 92 steps run faster than a second, reaching 19 digits, and then the first step to take longer than a minute is step 119, with 22 digits.

Also, the actual sequences started indeed diverging, due to the inaccuracy of large JavaScript integer: step 83 resulted in 23,762,659,088,671,304 using integers, but 23,762,659,088,671,300 using BigInt. And whereas that looks like a tiny difference on only the last digit, the number for the 84th step showed already what a big difference that makes: 20,792,326,702,587,410 with integers, and 35,168,735,451,235,260 with BigInt. The two sequences went entirely off.

What was also very evident is that at this point some numbers took a long time, and others were very quick. This is what you would expect from a more sophisticated approach to factorization, that it depends on the number and size of the factors. For example, calculating step 126 required to factorize the number 169,306,878,754,562,576,009,556, leading to 282,178,131,257,604,293,349,484, and that took more than 2 hours with that script on my machine. But then in Step 128 the result 552,686,316,482,422,494,409,324 was calculated from 346,582,424,991,772,739,637,140 in less than a second.

At that point I also started taking some of the numbers, and googled them, surfacing a Japanese blog post from ten years ago that posted the first 492 numbers, and also confirming that the 450th of these numbers corresponds to a published source. I compared the list with my numbers and was happy to see they corresponded so far. But I guesstimated I would not in a lifetime reach 492 steps, never mind the actual 2,145.

But that’s OK. Some things just need to be let rest.

That’s also when I realized that I was overly optimistic because I simply misread the number of steps that have been already calculated when reading the original post: I thought it was about 200-something steps, not 2,000-something steps. Or else I would have never started this. But now, thanks to the sunk cost fallacy, I wanted to push it just a bit further.

I took the biggest number that was posted on that blog post, 111,953,269,160,850,453,359,599,437,882,515,033,017,844,320,410,912, and let the algorithm work on that. No point in calculating steps 129, which is how far I have come, through step 492, if someone else already did that work.

While the computer was computing, I leisurely looked for libraries for fast factorization, but I told myself that no way am I going to install some scientific C library for this. And indeed, I found a few, such as the msieve project. Unsurprising, it was in C. But I also found a Website, CrypTool-Online with msieve on the Web (that’s pretty much one of the use cases I hope Wikifunctions will also support rather soonish). And there, I could not only run the numbers I already calculated locally, getting results in subsecond speed which took minutes and hours on my machine, but also the largest number from the Japanese website was factorized in seconds.

That just shows how much better a good library is than my naïve approach. I was slightly annoyed and challenged by the fact how much faster it is. Probably also runs on some fast machine in the cloud. Pretty brave to put a site like that up, and potentially have other people profit from the factorization of large numbers for, e.g. Bitcoin mining on your hardware.

The site is fortunately Open Source, and when I checked the source code I was surprised, delighted, and I understood why they would make it available through a Website: they don’t factorize the number in the cloud, but on the edge, in your browser! If someone uses a lot of resources, they don’t mind: it’s their own resources!

They took the msieve C library and compiled it to WebAssembly. And now I was intrigued. That’s something that’s useful for my work too, to better understand WebAssembly, as we use that in Wikifunctions too, although for now server side. So I rationalized, why not see if I can get that run on Node.

It was a bit of guessing and hacking. The JavaScript binding was written to be used in the browser, and the binary was supposed to be loaded through fetch. I guessed a few modifications, replacing fetch with Node’s FS, and managed to run it in Node. The hacks are terrible, but again, it doesn’t really matter as long as the factorization would speed up.

And after a bit of trying and experimenting, I got it running (source code). And that was my second hunch for choosing JavaScript: it had a great integration for using WebAssembly, and I figured it might come in handy to replace the JavaScript based solution. And now indeed, the factorization was happening in WebAssembly. I didn’t need to install any C libraries, no special environments, no nothing. I just could run Node, and it worked. I am absolutely positive there are is cleaner code out there, and I am sure I mismanaged that code terribly, but I got it to run. At the first run I found that it added an overhead of 4-5 milliseconds on each step, making the first few steps much slower than with pure JavaScript. That was a moment of disappointment.

But then: the first 129 steps, which I was waiting hours and hours to run, zoomed by before I could even look. Less than a minute, and all the 492 steps published on the Japanese blog post were done, allowing me to use them for reference and compare for correctness so far. The overhead was irrelevant, even across the 2,000 steps the overhead wouldn’t amount to more than 10 seconds.

The first step that took longer than a second was step 596, working on a 58 digit number. All the first 595 steps took less than a second each. The first step that took more than a minute, was step 751, a 76 digit number, taking 62 seconds, factorizing 3,846,326,269,123,604,249,534,537,245,589,642,779,527,836,356,985,238,147,044,691,944,551,978,095,188. The next step was done in 33 milliseconds.

The first 822 steps took an hour. Step 856 was reached after two hours, so that’s another 34 steps in the second hour. Unexpectedly, things slowed down again. Using faster machines, modern architectures, GPUs, TPUs, potentially better algorithms, such as CADO-NFS or GGNFS, all of that could speed that up by a lot, but I am happy how far I’ve gotten with, uhm, little effort. After 10 hours, we had 943 steps and a 92 digit number, 20 hours to get to step 978 and 95 digits. My goal was to reach step 1,000, and then publish this post and call it a day. By then, a number of steps already took more than an hour to compute. I hit step 1000 after about 28 and a half hours, a 96 digit number: 162,153,732,827,197,136,033,622,501,266,547,937,307,383,348,339,794,415,105,550,785,151,682,352,044,744,095,241,669,373,141,578.

I rationalize this all through “I had fun” and “I learned something about JavaScript, Node, and WebAssembly that will be useful for my job too”. But really, it was just one of these rabbit holes that I love to dive in. And if you read this post so far, you seem to be similarly inclined. Thanks for reading, and I hope I didn’t steal too much of your time.

I also posted a list of all the numbers I calculated so far, because I couldn’t find that list on the Web, and I found the list I found helpful. Maybe it will be useful for something. I doubt it. (P.S.: the list was already on the Web, I just wasn't looking careful enough. Both, OEIS and FactorDB have the full list.)

I don’t think any of the lessons here are surprising:

for many problems a naïve algorithm will take you to a good start, and might be sufficient
but never fight against an exponential algorithm that gets big enough, with constant speedups such as faster hardware. It’s a losing proposition
But hey, first wait if it gets big enough! Many problems with exponential complexity are perfectly solvable with naïve approaches if the problem stays small enough
better algorithms really make a difference
use existing libraries!
advancing research is hard
WebAssembly is really cool and you should learn more about it
the state of WebAssembly out there can still be wacky, and sometimes you need to hack around to get things to work

In the meantime I played a bit with other numbers (thanks to having a 4 core computer!), and I will wager one ambitious hypothesis: if 276 diverges, so does 276,276, 276,276,276, 276,276,276,276, etc., all the way to 276,276,276,276,276,276,276,276,276, i.e. 9 times "276".

I know that 10 times "276" converges after 300 steps, but I have tested all the other "lexical multiples" of 276, and reached 70 digit numbers after hundreds of steps. For 276,276 I pushed it further, reaching 96 digits at step 640. (And for at least 276,276 we should already know the current best answer, because the Wikipedia article states that all the numbers under one million have been calculated, and only about 9,000 have an unclear fate. We should be able to check if 276,276 is one of those, which I didn't come around to yet).

Again, this is not research, but just fooling around. If you want actual systematic research, the Wikipedia article has a number of great links.

Thank you for reading so far!

P.S.: Now that I played around, I started following some of the links, and wow, there's a whole community with great tools and infrastructure having fun about Aliquot sequences and prime numbers doing this for decades. This is extremely fascinating.

The Surrounding Sea

15 January 2024

Explore the ocean of words in which we all are swimming, day in day out. A site that allows you to browse through the lexicographic data in Wikidata along four dimensions:

alphabetical, like in a good old fashioned dictionary
through translations and synonyms
where does this word come from, and where did it go
narrower and wider words, describing a hierarchy of meanings

Wikidata contains over 1.2 million lexicographic entries, but you will see the many gaps when exploring the sea of words. Please join us in charting out more of the world of words.

Happy 23rd birthday to Wikipedia and the movement it started!

The Surrounding Ocean

Das Mädchen Doch

14 January 2024

Sie sagten ihrer Mutter
Kinder werde sie nie haben
Und als sie geboren wurde
Nannte ihre Mutter sie
Doch

Sie sagten sie sei schwach
Und klein und krank
Und dass sie nicht
Lange zu leben habe
Doch

Ihre Mutter hoffte
Das sie in einer Welt aufwuchs
In der alle gleich behandelt wurden
Aber leider
Doch

Sie sagten Mathe und Autos
Seien nichts für Mädchen
Dass sie sich interessiert
Für Puppen und für Kleidung
Doch

Sie sagten die Welt
Ist wie sie ist
Und sie zu ändern
Sei nichts für kleine kranke Mädchen
Doch

Sie sagten gut dass Du darüber sprachst
Wir sollten darüber nachdenken
Lass uns jetzt darüber debattieren
Und wir (nicht Du) entscheiden dann
Doch

Sie sagten man kann nicht alles haben
Man muss sich entscheiden
Aber so selbstsüchtig
Ich meine, keine Kinder zu wollen
Doch

Sie sagten sie sei unanständig
So ein Leben sei nicht richtig
Benannten sie mit unanständigen Worten
Was sie sich denn erlaube
Doch

Sie sagten das geht doch nicht
So ein Leben sei kein Leben
Das ist jetzt schon sehr anders
Das ist nicht einfach nur Neid
Doch

Sie sagten wir sind halt nicht so
Und wollen auch nicht so sein
Wir sind glücklich wie wir sind
Und deswegen darfst du glücklich nicht sein
Doch

Languages with the best lexicographic data coverage in Wikidata 2023

Languages with the best coverage as of the end of 2023

English 92.9%
Spanish 91.3%
Bokmal 89.1%
Swedish 88.9%
French 86.9%
Danish 86.9%
Latin 85.8%
Italian 82.9%
Estonian 81.2%
Nynorsk 80.2%
German 79.5%
Basque 75.9%
Portuguese 74.8%
Malay 73.1%
Panjabi 71.0%
Slovak 67.8%
Breton 67.3%

What does the coverage mean? Given a text (usually Wikipedia in that language, but in some cases a corpus from the Leipzig Corpora Collection), how many of the occurrences in that text are already represented as forms in Wikidata's lexicographic data.

The list contains all languages where the data covers more than two thirds of the selected corpus.

Progress in lexicographic data in Wikidata 2023

Here are some highlights of the progress in lexicographic data in Wikidata in 2023

Greek jumped from 0% to 45% right away
Panjabi jumped right away from 0% to 71% (but on an admittedly small corpus)
Italian made a huge jump from 52% to 82% by increasing the number of forms from 9,000 to 286,000
Turkish jumped from 0.9% to 22%
Sindhi climbed from 15% to 25%
Farsi climbed from 15% to 24% increasing the number of forms from 4,000 to 33,000
Western Panjabi climbed from 36.9% to 47.9%
Hindi climbed from 49.9% to 65.9%
Breton increased from 56% to 67%
Croatian increased from 40% to 45%
Dutch went from 20% to 29%
French from 82.9% to 86.9%, mostly by dealing better with apostrophes in the analysis
Nynorsk pushed from 75% to 80% by increasing the number of forms from 18,000 to 68,000
Danish from 83.9% to 86.9% by increasing the number of forms in Wikidata from 65,000 to 170,000
German from 76% to 79% by increasing the number of forms in Wikidata from 90,000 to 200,000
Spanish pushed from 88% to 91% by increasing the number of forms from 280,000 to 430,000

What does the coverage mean? Given a text (usually Wikipedia in that language, but in some cases a corpus from the Leipzig Corpora Collection), how many of the occurrences in that text are already represented as forms in Wikidata's lexicographic data. Note that every percent more gets much more difficult than the previous one: an increase from 1% to 2% usually needs much much less work than from 91% to 92%.

RIP Niklaus Wirth

RIP Niklaus Wirth;

BEGIN

I don't think there's a person who created more programming languages that I used than Wirth: Pascal, Modula, and Oberon; maybe Guy Steele, depending on what you count;

Wirth is also famous for Wirth's law: software becomes slower more rapidly than hardware becomes faster;

He received the 1984 Turing Award, and had an asteroid named after him in 1999; Wirth died at the age of 89;

END.

... further results

Retrieved from "http://simia.net/index.php?title=Main_Page&oldid=1136"

Blog post

Facts

... more about "Main Page"