Semantic search

Jump to navigation Jump to search

Experiment to understand LLMs better

Here’s an experiment I would love to do if I had the resources. Just to start gaining some more understanding of how LLMs work.

  1. Train an LLM Z on a lot of English text.
  2. Ensure that the LLM in its response uses correctly the past tense of “go”, “went”, in its responses.
  3. Ask the LLM directly what the past tense of “to go” is, and expect “went”.
  4. Remove all sentences / texts from the corpus that contain the word “went”. Add more text to the corpus to make it roughly the same size again.
  5. Train an LLM A on that corpus.
  6. Use the same prompts to see what the LLM uses instead of “went”.
  7. Ask the LLM directly what the past tense of “to go” is. I expect “goed”?
  8. How many example sentences / texts containing the text “went” does one need to add to the corpus of LLM A and retrain in order for the resulting LLM to get it right. Is one enough? Ten? A thousand?
  9. Add an explicit sentence ‘The past tense of “to go” is “went”’. to the corpus of LLM A and retrain instead of the implicit training data. Did the trained LLM now get it right? Does it use it right? Does it answer the explicit question correctly?
  10. Add an explicit sentence to the prompt of LLM A, instead of retraining it. Does it use the word right? Does it answer the explicit question correctly?

If there is some similar work to this out there, or if anyone has some work like this, I’d be very curious for pointers.

P.S.: Also, I would love to see whether people who do research on LLMs could correctly predict the result of this experiment ;)

Taking a self-driving car

Ten years ago, my daughter was just born and I just joined Google, who were working on self-driving cars. And I was always hoping that my daughter would not have to need to learn how to drive a car (but that if she wanted, she may). In the last ten years I lost confidence in that hope.

Yesterday, thanks to my wife organizing it, we took our first ride with a self-driving car, driving about ten minutes through San Francisco. And I guess a world-wide roll out will take time, maybe a lot of time, but what can I say: it drove very well.

Sleeping Lady with a Black Vase

31 May 2024

In 2009, a Hungarian art historian was watching the movie Stuart Little with his 3 year old daughter. And he's like "funny, that painting that's used in the set looks like that 1928 black and white photograph I have seen, of a piece of art which has been lost". So he sends a few emails...

Turns out, it *is* the actual artwork by Róbert Berény (1887-1953) which was last seen in public in 1928, and somehow made it to Sony, where it was used in a number of soap opera episodes and in Stuart Little.

The Ring verse in German

28 May 2024

I finally got the Lord of the Rings in English. I never read it in its native English, only in a German translation, about thirty years ago.

And already on the first page I am stumped: the ring verse seems to me sooo much better in German than in English. Now, it is absolutely possible that this is due to me having read it as an impressionable teenager and having carried the translation with me for three decades and thus developed fondness and familiarity with it, but I think it's more than that.

Here are the verses in English, German, and a literal back-translation of the German to English:

Three Rings for the Elven-kings under the sky,
Seven for the Dwarf-lords in their halls of stone,
Nine for Mortal Men doomed to die,
One for the Dark Lord on his dark throne
In the Land of Mordor where the Shadows lie.
One Ring to rule them all,
One Ring to find them,
One Ring to bring them all,
and in the darkness bind them
In the Land of Mordor where the Shadows lie.

German translation by von Freymann:

Drei Ringe den Elbenkönigen hoch im Licht,'
Sieben den Zwergenherrschern in ihren Hallen aus Stein,
Den Sterblichen, ewig dem Tode verfallen, neun,
Einer dem dunklen Herrn auf dunklem Thron
Im Lande Mordor, wo die Schatten drohn.
Einen Ring, sie zu knechten, sie all zu finden,
ins Dunkle zu treiben und ewig zu binden
Im Lande Mordor, wo die Schatten drohn.

Back-translation of her translation by me:

Three Rings for the Elven kings high in the light,
Seven for the Dwarf-lords in their halls of stone,
For the mortals, eternally doomed to death, nine,
One for the Dark Lord on dark throne
In the Land of Mordor, where the Shadows loom.
One Ring, to enslave them, to find them,
to drive to Darkness, and forever bind them
In the Land of Mordor, where the Shadows loom.

The differences are small, but I find the selection of words by the translator to be stronger and more evocative than Tolkien's original. Which is amazing. Thanks to the great Ebba-Margareta von Freymann for her wonderful translation of the poems!

Originally, the publisher Klett hat trouble with translating Tolkien's poems, but Ebba-Margareta had been, for many years working on the translation of poems by Tolkien, and by using her translations, Klett did a great service to the book for the German-speaking world.


The height of Anson Mount

26 May 2024

Slop is filling up the Internet.

Today my Google Now feed even suggested (!) the following page which was focused solely on the height of Anson Mount. Now I assume Google thinks I'm interested in the actor because I've read about Star Trek.

https://berkah.blob.core.windows.net/ernews/how-tall-is-anson-mount.html

The article has a certain fascination, because it claims to be the ultimate guide to Anson Mount's height, and it goes in a lot of detail about it, for example explaining that height is often measured in feet and inches, or how having more height helps Mount find better fitting clothes.

It's also fascinating because it gives his height as 6'3 / 1.91. Google Knowledge Graph claims 6'1 / 1.85 without a source. And IMDb states 5'11½ / 1.82. The website Celebrity Heights lists 5'11¼ / 1.81. I kid you not.

That makes me wonder whether I'm yearning back to times when people were publishing stuff like this (I'm not):

https://winteriscoming.net/2021/06/17/james-gunn-star-trek-anson-mount-fight-twitter-actors-lie-height/

Here we see reporting about a Twitter discussion between Mount and director James Gunn about actors lying about their height, and Mount seemingly being touchy about that subject.

The algorithmically pushed article also mentions Mount's place of birth in Tennessee (Wikipedia though says Illinois, but trust whom you will).

The Web has, almost from the beginning, been a place that you shouldn't trust blindly. I used to trust Google to be a first layer of defense. But the last few weeks indicate that this is no longer the case. Google will now push AI generated slop right to me, whereas it should try to keep me from even pulling it from the Web. I hope Google will figure that out.

In the last few weeks it's getting increasingly difficult to get correct information on the Web. I'm noticing it around Pokemon Go, where I look up whether a Pokemon has already been released, or how to evolve it. I get arbitrary answers, which I found plain wrong several times. Google's results are not ranked by trustworthiness, and now I have to start to remember which sites to trust, which sucks.

This is going to be exhausting.

(And if you think this is only true about pop culture stuff, then bless your heart)

Little Richard and James Brown

When Little Richard started becoming more famous, he already had signed up for a number of gigs but was then getting much better opportunities coming in. He was worried about his reputation, so he did not want to cancel the previous agreed gigs, but also did not want to miss the new opportunities. Instead he sent a different singer who was introduced as Little Richard, because most concert goers back then did not know how Little Richard exactly looked like.

The stand-in was James Brown, who at this point was unknown, and who later had a huge career, becoming an inaugural inductee to the Rock and Roll Hall of Fame - two years before Little Richard.

(I am learning a lot from and am enjoying Andrew Hickey's brilliant podcast "A History of Rock and Roll in 500 Songs")

Johnny Cash and Stalin

Johnny Cash was the first American to learn about Stalin's death.

At that time, Cash was a member of the Armed Forces and stationed in Germany. According to Cash, he was the one to intercept the Morse code message about Stalin's death before it was announced.

The Heat Death of the Internet

Good observations, and closing on a hopeful note. Short and pointed read.

Beyoncé's Number One in Country

Beyoncé very explicitly announced her latest album to be a country album, calling it "Cowboy Carter", and her single "Texas Hold 'Em" made her the first Black woman to top Billboard's Hot Country Songs charts.

It is good that Beyoncé made it so glaringly obvious that her song is a country song. The number of Black artists to have topped the Hot Country Song charts is surprisingly small: Charley Pride in the 70s, Ray Charles in a duet with Willie Nelson for one week in 1984, and then Darius Rucker and Kane Brown in the last decade or two.

Maybe one reason to understand why it is so hard to chart for Black artists in this particular genre: "Old Town Road", the debut single by Lil Nas X, first was listed on the Hot Country Song chart, but then Billboard decided that this was a mistake and instead recategorized the song, taking it off the Country charts in March 2019 before it would have become the Number One hit on April 6, 2019 were it not removed.

Billboard released a long explanation explaining that this decision had nothing to do with racism.

Cowboy Carter was released exactly in the same week five years after Old Town Road would have hit Number One.

I guess Beyoncé really wanted to make sure that everyone knows that her album and single are country.

War in the shadows

A few years ago I learned with shock and surprise that in the 1960s and 1970s Croatians have been assassinated by the Yugoslav secret service in other countries, such as Germany, and that the German government back then chose to mostly look away. That upset me. In the last few weeks I listened to a number of podcasts that were going into more details about these events, and it turned out that some of those murdered Croatians were entangled with the WW2 fascist Croatian Ustasha regime -- either by being Ustasha themselves, or by actively working towards recreating the Ustasha regime in Croatia.

Some of the people involved were actively pursing terrorist acts - killing diplomats and trying to kill politicians, hijacking and possibly downing airplanes, bombing cinemas, and even trying an actual armed uprising.

There was a failed attempt of planting seventeen bombs along the Croatian Adria, on tourist beaches, during the early tourist season, and to detonate them all simultaneously, in order to starve off income from tourism for Yugoslavia.

Germany struggled with these events themselves: their own secret service was tasked with protecting the German state, and it was initially even unclear how to deal with organizations whose goal is to destabilize a foreign government. Laws and rules were changed in order to deal with the Croatian extremists, rules that were later applied to the PLO, IRA, Hamas, etc.

Knowing a bit more of the background, where it seems that a communist regime was assassinating fascists and terrorists, does not excuse these acts, nor the German inactivity. It is a political assassination without due process. But it makes it a bit better understandable why the German post-Nazi administration, that was at that time busy with its own wave of terror by the Rote Armee Fraktion RAF, was not giving more attention to these events. And Germany received some of its due when Yugoslavia captured some of the kidnappers and murderers of Hanns Martin Schleyer, and did not extradite them to Germany, but let them go, because Germany did not agree to hand over Croatian separatists in return.

Croatians had a very different reputation in the 1970s than the have today.

I still feel like I have a very incomplete picture of all of these events, but so many things happened that I had no idea about.

Source podcasts in German