<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>http://simia.net/index.php?action=history&amp;feed=atom&amp;title=Goal_for_Wikidata_lexicographic_data_coverage_2023</id>
	<title>Goal for Wikidata lexicographic data coverage 2023 - Revision history</title>
	<link rel="self" type="application/atom+xml" href="http://simia.net/index.php?action=history&amp;feed=atom&amp;title=Goal_for_Wikidata_lexicographic_data_coverage_2023"/>
	<link rel="alternate" type="text/html" href="http://simia.net/index.php?title=Goal_for_Wikidata_lexicographic_data_coverage_2023&amp;action=history"/>
	<updated>2026-05-10T00:47:58Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.32.0</generator>
	<entry>
		<id>http://simia.net/index.php?title=Goal_for_Wikidata_lexicographic_data_coverage_2023&amp;diff=2432&amp;oldid=prev</id>
		<title>Denny: Created page with &quot;{{pubdate|{{subst:CURRENTDAY}}|{{subst:CURRENTMONTHNAME}}|{{subst:CURRENTYEAR}}}}  At the beginning of 2022, Wikidata had 807 Croatian word forms, covering 5.8% of a Croatian...&quot;</title>
		<link rel="alternate" type="text/html" href="http://simia.net/index.php?title=Goal_for_Wikidata_lexicographic_data_coverage_2023&amp;diff=2432&amp;oldid=prev"/>
		<updated>2022-12-28T21:28:58Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;{{pubdate|{{subst:CURRENTDAY}}|{{subst:CURRENTMONTHNAME}}|{{subst:CURRENTYEAR}}}}  At the beginning of 2022, Wikidata had 807 Croatian word forms, covering 5.8% of a Croatian...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;{{pubdate|28|December|2022}}&lt;br /&gt;
&lt;br /&gt;
At the beginning of 2022, Wikidata had 807 Croatian word forms, covering 5.8% of a Croatian language corpus (Croatian Wikipedia). One of my goals this year was to significantly increase the coverage, trying to add word forms to Wikidata from week to week. And together with a yet small number of contributors, we pushed coverage just in time for the end fo the year to 40%. With only 3,124 forms, we covered 40% of all occurrences of words in the Croatian Wikipedia, i.e. 11.4 Million word occurrences (tokens).&lt;br /&gt;
&lt;br /&gt;
Since every percent is more and more difficult to add, for next year I aim for us to reach 60% coverage, or 5.7 Million more word occurrences. Below's a list of most frequent words in the corpus that are still missing. Let's see how many forms will be covered by the end of 2023! I think that's ambitious, even though it is, in coverage term only half of what we achieved this year. But as said, every subsequent percentage will become more difficult than the previous one.&lt;br /&gt;
&lt;br /&gt;
Statistics and missing words for 55 languages:&lt;br /&gt;
https://www.wikidata.org/wiki/Wikidata:Lexicographical_coverage&lt;br /&gt;
&lt;br /&gt;
Current statistics for Croatian:&lt;br /&gt;
https://www.wikidata.org/wiki/Wikidata:Lexicographical_coverage/hr/Statistics&lt;br /&gt;
&lt;br /&gt;
Statistics as of end of year 2022:&lt;br /&gt;
https://www.wikidata.org/w/index.php?title=Wikidata:Lexicographical_coverage/hr/Statistics&amp;amp;oldid=1797161415&lt;br /&gt;
&lt;br /&gt;
Statistics for end of year 2021:&lt;br /&gt;
https://www.wikidata.org/w/index.php?title=Wikidata:Lexicographical_coverage/hr/Statistics&amp;amp;oldid=1551737937&lt;br /&gt;
&lt;br /&gt;
List of most frequent missing forms in Croatian: https://www.wikidata.org/wiki/Wikidata:Lexicographical_coverage/hr/Missing&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{tag|Simia}}&lt;br /&gt;
&amp;lt;noinclude&amp;gt;{{simiapost|english}}&amp;lt;/noinclude&amp;gt;&lt;/div&gt;</summary>
		<author><name>Denny</name></author>
		
	</entry>
</feed>