Languages with the best lexicographic data coverage in Wikidata 2023

From Simia
Revision as of 15:17, 8 January 2024 by Denny (talk | contribs) (Created page with "{{pubdate|{{subst:CURRENTDAY}}|{{subst:CURRENTMONTHNAME}}|{{subst:CURRENTYEAR}}}} Languages with the best coverage as of the end of 2023 # English 92.9% # Spanish 91.3% # Bok...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Languages with the best coverage as of the end of 2023

  1. English 92.9%
  2. Spanish 91.3%
  3. Bokmal 89.1%
  4. Swedish 88.9%
  5. French 86.9%
  6. Danish 86.9%
  7. Latin 85.8%
  8. Italian 82.9%
  9. Estonian 81.2%
  10. Nynorsk 80.2%
  11. German 79.5%
  12. Basque 75.9%
  13. Portuguese 74.8%
  14. Malay 73.1%
  15. Panjabi 71.0%
  16. Slovak 67.8%
  17. Breton 67.3%

What does the coverage mean? Given a text (usually Wikipedia in that language, but in some cases a corpus from the Leipzig Corpora Collection), how many of the occurrences in that text are already represented as forms in Wikidata's lexicographic data.

The list contains all languages where the data covers more than two thirds of the selected corpus.

Simia

Previous entry:
Progress in lexicographic data in Wikidata 2023
Next entry:
Das Mädchen Doch