Lexical masks in JSON

From Simia
Revision as of 14:18, 15 November 2020 by Denny (talk | contribs) (Created page with "{{pubdate|22|June|{{subst:CURRENTYEAR}}}} We have released lexical masks as ShEx files before, schemata for lexicographic forms that can be used to validate whether the data...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

We have released lexical masks as ShEx files before, schemata for lexicographic forms that can be used to validate whether the data is complete.

We saw that it was quite challenging to turn these ShEx files into forms for entering the data, such as Lucas Werkmeister’s Lexeme Forms. So we adapted our approach slightly to publish JSON files that keep the structures in an easier to parse and understand format, and to also provide a script that translates these JSON files into ShEx Entity Schemas.

Furthermore, we published more masks for more languages and parts of speech than before.

Full documentation can be found on wiki: https://www.wikidata.org/wiki/Wikidata:Lexical_Masks#Paper

Background can be found in the paper: https://www.aclweb.org/anthology/2020.lrec-1.372/

Thanks Bruno, Saran, and Daniel for your great work!

Simia

Previous entry:
Major bill for US National Parks passed
Next entry:
Starting Abstract Wikipedia