The Center for Language and Speech Technologies and Applications (TALP UPC) has collaborated with the Institute of Catalan Studies (IEC) to develop a disambiguator that can classify words in the reference corpus of modern Catalan morphologically and syntactically, and determine their lemma or dictionary form (the word that we search for in the dictionary). Ten million words from texts of different sources (books, novels, newspapers, etc.) were compiled in a large database. These words were then categorised by the program so that lexicographers can establish their uses, the meaning they are generally given, and the expressions that are used, to create a prescriptive dictionary of modern Catalan.
For this project, TALP UPC used technologies associated with language processing that could also be applied in sectors that handle large amounts of information, such as the health sector, the financial sector and the management of emergencies and services.