Hungarian spellcheckers

The peculiar complexities of the Hungarian language presented a unique challenge to spell checker programmers in the early days of computer programming. The solutions they were forced to find opened up new avenues in the field of language programming algorithms.

Background

Especially in the early decades of computer technology, creating spell-checking programs for some languages – such as Hungarian – encountered particular difficulties.

Indeed, while English usually has just two alphabetical variants of a word (its singular and plural forms), a spellchecker with a database of 50,000 words can thus recognise 100,000 words. However, the Hungarian language is completely different as it agglutinative.^[1] This means that basic words have several alphabetical variants depending on the former, conjugations and suffixes used. So a spellchecker with a database of 50,000 words would need to grow by a factor of two to three thousand to recognise all variants of these words.^[1]

NyelvÉsz

NyelvÉsz was a Hungarian spellchecker and corrector programme created in the late 1980s by computer engineer and programmer Tibor Béres, linguist Lajos Seregy, and co-programmers József Vanczák and Miklós Hámori. The name of the program was a play on words: the combination of the Hungarian words for language (nyelv) and mind (ész) literally means LinguIst ("NyelvÉsz").

The inherent complexity of the Hungarian language lead to the creation of a new programming model that could subsequently be universally applied to languages: the generative model.^[2] The novelty of the approach lay in the fact that the programme was not a parser, but an assembler, a constructor, a generator of words: The word database of the programme only contained the basic words, together with the information about all possible formers, bundles, conjugations and suffixes that each word could have. It also grouped words with the same substituability together, and included an algorithm describing how to perform substitutions for each group.

In addition, the programming required an unprecedented level of data compression at the time. NyelvÉsz only required 300 KB of memory, which one of its authors visualised by imagining pouring 400 hectolitres of wine into one bottle.^[1]

In the years after the regime change in Hungary (1989), entrepreneurship, companies and copyright were still in their infancy. The country's isolation from international academic and entrepreneurial life also resulted in a paucity of reliable English-language sources on the subject. Yet, a reliable, independent and in-depth source document from the period: a detailed programme testing supplemented with interviews in the leading computer journal at the time Computerworld Hungary,^[3] acknowledged it as the first Hungarian spellchecker, despite demonstrating its limitations and calling for further improvements.

One of Nyelvész's authors, linguist Lajos Seregy, consequently referred to it as the first Hungarian spellchecker in the first series of official Hungarian conferences in the field of applied linguistics,^[1]^[4]^[2] where he lectured on the topic for years; transcripts of all lectures presented at these conferences have been subsequently published in reviewed and edited volumes.

From a cultural and linguistic perspective, the relevance of the first Hungarian spellchecker program lies in the fact that its creation required basic linguistic research and additionally resulted in a mathematically precise description of the Hungarian language. For example, the number of conjugation paradigms existing within each word category became precisely definable, since the program had to contain all existing algorithms. It has become possible to precisely determine that the number of declensions of nouns in Hungarian is: 248. The number of conjugations, i.e. verbalisation paradigms, is close to 600.^[1]

From the perspective of programming history (especially in the field of language processing tools), NyelvÉsz's merit lies in the fact that it was the first to prove that the task of creating a Hungarian-language spellchecker was feasible using the tools available at the time. It was presented the IFABO IT-event in 1991, at a time when the general opinion was that the Hungarian "spell checker" was impossible to make – an opinion expressed in the May 1991 issue of the computer magazine Alaplap was still qualifying the task as "very complex", "mathematically ill-defined", and therefore "practically impossible to implement".^[4]

LEKTOR, Helyes-e?, Helyeske, Hunspell

The program needed many improvements, and one of its authors, Lajos Seregy, participated in the development of improved versions under the name LEKTOR. In the rapidly developing field of language technology, alongside LEKTOR, several other spell checkers quickly appeared, the most prominent representatives being: Helyes-e? created by Gábor Prószéky, Miklós Pál, László Tihanyi, further developed by Attila Novák; Helyeske created by Mihály Naszódi, László Farkas and László Elekfi; and Hunspell created by László Németh.^[5]

The operating principle of these spellcheckers is similar. The difference between them lies in whether the emphasis is placed on the basic vocabulary, the affix accuracy, or the most effective handling of word combinations. While the automatic classification method used for word expansion is pattern-based in the case of Helyes-e, the word classification algorithm also relies on statistical tools in the case of Hunspell.^[5]

Broader significance of the programming algorithms created

What the linguist at NyelvÉsz had only anticipated, was confirmed years later by Hunspell: the generative programming algorithm of Hungarian spell checkers outgrew the confines of the Hungarian language and was also suitable for modeling other languages, even the agglutinative ones. Hunspell's multilanguage library of correctly spelled words was later used to supply spell-check in 27 languages and was used by web browser Google Chrome.^[6]

In the age of emergence of large language models (LLM), what sheds new light on the operating mechanism of Hungarian spellcheckers is that their mode of operation with generative algorythms could have contained the early traces of deep learning architecture, the programming paradigm that forms the basis of artificial intelligence (AI). With the development of programming tools, the coding of relationships has expanded from the level of word elements to entire words and even texts. While early spell checkers assigned inflections to word stems, artificial intelligence assigns the numerical form of words (token), to the numerical form of the words most frequently occurring with them. Beyond rigid relationships, the relative strength of connections, the frequency of co-occurrence is also represented spatially, in a vector-like manner. Moreover, while early spell checkers required each word-unit of a language to be entered manually, advances in computer technology have made it possible to automate the entry of new word/text units. As was highlighted by the authors of "Attention Is All You Need", considered a foundational paper in modern artificial intelligence, proper word-embedding is sufficient to model even large amounts of texts.

References

^ ^a ^b ^c ^d ^e Seregy, Lajos (1991-07-04). Az első magyar "spelling-checker" [The first Hungarian "spelling-checker"] (PDF). Első magyar alkalmazott nyelvészeti konferencia (in Hungarian). Vol. 2. pp. 704–709.
^ ^a ^b Seregy, Lajos (1994). Az élő nyelvek leírásának új számítógépes modellje [A new computer model for the description of living languages] (PDF). IV. Országos Alkalmazott Nyelvészeti Konferencia (in Hungarian). Vol. 2. pp. 796–801. ISSN 0134-0492.
^ Révész, Gábor; Mester, Sándor (1991-08-15). "Nyelv körüli kerekasztal" [Language roundtable]. Computerworld Hungary (in Hungarian). No. VI. 33. pp. 13–21. Retrieved 2025-07-30.
^ ^a ^b Seregy, Lajos (1992). A számítógépes helyesírás-ellenőrző programok [The computer spell-checker programs] (PDF). II. Magyar Alkalmazott Nyelvészeti Konferencia (in Hungarian). Veszprém. pp. 13–19. ISBN 9637332294.
^ ^a ^b Naszódi, Mátyás (2017). A magyar helyesírás-ellenőrzők mai állása [State of the Hungarian Spell Checkers] (PDF). XIII. Magyar Számítógépes Nyelvészeti Konferencia (in Hungarian). Szeged. pp. 347-354. 373. ISBN 978-963-306-518-1.
^ Shankland, Stephen (Feb 12, 2009). "Google augments open-source spell-check". CNET.

[Seregy1991-1] Seregy, Lajos (1991-07-04). Az első magyar "spelling-checker" [The first Hungarian "spelling-checker"] (PDF). Első magyar alkalmazott nyelvészeti konferencia (in Hungarian). Vol. 2. pp. 704–709.

[Seregy1994-2] Seregy, Lajos (1994). Az élő nyelvek leírásának új számítógépes modellje [A new computer model for the description of living languages] (PDF). IV. Országos Alkalmazott Nyelvészeti Konferencia (in Hungarian). Vol. 2. pp. 796–801. ISSN 0134-0492.

[3] Révész, Gábor; Mester, Sándor (1991-08-15). "Nyelv körüli kerekasztal" [Language roundtable]. Computerworld Hungary (in Hungarian). No. VI. 33. pp. 13–21. Retrieved 2025-07-30.

[Seregy1992-4] Seregy, Lajos (1992). A számítógépes helyesírás-ellenőrző programok [The computer spell-checker programs] (PDF). II. Magyar Alkalmazott Nyelvészeti Konferencia (in Hungarian). Veszprém. pp. 13–19. ISBN 9637332294.

[:0-5] Naszódi, Mátyás (2017). A magyar helyesírás-ellenőrzők mai állása [State of the Hungarian Spell Checkers] (PDF). XIII. Magyar Számítógépes Nyelvészeti Konferencia (in Hungarian). Szeged. pp. 347-354. 373. ISBN 978-963-306-518-1.

[6] Shankland, Stephen (Feb 12, 2009). "Google augments open-source spell-check". CNET.

[1]

[2]

[3]

[4]

[5]

[6]