Prof. Nilufar Abdurakhmonova,
National University of Uzbekistan, Uzbekistan
PARATRANSLATOR-UZBEK PARALLEL CORPUS: APPROACHES, CHALLENGES AND SOLUTION
Currently digital linguistic available data in the internet is major means of implementation to solve problems involving natural language processing. Particularly intellectual text and speech technologies are binding with huge linguistic resources that is required all aspect of understanding human mind. Recent year’s machine translation system developed rapidly due to enhancing capabilities human language infrastructure regarding to corpus technologies. In this case parallel corpora for machine translation technologies are crucial base to mine data to align appropriate matching lexemes and sentence pairs. Obviously English has vast opportunities for all world languages using AI neural system in multilingual machine translation platform and dominated in this sphere for smart technologies. We can say one vivid example is huge contribution to develop translation technology creation Europarl corpus which consists of 30 million words of 11 official languages of the European Union. Hence it can’t be said that the quality of translation is not good enough for scientific and official texts from Uzbek into English. For example Google is one instance for Uzbek the fact that word sense disambiguation and terminological system is considered as a challenge.
Considering above mentioned core issues to enrich parallel corpora for machine translation we obtained scientific-practical project entitled “PRATRANSLATOR: Creation corpus based context logical electronic translation platform” (2024-2025) financed by Ministry of Higher Education, Science and Innovations of the Republic of Uzbekistan. This project focused to build parallel texts Uzbek, English, French, Russian, and Turkish. Our methodological approach is to compile the texts using translation memory and Crawler of Python to segment as the stage of betexts based web corpus and human factors for all styles translated texts into Uzbek or from Uzbek for other foreign languages. It helps to fill up the database of terminological and polisemantic words which translated in literary texts as metaphor and phraseological unites which Google mismatching segments semantically. We hope that this contribution encourages investigations on machine translation technologies and translation studies for lexicography and terminography.