Ministry of Science and Higher Education of the Republic of Kazakhstan
L.N. Gumilyov Eurasian National University
Corpus of Academic Kazakh Language
The main idea of the project is to strengthen the scientific and practical potential and capabilities of the Kazakh language in academic texts – monographs, articles, reports, theses, abstracts, and dissertations – to establish the concept of academic Kazakh, to select the academic vocabulary in Kazakh texts, and to develop a list of academic Kazakh words.

Creation of an academic corpus
Compilation of a corpus of academic texts – a data source for research

Enhancement of scientific potential
Expanding the possibilities of the Kazakh language in science and practical use

Technological integration
Accelerating the use of digital technologies in the humanities

Expanding the scope of use
Increasing research and teaching of the Kazakh language in the international arena
Corpus of Academic Kazakh Language
Digital humanities: Developing the corpus of academic Kazakh language
This project is an important step aimed at enhancing the academic and scientific potential of the Kazakh language. The main goal of the project is to develop a corpus of written academic texts in Kazakh. It includes a large dataset composed of written texts such as monographs, articles, reports, and dissertations, containing at least 5 million words. The corpus helps to recognize the use of the Kazakh language in academic texts and supports its establishment as an academic language.
Corpus of Academic Kazakh Language
The goal of the scientific project
Within the project, an academic written corpus of the Kazakh language consisting of at least 5,000,000 words will be created, including 50,000 annotated words. This corpus will enable comprehensive research into the current scientific use of the Kazakh language. Thus, the collected data will become an important source reflecting the scientific potential of the Kazakh language within the modern research infrastructure. The corpus serves as a necessary tool for deeper analysis of the use of Kazakh in academic texts, improving linguistic resources, and conducting scientific research in this area.
Corpus of Academic Kazakh Language
Directions of the Academic Kazakh Language Corpus
This project is aimed at enhancing the scientific potential of the Kazakh language and exploring the possibilities of academic Kazakh. Below are the sections of the corpus and their functions.
General information about the project
Developing the scientific potential of the Kazakh language and creating an important database for researchers and language learners
Corpus structure and content
A wide range of academic texts: monographs, articles, dissertations, textbooks, teaching materials
Usage and benefits
Use for study and research purposes by students, researchers, language teachers, and learners
Scientific and educational resources
Elevating the research and teaching of the Kazakh language to the international level and integrating with research fields
Corpus of Academic Kazakh Language
Our mission
The project to develop the academic written corpus of the Kazakh language is an initiative that opens new opportunities to study and analyze the natural use of Kazakh in academic texts as a scientific language, aimed at renewing the research and teaching of the language at both national and international levels.

Teaching
The development of the necessary linguodidactic and applied linguistic foundation for teaching Kazakh as the state language and for instructing representatives of other ethnic groups as a foreign language.

Research
Studying the scientific and applied potential and capabilities of the Kazakh language using information technology opportunities, increasing interdisciplinary scientific research.

Community
An open platform for the international community of researchers.

Management
Research projects and efforts to develop the Kazakh language
Principal Investigator
Gulnar Sarseke
Academic degree: Candidate of Philological Sciences (KazSU, 1998), Master’s in Educational Management (King’s College London, 2013).
Academic title: Associate Professor (2001).
Education: Master’s program at King’s College London (2012-2013); postgraduate studies at KazSU (1996-1998); Pavlodar Pedagogical Institute, Kazakh language and literature (1989-1994).
Achievements: recipient of the Bolashak Scholarship (2010); Honored Worker of Education of the Republic of Kazakhstan (2008); International Scholar Exchange Fellowship participant (Korea, 2017-2018); recipient of the ITEC scholarship (India, 2015); research linguist (University of Maryland, USA, 2020).
Project experience: participation in a six-month research project on developing English and Kazakh language corpora at the University of Maryland; co-investigator of the “Multimedia Corpus of Contemporary Kazakh Spoken Language” project at Nazarbayev University (2021-2023); teaching experience in the course of Corpus Linguistics.
Corpus of Academic Kazakh Language
Partners
Subscribe to news
Dear reader, subscribe to our newsletter! You will be the first to know about the latest news, interesting articles, special offers, and events.