Repository logo
  • English
  • Latviešu
  • Log In
    or
    New user? Click here to register.Have you forgotten your password?
Repository logo
  • Communities & Collections
  • Research Outputs
  • Projects
  • People
  • English
  • Latviešu
  • Log In
    or
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. Faculty of Education, Languages and Design
  3. Scientific publications
  4. Scientific papers (IVDF)
  5. Mūsdienu latgaliešu valodas runas korpusa izveide mazāk lietoto valodu dokumentēšanas kontekstā
 
  • Details
Options

Mūsdienu latgaliešu valodas runas korpusa izveide mazāk lietoto valodu dokumentēšanas kontekstā

Journal
Letonica: humanitāro zinātņu žurnāls
Digitālās humanitārās zinātnes latvijā
ISSN
1407-3110
Date Issued
2022
Author(s)
Juško-Štekele, Angelika 
Rezekne Academy of Technologies 
Kļavinska, Antra 
Rezekne Academy of Technologies 
DOI
10.35539/ltnc.2022.0047.a.j.s.a.k.226.243
Abstract
According to data of UNESCO, in 2013, Latgalian language with 150,000 users was recognised as one of the world’s endangered and vulnerable languages, as all generations still use the oral form, but the sustainability of the language is seriously jeopardised, since the number of young language users decreases. Pursuant to the EU directives and recommendations for preservation, research and development of regional and endangered languages, as well as the Guidelines for the State Language Policy 2021–2027 regarding development, disclosure on the web and accessibility of varied text corpus, in 2020, a group of researchers of the Rēzekne Academy of Technologies in the Project of State Research Programme Digital Resources of Humanities: Integration and Development (No. VPP-IZM-DH-2020/1-0001) started its work on the development of the Contemporary Latgalian Speech Corpus (MuLaR) aimed at the documentation, research, studies and acquisition of Latgalian. The aim of the article is to identify and analyse the issues that are important in the process of creating MuLaR, applying the referential analysis of the scientific literature and comparative methodology. In turn, applying the analytical-synthetic method and based on the experience accumulated by the corpus creators, there was developed an initial model for the corpus architectonics and technological solutions, covering such issues as ensuring a representative Latgalian speech corpus, bearing in mind the territorial distribution of Latgalian language communities and diversity of Latgalian patois; the most appropriate methods to document natural, spontaneous language: collection of new data, opportunities to use the existing recordings (interviews, TV, radio broadcasts, field research data collections), other databases (reiti.rta.lv); understanding metadata; ethical aspects of the speech corpus; transcribing (software, conventions to reveal the features of spoken text as accurately as possible); creation of an accessible, easy-to-use open-access platform, using the experience of creating oral speech corpuses for lesser-used languages / dialects in other countries. The article declares the main challenges for the corpus development after the initial validation of the corpus data, including in relation to the morphological tagging possibilities of the corpus.
Subjects
  • korpuslingvistika

  • reprezentativitāte

  • korpusa dizains

  • metadati

  • transkripcija

  • konvencijas

File(s)
 main article: 12-creation-of-contemporary-latgalian-speech-corpus-in-the-context-of-documenting-lesser-used-languages.pdf (187.8 KB)
Scopus© citations
0
Acquisition Date
Jan 12, 2024
View Details
google-scholar
Views
Downloads
User Guide
  • Documentation

© Rezekne Academy of Technologies

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement
  • Send Feedback