A Large Spanish-Catalan Parallel Corpus Release for Machine Translation

Authors

  • Marta R. Costa-Jussà TALP Research Center, Universitat Politécnica de Catalunya
  • José A. R. Fonollosa TALP Research Center, Universitat Politécnica de Catalunya
  • José B. Mariño TALP Research Center, Universitat Politécnica de Catalunya
  • Marc Poch Institut Universitari de Lingüística Aplicada (IULA), Universitat Pompeu Fabra
  • Mireia Farrús N-RAS Research Center, Universitat Pompeu Fabra

Keywords:

Catalan-Spanish parallel corpus, machine translation

Abstract

We present a large Spanish-Catalan parallel corpus extracted from ten years of the paper edition of a bilingual Catalan newspaper. The produced corpus of 7.5 M parallel sentences (around 180 M words per language) is useful for many natural language applications. We report excellent results when building a statistical machine translation system trained on this parallel corpus. The Spanish-Catalan corpus is partially available via ELDA (Evaluations and Language Resources Distribution Agency) in catalog number ELRA-W0053.

Downloads

Download data is not yet available.

Downloads

How to Cite

Costa-Jussà, M. R., Fonollosa, J. A. R., Mariño, J. B., Poch, M., & Farrús, M. (2015). A Large Spanish-Catalan Parallel Corpus Release for Machine Translation. COMPUTING AND INFORMATICS, 33(4), 907–920. Retrieved from https://www.cai.sk/ojs/index.php/cai/article/view/2807

Most read articles by the same author(s)