Application of Weighted Voting Taggers to Languages Described with Large Tagsets

Authors

  • Marcin Kuta
  • Wojciech Wojcik
  • Michał Wrzeszcz
  • Jacek Kitowski

Keywords:

Part-of-speech tagging, combination tagger, weighted probability distribution voting tagger, TagPair tagger

Abstract

The paper presents baseline and complex part-of-speech taggers applied to the modified corpus of Frequency Dictionary of Contemporary Polish, annotated with a large tagset. First, the paper examines accuracy of 6 baseline part-of-speech taggers. The main part of the work presents simple weighted voting and complex voting taggers. Special attention is paid to lexical voting methods and issues of ties and fallbacks. TagPair and WPDV voting methods achieve the top accuracy among all considered methods. Error reduction 10.8 % with respect to the best baseline tagger for the large tagset is comparable with other author's results for small tagsets.

Downloads

Download data is not yet available.

Author Biographies

Marcin Kuta

Institute of Computer Science
AGH University of Science and Technology
al. Mickiewicza 30, Cracow, Poland

Wojciech Wojcik

Institute of Computer Science
AGH University of Science and Technology
al. Mickiewicza 30, Cracow, Poland

Michał Wrzeszcz

Institute of Computer Science
AGH University of Science and Technology
al. Mickiewicza 30, Cracow, Poland

Jacek Kitowski

Institute of Computer Science
AGH University of Science and Technology
al. Mickiewicza 30, Cracow, Poland

Downloads

Published

2012-01-26

How to Cite

Kuta, M., Wojcik, W., Wrzeszcz, M., & Kitowski, J. (2012). Application of Weighted Voting Taggers to Languages Described with Large Tagsets. COMPUTING AND INFORMATICS, 29(2), 203–225. Retrieved from https://www.cai.sk/ojs/index.php/cai/article/view/81

Most read articles by the same author(s)

1 2 > >>