The NTU Multilingual Corpus is a collection of parallel texts, with some sense tagged, some treebanked and some marked for sentiment.
ind.db
(not showing logs)
An incomplete list
Year | Class | from-to | Corpus | Comments |
---|---|---|---|---|
2011 | hg2002 | Singapore Tourist Data | website | |
2012 | hg2002 | The Cathedral and the Bazaar | essay | |
2013 | hg2002 | 10000, 10598 | SPEC (retag) | |
2014 | hg2002 | 11000, 11607 | DANC (retag) | |
2015 | hg2002 | 蜘蛛の糸 [The Spider's Thread] | multilingual | |
2016 | hg8011 | 55657, 56209 | REDH | A-E |
2018 | hg8011 | 50804, 51464 | SCAN | |
2018 | hg2002 | 45681, 46691 | HOUND | A-B |
2019 | hg8011 | 46692, 47487 | HOUND | A-D (E two only) |
2019 | hg2002 | 47488, 48504 | HOUND | A-B (with sentiment) |
2020 | hg2002 | 48505, 49505 | HOUND | A-B (C one only) (with sentiment) |
2021 | hg8011 | 18525, 18935 | FINA | A-C (one D) (with sentiment) |
2021 | hg2002 | 13147, 13968 | NAVA | A-C to 13973 done by RA (with sentiment) |
Canonical Citation:
Liling Tan and Francis Bond. 2012. Building and annotating the linguistically diverse NTU-MC (NTU-multilingual corpus). In International Journal of Asian Language Processing 22(4) pp 161–174.
Other References:
Francis Bond, Andrew Devadason, Melissa Rui Lin Teo and Luís Morgado da Costa (2021) Teaching Through Tagging — Interactive Lexical Semantics In Proceedings of the 11th Global Wordnet Conference (GWC 2021)
Francis Bond, Shan Wang, Eshley Huini Gao, Hazel Shuwen Mok, and Jeanette Yiwen Tan. 2013. Developing parallel sense-tagged corpora with wordnets. In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse (LAW 2013). Sofia. pp 149–158.
Yu Jie Seah and Francis Bond. 2014. Annotation of Pronouns in a Multilingual Corpus of Mandarin Chinese, English and Japanese. In 10th Joint ACL - ISO Workshop on Interoperable Semantic Annotation Reykjavik.
Slav Petrov, Dipanjan Das, and Ryan McDonald. 2011. A universal part-of-speech tagset. arXiv preprint arXiv:1104.2086.
Shan Wang and Francis Bond. 2014. Building The Sense-Tagged Multilingual Parallel Corpus. In 9th Edition of the Language Resources and Evaluation Conference (LREC 2014), Reykjavik.
Francis Bond, Tomoko Ohkuma, Luis Morgado da Costa, Yasuhide Miura, Rachel Chen, Takayuki Kuribayashi, and Wenjie Wang (2016) A multilingual sentiment corpus for Chinese, English and Japanese. In Proceedings of the LREC 2016 Workshop “Emotion and Sentiment Analysis”, Portorož. pp 59–62
Contributors: Francis Bond, Luís Morgado da Costa, Tuan Anh Le, Michael Wayne Goodman and many more.