Sherlock Holmes Texts
Here are the texts of two stories, tokenized, pos-tagged and
annotated with extended wordnet senses. This release is prepared
for
the shared
annotation task of the Events and Stories in the News workshop at ACL 2017. The texts are part of the NTU Multilingual Corpus, only the English texts are given here.
- The Adventure of the Speckled Band (SPEC: Arthur Conan Doyle, 1892)
- The Adventure of the Dancing Men (DANC: Arthur Conan Doyle, 1903)
Texts are tokenized, pos-tagged and tagged with wordnet senses. They are released in a modified version of the NLP annotation format (NAF).
Sense tags are:
- Wordnet senses (012345678-[avnrx])
http://compling.hss.ntu.edu.sg/ntumc/cgi-bin/wn-gridx.cgi?gridmode=ntumc-noedit&lang=eng&synset=77000021-n
You can find them online here (substitute the synset):
http://compling.hss.ntu.edu.sg/ntumc/cgi-bin/wn-gridx.cgi?gridmode=ntumc-noedit&lang=eng&synset=77000021-n
- if the first digit is 0,1,2 then they are offset and pos from Princeton Wordnet 3.0
- if the first digit is 7, then they are pronouns (including demonstratives, indefinite and interrogative)
- if the first digit is 8 or 9, then they are new senses from the NTU-MC.
- Named entities (loc|num|org|oth|per|dat:year)
The stories are public domain, the annotations are released under the Creative Commons Attribution 4.0 International License .
Francis Bond
<bond@ieee.org>
Division of Linguistics and Multilingual Studies
Nanyang Technological University
Level 3, Room 55, 14 Nanyang Drive, Singapore 637332
Tel: (+65) 6592 1568; Fax: (+65) 6794 6303