Developing Linguistic Corpora:
a Guide to Good Practice Bibliography
AHDS: Arts and Humanities Data Service. http://www.ahds.ac.uk/. AHRC: Arts and Humanities Research Council. http://www.ahrc.ac.uk/. Allen, J., and Core, M. 1997. Draft of DAMSL: Dialog Act Markup in Several Layers. http://www.cs.rochester.edu/research/cisd/resources/damsl/RevisedManual/. Automatic Mapping Among Lexico-Grammatical Annotation Models (AMALGAM). http://www.comp.leeds.ac.uk/amalgam/amalgam/amalghome.htm. BAAL: British Association for Applied Linguistics. BAAL Recommendations on Good Practice in Applied Linguistics. http://www.baal.org.uk/goodprac.htm. Baker, J. P. 1997. Consistency and accuracy in correcting automatically tagged data. In Corpus annotation: Linguistic information from computer text corpora, eds. Roger Garside, G. Leech and A. McEnery, 243-250. London: Longman Baker, P., Hardie, A., McEnery, A., Xiao, R., Bontcheva, K., Cunningham, H., Gaizauskas, R., Hamza, O., Maynard, D., Tablan, V., Ursu, C., Jayaram, B., and Leisher, M. 2004. Corpus linguistics and South Asian languages: Corpus creation and tool development. Literary and Linguistic Computing 19:509-524 Biber, D., Johansson, S., Leech, G., Conrad, S., and Finegan, E. 1999. Longman grammar of spoken and written English. Harlow: Pearson Education Burnard, L. 1995. Users' reference guide to the British National Corpus. Oxford: Oxford University Computing Services Burnard, L. 1999. Using SGML for linguistic analysis: the case of the BNC. In Markup languages theory and practice, 31-51. Cambridge, Mass: MIT Press Burnard, L., and Dodd, T. 2003. Xara: an XML aware tool for corpus searching. http://www.oucs.ox.ac.uk/rts/xaira/Talks/cl2003.html. Carletta, J. 1996. Assessing agreement on classification tasks: the Kappa statistic. Computational Linguistics 22 Carletta, J., McKelvie, D., and Isard, A. 2002. Supporting linguistic annotation using XML and stylesheets. In Corpus linguistics: readings in a widening discipline, eds. G. Sampson and D. McCarthy. London & New York: Continuum Interpretations CLAWS part-of-speech tagger for English. UCREL. http://www.comp.lancs.ac.uk/computing/research/ucrel/claws/. Clear, J. 1992. Corpus sampling. In New directions in English language corpora, ed. G Leitner, 21-31. Berlin: Mouton de Gruyter COLT: Corpus of London Teenager. Department of English, University of Bergen. http://torvald.aksis.uib.no/colt/. Cook, G. 1995. Theoretical issues: transcribing the untranscribable. In Spoken English on Computer, eds. G. Leech, G. Myers and J. Thomas, 35-53. Harlow: Longman Dunlop, D. 1995. Practical considerations in the use of TEI headers in large corpora. In Text encoding initiative: background and context, eds. Nancy Ide and Jean Veronis, 242. Dordrecht; London: Kluwer Academic Edwards, J. 1993. Principles and contrasting systems of discourse transcription. In Talking Data: Transcription and coding in discourse research, eds. J. Edwards and M. Lampert, 3-32. Hillsdale, NJ: Lawrence Erlbaum Associates Edwards, J., and Lampert, M. 1993. Talking Data: Transcription and Coding in Discourse Research. Hillsdale, NJ: Lawrence Erlbaum Associates. Garside, R., Leech, G. N., and McEnery, T. 1997. Corpus annotation: linguistic information from computer text corpora. London: Longman GATE - general architecture for text engineering. http://gate.ac.uk. Gibaldi, J. 1998. MLA Style manual and Guide to Scholarly Publishing. New York: Modern Language Association Gibbon, D., Moore, R., and Winski, R. 1998. Handbook of standards and resources for spoken language systems.vol. 1: spoken language systems and corpus design. Berlin: Mouton de Gruyter Gillam, R. 2003. Unicode demystified. Boston: Addison-Wesley Goundry, N. 2001. Why Unicode won't work on the Internet: Linguistic, political, and technical limitations. http://www.hastingsresearch.com/net/04-unicode-limitations.shtml. Granger, S. 1998. Learner English on computer. London: Longman Granger, S., Hung, J., and Petch-Tyson, S. eds. 2002. Computer learner corpora, second language acquisition, and foreign language teaching. Amsterdam: John Benjamins Grice, M., Grice, M., Leech, G., Weisser, M., and Wilson, A. 2000. Representation and annotation of dialogue. In Handbook of multimodal and spoken dialogue systems: Resources, terminology and product evaluation, eds. D. Gibbon, I. Mertins and R. K. Moore, 1-101. Boston: Kluwer Halliday, M. 1993. Quantitative studies and probabilities in grammar. In Data, description discourse, ed. Michael Hoey, 1-25. London: Harper Collins Halteren, H. v. ed. 1999. Syntactic wordclass tagging. Text, speech, and language technology; 9. Dordrecht; Boston: Kluwer Academic Publishers Hirst, D. 1991. Intonation models: towards a third generation. In Actes du XIIeme Congres International des Sciences phonetiques. 19-24 aout 1991. Aix-en-Provence, France, 305-310. Aix-en-Povence: Universite de Provence, Service des Publications Hofland, K., and Johansson, S. 1982. Word frequencies in British and American English. London: Longman Hofland, K. c. 1999. ICAME CD-ROM. HIT Centre, University of Bergen. http://www.hit.uib.no/icame/cd. Ide, N. 1996. Corpus encoding standard. Version 1.5. Expert Advisory Group on Language Engineering Standards (EAGLES). http://www.cs.vassar.edu/CES/. James, G., Davison, R., Cheung, A., and Deerwater, S. 1994. English in computer science: a corpus-based lexical analysis. Hong Kong: Hong Kong University of Science and Technology and Longman Asia Johansson, S., Atwell, E., Garside, R., and Leech, G. 1986. The tagged LOB corpus: Users' manual. Norwegian Computing Centre for the Humanities. http://khnt.hit.uib.no/icame/manuals/lobman/INDEX.HTM. Johansson, S. 1995. The approach of the Text Encoding Initiative to the encoding of spoken discourse. In Spoken English on Computer, eds. G. Leech, G. Myers and J. Thomas, 82-98. Harlow: Longman Karlsson, F., Voutilainen, A., Heikkilä, J., and Antilla, A. 1995. Constraint grammar: a language-independent system for parsing unrestricted text. Berlin & New York: Mouton de Gruyter Kipp, M. Anvil.http://www.dfki.uni-sb.de/~kipp/anvil/. Knowles, G., Williams, B., and Taylor, L. 1996. A corpus of formal British English speech: the Lancaster/IBM Spoken English Corpus. London: Longman Korpela, J. 2001. A tutorial on character code issues. http://www.cs.tut.fi/~jkorpela/chars.html. Lamport, L. 1986. Latex: a document preparation system. Reading, Mass.: Addison-Wesley Leech, G., and Wilson, A. 1994. EAGLES morphosyntactic annotation. EAGLES report EAGSCSG/IR-T3.1. Pisa: Istituto di Linguistica Computazionale Leech, G., Barnett, R., and Kahrel, P. 1995a. Guidelines for the standardization of syntactic annotation of corpora. In EAGLES Document EAG-TCWG-SASG/1.8. Leech, G., Myers, G., and Thomas, J. eds. 1995b. Spoken English on computer. Harlow: Longman. Leech, G., and Wilson, A. 1999. Standards for Tagsets. In Syntactic Wordclass Tagging, ed. Hans van Halteren, 55-80. Dordrecht.: Kluwer Academic. Leech, G., and Weisser, M. 2003. Generic Speech Act Annotation for Task-Oriented Dialogue. In Proceedings of the Corpus Linguistics 2003 Conference, eds. D. Archer, P. Rayson, A. Wilson and A. McEnery. Lancaster: UCREL Technical Papers. Lickley, R. HCRC Disfluency coding manual. http://www.ling.ed.ac.uk/~robin/maptask/disfluency-coding.html. Marcus, M., Santorini, B., and Marcinkiewicz, M. 1993. Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics 19:313-330. Mengel, A., Dybkjaer, L., Garrido, J. M., Heid, U., Klein, M., Pirrelli, V., Poesio, M., Quazza, S., Schiffrin, A., and Soria, C. 2000. MATE Deliverable D 2.1. MATE Dialogue Annotation Guidelines. http://www.andreasmengel.de/pubs/mdag.pdf. Meyer, C. 2002. English Corpus Linguistics. Cambridge: Cambridge University Press. MICASE: Michigan Corpus of Academic Spoken English. http://www.hti.umich.edu/m/micase/. Morton, A. 1986. Once. A test of authorship based on words which are not repeated in the sample. Literary and Linguistic Computing 1:1-8. Pickering, B., Williams, B., and Knowles, G. 1996. Analysis of transcriber differences in the SEC. In Working with Speech, eds. G. Knowles, A. Wichmann and P. Alderson. London: Longman. Perez-Parent, M. 2002. Collection, handling, and analysis of classroom recordings data: using the original acoustic signal as the primary source of evidence. Reading Working Papers in Linguistics 6:245-254. http://www.rdg.ac.uk/app_ling/wp6/perezparent.pdf. Pierrehumbert, J. 1980. The phonology and phonetics of English intonation. MIT. Roach, P., and Arnfield, S. 1995. Linking prosodic transcription to the time dimension. In Spoken English on Computer, eds. G. Leech, G. Myers and J. Thomas, 149-160. Harlow: Longman. Roe, P. 1977. The notion of difficulty in scientific text. University of Birmingham. Sampson, G. 1995. English for the computer: the SUSANNE corpus and analytic scheme. Oxford: Clarendon Press Scott, M. WordSmith Tools. http://www.lexically.net/wordsmith/. Searle, S. J. Unicode revisited. http://tronweb.super-nova.co.jp/unicoderevisited.html. Searle, S. J. 1999. A brief history of character codes in North America, Europe, and East Asia. http://tronweb.super-nova.co.jp/characcodehist.html. Semino, E., and Short, M. 2003. Corpus Stylistics: Speech, Writing and Thought Presentation in a Corpus of English Narratives. London: Routledge Short, M., Semino, E., and Culpeper, J. 1996. Using a corpus for stylistics research: speech and thought presentation. In Using corpora for language research, eds. J. Thomas and M. Short, 110-131. London: Longman Sinclair, J. 1982. Reflections on computer corpora in English language research. In Computer corpora in English language research, ed. Stig Johansson: 1-6. Bergen. Sinclair, J. 1989. Corpus creation. In Language, learning and community, eds. C Candlin and T McNamara, 25-33: NCELTR Macquarie University. Sinclair, J. ed. 1990. Collins Cobuild English grammar. London: Collins. Sinclair, J. 1991. Corpus, concordance, collocation: Describing English language. Oxford: Oxford University Press. Sinclair, J. 1995. From theory to practice. In Spoken English on Computer, eds. G. Leech, G. Myers and J. Thomas, 99-112. Harlow: Longman. Sinclair, J. 2001. Preface. In Small corpus studies and ELT, eds. Mohsen Ghadessy, Alex Henry and Robert L. Roseberry, vii-xv. Amsterdam/Philadelphia: John Benjamins. Sinclair, J. 2003. Corpora for lexicography. In A practical guide to lexicography, ed. P Van Sterkenberg. Amsterdam: John Benjamins. Sinclair, J. 2004. Intuition and annotation - the discussion continues. In Advances in corpus linguistics. Papers from the 23rd International Conference on English Language Research on Computerized corproa (ICAME 23). Göteborg 22-26 May 2002., eds. Karin Aijmer and Bengt Altenberg, 39-59. Amsterdam/New York: Rodopi.http://www.ingentaconnect.com/content/rodopi/lang/2004/00000049/00000001/art00003. Smith, A. 2004. Preservation. In A companion to Digital Humanities, eds. S. Schreibman, R. Siemens and J. Unsworth, 576-591. Oxford: Blackwell. Sperberg-McQueen, C. M., and Burnard, L. 1994. Guidelines for electronic text encoding and interchange (TEI P3). Chicago & Oxford: ACH-ALLC-ACL Text Encoding Initiative. Stolcke, A., Ries, K., Coccaro, N., Shriberg, E., Bates, R., Jurafsky, R., Taylor, P., Martin, R., Van Ess-Dykema, C., and Meteer, M. 2000. Dialogue act modelling for automatic tagging and recognition of conversational speech. Computational Linguistics 26:339-373. Tapanainen, P., and Voutilainen, A. 1994. Tagging accurately - don't guess if you know. In Procedings of ANLP '94, 47-52. Stuttgart. Thompson, H., Anderson, A., and Bader, M. 1995. Publishing a spoken and written corpus on CD-ROM: the HCRC Map Task experience. In Spoken English on Computer, eds. G. Leech, G. Myers and J. Thomas, 168-182. Harlow: Longman. Tognini-Bonelli, E. 2001. Corpus linguistics at work: Studies in corpus linguistics, v. 6. Amsterdam: John Benjamins UCREL: University Centre for Computer Corpus Research on Language. http://www.comp.lancs.ac.uk/ucrel/. Unicode Consortium. 2003. The Unicode standard, Version 4.0. London: Addison-Wesley.http://www.unicode.org/versions/Unicode4.0.0/. van den Heuvel, H., Boves, L., and Sanders, E. 2000. Validation of content and quality of existing SLR: overview and methodology. http://www.spex.nl/validationcentre/d11v21.doc. Voutilainen, A., and Järvinen, T. 1995. Specifying a shallow grammatical representation for parsing purposes. In Proceedings from the 7th Conference of the European Chapter of the Association for Computational Linguistics, 210-214: Association for Computational Linguistics. Wells, J. C., Barry, W., Grice, M., Fourcin, A., and Gibbon, D. 1992. Standard computer-compatible transcription. Esprit project 2589 (SAM). In Doc. no. SAM-UCL-037. London: Phonetics and Linguistics Department, UCL. Whistler, K. Why Unicode will work on the Internet. http://slashdot.org/features/01/06/06/0132203.shtml. Working Group on Romanization Systems. United Nations Group of Experts on Geographical Names (UNGEGN). http://www.eki.ee/wgrs/. Zipf, G. K. 1935. The psychobiology of language. New York: Houghton Mifflin. |