HG7017: Computational Lexical Semantics

Francis Bond, 2014.

Thursday 9:30-13:30 HSS Computer Lab 2 (HSS-01-05)

In this course students will become familiar with how to represent word meanings computationally and a variety of methods for automatically determining the meaning of words. These include dictionary based methods, such as LESK, graph based methods such as UKB, supervised methods such as sequence classification and best paths as well as vector space methods. We will finish with a discussion of how to provide feedback from word sense disambiguation to meaning representation.

The course is a seminar course, with participants choosing and presenting papers each week. Because of this, content varies from year to year depending on the composition of the class.

The course includes a strong computational component, students will experiment with implementing systems and algorithms, in addition to learning about them. By the end of this course, graduate students will have an advanced knowledge of current computational approaches to lexical meaning. Students should be able to evaluate lexical resources for coverage and quality as well as to design and implement systems for determining meaning in text.

This year (2104) we will try to cover at least:

Schedule

Assessment

Presentation One (20%): assigned subject

Present a paper (or series of papers), along with an extended motivation/background and some discussion of applications. We will discuss both the content of the paper, and also the form of your presentation. One goal of the course is to learn how to present information effectively.

Presentations should take an hour, with 30-40 minutes of presentation and 30-20 minutes discussion. You should give me the corrected slides within a week of the presentation, and I will add them to the web page.

Presentation Two (20%): your choice (must be OKed: should fit in with your research)

Present a paper (or series of papers), along with an extended motivation/background and some discussion of applications. Can be your own work (if it is ready).

Project One (30%): assigned topic

Sample: critically evaluate one lexical resource (such as WordNet and tagged corpus). The evaluation will include writing code to summarize its properties and compare it to other resources.

The project must include a computational component, a presentation, and a written component. In the computational component the student will be expected to write to code analyse data and solve problems. In the presentation, they will present the results to their lecturer and peers. In the written component they will explain their results in the form of a short paper. They should incorporate any feedback from their presentation in the final paper. The final paper should follow the submission guidelines for TACL.

Project Two (30\%): your choice (must be OKed: should fit in with your research)

Sample: implement (or extend an existing implementation of) a word sense disambiguation system.

Recommended Readings

As there is no textbook which covers the topics to the depth required, we will rely mainly on readings. the following textbooks are recommended for background reading.

Course Outline

Sample topics

Sample Discussions

References

Motivation

One of the core areas of study in computational linguistics is the study of how words and sentence meaning is represented. This course covers both linguistics and computational issues, with an emphasis on the latter. The course also offers a grounded introduction to some general natural language processing techniques, such as sequence labeling, graph search, comparison of similarity and method combination.

Aims and objectives

This course is designed to provide an advanced knowledge of current computational approaches to lexical meaning. Students should be able to evaluate lexical resources for coverage and quality; design and implement systems for determining meaning in text; and present their results effectively. Students will do two small research projects during the course.

On completion of this course, the students should be able to:


Francis Bond <bond@ieee.org>
Computational Linguistics Lab
Division of Linguistics and Multilingual Studies
Nanyang Technological University
Level 3, Room 55, 14 Nanyang Drive, Singapore 637332
Tel: (+65) 6592 1568; Fax: (+65) 6794 6303