LAC: Language and The Computer
Francis Bond:
2010, 2011, 2012, 2013, 2015, 2017, 2021, 2023, 2024s, 2024w.
Michael Wayne Goodman: 2019, 2020.
βTraditionally linguistic analysis was done largely by hand, but
computer-based methods and tools are becoming increasingly widely used
in contemporary research. This course provides an introduction to
skills and resources that can assist the linguist, or indeed anyone,
in performing fast, flexible, and accurate quantitative
analyses. Students will learn a programming language (Python) along
with techniques for processing human language data. No previous
programming experience is required: we will teach you the basics of
programming and computational linguistics along with some good
software engineering practices.
This year we are doing a major overhaul, with a new text book! Please forgive any rough edges.
Course page (here);
Source on Github
Thursday
15:00-16:30,
Room 2.39, tΕ. Svobody 26,
779 00 Olomouc
Course Outline
Week: Date | Content |
Data Structures |
Readings |
Projects |
1: 09-26 |
Why do NLP? Why Python?
Setting up Colab and some simple data-types |
string |
PCC 1 & 2, π Cheat Sheet,
π Cheat Sheet BW |
|
2: 10-03 |
Introducing and Working with Lists |
list, tuple |
PCC 3 & 4, π list,
π list BW
DiP3 2.4
|
|
3: 10-10 |
If Statements |
boolean |
PCC 5, π if/while,
π if/while BW
|
|
4: 10-17 |
Dictionaries, User Input and While Loops |
dictionary, set |
PCC 6 & 7, π dict.,
π dict. BW
|
|
5: 10-24 |
Functions |
|
PCC 8, π functions,
π functions BW
|
|
6: 10-31 |
Files and Exceptions |
|
PCC 10, π files/exceptions,
π files/exceptions BW
|
Project 1 |
Reading Week |
7: 11-14 |
Testing Your Code |
|
PCC 11,π testing,
π testing BW |
|
8: 11-21 |
Regular Expressions |
regex |
Interactive RE,
Wikiversity Python Programming/RegEx |
|
9 11-28 |
Tokenization |
No lecture |
|
10: 12-05 |
Wordnet |
|
|
Project 1
due Dec 06 24:00 CET |
11: 12-12 |
Plotting |
|
|
|
12: --- |
Feedback, Review/catch up
|
|
Handy Summary of Python and NLP Concepts |
|
13: 12-18 Wed!
|
Final In-Class On-Line Open-Book
Programming Challenge
|
15:00–19:00, room 3.28 later for one student |
Project 2
due Jan 31 24:00 |
Textbooks and Tools
- Eric Matthes (2023)
Python Crash Course, 3rd Edition (PCC)
A Hands-On, Project-Based Introduction to Programming, No Starch Press, ISBN: 9781718502703
- It also comes with a set of handy references (π: Cheat Sheets)
- Other Python Books:
- Stephen Bird, Ewan Klein, Edward Loper (2009)
Natural Language Processing with Python, O'Reilly.
(Updated for Python3.0, 2019)
-
NLTK Natural Language Toolkit
- Allen Downey (2016)
Think Python: How to Think Like a Computer Scientist,
O'Reilly
- Goldwasser and Letscher (2008)
Object-Oriented
Programming in Python,
Prentice Hall
David LukeΕ‘, Rudolf Rosa (2020) An Introduction to Python for Linguists Visegrad Fund
- David Mertz (2003) Text
Processing in Python,
Addison-Wesley.
- Rowan Nichols (2024) Python
(Nice online summary)
- Mark Pilgrim (2009)
Dive into Python 3 (DiP3), Apress
- Al Sweigart (2019) Automate the Boring Stuff with Python, 2nd Edition, No Starch Press
- Even more: 5 Free Books to Help You Master Python
- On-line environments and tutorials
- NLP/Computational Linguistics
- Recommended text editors
- Linguistic Style Guides
Assessment and Solutions to Problems
Evaluation Criteria (same for all projects)
- Code/Approach (50%)
- Does the program produce useful results
- Is it properly documented
- Does it use appropriate data structures?
- Write Up (50%)
- Is it clear what was done and why?
- Does it adequately cite resources used?
- Does it adequately cite the linguistics literature?
- Does it include some evaluation?
- Is there any analysis of errors?
- Does it point out any remaining issues?
- Is it formatted correctly?
Assessment problems are generally open ended --- it is not expected that
the student can solve them fully: the goal is to see how they approach
the problem and understand it.
A note on using AI (such as ChaGPT or Claude)
Research shows
(Shein
2024) that using AI to help code allows you to solve problems
faster, but not to remember how to do it. The goal of this course
is to learn the foundations of programming, you will retain them
better if you solve the problems on your own.
Once you know more about how to program, you can and should use AI
to help you. I do, and it is helpful, but it makes mistakes that a
human wouldn't (or shouldn't). For example, I asked it to split data
into 10 roughly equal bins, which it did, but it only returned nine of
them! I noticed, because I could check the code and understand the
output, ... That is what this course will teach you.
Course materials were originally heavily inspired by
clt231:
Introduction to Natural Language Processing at the University of Helsinki. Thanks to Graham Wilcock for
letting us use them.
I will try not to make things too hard
(cartoon from Abtruse Goose)
Instead this class should be like this (cartoon from XKCD)
Francis Bond
<bond@ieee.org>
<francis.bond@upol.cz>
Home page