LTI: Language, Technology and the Internet

Francis (フランシス) Bond (凡土) : 2010, 2012, 2014, 2019, 2020, 2021 (as HG2052), 2023

Wednesday 09:45-11:15, Room 2.40, tř. Svobody 26, 779 00 Olomouc

This course explores the intersection of language, technology and the internet. We start by looking at the introduction of writing and how it can be represented on computers. We then look at speech and how it differs from text. These are compared to different media of communication such as email, blogs and chat. The internet has made new methods of writing possible, as well as made access to an incredible variety and amount of text. We study how wikipedia pages are written, and students group together to write their own pages. In the second half of the course, we learn about how information is represented electronically, both as text and meta-text. We finish with a discussion of large language models and AI. The implications of this technology for our thinking and understanding of language will also be discussed.

There is no set text-book, all the material is covered in the lectures. As a result, you need to actually come to the lectures. General guidelines to the course are given in lecture one.

Course Page: Course Page (here); Source: Source on Github

Course Outline

Week Content (click to download) Further Reading Misc
1 Introduction, Organization: Main Issues What can search terms tell us?
Which is more efficient: Chinese or English
Rants about technology through the ages
Media Usage Form
Media Usage Diary (sample)
Results: 2014, 2019, 2020, 2021, 2023
2 Writing and Text Sproat (2010) Ch 3 Introduce Assignment 1
Selected papers from previous years.
3 Speech and Language Technology Sproat (2010, Ch 6) Festival TTS
Mouth Bot
Readspeaker Demo
4 New Mediums: Email, Usenet, Blogs and Chat Crystal (2006, Ch 3–6) and some from this class. Q&A with David Crystal on Internet Linguistics
5 Wikis and Collaboration Wikipedia:About Assignment 2 Ex 0 (12th) Make a wiki account and user page
Multilingual Consent Form
6 The World Wide Web and HTML Crystal (2006) Ch 7 Assignment 1
Due: Oct 27 17:00
Assignment 2 Ex 1: Improve a page
Pick your group and topic
7 The Web as Corpus Kilgarriff (2004); Kilgariff and Grefenstette (2003)
Google's Book Search: A Disaster for Scholars (Nunberg, 2009)
8 Text and Meta-text Marcus, Santorini and Marcinkiewicz (2004)
A Gentle Introduction to Metadata (Jeff Good, 2002)
Assignment 2 Presentation Nov 8
9 Language Identification and Normalization Manning and Schütze (1999, Ch 3)
Generalized Language Identification (Marco Lui, 2014)
10 Citation, Reputation and PageRank Brin and Page (1998)
11 AI and Large Language Models I Assignment 2
Due on Friday Dec 1
12 AI and Large Language Models II
13 Review and Conclusions
Assignment 3
Due January 25th (midnight)

Recommended Readings

Assessment

Source code for this course available here https://github.com/bond-lab/Language-Technology-and-the-Internet under a Creative Commons Attribution 4.0 International Licence — CC BY 4.0.

Francis Bond <bond@ieee.org> Palacký University