LTI: Language, Technology and the Internet

Francis (フランシス) Bond (凡土) : 2010, 2012, 2014, 2019, 2020, 2021 (as HG2052), 2023, 2024

Thursday 16:45-18:15, Room 2.39, tř. Svobody 26, 779 00 Olomouc

This course explores the intersection of language, technology and the internet. We start by looking at the introduction of writing and how it can be represented on computers. We then look at speech and how it differs from text. These are compared to different media of communication such as email, blogs and chat. The internet has made new methods of writing possible, as well as made access to an incredible variety and amount of text. We study how wikipedia pages are written, and students will write their own pages. In the second half of the course, we learn about how information is represented electronically, both as text and meta-text. We finish with a discussion of large language models and AI. The implications of this technology for our thinking and understanding of language will also be discussed.

There is no set text-book, all the material is covered in the lectures. As a result, you need to actually come to the lectures. General guidelines to the course are given in lecture one.

Course Page: Course Page (here); Source: Source on Github

Course Outline

Wk Date   Content (click to download) Further Reading Misc
1 09-26 Introduction, Organization: Main Issues What can search terms tell us?
Which is more efficient: Chinese or English
Rants about technology through the ages
Media Usage Form
Media Usage Diary (sample)
Results: 2014, 2019, 2020, 2021, 2023, 2024
2 10-03 Writing and Text Sproat (2010) Ch 3 Introduce Assignment 1
Selected papers from previous years.
3 10-10 Speech and Language Technology Sproat (2010, Ch 6) Festival TTS
Mouth Bot
Readspeaker Demo
4 10-17 New Mediums: Email, Usenet, Blogs and Chat Crystal (2006, Ch 3–6) and some from this class. Q&A with David Crystal on Internet Linguistics
5 10-24 Wikis and Collaboration Wikipedia:About Assignment 2 Ex 0 (12th) Make a wiki account and user page
Multilingual Consent Form
6 10-31 The World Wide Web and HTML Crystal (2006) Ch 7 Assignment 1
Due: Nov 8 17:00 CET
Assignment 2 Ex 1: Improve a page
Pick your topic
Reading Week
7 11-14 The Web as Corpus Kilgarriff (2004); Kilgariff and Grefenstette (2003)
Google's Book Search: A Disaster for Scholars (Nunberg, 2009)
8 11-21 Text, Meta-text and Trust Marcus, Santorini and Marcinkiewicz (2004)
A Gentle Introduction to Metadata (Jeff Good, 2002)
9 Short break Watch: Simon Willison on AI (PyCon 2024 Keynote)
10 12-05 AI and Large Language Models I Assignment 2 Presentation
11 12-12 AI and Large Language Models II Practical: Let's try to side quest with AI!
Notes on the new Claude analysis JavaScript code execution tool
12 12-19 AI and Large Language Models III Assignment 2
Due on January 10, 2025
13 - Review and Conclusions
Assignment 3
Due January 31, 2025

Recommended Readings

Assessment

Source code for this course available here https://github.com/bond-lab/Language-Technology-and-the-Internet under a Creative Commons Attribution 4.0 International Licence — CC BY 4.0.

Francis Bond <bond@ieee.org> Palacký University