Course overview
Instructor
Thamar Solorio
tsolorio@uh.edu
Office hours: F 5:00-6:00pm, or by appointment, in PGH 584.
Teaching Assistant
Gustavo Aguilar
gaguilaralas@uh.edu
Office hours: TTH- 3:00-4:00pm in PGH 550A.
Course Syllabus
Piazza Class
https://piazza.com/uh/spring2018/cosc6336
Description
This is a graduate level introductory course to natural language processing (NLP). The course is intended to develop foundations in NLP and text mining. The broader goal is to understand how NLP tasks are carried out in the real world (e.g., Web) and how to build tools for solving practical language processing problems. Throughout the course, large emphasis will be placed on tying NLP techniques to specific real-world applications through hands-on experience. The course is standalone and covers required topics of machine learning and mathematical foundations.
Prerequisites
- Algorithms and Data Structure (COSC 3320) or equivalent.
- Sufficient programming experience (in C++/Java/Python, etc.) for building projects.
Course topics
Linguistics Background & Text Processing:
- Language models
- Vector Semantics
- Hidden Markov Models
- Sequence Labelling and POS tagging
- Syntactic Parsing
- Higher Level NLP tasks: Information Extraction, Question Answering, Dialogue Systems
Grading
- Assignments: 40% (3-4 total)
- Exams: 40% (2 total)
- Participation, exercises, and quizzes: 10%
Course Textbooks
References
- Text Book: The official book is the 3rd Edition Book from Jurafsky and Martin. The missing chapters will be based on the previous edition: SPEECH and LANGUAGE PROCESSING, An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Second Edition, by Daniel Jurafsky and James H. Martin, Prentice Hall, 2008.
- Online Book: Natural Language Processing in Python, NLTK. This book has open access and will help you get started on your programming assignments. I refer to this book as NLTK in all course materials.
Course schedule
| Week | Topic | Material | Assignments |
|---|---|---|---|
| 1 | Course Introduction | Lecture 1 slides | In-class Assignment 0 |
| 2 | Linguistics Background & Text Processing, Edit Distance | Lecture 2 slides Reading material: [J&M] Ch. 2 (3rd. ed.) | In-class Assignment 1 |
| 3 | Language models & Classification (NB and Logistic Regression) | Lecture 3.1 slides Lecture 3.2 slides Reading material: [J&M] Ch. 4, 6 & 7 (3rd ed.) | In-class Assignment 2 |
| 4 | HMMs and POS tagging | Lecture 4.1 HMM slides Lecture 4.2 HMM slides Lecture 4.3 POS slides Reading material: [J&M] Ch. 9 and 10 (3rd ed.) | Homework 1 Due date: Mar 14th & 16th |
| 5 | Vector Semantics and word embeddings | Lecture 5 slides Reading material: [J&M] Ch. 15 and 16 (3rd ed.) | Word2vec Demo Glove Demo |
| Spring Break | |||
| 6 | Midterm | March 21st | |
| 7 | Formal Grammars and Syntactic Parsing | Lecture 7.1 Formal Grammars slides Lecture 7.2 Syntactic Parsing slides Lecture 7.3 CKY example Reading material: [J&M] Ch. 12 and 13 | Homework 2 Due date: April 11th, 25th & May 2nd |
| 8 | Statistical Parsing and Dependency Parsing | Lecture 8 Statistical Parsing slides Reading material: [J&M] Ch. 13 and 14 (3rd ed.) | |
| 9 | Information Extraction (IE) | Lecture 9 Information Extraction Reading material: [J&M] Ch. 21 (3rd ed.) | |
| 10 | Dialogue systems | Lecture 10 Dialogue Systems Reading material: [J&M] Ch. 29 (3rd ed.) | |
| 11 | Final exam |