COSC 6336
Natural Language Processing
 



Course overview

Instructor

Thamar Solorio
tsolorio@uh.edu

Office hours: F 5:00-6:00pm, or by appointment, in PGH 584.

Teaching Assistant

Gustavo Aguilar
gaguilaralas@uh.edu

Office hours: TTH- 3:00-4:00pm in PGH 550A.

Course Syllabus

Syllabus

Piazza Class

https://piazza.com/uh/spring2018/cosc6336

Description

This is a graduate level introductory course to natural language processing (NLP). The course is intended to develop foundations in NLP and text mining. The broader goal is to understand how NLP tasks are carried out in the real world (e.g., Web) and how to build tools for solving practical language processing problems. Throughout the course, large emphasis will be placed on tying NLP techniques to specific real-world applications through hands-on experience. The course is standalone and covers required topics of machine learning and mathematical foundations.

Prerequisites

  1. Algorithms and Data Structure (COSC 3320) or equivalent.
  2. Sufficient programming experience (in C++/Java/Python, etc.) for building projects.


Course topics

Linguistics Background & Text Processing:

  • Language models
  • Vector Semantics
  • Hidden Markov Models
  • Sequence Labelling and POS tagging
  • Syntactic Parsing
  • Higher Level NLP tasks: Information Extraction, Question Answering, Dialogue Systems


Grading

  • Assignments: 40% (3-4 total)
  • Exams: 40% (2 total)
  • Participation, exercises, and quizzes: 10%


Course Textbooks

References

  • Text Book: The official book is the 3rd Edition Book from Jurafsky and Martin. The missing chapters will be based on the previous edition: SPEECH and LANGUAGE PROCESSING, An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Second Edition, by Daniel Jurafsky and James H. Martin, Prentice Hall, 2008.
  • Online Book: Natural Language Processing in Python, NLTK. This book has open access and will help you get started on your programming assignments. I refer to this book as NLTK in all course materials.


Course schedule

Week Topic Material Assignments
1 Course Introduction Lecture 1 slides
In-class Assignment 0
2 Linguistics Background & Text Processing,
Edit Distance
Lecture 2 slides
Reading material:
[J&M] Ch. 2 (3rd. ed.)
In-class Assignment 1
3 Language models & Classification
(NB and Logistic Regression)
Lecture 3.1 slides
Lecture 3.2 slides
Reading material:
[J&M] Ch. 4, 6 & 7 (3rd ed.)
In-class Assignment 2
4 HMMs and POS tagging Lecture 4.1 HMM slides
Lecture 4.2 HMM slides
Lecture 4.3 POS slides
Reading material:
[J&M] Ch. 9 and 10 (3rd ed.)
Homework 1
Due date: Mar 14th & 16th
5 Vector Semantics and word embeddings Lecture 5 slides
Reading material:
[J&M] Ch. 15 and 16 (3rd ed.)
Word2vec Demo
Glove Demo
Spring Break
6 Midterm March 21st
7 Formal Grammars and Syntactic Parsing Lecture 7.1 Formal Grammars slides
Lecture 7.2 Syntactic Parsing slides
Lecture 7.3 CKY example
Reading material:
[J&M] Ch. 12 and 13
Homework 2
Due date: April 11th, 25th & May 2nd
8 Statistical Parsing and Dependency Parsing Lecture 8 Statistical Parsing slides
Reading material:
[J&M] Ch. 13 and 14 (3rd ed.)
9 Information Extraction (IE) Lecture 9 Information Extraction
Reading material:
[J&M] Ch. 21 (3rd ed.)
10 Dialogue systems Lecture 10 Dialogue Systems
Reading material:
[J&M] Ch. 29 (3rd ed.)
11 Final exam