Generalized Linear Classifiers in NLP

For the better part of a decade machine learning methods like maximum entropy and support vector machines have been a major part of many NLP applications such as parsing, semantic role labeling, ontology induction, machine translation, and summarization. Many of these models fall into the class of Generalized Linear Classifiers, which are characterized by defining a prediction boundary as a linear combination of input features and their weights. In this course we will cover many of the important aspects of generalized linear classifiers including: training methods, min error vs. max likelihood, distribution free methods, online vs. batch, generative vs. discriminative, structured models, and extensions beyond linear predictors through kernels. The course assumes familiarity with basic concepts from statistics, calculus and linear algebra.

Date: October 22nd, 2007; Lecturer: Ryan McDonald

Outline (subject to change)

10-12 Introduction, feature representations, loss functions, perceptron, margin, SVMs, logistic regression (Max Ent)
13-15 Kernels, structured learning including conditional random fields, applications
15-17 Left-overs, practical

Slides (pdf)[latest slides, will change]

Practical

An implementation of the perceptron algorithm and some extensions handout. Starter code available here, data sets included.

You can find the solution here

Project suggestions

Build a structured perceptron algorithm for entity or part-of-speech tagging
Download and test various linear classifiers for standard NLP problems (MALLET (log. reg. (max. ent.)), libsvm or svm light)
Download and compare some structured learning algorithms (MALLET (CRF), StructLearn (Perceptron, MIRA), StructSVM)
Create a kernalized version of the perceptron algorithm
Parsing: show how one can use perceptron and/or CRFs for structured learning of context-free parsing through CKY and inside-outside algorithms
Prove the equivalence of logistic regression and maximum entropy, i.e., write out both objective functions and show that they are maximized with precisely the same parameters
Email me about possible data sets (see my webpage for an address).

Literature and resources

Slides from ESSLLI lecture pdf
SVM tutorials html
Tutorials by Dan Klein html. Bottom of the page. Check out "Introduction to Classification", "Max Margin Methods for NLP", and "Maxent Models ..."

Back to course page

10-12	Introduction, feature representations, loss functions, perceptron, margin, SVMs, logistic regression (Max Ent)
13-15	Kernels, structured learning including conditional random fields, applications
15-17	Left-overs, practical