Generalized Linear Classifiers in NLP

For the better part of a decade machine learning methods like maximum entropy and support vector machines have been a major part of many NLP applications such as parsing, semantic role labeling, ontology induction, machine translation, and summarization. Many of these models fall into the class of Generalized Linear Classifiers, which are characterized by defining a prediction boundary as a linear combination of input features and their weights. In this course we will cover many of the important aspects of generalized linear classifiers including: training methods, min error vs. max likelihood, distribution free methods, online vs. batch, generative vs. discriminative, structured models, distributed algorithms, and extensions beyond linear predictors through kernels. The course assumes familiarity with basic concepts from statistics, calculus and linear algebra.

Date: October 19th, 2009; Lecturer: Ryan McDonald

Outline (subject to change)
10-12 Introduction, feature representations, loss functions, perceptron, margin, SVMs, logistic regression (Max Ent), stochastic gradient descent
13-15 Parallelization, structured learning including conditional random fields, Kernels
15-17 Left-overs, practical

Slides (pdf) [subject to change]


An implementation of the perceptron algorithm and some extensions handout. Starter code available here, data sets included.

Project suggestions

Literature and resources

Back to course page