!!UPDATE!!

This page will soon be obsolete. The new MSTParser page is now a sourceforge project.

This new project was started by Jason Baldrige and Ryan McDonald to make it easier for new features to be added to the parser.

Code will be available soon. Try it out here!!

MSTParser (v0.2)

This is a simple web-page to download the implementations of the parsers described in:

Non-Projective Dependency Parsing using Spanning Tree Algorithms
R. McDonald, F. Pereira, K. Ribarov and J. Hajic
HLT-EMNLP, 2005

Online Large-Margin Training of Dependency Parsers
R. McDonald, K. Crammer and F. Pereira
ACL, 2005

Online Learning of Approximate Dependency Parsing Algorithms
R. McDonald and F. Pereira
EACL, 2006

The parser is implemented in java.

New: Version 0.2 uses second-order edge features (see EACL paper above).
New: Version 0.1 has the ability to produce typed (or labeled) trees.

Please view the README file to learn about usage and input/output format.

Downloads

FAQ

What character encoding does the parser use?
It is hard coded for Unicode (UTF8) in correspondence with the CoNLL-X shared task. You can grep "UTF8" and replace all occurances with whatever encoding you want.
Can the parser use CoNLL-X input format?
Not yet. However, I have include some easy to use python scripts to convert between CoNLL and MSTParser formats. They are in the scripts directory.
Can the parser produce non-tree dependency graphs?
Not yet. This will be part of the next release.
Is the edge labeler any good?
This is somewhat complicated. The parser currently jointly predicts dependencies and labels at once. This is nice since it allows the information from both decisions to simultaneously be used. However, the labeler is forced to obey any locality constraints of the dependency parser (single edge or pairs of edges). I have found that it is often better to have a post-processing edge labeler that can have a larger scope for features. It is not difficult to create this and any classifier can be used. I suggest MALLET. I will make a post-processing labeler available in the next version.

Questions: ryantm at cis dot upenn dot edu