CoNLL-X Shared Task: Multi-lingual Dependency Parsing

Tenth Conference on Computational Natural Language Learning - New York City, June 8-9, 2006

Contents

Danish
Dutch
Portuguese
Swedish
Unparsed test data (all four languages)
Gold standard test data (all four languages)

Back

Free data

This is data that you can download for free ("open source") from this page. Please note that 'free' does not imply that there is no license. The relevant license is included with the data.

All data is tarred and zipped using bzip2, and can be unpacked with either 'tar xjf filename' or 'tar xyf filename' (depending on the your version of tar). This will create a directory data/<language>/<treebank>/train, which contains the training data, and a directory data/<language>/<treebank>/doc, which contains a README file plus additional documentation.

Danish

This is the training part of Danish data. The README file is also part of the distribution conll06_data_danish_ddt_train_v1.1.tar.bz2 (456K).

Dutch

New corrected version from January 13
This is the training part of Dutch data. The README file is also part of the distribution conll06_data_dutch_alpino_train_v1.4.tar.bz2 (435K).

Portuguese

This is the training part of the Portuguese data. The README file is also part of the distribution conll06_data_portuguese_bosque_train_v1.2.tar.bz2 (956K).

Swedish

This is the training part of Swedish data. The README file is also part of the distribution conll06_data_swedish_talbanken05_train_v1.1.tar.bz2 (640K).

Unparsed test data (all four languages)

conll06_data_free_test_blind.tar.bz2 (94K) is tarred and zipped using bzip2, and can be unpacked with either 'tar xjf filename' or 'tar xyf filename' (depending on the your version of tar). This will create directories data/<language>/<treebank>/test, wich contain the unparsed ("blind") test data, i.e. the input to the parser without the correct answers.

Gold standard test data (all four languages)

conll06_data_free_test.tar.bz2 (118K) is tarred and zipped using bzip2, and can be unpacked with either 'tar xjf filename' or 'tar xyf filename' (depending on the your version of tar). This will create directories data/<language>/<treebank>/test, wich contain the gold standard test data (exactly the same format as the training data).