CoNLL-X Shared Task: Multi-lingual Dependency Parsing

Tenth Conference on Computational Natural Language Learning - New York City, June 8-9, 2006

Contents

Complete result tables
Labeled attachment score (LAS)
Unlabeled attachment score (UAS)
Label accuracy

Significance
for LAS
for UAS
for LAS when scoring punctuation

All official submissions of participants
in one tarball

Output of eval.pl
for concatenation of all submissions

Older, partial result tables
Average and standard deviation
Top three scores

Back

Complete result tables

Note: The content of the "name" and "affiliation" columns contains the information as provided by participants during results upload.

Labeled attachment score (LAS)

Arabic Chinese Czech Danish Dutch German Japanese Portuguese Slovene Spanish Swedish Turkish AV SD Bulgarian Name Affiliation
57.64 78.37 60.92 77.90 74.59 77.56 87.41 77.42 59.19 68.32 79.15 51.07 70.80 11.11 78.74 Sander Canisius, Antal van den Bosch, Erik Tjong Kim Sang, Toine Bogers, Jeroen Geertzen Tilburg University
53.81 54.89 59.76 66.35 58.24 69.77 65.38 75.36 57.19 67.44 68.77 37.80 61.23 9.92 72.89 Giuseppe Attardi Universita  di Pisa
63.81 74.81 59.36 78.38 68.45 76.52 90.11 81.47 67.83 72.99 71.72 55.09 71.71 9.67 79.73 YuChieh Wu National Central University
60.94 83.68 68.82 79.74 67.25 82.41 88.13 83.37 68.43 77.16 78.65 58.06 74.72 9.72 83.30 Xavier Carreras, Mihai Surdeanu, Lluís Màrquez Technical University of Catalonia
52.42 72.72 51.86 71.56 62.75 63.82 84.35 70.35 55.06 69.63 65.23 60.31 65.01 9.46 73.49 Deniz Yuret Koc University
55.37 76.18 63.02 74.61 69.51 74.74 84.75 78.18 64.31 71.37 74.09 53.87 70.00 9.25 79.21 Eckhard Bick University of Southern Denmark
66.71 86.92 78.42 84.77 78.59 85.82 91.65 87.60 70.30 81.29 84.58 65.68 80.19 8.53 87.41 Joakim Nivre, Johan Hall, Jens Nilsson, Gülşen Eryiğit, Svetoslav Marinov Växjö University, Istanbul Technical University, University of Skövde
44.39 66.20 53.34 76.05 72.11 68.73 83.35 71.01 50.72 46.96 71.10 49.81 62.81 13.01 0.00 Michael Schiehlen IMS, Uni Stuttgart
50.74 75.29 58.52 77.70 59.36 68.11 70.84 71.13 57.21 65.08 63.83 41.72 63.29 10.42 67.64 Jinshan Ma ?
53.37 71.63 60.54 66.61 61.56 70.97 82.87 75.28 58.73 67.62 67.58 46.05 65.23 9.93 74.81 Markus Dreyer, David A. Smith, and Noah A. Smith Johns Hopkins University
66.71 86.70 76.60 82.83 77.51 85.36 90.57 84.69 71.08 79.82 81.78 57.52 78.43 9.38 85.24 John O'Neil Basis Technology, Inc.
60.92 85.05 72.88 80.60 72.91 84.17 89.07 83.99 69.52 79.72 82.31 60.51 76.80 9.43 0.00 Quang Xuan Do, Ming-Wei Chang University of Illinois at Urbana-Champaign
64.29 72.49 71.46 81.54 72.67 80.43 85.63 84.57 66.43 78.16 78.13 63.39 74.93 7.65 0.00 Richard Johansson and Pierre Nugues Department of Computer Science, Lund University, Sweden
66.91 85.90 80.18 84.79 79.19 87.34 90.71 86.82 73.44 82.25 82.55 63.19 80.27 8.43 87.57 Ryan McDonald, Kevin Lerman, Fernando Pereira University of Pennsylvania
66.65 89.96 67.44 83.63 78.59 86.24 90.51 84.43 71.20 77.38 80.66 58.61 77.94 10.05 0.00 Sebastian Riedel, Ivan Meza-Ruiz, Ruken Çakıcı University of Edinburgh
62.71 84.73 75.24 81.56 76.61 84.92 90.37 86.01 69.06 77.68 82.00 63.21 77.84 8.95 0.00 Kenji Sagae Carnegie Mellon University
62.83 0.00 0.00 75.81 0.00 0.00 0.00 0.00 64.57 73.17 79.49 54.23 34.18 36.26 0.00 Nobuyuki Shimizu SUNY/Albany
63.53 79.92 74.48 81.74 71.43 83.47 89.95 84.59 72.42 80.36 79.69 61.74 76.94 8.47 83.36 Simon Corston-Oliver and Anthony Aue Microsoft Research
65.19 84.27 76.24 81.72 71.77 84.11 89.91 85.07 71.42 80.46 81.08 61.22 77.70 8.67 86.34 Yuchang Cheng Nara Institute of Science and Technology
AV 59.94 78.32 67.17 78.31 70.73 78.58 85.86 80.63 65.16 73.52 76.44 55.95 79.98
SD 6.53 8.82 8.93 5.45 6.66 7.51 7.09 5.83 6.78 8.41 6.46 7.71 6.30

Unlabeled attachment score (UAS)

Arabic Chinese Czech Danish Dutch German Japanese Portuguese Slovene Spanish Swedish Turkish AV SD Bulgarian Name Affiliation
74.59 82.86 72.88 82.93 77.79 80.01 89.67 85.61 74.02 71.33 85.08 64.19 78.41 7.28 82.51 Sander Canisius, Antal van den Bosch, Erik Tjong Kim Sang, Toine Bogers, Jeroen Geertzen Tilburg University
69.50 81.33 73.44 78.84 68.93 80.25 82.05 85.03 72.14 74.25 83.03 65.25 76.17 6.42 85.24 Giuseppe Attardi Universita  di Pisa
75.45 79.48 74.82 83.39 71.75 79.73 91.74 85.57 76.92 76.20 76.24 69.25 78.38 6.17 85.50 YuChieh Wu National Central University
72.65 88.65 77.44 85.67 71.39 85.90 90.79 87.76 77.72 80.77 85.54 70.05 81.19 7.21 88.81 Xavier Carreras, Mihai Surdeanu, Lluís Màrquez Technical University of Catalonia
68.82 78.37 66.36 78.16 66.17 67.71 87.31 79.46 70.60 73.89 73.25 71.54 73.47 6.36 78.56 Deniz Yuret Koc University
68.98 83.06 72.24 80.54 74.47 79.79 87.85 84.29 75.06 75.76 82.65 65.50 77.52 6.66 84.16 Eckhard Bick University of Southern Denmark
77.52 90.54 84.80 89.80 81.35 88.76 93.10 91.22 78.72 84.67 89.50 75.82 85.48 5.90 91.72 Joakim Nivre, Johan Hall, Jens Nilsson, Gülşen Eryiğit, Svetoslav Marinov Växjö University, Istanbul Technical University, University of Skövde
62.63 74.87 66.86 81.94 75.59 72.64 86.71 81.27 68.45 53.18 79.69 61.58 72.12 9.87 0.00 Michael Schiehlen IMS, Uni Stuttgart
64.79 79.90 68.14 79.90 64.07 73.00 72.64 77.10 68.94 70.07 73.19 56.90 70.72 6.77 73.97 Jinshan Ma ?
68.46 77.63 70.74 77.45 68.33 76.98 85.97 82.41 72.88 72.85 79.53 60.45 74.47 6.98 81.95 Markus Dreyer, David A. Smith, and Noah A. Smith Johns Hopkins University
78.54 90.64 85.58 88.78 81.73 89.16 93.16 89.70 81.71 84.11 88.45 72.02 85.30 6.00 90.72 John O'Neil Basis Technology, Inc.
76.09 89.60 81.78 86.85 76.25 86.90 90.77 88.60 80.32 83.09 89.05 73.15 83.54 6.01 0.00 Quang Xuan Do, Ming-Wei Chang University of Illinois at Urbana-Champaign
75.53 77.04 77.40 86.59 76.01 83.09 87.11 88.40 74.36 81.43 84.17 73.59 80.39 5.36 0.00 Richard Johansson and Pierre Nugues Department of Computer Science, Lund University, Sweden
79.34 91.07 87.30 90.58 83.57 90.38 92.84 91.36 83.17 86.05 88.93 74.67 86.61 5.51 92.04 Ryan McDonald, Kevin Lerman, Fernando Pereira University of Pennsylvania
78.62 93.18 77.32 89.66 82.91 89.76 92.96 89.42 83.17 81.05 88.33 74.07 85.04 6.38 0.00 Sebastian Riedel, Ivan Meza-Ruiz, Ruken Çakıcı University of Edinburgh
74.11 89.64 82.64 86.53 80.71 87.92 92.20 89.78 78.02 81.13 88.57 73.31 83.71 6.35 0.00 Kenji Sagae Carnegie Mellon University
74.27 0.00 0.00 81.72 0.00 0.00 0.00 0.00 74.88 77.58 86.62 68.77 38.65 40.59 0.00 Nobuyuki Shimizu SUNY/Albany
78.40 90.00 83.02 87.94 74.83 87.20 92.84 88.96 81.77 84.87 89.54 73.11 84.37 6.28 90.09 Simon Corston-Oliver and Anthony Aue Microsoft Research
77.74 89.46 83.40 88.64 75.49 87.66 93.12 90.30 81.14 85.15 88.57 74.49 84.60 6.15 91.30 Yuchang Cheng Nara Institute of Science and Technology
AV 73.48 84.85 77.01 84.52 75.07 82.60 89.05 86.46 76.53 77.76 84.21 69.35 85.89
SD 4.94 5.99 6.70 4.29 5.78 6.73 5.20 4.17 4.67 7.81 5.45 5.51 5.60

Label accuracy

Arabic Chinese Czech Danish Dutch German Japanese Portuguese Slovene Spanish Swedish Turkish AV SD Bulgarian Name Affiliation
70.38 80.85 70.46 84.09 80.07 86.78 90.69 81.85 69.44 81.95 82.45 64.41 78.62 8.01 83.28 Sander Canisius, Antal van den Bosch, Erik Tjong Kim Sang, Toine Bogers, Jeroen Geertzen Tilburg University
72.97 58.75 69.84 74.65 66.47 77.68 73.68 80.79 69.36 82.19 72.42 49.81 70.72 9.12 77.68 Giuseppe Attardi Universita di Pisa
77.66 78.91 65.90 84.81 73.49 82.97 93.58 87.16 78.22 85.17 73.91 67.22 79.08 8.18 83.58 YuChieh Wu National Central University
78.36 86.12 78.74 85.57 77.31 89.42 92.04 88.74 81.10 88.72 82.83 73.41 83.53 5.79 87.49 Xavier Carreras, Mihai Surdeanu, Lluís Màrquez Technical University of Catalonia
63.61 75.41 59.36 77.41 67.39 69.91 87.31 73.49 60.49 77.38 67.38 71.32 70.87 7.98 76.58 Deniz Yuret Koc University
73.75 80.91 76.48 82.48 77.71 84.56 88.77 83.65 78.32 85.85 79.47 71.26 80.27 5.11 85.38 Eckhard Bick University of Southern Denmark
80.34 89.01 85.40 89.16 83.69 91.03 94.34 91.54 80.54 90.06 87.39 78.49 86.75 5.05 90.44 Joakim Nivre, Johan Hall, Jens Nilsson, Gülşen Eryiğit, Svetoslav Marinov Växjö University, Istanbul Technical University, University of Skövde
63.51 71.45 71.64 84.57 82.69 83.73 88.93 77.14 65.53 74.25 76.16 69.07 75.72 7.99 0.00 Michael Schiehlen IMS, Uni Stuttgart
68.50 79.80 69.34 84.41 69.61 79.27 80.25 76.46 70.52 81.69 68.93 54.25 73.59 8.37 75.46 Jinshan Ma ?
69.98 79.48 71.54 75.25 69.13 82.51 87.09 82.41 72.18 82.71 71.40 58.97 75.22 7.90 81.79 Markus Dreyer, David A. Smith, and Noah A. Smith Johns Hopkins University
80.00 88.93 83.44 87.74 82.83 90.58 93.52 88.70 80.42 89.04 85.14 70.21 85.05 6.24 88.71 John O'Neil Basis Technology, Inc.
75.69 87.28 80.42 86.51 80.15 91.03 92.18 88.84 79.26 89.26 84.82 73.75 84.10 6.10 0.00 Quang Xuan Do, Ming-Wei Chang University of Illinois at Urbana-Champaign
79.06 77.10 82.14 87.17 81.15 89.10 89.51 89.42 80.70 88.42 83.21 77.63 83.72 4.77 0.00 Richard Johansson and Pierre Nugues Department of Computer Science, Lund University, Sweden
79.50 88.23 86.72 89.22 83.89 92.11 93.74 90.46 82.51 90.40 85.58 77.45 86.65 5.04 90.70 Ryan McDonald, Kevin Lerman, Fernando Pereira University of Pennsylvania
80.18 91.93 77.70 88.22 83.51 91.15 93.46 88.54 80.42 88.08 84.25 70.80 84.85 6.68 0.00 Sebastian Riedel, Ivan Meza-Ruiz, Ruken Çakıcı University of Edinburgh
79.18 87.16 83.80 88.08 81.75 90.97 93.56 90.22 80.96 88.98 85.12 77.71 85.62 5.02 0.00 Kenji Sagae Carnegie Mellon University
78.74 0.00 0.00 83.15 0.00 0.00 0.00 0.00 76.98 86.05 83.13 67.50 39.63 41.63 0.00 Nobuyuki Shimizu SUNY/Albany
76.81 82.21 82.18 86.89 79.53 89.18 93.20 88.88 81.91 89.46 82.33 74.95 83.96 5.57 86.57 Simon Corston-Oliver and Anthony Aue Microsoft Research
79.02 86.42 83.52 86.11 75.83 90.67 92.40 88.00 80.96 88.90 83.99 73.91 84.14 5.78 89.27 Yuchang Cheng Nara Institute of Science and Technology
AV 75.12 81.66 76.59 84.50 77.57 86.26 89.90 85.35 76.31 85.71 80.00 69.59 84.38
SD 5.49 7.92 7.69 4.35 5.92 6.01 5.36 5.45 6.40 4.56 6.24 7.94 5.23

Significance

Significance computed with version 1.8 of eval.pl and Dan Bikel's Randomized Parsing Evaluation Comparator (Statistical Significance Tester for evalb Output) (with a default 10,000 iterations). Differences taken to be significant if p<0.05.

For LAS (default: without scoring punctuation)

Using the -p option of eval.pl

Lang12      80.27 80.19 78.43 77.94 77.84 77.70 76.94 76.80 74.94 74.72 71.71 70.79 70.00 65.23 65.00 63.29 62.82 61.23 34.20
  1) McDonald 80.27   2) Nivre 80.19
     Not significant (p = 0.34956504349565 ; diff = -0.0766040816325955 ; num = 3495)
  1) McDonald 80.27   3) O'Neil 78.43
     SIGNIFICANT     (p = 9.99900009999e-05 ; diff = -1.84425855893407 ; num = 0)
  1) McDonald 80.27   4) Riedel 77.94
     SIGNIFICANT     (p = 9.99900009999e-05 ; diff = -2.33402299042056 ; num = 0)
  1) McDonald 80.27   5) Sagae 77.84
     SIGNIFICANT     (p = 9.99900009999e-05 ; diff = -2.43062973760922 ; num = 0)
  2) Nivre 80.19   3) O'Neil 78.43
     SIGNIFICANT     (p = 9.99900009999e-05 ; diff = -1.76765447730148 ; num = 0)
  3) O'Neil 78.43   4) Riedel 77.94
     SIGNIFICANT     (p = 0.0001999800019998 ; diff = -0.489764431486492 ; num = 1)
  4) Riedel 77.94   5) Sagae 77.84
     Not significant (p = 0.320567943205679 ; diff = -0.0966067471886589 ; num = 3205)

Arabic      66.91 66.71 66.71 66.65 65.19 64.29 63.81 63.53 62.83 62.71 60.94 60.92 57.64 55.37 53.81 53.37 52.42 50.74 44.39
  1) McDonald 66.91   2) O'Neil 66.71
     Not significant (p = 0.386261373862614 ; diff = -0.200438877755573 ; num = 3862)
  1) McDonald 66.91   3) Nivre 66.71
     Not significant (p = 0.401859814018598 ; diff = -0.200579158316685 ; num = 4018)
  1) McDonald 66.91   4) Riedel 66.65
     Not significant (p = 0.339466053394661 ; diff = -0.260196392785588 ; num = 3394)
  1) McDonald 66.91   5) Cheng 65.19
     SIGNIFICANT     (p = 0.0075992400759924 ; diff = -1.72336472945896 ; num = 75)
  2) O'Neil 66.71   3) Nivre 66.71
     Not significant (p = 0.495850414958504 ; diff = -0.000140280561112149 ; num = 4958)
  3) Nivre 66.71   4) Riedel 66.65
     Not significant (p = 0.473252674732527 ; diff = -0.0596172344689023 ; num = 4732)
  4) Riedel 66.65   5) Cheng 65.19
     SIGNIFICANT     (p = 0.0150984901509849 ; diff = -1.46316833667338 ; num = 150)

Bulgarian   87.57 87.41 86.34 85.24 83.36 83.30 79.73 79.21 78.74 74.81 73.49 72.89 67.64  0.00  0.00  0.00  0.00  0.00  0.00
  1) McDonald 87.57   2) Nivre 87.41
     Not significant (p = 0.396860313968603 ; diff = -0.159477358866937 ; num = 3968)
  1) McDonald 87.57   3) Cheng 86.34
     SIGNIFICANT     (p = 0.0073992600739926 ; diff = -1.23680031917007 ; num = 73)
  1) McDonald 87.57   4) O'Neil 85.24
     SIGNIFICANT     (p = 9.99900009999e-05 ; diff = -2.33374027528419 ; num = 0)
  1) McDonald 87.57   5) Corston-Oliver 83.36
     SIGNIFICANT     (p = 9.99900009999e-05 ; diff = -4.20888489926183 ; num = 0)
  2) Nivre 87.41   3) Cheng 86.34
     SIGNIFICANT     (p = 0.018998100189981 ; diff = -1.07732296030314 ; num = 189)
  3) Cheng 86.34   4) O'Neil 85.24
     SIGNIFICANT     (p = 0.0118988101189881 ; diff = -1.09693995611411 ; num = 118)
  4) O'Neil 85.24   5) Corston-Oliver 83.36
     SIGNIFICANT     (p = 0.0004999500049995 ; diff = -1.87514462397765 ; num = 4)

Chinese     89.96 86.92 86.70 85.90 85.05 84.73 84.27 83.68 79.92 78.37 76.18 75.29 74.81 72.72 72.49 71.63 66.20 54.89  0.00
  1) Riedel 89.96   2) Nivre 86.92
     SIGNIFICANT     (p = 0.0001999800019998 ; diff = -3.03812072434619 ; num = 1)
  1) Riedel 89.96   3) O'Neil 86.70
     SIGNIFICANT     (p = 9.99900009999e-05 ; diff = -3.25949094567422 ; num = 0)
  1) Riedel 89.96   4) McDonald 85.90
     SIGNIFICANT     (p = 9.99900009999e-05 ; diff = -4.06434004024162 ; num = 0)
  1) Riedel 89.96   5) Do/Chang 85.05
     SIGNIFICANT     (p = 9.99900009999e-05 ; diff = -4.90931790744472 ; num = 0)
  2) Nivre 86.92   3) O'Neil 86.70
     Not significant (p = 0.358764123587641 ; diff = -0.22137022132803 ; num = 3587)
  3) O'Neil 86.70   4) McDonald 85.90
     SIGNIFICANT     (p = 0.0444955504449555 ; diff = -0.804849094567402 ; num = 444)
  4) McDonald 85.90   5) Do/Chang 85.05
     Not significant (p = 0.107389261073893 ; diff = -0.844977867203099 ; num = 1073)

Czech       80.18 78.42 76.60 76.24 75.24 74.48 72.88 71.46 68.82 67.44 63.02 60.92 60.54 59.76 59.36 58.52 53.34 51.86  0.00
  1) McDonald 80.18   2) Nivre 78.42
     SIGNIFICANT     (p = 0.0098990100989901 ; diff = -1.76019000000009 ; num = 98)
  1) McDonald 80.18   3) O'Neil 76.60
     SIGNIFICANT     (p = 9.99900009999e-05 ; diff = -3.58013599999998 ; num = 0)
  1) McDonald 80.18   4) Cheng 76.24
     SIGNIFICANT     (p = 9.99900009999e-05 ; diff = -3.94030400000008 ; num = 0)
  1) McDonald 80.18   5) Sagae 75.24
     SIGNIFICANT     (p = 9.99900009999e-05 ; diff = -4.94000200000001 ; num = 0)
  2) Nivre 78.42   3) O'Neil 76.60
     SIGNIFICANT     (p = 0.00999900009999 ; diff = -1.81994599999989 ; num = 99)
  3) O'Neil 76.60   4) Cheng 76.24
     Not significant (p = 0.318268173182682 ; diff = -0.360168000000101 ; num = 3182)
  4) Cheng 76.24   5) Sagae 75.24
     Not significant (p = 0.126887311268873 ; diff = -0.999697999999924 ; num = 1268)

Danish      84.79 84.77 83.63 82.83 81.74 81.72 81.56 81.54 80.60 79.74 78.38 77.90 77.70 76.05 75.81 74.61 71.56 66.61 66.35
  1) McDonald 84.79   2) Nivre 84.77
     Not significant (p = 0.497650234976502 ; diff = -0.0196606786427225 ; num = 4976)
  1) McDonald 84.79   3) Riedel 83.63
     SIGNIFICANT     (p = 0.0047995200479952 ; diff = -1.15756087824353 ; num = 47)
  1) McDonald 84.79   4) O'Neil 82.83
     SIGNIFICANT     (p = 9.99900009999e-05 ; diff = -1.95614171656689 ; num = 0)
  1) McDonald 84.79   5) Corston-Oliver 81.74
     SIGNIFICANT     (p = 9.99900009999e-05 ; diff = -3.05375049900198 ; num = 0)
  2) Nivre 84.77   3) Riedel 83.63
     SIGNIFICANT     (p = 0.0271972802719728 ; diff = -1.13790019960081 ; num = 271)
  3) Riedel 83.63   4) O'Neil 82.83
     SIGNIFICANT     (p = 0.0097990200979902 ; diff = -0.79858083832336 ; num = 97)
  4) O'Neil 82.83   5) Corston-Oliver 81.74
     SIGNIFICANT     (p = 0.012998700129987 ; diff = -1.09760878243509 ; num = 129)

Dutch       79.19 78.59 78.59 77.51 76.61 74.59 72.91 72.67 72.11 71.77 71.43 69.51 68.45 67.25 62.75 61.56 59.36 58.24  0.00
  1) McDonald 79.19   2) Nivre 78.59
     Not significant (p = 0.204979502049795 ; diff = -0.600554221688768 ; num = 2049)
  1) McDonald 79.19   3) Riedel 78.59
     Not significant (p = 0.131986801319868 ; diff = -0.600422168867652 ; num = 1319)
  1) McDonald 79.19   4) O'Neil 77.51
     SIGNIFICANT     (p = 0.0014998500149985 ; diff = -1.68097839135656 ; num = 14)
  1) McDonald 79.19   5) Sagae 76.61
     SIGNIFICANT     (p = 0.0001999800019998 ; diff = -2.58084633853545 ; num = 1)
  2) Nivre 78.59   3) Riedel 78.59
     Not significant (p = 0.495750424957504 ; diff = 0.00013205282111528 ; num = 4957)
  3) Riedel 78.59   4) O'Neil 77.51
     SIGNIFICANT     (p = 0.0063993600639936 ; diff = -1.08055622248891 ; num = 63)
  4) O'Neil 77.51   5) Sagae 76.61
     Not significant (p = 0.113688631136886 ; diff = -0.89986794717889 ; num = 1136)

German      87.34 86.24 85.82 85.36 84.92 84.17 84.11 83.47 82.41 80.43 77.56 76.52 74.74 70.97 69.77 68.73 68.11 63.82  0.00
  1) McDonald 87.34   2) Riedel 86.24
     SIGNIFICANT     (p = 0.0094990500949905 ; diff = -1.09809904153353 ; num = 94)
  1) McDonald 87.34   3) Nivre 85.82
     SIGNIFICANT     (p = 0.0053994600539946 ; diff = -1.51769968051114 ; num = 53)
  1) McDonald 87.34   4) O'Neil 85.36
     SIGNIFICANT     (p = 0.0001999800019998 ; diff = -1.97680910543129 ; num = 1)
  1) McDonald 87.34   5) Sagae 84.92
     SIGNIFICANT     (p = 0.0003999600039996 ; diff = -2.41611821086259 ; num = 3)
  2) Riedel 86.24   3) Nivre 85.82
     Not significant (p = 0.244675532446755 ; diff = -0.419600638977613 ; num = 2446)
  3) Nivre 85.82   4) O'Neil 85.36
     Not significant (p = 0.240975902409759 ; diff = -0.459109424920143 ; num = 2409)
  4) O'Neil 85.36   5) Sagae 84.92
     Not significant (p = 0.257374262573743 ; diff = -0.439309105431306 ; num = 2573)

Japanese    91.65 90.71 90.57 90.51 90.37 90.11 89.95 89.91 89.07 88.13 87.41 85.63 84.75 84.35 83.35 82.87 70.84 65.38  0.00
  1) Nivre 91.65   2) McDonald 90.71
     SIGNIFICANT     (p = 0.0054994500549945 ; diff = -0.939738157105793 ; num = 54)
  1) Nivre 91.65   3) O'Neil 90.57
     SIGNIFICANT     (p = 0.0003999600039996 ; diff = -1.07970217869271 ; num = 3)
  1) Nivre 91.65   4) Riedel 90.51
     SIGNIFICANT     (p = 0.0010998900109989 ; diff = -1.13957025784526 ; num = 10)
  1) Nivre 91.65   5) Sagae 90.37
     SIGNIFICANT     (p = 0.0001999800019998 ; diff = -1.27942034779134 ; num = 1)
  2) McDonald 90.71   3) O'Neil 90.57
     Not significant (p = 0.311968803119688 ; diff = -0.13996402158692 ; num = 3119)
  3) O'Neil 90.57   4) Riedel 90.51
     Not significant (p = 0.406759324067593 ; diff = -0.059868079152551 ; num = 4067)
  4) Riedel 90.51   5) Sagae 90.37
     Not significant (p = 0.359264073592641 ; diff = -0.13985008994608 ; num = 3592)

Portuguese  87.60 86.82 86.01 85.07 84.69 84.59 84.57 84.43 83.99 83.37 81.47 78.18 77.42 75.36 75.28 71.13 71.01 70.35  0.00
  1) Nivre 87.60   2) McDonald 86.82
     Not significant (p = 0.0871912808719128 ; diff = -0.778947893791141 ; num = 871)
  1) Nivre 87.60   3) Sagae 86.01
     SIGNIFICANT     (p = 0.0004999500049995 ; diff = -1.59745258534642 ; num = 4)
  1) Nivre 87.60   4) Cheng 85.07
     SIGNIFICANT     (p = 9.99900009999e-05 ; diff = -2.53578758235173 ; num = 0)
  1) Nivre 87.60   5) O'Neil 84.69
     SIGNIFICANT     (p = 9.99900009999e-05 ; diff = -2.91475144739466 ; num = 0)
  2) McDonald 86.82   3) Sagae 86.01
     Not significant (p = 0.0687931206879312 ; diff = -0.818504691555276 ; num = 687)
  3) Sagae 86.01   4) Cheng 85.07
     SIGNIFICANT     (p = 0.0465953404659534 ; diff = -0.938334997005313 ; num = 465)
  4) Cheng 85.07   5) O'Neil 84.69
     Not significant (p = 0.245475452454755 ; diff = -0.378963865042934 ; num = 2454)

Slovene     73.44 72.42 71.42 71.20 71.08 70.30 69.52 69.06 68.43 67.83 66.43 64.57 64.31 59.19 58.73 57.21 57.19 55.06 50.72
  1) McDonald 73.44   2) Corston-Oliver 72.42
     Not significant (p = 0.0628937106289371 ; diff = -1.01914468425265 ; num = 628)
  1) McDonald 73.44   3) Cheng 71.42
     SIGNIFICANT     (p = 0.0033996600339966 ; diff = -2.01796962430059 ; num = 33)
  1) McDonald 73.44   4) Riedel 71.20
     SIGNIFICANT     (p = 0.0001999800019998 ; diff = -2.23829536370904 ; num = 1)
  1) McDonald 73.44   5) O'Neil 71.08
     SIGNIFICANT     (p = 9.99900009999e-05 ; diff = -2.35802158273381 ; num = 0)
  2) Corston-Oliver 72.42   3) Cheng 71.42
     Not significant (p = 0.0761923807619238 ; diff = -0.998824940047939 ; num = 761)
  3) Cheng 71.42   4) Riedel 71.20
     Not significant (p = 0.365663433656634 ; diff = -0.220325739408452 ; num = 3656)
  4) Riedel 71.20   5) O'Neil 71.08
     Not significant (p = 0.406159384061594 ; diff = -0.11972621902477 ; num = 4061)

Spanish     82.25 81.29 80.46 80.36 79.82 79.72 78.16 77.68 77.38 77.16 73.17 72.99 71.37 69.63 68.32 67.62 67.44 65.08 46.96
  1) McDonald 82.25   2) Nivre 81.29
     Not significant (p = 0.107989201079892 ; diff = -0.961574834702475 ; num = 1079)
  1) McDonald 82.25   3) Cheng 80.46
     SIGNIFICANT     (p = 0.0120987901209879 ; diff = -1.78294730514931 ; num = 120)
  1) McDonald 82.25   4) Corston-Oliver 80.36
     SIGNIFICANT     (p = 0.0002999700029997 ; diff = -1.88341614906835 ; num = 2)
  1) McDonald 82.25   5) O'Neil 79.82
     SIGNIFICANT     (p = 9.99900009999e-05 ; diff = -2.42434381887402 ; num = 0)
  2) Nivre 81.29   3) Cheng 80.46
     Not significant (p = 0.142285771422858 ; diff = -0.821372470446832 ; num = 1422)
  3) Cheng 80.46   4) Corston-Oliver 80.36
     Not significant (p = 0.433556644335566 ; diff = -0.100468843919046 ; num = 4335)
  4) Corston-Oliver 80.36   5) O'Neil 79.82
     Not significant (p = 0.141585841415858 ; diff = -0.540927669805669 ; num = 1415)

Swedish     84.58 82.55 82.31 82.00 81.78 81.08 80.66 79.69 79.49 79.15 78.65 78.13 74.09 71.72 71.10 68.77 67.58 65.23 63.83
  1) Nivre 84.58   2) McDonald 82.55
     SIGNIFICANT     (p = 9.99900009999e-05 ; diff = -2.03136825333601 ; num = 0)
  1) Nivre 84.58   3) Do/Chang 82.31
     SIGNIFICANT     (p = 0.0001999800019998 ; diff = -2.27051782513445 ; num = 1)
  1) Nivre 84.58   4) Sagae 82.00
     SIGNIFICANT     (p = 9.99900009999e-05 ; diff = -2.58940848436569 ; num = 0)
  1) Nivre 84.58   5) O'Neil 81.78
     SIGNIFICANT     (p = 9.99900009999e-05 ; diff = -2.8082772356105 ; num = 0)
  2) McDonald 82.55   3) Do/Chang 82.31
     Not significant (p = 0.300969903009699 ; diff = -0.239149571798436 ; num = 3009)
  3) Do/Chang 82.31   4) Sagae 82.00
     Not significant (p = 0.267673232676732 ; diff = -0.318890659231243 ; num = 2676)
  4) Sagae 82.00   5) O'Neil 81.78
     Not significant (p = 0.347065293470653 ; diff = -0.218868751244813 ; num = 3470)

Turkish     65.68 63.39 63.21 63.19 61.74 61.22 60.51 60.31 58.61 58.06 57.52 55.09 54.23 53.87 51.07 49.81 46.05 41.72 37.80
  1) Nivre 65.68   2) Johansson 63.39
     SIGNIFICANT     (p = 9.99900009999e-05 ; diff = -2.29034455287789 ; num = 0)
  1) Nivre 65.68   3) Sagae 63.21
     SIGNIFICANT     (p = 9.99900009999e-05 ; diff = -2.46960963951408 ; num = 0)
  1) Nivre 65.68   4) McDonald 63.19
     SIGNIFICANT     (p = 9.99900009999e-05 ; diff = -2.48964150567618 ; num = 0)
  1) Nivre 65.68   5) Corston-Oliver 61.74
     SIGNIFICANT     (p = 9.99900009999e-05 ; diff = -3.94360087631949 ; num = 0)
  2) Johansson 63.39   3) Sagae 63.21
     Not significant (p = 0.385361463853615 ; diff = -0.179265086636192 ; num = 3853)
  3) Sagae 63.21   4) McDonald 63.19
     Not significant (p = 0.467853214678532 ; diff = -0.0200318661620997 ; num = 4678)
  4) McDonald 63.19   5) Corston-Oliver 61.74
     SIGNIFICANT     (p = 0.00999900009999 ; diff = -1.4539593706433 ; num = 99)

For UAS (default: without scoring punctuation)

Lang12      86.60 85.48 85.30 85.03 84.59 84.37 83.71 83.54 81.19 80.40 78.41 78.38 77.51 76.17 74.47 73.47 72.12 70.72 38.69
  1) McDonald 86.60   2) Nivre 85.48
     SIGNIFICANT     (p = 9.99900009999e-05 ; diff = -0.951225420645528 ; num = 0)
  2) Nivre 85.48   3) O'Neil 85.30
     SIGNIFICANT     (p = 0.0070992900709929 ; diff = -0.36026162867816 ; num = 70)

Arabic      79.34 78.62 78.54 78.40 77.74 77.52 76.09 75.53 75.45 74.59 74.27 74.11 72.65 69.50 68.98 68.82 68.46 64.79 62.63
  1) McDonald 79.34   2) Riedel 78.62
     Not significant (p = 0.0705929407059294 ; diff = -0.674653609909583 ; num = 705)
  2) Riedel 78.62   3) O'Neil 78.54
     Not significant (p = 0.322267773222678 ; diff = -0.154388122779622 ; num = 3222)

Bulgarian   92.04 91.72 91.30 90.72 90.09 88.81 85.50 85.24 84.16 82.51 81.95 78.56 73.97  0.00  0.00  0.00  0.00  0.00  0.00
  1) McDonald 92.04   2) Nivre 91.72
     Not significant (p = 0.297770222977702 ; diff = -0.234827800413541 ; num = 2977)
  2) Nivre 91.72   3) Cheng 91.30
     Not significant (p = 0.177182281771823 ; diff = -0.36526162101579 ; num = 1771)

Chinese     93.18 91.07 90.64 90.54 90.00 89.64 89.60 89.46 88.65 83.06 82.86 81.33 79.90 79.48 78.37 77.63 77.04 74.87  0.00
  1) Riedel 93.18   2) McDonald 91.07
     SIGNIFICANT     (p = 0.0005999400059994 ; diff = -1.21794846924494 ; num = 5)
  2) McDonald 91.07   3) O'Neil 90.64
     SIGNIFICANT     (p = 9.99900009999e-05 ; diff = -1.40016338788998 ; num = 0)

Czech       87.30 85.58 84.80 83.40 83.02 82.64 81.78 77.44 77.40 77.32 74.82 73.44 72.88 72.24 70.74 68.14 66.86 66.36  0.00
  1) McDonald 87.30   2) O'Neil 85.58
     SIGNIFICANT     (p = 9.99900009999e-05 ; diff = -1.66345250666407 ; num = 0)
  2) O'Neil 85.58   3) Nivre 84.80
     Not significant (p = 0.301869813018698 ; diff = -0.291146555036377 ; num = 3018)

Danish      90.58 89.80 89.66 88.78 88.64 87.94 86.85 86.59 86.53 85.67 83.39 82.93 81.94 81.72 80.54 79.90 78.84 78.16 77.45
  1) McDonald 90.58   2) Nivre 89.80
     Not significant (p = 0.0771922807719228 ; diff = -0.589205302003393 ; num = 771)
  2) Nivre 89.80   3) Riedel 89.66
     Not significant (p = 0.313468653134687 ; diff = -0.219810263934804 ; num = 3134)

Dutch       83.57 82.91 81.73 81.35 80.71 77.79 76.25 76.01 75.59 75.49 74.83 74.47 71.75 71.39 68.93 68.33 66.17 64.07  0.00
  1) McDonald 83.57   2) Riedel 82.91
     Not significant (p = 0.0602939706029397 ; diff = -0.676567763708292 ; num = 602)
  2) Riedel 82.91   3) O'Neil 81.73
     SIGNIFICANT     (p = 0.0006999300069993 ; diff = -1.30306230659862 ; num = 6)

German      90.38 89.76 89.16 88.76 87.92 87.66 87.20 86.90 85.90 83.09 80.25 80.01 79.79 79.73 76.98 73.00 72.64 67.71  0.00
  1) McDonald 90.38   2) Riedel 89.76
     SIGNIFICANT     (p = 0.042995700429957 ; diff = -0.634872872209115 ; num = 429)
  2) Riedel 89.76   3) O'Neil 89.16
     SIGNIFICANT     (p = 0.0024997500249975 ; diff = -0.554039100452172 ; num = 24)

Japanese    93.16 93.12 93.10 92.96 92.84 92.84 92.20 91.74 90.79 90.77 89.67 87.85 87.31 87.11 86.71 85.97 82.05 72.64  0.00
  1) O'Neil 93.16   2) Cheng 93.12
     Not significant (p = 0.447655234476552 ; diff = 0.0444399511914781 ; num = 4476)
  2) Cheng 93.12   3) Nivre 93.10
     Not significant (p = 0.490650934906509 ; diff = -0.00754089712938821 ; num = 4906)

Portuguese  91.36 91.22 90.30 89.78 89.70 89.42 88.96 88.60 88.40 87.76 85.61 85.57 85.03 84.29 82.41 81.27 79.46 77.10  0.00
  1) McDonald 91.36   2) Nivre 91.22
     Not significant (p = 0.501849815018498 ; diff = 0.00421041635529207 ; num = 5018)
  2) Nivre 91.22   3) Cheng 90.30
     SIGNIFICANT     (p = 0.0196980301969803 ; diff = -0.91647187546333 ; num = 196)

Slovene     83.17 83.17 81.77 81.71 81.14 80.32 78.72 78.02 77.72 76.92 75.06 74.88 74.36 74.02 72.88 72.14 70.60 68.94 68.45
  1) McDonald 83.17   2) Riedel 83.17
     Not significant (p = 0.132386761323868 ; diff = -0.471202811969547 ; num = 1323)
  2) Riedel 83.17   3) Corston-Oliver 81.77
     SIGNIFICANT     (p = 0.0258974102589741 ; diff = -0.905599548709873 ; num = 258)

Spanish     86.05 85.15 84.87 84.67 84.11 83.09 81.43 81.13 81.05 80.77 77.58 76.20 75.76 74.25 73.89 72.85 71.33 70.07 53.18
  1) McDonald 86.05   2) Cheng 85.15
     Not significant (p = 0.0724927507249275 ; diff = -1.00142453875124 ; num = 724)
  2) Cheng 85.15   3) Corston-Oliver 84.87
     Not significant (p = 0.31006899310069 ; diff = -0.337954497765551 ; num = 3100)

Swedish     89.54 89.50 89.05 88.93 88.57 88.57 88.45 88.33 86.62 85.54 85.08 84.17 83.03 82.65 79.69 79.53 76.24 73.25 73.19
  1) Corston-Oliver 89.54   2) Nivre 89.50
     Not significant (p = 0.413258674132587 ; diff = 0.107872146044144 ; num = 4132)
  2) Nivre 89.50   3) Do/Chang 89.05
     Not significant (p = 0.221577842215778 ; diff = -0.343394880410116 ; num = 2215)

Turkish     75.82 74.67 74.49 74.07 73.59 73.31 73.15 73.11 72.02 71.54 70.05 69.25 68.77 65.50 65.25 64.19 61.58 60.45 56.90
  1) Nivre 75.82   2) McDonald 74.67
     Not significant (p = 0.188081191880812 ; diff = -0.467515567314209 ; num = 1880)
  2) McDonald 74.67   3) Cheng 74.49
     Not significant (p = 0.102389761023898 ; diff = -0.670985862743649 ; num = 1023)

For LAS when scoring punctuation

Lang12      80.56 80.23 78.66 78.27 78.22 78.16 77.57 76.33 75.23 75.08 71.05 69.73 69.13 63.55 63.50 62.43 61.69 58.39 36.00
  1) McDonald 80.56   2) Nivre 80.23
     SIGNIFICANT     (p = 0.0460953904609539 ; diff = -0.326017028789465 ; num = 460)
  2) Nivre 80.23   3) O'Neil 78.66
     SIGNIFICANT     (p = 9.99900009999e-05 ; diff = -1.57469131759578 ; num = 0)

Arabic      67.00 66.98 66.95 66.69 65.40 64.82 64.40 63.67 63.15 63.09 61.42 59.07 56.23 53.42 52.91 51.98 50.79 50.47 43.40
  1) O'Neil 67.00   2) Nivre 66.98
     Not significant (p = 0.4995500449955 ; diff = -0.0183044853899474 ; num = 4995)
  2) Nivre 66.98   3) McDonald 66.95
     Not significant (p = 0.486051394860514 ; diff = -0.0371207891308387 ; num = 4860)

Bulgarian   88.07 88.05 86.97 85.51 83.99 83.60 78.90 76.88 76.53 74.67 71.17 69.68 64.93  0.00  0.00  0.00  0.00  0.00  0.00
  1) McDonald 88.07   2) Nivre 88.05
     Not significant (p = 0.478152184781522 ; diff = -0.0169396697000792 ; num = 4781)
  2) Nivre 88.05   3) Cheng 86.97
     SIGNIFICANT     (p = 0.0242975702429757 ; diff = -1.0784378159757 ; num = 242)

Chinese     89.96 86.95 86.65 85.85 85.08 84.74 84.18 83.60 79.93 78.35 76.20 75.26 74.72 72.55 72.53 71.69 66.10 55.05  0.00
  1) Riedel 89.96   2) Nivre 86.95
     SIGNIFICANT     (p = 0.0001999800019998 ; diff = -3.01265562649654 ; num = 1)
  2) Nivre 86.95   3) O'Neil 86.65
     Not significant (p = 0.303569643035696 ; diff = -0.299319632880994 ; num = 3035)

Czech       80.23 77.60 76.99 76.68 75.79 75.12 71.04 69.84 68.41 67.95 64.14 60.81 59.34 57.15 55.56 55.48 49.92 47.72  0.00
  1) McDonald 80.23   2) Nivre 77.60
     SIGNIFICANT     (p = 0.0003999600039996 ; diff = -2.63110029044938 ; num = 3)
  2) Nivre 77.60   3) O'Neil 76.99
     Not significant (p = 0.200679932006799 ; diff = -0.615270801298522 ; num = 2006)

Danish      84.71 84.59 83.10 81.99 81.63 81.51 81.49 81.07 80.64 78.81 76.06 75.15 74.90 73.22 73.12 69.26 68.69 66.92 60.56
  1) Nivre 84.71   2) McDonald 84.59
     Not significant (p = 0.411758824117588 ; diff = -0.119760765550254 ; num = 4117)
  2) McDonald 84.59   3) Riedel 83.10
     SIGNIFICANT     (p = 0.0005999400059994 ; diff = -1.48636021872862 ; num = 5)

Dutch       81.15 80.63 80.61 79.70 78.84 77.06 75.52 75.33 74.50 74.09 72.44 71.57 70.33 66.48 65.78 65.19 62.04 60.14  0.00
  1) McDonald 81.15   2) Riedel 80.63
     Not significant (p = 0.147685231476852 ; diff = -0.519108325872935 ; num = 1476)
  2) Riedel 80.63   3) Nivre 80.61
     Not significant (p = 0.479052094790521 ; diff = -0.0180931065353747 ; num = 4790)

German      87.30 86.00 85.55 85.21 84.47 84.11 83.83 83.28 81.91 79.84 75.80 74.55 74.08 70.09 67.76 66.40 64.65 61.03  0.00
  1) McDonald 87.30   2) Riedel 86.00
     SIGNIFICANT     (p = 0.0020997900209979 ; diff = -1.29961362838077 ; num = 20)
  2) Riedel 86.00   3) Nivre 85.55
     Not significant (p = 0.237776222377762 ; diff = -0.456520899192128 ; num = 2377)

Japanese    92.68 91.86 91.74 91.68 91.56 91.33 91.19 91.16 89.62 89.56 88.97 87.20 86.64 86.29 84.99 74.42 73.05 69.64  0.00
  1) Nivre 92.68   2) McDonald 91.86
     SIGNIFICANT     (p = 0.0057994200579942 ; diff = -0.823055506916504 ; num = 57)
  2) McDonald 91.86   3) O'Neil 91.74
     Not significant (p = 0.302569743025697 ; diff = -0.122771843810114 ; num = 3025)

Portuguese  85.60 84.93 83.98 83.04 82.92 82.32 82.27 82.09 81.83 81.01 78.42 74.06 73.34 68.64 66.75 65.47 65.37 64.84  0.00
  1) Nivre 85.60   2) McDonald 84.93
     Not significant (p = 0.177482251774823 ; diff = -0.664663371399456 ; num = 1774)
  2) McDonald 84.93   3) Sagae 83.98
     Not significant (p = 0.083991600839916 ; diff = -0.954567922277093 ; num = 839)

Slovene     72.39 71.86 70.81 70.25 69.95 68.39 68.31 67.76 65.52 65.26 64.59 63.33 61.17 55.54 54.95 54.63 52.94 49.01 47.31
  1) McDonald 72.39   2) Corston-Oliver 71.86
     Not significant (p = 0.22027797220278 ; diff = -0.531801251956097 ; num = 2202)
  2) Corston-Oliver 71.86   3) Cheng 70.81
     Not significant (p = 0.076992300769923 ; diff = -1.04879655712051 ; num = 769)

Spanish     80.61 79.47 78.63 78.54 77.99 77.96 76.26 75.78 75.76 74.46 70.86 69.30 68.34 65.19 65.19 64.96 62.33 60.98 42.41
  1) McDonald 80.61   2) Nivre 79.47
     Not significant (p = 0.0891910808919108 ; diff = -1.14164207938182 ; num = 891)
  2) Nivre 79.47   3) Corston-Oliver 78.63
     Not significant (p = 0.141985801419858 ; diff = -0.843170003512483 ; num = 1419)

Swedish     83.89 82.32 82.02 81.72 80.99 80.92 80.69 79.76 79.51 78.52 76.98 76.93 72.01 68.90 67.68 64.64 62.94 61.69 60.70
  1) Nivre 83.89   2) McDonald 82.32
     SIGNIFICANT     (p = 0.0018998100189981 ; diff = -1.57379243281468 ; num = 18)
  2) McDonald 82.32   3) Do/Chang 82.02
     Not significant (p = 0.302569743025697 ; diff = -0.300456152758159 ; num = 3025)

Turkish     73.78 71.96 71.31 70.96 70.93 70.69 70.01 68.82 68.82 68.57 68.16 65.95 65.28 64.00 62.18 53.80 52.97 51.40 51.23
  1) Nivre 73.78   2) McDonald 71.96
     SIGNIFICANT     (p = 0.0002999700029997 ; diff = -1.81538094607124 ; num = 2)
  2) McDonald 71.96   3) Sagae 71.31
     Not significant (p = 0.107489251074893 ; diff = -0.648934676030152 ; num = 1074)

All official submissions of participants

in one tarball

Download online_results.tar.bz2 (1.3 MB)

These files are tarred and zipped using bzip2, and can be unpacked with either 'tar xjf filename' or 'tar xyf filename' (depending on the your version of tar). This will create a directory online_results, which contains files named <participant-name>_<language-name>.txt

Each file contains the HEAD and the DEPREL column as submitted by that shared task participant for the test data of that language. Due to licensing restrictions for most of the data, we cannot include the other columns of the test data but if you have the test data, you can easily restore the original submission files with a UNIX tools like "paste".

We hope that these files will be useful for researchers trying to reproduce the shared task experiments for future comparison, and for those working on parser combinations.

Output of eval.pl

for concatenation of all submissions

Prior to calling eval.pl, gold standard and predicted DEPREL values have been suffixed with a two-letter language prefix, e.g. the German DEPREL value 'APPR' becomes 'APPR_ge'. Output of eval.pl: everybody_allLangs.eval

Older, partial result tables

Final average and standard deviation per language

AV: average; SD: standard deviation

Labeled attachment score
Arabic Chinese Czech Danish Dutch German Japanese Portuguese Slovene Spanish Swedish Turkish Bulgarian
AV 59.94 78.32 67.17 78.31 70.73 78.58 85.86 80.63 65.16 73.52 76.44 55.95 79.98
SD 6.53 8.82 8.93 5.45 6.66 7.51 7.09 5.83 6.78 8.41 6.46 7.71 6.30
Unlabeled attachment score
Arabic Chinese Czech Danish Dutch German Japanese Portuguese Slovene Spanish Swedish Turkish Bulgarian
AV 73.48 84.85 77.01 84.52 75.07 82.60 89.05 86.46 76.53 77.76 84.21 69.35 85.89
SD 4.94 5.99 6.70 4.29 5.78 6.73 5.20 4.17 4.67 7.81 5.45 5.51 5.60
Label accuracy
Arabic Chinese Czech Danish Dutch German Japanese Portuguese Slovene Spanish Swedish Turkish Bulgarian
AV 75.12 81.66 76.59 84.50 77.57 86.26 89.90 85.35 76.31 85.71 80.00 69.59 84.38
SD 5.49 7.92 7.69 4.35 5.92 6.01 5.36 5.45 6.40 4.56 6.24 7.94 5.23

Top three scores per language and over all 12 required languages

Significance computed with version 1.8 of eval.pl and Dan Bikel's Randomized Parsing Evaluation Comparator (Statistical Significance Tester for evalb Output) (with a default 10,000 iterations). Differences taken to be significant if p<0.05.

Clarification: "12 l." means how a system did on all 12 required languages. So the top three for "12 l." are the scores of the three systems that performed best overall (the "winners").
Technically, it is not the average of the scores of a system on each language but the total. I computed it by concatenating all 12 gold standard files and, for each system, all twelve submissions (with dummy submissions of zero accuracy for those few files that somebody failed to submit) and then applying eval.pl to these two concatenated files. Because our test sets are all roughly the same size (5000 scoring tokens) the score one gets when one does this concatenation is at most 0.01% different from what one would get by computing the average of the individual language scores. So this difference is irrelevant for the ranking. We can only compute significance of differences in the concatenated results, not of differences in the averages, hence this subtle distinction.

Top three: labeled attachment
Arabic Chinese Czech Danish Dutch German Japanese Portuguese Slovene Spanish Swedish Turkish 12 l. Bulgarian
1st 66.91 89.96 80.18 84.79 79.19 87.34 91.65 87.60 73.44 82.25 84.58 65.68 80.27 87.57
2nd 66.71 86.92 78.42 84.77 78.59 86.24 90.71 86.82 72.42 81.29 82.55 63.39 80.19 87.41
3rd 66.71 86.70 76.60 83.63 78.59 85.82 90.57 86.01 71.42 80.46 82.31 63.21 78.43 86.34
Difference 1st to 2nd significant? (p = ...)
No Yes Yes No No Yes Yes No No No Yes Yes No No
0.382 0.000 0.011 0.500 0.196 0.010 0.004 0.090 0.060 0.112 0.000 0.000 0.348 0.401
Difference 2nd to 3rd significant? (p = ...)
No No Yes Yes No No No No No No No No Yes Yes
0.493 0.349 0.008 0.023 0.495 0.246 0.322 0.073 0.076 0.144 0.314 0.380 0.000 0.017

Top three: unlabeled attachment
Arabic Chinese Czech Danish Dutch German Japanese Portuguese Slovene Spanish Swedish Turkish 12 l. Bulgarian
1st 79.34 93.18 87.30 90.58 83.57 90.38 93.16 91.36 83.17 86.05 89.54 75.82 86.60 92.04
2nd 78.62 91.07 85.58 89.80 82.91 89.76 93.12 91.22 83.17 85.15 89.50 74.67 85.48 91.72
3rd 78.54 90.64 84.80 89.66 81.73 89.16 93.10 90.30 81.77 84.87 89.05 74.49 85.30 91.30

Top three: label accuracy
Arabic Chinese Czech Danish Dutch German Japanese Portuguese Slovene Spanish Swedish Turkish 12 l. Bulgarian
1st 80.34 91.93 86.72 89.22 83.89 92.11 94.34 91.54 82.51 90.40 87.39 78.49 86.75 90.70
2nd 80.18 89.01 85.40 89.16 83.69 91.15 93.74 90.46 81.91 90.06 85.58 77.71 86.65 90.44
3rd 80.00 88.93 83.80 88.22 83.51 91.03 93.58 90.22 81.10 89.46 85.14 77.63 85.62 89.27