Obtaining linguistic annotation from novice crowdworkers is far from trivial. A case in point is the annotation of discourse relations, which is a complicated task. Recent methods have obtained promising results by extracting relation labels from either discourse connectives (DCs) or question-answer (QA) pairs that participants provide. The current contribution studies the effect of worker selection and training on the agreement on implicit relation labels between workers and gold labels, for both the DC and the QA method. In Study 1, workers were not specifically selected or trained, and the results show that there is much room for improvement. Study 2 shows that a combination of selection and training does lead to improved results, but the method is cost- and time-intensive. Study 3 shows that a selection-only approach is a viable alternative; it results in annotations of comparable quality compared to annotations from trained participants. The results generalized over both the DC and QA method and therefore indicate that a selection-only approach could also be effective for other crowdsourced discourse annotation tasks.
|Title of host publication||2022 Language Resources and Evaluation Conference, LREC 2022|
|Editors||Nicoletta Calzolari, Frederic Bechet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Helene Mazo, Jan Odijk, Stelios Piperidis|
|Publisher||European Language Resources Association (ELRA)|
|Number of pages||9|
|State||Published - 2022|
|Event||13th International Conference on Language Resources and Evaluation Conference, LREC 2022 - Marseille, France|
Duration: 20 Jun 2022 → 25 Jun 2022
|Name||2022 Language Resources and Evaluation Conference, LREC 2022|
|Conference||13th International Conference on Language Resources and Evaluation Conference, LREC 2022|
|Period||20/06/22 → 25/06/22|
Bibliographical noteFunding Information:
This research was funded in part by the German Research Foundation (DFG) as part of SFB 1102 “Information Density and Linguistic Encoding”, by grants
from Intel Labs, Facebook, the Israel Science Foundation grant 1951/17 and by the European Research Council (ERC) under the Horizon 2020 research and innovation program, grant agreement No. 677352 (NL-PRO), for which we are grateful.
© European Language Resources Association (ELRA), licensed under CC-BY-NC-4.0.
- discourse annotations
- participant selection