research-article

Supporting Requesters in Writing Clear Crowdsourcing Task Descriptions Through Computational Flaw Assessment

Authors:
Zahra Nouri

Department of Computer Science, Paderborn University, Germany

Department of Computer Science, Paderborn University, Germany

0000-0002-7551-6242
View Profile

,
Nikhil Prakash

Khoury College of Computer Sciences, Northeastern University, United States

Khoury College of Computer Sciences, Northeastern University, United States

0000-0002-5976-2961
View Profile

,
Ujwal Gadiraju

Web Information Systems, Delft University of Technology, Netherlands

Web Information Systems, Delft University of Technology, Netherlands

0000-0002-6189-6539
View Profile

,
Henning Wachsmuth

Institute of Artificial Intelligence, Leibniz University Hannover, Germany

Institute of Artificial Intelligence, Leibniz University Hannover, Germany

0000-0003-2792-621X
View Profile

IUI '23: Proceedings of the 28th International Conference on Intelligent User InterfacesMarch 2023Pages 737–749https://doi.org/10.1145/3581641.3584039

Published:27 March 2023Publication History

IUI '23: Proceedings of the 28th International Conference on Intelligent User Interfaces

Pages 737–749

ABSTRACT

Quality control is an, if not the, essential challenge in crowdsourcing. Unsatisfactory responses from crowd workers have been found to particularly result from ambiguous and incomplete task descriptions, often from inexperienced task requesters. However, creating clear task descriptions with sufficient information is a complex process for requesters in crowdsourcing marketplaces. In this paper, we investigate the extent to which requesters can be supported effectively in this process through computational techniques. To this end, we developed a tool that enables requesters to iteratively identify and correct eight common clarity flaws in their task descriptions before deployment on the platform. The tool can be used to write task descriptions from scratch or to assess and improve the clarity of prepared descriptions. It employs machine learning-based natural language processing models trained on real-world task descriptions that score a given task description for the eight clarity flaws. On this basis, the requester can iteratively revise and reassess the task description until it reaches a sufficient level of clarity. In a first user study, we let requesters create task descriptions using the tool and rate the tool’s different aspects of helpfulness thereafter. We then carried out a second user study with crowd workers, as those who are confronted with such descriptions in practice, to rate the clarity of the created task descriptions. According to our results, 65% of the requesters classified the helpfulness of the information provided by the tool high or very high (only 12% as low or very low). The requesters saw some room for improvement though, for example, concerning the display of bad examples. Nevertheless, 76% of the crowd workers believe that the overall clarity of the task descriptions created by the requesters using the tool improves over the initial version. In line with this, the automatically-computed clarity scores of the edited task descriptions were generally higher than those of the initial descriptions, indicating that the tool reliably predicts the clarity of task descriptions in overall terms.

References

Michael S Bernstein, Greg Little, Robert C Miller, Björn Hartmann, Mark S Ackerman, David R Karger, David Crowell, and Katrina Panovich. 2010. Soylent: a word processor with a crowd inside. In Proceedings of the 23nd annual ACM symposium on User interface software and technology. 313–322.Google ScholarDigital Library
Jonathan Bragg, Daniel S Weld, 2018. Sprout: Crowd-powered task design for crowdsourcing. In The 31st Annual ACM Symposium on User Interface Software and Technology. ACM, 165–176.Google ScholarDigital Library
Jeanne Sternlicht Chall and Edgar Dale. 1995. Readability revisited: The new Dale-Chall readability formula. Brookline Books.Google Scholar
Jesse Chandler, Gabriele Paolacci, and Pam Mueller. 2013. Risks and Rewards of Crowdsourcing Marketplaces. Springer New York, New York, NY, 377–392. https://doi.org/10.1007/978-1-4614-8806-4_30Google ScholarCross Ref
Joseph Chee Chang, Saleema Amershi, and Ece Kamar. 2017. Revolt: Collaborative crowdsourcing for labeling machine learning datasets. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 2334–2346.Google ScholarDigital Library
Jenny J Chen, Natala J Menezes, Adam D Bradley, and T North. 2011. Opportunities for crowdsourcing research on amazon mechanical turk. Interfaces 5, 3 (2011), 1.Google Scholar
Chun-Wei Chiang, Anna Kasunic, and Saiph Savage. 2018. Crowd coach: Peer coaching for crowd workers’ skill growth. Proceedings of the ACM on Human-Computer Interaction 2, CSCW(2018), 1–17.Google ScholarDigital Library
Kevyn Collins-Thompson and James P Callan. 2004. A language modeling approach to predicting reading difficulty. In Proceedings of the human language technology conference of the North American chapter of the association for computational linguistics: HLT-NAACL 2004. 193–200.Google Scholar
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423Google ScholarCross Ref
Djellel Difallah, Elena Filatova, and Panos Ipeirotis. 2018. Demographics and dynamics of mechanical turk workers. In Proceedings of the eleventh ACM international conference on web search and data mining. 135–143.Google ScholarDigital Library
Ailbhe Finnerty, Pavel Kucherbaev, Stefano Tranquillini, and Gregorio Convertino. 2013. Keep it simple: Reward and task design in crowdsourcing. In Proceedings of the Biannual Conference of the Italian Chapter of SIGCHI. 1–4.Google ScholarDigital Library
Michael J Franklin, Donald Kossmann, Tim Kraska, Sukriti Ramesh, and Reynold Xin. 2011. CrowdDB: answering queries with crowdsourcing. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. ACM, 61–72.Google ScholarDigital Library
Ujwal Gadiraju and Gianluca Demartini. 2019. Understanding worker moods and reactions to rejection in crowdsourcing. In Proceedings of the 30th ACM Conference on Hypertext and Social Media. 211–220.Google ScholarDigital Library
Ujwal Gadiraju, Jie Yang, and Alessandro Bozzon. 2017. Clarity is a worthwhile quality: On the role of task clarity in microtask crowdsourcing. In Proceedings of the 28th ACM Conference on Hypertext and Social Media. ACM, 5–14.Google ScholarDigital Library
Snehalkumar (Neil) S Gaikwad, Mark E Whiting, Dilrukshi Gamage, Catherine A Mullings, Dinesh Majeti, Shirish Goyal, Aaron Gilbee, Nalin Chhibber, Adam Ginzberg, Angela Richmond-Fuller, 2017. The daemo crowdsourcing marketplace. In Companion of the 2017 ACM conference on computer supported cooperative work and social computing. 1–4.Google ScholarDigital Library
Philipp Gutheim and Björn Hartmann. 2012. Fantasktic: Improving quality of results for novice crowdsourcing users. EECS Dept., Univ. California, Berkeley, CA, USA, Tech. Rep. UCB/EECS-2012 112(2012).Google Scholar
Pengcheng He, Jianfeng Gao, and Weizhu Chen. 2021. DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing. CoRR abs/2111.09543(2021). arXiv:2111.09543https://arxiv.org/abs/2111.09543Google Scholar
Jeff Howe. 2006. The rise of crowdsourcing. Wired magazine 14, 6 (2006), 1–4.Google Scholar
Kosetsu Ikeda, Atsuyuki Morishima, Habibur Rahman, Senjuti Basu Roy, Saravanan Thirumuruganathan, Sihem Amer-Yahia, and Gautam Das. 2016. Collaborative crowdsourcing with crowd4u. Proceedings of the VLDB Endowment 9, 13 (2016), 1497–1500.Google ScholarDigital Library
Thorsten Joachims. 1998. Text categorization with support vector machines: learning with many relevant features. In Proceedings of ECML-98, 10th European Conference on Machine Learning, Claire Nédellec and Céline Rouveirol (Eds.). Springer Verlag, Heidelberg, DE, Chemnitz, DE, 137–142. /brokenurl#joachims98.psGoogle ScholarDigital Library
Collins-Thompson Kevyn. 2014. Computational assessment of text readability. ITL-International Journal of Applied Linguistics 165, 2(2014), 97–135.Google ScholarCross Ref
Shashank Khanna, Aishwarya Ratan, James Davis, and William Thies. 2010. Evaluating and improving the usability of Mechanical Turk for low-income workers in India. In Proceedings of the first ACM symposium on computing for development. 1–10.Google ScholarDigital Library
J Peter Kincaid, Robert P Fishburne Jr, Richard L Rogers, and Brad S Chissom. 1975. Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. Technical Report. Naval Technical Training Command Millington TN Research Branch.Google Scholar
Aniket Kittur, Jeffrey V Nickerson, Michael Bernstein, Elizabeth Gerber, Aaron Shaw, John Zimmerman, Matt Lease, and John Horton. 2013. The future of crowd work. In Proceedings of the 2013 conference on Computer supported cooperative work. 1301–1318.Google ScholarDigital Library
Aniket Kittur, Boris Smus, Susheel Khamkar, and Robert E Kraut. 2011. Crowdforge: Crowdsourcing complex work. In Proceedings of the 24th annual ACM symposium on User interface software and technology. 43–52.Google ScholarDigital Library
Anand P Kulkarni, Matthew Can, and Bjoern Hartmann. 2011. Turkomatic: automatic recursive task and workflow design for mechanical turk. In CHI’11 extended abstracts on human factors in computing systems. 2053–2058.Google Scholar
Greg Little, Lydia B Chilton, Max Goldman, and Robert C Miller. 2010. Turkit: human computation algorithms on mechanical turk. In Proceedings of the 23nd annual ACM symposium on User interface software and technology. 57–66.Google ScholarDigital Library
VK Chaithanya Manam, Dwarakanath Jampani, Mariam Zaim, Meng-Han Wu, and Alexander J. Quinn. 2019. TaskMate: A Mechanism to Improve the Quality of Instructions in Crowdsourcing. In Companion Proceedings of The 2019 World Wide Web Conference. 1121–1130.Google ScholarDigital Library
VK Chaithanya Manam and Alexander J Quinn. 2018. Wingit: Efficient refinement of unclear task instructions. In Sixth AAAI Conference on Human Computation and Crowdsourcing.Google ScholarCross Ref
Brian McInnis, Dan Cosley, Chaebong Nam, and Gilly Leshed. 2016. Taking a HIT: Designing around rejection, mistrust, risk, and workers’ experiences in Amazon Mechanical Turk. In Proceedings of the 2016 CHI conference on human factors in computing systems. 2271–2282.Google ScholarDigital Library
Zahra Nouri, Ujwal Gadiraju, Gregor Engels, and Henning Wachsmuth. 2021. What is unclear? computational assessment of task clarity in crowdsourcing. In Proceedings of the 32nd ACM Conference on Hypertext and Social Media. 165–175.Google ScholarDigital Library
Zahra Nouri, Ujwal Gadiraju, Gregor Engels, and Henning Wachsmuth. 2021. What is Unclear? Computational Assessment of Task Clarity in Crowdsourcing. In Submitted for publication.Google Scholar
Zahra Nouri, Henning Wachsmuth, and Gregor Engels. 2020. Mining Crowdsourcing Problems from Discussion Forums of Workers. In Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Barcelona, Spain (Online), 6264–6276. https://doi.org/10.18653/v1/2020.coling-main.551Google ScholarCross Ref
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.Google ScholarDigital Library
Niloufar Salehi, Jaime Teevan, Shamsi Iqbal, and Ece Kamar. 2017. Communicating context to the crowd for complex writing tasks. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. 1890–1901.Google ScholarDigital Library
Thimo Schulze, Stefan Seedorf, David Geiger, Nicolas Kaufmann, and Martin Schader. 2011. Exploring task properties in crowdsourcing–An empirical study on Mechanical Turk. (2011).Google Scholar
David Schwartz. 2018. Embedded in the crowd: Creative freelancers, crowdsourced work, and occupational community. Work and Occupations 45, 3 (2018), 247–282. https://doi.org/10.1177/0730888418762263 arXiv:https://doi.org/10.1177/0730888418762263Google ScholarCross Ref
M Six Silberman, Joel Ross, Lilly Irani, and Bill Tomlinson. 2010. Sellers’ problems in human computation markets. In Proceedings of the acm sigkdd workshop on human computation. 18–21.Google ScholarDigital Library
Daniel S Weld, Christopher H Lin, and Jonathan Bragg. 2015. Artificial intelligence and collective intelligence. Handbook of Collective Intelligence(2015), 89–114.Google Scholar
Meng-Han Wu and Alexander James Quinn. 2017. Confusing the crowd: Task instruction quality on amazon mechanical turk. In Fifth AAAI Conference on Human Computation and Crowdsourcing.Google ScholarCross Ref

Index Terms

Supporting Requesters in Writing Clear Crowdsourcing Task Descriptions Through Computational Flaw Assessment

Index terms have been assigned to the content through auto-classification.

Recommendations

What Is Unclear? Computational Assessment of Task Clarity in Crowdsourcing
HT '21: Proceedings of the 32nd ACM Conference on Hypertext and Social Media

Designing tasks clearly to facilitate accurate task completion is a challenging endeavor for requesters on crowdsourcing platforms. Prior research shows that inexperienced requesters fail to write clear and complete task descriptions which directly ...
Read More
Studying the influence of requesters in posted-price crowdsourcing
CODS-COMAD '18: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data

Crowd-powered systems have recently emerged as useful models for solving complex tasks online by combining machine intelligence with crowd intelligence. These models are mainly of two types - collaborative and competitive. Studying the behavior of the ...
Read More
TRR: Reducing Crowdsourcing Task Redundancy
Database and Expert Systems Applications
Abstract
In this paper, we address the problem of task redundancy in crowdsourcing systems while providing a methodology to decrease the overall effort required to accomplish a crowdsourcing task. Typical task assignment systems assign tasks to a fixed ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

IUI '23: Proceedings of the 28th International Conference on Intelligent User Interfaces
March 2023
972 pages
ISBN:9798400701061
DOI:10.1145/3581641

Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 March 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate746of2,811submissions,27%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 109
  Total Downloads
- Downloads (Last 12 months)93
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Supporting Requesters in Writing Clear Crowdsourcing Task Descriptions Through Computational Flaw Assessment

IUI '23: Proceedings of the 28th International Conference on Intelligent User Interfaces

ABSTRACT

References

Cited By

Index Terms

Recommendations

What Is Unclear? Computational Assessment of Task Clarity in Crowdsourcing

Studying the influence of requesters in posted-price crowdsourcing

TRR: Reducing Crowdsourcing Task Redundancy

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Supporting Requesters in Writing Clear Crowdsourcing Task Descriptions Through Computational Flaw Assessment

IUI '23: Proceedings of the 28th International Conference on Intelligent User Interfaces

ABSTRACT

References

Cited By

Index Terms

Recommendations

What Is Unclear? Computational Assessment of Task Clarity in Crowdsourcing

Studying the influence of requesters in posted-price crowdsourcing

TRR: Reducing Crowdsourcing Task Redundancy

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media