ABSTRACT
Quality control is an, if not the, essential challenge in crowdsourcing. Unsatisfactory responses from crowd workers have been found to particularly result from ambiguous and incomplete task descriptions, often from inexperienced task requesters. However, creating clear task descriptions with sufficient information is a complex process for requesters in crowdsourcing marketplaces. In this paper, we investigate the extent to which requesters can be supported effectively in this process through computational techniques. To this end, we developed a tool that enables requesters to iteratively identify and correct eight common clarity flaws in their task descriptions before deployment on the platform. The tool can be used to write task descriptions from scratch or to assess and improve the clarity of prepared descriptions. It employs machine learning-based natural language processing models trained on real-world task descriptions that score a given task description for the eight clarity flaws. On this basis, the requester can iteratively revise and reassess the task description until it reaches a sufficient level of clarity. In a first user study, we let requesters create task descriptions using the tool and rate the tool’s different aspects of helpfulness thereafter. We then carried out a second user study with crowd workers, as those who are confronted with such descriptions in practice, to rate the clarity of the created task descriptions. According to our results, 65% of the requesters classified the helpfulness of the information provided by the tool high or very high (only 12% as low or very low). The requesters saw some room for improvement though, for example, concerning the display of bad examples. Nevertheless, 76% of the crowd workers believe that the overall clarity of the task descriptions created by the requesters using the tool improves over the initial version. In line with this, the automatically-computed clarity scores of the edited task descriptions were generally higher than those of the initial descriptions, indicating that the tool reliably predicts the clarity of task descriptions in overall terms.
- Michael S Bernstein, Greg Little, Robert C Miller, Björn Hartmann, Mark S Ackerman, David R Karger, David Crowell, and Katrina Panovich. 2010. Soylent: a word processor with a crowd inside. In Proceedings of the 23nd annual ACM symposium on User interface software and technology. 313–322.Google ScholarDigital Library
- Jonathan Bragg, Daniel S Weld, 2018. Sprout: Crowd-powered task design for crowdsourcing. In The 31st Annual ACM Symposium on User Interface Software and Technology. ACM, 165–176.Google ScholarDigital Library
- Jeanne Sternlicht Chall and Edgar Dale. 1995. Readability revisited: The new Dale-Chall readability formula. Brookline Books.Google Scholar
- Jesse Chandler, Gabriele Paolacci, and Pam Mueller. 2013. Risks and Rewards of Crowdsourcing Marketplaces. Springer New York, New York, NY, 377–392. https://doi.org/10.1007/978-1-4614-8806-4_30Google ScholarCross Ref
- Joseph Chee Chang, Saleema Amershi, and Ece Kamar. 2017. Revolt: Collaborative crowdsourcing for labeling machine learning datasets. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 2334–2346.Google ScholarDigital Library
- Jenny J Chen, Natala J Menezes, Adam D Bradley, and T North. 2011. Opportunities for crowdsourcing research on amazon mechanical turk. Interfaces 5, 3 (2011), 1.Google Scholar
- Chun-Wei Chiang, Anna Kasunic, and Saiph Savage. 2018. Crowd coach: Peer coaching for crowd workers’ skill growth. Proceedings of the ACM on Human-Computer Interaction 2, CSCW(2018), 1–17.Google ScholarDigital Library
- Kevyn Collins-Thompson and James P Callan. 2004. A language modeling approach to predicting reading difficulty. In Proceedings of the human language technology conference of the North American chapter of the association for computational linguistics: HLT-NAACL 2004. 193–200.Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423Google ScholarCross Ref
- Djellel Difallah, Elena Filatova, and Panos Ipeirotis. 2018. Demographics and dynamics of mechanical turk workers. In Proceedings of the eleventh ACM international conference on web search and data mining. 135–143.Google ScholarDigital Library
- Ailbhe Finnerty, Pavel Kucherbaev, Stefano Tranquillini, and Gregorio Convertino. 2013. Keep it simple: Reward and task design in crowdsourcing. In Proceedings of the Biannual Conference of the Italian Chapter of SIGCHI. 1–4.Google ScholarDigital Library
- Michael J Franklin, Donald Kossmann, Tim Kraska, Sukriti Ramesh, and Reynold Xin. 2011. CrowdDB: answering queries with crowdsourcing. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. ACM, 61–72.Google ScholarDigital Library
- Ujwal Gadiraju and Gianluca Demartini. 2019. Understanding worker moods and reactions to rejection in crowdsourcing. In Proceedings of the 30th ACM Conference on Hypertext and Social Media. 211–220.Google ScholarDigital Library
- Ujwal Gadiraju, Jie Yang, and Alessandro Bozzon. 2017. Clarity is a worthwhile quality: On the role of task clarity in microtask crowdsourcing. In Proceedings of the 28th ACM Conference on Hypertext and Social Media. ACM, 5–14.Google ScholarDigital Library
- Snehalkumar (Neil) S Gaikwad, Mark E Whiting, Dilrukshi Gamage, Catherine A Mullings, Dinesh Majeti, Shirish Goyal, Aaron Gilbee, Nalin Chhibber, Adam Ginzberg, Angela Richmond-Fuller, 2017. The daemo crowdsourcing marketplace. In Companion of the 2017 ACM conference on computer supported cooperative work and social computing. 1–4.Google ScholarDigital Library
- Philipp Gutheim and Björn Hartmann. 2012. Fantasktic: Improving quality of results for novice crowdsourcing users. EECS Dept., Univ. California, Berkeley, CA, USA, Tech. Rep. UCB/EECS-2012 112(2012).Google Scholar
- Pengcheng He, Jianfeng Gao, and Weizhu Chen. 2021. DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing. CoRR abs/2111.09543(2021). arXiv:2111.09543https://arxiv.org/abs/2111.09543Google Scholar
- Jeff Howe. 2006. The rise of crowdsourcing. Wired magazine 14, 6 (2006), 1–4.Google Scholar
- Kosetsu Ikeda, Atsuyuki Morishima, Habibur Rahman, Senjuti Basu Roy, Saravanan Thirumuruganathan, Sihem Amer-Yahia, and Gautam Das. 2016. Collaborative crowdsourcing with crowd4u. Proceedings of the VLDB Endowment 9, 13 (2016), 1497–1500.Google ScholarDigital Library
- Thorsten Joachims. 1998. Text categorization with support vector machines: learning with many relevant features. In Proceedings of ECML-98, 10th European Conference on Machine Learning, Claire Nédellec and Céline Rouveirol (Eds.). Springer Verlag, Heidelberg, DE, Chemnitz, DE, 137–142. /brokenurl#joachims98.psGoogle ScholarDigital Library
- Collins-Thompson Kevyn. 2014. Computational assessment of text readability. ITL-International Journal of Applied Linguistics 165, 2(2014), 97–135.Google ScholarCross Ref
- Shashank Khanna, Aishwarya Ratan, James Davis, and William Thies. 2010. Evaluating and improving the usability of Mechanical Turk for low-income workers in India. In Proceedings of the first ACM symposium on computing for development. 1–10.Google ScholarDigital Library
- J Peter Kincaid, Robert P Fishburne Jr, Richard L Rogers, and Brad S Chissom. 1975. Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. Technical Report. Naval Technical Training Command Millington TN Research Branch.Google Scholar
- Aniket Kittur, Jeffrey V Nickerson, Michael Bernstein, Elizabeth Gerber, Aaron Shaw, John Zimmerman, Matt Lease, and John Horton. 2013. The future of crowd work. In Proceedings of the 2013 conference on Computer supported cooperative work. 1301–1318.Google ScholarDigital Library
- Aniket Kittur, Boris Smus, Susheel Khamkar, and Robert E Kraut. 2011. Crowdforge: Crowdsourcing complex work. In Proceedings of the 24th annual ACM symposium on User interface software and technology. 43–52.Google ScholarDigital Library
- Anand P Kulkarni, Matthew Can, and Bjoern Hartmann. 2011. Turkomatic: automatic recursive task and workflow design for mechanical turk. In CHI’11 extended abstracts on human factors in computing systems. 2053–2058.Google Scholar
- Greg Little, Lydia B Chilton, Max Goldman, and Robert C Miller. 2010. Turkit: human computation algorithms on mechanical turk. In Proceedings of the 23nd annual ACM symposium on User interface software and technology. 57–66.Google ScholarDigital Library
- VK Chaithanya Manam, Dwarakanath Jampani, Mariam Zaim, Meng-Han Wu, and Alexander J. Quinn. 2019. TaskMate: A Mechanism to Improve the Quality of Instructions in Crowdsourcing. In Companion Proceedings of The 2019 World Wide Web Conference. 1121–1130.Google ScholarDigital Library
- VK Chaithanya Manam and Alexander J Quinn. 2018. Wingit: Efficient refinement of unclear task instructions. In Sixth AAAI Conference on Human Computation and Crowdsourcing.Google ScholarCross Ref
- Brian McInnis, Dan Cosley, Chaebong Nam, and Gilly Leshed. 2016. Taking a HIT: Designing around rejection, mistrust, risk, and workers’ experiences in Amazon Mechanical Turk. In Proceedings of the 2016 CHI conference on human factors in computing systems. 2271–2282.Google ScholarDigital Library
- Zahra Nouri, Ujwal Gadiraju, Gregor Engels, and Henning Wachsmuth. 2021. What is unclear? computational assessment of task clarity in crowdsourcing. In Proceedings of the 32nd ACM Conference on Hypertext and Social Media. 165–175.Google ScholarDigital Library
- Zahra Nouri, Ujwal Gadiraju, Gregor Engels, and Henning Wachsmuth. 2021. What is Unclear? Computational Assessment of Task Clarity in Crowdsourcing. In Submitted for publication.Google Scholar
- Zahra Nouri, Henning Wachsmuth, and Gregor Engels. 2020. Mining Crowdsourcing Problems from Discussion Forums of Workers. In Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Barcelona, Spain (Online), 6264–6276. https://doi.org/10.18653/v1/2020.coling-main.551Google ScholarCross Ref
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.Google ScholarDigital Library
- Niloufar Salehi, Jaime Teevan, Shamsi Iqbal, and Ece Kamar. 2017. Communicating context to the crowd for complex writing tasks. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. 1890–1901.Google ScholarDigital Library
- Thimo Schulze, Stefan Seedorf, David Geiger, Nicolas Kaufmann, and Martin Schader. 2011. Exploring task properties in crowdsourcing–An empirical study on Mechanical Turk. (2011).Google Scholar
- David Schwartz. 2018. Embedded in the crowd: Creative freelancers, crowdsourced work, and occupational community. Work and Occupations 45, 3 (2018), 247–282. https://doi.org/10.1177/0730888418762263 arXiv:https://doi.org/10.1177/0730888418762263Google ScholarCross Ref
- M Six Silberman, Joel Ross, Lilly Irani, and Bill Tomlinson. 2010. Sellers’ problems in human computation markets. In Proceedings of the acm sigkdd workshop on human computation. 18–21.Google ScholarDigital Library
- Daniel S Weld, Christopher H Lin, and Jonathan Bragg. 2015. Artificial intelligence and collective intelligence. Handbook of Collective Intelligence(2015), 89–114.Google Scholar
- Meng-Han Wu and Alexander James Quinn. 2017. Confusing the crowd: Task instruction quality on amazon mechanical turk. In Fifth AAAI Conference on Human Computation and Crowdsourcing.Google ScholarCross Ref
Index Terms
- Supporting Requesters in Writing Clear Crowdsourcing Task Descriptions Through Computational Flaw Assessment
Recommendations
What Is Unclear? Computational Assessment of Task Clarity in Crowdsourcing
HT '21: Proceedings of the 32nd ACM Conference on Hypertext and Social MediaDesigning tasks clearly to facilitate accurate task completion is a challenging endeavor for requesters on crowdsourcing platforms. Prior research shows that inexperienced requesters fail to write clear and complete task descriptions which directly ...
Studying the influence of requesters in posted-price crowdsourcing
CODS-COMAD '18: Proceedings of the ACM India Joint International Conference on Data Science and Management of DataCrowd-powered systems have recently emerged as useful models for solving complex tasks online by combining machine intelligence with crowd intelligence. These models are mainly of two types - collaborative and competitive. Studying the behavior of the ...
TRR: Reducing Crowdsourcing Task Redundancy
Database and Expert Systems ApplicationsAbstractIn this paper, we address the problem of task redundancy in crowdsourcing systems while providing a methodology to decrease the overall effort required to accomplish a crowdsourcing task. Typical task assignment systems assign tasks to a fixed ...
Comments