Skip to content

Amazon mechanical turk

kcranston edited this page Sep 6, 2012 · 6 revisions

Mechanical Turk, or MTurk, is an Amazon product that allows for crowd-sourcing of digital tasks. A Requester creates Human Intelligence Tasks (HITs) and publishes them to a list where Workers can select, complete and submit the tasks. Steps to set up and publish HITs:

General setup

  1. Through the Requester interface, design your HIT. Amazon provides a number of templates, but you can customize these in any way you want, so don't worry too much about which one to pick.
  2. Customize your HIT template:
    • Properties: how much time a worker has to complete the task after taking the HIT; number of workers to perform the same HIT, reward per assignment, etc.
    • Design layout: what elements to put on the HIT that the workers will see. You can use the WYSIWYG editor, or View HTML Source. The template includes instructions, input data data / links to input data and places for data entry (or links to an external location for data entry). You can create a batch of similar HITs by including variables in the template (see instructions here from Amazon.
    • Publish your HITs: If you have included input variables in your template, you will be prompted to upload a data file with the list of variables. The data file is a csv file where the column headings match the name of the input variables and the rows are the separate values. Each HIT in the batch will correspond to one row of the data file, substituting the values into the associated variable.
    • Manage: You can watch the progress of HITs, see statistics on completion time and approve submitted HITs (either individually or the whole batch). Workers will not be paid until you approve a HIT, noting that the template contains an auto-approval window after which all HITs will be automatically approved.
Notes:
  • once a worker takes a HIT, that HIT is unavailable to other Workers until the Time Allotted expires without the Worker completing the HIT
  • you can request any number of workers to complete each HIT. This can be useful to compare results for accuracy.

Details from the Park Notebooks project

  • started with Data Extraction Template and modified
  • screen cap of the HIT template
  • for full pages: estimated 45 minutes, allowed 60 minutes, paid $1.00
  • for half pages: estimated 25 minutes; allowed 45 minutes, paid $0.60
  • sample input file
  • average time per HIT: full pages ~35 min, half pages ~23
  • all HITs in batch completed within 1 hour of submission
  • acceptance rate was 61 / 66 HITs; visual check to make sure all cells complete and spot check 3 rows (could have submitted the proofreading as a HIT)