Turkopticon: Defender of Amazon’s Anonymous Workforce

January 16, 2016
By

(This article was first published on Econometrics by Simulation, and kindly contributed to R-bloggers)

Labor crowdsourcing is the system by which large crowds or workers contribute to a project allowing for complex and tedious tasks to be rapidly and efficiently completed. The largest labor crowdsourcing platform in the world, Amazon Mechancial TURK (Mturk) is estimate to have revenue in the order of 10 and 150 million dollars annually. Despite this, there is no built in system by which workers can identify which employers (requesters) are cheaters and which are legitimate. And in a system powered by anonymity and numerous micro transactions, the inability to provide feedback to warn other workers of requester quality, is a big deal!

Social activists and artists, Six Silberman and Lilly Irani at UCSD have designed a solution to help mitigate this problem. Turkopticon provides a mechanism by which workers can rate their experience of working with requesters. Turkopticon reviews can be read on the UCSD host website and average ratings can be quickly accessed by workers when searching for requesters through the browser extension and pops up next to request information upon mouse over.
Figure 1: An example MTurk HIT listing with Turkopticon review information provided.

Established in May of 2009, Turkopticon with over 284 thousand reviews written by more than 17 thousand reviewers has been an unmitigated success at creating a tool by which the community of Mturk workers share information about their experience with requesters.

Figure 2: Activity on Turkopticon measured in terms of number of reviews written daily and the unique number of reviewers.

From Figure 2 we can see that both the number of reviews and the number of daily participating reviewers increased dramatically from 2009 until mid-2015 at which time both the number of participating reviewers and the number of published reviews have been in decline.

This might not actually be a problem for the Mturk system. Perhaps as information is shared about a particular requester, workers find that their individualized experiences are sufficiently summarized by the quantity of information already available.

Figure 3: Mean and median number of reviews for individual requesters at each date.

Figure 3 seems to support the idea that as time has gone on some requesters have accumulated a large collection of reviews. This is not particularly surprising as one would expect that the longer requesters use Mturk the more reviews they accumulate except in the case when requesters prefer not to maintain their reviews (if for instance they are cheaters). Requesters have the power at any time to start a new Amazon requester account. Reviews do not transfer between accounts. This might be what is driving the median number of reviews to be so low (around 10 at this time).

In order to get a better perspective on what is happening we might want to ask the question, how long are requesters generally active? We cannot observe how long requesters are active directly as we do not have the Mturk activity data, but we can look at when reviews are posted assuming that requesters must have been active at least for each day for which as review was written.

Figure 4: Mean and median of the number of days requester accounts have been active calculated as Current Date of a Review less the First Date of a Review.

From Figure 4 we can see that on average requesters have been active for nearly two years though the median activity level is much lower than this value at less than a year. These numbers are likely inflated as requesters that get reviewed early then drop out of Mturk for a period of time before using their accounts once again are considered equally active as requesters who have been active continuously over the same period. One way of avoiding this would be only to count days which were active by requesters as demonstrated by being reviewed those days. However, this figure ends up being almost identical to Figure 3, so I have omitted it.

Seeing all of these reviews we might ask ourselves how many reviews are being contributed by an elite group of very active reviewers and how much by a wider group?

Figure 5: The numbers of reviews by reviewers at the time of writing a review.

From Figure 5 we can see that the median number of reviews written on any given date is around 100. This implies large scale community involvement with many reviewers contributing a significant number of reviews. We can also see that the mean is significantly higher than the median and grows more so recently, implying that the distribution is skewed with a few reviewers contributing a significant portion of reviews written.

From the available evidence we can therefore confidently claim that Turkopticon has been successful at fulfilling its mission of providing a mechanism for workers to exchange information with regard requester quality. This however is not the only objective of Turkopticon and the workers who contribute to its database.

One of the major objectives of Turkopticon, at least as many workers see it, is to provide a platform by which workers can exchange information about requesters and thus use that information to gain leverage over requesters. Ideally driving the pay rate upwards.

Workers seem to be targeting an ideal pay rate in the range of $11-$15 per hour though in practice the effective pay rate seems to be much less than this. Turkopticon reviewers would also like to see the general quality of their working environment improve as well. In practice they would like their work to be reviewed quick and rarely rejected. And when a problem arises for requesters to communicate effectively and respectfully with them.

From our data we cannot directly observe pay rate. However, what we can observe is the four rating categories defined by Turkopticon: Pay, Fast (rapidity of accepting or rejected submitted HITs), Fairness, Communication. These categories allow for ratings between 1 and 5.

From the trends in these rating systems we can ideally infer how the Mturk workplace has changed for workers over time.

Figure 6: Mean reviews over time.

From Figure 6 we can see that mean reviews have changed significantly over time. In particular the ratings for Fast and Fair have improved generally while the ratings for Pay and Communication, except for a brief bump for pay in 2013 have fallen ominously.

So what is driving these trends and does Turkopticon have anything to do with it?

There was a time when Mturk placed heavy restrictions on non-US workers, I suspect as a result of attempting to comply with federal taxation laws. This I suspect significantly restricted the supply of workers causing a short term rise in the system wage.

From our previous analysis we can see that Turkopticon is widely and actively used by many Mturk workers. Thus we consider how Turkopticon might be driving the trends that we are seeing.

My personal experience with interacting with workers through Turkopticon was quiet unpleasant. I don’t know if this is typical of other requesters using Turkopticon but talking with other academics who have used Turkopticon I suspect it is not unusual.

If this is the case, then Turkopticon provides an easy explanation for the falling wage. That is, requesters dropping out of the system after being targeted by a disgruntled, organized, and anonymous workforce. Simultaneously, many workers have claimed that the only reason they have continued to be active on Mturk is because of Turkopticon. Thus Turkopticon might be suppressing the going wage by driving requesters away, while simultaneously retaining workers.

Ironic though predictable.

Increased attention by requesters to scales like Fairness and Fastness is also predictable as requesters grow more attentive to the non-monetary concerns of the workforce. Likewise, due to the growing boldness and hostility of workers organized through the Turkopticon platform, it should come as no surprise to anybody that requesters have opted out of investing in direct communication with workers.

I personally found the experience painful in the extreme as literally everything I said was turned against me in a kind of sadistic group think in which anonymous workers would take turns at antagonizing, humiliating, and threatening me.

My personal experience aside, I do not know what is driving the apparent fall in Mturk pay.

One might argue that pay and other factors have not meaningfully changed only that instead as reviewers have gained more experience they have had more experience to base their reviews. This might be particularly true for the scale “Pay” with some outspoken workers asserting that a pay rating 5 is only warranted when the effective hourly wage is over $12 per hour. This is a plausible explanation as workers seems to learn from the input of other workers about their rating scales.

Figure 7: Completion rates of scales and bomb rates. An incomplete scale submission is a rating in which a reviewer does not provide a rating for a requester on that scale.

In Figure 7 in we can see that 1-bombs and 5-bombs (all 1s or all 5s) have peaked in previous years and dropped off over time. This indicates that perhaps reviewers are learning the expectations of the system and conforming their reviews to match these standards. This supports the changing expectations for pay hypothesis.

Overall, we conclude that Turkopticon is an amazing success. It has brought workers together to exchange information about employees on a immense scale. The end results of collective worker actions have not been as hoped for though predictable with wages decreasing as requesters drop out of the system in response to collective antagonism organized through Turkopticon.

To leave a comment for the author, please follow the link and comment on their blog: Econometrics by Simulation.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)