cheatR: an R package for catching cheaters

July 29, 2018
By

(This article was first published on Mattan S. Ben-Shachar, and kindly contributed to R-bloggers)

cheatR is a mini package to help you find cheaters by comparing hand-ins. It was developed by Almog Simchon and me in response to students overheard bragging about how an assignment in an first-year undergrad course was “super easy” because “we all just copied from each other!” (though this would later turn out to be an exaggeration).

Our idea was to compare each hand-in to all other hand-ins and see the degree of overlap between them. This was achieved using the ngram r-package to break each hand-in into a list of “phrases” and then to count how many times each phrase appeared across a pair of documents1. Finally, the percent of non-unique phrases was calculated.

Looking for Cheaters

We then ran this algorithm across all 300~ hand-ins, and found that it seems like the knuckle-headed overheard student estimation of “we all just copied from each other” was an extreme exaggeration. Looking at the distribution of overlap, we can see the vast majority of overlap was quite small (and even this small degree of overlap could be accounted for by the fact the most hand-ins contained the assignment instructions in them):

As is evident from this graph, there were some hand-ins with a 100% overlap! Zooming in to the 70-100% range, it becomes clearer that some students were mischievous!
Plotting the relations between this subgroup, it was apparent some students had become close friends over their first year… 
File names have been redacted.

The After Math

Other than the cheating students received a failing grade on their assignments, I think we can say that the war on cheaters has escalated – and we cant wait to see the new methods students will use for cheating next year!

If you also want to find cheaters, you can try cheatR (hosted on GitHub) for yourself by installing it in R and running it locally, by running:


# install.packages("devtools")
devtools::install_github("mattansb/cheatR")

or you can try our shiny app!


1 We worked under the assumption that if a phrase’s was found more than once, it was not because it was repeated within the same document, but becuase it apeared in both documents. This might not always be the case, but we found no “false positives” in our usage so far, so this might be a resonable assumption.

To leave a comment for the author, please follow the link and comment on their blog: Mattan S. Ben-Shachar.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)