I caught a mention of this project by Pete Warden on Four Short Links today. If his name sounds familiar, heâs the creator of the DSTK, an OâReilly author, and now works at Google. A decidedly clever and decent chap.
The project goal is noble: crowdsource and make a repository of open speech data for researchers to make a better world. Said sourcing is done by asking folks to record themselves saying âYesâ, âNoâ and other short words.
As I meandered over the blog post I looked in horror on the URL for the application that did the recording:
Why would the goal of the project combined with that URL give pause? Read on!
Youâve Got Scams!
Picking up the phone and saying something as simple as âYesâ has been a major scam this year. By recording your voice, attackers can replay it on phone prompts and because itâs your voice it makes it harder to refute the evidence and can foil recognition systems that look for your actual voice.
As the chart above shows, the Better Business Bureau has logged over 5,000 of these scams this year (searching for âphishingâ and âyesâ). You can play with the data (a bit â the package needs work) in R with
Now, these are âanalogâ attacks (i.e. a human spends time socially engineering a human). Bookmark this as you peruse section 2.
Integrity Challenges in 2017
I âtrustâ Peteâs intentions, but I sure donât trust
open-speech-commands.appspot.com (and, you shouldnât either). Why? Go visit https://totally-harmless-app.appspot.com. Itâs a Google App Engine app I made for this post. Anyone can make an appspot app and the
https is meaningless as far as integrity & authenticity goes since Iâm running on googleâs infrastructure but Iâm not google.
You canât really trust most SSL/TLS sessions as far as site integrity goes anyway. Letâs Encrypt put the final nail in the coffin with their Certs Gone Wild! initiative. With super-recent browser updates you can almost trust your eyes again when it comes to URLs, but you should be very wary of entering your info â especially uploading voice, prints or eye/face images â into any input box on any site if you arenât 100% sure itâs a legit site that you trust.
Tracking the Trackers
If you donât know that youâre being tracked 100% of the time on the internet then you really need to read up on the modern internet.
In many cases your IP address can directly identify you. In most cases your device & browser profile â which most commercial sites log â can directly identify you. So, just visiting a web site means that itâs highly likely that web site can know that you are both not a dog and are in fact you.
Still Waiting for the âSo, What?â
Many states and municipalities have engaged in awareness campaigns to warn citizens about the âSay âYesââ scam. Asking someone to record themselves saying âYesâ into a random web site pretty much negates that advice.
Folks like me regularly warn about trust on the internet. I could have cloned the functionality of the original site to
open-speech-commmands.appspot.com. Did you even catch the 3rd âmâ there? Even without that, itâs an
appspot.com domain. Anyone can set one up.
Even if the site doesnât ask for your name or other info and just asks for your âYesâ, it can know who you are. In fact, when youâre enabling the microphone to do the recording, it could even take a picture of you if it wanted to (and youâd likely not know or not object since itâs for SCIENCE!).
So, in the worst case scenario a malicious entity could be asking you for your âYesâ, tying it right to you and then executing the post-scam attacks that were being performed in the analog version.
But, go so far as to assume this is a legit site with good intentions. Do you really know whatâs being logged when you commit your voice info? If the data was mishandled, it would be just as easy to tie the voice files back to you (assuming a certain level of data logging).
The âso whatâ is not really a warning to users but a message to researchers: You need to threat model your experiments and research initiatives, especially when innocent end users are potentially being put at risk. Data is the new gold, diamonds and other precious bits that attackers are after. You may think youâre not putting folks at risk and arenât even a hacker target, but how you design data gathering can reinforce good or bad behaviour on the part of users. It can solidify solid security messages or tear them down. And, you and your data may be more of a target than you really know.
Reach out to interdisciplinary colleagues to help threat model your data collection, storage and dissemination methods to ensure you arenât putting yourself or others at risk.
Pete did the right thing:
and, Iâm sure the site will be on a âproperâ domain soon. When it is, Iâll be one of the first in line to help make a much-needed open data set for research purposes.