Human in A.I. loop

[This article was first published on R – sane.a.lytics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A.I is out to get us. It will replace us all.. or maybe, it’s just another tool that we will get used to like calculators or iPhones.


If you are reading this post, chances are you are familiar with Artificial Intelligence, Deep Learning, Neural nets.  I am not going to be pedantic and use them interchangeably.

If you don’t care for any of that, feel free to read this article as “Human in Machine Learning loop”. Or how to debug beyond metrics (smell a pattern in this blog?).

Let’s start with a practical example of something I built at Rent The Runway. RTR rents out high end designer dresses for a mass market price.  We have a lot of folks who love us. And they are not shy about tagging us on their Instagram posts.

Wouldn’t it be great if  we could show those Instagram photos on our product pages and help our customers find fashion inspiration through these high quality shots?

We already have photo reviews that our customers post on our site showing us how they wore it. It is a highly used feature for visitors. But customers have to submit to us directly for that to work.

My mind was set. It had to be done.

But how? While it is true that they tag #renttherunway or #myRTR, they don’t tag the exact dress name.

Obviously I brought a tank to a knife fight and built a convolutional neural net. This A.I that would look at these natural images and classify them as one of the thousands of dresses that we have currently on site. I named it DeepDress, but probably should have called it Dress2Vec… too late now. I used Torch to train and Benjamin’s waffle package to create an internal API that I can hit from any language. Since I have little patience, I used NVIDIA GPUs to speed up the training and tests.

I got all 30k Instagram posts at the time and I stored the metadata in Mongo (I know, I know but it’s great for unstructured).

Here is the code for that –

Then I looped through them and called my API to identify the dress and saved that.

Mission accomplished and moving on.

Well my friends, if that were the case, I wouldn’t have written this post. So here is how a production system is really built.

One problem I knew ahead of time is that our inventory changes over time as we retire older styles and acquire new ones. DeepDress did not know of some very new dresses and none of the old ones. Old ones aren’t an issue since we wouldn’t want to display them anyway but there is a cold start problem that I hadn’t solved at the time.

Another issue is that I had trained the algorithm only on our dresses. While that is our bread and butter, we also rent bags, earrings, necklaces, bracelets, activewear, etc. I always start simple for v1. This means there were bound to be things classified incorrectly as the convnet sweats bullets trying to figure out which dress that Diane Von Furstenberg minaudiere looks like. We could use a lower bound on the probability of most likely class, but what should that threshold be?

Besides the above red flags, even if the identification were 100% accurate (ha), it would make sense to have someone in RTR verify that the image is in fact, fit for display on our site and consistent with our branding.

What we need to build is a gamified quick truther thingy that all our 200 employees in main office can log in during their lunch break and rate a few items. It better be interesting and fast because they have important things to do. It wouldn’t hurt to get verification of the same image from a few different folks so we build our confidence. Then there are nice to haves like responsive, reactive, and other fancy words that mean fun to use.

I could have built this in ruby or react or something that pleases my web developer friends. But I am a data scientist who is very familiar with R, so I built another shiny little thing in an hour. I am in good company. My friend Rajiv also loves understanding his A.I. creations in R.

Rough outline on what we want –

  1. Start with a random seed for each session
  2. Retrieve Instagram Images from that seed backwards
  3. User has to choose one of N options
  4. Upon choice, store the result back in Mongo
  5. Automatically move on to the next one (shiny is responsive)
  6. Every day, a job would restart the server, freezing what was done and starting on the next batch, moving back in time.

First we set up the global.R to get the relevant Instagram posts we will verify –

Then we create ui.R that my colleagues will see –

Finally we need to hook up some logic –

And shiny does the rest. Here is what this looks like in practice –

Click to view slideshow.

This was so successful that in the two weeks the POC was up, we had 3,281 verifications. Some colleagues (thanks Sam/Sandy) did as many as 350+ verifications over that time. It should be pointed out that besides an announcement email, there was no other push to do this.

We found that there were a significant number of posts that were not even dresses. I guess our fashionistas really love us. This is not a problem I knew about going into building this and was certainly not expecting high probability scores for the suggested class in some cases. It helped me fix it by adding a class for “not a dress” with actual training data (and noise).

Most importantly, we answered important questions like what this cute puppy wore.



And if that isn’t the most satisfying result, I don’t know what is.

We have some amazing employees (hi Sam/Liz/Amanda) who have can remember single dress on site. For mere mortals, matching a photo to something we have is nearly impossible.

DeepDress A.I toiled through all 30k photos and narrowing each down to 3 possible choices from our thousands. But it is humans who really save the day. As a result we have a golden database of Instagram photos mapped to exact product that took advantage of both their strengths. Look at that – humans and A.I can play together.

Hopefully this example demonstrates why it is useful to use your resources (company employees) to check your systems beyond the metrics before releasing things in the wild. I end up using shiny a lot for these sort of internal demos because I slice and dice the results in R. But do them in whatever you are comfortable in and aggressively re-evaluate those choices should they reach production. For example, we will rewrite the truther in Ruby now that it will be production to comply with our other standards. For demos, I have no emotional attachment to any language (close your ears C++).

I’m sure you guessed that could not have been the only reason I built DeepDress and you would be right. But today’s not a day to talk about that.



To leave a comment for the author, please follow the link and comment on their blog: R – sane.a.lytics. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)