This is a guest post by Alan Briggs. Alan is a Data Scientist with Elder Research, Inc. assisting with the development of predictive analytic capabilities for national security clients in the Washington, DC Metro area. He is President-Elect of the Maryland chapter of the Institute for Operations Research and Management Science (INFORMS). Follow him at @AlanWBriggs. This post previews two events that Alan is presenting with Data Community DC President Harlan Harris.
Have you ever tried to schedule dinner or a movie with a group of friends from across town? Then, when you throw out an idea about where to go, someone responds “no, not there, that’s too far?” Then, there’s the old adage that the three most important things about real estate are location, location and location. From placing a hotdog stand on the beach to deciding where to build the next Ikea, location really crops up all over the place. Not surprisingly, and I think most social scientists would agree, people tend to act in their own self-interest. That is, everyone wants to travel the least amount of distance, spend the least amount of money or expend the least amount of time possible in order to do what they need [or want] to do. For one self-interested person, the solution to location problems would always be clear, but we live in a world of co-existence and shared resources. There’s not just one person going to the movies or shopping at the store; there are several, hundreds, thousands, maybe several hundred thousand. If self-interest is predictable in this small planning exercise of getting together with friends, can we use math and science to leverage it to our advantage? It turns out that the mathematical and statistical techniques that are scalable to the worlds’ largest and most vexing problems can also be used to address some more everyday issues, such as where to schedule a Meetup event.
With a little abstraction, this scenario looks a lot like a classical problem in operations research called the facility location or network design problem. Its roots tracing back to the 17th century Fermat-Weber Problem, facility location analysis seeks to minimize the costs associated with locating a facility. In our case, we can define the cost of a Meetup venue by the sum of the distance traveled to the Meetup by its attendees. Other costs could be included, but to start simple, you can’t beat a straight-line distance.
So, here’s a little history. The data-focused Meetup scene in the DC region is several years old, with Hadoop DC, Big Data DC and the R Users DC (now Statistical Programming DC) Meetups having been founded first. Over the years, as these groups have grown and been joined by many others, their locations have bounced around among several different locations, mostly in downtown Washington DC, Northern Virginia, and suburban Maryland. Location decisions were primarily driven by supply – what organization would be willing to allow a big crowd to fill its meeting space on a weekday evening? Data Science DC, for instance, has almost always held its events downtown. But as the events have grown, and as organizers have heard more complaints about location, it became clear that venue selection needed to include a demand component as well, and that some events could or should be held outside of the downtown core (or vice-versa, for groups that have held their events in the suburbs).
Data Community DC performed a marketing survey at the beginning of 2013, and got back a large enough sample of Meetup attendees to do some real analysis. (See the public survey announcement.) As professional Meetups tend to be on weekday evenings, it is clear that attendees are not likely traveling just from work or just from home, but are most likely traveling to the Meetup as a detour from the usual commute line connecting their work and home. Fortunately, the survey designers asked Meetup attendees to provide both their home and their work zip codes, so the data could be used to (roughly) draw lines on a map describing typical commute lines.
The Revolutions blog recently presented a similar problem in their post How to Choose a New Business Location with R. The author, Rodolfo Vanzini, writes that his wife’s business had simply run out of space in its current location. An economist by training, Vanzini initially assumed the locations of customers at his wife’s business must be uniformly distributed. After further consideration, he concluded that “individuals make biased decisions basing their assumptions and conclusions on a limited and approximate set of rules often leading to sub-optimal outcomes.” Vanzini then turned to R, plotted his customers’ addresses on a map and picked a better location based on visual inspection of the location distribution.
If you’ve been paying attention, you’ll notice a common thread with the previously mentioned location optimizations. When you’re getting together with friends, you’re only going to one movie theater; Vanzini’s wife was only locating one school. Moreover, both location problems rely on a single location — their home — for each interested party. That’s really convenient for a beginner-level location problem; accordingly, it’s a relatively simple problem to solve. The Meetup location problem on the other hand adds two complexities that make this problem worthy of the time you’re spending here. Principally, if it’s not readily apparent, a group of 150 boisterous data scientists can easily overstay their welcome by having monthly meetings at the same place over and over again. Additionally, having a single location also ensures that the part of the population that drives the farthest will have to do so for each and every event. For this reason, we propose to identify the three locations which can be chosen that minimize the sum of the minimum distances traveled for the entire group. The idea is that the Meetup events can rotate between each of the three optimal locations. This provides diversity in location which appeases meeting space hosts. But, it also provides a closer meeting location for a rotating third of the event attendee population. Every event won’t be ideal for everyone, but it’ll be convenient for everyone at least sometimes.
As an additional complexity, we have two ZIP codes for each person in attendance — work and home — which means that instead of doing a point-to-point distance computation, we instead want to minimize the distance to the three meeting locations from the closest point along the commute line. Optimizing location with these two concepts in mind — particularly the n-location component — is substantially more complicated than the single location optimization with just one set of coordinates for each attendee.
So, there you have it. To jump ahead to the punchline, the three optimal locations for a data Meetup to host its meetings are Rockville, MD, downtown Washington DC and Southern Arlington, VA.
To hear us (Harlan and Alan) present this material, there are two great events coming up. The INFORMS Maryland Chapter is hosting their inaugural Learn. Network. Do. event on October 23 at INFORMS headquarters on the UMBC campus. Statistical Programming DC will also be holding its monthly meeting at iStrategy Labs on October 24. Both events will pull back the curtain on the code that was used and either event should send you off with sufficient knowledge to start to tackle your own location optimization problem.