Site icon R-bloggers

Analysis of software developers in New York, San Francisco, London and Bangalore

[This article was first published on Variance Explained, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

(Note: Cross-posted with the Stack Overflow Blog.)

When I tell someone Stack Overflow is based in New York City, they’re often surprised: many people assume it’s in San Francisco. (I’ve even seen job applications with “I’m in New York but willing to relocate to San Francisco” in the cover letter.) San Francisco is a safe guess of where an American tech company might be located: it’s in the heart of Silicon Valley, near the headquarters of tech giants such as Apple, Google, and Facebook. But New York has a rich startup ecosystem as well- and it’s a very different world from San Francisco, with developers who use different languages and technologies.

On the Stack Overflow data team we don’t have to hypothesize about where developers are and what they use: we can measure it! By analyzing our traffic, we have a bird’s eye view of who visits Stack Overflow, and what technologies they’re working on. As the first in a series of upcoming analyses of Stack Overflow data, here we’ll show some examples of what we can detect about software developers in each major city.

In this post we’re going to focus on the four cities that visit Stack Overflow the most: San Francisco, Bangalore, London, and New York.1

(The data used in this post is private within the company, but if you’re curious how it was generated you can find the code here).

San Francisco vs New York

First we’ll compare the two most popular American cities for software development: San Francisco and New York.

When developers are using a programming language or technology, they typically visit questions related to it. So based on how much traffic goes to questions tagged with Python, or Javascript, we can estimate what fraction of a city’s software development takes place in that language.

For example, there were 187 million question views from San Francisco in the last year, and we can see that 10.3% of these visits were to questions with the Python tag, compared to 12.8% of New York’s traffic.

Most of these common technologies look like they make up a fairly similar fraction of NY and SF traffic, but we’re interested in stark differences. What tags (among the 200 most high-traffic tags) showed the largest difference between San Francisco and New York?

One clear difference: New York has a larger share of Microsoft developers. Many tags important in the Microsoft technology stack, such as C#, .NET, SQL Server, and VB.NET, had about twice as much traffic in New York as in San Francisco. This may be because many banks and financial firms, which are much more common in NY than in SF, use these technologies.

There are also patterns in the technologies that are more common in the San Francisco area, especially languages developed by Apple (Cocoa, Objective-C, OSX) and Google (Go, Android). We can also see several influential open source projects, especially ones associated with Apache (Hive, Hadoop, Spark).

Rather than looking only at the most dramatic changes, we could visualize the SF/NY ratio compared to the total visits:

This confirms that C# (in NY) and Android (in SF) stand out as the highest traffic tags that show different behavior, with tags such as Excel, VBA, Cocoa, and Go showing more even dramatic differences. Meanwhile, the Java tag has about the same level of traffic in each city, as do several “language agnostic” tags such as “string”, “regex”, and “performance”.

New York, San Francisco, Bangalore, and London

Let’s expand the story to include Bangalore, India, and London, England. Together these four cities make up 11.1% of all Stack Overflow traffic.

Each of these cities is the “capital” of particular tags, visiting them more than the other three cities do. Which tags does each city lead in?

This fills out more of our story:

This portrait of four major developer hubs is is just one of many ways Stack Overflow traffic can tell us about the global software engineering ecosystem. Whether you want to understand developers, hire them, engage them, or make your own developers more efficient, we have solutions to help you solve your problems. Check out Developer Insights to learn more.

  1. In this analysis, we counted all traffic within 50 miles of a city: this means San Francisco includes a larger part of the “Bay Area”, such as Mountain View and Cupertino.

To leave a comment for the author, please follow the link and comment on their blog: Variance Explained.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.