Big data, big analytics, big opportunity

[This article was first published on eKonometrics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Data, data, every where
Nor any byte to think

The world today is awash with data. Corporations, governments, and individuals are busy generating petabytes of data on culture, economy, environment, religion, and society.  While data has become abundant and ubiquitous, data analysts needed to turn raw data into knowledge are in fact in short supply.

With big data comes big opportunity for the educated middle class in the developing world where an army of data scientists can be trained to support the offshoring of analytics from the western countries where such needs are unlikely to be filled from the locally available talent.

In a 2011 report, McKinsey Global Institute revealed that the United States alone faces a shortage of almost 200,000 data analysts. The American economy requires an additional 1.5 million managers proficient in decision-making based on insights gained from the analysis of large data sets. And even when Hal Varian, Google’s famed chief economist, profoundly proclaimed that “the real sexy job in 2010s is to be a statistician,” there were not many takers for the opportunity in the West where students pursuing degrees in statistics, engineering, and other empirical fields are small in number and are often visa students from abroad.

A recent report by Statistics Canada revealed that two-thirds of those who graduated with a PhD in engineering from a Canadian University in 2005 spoke neither English nor French as mother tongue. Similarly, four out of 10 PhD graduates in computers, mathematics, and physical sciences did not speak a western language as mother tongue. Also, more than 60 per cent of engineering graduates were visible minorities, suggesting that the supply chain of highly qualified professional talent in Canada, and to a large extent in North America, is already linked to the talent emigrating from China, Egypt, India, Iran, and Pakistan.

The abundance of data and the scarcity of analysts present a unique opportunity for developing countries, which have an abundant supply of highly numerate youth who could be trained and mobilized en masse to write a new chapter in offshoring. This would require a serious rethink for thought leaders in developing countries who have not taxed their imaginations beyond thinking of policies to create sweat shops where youth would undersell their skills and see their potential wilt away while creating undergarments for consumers in the west. The fate of the youth in developing countries need not be restricted to stitching underwear or making cold calls from offshored call-centers in order for them to be part of the global value chains. Instead, they can be trained as skilled number-crunchers who would add value to otherwise worthless data for businesses, big and small.

A multi-billion dollar industry

The past decade has witnessed a major change in the sectorial evolution of some very large manufacturing firms known in the past for mostly hardware engineering and now evolving into firms delivering services, such as business analytics. Take IBM for example, which specialized as a computer hardware company producing servers, desktop computers, laptops, and other supporting infrastructure. That was IBM’s past. Today, IBM is focused on analytics. It is spending hundreds of millions of dollars in advertising, trying to rebrand itself as a leader in business analytics. In fact, it has divested from several hardware initiatives, such as manufacturing laptops, and has instead spent billions in acquisitions to build its analytic credentials. For instance, IBM has acquired SPSS for over a billion dollars to capture the retail side of the Business analytics market. For large commercial ventures, IBM acquired Cognos to offer full service analytics.

In 2011 alone, the business analytics software market was worth over $30 billion. Oracle ($6.1bn), SAP ($4.6 bn), IBM ($4.4 bn), and Microsoft and SAS each with $3.3 bn in sales led the market. It is estimated that the sale of business analytics software alone will hit $50 billion by 2016.  Dan Vesset of IDC, a company specializing in watching industry trends, aptly noted that business analytics had “crossed the chasm into the mainstream mass market” and the “demand for business analytics solutions is exposing the previously minor issue of the shortage of highly skilled IT and analytics staff.”

In addition to the bundled software and service sales offered by the likes of Oracle and IBM, business analytics services in the consulting domain generated several billion dollars more worldwide. While the large firms command the lion’s share in the analytics market, the billions left at the bottom are still a large enough prize to take the analytics plunge.

Several billion reasons to hop on the analytics bandwagon

While the IBMs of the world are focused largely on large corporations, the analytics needs for small and medium-sized enterprises (SMEs) are unlikely to be met by IBM, Oracle, or other large players. Cost is the most important determinant. SMEs prefer to have analytics done on the cheap while the overheads of the large analytics firms run into millions of dollars thus pricing them out of the SME market. With offshoring comes the access to affordable talent in developing countries who can bid for smaller contracts and beat the competition in the West on price, and over time on quality as well.

The trick therefore, is to beat the IBMs of the world in the analytics game by not competing against them. Realizing that business analytics is not a market, but an amalgamation of several types of markets focused on delivering value-added services involving data capture, data warehousing, data cleaning, data mining, and data analysis, developing countries can carve out a niche for themselves by focusing exclusively on contracts that large firms will not bid for because of their intrinsic large overheads.

Leaving the fight for top dollars in analytics to top dogs, a cottage industry in analytics could be developed in the developing countries that may strive to serve the analytics need of SMEs. Take the example of the Toronto Transit Commission (TTC), Canada’s largest public transit agency with annual revenues exceeding a billion dollars. When TTC needed to have a large database of almost a half million commuter complaints analyzed, it turned to Ryerson University, rather than a large analytics firm. TTC’s decision to work with Ryerson University was motivated by two considerations. First the cost; as a public sector university, Ryerson believes strongly in serving the community and thus offered the services for gratis. The second reason is quality. Ryerson University, like most similar institutions of higher learning, excels in analytics where several faculty members work at the cutting edge of analytics and are more than willing to apply their skills to real life problems.

Why now?

The timing had never been better to undertake such an endeavor on a very large scale. The innovations in Information and Communication Technology (ICT) and the ready availability of the most advanced analytics software as freeware allows entrepreneurs in developing countries to compete worldwide. The Internet makes it possible to be part of global marketplaces with negligible costs. With cyber marketplaces such as Kijiji and Craigslist individuals can become proprietors offering services worldwide.

Using the freely available Google Sites, one can have a business website online immediately at no cost.Google Docs, another free service from Google, allows one to have a web server for free to share documents with collaborators or the rest of the world for free. Other free services, such as Google Trends, allow individual researchers to generate data on business and social trends without needing subscriptions for services that cost millions. The graph below is generated using Google trends showing daily visits to the websites of leading analytics firms. Without free access to such services, access to the data used to generate the same graph would carry a huge price tag.

Similarly, another free service from Google allows one to determine, for instance, which cities registered the highest number of search requests for ‘business analytics’. It appears that four of the top six cities where analytics are most popular are located in India, which is evident from the following graph where search intensity is mapped on a normalized index of 0 to 100.

The other big development of recent times is freeware that is leveling the playing field between haves and have-nots. In analytics, one of the most sophisticated computing platforms is R, which is available for free. Developers worldwide are busy developing the R platform, which now offers over 3,000 packages for free for analyzing data. From econometrics to operations research, R is fast becoming the lingua franca for computing. R has evolved from being popular just amongst computing geeks to having its praise sung by the New York Times.

R has also made some new friends, especially Paul Butler, a Canadian student who became a worldwide sensation by mapping the geography of Facebook. While being an intern at Facebook, Paul analyzed gigabytes of data to plot how Facebook’s friends were linked globally. His map (see the image below) became an instant hit worldwide and has been reproduced in publications thousands of times. If you are wondering what software Paul used to generate the map, wonder no more, the answer is R.

R is fast becoming the preferred computing platform for data scientists worldwide. For decades the data analysis market was ruled by the likes of SAS, SPSS, Stata and other similar players. R has taken over the imagination of data analysts as of late who are fast converging to R, especially after R’s ability to interact with Hadoop (another open source platform) for analyzing big data . In fact, most innovations in statistics are first coded in R so that the algorithms become available to all immediately and for free.


The fact that R is freely available should not be taken lightly. A commercial license of a similarly equipped version of SPSS may cost up to US$7,500. The other big advantage of using R is the fact that thousands of training documents on the Internet and videos on YouTube are also available for free by volunteers.

Where to next

The private sector has to take the lead for business analytics to take root in developing countries. The governments could also have a small role in regulation. However, the analytics revolution has to take place not because of the public sector, but in spite of it. Even public sector universities in developing countries cannot be entrusted with the task where senior university administers do not warm up to innovative ideas unless they involve a junket in Europe or North America. At the same time the faculty in public sector universities in developing countries is often unwilling to try new technologies.

The private sector in developing countries may want to launch first an industry group that takes upon the task of certifying firms and individuals interested in analytics for quality, reliability, and ethical and professional competencies. This will help build confidence around national brands. Without such certification, foreign clients will be apprehensive to share their proprietary data with individuals hidden behind computer monitors thousands of miles away.

The private sector will also have to take the lead in training a professional workforce in analytics. Several companies train their employees in the latest technology and then market their skills to clients. The training houses would therefore also double as consulting practices where the best graduates may be retained as consultants.

Small virtual marketplaces could be setup in large cities where clients can put requests for proposals and pre-screened, qualified bidders can compete for the contract. The national self-regulating body will be responsible for screening qualified bidders from its vendor-of-record database, which it would make available to clients globally through the Internet.

The IBMs of the world see the analytics market to hit hundreds of billions in revenue in the next decade. The abundant talent in developing countries can be polished into a skilled workforce to tap into the analytics market to channel some revenue to developing countries while creating gainful employment opportunities for the educated youth who have been reduced to making cold calls from offshored call centers.

To leave a comment for the author, please follow the link and comment on their blog: eKonometrics. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)