experimental. the behavioral risk factor surveillance system (brfss) aggregates behavioral health data from 400,000 adults via telephone every year. it's um *clears throat* the largest telephone survey in the world and it's gotta lotta uses...
experimental. the behavioral risk factor surveillance system (brfss) aggregates behavioral health data from 400,000 adults via telephone every year. it's um *clears throat* the largest telephone survey in the world and it's gotta lotta uses...
- “Big Data: The Management Revolution,” by Andrew McAfee and Erik Brynjolfsson, pages 61 – 68;
- “Data Scientist: The Sexiest Job of the 21st Century,” by Thomas H. Davenport and D.J. Patil, pages...
Introduction A few months ago, Drew Conway and I gave a webcast that tried to teach people about the basic principles behind linear and logistic regression. To illustrate logistic regression, we worked through a series of progressively more complex spam detection problems. The simplest data set we used was the following: This data set has
experimental. think of the american community survey (acs) as the united states' census for off-years - the ones that don't end in zero. every year, one percent of all americans respond, making it the largest complex sample administered by ...
I know “officially” data scientists all always work in “big data” environments with data in a remote database, streaming store or key-value system. But in day to day work Excel files and Excel export files get used a lot and cause a disproportionate amount of pain. I would like to make a plea to my
I’ve spent a good deal of 2012 constructing a data warehouse to manage all the various data elements that my company has. Although we’re a small enterprise, the richness and complexity of the information is rather high. Moreover, as a data-driven organization, there’s a strong impetus to construct meaningful analysis with every bit of input 
I have found that I get data from many different sources. These sources range from simple .csv files to more complex relational databases, to structure XML or JSON files. I have compiled the different approaches that one can use to easily access these datasets. Local Column Delimited Files This is probably the most common and
R, which was largely predominant in the academic world, has started picking up a lot in businesses as well. At least that is what I am witnessing among my colleagues. Lot of people have started experimenting with R, choosing the path to enlightenment. ...
the centers for medicare and medicaid services (cms) took the plunge. the famous medicare 5% sample has been released to the public, free of charge. jfyi - medicare is the u.s. government program that provides health insurance to 50 million...