Monthly Archives: March 2013

Analyzing SimplyStatistics visits info

March 9, 2013
By
Analyzing SimplyStatistics visits info

Recently we had to analyze the data of the number of visits per day to SimplyStatistics.org. There were two goals: Estimate the fraction of visitors retained after a spike in the number of visitors Identify (if any) any factors that influence the fraction estimated in 1. For me it was a fun project in part because I like SimplyStatistics but also...

Read more »

A bit more on sample size

March 8, 2013
By
A bit more on sample size

In our article What is a large enough random sample? we pointed out that if you wanted to measure a proportion to an accuracy “a” with chance of being wrong of “d” then a idea was to guarantee you had a sample size of at least: This is the central question in designing opinion polls Related posts:

Read more »

R vs. Perl/mySQL – an applied genomics showdown

March 8, 2013
By

R vs. Perl/mySQL - an applied genomics showdown Recently I was given an assignment for a class I'm taking that got me thinking about speed in R. This isn't something I'm usually concerned with, but the first time I tried to run my solution (ussing plyr's ddply() it was going to take all night to compute. I consulted the professor that taught...

Read more »

Quandl package released to CRAN

March 8, 2013
By

In a guest post here on February 20, Tammer Kamel introduced us to Quandl, a kind of "wikipedia" of time series data. In the post, Tammer (the founder of Quandl) noted that they were working on an R package to give R users access to Quandl as a data source. That package is now available. It includes the Quandl...

Read more »

Comparing quantiles for two samples

March 8, 2013
By
Comparing quantiles for two samples

Recently, for a research paper, I some samples, and I wanted to compare them. Not to compare they means (by construction, all of them were centered) but there dispersion. And not they variance, but more their quantiles. Consider the following boxplot type function, where everything here is quantile related (which is not the case for standard boxplot, see http://freakonometrics.hypotheses.org/4138,...

Read more »

Data Visualization: Shiny Democratization

March 8, 2013
By
Data Visualization: Shiny Democratization

In organizing Data Visualization DC we focus on three themes: The Message, The Process, The Psychology. In other words, ideas and examples of what can be communicated, the tools and know-how to get it done, and how best to communicate. … Continue reading → The post Data Visualization: Shiny Democratization appeared first on Data Community DC.

Read more »

Publishing Stats for Analytic Reuse – FAOStat Website and R Package

March 8, 2013
By
Publishing Stats for Analytic Reuse – FAOStat Website and R Package

How can stats and data publishers, from NGOs and (inter)national statistics agencies to scientific researchers, publish their data in a way that supports its analysis directly, as well as in combination with other datasets? Here’s one approach I learned about from Michael Kao of the UN Food and Agriculture Organisation statistics division, FAOStat. At first

Read more »

Cool GSS training video! And cumulative file 1972-2012!

March 8, 2013
By

Felipe Osorio made the above video to help people use the General Social Survey and R to answer research questions in social science. Go for it! Meanwhile, Tom Smith reports: The initial release of the General Social Survey (GSS), cumulative file for 1972-2012 is now on our website. Codebooks and copies of questionnaires will be The post Cool...

Read more »

Visualizing rOpenSci collaboration

March 8, 2013
By
Visualizing rOpenSci collaboration

We (rOpenSci) have been writing code for R packages for a couple years, so it is time to take a look back at the data. What data you ask? The commits data from GitHub ~ data that records who did what and when. Using the Github commits API we can gather data on who commited code to a...

Read more »

Visualizing rOpenSci collaboration

March 8, 2013
By
Visualizing rOpenSci collaboration

We (rOpenSci) have been writing code for R packages for a couple years, so it is time to take a look back at the data. What data you ask? The commits data from GitHub ~ data that records who did what and when. Using the Github commits API we can gather data on who commited code to a...

Read more »