For the past two years I have been working on the Wisconsin Dropout Early Warning System, a predictive model of on time high school graduation for students in grades 6-9 in Wisconsin. The goal of this project is to help schools and educators have an ea... [Read more...]
This is the bimonthly R Jobs post (for 2014-08-25), based on the R-bloggers’ sister website: R-users.com. If you are an employer who is looking to hire people from the [Read more...]
With all the recent buzz about ggvis (this, this, and this) it’s often easy to forget all that ggplot2 offers as a graphics package. True, ggplot is a static approach to graphing unlike ggvis but it has fundamentally changed the way we think about plots in R. I recently ... [Read more...]
The OpenCPU system exposes an HTTP API for embedded scientific computing with R. This provides reliable and scalable foundations for integrating R based analysis and visualization modules into pipelines, web applications or big data infrastruct...
On 23–25 September, I will be running a 3-day workshop in Perth on “Forecasting: principles and practice” mostly based on my book of the same name. Workshop participants will be assumed to be familiar with basic statistical tools such as multiple regression, but no knowledge of time series or forecasting will ... [Read more...]
Objective I recently needed to stem every word in a block of text i.e. reduce each word to a root form. Problem The stemmer I was using would only stem the last word in each block of text e.g. Solution I wrote a function which splits a block ... [Read more...]
During the last year I have been running some estimations in both JAGS and Stan. In that period I have seen one example where JAGS could not get me decent samples (in the sense of low Rhat and high number of effective samples) but that was data which I... [Read more...]
The book Computational Actuarial Science, with R is officially out. In the introduction of the book, and on the website of CRC, it is mentioned that the datasets can be found “in an R package on CRAN“, which is unfortunately incorrect. Some datasets are too large, so the package can ... [Read more...]
As of late, I’ve been making heavy use of Reference Classes in R. They are easier for me to wrap my mind around since they adopt a usage style more like “traditional” OOP languages like Java. Primarily, object methods are part of the class definition and accessed via the ... [Read more...]
Distribution of intercept-slope correlation estimates with 37 subjects, 15 itemsDistribution of intercept-slope correlation estimates with 50 subjects, 30 itemsShould one always fit a full variance covariance matrix (a "maximal" model) when one analyze...
The motivation of pipeline operator is to make code more readable. In many cases, it indeed better organizes code so that the logic is presented in human-readable fluent style. In other cases, however, such operators can make things worse.
Recently, I had an interesting discussion on how to add side ... [Read more...]
In my first pass at text analysis of the ESA program, I looked at how the frequency of words used in the ESA program differed from last year to this year. There are much more sophisticated ways at looking at word use in text, though, and I began to dive ... [Read more...]
This entire movie — images, music, everything — is generated from a Windows PC executable of just 4,095 bytes. That's not a typo: we're not talking bytes not megabytes or gigabytes here. Less than 4kb total creates this entire scene. For comparison, a medium-quality video file of this exact same scene in AVI ... [Read more...]
By Neera Talbert, VP Services and Ben Wiley, R Programmer at Revolution Analytics By now, everyone should be familiar with the data scientist boom. Simply logging onto LinkedIn reveals a seemingly infinite number of people with words and phrases like “Data Scientist”, “Big Data Specialist”, and “Analytics” in their title. ... [Read more...]
I've been teasing about this post for some time now.
My next blog post is "Pro Grammar and Devel Hoper". And this not just an empty pun. Stay tuned.— Romain François (@romain_francois) August 3, 2014
@stefanbache another teaser. https://t.co/i2ubfOyjIO
iris ____ filter( Sepal.Length __ 7 )
iris |__ filter( Sepal.... [Read more...]
Using browser based data analysis toolkits such as pandas in IPython notebooks, or R in RStudio, means you need to have access to python or R and the corresponding application server either on your own computer, or running on a remote server that you have access to. When running occasional ... [Read more...]
Hi, I’m Andrew and this is my first post for Coppelia! If you like the look of this feel free to visit my blog dinner with data (and see what happens when a data scientist hits the kitchen!) I was excited by James’s last post on the new ... [Read more...]
An update to the stringdist package was released earlier this month. Thanks to a contribution of Jan van der Laan the package now includes a method to compute soundex codes as defined here. Briefly, soundex encoding aims to translate words … Continue reading →
[Read more...]