A number of key assumptions underlie the linear regression model - among them linearity and normally distributed noise (error) terms with constant variance In this post, I consider an additional assumption: the unobserved noise is uncorrelated with any covariates or predictors in the model.
In this simple model:
\[Y_i = \...
Every R package has its story. Some packages are written by experts, some by
novices. Some are developed quickly, others were long in the making. This is the
story of jstor, a package which I developed during my time as a student of
sociology, working in a research project on ...
Last week I ran across this great post on creating a neural network in Python. It walks through the very basics of neural networks and creates a working example using Python. I enjoyed the simple hands on approach the author used, and I was interested to see how we might ...
Last month, Max Roser presented a cartogram of the Earth’s population in 2018.
He also provided some perspectives on its spatial distribution in an article on the worldinourdata.org, which I recommend.
Links to the article were shared in many places, including in the blog post A Map of the ... [Read more...]
According to a KDD poll fewer respondents (by rate) used only R in 2017 than in 2018. At the same time more respondents (by rate) used only Python in 2017 than in 2016. Let’s take this as an excuse to take a quick look at what happens when we try a task in ...
Data manipulation is a breeze with amazing packages like plyr and dplyr. Recoding factors, which could prove to be a daunting task especially for variables that have many categories, can easily be accomplished with these packages. However, it is important for those learning Data Science to understand how the basic ...
Data manipulation is a breeze with amazing packages like plyr and dplyr. Recoding factors, which could prove to be a daunting task especially for variables that have many categories, can easily be accomplished with these packages. However, it is important for those learning Data Science to understand how the basic ...
Here is the podcast link.
Introducing Andrew Gelman
Hugo: Hi there, Andy, and welcome to DataFramed.
Andrew: Hello.
Hugo: Such a pleasure to have you on the show and I'm really excited to have you here today to talk about polling and election forecasting, but before that I'd like to ...
In the last couple of years, real estate companies have shifted their focus to the digital world, and now almost all investments have an online system showing what apartments are available. This is very convenient for their potential clients, as they can easily become familiar with the apartments on offer. ...
Have you ever wanted to make your Shiny tables interactive, more functional and look better? The DT package, which stands for “DataTables”, provides an R interface to the JavaScript library “DataTables”. It allows creating high standard tables by implementing the functionalities and design features that are available through the “DataTables” ...
From Executive Business Leadership to Data Scientists, we all agree on one thing: A data-driven transformation is happening. Artificial Intelligence (AI) and more specifically, Data Science, are redefining how organizations extract insights from their ... [Read more...]
Plotting phylogenies and associated data side by side is a good way to explore
evolutionary patterns in your data. In this post I will describe my approach
for creating such plots in R using ggplot, ggtree, and patchwork.
ggtree itself comes with ...
September was another relatively slow month for new package activity on CRAN: “only” 126 new packages by my count. My Top 40 list is heavy on what I characterize as “utilities”: packages that either extend R in some fashion or make it easier to do things in R. This month, the packages ...
Proper identification of individuals is crucial for acknowledging and
studying their scientific work, be it journal articles or pieces of
software. In this tech note, one year after CRAN started supporting
ORCIDs, we shall explain why and how to use unique author identifiers in
DESCRIPTION files.
Why use ORCIDs on ... [Read more...]
In this post, I show some results of predicting height based on DNA mutations. This analysis aims at reproducing the analysis of this paper using my own analysis tools in.
I use a new dataset composed of 500,000 adults from UK, and genotyped over hund...
The PALM tree algorithm for partially additive (generalized) linear model trees is introduced along with the R package palmtree. One potential application is modeling of treatment-subgroup interactions while adjusting for global additive effe...
R has been around long time and the packages have evolved through the years as well. From the initial releases, updates, to new packages. Like many open-source and community driven languages, R is not an exception. And getting the first…Read more ›
How can we use data analytics to increase our self-knowledge? Along with biofeedback from digital devices like FitBit, less structured sources such as sent emails can provide insights.
E.g. here it seems my communication took a sudden more positive tu...
In R, we can subset a data frame df easily by putting the conditional in square brackets after df. For example, if I want all the rows in df which have value equal to 1 in the column colA, all … Continue reading →
[Read more...]