Monthly Archives: August 2019

Simulation studies in R with the ‘parSim’ package

August 31, 2019
By
Simulation studies in R with the ‘parSim’ package

(featuring missing data analysis) Simulation studies are absolutely vital for methodological work to be validated and tested in multiple settings. One simulation study is good, but more simulation studies are always better. In the context of network estimation for example, simulation studies are currently the only way to go in assessing the sample size needed,

Read more »

Estimating variance: should I use n or n – 1? The answer is not what you think

Estimating variance: should I use n or n – 1? The answer is not what you think

Estimates of population parameters based on samples are not exact: there is always some error involved. In principle, one can estimate a population parameter with any estimator, but some will be better than others. There is one particular case which was always very confusing to me (because of the multiple alternatives) and that is the estimation of the variance...

Read more »

Use ExPanD to Create a Notebook for Your EDA

Use ExPanD to Create a Notebook for Your EDA

The ‘ExPanDaR’ package offers a toolbox for interactive exploratory data analysis (EDA). You can read more about it here. The ‘ExPanD’ shiny app allows you to customize your analysis to some extent but often you might want to continue and extend your analysis with additional models and visualizations that are not part of the ‘ExPanDaR’ package. Thus, I am currently...

Read more »

Using Spark from R for performance with arbitrary code – Part 1 – Spark SQL translation, custom functions, and Arrow

August 31, 2019
By
Using Spark from R for performance with arbitrary code – Part 1 – Spark SQL translation, custom functions, and Arrow

Introduction Apache Spark is a popular open-source analytics engine for big data processing and thanks to the sparklyr and SparkR packages, the power of Spark is also available to R users. This series of articles will attempt to provide practical insights into using the sparklyr interface to gain the benefits of Apache Spark while still retaining the ability to use R...

Read more »

‘There is a game I play’ – Analyzing Metacritic scores for video games

August 30, 2019
By
‘There is a game I play’ – Analyzing Metacritic scores for video games

There is a game I play / try to make myself okay / try so hard to make the pieces all fit / smash it apart / just for the f**k of it (Nine Inch Nails: The Big Come Down) After this rather distressing opening by the Nine Inch Nails, let’s turn to a more uplifting topic: video games! There...

Read more »

Explaining Predictions: Random Forest Post-hoc Analysis (randomForestExplainer package)

August 30, 2019
By
Explaining Predictions: Random Forest Post-hoc Analysis (randomForestExplainer   package)

Recap This is a continuation on the explanation of machine learning model predictions. Specifically, random forest models. We can depend on the random forest package itself to explain predictions based on impurity importance or permutation importance. Today, we will explore external packages which aid in explaining random forest predictions. External packages There are external a few packages which offer to calculate variable...

Read more »

Lesser known dplyr functions

August 30, 2019
By

The dplyr package is an essential tool for manipulating data in R. The “Introduction to dplyr” vignette gives a good overview of the common dplyr functions (list taken from the vignette itself): filter() to select cases based on their values. arrange() to … Continue reading →

Read more »

Seeking postdoc (or contractor) for next generation Stan language research and development

August 30, 2019
By

The Stan group at Columbia is looking to hire a postdoc* to work on the next generation compiler for the Stan open-source probabilistic programming language. Ideally, a candidate will bring language development experience and also have research interests in a related field such as programming languages, applied statistics, numerical analysis, or statistical computation. The language

Read more »

Why R?

August 30, 2019
By

I was working with our copy editor on Appendix A of Practical Data Science with R, 2nd Edition; Zumel, Mount; Manning 2019, and ran into this little point (unfortunately) buried in the back of the book. In our opinion the R ecosystem is the fastest path to substantial data science, statistical, and machine learning accomplishment. … Continue reading Why...

Read more »

Bigram Analysis of Democratic Debates

August 30, 2019
By
Bigram Analysis of Democratic Debates

This tutorial will mainly focus on ggplot and bigrams, but it does gloss over clustering for a heatmap. This project started a while back, tweetingContinue ReadingBigram Analysis of Democratic Debates

Read more »

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)