Statistics Sunday: More Text Analysis – Term Frequency and Inverse Document Frequency

July 29, 2018
By
Statistics Sunday: More Text Analysis – Term Frequency and Inverse Document Frequency

Statistics Sunday: Term Frequency and Inverse Document Frequency As a mixed methods researcher, I love working with qualitative data, but I also love the idea of using quantitative methods to add some meaning and context to the words. This is the main reason I've started digging into using R for text mining, and these skills have paid off...

Read more »

But can ravens forecast?

July 29, 2018
By
But can ravens forecast?

Why forecast sales? Humans have the magical ability to plan for future events, for future gain. It’s not quite a uniquely human trait. Because apparently ravens can match a 4-year-old. An abundance of data, and some very nice R packages, make our ability to plan all the more powerful. A couple of months ago we looked … Continue reading "But...

Read more »

5 Things You Should Know About The Future of Population

July 28, 2018
By
5 Things You Should Know About The Future of Population

Motivation About a decade ago, I stumbled upon a TED Talk by a Swedish global health professor named Hans Rosling.  During the video I learned more about global macro-trends (infant mortality, GDP, etc.), opening my eyes to some misconceptions and piquing my interest.  Dr. Rosling’s moving bubble plots enraptured me and his broader audience. See

Read more »

June 2018: Top 40 New Packages

July 28, 2018
By
June 2018: Top 40 New Packages

Approximately 144 new packages stuck to CRAN in June. That fact that 31 of these are specialized to particular scientific disciplines or analyses provides some evidence to my hypothesis that working scientists are actively adopting R. Below are my Top 40 picks for June, organized into the categories of Computational Methods, Data, Data Science, Economics, Science, Statistics, Time Series,...

Read more »

Tuning xgboost in R: Part II

July 28, 2018
By
Tuning xgboost in R: Part II

By Gabriel Vasconcelos In this previous post I discussed some of the parameters we have to tune to estimate a boosting model using the xgboost package. In this post I will discuss the two parameters that were left out in … Continue reading →

Read more »

Data Science to Analyze Big Genomic Data

July 28, 2018
By
Data Science to Analyze Big Genomic Data

Finding the neural stem cell populations in mouse brain   Introduction The main objective of this project is to identify new stem cell populations in mouse brain. Characterizing the unique gene expression signature of these cells could be the starting point for finding and defining cancer stem cells (CSC) in human tumors. In particular, glioblastoma,

Read more »

Mysteriously Slow sample

July 28, 2018
By
Mysteriously Slow sample

Hi everyone, I'm at JSM 2018 right now, so feel free to drop by my session or drop by in the halls! Just give me a tweet! Back to the meat-and-potatoes of this post. A while ago I was running good old sample and comparing its performance to my lpm2_kdtree function in the BalancedSampling package (Grafström and Lisic,...

Read more »

Variation in Hospital Charges and Medicare Payments for Inpatient Procedures in the United States

July 27, 2018
By
Variation in Hospital Charges and Medicare Payments for Inpatient Procedures in the United States

Background U.S. healthcare costs have been on the rise over the past several years, outpacing the growth of the economy overall. The Centers for Medicare and Medicaid Services (CMS) estimates that American healthcare spending increased by 4.6% in 2017 to reach $3.5 trillion. The increases in medical care are  driven by the Medicaid expansion, the private

Read more »

Le Monde puzzle [#1062]

July 27, 2018
By
Le Monde puzzle [#1062]

A simple Le Monde mathematical puzzle none too geometric: Find square triangles which sides are all integers and which surface is its perimeter. Extend to non-square rectangles. No visible difficulty by virtue of Pythagore’s formula: for (a in 1:1e4) for (b in a:1e4) if (a*b==2*(a+b+round(sqrt(a*a+b*b)))) print(c(a,b)) produces two answers 5 12 6 8 and in

Read more »

aRt with code

July 27, 2018
By
aRt with code

Looking for something original to decorate your wall? Art With Code, created by Harvard University bioinformatician Jean Fan, provides a collection of R scripts to generate artistic images in the style of famous artworks, for example this randomly-generated piece in the style of Mondrian: Other art generators include "Tunnel" (rotated and scaled designs in the style of Päivi Julin),...

Read more »

No worries! Afterthoughts from UseR 2018

July 27, 2018
By
No worries! Afterthoughts from UseR 2018

This year the UseR conference took place in Brisbane, Australia. UseR is my favorite conference and this one was mine 11th (counting from Dortmund 2008).  Every UseR is unique. Every UseR is great. But my feelings are that European UseRs are (on average) more about math, statistics and methodology while US UseRs are more about … Czytaj dalej No...

Read more »

Weight loss in the U.S. – An analysis of NHANES data with tidyverse

July 27, 2018
By
Weight loss in the U.S. – An analysis of NHANES data with tidyverse

Based on a paper published in JAMA last year, the weight gain is increasing among US adults while there is no difference in the percentage of people that were trying to lose weight. The authors used the data from the National Health and Nutrition Examination Survey NHANES from 1988 to 2014 and calculated the proportion Related PostMachine Learning Results...

Read more »

The Ten Rules of Defensive Programming in R

July 27, 2018
By
The Ten Rules of Defensive Programming in R

When you think of R, defensive coding may not be your first thought. But writing code that fails well & is easy to debug is more important than you'd think. The post The Ten Rules of Defensive Programming in R appeared first on Doodling in Data.

Read more »

How to use Covariates to Improve your MaxDiff Model

July 27, 2018
By
How to use Covariates to Improve your MaxDiff Model

MaxDiff is a type of best-worst scaling. Respondents are asked to compare all choices in a given set and pick their best and worse (or...

Read more »

Using themes in ggplot2

July 27, 2018
By
Using themes in ggplot2

As noted elsewhere, sometimes beauty matters. A plot that’s pleasing to the eye will be considered more gladly, and thus might be understood more thoroughly. Also, since we at STATWORX oftentimes need to subsume and communicate our results, we have come to appreciate how a nice plot can upgrade any presentation. So how make a plot look good? How...

Read more »

Cucumber time, food on a 2D plate / plane

July 27, 2018
By
Cucumber time, food on a 2D plate / plane

Introduction It is 35 degree Celsius out side, we are in the middle of the ‘slow news season’, in many countries also called cucumber time.  A period typified by the appearance of less informative and frivolous news in the media. … Continue reading →

Read more »

EARL London interviews – Patrik Punco, NOZ Medien

July 27, 2018
By

Our next interviewee is Patrik Punco, Marketing Analyst at German media company, NOZ Medien. Patrik is presenting a lighting talk ‘Subscription Analytics with focus on Churn Pattern Recognition in a German News Company’ at EARL London. Ruth Thomson, Mango’s Practice Lead for Strategic Advice chatted to Patrik about the business need for his project, what value it created for the...

Read more »

Two new Apache Drill UDFs for Processing UR[IL]s and Internet Domain Names

July 26, 2018
By
Two new Apache Drill UDFs for Processing UR[IL]s  and Internet Domain Names

Continuing the blog’s UDF theme of late, there are two new UDF kids in town: drill-url-tools🔗 for slicing & dicing URI/URLs (just going to use ‘URL’ from now on in the post) drill-domain-tools🔗 for slicing & dicing internet domain names (IDNs). Now, if you’re an Apache Drill fanatic, you’re likely thinking “Hey hrbrmstr: don’t you... Continue reading →

Read more »

Hacking our way through UpSetR

Hacking our way through UpSetR

For our club meeting today we were going to summarize the Demystifying Data Science conference but we forgot that the videos are not released yet. Oops, we'll have to postpone our blog post. We didn't read the fine print that talk recordings will be available sometime next week. Sorry about that!— LIBD rstats club (@LIBDrstats) July 27, 2018 So we adjusted...

Read more »

CHAID v ranger v xgboost – a comparison – July 27, 2018

July 26, 2018
By
CHAID v ranger v xgboost – a comparison – July 27, 2018

In an earlier post, I focused on an in depth visit with CHAID (Chi-square automatic interaction detection). Quoting myself, I said “As the name implies it is fundamentally based on the venerable Chi-square test – and while not the most powerful (in terms of detecting the smallest possible differences) or the fastest, it really is easy to manage and...

Read more »

Announcing the 1st Bookdown Contest

July 26, 2018
By
Announcing the 1st Bookdown Contest

Since the release of the bookdown package in 2016, there have been a large number of books written and published with bookdown. Currently there are about 200 books (including tutorials and notes) listed on bookdown.org alone! We have also heard about other applications of bookdown based on custom templates (e.g., dissertations). As popular as bookdown is becoming, especially with teachers,...

Read more »

How to use rquery with Apache Spark on Databricks

July 26, 2018
By
How to use rquery with Apache Spark on Databricks

A big thank you to Databricks for working with us and sharing: rquery: Practical Big Data Transforms for R-Spark Users How to use rquery with Apache Spark on Databricks rquery on Databricks is a great data science tool.

Read more »

Stan Pharmacometrics conference in Paris July 24 2018

July 25, 2018
By

I just got back from attending this amazing conference in Paris:http://www.go-isop.org/stan-for-pharmacometrics---paris-franceA few people were disturbed/surprised by the fact that I am linguist ("what are you doing at an pharmacometrics conference?")....

Read more »

RStudio Connect 1.6.6 – Custom Emails

July 25, 2018
By
RStudio Connect 1.6.6 – Custom Emails

We are excited to announce RStudio Connect 1.6.6! This release caps a series of improvements to RStudio Connect’s ability to deliver your work to others. Custom Email The most significant change in RStudio Connect 1.6.6 is the new ability for publishers to customize the emails sent to others when they update their data products. In RStudio Connect, it is already...

Read more »

Explaining Black-Box Machine Learning Models – Code Part 2: Text classification with LIME

July 25, 2018
By
Explaining Black-Box Machine Learning Models – Code Part 2: Text classification with LIME

This is code that will encompany an article that will appear in a special edition of a German IT magazine. The article is about explaining black-box machine learning models. In that article I’m showcasing three practical examples: Explaining supervised classification models built on tabular data using caret and the iml package Explaining image classification models with keras and lime Explaining text classification...

Read more »

Singularity as a software distribution / deployment tool

July 25, 2018
By

In this blog post, I’ll explain how someone can take advantage of Singularity to make R or Python packages available as an image file to users. This is a necessity if the specific R or Python package is difficult to install across different operating systems making that way the installation process cumbersome. Lately, I’ve utilized the reticulate package in...

Read more »

rOpenSci Educators Collaborative: How Can We Develop a Community of Innovative R Educators?

rOpenSci Educators Collaborative: How Can We Develop a Community of Innovative R Educators?

tl;dr: we propose three calls to action: Share your curricular materials in the open. Participate in the rOpenSci Education profile series. Discuss with us how you want to be involved in rOpenSci Educators’ Collaborative. In previous posts in this series, we identified challenges that individual instructors typically face when teaching science with R, and shared characteristics of effective educational resources to help address...

Read more »

New Course: Structural Equation Modeling with lavaan in R

July 25, 2018
By
New Course: Structural Equation Modeling with lavaan in R

Here is the course link. Course Description When working with data, we often want to create models to predict future events, but we also want an even deeper understanding of how our data is connected or structured. In this course, you will explore ...

Read more »

New Course: Experimental Design in R

July 25, 2018
By
New Course: Experimental Design in R

Here is the course link. Course Description Experimental design is a crucial part of data analysis in any field, whether you work in business, health or tech. If you want to use data to answer a question, you need to design an experiment! In this c...

Read more »

Search R-bloggers


Sponsors

Mango solutions





Zero Inflated Models and Generalized Linear Mixed Models with R

Analytics Vidhya



datasciencego.com

Quantide: statistical consulting and training

ODSC2 west

ODSC1_london

datasociety

http://www.eoda.de

max kuhn









Six Sigma Online Training

mljar.com

Our ads respect your privacy. Read our Privacy Policy page to learn more.

Contact us if you wish to help support R-bloggers, and place your banner here.