Gradient boosting in R

August 24, 2017
By
Gradient boosting in R

Boosting is another famous ensemble learning technique in which we are not concerned with reducing the variance of learners like in Bagging where our aim is to reduce the high variance of learners by averaging lots of models fitted on bootstrapped data samples generated with replacement from training data, so as to avoid overfitting. Another Related Post Radial kernel Support...

Read more »

Linear Congruential Generator in R

August 24, 2017
By
Linear Congruential Generator in R

Part of 1 in the series Random Number GenerationA Linear congruential generator (LCG) is a class of pseudorandom number generator (PRNG) algorithms used for generating sequences of random-like numbers. The generation of random numbers plays a large role in many applications ranging from cryptography to Monte Carlo methods. Linear congruential... The post Linear Congruential Generator in R appeared first on...

Read more »

Calculating a fuzzy kmeans membership matrix with R and Rcpp

August 24, 2017
By

by Błażej Moska, computer science student and data science intern Suppose that we have performed clustering K-means clustering in R and are satisfied with our results, but later we realize that it would also be useful to have a membership matrix. Of course it would be easier to repeat clustering using one of the fuzzy kmeans functions available in...

Read more »

Reticulating Readability

August 24, 2017
By

I needed to clean some web HTML content for a project and I usually use hgr::clean_text() for it and that generally works pretty well. The clean_text() function uses an XSLT stylesheet to try to remove all non-“main text content” from an HTML document and it usually does a good job but there are some pages... Continue reading →

Read more »

Big Data analytics with RevoScaleR Exercises

August 24, 2017
By
Big Data analytics with RevoScaleR Exercises

In this set of exercise , you will explore how to handle bigdata with RevoscaleR package from Microsoft R (previously Revolution Analytics).It comes with Microsoft R client . You can get it from here . get the Credit card fraud data set from revolutionanalytics and lets get started Answers to the exercises are available here.Please Related exercise sets: Vector exercises...

Read more »

Introducing ‘powerlmm’ an R package for power calculations for longitudinal multilevel models

August 24, 2017
By

Over the years I've produced quite a lot of code for power calculations and simulations of different longitudinal linear mixed models. Over the summer I bundled together these calculations for the designs I most typically encounter into an R package. T...

Read more »

New R Course: Sentiment Analysis in R – The Tidy Way

August 24, 2017
By
New R Course: Sentiment Analysis in R – The Tidy Way

Hello, R users! This week we're continuing to bridge the gap between computers and human language with the launch Sentiment Analysis in R: The Tidy Way by Julia Silge! Text datasets are diverse and ubiquitous, and sentiment analysis provides an approa...

Read more »

Practical Guide to Principal Component Methods in R

August 24, 2017
By
Practical Guide to Principal Component Methods in R

Introduction Although there are several good books on principal component methods (PCMs) and related topics, we felt that many of them are either too theoretical or too advanced. This book provides a solid practical guidance to summarize, visu...

Read more »

Boston EARL Keynote speaker announcement: Tareef Kawaf

August 24, 2017
By
Boston EARL Keynote speaker announcement: Tareef Kawaf

...

Read more »

H2O.ai: Going for a paddle

August 24, 2017
By
H2O.ai: Going for a paddle

Owen Jones, Placement Student A quick disclaimer: This post isn’t called H2O.ai: Going for the 100m freestyle world record. I’m not trying to win a Kaggle competition. I’m not carrying out detailed, highly-controlled benchmarking tests. I’m not, in fact, claiming to be doing anything particularly useful at all. This is just me, just playing around with some code, just for...

Read more »

A simple function for installing R packages based on a folder with R scripts

August 24, 2017
By

- Whenever I buy a new computer or format an old one, I have the problem of reinstalling my R packages. If you are a heavy user, you will likely have a significant amount of packages used by...

Read more »

FedData – Getting assorted geospatial data into R

August 24, 2017
By
FedData – Getting assorted geospatial data into R

The package FedData has gone through software review and is now part of rOpenSci. FedData includes functions to automate downloading geospatial data available from several federated data sources (mainly sources maintained by the US Federal government). Currently, the package enables extraction from six datasets: The National Elevation Dataset (NED) digital elevation models (1 and 1/3 arc-second; USGS) The...

Read more »

My experience in switching from Windows 10 to Linux Mint 18.2

August 24, 2017
By

- It has been 8 months since I switched from Windows 10 to Linux Mint. In this post I’ll talk about my experience as a scholar and R user in this transition. My work is, simply put, to...

Read more »

Analyzing Google Trends Data in R

August 23, 2017
By

Google Trends shows the changes in the popularity of search terms over a given time (i.e., number of hits over time). It can be used to find search terms with growing or decreasing popularity or...

Read more »

Hard-nosed Indian Data Scientist Gospel Series – Part 1 : Incertitude around Tools and Technologies

August 23, 2017
By
Hard-nosed Indian Data Scientist Gospel Series – Part 1 : Incertitude around Tools and Technologies

Before recession a commercial tool was popular in the country, hence, uncertainty around tools and technology was not much; however, after recession, incertitude (i.e. uncertainty) around tools and technology have pre-occupied and occupying data sc...

Read more »

Digit fifth powers: Euler Problem 30

August 23, 2017
By
Digit fifth powers: Euler Problem 30

Euler problem 30 is another number crunching problem that deals with numbers to the power of five. Two other Euler problems dealt with raising numbers to a power. The previous problem looked at permutations of powers and problem 16 asks for … Continue reading → The post Digit fifth powers: Euler Problem 30 appeared first on The Devil is in the...

Read more »

Control Systems Toolbox – System Interconnection

August 23, 2017
By
Control Systems Toolbox – System Interconnection

Introduction Dynamic systems are usually represented by a model before they can be analyzed computationally. These dynamic systems are systems that change, evolve or have their states altered or varied with time based on a set of defined rules. Dynamic systems could be mechanical, electrical, electronic, biological, sociological, and so on. Many such systems are usually defined by a set...

Read more »

Sentiment analysis using tidy data principles at DataCamp

August 23, 2017
By
Sentiment analysis using tidy data principles at DataCamp

I’ve been developing a course at DataCamp over the past several months, and I am happy to announce that it is now launched! The course is Sentiment Analysis in R: the Tidy Way and I am excited that it is now available for you to explore and learn...

Read more »

Recreating and updating Minard with ggplot2

August 23, 2017
By
Recreating and updating Minard with ggplot2

Minard's chart depicting Napoleon's 1812 march on Russia is a classic of data visualization that has inspired many homages using different time-and-place data. If you'd like to recreate the original chart, or create one of your own, Andrew Heiss has created a tutorial on using the ggplot2 package to re-envision the chart in R: The R script provided in...

Read more »

Basics of data.table: Smooth data exploration

August 23, 2017
By
Basics of data.table: Smooth data exploration

The data.table package provides perhaps the fastest way for data wrangling in R. The syntax is concise and is made to resemble SQL. After studying the basics of data.table and finishing this exercise set successfully you will be able to start easing into using data.table for all your data manipulation needs. We will use data Related exercise sets: Vector exercises...

Read more »

Going Bayes #rstats

August 23, 2017
By
Going Bayes #rstats

Some time ago I started working with Bayesian methods, using the great rstanarm-package. Beside the fantastic package-vignettes, and books like Statistical Rethinking or Doing Bayesion Data Analysis, I also found the ressources from Tristan Mahr helpful to both better understand Bayesian analysis and rstanarm. This motivated me to implement tools for Bayesian analysis into my

Read more »

Rcpp now used by 10 percent of CRAN packages

August 23, 2017
By
Rcpp now used by 10 percent of CRAN packages

Over the last few days, Rcpp passed another noteworthy hurdle. It is now used by over 10 percent of packages on CRAN (as measured by Depends, Imports and LinkingTo, but excluding Suggests). As of this morning 1130 packages use Rcpp out of a total of...

Read more »

Simple practice: data wrangling the iris dataset

August 23, 2017
By

If you want to work on large data science projects (analyses and machine learning) you need to be able to perform dozens of small tasks ... For example, you'll need to be able to fluently perform dozens of little bits of data wrangling, just like this ... The post Simple practice: data wrangling the iris dataset appeared first on SHARP SIGHT...

Read more »

useR!2017 Roundup

August 23, 2017
By
useR!2017 Roundup

Organising useR!2017 was a challenge but a very rewarding experience. With about 1200 attendees of over 55 nationalities exploring an interesting program, we believe it is appropriate to call it a success - something the aftermovie only seems to confirm. Behind the Scenes To give you a glimpse behind the scenes of the conference organization, Maxim Nazarov held...

Read more »

Gender roles in film direction, analyzed with R

August 22, 2017
By
Gender roles in film direction, analyzed with R

What do women do in films? If you analyze the stage directions in film scripts — as Julia Silge, Russell Goldenberg and Amber Thomas have done for this visual essay for ThePudding — it seems that women (but not men) are written to snuggle, giggle and squeal, while men (but not women) shoot, gallop and strap things to other...

Read more »

Caching httr Requests? This means WAR[C]!

August 22, 2017
By

I’ve blathered about my crawl_delay project before and am just waiting for a rainy weekend to be able to crank out a follow-up post on it. Working on that project involved sifting through thousands of Web Archive (WARC) files. While I have a nascent package on github to work with WARC files it’s a tad... Continue reading →

Read more »

Some Neat New R Notations

August 22, 2017
By
Some Neat New R Notations

The R package seplyr supplies a few neat new coding notations. An Abacus, which gives us the term “calculus.” The first notation is an operator called the “named map builder”. This is a cute notation that essentially does the job of stats::setNames(). It allows for code such as the following: library("seplyr") names

Read more »

So you (don’t) think you can review a package

August 22, 2017
By
So you (don’t) think you can review a package

Contributing to an open-source community without contributing code is an oft-vaunted idea that can seem nebulous. Luckily, putting vague ideas into action is one of the strengths of the rOpenSci Community, and their package onboarding system offers a chance to do just that. This was my first time reviewing a package, and, as with so many things in life, I...

Read more »

Onboarding visdat, a tool for preliminary visualisation of whole dataframes

August 22, 2017
By
Onboarding visdat, a tool for preliminary visualisation of whole dataframes

Take a look at the data This is a phrase that comes up when you first get a dataset. It is also ambiguous. Does it mean to do some exploratory modelling? Or make some histograms, scatterplots, and boxplots? Is it both? Starting down either path, you often encounter the non-trivial growing pains of working with a new dataset. The mix ups of...

Read more »

Search R-bloggers

Sponsors

Mango solutions







Zero Inflated Models and Generalized Linear Mixed Models with R



Quantide: statistical consulting and training

ODSC2

ODSC1

datasociety

http://www.eoda.de





CRC R books series







Six Sigma Online Training



statcon.de

mljar.com



Contact us if you wish to help support R-bloggers, and place your banner here.