Extracting and Enriching Ocean Biogeographic Information System (OBIS) Data with R

January 25, 2017
By
Extracting and Enriching Ocean Biogeographic Information System (OBIS) Data with R

Programmatic access to biodiversity data is revolutionising large-scale, reproducible biodiversity research. In the marine realm, the largest global database of species occurrence records is the Ocean Biogeographic Information System, OBIS. As of January 2017, OBIS contains 47.78 million occurrences of 117,345 species, all openly available and accessible via the OBIS API. The number of questions to address...

Read more »

Modelling extremes using generalized additive models

January 25, 2017
By
Modelling extremes using generalized additive models

Quite some years ago, whilst working on the EU Sixth Framework project Euro-limpacs, I organized a workshop on statistical methods for analyzing time series data. One of the sessions was on the analysis of extremes, ably given by Paul Northrop (UCL Department of Statistical Science). That intro certainly whet my appetite but I never quite found the time to...

Read more »

A Glimpse into The Daily Life of a Data Scientist

January 24, 2017
By
A Glimpse into The Daily Life of a Data Scientist

A couple of weeks ago, I had a discussion with a co-worker regarding a project I was involved in, I felt that there was no clear understanding of the daily challenges data scientists face. Few days later, I was at Rstudio::Conf 2017 where I met lots of data scientists from academia and industry. Later on, I described one of...

Read more »

a typo that went under the radar

January 24, 2017
By
a typo that went under the radar

A chance occurrence on X validated: a question on an incomprehensible formula for Bayesian model choice: which, most unfortunately!, appeared in Bayesian Essentials with R! Eeech! It looks like one line in our LATEX file got erased and the likelihood part in the denominator altogether vanished. Apologies to all readers confused by this nonsensical formula!Filed

Read more »

Building a machine learning model with the MicrosoftML package

January 24, 2017
By
Building a machine learning model with the MicrosoftML package

Microsoft R Server 9 includes a new R package for machine learning: MicrosoftML. (So do the Data Science Virtual Machine and the free Microsoft R Client edition, incidentally.) This package includes a suite of fast predictive modeling functions implemented by Microsoft Research, including: Linear (rxFastLinear) and logistic (rxLogisticRegression) model functions based on the Stochastic Dual Coordinate Ascent method; Classification/regression...

Read more »

“smooth” package for R. es() function. Part IV. Model selection and combination of forecasts

January 24, 2017
By
“smooth” package for R. es() function. Part IV. Model selection and combination of forecasts

Mixed models In the previous posts we have discussed pure additive and pure multiplicative exponential smoothing models. The next logical step would be to discuss mixed models, where some components have additive and the others have multiplicative nature. But we won’t spend much time on them because I personally think that they do not make

Read more »

Descriptive Analysis of MLST Data for MRSA

January 24, 2017
By
Descriptive Analysis of MLST Data for MRSA

During one of my summers, I had the opportunity to conduct some research on the prevalence of methicillin-resistant Staphylococcus aureus (MRSA) in vulnerable populations and examining US emergency department data and I thought this would be a pretty interesting topic to expand on for my thesis in lieu of the increasing concerns of antimicrobial resistance, … Continue...

Read more »

Building Shiny App Exercises (part 5)

January 24, 2017
By
Building Shiny App Exercises (part 5)

RENDER FUNCTIONS In the fourth part of our series we just “scratched the surface” of reactivity by analyzing some of the properties of the renderTable function. Now it is time to get deeper and learn how to use the rest of the render functions that shiny provides. As you were told in part 4 these

Read more »

Distribution of Mean of the Combinations of a Set.

January 24, 2017
By
Distribution of Mean of the Combinations of a Set.

For some purpose I found myself generating and analyzing the average of the combinations of a set and when I generated the corresponding histogram I was surprised by its shape.It should be remembered that the combinations C(m, n) of a set are the number of ...

Read more »

xml2 1.1.1

January 24, 2017
By
xml2 1.1.1

Today we are pleased to release version 1.1.1 of xml2. xml2 makes it easy to read, create, and modify XML with R. You can install it with: install.packages("xml2") As well as fixing many bugs, this release: Makes it easier to create an modify XML Improves roundtrip support between XML and lists Adds support for XML

Read more »

Creating a “balloon plot” as alternative to a heat map with ggplot2

January 24, 2017
By
Creating a “balloon plot” as alternative to a heat map with ggplot2

Heat maps are great to compare observations with lots of variables (which must be comparable in terms of unit, domain, … Read More →

Read more »

sparklyr 0.5

January 24, 2017
By
sparklyr 0.5

We’re happy to announce that version 0.5 of the sparklyr package is now available on CRAN. The new version comes with many improvements over the first release, including: Extended dplyr support by implementing: do() and n_distinct(). New functions including sdf_quantile(), ft_tokenizer() and ft_regex_tokenizer(). Improved compatibility, sparklyr now respects the value of the ‘na.action’ R option and dim(), nrow() and ncol(). Experimental

Read more »

Euler Problem 9 : Special Pythagorean Triple

January 24, 2017
By
Euler Problem 9 : Special Pythagorean Triple

Solution to Euler Problem 9 in the R Language: Find the Pythagorean triple for which a+b+c equals 1000. Continue reading → The post Euler Problem 9 : Special Pythagorean Triple appeared first on The Devil is in the Data.

Read more »

How to do an analysis in R (part 2, visualization and analysis)

January 24, 2017
By
How to do an analysis in R (part 2, visualization and analysis)

In several recent blog posts, I've emphasized the importance of data analysis. My main point has been, that if you want to learn data science, you need to learn data analysis. Data analysis is the foundation of practical data science. With that statement in mind, I want to show you step-by-step what an analysis looks like in R ... The post...

Read more »

How to use viridis colors with plotly and leaflet?

“… avoiding catastrophe becomes the first principle in bringing color to information: Above all, do no harm.” - Envisioning Information, Edward Tufte, Graphics Press, 1990 Choosing colors for your plot is not so simple. Why is that so? First of all, it depends on numerous things… What plot are you creating? What is...

Read more »

Parallel Computation with R and XGBoost

January 23, 2017
By
Parallel Computation with R and XGBoost

Share This: XGBoost is a comprehensive machine learning library for gradient boosting. It began from the Kaggle community for online machine learning challenges, and then maintained by the collaborative efforts from the developers in the community. It is well known for its accuracy, efficiency and flexibility for various interfaces: the computational module is implemented in C++,

Read more »

French villages and a sort of resolution

January 23, 2017
By
French villages and a sort of resolution

Sort of introduction to this post and hopefully the next ones I usually don’t have any New Year resolution. However, recent tweets about productivity – from people I actually find productive and inspiring – made me ponder a bit on my unfinished...

Read more »

Upcoming R Conferences

January 23, 2017
By

Since a few new events have been announced recently, I thought I'd give a run-down on some major R conferences coming up in the next six months. February 18: satRdays, Cape Town (South Africa). This is the second in a series of one-day conferences inspired by an R Consortium proposal. The first event in Budapest was a great success,...

Read more »

Principal Component Analysis in R

January 23, 2017
By
Principal Component Analysis in R

Principal component analysis (PCA) is routinely employed on a wide range of problems. From the detection of outliers to predictive modeling, PCA has the ability of projecting the observations described by variables into few orthogonal components defined at where the data ‘stretch’ the most, rendering a simplified overview. PCA is particularly powerful in dealing with multicollinearity and variables that … Continue...

Read more »

Choosing Software to Publish your Data Science Portfolio

January 23, 2017
By
Choosing Software to Publish your Data Science Portfolio

I’ve recently spoken to several people who Have decided to create a portfolio of their data science projects Are new to online publishing They frequently have... The post Choosing Software to Publish your Data Science Portfolio appeared first on AriLamstein.com.

Read more »

2016, the Earthquake Annus Horribilis of Italy

January 23, 2017
By
2016, the Earthquake Annus Horribilis of Italy

  There is no exaggeration in stating that historic heritage is one of the most outstanding and valuable assets of Italy. The smallest villages or the largest cities, all boast hundred- (sometimes thousand-) year old buildings of great cultural, architectural, or artistic interest. Amazingly enough, a vast majority of these Read More ...

Read more »

Where Cohen went wrong – the proportion of overlap between two normal distributions

January 23, 2017
By
Where Cohen went wrong – the proportion of overlap between two normal distributions

I've received many emails regarding the percent of overlap reported in my Cohen's d visualization. Observant readers, have noted that I report a different number than Cohen (and other authors). For instance, if we open p. 22 in Cohen's Statistical power analysis for the behavior sciences, we see that Cohen writes that d = 0.5 means a 33...

Read more »

Releasing RQGIS 0.2.0

January 23, 2017
By
Releasing RQGIS 0.2.0

Today we are happy to announce a new version of RQGIS! RQGIS establishes an interface between R and QGIS, i.e. it allows the user to access the more than 1000 QGIS geoalgorithms from within R.

Read more »

Trumpworld Analysis : Ownership Relations in his Business Network

January 23, 2017
By
Trumpworld Analysis : Ownership Relations in his Business Network

Analysis of the ownership relationships between organisations associated with Donald J. Trump. A social network analysis of Trumpland using the igraph package in R. Continue reading → The post Trumpworld Analysis : Ownership Relations in his Business Network appeared first on The Devil is in the Data.

Read more »

Detect Lines in Digital Images

January 23, 2017
By
Detect Lines in Digital Images

As part of our data science training initiative, bnosac is also providing a course on computer vision with R & Python which is held in March 9-10 in Leuven, Belgium (subscribe here or have a look at our full training offer here). Part of the course is covering finding blobs, corners, gradients, edges & lines in images. For...

Read more »

readr::problems() returns tidy data!

January 23, 2017
By

A handy little trick I picked up today when using readr. Some background: I needed a mapping between ZIP Code Tabulation Areas and counties (to link to some urban/rural data). The Census Bureau provides a CSV style table that includes information about each of the ZCTA (e.g.,...

Read more »

Monotonic Binning with Smbinning Package

January 22, 2017
By
Monotonic Binning with Smbinning Package

The R package smbinning (http://www.scoringmodeling.com/rpackage/smbinning) provides a very user-friendly interface for the WoE (Weight of Evidence) binning algorithm employed in the scorecard development. However, there are several improvement opportunities in my view: 1. First of all, the underlying algorithm in the smbinning() function utilizes the recursive partitioning, which does not necessarily guarantee the monotonicity. 2.

Read more »

Applying diffusion theory to Google Trends

January 22, 2017
By

on example of Candy Crush Saga adoption -

Read more »

Interactive BMI Chart

January 22, 2017
By
Interactive BMI Chart

I was recently listening to the #WhoIsFat Joe Rogan podcast where comedians Bert Kreischer and Tom Segura had their weight loss challenge weigh-ins. The challenge was for both guys to get out of the “obese” category and into the merely “overweight” category. If one made it and the other didn’t, the loser would pay for a trip to Paris...

Read more »

Sponsors

Mango solutions











Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training

ODSC1

ODSC2

datasociety

http://www.eoda.de







CRC R books series







Six Sigma Online Training





Contact us if you wish to help support R-bloggers, and place your banner here.