The Open Source R Programming Language is Becoming Pervasive

August 8, 2014
By

So says CIO.com, in a recent article 11 Market Trends in Advanced Analytics. R, an open source programming language for computational statistics, visualization and data is becoming a ubiquitous tool in advanced analytics offerings. Kirsch says nearly every top vendor of advanced analytics has integrated R into their offering and so that they can now import R models. This...

Read more »

Community conversations and a new package for full text

August 8, 2014
By

Community Community is at the heart of rOpenSci. We couldn't have accomplished most of our work without help from various contributors and users. Most of our discussions with the broader community over the past year have been through twitter or one-on-one conversations. However, we would like to foster more open ended and deeper discussions with our community. To this end,...

Read more »

San Leandro and Hayward Housing Prices

San Leandro and Hayward Housing Prices

I’ve done a previous post about the salaries of data scientists, but now I’m going to look at one of the negative sides of all the high salaries generated by the tech field in the Bay Area – real estate prices. A … Continue reading →

Read more »

Vtreat: designing a package for variable treatment

August 7, 2014
By
Vtreat: designing a package for variable treatment

When you apply machine learning algorithms on a regular basis, on a wide variety of data sets, you find that certain data issues come up again and again: Missing values (NA or blanks) Problematic numerical values (Inf, NaN, sentinel values like 999999999 or -1) Valid categorical levels that don’t appear in the training data (especially Related posts:

Read more »

A Simple Shiny App for Monitoring Trading Strategies – Part II

August 7, 2014
By

This is a follow up on my previous post “A Simple Shiny App for Monitoring Trading Strategies“.  I added a few improvements that make the app a bit better (at least for me!). Below is the list of new features : A sample  .csv file (the one that contains the raw data) A “EndDate”  drop

Read more »

Incidental R

August 7, 2014
By
Incidental R

by Joseph Rickert Last week, I posted a list of sessions at the Joint Statistical Meetings related to R. As it turned out, that list was only the tip of the iceberg. In some areas of statistics, such as graphics, simulation and computational statistics the use of R is so prevalent that people working in the field often don't...

Read more »

Rcpp now used by 250 CRAN packages

August 7, 2014
By
Rcpp now used by 250 CRAN packages

Rcpp reached a nice round milestone yesterday: 250 packages on CRAN now depend on it ...

Read more »

why clusterProfiler fails

August 6, 2014
By

Recently, there are some comments said that sometimes clusterProfiler failed in KEGG enrichment analysis. kaji331 compared cluserProfiler with GeneAnswers and found that clusterProfiler gives larger p values. The result forces me to test it. Read More: 251 Words Totally

Read more »

John Chambers useR! 2014 Keynote

August 6, 2014
By
John Chambers useR! 2014 Keynote

At useR! 2014, John Chambers was generous enough to provide us with insight into the...

Read more »

Making Maps with a Punchline

August 6, 2014
By
Making Maps with a Punchline

I’ve had a lifelong fascination with maps, and working with R definitely enables my map...

Read more »

Data science goes to college with DataFest

August 6, 2014
By
Data science goes to college with DataFest

Below is the first of several exciting data science developments for the younger generation, happening...

Read more »

A New Use for Pipes in R: Forkbombs

August 6, 2014
By

Almost 3 years ago, I wrote about how to forkbomb with R. A quick recap is that a forkbomb is a low-tier, malicious misuse of a system; sort of a "baby's first denial of service". The idea is that you write a program that will start an entirely new copy of itself each time it is executed. Executing it...

Read more »

In case you missed it: July 2014 Roundup

August 6, 2014
By

In case you missed them, here are some articles from June of particular interest to R users: The deadline for our contest to visualize the location of R user groups has been extended to August 16. Previews of R-related sessions at this year's JSM conference in Boston. Coding errors in R graphics scripts serendipitously create some interesting art. Another...

Read more »

Predicting Monthly Car Sales for Brands in US: First Step

August 6, 2014
By

I've set out to produce monthly forecasts of monthly car sales by brand in the US. So far I've made a SUTSE dynamic linear model (code on Github) and created a Shiny app (http://sweiss.shinyapps.io/carvis/) as a prototype (no predictions...

Read more »

NCEAS Codefest

August 6, 2014
By

We're delighted to be sponsoring the upcoming Open Science Codefest in Santa Barbara, California, alongside RENCI, NCEAS, NSF, DataONE, and Mozilla Science Lab. The Open Science Codefest's goal is to gather researchers from across ecology, biodiversity science, and other earth and environmental sciences with programmer types to collaborate on coding projects. The ideas...

Read more »

Results of the Readers’ Survey

August 5, 2014
By
Results of the Readers’ Survey

 First of all, let me say “Thank You” to all of the 357 people who completed the survey. I was hoping for 100, so needless to say the response blew away my expectations. This endeavor seems like a worthwhile effort to do once a year. Next year I will refine the...

Read more »

New freqparcoord Example

August 5, 2014
By
New freqparcoord Example

In my JSM talk this morning, I spoke about work done by Yingkang Xie and myself, on a novel approach to the parallel coordinates method of visualization.  I’ve made several posts to this blog in the past on freqparcoord, our implemention of our method. My talk this morning used some recently-available NYC taxi data.  You

Read more »

When life gives you coloured cells, make categories

August 5, 2014
By
When life gives you coloured cells, make categories

Let’s start by making one thing clear. Using coloured cells in Excel to encode different categories of data is wrong. Next time colleagues explain excitedly how “green equals normal and red = tumour”, you must explain that (1) they have sinned and (2) what they meant to do was add a column containing the words

Read more »

Simpler R coding with pipes > the present and future of the magrittr package

August 5, 2014
By
Simpler R coding with pipes > the present and future of the magrittr package

This is a guest post by Stefan Milton, the author of the magrittr package which introduces the %>% operator to R programming. Preface (by Tal Galili) I was first introduced to the %>% (a.k.a: pipe) operator in R, thanks to Hadley Wickham’s (fascinating) dplyr tutorial (link to the workshop’s material) at useR!2014. After several discussions during the conference (including one very

Read more »

Clarifying difference between Ratio and Interval Scale of Measurement

August 5, 2014
By

Clarifying difference between Ratio and Interval Scale of Measurement Clarifying difference between Ratio and Interval Scale of Measurement Introduction Recently while preparing lecture on scales of measurements and types of statistical data, I came across two scales of measurement when numbers are used to denote a quantitative variable. ...

Read more »

GRAN and switchr can’t send you back in time, but they can send R (sort of)

August 5, 2014
By
GRAN and switchr can’t send you back in time, but they can send R (sort of)

Using package repositories to recreate the past, distribute the present, and protect against the future by Gabriel Becker (@groundwalkergmb) Bioinformatics and Computational Biology Genentech Research and Early Developmen 1. Have you ever needed to reach into the distant past … to recreate a years old result? Take - as an arbitrary example - Anders and Huber's paper on Differential...

Read more »

Parameterized SQL queries

August 5, 2014
By

Mateusz Żółtak asked me to spread the word about his new R package for parameterized SQL queries. Below you can find the copy of package vignette. If you work with SQL in R you may find it useful. Mateusz Żółtak The package RODBCext is an extension of the RODBC database connectivity package. It provides support

Read more »

ESA 2014: Don’t Know Much About History…

August 5, 2014
By
ESA 2014: Don’t Know Much About History…

After my last post text-mining ESA Annual Meeting abstracts, Nash Turley was interested in the presence of the term “natural history” in ESA abstracts. I decided to collect a little more data by including programs back to 2010, giving a five-year data set. Thankfully the program back to 2010 remains in mostly the...

Read more »

Rotated axis labels in R plots

August 5, 2014
By
Rotated axis labels in R plots

It's somehow amazing to me that the option for slanted or rotated axes labels is not an option within the basic plot() or axis() functions in R.  The advantage is mainly in saving plot area space when long labels are needed (rather than as a means...

Read more »

Social Media Mining and Bioinformatics (with R)

August 5, 2014
By
Social Media Mining and Bioinformatics (with R)

In June and July, I receive copies of two books, Social Media Mining with R, by Nathan Danneman and Richard Heimann Bioinformatics with R Cookbook, by Paurush Praveen Sinha For the first one, two recent interesting books deal with the same topic. Reza Zafarani, Mohammad Ali Abbasi and Huan Liu published last year Social Media Mining: An Introduction. Actually, the book can...

Read more »

Thanks to R Markdown: Perhaps Word is an option after all?

August 5, 2014
By
Thanks to R Markdown: Perhaps Word is an option after all?

In many cases Word is still the preferred file format for collaboration in the office. Yet, it is often a challenge to work with it, not so much because of the software, but how it is used and abused. Thanks to Markdown it is no longer painful to inclu...

Read more »

BH release 1.54.0-3

August 4, 2014
By
BH release 1.54.0-3

A new release of our BH package providing Boost headers for use by R is now on the CRAN mirrors. This release is the third based on Boost 1.54.0. At the request of the maintainer of the recent added RcppMLPACK package, it adds the Boost.Heap l...

Read more »

Parsing Domain Names in R with tldextract

August 4, 2014
By

The R Language is really good at data and statistical analysis, but when it comes to working with information security data it has a few holes that need plugging up. Bob has been doing a couple of posts using Rcpp to do things like Basic DNS Lookups, TXT lookups, and IPv4 Conversions. I wanted to add to some of that work with a quick package...

Read more »

Introducing the Shiny App DThiring

August 4, 2014
By

Well it has a been a long time since I have written anything on this blog.  I am long overdue.  I've been terribly busy learning new things and getting on with life.  One of the things I have learned is building R applications using Shin...

Read more »