Articles by David Smith

Using checkpoint with knitr and RStudio

April 25, 2017 | David Smith

The knitr package by Yihui Xie is a wonderful tool for reproducible data science. I especially like using it with R Markdown documents, where with some simple markup in an easy-to-read document I can easily combine R code and narrative text to generate an attractive document with words, tables and ... [Read more...]

R 3.4.0 now available

April 24, 2017 | David Smith

R 3.4.0, the latest release of the R programming language (codename: "You Stupid Darkness"), is now available. This is the annual major update to the R language engine, and provides improved performance for R programs. The source code was released by the R Core Team on Friday and binaries for Windows, ... [Read more...]

Reproducible Data Science with R

April 21, 2017 | David Smith

Yesterday, I had the honour of presenting at The Data Science Conference in Chicago. My topic was Reproducible Data Science with R, and while the specific practices in the talk are aimed at R users, my intent was to make a general argument for doing data science within a reproducible ... [Read more...]

SQL Server 2017 to add Python support

April 20, 2017 | David Smith

One of the major announcements from yesterday's Data Amp event was that SQL Server 2017 will add Python as a supported language. Just as with the continued R support, SQL Server 2017 will allow you to process data in the database using any Python function or package without needing to export the ... [Read more...]

Microsoft R Server 9.1 now available

April 19, 2017 | David Smith

During today's Data Amp online event, Joseph Sirosh announced the new Microsoft R Server 9.1, which is available for customers now. In addition the updated Microsoft R Client, which has the same capabilities for local use, is available free for everyone on both Windows and — new to this update — Linux. This ... [Read more...]

Free AI Workshop, May 9 in Seattle

April 17, 2017 | David Smith

There will be free AI workshop in Seattle on May 9, presented by members of the Microsoft Data Science team. The AI Immersion Workshop includes five specializations to choose from (in parallel tracks), all focused on an aspect of developing and deploying intelligent applications: Applied Machine Learning for Developers, featuring Microsoft ... [Read more...]

Data Amp: a major on-line Microsoft event, April 19

April 12, 2017 | David Smith

This coming Wednesday, April 19 at 8AM Pacific Time (click for your local time), Microsoft will be hosting a major on-line event of interest to anyone working with big data, analytics, and artificial intelligence: Microsoft Data Amp. During Data Amp, Executive Vice President Scott Guthrie and Corporate Vice President Joseph Sirosh ... [Read more...]

Prepare real-world data for analysis with the vtreat package

April 11, 2017 | David Smith

As anyone who's tried to analyze real-world data knows, there are any number of problems that may be lurking in the data that can prevent you from being able to fit a useful predictive model: Categorical variables can include infrequently-used levels, which will cause problems if sampling leaves them unrepresented ... [Read more...]

In case you missed it: March 2017 roundup

April 10, 2017 | David Smith

In case you missed them, here are some articles from March of particular interest to R users. A tutorial and comparison of the SparkR, sparklyr, rsparkling, and RevoScaleR packages for using R with Spark. An analysis of Scrabble games between AI players. The doAzureParallel package, a backend to "foreach" for ... [Read more...]

The faces of R, analyzed with R

April 7, 2017 | David Smith

Maëlle Salmon recently created a collage of profile pictures of people who use the #rstats hashtag in their Twitter bio to indicate their use of R. (I've included a detail below; click to see the complete version at Maëlle's blog.) Naturally, Maëlle created the collage using R ... [Read more...]

Microsoft R Open 3.3.3 now available

April 6, 2017 | David Smith

Microsoft R Open (MRO), Microsoft's enhanced distribution of open source R, has been upgraded to version 3.3.3, and is now available for download for Windows, Mac, and Linux. This update upgrades the R language engine to R 3.3.3, upgrades the installer, and updates the bundled packages. R 3.3.3 makes just a few minor ... [Read more...]

The Most Popular Languages for Data Scientists/Engineers

April 3, 2017 | David Smith

The results of the 2017 StackOverflow Survey of nearly 65,000 developers were published recently, and includes lots of interesting insights about their work, lives and preferences. The results include a cross-tabulation of the most popular languages amongst the "Data Scientist/Engineer" subset, and the results were ... well, surprising: When thinking about data ... [Read more...]

Tutorial: Using R for Scalable Data Analytics

March 31, 2017 | David Smith

At the recent Strata conference in San Jose, several members of the Microsoft Data Science team presented the tutorial Using R for Scalable Data Analytics: Single Machines to Spark Clusters. The materials are all available online, including the presentation slides and hands-on R scripts. You can follow along with the ... [Read more...]

Learning Scrabble strategy from robots, using R

March 30, 2017 | David Smith

While you might think of Scrabble as that game you play with your grandparents on a rainy Sunday, some people take it very seriously. There's an international competition devoted to Scrabble, and no end of guides and strategies for competitive play. James Curley, a psychology professor at Columbia University, has ... [Read more...]

Data science languages score highly in RedMonk rankings

March 27, 2017 | David Smith

Redmonk have once again updated (a little later than usual) their bi-annual programming language report with their January 2017 rankings. If you haven't come across these rankings before, they are based on GitHub contributions and StackOverflow questions related to around 40 commonly-used programming languages. The raw data (as of January 2017) is shown ... [Read more...]

Comparing subreddits, with Latent Semantic Analysis in R

March 24, 2017 | David Smith

FiveThirtyEight published a fascinating article this week about the subreddits that provided support to Donald Trump during his campaign, and continue to do so today. Reddit, for those not in the know, is an popular online social community organized into thousands of discussion topics, called subreddits (the names all begin ... [Read more...]
1 14 15 16 17 18 94

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)