Function Objects and Pipelines in R

February 3, 2019
By

Composing functions and sequencing operations are core programming concepts. Some notable realizations of sequencing or pipelining operations include: Unix’s |-pipe CMS Pipelines. F#‘s forward pipe operator |__. Haskel’s Data.Function & operator. The R magrittr forward pipe. Scikit-learn‘s sklearn.pipeline.Pipeline. The idea is: many important calculations can be considered as a sequence of transforms applied to a … Continue reading Function...

Read more »

Retail Data Visualization with R and Shiny

February 3, 2019
By
Retail Data Visualization with R and Shiny

Introduction Because of my marketing background, finding information hiding wihtin a marketing dataset is always an interesting topic to me. It makes me feel a sense of accomplishment when I cleaned up a very messy large dataset, and finally discover some insights from it. Therefore, I've decided to practice my skills of data cleaning and

Read more »

Building Our Own Open Source Supercomputer with R and AWS

February 3, 2019
By

How to build a scaleable computing cluster on AWS and run hundreds or thousands of models in a short amount of time. We will completely rely on R and open source R packages. This is post 1 out of 2. Introduction An ever-increasing number of busines...

Read more »

Settling class action lawsuits with conjoint analysis and R (+a conjoint shiny app)

Settling class action lawsuits with conjoint analysis and R (+a conjoint shiny app)

A few days ago I presented at the 9th Israeli class action lawsuit conference. You’re probably asking yourself what would a data scientist do in a room full of lawyers? Apparently, there is a lot to do… Here’s the story: being in market research, we get a lot of lawyers which are faced with class action lawsuits (either suing or...

Read more »

R Package Update: urlscan

February 3, 2019
By
R Package Update: urlscan

The urlscan🔗 package (an interface to the urlscan.io API) is now at version 0.2.0 and supports urlscan.io’s authentication requirement when submitting a link for analysis. The service is handy if you want to learn about the details — all the gory technical details — for a website. For instance, say you wanted to check on... Continue reading →

Read more »

Synthesising Multiple Linked Data Sets and Sequences in R

February 3, 2019
By
Synthesising Multiple Linked Data Sets and Sequences in R

In my last post I looked at generating synthetic data sets with the ‘synthpop’ package, some of the challenges and The post Synthesising Multiple Linked Data Sets and Sequences in R appeared first on Daniel Oehm | Gradient Descending.

Read more »

The power of tapping into your community for support

February 2, 2019
By
The power of tapping into your community for support

This week the owner of my favorite Mexican restaurant in Baltimore, Rosalyn Vera, got death and arson1 threats. I could have been a bystander, but I tapped into my network and asked for help and she has received it. It’s been great to see the power of the community in action. The backstory So, I use R and Bioconductor for work...

Read more »

Multiple Data (Time Series) Streams Clustering

February 2, 2019
By

Nowadays, data streams occur in many real scenarios. For example, they are generated from sensors, web traffic, satellites, and other interesting use cases. We have to process them in a fast way and extract from them as much knowledge as we can. Data s...

Read more »

Navigate through Decennial Census and American Community Survey

February 2, 2019
By
Navigate through Decennial Census and American Community Survey

Finding the right content in census data can be daunting. Just give you an idea how complex the census data are, there are 1127 tables and 25070 columns of table contents in the 2012-2017 ACS 5-year summary file alone. dataset number of tables number of columns 2010 decennial census summary file 1 333 8959 2012-2017 5-year ACS summary file 1127 25070 2017 1-year ACS summary file 1372 33593 The complexity does not...

Read more »

Bibliography with knitr : cite your references and packages

February 2, 2019
By
Bibliography with knitr : cite your references and packages

A tutorial to use your Zotero references with rmarkdown, easily add the references, automatically generate your bibliography, including the packages used in your document. It’s a good practice to cite the R packages you use in your analysis. However it can be cumbersome to maintain the list of your package’s references in Zotero while the

Read more »

Homebrew 2.0.0 Released == homebrewanalytics package updated

February 2, 2019
By
Homebrew 2.0.0 Released == homebrewanalytics package updated

A major new release of Homebrew has landed and now includes support for Linux as well as Windows! via the Windows Subsystem for Linux. There are overall stability and speed improvements baked in as well. The aforelinked notification has all the info you need to see the minutiae. Unless you’ve been super-lax in updating, brew... Continue reading →

Read more »

Simulating the Six Nations 2019 Rugby Tournament in R

February 2, 2019
By
Simulating the Six Nations 2019 Rugby Tournament in R

I really like running simulation models before sporting events because they can give you so much more depth of understanding compared to the ‘raw’ odds that you get from the media or bookmakers, etc.  Yes, a team might have a “30% chance of winning a tournament we might hear”.  But there might be another strong … Continue reading Simulating...

Read more »

R Markdown: 3 sources of reproducibility issues and options how to tackle them

February 2, 2019
By
R Markdown: 3 sources of reproducibility issues and options how to tackle them

Introduction R Markdown is a great tool to use for creating reports, presentations and even websites that contain evaluated and rendered code. This can help us immensely when presenting data science type of work to audiences, while still being able to version control the content creation process. One of the challenges that stay is reproducibility of the rendered results. In this...

Read more »

Kalman Filter: Modelling Time Series Shocks with KFAS in R

February 1, 2019
By
Kalman Filter: Modelling Time Series Shocks with KFAS in R

CategoriesAdvanced Modeling Tags R Programming Time Series When it comes to time series forecasts, conventional models such as ARIMA are often a popular option. While these models can prove to have high degrees of accuracy, they have one major shortcoming – they do not typically account for “shocks”, or sudden changes in a time series. Let’s see how we can potentially alleviate Related...

Read more »

Setting up your blog with RStudio and blogdown III: modify your theme

February 1, 2019
By
Setting up your blog with RStudio and blogdown III: modify your theme

This is Part III of my series of posts about how to setup you blog with RStudio and blogdown. The other parts are: - Part I: about to setup the blog using Hugo, RStudio and blogdown - Part II explains my workflow of creating new posts. - Part III (this one) how to modify the theme. In this post I am going to...

Read more »

Setting up your blog with RStudio and blogdown II: Workflow

February 1, 2019
By
Setting up your blog with RStudio and blogdown II: Workflow

Workflow In Part I of this series of post we setup our new blog using blogdown and Hugo. Once the blog is configured, this is the typical workflow I follow to write new posts and update my blog online: Open your blog project with RStudio Load the blogdown library and start the Hugo server and browser library(blogdown) blogdown::serve_site() Create new post: the best way is...

Read more »

Setting up our blog with RStudio and blogdown I: Creating the blog

February 1, 2019
By
Setting up our blog with RStudio and blogdown I: Creating the blog

Last month I migrated my blog from Wordpress to Hugo and blogdown. Now I can post from RStudio using R/markdown, which allow me to create interactive posts including R code. It has been such a good experience that I decide to write down how to do it with three posts: This one about to setup our blog. Part II explains my...

Read more »

Tutorial: Sequential Pattern Mining in R for Business Recommendations

February 1, 2019
By
Tutorial: Sequential Pattern Mining in R for Business Recommendations

by Allison Koenecke, Data Scientist, AI & Research Group at Microsoft, with acknowledgements to Amita Gajewar and John-Mark Agosta. In this tutorial, Allison Koenecke demonstrates how Microsoft could recommend to customers the next set of services they should acquire as they expand their use of the Azure Cloud, by using a temporal extension to conventional Market Basket Analysis. Problem...

Read more »

Mandalaxies

February 1, 2019
By
Mandalaxies

One cannot escape the feeling that these mathematical formulas have an independent existence and an intelligence of their own, that they are wiser than we are, wiser even than their discoverers (Heinrich Hertz) I love spending my time doing mathematics: transforming formulas into drawings, experimenting with paradoxes, learning new techniques … and R is a perfect … Continue reading Mandalaxies...

Read more »

Real Net Profit: 150% in just 4 Months

February 1, 2019
By

  Developing a post-commission profitable currency trading model using Pivot Billions and R. Needle, meet haystack. Searching for the right combination of features to make a consistent trading model can be quite difficult and takes many, many iterations. By incorporating Pivot Billions and R into my research process, I was able to dramatically improve the continue reading...

Read more »

dqrng v0.0.5: New and updated RNGs

February 1, 2019
By

A new version of dqrng has made it onto the CRAN servers after a brief hick-up. Thanks to the CRAN team in general and Uwe Ligges in particular for their relentless efforts. This versions adds a new RNG to be used together with the provided distribution functions: The 64 bit version of the 20 rounds

Read more »

recogeo: A new R package to reconcile changing geographies boundaries (and corresponding variables)

February 1, 2019
By
recogeo: A new R package to reconcile changing geographies boundaries (and corresponding variables)

Demographics information is usually reported in relation to precise boundaries: administrative, electoral, statistical, etc. Comparing demographics information reported at different point in time is often problematic because boundaries keep changing. The recogeo package faciliates reconciling boundaries and their data by a spatial analysis of the boundaries of two different periods. In this post, I explain

Read more »

Quantile regression in R

January 31, 2019
By
Quantile regression in R

Quantile regression: what is it? Let be some response variable of interest, and let be a vector of features or predictors that we want to use to model the response. In linear regression, we are trying to estimate the conditional … Continue reading →

Read more »

rOpenSci Software Peer Review: Still Improving

rOpenSci Software Peer Review: Still Improving

rOpenSci’s suite of packages is comprised of contributions from staff engineers and the wider R community, bringing considerable diversity of skills, expertise and experience to bear on the suite. How do we ensure that every package is held to a high standard? That’s where our software review system comes into play: packages contributed by the community undergo a transparent,...

Read more »

How GPL makes me leave R for Python :-(

January 31, 2019
By
How GPL makes me leave R for Python :-(

Being a data scientist in a startup I can program with several languages, but often R is a natural choice. Recently I wanted my company to build a product based on R. It simply seemed like a perfect fit. But this turned out to be a slippery slope into the open-source code licensing field, which … Continue reading How...

Read more »

Using Data Science to read 10 years of Luxembourguish newspapers from the 19th century

Using Data Science to read 10 years of Luxembourguish newspapers from the 19th century

I have been playing around with historical newspaper data (see here and here). I have extracted the data from the largest archive available, as described in the previous blog post, and now created a shiny dashboard where it is possible to visualize the most common words per article, as well as read a summary of each article. The summary was made using a method called...

Read more »

Announcing new software peer review editors: Melina Vidoni and Brooke Anderson

Announcing new software peer review editors: Melina Vidoni and Brooke Anderson

We are pleased to welcome Brooke Anderson and Melina Vidoni to our team of Associate Editors for rOpenSci Software Peer Review. They join Scott Chamberlain, Anna Krystalli, Lincoln Mullen, Karthik Ram, Noam Ross and Maëlle Salmon. With the addition of Brooke and Melina, our editorial board now includes four women and four men, located in North America, South America...

Read more »

Book review: Beyond Spreadsheets with R

January 30, 2019
By
Book review: Beyond Spreadsheets with R

Disclaimer: Manning publications gave me the ebook version of Beyond Spreadsheets with R - A beginner’s guide to R and RStudio by Dr. Jonathan Carroll free of charge. Beyond Spreadsheets with R shows you how to take raw data and transform it for use in computations, tables, graphs, and more. You’ll build on simple programming techniques like loops and conditionals to...

Read more »

missing digit in a 114 digit number [a Riddler’s riddle]

January 30, 2019
By
missing digit in a 114 digit number [a Riddler’s riddle]

A puzzling riddle from The Riddler (as Le Monde had a painful geometry riddle this week): this number with 114 digits 530,131,801,762,787,739,802,889,792,754,109,70?,139,358,547,710,066,257,652,050,346,294,484,433,323,974,747,960,297,803,292,989,236,183,040,000,000,000 is missing one digit and is a product of some of the integers between 2 and 99. By comparison, 76! and 77! have 112 and 114 digits, respectively. While 99! has 156 digits.

Read more »

Search R-bloggers


Sponsors

Mango solutions





Zero Inflated Models and Generalized Linear Mixed Models with R



wiley.com/learn/datascience

Quantide: statistical consulting and training

ODSC boston

http://www.eoda.de









Six Sigma Online Training

mljar.com

Our ads respect your privacy. Read our Privacy Policy page to learn more.

Contact us if you wish to help support R-bloggers, and place your banner here.