3 new R jobs (from R-users.com ; 2015-08-17)
<img src=' [Read more...]
Slides of 10+ excellent tutorials at KDD 2015: Spark, graph mining and many more
by Yanchang Zhao RDataMining.com I attended the KDD 2015 conference in Sydney last week. At the conference, there were more than 10 tutorials and I went to two of them, which are 1) Graph-Based User Behavior Modeling: From Prediction to … Continue reading → [Read more...]
Constructing a network of politicians from newspaper data
The following is a guest post by Jana Blahak and Jan Dix (University of Konstanz), with support from Simon Munzert.
In the last post, we introduced the
rzeit package, an R binding to the Content API at ZEIT Online. This time, we give a little demonstration of what can be ... [Read more...]
RSiteCatalyst Version 1.4.5 Release Notes
It’s only been a month since the last RSiteCatalyst update, and this update is also a pretty minor update in terms of functionality.
Set Your Own Endpoint
For the overseas users (or companies with weird setups), you can now use the endpoint argum... [Read more...]
Some reflections on teaching frequentist statistics at ESSLLI 2015
I spent the last two weeks teaching frequentist and Bayesian statistics at the European Summer School in Logic, Language, and Information (ESSLLI) in Barcelona, at the beautiful and centrally located Pompeu Fabra University. The course web page for the first week is here, and the web page for the second ... [Read more...]
R, Python, and SAS: Getting Started with Linear Regression
Consider the linear regression model, $$ y_i=f_i(\boldsymbol{x}|\boldsymbol{\beta})+\varepsilon_i, $$ where $y_i$ is the response or the dependent variable at the $i$th case, $i=1,\cdots, N$ and the predictor or the independent variable is the $\boldsymbol{x}$ term defined in the mean function $... [Read more...]
R, Python, and SAS: Getting Started with Linear Regression
Consider the linear regression model, $$ y_i=f_i(boldsymbol{x}|boldsymbol{beta})+varepsilon_i, $$ where $y_i$ is the response or the dependent variable at the $i$th case, $i=1,cdots, N$ and the predictor or the independent variable is the $boldsymbol{x}$ term defined in the mean function $... [Read more...]
R 3.2.2 is released
R 3.2.2 (codename “Fire Safety”) was released last weekend. You can get the latest binaries version from here. (or the .tar.gz source code from here). The full list of new features and bug fixes is provided below. SOME OF THE CHANGES I personally found two things particularly interesting in this ...
[Read more...]Some Considerations of Modeling Severity in Operational Losses
In the Loss Distributional Approach (LDA) for Operational Risk models, multiple distributions, including Log Normal, Gamma, Burr, Pareto, and so on, can be considered candidates for the distribution of severity measures. However, the challenge remains in the stress testing exercise, e.g. CCAR, to relate operational losses to macro-economic scenarios ... [Read more...]
Yet another post on google scholar data analysis
Inspired by this post, I wanted to use Google Scholar data to put nice images on my professional website (girly habit). This post explains how I combined the functions available in the R package scholar with additional analyses (partially inspired from the script available at this link, which in my ... [Read more...]
Managing longitudinal data: Conversion between the wide and the long
If you measure the same person twice, you have longitudinal data. We all love longitudinal data because we can understand how their health outcomes change with time and this helps answering many interesting research questions. However, newer R users often face a problem in managing longitudinal data because it often ... [Read more...]
Programming simple economic experiments in shiny
In this post, I want to present a flexible way to implement small surveys or economic experiments in shiny. If you have some background in experimental economics, you may have noticed that the most widely used software to implement economic experiments is zTree. To be honest I never touched zTree ...
[Read more...]Seattle histogram
Filed under: pictures, R, Statistics, Travel Tagged: histogram, sculpture, Seattle, Washington Convention Center [Read more...]
RForcecom Demo Video
Recently, I have created a demo video of an R package named RForcecom which connect to the Salesforce.com and Force.com from R. The video consists of 4 parts. Install and load RForcecom Sign into the Salesforce.com Get opportunity list from…Read more › [Read more...]
RForcecom Demo VVideo
Recently, I have created a demo video of an R package named RForcecom which connect to the Salesforce.com and Force.com from R. The video consists of 4 parts. Install and load RForcecom Sign into the Salesforce.com Get opportunity list from…Read more › [Read more...]
Time Series Analysis: Building a model on non-stationary time series
In this post I will give a brief introduction to time series analysis and its applications. We will be using the R package astsa which was developed by professor David Stoffer at the University of Pittsburgh. The textbook it accompanies, which is a good read for anyone interested in the ... [Read more...]
R in big data pipeline
R is my fabovite tool for research. There are still quite a few things that only R can do or quicker/easier with R.
But unfortunately a lot of people think R becomes less powerful at production stage where you really need to make sure all the fun... [Read more...]
R in big data pipeline
R is my fabovite tool for research. There are still quite a few things that only R can do or quicker/easier with R.
But unfortunately a lot of people think R becomes less powerful at production stage where you really need to make sure all the functionalities run as ... [Read more...]
The Rise of the Robots (Advisors…)
The Asset Management industry is on the verge of a major change. Over the last couple of years Robots Advisors (RA) have emerged as new players. The term itself is hard to define as it encompasses a large variety of services. Some are designed to help traditional advisers to better ... [Read more...]
Use box plots to assess the distribution and to identify the outliers in your dataset
After you check the distribution of the data by ploting the histogram, the second thing to do is to look for outliers. Identifying the outliers is important becuase it might happen that an association you find in your analysis can be explained by the presence of outliers. The best tool ... [Read more...]