Boxplots are a good way to get some insight in your data, and while R provides a fine ‘boxplot’ function, it doesn’t label the outliers in the graph. However, with a little code you can add labels yourself:The numbers plotted next to ...

This post is contributed by Tomasz Konopka. Comments are welcome. [email protected] One of the great features of R is its capable graphics framework. In principle, the framework allows us to customize all aspects of the visual presentation of data. In practice, however, customization is rather tedious. For example, R’s own boxplot function has 17 custom arguments, not counting ...; stripchart has 20. Tweaking the default...

by Ari Lamstein Introduction Today I will walk through an analysis of San Francisco Zip Code Demographics using my new R package choroplethrZip. This package creates choropleth maps of US Zip Codes and connects to the US Census Bureau. A choropleth is a map that shows boundaries of regions (such as zip codes) and colors those regions according to...

Introducing: Machine Learning in R Machine learning is a branch in computer science that studies the design of algorithms that can learn. Typical machine learning tasks are concept learning, function learning or “predictive modeling”, clustering and finding predictive patterns. These tasks are learned through available data that were observed through experiences or instructions, for example. The post

by Herman Jopia What is Binning? Binning is the term used in scoring modeling for what is also known in Machine Learning as Discretization, the process of transforming a continuous characteristic into a finite number of intervals (the bins), which allows for a better understanding of its distribution and its relationship with a binary variable. The bins generated by...

Yesterday, I did upload a post where I tried to show that “standard” regression models where not performing bad. At least if you include splines (multivariate splines) to take into accound joint effects, and nonlinearities. So far, I do not discuss the possible high number of features (but with boostrap procedures, it is possible to assess something related to...