Some time ago I read a nice post Solving easy problems the hard way where linear regression is used to solve an interesting puzzle. Following the idea I used rpart to find optimal decision tree sorting five elements.It is well known that...

In my last post, I considered the shifts in two interestingness measures as possible tools for selecting variables in classification problems. Specifically, I considered the Gini and Shannon interestingness measures applied to the 22 categorical mushroom characteristics from the UCI mushroom dataset. The proposed variable selection strategy was to compare these values when computed from only edible mushrooms...

by Yanchang Zhao, RDataMining.com There are some nice slides and R code examples on Data Mining and Exploration at http://www.inf.ed.ac.uk/teaching/courses/dme/, which are listed below. PDF Slides: - Overview of Data Mining http://www.inf.ed.ac.uk/teaching/courses/dme/2012/slides/datamining_intro4up.pdf - Visualizing Data http://www.inf.ed.ac.uk/teaching/courses/dme/2012/slides/visualisation4up.pdf - Decision trees http://www.inf.ed.ac.uk/teaching/courses/dme/2012/slides/classification4up.pdf … Continue reading →

For the past 12 years, KDNuggets has conducted an annual poll asking "What analytics/data mining software you used in the past 12 months for a real project (not just evaluation)". In this year's poll, R was the top-ranked data mining solution, selected by 30.7% of poll respondents. Microsoft Excel was second, at 29.8%. Rapidminer, which took the #1 spot...

The Tenth Australasian Data Mining Conference (AusDM 2012) Sydney, Australia 5-7 December 2012 http://ausdm12.togaware.com/ Data mining, the art and science of intelligent analysis of (usually large) data sets for meaningful (and previously unknown) insights, is now being actively applied in … Continue reading →

MapReduce, the heart of Hadoop, is a programming framework that enables massive scalability across servers using data stored in the Hadoop Distributed File System (HDFS). The Oracle R Connector for Hadoop (ORCH) provides access to a Hadoop cluster from R, enabling manipulation of HDFS-resident data and the execution of MapReduce jobs. Conceptutally, MapReduce is similar...