582 search results for "SQL"

Finding Word Use Patterns in Wikileaks Cables

June 12, 2012
By
Finding Word Use Patterns in Wikileaks Cables

6/18: A follow-up to this post is now available here.Recent DiscoveriesWhen I was a diplomat, I was always interested in the Wikileaks cables and what could be done with them. Unfortunately, I never got a chance to look at the site in depth, due to security policies. Now that the ex- is firmly prepended to diplomat in my...

Read more »

Data distillation with Hadoop and R

June 11, 2012
By
Data distillation with Hadoop and R

We're definitely in the age of Big Data: today, there are many more sources of data readily available to us to analyze than there were even a couple of years ago. But what about extracting useful information from novel data streams that are often noisy and minutely transactional ... aye, there's the rub. One of the great things about...

Read more »

R Tops Data Mining Software Poll

May 31, 2012
By

For the past 12 years, KDNuggets has conducted an annual poll asking "What analytics/data mining software you used in the past 12 months for a real project (not just evaluation)". In this year's poll, R was the top-ranked data mining solution, selected by 30.7% of poll respondents. Microsoft Excel was second, at 29.8%. Rapidminer, which took the #1 spot...

Read more »

When SAP HANA met R – First kiss

When SAP HANA met R – First kiss

If you follow my blogs (I hope you do) then you know I really love the R programming language but I also love SAP HANA and in the past I have dealt with integration between those two:HANA meets RR meets HANASanitizing data in SAP HANA with RBut...those integrations were not done using the...

Read more »

Orbitz: R has become the data-mining tool of choice

May 17, 2012
By

Sameer Chopra, vice president of Advanced Analytics at Orbitz Worldwide, wrote recently in Analytics magazine about the changing landscape of processes, software and systems for statistical modelers. In a section on "Big Data and Open Source Analytics", Chopra lays out the reasons why the R language "has become the data-mining tool of choice for machine learners": R has very...

Read more »

data.table version 1.8.1 – now allowed numeric columns and big-number (via bit64) in keys!

May 9, 2012
By

This is a guest post written by Branson Owen, an enthusiastic R and data.table user. Wow, a long time desired feature of data.table finally came true in version 1.8.1! data.table now allowed numeric columns and big number (via bit64) in …Read more »

Read more »

PubMed publications in 2011 by 202 world countries: who’s the winner?

May 7, 2012
By
PubMed publications in 2011 by 202 world countries: who’s the winner?

Which country had the most PubMed citations in 2011? To find out I used R statistical software to analyze the affiliation of 986 427 articles.

Read more »

Ack! Duplicates in the Data!

May 3, 2012
By
Ack!  Duplicates in the Data!

As I mentioned in a previous post, I compiled the data set that I’m currently working on in PostgreSQL.  To get this massive data set, I had to write a query that was massive by dint of the number of … Continue reading →

Read more »

Google BigQuery and the Github Data Challenge

May 1, 2012
By

Github has made data on its code repositories, developer updates, forks etc. from the public GitHub timeline available for analysis, and is offering prizes for the most interesting visualization of the data. Sounds like a great challenge for R programmers! The R language is currently the 26th most popular on GitHub (up from #29 in December), and it would...

Read more »

The R-Podcast Episode 6: Importing Data from External Sources

April 29, 2012
By

In this episode: Listener feedback and importing data from external sources into R. We dive into the basics of importing delimited text files using read.table and its varients. We also discuss recommendations for importing MS Excel spreadsheet files, relational databases such as MySQL, data from HTML tables, and files produced by other statistical computing packages.

Read more »