A Return to Reliable R

Posted on September 5, 2012 by inkhorn82 in Uncategorized | 0 Comments

[This article was first published on Data and Analysis with R, at Work, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The saga with Statistica continues:

Statistica kept crashing on me while doing my data processing. One of the big problems was a wonderful bug that occurred when some of my text data variables were coded (unsurprisingly) as text! Under this condition, I would only be able to add a certain small number of extra variables when I needed to make them, and then after that, any extra variable that I tried to add would crash the program!

I was told that this is a known bug in Statistica and they’re hoping to fix it with an update coming around by the end of the year. In the meanwhile, a workaround is to go into the “Variable Specs” for any variable coded as Text and recode it as “Double”, save the worksheet, then try again. That seemed to get rid of the crashing, but then my biographical ID column that held all the original database IDs for the individuals in my dataset got messed up. Numerous IDs, which were previously unique, became spontaneously reassigned to more than one person. I can’t have that because once I’m done with the dataset, I have to return important parts of it back to the clients I work with so they can put certain new columns into their database. So it was a bit of a catch 22.

My supervisor advised me to make a new, strictly numeric, ID column outside of Statistica, and import only the new ID column, and not the old one, back into the program. I did that, and all seemed well until finally it crashed, yet again! This time, I had no clue whatsoever why the crash happened.

That’s when I told myself “screw it, I’m wasting time in Statistica and am going to do the rest of this analysis in R”. Man, is it ever nice to be back in R. Ironically, things are much more simple and flow a lot faster for me. The only problem is that I have a few projects coming up soon that really need a data analysis program that can handle humongous data sets. For that reason, I’m probably going to have to see if reinstalling Statistica makes it more reliable to work with. If not, I suppose I’ll have to move on to other options!

To leave a comment for the author, please follow the link and comment on their blog: Data and Analysis with R, at Work.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

A Return to Reliable R

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)