Articles by Data Science Notes - R

Read from hdfs with R. Brief overview of SparkR.

February 19, 2016 | 0 Comments

Disclaimer: originally I planned to write post about R functions/packages which allow to read data from hdfs (with benchmarks), but in the end it became more like an overview of SparkR capabilities. Nowadays working with “big data” almost always means working with hadoop ecosystem. A few years ago this ... [Read more...]

Experiments on english wikipedia. GloVe and word2vec.

November 30, 2015 | 0 Comments

Today I will start to publish series of posts about experiments on english wikipedia. As I said before, text2vec is inspired by gensim - well designed and quite efficient python library for topic modeling and related NLP tasks. Also I found very useful Radim’s posts, where he tried ... [Read more...]

Analyzing texts with text2vec package.

November 8, 2015 | 0 Comments

In the last weeks I have actively worked on text2vec (formerly tmlite) - R package, which provides tools for fast text vectorization and state-of-the art word embeddings. This project is an experiment for me - what can a single person do in a particular area? After these hard weeks, ... [Read more...]

Introducing tmlite – new framework for text mining in R

September 15, 2015 | 0 Comments

IMPORTANT NOTE Code from this post is outdated (package APIs were changed). See this post. Today I am pleased to present tmlite - small, but fast and robust package for text-mining tasks in R. It is not availible yet on CRAN, but you can install it directly from github: devtools::... [Read more...]

Working with MS SQL server on non-windows systems

July 15, 2015 | 0 Comments

As I know, there are few choices to connect from R to MS SQL Server: RODBC RJDBC rsqlserver But only second option can be used on mac and linux machines. Here is nice stackoverflow thread. Most of the people suggest to use microsoft sql java driver. But there is a ... [Read more...]

Installing cuda toolkit and related R packages

June 3, 2015 | 0 Comments

The main purpose of this post is to keep all steps of installing cuda toolkit (and R related packages) and in one place. Also I hope this may be useful for someone. Installing cuda toolkit ( Ubuntu ) First of all we need to install nvidia cuda toolkti. I’am on latest ... [Read more...]

Locality Sensitive Hashing In R Part 1

January 1, 2015 | 0 Comments

Introduction In the next series of posts I will try to explain base concepts Locality Sensitive Hashing technique. Note, that I will try to follow general functional programming style. So I will use R’s Higher-Order Functions instead of traditional R’s *apply functions family (I suppose this post will ... [Read more...]

Rmongodb 1.8.0

November 1, 2014 | 0 Comments

Today I’m introducing new version of rmongodb (which I started to maintain) – v1.8.0. Install it from github: library(devtools) install_github("mongosoup/[email protected]") Release version will be uploaded to CRAN shortly. This release brings a lot of improvements to rmongodb: Now rmongodb correctly handles arrays. [Read more...]

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)