Blog Archives

Read from hdfs with R. Brief overview of SparkR.

February 19, 2016
By

Disclaimer: originally I planned to write post about R functions/packages which allow to read data from hdfs (with benchmarks), but in the end it became more like an overview of SparkR capabilities. Nowadays working with “big data” almost always means working with hadoop ecosystem. A few years ago this also meant that you also would have to be a good...

Read more »

text2vec implementation details. Writing fast parallel asynchronous SGD/AdaGrad.

January 8, 2016
By

Before reading this post, I very recommend to read: Orignal GloVe paper Jon Gauthier’s post, which provides detailed explanation of python implementation. This post helps me a lot with C++ implementation. Word embedding After Tomas Mikolov et al. released word2vec tool, there was a boom of articles about words vector representations. One of the greatest is GloVe,...

Read more »

Experiments on english wikipedia. GloVe and word2vec.

November 30, 2015
By
Experiments on english wikipedia. GloVe and word2vec.

Today I will start to publish series of posts about experiments on english wikipedia. As I said before, text2vec is inspired by gensim - well designed and quite efficient python library for topic modeling and related NLP tasks. Also I found very useful Radim’s posts, where he tried to evaluate some algorithms on english wikipedia dump....

Read more »

Analyzing texts with text2vec package.

November 8, 2015
By
Analyzing texts with text2vec package.

In the last weeks I have actively worked on text2vec (formerly tmlite) - R package, which provides tools for fast text vectorization and state-of-the art word embeddings. This project is an experiment for me - what can a single person do in a particular area? After these hard weeks, I believe, he can do a lot. There are a lot...

Read more »

Introducing tmlite – new framework for text mining in R

September 15, 2015
By
Introducing tmlite – new framework for text mining in R

IMPORTANT NOTE Code from this post is outdated (package APIs were changed). See this post. Today I am pleased to present tmlite - small, but fast and robust package for text-mining tasks in R. It is not availible yet on CRAN, but you can install it directly from github: devtools::install_github("dselivanov/tmlite") Reasonable question is - why new package? R already has such great...

Read more »

Working with MS SQL server on non-windows systems

July 15, 2015
By

As I know, there are few choices to connect from R to MS SQL Server: RODBC RJDBC rsqlserver But only second option can be used on mac and linux machines. Here is nice stackoverflow thread. Most of the people suggest to use microsoft sql java driver. But there is a case when this will not...

Read more »

Installing cuda toolkit and related R packages

June 3, 2015
By

The main purpose of this post is to keep all steps of installing cuda toolkit (and R related packages) and in one place. Also I hope this may be useful for someone. Installing cuda toolkit ( Ubuntu ) First of all we need to install nvidia cuda toolkti. I’am on latest ubuntu 15.04, but found this article well suited for...

Read more »

Locality Sensitive Hashing In R Part 1

January 1, 2015
By

Introduction In the next series of posts I will try to explain base concepts Locality Sensitive Hashing technique. Note, that I will try to follow general functional programming style. So I will use R’s Higher-Order Functions instead of traditional R’s *apply functions family (I suppose this post will be more readable for non R users). Also I will use brilliant...

Read more »

Rmongodb 1.8.0

November 1, 2014
By

Today I’m introducing new version of rmongodb (which I started to maintain) – v1.8.0. Install it from github: library(devtools) install_github("[email protected]") Release version will be uploaded to CRAN shortly. This release brings a lot of improvements to rmongodb: Now rmongodb correctly handles arrays. mongo.bson.to.list() rewritten from scratch. R’s unnamed lists are treated as arrays, named lists as objects. Also it has...

Read more »

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)