Package rtrie on CRAN

January 7, 2017
By

(This article was first published on R-Chart, and kindly contributed to R-bloggers)

Just had a new package published on CRAN…rtrie :).

The rtrie package allows you to quickly create Tries from a list of strings.  Example use cases for the package are word searching in games like Boggle or Scrabble or implementation of applications that  autocomplete or autocorrect text.   Incidentally, besides being great for statistics, R is an excellent language for building algorithms and data structures from a pedagogical standpoint.  The language’s Scheme influence makes the expression of algorithms straightforward and the plotting capabilities make it easy to visualize the data structures and processing involved.  The remainder of this post is intended to illustrate that fact as well as introduce rtrie’s capabilities.

To create a new trie from a vector of words:

> library(rtrie)
> trie <- char_tree(c(‘able’, ‘act’, ‘acts’, ‘across’, ‘act’, ‘bat’, ‘babble’, ‘bobble’), ‘X’)

The new trie is a list of lists… with each node being a letter.   So the list of lists above can be plotted using the data.tree package.  This package has a wide range of tree related processing functions… including functionality related to all of the plots shown in this post.

> library(data.tree)
> data_trie <– FromListSimple(trie)
> plot(data_trie)

The trie can also be displayed in text form which can be read left to right.

data_trie

> data_trie
                   levelName
1  Root                     
2   ¦–a                    
3   ¦   ¦–b                
4   ¦   ¦   °–l            
5   ¦   ¦       °–e        
6   ¦   °–c                
7   ¦       ¦–r            
8   ¦       ¦   °–o        
9   ¦       ¦       °–s    
10  ¦       ¦           °–s
11  ¦       °–t            
12  ¦           °–s        
13  °–b                    
14      ¦–a                
15      ¦   ¦–b            
16      ¦   ¦   °–b        
17      ¦   ¦       °–l    
18      ¦   ¦           °–e
19      ¦   °–t            
20      °–o                
21          °–b            
22              °–b        
23                  °–l    

24                      °–e

To check if a given word is in the tree, call is_word which returns a boolean indicating whether or not the word was found in the trie.

> is_word(‘across’, trie)
[1] TRUE

The following diagram illustrates where the match was made in the trie.

To obtain a list of suggested words based on a prefix, call matching_words which returns a vector of matching words.

> matching_words(‘ac’, trie)
 r.o.s.s        t      t.s 
“across”    “act”   “acts” 

Three words are returned, corresponding to the section of the tree where the prefix exists.  The section of the tree identified by the function is shown below.  Although not clear in the diagram, a termination character exists that allowed the function to return both “act” and “acts”.

The highlighted graphs above were created by using data.tree’s SetGraphStyle, SetNodeStyle, and SetEdgeStyle functions.

If you are interested in more information, the code is available on Github and the vignette includes additional discussion.

To leave a comment for the author, please follow the link and comment on their blog: R-Chart.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)