In the previous post, mpiktas posted a comment that included this question “Isn’t Python dictionary a named list in R?”. The short answer is no, but the reason why is worth some more explanation.
At first sight, a named list does look a little bit as a python dictionary, it allows to retrieve values inside a list using names (i.e. strings) instead of by numerical index. The code below is also a good example on how it is possible to mix R and python code inside a Jupyter notebook
But there is a more fundamental difference, the python dictionary uses a hash table implementation so that access to a key is independent of the size of the dictionary, while the named list in R seems to use a sorted table of names. This makes access by names (very) slow compared to access by index in R. Here is an example, that also shows how to mix intelligently R and python.
The code builds (large) equivalent structures, and a random subset is defined using R (easier in R) and passed to the python code. Then we measure the access time using different approaches. In python, we use both a list and a dictionary.
nBits = [18,19,20,21]
columns = ["Size", "Python dict by key", "Python list by index", "R list by index", "R list by key"]
results = pandas.DataFrame(columns=columns, index=nBits)
nSamples = 1000
for nBit in [18,19,20,21]:
n = (1<
Python dict by key
Python list by index
R list by index
R list by key
The results have sometimes outliers, but the main result is that access by key in R is both slow and increasing with the list size.
There is more to say, in particular there are data structures in R that are based on hash tables, but they remain less flexible than a python dictionary. I am not the first person making this type of analysis, you can check e.g.