Data Science on Blockchain with R. Part II: Tracking the NFTs

[This article was first published on Tdemarchinr in Towards Data Science on Medium, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A story about nodes and vertices

Examples of Weird Whale NFTs. These NFTs (token ids 525, 564, 618, 645, 816, 1109, 1523 and 2968) belong to the creator of the collection Benyamin Ahmed (Benoni) who gave us the permission to show them in this article.

By Thomas de Marchin and Milana Filatenkova

Thomas is Senior Data Scientist at Pharmalex. He is passionate about the incredible possibility that blockchain technology offers to make the world a better place. You can contact him on Linkedin or Twitter.

Milana is Data Scientist at Pharmalex. She is passionate about the power of analytical tools to discover the truth about the world around us and guide decision making. You can contact her on Linkedin.

1. Introduction

What is the Blockchain: A blockchain is a growing list of records, called blocks, that are linked together using cryptography. It is used for recording transactions, tracking assets, and building trust between participating parties. Primarily known for Bitcoin and cryptocurrencies application, Blockchain is now used in almost all domains, including supply chain, healthcare, logistic, identity management… Some blockchains are public and can be accessed from everyone while some are private. Hundreds of blockchains exist with their own specifications and applications: Bitcoin, Ethereum, Tezos…

What are NFTs: Non-Fungible Tokens are used to represent ownership of unique items. They let us tokenize things like art, collectibles, patents, even real estate. They can only have one official owner at a time and the record of their ownership is secured on the blockchain, no one can modify or copy/paste a new NFT into existence.

What is R: R language is widely used among statisticians and data miners for developing data analysis software.

What is a smart contract: Smart contracts are simply programs stored on a blockchain that run when predetermined conditions are met. They typically are used to automate the execution of an agreement so that all participants can be immediately certain of the outcome, without any intermediary’s involvement or time loss. They can also automate a workflow, triggering the next action when conditions are met. An example of a smart contract use case is the lottery: people buy tickets within a predefined time window and once it expires, a winner is automatically selected and money is transferred to his account, all this without involvement of a third party.

This is the second article on a series of articles on interaction with blockchains using R. Part I focuses on some basic concepts related to blockchain, including how to read the blockchain data. If you haven’t read it, I strongly encourage you to do so to get familiar with the tools and terminology we use in this second article: Part I.

It is not uncommon to hear that cryptocurrencies are popular with Mafia as it is anonymous and confidential. This is only partially true. While we don’t know exactly who is behind an address, the transactions made by this address are visible from everyone. Unless you are very careful, it is practically possible to determine who is behind the address by crossing databases. There are now companies specialized in doing only that. This is a breakthrough path towards a more transparent and fair world. Blockchain has the potential to resolve major problems linked to lack of traceability and accountability problems the world is facing. Take for example the cacao culture in Ivory Coast. The Country has lost more than 60% of its “protected” forests during the last 25 years despite the official engagement of the agribusiness to fight against deforestation. The main reason for the inefficiency of current strategies to protect wild forests is that it is extremely difficult to trace down the origin of the cocoa beans, the existing traceability solutions being easily manipulated. Blockchain could help solve this issue. In the pharmaceutical world, blockchain has also the potential to improve certain aspects of drug production industry. For example, this technology would enhance the transparency into manufacturing process and supply chain, as well as management of clinical data.

The topic covered in this article is how to extract and visualize the blockchain transactions. We will explore some of the tools available on R to do this. To avoid messing up with the mafia, let’s focus on a more benign but currently very popular application of blockchain technology: the art NFTs. We will look at more specifically the Weird Whales NFTs. The approach described here is of course extensible to anything stored on the blockchain.

The Weird Whales project is a collection of 3350 whales which have been programmatically generated from an ocean of combinations, each with their unique characteristics and traits: This project was created by a 12-year-old programmer named Benyamin Ahmed who puts on sale on the famous NFT marketplace OpenSea. The 3,350 computer-generated Weird Whales almost instantly sold out based on the heartwarming story and Benyamin made more than 400,000$ in two months. Whales were initially sold at approximately 60$ but since then, there price has been multiplied by 100… Read this to learn more about this incredible story.

A HTML version of this article as well as the code used to generate it is available on my Github.

2 Data

This section is dedicated to downloading the sales data consisting in which tokens were transferred to which address as well as the sale prices. This is a very interesting topic but also a bit technical so if you are only interested in the data analysis, you can skip this section and go directly to section 3.

2.1 Transfers

All smart contracts are different and it is important to understand them to decode the information. Reverse engineering by reading the source code and using block explorers like EtherScan is usually a good start. The Weird Whales are managed by a specific smart contract on the Ethereum blockchain. This contract is stored on a specific address and you can read its code here.

To make it easier to extract information from the blockchain, which can be fairly complicated due to how it is stored on the ledger, we can read the events. In Solidity, the programming language used to code smart contracts on Ehtereum, events are dispatched signals the smart contracts can fire. Any app connected to Ethereum network can listen to these events and act accordingly. Here is a list of recent Weird Whales events:

We are mostly interested by a specific type of event: transfer. Every time a transfer of a token takes place, an event gets written on the blockchain with structure: Transfer (index_topic_1 address from, index_topic_2 address to, index_topic_3 uint256 tokenId). As indicated its names, this event records the address from which the token is transferred, the address to which it is transferred and the token ID, which goes from 1 to 3350 (as there were 3350 Weird Whales generated).

We will therefore extract all transfer events related to Weird Whales. For this, we filter on the hash signature of this event (also called Topic 0). By doing a bit of reverse engineering on EtherScan (, we see that topic 0 for this event is “0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef”.

Below, we outline a process to create a databases containing trade data of the Weird Whales. We download the data using the EtherScan API, see Part I for more details. EtherScan’s API is limited to 1000 result per call. That’s not enough to analyze the Weird Whale transactions as only the minting (process of creation of the token on the blockchain) generates 3350 transaction (1 transaction per NFT minted). And that’s without all the subsequent transfers! That’s why we have to use a dirty while loop. Note that if you are ready to pay a bit, there are other blockchain database available without restriction. For example, the Ethereum database is available on Google BigQuery.

This is how the Transfer dataset looks like:

2.2 Sales price

While both the transfer and the sales can be managed by the same contract, it is done differently On OpenSea. The sale is managed by the main OpenSea contract and if it is approved (asked price reached), that first contract calls a second NFT contract, here Weird Whales, which then triggers the transfer. If we want to know the price at which the NFTs were sold (in addition to the transfer discussed above), we need to extract data from the main contract ( The sales are recorded by an event called OrderMatch.

Note that this loop can take a while to run as we download all the sales prices for all the NFT sales on OpenSea, not only the Weird Whales. Given that at the time of writing, there were on average 30 transactions per minute on OpenSea, it can take a while to download… This code block is very similar to the one above so if you are unsure about what a line does exactly, read the code comments above.

If we look at the orderMatch event structure, we see that the price is encoded in uint256 type in the data field. It is preceded by two others fields, buyHash and sellHash, both in bytes32 types. The uint256 and bytes32 types are both 32 bytes long, which makes 64 Hexadecimal characters. We are not interested by the buyHash and sellHash data but only by the price sale. We thus have to retrieve the last 64 characters and convert them into decimal to get the sale price.

Figure 1: Structure of an orderMatch event

2.3 Combine the two events

Let’s now merge the two dataset by the transactionHash of the Weird Whales transfers.

2.4 Convert the ETH price in USD

As we are working on the Ethereum blockchain, the transaction price are given in ETH. Ethereum / USD rate is highly volatile. If we try to convert ETH to USD, we cannot just apply a multiplicative factor. We thus have to download the historical ETH USD price. This time around, we will download the data from the Poloniex exchange instead of EtherScan (you need a pro account for that). Poloniex exchange provides free access to ETH-USD convertion.

We will use a spline function approximation to smooth out and interpolate the conversion rate. This is due to the timestamp of the transaction event, which is in seconds, while the resolution of the historical price dataset is much lower, they are recorded on average every 30 minute. We thus have to interpolate the historical prices to cover the transaction events.

Figure 2: Historical ETH to USD rate. Red line: spline.

Let us now convert the ETH in USD for our Weird Whales transactions using the interpolated value of the price at the time of the transaction.

2.5 Final dataset

Note that if you do not manage to download all the data from EtherScan, you can always load the dataset available on the github.

This is how the final dataset looks like:

3 Analysis

3.1 Descriptive statistics

Below we make a summary of the dataset by calculating a few descriptive statistics :

Table 1: Summary statistics on the content of the dataset.

Another interesting exercise — We can determine the number of transactions per address. The first address (0x00..) is not a real address — it refers to the minting of the NFTs. The number of transactions in it is simply the total number of NFTs registered on the blockchain. Besides we see that some addresses are quite active in trading Weird Whales as they have been involved in hundreds of transactions!

Let’s now visualize the price of transactions as a function of the time, irrespective of the token ID. In the first few days, we see high variability in prices. This is followed by a quieter period and we see the beginning of an upward trend in the last days of August. This price increase coincides with an intensive activity on the social media as the story of the creator of Weird Whales becomes viral. One transaction sci-rockets up to 25000 USD, this corresponds to an increase of about 55555% compared to the initial price of 45 USD!

Figure 3: Sales price (USD) evolution for the Weird Whales NFTs transactions.

3.2 Visualizing the network

So far, we have looked at summary of the transactions prices. Now, how to visualize the transactions, knowing that each NFT is unique and needs to be differentiated from the others? The tool we need is a network. The dataset we assembled is perfectly suited to be plotted as a network. Networks are described by vertices (or nodes) and edges (or links). Here we are going to use a node representation for a wallet address. The network we are going to construct will display all the wallet addresses that have ever traded Weird Whales. The connections between the vertices, so called edges, will represent the transactions.

There are several class and corresponding packages available in R to plot networks, the most famous being network and igraph. Have a look at the References section for a few amazing tutorials on this topic. My personal preference is the network package as it gives the possibility to create interactive plots via the networkDynamic and ndtv packages. Besides, other packages have been developed to facilitate manipulation and plotting of such objects, such as for example, the ggraph package which brings the familiar ggplot2 framework to the network class.

Let’s give it a try! We will first create a simple static (i.e. without the temporal dimension) network and plot it using the ggraph package. There are too many data to display in one plot so we will subset our data by plotting only the NFTs involved in more than 7 transactions.

We can now plot our network. We see that all transactions of all the tokens originate from the single minting address (1). Some addresses are involved in multiple transactions and that’s why we see several (curved) edges for those ones. We also see that some tokens were transferred to an address only to be sent back to the sender.

Figure 4: Static network of the WheirdWhales NFTs transactions. Each address is represented by a node (circle) and the transations are represented by the edges (lines). The edge color refers to the token ID. A high resolution figure is available here.

Let’s now use the timestamp of the transaction to add a temporal dimension to our network. This would allow us to visualise the network evolution over time. For this, we will use the amazing networkDynamic package.

We can create a timeline plot showing the frequency of the transaction activity. We see that most (about 2/3) of the transactions happened very shortly after the NFT’s creation. This is followed by a relatively calm period and then a spike of activity around 1000.

Figure 5: Timeline plot showing the frequency of the transactions.

Below, we create an animation which can be launched directly from the browser. Edges and vertices can be clicked on to show more information about the associated addresses and tokens traded.

Figure 6: Animated network of the Wheird Whales NFTs transactions. Each address is represented by a node (circle) and the transactions are represented by the edges (lines). The edge color refers to the token ID. Edges and vertices can be clicked on to show more information about the associated addresses and tokens traded. The interactive version is available here.

4 Conclusion

Hopefully, you enjoyed reading this article and have now a better understanding on how to visualize the blockchain transactions. Here, we have shown an example of how to download and plot a network of transactions associated to NFTs. We saw that analysing and plotting transactions is easy once you have the data. On the other hand, obtaining data in an appropriate format requires a deep understanding of the blockchain, especially when working with smart contracts.

In the next posts, we can explore the Tezos blockchain, the new place to be for NFTs trading. We could also dig into the Helium blockchain, a physical decentralized wireless blockchain-powered network for Internet of Things (IoT) devices. If you wish to learn more about this, please follow me on Medium, Linkedin and/or Twitter so you get alerted of a new article release. Thank you for reading and feel free to reach us if you have questions or comments.

A HTML version of this article as well as the code used to generate it is available on my Github.

If you wish to help us continue researching and writting about data science on blockchain, don’t hesitate to make a donation to our Ethereum address (0xf5fC137E7428519969a52c710d64406038319169) or Tezos address (tz1ffZLHbu9adcobxmd411ufBDcVgrW14mBd).

Stay tuned!

5 References

Powered by and Poloniex APIs:



Data Science on Blockchain with R. Part II: Tracking the NFTs was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

To leave a comment for the author, please follow the link and comment on their blog: Tdemarchinr in Towards Data Science on Medium. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)