Loading Data with Pandas

[This article was first published on R – Quintuitive, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

On at least a couple of occasions lately, I realized that I may need Python in the near future. While I have amassed some limited experience with the language over the years, I never spent the time to understand Pandas, its de-facto standard data-frame library.

Where does one start? For me its usually with the data. Simple stuff, loading, wrangling, etc. Re-writing my little R6 helper class to load future’s data looked like a perfect candidate.

There was some frustration, totally expected after years of experience with R. Some things were less intuitive, however, surprisingly pretty much nothing was straight ugly. ? And when it comes to code, I am not easy to please. The end result is available here.

Here is a little example how to use the code, although one can’t do much without the data, which I can’t distribute:

import pandas as pd
import instrumentdb as idb

def main():
    # Crate the object for the database
    db = idb.CsiDb()

    # Load the data for three elements
    all = db.mload_bars(["HO2", "RB2", "CL2"])
    print(all['HO2'].head())
    print(all['RB2'].head())

    # Build an array of the closing prices for each series
    closes = []
    for ss in all.keys():
        closes.append(all[ss]['close'])

    # Create a single data frame using these series
    all_df = pd.concat(closes, join='inner', axis=1)
    all_df.columns = [xx.lower() for xx in all.keys()]

    print(all_df.tail())

    # That's the only line that would work without the data.
    print(db.future_list())

if __name__ == "__main__":
    main()

The structure of the database is available from Tradelib’s source code (I am using the SQLite’s version for this test). To bootstrap (create) the database I use sqlite3.exe’s read command, to which I pass data.sqlite.sql as a parameter. To be used via the CsiDb class, the database is configured using a TOML configuration file.

flavor = "SQLite"
db = "sqlite:///C:/Users/qmoron/Documents/csidata.sqlite"
bars_table = "csi_bars"

Now a little rant: In the above code, I tried to create a module, instrumentdb, to keep the source code in it. This created some problems while developing the module. Apparently, once loaded, it’s pretty hard to re-load the module properly within the same REPL interpreter. From R’s perspective, where I am used to re-loading files, or even packages, as my development goes, that seemed quite an obstacle. After straggling with the issue for a while, the best I was able to come up with, is the above approach of using a full-blown “main” file to drive the execution and some tests. This is unlikely to scale (in the sense of using it in a rapid REPL prototyping) – I am open to suggestions.

The post Loading Data with Pandas appeared first on Quintuitive.

To leave a comment for the author, please follow the link and comment on their blog: R – Quintuitive.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)