Loading Data with Pandas

June 27, 2016
By

(This article was first published on R – Quintuitive, and kindly contributed to R-bloggers)

On at least a couple of occasions lately, I realized that I may need Python in the near future. While I have amassed some limited experience with the language over the years, I never spent the time to understand Pandas, its de-facto standard data-frame library.

Where does one start? For me its usually with the data. Simple stuff, loading, wrangling, etc. Re-writing my little R6 helper class to load future’s data looked like a perfect candidate.

There was some frustration, totally expected after years of experience with R. Some things were less intuitive, however, surprisingly pretty much nothing was straight ugly. 🙂 And when it comes to code, I am not easy to please. The end result is available here.

Here is a little example how to use the code, although one can’t do much without the data, which I can’t distribute:

import pandas as pd
import instrumentdb as idb

def main():
    # Crate the object for the database
    db = idb.CsiDb()

    # Load the data for three elements
    all = db.mload_bars(["HO2", "RB2", "CL2"])
    print(all['HO2'].head())
    print(all['RB2'].head())

    # Build an array of the closing prices for each series
    closes = []
    for ss in all.keys():
        closes.append(all[ss]['close'])

    # Create a single data frame using these series
    all_df = pd.concat(closes, join='inner', axis=1)
    all_df.columns = [xx.lower() for xx in all.keys()]

    print(all_df.tail())

    # That's the only line that would work without the data.
    print(db.future_list())

if __name__ == "__main__":
    main()

The structure of the database is available from Tradelib’s source code (I am using the SQLite’s version for this test). To bootstrap (create) the database I use sqlite3.exe’s read command, to which I pass data.sqlite.sql as a parameter. To be used via the CsiDb class, the database is configured using a TOML configuration file.

flavor = "SQLite"
db = "sqlite:///C:/Users/qmoron/Documents/csidata.sqlite"
bars_table = "csi_bars"

Now a little rant: In the above code, I tried to create a module, instrumentdb, to keep the source code in it. This created some problems while developing the module. Apparently, once loaded, it’s pretty hard to re-load the module properly within the same REPL interpreter. From R’s perspective, where I am used to re-loading files, or even packages, as my development goes, that seemed quite an obstacle. After straggling with the issue for a while, the best I was able to come up with, is the above approach of using a full-blown “main” file to drive the execution and some tests. This is unlikely to scale (in the sense of using it in a rapid REPL prototyping) – I am open to suggestions.

The post Loading Data with Pandas appeared first on Quintuitive.

To leave a comment for the author, please follow the link and comment on their blog: R – Quintuitive.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)