Loading Data with Pandas
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
On at least a couple of occasions lately, I realized that I may need Python in the near future. While I have amassed some limited experience with the language over the years, I never spent the time to understand Pandas, its de-facto standard data-frame library.
Where does one start? For me its usually with the data. Simple stuff, loading, wrangling, etc. Re-writing my little R6 helper class to load future’s data looked like a perfect candidate.
There was some frustration, totally expected after years of experience with R. Some things were less intuitive, however, surprisingly pretty much nothing was straight ugly. And when it comes to code, I am not easy to please. The end result is available here.
Here is a little example how to use the code, although one can’t do much without the data, which I can’t distribute:
import pandas as pd import instrumentdb as idb def main(): # Crate the object for the database db = idb.CsiDb() # Load the data for three elements all = db.mload_bars(["HO2", "RB2", "CL2"]) print(all['HO2'].head()) print(all['RB2'].head()) # Build an array of the closing prices for each series closes = [] for ss in all.keys(): closes.append(all[ss]['close']) # Create a single data frame using these series all_df = pd.concat(closes, join='inner', axis=1) all_df.columns = [xx.lower() for xx in all.keys()] print(all_df.tail()) # That's the only line that would work without the data. print(db.future_list()) if __name__ == "__main__": main()
The structure of the database is available from Tradelib’s source code (I am using the SQLite’s version for this test). To bootstrap (create) the database I use sqlite3.exe’s read command, to which I pass data.sqlite.sql as a parameter. To be used via the CsiDb class, the database is configured using a TOML configuration file.
flavor = "SQLite" db = "sqlite:///C:/Users/qmoron/Documents/csidata.sqlite" bars_table = "csi_bars"
Now a little rant: In the above code, I tried to create a module, instrumentdb, to keep the source code in it. This created some problems while developing the module. Apparently, once loaded, it’s pretty hard to re-load the module properly within the same REPL interpreter. From R’s perspective, where I am used to re-loading files, or even packages, as my development goes, that seemed quite an obstacle. After straggling with the issue for a while, the best I was able to come up with, is the above approach of using a full-blown “main” file to drive the execution and some tests. This is unlikely to scale (in the sense of using it in a rapid REPL prototyping) – I am open to suggestions.
The post Loading Data with Pandas appeared first on Quintuitive.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.