Kindle clippings.txt with Python
[This article was first published on Max Humber, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exactly a year ago I posted Kindle clippings.txt with R. Since then things have changed… I’m a Pythonista now! Consequently, I thought it would be fun to update that post and parse highlights with 3.6+ and pandas. Janky, but it works:
import pandas as pd
txt = """Sourdough (Robin Sloan)
- Your Highlight on page 187 | Location 2853-2855 | Added on Tuesday, October 2, 2017 8:47:09 PM
The world is going to change, I think—slowly at first, then faster than anyone expects.
==========
Sapiens (Yuval Noah Harari)
- Your Highlight on page 196 | Location 2996-2997 | Added on Tuesday, October 3, 2017 8:51:09 PM
Evolution has made Homo sapiens, like other social mammals, a xenophobic creature.
==========
Life 3.0 (Max Tegmark)
- Your Highlight on page 75 | Location 1136-1137 | Added on Wednesday, October 11, 2017 6:00:15 PM
In short, computation is a pattern in the spacetime arrangement of particles
==========
"""
with open('clippings.txt', 'w', encoding='utf-8-sig') as f:
f.write(txt)
with open('clippings.txt', 'r', encoding='utf-8-sig') as f:
contents = f.read().replace(u'\ufeff', '')
lines = contents.rsplit('==========')
store = {'author': [], 'title': [], 'quote': []}
for line in lines:
try:
meta, quote = line.split(')\n- ', 1)
title, author = meta.split(' (', 1)
_, quote = quote.split('\n\n')
store['author'].append(author.strip())
store['title'].append(title.strip())
store['quote'].append(quote.strip())
except ValueError:
pass
df = pd.DataFrame(store)
print(df.to_csv(index=False, encoding='utf-8-sig'))
# author,quote,title
# Robin Sloan,"The world is going to change, I think—slowly at first, then faster than anyone expects.",Sourdough
# Yuval Noah Harari,"Evolution has made Homo sapiens, like other social mammals, a xenophobic creature.",Sapiens
# Max Tegmark,"In short, computation is a pattern in the spacetime arrangement of particles",Life 3.0
Right now I’m 49 books deep. It’s crunch time, but I can see the end! Look for my annual 52 Quotes post in a couple of days!
To leave a comment for the author, please follow the link and comment on their blog: Max Humber.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.