Kindle clippings.txt with Python

[This article was first published on Max Humber, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Exactly a year ago I posted Kindle clippings.txt with R. Since then things have changed… I’m a Pythonista now! Consequently, I thought it would be fun to update that post and parse highlights with 3.6+ and pandas. Janky, but it works:

<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>

<span class="n">txt</span> <span class="o">=</span> <span class="s">"""Sourdough (Robin Sloan)
- Your Highlight on page 187 | Location 2853-2855 | Added on Tuesday, October 2, 2017 8:47:09 PM

The world is going to change, I think—slowly at first, then faster than anyone expects.
==========
Sapiens (Yuval Noah Harari)
- Your Highlight on page 196 | Location 2996-2997 | Added on Tuesday, October 3, 2017 8:51:09 PM

Evolution has made Homo sapiens, like other social mammals, a xenophobic creature.
==========
Life 3.0 (Max Tegmark)
- Your Highlight on page 75 | Location 1136-1137 | Added on Wednesday, October 11, 2017 6:00:15 PM

In short, computation is a pattern in the spacetime arrangement of particles
==========
"""</span>

<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s">'clippings.txt'</span><span class="p">,</span> <span class="s">'w'</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s">'utf-8-sig'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
    <span class="n">f</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">txt</span><span class="p">)</span>

<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s">'clippings.txt'</span><span class="p">,</span> <span class="s">'r'</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s">'utf-8-sig'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
    <span class="n">contents</span> <span class="o">=</span> <span class="n">f</span><span class="o">.</span><span class="n">read</span><span class="p">()</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s">u'</span><span class="se">\ufeff</span><span class="s">'</span><span class="p">,</span> <span class="s">''</span><span class="p">)</span>
    <span class="n">lines</span> <span class="o">=</span> <span class="n">contents</span><span class="o">.</span><span class="n">rsplit</span><span class="p">(</span><span class="s">'=========='</span><span class="p">)</span>
    <span class="n">store</span> <span class="o">=</span> <span class="p">{</span><span class="s">'author'</span><span class="p">:</span> <span class="p">[],</span> <span class="s">'title'</span><span class="p">:</span> <span class="p">[],</span> <span class="s">'quote'</span><span class="p">:</span> <span class="p">[]}</span>
    <span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">lines</span><span class="p">:</span>
        <span class="k">try</span><span class="p">:</span>
            <span class="n">meta</span><span class="p">,</span> <span class="n">quote</span> <span class="o">=</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">')</span><span class="se">\n</span><span class="s">- '</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
            <span class="n">title</span><span class="p">,</span> <span class="n">author</span> <span class="o">=</span> <span class="n">meta</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">' ('</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
            <span class="n">_</span><span class="p">,</span> <span class="n">quote</span> <span class="o">=</span> <span class="n">quote</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">'</span><span class="se">\n\n</span><span class="s">'</span><span class="p">)</span>
            <span class="n">store</span><span class="p">[</span><span class="s">'author'</span><span class="p">]</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">author</span><span class="o">.</span><span class="n">strip</span><span class="p">())</span>
            <span class="n">store</span><span class="p">[</span><span class="s">'title'</span><span class="p">]</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">title</span><span class="o">.</span><span class="n">strip</span><span class="p">())</span>
            <span class="n">store</span><span class="p">[</span><span class="s">'quote'</span><span class="p">]</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">quote</span><span class="o">.</span><span class="n">strip</span><span class="p">())</span>
        <span class="k">except</span> <span class="nb">ValueError</span><span class="p">:</span>
            <span class="k">pass</span>

<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">store</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">df</span><span class="o">.</span><span class="n">to_csv</span><span class="p">(</span><span class="n">index</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s">'utf-8-sig'</span><span class="p">))</span>
<span class="c"># author,quote,title</span>
<span class="c"># Robin Sloan,"The world is going to change, I think—slowly at first, then faster than anyone expects.",Sourdough</span>
<span class="c"># Yuval Noah Harari,"Evolution has made Homo sapiens, like other social mammals, a xenophobic creature.",Sapiens</span>
<span class="c"># Max Tegmark,"In short, computation is a pattern in the spacetime arrangement of particles",Life 3.0</span>

Right now I’m 49 books deep. It’s crunch time, but I can see the end! Look for my annual 52 Quotes post in a couple of days!

To leave a comment for the author, please follow the link and comment on their blog: Max Humber.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)