30 issues of Demographic Digest – the most frequent journals

[This article was first published on Ilya Kashnitsky, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Today, the 30-th issue of my Demographic Digest was published.

Demographic Digest is my project that started in November 2015. Twice a month I select fresh demographic papers and write brief summaries of them in Russian to be published in Demoscope Weekly, the most popular Russian journal/website in social sciences. If you read Russian, you may want to browse the archive or visit the website of the project (which is still to be filled).

The project is in the transitional phase now. Since 2016 Demographic Digest welcomes contributions from from external authors. In February 2017 I launched the first iteration of a project for the students of National Research University Higher School of Economics.

To draw a line after the first phase of the project, I analysed what journals supplied Demographic Digest most frequently. Also, my desire was to try visualizing data with treemaps, which I mentioned in the bonus part1 of the latest digest issue.

For that, I exported the bibliographic data of all the papers covered in Demographic Digest. I use Zotero as a reference manager; the paper records are exported as a single .bib file, which I then saved as a plain text (.txt) file. Then I read this data in R, cleaned it, and finally visualized.

<span class="c1"># load required packages
</span><span class="n">library</span><span class="p">(</span><span class="n">tidyverse</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">stringr</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">readxl</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">extrafont</span><span class="p">)</span><span class="w">
</span><span class="n">myfont</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="s2">"Roboto Condensed"</span><span class="w">

</span><span class="n">df</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data.frame</span><span class="p">(</span><span class="n">lines</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">readLines</span><span class="p">(</span><span class="s2">"https://ikashnitsky.github.io/doc/misc/dd-stats/dd-bib.txt"</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
        </span><span class="n">mutate</span><span class="p">(</span><span class="n">lines</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">lines</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="nf">as.character</span><span class="p">())</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
        
        </span><span class="c1"># grab only the lines that contain journals' titles
</span><span class="w">        </span><span class="n">filter</span><span class="p">(</span><span class="n">lines</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">str_detect</span><span class="p">(</span><span class="s2">"journaltitle"</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
        
        </span><span class="c1"># remove everything that is not the bare journal's title
</span><span class="w">        </span><span class="n">transmute</span><span class="p">(</span><span class="n">journals</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">lines</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
                       </span><span class="n">str_replace_all</span><span class="p">(</span><span class="n">pattern</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"\tjournaltitle = |\\Q{\\E|\\Q}\\E,|\\Q}\\E"</span><span class="p">,</span><span class="w"> 
                                       </span><span class="n">replacement</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">""</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
        </span><span class="c1"># calculate frequencies
</span><span class="w">        </span><span class="n">group_by</span><span class="p">(</span><span class="n">journals</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
        </span><span class="n">summarise</span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">n</span><span class="p">())</span><span class="w">
</span>

For one journal title, Ageing and Society, I failed to replace the “\&” using regular expressions. This one is to be fixed manually. I also corrected the title of Lancet journal removing the article “The”. Finally, I corrected the frequencies for Population Studies and Population and Development Review subtracting 6, because for both journals I provided lists of most cited papers as a bonus. Following the same logic, I cleaned the data from the papers that appeared in the bonus part.

<span class="c"># correct "Ageing and Society"</span>
<span class="n">df</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">]</span> <span class="o"><-</span> <span class="s">"Ageing and Society"</span>

<span class="c"># correct the title of Lancet</span>
<span class="n">df</span> <span class="o"><-</span> <span class="n">df</span> <span class="o">%>%</span> <span class="n">mutate</span><span class="p">(</span><span class="n">journals</span> <span class="o">=</span> <span class="n">journals</span> <span class="o">%>%</span> <span class="n">str_replace</span><span class="p">(</span><span class="s">"The Lancet"</span><span class="p">,</span> <span class="s">"Lancet"</span><span class="p">))</span>

<span class="c"># correct "Population and Development Review" and "Population Studies" for 6 each</span>
<span class="c"># Reason - top cited papers bonus</span>
<span class="n">df</span><span class="p">[</span><span class="n">df</span><span class="err">$</span><span class="n">journals</span> <span class="o">%</span><span class="ow">in</span><span class="o">%</span> <span class="n">c</span><span class="p">(</span><span class="s">"Population and Development Review"</span><span class="p">,</span> <span class="s">"Population Studies"</span><span class="p">),</span> <span class="mi">2</span><span class="p">]</span> <span class="o"><-</span> 
        <span class="n">df</span><span class="p">[</span><span class="n">df</span><span class="err">$</span><span class="n">journals</span> <span class="o">%</span><span class="ow">in</span><span class="o">%</span> <span class="n">c</span><span class="p">(</span><span class="s">"Population and Development Review"</span><span class="p">,</span> <span class="s">"Population Studies"</span><span class="p">),</span> <span class="mi">2</span><span class="p">]</span> <span class="o">-</span> <span class="mi">6</span>

To provide some additional metrics of the journals, I downloaded bibliometric data from the SCImago Journal & Country Rank projecthttp://www.scimagojr.com/aboutus.php. Demographic journals usually have rather low SJR, compared to medical journals; that’s why I downloaded the data only for journals in Social Sciences (the.xlsx file). Then I read the data in R and join to my data frame.

<span class="c1"># read SJR data for journals in Social Sciences
</span><span class="n">sjr</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">readxl</span><span class="o">::</span><span class="n">read_excel</span><span class="p">(</span><span class="s2">"https://ikashnitsky.github.io/doc/misc/dd-stats/scimagojr.xlsx"</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
        </span><span class="n">mutate</span><span class="p">(</span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Title</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">tolower</span><span class="p">())</span><span class="w">

</span><span class="c1"># join the data frames; note that I create an "id" variable in lower case
</span><span class="n">df_sjr</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">left_join</span><span class="p">(</span><span class="n">df</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">mutate</span><span class="p">(</span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">journals</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">tolower</span><span class="p">),</span><span class="w"> </span><span class="n">sjr</span><span class="p">,</span><span class="w"> </span><span class="s2">"id"</span><span class="p">)</span><span class="w"> 
</span>

Finally, it’s time to visualize the data. I use the amazing treemap package2.

<span class="c1"># Treemap visualization
</span><span class="n">library</span><span class="p">(</span><span class="n">treemap</span><span class="p">)</span><span class="w">

</span><span class="n">treemap</span><span class="p">(</span><span class="n">dtf</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">df_sjr</span><span class="p">,</span><span class="w"> 
        </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"journals"</span><span class="p">,</span><span class="w"> 
        </span><span class="n">vSize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"n"</span><span class="p">,</span><span class="w"> 
        </span><span class="n">vColor</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"SJR"</span><span class="p">,</span><span class="w"> 
        </span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"value"</span><span class="p">,</span><span class="w">
        </span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">,</span><span class="w">
        </span><span class="n">palette</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"BrBG"</span><span class="p">,</span><span class="w"> 
        </span><span class="n">border.col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"grey10"</span><span class="p">,</span><span class="w"> 
        </span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Journals' frequency in Demographic Digest"</span><span class="p">,</span><span class="w">
        </span><span class="n">title.legend</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"SJR (only social sciences)"</span><span class="p">,</span><span class="w">
        </span><span class="n">fontfamily.title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">myfont</span><span class="p">,</span><span class="w">
        </span><span class="n">fontfamily.labels</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">myfont</span><span class="p">,</span><span class="w">
        </span><span class="n">fontfamily.legend</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">myfont</span><span class="p">,</span><span class="w">
        </span><span class="n">drop.unused.levels</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">T</span><span class="p">)</span><span class="w">
</span>

Here is how the output looks.

treemap

Note that the lion’s share of Population Studies is mainly explained by the first issue of Demographic Digest, in which I covered all the papers from the brilliant special issue Population — The long view.

  1. I finish each issue of Demographic Digest with a bonus, in with I cover fun papers, discuss some academia related issues, or just provide link to cool visualizations and projects. 

  2. I also tried portfolio and treemapify, but liked the output from treemap most. 

To leave a comment for the author, please follow the link and comment on their blog: Ilya Kashnitsky.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)