Postgresql + R Sandbox

[This article was first published on --Jean Arreola--, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

ElephantSQL

ElephantSQL offers a free instance of Postgresql, with a limit of 20 MB and 5 concurrent connections. For example, you can upload a shiny application that depends on data from ElephantSQL.

You only need to register to the site and automatically you can acces your free instance.

In this post we will see how to take advantage of this cloud database.

Getting the data

For this example I will use the open data of air quality available in the page of SEDEMA (Environment Secretary) of Mexico City.

The data is structured by one csv file per year, and is avalilable from 1992.

<span class="c1">#Auxiliary function to download the files
</span><span class="w">
</span><span class="n">load_sedema</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">year</span><span class="p">){</span><span class="w">

  </span><span class="c1">#URL to the file
</span><span class="w">  </span><span class="c1">#from 1992
</span><span class="w">  </span><span class="n">link</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">paste0</span><span class="p">(</span><span class="s2">"http://148.243.232.112:8080/opendata/IndiceCalidadAire/indice_"</span><span class="p">,</span><span class="n">year</span><span class="p">,</span><span class="s2">".csv"</span><span class="p">)</span><span class="w"> 
  
  </span><span class="c1">#Columns classes
</span><span class="w">  </span><span class="n">types</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"character"</span><span class="p">,</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="s2">"numeric"</span><span class="p">,</span><span class="m">26</span><span class="p">))</span><span class="w">
  
  </span><span class="c1">#Download the file
</span><span class="w">  </span><span class="n">air_data</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">read.csv</span><span class="p">(</span><span class="n">link</span><span class="p">,</span><span class="n">skip</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">9</span><span class="p">,</span><span class="w"> </span><span class="n">stringsAsFactors</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">F</span><span class="p">,</span><span class="w"> </span><span class="n">encoding</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"latin1"</span><span class="p">,</span><span class="w"> </span><span class="n">header</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">F</span><span class="p">,</span><span class="w">
                       </span><span class="n">colClasses</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">types</span><span class="p">,</span><span class="w"> </span><span class="n">na.string</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"NA"</span><span class="p">)</span><span class="w">
  
  </span><span class="c1">#Remove missing data
</span><span class="w">  </span><span class="n">air_data</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">air_data</span><span class="p">[</span><span class="o">!</span><span class="n">air_data</span><span class="p">[,</span><span class="m">1</span><span class="p">]</span><span class="o">==</span><span class="s2">""</span><span class="p">,</span><span class="m">1</span><span class="o">:</span><span class="m">27</span><span class="p">]</span><span class="w">
  
  </span><span class="c1">#Fix time variable
</span><span class="w">  </span><span class="n">air_data</span><span class="o">$</span><span class="n">V</span><span class="m">1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">paste0</span><span class="p">(</span><span class="n">substring</span><span class="p">(</span><span class="n">air_data</span><span class="o">$</span><span class="n">V</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">6</span><span class="p">),</span><span class="w"> </span><span class="n">year</span><span class="p">)</span><span class="w"> </span><span class="c1">#We need to asure that all dates are from the specified year
</span><span class="w">  
  </span><span class="nf">return</span><span class="p">(</span><span class="n">air_data</span><span class="p">)</span><span class="w">

</span><span class="p">}</span><span class="w">
</span>

Next step is to create the table on Postgresql, now that we know thw structure of the csv.

<span class="n">library</span><span class="p">(</span><span class="n">RPostgreSQL</span><span class="p">)</span><span class="w">


</span><span class="c1"># SQL query to create main table if it not exists
</span><span class="w">
</span><span class="s2">"
CREATE TABLE IF NOT EXISTS air_quality (
  FECHA date,  
  HORA integer,
  NO_OZONO integer,
  NO_AZUFRE integer,
  NO_NITROGENO integer,
  NO_CARBONO integer,
  NO_PM10 integer,
  NE_OZONO integer,
  NE_AZUFRE integer,
  NE_NITROGENO integer,
  NE_CARBONO integer,
  NE_PM10 integer,
  CE_OZONO integer,
  CE_AZUFRE integer,
  CE_NITROGENO integer,
  CE_CARBONO integer,
  CE_PM10 integer,
  SO_OZONO integer,
  SO_AZUFRE integer,
  SO_NITROGENO integer,
  SO_CARBONO integer,
  SO_PM10 integer,
  SU_OZONO integer,
  SU_AZUFRE integer,
  SU_NITROGENO integer,
  SU_CARBONO integer,
  SU_PM10 integer,
  ID serial,
  PRIMARY KEY (ID)
)
"</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">query</span><span class="w">

</span><span class="c1">#Be sure to change your credentials! You can check them on the Details window on your ElephantSQL instance!
</span><span class="w">
</span><span class="c1">#dbname is the user & default database
#host is the serve
#you can get the port from URL
</span><span class="w">
</span><span class="c1"># Connect to database
</span><span class="w">
</span><span class="n">drv</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">dbDriver</span><span class="p">(</span><span class="s2">"PostgreSQL"</span><span class="p">)</span><span class="w">

</span><span class="n">con</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">dbConnect</span><span class="p">(</span><span class="n">drv</span><span class="p">,</span><span class="w"> </span><span class="n">dbname</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">user</span><span class="p">,</span><span class="w"> 
                 </span><span class="n">host</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">db_url</span><span class="p">,</span><span class="w"> </span><span class="n">port</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5432</span><span class="p">,</span><span class="w">
                 </span><span class="n">user</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">user</span><span class="p">,</span><span class="w"> </span><span class="n">password</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pwd</span><span class="p">)</span><span class="w">


</span><span class="c1"># Create table
</span><span class="w">
</span><span class="n">dbGetQuery</span><span class="p">(</span><span class="n">con</span><span class="p">,</span><span class="w"> </span><span class="n">query</span><span class="p">)</span><span class="w">
</span>

Next we upload the table from one year

<span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">upload_sedema</span><span class="p">(</span><span class="m">2017</span><span class="p">)</span><span class="w">

</span><span class="c1">#Correct format for date
</span><span class="n">data</span><span class="o">$</span><span class="n">V</span><span class="m">1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">strptime</span><span class="p">(</span><span class="n">data</span><span class="o">$</span><span class="n">V</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="s2">"%d/%m/%Y"</span><span class="p">)</span><span class="w">
</span><span class="n">data</span><span class="o">$</span><span class="n">V</span><span class="m">1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"/"</span><span class="p">,</span><span class="s2">"-"</span><span class="p">,</span><span class="n">data</span><span class="o">$</span><span class="n">V</span><span class="m">1</span><span class="p">)</span><span class="w">

</span><span class="c1">#Set ID
</span><span class="n">data</span><span class="o">$</span><span class="n">id</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">seq</span><span class="p">(</span><span class="n">ind</span><span class="p">,</span><span class="n">nrow</span><span class="p">(</span><span class="n">data</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">ind</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="m">1</span><span class="p">)</span><span class="w">

</span><span class="c1">#Upload data
</span><span class="w">  
</span><span class="n">dbWriteTable</span><span class="p">(</span><span class="n">conn</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">con</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"air_quality"</span><span class="p">,</span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">,</span><span class="w"> </span><span class="n">append</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">T</span><span class="p">,</span><span class="w"> </span><span class="n">row.names</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">F</span><span class="p">)</span><span class="w">
</span>

Now you can upload all of the years! Be sure to check the full script

elephantsql

We can query the data now.

<span class="n">query</span><span class="w"> </span><span class="o"><-</span><span class="w"> 
</span><span class="s1">'
SELECT 
  * 
FROM 
  "public"."air_quality" 
LIMIT 100
  
'</span><span class="w">

</span><span class="n">last100</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">dbGetQuery</span><span class="p">(</span><span class="n">con</span><span class="p">,</span><span class="w"> </span><span class="n">query</span><span class="p">)</span><span class="w">

</span><span class="n">head</span><span class="p">(</span><span class="n">last100</span><span class="p">)</span><span class="w">

</span><span class="c1"># Close the connection
</span><span class="w">  
</span><span class="nf">on.exit</span><span class="p">(</span><span class="n">dbDisconnect</span><span class="p">(</span><span class="n">con</span><span class="p">)</span><span class="w">
</span>
## Loading required package: methods
## Loading required package: DBI
##        fecha hora no_ozono no_azufre no_nitrogeno no_carbono no_pm10
## 1 1992-04-01    7       55        34           10         43      NA
## 2 1992-04-01    8       72        39           15         46      NA
## 3 1992-04-01    9       80        44           25         52      NA
## 4 1992-04-01   10       84        48           31         62      NA
## 5 1992-04-01   11      161        43           45         73      NA
## 6 1992-04-01   12      250        41           42         82      NA
##   ne_ozono ne_azufre ne_nitrogeno ne_carbono ne_pm10 ce_ozono ce_azufre
## 1       70        24           19         43      NA       56        39
## 2       68        25           21         43      NA       56        37
## 3       62        35           30         46      NA       68        41
## 4       47        40           33         47      NA       85        43
## 5       81        37           28         47      NA      123        45
## 6       89        32           19         47      NA      185        38
##   ce_nitrogeno ce_carbono ce_pm10 so_ozono so_azufre so_nitrogeno
## 1           20         46      NA       34        26            9
## 2           23         45      NA       46        29           10
## 3           36         48      NA       54        32           15
## 4           64         55      NA       62        34           26
## 5           50         59      NA       81        35           19
## 6           38         62      NA      124        35           16
##   so_carbono so_pm10 su_ozono su_azufre su_nitrogeno su_carbono su_pm10 id
## 1         27      NA       25        18           16         64      NA  1
## 2         31      NA       31        20           18         65      NA  2
## 3         38      NA       32        24           21         65      NA  3
## 4         45      NA       42        26           36         65      NA  4
## 5         47      NA       69        24           40         66      NA  5
## 6         49      NA       55        22           27         67      NA  6

I hope this little example can help you to try PostgreSQL even if you don’t have it installed on your computer or if you don’t have a server.

To leave a comment for the author, please follow the link and comment on their blog: --Jean Arreola--.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)