Geospatial Queries using Pymongo in R

[This article was first published on mlampros, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Since I submitted the geojsonR package I was interested in running geospatial MongoDB queries using GeoJson data. I decided to use PyMongo (through the reticulate package) after opening two Github issues here and here. In my opinion, the PyMongo library is huge and covers a lot of things however, my intention was to be able to run geospatial queries from within R.

The GeoMongo package

The GeoMongo package allows the user,

  • to insert and query only GeoJson data using the geomongo R6 class
  • to read data in either json (through the geojsonR package) or BSON format (I’ll explain later when BSON is necessary for inserting data)
  • to validate a json instance using a schema using the json_schema_validator() function (input parameters are R named lists)
  • to utilize MongoDB console commands using the mongodb_console() function. The mongodb_console() function takes advantage of the base R system() function. For instance, MongoDB console commands are necessary in case of bulk import / export of data as documented here and here.

I was able to reproduce the majority of geospatial MongoDB queries ( System Requirements : MongoDB (>= 3.4) and Python (>= 3.5) ) using a number of blog posts on the web, however I’ll take advantage of the following two in order to explain how one can use the GeoMongo package for this purpose:

queries based on first example blog post

When inserting data using the geomongo R6 class the user has the option (via the TYPE_DATA parameter) to either give a character string (or vector), a list, a file or a folder of files as input. To start with, I’ll use the following character strings ( they appear in the first example blog post , the “_id” ‘s were removed),

<span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">GeoMongo</span><span class="p">)</span><span class="w">


</span><span class="c1"># important : the property-names of each geojson object should be of type character string
</span><span class="w">
</span><span class="n">loc1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'{
          "name" : "Squaw Valley",
          "location" : {
              "type" : "Point",
              "coordinates" : [
                  -120.24,
                  39.21
              ]
          }
      }'</span><span class="w">


</span><span class="n">loc2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'{
        "name" : "Mammoth Lakes",
        "location" : {
            "type" : "Point",
            "coordinates" : [
                -118.9,
                37.61
            ]
        }
    }'</span><span class="w">


</span><span class="n">loc3</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'{
        "name" : "Aspen",
        "location" : {
            "type" : "Point",
            "coordinates" : [
                -106.82,
                39.18
            ]
        }
    }'</span><span class="w">


</span><span class="n">loc4</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'{
        "name" : "Whistler",
        "location" : {
            "type" : "Point",
            "coordinates" : [
                -122.95,
                50.12
            ]
        }
    }'</span><span class="w">



</span><span class="c1"># create a vector of character strings
</span><span class="w">
</span><span class="n">char_FILES</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="n">loc1</span><span class="p">,</span><span class="w"> </span><span class="n">loc2</span><span class="p">,</span><span class="w"> </span><span class="n">loc3</span><span class="p">,</span><span class="w"> </span><span class="n">loc4</span><span class="p">)</span><span class="w">           

</span>

Before inserting the data one should make sure that MongoDB is running on the Operating System. Information on how to install MongoDB can be found here.

The geomongo R6 class will be initialized and a database and collection will be created,

<span class="w">
</span><span class="n">init</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">geomongo</span><span class="o">$</span><span class="n">new</span><span class="p">(</span><span class="n">host</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'localhost'</span><span class="p">,</span><span class="w"> </span><span class="n">port</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">27017</span><span class="p">)</span><span class="w">    </span><span class="c1"># assuming MongoDB runs locally
</span><span class="w">
</span><span class="n">getter_client</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">init</span><span class="o">$</span><span class="n">getClient</span><span class="p">()</span><span class="w">                          </span><span class="c1"># get MongoClient()
</span><span class="w">
</span><span class="n">init_db</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">getter_client</span><span class="p">[[</span><span class="s2">"example_db"</span><span class="p">]]</span><span class="w">                   </span><span class="c1"># create a new database
</span><span class="w">
</span><span class="n">init_col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">init_db</span><span class="o">$</span><span class="n">create_collection</span><span class="p">(</span><span class="s2">"example_col"</span><span class="p">)</span><span class="w">       </span><span class="c1"># create a new collection
</span><span class="w">
</span>

After the preliminary steps, one can continue by inserting the char_FILES object to the relevant database / collection using the geoInsert method. The TYPE_DATA parameter equals here to dict_many meaning it can take either a list of lists (nested list) or a character vector of strings,

<span class="w">
</span><span class="n">init</span><span class="o">$</span><span class="n">geoInsert</span><span class="p">(</span><span class="n">DATA</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">char_FILES</span><span class="p">,</span><span class="w">              </span><span class="c1"># input data
</span><span class="w">               
               </span><span class="n">TYPE_DATA</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'dict_many'</span><span class="p">,</span><span class="w">        </span><span class="c1"># character vector of strings as input
</span><span class="w">               
               </span><span class="n">COLLECTION</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">init_col</span><span class="p">,</span><span class="w">          </span><span class="c1"># specify the relevant collection
</span><span class="w">               
               </span><span class="n">GEOMETRY_NAME</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"location"</span><span class="p">)</span><span class="w">     </span><span class="c1"># give the 'geometry name' of each geo-object
</span><span class="w">
</span>

One can now run various commands to check the correctness of the inserted data,

<span class="w">
</span><span class="n">init_db</span><span class="o">$</span><span class="n">collection_names</span><span class="p">()</span><span class="w">    </span><span class="c1"># prints out the collection names of the relevant database
</span><span class="w">
</span>
<span class="w">
</span><span class="s2">"example_col"</span><span class="w">

</span>
<span class="w">
</span><span class="n">init_col</span><span class="o">$</span><span class="n">find_one</span><span class="p">()</span><span class="w">          </span><span class="c1"># prints one of the inserted geometry objects
</span><span class="w">
</span>
<span class="w">
</span><span class="o">$</span><span class="n">`_id`</span><span class="w">
</span><span class="m">5984</span><span class="n">a</span><span class="m">0</span><span class="n">b</span><span class="m">742</span><span class="n">b</span><span class="m">2563</span><span class="n">fb5838f6a</span><span class="w">

</span><span class="o">$</span><span class="n">location</span><span class="w">
</span><span class="o">$</span><span class="n">location</span><span class="o">$</span><span class="n">type</span><span class="w">
</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="s2">"Point"</span><span class="w">

</span><span class="o">$</span><span class="n">location</span><span class="o">$</span><span class="n">coordinates</span><span class="w">
</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="m">-120.24</span><span class="w">   </span><span class="m">39.21</span><span class="w">


</span><span class="o">$</span><span class="n">name</span><span class="w">
</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="s2">"Squaw Valley"</span><span class="w">

</span>
<span class="w">
</span><span class="n">init_col</span><span class="o">$</span><span class="n">count</span><span class="p">()</span><span class="w">          </span><span class="c1"># prints the number of the inserted geometry objects
</span><span class="w">
</span>
<span class="w">
</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="m">4</span><span class="w">

</span>

I’ll continue reproducing some of the geo-queries of the first example blog post from within an R-session.

The first query is about the number of locations in the state of Colorado, where Colorado is approximated as the below GeoJson square,

<span class="w">
</span><span class="p">{</span><span class="w">
  </span><span class="s2">"type"</span><span class="o">:</span><span class="w"> </span><span class="s2">"Polygon"</span><span class="p">,</span><span class="w">
  </span><span class="s2">"coordinates"</span><span class="o">:</span><span class="w"> </span><span class="p">[[</span><span class="w">
    </span><span class="p">[</span><span class="m">-109</span><span class="p">,</span><span class="w"> </span><span class="m">41</span><span class="p">],</span><span class="w">
    </span><span class="p">[</span><span class="m">-102</span><span class="p">,</span><span class="w"> </span><span class="m">41</span><span class="p">],</span><span class="w">
    </span><span class="p">[</span><span class="m">-102</span><span class="p">,</span><span class="w"> </span><span class="m">37</span><span class="p">],</span><span class="w">
    </span><span class="p">[</span><span class="m">-109</span><span class="p">,</span><span class="w"> </span><span class="m">37</span><span class="p">],</span><span class="w">
    </span><span class="p">[</span><span class="m">-109</span><span class="p">,</span><span class="w"> </span><span class="m">41</span><span class="p">]</span><span class="w">
  </span><span class="p">]]</span><span class="w">
</span><span class="p">}</span><span class="w">


</span>

and the corresponding MongoDB query would be,

<span class="w">
</span><span class="n">db.locations.find</span><span class="p">({</span><span class="w">
</span><span class="n">...</span><span class="w">   </span><span class="n">location</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">...</span><span class="w">     </span><span class="o">$</span><span class="n">geoIntersects</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">...</span><span class="w">       </span><span class="o">$</span><span class="n">geometry</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">...</span><span class="w">         </span><span class="n">type</span><span class="o">:</span><span class="w"> </span><span class="s2">"Polygon"</span><span class="p">,</span><span class="w">
</span><span class="n">...</span><span class="w">         </span><span class="n">coordinates</span><span class="o">:</span><span class="w"> </span><span class="p">[[</span><span class="w">
</span><span class="n">...</span><span class="w">           </span><span class="p">[</span><span class="m">-109</span><span class="p">,</span><span class="w"> </span><span class="m">41</span><span class="p">],</span><span class="w">
</span><span class="n">...</span><span class="w">           </span><span class="p">[</span><span class="m">-102</span><span class="p">,</span><span class="w"> </span><span class="m">41</span><span class="p">],</span><span class="w">
</span><span class="n">...</span><span class="w">           </span><span class="p">[</span><span class="m">-102</span><span class="p">,</span><span class="w"> </span><span class="m">37</span><span class="p">],</span><span class="w">
</span><span class="n">...</span><span class="w">           </span><span class="p">[</span><span class="m">-109</span><span class="p">,</span><span class="w"> </span><span class="m">37</span><span class="p">],</span><span class="w">
</span><span class="n">...</span><span class="w">           </span><span class="p">[</span><span class="m">-109</span><span class="p">,</span><span class="w"> </span><span class="m">41</span><span class="p">]</span><span class="w">
</span><span class="n">...</span><span class="w">         </span><span class="p">]]</span><span class="w">
</span><span class="n">...</span><span class="w">       </span><span class="p">}</span><span class="w">
</span><span class="n">...</span><span class="w">     </span><span class="p">}</span><span class="w">
</span><span class="n">...</span><span class="w">   </span><span class="p">}</span><span class="w">
</span><span class="n">...</span><span class="w"> </span><span class="p">})</span><span class="w">

</span>

This query can be translated in R in the following way:

  • curly braces correspond to R-lists
  • arrays (of size 2) to R-vectors,

<span class="w">
</span><span class="n">query_geoIntersects</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="s1">'location'</span><span class="w"> </span><span class="o">=</span><span class="w"> 
                             
                             </span><span class="nf">list</span><span class="p">(</span><span class="s1">'$geoIntersects'</span><span class="w"> </span><span class="o">=</span><span class="w"> 
                                    
                                    </span><span class="nf">list</span><span class="p">(</span><span class="s1">'$geometry'</span><span class="w"> </span><span class="o">=</span><span class="w"> 
                                           
                                           </span><span class="nf">list</span><span class="p">(</span><span class="w">
                                             
                                             </span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Polygon"</span><span class="p">,</span><span class="w"> 
                                                
                                             </span><span class="n">coordinates</span><span class="w"> </span><span class="o">=</span><span class="w"> 
                                               
                                               </span><span class="nf">list</span><span class="p">(</span><span class="w">
                                                 
                                                 </span><span class="nf">list</span><span class="p">(</span><span class="w">
                                                   
                                                   </span><span class="nf">c</span><span class="p">(</span><span class="m">-109</span><span class="p">,</span><span class="w"> </span><span class="m">41</span><span class="p">),</span><span class="w"> 
                                                   
                                                   </span><span class="nf">c</span><span class="p">(</span><span class="m">-102</span><span class="p">,</span><span class="w"> </span><span class="m">41</span><span class="p">),</span><span class="w"> 
                                                   
                                                   </span><span class="nf">c</span><span class="p">(</span><span class="m">-102</span><span class="p">,</span><span class="w"> </span><span class="m">37</span><span class="p">),</span><span class="w"> 
                                                   
                                                   </span><span class="nf">c</span><span class="p">(</span><span class="m">-109</span><span class="p">,</span><span class="w"> </span><span class="m">37</span><span class="p">),</span><span class="w"> 
                                                   
                                                   </span><span class="nf">c</span><span class="p">(</span><span class="m">-109</span><span class="p">,</span><span class="w"> </span><span class="m">41</span><span class="p">)</span><span class="w">
                                                   </span><span class="p">)</span><span class="w">
                                                 </span><span class="p">)</span><span class="w">
                                             </span><span class="p">)</span><span class="w">
                                         </span><span class="p">)</span><span class="w">
                                  </span><span class="p">)</span><span class="w">
                           </span><span class="p">)</span><span class="w">


</span>

and the find METHOD of geoQuery function will be used to return locations which are within the boundaries of Colorado,

<span class="w">
</span><span class="n">loc_intersect</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">init</span><span class="o">$</span><span class="n">geoQuery</span><span class="p">(</span><span class="n">QUERY</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">query_geoIntersects</span><span class="p">,</span><span class="w">      </span><span class="c1"># query from previous chunk
</span><span class="w">                                  
                              </span><span class="n">METHOD</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"find"</span><span class="p">,</span><span class="w">                  </span><span class="c1"># the method to use
</span><span class="w">                              
                              </span><span class="n">COLLECTION</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">init_col</span><span class="p">,</span><span class="w">            </span><span class="c1"># the collection to use
</span><span class="w">
                              </span><span class="n">GEOMETRY_NAME</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"location"</span><span class="p">,</span><span class="w">       </span><span class="c1"># the geometry name to use
</span><span class="w">                              
                              </span><span class="n">TO_LIST</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span><span class="w">                  </span><span class="c1"># returns a data.table
</span><span class="w">
</span><span class="n">loc_intersect</span><span class="w">

</span>

The output can be returned either as a list or as a data.table,

<span class="w">
</span><span class="c1"># data.table format
</span><span class="w">
   </span><span class="n">location.type</span><span class="w"> </span><span class="n">location.coordinates1</span><span class="w"> </span><span class="n">location.coordinates2</span><span class="w">  </span><span class="n">name</span><span class="w">                       </span><span class="n">id</span><span class="w">
</span><span class="m">1</span><span class="o">:</span><span class="w">         </span><span class="n">Point</span><span class="w">               </span><span class="m">-106.82</span><span class="w">                 </span><span class="m">39.18</span><span class="w"> </span><span class="n">Aspen</span><span class="w"> </span><span class="m">5984</span><span class="n">a</span><span class="m">0</span><span class="n">b</span><span class="m">742</span><span class="n">b</span><span class="m">2563</span><span class="n">fb5838f6c</span><span class="w">

</span>

The next few code chunks will show how to return documents that are within a certain distance of a given point using the geoWithin and centerSphere operators (locations with a square of circumradius 300 miles centered on San Francisco, approximately latitude 37.7, longitude -122.5).

<span class="w">
</span><span class="c1"># MongoDB query
</span><span class="w">
</span><span class="n">db.locations.find</span><span class="p">({</span><span class="w">
</span><span class="n">...</span><span class="w">   </span><span class="n">location</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">...</span><span class="w">     </span><span class="o">$</span><span class="n">geoWithin</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">...</span><span class="w">       </span><span class="o">$</span><span class="n">centerSphere</span><span class="o">:</span><span class="w"> </span><span class="p">[[</span><span class="m">-122.5</span><span class="p">,</span><span class="w"> </span><span class="m">37.7</span><span class="p">],</span><span class="w"> </span><span class="m">300</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="m">3963.2</span><span class="p">]</span><span class="w">
</span><span class="n">...</span><span class="w">     </span><span class="p">}</span><span class="w">
</span><span class="n">...</span><span class="w">   </span><span class="p">}</span><span class="w">
</span><span class="n">...</span><span class="w"> </span><span class="p">})</span><span class="w">

</span>

and the corresponding query in R,

<span class="w">
</span><span class="n">geoWithin_sph</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="s1">'location'</span><span class="w"> </span><span class="o">=</span><span class="w"> 
                       
                       </span><span class="nf">list</span><span class="p">(</span><span class="s1">'$geoWithin'</span><span class="w"> </span><span class="o">=</span><span class="w"> 
                            
                            </span><span class="nf">list</span><span class="p">(</span><span class="s1">'$centerSphere'</span><span class="w"> </span><span class="o">=</span><span class="w"> 
                                   
                                   </span><span class="nf">list</span><span class="p">(</span><span class="w">
                                     
                                     </span><span class="nf">c</span><span class="p">(</span><span class="m">-122.5</span><span class="p">,</span><span class="w"> </span><span class="m">37.7</span><span class="p">),</span><span class="w"> </span><span class="m">300</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="m">3963.2</span><span class="p">)</span><span class="w">
                                 </span><span class="p">)</span><span class="w">
                          </span><span class="p">)</span><span class="w">
                   </span><span class="p">)</span><span class="w">


</span><span class="c1"># no need to specify again the "COLLECTION" and "GEOMETRY_NAME" parameters
# as we use the same initialization of the R6 class with the previous query
</span><span class="w">
</span><span class="n">res_geoWithin_sph</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">init</span><span class="o">$</span><span class="n">geoQuery</span><span class="p">(</span><span class="n">QUERY</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">geoWithin_sph</span><span class="p">,</span><span class="w">
                                  
                                  </span><span class="n">METHOD</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"find"</span><span class="p">)</span><span class="w">
</span><span class="n">res_geoWithin_sph</span><span class="w">

</span>
<span class="w">
</span><span class="c1"># example output
</span><span class="w">
   </span><span class="n">location.type</span><span class="w"> </span><span class="n">location.coordinates1</span><span class="w"> </span><span class="n">location.coordinates2</span><span class="w">           </span><span class="n">name</span><span class="w">                       </span><span class="n">id</span><span class="w">
</span><span class="m">1</span><span class="o">:</span><span class="w">         </span><span class="n">Point</span><span class="w">               </span><span class="m">-118.90</span><span class="w">                 </span><span class="m">37.61</span><span class="w">  </span><span class="n">Mammoth</span><span class="w"> </span><span class="n">Lakes</span><span class="w"> </span><span class="m">5984</span><span class="n">a</span><span class="m">0</span><span class="n">b</span><span class="m">742</span><span class="n">b</span><span class="m">2563</span><span class="n">fb5838f6b</span><span class="w">
</span><span class="m">2</span><span class="o">:</span><span class="w">         </span><span class="n">Point</span><span class="w">               </span><span class="m">-120.24</span><span class="w">                 </span><span class="m">39.21</span><span class="w">   </span><span class="n">Squaw</span><span class="w"> </span><span class="n">Valley</span><span class="w"> </span><span class="m">5984</span><span class="n">a</span><span class="m">0</span><span class="n">b</span><span class="m">742</span><span class="n">b</span><span class="m">2563</span><span class="n">fb5838f6a</span><span class="w">

</span>

One can read more about the magic number 3963.2 (radius of the Earth) either in the first example blog post or in the MongoDB documentation.

Here one can also plot the output locations using the leaflet package,

<span class="w">
</span><span class="n">map_dat</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">leaflet</span><span class="o">::</span><span class="n">leaflet</span><span class="p">()</span><span class="w">

</span><span class="n">map_dat</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">leaflet</span><span class="o">::</span><span class="n">addTiles</span><span class="p">(</span><span class="n">map_dat</span><span class="p">)</span><span class="w">

</span><span class="n">map_dat</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">leaflet</span><span class="o">::</span><span class="n">addMarkers</span><span class="p">(</span><span class="n">map_dat</span><span class="p">,</span><span class="w"> 
                               
                               </span><span class="n">lng</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">unlist</span><span class="p">(</span><span class="n">res_geoWithin_sph</span><span class="o">$</span><span class="n">location.coordinates1</span><span class="p">),</span><span class="w"> 
                               
                               </span><span class="n">lat</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">unlist</span><span class="p">(</span><span class="n">res_geoWithin_sph</span><span class="o">$</span><span class="n">location.coordinates2</span><span class="p">))</span><span class="w">
</span><span class="n">map_dat</span><span class="w">

</span>

Alt text

The next query utilizes the aggregate method to return the locations sorted by distance from a given point,

<span class="w">
</span><span class="c1"># MongoDB query
</span><span class="w">
</span><span class="n">db.locations.aggregate</span><span class="p">([{</span><span class="w">
</span><span class="n">...</span><span class="w">   </span><span class="o">$</span><span class="n">geoNear</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">...</span><span class="w">     </span><span class="n">near</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">...</span><span class="w">       </span><span class="n">type</span><span class="o">:</span><span class="w"> </span><span class="s1">'Point'</span><span class="p">,</span><span class="w">
</span><span class="n">...</span><span class="w">       </span><span class="n">coordinates</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="m">-122.5</span><span class="p">,</span><span class="w"> </span><span class="m">37.1</span><span class="p">]</span><span class="w">
</span><span class="n">...</span><span class="w">     </span><span class="p">},</span><span class="w">
</span><span class="n">...</span><span class="w">     </span><span class="n">spherical</span><span class="o">:</span><span class="w"> </span><span class="n">true</span><span class="p">,</span><span class="w">
</span><span class="n">...</span><span class="w">     </span><span class="n">maxDistance</span><span class="o">:</span><span class="w"> </span><span class="m">900</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="m">1609.34</span><span class="p">,</span><span class="w">
</span><span class="n">...</span><span class="w">     </span><span class="n">distanceMultiplier</span><span class="o">:</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="m">1609.34</span><span class="p">,</span><span class="w">
</span><span class="n">...</span><span class="w">     </span><span class="n">distanceField</span><span class="o">:</span><span class="w"> </span><span class="s1">'distanceFromSF'</span><span class="w">
</span><span class="n">...</span><span class="w">   </span><span class="p">}</span><span class="w">
</span><span class="n">...</span><span class="w"> </span><span class="p">}])</span><span class="w">

</span>

and the corresponding query in R,

<span class="w">
</span><span class="n">query_geonear</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="s1">'$geoNear'</span><span class="w"> </span><span class="o">=</span><span class="w"> 
                       
                       </span><span class="nf">list</span><span class="p">(</span><span class="n">near</span><span class="w"> </span><span class="o">=</span><span class="w"> 
                              
                              </span><span class="nf">list</span><span class="p">(</span><span class="w">
                                
                                </span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Point"</span><span class="p">,</span><span class="w"> 
                                  
                                </span><span class="n">coordinates</span><span class="w"> </span><span class="o">=</span><span class="w"> 
                                  
                                  </span><span class="nf">c</span><span class="p">(</span><span class="m">-122.5</span><span class="p">,</span><span class="w"> </span><span class="m">37.1</span><span class="p">)</span><span class="w">
                                
                                </span><span class="p">),</span><span class="w"> 
                            
                            </span><span class="n">distanceField</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"distanceFromSF"</span><span class="p">,</span><span class="w"> 
                            
                            </span><span class="n">maxDistance</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">900</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="m">1609.34</span><span class="p">,</span><span class="w">
                            
                            </span><span class="n">distanceMultiplier</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="m">1609.34</span><span class="p">,</span><span class="w"> 
                            
                            </span><span class="n">spherical</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
                     </span><span class="p">)</span><span class="w">


</span><span class="n">func_quer_geonear</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">init</span><span class="o">$</span><span class="n">geoQuery</span><span class="p">(</span><span class="n">QUERY</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">query_geonear</span><span class="p">,</span><span class="w"> 
                                  
                                  </span><span class="n">METHOD</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"aggregate"</span><span class="p">)</span><span class="w">
</span><span class="n">func_quer_geonear</span><span class="w">


</span>

<span class="w">
</span><span class="c1"># example output
</span><span class="w">
   </span><span class="n">distanceFromSF</span><span class="w"> </span><span class="n">location.type</span><span class="w"> </span><span class="n">location.coordinates1</span><span class="w"> </span><span class="n">location.coordinates2</span><span class="w">           </span><span class="n">name</span><span class="w">                           </span><span class="n">id</span><span class="w">
</span><span class="m">1</span><span class="o">:</span><span class="w">       </span><span class="m">190.8044</span><span class="w">         </span><span class="n">Point</span><span class="w">               </span><span class="m">-120.24</span><span class="w">                 </span><span class="m">39.21</span><span class="w">   </span><span class="n">Squaw</span><span class="w"> </span><span class="n">Valley</span><span class="w">     </span><span class="m">5984</span><span class="n">a</span><span class="m">0</span><span class="n">b</span><span class="m">742</span><span class="n">b</span><span class="m">2563</span><span class="n">fb5838f6a</span><span class="w">
</span><span class="m">2</span><span class="o">:</span><span class="w">       </span><span class="m">201.0443</span><span class="w">         </span><span class="n">Point</span><span class="w">               </span><span class="m">-118.90</span><span class="w">                 </span><span class="m">37.61</span><span class="w">  </span><span class="n">Mammoth</span><span class="w"> </span><span class="n">Lakes</span><span class="w">     </span><span class="m">5984</span><span class="n">a</span><span class="m">0</span><span class="n">b</span><span class="m">742</span><span class="n">b</span><span class="m">2563</span><span class="n">fb5838f6b</span><span class="w">
</span><span class="m">3</span><span class="o">:</span><span class="w">       </span><span class="m">863.9478</span><span class="w">         </span><span class="n">Point</span><span class="w">               </span><span class="m">-106.82</span><span class="w">                 </span><span class="m">39.18</span><span class="w">          </span><span class="n">Aspen</span><span class="w">     </span><span class="m">5984</span><span class="n">a</span><span class="m">0</span><span class="n">b</span><span class="m">742</span><span class="n">b</span><span class="m">2563</span><span class="n">fb5838f6c</span><span class="w">

</span>

queries based on the second (MongoDB) documentation example

I picked this documentation example in order to show how someone can use the command METHOD besides the find and aggregate methods.

First I’ll build a new collection (places) and then I’ll insert the example data,

<span class="w">
</span><span class="n">places_col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">init_db</span><span class="o">$</span><span class="n">create_collection</span><span class="p">(</span><span class="s2">"places"</span><span class="p">)</span><span class="w">       </span><span class="c1"># create a new collection
</span><span class="w">
</span>
<span class="w">
</span><span class="c1"># important : the property-names of each geojson object should be of type character string
</span><span class="w">
</span><span class="n">place1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'{
          "name": "Central Park",
          "location": { "type": "Point", "coordinates": [ -73.97, 40.77 ] },
          "category": "Parks"
          }'</span><span class="w">


</span><span class="n">place2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'{
         "name": "Sara D. Roosevelt Park",
         "location": { "type": "Point", "coordinates": [ -73.9928, 40.7193 ] },
         "category": "Parks"
        }'</span><span class="w">


</span><span class="n">place3</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'{
       "name": "Polo Grounds",
       "location": { "type": "Point", "coordinates": [ -73.9375, 40.8303 ] },
       "category": "Stadiums"
        }'</span><span class="w">


</span><span class="c1"># create a vector of character strings
</span><span class="w">
</span><span class="n">doc_FILES</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="n">place1</span><span class="p">,</span><span class="w"> </span><span class="n">place2</span><span class="p">,</span><span class="w"> </span><span class="n">place3</span><span class="p">)</span><span class="w">

</span>

<span class="w">
</span><span class="n">init</span><span class="o">$</span><span class="n">geoInsert</span><span class="p">(</span><span class="n">DATA</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">doc_FILES</span><span class="p">,</span><span class="w">               </span><span class="c1"># insert data
</span><span class="w">               
               </span><span class="n">TYPE_DATA</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'dict_many'</span><span class="p">,</span><span class="w">        </span><span class="c1"># character vector of strings as input
</span><span class="w">               
               </span><span class="n">COLLECTION</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">places_col</span><span class="p">,</span><span class="w">        </span><span class="c1"># specify the relevant collection
</span><span class="w">               
               </span><span class="n">GEOMETRY_NAME</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"location"</span><span class="p">)</span><span class="w">     </span><span class="c1"># give the 'geometry name' of each geo-object
</span><span class="w">
</span>

<span class="w">
</span><span class="c1"># outputs the collection names
</span><span class="w">
</span><span class="n">init_db</span><span class="o">$</span><span class="n">collection_names</span><span class="p">()</span><span class="w">

</span>
<span class="w">
</span><span class="c1"># example output
</span><span class="w">
</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="s2">"places"</span><span class="w">      </span><span class="s2">"example_col"</span><span class="w">

</span>

<span class="w">
</span><span class="n">places_col</span><span class="o">$</span><span class="n">count</span><span class="p">()</span><span class="w">          </span><span class="c1"># number of geojson objects in collection
</span><span class="w">
</span>
<span class="w">
</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="m">3</span><span class="w">

</span>

After the data is inserted one can now query the data using the command METHOD.

Worth mentioning for this particular method are the differences between MongoDB and PyMongo. The following code chunk shows the MongoDB runCommand,

<span class="w">
</span><span class="n">db.runCommand</span><span class="p">(</span><span class="w">
   </span><span class="p">{</span><span class="w">
     </span><span class="n">geoNear</span><span class="o">:</span><span class="w"> </span><span class="s2">"places"</span><span class="p">,</span><span class="w">
     </span><span class="n">near</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">type</span><span class="o">:</span><span class="w"> </span><span class="s2">"Point"</span><span class="p">,</span><span class="w"> </span><span class="n">coordinates</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="m">-73.9667</span><span class="p">,</span><span class="w"> </span><span class="m">40.78</span><span class="w"> </span><span class="p">]</span><span class="w"> </span><span class="p">},</span><span class="w">
     </span><span class="n">spherical</span><span class="o">:</span><span class="w"> </span><span class="n">true</span><span class="p">,</span><span class="w">
     </span><span class="n">query</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">category</span><span class="o">:</span><span class="w"> </span><span class="s2">"Parks"</span><span class="w"> </span><span class="p">}</span><span class="w">
   </span><span class="p">}</span><span class="w">
</span><span class="p">)</span><span class="w">

</span>

which corresponds to the following query in GeoMongo (similar to PyMongo),

<span class="w">
</span><span class="n">Args_Kwargs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="s2">"geoNear"</span><span class="p">,</span><span class="w"> </span><span class="s2">"places"</span><span class="p">,</span><span class="w">

                   </span><span class="n">near</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="s2">"type"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Point"</span><span class="p">,</span><span class="w"> </span><span class="s2">"coordinates"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">-73.9667</span><span class="p">,</span><span class="w"> </span><span class="m">40.78</span><span class="p">)),</span><span class="w">

                   </span><span class="n">spherical</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">,</span><span class="w">

                   </span><span class="n">query</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="s2">"category"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Parks"</span><span class="p">))</span><span class="w">

</span>

Information about the various parameters of the command method can be found in the PyMongo documentation.

Then the GeoMongo command method takes the parameters in the same way as the find or aggregate methods,

<span class="w">
</span><span class="n">init</span><span class="o">$</span><span class="n">geoQuery</span><span class="p">(</span><span class="n">QUERY</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Args_Kwargs</span><span class="p">,</span><span class="w"> 
              
              </span><span class="n">METHOD</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"command"</span><span class="p">,</span><span class="w"> 
              
              </span><span class="n">COLLECTION</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">places_col</span><span class="p">,</span><span class="w"> 
              
              </span><span class="n">DATABASE</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">init_db</span><span class="p">,</span><span class="w">             </span><span class="c1"># additionally I have to specify the database
</span><span class="w">
              </span><span class="n">TO_LIST</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span><span class="w">


</span>

which returns only the ‘Parks’ (of the category property name) from the input documents,

<span class="w">
</span><span class="n">obj.category</span><span class="w"> </span><span class="n">obj.location.type</span><span class="w"> </span><span class="n">obj.location.coordinates1</span><span class="w"> </span><span class="n">obj.location.coordinates2</span><span class="w">               </span><span class="n">obj.name</span><span class="w">      </span><span class="n">dis</span><span class="w">                       </span><span class="n">id</span><span class="w">
       </span><span class="n">Parks</span><span class="w">             </span><span class="n">Point</span><span class="w">                  </span><span class="m">-73.9700</span><span class="w">                   </span><span class="m">40.7700</span><span class="w">           </span><span class="n">Central</span><span class="w"> </span><span class="n">Park</span><span class="w"> </span><span class="m">1147.422</span><span class="w"> </span><span class="m">5985</span><span class="n">b</span><span class="m">4</span><span class="n">d</span><span class="m">242</span><span class="n">b</span><span class="m">2563</span><span class="n">fb5838f6e</span><span class="w">
       </span><span class="n">Parks</span><span class="w">             </span><span class="n">Point</span><span class="w">                  </span><span class="m">-73.9928</span><span class="w">                   </span><span class="m">40.7193</span><span class="w"> </span><span class="n">Sara</span><span class="w"> </span><span class="n">D.</span><span class="w"> </span><span class="n">Roosevelt</span><span class="w"> </span><span class="n">Park</span><span class="w"> </span><span class="m">7106.506</span><span class="w"> </span><span class="m">5985</span><span class="n">b</span><span class="m">4</span><span class="n">d</span><span class="m">242</span><span class="n">b</span><span class="m">2563</span><span class="n">fb5838f6f</span><span class="w">

</span>

The following two blog posts include also a variety of geospatial queries ( here and here ).

More details about the geomongo R6 class and each method (read_mongo_bson(), geoInsert(), geoQuery()) can be found in the Details and Methods of the package documentation.

When to input data in bson rather than in json format (applies to the geomongo R6 class)

When inserting data to MongoDB there are cases where the id appears in the following format,

<span class="w">
</span><span class="c1"># data taken from :  https://docs.mongodb.com/manual/tutorial/geospatial-tutorial/
</span><span class="w">
</span><span class="n">example_dat</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'{"_id":
                      {"$oid":"55cba2476c522cafdb053add"},
                "location":
                      {"coordinates":[-73.856077,40.848447],"type":"Point"},
                "name":"Morris Park Bake Shop"}'</span><span class="w">

</span>

<span class="w">
</span><span class="n">bson_col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">init_db</span><span class="o">$</span><span class="n">create_collection</span><span class="p">(</span><span class="s2">"example_bson"</span><span class="p">)</span><span class="w">       </span><span class="c1"># create a new collection
</span><span class="w">
</span>

Inserting the example_dat in the bson_col will raise an error,

<span class="w">
</span><span class="n">init</span><span class="o">$</span><span class="n">geoInsert</span><span class="p">(</span><span class="n">DATA</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">example_dat</span><span class="p">,</span><span class="w">             </span><span class="c1"># insert data
</span><span class="w">             
              </span><span class="n">TYPE_DATA</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'dict_one'</span><span class="p">,</span><span class="w">          </span><span class="c1"># single list as input
</span><span class="w">             
              </span><span class="n">COLLECTION</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">bson_col</span><span class="p">,</span><span class="w">           </span><span class="c1"># specify the relevant collection
</span><span class="w">             
              </span><span class="n">GEOMETRY_NAME</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"location"</span><span class="p">,</span><span class="w">      </span><span class="c1"># give the 'geometry name' of each geo-object
</span><span class="w">              
              </span><span class="n">read_method</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"geojsonR"</span><span class="p">)</span><span class="w">

</span>
<span class="w">
</span><span class="c1"># example output
</span><span class="w">
</span><span class="n">Error</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="n">py_call_impl</span><span class="p">(</span><span class="n">callable</span><span class="p">,</span><span class="w"> </span><span class="n">dots</span><span class="o">$</span><span class="n">args</span><span class="p">,</span><span class="w"> </span><span class="n">dots</span><span class="o">$</span><span class="n">keywords</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> 
  </span><span class="n">InvalidDocument</span><span class="o">:</span><span class="w"> </span><span class="n">key</span><span class="w"> </span><span class="s1">'$oid'</span><span class="w"> </span><span class="n">must</span><span class="w"> </span><span class="n">not</span><span class="w"> </span><span class="n">start</span><span class="w"> </span><span class="n">with</span><span class="w"> </span><span class="s1">'$'</span><span class="w">

</span>

This error is explained also in a similar StackOverflow question

In such a case, one has to change the read_method to mongo_bson to correctly insert the data,

<span class="w">
</span><span class="n">init</span><span class="o">$</span><span class="n">geoInsert</span><span class="p">(</span><span class="n">DATA</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">example_dat</span><span class="p">,</span><span class="w">             </span><span class="c1"># insert data
</span><span class="w">             
              </span><span class="n">TYPE_DATA</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'dict_one'</span><span class="p">,</span><span class="w">          </span><span class="c1"># single character string as input
</span><span class="w">             
              </span><span class="n">COLLECTION</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">bson_col</span><span class="p">,</span><span class="w">           </span><span class="c1"># specify the relevant collection
</span><span class="w">             
              </span><span class="n">GEOMETRY_NAME</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"location"</span><span class="p">,</span><span class="w">      </span><span class="c1"># give the 'geometry name' of each geo-object
</span><span class="w">              
              </span><span class="n">read_method</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"mongo_bson"</span><span class="p">)</span><span class="w">

</span>

Finally, we can check the correctness of the inserted data,

<span class="w">
</span><span class="n">bson_col</span><span class="o">$</span><span class="n">count</span><span class="p">()</span><span class="w">

</span>
<span class="w">
</span><span class="c1"># example output
</span><span class="w">
</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="m">1</span><span class="w">

</span>

<span class="w">
</span><span class="n">bson_col</span><span class="o">$</span><span class="n">find_one</span><span class="p">()</span><span class="w">

</span>
<span class="w">
</span><span class="c1"># example output
</span><span class="w">
</span><span class="o">$</span><span class="n">`_id`</span><span class="w">
</span><span class="m">55</span><span class="n">cba2476c522cafdb053add</span><span class="w">

</span><span class="o">$</span><span class="n">location</span><span class="w">
</span><span class="o">$</span><span class="n">location</span><span class="o">$</span><span class="n">type</span><span class="w">
</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="s2">"Point"</span><span class="w">

</span><span class="o">$</span><span class="n">location</span><span class="o">$</span><span class="n">coordinates</span><span class="w">
</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="m">-73.85608</span><span class="w">  </span><span class="m">40.84845</span><span class="w">


</span><span class="o">$</span><span class="n">name</span><span class="w">
</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="s2">"Morris Park Bake Shop"</span><span class="w">

</span>

The README.md file of the GeoMongo package includes the SystemRequirements and installation instructions.

An updated version of the GeoMongo package can be found in my Github repository and to report bugs/issues please use the following link, https://github.com/mlampros/GeoMongo/issues.

To leave a comment for the author, please follow the link and comment on their blog: mlampros.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)