India Census 2001 – Part 1
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I was trying – for the last few weeks – to get the 2001 Indian census data. Alas the census website is under construction. But fortunately the Internet rewind button works! Thankfully the literacy data was online there. The raw data is available here.
I cleaned up the data so that it is easy to work with R. I removed the commas in the numbers. Also, under the urban status column I removed the dots and capitalized the status codes. One of the urban status became ‘NA’ and since R treats ‘NA’ as a missing data I changed it to NA1.
The cleaned up data is available here. Please download and rename it as india-census-2001.csv
Here goes the R code to explore the data:
#--------------------------------------------------------------------------- # set the working directory # replace dir with your own path where "india-census-2001.csv" is stored setwd("dir") # load the plotting package library(lattice) india <- read.csv(file = "india-census-2001.csv", header = T) # find out the places with zero population! india_pop_zero <- subset(india, TotPop == 0)[,c(2,3,4,5)] #---------------------------------------------------------------------------
Lets us print out those places with zero population.
#--------------------------------------------------------------------------- print(india_pop_zero) City UrbanStatus State District 200 Anjar M Gujarat Kachchh 636 Bhachau M Gujarat Kachchh 735 Bhuj M Gujarat Kachchh 1495 Gandhidham M Gujarat Kachchh 2173 Kandla CT Gujarat Kachchh 2937 Mandvi M Gujarat Kachchh 3128 Morvi M Gujarat Rajkot 3178 Mundra CT Gujarat Kachchh 4043 Rapar M Gujarat Kachchh 5119 Wankaner M Gujarat Rajkot #---------------------------------------------------------------------------
Find out the population in all Kachchh districts.
#--------------------------------------------------------------------------- subset(india, District == "Kachchh")[,c(2,3,4,5,6)] City UrbanStatus State District TotPop 200 Anjar M Gujarat Kachchh 0 636 Bhachau M Gujarat Kachchh 0 735 Bhuj M Gujarat Kachchh 0 1495 Gandhidham M Gujarat Kachchh 0 2173 Kandla CT Gujarat Kachchh 0 2937 Mandvi M Gujarat Kachchh 0 3178 Mundra CT Gujarat Kachchh 0 4043 Rapar M Gujarat Kachchh 0 #---------------------------------------------------------------------------
Find out the population in all Rajkot districts.
#--------------------------------------------------------------------------- > subset(india, District == "Rajkot")[,c(2,3,4,5,6)] City UrbanStatus State District TotPop 695 Bhayavadar M Gujarat Rajkot 18246 1298 Dhoraji M Gujarat Rajkot 80807 1613 Gondal M Gujarat Rajkot 95991 1956 Jasdan M Gujarat Rajkot 39041 1984 Jetpur Navagadh M Gujarat Rajkot 104311 3128 Morvi M Gujarat Rajkot 0 3521 Paddhari CT Gujarat Rajkot 9225 3967 Rajkot MCorp Gujarat Rajkot 966642 4919 Upleta M Gujarat Rajkot 55341 5119 Wankaner M Gujarat Rajkot 0 #---------------------------------------------------------------------------
Looks as if the data in the Kachchh region was not collected. Wonder why those two Rajkot districts also suffered the unfortunate fate. Maybe they are close to Kachchh region. Anyway let us look at the data which has non-zero population.
Let us plot the literacy rate of the city/town (x-axis) against the State (y-axis)
#--------------------------------------------------------------------------- india <- subset(india, TotPop > 0) # Plot the literacy data dotplot(State ~ 100*Literates/TotPop, xlab = "Literacy", data = india) #---------------------------------------------------------------------------
Here goes the plot
Looking at the plot, no surprise that Kerala has very high literacy rate in all the towns and the spread is also low. Tamil Nadu has a bigger spread in the literacy rates. The Northeastern states are doing very well in the educational aspect if we evaluate them by their literacy rates.
Let us check which city/town has the highest and the lowest literacy in India
#--------------------------------------------------------------------------- subset(india, TotLiteracy == max(TotLiteracy))[,c("City", "State", "District", "TotLiteracy")] City State District TotLiteracy 1663 Gulmarg Jammu & Kashmir Baramula 96.23494 #--------------------------------------------------------------------------- subset(india, TotLiteracy == min(TotLiteracy))[,c("City", "State", "District", "TotLiteracy")] City State District TotLiteracy 4666 Tarapur Maharashtra Thane 0.7843697 #---------------------------------------------------------------------------
Well, this is a surprise. The city with the highest literacy is in Jammu & Kashmir (Gulmarg) and the lowest is in Maharashtra (Tarapur). What is shocking is that the literacy rate in Tarapur is less than 1%. I hope that there was mistake in data collection, otherwise it is a damning indictment of a huge administrative failure in that district. This is unacceptable.
In the next few posts, I will concentrate on Tamil Nadu and Coimbatore. It should be pretty easy to modify the code in the coming posts to look at the states and districts of your interest.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.