I’ll be running an R course soon and I am looking for fun (public) datasets to use in data manipulation and visualization. I would like to use a single dataset that has some easy variables for the first days, but also some more challenging ones for the final days. And I want that when I put exercises, the students* are curious about finding out the answer.
[*in this case students are not ecologists]
-Movies. How many movies has Woody Allen? Is the number of movies per year increasing linearly or exponentially? That is a good theme with lots of options. IMDB releases some data, AND processing their terribly formatted txt files and assembling them would be an excellent exercise for an advanced class, but not for beginners. OMDB has an API to make searches and if you donate you can get the full database. And of course, there is an R package to use the API. This is better option for beginners.
-Music. Everyone likes music and there are 300Gb of data here. You can get also just a chunk, though, but still 2 Gb of data is probably too much for beginers.
-Football: I discarded this one for me because I know nothing about it, but I am sure it will be highly popular in Spain. An open database here.
–Kaggel datasets are also awesome. To download them you just have to register. I may use the baby names per year and US state. Everyone is curious about the most popular name the year of your birthday, for example.
–Earthquakes: This one also needs some parsing of the txt files (easier than IMDB) and will do for pretty visualizations.
-Datasets already in R: Along with the classic datasets on Iris flowers (used by Fisher!) or the cars dataset there are cooler options. For example there are lots of datasets for econometrics (some are curious), and Rstudio also released some cool ones recently (e.g. flights).
Please, add your ideas in the comments, especially if you have used them with success for teaching R. Thanks!