Chapman University DataFest Highlights

R Views

5 years ago

[This article was first published on R Views, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Editor’s Note: The 2017 Chapman University DataFest was held during the weekend of April 21-23. The 2018 DataFest will be held during the weekend of April 27-29.

DataFest was founded by Rob Gould in 2011 at UCLA with 40 students. In just seven years, it has grown to 31 sites in three countries. Have a look at Mine Çetinkaya-Rundel’s post Growth of DataFest over the years for the details. In recent years, it has been difficult for UCLA to keep up with the growing interest and demand from southern California universities. This year, the Chapman DataFest became the second DataFest site in southern California, and the largest inaugural DataFest in the history of the event. We had 65 students who stayed the whole weekend from seven universities organized into 15 teams.

The event began on a Friday evening with Professor Rob Gould, the “founder” of DataFest, giving advice on goals for the weekend. He then introduced the Expedia dataset: nearly 11 million records representing users’ online searches for hotels, plus an associated file with detailed information about the hotel destinations.

Throughout the weekend, the organizers kept students motivated with data challenges (with cell phone chargers awarded as prizes), a mini-talk on tools for joining and merging data files, and a tutorial from bitScoop on using their API integration platform.

At noon on Sunday, the students submitted their two-slide presentations via email. At 1 pm, each team had five minutes to show their findings to the six-judge panel: Johnny Lin (UCLA), Joe Kurian (Mitsubishi UFG Union Bank, Irvine), Tao Song (Spectrum Pharmaceuticals), Pamela Hsu (Spectrum Pharmaceuticals), Lynn Langit (AWS, GCP IoT), and Brett Danaher (Chapman University).

The judges announced winners in three official categories:

Best Insight: CSU Northridge team “Mean Squares” (Jamie Decker, Matthew Jones, Collin Miller, Ian Postel, and Seyed Sajjadi). [See Seyed’s description of his team’s experience!]

Best Visualization: Chapman University team “Winners ‘); Drop Table” (Dylan Bowman, William Cortes, Shevis Johnson, and Tristan Tran).

Best Use of External Data: Chapman University team “BEST” (Brandon Makin, Sarah Lasman, and Timothy Kristedja).

Additionally, “Judges’ Choice” awards for “Best Use of Statistical Models” went to the USC “Big Data” team (Hsuanpei Lee, Omar Lopez, Yi Yang Tan, Grace Xu, and Xuejia Xu) and the USC “Quants” team (Cheng (Serena) Cheng, Chelsea Lee, and Hossein Shafii).

All winners were given certificates and medallions designed by Chapman’s Ideation Lab and printed on Chapman’s MLAT Lab 3D printer (see photo).

Winners also received free student memberships in the American Statistical Association.

Many thanks go to the Silver Sponsors: Children’s Hospital Orange County Medical Intelligence and Innovation Institute, Southern California Chapter of the American Statistical Association, and Chapman University MLAT Lab; and Bronze Sponsors: Experian, RStudio, Chapman University Computational and Data Sciences and Schmid College of Science and Technology, Orange County Long Beach ASA Chapter, the Missing Variables, USC Stats Club, Luke Thelen, and Google.

Thanks also to the 45 VIP consultants from BitScoop Labs, Chapman University, Compatiko, CSU Fullerton, CSU Long Beach, CSU San Bernardino, Education Management Services, Freelance Data Analysis, Hiner Partners, Mater Dei High School, Nova Technologies, Otoy, Southern California Edison, Sonikpass, Startup, SurEmail, UC Irvine, UCLA, USC, and Woodbridge High School, many of whom spent most of the weekend working with the students.

Overall, participants were enthusiastic about meeting students from other schools and the opportunity to work with the local professionals. (See the two student perspectives below.) DataFest will continue to grow as these students return to their schools and share their enthusiasm with their classmates!

The Mean Squares Perspective

by Seyed Sajjadi

For most of our team, this DataFest was only the first or second hackathon they ever attended, but the group gelled instantly.

Culture is important for a hackathon group, but talent and preparation play key roles in the success or failure. Our group spent more than a month in advance preparing for this competition. We practiced, practiced, and practiced some more for this event. We had weekly workshops where we presented the assignments that we had worked on for the past week.

The next essential for the competition may come as a surprise to most: having an artist design and prepare the presentation took an enormous amount of work off our shoulders. During the entire competition, we had a very talented artist design a fabulous slideshow for the presentation. This may sound boastful, but allowing specialized talent to work on the slideshow the entire competition is a lot better than designing it at the last minute.

The questions that were asked were not specific at all, and it was on the participants to form and ask the proper questions. We focused on optimizing two questions of customer acquisition and retention/conversion. We proved that online targeting and marketing can be optimized by regional historical data feedback, meaning that most states residents tend to have similar preferences when it comes to same destinations. For instance, most Californians go to Las Vegas to gamble, but most people from Texas go to Las Vegas for music events; these analyses can be used to better target potential customers from neighboring regions.

Regarding customer retention and conversion of lookers to bookers, we calculated the optimum point in time where Expedia can offer more special packages; this time frame happened to be around 14 sessions of interaction between the customer and the website. The biggest part of our analysis was achieved via hierarchical clustering.

A big aspect of the event had to do with the atmosphere and the organization. They invited people from industry to come and roam around the halls, which led to a great opportunity to meet professionals in the field of data science. We were situated in a huge room with all of the teams. We ended up crowding around a small table with everyone on their laptops and chairs. The room was big enough to have impromptu meetings, which allowed a lot of room to breathe. This hackathon was a huge growing experience for all of us on “The Mean Squares”.

Team Pineapples’ Perspective

by Annelise Hitchman

On day one, I could tell my enthusiasm to start working on the dataset was matched by the other dozens of students participating. The room was filled with interaction, and not just among the individual teams. I enjoyed talking with all the consultants in the room about the data, our approach, and even just learning about what they did for work. DataFest introduced me to real-world data that I had never seen in my classes. I learned quite a bit about data analysis from both my own team members and nearly everyone else at the event. Watching the final presentations was an inspiring and insightful end to DataFest. I really hope that DataFest is able to continue and be available to universities such as my own, so that all students interested in data analysis can participate.

Michael Fahy is Professor of Mathematics and Computer Science and Associate Dean, Schmid College of Science & Technology at Chapman University

To leave a comment for the author, please follow the link and comment on their blog: R Views.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.