Highlights of DataFest 2018

Highlights of DataFest 2018

For the second consecutive year, I've enjoyed being part of the judge team at the annual DataFest event at Chapman University. Led by Dr. Michael Fahy and his team, this year's event attracted 24 teams from universities in the greater SoCal area.

What is DataFest?

DataFest is a weekend-long data hackathon, where teams of 2-5 university students are given a large, commerical dataset and a set of questions. The teams then work together using data science methods to produce one or more findings. They share their results during the presentation portion of the event.

We judged their work on 3 categories - Overall Insight, Visualization and Data.

What do Students do with data?

This is such an interesting question for me as an industry person. I was fascinated to see the different tools, approaches and results that the teams produced. There were a couple of observations that particularly interested me:

  • Most teams augmented the data that was provided with one or more sets of public data.
Students used open data, particularly government-supplied data
  • The more types of data the teams used, the more data munging (cleaning) they had to do. Some teams got stuck in this area. I actually thought that was very useful and reflective of real-world work with data.
  • Several teams used data visualizations early in their process, to get a 'view' of the quality or information in the data. One team found a bug in the vendor's website that populated a default which skewed their data!
  • The teams generally focused on using the data to gain insight into questions in one of three areas. These were as either a) making more money for the company that provided the dataset, b) providing more useful information for students or c) investigating relationships between socioeconomic impacts (poverty, levels of education, rural markets...) and the provided dataset.

Which Tools and Languages do Students use?

In this area, it interested me to observe that there seemed to be less use the R language (than in previous year's entries) and more use of Python. The most common algorithm used was logistic regression. A couple of teams built full custom machine learning models.

I heard from several teams that they were resource constrained given the size of the dataset -- due to the lack of storage and processing power on their laptops. Being a Cloud Architect, it pains me to hear that students are not using the public cloud in this work. Here's a quote from one team:

It took 2 hours to render these heat maps on our laptop.

An obvious growth area is to include mentorship with one more of the public cloud vendors for next year's event.

Hoodies and Blankets were on display during the wee hours of the hackathon.

What's Next?

Congratulations to the hardworking hosts at Chapman University and participating students on a great event. As I did last year, I invited members of the winning team to join me in real-world work. To date, I've hired one person from the winning team of 2017 - he's doing great. The energy, creativity and skills of the students inspires me.

Let's help them grow - to contribute contact Dr. Fahy via [email protected]


Thanks for the recap Lynn. I couldn’t agree more regarding the topic of providing more real world (Cloud) tools for the teams. Several asked me how to execute James Peach’s recommendation to load the dataset into MySQL for quick high-level analysis but their laptops couldn’t handle the larger import. The one team that I know succeeded in import didn’t get viable results until just before presentation time. A standard set of on-demand Cloud resources, available to all the teams, would have allowed them to more quickly get the analysis & discovery.

回复
Helen Arabanos

Analytics, Data Science, Statistics, HR, Accounting, Budgets, Project Management, Business Analysis, Training, Process Improvement, Process Documentation

6 年

Congratulations to Hernan Padilla and his son!

回复
William McIntyre

Managing Director | Business Enabler | Community Leader

6 年

Love the narrative Lynn Langit!

回复
Michael Pollind

Software Engineer at Qualcomm

6 年

had a good time, just a little under the weather from the event. stayed up all night writing sql. :D

Hernan Padilla

Principal Territory Manager

6 年

Way to go Ryan! Congrats to all on the UCI team! Go Anteaters!!

回复

要查看或添加评论,请登录

Lynn Langit的更多文章

  • Immigrant Stories...from Minneapolis

    Immigrant Stories...from Minneapolis

    The Student At the end of our remote pairing session, my young intern said, "I am happy and sad today." I said, "Why is…

    2 条评论
  • Gratitudes of 2020

    Gratitudes of 2020

    Everyone has faced struggles in this highly unusual year. As the year finally comes to close, I reflect.

    6 条评论
  • Learn with Me

    Learn with Me

    Over 4 million students have watched some part of some technical course that I've created over the past years. I have…

    1 条评论
  • GCP - What's New

    GCP - What's New

    This week I attended the annual GoogleCloud Next conference in San Francisco. Given my relatively unique perspective of…

  • What should I learn now?

    What should I learn now?

    In my work as a technical educator, speaker and cloud architect, I get a large number of questions from my students…

    2 条评论
  • Getting to Serverless Data

    Getting to Serverless Data

    As an independent cloud architect, I respond to the needs and desires of my customers. Although I have practical…

  • Travel Like a Techie

    Travel Like a Techie

    I travel frequently, for both fun and work. One of the most satisfying aspects of global travel is connecting in person…

  • What is Remote Pair Programming?

    What is Remote Pair Programming?

    Global Work Over many years, I've been working on global projects with distributed teams. I am often asked, just how…

    2 条评论
  • Coding in American Middle Schools

    Coding in American Middle Schools

    Don't tell me it can't be done. I am not saying it won't take a tremendous effort and a long time, but I am seeing the…

    52 条评论
  • What is TeamTeri?

    What is TeamTeri?

    Motivation For much of 2017, I've been working on a series of projects in an area that has been new to me. I call this…

    3 条评论

社区洞察

其他会员也浏览了