Education and career advice for data science students
Michael L. Brodie [email protected], Data Systems Laboratory, School of Engineering and Applied Sciences, Harvard University, Cambridge, MA USA
Research.com extracted portions of this interview for: Interview with Data Science Experts: Answering Students' Questions About Data Science Trends, July 25, 2024
Data science is non-intuitive, at least at the beginning
Data science is the use of the artificial neural network (ANN) paradigm to solve problems for which there are no solutions using conventional paradigms like math, science, and computing. Students educated in conventional paradigms have an intuitive understanding of conventional problem solving that can produce results that are correct, accurate, coherent, precise, unbiased, optimal, certain, deterministic, robust, and provable with verifiable explanations and interpretations. Data science is far from intuitive for novices. The ANN and conventional paradigms are incomparable. ANN results can be incorrect, inaccurate, incoherent, imprecise, not robust, biased, sub-optimal, uncertain, non-deterministic; fragile or specific to the ANN analysis, the training dataset, or the inference dataset; lack verifiable explanations and interpretations in terms of the domain problem being addressed; even wildly incorrect, e.g., hallucinations (factually incorrect, made up, or weird answers). There is no ANN theory that can be used to prove or disprove those properties. Despite this potential, data science has been applied so successfully in billions of applications that it is being adopted and applied in every institution world-wide. Rather than being negative, these properties are due to the potential of data science to address problems at scopes, scales, complexity, flexibility, realism, power, and speed vastly beyond conventional paradigms. The goal of a data science education is for it to become intuitive - to understand the data science paradigm and its differences with conventional paradigms.
1.???? What core skills do you believe are essential for a successful career in data science, and how should students go about acquiring them?
This question is comparable to asking what skills are required for a career in science, a mature, peer but fundamentally different knowledge discovery paradigm. Both science and data science are broad fields each offering careers as practitioners and researchers. I propose the following for planning both career paths in data science. To make an informed decision, students should become familiar with the nature of data science, e.g., using [1] or online introductions, and with the data science life cycle
2. What types of internships, projects, or real-world experiences would you recommend students seek out during their studies?
Data science is inscrutable – a fancy term meaning that we do not understand how it works – what or how a data science model learns in training nor discovers in inference (problem solving). So, there is no theory with which to understand data science. It must be learned by understanding its nature[1] and workflow[2] and applications of that knowledge in practice, solving toy problems first then real, practical problems. In 2024, every department of every university, every company, large and small, and every government agency are learning data science and its applications, just as students are. Students should research internships as an integral part of their data science research. Start by establishing relationships with real-world data science projects
3. How important is it for data science students to have knowledge in fields outside of data science, such as business or social sciences?
As described in the previous two questions, knowledge and expertise in fields outside data science
领英推荐
4. What emerging trends and technologies should data science students be aware of to stay ahead in the field?
The AI Revolution is in its infancy, only a decade old. Data science is the most studied, practiced and published field on the planet. Millions of data science researchers and practitioners produce new research, technology, and practical results daily. Nobody can keep up with such a deluge. Success as a data science researcher or practitioner requires investing considerable resources to keep up with developments relevant to their work and plans. As in all recommendations here, students must learn enough about data science to define an initial focus and pursue it passionately. It is impossible to cover everything. Develop research methods to discover and understand emerging trends and technologies within that focus. This may require going outside your focus periodically. Attempt to understand such developments by evaluating their impact on your current knowledge of data science. Be prepared for fundamental breakthroughs that will change or supplant your current knowledge. Another focus may arise, e.g., a different application domain, a different approach to data science. In such cases, follow your passion.
5. What advice would you give to computer science students about preparing for the transition from academia to the professional world
Transitioning from academia to the professional world is like transitioning from not understanding to understanding data science (question 1), and from academic studies to an internship (question 2). These transitions require a research method to identify and understand opportunities, in this case in the professional world, the select and understand those that appeal to you most, the gain direct experience and knowledge of such real-world opportunities. This should be done to enable the student and the professional opportunity to determine essential skills and passion that will benefit both. Pursuing multiple such opportunities should provide multiple choices. The transition from academia to the professional world is long-term, more than five years for undergraduate degrees and more than ten years for graduate degrees. As seen in the previous answers, it helps to set professional career objectives to focus your education and internships. Considering your future life, explore and discard multiple career paths based on knowledge and passion.
6. What resources (such as galleries, workshops, or community groups) should data science students take advantage of to enhance their learning and exposure?
The remarkable, already proven value of data science as new knowledge discovery and problem-solving paradigm has led to a world-wide demand for data science and data scientists. This has led, almost overnight, to a vast number of educational and training resources readily available from training and educational organizations in person and on the web. As with all previous answers, identifying and selecting such resources requires research such as defining your requirements to guide discovering and selecting such resources. I have not conducted such research; hence, I can’t recommend any. However, I have observed the phenomenon of overnight data science experts. Most companies that offer AI products and services offer online tutorials in topics related to their product. Most such tutorials are free but are intended to market their products. I have often used such free tutorials by data science technology leaders including Microsoft, Google, Amazon, Anthropic, and OpenAI.
References
[1]? Brodie, M.L., A framework for understanding data science, arXiv preprint https://arxiv.org/abs/2403.00776https://doi.org/10.48550/arXiv.2403.00776 Harvard University, March 2024.
[2]? Bin Yu, Rebecca L. Barter, Veridical Data Science: The Practice of Responsible Data Analysis and Decision Making, MIT Press, October 2024, online version.
Subject matter expert, keynote speaker (incl. TedX), writer, and lecturer on ethics, responsibility and sustainability with a specific focus on tech and finance. I help companies align value with values.
7 个月This is an excellent piece - also for non-data scientists like me. I love your emphasis on the importance of expertise in fields outside data science in order to understand data science. And I am slowly starting to understand what you mean by "there is no theory with which to understand data science" and that I was wrong in assuming that data science just mirrors an inductive approach to reasoning. So, thanks for your patient elaboration!