Education and career advice for data science students
What data science students need to know. Image generated by Copilot with ChatGPT-4

Education and career advice for data science students

Michael L. Brodie [email protected], Data Systems Laboratory, School of Engineering and Applied Sciences, Harvard University, Cambridge, MA USA

Research.com extracted portions of this interview for: Interview with Data Science Experts: Answering Students' Questions About Data Science Trends, July 25, 2024

Data science is non-intuitive, at least at the beginning

Data science is the use of the artificial neural network (ANN) paradigm to solve problems for which there are no solutions using conventional paradigms like math, science, and computing. Students educated in conventional paradigms have an intuitive understanding of conventional problem solving that can produce results that are correct, accurate, coherent, precise, unbiased, optimal, certain, deterministic, robust, and provable with verifiable explanations and interpretations. Data science is far from intuitive for novices. The ANN and conventional paradigms are incomparable. ANN results can be incorrect, inaccurate, incoherent, imprecise, not robust, biased, sub-optimal, uncertain, non-deterministic; fragile or specific to the ANN analysis, the training dataset, or the inference dataset; lack verifiable explanations and interpretations in terms of the domain problem being addressed; even wildly incorrect, e.g., hallucinations (factually incorrect, made up, or weird answers). There is no ANN theory that can be used to prove or disprove those properties. Despite this potential, data science has been applied so successfully in billions of applications that it is being adopted and applied in every institution world-wide. Rather than being negative, these properties are due to the potential of data science to address problems at scopes, scales, complexity, flexibility, realism, power, and speed vastly beyond conventional paradigms. The goal of a data science education is for it to become intuitive - to understand the data science paradigm and its differences with conventional paradigms.

1.???? What core skills do you believe are essential for a successful career in data science, and how should students go about acquiring them?

This question is comparable to asking what skills are required for a career in science, a mature, peer but fundamentally different knowledge discovery paradigm. Both science and data science are broad fields each offering careers as practitioners and researchers. I propose the following for planning both career paths in data science. To make an informed decision, students should become familiar with the nature of data science, e.g., using [1] or online introductions, and with the data science life cycle, e.g., using [1] or for more detail, the introduction to [2].? Data science is a problem solving paradigm or method that can be applied to problems in any data-rich domain. Solving such problems requires knowledge and expertise in data science and in the domain. Students should select a discipline, e.g., biology, that they are passionate about and develop discipline-specific core skills. Data science is now required in every discipline, students should develop data science problem solving skills for both career paths. Having understood data science[1] and the data science workflow[2], students should develop skills in solving discipline-specific problems using data science. Students should focus on the dominant neural network model, e.g., large language models (LLMs) or convolutional neural networks (CNNs), that are most successfully applied in their chosen discipline. LLMs and CNNs are two of over 30 such models. It is always helpful to get guidance on such plans from data science and domain experts. Honor the time, commitment, and challenges in developing a successful data science career. Due to the nature of this fascinating, powerful new technology and career, be prepared for fundamental change. More than in any career choice, you will learn to think differently about the world.

2. What types of internships, projects, or real-world experiences would you recommend students seek out during their studies?

Data science is inscrutable – a fancy term meaning that we do not understand how it works – what or how a data science model learns in training nor discovers in inference (problem solving). So, there is no theory with which to understand data science. It must be learned by understanding its nature[1] and workflow[2] and applications of that knowledge in practice, solving toy problems first then real, practical problems. In 2024, every department of every university, every company, large and small, and every government agency are learning data science and its applications, just as students are. Students should research internships as an integral part of their data science research. Start by establishing relationships with real-world data science projects, not just as a potential intern but as a data science student seeking real-world knowledge. This involves researching the organization’s business and data science projects. Once they gain an interest in you, offer basic data science assistance that is specific to one of their data science projects discovered in your research of the organization. You must be able to offer knowledge, expertise, and labor that they do not have driven by your passion. This method has worked well in my graduate-level university research, with undergraduate and high school students. Becoming valuable to a real-world data science project is a multi-year activity.

3. How important is it for data science students to have knowledge in fields outside of data science, such as business or social sciences?

As described in the previous two questions, knowledge and expertise in fields outside data science are essential in learning and understanding data science. This suggests two career choices – a career in a discipline with data science expertise and a career in data science with expertise in one or more disciplines, e.g., biology and medicine.

4. What emerging trends and technologies should data science students be aware of to stay ahead in the field?

The AI Revolution is in its infancy, only a decade old. Data science is the most studied, practiced and published field on the planet. Millions of data science researchers and practitioners produce new research, technology, and practical results daily. Nobody can keep up with such a deluge. Success as a data science researcher or practitioner requires investing considerable resources to keep up with developments relevant to their work and plans. As in all recommendations here, students must learn enough about data science to define an initial focus and pursue it passionately. It is impossible to cover everything. Develop research methods to discover and understand emerging trends and technologies within that focus. This may require going outside your focus periodically. Attempt to understand such developments by evaluating their impact on your current knowledge of data science. Be prepared for fundamental breakthroughs that will change or supplant your current knowledge. Another focus may arise, e.g., a different application domain, a different approach to data science. In such cases, follow your passion.

5. What advice would you give to computer science students about preparing for the transition from academia to the professional world?

Transitioning from academia to the professional world is like transitioning from not understanding to understanding data science (question 1), and from academic studies to an internship (question 2). These transitions require a research method to identify and understand opportunities, in this case in the professional world, the select and understand those that appeal to you most, the gain direct experience and knowledge of such real-world opportunities. This should be done to enable the student and the professional opportunity to determine essential skills and passion that will benefit both. Pursuing multiple such opportunities should provide multiple choices. The transition from academia to the professional world is long-term, more than five years for undergraduate degrees and more than ten years for graduate degrees. As seen in the previous answers, it helps to set professional career objectives to focus your education and internships. Considering your future life, explore and discard multiple career paths based on knowledge and passion.

6. What resources (such as galleries, workshops, or community groups) should data science students take advantage of to enhance their learning and exposure?

The remarkable, already proven value of data science as new knowledge discovery and problem-solving paradigm has led to a world-wide demand for data science and data scientists. This has led, almost overnight, to a vast number of educational and training resources readily available from training and educational organizations in person and on the web. As with all previous answers, identifying and selecting such resources requires research such as defining your requirements to guide discovering and selecting such resources. I have not conducted such research; hence, I can’t recommend any. However, I have observed the phenomenon of overnight data science experts. Most companies that offer AI products and services offer online tutorials in topics related to their product. Most such tutorials are free but are intended to market their products. I have often used such free tutorials by data science technology leaders including Microsoft, Google, Amazon, Anthropic, and OpenAI.

References

[1]? Brodie, M.L., A framework for understanding data science, arXiv preprint https://arxiv.org/abs/2403.00776https://doi.org/10.48550/arXiv.2403.00776 Harvard University, March 2024.

[2]? Bin Yu, Rebecca L. Barter, Veridical Data Science: The Practice of Responsible Data Analysis and Decision Making, MIT Press, October 2024, online version.

Dr. Dorothea Baur

Subject matter expert, keynote speaker (incl. TedX), writer, and lecturer on ethics, responsibility and sustainability with a specific focus on tech and finance. I help companies align value with values.

7 个月

This is an excellent piece - also for non-data scientists like me. I love your emphasis on the importance of expertise in fields outside data science in order to understand data science. And I am slowly starting to understand what you mean by "there is no theory with which to understand data science" and that I was wrong in assuming that data science just mirrors an inductive approach to reasoning. So, thanks for your patient elaboration!

回复

要查看或添加评论,请登录

Michael Brodie的更多文章

社区洞察

其他会员也浏览了