The 7 steps to choose the best topic for  your #DataScience graduation project
Photo by Jasmine Coro on Unsplash

The 7 steps to choose the best topic for your #DataScience graduation project

So, you’ve made the very wise decision of studying Data Science a while back. You have studied a bunch of various difficult topics. And those challenges further sparked your interest to continue.?And after all of the long-hours?studying, shone the light at the end of this very long tunnel:??Your graduation project / degree thesis??

?And with this bright light, comes the very common question:

"Do you have any ideas for the final thesis or project?

"?Not-so-spoiler alert: This article is your very, very detailed answer to this question .?

But first things first, let's start with the ultimate rule for this phase:??

** Attitude first, details next?**

This phase will feel somehow like the time you went shopping for your prom dress. It should have been an exciting experience, but the overwhelming number of choices for that one-time event has turned the experience into a stressful one.?When deciding on a topic for your thesis or project, you will probably feel stressed. There is a multitude of topics and you might not be sure of what and how to choose, or what to consider when deciding.??You might find researching all of the different topics overwhelming, and this is completely normal and expected. What I would strongly advice is to?try your best to?enjoy the exploration phase?in itself, regardless of your final decision. Dress shopping should always be fun, so is choosing your?thesis / project's topic!??

And speaking about a stress-free experience, let's start by highlighting why it is important to do enough background research before deciding on your final topic:?

Your graduation project / thesis will be a major?opportunity?to:?

  • A chance to discover and?dig deeper?in your topics of your interest:?By that point, you would have studied a bunch of courses and implemented some methods here and there. Some you liked, and others not as much.?This is your chance to dig deeper and gain more depth of knowledge and practical experience in the topics that you have liked.
  • A chance to?stand out?from a crowd of identical CVs:?For us as interviewers and hiring managers, choosing good fresh graduates is a more difficult task than choosing experienced professionals. Most of the CVs for fresh grads look very similar. And one of the main differentiating factors between one CV and another is the relevance of the graduation project / thesis. So use this chance to stand-out from the pool of identical CVs.?

If the decision is that important, what is the best way to decide on your topic?

There are two ways to decide on your graduation project or thesis:

  • A bottom-up approach:?Search for “Ideas for a Data Science project or thesis” online or check?Kaggle?, and you would get numerous ideas. Those lists are very useful as an initial inspiration. However, you shouldn’t decide exclusively based on those lists, as the results tend to be generic and biased towards trending topics.??
  • A top-down approach:?To get the best topic for your project, you will need to approach the problem the other way around. First, refine the paths and methodologies that interest you first, and then use the bottom-up approach to get ideas about the possible topics in those areas.??

So how to do this exactly??The tactics are concrete, here are 7 questions that will guide you through your decision process:??

**********************

The 7 guiding questions:

Part I.?PATH:

1. Which path would you like to take, a?research?path or an?applied?one????

Part II:?PILLARS:

2. Which?data type(s) are you interested in??

3. Which?algorithm(s) and method(s) are you interested in??

4. Which?domain(s) or function(s) are you interested in??

Part III.?PROJECT:??

5. What are the?concrete topic(s)?that will cover your previous preferences???

6. Which?data sources(s) will you use for your project / thesis??

7. Who will?supervise?your project / thesis??

Continue reading to understand how to answer each of the previous questions and what to consider during the final decision.

**********************

Part I: Choose your path: Research thesis or Applied project

Would you like to answer a novel research question that has never been addressed before, or would you like to apply this fascinating science to develop and AI program that would address a present challenge???

Q1. Which path: research or applied???

If you need more details to decide, here is a comparison that will guide you through:??

No alt text provided for this image




*********************

Part II: Your Pillars - Data type, Algorithm, and Domain

By now you have studied numerous topics in Machine Learning and Artificial Intelligence and their applications used on different kinds of data. But probably a subset has caught your deeper interest. Now its your chance to dig deeper.??

?In this section you will be guided to answer 3 questions:?

  1. Which?data type(s)?are you interested in??
  2. Which?algorithm(s)?and method(s) are you interested in??
  3. Which?domain(s)?are you interested in??

Use the following guide to refine the data type, algorithms, and domain that you are interested in. For each of the following questions, you are not restricted only by one answer, but also you can choose multiple answers. But also it is OK if you have no specific preference. And you can also have an exclusion-answer - which is anything but those methods.??

**

A. Data Type

There is the standard structured data, and there are special data types that are have?their own properties, techniques and challenges.??

Q2. Which data type(s) are you interested in??

Some examples of those data types are:?

  • Time Series: e.g. Stock data, Sales data, Heights of ocean tides, …??
  • Geographical / Spatial: e.g. Locations of earthquakes,?Cell phone data, …??
  • Images: e.g. Satellite images, X-rays, Scanned documents, …??
  • Text: e.g. News articles, Twitter feeds, Product reviews, …??
  • Sound: e.g. Music, Historical speeches, Audio with different accents, …??
  • ...

What if you are interested in multiple data types? You have 2 options:??

  1. Single data source that has both types. ?For example, if you are interested in images and geographical data, you might look for a topic that uses Satellite images. Or if you are interested Text and Time Series, news articles might be a good fit.??
  2. Different data sources of different types.? For example: If you will predict the optimal budget allocation for the marketing campaigns, you can use time-series sales data as the first data source, and text data from news articles as a second data source.?

**

B. Algorithms & Methods

You've studied, studied, and studied. And now it's your time to choose your favourites!??

Q3. Which algorithm(s) and method(s) are you interested in??

Here is a broad overview of the major topics that you should think about when choosing your topic. Does any or multiple of those ignite your interest? Does any of those belong to your exclusion list? It is also fine if you do not have any preference.??

  • Supervised learning:??Classification, Regression, …?
  • Semi-supervised learning:??Reinforcement learning, …?
  • Unsupervised learning:??Clustering, Dimensionality reduction,??Anomaly detection, …?
  • Other:??Bayesian methods,??Evolutionary algorithms,??Optimisation algorithms,??Deep learning,??Generative models,??Graphical models,??Active learning,??Transfer learning,??Interpretable ML,??Auto ML,?Ensemble methods, …??

If you have chosen a research path, this is your playground! In addition to the option of digging deeper into a single method or algorithm, think also about?combining multiple ones.??Example? Bayesian methods for interpretable ML. That's cool!??

**

C. Domain

Data Science and Machine Learning can be applied in almost any field you can think of. And now it's your time to pick one.?

Q4. Which domain(s) or function(s) are you interested in??

Here are 3 sources that will help you to decide which domain to choose:

  1. Prior preference:?If you are passionate about a specific domain,?then that's the one.
  2. Local job boards:?Understanding your local job market is very important, specifically if you are interested in landing a job directly after your studies. Start by understanding your local job market through searching for the Data Science jobs on your local job boards.?Get a good understanding of who is hiring and what they are interested in. Use this information to choose your domain.??
  3. Global AI in industry reviews:?There are general "Data/AI maturity levels" for the various industries. Generally, industries that are digital-natives or early adopters of the digital transformation have higher adoption of AI solutions.??This topic has been a research area for most of the big tech and consulting firms. Search for "state of AI in industry", and you will get numerous reports.?Get started with?McKinsey's report here?to understand the overall maturity of the different industries, use cases and domains.??

**********************

Part III. Concretise your project

By now you should have decided on the first 2 Ps: Path and Pillars. That is, you should have chosen if you will proceed with a research or an industrial topic. You should have had an idea about your preferred data type and/or methodology and/or domain. Now it is time to concretise the details of your project / thesis.??

**

A. Finalise your topic

Now It’s your time to research the possibilities of your preferred data type and/or methodology and/or domain.?


Q5. Which concrete topic will you work on???

In addition to the regular search engines that will enable you to find ideas from blogs and other forums, use https://scholar.google.com/??and look for the latest topics and developments of the 3 pillars you are interested in.

Example: If you are interested in using Audio in Reinforcement learning:

No alt text provided for this image

Spend time to read and understand what has been done, and based on those insights concretise what exactly you will be working on in your project / thesis. For example, in this case you might decide to work on an audio-based language training (speaking / listening) app that dynamically generates and assigns lessons based on the users inferred need and progress.

**

B. Find your data source(s)

Q6. Which?data sources(s) will you use for your project??

After deciding on the concrete topic of your project, find your data sets online. You could also use ?https://datasetsearch.research.google.com?or any other specialised portal that you trust. If you are planning to use standard datasets, you might find https://paperswithcode.com/datasets to be very useful.

Regardless of the source of your, make sure that your data has:

  1. Open license: Not all of the online data-sets are free to use. Some are subject to charge, and others are subject to conditions. Before using the data, make sure that your data source has a fitting license for your usage.
  2. High quality: Before deciding on the data source that you will use, triple check the actual quality of the content. You know the rule: Garbage in - Garbage out.
  3. (Optional) Data updates: If you are developing an applied solution that is using the current weather data in a certain location, you will need as a first step batch data to train your model. Afterwards, when you proceed to deployment, you will need the same dataset in real time. As to avoid such unpleasant surprise during your deployment, make sure to check the updates frames beforehand.

**

C. Agree with your supervisor(s)

Q7. Who will?supervise?your project??

Last but not least, after finalising your topic(s), you will need to find a supervisor and a mentor who will guide you through this journey. Your supervisor will play a crucial role in guiding and mentoring you through success. Make sure to choose a supervisor who is?experienced and interested?in your topic, and has?enough time?to guide you through your thesis. You might also need?multiple supervisors, specifically if you have chosen a topic that is interdisciplinary. For example, if you have chosen a topic of Boosting techniques in Drug Design, you might need a supervisor who is expert in Machine Learning and another who is an expert in Pharma or Chemistry,??

**********************

Here you are, that was your guide with 7 simple questions to guide you through finding the best graduation project / degree thesis for you.

Don't forget to ENJOY this process of exploration. Our field is full with incredibly exciting opportunities, so make sure you are not missing the fun!

**********************

Mohamed AbdelSalam, MSc

Data & Analytics Manager | Elevating Business Intelligence through ML, Deep Learning & Robust Data Management ??

2 年
NourEldin Osama Saad

Fresh Graduate Data Science | Pythonist | CS

2 年

I appreciate the insightful information you provided in this article

回复

It is great and helpful, thanks Prof

回复
Mohamed Mostafa

Project Manager (PMP)? | Electrical, Electronics, Communications Engineering | Light Current

2 年

See your inbox massage

回复
Catherine Sirven

LifeHub animation. Innov4Ag program at Bayer

2 年

Always pleasantly surprised by the insights provided by Deena Gergis in articles! Super helpful and structured ideas! Something I would certainly recommend as a read for the participant of the #dcc22 challenge Heiko Schomberg and to academic colleagues (Duplessis Sébastien?). Also very interesting for Campus Région du numérique!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了