Machine Learning Projects: 5 steps to Success!

Machine Learning Projects: 5 steps to Success!

The following five-part blog series will give you all the necessary knowledge to plan and execute a Machine Learning based Predictive Analytics project. By reading this article you will learn the following:

  • How your company should structure a Machine Learning project.
  • Learn the language of Machine Learning / Data Science and avoid typical pitfalls.
  • What kind of economic and technological criteria are important in achieving the highest return.

This blog series is split into 5 parts:

(Read to the end of the article for a list of tips)

Why Machine Learning?

For a while now, Machine Learning or Predictive Analytics has been a hot topic. It can be considered the next logical step in applying data in business.

The majority of business decision-makers already use data to get a better understanding of their company and its processes and to make informed decisions.

Until now, the analysis of data has been mostly restricted to simple evaluations and visualizations. Such visualizations are usually realized by applying classical BI tools that load protocol data from databases and create so-called reports that display different diagrams like sales figures etc. This approach is especially useful for checking KPIs or other measurable business goals. 

However, there are still things that need to be checked manually. You probably should check why certain numbers have been observed or what kinds of conclusions can be drawn from the data (usually this is achieved by applying drill-down tables, like pivot tables in Excel).

Even in a manually executed analysis, experienced data analysts often make mistakes

These manual analyses are prone to errors, and even experienced analysts can make easy mistakes. Repetitive executions are often time consuming and expensive. In a worst-case scenario, the costs, repeated errors, or duration can completely prevent useful analytical actions.

Automated procedures are quicker, more objective, and cheaper

Machine Learning or Predictive Analytics techniques can solve this problem by providing a method for decision proposals. It can be regarded as a kind of new BI because, in addition to classical BI reports, these intelligent algorithms generate recommendations of actions to be taken.

This approach offers several advantages. Manually executed tasks can now be (partially) automated, so corresponding costs can be significantly reduced. On the other hand, the analyses are qualitatively better and more standardized.

They are better because an algorithm can analyze much more data and is more objective. It is more standardized because, in contrast to a human worker, it maintains a consistent quality level. This makes an algorithm quicker, cheaper, and more accurate.

Your analysts won’t have to undertake any unnecessary work, meaning they can either pay more attention to other tasks they have to deal with, or they can use algorithms to support better-informed decisions. The implementation of an appropriate Predictive Analytics strategy, therefore, gives you an advantage over your competitors.

Managers are often clueless as to how a data application strategy should look

The execution of a Machine Learning application strategy requires a huge amount of expert knowledge in the areas of statistics, mathematics, and information technology. This means that managers are often clueless as to what the appropriate criteria and requirements look like. During the implementation of a Predictive Analytics project, the responsible manager often has to rely on an accurate execution by a data science service provider.

It, therefore, makes sense to gather some independent knowledge on the different principles and necessities of how to execute a Predictive Analytics project.

The following article will support your understanding of these aspects. It does not require any knowledge about statistics or mathematics and is explicitly written for managers and decision makers. By reading this article you will learn basic techniques to avoid the most common pitfalls made when fulfilling a Big Data Analytics project.

Marketing departments, journalists, and ghostwriters give a false picture of Machine Learning

Before delving into the first part of the blog series, it is first necessary to clarify some confusion in regards to names.

You have probably already heard terms like Predictive Analytics, (Big) Data Analytics, Machine Learning, Data Mining, Artificial Intelligence, AI or (the currently very popular) Deep Learning.

Don’t get confused by all these names. They basically all mean the same thing, namely “applied statistics”.

Any real difference is more a matter of opinion. Experts usually refer to these terms to categorize groups of techniques. It does not mean that one technique is better or worse than the other. It just varies depending on the problem that needs tackling. Unfortunately, this AI gibberish is the root cause of confusion in many cases.

Marketing departments and journalists use these terms and concepts since they have found they get a better reaction when they utilize more technical sounding language. For instance, Deep Learning is a technique from the eighties.

Big Data, on the other hand, must be regarded from a different perspective, since it is not about analyzing data but rather about technologies that store and manage data (for instance Apache Hadoop HDFS). However, Big Data infrastructures often provide a basis for the application of Predictive Analytics. So, both fields are often connected in the context of industrial applications.

It is enough to differentiate between the two domains “data management and infrastructure” and “data analysis”. This article is mostly concerned with the data analysis aspects.

The complete article is split into 5 basic parts. The first article (this one) is about business-relevant requirements. Part 2 explains how to handle and prepare data for the application of the Machine Learning Model which will be introduced in part 3. Part 4 illustrates the typical techniques for assessing the quality of your results. Part 5 discusses implementation issues and outlines integration techniques.

Part 1: Machine Learning must serve your business needs

Let’s begin this article with the most important aspect to consider: The central question in any Machine Learning project is how your actual business problems can be solved. Your Predictive Analytics project is as unique as your business. It is likely a combination of your goals, intentions, the requirements, and even your personal preferences. To avoid conflicts, you will need a well-structured approach. In this approach, the first thing to do is to formulate a very clear business goal. This goal will provide you with guidance at every step of the project and help avoid any ambiguities. For example, think about a project to decrease the churn rate of your customer base. This goal requires completely different measures than an online shop recommendation algorithm. So, make sure your project plan contains a clearly formulated goal right from the beginning.

Goals require plans

After developing a clear business goal, it is time for your project plan. Planning is best divided into parts. Each part of the plan should contain the current situation, its partial goals and a set of measures or methods to reach these partial goals. All parts of the plan will then be combined to achieve the principal goal. The project plan will give you orientation and will help you to clarify possible questions in advance, such as for project costs or project duration.

Again, again, and again

You will rarely encounter a Predictive Analytics problem that appears as a clearly structured data analysis problem. Therefore, just looking at your business goal won’t be enough. Even after finishing the first version of the project plan, each step is likely to need readjusting over time. This is because some parts may not go as anticipated or some aims may turn out to be too ambitious. You must realize that the process of creating a project plan is a repetitive process. The figure below visualizes some typical phases within a Big Data Analytics project. Each review cycle requires a review of each phase. The plan is complete when no other changes are necessary. 

Who should do what?

Usually, the manager or a business analyst is responsible for the formulation of a business problem. They should clarify which business goals and results should be achieved. Then, based on the current data available, it can be determined how realistic these goals are. The formulation of the business problem will define the framework for the project and will guide the implementation team. Typical questions you should answer are: What exactly do we want to achieve? What would a solution to this problem look like? Which kinds of methods (classification, regression) are required? While answering these questions, always keep in mind what Machine Learning methods can and cannot do. Having a broad understanding of how to execute a data science project will be extremely helpful in this initial planning phase.

As mentioned, the planning phase is a repetitive process and the quality usually depends on the number of repetitions. Try to clarify any gaps or open questions. Maybe you missed a detail, or some execution steps are too vague (for instance, which tools should be applied). Or you have considered approaches A and B, but the implementation phase only refers to approach A. 

Do you understand your data?

Another important aspect is getting an accurate grasp of your data. Understanding your data means that you understand how this data can be applied or what preparation steps are required. Not every Machine Learning or Predictive Analytics model is able to handle any type of data. For instance, is the data time-dependent (process chain)? Or is it categorical data (product types), text data, numerical data or discrete data (integers)? Another important aspect is dependencies within the data, for instance, weather changes lead to different shopping behaviors or machines becoming inefficient if it is too cold etc. Most data will require data preparation, so it can be compatible with the application of certain Machine Learning algorithms. Only by understanding your data can you formulate realistic goals.

Standard solutions are cheaper and quicker

During the planning phase, it is advisable to directly consider some software tools. This means that your plan should be formulated so you can use out-of-the-box solutions. Projects are much more efficient and quicker if you can apply standard solutions. Additionally, this approach lets you better focus on your actual problem.

Don’t forget your people!

Besides technology, your people will still play a very important role. Ask yourself: Did you consider everybody that will be affected by the project? Depending on the size of the project, different departments will be involved. This aspect is also related to which part of your plan corresponds to which department or person in your company. Different people might be affected by different parts of your plan. For instance, it might be necessary to consult the legal department for issues related to your data. It is therefore important that you can directly match each part of your project plan with a corresponding entity in your company. 

Checklist

Use this checklist during the planning phase of your Predictive Analytics project. For details, please refer to the article.

  1. What is your actual business goal?
  2. Does the formulation of the business problem provide an appropriate framework for the plan’s execution?
  3. Did you go over your plan repeatedly?
  4. Do you understand the potential of your data and how it can be applied?
  5. Did you consider the application of standard software solutions?
  6. Does your plan consider the people and/or departments involved?
  7. Can you match each plan section with the corresponding business units?
  8. Are there any individual requirements concerning your goal?  

The planning phase is especially important for the management team. It gives you an idea of what is realistic and how much effort it will take. It will also expose possible pitfalls. By considering the aspects mentioned in the article you will cover the most important uncertainties. Keep in mind that each Predictive Analytics project is an individual undertaking.

About the author

Dr. Thomas Vanck is an expert for Machine Learning an Data Analysis. Since years, he supports companies in applying their data for bigger success. He is looking forward to hear your questions about your planned or ongoing data projects. Feel free to write him a message.

Brendan Usher

Director at Logical Line Marking

6 年

I’d love to see this in action in business!

回复

要查看或添加评论,请登录

Dr. Thomas Vanck的更多文章

社区洞察

其他会员也浏览了