Debunk cheating data scientists – before they even start cheating

Debunk cheating data scientists – before they even start cheating

This is part 4 and 5 of my 5-part blog series about the application of AI and Machine Learning in company projects.

Reading this article will enhance your communication with your data scientist at eye level competently.

You will learn:

  • How to find the objectively best AI model
  • Possibilities of how the Machine Learning solution can be implemented in your production environment

How to easily understand common terms in the context of machine learning

This is part 4 & 5 of the series. The complete article is split into 5 parts:

Read to the end of both articles to fully understand all aspects.

Part 4: Evaluation actually is that easy!

The evaluation of Machine Learning models is equivalent to the quality assurance (QA) of your AI project.

Besides the data preparation process (Part 2 of the blog series), this is the second most important phase in any Machine Learning venture.

The main goal is the selection of an appropriate prediction model.

The main selection criteria are the prediction quality. Because, the better the prediction, the better your benefit.

Conscientiousness during the QA procedure should play the predominant role. In this context conscientiousness means in regard to the objectivity of the evaluation.

During this phase bad or even harmful algorithmic predictions must be without any compromise discovered and eliminated.

Therefore, the most important requirement for the evaluation phase is the need to perform the evaluation objectively and honestly.

The good news is that this is not really difficult, because the evaluation process corresponds rather to a craft trade than an art (in contrast to the data preparation phase).

The most simplest, important and objective approaches is the so-called cross-validation method.

Cross-Validation is the standard technique to gain a neutral, objective and unbiased analysis of the performance of your Machine Learning model.

The principle behind it is almost trivial! Here is how it works;

You take your data and split it up into two datasets or packages.

We will call the first dataset the training dataset and the second one will be called evaluation or hold-out dataset.

Like the name suggests: The first dataset or training dataset will be used to infer or learn the machine learning model.

After that, we will take the evaluation dataset and assess the prediction quality of our newly created AI model.

The ingenious point is: During the inference process, the AI model doesn’t see the evaluation dataset.

The model has only seen the training data and thus, was only able to adapt to these kind of data.

Or to put it differently, it could not memorize the evaluation data. Instead, for the model those data are completely new and unknown.

Applying this method, you can quasi simulate the later usage when your model will have to predict a new actual data.

Therefore, this kind of quality assessment is neutral and objective.

If you would take the training data for the evaluation, you would practically tell your model: Memorize each data point as accurate as possible but not to pay any attention to the structure or the idea behind this data.

By estimating the prediction quality using the training data, you would evaluate the model on the basis what it already knows the best.

You would measure a much lower error than you would otherwise.

To get an honest and neutral assessment, you must measure the quality on new and unknown data.

The hold-out dataset exactly meets this requirement. The AI model did not see it and could not adapt to it.

Thus, we force the model to learn the abstract structure behind the data. The abstract structure that has generated this kind of data.

Because the new data (= evaluation data) will also follow the rules of this (hidden) structure.

And the most important question is how good your machine learning model will react to new data.

Cross-validation simulates such a situation and is therefore the best approach for measuring your prediction quality.

Hence, make sure that you have considered a hold-out dataset in each step of the evaluation phase. Only then, you will be able to maintain a neutral and objective view on the performance of your method.

Considering the data split ratio: As a rule of thumb you can take 20% training data and 80% evaluation data.

However, this choice will not only depend on the dataset itself and the amount of data but also on your individual requirements.

Evaluation metrics

Cross-validation is a fundamental technique for assessing your model’s prediction quality.

The actual error measurement is done by the help of what is called metric.

Cross-validation will use this metric to subsequently measure the cumulated error.

This cumulated error can then be used to compare the prediction quality of different machine learning models or approaches.

For instance, suppose you want to predict two classes, class A and class B. In this case a metric could be 0 error point if the prediction was correct and 1 error point if the prediction was false.

This metric will then be applied to every combination of given data point and its model prediction and subsequently all errors will be added up.

This is just one example of a metric.

In fact, there are countless examples of metric. Depending on what your prediction goal is and what kind of misconduct you would like to punish.

Therefore, it is important that you choose a metric that reflects your business goal.

If you do not consider a business goal compliant metric, your algorithm will not be able to replicate those goals.

Yet in most cases, standard metrics will do the job just fine or during the data preparation phase the data can be transformed in such a way that standard metrics can be applied.

Under different circumstance though an individual metric can make sense.

This could be the case if you would like to consider special cost or other criteria that cannot be captured by standard metrics.

However, designing your own metrics is significantly costlier than using ready made ones.

Hence, you should always try to make use of standard metrics as long as possible.

Let your business goal guide you in choosing the right metric. This requirement is actually an individual one and you should consider getting support by an experienced data scientist.

Excellence by experience

Another useful way to improve your AI model is by taking the knowledge of experienced people into account.

A lot of employees have a very profound practical knowledge that can be used to assess the data quality.

Coworkers with such an experience can give you clues as to which kind of data is particularly meaningful and as well supports your data scientist in interpreting the seen data.

This better and deeper understanding helps the data scientist during the data preparation and selection phase on the kind of modification to be considered in the model.

The other way is also possible meaning that your employees will get new insights into the processes such that they can improve themselves.

This is in particular true if you apply comprehensible model (i.e. not neural networks).

You have to beat the baseline

Cross-validation allows you to compare two AI models, which allows you to decide if an model A is better than a model B.

It gives you comparative values or values of your relative improvement.

In order to determine the absolute improvement, you will additionally need the information about the current state – also called the baseline.

In other words, the baseline allows you to measure the absolute improvement from the original or current state to the result achieved by a machine learning approach.

For instance: The current state could be that only 3.5% of shop visitors actually convert or that the throughput of a business process is at about 20 units per minute.

An AI model could increase the conversion rate to e.g. 6% or the throughput to 30 units per minute.

In that case the absolute improvement would be 2.5 percentage points or 10 units respectively.

In order for the machine learning approach to be meaningful, the end result has to be better than the original or current state. If you cannot see any improvement, you should reconsider and recheck your base data in regard of the informative value for your business goal.

The definition and finding a baseline value might be tricky.

The business goal should determine how the baseline will be calculated.

This means the current costs and current effort made to reach the business goal have to be investigated and measured to determine a reasonable baseline.

In case you haven’t applied any machine learning model yet you will also have to consider manual labor costs and work.

On the other hand, if you already use an AI approach or any other automated method, it becomes much easier because then your new model will only have to compete against your old software approach.

Another intrinsic advantage of the baseline: being an absolute reference is iy allows you to compare the relative advantage of two new competing models on an absolute level.

Let’s assume we have a method A that yields a conversion rate of 4.5% and method B yields 4.1%.

If the current baseline is at 1.5%, then the relative improvement of A over B is not particularly huge.

Therefore, you might still want to choose B instead of A because even though having a poorer performance, model B might be much easier to understand than A.

So the baseline does not only provide you an appropriate approach to compare two methods but helps you also in the decision of which model should be finally adopted.

Thus, at the beginning of each AI project, you should pay attention to what you want to improve and subsequently how you can best measure the current state.

Clustering – Or when your data scientist is also overstrained…

In the other article, I’ve already discussed the risks of applying a clustering or unsupervised method.

Now it is time to expose the special weaknesses of clustering approaches.

In contrast to a supervised method, the unsupervised method (or clustering) does not use any labels which you could use to assess prediction quality in any way.

In the evaluation phase, the disadvantages become particularly apparent because any kind of objective evaluation isn’t possible at all.

If you still think the application of a clustering algorithm is a good idea - by any means - you have to fully understand any aspect of why a specific clustering result has been reached.

You have to understand that clustering method are designed in a way that they always find a “solution” (that is due to its mathematical properties).

For example: Consider you have a dataset and let’s further assume that this dataset actually does contain two clusters or two data point clouds.

Since you have no idea what the data is made up of, you’ll tell your clustering algorithm now that you expect three clusters.

Due to its mathematical structure, the algorithm will – as wished – find three clusters.

But you will not have any possibility to verify the veracity since you don’t have any labels.

But it gets even worse: Usually datasets do not have any clear cluster structures. Instead the characteristic datapoints are so closely interwoven that the definition of clusters becomes close to impossible.

And now does your clustering algorithm find three cluster – congratulations!

In fact, there exists multiple circumstances that will make clustering pretty unstable and unreliable.

Small changes to the input data might result in unforeseen consequences with the corresponding negative effects on your business.

Try to avoid any kind of clustering as long as possible.

The results of clustering method are only seemingly intuitive (pseudo plausible explanation or talk up of the outcome), very risky and provide an enormous potential of misinterpretation.

Since classical clustering methods do not consider any hold-out data you, cannot measure the performance objectively.

Further, your data scientist has to understand any aspect of the mathematical structure of the clustering criteria.

Normally, the clustering criteria does not correspond to your original business goal. This is another reason why the result is usually useless.

Frequently, the negative effects of an unsupervised algorithm only will become noticeable in the later production phase when it has been deeply integrated into your IT systems.

Besides the already accumulated damage, you will also have to deal with additional (high) costs of the repair.

In order to avoid such (invisible) traps, you should prefer supervised methods as long as possible. 

Part 5: Implementation and Integration

The implementation phase or the deployment is the final steps and describes the integration of your machine learning method into the operational systems of your business.

Before starting this phase, you must make sure you properly assess the quality of your AI method such that you can proof that it serves your business goal.

The realization often depends on the existing IT infrastructure. Depending on those requirements you may choose between different options.

Usage as a service is one of the most modular applications. In its simplest form, other programs or services of your IT infrastructure may request prediction for given datasets via an API (application program interface). This approach makes the prediction system independent of the rest of your infrastructure which often reduces maintenance costs and overhead. Since the communication is established via a network, you might want to keep latency issues in mind.

Alternatively, you might include your algorithms directly into your system or programs. For example, as a plugin or as a programming library. This library can be directly included into new IT software components. However, be sure that the interfaces to the model itself is encapsulated. If for some reason you want to replace the current model with a new and better on,e you will only have to adjust a limited amount of components. This will safe costs, time and ensures a more stable operation.

If you decide to choose it as a service, it may be advisable to directly transfer everything into a cloud service and make it externally managed. The operation of a prediction system often requires highly qualified personnel. Suitable data scientist with the necessary skills are often difficult to find. External service providers usually have such qualified employees which you will then (so to say) hire at the same time. Therefore, this type of application reduces the amount of resources needed and allows you to concentrate on your core business.

Once the system has been established, you have to continuously and carefully monitor any live operations. Make sure you record any relevant protocol data to revise the performance improvements. Evaluating these protocols helps to supervise your costs and supports you in justifying the usefulness of the application of a machine learning approach.

Monitoring is also for another reason very important. Overtime, the data basis might shift due to changes in your company processes. This can have a negative impact on the performance of your models which you can prevent by quickly reacting to such changes. A deterioration of the performance usually requires an adjustment of your model to the new circumstances.

Checklist

Finally, like in the previous parts, I will provide a short list of the most important aspects you should pay attention to:

  1. Do you perform an honest and objective evaluation of your approach?
  2. Do you have a hold-out dataset?
  3. Do you apply cross-validation?
  4. Do you consider an error metric that accounts for your business goal?
  5. Do you consider the experience of your employees?
  6. Do you consider a baseline as a reference? 
  7. Does it make sense to apply an easier and better to understand but less performant model?
  8. Clustering: Do you understand how you got the results you now observe?
  9. Avoid clustering as long as possible.

Conclusion

This was the final part of my blog series about AI projects in companies. As you could read, the implementation requires the attention of multiple aspects – not only technical ones. Also, as you might have noticed, you do not need any particular mathematical knowledge to understand the basic concepts of data analysis. However, by reading these articles you have gained a solid basic knowledge such that you can retrace any step of a typical machine learning project. If you have any further questions do not hesitate to contact me. Simply write me a message here on LinkedIn.

About the author

Dr. Thomas Vanck is an expert for Machine Learning and Data Analysis. Since years, he supports companies in applying their data for bigger success. He is looking forward to hear your questions about your planned or ongoing data projects. Feel free to write him a message. 

要查看或添加评论,请登录

Dr. Thomas Vanck的更多文章

社区洞察

其他会员也浏览了