Anna′s Dilemma
Image generated by AI via OpenAI's DALL-E

Anna′s Dilemma

Anna arrived at work stressed. Her deep learning-based artificial intelligence model had been running for 120 hours on a powerful computer network. The reason was the enormous volume of training data, involving not only the traditional variables associated with the solvency of each client of the finance company but also individualized data from social media posts, shopping habits, and other variables that usually are not part of a credit analysis risk model. All with the proper consent of the clients, of course, after all, who wouldn't agree to provide data for analysis in exchange for attractive 10% discounts at pharmacies or e-commerce stores?

It had been a massive effort to organize this mass of information, perhaps the biggest challenge Anna had ever faced. Cleaning the data, ensuring its integrity, sometimes became a painful and almost insane task. Talking about Data Lake is easy, but creating a real, operational one with good performance is completely different!

But, finally, this stage was overcome. By her prediction, her model should be ready at any moment. She didn't include an execution log in the code because it would make the training process even slower, but she had already regretted that decision. At least she would be less anxious!

It was at this moment that, suddenly, the asterisk indicating modeling in progress disappeared. Anna knew what this meant; the process was finished. Without errors. Her model was ready.

She rushed to check the accuracy achieved on the more than 2 million examples provided to the algorithm: 99.7%!!!

Anna cursed! Overfitted!! Certainly, this precision indicated some error, perhaps some corrupted data. There is no 99.7% accuracy in real life! Would she have to start all over from scratch?

Still frustrated, Anna decided to test the model on the data she had set aside for testing. Another 500,000 CPFs with credit history, which would be the real proof!

She applied the algorithm to the new base, which had not been used in training, precisely to avoid biasing the results.

She was expecting a forecast well below the minimum reasonable accuracy of 90%, after all, her model should be completely biased by the training sample. She went to get a coffee, already imagining the headache it would be to find where the error was, certainly in the data, but where?

Upon returning, she looked incredulously at the screen. Accuracy of 98.7%! Could it be? The best model the finance company had ever developed was around 93% accuracy, and it was the pride of the former area manager, who today was none other than the CEO of the company.

But there was no doubt! There was the number. She ran a few more tests, including on a new base of prospects that had been worked on a few weeks ago. She realized that the delinquency projected for this base, according to her new model, was 27%, a huge amount!! But how had they lent money to such a bad base?? She looked at the system and noticed that the percentage of clients in arrears, a few weeks after the loan, was already at 7%, an absurdity! Once again, her model must be correct. And far superior to the one they were currently using.

After a few minutes, she realized what she now had in her hands. Through the integration of data not directly related to credit granting, she had significantly increased the ability to predict future delinquency. These apparent few extra percentage points in accuracy would mean a 30% increase in the finance company's profit. They could be more competitive by offering better rates to good risks. Of course, they would also more safely deny loans to potential bad payers. After all, that had always been the goal of credit models, but now the company would have a much more accurate weapon in its hands. A laser, instead of infrared.

Who would have thought that more than 5 monthly beach posts would increase the risk of someone being a bad payer? Or that the day of the week when credit is sought would also have an important predictive effect? Incredible!!

Anna was now radiant. Maybe she would finally get that long-awaited promotion.

It was at this moment that she looked more closely at the low-quality base, the one that her model was able to predict delinquency much more accurately than the current model. She realized that the vast majority of the risks that should have been denied were from low-income entrepreneurs, but with an impeccable credit history. Her independent variables showed that these people were applying for loans on Monday, the day with the highest delinquency. And they were posting beach photos during the week, another major credit risk offender. Who knows why, but Anna didn't need 'whys', she just needed 'whats' and 'who'. Causal modeling is for academics, right?

She looked at some names, common names, perhaps of simple, hardworking people who were happy to start their own businesses. Of course, many would not succeed and might not be able to pay back their loans...

Anna swallowed hard. She called a colleague she trusted very much, Sofia, and shared her dilemma. 'It seems that, despite the high precision, we have a bias in this model,' Sofia said. Anna sighed. Once again, she looked at the social network photos she had used for the model. She was motionless for a few seconds, which seemed like an eternity. And, finally, she thought to herself: 'Yeah, I guess I still have a lot of work ahead. That promotion is still going to have to wait a bit.

What did you think of the story? Of course, this is a fictional story (including the characters), but it brings to the forefront one of the hottest discussions about AI. Better and more accurate models, based on a wide range of variables, bringing precision and better margins, aren't they desirable? But what about the risk of excessive precision excluding part of society, making social mobility more difficult, for example? And here we are talking about credit, but the applications are diverse, such as in risk analysis for insurance, including health insurance. How to balance the two sides? Leave your opinion here!"

Amazing insights! Keep those articles coming!

Marcelo Correa

Senior Executive in Growth Martech, Digital Marketing, Advanced Analytics, Ecommerce, CRM and Customer Journey

1 年

I’ll give my humble opinion here, but I think that this is the chalenge that companies will face from now on: to know with hight accuracy who have a big chance on failing in their commitments, but also need a chance AND the company want to be part of these people’s recovery. Of course that companies are driven by profits, but this is the only reason to exist? Should organizations think of a potencial rule in helping high risk entrepreneurs and people and doing so, helping society? I might be romantic, but for me, that’s what separates human from AI. One part of the “profit” given by the model, should be put to help people, even knowing that this financial return might not occur, at least in short term. Fichman, it’s a pleasure to read your first of many articles that you will share with us. I ‘ve always been your fan since Readers. I am changing the course of my carreer too to dedicate to Martech, been AI and data science one of my pillars. I wish a successfull new path to you!!

André Artag?o

Business Consultant / C-Level Executive / Local & Regional Managing Director / Board Member / BPO Expert / Open to Relocation

1 年

Great article and excellent insights... Wish you Luis Fichman the usual success in this new activity - hope to read more great stories and accurate analyses from you regularly!! Abra??o!!

要查看或添加评论,请登录

Luis Fichman的更多文章

  • The Election

    The Election

    Jorge Santos (fictitious name) knew it was all over. While, in front of him, his advisors and everyone working in his…

  • The Invisible Voice

    The Invisible Voice

    Laura hated the smell of hospitals. An unmistakable, sterile aroma permeated the air, a mixture of antiseptic cleaning…

    3 条评论
  • The Hangover

    The Hangover

    Edu (fictitious name) slowly opened his eyes, still feeling the weight of the previous night on his eyelids. It had…

    2 条评论
  • The Accident

    The Accident

    "Hello. My code is RT232-3, version 14.

    2 条评论
  • Race Against Time

    Race Against Time

    Laura rubbed her eyes, trying to shake off the sleep as quickly as possible. The message she had just received left no…

  • The Marriage Proposal

    The Marriage Proposal

    The year 2077 was coming to an end. Marcel had nothing to complain about; the year had been very good for him.

    5 条评论

社区洞察

其他会员也浏览了