登录查看更多内容

How to add ML classifier feature to your product in 4 days

Can Huzmeli

I intend to grow engineering leaders from within and create teams with high standards while establishing a culture of fairness, autonomy and mastery in an enjoyable workplace.

发布日期: 2023年2月1日

This is the second blog post of the series that started with "Creation of ethscore.net in 16 days". You don't have to read to first one. I tried to write them as independent as possible. Even with the blog posts, I am thinking about dependencies!

In this post, I will explain how you can add a basic ML classifier feature to your product in just 4 days. In this case, I wanted to add a trust score for an Ethereum wallet ID, meaning, how much can you trust that account.

Day - 1

First, I had to find data. For now, I didn't even decide whether I'll go for a supervised machine learning algorithm or an unsupervised one. I wanted to make that decision based on what kind of data I could find. After a long day of researching data online, I found a list of (around 700) Ethereum wallet IDs that were reported as fraudelent. I randomly checked a few of them, it was mostly phishing fraud. This was a good start but I needed the detailed data of these accounts. I wrote a script to fetch the detailed data of these accounts from https://etherscan.io/

I copied below a snippet of the etherscan.io API integration code if you need it.

etherscan_transactions_url = "https://api.etherscan.io/api?module=account&action=txlist&address="+i['address']+"&startblock=0&endblock=99999999&page=1&offset=10000&sort=asc&apikey=1234567890asdsdfgfghfjgd
response = requests.get(etherscan_transactions_url)
transactions_json = response.json() 
fp = open(output_filepath+i['address']+".json", 'w')
fp.write(json.dumps(transactions_json))
fp.close()"

The code above create a file for each Ethereum wallet id with all transaction history. However, I needed a summarised structural data that I can feed into the training model. So, I wrote another script to scan each file and calculate numbers like total number of transactions etc.

Day - 2

With the 2 scripts above, I had a CSV file ready to train the model. Or do I? Of course not, because in order to train the model, I need normal accounts as well as the fraudelent accounts. I can't train the model with only fraudelent account data. In order to do that, I decided to find a contract that is used to buy NFTs and decided that anyone who bought NFT through that contract would be a normal account. Now, you might think a fraudelent account can use the same account for phishing but also for self investment. However, that's not the case, cyber criminals never use the same account for their personal usage (ie: to buy NFTs) and for their criminal activities. I used the same 2 scripts above to create the data for the normal Ethereum wallet IDs.

After having my data ready in a single CSV file, with a column "fraud" as 1 or 0 to classify the data, I was ready to train my model. I looked around AWS and GCP to see if I could achieve something very quick. However, they both proved to be too complicated for a basic model I was trying to create. In AWS, there are ready models but they are for very specific usage. For example, if you want to check if a credit card transaction is fraudelent or not, you can start using AWS Fraud Detector with just a few clicks. Or the same for the creation of fake accounts, you can use AWS easily. However, if you want to build something custom, you have to build an entire flow of S3 - SageMaker - Write your code in SageMaker (maybe use xgboost) - Create a Lambda function - Open your lambda function through API Gateway, and solve all of the configuration problems you would face on your way... I needed something quick and simple. So, I decided to use Dataiku. I knew Dataiku from one of my consultancy engagement where I was responsible to setup Dataiku for a global mining company's Data Science function. I knew how easy it was to train a model and create an API service for scoring. It took me only a few hours to feed my CSV file and create this flow.

领英推荐

ModeLeak: Privilege escalation to LLM model…

ReversingLabs 3 个月前

Data Poisoning Attacks How to Beef up Your AI Security?

IntellyLabs Technologies 8 个月前

Experts Find Flaw in Replicate AI Service Exposing…

Cyberyami 9 个月前

No alt text provided for this image — Dataiku flow screenshot

Dataiku has a feature to compare different algorithms and inform you with the success rate. In my case, all 3 supervised algorithms performed almost the same. So, I chose the simplest one, logistic regression. I found this feature very powerful. Normally, if I had to do all of that from scratch, it would have taken me weeks but with Dataiku, it took me only a few hours.

Day - 3

The last step was to expose an API to classify a given data set as fraud or not. This proved to be a complicated as the documentation missed that requirement to create an extension service in order to be able to create an API. It took me half day trying to find out what the problem was. Luckily, Dataiku Forum was very responsive. If you are interested to see the problem I had and the solution, this is the post I created in the Dataiku Community.

I quickly integrated my Lambda function with the Dataiku API service. The service does not only provide "1" or "0" but also a probability of that decision. I used the probability of the decision as well which helped to create a score rather than a black and white result. I, shamefully, added a few hard coded rules for the scoring. Hopefully, when I improve the ML, I will be able to remove these hard coded interference.

If you want to learn more about Dataiku or need some help to get started, let me know. I will be happy to help.

Day - 4

Now it was time to add the visualisation to the page. It felt like I made the right choice by using MUI because MUI had the exact component I needed to show the result. I added a new React component, used the score information and displayed it as below.

This was the second blog post of the "Creation of ethscore.net in 16 days" series. In the next blog post, the third one, I will share the AWS architecture of this solution.

要查看或添加评论，请登录

Can Huzmeli的更多文章

Engineering Productivity at Different Stages

2024年2月5日

Engineering Productivity at Different Stages

Yes, I decided to get on the bandwagon of the software engineering productivity debate because why not. I have been…

5 条评论
Essential Metrics for Continuous Improvement in Software Development

2023年7月5日

Essential Metrics for Continuous Improvement in Software Development

In the ever-evolving landscape of software development, measuring key metrics is crucial for driving continuous…
Maintaining the code base of decline-stage products

2023年5月29日

Maintaining the code base of decline-stage products

?? 75% of your features are at the decline stage (you can't get rid of them and you have to contractually maintain…
Too good to be true

2023年5月21日

Too good to be true

This is the last post of the series "Creation of ethscore.net in 16 days".
AWS architecture of ethscore.net

2023年3月1日

AWS architecture of ethscore.net

This is the third post of the "Creation of ethscore.net in 16 days" blog post series.
Creation of ethscore.net in 16 days

2023年2月1日

Creation of ethscore.net in 16 days

In this article, I will explain how you can create a product like ethscore.net in 16 days.
B2B customer escalations

2022年12月8日

B2B customer escalations

"When we raise a customer escalation to you, how do you manage it?" asked one of the engineering managers during our…
“Agile is not for us”… seriously?

2016年8月23日

“Agile is not for us”… seriously?

On March 3, 2010, the Federal Bureau of Investigation killed its biggest and most ambitious modernization project — the…
3 Ways to Use Consumer Psychology to Your Advantage

2015年12月29日

3 Ways to Use Consumer Psychology to Your Advantage

"It is the result of thinking, not the process of thinking, that appears spontaneously in consciousness." George Miller…
Two reasons why we hate design changes

2015年12月8日

Two reasons why we hate design changes

I can remember my reaction when Facebook changed its news feed layout, I was really angry. But now I can't remember at…

See all articles

How to add ML classifier feature to your product in 4 days

Can Huzmeli

I intend to grow engineering leaders from within and create teams with high standards while establishing a culture of fairness, autonomy and mastery in an enjoyable workplace.

领英推荐

Can Huzmeli的更多文章

社区洞察

其他会员也浏览了

AI Meets Security: How TechVention Delivers Accuracy Without Compromise

Understanding AI Attacks and Their Types

ZipBeans ?| Maximizing Data Security and Ethical AI Use at Zipteams: Our Proven Methods for Ensuring Transparency and Dependability

Preventing AI Hallucinations, How Multi-Turn Attacks Generate Harmful Content, Guide for Building Secure AI Apps, and more

Vulnerable AI Tools Expose Sensitive Data

AI Data Breaches are Rising!

Advantages of Machine Learning for comprehensive fraud prevention

AI Data Breaches Are Rising! Learn How to Protect Your Company!

Man in the Model Attacks: Expanding the Threat Rationale of MitM Attacks

How To Protect Your Company Against AI Threats

领英推荐

Can Huzmeli的更多文章

Engineering Productivity at Different Stages

Essential Metrics for Continuous Improvement in Software Development

Maintaining the code base of decline-stage products

Too good to be true

AWS architecture of ethscore.net

Creation of ethscore.net in 16 days

B2B customer escalations

“Agile is not for us”… seriously?

3 Ways to Use Consumer Psychology to Your Advantage

Two reasons why we hate design changes

社区洞察

其他会员也浏览了

AI Meets Security: How TechVention Delivers Accuracy Without Compromise

Understanding AI Attacks and Their Types

ZipBeans ?| Maximizing Data Security and Ethical AI Use at Zipteams: Our Proven Methods for Ensuring Transparency and Dependability

Preventing AI Hallucinations, How Multi-Turn Attacks Generate Harmful Content, Guide for Building Secure AI Apps, and more

Vulnerable AI Tools Expose Sensitive Data

AI Data Breaches are Rising!

Advantages of Machine Learning for comprehensive fraud prevention

AI Data Breaches Are Rising! Learn How to Protect Your Company!

Man in the Model Attacks: Expanding the Threat Rationale of MitM Attacks

How To Protect Your Company Against AI Threats