登录查看更多内容

Military Use of Machine Learning “Magic Powder” in Gaza

?? Alastair Muir, PhD, BSc, BEd, MBB

Data Science Consultant | @alastairmuir.bsky.social | Risk Analysis and Optimization

发布日期: 2024年4月12日

This is not a political viewpoint. It is based on my own investigation on how what I love can be used or misused once the genie is out of the bottle.

Let’s go back to 2021 when Yossi Sariel was unmasked as commander of Israel's Unit 8200, architect of their AI strategy and author of the book “The Human Machine Team”. Unit 8200 is part of Israeli Defense Forces (IDF) and is one of the world’s most powerful surveillance agencies, on par with the US National Security Agency.(https://www.theguardian.com/world/2024/apr/05/top-israeli-spy-chief-exposes-his-true-identity-in-online-security-lapse).

The IDF never denied the existence of a database of operatives in terrorist organizations, that cross-checks existing information on such operatives

It was revealed today (https://www.theguardian.com/world/2024/apr/11/idf-colonel-discusses-data-science-magic-powder-for-locating-terrorists) that Unit 8200 has adopted machine learning "Magic Powder" to help identify Hamas targets in Gaza. They “...take the original sub-group, we calculate their close circles, we then calculate relevant features, and at last we rank the results and determine the threshold”. This "Magic Powder" is a Positive Unlabelled Learning Classifier.

Let’s take this apart Positive Unlabelled Learning to assess the limitations and risks of the IDF machine learning application, and possible issues with implementing machine learning guided decision support in a conflict arena.

Positive Unlabelled Classification is unlike common machine learning classification algorithms

Positive unlabelled learning is a machine learning classifier developed for dealing with positive-unlabeled (PU) datasets. This is a challenging problem where positive instances are not explicitly identified as positive. An essential requirement of machine learning is having a set of labeled data to refine your model and assess its accuracy and precision. This allows the model to be used to predict the label of a new observations. In the figure above you can see that in PU datasets only some of the positives and none of the negatives are labelled. This becomes even harder with unbalanced data sets, when the proportion of positives is low.

VentureBeat 8 个月前

5 Amazing Companies That Really Get Big Data

Bernard Marr 9 年前

Neural Synapses Dynamics: Embracing data, generative…

David M. Luna ?? ?? ?? Threat Convergence ? Kine-Dynamics 1 年前

A pivotal reference to this problem, (“Learning classifiers from only positive and unlabeled data”, 2008, Charles Elkan and Keith Noto) divides the problem into two phases involving Bayesian inference.

Train a classifier to predict whether the sample is labeled or not: P(s=1|x)
Use the classifier to predict the probability that the positive samples are labeled: P(s=1 | y =1)
Use the classifier to predict the probability that sample k is labeled: P(s=1|k)
Estimate the probability that k is positive: P(s=1|k)/P(s=1|y=1)

In a practical world, the result of these calculations is always a balance of probabilities that depends very much on the sample size, the proportion of unlabeled observations, the chosen algorithm for classification, and the reliability of the labels in the training set. These are fixed and determined by the available data. Newer algorithms have been developed, but a single problem persists.

My key concern for all algorithms is how the IDF is going to balance the number of false alarms against a failure to detect a credible risk. This changes both the way the machine learning model is refined and changes the confidence the IDF will place in recommendations.

For all algorithms, the objective is to balance the influence of all the input variable on the outcome. The outcome of a classifier comes in many, many forms depending on your tolerance of errors. Calculated probabilities are used along with a cutoff to make the decision of yes/no, positive/negative, one/zero, guilty/innocent. This threshold of probability for the classification is a tunable parameter and addresses the consequences of making decisions based on a model with known errors. In the criminal justice system, it is always a tradeoff between convicting an innocent person, and letting a guilty person go free.

If a machine learning model is part of a decision support system, then it has to pass a number of hurdles.

What is your definition of the positive or negative classification? (building, critical infrastructure, vehicle, person of interest, sympathizer, terrorist, leader)
What are your actions as a result of a positive classification? (surveillance, interrogation, campaign, lethal force)
What is your metric for refining the model? (recall, accuracy, precision, F1 score, etc.)
How accurate is your classification data for labeled subjects?
How large is your training set?
How well does the algorithm perform on unbalanced data?
How have you weighted the consequences of false positives versus false negatives?
Is the output a classification (yes/no) or a probability?

?? Alastair Muir, PhD, BSc, BEd, MBB

Data Science Consultant | @alastairmuir.bsky.social | Risk Analysis and Optimization

7 个月

Another system used to identify buildings and structures as targets is called "The Gospel". There are no words on the technology behind this application https://www.theguardian.com/world/2023/dec/01/the-gospel-how-israel-uses-ai-to-select-bombing-targets#:~:text=The%20IDF%20said%20that%20%E2%80%9Cthrough,carried%20out%20by%20a%20person%E2%80%9D.

1 次回应

?? Alastair Muir, PhD, BSc, BEd, MBB

Data Science Consultant | @alastairmuir.bsky.social | Risk Analysis and Optimization

7 个月

More details about the "Lavender" system: it identified 37,000 Hamas targets (people). The accuracy was quoted as 90%, but accuracy for a classifier does not consider false positive or false negative rates https://www.theguardian.com/world/2024/apr/03/israel-gaza-ai-database-hamas-airstrikes

1 次回应

Aaron Sheldon

Scientific Consultant | A Big Maths data unicorn pursuing unicorn projects

7 个月

Prosecutors Fallacy writ large...used to justify genocide.

3 次回应

查看更多评论

要查看或添加评论，请登录

?? Alastair Muir, PhD, BSc, BEd, MBB的更多文章

SHAP is not all you need (or why you should always use permutation feature importance)

2024年3月10日

SHAP is not all you need (or why you should always use permutation feature importance)

Repost from Christoph Molnar A most annoying misconception in the world of machine learning interpretability This post…

11 条评论
Why Does My Model Not Generalize Well?

2023年12月27日

Why Does My Model Not Generalize Well?

We can learn a lot from the analysis of ecological data. These data can show complex temporal, spatial, hierarchical…

6 条评论
ChatGPT vs Gemini: What Does a Game Changer in AI Look Like?

2023年12月7日

ChatGPT vs Gemini: What Does a Game Changer in AI Look Like?

TLDR: Look at the time dependence of the distribution of metrics from an ensemble of strategies When I evaluate…
Robust AI: Rethinking Data Strategies

2023年11月2日

Robust AI: Rethinking Data Strategies

In the world of data, we often find ourselves working with whatever information comes our way. But is that always the…

2 条评论
Data Science and Data Engineering

2023年10月17日

Data Science and Data Engineering

Alastair Muir, Phd, BSc, BEd, MBB Managing Partnerships In today's data-driven world, the terms "data science” (DS) and…
Data scientists: How to talk with your subject matter experts

2020年10月23日

Data scientists: How to talk with your subject matter experts

I had an amazing meeting with my subject matter expert partner yesterday. I came to the table with lots of CNN, LSTM…

3 条评论
Why 85% of AI and ML initiatives fail*

2020年9月4日

Why 85% of AI and ML initiatives fail*

*Click bait headline, but it's not "failure to get good data", or "lack of executive support" If I see headlines like…

3 条评论
Reporting on your data science results can be a full time job

2017年10月3日

Reporting on your data science results can be a full time job

All Data Scientists should possess reporting and communication. The BBC is adding its own "Data Journalist" set to the…
Machine Learning - Old Fish in New Paper

2017年7月12日

Machine Learning - Old Fish in New Paper

When NOT to use deep learning? Pablo Cordero’s post, Jeff Leek's Simply Stats Blog, and a rebuttal from Andrew Beam…
Why You Need Projects in a CI Program

2017年7月3日

Why You Need Projects in a CI Program

Decades ago, I walked through a GE Power Systems service shop with Steve Zwolinski, at the time a newly promoted…

1 条评论

See all articles

Military Use of Machine Learning “Magic Powder” in Gaza

?? Alastair Muir, PhD, BSc, BEd, MBB

Data Science Consultant | @alastairmuir.bsky.social | Risk Analysis and Optimization

领英推荐

?? Alastair Muir, PhD, BSc, BEd, MBB的更多文章

社区洞察

其他会员也浏览了

They are coming for your data

The Rise of Shadow AI: How Nation-States Undermine Trust in AI for Political Gain

Understanding the Role of Artificial Intelligence in Local Governance

Flying into the Future with AI and Data Analytics in the Air Force

is NATO's AI Strategy sufficient enough for Defending Against the Dark Side of Artificial Intelligence

Let’s not bomb those AI data centres just yet

Artificial Intelligence and Public Service: Key New Challenges

CaliforniAI: New Executive Order takes on Generative AI

Could Superintelligence Be One of the Most Closely Guarded Secrets of the United States?

领英推荐

?? Alastair Muir, PhD, BSc, BEd, MBB的更多文章

SHAP is not all you need (or why you should always use permutation feature importance)

Why Does My Model Not Generalize Well?

ChatGPT vs Gemini: What Does a Game Changer in AI Look Like?

Robust AI: Rethinking Data Strategies

Data Science and Data Engineering

Data scientists: How to talk with your subject matter experts

Why 85% of AI and ML initiatives fail*

Reporting on your data science results can be a full time job

Machine Learning - Old Fish in New Paper

Why You Need Projects in a CI Program

社区洞察

其他会员也浏览了

They are coming for your data

The Rise of Shadow AI: How Nation-States Undermine Trust in AI for Political Gain

Understanding the Role of Artificial Intelligence in Local Governance

Flying into the Future with AI and Data Analytics in the Air Force

is NATO's AI Strategy sufficient enough for Defending Against the Dark Side of Artificial Intelligence

Let’s not bomb those AI data centres just yet

Artificial Intelligence and Public Service: Key New Challenges

CaliforniAI: New Executive Order takes on Generative AI

Could Superintelligence Be One of the Most Closely Guarded Secrets of the United States?