Military Use of Machine Learning “Magic Powder” in Gaza
Photo by Myko Makhlai on Unsplash

Military Use of Machine Learning “Magic Powder” in Gaza

This is not a political viewpoint. It is based on my own investigation on how what I love can be used or misused once the genie is out of the bottle.

Let’s go back to 2021 when Yossi Sariel was unmasked as commander of Israel's Unit 8200, architect of their AI strategy and author of the book “The Human Machine Team”. Unit 8200 is part of Israeli Defense Forces (IDF) and is one of the world’s most powerful surveillance agencies, on par with the US National Security Agency.(https://www.theguardian.com/world/2024/apr/05/top-israeli-spy-chief-exposes-his-true-identity-in-online-security-lapse).

The IDF never denied the existence of a database of operatives in terrorist organizations, that cross-checks existing information on such operatives

It was revealed today (https://www.theguardian.com/world/2024/apr/11/idf-colonel-discusses-data-science-magic-powder-for-locating-terrorists) that Unit 8200 has adopted machine learning "Magic Powder" to help identify Hamas targets in Gaza. They “...take the original sub-group, we calculate their close circles, we then calculate relevant features, and at last we rank the results and determine the threshold”. This "Magic Powder" is a Positive Unlabelled Learning Classifier.

Let’s take this apart Positive Unlabelled Learning to assess the limitations and risks of the IDF machine learning application, and possible issues with implementing machine learning guided decision support in a conflict arena.

Positive Unlabelled Classification is unlike common machine learning classification algorithms

Positive unlabelled learning is a machine learning classifier developed for dealing with positive-unlabeled (PU) datasets. This is a challenging problem where positive instances are not explicitly identified as positive. An essential requirement of machine learning is having a set of labeled data to refine your model and assess its accuracy and precision. This allows the model to be used to predict the label of a new observations. In the figure above you can see that in PU datasets only some of the positives and none of the negatives are labelled. This becomes even harder with unbalanced data sets, when the proportion of positives is low.

A pivotal reference to this problem, (“Learning classifiers from only positive and unlabeled data”, 2008, Charles Elkan and Keith Noto) divides the problem into two phases involving Bayesian inference.

  1. Train a classifier to predict whether the sample is labeled or not: P(s=1|x)
  2. Use the classifier to predict the probability that the positive samples are labeled: P(s=1 | y =1)
  3. Use the classifier to predict the probability that sample k is labeled: P(s=1|k)
  4. Estimate the probability that k is positive: P(s=1|k)/P(s=1|y=1)

In a practical world, the result of these calculations is always a balance of probabilities that depends very much on the sample size, the proportion of unlabeled observations, the chosen algorithm for classification, and the reliability of the labels in the training set. These are fixed and determined by the available data. Newer algorithms have been developed, but a single problem persists.

My key concern for all algorithms is how the IDF is going to balance the number of false alarms against a failure to detect a credible risk. This changes both the way the machine learning model is refined and changes the confidence the IDF will place in recommendations.

For all algorithms, the objective is to balance the influence of all the input variable on the outcome. The outcome of a classifier comes in many, many forms depending on your tolerance of errors. Calculated probabilities are used along with a cutoff to make the decision of yes/no, positive/negative, one/zero, guilty/innocent. This threshold of probability for the classification is a tunable parameter and addresses the consequences of making decisions based on a model with known errors. In the criminal justice system, it is always a tradeoff between convicting an innocent person, and letting a guilty person go free.

If a machine learning model is part of a decision support system, then it has to pass a number of hurdles.

  • What is your definition of the positive or negative classification? (building, critical infrastructure, vehicle, person of interest, sympathizer, terrorist, leader)
  • What are your actions as a result of a positive classification? (surveillance, interrogation, campaign, lethal force)
  • What is your metric for refining the model? (recall, accuracy, precision, F1 score, etc.)
  • How accurate is your classification data for labeled subjects?
  • How large is your training set?
  • How well does the algorithm perform on unbalanced data?
  • How have you weighted the consequences of false positives versus false negatives?
  • Is the output a classification (yes/no) or a probability?

?? Alastair Muir, PhD, BSc, BEd, MBB

Data Science Consultant | @alastairmuir.bsky.social | Risk Analysis and Optimization

7 个月

Another system used to identify buildings and structures as targets is called "The Gospel". There are no words on the technology behind this application https://www.theguardian.com/world/2023/dec/01/the-gospel-how-israel-uses-ai-to-select-bombing-targets#:~:text=The%20IDF%20said%20that%20%E2%80%9Cthrough,carried%20out%20by%20a%20person%E2%80%9D.

?? Alastair Muir, PhD, BSc, BEd, MBB

Data Science Consultant | @alastairmuir.bsky.social | Risk Analysis and Optimization

7 个月

More details about the "Lavender" system: it identified 37,000 Hamas targets (people). The accuracy was quoted as 90%, but accuracy for a classifier does not consider false positive or false negative rates https://www.theguardian.com/world/2024/apr/03/israel-gaza-ai-database-hamas-airstrikes

Aaron Sheldon

Scientific Consultant | A Big Maths data unicorn pursuing unicorn projects

7 个月

Prosecutors Fallacy writ large...used to justify genocide.

要查看或添加评论,请登录

?? Alastair Muir, PhD, BSc, BEd, MBB的更多文章

社区洞察

其他会员也浏览了