Tagging schemas are a prerequisite to successful use of AI for cybersecurity
I'm going to apologize to Jacob Williams (known by many as @MalwareJake on Twitter) right up front because I will likely unintentionally misquote his opening IANS presentation 2 days ago - but it's not on purpose and the gist behind the quote is still highly accurate. It went something like this. "data normalization is the challenge that is inhibiting successful application of artificial intelligence in cybersecurity."
In context, Jake was talking about the importance and need to develop a tagging schema. As a simplistic example, consider tags that might be applied to an application:
In other words, it isn't sufficient just to HAVE tags or even apply tags (as shown in this example), but that you must also follow a defined tagging SCHEMA.
I asked ChatGPT itself what role data normalization played in it's own use and it told me:
Data normalization plays a crucial role in AI tools like ChatGPT. Normalization refers to the process of scaling and transforming data to improve the performance and stability of machine learning models. Specifically, normalization aims to bring the input data into a consistent range, so that the model can learn effectively from the data and make accurate predictions.
In the case of ChatGPT, which is a language model designed for natural language processing tasks such as conversation generation, data normalization can help to ensure that the model can learn from a wide range of input texts with varying lengths, structures, and formats. By normalizing the input data, ChatGPT can effectively process and analyze the text, identify patterns and relationships in the language, and generate coherent and meaningful responses to user queries.
Additionally, data normalization can help to prevent overfitting, which occurs when a model is too closely tuned to the training data and performs poorly on new or unseen data. By normalizing the input data, the model can learn to generalize patterns in the language rather than memorizing specific examples from the training data.
And to put an exclamation point on the whole topic, consider this article from NBC news that states OpenAI (the startup company behind ChatGPT) actually employees an "army of contractors" to assist it with the necessary task of data labelling [normalization].
Quoting one of the interviewed contractors
“We are grunt workers, but there would be no AI language systems without it,” one worker, Alexej Savreux, told NBC. “You can design all the neural networks you want, you can get all the researchers involved you want, but without labelers, you have no ChatGPT. You have nothing.”
Now where the rubber meets the road- here's a practical call to action.
FULL DISCLOSURE: The author of this article is a co-founder in a shadow startup that is looking at how artificial intelligence can improve current security processes. While AI has arguably been historically overhyped and exaggerated, it doesn't take away that a properly designed and implemented AI system can radically benefit your security position [emphasis on properly]. Contact me privately on LinkedIn if you want to have a non-sales conversation to see what that might look like.