Finding your wifi password in 512-dimensional space

Finding your wifi password in 512-dimensional space

We recently participated in kaggle’s and AI Village’s capture-the-flag @ DEFCON competition, which considers different aspects of security in data science. We’d like to tell you about some of the mini-challenges that we solved and their implications. Maybe the one with the highest wow-effect was finding a wifi password in a really high-dimensional space. A mixture of data science and escape room, as we will see. The challenge text stated:

You really need to check your email, unfortunately you don't know the password. Fortunately, someone wrote it down. Unfortunately, it's written down on a low-dimensional manifold embedded in a very high-dimensional space. Check out the wifi/Embedded characters.npz file -- a list of tokens is given in the tokens key with their corresponding embeddings in the same order under the embeddings key -- and recover the password.

A little confusing, you say? No problem, let’s apply some investigative Data Science to figure it out!

We are given two items: A large 182x512 matrix…

No alt text provided for this image

and the following token sequence, which happens to contains 182 characters: !!""##$$%%&&''(())**++,,--..//00112233445566778899::;;<<==>>??@@AABBCCDDEEFFGGHHIIJJKKLLMMNNOOPPQQRRSSTTUUVVWWXXYYZZ[[\\]]^^__``aabbccddeeffgghhiijjkkllmmnnooppqqrrssttuuvvwwxxyyzz{{||}}~~

What now? Each character in the token sequence seems to correspond to a row in the matrix, but 512 dimensions is a little too large to analyze comfortably. Could we reduce the number of dimensions to something manageable? Well, yes, we can! Meet Principal Component Analysis (affectively known among data scientists as “PCA”): An algorithm that reduces a huge 182x512 matrix (i.e. 182 points with 512 dimensions each) to something with less dimensions, but as much information about the original data as possible. Something like 182x2 would be nice and easy to visualize. We can plot the results to produce the following beautiful graph:

No alt text provided for this image

Incredible! Now the question remains: How do we extract the password from here? Maybe after some thinking the escape room-savvy reader guessed it: Assign the i-th letter in the token sequence to the i-th point in the spiral and then read the result outwards: FLAG{TURNED}0123456789abcdefghijklmnopqrstu… and so on That’s your password!

Why is this relevant to security and data science? On the one hand, it shows how powerful dimensionality reduction can be. It transforms an intractable problem into an intuitive one. Most importantly, however, it demonstrates how to hide information in plain sight. Who needs quantum computer-proof encryption algorithms if you can hide it with a little help of statistics?

Stay tuned for more exciting data science posts!

Yacine Benyamina

Unity Developer | Data Science and AI student.

12 个月

hello is it possible to provide the code for this?

回复

要查看或添加评论,请登录

qdive的更多文章

社区洞察

其他会员也浏览了