Finding your wifi password in 512-dimensional space
We recently participated in kaggle’s and AI Village’s capture-the-flag @ DEFCON competition, which considers different aspects of security in data science. We’d like to tell you about some of the mini-challenges that we solved and their implications. Maybe the one with the highest wow-effect was finding a wifi password in a really high-dimensional space. A mixture of data science and escape room, as we will see. The challenge text stated:
You really need to check your email, unfortunately you don't know the password. Fortunately, someone wrote it down. Unfortunately, it's written down on a low-dimensional manifold embedded in a very high-dimensional space. Check out the wifi/Embedded characters.npz file -- a list of tokens is given in the tokens key with their corresponding embeddings in the same order under the embeddings key -- and recover the password.
A little confusing, you say? No problem, let’s apply some investigative Data Science to figure it out!
We are given two items: A large 182x512 matrix…
and the following token sequence, which happens to contains 182 characters: !!""##$$%%&&''(())**++,,--..//00112233445566778899::;;<<==>>??@@AABBCCDDEEFFGGHHIIJJKKLLMMNNOOPPQQRRSSTTUUVVWWXXYYZZ[[\\]]^^__``aabbccddeeffgghhiijjkkllmmnnooppqqrrssttuuvvwwxxyyzz{{||}}~~
领英推荐
What now? Each character in the token sequence seems to correspond to a row in the matrix, but 512 dimensions is a little too large to analyze comfortably. Could we reduce the number of dimensions to something manageable? Well, yes, we can! Meet Principal Component Analysis (affectively known among data scientists as “PCA”): An algorithm that reduces a huge 182x512 matrix (i.e. 182 points with 512 dimensions each) to something with less dimensions, but as much information about the original data as possible. Something like 182x2 would be nice and easy to visualize. We can plot the results to produce the following beautiful graph:
Incredible! Now the question remains: How do we extract the password from here? Maybe after some thinking the escape room-savvy reader guessed it: Assign the i-th letter in the token sequence to the i-th point in the spiral and then read the result outwards: FLAG{TURNED}0123456789abcdefghijklmnopqrstu… and so on That’s your password!
Why is this relevant to security and data science? On the one hand, it shows how powerful dimensionality reduction can be. It transforms an intractable problem into an intuitive one. Most importantly, however, it demonstrates how to hide information in plain sight. Who needs quantum computer-proof encryption algorithms if you can hide it with a little help of statistics?
Stay tuned for more exciting data science posts!
Unity Developer | Data Science and AI student.
12 个月hello is it possible to provide the code for this?