Honeypots, Hackers and SIEMs, Oh My! - Part 2
If I recall correctly, we left off right about here:
latitude:49.84441,longitude:24.02543,destinationhost:honeypot-vm2,username:admin,sourcehost:xxx.xxx.xxx.xxx,state:null, country:Ukraine,label:Ukraine - xxx.xxx.xxx.xxx,timestamp:2022-07-28 15:37:
latitude:49.84441,longitude:24.02543,destinationhost:honeypot-vm2,username:admin,sourcehost:xxx.xxx.xxx.xxx,state:null, country:Ukraine,label:Ukraine - xxx.xxx.xxx.xxx,timestamp:2022-07-28 15:37:50
latitude:49.84441,longitude:24.02543,destinationhost:honeypot-vm2,username:morefailure,sourcehost:xxx.xxx.xxx.xxx,state:null, country:Ukraine,label:Ukraine - xxx.xxx.xxx.xxx,timestamp:2022-07-28 16:02:525
For those who did not read the inspired genius that was my first post, I'll summarize what this wall of text means.
Now all that I had to do was configure my custom logs, pass them to Sentinel and plot the results on a map.
The second half of the project was more streamlined in several ways, requiring more attention to detail and less "welp the original tutorial isn't relevant anymore, figure it out" than the first half.
After I had launched my VM and script, I took a brief break from the tutorial to wait and see where my first attempted breach would come from in real time. A certain childish excitement set in and several minutes went by as my eyes bored into my screen, eagerly anticipating the moment when the first l33t hax0r would fall into my trap.
To my bemusement, the internet's elitest hackers weren't as interested in my honeypot as I had hoped, so I got up to make myself some tea. Of course, by the time I had returned, the unwatched pot had boiled.
For those of you that didn't bring your magnifying glasses, the first country to try and get into my VM was apparently Finland (or someone working through a VPN wired through Finland).
Now that I was 100% sure that my VM was out on the open internet and attracting attention, it was time to proceed with the tutorial and set up my custom logs.
Part 3 - Log Configuration
Some of you diligent readers may be asking a simple question - "What had to be configured, exactly? All of the information you need it right in the text file!" ...and you'd be partially right.
Take a look at the example logs above. When you, the human, look at a chunk of text such as "latitude:49:84441,", you actually see several things at once without realizing it.
Computers make no inherent assumptions about the values of text unless they are taught how to, and that was the next task at hand - turning "xxxxxxxx:xx:xxxxx," into "<category>:<numerical value containing a colon>," so that the LAW correctly parsed the data, then replicating this for each data type I wanted to pass into my fancy-schmancy map.
Let's stick with latitude for an example. I wanted to have a numerical value to use as "latitude_CF" for my map, with "CF" being shorthand for "custom field". In order to do that, I needed my logs to identify and export that piece of information from the raw data supplied by my Powershell script to Sentinel without my assistance.
The first step in teaching the algorithm how to extract the raw data I wanted was opening the raw data.
latitude:60.17116,longitude:24.93265,destinationhost:honeypot-vm2,username:AZUREUSER,sourcehost:95.217.117.163,state:South Finland,label:Finland - 95.217.117.163,timestamp:2022-07-29 12:55:13
The next step was highlighting the numerical value for latitude that I wanted to extract (60.17116) and designating its data type. In other words, I told the algorithm that I expected to receive a number as output.
At this point, the LAW algorithm would take my example value, parse a bunch of other log values on its own and return what it thought was the correct values.
As the above example shows, the algorithms did pretty well with numerical values that a) came after a colon, b) contained a colon and c) had a delimiter (,) after them. I scrolled through the search results and confirmed that the algorithms had, in fact, identified the exact data points I wanted to extract.
This process then had to be repeated for every single data field I wanted to export to Sentinel, and the algorithms predictably needed a bit of help with certain values.
(The black boxes were my computer's IP address, which I've masked for obvious reasons).
For example, the algorithm inexplicably thought I wanted longitude when I was trying to extract the destination host.
"Countries" and "labels" with spaces in the names (such as "United States") also required a bit of extra editing, no bother, really. Just a bit of attention to detail over time and voila - a short time later, I had my custom fields defined!
I alt-tabbed back into my VM to see how things were going, and sure enough, the login attempts were now coming in by the hundreds! It was time to put my newly minted custom logs to the test.
First, I ran a query in the "Logs" section of my LAW. What I was hoping to see was data that had been extracted correctly (no empty values), categorized correctly (no longitudes where there should be latitudes, etc.) and parsed in real time.
It took a few tries and some tweaking with the query, but ultimately I got what I was looking for:
领英推荐
There were lots of attempts from a user called "的用户帐户" as well - Google Translate tells me that this means "user account" in Chinese.
Now that the LAW was identifying and extracting the information I needed it to through custom logs, I was ready to finally move on to creating my map!
Part 4 - Mapping the Data
In many ways, this was the most straightforward section of the entire project. In exactly one way, it was the least straightforward section of the entire project.
Mystery Alert: there is one issue that I wasn't able to solve (but was able to troubleshoot), so maybe you computer-savvy folks out there can let me know what exactly happened in the comments.
I leaned on Josh Madakor a bit for this part, as he had written a logs query that allowed the custom logs data to be plotted on a map in Sentinel.
FAILED_RDP_WITH_GEO_CL |
summarize event_count=count() by sourcehost_CF, latitude_CF, longitude_CF, country_CF, label_CF, destinationhost_CF |
where destinationhost_CF != "samplehost" |
where sourcehost_CF != ""
I get strong SQL vibes from these queries, by the way. Anyhoo, the mystery of this query is that it absolutely refused to generate a map and the resulting error message was so vague as to be un-Googleable.
I tried all the standard troubleshooting approaches - created a new map, verified that the other aspects of my data pipeline were working, etc. Nothing.
After a bit of messing around with the query itself, I found the source of the trouble - <where sourcehost_CF != "">
The purpose of the above line is to exclude data where the sourcehost (the IP address of the attacker) has been left blank or hasn't been provided. For whatever reason, this line prevented the map from opening as it should have, and I haven't the slightest idea why.
In any case, I just removed the line and went forward with my mapping. And wouldn't you know it, at long last Sentinel did exactly what I wanted it to.
At this point, the majority of my decisions were now just aesthetic in nature, but I settled on the following:
And that was that! The only thing keeping me from tracking all of the internet's attempts to break in to my honeypot at this point was the geolocation API, which would only run for free 2,000 times every day.
If 2,000 times sounds like a lot to you, you clearly have never seen a China-based server try to brute force their way into something that you own.
I also decided to use latitude/longitude rather than the more general dot placement that Sentinel was using for countries. That way I could see the difference in multiple servers based out of the same country reflected on my map.
Conclusion
I loved this project for several reasons:
1) I got to actually work with several Azure services and learn how to navigate, configure, customize, all that good stuff.
2) I got to troubleshoot and solve problems on my own, which is exactly what I would be doing if I were paid to track these security events in an Azure-based SIEM. Going off-road a bit meant I had to really understand the content of the tutorial and Azure itself in order to get everything to work, and that's good practice.
3) It gave me my first real look into Powershell scripting.
4) It was totally free! (No, really, you can replicate this yourself and not pay a dime.)
Briefly coming back to Point 2, I also compiled the changes to the original tutorial and posted them in a comment on the YouTube video itself. It was excellent practice for me and hey, why not help a brother/sister out who's also trying to learn something new?
Within 24 hours, I was more than pleasantly surprised to receive a healthy helping of sweet, sweet validation.
I'm taking that to mean I did something right.
Thanks for reading! As always, this post will be up on my blog as well. Stop over if you like reading the same things twice (but in a different font!) or need another Twitter account to follow.
- B