Detecting Benign URLs
?
I have had the pleasure of working for the last couple of years with a 澳大利亚悉尼大学 PHD researcher, Fariza Rashid , in support of a Defence Innovation Network scholarship. Recently Fariza’s first peer reviewed paper was accepted to be published, which can be found at: Phishing URL detection generalisation using Unsupervised Domain Adaptation - ScienceDirect?
The original research topic that we ( 泰雷兹 & Faculty of Engineering - University of Sydney ) selected for the DIN scholarship was to determine if the viability of various machine learning methods to determine if a URL was benign or not. Specifically, it was based on the problem that in many cases a benign application HTTP/s licence check, or update availability check can look like a malicious trojan calling its C2 (command and control) server on the Internet for updates or commands (when you only have proxy logs). When you consider the tens of thousands of applications, it gets harder and therefore takes valuable initial incident response analysts time, to determine if a potential malicious C2 traffic that was detected was really the beginning of a bad day or was just a random application doing a license or update check. The project title at the time was called: "Finding Haystacks within Haystacks: An AI Framework for Automated, Machine-to-Machine Sharing of Benign Cyber Intelligence Data"
The hope was, given there is no easily available references of benign web requests for software (ideally an open-source project, or a project supported by a national CERT), then by validating potential ML approaches with the research, it would offer an easier path for such a community project to be stood up. The alternative would be to manually research all the potentially used applications out there (tens of thousands of applications) for the applications documented and undocumented license and update HTTP/s requests.
Unfortunately, due to the lack of proxy datasets available with labelled data (especially around benign license or update checks), the initial research from Fariza has been focused on the effectiveness of current research and practices for phishing URL detection. There were numerous phishing URL datasets available to undertake this research. It has been found through the research, that when reviewing the previous work in this space, that the approaches when generalised across datasets, did not perform as well for Phishing URL detection.
领英推荐
Research regarding how the knowledge gained from work done so far can be pivoted to the original problem statement continues. If anyone is interested, or even better, willing to share labelled URL datasets (especially if they are labelled against things like software license/update checks), please reach out to Fariza.
#cyber #cybersecurity #phishing #trojan #benign #CTI #misp #networking #research #phd #cert Suranga Seneviratne
?
?
Azure Team Lead
11 个月Congrats Ben Doyle