Phishing detection using machine learning

Shirazi, Hossein, author; Ray, Indrakshi, advisor; Anderson, Chuck, advisor; Malaiya, Yashwant K., committee member; Wang, Haonan, committee member

Phishing detection using machine learning

Files

Shirazi_colostate_0053A_16761.pdf (3.69 MB)

Date

2021

Authors

Shirazi, Hossein, author

Ray, Indrakshi, advisor

Anderson, Chuck, advisor

Malaiya, Yashwant K., committee member

Wang, Haonan, committee member

Abstract

Our society, economy, education, critical infrastructure, and other aspects of our life have become largely dependent on cyber technology. Thus, cyber threats now endanger various aspects of our daily life. Phishing attacks, even with sophisticated detection algorithms, are still the top Internet crime by victim count in 2020. Adversaries learn from their previous attempts to (i) improve attacks and lure more victims and (ii) bypass existing detection algorithms to steal user's identities and sensitive information to increase their financial gain. Machine learning appears to be a promising approach for phishing detection and, classification algorithms distinguish between legitimate and phishing websites. While machine learning algorithms have shown promising results, we observe multiple limitations in existing algorithms. Current algorithms do not preserve the privacy of end-users due to inquiring third-party services. There is a lack of enough phishing samples for training machine learning algorithms and, over-represented targets have a bias in existing datasets. Finally, adversarial sampling attacks degrade the performance of detection models. We propose four sets of solutions to address the aforementioned challenges. We first propose a domain-name-based phishing detection solution that focuses solely on the domain name of websites to distinguish phishing websites from legitimate ones. This approach does not use any third-party services and preserves the privacy of end-users. We then propose a fingerprinting algorithm that consists of finding similarities (using both visual and textual characteristics) between a legitimate targeted website and a given suspicious website. This approach addresses the issue of bias towards over-represented samples in the datasets. Finally, we explore the effect of adversarial sampling attacks on phishing detection algorithms in-depth, starting with feature manipulation strategies. Results degrade the performance of the classification algorithm significantly. In the next step, we focus on two goals of improving the performance of classification algorithms by increasing the size of used datasets and making the detection algorithm robust against adversarial sampling attacks using an adversarial autoencoder.

Subject

machine learning

adversarial attacks

phishing detection

URI

https://hdl.handle.net/10217/234216
https://doi.org/10.25675/3.04518

Collections

2020-
Theses and Dissertations

Full item page

Phishing detection using machine learning

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Abstract

Description

Rights Access

Subject

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By