A Review on Malicious URLs Detection Using Machine Learning Methods

Tasfia Tabassum

Department of Information and Communication Engineering, Noakhali Science and Technology University, Bangladesh.

Md. Mahbubul Alam *

Department of Information and Communication Engineering, Noakhali Science and Technology University, Bangladesh.

Md. Sabbir Ejaz

Department of Information and Communication Engineering, Noakhali Science and Technology University, Bangladesh.

Mohammad Kamrul Hasan

Department of Information and Communication Engineering, Noakhali Science and Technology University, Bangladesh.

*Author to whom correspondence should be addressed.


Abstract

Malicious URLs are a serious threat to cybersecurity because they can compromise user security and inflict large financial losses. The extensiveness and adaptability of traditional detection approaches which rely on blacklists are limited when it comes to rapidly emerging threats. In response, machine learning methods have become more popular as a means of improving the detection efficiency of malicious URLs. This paper provides a thorough analysis providing a structured understanding of all aspects and formal formulation of the machine learning job of malicious URL detection. It covers feature representation and algorithm design, classifying and reviewing contributions from literature studies. The survey aims to provide a state-of-the-art understanding and support future research and practical implementations. It targets a diverse audience, including experts, cybersecurity professionals and machine learning researchers. The article provides a comprehensive overview of the field discussing practical system design considerations, ongoing research challenges and future research directions.

Keywords: Malicious URLs, Cybersecurity, Malware, Phishing, Machine Learning, Deep Learning


How to Cite

Tabassum, T., Alam, M. M., Ejaz , M. S., & Hasan , M. K. (2023). A Review on Malicious URLs Detection Using Machine Learning Methods. Journal of Engineering Research and Reports, 25(12), 76–88. https://doi.org/10.9734/jerr/2023/v25i121042

Downloads

Download data is not yet available.

References

Ashwini P, Vadivelan N D. Security from phishing attack on internet using evolving fuzzy neural network. CVRJST. 2021;20(1):50-5.

Sahoo D, Liu C, Hoi SC. Malicious URL detection using machine learning: A survey. arXiv preprint arXiv:1701.07179; 2017.

Aalla HVS, Dumpala NR, Eliazer M. Malicious URL prediction using machine learning techniques. Ann Rom Soc Cell Biol. 2021;2170-6.

Aljabri M, Altamimi HS, Albelali SA, Al-Harbi M, Alhuraib HT, Alotaibi NK, et al. Detecting malicious URLs using machine learning techniques: review and research directions. IEEE Access. 2022;10:121395-417.

Yuan J, Liu Y, Yu L. A novel approach for malicious URL detection based on the joint model. Sec Commun Netw. 2021;2021:1-12.

Anil GN. Detection of phishing websites based on feature extraction using machine learning. Int Res J Eng Technol (IRJET); 2020.

Liu J. Lexical features of economic legal policy and news in China Since the COVID-19 outbreak. Front Public Health. 2022;10:928965.

Joshi A, Lloyd L, Westin P, Seethapathy S. Using lexical features for malicious URL detection— A machine learning approach. arXiv preprint arXiv:1910.06277; 2019.

TechTarget [cited Oct 24, 2023]. Available:https://www.techtarget.com/

Choi H, Zhu BB, Lee H. Detecting malicious web links and identifying their attack types. In: 2nd USENIX Conference on Web Application Development (WebApps 11); 2011.

Johnson C, Khadka B, Basnet RB, Doleck T. Towards detecting and classifying malicious URLs using deep learning. J Wirel Mob Netw Ubiquitous Comput Depend Appl. 2020;11(4):31-48.

Cova M, Kruegel C, Vigna G. ’Detection and analysis of drive-by-download attacks and malicious Javascript code,’ in Proc. 19th international conference World Wide Web (WWW). 2010;281-90.

Sánchez-Paniagua M, Fernández EF, Alegre E, Al-Nabki W, Gonzalez-Castro V. Phishing URL detection: A real-case scenario through login URLs. IEEE Access. 2022;10:42949-60.

Pandey A, Chadawar J. Phishing URL detection using hybrid ensemble model. Int J Eng Res Technol (IJERT). 2022;11(04).

Romagna M, van den Hout NJ. Hacktivism and website defacement: Motivations, capabilities and potential threats. In: 27th virus bulletin international conference. 2017;1.

Romagna M, van den Hout NJ. Hacktivism and website defacement: motivations, capabilities and potential threats. In: 27th virus bulletin international conference. 2017;1..

Chang P. Multi-layer perceptron neural network for improving detection performance of malicious phishing URLs Without Affecting Other Attack Types Classification. arXiv preprint arXiv:2203.00774; 2022.

Tariq HA, Yang W, Hameed I, Ahmed B, Khan RU. USING black-list and white-list technique to detect malicious URLs. IJIRIS::International Journal of Innovative Research Journal in Information Security. 2017;4:01-7.

Kumar R, Zhang X, Tariq HA, Khan RU. Malicious URL detection using multi-layer filtering model 14th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), Chengdu, China, 2017;2017:97-100.

Chu W, Zhu BB, Xue F, Guan X, Cai Z. Protect sensitive sites from phishing attacks using features extractable from inaccessible phishing URLs. In: IEEE international conference on communications (ICC). IEEE Publications. 2013;2013:1990-4.

Seifert C, Welch I, Komisarczuk P. Identification of malicious web pages with static heuristics. In: Australasian Telecommunication Networks and Applications Conference. IEEE Publications; 2008;2008:91-6.

Nguyen LAT, To BL, Nguyen HK, Nguyen MH. A novel approach for phishing detection using URL-based heuristic. In: 2014 international conference on computing, management and telecommunications (ComManTel). IEEE Publications. 2014;298-303.

Schultz MG, Eskin E, Zadok F, Stolfo SJ (2000, May). Data mining methods for detection of new malicious executables. In Proceedings. S&P IEEE Symposium on Security and Privacy. IEEE Publications. 2001;2001:38-49.

Lekshmi RA, Thomas S. Detecting malicious URLs using machine learning techniques: A comparative literature review. Int Res J Eng Technol (IRJET). 2019;6(06).

Wang Y. Malicious URL detection an evaluation of feature extraction and machine learning algorithm. Highlights Sci Eng Technol. 2022;23:117-23.

Abad S, Gholamy H, Aslani M. Classification of malicious URLs using machine learning. Sensors (Basel). 2023;23(18):7760.

Almousa M, Anwar M. A URL-based social semantic attacks detection with character-aware language model. IEEE Access. 2023;11:10654-63.

Aljabri, Alhaidari M, F, Mohammad RMA, Mirza S, Alhamed DH, Altamimi HS, et al. An assessment of lexical, network, and content-based features for detecting malicious urls using machine learning and deep learning models. Comp Intell Neurosci. 2022;2022:3241216.

He S, Li B, Peng H, Xin J, Zhang E. An effective cost-sensitive XGBoost method for malicious URLs detection in imbalanced dataset. IEEE Access. 2021;9:93089-96.

Maci A, Santorsola A, Coscia A, Iannacone A. Unbalanced web phishing classification through deep reinforcement learning. Computers. 2023;12(6):118.

DR, U.S., Patil, A, & Mohana, M. In: International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT). IEEE Publications. Malicious URL Detection and Classification Analysis using Machine Learning Models. 2023;2023:470-6.

Cho X, Hoa D, Tisenko V. Malicious url detection based on machine learning. Int J Adv Comput Sci Appl; 2020.

Patgiri R, Katari H, Kumar R, Sharma D. Empirical study on malicious URL detection using machine learning. In: Distributed computing and Internet technology. Proceedings of the 15: 15th International Conference, ICDCIT 2019, Bhubaneswar, India, Jan 10-13, 2019. Springer International Publishing. 2019;380-8.

Tong X, Jin B, Wang J, Yang Y, Suo Q, Wu Y. MM-ConvBERT-LMS: detecting malicious Web pages via multi-modal learning and pre-trained model. Appl Sci. 2023;13(5):3327.

Nagy N, Aljabri M, Shaahid A, Ahmed AA, Alnasser F, Almakramy L et al. Phishing URLs detection using sequential and parallel ML techniques: comparative analysis. Sensors (Basel). 2023; 23(7):3467.

Ghaleb FA, Alsaedi M, Saeed F, Ahmad J, Alasli M. Cyber threat intelligence-based malicious url detection model using ensemble learning. Sensors (Basel). 2022;22(9):3373.

Wei B, Hamad RA, Yang L, He X, Wang H, Gao B et al. A deep-learning-driven light-weight phishing detection sensor. Sensors (Basel). 2019;19(19):4258.

Hajaj C, Hason N, Dvir A. Less is more: robust and novel features for malicious domain detection. Electronics. 2022;11(6):969.

Umer M, Sadiq S, Karamti H, Alhebshi RM, Alnowaiser K, Eshmawi AA, et al. Deep learning-based intrusion detection methods in cyber-physical systems: challenges and future trends. Electronics. 2022;11(20): 3326.

Elsadig M, Ibrahim AO, Basheer S, Alohali MA, Alshunaifi S, Alqahtani H, et al. Intelligent deep machine learning cyber phishing URL detection based on BERT features extraction. Electronics. 2022; 11(22):3647.

Abdul Samad SR, Balasubaramanian S, Al-Kaabi AS, Sharma B, Chowdhury S, Mehbodniya A, et al. Analysis of the performance impact of fine-tuned machine learning model for phishing URL detection. Electronics. 2023;12(7):1642.

Sandra K, ChaeHo L, Lee SG. Malicious URL detection based on associative classification. Entropy; 2021.

Roy SS, Awad AI, Amare LA, Erkihun MT, Anas M. Multimodel phishing url detection using LSTM, bidirectional LSTM, and gru models. Future Internet. 2022;14(11):340.

Fotiadou K, Velivassaki TH, Voulkidis A, Skias D, Tsekeridou S, Zahariadis T. Network traffic anomaly detection via deep learning. Information. 2021;12(5):215.

Almuhaideb AM, Aslam N, Alabdullatif A, Altamimi S, Alothman S, Alhussain A, et al. Homoglyph attack detection model using machine learning and hash function. J Sens Actuator Netw. 2022;11(3):54.

Saxe J, Berlin K. eXpose: A character-level convolutional neural network with embeddings for detecting malicious URLs, file paths and registry keys. arXiv preprint arXiv:1702.08568; 2017.

Christou O, Pitropakis N, Papadopoulos P, McKeown S, Buchanan WJ. Phishing URL detection through top-level domain analysis: A descriptive approach. arXiv 2020. arXiv preprint arXiv:2005.06599.

Lee S, Kim J. Warningbird: detecting suspicious urls in twitter stream. Ndss. 2012;12.

Tung SP, Wong KY, Kuzminykh I, Bakhshi T, Ghita B. Using a machine learning model for malicious url type detection. In: International Conference on Next Generation Wired/Wireless Networking. Cham: Springer International Publishing. 2021;493-505.

Jain AK, Gupta BB. A novel approach to protect against phishing attacks at client side using auto-updated white-list. EURASIP J Inf Sec. 2016; 2016:1-11.

Basnet RB, Sung AH, Liu Q. Learning to detect phishing URLs. Int J Res Eng Technol. 2014;3(6):11-24.

Alsariera YA, Adeyemo VE, Balogun AO, Alazzawi AK. Ai meta-learners and extra-trees algorithm for the detection of phishing websites. IEEE Access. 2020; 8:142532-42.

Liu L, Ren W, Xie F, Yi S, Yi J, Jia P. Learning-based detection for malicious android application using code vectorization. Sec Commun Netw. 2021;2021:1-11.

Diwan TD, Choubey S, Hota HS, Goyal SB, Jamal SS, Shukla PK et al. Feature entropy estimation (FEE) for malicious IoT traffic and detection using machine learning. Mob Inf Syst. 2021;2021:1-13.

Wang Y, Cai W, Lyu P, Shao W. A combined static and dynamic analysis approach to detect malicious browser extensions. Sec Commun Netw. 2018; 2018.

Song Y, Geng Y, Wang J, Gao S, Shi W. Permission sensitivity-based malicious application detection for android. Sec Commun Netw. 2021;2021:1-12.

Khan N, Abdullah J, Khan AS. Defending malicious script attacks using machine learning classifiers. Wirel Commun Mob Comput. 2017;2017.

Zhao H, Chang Z, Bao G, Zeng X. Malicious domain names detection algorithm based on N-gram. J Comput Netw Commun. 2019;2019:1-9.

Gomez G, Kotzias P, Dell’Amico M, Bilge L, Caballero J. Unsupervised detection and clustering of malicious tls flows. Sec Commun Netw. 2023;2023:1-17.

Zhao Y, Bo B, Feng Y, Xu C, Yu B. A feature extraction method of hybrid gram for malicious behavior based on machine learning. Sec Commun Netw. 2019;2019:1-8.

Kamran SA, Sengupta S, Tavakkoli A 2021. Semi-supervised conditional gan for simultaneous generation and detection of phishing urls: A game theoretic perspective. ArXiv preprint arXiv: 2108.01852.

Ispahany J, Islam R. Detecting malicious urls of covid-19 pandemic using ml technologies. arXiv preprint arXiv: 2009.09224; 2020.

Yu X. Phishing websites detection based on hybrid model of deep belief network and support vector machine. In IOP Conference Series. IOP Conf Ser.: Earth Environ Sci (Vol. 602, No. 1, p. 012001). 2020;602(1).

Aboaoja FA, Zainal A, Ghaleb FA, Al-rimy BAS, Eisa TAE, Elnour AAH. Malware detection issues, challenges, and future directions: A survey. Applied Sciences. 2022;12(17):8482.