Enhancing Phishing Detection with Advanced Ensemble Learning Techniques

Mohammed Ahmed

doi:10.51200/ijmic.v1i2.6334

Authors

Mohammed Ahmed DaTA Research Group, Faculty of Computing and Informatics, Universiti Malaysia Sabah, Sabah, Malaysia

DOI:

https://doi.org/10.51200/ijmic.v1i2.6334

Keywords:

phishing detection, ensemble learning, feature selection, principle component analysis, stacking model

Abstract

Phishing attacks contribute to over 90% of data breaches, posing a severe cybersecurity threat by tricking users into divulging sensitive information. Traditional detection methods, such as blacklists and heuristic-based approaches, are often ineffective against new phishing websites due to their rapidly evolving nature. This study introduces an advanced phishing detection model that leverages ensemble learning techniques to improve accuracy, robustness, and adaptability. The model integrates Decision Tree, Support Vector Machine (SVM), and k-Nearest Neighbours (kNN) as base classifiers, combined through a stacking ensemble approach, with Logistic Regression serving as the meta-classifier. Feature selection is performed using Random Forest, selecting the most impactful attributes based on importance scores greater than 0.01. Principal Component Analysis (PCA) is applied to reduce dimensionality while retaining 95% of the variance, minimizing information loss. Hyperparameter optimization is achieved through Grid Search. The dataset was sourced from an open-access phishing detection repository and consists of 11,430 URLs, with 60% classified as phishing and 40% as legitimate. It includes 87 features that are categorized into URL structure, webpage content, and external service queries. The model's performance is evaluated using accuracy, precision, recall, and F1-score across various test sizes (10%, 20%, 30%, and 40%). Experimental results demonstrate that the stacking ensemble model achieves a peak accuracy of 97.64% with PCA (95%) and feature selection (importance score > 0.01) at a 10% test size, significantly outperforming traditional methods. Performance comparisons across different test sizes highlight the positive impact of feature selection and PCA on phishing detection. Statistical validation through t-tests (p < 0.05) further confirms the model’s reliability, indicating substantial improvements over baseline methods. This study showcases the potential of ensemble learning and feature optimization in enhancing phishing detection, offering a robust solution for practical cybersecurity applications.

Enhancing Phishing Detection with Advanced Ensemble Learning Techniques

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Categories