International Journal of Scientific & Technology Research

Home About Us Scope Editorial Board Blog/Latest News Contact Us
10th percentile
Powered by  Scopus
Scopus coverage:
Nov 2018 to May 2020


IJSTR >> Volume 3- Issue 10, October 2014 Edition

International Journal of Scientific & Technology Research  
International Journal of Scientific & Technology Research

Website: http://www.ijstr.org

ISSN 2277-8616

An Optimized Feature Selection Technique For Email Classification

[Full Text]



Olaleye Oludare, Olabiyisi Stephen, Olaniyan Ayodele, Fagbola Temitayo



Index Terms: Classification, Dataset, E-mail, Feature-Selection, Machine Learning, Particle Swarm Optimization, Support Vector Machine.



Abstract: In machine learning, feature selection is a problem of global combinatorial optimization resulting in poor predictions and high computational overhead due to irrelevant and redundant features in the dataset. The Support Vector Machine (SVM) is a classifier suitable to deal with feature problems but cannot efficiently handle large e-mail dataset. In this research work, the feature selection in SVM was optimized using Particle Swarm Optimization (PSO). The results obtained from this study showed that the optimized SVM technique gave a classification accuracy of 80.44% in 2.06 seconds while SVM gave an accuracy of 68.34% in 6.33 seconds for email dataset of 1000. Using the 3000 e-mail dataset, the classification accuracy and computational time of the optimized SVM technique and SVM were 90.56%, 0.56 second and 46.71%, 60.16 seconds respectively. Similarly, 93.19%, 0.19 second and 18.02%, 91.47 seconds were obtained for optimized SVM technique and SVM respectively using 6000 e-mail dataset. In conclusion, the results obtained demonstrate that the optimized SVM technique had better classification accuracy with less computational time than SVM. The optimized SVM technique exhibited better performance with large e-mail dataset thereby eliminating the drawbacks of SVM.



[1] Abu-Nimeh S., D. Nappa, X.Wang, and S. Nair. Bayesian Additive Regression Trees-Based Spam Detection for Enhanced Email Privacy. In Proc. of the 3rd Int. Conf. on Availability, Reliability and Security (ARES 2008), Barcelona, Spain, pp 1044–1051, IEEE, March 2008.

[2] Andrew Webb (2010), Statistical Pattern Recognition. London: Oxford University Press.

[3] Blum A. L., and Langley P.,. Selection of relevant features and examples in machine learning. Artificial Intelligence, 97:245–271, 1997.

[4] Cranor, Lorrie F. & LaMacchia, Brian A. (1998): “Spam”, Communications of the ACM, 41(8): 1998, pp. 74-83

[5] Cristianini, N., & Shawe-Taylor, J. (2000). A introduction to support vector machines and other kernel-based learning methods. Cambridge, UK: Cambridge University Press.

[6] Duda, R. O., Hart, P. E., and Stork, D. G., 2001. Pattern Classification (2nd ed.). John Wiley and Sons, University of Michigan.

[7] Edward, D. (2003): “Intelligent Filters that Blocks SPAM”, Email and Pornographic Images

[8] El-Naqa, Yongyi, Y., Wernick M. N., Galatsanos, N. P., & Nishikawa R. M. (2002): A Support Vector Machine Approach for Detection of Microcalcifications, Medical Imaging, 21(12), pp. 1552-1563

[9] Fard, M. M., 2006. Ensemble Learning with Local Experts. IEEE Computer Society ezine “Looking.Forward” student magazine 14th edition. Available online at: 110 http://www.computer.org/portal/cms_docs_ieeecs/ieeecs/communities/students/lookin g/2006fall/05.pdf (last accessed: August 2008)

[10] Hsu, C.-W., & Lin, C.-J. (2002). A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Networks, 13(2), 415–425.

[11] Jain, A. K., Duin, R. P. W., Mao, J., 2000. Statistical pattern recognition: a review. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22 (1), pp. 4-38.

[12] Kennedy, J. and R.C. Eberhart. 1995. Particle Swarm Optimization. Proceedings IEEE International Conference on Neural Networks, IV, p. 1942-1948.

[13] Kim, K. I., Jung, K., & Kim, J. H. (2003). Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(12), pp. 1631–1639.

[14] Koller .D, and Sahami M.,. Toward optimal feature selection. pages 284–292. Morgan Kaufmann, 1996.

[15] Lee S, D. Kim, J. Kim, and J. Park. Spam Detection Using Feature Selection and Parameters Optimization. In Proc. of the 4th International Conference on Complex, Intelligent and Software Intensive Systems (CISIS’10), Krakow, Poland, pp. 883–888, IEEE, February 2010.

[16] Liyang, W. Y., Yongyi, R. M., Nishikawa, M. N., & Wernick, A. E. (2005a). Relevance vector machine for automatic detection of clustered microcalcifications. IEEE Transactions on Medical Imaging, 24(10), pp. 1278–1285.

[17] Liyang, W., Yongyi, Y., Nishikawa, R. M., & Yulei, J. (2005b). A study on several machine-learning methods for classification of malignant and benign clustered microcalcifications. IEEE Transactions on Medical Imaging, 24(3), pp. 371–380.

[18] Liang J., S. Yang, and A. Winstanley. Invariant Optimal Feature Selection (2008). A Distance Discriminant and Feature Ranking based Solution. Pattern Recognition, 41(1): pp. 1429–1439.

[19] McKenzie, D. and Low, L. H., 1992. The construction of computerized classification systems using machine learning algorithms: an overview. Computers in Human Behavior, vol. 8, pp. 155- 67. Ruta, D. and Gabrys, B., 2005. Classifier selection for majority voting. Information Fusion, 6, pp. 63–81.

[20] Priyanka, C., Rajesh, W., & Sanyam, S. (2010), “Spam Filtering using Support Vector Machine”, Special
Issue of IJCCT 1(2, 3, 4), 166-171.

[21] Ruta, D. and Gabrys, B., 2005. Classifier selection for majority voting. Information Fusion, 6, pp. 63–81.

[22] Shanin, M. A., Tollner, E. W. and McClendon, R. W., 2001. Artificial intelligence classifiers for sorting apples based on watercore. Journal of Agricultural Engineering Resources, vol. 79 (3), pp. 265-274.

[23] Shipp, C. A. and Kuncheva, L. I., 2002. Relationships between combination methods and measures of diversity in combining classifiers. Information Fusion, 3, pp. 135–148

[24] Thota H, R. N. Miriyala, S. P. Akula, K. M. Rao, C. S. Vellanki, A. A. Rao, and S. Gedela. Performance Comparative in Classification Algorithms Using Real Datasets (2009). Journal of Computer Science and Systems Biology, 2(1): pp. 97–100.

[25] Vapnik, V., 2000b. Support Vector Machines and Other Kernel-based Learning Methods. John Shawe-Taylor & Nello Cristianini, Cambridge University Press.

[26] Waters, Darren. “ Spam Overwhelms Email Messages” BBC News, 2009. Retrieved 2012-12-10

[27] Widodo, A., & Yang, B.-S. (2007). Support vector machine in machine condition monitoring and fault diagnosis. Mechanical Systems and Signal Processing, 21, 2560–2574.

[28] Widodo, A., Yang, B.-S., & Han, T. (2007). Combination of independent component analysis and support vector machines for intelligent faults diagnosis of induction motors. Expert Systems with Application, 32, 299–312.