International Journal of Scientific & Technology Research

Home About Us Scope Editorial Board Blog/Latest News Contact Us
10th percentile
Powered by  Scopus
Scopus coverage:
Nov 2018 to May 2020


IJSTR >> Volume 8 - Issue 7, July 2019 Edition

International Journal of Scientific & Technology Research  
International Journal of Scientific & Technology Research

Website: http://www.ijstr.org

ISSN 2277-8616

Machine Learning Based Speech Emotions Recognition System

[Full Text]



Dr. Yogesh Kumar, Dr. Manish Mahajan



Emotion recognition, Feature extraction, Emotions, Modeling, Machine Learning , deep neural network, Dataset



The speech signal is one of the most natural and fastest methods of communication between humans. Many systems have been developed by various researchers to identify the emotions from the speech signal. In differentiating between various emotions particularly speech features are more useful and if not clear is the reason that makes emotion recognition from speaker’s speech very difficult. There are a number of the dataset available for speech emotions, it's modelling, and types that helps in knowing the type of speech. After feature extraction, another important part is the classification of speech emotions so the paper has compared and reviewed the different classifiers that are used to differentiate emotions such as sadness, neutral, happiness, surprise, anger, etc. The research also shows the improvement in emotion recognition system by making automatic emotion recognition system adding a deep neural network. The analysis has also been performed using different ML techniques for Speech emotions recognition accuracy in different languages.



[1] M. E. Ayadi, M. S. Kamel, F. Karray, “Survey on Speech Emotion Recognition: Features, Classification Schemes, and Databases”, Pattern Recognition, vol. 44, pp. 572-587, 2011.
[2] S. K. Bhakre, A. Bang, “Emotion Recognition on The Basis of Audio Signal Using Naive Bayes Classifier”, 2016 Intl. Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 2363-2367, 2016.
[3] I. Chiriacescu, “Automatic Emotion Analysis Based On Speech”, M.Sc. THESIS Delft University of Technology, 2009.
[4] X. Chen, W. Han, H. Ruan, J. Liu, H. Li, D. Jiang, “Sequence-to-sequence Modelling for Categorical Speech Emotion Recognition Using Recurrent Neural Network”, 2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia), pp. 1-6, 2018.
[5] P. Cunningham, J. Loughrey, “Over fitting in Wrapper-Based Feature Subset Selection: The Harder You Try the Worse it Gets Research and development in intelligent systems”, XXI, 33-43, 2005.
[6] C. O. Dumitru, I. Gavat, “A Comparative Study of Feature Extraction Methods Applied to Continuous Speech Recognition in Romanian Language”, International Symphosium ELMAR, Zadar, Croatia, 2006.
[7] S. Emerich, E. Lupu, A. Apatean, “Emotions Recognitions by Speech and Facial Expressions Analysis”, 17th European Signal Processing Conference, 2009.
[8] R. Elbarougy, M. Akagi, “Cross-lingual speech emotion recognition system based on a three-layer model for human perception”, 2013 AsiaPacific Signal and Information Processing Association Annual Summit and Conference, pp. 1–10, 2013.
[9] D. J. France, R. G. Shiavi, “Acoustical properties of speech as indicators of depression and suicidal risk”, IEEE Transactions on Biomedical Engineering, pp. 829–837, 2000.
[10] P. Harár, R. Burget, M. K. Dutta, “Speech Emotion Recognition with Deep Learning”, 2017 4th International Conference on Signal Processing and Integrated Networks (SPIN), pp. 137-140, 2017.
[11] Q. Jin, C. Li, S. Chen, “Speech emotion recognition with acoustic and lexical features”, PhD Proposal, pp. 4749–4753, 2015.
[12] Y. Kumar, N. Singh, “An Automatic Spontaneous Live Speech Recognition System for Punjabi Language Corpus”, I J C T A, pp. 259-266, 2016.
[13] Y. Kumar, N. Singh, “A First Step towards an Automatic Spontaneous Speech Recognition System for Punjabi Language”, International Journal of Statistics and Reliability Engineering, pp. 81-93, 2015.
[14] Y. Kumar, N. Singh, “An automatic speech recognition system for spontaneous Punjabi speech corpus”, International Journal of Speech Technology, pp. 1-9, 2017.
[15] A. Khan, U. Kumar Roy, “Emotion Recognition Using Prosodic and Spectral Features of Speech and Naïve Bayes Classifier”, 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), pp. 1017-1021, 2017.
[16] A. Kumar, K. Mahapatra, B. Kabi, A. Routray, “A novel approach of Speech Emotion Recognition with prosody, quality and derived features using SVM classifier for a class of North-Eastern Languages”, 2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS), pp. 372-377, 2015.
[17] Y. Kumar, N. Singh, “Automatic Spontaneous Speech Recognition for Punjabi Language Interview Speech Corpus”, I.J. Education and Management Engineering, pp. 64-73, 2016.
[18] G. Liu, W. He, B. Jin, “Feature fusion of speech emotion recognition based on deep Learning”, 2018 International Conference on Network Infrastructure and Digital Content (IC-NIDC), pp. 193-197, 2018.
[19] C. M. Lee, S. S. Narayanan, “Toward detecting emotions in spoken dialogs”, IEEE Transactions on Speech and Audio Processing, pp. 293-303, 2005.
[20] S. Mirsamadi, E. Barsoum, C. Zhang, “Automatic speech emotion recognition using recurrent neural networks with local attention”, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2227-2231, 2017.
[21] A. Nogueiras, A. Moreno, A. Bonafonte, J. B. Marino, “Speech Emotion Recognition Using Hidden Markov Model”, Eurospeech, 2001.
[22] J .Pohjalainen, P. Alku, “Multi-scale modulation filtering in automatic detection of emotions in telephone speech”, International Conference on Acoustic, Speech and Signal Processing, pp. 980-984, 2014.
[23] S. Renjith, K. G. Manju, “Speech Based Emotion Recognition in Tamil and Telugu using LPCC and Hurst Parameters”, 2017 International Conference on circuits Power and Computing Technologies (ICCPCT), pp. 1-6, 2017.
[24] A. Rajasekhar, M. K. Hota, “A Study of Speech, Speaker and Emotion Recognition using Mel Frequency Cepstrum Coefficients and Support Vector Machines”, International Conference on Communication and Signal Processing, pp. 0114-0118, 2018.
[25] M. Shrivastava, A. Agarwal, “Classification of emotions from speech using implicit features”, In 9th International Conference on Industrial and Information Systems, pp. 1-6, 2014.
[26] B. Schuller, A. Batliner, S. Steidl, D. Seppi, “Recognising realistic emotions and a_ect in speech: State of the art and lessons learnt from the first challenge”, Speech Communication, vol. 53, pp. 1062-1087, 2011.
[27] A. Steven, J. Rieger, R. Muraleedharan, R. P. Ramachandran, “Speech Based Emotion Recognition Using Spectral Feature Extraction and an Ensemble of kNN Classifiers”, 2014 9th International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 589-593, 2014.
[28] P. Tzirakis, J. Zhang, B. W. Schuller, “End-to-end speech emotion recognition using deep neural networks”, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5089-5093, 2018.
[29] T. Vogt, E. Andre, J. Wagner, “Automatic Recognition of Emotions from Speech: A review of the literature and recommendations for practical realization”, LNCS 4868, pp. 75-91, 2008.
[30] D. Ververidis, C. Kotropoulos, “Emotional speech recognition: Resources, features, and methods Speech Communication”, vol. 48, pp. 1162-1181, 2006.
[31] N. Weißkirchen, R. Bock, A. Wendemuth, “Recognition of Emotional Speech with Convolutional Neural Networks by Means of Spectral Estimates”, 2017 Seventh International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW), pp. 50-55, 2017.
[32] S. Yildirim, M. Bulut, C. Lee, “An acoustic study of emotions expressed in speech”, Proceedings of InterSpeech, pp. 2193–2196, 2004.
[33] J. Yuan, L. Shen, F. Chen, “The acoustic realization of anger, fear, joy and sadness in Chinese”, Proceedings of ICSLP, pp. 2025–2028, 2002.
[34] L. Zheng, Q. Li, H. Ban, S. Liu, “Speech Emotion Recognition Based on Convolution Neural Network combined with Random Forest”, The 30th Chinese Control and Decision Conference (2018 CCDC), pp. 4143-4147, 2018.
[35] H. Zhao, N. Ye, R. Wang, “A Survey on Automatic Emotion Recognition Using Audio Big Data and Deep Learning Architectures”, 2018 4th IEEE International Conference on Big Data Security on Cloud, pp. 139-142, 2018.
[36] X. Zhou, J. Guo, R. Bie, “Deep learning based Affective Model for Speech Emotion Recognition”, 2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress, pp. 841-846, 2016.