Method for Speaker Gender Classification Based on Gaussian Mixture Modeling

Authors

  • Vasyl Semenov EPAM Department of Information Technologies, American University Kyiv, Kyiv Academic University
  • Yevheniya V. Semenova Laboratory of Data Science and Machine Learning, Kyiv Academic University, Institute of Mathematics of NANU

Keywords:

cepstral coefficients, pitch frequency, Gaussian mixture models, logistic regression, neural network

Abstract

The automatic gender identification is animportant problem both as independent task and as acomponent of different natural language processing (NLP)systems. In this paper the method for automatic speaker genderclassification is proposed and its basic algorithmic stages aredescribed. The method is based on the modeling of voice acousticparameters distribution by weighted sum of several Gaussiandistributions (Gaussian Mixture Modeling, GMM). The set ofcepstral RASTA-PLP coefficients extended by fundamentalfrequency was selected as the vector of acoustic features. GMMsfor male and female speakers were trained by ExpectationMaximization (EM) method with initialization by K-meansalgorithm. The dependency of classification accuracy on theGMM types (with diagonal and full-size covariance matrices) aswell as their orders was investigated. In different experimentsproposed method has shown classification accuracy from 91%to 100%. The comparison of proposed method both with logisticregression and five-layer neural network is also given.

Downloads

Published

2024-05-24

Issue

Section

Section 2 Information theory, coding and information form transformation