Method for Speaker Gender Classification Based on Gaussian Mixture Modeling

Автор(и)

  • Vasyl Semenov EPAM Department of Information Technologies, American University Kyiv, Kyiv Academic University
  • Yevheniya V. Semenova Laboratory of Data Science and Machine Learning, Kyiv Academic University, Institute of Mathematics of NANU

Ключові слова:

cepstral coefficients, pitch frequency, Gaussian mixture models, logistic regression, neural network

Анотація

The automatic gender identification is animportant problem both as independent task and as acomponent of different natural language processing (NLP)systems. In this paper the method for automatic speaker genderclassification is proposed and its basic algorithmic stages aredescribed. The method is based on the modeling of voice acousticparameters distribution by weighted sum of several Gaussiandistributions (Gaussian Mixture Modeling, GMM). The set ofcepstral RASTA-PLP coefficients extended by fundamentalfrequency was selected as the vector of acoustic features. GMMsfor male and female speakers were trained by ExpectationMaximization (EM) method with initialization by K-meansalgorithm. The dependency of classification accuracy on theGMM types (with diagonal and full-size covariance matrices) aswell as their orders was investigated. In different experimentsproposed method has shown classification accuracy from 91%to 100%. The comparison of proposed method both with logisticregression and five-layer neural network is also given.

Завантаження

Опубліковано

24.05.2024

Номер

Розділ

Секція 2 Теорія інформації, кодування, перетворення форми, цифрової обробки та ущільнення інформації