A Benchmark of Dynamical Variational Autoencoders applied to Speech Spectrogram Modeling

The Variational Autoencoder (VAE) is a powerful deep generative model that is now extensively used to represent high-dimensional complex data via a low-dimensional latent space learned in an unsupervised manner. In the original VAE model, input data vectors are processed independently. In recent years, a series of papers have presented different extensions of the VAE to process sequential data, that not only model the latent space, but also model the temporal dependencies within a sequence of data vectors and corresponding latent vectors, relying on recurrent neural networks. We recently performed a comprehensive review of those models and unified them into a general class called Dynamical Variational Autoencoders (DVAEs). In the present paper, we present the results of an experimental benchmark comparing six of those DVAE models on the speech analysis-resynthesis task, as an illustration of the high potential of DVAEs for speech modeling.

Keywords

Speech signals modeling dynamical variational autoencoders speech spectrograms speech analysisresynthesis

Domains

Machine Learning [cs.LG] Artificial Intelligence [cs.AI] Sound [cs.SD]

Fichier principal

Bie_et_al_Interspeech_2021_DVAE.pdf (239.23 Ko)

Origin : Files produced by the author(s)

Xavier Alameda-Pineda : Connect in order to contact the contributor

https://inria.hal.science/hal-03295657

Submitted on : Tuesday, January 18, 2022-4:56:19 PM

Last modification on : Thursday, April 4, 2024-9:16:23 PM

Dates and versions

hal-03295657 , version 1 (18-01-2022)

Identifiers

HAL Id : hal-03295657 , version 1
ARXIV : 2106.06500
DOI : 10.21437/Interspeech.2021-256

Cite

Xiaoyu Bie, Laurent Girin, Simon Leglaive, Thomas Hueber, Xavier Alameda-Pineda. A Benchmark of Dynamical Variational Autoencoders applied to Speech Spectrogram Modeling. Interspeech 2021 - 22nd Annual Conference of the International Speech Communication Association, Aug 2021, Brno, Czech Republic. pp.46-50, ⟨10.21437/Interspeech.2021-256⟩. ⟨hal-03295657⟩

Export

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 UGA CNRS INRIA INSA-RENNES IRISA GIPSA IETR SUP_IETR LJK LJK_GI GIPSA-CRISSP CENTRALESUPELEC IETR-FAST INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES INSA-GROUPE GIPSA-PPC MIAI ANR UR1-MATH-NUM HUB-IA LJK-GI-ROBOTLEARN

148 View

95 Download