LREC'2022

Abstract

Audiobook readers play with their voices to emphasize some text passages, highlight discourse changes or significant events, in order to make listening easier and entertaining. Dialog is a central passage in audiobooks where the reader applies significant voice transformation, mainly prosodic modifications, to realize character properties and changes. However, these intra-speaker modifications are hard to re-produce with simple text-to-speech synthesis. The manner of vocalizing characters involved in a given story depends on the text style and differs from one speaker to another. In this work, this problem is investigated through the prism of voice conversion. We propose to explore modifying the narrator’s voice to fit the context of the story, such as the character who is speaking, using voice conversion. To this end, two complementary experiments are designed: the first one aims to assess the quality of our Phonetic PosteriorGrams (PPG)-based voice conversion system using parallel data. Objective and subjective evaluations with naive raters are conducted to estimate the quality of the signal generated and the speaker similarity. The second experiment applies an intra speaker voice conversion, considering narration passages and character speech passages as two distinct speakers. Data are then no parallel and the dissimilarity between character and narrator is subjectively measured.

Tasks

Inter-speaker voice conversion: change voice from speaker source to target speaeker.
Intra-speaker voice conversion: change voice style.

Dataset

Inter-speaker

Boule de suif (author Guy de maupassant) : FFR0009LV (Ezwa, female voice) , FFR0012LA (Victoria, female voice), MFR0015LA (Jean-Luc Fischer, male voice)
La petite comtesse (author Octave Feuillet) : FFR0011LA (Pomme, female voice), MFR0013LA (DanielLuttringer, male voice), MFR0014LA (René Depasse, male voice)

Intra-speaker

For the intra-speaker voice conversion or voice style conversion with the Synpaflex corpus.

Inter-speaker voice conversion samples

Speaker ID	Target Voc.	Synth. mode (SynMode)	Intra-gender Conv. mode	Inter-gender Conv. mode	Intra-gender Source Voc.	Inter-gender Source Voc.
FFR0009LV
FFR0009LV
FFR0009LV
FFR0012LA
FFR0012LA
FFR0012LA
MFR0013LA
MFR0013LA
MFR0013LA
MFR0014LA
MFR0014LA
MFR0014LA

Intra-speaker voice conversion

Reference A	Reference B	IS Voc.	Syn. mode - IS	Conv. mode- IS to DS	Syn. mode - DS	DS Voc.

Comments

References

[1] Sini, A., Lolive, D., Vidal, G., Tahon, M., & Delais-Roussarie, E. (2018, May). Synpaflex-corpus: An expressive french audiobooks corpus dedicated to expressive speech synthesis. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018).

INVESTIGATING INTER AND INTRA-SPEAKER VOICE CONVERSION USING AUDIOBOOK

Aghilas Sini , Damien Lolive , Nelly Barbot , Pierre Alain

Univ Rennes, CNRS, IRISA Lannion, France

6 Rue de Kerampont CS 80518, 22305 Lannion