What is voice morphing?

Let's start with what morphing is.

Morphing is an image transformation technique that smoothly changes one image into another, used in film and computer animation.

Voice morphing (or voice conversion) is an advanced digital audio processing technique that seamlessly transforms one person's voice (the source) into another person's voice (the target), while preserving the content of the speech. It uses artificial intelligence (AI) algorithms, machine learning, and digital signal processing (DSP). The system analyzes the characteristics of the source voice (timbre, pitch, timbre) and maps them to the characteristics of the target voice.

Researchers analyzing a signal-level approach to voice morphing attacks have revealed vulnerabilities in biometric voice recognition systems. They demonstrated that voice morphing attacks combine identities to bypass voice biometrics.

This is time-domain voice identity morphing (TD-VIM), which allows for the mixing of identities without embedding them in a structure or reference text.

In biometric systems, it's common practice to associate each sample or template with a specific individual. Advanced voice identity morphing (VIM) allows the generation of a sample that combines the identities of two or more speakers. "The modified voice sample can be used to match all identities whose voice samples were used to generate morphing attacks, which poses a high risk in application scenarios such as banking and finance, where a single identity verification is essential."

To investigate this issue, the research team created four distinct morphing signals and assessed their effectiveness through a comprehensive vulnerability analysis. The data was compared to the Generalized Morphing Attack Potential (G-MAP) metric, "which measures attack effectiveness in two deep learning-based speaker verification systems (SVS) and one commercial system, Verispeak."
The results highlight the effectiveness of the TD-VIM method in bypassing advanced verification mechanisms, underscoring the importance of improving SVS security.


The research comes from the Indian Institute of Technology and the Norwegian University of Science and Technology.

more about the voice morphing phenomenon here