What distinguishes effective voice biometrics systems? The following four indicators determine the advantage of one system over another:
1. Accuracy rate, it means that the effectiveness of biometric systems should be in the range of 95-99%.

2. FAR (False Acceptance Rate), a metric that measures how often a system incorrectly accepts an unauthorized person (e.g., someone impersonating a user) as a valid user. In the most accurate systems, this rate is less than 1%. The lower the rate, the more secure the system and the more difficult it is to impersonate.

3. FRR (False Rejection Rate), a metric that measures false rejections, or the number of times the system rejects a genuine user when it should accept them. Ideally, this figure is below 3%.

4. EER (Equal Error Rate). The point at which the FAR equals the FRR, this metric is often used to compare the quality of biometric systems.

The most effective systems are generally considered to be Phonexia oraz ID R&D systems due to their outstanding performance in comparative tests.

In our research, we primarily use Phonexia engines, but we also utilize others such as Kaldi (X-vector) and ECAPA. The goal is to test our algorithms as extensively as possible in a diverse environment. Security is our top priority.

In biometric analyses, the recording itself, or more precisely, its quality, determines the effectiveness of the biometric system’s speaker recognition. It’s worth recalling what determines the quality of such a recording (audio signal). Therefore, the quality of the audio signal is primarily influenced by:

👉 acoustic conditions – noise level, reverberation and SNR (signal to noise ratio)
👉speech quality – naturalness of the voice, its loudness, coherence of articulation
👉 recording devices
👉 recording time
👉 technical parameters such as frequency and resolution.

However, it’s safe to say that the main factor determining quality is SNR. SNR (Signal-to-Noise Ratio) stands for signal-to-noise ratio—a measure of signal quality relative to the level of interference. A high ratio indicates clear sound, free from background noise, which translates into high speech identification efficiency and more accurate analysis.

Recording time is also a crucial factor. Biometric systems require a minimum length of voice sample to extract characteristic features. A recording that is too short (1-2 seconds) may not be sufficient for reliable identification. A longer recording, on the other hand, does not directly improve quality, but it allows for better extraction of clear and repeatable fragments, which improves accuracy.

Therefore, the optimal recording length ranges from a few to a dozen or so seconds of speech.

Speech quality, in turn, encompasses not only the clarity of the recording but also naturalness, volume control, and consistent articulation. These elements influence the reliability of voice samples and the effectiveness of recognition algorithms. It’s important to remember that extreme values ​​(too quiet/too loud) can complicate analysis.





An open benchmark for assessing systems for detecting deepfakes and manipulated media content has emerged. It aims to help evaluate and improve algorithms for detecting audio, video, and image content generated by AI.

The shared dataset contains over 50,000 samples of real, AI-generated and manipulated audiovisual content—deepfakes and synthetic media—annotated with real-world use cases. Adversarial attacks allow for testing the model’s robustness.

Importantly, the license is granted for evaluation purposes only and is not intended for training or commercial purposes..

It’s a joint initiative of Microsoft’s Good Lab, Northwestern University’s Security and Artificial Intelligence Lab, and the nonprofit WITNESS.

Will it encourage researchers to use and share their own analyses?

more biometricupdate https://www.biometricupdate.com/202507/new-microsoft-benchmark-for-evaluating-deepfake-detection-prioritizes-breadth

A new form of payment using smart glasses has hit the market! Transactions can be completed using QR code scans and voice commands. Alipay’s smart glasses, created in collaboration with Chinese smartphone manufacturer Meizu, are powered by voice authentication and intent recognition technology. Meizu, for its part, offers an optical waveguide display, voice noise reduction, and camera-based code capture technology.


Firma przeprowadziła właśnie swoją pierwszą transakcję płatniczą z wykorzystaniem portfela elektronicznego z wbudowanymi inteligentnymi okularami za pośrednictwem AlipayHK w Hongkongu.

The company has just completed its first payment transaction using an e-wallet with built-in smart glasses via AlipayHK in Hong Kong.

Ant Group plans to roll out the new feature to Alipay+’s global partners in 2025. Alipay+ is the company’s cross-border mobile payment solution that allows businesses to accept payments from mobile wallets across multiple countries, including Line Pay and GrabPay. The service currently connects over 1.7 billion user accounts across 36 mobile wallets.

more https://www.biometricupdate.com/202506/alipay-introduces-smart-glasses-payment-with-voice-authentication

What do you think about this form of payment? Could this service revolutionize the market? Do you see its applications in everyday life?

Can a bot be sincere? Can it express remorse when it apologizes? These are the questions researchers are asking themselves in the context of the possible use of AI in handling complaints.

This is about automating the entire process. While its operation seems to be easy, the most difficult challenge may be showing emotion by the bot, in this case an apology.

Research suggests that when people expect an apology, they expect sincere contrition and authenticity. Spontaneity is also important, and these are typical human traits that a machine can’t always handle perfectly. So how reliable can an apologetic AI be?

more on biometric update https://www.biometricupdate.com/202505/apologetic-intelligence-should-bots-handle-complaints

Voice stream augmentation is a technique that amplifies or modifies a voice stream in real time to improve its quality. In short, it involves using electronic devices such as microphones, loudspeakers, amplifiers, or software algorithms to change the characteristics of the sound, e.g. voice tone.

It is not difficult to guess that augmentation is used in speech recognition systems, games, and even in artificial intelligence, where it allows the generation of artificial voices and improving their naturalness.

Currently, in one of the projects we are creating our own voice stream augmentation engine. Its purpose is to support detection of unauthorized use of voice in the communicator for further synthesis/conversion without causing degradation of the sound heard at the level of the human ear. All this to prevent voice theft and ensure the most effective operation of the service.

It turns out that deepfake audio can be more dangerous than video! According to the Pindrop report, in the two years 2023-2024 there was a 760% increase in the number of such deepfakes (audio).


In an era of increasing attacks, self-awareness seems to be a key barrier protecting humans from these types of threats. It is about:

●  limited trust in voice assistants,
â—Ź knowledge of social techniques used by fraudsters,
â—Ź control over the content you publish on the Internet.

In system solutions, it is obvious to use advanced biometric technologies and methodologies to detect deepfakes in real time.

For example, Pindrop uses a technique called acoustic fingerprinting as one of its capabilities. This involves creating a digital signature for each voice based on its acoustic properties, such as pitch, tone, and cadence. These signatures are then used to compare and match voices across calls and interactions. For more on deepfakes, check out this podcast with Vijay Balasubramaniyan, CEO of Pindrop. Link below
https://www.biometricupdate.com/202504/biometric-update-podcast-digs-into-deepfakes-with-pindrop-ceo

As a reminder, Pindrop is a company based in Atlanta, USA.  Their solutions are leading the way for the future of voice communications, setting the standard for identity, security, and trust in every voice interaction.  More at pindrop.com

The exhibition was marked by the ubiquitous AI. Many companies presented their latest achievements in constructing systems that communicate autonomously with people. The humanoid robot Ameca (Etisalat) interacting with its interlocutors aroused great interest. The stands with interactive agents (Amdocs) offered an almost unbelievable quality of image and speech generated by the systems.

Google has unveiled Gemini Live, its response to ChatGPT’s voice mode.  Gemini Live has function Share Screen With Live, that allows Gemini to interact with the image displayed on the phone’s screen. Deutsche Telekom has indicated a possible direction for the development of phones by turning the entire phone into a chatbot. The phone has no applications and is a personal assistant that communicates with the user by voice. The basis of the solution is a digital assistant from AI Perplexity, but it is also to be open to, among others, Google Cloud AI, ElevenLabs, and Picsart. South Korean startup Newnal has presented a new operating system for mobile phones that uses historical and current user data to create a personalized AI assistant that is to eventually become an AI avatar behaving just like the user.

All of the above solutions, as well as many others, are connected by the use of voice technologies for two-way communication. The direction indicated at MWC 2025 is clear – our actions will be supported by avatars and bots communicating with us autonomously. The possibility of quick, machine confirmation of who we are talking to is therefore becoming even more important than ever before, because the quality of autonomous voice communication systems does not guarantee correct verification of the speaker by a human.

Photos by Andrzej Tymecki

 


How to effectively detect voice-based fraud? How to distinguish a real voice from a fake one, e.g. one generated on the basis of AI? The answer is simple. This requires advanced voice biometrics tools and a number of analyses. We publish here two examples that we analyzed some time ago in our laboratory and, thanks to our proprietary algorithm, we assessed with a very high probability whether the voice is real or fake and to what extent it is consistent with the voice of a given person.

The analyzes concern:

â—Ź recognizing the voice of one of President Duda’s Russian pranksters pretending to be President Macron

● assessment of the similarity of the voices of actors Piotr Fronczewski and Filip Pławiak in the film Rojst. In the play, men play the role of the same person (Kociołek) in adulthood and youth, respectively.

We share with you the conclusions from these experiments.

Biometric comparison of the voices of Fronczewski and Pławiak.

For this purpose, we used 25 seconds of total speeches by both characters, composed of several fragments of their original speech, based on the original film soundtrack. What compliance did we achieve?
The results of the analysis showed that the actors’ biometric voices are NOT consistent. PĹ‚awiak statement vs. Fronczewski VP – only 15% agreement, Fronczewski statement vs. PĹ‚awiak VP – 11%, but interestingly, these differences are not noticed at the level of the ear. In our opinion, the voices of PĹ‚awiak and Fronczewski are almost identical. And that is ultimately what this is all about.



For both characters, gender and nationality were recognized with minimal uncertainty (score of almost 100%). An age difference between the characters was also detected, estimated at 20 years.

Analysis of the voices of Russian pranksters Vladimir Kuznetsov (Vovan) and Alexei Stolyarov (Lexus) impersonating President Macron.


In this case, we biometrically analyzed the recordings of the pranksters’ voices and compared them with the voice of the real Macron (in both Polish and English versions). We downloaded all voice samples in the form of individual recordings from the public domain on YouTube. Our goal was to confirm the effectiveness of biometric systems for this specific situation – identifying fraud.

It turned out that the voice of one of the “Lexus” pranksters was just over 50% consistent with the voice of the President of France and as much as 97% consistent with the voice of the false president. The voice of the second one – “Vovana” – showed no similarities (0%) to the fake president.

 This clearly proves that thanks to biometric analysis we managed to:

â—Źdetect the fact, only after 1 minute, that a fake president was involved in the conversation
● identify the identity of the fictional president (Lexus)
â—Ź confirm that the public domain is a very good source of voice samples, which may not always be used for noble purposes
â—Ź strengthen the thesis that the most effective attacks are those using social engineering, and in this case it was the choice of the right time when the President was faced with increased stress (rocket fall).

These are just selected examples of the use of specialized biometric tools to confirm the identity of people. If implemented in the future, they may help detect voice-based abuse.

As many as 230 million stolen traditional passwords were registered despite meeting the standard requirements regarding their complexity (min. 8 characters, 1 capital letter, 1 digit, and a special character), according to Specops Breached Password Report z 2025 r. This means that the level of traditional security is insufficient and more effective protection tools are needed. Can a biometric password prove to be a more secure password? Absolutely yes!


Biometrics is one of the safer ways of logging in because it is based on biometric features of people such as face, eye pupil or voice. Biometric identifiers are unique to a given person and distinguish them from others.

The advantage of biometrics lies in its unrivaled accuracy and convenience. Unlike traditional methods such as passwords or PINs, which can be easily forgotten or stolen, biometric identifiers are inextricably linked to people. This inherent link between individuals and their biometric characteristics makes it much more difficult for unauthorized individuals to impersonate another person.


An example of secure login using biometrics is VoiceToken from BiometrIQ, a voice authentication tool that provides very strong, two-step authentication. We remind you how it is done.


When speaking words, the compliance of the read words (first level) with the pattern is verified as well as the biometric compliance of the speaker’s voice with his VoicePrint (second level).


Extremely high security is ensured by an algorithm for selecting words to be read, which reduces the possibility of guessing the sequence of words that will be displayed on the screen to almost zero. The Speech To Text (STT) mechanism combined with an innovative biometric engine guarantee high effectiveness, even in the case of attacks based on speech synthesis.

 More about VoiceToken

Are you ready for changes?