Vietnam continues to surprise. Hanoi is launching the country’s first metro system with biometrics, digital ID, and EMV payments. It is the first city in Vietnam to implement a metro system integrating biometric identity verification with digital ID data and open-loop payments (interoperable payment acceptance).

The upgrade includes digital ID infrastructure at all 12 stations on Hanoi Metro Line 2A.
The metro line is now equipped with multi-format readers that accept chip-based identification cards (CCCD), NFC cards, QR codes, and EMV-compliant bank cards. AI-enabled cameras at the ticket gates perform biometric matching with digital ID data, allowing passengers to pass through without having to show their ID cards.



The system is also integrated with the Ministry of Public Security’s RAR Center, enabling future ticket purchases through VNeID, Vietnam’s national digital ID platform and national transport hub.

Officials argue that the initiative strengthens Hanoi’s ambitions as a smart city and improves interoperability between different modes of transportation. It also introduces a unified digital identity layer for public transportation, something no other Vietnamese city can yet boast.

During the two-month trial period, the system served over a million passengers, enjoying great interest among younger passengers.

https://www.biometricupdate.com/202512/vietnam-integrating-biometrics-into-daily-life-in-digital-transformation-drive

  1. The voice biometrics market is relatively young, currently estimated at USD 2-3 billion, USD 2.6 billion according to the Mordor Intelligence report “Voice Biometrics Market Size, Forecast Report, Landscape 2025”.
  2. Depending on the source, forecasts assume growth of approximately $10-15 billion over the next 8-10 years.
  3. The leading region is North America – in the Fortune Business Insights analysis, the share in 2024 was nearly 37%.
  4. Asia-Pacific (APAC) is often cited as the fastest growing region in the coming years.
  5. The “Healthcare and Life Sciences” sector will be the leader in 2025 with a 40% market share.
  6. Growth is driven by: growing security requirements, the need for passwordless authentication, the development of voice and AI technologies, and the digitization of financial and contact services.

sources:

What distinguishes effective voice biometrics systems? The following four indicators determine the advantage of one system over another:
1. Accuracy rate, it means that the effectiveness of biometric systems should be in the range of 95-99%.

2. FAR (False Acceptance Rate), a metric that measures how often a system incorrectly accepts an unauthorized person (e.g., someone impersonating a user) as a valid user. In the most accurate systems, this rate is less than 1%. The lower the rate, the more secure the system and the more difficult it is to impersonate.

3. FRR (False Rejection Rate), a metric that measures false rejections, or the number of times the system rejects a genuine user when it should accept them. Ideally, this figure is below 3%.

4. EER (Equal Error Rate). The point at which the FAR equals the FRR, this metric is often used to compare the quality of biometric systems.

The most effective systems are generally considered to be Phonexia oraz ID R&D systems due to their outstanding performance in comparative tests.

In our research, we primarily use Phonexia engines, but we also utilize others such as Kaldi (X-vector) and ECAPA. The goal is to test our algorithms as extensively as possible in a diverse environment. Security is our top priority.

In biometric analyses, the recording itself, or more precisely, its quality, determines the effectiveness of the biometric system’s speaker recognition. It’s worth recalling what determines the quality of such a recording (audio signal). Therefore, the quality of the audio signal is primarily influenced by:

👉 acoustic conditions – noise level, reverberation and SNR (signal to noise ratio)
👉speech quality – naturalness of the voice, its loudness, coherence of articulation
👉 recording devices
👉 recording time
👉 technical parameters such as frequency and resolution.

However, it’s safe to say that the main factor determining quality is SNR. SNR (Signal-to-Noise Ratio) stands for signal-to-noise ratio—a measure of signal quality relative to the level of interference. A high ratio indicates clear sound, free from background noise, which translates into high speech identification efficiency and more accurate analysis.

Recording time is also a crucial factor. Biometric systems require a minimum length of voice sample to extract characteristic features. A recording that is too short (1-2 seconds) may not be sufficient for reliable identification. A longer recording, on the other hand, does not directly improve quality, but it allows for better extraction of clear and repeatable fragments, which improves accuracy.

Therefore, the optimal recording length ranges from a few to a dozen or so seconds of speech.

Speech quality, in turn, encompasses not only the clarity of the recording but also naturalness, volume control, and consistent articulation. These elements influence the reliability of voice samples and the effectiveness of recognition algorithms. It’s important to remember that extreme values ​​(too quiet/too loud) can complicate analysis.





An open benchmark for assessing systems for detecting deepfakes and manipulated media content has emerged. It aims to help evaluate and improve algorithms for detecting audio, video, and image content generated by AI.

The shared dataset contains over 50,000 samples of real, AI-generated and manipulated audiovisual content—deepfakes and synthetic media—annotated with real-world use cases. Adversarial attacks allow for testing the model’s robustness.

Importantly, the license is granted for evaluation purposes only and is not intended for training or commercial purposes..

It’s a joint initiative of Microsoft’s Good Lab, Northwestern University’s Security and Artificial Intelligence Lab, and the nonprofit WITNESS.

Will it encourage researchers to use and share their own analyses?

more biometricupdate https://www.biometricupdate.com/202507/new-microsoft-benchmark-for-evaluating-deepfake-detection-prioritizes-breadth

A new form of payment using smart glasses has hit the market! Transactions can be completed using QR code scans and voice commands. Alipay’s smart glasses, created in collaboration with Chinese smartphone manufacturer Meizu, are powered by voice authentication and intent recognition technology. Meizu, for its part, offers an optical waveguide display, voice noise reduction, and camera-based code capture technology.


Firma przeprowadziła właśnie swoją pierwszą transakcję płatniczą z wykorzystaniem portfela elektronicznego z wbudowanymi inteligentnymi okularami za pośrednictwem AlipayHK w Hongkongu.

The company has just completed its first payment transaction using an e-wallet with built-in smart glasses via AlipayHK in Hong Kong.

Ant Group plans to roll out the new feature to Alipay+’s global partners in 2025. Alipay+ is the company’s cross-border mobile payment solution that allows businesses to accept payments from mobile wallets across multiple countries, including Line Pay and GrabPay. The service currently connects over 1.7 billion user accounts across 36 mobile wallets.

more https://www.biometricupdate.com/202506/alipay-introduces-smart-glasses-payment-with-voice-authentication

What do you think about this form of payment? Could this service revolutionize the market? Do you see its applications in everyday life?

Can a bot be sincere? Can it express remorse when it apologizes? These are the questions researchers are asking themselves in the context of the possible use of AI in handling complaints.

This is about automating the entire process. While its operation seems to be easy, the most difficult challenge may be showing emotion by the bot, in this case an apology.

Research suggests that when people expect an apology, they expect sincere contrition and authenticity. Spontaneity is also important, and these are typical human traits that a machine can’t always handle perfectly. So how reliable can an apologetic AI be?

more on biometric update https://www.biometricupdate.com/202505/apologetic-intelligence-should-bots-handle-complaints

Voice stream augmentation is a technique that amplifies or modifies a voice stream in real time to improve its quality. In short, it involves using electronic devices such as microphones, loudspeakers, amplifiers, or software algorithms to change the characteristics of the sound, e.g. voice tone.

It is not difficult to guess that augmentation is used in speech recognition systems, games, and even in artificial intelligence, where it allows the generation of artificial voices and improving their naturalness.

Currently, in one of the projects we are creating our own voice stream augmentation engine. Its purpose is to support detection of unauthorized use of voice in the communicator for further synthesis/conversion without causing degradation of the sound heard at the level of the human ear. All this to prevent voice theft and ensure the most effective operation of the service.

It turns out that deepfake audio can be more dangerous than video! According to the Pindrop report, in the two years 2023-2024 there was a 760% increase in the number of such deepfakes (audio).


In an era of increasing attacks, self-awareness seems to be a key barrier protecting humans from these types of threats. It is about:

●  limited trust in voice assistants,
● knowledge of social techniques used by fraudsters,
● control over the content you publish on the Internet.

In system solutions, it is obvious to use advanced biometric technologies and methodologies to detect deepfakes in real time.

For example, Pindrop uses a technique called acoustic fingerprinting as one of its capabilities. This involves creating a digital signature for each voice based on its acoustic properties, such as pitch, tone, and cadence. These signatures are then used to compare and match voices across calls and interactions. For more on deepfakes, check out this podcast with Vijay Balasubramaniyan, CEO of Pindrop. Link below
https://www.biometricupdate.com/202504/biometric-update-podcast-digs-into-deepfakes-with-pindrop-ceo

As a reminder, Pindrop is a company based in Atlanta, USA.  Their solutions are leading the way for the future of voice communications, setting the standard for identity, security, and trust in every voice interaction.  More at pindrop.com

The exhibition was marked by the ubiquitous AI. Many companies presented their latest achievements in constructing systems that communicate autonomously with people. The humanoid robot Ameca (Etisalat) interacting with its interlocutors aroused great interest. The stands with interactive agents (Amdocs) offered an almost unbelievable quality of image and speech generated by the systems.

Google has unveiled Gemini Live, its response to ChatGPT’s voice mode.  Gemini Live has function Share Screen With Live, that allows Gemini to interact with the image displayed on the phone’s screen. Deutsche Telekom has indicated a possible direction for the development of phones by turning the entire phone into a chatbot. The phone has no applications and is a personal assistant that communicates with the user by voice. The basis of the solution is a digital assistant from AI Perplexity, but it is also to be open to, among others, Google Cloud AI, ElevenLabs, and Picsart. South Korean startup Newnal has presented a new operating system for mobile phones that uses historical and current user data to create a personalized AI assistant that is to eventually become an AI avatar behaving just like the user.

All of the above solutions, as well as many others, are connected by the use of voice technologies for two-way communication. The direction indicated at MWC 2025 is clear – our actions will be supported by avatars and bots communicating with us autonomously. The possibility of quick, machine confirmation of who we are talking to is therefore becoming even more important than ever before, because the quality of autonomous voice communication systems does not guarantee correct verification of the speaker by a human.

Photos by Andrzej Tymecki