A few days ago, we launched a new version of the voice coherence demo, in which you can check whose voice among famous Polish athletes is closest to yours. Our list includes the names of such Olympians as Iga Świątek, Michał Kwiatkowski, Anita Włodarczyk and Hubert Hurkacz. In total, we selected as many as 20 people from 13 different disciplines.

How does the demo work?
To check who your voice is similar to, just visit the website https://demo.biometriq.pl:8443/ and read the text we have prepared, the reading time of which is only 30 seconds. During this time, we record a voice sample, which is simultaneously compared with the voices of selected 20 athletes. The results are displayed immediately in a column in the form of % convergence.

The result may show:
=> significant % convergence with only one person
=> show no convergence, then the table will remain unchanged and 0 will be shown everywhere
=> convergence with several people (this happens most often).

Remember that voices may be confusingly similar to each other, but not biometrically consistent. Conversely, they may be biometrically consistent but not audibly similar. This is because the human ear primarily perceives the intensity and tone of sound along with the location of its source, and biometric systems extract from the audio stream and analyze several features that make the analyzed voice unique.

What does biometric integrity testing give us and when is it helpful?

The algorithm used in this exercise may be used commercially in the future in systems for verifying the identity of people and detecting voice-based fraud, the so-called deepfakes. This model is constantly being developed and improved by us to be as effective and reliable as possible, which is associated with obtaining a probability of assessing the speaker’s truthfulness of over 90%.

From January 2024 We are implementing the Vesper project, which aims to create an innovative voice communicator. We informed about the project here https://biometriq.pl/en/vesper-save-voice-communication-platform-with-integration-of-biometric-services/

The aim of the first stage of the project, completed on December 31 this year, is to develop an innovative method for detecting the truthfulness of the speaker and the transmitting voice stream.
Implementing this method in the messenger will allow you to assess the compatibility of the interlocutor communicating, among others, smartphone or computer according to your expectations and prevent voice attacks.

And what we did?

1. We have developed the structure of the subsystem.

2. We have developed preliminary requirements for a methodology for detecting the veracity of the far-end voice source based on the received signal in the near-end device.

3. We have developed guidelines for measurement methodologies, taking into account the statistical significance of the results.
4. We have developed a preliminary version of research procedures taking into account the developed research scenarios for stage 1.

Scenarios and research procedures were developed taking into account the state of scientific knowledge at the time of work.

5. We have prepared a dataset for training neural networks.

6. We carried out the first training of neural networks and the selection and optimization of cost functions.
7. We performed a detailed analysis of QoE assessment methodologies in connection with qualitative objective parameters and proposed the framework of our own solution in this area.
8. We analyzed the possibilities of controlling and intervening in the audio path for mobile phones available on the market.
9. We have prepared and configured the first version of the VESPER test platform based on the Signal framework for Windows and Android.
10. We rented a DSP laboratory

11. We acquired a speech corpus for testing

12. We have purchased Voice Conversion engine licenses

Ideas for using the latest technologies may be surprising. Deadbots (also known as griefbots or postmortem avatars) have appeared on the market, i.e. replicas of deceased people communicating with their loved ones in their native language and in their own voices. These are applications or computer programs based on data obtained from the Internet, which are intended to create the illusion of a deceased person and provide emotional support after the death of loved ones.

And although the goal seems right, scientists from the University of Cambridge draw attention to the number of threats associated with this technology. Based on the three analyzed scenarios (selling products using the image of the deceased, parent avatar for a child, purchasing a long-term deadbot subscription for loved ones), the main ones are: 

=> the possibility of manipulating and influencing people in mourning

=> using the image of people after death without their prior consent (need to regulate this aspect by obtaining consents to use, also regarding voice)

=> monetization of the experience of mourning and the desire to circumvent regulations for sales purposes by companies producing deadbots

=> unfavorable impact of technology on certain social groups, mainly children (indication of introducing an age limit for the use of this type of solutions)

According to Newseria, scientists do not completely reject this solution. They indicate the benefits:

=> public education, deadbot as an intergenerational exchange of stories and experiences (e.g. Holocaust survivors talk about their experiences)

=> source of income for families after the death of famous artists or journalists

Deadbots are another example indicating the need to implement legal regulations for services created based on AI. This would avoid infringements related to the use of their image and voice after their death.

What do you think about deadbots? Are you convinced by this type of services?

More here https://biznes.newseria.pl/news/deadboty-moga-byc,p262956223

Read more: Do deadbots have more threats or benefits?

The incident with Scarlett Johansson confirms the fact that the issue of legal regulations and effective tools is currently the highest priority in the context of preventing deepfaks, i.e. voice-based attacks. Illegal use of the voices of famous people to promote or discredit them is common and constitutes quite a challenge in the world of social media.

The dispute that Johansson is having with Open AI, which allegedly used her voice from the movie “Her” to create the GPT Chat assistant, is a perfect example here showing how easily a voice can be used and how difficult it is to prove that the voice belongs to a given person and not another person .

In short, Open AI, despite Johansson’s lack of consent to license her voice to create a chat voice assistant, GPT presented its voice-using product called “Sky”, confusingly similar to it.
The lack of legal protection in this area unfortunately does not work to the actress’s advantage. However, it clearly draws attention to the need to protect the creative work of artists to power artificial intelligence tools.

You can read about the use of S. Johansson’s voice in the original article

https://www.npr.org/2024/05/20/1252495087/openai-pulls-ai-voice-that-was-compared-to-scarlett-johansson-in-the-movie-her

We’re talking about this for a reason. When it comes to anti-deepfake tools, BiometrIQ, as a research company, specializes in creating algorithms that help detect fraud by comparing real voices with those generated by AI. Using proprietary tools, we can assess with very high certainty whether a voice has been faked or not. Using a biometric-based algorithm is certainly the most effective way to combat deepfakes on the Internet.

We also have an algorithm that helps, already at the stage of creating recordings, mark them so that they cannot be effectively used for further conversion or voice synthesis. Such a tool would certainly help reduce voice theft cases.

Read more: Legal regulation can prevent deepfakes. Scarlett Johansson’s case

It’s already happening. Age estimation using biometrics on point for UK sales solution articles. Is security responsible for solutions in Poland?

Innovative Technology (ITL) has confirmed the main authority’s partnership with Buckinghamshire & Surrey Trading Standards for the use of their biometric age assessment technology by retailers selling age-restricted goods.

ITL’s biometric age estimation products, MyCheckr and MyCheckr Mini, anonymously estimate age at the point of sale, thereby preventing minors from accessing alcohol and cigarettes.

https://www.biometricupdate.com/202402/global-demand-for-age-estimation-verification-drives-deals-for-itl-new-entrants
Read more: Age verification for purchases of limited sale items

Is this phenomenon regulated by law in any way? And if so, are these regulations sufficient to protect the author? After all, the voice is not only an element of the image, but a unique biometric feature used for identification. Therefore, it should be protected in the same way as personal data.

Progress in the development of artificial intelligence means that voice-based fraud (impersonating other people) is becoming more and more common and is the subject of financial fraud, political attacks, data theft or unfair promotion. Do you remember the fake voice of US President Joe Biden used in automated telephone calls discouraging participation in the primary elections, or the use of Taylor Swift’s fake voice in an ad phishing for data under the guise of handing out pots? These are examples of how quickly and effectively technology can be used for unfair purposes. Unfortunately, the scale of this type of abuse will only increase.

Motivations for use are also becoming more and more popular in film productions. We’re talking about the cult Top Gun Maverick, where Val Kilmer’s voices were synthesized. This happens in the last season of the Polish series Rojst, in which Filip Pławiak (young Kociołek) talks about Piotr Fronczewski (Kociołek). The effect of use, i.e. the procedure we are dealing with when the challenge is no longer used in the era of AI invasion. While this aspect was taken into account and regulated for the needs of Rojst’s production, the question arises about the device in films synthesized by actors after their death. The analysis report also includes aspects of biometric testing manufacturers.

Having the appropriate tools at our disposal, we attempted to biometrically compare the voices of Fronczewski and Pławiak. The results of the analysis show that their voices are biometrically NOT consistent (Pławiak utterance vs. Fronczewski VP – only 15% agreement, Fronczewski utterance vs. Pławiak VP – 11%), but interestingly these differences are not noticed at the ear level. In our opinion, the voices of Pławiak and Fronczewski are almost identical. And that’s what’s going on here.

For both characters, gender and nationality were recognized with minimal uncertainty (score of almost 100%). An age difference between the characters was also detected, estimated at 20 years.
The study was carried out in our digital signal processing laboratory, for this purpose we used 25 seconds of total speeches of both characters, composed of several fragments of their original speech, based on the original film track.

The conclusions from this experiment indicate how helpful and effective biometrics can be in identifying the speaker, assessing the authenticity of his voice and, consequently, detecting voice-based fraud. Will this be enough to limit the unfair use of the voices of famous people in the future? And most importantly, are we able to regulate the market so as to take care of the voices of famous people after their death?

What is accessibility?

Accessibility is a broad concept describing the extent to which a given system can be used by as large a group of people as possible.

It is a property of the environment (physical space, digital reality, information and communication systems, products or services) that allows people with functional (physical, cognitive) difficulties to use it on an equal basis with others.

Regulations in Poland – WCGA2.0 and WCAG2.1 Directives

With people with disabilities in mind, the WCAG 2.0 standard (Web Content Accessibility Guidelines) was created – an extensive set of recommendations on the accessibility of Internet content – these are 12 guidelines defining the features of individual content elements on the Internet that affect their accessibility. This applies to online stores, email clients and mobile applications.

WCAG2.1 is an extended WCAG 2.0 standard with an additional 17 guidelines related to sharing content on mobile devices. among others the ability to view the image on mobile devices in vertical and horizontal orientation, adjust the page content to the device window without having to scroll the image, use appropriate spacing between lines, use an appropriate contrast ratio, and the ability to turn off animation after interaction.

Public institutions are obliged to apply these Directives, but when thinking about popularizing solutions, we should ensure that newly created products/services meet these requirements to the greatest extent possible.

What does accessibility mean to us as a company?

Accessibility for us is one of our 4 main values ​​(along with innovation, cooperation, security), to which we attach a huge importance when designing solutions in the field of voice biometrics and improving systems of this type. We develop our technologies with its universality in mind, for people who lack knowledge or have little knowledge about online threats. By creating solutions already at the research stage, we make sure that as many people as possible can use them, giving priority to people who are not fully functional, e.g. with limited vision. In this way, we try to eliminate differences, to put it bluntly, to counteract digital exclusion. We believe that this approach significantly improves the quality and comfort of life of people with disabilities. According to the Central Statistical Office, there are over 3 million such people (legally registered) in Poland, which constitutes 10% of the entire society.

Vesper voice communicator

We are currently working on a new, innovative solution for communicating via voice. Its advantage will be its high accessibility for people with disabilities. Due to the fact that the solution will also be available on mobile devices, we will base our design on the WCAG2.1 standard. The product will feature a clear and default interface, ensuring simple and intuitive operation and high flexibility in use. The voice communicator will provide users with, among others: the ability to clearly zoom in on the text or use alternative descriptions. It is worth emphasizing that none of the instant messengers available on the market have such functionalities.

The project is implemented thanks to a grant from the European Union.

more about the Vesper project

Is it possible? The research experiment opens new possibilities in this area.

Researchers in the UK have created a dataset of physical movements that generate speech sounds. This collection may be used in the future to develop speech recognition systems that synthesize the voices of people with speech impediments. This may also contribute to the development of a new method for recognizing silent speech and even new behavioral biometrics.

This means that in the future,  voice-controlled devices such as smartphones will likely be able to read users’ lips and be used to authenticate banking and other sensitive applications by identifying the user’s unique facial expressions. In other words, a person could be authenticated based on the movements of their lips and face.

In this experiment, the database was built based on lip reading and facial movement analysis. Data from continuous wave radars were used to capture the movement of the skin on the face, tongue and larynx of the study participants while speaking. Scientists used, among others, a laser spectra detection system with a super-fast camera to capture vibrations on the skin surface, as well as a Kinect V2 camera to read changes in the shape of the lips when forming various sounds.

The database, created based on the analysis of 400 minutes of speech, will be made available to researchers free of charge in order to further develop the technology.

The research group included scientists from the University of Dundee and University College London. The experiment also used technology from the Center for Communication, Sensing and Imaging at the University of Glasgow.

more 

The fight against crime can become more effective thanks to voice biometrics.

Phonexia’s product for voice comparison in the field of computer forensics will soon be available on the market. Voice Inspector 5.1, because this is what we are talking about, was designed especially for experts in this area.
The software is able to identify a person based on just 3 seconds of speech and offers the same voice comparison accuracy regardless of language. The new software offering meets international standards of judicial admissibility, in line with the guidelines of the European Network of Forensic Science Institutes (ENFSI).

The product also includes a set of supporting technologies, such as speaker diarization based on voice recognition, which allows marking individual speakers and separating them from the mono audio stream, a phoneme recognition module for identifying similar sound patterns in recordings, voice presence detection, and a spectrogram for analyzing audio files. 

Phonexia operates as part of the European Union-backed Roxanne consortium, which cooperates with law enforcement agencies in investigating criminal networks by providing voice biometrics data. The project was co-financed under the EU’s “Horizon 2020” program.

More on biometricupdate
https://www.biometricupdate.com/202401/phonexia-launches-voice-biometrics-product-for-forensic-investigations

We started 2024 by launching a new research and development project for which we received a European Union subsidy called “Vesper – safe voice communication platform with the integration of biometric services”. The aim of this project is to develop and implement an innovative voice communicator with unique functional features on the market. What is it? Will Vesper voice communicator stand out?

In addition to strong transmission encryption, the communicator will have integrated voice biometrics technology and two unique functionalities developed as part of the research and development work of this project, i.e. technology for verifying the authenticity of the far-end voice stream emission source and technology for augmenting the voice stream received in the near-end device. It is worth emphasizing that no other instant messengers such as Skype or Teams currently available on the market have such functional features.

The technologies developed as part of the project are intended to protect the user against presentation attacks and prevent the interlocutor’s voice from being used to effectively create a deep fake using voice synthesis techniques.

Additionally, the technologies implemented in the communicator: the technology for verifying the authenticity of the long-range voice stream emission source and the technology for amplification of the voice stream received in the short-range device will also be the subject of independent commercialization under the granted license.

Another advantage of the messenger will be its high accessibility for people with disabilities, prepared according to the WCAG2.1 standard. The product will feature a clear and default interface, ensuring simple and intuitive operation and high flexibility in use.

The target market for the introduced products is the international encrypted mobile communications market. The main recipients of the Vesper communicator will be enterprises requiring secure voice communication, administration, etc. The recipients of the two technologies resulting from the project will be primarily producers and suppliers of voice and related communication systems, for whom such technologies will constitute an added value that increases the safety of their users.

The project started on January 1, 2024 and will last 2.5 years. This is the fourth BiometrIQ project implemented with EU funds.

=>Project value: PLN 7,217,514.00

=>Amount of contribution from European Funds: 4,779,500.00

=> Project number: FENG.01.01-IP.02-0769/23