Audiobox: it is now possible to test AI to clone voices from Meta

how to use audiobox

Artificial Intelligence continues to advance by leaps and bounds and is unstoppable. If we had uploaded things about programs that generate video only by marking texts, now Meta has just blown everything up with Audiobox, the program to clone and create reliable sounds from scratch through AI.

If you want to know more about Audiobox, how AI voices are generated and the risks that this technology can entail, we advise you to continue reading this article.

What is Audiobox?

audiobox

Audiobox is the Meta service to be able generate voices generatively, imitating others that already exist. That is, it is an AI voice “cloner” that is also quite reliable.

It is not the first time that the company has flirted with this idea, since a while ago it launched Voicebox, but it is true that it was not as advanced or as complete as this other solution.

With Audiobox, Meta comments that You can record voices from scratch and clone them in a few seconds, but to do so the user must record themselves reciting a specific text.

This is done because it is a text that has been more than studied and allows us to obtain specific nuances of the voice, in addition to being a way to avoid identity theft and any neighbor's child coming and cloning your voice, basically.

Today, Audiobox is capable of achieving the following milestones:

  • Generate soundscapes from text: e.g. “recreate a Tuscan road with vintage cars and birds in the background”
  • Creating audios with a tono and concrete rhythm: “generate a child's voice with a high-pitched and nasal tone”
  • Associate a voice with a environment characteristic: “pretend that that child is inside a cavern with an echo”

How are voices cloned through AI?

clone voices with audiobox

The process of voice cloning using artificial intelligence is not far from any AI method that exists on the market today, based on the Transformer architecture, which we have talked about in this other article.

What the Transformer architecture does is allow the model to be analyzed to focus on specific parts of the input when performing tasks. Attention is computed by weights assigned to different parts of the input, and these weights are learned during training.

Let's imagine an assembly line, where one operator collects the voice, the other processes it, another analyzes it and so on until an exact clone is made. Well, this is how the Transformer architecture would work applied to speech generation with AI. If we focus on the steps, they would be these:

Data collection

Are needed extensive audio recordings of the person whose voice is to be cloned. The larger and more varied the data set, the better the resulting model.

Preprocessing thereof

Audio recordings are processed to extract relevant features, such as tone, intonation, speed, and other characteristics that define the voice.

Deep learning models:

Deep learning models are used, such as recurrent neural networks (RNN), to learn complex patterns in voice data. These models can be trained to capture the variability and subtleties of the voice.

Model Training

The model is trained using the collected data set. During training, the model adjusts its weights and parameters to minimize the difference between the generated voice and the real voice of the target speaker.

Voice generation

Once trained, The model can generate synthetic voice that imitates the original voicel. You can provide text as input, and the model will generate the corresponding speech.

How can we use Audiobox in Spain?

To use this new service in Spain we will simply have to access the Audiobox website, register and we can now use it from the computer. Of course, we will have certain reservations that we should know:

  • Moment only works with English audio, since it is a beta
  • It is likely that the end result is something robotic, which is normal because it is not yet completely polished.

Are there risks in using programs like Audiobox?

Hackers could use Audiobox to commit digital crimes

Hackers could use Audiobox to commit digital crimes

Voice cloning with artificial intelligence presents several risks and ethical challenges, above all linked to the condition of human beings, since not everyone is going to make good use of this type of technology, which is dangerous in the wrong hands. Specifically, the big risks that we see when using a program of this type are the following:

Fraud and identity theft

The ability to clone voices could be used to carry out fraud and identity theft in phone calls, voice messages or audio recordings.

This could have serious consequences in terms of security and trust, especially at a time when telematic procedures or purchasing products over the telephone are becoming more popular.

Disinformation and manipulation

Voice cloning technology could be used to create fake audio recordings in order to spread misinformation or manipulate public opinions. This raises concerns in the context of misinformation and manipulation of reality.

This could affect individuals, companies or even public figures, who could see their voices used to generate false and interested content, such as an audio of a President admitting bribes when he never did so in real life, for example, with the repercussion legal and media that this would entail.

Phishing and social engineering

Voice cloning could used in phishing and social engineering attacks, where attackers try to trick people into believing that they are interacting with someone they trust, in order to try to obtain their data for illegal things.

Let's imagine hackers asking a person for a quick transfer or a Bizum with the voice of a child because they need a quick deposit, or asking for a moment for the bank password to "see something about the account", for example. example.

Legal issues and liability

Misuse of voice cloning could lead to legal issues and liability challenges, as Determining the authenticity of voice recordings could become more complicated. And although before experts and experts were needed to see the authenticity, now the factor of cloned voices will have to be taken into account for trials.

For our part, we just have to wait to see how this evolves and if Meta will get its act together with the security issue. At the moment they have announced that the project will be closed source, so we advise a certain caution when it comes to using our voice for this, since we are not clear exactly what use the company will give to the data collected. .


Leave a Comment

Your email address will not be published. Required fields are marked with *

*

*

  1. Responsible for the data: Actualidad Blog
  2. Purpose of the data: Control SPAM, comment management.
  3. Legitimation: Your consent
  4. Communication of the data: The data will not be communicated to third parties except by legal obligation.
  5. Data storage: Database hosted by Occentus Networks (EU)
  6. Rights: At any time you can limit, recover and delete your information.