In January 2024, an employee at the Hong Kong office of an international company received an email supposedly from the chief financial officer of the company’s UK headquarters. Then came what looked like an ordinary internal meeting: a closed video conference, several colleagues on screen, the calm voice of a senior executive, and instructions for confidential transfers. The people looked familiar. The voice sounded right. The transfer went through — and the money ended up in the hands of scammers.
Later, Hong Kong police broke down the scene in detail. The conference turned out to be a prepared deepfake assembled from publicly available videos and employees’ voices. There had been no real meeting. The victim transferred about HK$200 million to five local accounts. The authorities noted one important detail separately: there was almost no interaction. The fake CFO gave instructions, quickly ended the meeting, and the subsequent commands were sent through a messenger.
This case shows why voice and video deepfakes have already become a practical tool for fraud, undermining our trust in what we hear and see.

Why We Still Trust Voices
We have a special relationship with voices. We may not remember a phone number, recognize a colleague’s handwriting, or ever meet someone in person. But our brains are remarkably good at recognizing the voices of people we know. Timbre, pauses, familiar verbal habits, fatigue at the end of a sentence — all of this can feel almost like a biometric signature.
The problem is that this biometric-like signal no longer belongs exclusively to its owner. In a study published in Scientific Reports in 2025, participants struggled with two different tasks. When they heard a real speaker paired with that speaker’s AI clone, they judged the two voices to belong to the same person in a median 83.3% of trials. In a separate task, where they had to decide whether a single voice was real or AI-generated, they correctly identified AI-generated voices only 60.8% of the time. The human ear is not a reliable detector.
Imagine an everyday scenario. A mother receives a call from her daughter: “Mom, I’ve been in an accident, my phone is almost dead, there’s a lawyer with me, I urgently need you to transfer money.” The voice is trembling. There is noise in the background. The situation is distressing, there is no time, and emotions have already taken over. Even if the person later says, “I would never have believed it,” in the moment, people do not analyze, they react.
The same mechanism works inside a company. A finance manager receives an email from the “CEO”, followed by a short call: “I’m in a meeting right now, this is urgent, I’ve sent the details, the payment needs to be processed by the end of the day.” Few employees want to let their boss down.
The New Shape of Deception
Fraud involving an “urgent money transfer” existed long before generative AI. Business Email Compromise, or BEC, has for years relied on fake emails from executives, suppliers, lawyers, and accountants. Now scammers have gained a low-cost trust amplifier.
In the 2025 Internet Crime Complaint Center report complaints involving AI-related information totaled 22,364 cases, with reported losses of more than $893 million. In the same section, the FBI specifically states that voice cloning can be used to request wire transfers, and that businesses reported losses of more than $30 million from BEC schemes involving AI. The report also describes “distress scams” — calls supposedly from a loved one in trouble; in such schemes involving voice cloning, victims reported losses of more than $5 million.
And these figures are likely to keep growing. According to forecasts, generative AI could increase fraud losses in the United States from $12.3 billion in 2023 to $40 billion by 2027.

Voice as Bait
In fraudulent schemes, voice is only one element among many — a detail that makes the whole picture convincing.
Before the call, there may be a phishing email. Before the email, information may be collected from LinkedIn, a corporate website, interviews, webinars, podcasts, YouTube, and Instagram. Does an executive have public talks online? Excellent. Does an employee have conference recordings? That works too. Who works with whom, who is on vacation, who recently changed roles, when the company’s quarter closes — all of this helps attackers create a moment in which the request sounds plausible.
Financial regulators are increasingly discussing deepfake media as part of a broader scheme to bypass identity checks. In November 2024, FinCEN issued an alert for financial institutions: criminals are using deepfake media and generative AI tools for fraud, including attempts to bypass identity verification and authentication methods. The alert also noted an increase in suspicious activity reporting involving suspected use of deepfake media.
In such a scheme, voice works like a stamp on a forged document. It does not have to be perfect in itself. It only has to match the victim’s expectation: “It really is him. That is how the boss sounds. That is how my son speaks. That is how our supplier does business.”
A Market Built on Trust
A Consumer Reports assessment of six popular AI voice cloning products — Descript, ElevenLabs, Lovo, PlayHT, Resemble AI, and Speechify — found that most services lack meaningful safeguards against fraud or abuse. Many of these products make it possible to create an artificial copy of a voice from a short audio fragment.
Voice cloning has useful applications: voice-over work, localization, audiobooks, assistance for people who have lost their voice, and speech restoration after illness. But the same mechanism that helps an actor quickly record a voice-over track also helps a scammer sound like someone else’s child saying: “I urgently need money.”
That is why regulators are already launching initiatives such as the Voice Cloning Challenge to find ways to counter the harmful use of voice cloning: from synthetic voice detection to watermarking and verification that audio comes from a live human source.

Laws Against Voice-Based Fraud
In the United States, in February 2024, AI-generated voices in robocalls were deemed illegal under the Telephone Consumer Protection Act. One of the triggers was the incident involving a fake call using Joe Biden’s voice before the New Hampshire primary, but the implication is broader: if a voice can be synthesized, its use in automated calling becomes a legal issue.
In 2026, a bill is was introduced in the US Senate proposing protections against digital impersonation fraud. Voice and video impersonation are becoming matters of criminal law and consumer protection policy.
In Europe, policy is more closely tied to the transparency of synthetic content. The EU AI Act requires disclosure that deepfake audio, video, or images have been artificially generated or altered, while providers of AI systems must mark synthetic content in a machine-readable format, where technically feasible.
But the law does not solve the central problem in real-time scams. When a person receives a call “from the police,” “from the bank,” “from their child,” or “from their manager,” they do not have time to read legal frameworks. They have only a few seconds, and they are under pressure: urgent, confidential, do not tell anyone, the money is needed now.
How the Rule of Trust Is Changing — and What to Do About It
The worst advice on this topic is: “Listen more carefully.” Research already shows that people are not very good at recognizing synthetic voices. In a real attack, stress, background noise, haste, respect for authority, fear for a loved one, and corporate discipline are added to the mix.
A better rule is simpler: voice is no longer proof of identity. It is the reason to stop and verify.
For families, this can be a simple agreement: if someone calls “in trouble” and asks for money, the call should be ended, after which the recipient should call back using a known number or contact another relative. Or agree in advance on a code word that only you will know.
In companies, no urgent payment should go through simply because “the CEO called.” There needs to be an independent confirmation channel, two-step approval, a ban on changing payment details through a single channel, and separate procedures for “confidential” requests. Anything that sounds urgent and secret should automatically become suspicious.
Voice deepfakes are dangerous because they exploit an old human habit: we hear a familiar voice and fill in the trust ourselves. The scammer only has to give us the right timbre, the right pause, and the right reason to hurry.
A familiar voice can still trigger trust. What it can no longer do, on its own, is prove who is speaking.

