Voice-preserving translation: capability, consent, and the ethics of cloning
When we describe Owaa as "real-time translated calls in your own voice", reasonable people ask two follow-up questions:
1. *How* do you preserve the speaker's voice across languages? 2. Should that even be allowed?
Both questions deserve clear answers. This post gives them.
Three things often called "voice preservation"
The phrase is used loosely. There are three genuinely different things:
1. Voice matching (catalog-based)
The system picks the best-fit voice from a stock catalog of high-quality natural-sounding voices. The catalog has dozens of voices varying by gender, register, language background, and emotional tone. The system listens to the speaker for a few seconds, classifies them (gender, approximate register, age range), and picks the catalog voice that gets closest. The output across languages stays consistent because the same catalog voice is used for each side throughout the call.
Privacy footprint: zero. No biometric data is captured. The "voice" the listener hears is a stock asset.
2. Voice cloning (model-based)
Train a per-user voice model from a recorded sample (typically 30–60 seconds of clear speech). The model can then synthesize *that specific person's* voice in any supported language. This is what most people picture when they hear "your voice across languages".
Privacy footprint: material. The voice sample is a biometric. Storage, retention, deletion, and access controls all have legal weight, especially under GDPR, CCPA, and BIPA (Illinois).
3. Biometric capture (continuous)
Background recording of a voice during regular calls, used to refine a profile over time. Users may not realize this is happening.
Privacy footprint: severe. Without explicit, granular, continuous consent this is generally prohibited.
Where Owaa stands today
Phase 1 (current) ships option 1: catalog-based voice matching. No biometric data is captured. The speaker hears the other person rendered in a high-quality TTS voice that matches their general timbre, but it's a stock voice.
Phase 2 — planned — adds option 2 with explicit opt-in consent and a 12-month retention policy. The speaker would record a brief sample under explicit consent, the model is trained, and from then on their translated voice across languages sounds like *them*. Users can revoke and delete the model at any time, with a hard 12-month TTL even without an explicit revoke.
We will never ship option 3.
Why the consent design matters more than the model
The technical capability to clone a voice is now broadly available — open-source models can produce a plausible clone from 30 seconds of audio. The hard part isn't the model. It's making sure:
- The user knows it's happening.
- The user can refuse without losing the rest of the product.
- The data is portable and deletable.
- The use is bounded (translated calls only, not general impersonation).
- The retention is finite (12 months).
- The consent is reversible at any moment.
Get any of those wrong and you've shipped a surveillance product wearing a translation product's hat. The model is the easy part.
What that means in practice for users
If you call our hotline today (Phase 1), nobody is recording or cloning your voice. You get translated voice over a stock catalog voice that's a reasonable match for your tone. That's it.
When Phase 2 ships, you'll be asked — explicitly, in plain language, before any recording happens — whether you want to enable voice cloning. If you don't, you keep the Phase 1 experience. If you do, the feature is on for *your account only*, with the controls above.
We will publish the full consent flow + retention policy on the Compliance page before Phase 2 launches.
What about the other side of the call?
Important asymmetry: the callee in any translated call hears the AI-rendered version of the caller. They don't hear the caller's raw voice in their own language. So the callee isn't subject to anything new — they're just receiving a phone call as usual.
There is, however, a question of whether the callee should be told the call is AI-mediated. Different jurisdictions have different rules:
- TCPA (US) does not currently require disclosure for AI-mediated translation, but does for outbound auto-dialing.
- EU AI Act (as of writing) classifies translation systems as "limited risk" and recommends user awareness, not consent.
- Some carriers require an audible tone or announcement for any AI-altered voice on outbound calls.
Our default behavior, when in doubt, is to disclose. The recipient hears a brief "translation enabled" announcement on outbound calls placed via the agent. The hotline (inbound) doesn't need disclosure because the inbound caller chose to be in the system.
Where this lands
Voice-preserving translation is a powerful feature when shipped under tight consent design and clear retention rules. It is a creepy feature when shipped without those. The technical work isn't where companies fail — the policy work is. Owaa's plan, in writing, is to do the policy work first and the model work second.
Read more
- Compliance — emergency call blocking, TCPA, voice biometric consent
- Privacy policy
- How real-time voice translation actually works