Logo

Next Generation AI-based Voice Cloning & Talking Head Generation

Safe and Realistic Voice Cloning Technology

Experience the future of digital communication with our Safe and Realistic Voice Cloning Technology. Say goodbye to the hassle of lengthy recording sessions—our cutting-edge solution transforms any text into natural-sounding audio using your own voice.

Each synthesis includes an invisible watermark to ensure your unique voice signature is fully protected and never misused. With our technology, your voice stays yours, even as it reaches audiences everywhere.

Next-generation Talking-head Video Generation Model

Unlock the full potential of virtual presentations with our Next-generation Talking-head Video Generation Model. Seamlessly integrated with our voice cloning technology, this model enables you to effortlessly create lifelike talking-head videos from any script.

Prepare to captivate your audience with presentations that not only sound like you but also visually represent your presence, making every speech engaging, no matter the occasion. Step into the spotlight with ease and let our technology amplify your message across any platform.

Break Down Language Barriers with Us

Master one language and speak to the world with our advanced translation model, seamlessly integrated with our Talking-head Video Generation and Voice Cloning technologies. Simply provide your script in your native language, and our model does the rest, automatically translating and generating presentation talking-head videos in over a many languages.

Whether you're addressing a local seminar or a global conference, our technology ensures you deliver your message clearly and effectively in any language, at any event.

What to do about deepfakes: opportunities and problems as AI tech makes leaps and bounds

Recent news reports from University of New South Wales about our technology.

A Video Demonstration of our Lecture Video Generator

Our technology can be used for multiple purposes, so we have set up a simple lecture generation for demonstration.

ABC 7.30 Interviews Our Team on the Future of AI Content

ABC 7.30 visited UNSW to learn how we’re building technology that creates and verifies AI-generated speech and video, aiming to reducing risks.

Prof Sanjay Jha (Chief Executive Officer) 

Professor and Director of Research and Innovation at the School of Computer Science and Engineering, University of New South Wales (UNSW).

Maansi Jha (Director of Operations) 

Over a decade of experience in public sector/education consulting at PwC Australia and program management within the NSW Department of Education. 

Wenbin Wang (Head of Technology) 

Final year PhD Candidate at the School of Computer Science and Engineering, University of New South Wales (UNSW), specialising in Text-to-Speech, Voice Cloning, Generative AI.

A/Prof Yang Song (Technical Advisor) 

ARC Future Fellow and Scientia Associate Professor at the School of Computer Science and Engineering, University of New South Wales (UNSW).

Publications

1. W. Wang, Y. Song, and S. Jha, “GLOBE: A High-quality English Corpus with Global Accents for Zero-shot Speaker Adaptive Text-to-Speech,” Proceedings of the Interspeech, 2024. [Demo] [Dataset]
2. W. Wang, Y. Song, and S. Jha, “USAT: A Universal Speaker-Adaptive Text-to-Speech Approach,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 2590-2604, 2024. [Dataset]
3. W. Wang, Y. Song, and S. Jha, “Method and system for zero-shot speaker-adaptive speech synthesis,” Patent, PCT/AU2023/050900.
4. W. Wang, Y. Song, and S. Jha, “Generalizable Zero-Shot speaker-adaptive Speech Synthesis with Disentangled Representations,” in Proceedings of the Interspeech (Oral), 2023.
5. W. Wang, Y. Song, and S. Jha, “Autolv: Automatic lecture video generator,” in Proceedings of the IEEE International Conference on Image Processing, 2022. [Demo]

Get in touch