AI Hustle: News on Open AI, ChatGPT, Midjourney, NVIDIA, Anthropic, Open Source LLMs: Eleven Labs' Breakthrough: AI Audiobook Creation in Minutes
Jaeden Schafer & Jamie McCauley 10/9/23 - Episode Page - 10m - PDF Transcript
Welcome to the OpenAI podcast, the podcast that opens up the world of AI in a quick and
concise manner.
Tune in daily to hear the latest news and breakthroughs in the rapidly evolving world
of artificial intelligence.
If you've been following the podcast for a while, you'll know that over the last six
months I've been working on a stealth AI startup.
Of the hundreds of projects I've covered, this is the one that I believe has the greatest
potential.
So today I'm excited to announce AIBOX.
AIBOX is a no-code AI app building platform paired with the App Store for AI that lets
you monetize your AI tools.
The platform lets you build apps by linking together AI models like chatGPT, mid-journey
and 11 labs, eventually will integrate with software like Gmail, Trello and Salesforce
so you can use AI to automate every function in your organization.
To get notified when we launch and be one of the first to build on the platform, you
can join the wait list at AIBOX.AI, the link is in the show notes.
We are currently raising a seed round of funding.
If you're an investor that is focused on disruptive tech, I'd love to tell you more
about the platform.
You can reach out to me at jaden at AIBOX.AI, I'll leave that email in the show notes.
In a leap towards revolutionizing the field of long form audio creation, today marks the
launch of projects from 11 labs.
So 11 labs is the audio generating AI platform.
And this new feature they have called projects is a one-stop kind of a workflow solution for
generating and editing extensive audio content.
So this came from a really a ton of exhaustive research they did into long form speech synthesis
audio conditioning and paralyzed audio generation and projects aims to alleviate the multifaceted
challenges faced by creators, publishers and independent authors and audio production.
So really what they're saying here is this is now a platform where you can create audio
books.
And I think this is a really like timely piece of news because Project Gutenberg just said
that they have, it's essentially like a platform that has like a ton of different open sourced
or like books pretty much, right?
That's Project Gutenberg.
And Project Gutenberg has just used AI to essentially dictate or like voice 5,000 books
and it's just open and available for everyone to listen to for free, right?
So I don't know how big companies like Audible are going to respond, how they're going to
be impacted, right?
Because eventually, inevitably, if the rights to the book are not owned by someone, then
all books are just going to be available for everyone to listen to for free.
Project Gutenberg, I think, put it on Spotify and YouTube and a bunch of other places.
So it'll be interesting.
Of course, there's still going to be books where it's like a brand new book that was
just written and it's author and they read it and they're going to put it exclusively
on Audible.
So there's going to be a space for Audible for sure.
In any case, I think this whole audio format area is getting some massive shakeups and
this new feature released by Eleven Labs is definitely one of those big shakeups.
So join in an already very robust kind of suite of tools, including speech synthesis,
voice lab and voice library projects from Eleven Labs, I think really stands out as
a specialized tool for long form audio creation.
So this could be videos, this could be books, this could be movie scripts, all sorts of
things.
But I think it arrives on the heels of a really hot demand right now for long form audio
content and it integrates effortlessly with professional voice cloning, meaning you can
clone your voice and have it read a book, read your book.
It's really, really a useful tool, right?
Authors are going to love this.
They don't have to sit there manually, read an entire book, for example, voice library
and a multilingual model are essentially built into this new thing.
And this is making it a very comprehensive solution for a bunch of different audio needs.
Something that I think is really cool with some of the multilingual AI tech I've seen
so far is the fact that you can record your own voice.
And besides just making like a clone of your voice to talk in like English, for example,
you can also clone your voice and get it to speak, you know, Mandarin or French.
But in like, it sounds like your voice, but now you're speaking another language.
That to me is like so cool that I'm seeing it integrated into these audio platforms.
So prior to the advent of projects, users often found themselves when, you know, using
11Labs, myself included, I'm a user paying for it monthly, they, you know, we really
found ourselves kind of tangled in an array of different challenges from stability issues
to inconvenient file format limitations.
One particularly annoying thing was the disconnection when piecing together text fragments from
different speakers, which resulted in like really jarring transitions and a lack of like
the whole thing sounding like really cohesive.
So you know, really, you had to essentially regenerate entire audio fragments just to
fix a few minor flaws.
And it was really inefficient and frustrating of a process.
So I've watched their demo video of how this works, and I think it's actually very smart.
And I really like in this, I'm seeing this in other platforms, like for example, example,
big shout out to Opus Pro.
It's a platform where you can upload like an entire podcast, for example, I've done
it, if you follow me on Instagram, and you've seen my reels that I post that just straight
up comes from Opus, where you can upload your entire podcast script, or your entire podcast
video, and then it will decide what pieces of it are like interesting uses AI to decide
which pieces are interesting, and it's going to clip them out, then it throws a bunch of
editing on top cool text transitions and stuff, and it essentially creates an entire reel
for you.
Now, the thing that I really love about it, the reason I bring it up is because they're
using the same editing technique that I'm seeing 11 labs use, and that is I'm so used
to it, I don't know if you've done much video editing or audio editing, but I'm so used to
most platforms where essentially you see like the audio file in front of you, and you like
listen to a point, it's a bunch of spikes up and down, right?
That's how audio files look, and you like clip different audio files, you delete things
that you don't like, whatever, that's how you edit stuff.
The way that these new AI and kind of more modern editors work, whether that's video
or audio, is that it essentially gives you a transcript.
So if you upload an audio file or you get something recorded, or in Opus's case, the
video clip, it shows you just the transcript of what's being said, and you literally like
if you don't want a word in your video, you just backspace it off the transcript.
So it's just like editing a text document, except it's tied to the timestamp and everything
on the actual video or the audio file, and it edits it there live, absolutely love that
form of editing.
I don't know why, but it's so annoying, like for me, using video editors, like a premiere,
for example, trying to like trim the exact place where I start or stop saying a word,
so much better to just see the transcript and just backspace the word off the transcript
or black it out on Opus, and then all of a sudden it's like just removed, super, super
cool.
So 11 Labs is doing the same thing where essentially you're able to do like multiple voices.
So for example, if you have an audio book and there's like three different characters
in there, you can select like one of the characters quotes, and then you can use a
different voice to narrate like, so it's actually like a conversation between all these different
people.
And you know, you've heard this before in audio books where it's like one voice actor
and they like put on a different accent when they're talking like a different person, and
it's kind of like, you know, funny, and then it's like the grandpa talks like this and
like the kid talks like that.
It's like funny and it's like whatever, and there's just skill and whatever involved.
This is kind of cool though, because it's like you literally get different voice actors
for all the different parts.
It's really easy to edit.
You just highlight the text you want, select the voice, it's going to do the voice for
that one.
You highlight the next one, change the voice if you want, or continue with the same voice.
And this works for both their library of, for those that don't know, 11 Labs has like
a library of their own voices.
They also have a community tab where people can, you know, essentially allow their voices
to be used, and then they also have the whole voice cloning thing.
So you can clone your own voice or someone in your studio's voice, upload it, and then
use that to narrate different parts.
So very, very cool.
So essentially what they're doing here is that they are promising an entire audio book
at the click of a button.
And so this whole new projects feature brings in an array of really interesting features.
Users can now designate specific text fragments to particular speakers like I mentioned.
And the thing that's interesting here is that they're both, like you can do multiple languages.
So you could, you can essentially like highlight and change languages.
They have different voices for different languages and different things.
So all of this is very, very interesting.
In their blog post, they recently said, quote, with projects, our goal was to design a tool
that makes long form audio generation as simple as possible, drawing from fresh, fresh research
and your feedback.
We've developed a comprehensive solution which also seamlessly integrates with our existing
ecosystem of tools.
We can't wait to hear you bring your stories to life and quote.
So in any case, I think this is a really cool tool.
I think this is going to be a game changer in a lot of different areas.
We're going to see a ton more, like this really unlocks audio books now and an affordable
rate I think.
And so I think it's going to be interesting to see the different audio creations that
come out of this new innovation, this new platform, 11 Labs really is on top of their
game and have definitely a company that will continue to follow in the future.
If you are looking for an innovative and creative community of people using ChatGPT, you need
to join our ChatGPT creators community.
I'll drop a link in the description to this podcast.
We'd love to see you there where we share tips and tricks of what is working in ChatGPT.
It's a lot easier than a podcast as you can see screenshots, you can share and comment
on things that are currently working.
So if this sounds interesting to you, check out the link in the comment.
We'd love to have you in the community.
Thanks for joining me on the open AI podcast.
It would mean the world to me if you would rate this podcast wherever you listen to your
podcasts and I'll see you tomorrow.
Machine-generated transcript that may contain inaccuracies.
Join us in this episode as we explore the groundbreaking innovation from Eleven Labs, which promises to revolutionize the world of audiobooks. Discover how their cutting-edge projects enable the creation of AI-driven audiobooks in a matter of minutes. Tune in to learn more about the future of storytelling and the transformative impact of AI on the audiobook industry.
Get on the AI Box Waitlist: https://AIBox.ai/
Join our ChatGPT Community: https://www.facebook.com/groups/739308654562189/
Follow me on Twitter: https://twitter.com/jaeden_ai