AI Hustle: News on Open AI, ChatGPT, Midjourney, NVIDIA, Anthropic, Open Source LLMs: Eleven Labs' Breakthrough: AI Audiobook Creation in Minutes

Jaeden Schafer & Jamie McCauley 10/9/23 - Episode Page - 10m - PDF Transcript

Transcript
Show Notes

Welcome to the OpenAI podcast, the podcast that opens up the world of AI in a quick and

concise manner.

Tune in daily to hear the latest news and breakthroughs in the rapidly evolving world

of artificial intelligence.

If you've been following the podcast for a while, you'll know that over the last six

months I've been working on a stealth AI startup.

Of the hundreds of projects I've covered, this is the one that I believe has the greatest

potential.

So today I'm excited to announce AIBOX.

AIBOX is a no-code AI app building platform paired with the App Store for AI that lets

you monetize your AI tools.

The platform lets you build apps by linking together AI models like chatGPT, mid-journey

and 11 labs, eventually will integrate with software like Gmail, Trello and Salesforce

so you can use AI to automate every function in your organization.

To get notified when we launch and be one of the first to build on the platform, you

can join the wait list at AIBOX.AI, the link is in the show notes.

We are currently raising a seed round of funding.

If you're an investor that is focused on disruptive tech, I'd love to tell you more

about the platform.

You can reach out to me at jaden at AIBOX.AI, I'll leave that email in the show notes.

In a leap towards revolutionizing the field of long form audio creation, today marks the

launch of projects from 11 labs.

So 11 labs is the audio generating AI platform.

And this new feature they have called projects is a one-stop kind of a workflow solution for

generating and editing extensive audio content.

So this came from a really a ton of exhaustive research they did into long form speech synthesis

audio conditioning and paralyzed audio generation and projects aims to alleviate the multifaceted

challenges faced by creators, publishers and independent authors and audio production.

So really what they're saying here is this is now a platform where you can create audio

books.

And I think this is a really like timely piece of news because Project Gutenberg just said

that they have, it's essentially like a platform that has like a ton of different open sourced

or like books pretty much, right?

That's Project Gutenberg.

And Project Gutenberg has just used AI to essentially dictate or like voice 5,000 books

and it's just open and available for everyone to listen to for free, right?

So I don't know how big companies like Audible are going to respond, how they're going to

be impacted, right?

Because eventually, inevitably, if the rights to the book are not owned by someone, then

all books are just going to be available for everyone to listen to for free.

Project Gutenberg, I think, put it on Spotify and YouTube and a bunch of other places.

So it'll be interesting.

Of course, there's still going to be books where it's like a brand new book that was

just written and it's author and they read it and they're going to put it exclusively

on Audible.

So there's going to be a space for Audible for sure.

In any case, I think this whole audio format area is getting some massive shakeups and

this new feature released by Eleven Labs is definitely one of those big shakeups.

So join in an already very robust kind of suite of tools, including speech synthesis,

voice lab and voice library projects from Eleven Labs, I think really stands out as

a specialized tool for long form audio creation.

So this could be videos, this could be books, this could be movie scripts, all sorts of

things.

But I think it arrives on the heels of a really hot demand right now for long form audio

content and it integrates effortlessly with professional voice cloning, meaning you can

clone your voice and have it read a book, read your book.

It's really, really a useful tool, right?

Authors are going to love this.

They don't have to sit there manually, read an entire book, for example, voice library

and a multilingual model are essentially built into this new thing.

And this is making it a very comprehensive solution for a bunch of different audio needs.

Something that I think is really cool with some of the multilingual AI tech I've seen

so far is the fact that you can record your own voice.

And besides just making like a clone of your voice to talk in like English, for example,

you can also clone your voice and get it to speak, you know, Mandarin or French.

But in like, it sounds like your voice, but now you're speaking another language.

That to me is like so cool that I'm seeing it integrated into these audio platforms.

So prior to the advent of projects, users often found themselves when, you know, using

11Labs, myself included, I'm a user paying for it monthly, they, you know, we really

found ourselves kind of tangled in an array of different challenges from stability issues

to inconvenient file format limitations.

One particularly annoying thing was the disconnection when piecing together text fragments from

different speakers, which resulted in like really jarring transitions and a lack of like

the whole thing sounding like really cohesive.

So you know, really, you had to essentially regenerate entire audio fragments just to

fix a few minor flaws.

And it was really inefficient and frustrating of a process.

So I've watched their demo video of how this works, and I think it's actually very smart.

And I really like in this, I'm seeing this in other platforms, like for example, example,

big shout out to Opus Pro.

It's a platform where you can upload like an entire podcast, for example, I've done

it, if you follow me on Instagram, and you've seen my reels that I post that just straight

up comes from Opus, where you can upload your entire podcast script, or your entire podcast

video, and then it will decide what pieces of it are like interesting uses AI to decide

which pieces are interesting, and it's going to clip them out, then it throws a bunch of

editing on top cool text transitions and stuff, and it essentially creates an entire reel

for you.

Now, the thing that I really love about it, the reason I bring it up is because they're

using the same editing technique that I'm seeing 11 labs use, and that is I'm so used

to it, I don't know if you've done much video editing or audio editing, but I'm so used to

most platforms where essentially you see like the audio file in front of you, and you like

listen to a point, it's a bunch of spikes up and down, right?

That's how audio files look, and you like clip different audio files, you delete things

that you don't like, whatever, that's how you edit stuff.

The way that these new AI and kind of more modern editors work, whether that's video

or audio, is that it essentially gives you a transcript.

So if you upload an audio file or you get something recorded, or in Opus's case, the

video clip, it shows you just the transcript of what's being said, and you literally like

if you don't want a word in your video, you just backspace it off the transcript.

So it's just like editing a text document, except it's tied to the timestamp and everything

on the actual video or the audio file, and it edits it there live, absolutely love that

form of editing.

I don't know why, but it's so annoying, like for me, using video editors, like a premiere,

for example, trying to like trim the exact place where I start or stop saying a word,

so much better to just see the transcript and just backspace the word off the transcript

or black it out on Opus, and then all of a sudden it's like just removed, super, super

cool.

So 11 Labs is doing the same thing where essentially you're able to do like multiple voices.

So for example, if you have an audio book and there's like three different characters

in there, you can select like one of the characters quotes, and then you can use a

different voice to narrate like, so it's actually like a conversation between all these different

people.

And you know, you've heard this before in audio books where it's like one voice actor

and they like put on a different accent when they're talking like a different person, and

it's kind of like, you know, funny, and then it's like the grandpa talks like this and

like the kid talks like that.

It's like funny and it's like whatever, and there's just skill and whatever involved.

This is kind of cool though, because it's like you literally get different voice actors

for all the different parts.

It's really easy to edit.

You just highlight the text you want, select the voice, it's going to do the voice for

that one.

You highlight the next one, change the voice if you want, or continue with the same voice.

And this works for both their library of, for those that don't know, 11 Labs has like

a library of their own voices.

They also have a community tab where people can, you know, essentially allow their voices

to be used, and then they also have the whole voice cloning thing.

So you can clone your own voice or someone in your studio's voice, upload it, and then

use that to narrate different parts.

So very, very cool.

So essentially what they're doing here is that they are promising an entire audio book

at the click of a button.

And so this whole new projects feature brings in an array of really interesting features.

Users can now designate specific text fragments to particular speakers like I mentioned.

And the thing that's interesting here is that they're both, like you can do multiple languages.

So you could, you can essentially like highlight and change languages.

They have different voices for different languages and different things.

So all of this is very, very interesting.

In their blog post, they recently said, quote, with projects, our goal was to design a tool

that makes long form audio generation as simple as possible, drawing from fresh, fresh research

and your feedback.

We've developed a comprehensive solution which also seamlessly integrates with our existing

ecosystem of tools.

We can't wait to hear you bring your stories to life and quote.

So in any case, I think this is a really cool tool.

I think this is going to be a game changer in a lot of different areas.

We're going to see a ton more, like this really unlocks audio books now and an affordable

rate I think.

And so I think it's going to be interesting to see the different audio creations that

come out of this new innovation, this new platform, 11 Labs really is on top of their

game and have definitely a company that will continue to follow in the future.

If you are looking for an innovative and creative community of people using ChatGPT, you need

to join our ChatGPT creators community.

I'll drop a link in the description to this podcast.

We'd love to see you there where we share tips and tricks of what is working in ChatGPT.

It's a lot easier than a podcast as you can see screenshots, you can share and comment

on things that are currently working.

So if this sounds interesting to you, check out the link in the comment.

We'd love to have you in the community.

Thanks for joining me on the open AI podcast.

It would mean the world to me if you would rate this podcast wherever you listen to your

podcasts and I'll see you tomorrow.

Machine-generated transcript that may contain inaccuracies.

Join us in this episode as we explore the groundbreaking innovation from Eleven Labs, which promises to revolutionize the world of audiobooks. Discover how their cutting-edge projects enable the creation of AI-driven audiobooks in a matter of minutes. Tune in to learn more about the future of storytelling and the transformative impact of AI on the audiobook industry.

Get on the AI Box Waitlist: https://AIBox.ai/
Join our ChatGPT Community: ⁠https://www.facebook.com/groups/739308654562189/⁠
Follow me on Twitter: ⁠https://twitter.com/jaeden_ai⁠