AI Hustle: News on Open AI, ChatGPT, Midjourney, NVIDIA, Anthropic, Open Source LLMs: OpenAI's ChatGPT: Now Seeing, Hearing, and Speaking!

Jaeden Schafer & Jamie McCauley Jaeden Schafer & Jamie McCauley 10/10/23 - Episode Page - 11m - PDF Transcript

Welcome to the OpenAI podcast, the podcast that opens up the world of AI in a quick and

concise manner.

Tune in daily to hear the latest news and breakthroughs in the rapidly evolving world

of artificial intelligence.

If you've been following the podcast for a while, you'll know that over the last six

months I've been working on a stealth AI startup.

Of the hundreds of projects I've covered, this is the one that I believe has the greatest

potential.

So today I'm excited to announce AIBOX.

AIBOX is a no-code AI app building platform paired with the App Store for AI that lets

you monetize your AI tools.

The platform lets you build apps by linking together AI models like chatGPT, mid-journey

and 11Labs, eventually will integrate with software like Gmail, Trello and Salesforce

so you can use AI to automate every function in your organization.

To get notified when we launch and be one of the first to build on the platform, you

can join the wait list at AIBOX.AI, the link is in the show notes.

We are currently raising a seed round of funding.

If you're an investor that is focused on disruptive tech, I'd love to tell you more

about the platform.

You can reach out to me at jaden at AIBOX.AI, I'll leave that email in the show notes.

So the big news here, there was recently a demo released, a lot of people are talking

about it and in the demo, essentially someone takes a picture of their bicycle, uploads

it to chatGPT on their phone on the mobile app.

I believe this is iOS, I tried to download this on Android, I'm not seeing it.

So yeah, anyways, they took a picture of their bicycle on chatGPT, they were able to upload

it and as soon as it was uploaded, they were then able to add the message saying, you know,

help me lower my bike seat, right?

And so immediately after sending this, chatGPT is able to come up with a response that says,

you know, to lower your bike seat, locate the release lever, you know, if it's a quick

release lever, do this, tighten the quick release.

Anyways, it walks through the steps of doing it, it says, if you have further tools, show

me and I'll guide you through.

So then the person goes, they take a picture of where the kind of release thing on the

seat is, they circle it on their phone.

So they're actually like drawing attention to the specific part they're talking about

and they say, is this the lever, chatGPT says, no, that's not the lever, it's a bolt, you'll

need an Allen wrench to loosen it.

And then it, you know, asks about what tools they have, so they go and take a picture of

tool set they have.

And they also take a picture of, I believe, the users or the owner's manual for the bicycle

and they say, you know, here's the manual and here's my toolbox, do I have the right tools?

And this is like so fascinating because it's using this computer vision at the same time,

it says yes, you have the right tools on the left section of your toolbox, there's a set

labeled the wall within the set, find the nine millimeter Allen head.

So it's literally like looking at the picture and it's like talking you through, like on

the right side of your toolbox on that little thing with this label on it, like it's telling

you what specifically to go and find, go and do, really, really cool feature here.

So of course, this is the ability that chatGPT now has to see, this is kind of something

that was rumored and talked about back when GPT-4 launched, but it never actually happened

at the beginning of the year, until now it looks like they're finally rolling this in.

It's kind of interesting, I recently saw a tweet on Twitter where someone was saying

they're like, you know, OpenAI is a big conference planned for November, but what the heck is

there left for them to announce, like AGI, because after this announcement, like honestly

this is some massive capabilities, right?

So now you can take pictures, upload it to chatGPT and ask it about the picture, so it's

literally able to see.

In addition, like I mentioned in the intro, there's the huge news of Spotify being able

to do voices, or they're partnering with OpenAI to do voices, so they're going to translate

podcast voices into different languages, so essentially clone a voice and translate it.

I've also seen demos with chatGPT where people were able to, you're able essentially to talk

to it, so send an audio file where you are talking, you say, hey chatGPT, tell my kids

a bedtime story about a hedgehog that lives on the top of a mountain, and chatGPT was

able to say, sure, here's a bedtime story and talk back to you, so chatGPT can talk

back to you, you can talk to chatGPT, you can take pictures, send it to chatGPT, ask

it for context or information about the pictures that you're seeing, it can see, it can hear,

and most recently, of course, with the huge announcement that they're going to start integrating

the brand new version of Dolly 3 in there, so it's also able to train your really interesting

pictures.

I saw some interesting demos where essentially they were talking to chatGPT and saying like,

hey, you know, tell my kids a bedtime story about a hedgehog, it tells them the story

about the hedgehog, right, using the voice feature, but then they're like, okay cool,

my kids really want to know what the hedgehog looks like, can you show us a picture of what

this hedgehog would look like if it was on top of a mountain in a forest, and so it generates

the picture, and it's like, okay, cool, my kids are about to go to bed, can you generate

a picture of this hedgehog as it's going to bed, it does that, same hedgehog, and this

is another thing that I think is really interesting with something like Mid Journey, you can try

to use the same prompt to get a very similar type of character, so for example, I've spoke

with a number of people that have made like children's illustrated books, and they have

like keywords they put in there to try to get the same characters in the books, that's

one of the hard things is to make it really cohesive, but they say things like, you know,

like illustrated in like X, Y, and Z style, and they know that if they say like girl with

black hair in this style, it tends to come up with a very similar looking girl, and so

they were able to make a bunch of kids books doing this, but they're not like perfect,

and so they're not exactly the same person, it was like okay, right, but I think what's

really incredible here that ChatGPT is going to have the capability of doing is because

you can upload pictures and ask it questions about that, you can say, you know, generate

me the picture of this hedgehog, or this character in a book, or this person, or this thing,

it generates it, and you say okay, now generate that same thing, but in this scene, in this

place doing this thing, or that, and all of a sudden ChatGPT because of the way it's integrated

with natural language processing, and it has this computer vision element where you're

able to upload and talk, it's able to do some really powerful effects where it's like essentially

manipulating the same characters, the same images to give you, you know, different variations

of that, which are going to be really powerful. So many incredible use cases are coming out

with this, it's all over Twitter, people are talking about some really impressive use cases,

but honestly, you know, recently we even had Lex Friedman talking about, he's one of the

first people that are going to be on this whole spotify thing, and he recently said,

he posted a clip of literally him speaking in English on his podcast, and it translated

to Spanish, he said, this is me speaking Spanish thanks to amazing work by Spotify, AI, engineers.

The translation and voice cloning are done by AI languages, can create barriers of understanding,

and thus fuel division. I can't wait for AI to break down this barrier and reveal our

common humanity. Great job, Spotify, I'm excited for what the future holds, check out the full

episode of me speaking Spanish. So, you know, I mean, he's trying to promote his podcast,

which by all means, that's awesome, but you know, just then on LinkedIn had almost 10,000

people that have liked it in the last 21 hours. So really, really powerful technology, I think

this kind of just goes to show like this is something people definitely want. I, even for

myself, I've been invited on a number of podcasts where they're like, hey, I'd love to have you

on this podcast, we talk about AI, but it's in like, you know, we're based in Sweden, and like

that's pretty much our audience. And so like for me, it's just like not super viable. Obviously,

I don't speak anything other than English and French, so I'm kind of limited to those two

lanes. But I think this is incredible, being able to translate into all these different

languages and clone your own voice. Absolutely incredible. I see, you know, if this is really

what OpenAI is building in, you can see all of the different incredible use cases that

are going to happen, you're essentially going to be able to go travel to a new place, and

you know, talk into your phone, and it's going to in real time translate for people. So that or

even maybe you wear a pair of AirPods with your phone that is listening. So when people talk to

you in another language, it immediately translates into English, you speak into your phone, and then

it's going to in your same voice, speak to them in French or Spanish or German or whatever, wherever

you're visiting, and give them that, you know, you're going to be able to communicate so much

easier, there's going to be some incredible applications built on top of this technology. And

of course, on top of the technology that we're able to see things all around us. So, you know,

imagine traveling, you're in another country, you take a picture of a street, and you're like,

hey, I'm looking for like, you don't know what any of the shops are. But it's like, based on all

the street signs, which one of these is like, which one of these is like a shoe store or something,

right? I don't know, it sounds funny, but there's going to be so many incredible use cases. I think

this is going to just further integrate AI into things. And honestly, I think right now, this is

OpenAI is kind of secret to getting ahead, because they're building APIs with all of this stuff.

And so if you think about it, you know, right now we have like Anthropic that just raised $4

billion, or has the ability to raise up to $4 billion from Amazon and AWS, you know, they're

still just trying to compete on chat GPT alone, and just that tool. But as chat GPT now is starting

to integrate this image generation and the speech and this audio and all sorts of other things.

And of course, I, you know, I find it hard to believe that video is very far away. I think, you

know, a lot of their competitors are going to have to fight really hard to still be relevant,

because I think once people get all of these new features from chat GPT, it's going to be really

hard to convince yourself to go and pay for something like Anthropic, if it doesn't have those

abilities to have, you know, hearing, seeing, speaking, and everything else integrated, it's

going to become incredibly convenient. And I think this may kind of push it to mainstream beyond

just people using chat GPT to, you know, generate articles or responses and texts for them. But

all of a sudden, it may be just an app everyone has to have, everyone has to have premium and you

use it all the time. It's like your travel companion, you can't live without it. I see it

going more in that direction with some of these new features. So very exciting, definitely something

will continue to follow and a lot of amazing advancements to come. If you are looking for an

innovative and creative community of people using chat GPT, you need to join our chat GPT

creators community. I'll drop a link in the description to this podcast. We'd love to see

you there where we share tips and tricks of what is working in chat GPT. It's a lot easier than

a podcast as you can see screenshots, you can share and comment on things that are currently

working. So if this sounds interesting to you, check out the link in the comment, we'd love to

have you in the community. Thanks for joining me on the open AI podcast. It would mean the world

to me if you would rate this podcast wherever you listen to your podcasts and I'll see you tomorrow.

Machine-generated transcript that may contain inaccuracies.

Discover the groundbreaking enhancements as we delve into the latest ChatGPT update from OpenAI, where the AI can now see, hear, and speak. Join us for an in-depth conversation about the implications of this technological leap, from improved user interactions to potential applications across industries. Tune in to explore the exciting new capabilities of ChatGPT that are shaping the future of AI-powered communication.


Get on the AI Box Waitlist: https://AIBox.ai/
Join our ChatGPT Community: ⁠https://www.facebook.com/groups/739308654562189/⁠
Follow me on Twitter: ⁠https://twitter.com/jaeden_ai⁠