AI Hustle: News on Open AI, ChatGPT, Midjourney, NVIDIA, Anthropic, Open Source LLMs: Microsoft Open Sources Game-Changing Protein AI, EvoDiff

Jaeden Schafer & Jamie McCauley Jaeden Schafer & Jamie McCauley 10/9/23 - Episode Page - 9m - PDF Transcript

Welcome to the OpenAI podcast, the podcast that opens up the world of AI in a quick and

concise manner.

Tune in daily to hear the latest news and breakthroughs in the rapidly evolving world

of artificial intelligence.

If you've been following the podcast for a while, you'll know that over the last six

months I've been working on a stealth AI startup.

Of the hundreds of projects I've covered, this is the one that I believe has the greatest

potential.

So today I'm excited to announce AIBOX.

AIBOX is a no-code AI app building platform paired with the App Store for AI that lets

you monetize your AI tools.

The platform lets you build apps by linking together AI models like chatGPT, mid-journey

and 11 labs, eventually will integrate with software like Gmail, Trello and Salesforce

so you can use AI to automate every function in your organization.

To get notified when we launch and be one of the first to build on the platform, you

can join the wait list at AIBOX.AI, the link is in the show notes.

We are currently raising a seed round of funding.

If you're an investor that is focused on disruptive tech, I'd love to tell you more

about the platform.

You can reach out to me at jaden at AIBOX.AI, I'll leave that email in the show notes.

The landscape of protein design, which is a cornerstone in understanding and treating

diseases, I think is on the brink of a transformation.

So proteins, for those that don't know, essentially serve as the natural molecules executing critical

cellular functions.

But creating them in a lab has often been a really, like it's really costly and it's

very complex.

However, this is all potentially about to change.

Microsoft claims to have simplified this really intricate process with its newly introduced

framework, which is called EvoDiff.

So traditionally, protein design has kind of a bunch of extensive computational and

human resources that are needed in order to do it.

And scientists have to conceptualize a protein structure capable of performing specific bodily

tasks.

And then once they do that, they have to determine the sequence of amino acids that would likely

fold into that structure.

Okay, so kind of complex scientific stuff.

But protein folding is essentially, as proteins have to adopt, they have to essentially adopt

these precise three dimensional shapes to function as their intended.

So that's what protein folding is.

So EvoDiff offers a radical kind of departure from what has been the norm for so long.

So according to Microsoft, the general purpose framework can churn out, you know, high fidelity

diverse proteins based solely on a protein sequence.

So this completely sidesteps the often very cumbersome task of requiring structural information

about the target protein.

And this open source framework holds promise in essentially applications ranging from generating

enzymes for novel therapeutics to also kind of like facilitating new industrial chemical

reactions.

So Kevin Yang, who is a senior researcher at Microsoft, he envisions EvoDiff as a groundbreaking

tool in protein engineering.

So he said, he said, quote, we envision that EvoDiff will expand capabilities in protein

engineering beyond the structure function paradigm towards programmable sequence first

design.

So he also said, with EvoDiff, we're demonstrating that we may not actually need structure, but

rather than protein sequence, rather that, or rather that proteins, protein sequencing

is all that you need to controllably design new proteins, really, really interesting stuff.

So I think at the core of EvoDiff is around a 640 million parameter model trained on a

comprehensive data set spanning various species and functional classes of proteins.

So for the, you know, uninitiated parameters in an AM model are essentially learned from

training data and essentially dictate the model's competence.

So data sources for the model include the open fold data set for sequence alignment

and UNRF 50, which is essentially a subset of data from the renowned Uniprot consortium

database.

So drawing parallels to cutting edge image generating models like stable diffusion and

Dolly to EvoDiff operates as a diffusion model.

This is really interesting because, you know, these diffusion models for those that don't

know essentially what that really means is it's like, if so, we'll go to like images,

but it's really interesting.

They've moved this over to protein and science and other areas.

So this is kind of like a, in my opinion, this is another testament to like how cool

and impactful a lot of these, a lot of these advances in AI are because they're not just

like for, you know, generating images on mid journey or generating text on chat, GBT, like

the way that they're built and the architecture of these tools is now being used in so many

other things.

And actually, I'll also just as a side note say, I think it is really incredible.

People, people just hear the word AI and in their brain, they're like, yeah, AI is just

like a computer doing stuff.

And I think they don't really realize the fundamentally like how incredible it is that

we have image AI and also text generating, generating AI coming up at the same time,

because these are actually fundamentally different in how these things function.

Like, of course, you need data in for both of them.

But for diffusion, diffusion models, which is what this new EvoDiff is using, but how

it works for images, essentially diffusion model means that like when it when images

rendering, it's kind of like chat, GBT, where it's predicting the next token, but instead

of predicting the next token, it's predicting the next pixel in an image.

And so you can imagine it where it's like a square and you've probably seen this on

if you've used something like mid journey, where you like look at the image and it's

like blurry and it slowly kind of like comes into focus, right?

So that's what a diffusion model is doing.

It's essentially rendering it where it like renders like a really fuzzy bunch of pixels

in a square.

And then it's like, if we were trying to do X, Y and Z, like how would the pixels

change?

What's the prediction?

And it slowly like almost like comes into focus.

It's diffusion and it's coming into focus of what it's actually supposed to do,

predicting all the pixels and the placements, super, super interesting stuff.

So it's really cool because that same technology is now getting moved into other

areas like EvoDiff.

So essentially how EvoDiff works is that it would it refines a protein made mostly

of, you know, quote unquote noise and then gradually filtering out the distractions

to arrive at an accurate protein sequence, such as, you know, the same thing

that diffusion models are kind of how they're not like confined to obviously

proteins or applications really stretch across a bunch of different domains,

including music and speech synthesis.

So really that diffusion model, I think is a very, very like influential in the

sense that like, of course, we discovered it to create images, but now we're literally

using it for image generation.

We're using it for protein generation.

We're using it for like music creation, speech creation, all sorts of really

interesting things.

And it's kind of like this diffusion model that's doing all that and really,

really cool stuff.

In any case, EvoDiff cannot only create new proteins, but it can fill in the

gaps in existing protein design.

So that's according to Eva Amini, which is another researcher, a senior researcher

at Microsoft.

So the frameworks of versatility allows it to essentially generate protein,

amino acid sequences, meeting specific functional criteria, and even to

synthesize, quote unquote, disordered proteins.

So those don't typically fold into a final structure, but they still play vital

roles in biology and diseases and stuff.

So very, very interesting.

And while EvoDiff appears promising, I think it's, you know, fairly essential to

note that the research has yet to kind of undergo peer review.

So this is coming out of Microsoft.

And then Sarah Alamadari, who is a data scientist at Microsoft caution that

there's still quote, a lot more scaling work needed before commercial

application and also said, quote, this is just a 640 million parameter model.

And we may see improved generation quality if we scale up to billions of parameters.

So I think looking ahead, the EvoDiff team plans to kind of validate the

generated proteins in the lab.

If successful, this is going to pave the way for the framework's next iteration,

which is opening new vistas in protein engineering and healthcare innovation.

Also, it's a really exciting thing.

So definitely a story we're going to continue following.

And we're really excited to see how this continues to advance and play out.

If you are looking for an innovative and creative community of people using

chat GPT, you need to join our chat GPT creators community.

I'll drop a link in the description to this podcast.

We'd love to see you there where we share tips and tricks of what is working in chat GPT.

It's a lot easier than a podcast as you can see screenshots.

You can share and comment on things that are currently working.

So if this sounds interesting to you, check out the link in the comment.

We'd love to have you in the community.

Thanks for joining me on the open AI podcast.

It would mean the world to me if you would rate this podcast

wherever you listen to your podcasts, and I'll see you tomorrow.

Machine-generated transcript that may contain inaccuracies.

Join us in this episode as we unveil Microsoft's groundbreaking move in open sourcing EvoDiff, a cutting-edge AI for protein generation. Explore the potential of this revolutionary technology in the fields of biotechnology and drug discovery. Learn how EvoDiff is set to transform the way we approach protein research and development, thanks to Microsoft's forward-thinking commitment to open source innovation.


Get on the AI Box Waitlist: https://AIBox.ai/
Join our ChatGPT Community: ⁠https://www.facebook.com/groups/739308654562189/⁠
Follow me on Twitter: ⁠https://twitter.com/jaeden_ai⁠