This Week in Startups: Reverse-engineering autonomy in humanoid robots with Sanctuary AI CEO Geordie Rose | E1832

Jason Calacanis 10/21/23 - Episode Page - 1h 3m - PDF Transcript

Transcript
Show Notes

I want to just make this very clear that my perspective on AI and automation is that there's an upward spiral when you have more energy, you have more intelligence, you have more capability.

These drive all the metrics of human flourishing up. They don't take. So when we think about the answer, when will you get lights out manufacturing? I think the answer is never because people will always find new things to do with the tools that we've built.

Even very powerful tools that can think and maybe are even self aware. These will only increase the number of jobs, the increase wages, but there'll be different kinds of jobs.

There'll be the sorts of things that maybe we can't even imagine now.

This week in startups is brought to you by InTouch CX. Looking for ways to make your startup more efficient?

InTouch CX has a groundbreaking suite of AI powered tools for end to end optimization to give your business the edge it needs to thrive.

Get started with your free consultation at intouchcx.com slash twist.

Fount, do you want access to the performance protocols that pro athletes and special ops use?

With Fount, an elite military operator supercharges your focus, sleep, recovery and longevity, all powered by your unique data.

Want a true edge in work and life? Go to fount.bio slash twist for $500 off and

.techdomains has a new program called startups.tech where you can get your company featured on this week in startups.

Go to startups.tech slash jason to find out how.

Hey everybody, welcome to this week in startups. We've been focused a ton on AI this past year.

Of course, we talked about it over the last decade on the show, but things have heated up with language models and

you know, the very forgotten category of startups is of course robotics.

We see once in a while on the internet a trending video and the trending video tends to be when a Boston

dynamics robot's doing a backflip or we see maybe some surgery being done on a grape.

You've seen all these viral videos, but the idea of humans leaving the factory floor

and going and doing things in the real world, well, we don't see many startups doing that.

We have one in our portfolio, Cafe X making

coffees at SFOs airport right now.

And of course, our friends over at Tesla are making Optimus and there's another startup called Figure.

They're working on a humanoid robot. Sanctuary AI is today's guest.

They're another startup working on this problem and they're specifically focused on building robots with general intelligence.

What does this mean?

Well, it's not verticalized and they're not just making a cup of coffee.

But what if these robots could solve problems in the same way we do as biological creatures, as human beings?

And if that works, well, that's going to have the economic impact of humanity.

And it's going to go well beyond just the steam engine.

And we have the founder or I should say the co-founder and CEO of Sanctuary AI on the program.

His name is Geordie Rose. Welcome to the program, Geordie.

Thanks for having me.

Great name. I am reminded of the Mark Knopfler lyric from the amazing song,

Selling to Philadelphia, where he says, I am Jeremiah Dixon.

I am a Geordie boy. Do you understand the reference Geordie boy?

I do, yeah.

So let's talk a little bit about the company.

And I know you were founded in 2018.

So you've been working on this for a while.

You've raised close to 100 million bucks.

Where are you at with building this humanoid robot?

And I would love to see the latest.

I should start by saying that our approach to the problem and the reasons for us working on it

are slightly different than most people who work in robotics.

For us, the motivation for doing it was a belief that human-like intelligence and more

generally the intelligence of animals, which is kind of our model for what intelligence means,

is very intimately tied to our presence in the world.

We have a body. We are a thing. We experience the world through our senses.

We develop understanding of it through interacting with it.

And then we act on it to achieve our goals.

All of those things are very difficult to do if you're not actually physically present in the

world. So the starting point of this, which actually goes back more than a decade now through

two different companies, was to explore this idea that intelligence, by which I mean general

intelligence, emerges as a consequence of having to deal with the real world.

The real world, you never see the same thing twice.

You have to be able to generalize from your previous experiences to new experiences.

You have to be able to understand the common sense ways that the world is.

So we've been building software, which you could call general intelligence or AI,

but it's also control systems for robots.

And we've always viewed the problem of artificial general intelligence through that lens is that,

for us, a true general intelligence can be thought of as a control system for a robot that

converts what it sees, hears, touches, feels about the world into actions that are intended to reach

goals. So for us, the robot is somewhat means to an end. And because of that thesis, we focused

almost exclusively on a very hard but very fundamental problem, which is the building

and use of hands. So much of the humanoid robotics videos are performative.

They show robots doing things, but they're not valuable things. And for us, I think that the key

value of doing this is to understand how an entity, a robot or a person,

understands the world well enough to be able to manipulate it with its five-fingered opposable

thumb hands. I believe that the hand played a big part in our technological evolution

and also in the development of language, which are related things.

So that's what the thesis... How did that happen? How did the hand

play a role in language? I'm curious. Was it writing or the ability to hold pen?

Speaking. So how does a hand help you speak?

Yeah. So although this is speculative, there is a lot of evidence that the earliest spoken language

was very strongly connected to the things that our hands do, like point, touch, grasp and so on.

And some of the evidence for that is in neuroscience, where the part of your brain that

controls the grasping and the use of the hands overlaps with your language center.

These things are not disconnected. And when you actually try to build a system that

touches, feels the world and can interact with it in the way we do, you see this explicitly,

is that the cognition, you think of it as the domains of intelligence,

many of them and maybe all of them are required in order to do something with your hands.

It's a remarkable thing that a planning, reasoning, logic, all of these things are

connected to the way that we interact with the world through our hands.

That's fascinating. Like when you were saying that, I was thinking, so I put my hand on my chin.

And then if I were to, if you and I were navigating the world, we were, you know,

early sellers or somewhere, we might point towards the direction we want to go or

I might put my hand on my chest to refer to myself or I might put my hand out

and my palm up to refer to you in some sort of gracious way.

Is that what we're referring to, this sort of instinctual thing that

happens with our hands as we're talking? Yes, our view and the position that we've taken is that

the hands and their use and the mind are interwoven in an inseparable way in people.

So if you want to understand human-like intelligence, the kind of intelligence that we have,

the hands are the appropriate starting point and that's why we focus so much on them.

And I asked to see some things. I can actually show you one of the hands.

Yeah, all right. So I see on the screen here, you've got, yeah, a very interesting looking hand,

five digits, yeah, four fingers and a thumb and a palm. And it looks like something out of

Terminator, but a little more elegant, in fact.

Well, I think that I would not characterize it that way. I think that the way that we

imagine this hand is that it's the best that the technologies that the global community knows

how to build. It's the best that we can get to human hands. There's a lot of things in this hand

that aren't immediately obvious just by looking at it. And those are mostly about the sensors.

Our sense of touch is a very important thing for our intelligence and how we are in the world.

We tend to take it for granted because it's always there. And when we look at screens and things,

people are very visual and they think about the world in terms of seeing, which is fine.

But there's an interesting observation that seeing is about the future. It's about planning,

because the things that you see are away from you. So for example, if you look at a cup and you

want to pick it up, the part of your brain that plans thinks about the future,

but touch is a little different. Touch is an immediate thing that's in the now.

Touch doesn't have foresight. It's all about the present moment. And when you make contact with

the world, say you're seating in a chair or you're picking something up or you're turning a Rubik's

Cube in your hand, the sense of now is intimately connected with the touch sense. And without it,

you can't be who you are. So this is an important thing for building robots is that touch is not

a second-class citizen if you're trying to build a system that behaves in things like we do.

And so these hands are covered in very sophisticated touch sensors that allow them to feel the world,

something like we do. Yeah. So when we're looking at each digit, I guess we have a couple of knuckles.

And so the tip of your finger has one pad, then that middle knuckle, I guess has another pad,

and then there's a longer pad in that third spot. So if you're looking at your own hand,

you have those sort of three segments of a finger, they each have a pad on this robot.

And the thumb obviously has the same configuration. And I guess when they touch each other,

that's telling it something's hitting that pad. Is that correct?

It's more general than that. So the sensors in your hand are not just about

a yes or no question about you're touching something. They're very rich. You can feel

temperature. You can feel if something's sliding past your fingers, which is very important when

you're trying to hold something or turn it in your hand. Imagine trying to put a key in a lock

and turn it. A lot is going on in that thing in your brain. And a lot of it is driven by touch.

If you didn't have the sense of touch, it would be very difficult to insert a key into a lock

and turn it. Even something as simple as that. Because of resistance, right? You have a certain

resistance on either side and the top or the bottom of your finger where it's touching the key.

I'm imagining this as you're saying it. Yeah. The way that we do things in the world is not...

We take it for granted. There's this thing called Morvex paradox, where the things that we take

for granted and are easy are actually some of the hardest problems that there are. And the

reason they're easy is that we've had a billion years of evolution to create a system that is

fine-tuned to be able to deal with things like picking things up and putting things in things

and things like this. But the reason why we have AI systems that can write at the level of GPT-4

or create images from scratch that are as beautiful as any human artist could draw,

the wonders of the digital age, but we don't have a robot that can do laundry,

is that doing laundry is a fundamentally much more difficult problem than any of the ones that

modern AI has managed to master. Yeah, it's fascinating when you think about how

complex this is. And that paradox you mentioned, that seems like a fascinating evolutionary moment.

These systems are so complex that they must be automated. Because to actually,

with cognition, to try to think, okay, I'm going to have to give some resistance as this key goes

into it. And I'm going to have to feel it click a couple of times. And then I'm going to have to

twist it left. But I'm going to need to put more pressure on the inside of my index finger versus

my thumb. And if I put too much pressure, I'm going to break the key off in the locks. I mean,

it's incredible when you think of all of that occurring. And it's occurring in just an automated

fashion. It's just a chunk of a task to open the door. It's not even one test either. It's

probably open the door, which is the key getting taken from your pocket, being put into lock,

twisting, open the door, closing at the whole shebang. It's just abstracted into one

instruction set, huh? Well, that's an interesting phrase. And I'm glad you mentioned it because

that's the way that most serious embodied cognition efforts work is they have an idea of an

instruction set, which is very similar to the way that processors work. I worked my first half of

my career in building computer systems. And the marvels that we've built in the computing side

are fundamentally based on a very non-trivial fact that every program that you could write

boils down to the execution of roughly order 100 different tiny little programs that just

happen in different orders. So every program that you can write on a computer is basically

composed of only about 100 building blocks in modern processors. The reason you can do that is

that processors have a natural way to turn the analog nature of the world into a digital character

that allows error correction, which is the fundamental reason why you can do

all the things that we do today, that in the computer, there's this thing called a transistor,

which is the basis of this digitization going from things that are just any number at all

when you measure them like voltages and currents to something that's only a zero one in motion,

that is the taking of actions in the physical world. You need to be able to find a way to do

that same thing. And so what we've done is created this type of instruction set, which is a very

small number of building blocks that you can compose in different orders to create massive

complexity of tasks and potentially all of them. So if you think of a robot moving through the

world or a person as a program, then you can imagine that any program could be written in say

maybe just 100 things in different orders and that's what we're trying to do here is to figure

out what those 100 things are and then use a technique called task planning, which is the

idea that given a goal, let's say I ask the robot in natural language to do something,

the robot can figure out how to sequence the things it knows how to do in order to achieve the

goal and thereby achieve general intelligence. Because if I can ask the robot to do anything at

all like in the human sphere and the robot can actually perform the task, then it would be

fair, I think, to think of these things as reaching the goal of having general intelligence.

By the way, I should mention that there's a concept that David Chalmers is a philosopher

introduced called a philosophical zombie, where a system can have the appearance of being like us

in the sense that it can do things like we do, but it doesn't have the first person conscious

experience that we have. So there's a lot of mysteries about the relationship between being

able to actually achieve goals, do things and whether that is related or different than the

experience we have of being people, this thing that it feels like to be a thing. That's a deep,

deep mystery. All right, listen, efficiency needs to be top of mind for every founder in 2023.

Fundraising is drying up, so you need to extend your runway. And one great way to do that is

automation, but it's hard to apply automation in your day-to-day operations, isn't it? So here's

an amazing solution. Intouch CX provides easily integrated automation tools for customer support.

You're wondering how it works? Well, let me tell you. Intouch CX provides automated and live chat,

email and voice support. This eliminates unnecessary process and cost, and it will

make you faster. Intouch CX will streamline your customer support process, cut back on repetitive

and time-consuming tasks, and increase productivity by 30%. And it's going to simplify your business.

So refamp your workflows with Intouch CX. Intouch CX partners are experiencing 45%

average cost savings in customer support ops so far. Find out how Intouch CX can improve your

startup's efficiency. Get a free consultation with their automation experts and get started at

intouchcx.com-twist. That's intouchcx.com-twist. Yeah, consciousness, the big C is for philosophers

and religious people. What is consciousness? Is it just some illusion that we're having

in this brain of ours, which is a collection of a bunch of subroutines as you're sort of

alluding to here? Or is it this God molecule in our brains making us

sentient and driving us to do things? I guess this is one of the exciting things about AI,

is that we're in some way, or in your case, quite literally, trying to deconstruct and then reconstruct

what is happening in cognition. But it has to start with, hey, pour me a glass of milk.

So it has to know what milk is. Easy enough to do now with visual computing. But pour, okay,

we know what that word means. It's moving some liquid from one place to another.

And then we have to then, of course, make it do that accurately. So if we were breaking down a

task like that, and you said there's about 100 things you're teaching, what are those little

subroutines or those micro behaviors, or you used a term for it, what was the term used?

We call them micro policies. So in the world of reinforcement learning, which a lot of this

is grounded in, we came up through the reinforcement learning school of thinking about cognition.

A policy is an action that you take from a current state. So it's a prescription for how

you act, given the observation of the world that you have. So these micro policies are

a collection of very specific types of behaviors, like say, for example, turning a key in a lock

that we train individually in isolation from any other use. So the way that works is that we take

the robot and we, a person who is teleoperating the robot, which is a process of a person controlling

and being kind of immersed in the robot, receiving the senses of the robot and moving the robot

through a rig, which is another type of robot that the person is strapped to.

So the person moves the robot to accomplish the task, because the person knows what it means

to pick up a key input in a lock and turn it. And we collect order hundreds of episodes,

which are instances of solving that problem. And then we use that to seed a thing we call

large behavior model. So large behavior model is much like a large language model, except the

fundamental data is the data of experience. It's vision, audio, proprioception, which is

the information from the servos and the robot, where it is and so on, how fast it's moving,

and touch haptics. So if you can use the same idea where you take a bunch of data, and instead

of predicting the next word or token in a text prompt to response, you take the past, which

is the things that have happened to the robot, and you predict the future, which are the analog

to the large language model predicting the next tokens. But in an interesting twist,

the predictions of proprioception are predictions of where you will be, how you will move.

And you can then send those predictions to the actual motors, and the motors can move.

So that one of the most fundamental churning points in my professional career happened about

10, 12 years ago when I and some colleagues read Jeff Hawkins book called On Intelligence,

which was really the first thing that I read that made sense about the potential model of

human cognition. And central to that story was the idea that our brains are predictors,

is that we imagine the future, and then we implement the imagining. So if I decide to pick

up a cup, my brain is predicting how my motor's signals will fire, and then it sends those predictions

to my muscles, and then I perform the task. So these large behavior models that we and others

are working on now are of this sort, is that they predict the future based on the statistical

properties of the data that they've looked at, and then they execute the tasks, and they work

quite well. So for a human being, we're going to perform a task, we're going to pour this glass of

milk, we will, in our minds, and we do this either consciously or maybe even subconsciously,

okay, I'm going to pick up the glass, I'm going to pour the milk, I'm not going to spell it,

it's going to pour in some kind of an arc, I'm going to watch it fill, I don't want to splash,

I'm going to stop pouring at a certain point, and you kind of visualize this movie in your head,

this potential future behavior, and so our minds are so powerful, they can actually

essentially play a scenario, almost like a screenplay, like a little vignette,

and then our muscles actually go play that routine, is that the concept here in terms of

intelligence of what happens in our brains? Yeah, so an analogy would be,

let's say you take a piano keyboard, and each of these little micro-polices that we're talking about

are one of the keys on the keyboard. Your brain, your mind, because I want to make a distinction

here is that I've come to believe very strongly recently that the mind is a creator of stories

about the future. You, the awareness that you are, your conscious presence is not your mind,

it's a different thing, you touched on it briefly, I don't have any proof of this,

but I'm much more of the mind that the, there's a big mystery there about what it is to be

the thing that is you, but it's not your mind, your mind is a machine just like your heart,

and the job of your mind is to produce stories. So in the analogy to the piano,

think of the mind as creating sheet music, and then the sheet music is automatically

put on the piano and the keys are played and you hear a melody or a song. In the analogy,

the song is the behavior, like for example, picking up a glass of milk and drinking it.

All of the behaviors that we exhibit in this model are all different songs that are generated by

pressing the keys in different orders and with different styles. So the mind is the creator

of the sheet music that it does always, this is, brains are always doing this,

and then it's played on your body, like a song. And, and our conscious perspective

is sometimes not aware that these are separate things, you know, we have difficulty introspecting

our minds and our behaviors for a variety of reasons. But I think that after working on this

problem for a long time and seeing the, the synthetic analogues of us, how it actually

works in robots, I think this is a good model is that your mind is a machine for creating sheet

music that is immediately played on the, the instrument, which is you and your awareness

is a separate thing that kind of watches this. And sometimes we get confused and we think we are

the mind, but I don't think we are, I think we're something separate.

So there's the mind and then there's the machine. And this machine is going out,

conceiving of these tasks, executing on them, playing the sheet music, running through the

script. I use this, the analogy of a film, using the analogy of piano, but the script gets played,

the sheet music gets played, it happens. But our awareness that I am a human, I am Jason,

you're Jordy, we are on a podcast, we're having a conversation, I'm trying to understand what

you're doing, you're trying to understand my questions. And then there's going to be 100 or

200,000 people who listen to this. And they're also going to try to, that's consciousness,

the awareness of each other and ourselves and our place in the universe and that there is even a

universe. These are two different things. But for some reason, we perceive these two different

things that are occurring, the mechanical execution of tasks, through this very interesting

project, process, which you're now recreating is different than consciousness. And consciousness,

who knows when we're going to ever figure that out? Or if we can figure it out, this idea that

we're aware that we are a living being, but we can figure out, at least at this point in time,

it feels to you like we're going to figure out, and we're close to figuring out how our brains

break down complex tasks and do them so elegantly. Am I, am I repeating that back to you correctly?

Yes, that's right. There's a spiritual leader, I suppose you could say, called Eckhart Tolle,

who refers to the first thing, which is the not knowing that you are different than the plans

that your mind makes as being unconscious is the phrase he uses. And I think that it's a natural

state of people is to not be aware that the mechanical following of scripts, which is most of

our behavior, and that's the sort of thing that you can, there's a shot to doing in a robot. So

I'm fairly sure that we can build machines that can do all work, like all of it, that at least

does it's currently, you know, understood things like automotive manufacturing, logistics, bringing

parcels to your home. I think that all of those things are within scope to do within say a decade,

at least have the capability. So this idea of building a thing that appears as though it's

intelligent and does all the things that you'd want, that's within reach. But the thing that I'm

really kind of taken with is this other notion, you know, I used to be a theoretical physicist a

long time ago, and I worked on foundational problems and quantum mechanics and general

relativity. And I've always had, you know, at the base of who I am, I've always been interested in

understanding how things work at some fundamental level. And it's always bothered me that

all we ever experience of the world is this first person thing, the feeling of being you in the

moment. But we don't understand that at all. And I think that this neglect of what is probably the

most central direct experience we have of the world means that there is a discovery waiting to be

made about the relationship between our experience and notions of space and time. And I think that

this project is somehow, in some ways, aimed at that, is that, you know, it sort of starts from

a weird spot because you see these robots, and there's mechanical hands, and then I'm talking

about, you know, some fundamental relationship between the emergence of space and time and

how we perceive it through our conscious perspective. And as he seems to be not related,

but I actually think that they are very tightly related. You see these blue light glasses I'm

wearing? I'm not wearing it for style, although they are very stylish. They've totally changed my

life. Why? I started having headaches, right, and had eye strain. So I got these blue light

blocking glasses that do a little magnification because I need readers. Yeah, I look nuts,

but my eye strain's gone down, my headaches have gone away, and I'm sleeping better. Do

you know how I got on this? I got on it because I now have a health coach. Who's my health coach?

It's F-O-U-N-T. It's a health company that's created custom health and performance programs

that are tailored to your body, obviously, also your goals, and they take into account your lifestyle.

My coach is incredible. I text with him all the time. They did a blood work for me.

They check out my wearable data, and we do weekly calls to see if I'm on track and getting the

results I want. They also told me about some supplements I should be taking based on the

blood work, and they do it at a fraction of the cost. We upgraded my diet. I'm doing a little more

protein. We've optimized my sleep. That's great. I got the supplement packs. I feel great. I feel

like I'm in control of my destiny. If you want to be like me and you're concerned about your health

and you want to just try to do better, have some experts on your team. Build your own program.

Go to fount.bio-twist. That's F-O-U-N-T.B-I-O-slash-twist. Get your free consultation.

Mention twists. You get $500 off your first month, and get your own personal health coach.

Health is well, and if you're running a startup, if you're a CEO, if you're a capital allocator,

take it seriously. I love this service. Fount.bio-slash-twist.

Well, if we think about the experience of being human and our place in this universe that we're

trying to figure out, performing the tasks, as you said earlier in our conversation, is how we

navigate the world, and it's how we are actually doing this act of trying to figure out what it

is to be human. And this all then starts to open up all kinds of possibilities, free will.

When we pour that glass of milk, when we play that sheet music, where is our decision to do that

occurring? What parts of it are automated? Which parts of it are just wrote and just get executed

on? And so it does open up. And I agree with you. This is the question that we will always try to

figure out. And this is why science fiction always winds up here, which is what does it mean

to be human, whether we're talking about Blade Runner or Prometheus in the Alien series and

really Scott's take on it. So let's get back to reality here. When you're training the robot,

you are not saying, hey, we're going to pick up a tennis ball here. If we're going to pick up a

tennis ball, it has this size, therefore we're going to program it to pick it up. That's what

people did with robots before. They very explicitly had it do some very narrow verticalized tasks.

You're having a human being, like the guy who played Gollum, I guess, in Lord of the Rings,

use gloves or something to send the instructions to the robot's actual physical actuator hands.

And they're incredibly sensitive and have those pads on them. So we're teaching it, hey, I'm going

to just pick up Andy Serkis. Yeah, he was a guy who played Gollum. We're going to actually just

pick up the tennis ball. And then the AI that we train is going to know what happened. And that's

what's going on here. Yeah, so I have got a, this might be helpful. Can you see the video?

Ah, here we go. Yeah. Yeah, we got a video of a robot. Yeah. Yeah, so this is Phoenix. And what

you're seeing is this process that we're talking about where there's a person in a suit, they have

haptic gloves with force feedback so that they can feel the world they see through a heads-up

display so they can, they feel like they're looking through the eyes of the robot. And they're

connected to it, their own robot that when they move, moves the robot in an analogous fashion.

So the, yeah, so this is what it looks like when you watch the robot side of teleoperation.

You can see that these machines are capable of doing lots of different things. I mean,

that might not be obvious from watching this, but the systems are nearly capable of doing

anything that a person can do under this type of control as long as they don't have to move

around the world. This is focused on the upper body stuff and the problem that I mentioned,

which is the dexterous manipulation of the world. And you're seeing it there, you know,

basically pick up an object and then scan it with a barcode as if you were working in an Amazon,

let's say factory and shipping and packing boxes or even doing something as delicate as using a

Ziploc bag, which we do unconsciously. We feel it, it feels like the Ziploc bag, yellow and

blue may green and you just, you have that color system, but you also have the feeling of it. So

humans do the tasks and then take me to what the software then does with the human having

done the task. What does it do next in terms of building a model to then go do the next thing in

the world? Yeah, so imagine you have a Reddit post, which is some sequence of words that

someone says, I really love Diablo 2 because Amazon is my favorite character. So somebody's

written something that sentence is the expression of a thought that a person has had into words.

Now, when you train something like a GPT large language model, that sentence is used to

help figure out the statistical likelihood of each word. Let's just keep it simple,

following the preceding ones. And if you give this model enough words that people have written,

the expression of their thoughts, then if I was to say type in a prompt, which is my favorite game

is, then of all of the words that have ever been written to some approximation,

there's a probability of what the next token will be given that prompt. And then the thing can unroll,

which means I put the next word in and now I ask with all those four words, what's the fifth word?

Okay, put the fifth word in. What's a six? And each time it's a probabilistic thing. So you roll

a random number and you pick the thing that the random number says it should be. So with this type

of model, these large behavior models, the data is a little different. It's the data of the sequence

of successive nows. It's the time data from the person performing the task. So if I ask the robot

to open a Ziploc bag, let's say that's the micro policy that we're going to train. So a person

picks up the bag from the table through the robot, opens the bag, and maybe pulls it open a little

bit. So that then becomes the analog of a sentence. It's a piece of data, which we're now going to

use to train a model, where instead of predicting the next word, we're going to predict the next

sequence of actions. And we unroll the same way we would a sentence. So every successive

prediction becomes a movement pattern for the system. And in this case of the kinds of things

we build, while it's similar in some sense, there are some very big technical difficulties in

actually doing this that require the synthesis of many different kinds of artificial intelligence

advances. For example, you could send the pixels from the camera in at every step to one of these

models. But the pixels are not the thing that you really care about. What you really care about

is where are the things and what are they, which is a much lower dimensional thing. So machine

learning computer vision techniques have been developed that will take the camera feed and

extract what you could think of as the semantic or important information about the scene. And

those are typically the things that you put into these types of models. And that's not just true

for vision, it's true for haptics and audio and proprioception. So on the audio side, the obvious

thing is if a person speaking, you could use the actual audio waveform, but you could also use the

text. And text is a much more compressed and high quality version of the data than the actual audio

itself. So we tend to do text extraction from speech before we send them into these types of

models as well. That's fascinating. So you can know with machine learning, hey, there's a bag in

this scene and the bag is open. And but the bag is upside down. So it needs to be up. We should

flip it around so these things don't fall out of the bag, etc. And so where are you at? Let's

I think I understand what's happening here in terms of the language model analogy,

and then just translating that into predicting the next best thing to do.

And so where are you at in terms of training this in the real world? I assume that factories,

and the example you give looks like a, you know, a packing and shipping, probably one of the most

boring, monotonous, soul crushing jobs a human being could have. So why not give that to a robot?

And sure, you could do it 24 hours a day or whatever the robots are going to be capable of.

So where are you at in terms of taking this and actually having it at a fulfillment center,

packing boxes and making sure that it scans them and puts the right

objects into the box and then ships them onto the next person and being on

you know, this distribution center floor.

To be clear, the initial go to market is in automotive manufacturing. It's not in logistics

and retail. We focus almost exclusively on that with one exception. In automotive manufacturing,

if you take a look at a video that, say, Toyota makes of their factory floor,

automotive manufacturing is one of the most automated systems there are in any industry.

But if you watch what actually happens, there are hundreds or thousands of people

in automotive manufacturing facilities all the time. The question is why? Why are they being

automated? So when you look at what they're doing, there's kind of two categories of answer.

One is it may be beyond the bounds of science. We may not know how to do the thing that they're

doing, but there's another answer, which is that often people are used to connect machines.

So let's say I have a machine for stamping a part and I have a machine for

putting, making the part in the first place. So moving the part from the one machine to the

other machine is a very difficult process that involves all of these things that we talk about.

You need to be able to know what a thing is, where it is, localize it, use your hands to

pick it up, sometimes out of a cluttered mess, put it somewhere, which often requires putting

something on a jig, which is a difficult thing. You need to be able to move around and so on.

So a lot of the work that's done in automotive manufacturing specifically is a combination

of different solve problems that have never been put together in a way that you could make economic.

And one of the key factors of these general purpose machines that we and others are building

is that this is exactly the kind of thing that's required in order to actually do this for real,

is that if I was to spend all my time and energy building a machine that did one of the things

that somebody in this factory floor does, it would be very difficult to build a business.

But if you could build a machine that could do, say, 50 of the kinds of things, now we're talking.

So our initial use cases are nearly all of this sort. They're automotive manufacturing,

they're the connector problems where you're moving between machines with parts or things of

the material. And even the things that aren't automotive manufacturing that we've looked at

all share the same feature. For example, in warehouses, which was my last business built

robots for e-commerce distribution centers, there's a problem called induction. So induction is

the problem of taking things that usually come off trucks, big pallets and just stuff. Imagine

all of the things that you could buy on Amazon coming into a warehouse. And then taking them

from their point of delivery and then getting them wherever they should be in the system,

on a shelf, in a box, whatever. So induction is another kind of problem that's related to this,

where you're dealing with system things that you need to manipulate with your hands,

opening boxes, closing boxes, putting things in boxes, taking things out of boxes and so on.

And so that's another category of things that's related. But nearly everything we're doing now

is helping automotive manufacturers dramatically improve the efficiency and productivity of their

workforce. We're back with another Pitch It to J-Cal. This is this segment brought to you by

our friends at Dot Tech Domains. Dot Tech Domains are giving twist listeners a chance

to show off their startup on this weekend's startups. So go to startups.tech.json, that's

startups.t-e-c-h slash Jason to apply. There's only one rule. You need to have a Dot Tech domain

name to get featured. This week, I received a great pitch from Label Drive, which you can find at

LabelDrive.tech. Label Drive helps other companies manage their AI data. And they've built a tool

for collecting and labeling data that's especially focused on identifying and catarising objects

to save your time, save your money and build better products. And as we all know, that's crucial for

AI training. So I want you to go right now to LabelDrive.tech. And if you're interested in

getting featured on this weekend's startups with your new Dot Tech domain name, I want you to go

to startups.t-e-c-h slash Jason and apply today. That's startups.t-e-c-h slash Jason and fill out the

form to apply. If this works, when do you think you'll have the ability to have the robot find

those 50 different things to do? Let's say you nail that and feels like you're well on your way.

The first question, when do you get that solved and in factories and just doing it day in and day

out? The plan that we've got takes us from where we are now to the full automation of important

tasks by which, I mean, there are markets for, say, billion dollars of annually occurring revenue

for us. So let's say that's a kind of the thing that we want to target. We enter into agreements

with our customers where the first step is that we mock up their situation in our facility here

in Vancouver. Think of it as a digital twin or not a digital twin, a real-world twin. There's

also a digital twin, by the way, but the real-world twin. And then the processes that they pay us

to show that we can automate using this type of thing, the kinds of tasks that they want

as a first step. So there's a period of roughly two and a half years that we see where we go from

where we are today to being able to really do something for real in the lab of the sort that

you could then scale. So that's the first step. When we start scaling is likely around the middle

of 2026, where you're going to start to see the increasing number of these types of machines

actually deployed inside automotive manufacturing plants contributing to the productivity of the

plant. So this is the plan of record. Now, I've done quantum computing and all sorts of things

where it's very difficult to predict how things will go. So in something like this,

you have a plan, things could go faster, they could go slower, it's unclear, but that's what

we're aiming at. I think you'll start to see the beginnings of large-scale deployments of

these machines somewhere in 2026. And so 2026, you start seeing the deployment of these.

And then when do you think factories start to remove humans? I guess they call that the

lights out moment. You don't even have to install lights in a space. I know it's funny,

but when do you think you have that lights out moment and factories don't need to have humans

in it? So I want to make a point about this. There is a myth that AI and automation reduces

labor. It's not true. Throughout history, there have been a series of moral panics

where the next big technology thing is believed to do something terrible to employment. It's never

happened. Every single time there's been a new thing introduced. And I think that the central

reason for us thinking this is that it's the lump of labor fallacy, the idea that there's

a fixed amount of work. And if you give the work to the robots, there's nothing left,

that's simply wrong. The way that it actually works in practice is that when you give,

say like you give a bunch of labor, like I want to build 80 million cars. So that's a fixed amount.

Let's say we could do that all with robots. The amount of work that's available for the

general human population expands as a consequence of that. It doesn't shrink. So I want to just make

this very clear that my perspective on AI and automation is that there's an upward spiral.

When you have more energy, you have more intelligence, you have more capability.

These drive all the metrics of human flourishing up. They don't take. So when we think about the

answer, when will you get lights out manufacturing? I think the answer is never because people will

always find new things to do with the tools that we've built, even very powerful tools that can

think and maybe even self-aware. These will only increase the number of jobs, the increase wages,

but there'll be different kinds of jobs. There'll be the sorts of things that maybe we can't even

imagine now that are made possible by these things. Like look at the internet 20 years ago,

podcasting. This is a great example. Yeah. Now we have an entire podcasting history. We have

people who take pictures or there's an incredible company called Songfinch. What they do is you go

there and you tell it you want to make a song for your mom or your dad. They pair you with an artist

and you pay them $200 and they'll write a song about your mom for her birthday. That's very cool.

Well, I mean, there's humans out there and I guess these used to be bards or

court jesters or whatever who would do these kind of tasks as well, but we find things and

you're just thinking about your robot and, oh, well, we have this new problem, forest fires.

How are we going to clean up the, how are we going to rake up as our former president joked of,

how are we going to rake up all that debris under the trees there in the mountains in California?

If somebody had 10 of these robots able to do a task, they might say, oh, you know,

I have an interesting idea. Maybe we could clean up and do some deforestation with them

and they will eventually, in your mind, a decade from now or two decades from now,

not just be in factories. They'll be in our lives. They'll be side by side with us

solving problems in the real world. Yeah, that's the ultimate vision here so they can leave the

factories. Yeah, I think of them as being a kind of thing like the automotive industry where at

some point they'll be ubiquitous and parts of, and our entire civilization will be built in

synergy with this new thing like we did with cars, you know, roads and so on. By the way,

I wanted to mention that this happened, this business of the job upgrade happened to me

when I was starting school. There were no quantum computers at all, except maybe theoretically.

And we started a company to try to build one. This is an example where the, we probably hired

about over time, I don't know, maybe 300, 400 people who had PhDs in physics in that company.

And this is D-Wave, yeah. This is D-Wave, yeah. That was a new kind of job that was created as a

consequence of a revolutionary new idea. So this is the sort of thing that always happens with

innovation is that, and I'm kind of emphasizing this a little bit because we're at a very

weird time right now where there's an attempt to do regulatory capture in artificial intelligence.

It's a very dangerous idea. This idea of de-acceleration or stagnation or holding back,

which are connected to ideas of the old ideas that were rooted in communism. These are very

dangerous social ideas that I think it's important that we don't stay silent about.

People like me who have very strongly helped beliefs that technology is the solution to

maybe all of our problems, not only the ones we create, but also the ones that might emerge

as a consequence of our natural habitat, global warming or meteors or whatever.

The idea that the better we can get at creating new things, the better we are all

is a very important policy idea that I don't think is being communicated effectively by the

community of people who build technologies. There are a group of people who want to

have the government and they've specifically gone to Washington and said, hey, please regulate us.

And there happened to be the people who were at the, maybe at the forefront or some amongst the

people at the forefront. And building a bunch of regulation into this would benefit the people

who have the lead today, as opposed to say open source people or folks who are coming up. Is that

the thinking of what their motivation is? Because this is a group of technologists

who are on the cutting edge. Why would they go to Washington and want to have a bunch of

non-technical politicians slow things down? What do you think their motivation is?

I think the best answer to this is somebody that you had on Bill Gurley.

Yeah. So his take on this I really resonated with. It was one of the most,

you know, sometimes you watch something and you're like, I'm disagreeing with everything

this guy's saying right now. I think I would, if people are interested in this subject and

they haven't seen that, I would most definitely recommend it. Bill Gurley all-in talk. Yeah,

we'll put on the show notes for everybody. Yeah. But the regulatory capture is what they're going

for. It calcifies the winners as the winners. It builds up a moat for them. And this could be

just cataclysmic for humans, right? We need this technology to solve problems.

Yeah. And I think that's the point is that the solving of problems comes from innovation

and growth. And the forces of stagnation, the people who are pushing for not that

are very strong right now. And I think it's dangerous because

I think that my view of this is that civilizations metrics like how well people are doing

are very strongly correlated with growth. And there's an idea that we have to slow everything

down, which is, I think, a dangerous idea. I think that what would happen if we were to

implement policies that were restrictive is the same thing that happened. I'll use an example

with nuclear power. A lot of the problems that we face today in the global warming sense and

catastrophic potential futures that we might be looking at are connected very strongly

to the precautionary principle, which in the nuclear industry was, well, we don't want to

build nuclear power plants because we're afraid of nuclear bombs, which is ridiculous,

by the way, because they're not the same thing at all.

They're not the same thing. Yeah.

And maybe a reactor melted down once or twice.

You don't count all the deaths that happened in all these other industries,

which were massively higher. If we had not done that with nuclear instead embraced it,

we would not be where we are today. And so there's examples of this fear, which is a

rational fear that can become policy that it could be very dangerous here, because I think

that these technologies we're talking about, which is the AI robotics to a certain extent,

but not just those things more generally, we should take the attitude that the upward spiral

is the objective. We want more energy. We want everybody on the planet to come to the energy

consumption of us. We don't want to reduce energy consumption. We want to increase it.

And then we want to increase everybody by another 1,000 times. And we need to be able

to find ways that technology can enable that and then enable solving the problems that might come

of it from second order effects like global warming. These things are all solved by innovation

and technology. Innovation and change is not the enemy. It's our friend. It's a necessary part.

And it's connected to who we are as people. People are explorers. We're adventurers. We want

novelty. We want to go to places that no one's ever gone, either literally or figuratively.

And that is the essence of the human spirit to me. And we want to be advocating for that as

technologists and leaders in our fields. Yeah. And it's so paradoxical. I remember when I was a

kid, all these great musicians who I loved, Bob Dylan and et cetera, did the No Nukes concerts.

And we really were indoctrinated into this fear of nuclear. And the second order effect

is that we burned more coal and we burned more oil and we heated up the planet. And now we're

trying to solve the problem. And the solution was there in the 70s. And then sometime in the 80s,

we decided, hey, let's stop doing this. And now 80s, 90s, 2000s, 2010s, we're sitting here

four decades later. And finally, people are starting to realize 40 years later, oh, you know

what? Maybe that was a mistake. Should we start building these again? And now we've got to

reconvince everybody that we went on a 50 year side quest that made no sense. And it's incredibly

frustrating. And, you know, it's, yeah, to some of our friends, people have been on the pod, Sam

Altman, Reed Hoffman, Mustafa, like, they, I think they're misguided here. We, we could have

conversations about this, right? I mean, there's nothing wrong with having a conversation. Hey,

how do you make nuclear safer? Hey, could these robots, I mean, it sounds farcical, but could

the robots escape and then do bad things in the world? Sure, we could have this conversation.

But that doesn't mean that we need to have a bunch of regulators come in and say, oh, somebody in

Washington is going to approve your language model and your code. And that doesn't make much

sense to me. That seems like they're doing regulatory capture, I agree 100%. And you know,

the other thing I realized about what we were saying, Jordy, is there's something about solving

problems, I realized in this conversation, that when, and when we were talking about jobs, and

there's a sensitivity to that, with good reason, we, you know, we want to pay things and a large

amount of jobs could go away quickly. And there could be displacement, of course. But when the mind

and consciousness is left alone, our minds are designed in a very interesting way to think

and find the next problem to solve. There's something fundamental about human consciousness

and this brain and, you know, Darwin and evolution that our species survived, dominated, and evolved

with something inherent in our code, which is understand the world and find the next problem

to solve. Does that resonate with you? Oh, yeah. I mean, everybody who's laid awake at night and

they can't get their mind to stop spinning through all the negative scenarios that could happen.

Everybody, I think, experiences this. You're exactly right, is that this tool that we've got,

this beautiful mind that does all these wonderful things, it creates the worst nightmare as possible

about what will happen as a consequence of it working well. And so with technology, our mind

spins up all of these horror science fantasy ideas, we turn them into movies like Terminator

or Black Mirror. None of that is real. I think there's a very important, powerful message here,

is that the terrible stories our minds tell us when you lie awake at night about your personal

life is the same process that generates fear about the outcomes of change. So when we do

something new, we innovate, we discover something about the world, there's a natural tendency that

all of us have to imagine what might go wrong. Yeah, so I would advocate for being aware of that,

is that it's a story your mind is telling you. The Terminator thing is not true. It's not real.

It's never going to happen. It's just a story that somebody made up that resonates with our

basal, base nature, fears and concerns about the future and so on. But it's not real. What's real

is very different. Yeah, in our minds, there was a reason this obviously existed, the person who

worried, hey, I wonder if these berries are poisonous or not, or I wonder if there's

something dangerous in that body of water, maybe I should be cautious. Yeah, a little bit of caution,

thoughtfulness, probably extended life and people who were reckless probably had shorter lives.

And so, yeah, the gene pool probably evolved this way. But you must be aware of how

catastrophizing it is. I mean, people can get really wound up. We see this with social media

presenting us with so much bad news in the world. Our brains are not designed to process that, are

they? No, and this is an example of how technology can have unintended consequences that are negative.

Social media hijacks this propensity that we have to tribalize, to fear, to other people. Other

people is being different. What part of this idea of thinking of this conscious perspective that you

have is separate from your brain, carries with it another idea that we're all connected. We all

have this thing, we all share in it. The analogy that Eckhart Tolle uses is that there's an ocean

and we're ripples on the ocean, but this ocean is the same for all of us. This idea is a powerful

one when you're trying to think about why you're reacting in a certain way to certain things. The

social media stuff is an amplifier of the negative aspects of how we function as people.

But that doesn't mean that we shouldn't have done it. I think this is the point. Like you said

before, we want to talk about it. We want to have a frank discussion about it, but the solution to

these things doesn't come from shutting things down. It comes from having this discussion and

making good clear-minded decisions about how to build, not how to break.

One of the great paradoxes of all of this might be we build up this AI and we get to some general

intelligence. It might tell us, it's a non-zero chance, it might explain things to us about

our own consciousness, why we're here, and what consciousness is that we ourselves could not

come to the answer. We may unlock some mysteries that explain our own existence in a way. That is,

just to me, would be a wonderful gift of accelerating this. What if this machine,

what if this artificial intelligence can be more objective about us and can teach us something?

That would be a pretty mind-blowing outcome, I sure would.

All right. Listen, continue success with this from just working on quantum computing and now

to robotics and figuring out how to make the sequences play. It's going to be very interesting

to watch your progress and listen, accelerate it all. Let's go. I'm assuming you're hiring and

this must be one of the most fascinating places to work in the world. If people are interested

in learning more or maybe applying for a position to build this out and accelerate

human intelligence and augment it so beautifully, we're going to find out more.

So, I and one of the other founders of the company, Dr. Suzanne Gildert, have a podcast

called the Sanctuary Ground Truth Podcast. That's a place that you could look. We also,

at our website, sanctuary.ai, there is a careers page. We are hiring and growing quite quickly,

and there are positions for all sorts of different kinds of people. We mostly hire technical people,

of course, but there are some other things. And if anybody's interested, please

watch the Ground Truth Podcast and go to the website and check us out.

Amazing. All right. And we'll see you all next time on This Week in Starter. Bye-bye.

Machine-generated transcript that may contain inaccuracies.

This Week in Startups is brought to you by…
IntouchCX. Looking for ways to make your startup more efficient? IntouchCX has a ground-breaking suite of AI-powered tools for end-to-end optimization to give your business the edge it needs to thrive. Get started with your free consultation at http://intouchcx.com/twist
Fount. Do you want access to the performance protocols that pro athletes and special ops use? With Fount, an elite military operator supercharges your focus, sleep, recovery, and longevity, all powered by your unique data. Want a true edge in work and life? Go to fount.bio/TWIST for $500 off.
.Tech Domains has a new program called startups.tech, where you can get your startup featured on This Week in Startups. Go to startups.tech/jason to find out how!

Today’s show:
Sanctuary AI CEO Geordie Rose joins Jason for an incredible interview on the complexities of using AI to train robots (11:09), developing large behavior models (17:53), the 'lights out' moment in manufacturing (42:52), and much more!

Time stamps:

(0:00) Sanctuary AI CEO Geordie Rose joins Jason
(3:42) Sanctuary AI's approach to robotics and motivation behind creating humanoid robots
(6:05) The human hand's integral role in AI-driven robot development: Planning, reasoning, and understanding the world
(11:09) Moravec’s paradox and the challenges of instilling perception in robots
(16:40) InTouchCX - Get started with a free consultation at http://intouchcx.com/twist
(17:53) The significance of "Micro-Policies" and developing large behavior models
(22:59) Exploring human cognition and large behavior models
(28:52) Fount - Get $500 off an executive health coach at https://fount.bio/twist
(30:23) Sanctuary AI’s Phoenix robot, robot training, and use of large language models
(37:46) Robotics in automotive manufacturing
(41:43) .Tech Domains - Apply to get your startup featured on This Week in Startups at https://startups.tech/jason
(42:52) The"lights out' moment in manufacturing and the challenge of regulatory capture in AI
(56:01) Humans’ problem-solving nature and roots of technological fear
*
Check out Sanctuary AI: https://sanctuary.ai/
Follow Geordie: https://twitter.com/realgeordierose
*
Check out Bill Gurley’s 2,851 Miles: https://youtu.be/F9cO3-MLHOM?feature=shared
*
Read LAUNCH Fund 4 Deal Memo: https://www.launch.co/four Apply for Funding: https://www.launch.co/apply

Buy ANGEL: https://www.angelthebook.com

Great recent interviews: Steve Huffman, Brian Chesky, Aaron Levie, Sophia Amoruso, Reid Hoffman, Frank Slootman, Billy McFarland, PrayingForExits, Jenny Lefcourt

Check out Jason’s suite of newsletters: https://substack.com/@calacanis

Follow Jason:

Twitter: https://twitter.com/jason

Instagram: https://www.instagram.com/jason

LinkedIn: https://www.linkedin.com/in/jasoncalacanis

Follow TWiST:

Substack: https://twistartups.substack.com

Twitter: https://twitter.com/TWiStartups

YouTube: https://www.youtube.com/thisweekin

Subscribe to the Founder University Podcast: https://www.founder.university/podcast