FYI - For Your Innovation: Roblox: The Virtual Playground Where Gaming, Fashion, and Real-World Brands Collide with Daniel Sturman

ARK Invest ARK Invest 8/31/23 - Episode Page - 46m - PDF Transcript

Welcome to FYI, the four-year innovation podcast. This show offers an intellectual discussion on

technologically-enabled disruption because investing in innovation starts with understanding it.

To learn more, visit arc-invest.com.

Arc Invest is a registered investment advisor focused on investing in disruptive innovation.

This podcast is for informational purposes only and should not be relied upon as a basis for

investment decisions. It does not constitute either explicitly or implicitly any provision of

services or products by arc. All statements may regarding companies or securities are strictly

beliefs and points of view held by arc or podcast guests and are not endorsements or

recommendations by arc to buy, sell, or hold any security. Clients of arc investment management

may maintain positions in the securities discussed in this podcast.

Hi everyone, welcome to another episode of For Your Innovation by Arc Invest,

a podcast on all things related to disruptive technologies. I'm Andrew Kim,

research associate covering consumer internet and fintech, and I'm joined by Nick Gruess,

associate portfolio manager. Today we have the great privilege of speaking with Daniel Sturman,

CTO of Roblox. Hi Daniel, thanks so much for your time today.

My pleasure to be here.

We love it. If you can just tell us a little bit more about yourself or our listeners and

let us know how you ended up where you are today.

Great. Yeah, so as you mentioned, I'm the CTO at Roblox. That means I oversee the

technical team here in the company. That's about 1,500 folks, so the majority and majority of

Roblox are technical contributors. I've come here through a non-typical path. I never worked in

3D graphics or animation before. My background has all been systems. I worked for a range of

systems and scale out companies before Roblox. I was a company called Cloud Era,

which did big data software. I spent about eight years at Google, about a decade at IBM before

that. Great. Daniel, we're very curious about the Roblox platform. Could you give us a brief

background on what Roblox is, how it's positioned in the gaming space, and then we're going to

dive very deep on the technical side with you, especially in AI. We'd love to just get your

kind of overview on Roblox, the platform, where it is today, and then we'll go further.

Sounds good. Roblox, we keep it in term platform, and that's really important. It is a platform.

At Roblox, we try very hard not to create any of our own content. The John idea behind Roblox,

it was one of bringing 3D simulation to people and letting them create whatever they wanted

out of that. We, for example, will provide all the engine and physics rules that you need. We

provide the backend cloud, we provide things like translation, but our creators come and they create

the content. We have millions of creators on the platform, and over 60 million daily active users

come and consume that data or that content. The two tend to go back and forth. We're seeing more

and more that people are passionate about using the platform, become creators. Actually, as the

platform is evolving, we're lowering the barriers to creation. The line between a user and a creator

is becoming increasingly thin. It's a platform. It runs on as many devices as we possibly can.

Today, that is iPhone, Android, Mac, Windows, Xbox, but we're always looking at more so that

you can consume this content wherever you are at, and it doesn't have to be one place or another.

We just announced our bid on MetaQuest, for example, as another format. It's really about

universal consumability. We're worldwide. We have people in most countries around the globe,

and a growing audience across the globe. The majority of our traffic is international at this

point. That's what Roblox is a nutshell. What do people build? Yes, there are games, but there are

music concerts. There are shopping experiences and fashion experiences. There are brand experiences

with large brands coming and building experiences on Roblox so that their fans can come and engage

in a 3D immersive way with their content. There's a really exploding set of ways people are using

this technology and what they're doing with it. Thank you for that great overview of Roblox.

I guess just diving a little bit deeper into generative AI, because that seems to be all the

buzz, and Roblox is in the spotlight of all that buzz. In your presentation at Morgan Stanley in

June, I believe, you showed a demo in which a developer was able to generate and manipulate

3D assets using a voice-powered AI assistant. We were blown away by this, how context-aware

this AI system was, and how generally helpful it was. I was just wondering if you can maybe talk

about the model training process that allowed the AI assistant to be so context-aware and helpful,

and what challenges you faced in the fine-tuning process, and what challenges

remain for the AI products that are live today, namely the material generator and code assistant?

Yeah, so let's unpack that a bit. First, the demo we showed, as I stated in that presentation,

was very, very early. That was not a system we built per se by training data. We basically

was all built using prompt engineering to connect a reasonably good AI up to what was possible in

studio, and using a lot of our existing studio features, like semantic search and so on, that

allowed that to just run very well. But as you kind of see from some of the lives in there,

that was not trained end-to-end. That's our next step on that. We're working on a much more Roblox

specific version with kind of properly sized and domain-specific AIs behind everything we're

doing there. So that still has a ways to go, but we're excited about that, and I thought the prototype

illustrated the promise of doing something like that, and it's just going to get better as we

better understand the capabilities of the platform and tie those pieces together. You would also,

I think, ask just changing gears a bit about our existing products with General AI,

Material Generator, and Code Assist. Those are going quite well. We're seeing a lot of uptake

on the uses. We're seeing some really incredible things, like with Code Assist. It's cutting

keystrokes by creators using about like 35%, so it's really helping them code faster.

With Material Generation, we've seen uptick of additional 50% of folks who are using PBR materials,

which is what that is around, creating these really rich, physically-based rendering materials

on the platform. We're starting to see evidence that it's democratizing these sorts of things,

like PBR materials before were at the limit of someone with pretty strong technical graphics

type skills, and now I can create a PBR material. I absolutely don't have those skills at all,

right? So I think that's going quite well. Similarly with coding, the stronger that platform

gets, the easier it will be for someone to make an entree into scripting on the platform, and we

there's a lot of problems on that. I want to take a step back here and just ask you your general

thoughts on AI in gaming. What will it mean to the space? And then specifically on the Roblox

platform, what do you think this does for content creators? Because Roblox has, as you mentioned

before, extended beyond gaming. So I'm very curious, when you look at this space overall,

what is AI going to do to content creation in the digital world and in gaming and on Roblox?

Right. So step back, I'm always looking at this from what does general AI need for the creation

space? Because I think gaming is a little bit different. It's very similar, but it's a little

bit more narrow and also tends to traditionally be built by full-time people doing full-time

things around kind of a marketing effort, right, which is a game. On Roblox, since we're allowing

anyone to come and create, it's a little bit different. You have a very wide range of skill

sets. And I think general AI is extremely helpful on both helping people who don't bring the same

amount of technical skill to the platform, but also helping them grow those technical skills as

they, if you build these products right, they kind of teach as they go, right? And I think that's

extremely powerful. You look at the breadth of experiences on Roblox and we do have some full-time

studios that look like game studios that are building on the platform. But we also have things

like advertising teams who want to build a brand experience coming into the platform. And

they're looking for ways to make this much more accessible. They can contract with someone today,

usually an independent contractor, but just the ability to speed and iterate and move that sort

of process closer to the person with the creative idea is extremely important. And I expect like

long-term, you look at some of the things with music, there's no reason why every new hit shouldn't

have kind of a 3D immersive equivalent of a music video, right, on the platform. And that sort of

creations just can become more frequent, more common. And the ability to quickly build worlds out

to start to describe behavior you should want, avatars to have or the world to have and so on

is all absolutely within the domain of generative AI. You can see it from the existing launch that

we made, just kind of like the proof points. And so I think it's just going to allow all this to

explode. It's all heading in a direction for us that, like I said, blurring this line between

user and creator. And generative AI enables a much more casual creation experience versus kind

of a formal, I'm sitting down and I'm creating now, but it can be intertwined with the use of an

experience, whether you are as simple as maybe you're in a music experience and you create a new

dance on the spot, right? And we already have a version of that in studio. There's no reason

that couldn't roll into general roadblocks as a whole. I just think the potential is unlimited

start to blur that line. It's amazing. I think it's clear from the metrics you shared regarding

material generator and code assist that these teams are getting a lot more productive, right?

Can you share, like maybe from a time saved perspective, how much on teams you think are

saving on average? And also how our teams themselves evolving, right? As like the barriers

to entry for certain functions are kind of declining, are artists becoming engineers and

engineers becoming artists. Yeah, so I think it's still very early days. Folks are still playing,

we're all doing this. We're all playing around with these tools and trying to figure out how they

fit into our workflow, while also working through some careful considerations about what data might

give you access to and so on, right? And that's something roadblocks has been very thoughtful

on. We put a lot of focus and attention to our creator community. So for example,

all this training data on roadblocks is all opted as an example. We're not taking stuff that we

assume we have the rights to or anything like that. We're just making sure creators are giving us

their data. As for productivity, I think since people are still learning, we don't have the final

word on that at all. I do see that there's supposed to be more of a blur. It's not that

artists becoming engineers or engineering becoming artists, but you don't necessarily have to stop

as soon. You can kind of go further with your own ideas in either domain and not have to wait

on finding a team member or a partner to start to create these. So I think you'll see

even individual creators can go a lot further on the platform with this sort of technology.

And then even once they're in larger teams, prototyping and blurning of roles may happen.

The ability to give a concept of what you're looking for will probably get far better.

And then I think for in any given domain, whether you're an engineer or an artist,

you'll be able to do your craft faster. Like I mentioned, with 35% pure keystrokes

for folks using our coding assist, you're going to see the same on the outside. People

just be able to do their job better and faster. Got it. And you're primarily kind of working with

open source models today, right? Can you talk about the puts and takes that roadblocks had to

address when considering open source versus like commercial APIs? Yeah. So something we're

very focused on is we always say roadblocks, we ought to take the long view. And where do we

need to go with this? Well, it's one thing just to use existing models, but we have some very

unique data that we think is going to be essential to realize the full power of this. So for example,

we have a very large number of 3D models, right? 3D models that actually have

actual behavior in the system. So imagine we don't just have images of cars or even 3D

meshes of cars. We have cars with turning wheels and steering wheels and scripts around that can

break and so on. We have doors and windows and, you know, fence gates that open. We have

then more static things like trees, but all these things, they're much more complex than just here's

a bunch of meshes, here's a bunch of creative pieces, and then someone else comes around and

scripts it. So we want to start looking at what does it take to build real 3D objects? Well,

that's not something commercial models necessarily going to go towards. I mean, we have a very unique

need here. So our bias has always been towards the open source and advancing the open source

ecosystem. So for example, there's the star coder model, which you can find on Hugging Face.

It's generally somewhere in the leaderboard on LLMs. That's something Roblox was very involved

in building thanks to a collaboration with Arjun Guha, who's a Roblox researcher and a visiting

professor from Northeastern University. So we're actively engaged not just in using these models,

but in contributing back to them and making them stronger. And we're looking actively at then how

we take them, make them stronger and apply some of our unique datasets. Got it. And maybe we can

just dive a little deeper into what it means to train 3D asset generation models. Because as you've

mentioned, it needs to be context-aware, it needs to be dynamic, segmented. A car needs to move in

a certain way. If I wanted wheels on a rubber duck, then I would want those wheels to turn as well.

I'm just wondering if you can explain to a lay person what that model training process

would entail. Yeah. So first of all, I have to say that that is kind of a grand challenge. Like I

don't think anyone knows how to do it. We work very closely with the top academics. And this is

an open problem, as they say. I think there's a few steps. The first is just getting us to much

better 3D generation, either from a prompt or a 2D image. And this is something we're working

closely on. A lot of folks are working on it. No one has really solved this well. We have some

early prototypes, for example, in how we do avatar generation from prompt or photo to avatar.

And they're really rough. I'll be honest, where we are today, I expect in six months these will

look a lot better. But that's the simple act of just creating what looks like a good 3D mesh.

From there, we also have an effort on how do we take a 3D mesh of an avatar and turn it into a

moving, living, breathing avatar. We want things like arm movement. We want in general,

realistic movement of body parts. We want now facial movement, which is now live on the platform,

where it lip syncs along with you, your eyebrows can move, and your head can move, and so on.

So all that has to be built out. We're starting with something, what I'll call simple, like avatars.

Not that avatars are simple, but they tend to fall into a particular domain, a typical class,

typical sort of thing. They tend to be bipedal and have heads and things like this.

So it's a little bit easier to train the models. But overall, what we expect to do,

and I think we've seen this with where these large models have gone, you can over time

apply techniques where you just the large volume of capabilities as you learn to describe them.

The hard thing is being able to describe to the model all the capabilities that you want it to

have. And that's something we're just going to keep pushing on. And we think that the path has

been shown more generally around language, but we have to extend that into things around,

I mentioned 3D generation, but another big one is we capture kind of real time avatar human behavior

on the platform every single month. So how do we kind of, I think it's like five billion hours

of avatar activity every month. How do we capture that? How do you encode that? How do you put

informant that's efficient? And then how do you start training a model around all that so you

could build a really great, either MPC simulator or maybe your doppelganger for when you want to

be in the experience, but you can't be in the experience or something like that. So all these

sorts of things, I think the big challenge will be how do you translate these new data types

into something a model can interpret? How do you score results? What is a good result? What's a

bad result? And then the size, the advantage of using size on these things has been very, very

clear. And that's something we will be very focused on. Got it. Would you say that, I mean,

I think the breadth and volume of data that Roblox has is like immense by no question, right?

But would you consider like this setup of the model and like the training process to be a

bigger challenge? Or would you consider like the data cleaning and collection process to figure

out what is trainable in the first place? Would that be a bigger challenge?

Yeah, I think there's always a challenge around making sure your data is clean. But I think that's

one that's relatively well understood. Like there's not a, I don't think we need a scientific

breakthrough to make that happen. For me, it's more, the biggest challenge we see is how do you

just set these things up to understand the sort of data and be able to do smart things with it?

I think there will be, for some of these use cases, have to be novel architectures or use of

existing architectures, for example. We've seen, you know, with a lot of the alum work, a lot of it

has been around the training technique, which is kind of key. But also, you know, all this kind

of started with transformers and BERT models, which were almost incredible in their simplicity,

but scaled really, really well. And then just to be made, that's what allowed us to make these

really big. There may be an equivalent of that for the 3D space. Like it's not clear one way or

another to me whether this tech gets us there, or we have to do something special. I definitely

think the way like encoding decoding happens in these models will have to be very something new,

because I don't think we're yet seeing that work as well as we'd like.

Could you elaborate on that front? What do you mean?

Yeah, it's just like these models aren't today built to understand 3D data. Like

take diffusion models, they've all pretty much focused into 2D creation, prompt to 2D. We've

seen that work really, really well. But it's what we do know is you can't just take that naively,

apply that to 3D imagery and get what you want, right? So something's going to have to change

there. It has to be some sort of breakthrough. Part of it may be the training set, which we have,

but even based on our experience with that so far, that's not sufficient. There's going to be,

need to be more breakthroughs if you want these to run both well and efficiently. And efficient

is important. At the scale we're operating at, if we want everyone to be able to create, let's say,

a custom avatar from a photo and a text prompt, that's got to be something that's computationally

reasonable. Or you can just forget about the compute needs, like the lag time to get in line

to get your avatar will just be too long, right? So we need to make this run really efficiently,

really scaleably, so that everyone can be whoever they want to be on the platform. And that's just

avatars. Then think of all the other creation you want to get into. Yeah, I want to build a

custom house. What tools can be available to help you build a custom house? We have a lot of houses

on the platform. We can do a lot of training there. It can learn a lot about that. But again,

we got to develop what is a reputation for these sorts of objects? What are the key things the

model's got to learn on? You can easily see a result where you get house-like things with no

doors or windows, for example. That'd be very, very easy to have happen, right? So I think it's

presenting, structuring the models so that you know what it means to consume and then score the

results of all these experiments. Do you imagine that as you continue to roll out new AI features

and content creation accelerates, that you'll have to then fight AI with AI from a moderation

standpoint? Because, you know, Robox is an open platform everyone can create, but there are some

things you obviously don't want shown to certain people on the platform or just not built specifically.

So how do you manage the other side of this equation, which is the moderation?

I think that's a great question. So I mentioned two key datasets that were focused on 3D generation

and kind of, you know, avatar behavior. The third big one, though, that we really have an

extensive dataset on is safety data. Understand what's acceptable, not acceptable for different age

categories of behavior. We get that both through things our own moderators have done, through

comments and views reports from our community continually. And so that dataset is just getting

richer and richer for us. It's one of the best datasets we have in the company. So what does

that mean for the future of generated AI? Well, it means a few things. One, it means,

and I'm going to move away from generative here and just talk about like large models because

on the safety side, you're not really generating, you're evaluating, right? And if you think about

the root for these large language models, it was all around natural language processing tasks.

A lot of safety tasks look like that in some way, shape or form, where you're trying to understand

the context or the content. For example, what is bullying is not something you can get through

just understanding word patterns, simple word patterns, right? Like bullying is very contextual,

as an example. And most, it turns out, you know, outside of just like poor language,

most non-civil human behavior tends to be very, very contextual. And I think there's a place

where these models have already started to show for us some incredible improvements. In fact,

we'll be launching soon our first real-time voice moderation system. So we launched voice

a number of quarters ago, but we kept it very small. And it was based on, safety was based on

age of participants and kind of responsive moderation to abuse reports. In order to expand

that, we need this to be automated. We need it to be real-time. So we have built a model that

is starting to do a great job of understanding when using inappropriate language, when you're

bullying, when you're making racist comments. There's a bunch of categories that it can feel

directly into. And it can also give a user real-time feedback. We've seen with voice, for example,

that people will, you know, kind of, you'll devolve to the lowest level. So if one person's

behaving badly and encourages others to behave badly, you give them a little nudge. We've started

to see, oh, wait, yeah, oh, someone just called me on that behavior. And we don't even have to get

to the extreme of like real moderation actions and so on. Eventually, this model will turn into

that. Well, can if someone persists in bad behavior, will have high enough confidence that it can take

action, turn your voice off, for example, or flag them in your country band or something like that.

But I don't think we're going to get there. So it's one example of using as large language models,

like to make the platform safer. In terms of AI driven generation, the other thing you can do

is you can train safety directly into the models you're building. You can work to make sure things

that you consider not appropriate just don't show up in the creation. So in a sense, these are an

advantage over, let's say, former, much more manual generic tools where that's kind of almost

impossible. You have to put it through a separate moderation step. With these sorts of generic tools,

we can kind of make civil behavior in generation all in one because it comes down to how you train

the model and how you build the model and what you're looking for as behavior. So I think there's a

huge opportunity there as well to just make the platform more civil. And to do it, by the way,

in a way that is age appropriate, like as I believe you're aware, we recently launched experience

guidelines. You know, that was for 13 plus was the initial we're looking at 17 plus, which we've

announced. And in doing those, these models, those general things will behave differently. There's

different standards for those. And I expect long term, there'll be different standards around

the globe. Right. I like to think of the dilemma of the beer can. Where is that appropriate? Well,

it depends on age in a lot of places. But standards in the United States and Europe are going to be

different. And standards between US and Europe and another part of the world might be very different

on how that's perceived. We need to understand all of that and take that account. I think these

large models give us a great opportunity to really dial that in without a lot of human intervention.

So it sounds like you can get much more granular with how you address moderation without incurring

too much overhead and costs because you're unable, you know, in essence, you're unable to embed a

lot of this into the content generation, which is exactly right. Yeah, that's, that's really

interesting. Maybe just on just taking a little higher level, how do you think generative AI will

impact kind of just the Roblox platform generally, right? As in, do you expect more experiences?

And do you expect like higher quality experiences on average? And like, how do those two kind of

factors battle each other out, right? Because the higher quality experiences kind of have to

get through the noise of more experiences on average, right?

That's also a great question. I expect both. I expect we lower the bar when it takes to create

and forever will make it easier to create a more immersive experience, right? So you'll get both of

those things. Going back to the demo that I showed with Morgan Stanley, you'd start to see how you

don't need to know the catalog. The AI can go find stuff for you and can find the right thing,

it can ask for your preferences, right? And you can start to just build something out very,

very quickly with that. How do we differentiate the great experiences from the not so great

experiences? Well, we've been investing for years on how we do recommendation or recommendation

system, our discovery system. I mentioned we have deployed semantic search which makes search

across the platform much realer, much more accurate based on what you're trying to find.

But the same goes for our discovery systems where over time, I expect we'll be taking these millions

and millions of experiences and being able to understand each of us individually. Dan will get

a different set of recommendations than Andrew, than Nick. We're just going to get different

things based on what we've shown we'd like to do in these experiences. We're already starting to

see the dividends from that. Our growth into 17 and up, you may not be aware, but that's our

largest growing category at this point. And collectively 17 and 24 is our collectively

largest age group at this point on the platform. I'd say if there was one thing that drove that,

there's been many, many things, but the most important is differentiation on recommendations,

helping people find the content on the platform that excites them. And I think we just have to

keep going with that. And I'm very, very optimistic of the technology being there to

help us understand very early that this experience has a spark and we need to promote it up for a

certain class of person on the platform. Yeah, speaking of 17 plus experiences, I got to tinker

around with, I think it's like a bar experience, where you're drinking liquor. I found it incredible

to see other people purchase virtual alcohol within the experience. And it's really admirable how

Roblox has really pioneered putting value into these digital assets and just encouraging users to

purchase virtual goods as a form of self-expression. And one could imagine that it's going to get

easier to create such goods with generative AI, both on a developer standpoint, but also for

consumers. So I'm just wondering if you can maybe daydream with us on how you envision

consumers interacting directly with generative AI on Roblox in the future.

Sure. If you don't mind, I want to just back up on a few things you said there. First,

yeah, you were at Top Room, 17 plus experience. And one thing I'll call it's unique on Roblox,

in order to be 17 plus there, you had to show ID, right? So one thing we're starting with,

again, is safety and stability. You're not going to have someone under 17 casually walking

in that experience because they found it on the platform, right? They have to have the government

ID to get in. And we're being just really, really careful with all of that. And then, yeah, you end

up in this Top Room experience, which is really just, if you look at what it really is at its core,

it's a hangout for people who are over 17, who want to be with people who are over 17, who don't

want to be with, it's a little bit of an adult swim sort of experience, right? Like, it's really

just a space for people who are older on the platform to hang out. Then you can see the potential.

If you walk around, they have a stage set up, right? You could imagine there being real-time gigs

of various sorts on the platform with that, right? Or maybe pre-recorded gigs or whatever it is,

but things that kind of fit into that whole genre of adults hanging out with adults, right?

So that's, I mean, even before we get to, like, what it means to be creating objects on the

platform or, you know, consumer goods, I want to call out the opportunity with 17 Plus that it

can be both incredibly safe for the folks who are not and then kind of give an outlet to people

who are older. So now let's talk about consumer goods. I think we've already started to see an

explosion there. I'm not sure if you checked out the band's experience, but the sort of immersion

you can have with a product or a product line is unlike anything else. You're sitting there and

you're playing with it. This is so much better than a webpage. You're sitting there, you're

trying out a skateboard kind of, right? You're seeing how you look in a particular set of clothing.

We've had incredible luck with Vans, with Gucci, throughout Floran. Like, these brands are all on

the platform and they're just growing. We have more and more of them every day. It's an incredible

way to connect in both directions. It's not an ad being forced upon you. It's a thing you're electing

to go do and engage with because you're passionate about the products, right? In the same way, kind

of, we all like for certain categories of things, we all have the stories we like to go in and just

browse. We're not going into buy right now necessarily. We just want to check out what's hot,

what's going on, what do I want to learn about. It's an educational experience to some degree.

I could see then taking that into the general world where maybe ideas for customization. Let's

say it's a handbag or a shirt or any other sort of clothing item, the ability to tweak it real time

and get an idea what that might look like. You add to the fact that you're building your own

avatar to go with that. So maybe when you're doing fashion, you want an avatar that looks

incredibly like yourself. As opposed to another experience, you might want something that's a

little bit more, you want to be someone else. You want to escape a little bit. That all becomes

quite possible. That sort of flexibility on the platform. So I think the limits here, there are

no limits here. It's limitless on where this could go. And we're starting to experiment with this,

engaging more with brands. We have portal ads on the platform, which are completely new things,

but complement these sorts of things very well. 17 plus experiences or age experience guidelines in

general. So you can really tailor who you're talking to and who you're connecting with at a

given time. And again, bringing that all back to keeping things civil, keeping things safe based

on your age category and the country you're living in what's appropriate. I wanted to ask one

question. Just playing all of this out, hearing all of this content creation acceleration,

you had mentioned before the meta quest beta. And when you start to couple the environments and

experiences that are available on Roblox with virtual reality, that's starting to get into the

ready player one metaverse, fully immersed, buying digital assets that also may be tied to the real

world. Is that kind of the end state here? Or do you still think that it's important to be content

or platform agnostic? You mentioned all of the other platforms you're on. Or is it the VR is

kind of the end goal? And you couple that with generative AI. I mean, it really starts to begin

to take on a life of its own. And like, I just immediately go into the ready player one movie.

And I'm like, that's, that's where all of this is headed in five years. And Roblox seems like

they're leading the way on both the AI front and just being in all or on all of the platforms you'd

want to be on. Yeah, so I think VR is a really interesting format for consuming 3D virtual worlds.

But I think what we've seen is it's by far not the only way people most of the time want to consume

content whenever they have the time in the place to do it. So I think VR is incredibly compelling

way to consume 3D immersive content. But we're also seeing that there's just a strong driver for

people to be able to consume content wherever and whenever they have the time, the energy,

the inclination to do so. We've seen things can be incredibly compelling immersive on a phone.

And we see that just by how many folks engage in Roblox with a phone. And then what's also

cool is they can when they have the time take their VR headset and such as your headset you

generally need a room. Like I don't know about you guys, but if the dog walks into the room when

I'm using VR that doesn't go well, right? Like I could end up tripping over the dog.

You know, so you need that space, but it's just so immersive. It can be a huge amount of fun in a

different way. And the great thing is Nick could be on with VR and I could be on my phone and we

can be interacting in the same experience in our own ways as the way that's appropriate for us.

So at least for now, I think that choice is going to be very, very important. And I think we'll see

how people engage with VR and there's clearly a lot you can do with that. And I think we all

believe AR is coming in some form and we have to see what that means. But at the same time, I think

people are going to get a lot of pleasure out of the device you have in your back pocket,

pulling it out and using it whenever you like, right? And be able to engage with some of this,

you know, maybe you're thinking of going shopping, you want to engage with some of these brands

before you go shopping, you're not at home with your VR headset, you just have your phone and

that's how you're going to do it, right? So I think it kind of goes both ways. And

I think the future is bright in terms of new formats and new hardware platforms that are

available. And I feel we're set up very well from a technical point of view to manage all the different

platforms. I think just adding on to VR, I think voice and like a 3D spatial voice is kind of the

dominant form of communication on Roblox today. I was just wondering how that kind of extends,

does it extend just well to VR and how you would adjust for like private communications

in VR versus, you know, a blast and how you're just thinking about

socialization overall within VR. So this is really interesting from just even a technical point of

view. We're focused on all sound and Roblox at its root being spatial. So that means I'm closer to

I hear you really well, I can even whisper in your ear, you can back up and you don't hear it as

well or you don't hear it at all. And that of course translates very well into VR, but also

just translate to anything where you're trying to do 3D. There will be things like, you know,

group chats and so on, but we're in a sense going to mimic those through virtual headpieces and

cell phones and so on where, you know, you're just like we are here, we can all each other

clearly even though we're not near each other, right? And that's the way we're going to kind of

build up the fundamentals in Roblox for not just voice, but all sound, it can be music, right?

So imagine in a nightclub, there'll be loud parts, you can back away and go talk to someone,

it'll be easier to hear. And we're very excited about that future from a from a sound point of

view. I think it then you get into private communication, there's a few ways we're going

to do that. I think for starts, just to get this right, private communication will be a more

explicit flag. It won't be that you're accidentally in a corner or no one can come by because imagine

you're in a virtual representation of Central Park in New York, you think it's private,

but a nine-year-old wanders by like it's not okay, whatever language you're using when that

happens, like nor would it be in real life. If you're sitting with your friend talking in very

colorful language and nine-year-old comes by, their parents probably not going to be really

thrilled about what their kid just heard even though they wandered into your space, right?

So we're going to start probably with making it very explicit to share spaces in a private way

and there the moderation rules will probably be very different, the content acceptability

rules will probably be very different than when you're in a public space where it's kind of you

have to assume anyone's around, just like the way you and I might talk in the middle of a shopping

mall would be pretty different than if we were hanging out at one of our houses or something

like that. That all goes into what civility means. So civility is who speaking, kind of what is

their background, what are the laws of the country they're in, but also then is it private, is it

public communication, right? And looking at all that as a fairly complicated matrix of what behavior

is okay when, but then making that really obvious to everyone, it should mimic the real world pretty

closely, which we think will make it fairly intuitive. I guess just on that Central Park

example, in terms of moderation, do you think these AI models can get to a point in which

it's actually a preemptive censorship as opposed to reducing the latency so much that it feels

almost instantaneous? Yeah, I absolutely think we can get there. So I mean, at the end of the day,

do I think we can build a model that beeps things out that you don't want to hear? Absolutely, right?

Or maybe even not beeps it up, substitutes words. So I think that's all within reach. Our models

right now have a few seconds latency ranging from let's say five to 10 seconds on what we're seeing,

but this is our first generation of it. And we're not even running it yet on your end device,

which I think a great way if you want to protect yourself and you should be running a small version

of the model that's that ultimate defense right on your end device, we may be doing more sophisticated

processing up in our cloud, but we can give you some of that direct control as well. There's a

lot of possibility with getting this to be real time and that's good a few ways. We know with

anything around nudging people towards stability real time makes a huge difference. So the sooner

we can give a speaker feedback and the better we can protect you without you just having to be

frustrated and complaining about it, the better everyone is. And that's absolutely I think the

future. And Roblox is a global platform. So I assume if you can bleep out words and

in real time, you could probably also translate into different languages in real time as well,

because that would be pretty incredible. And you'd extend the platform to different countries and

geographies and you know, people would be forming friendships and in different languages, which

is that is that also possible? Yeah, so today, voice because of moderation reasons restricted

to English, we're going to roll that out as we get these languages built in multiple languages.

But the interesting about language, unlike general, let's say type text, people

naturally in certain parts of the world switch languages mid sentence, even as they're talking,

which makes this kind of exciting. But you're right overall, the long term vision of automatic

translation we think is totally in reach. You can kind of think about what we're doing now is

translating from, let's say, arbitrary English to safe English might be almost

and like the same sorts of approach long term can be used for real time translation.

I think there's a ways to go like that would be extremely computationally expensive right now.

And at the scale we've been doing voice today, that wouldn't be okay. But where this tech is

going that seems totally within within bounds and insight, I think it's a really interesting

technical problem to work on. Yeah, just moving back to your earlier example with fans world.

I just remember with, I think Nike when the Pharrell Williams like human

sneakers first came out where you can customize the text on the sneakers. I thought that was like

the most revolutionary thing in the world, right? Just giving the users some customization abilities.

But it seems like with asset generation and material generation, a very near term possibility

for consumers themselves, we can probably see a lot more custom assets being created in these

branded environments, but also translated into like their physical counterparts, right? And

I'm just wondering if you can kind of give a general timeline or maybe vision on when consumers

can start working with assets directly when consumed or like environments, right? And like

maybe entire experiences afterwards. How many years down the line do you think that is?

Right, that's certainly a good question. So I think I can only speak to the what I'll call the

Roblox end of that. Like I think the ability to start doing arbitrary generation as a user of

different components or customization is a small number of years away. I don't know if it's one

year, if it's two or three or something like that, but you know, it's not five, 10 years away.

And I think when we do that, we can absolutely give signals back to creators that something is

being created and what it looks like. And you can even upload it to the, we've built all the APIs

at this point where you could upload that to the permanent catalog at that point. Then there's the

other part of the willingness, desire and the tech. If you are a clothing manufacturer to take that

and turn that into a real object and ship it. I'm not particularly well qualified to talk about

that into the supply chain and what that looks like. It feels like something that should be very

doable based on some of the things you've seen emerge, but we have to wait and see or you'll have

to find another person for a podcast to answer on like that into the supply chain and what that looks

like. But we'll give them the ability to understand what it was, a 3D representation of it, a description

of the textures or the text or whatever is they've done to it and let them run with it.

Got it. Thank you. I guess we'd be remiss if we didn't mention that the Roblox developer

conference is coming up. I think September 8th through 9th. We're super excited to see what you

have in store. We know that as we've discussed material generation and code assist are now live,

but can you give us a little sneak peek into what we can expect to see at RDC?

Yeah, so you're going to see a lot of cool stuff at RDC. It says, you know, our big event for the

year and we like to talk about where we're going in the future with that. I don't want to give too

much away because I want to leave it for RDC, but there will be a lot of conversation around

generated AI. We'll be talking about where we're going with voice and some of the stuff around

voice moderation. We'll be talking about the next generation of how you build avatars. We'll be

talking about where coding assist and material generation go and then ultimately where things

like the demo you saw with Morgan Stanley are going. So we'll be talking about all of that at RDC.

I'll leave the specifics for early September and you can catch up with them there.

Awesome. We can't wait. Awesome. Well, thank you again so much, Daniel, for your time today and

sharing Roblox's incredible work and vision with generative AI, the metaverse, and modern

socialization overall. We're so excited to see what Roblox has in store at RDC this September,

8th through 9th. And if our listeners want to learn more about Roblox's latest technological

feats, where should they go? They should just go to our website, our tech blog, and they can read

all about what's going on there. Is the tech blog different from research.roblox.com?

It is. We have a research site that focuses on particularly research contributions,

but we also have a tech blog that's a little bit broader and just highlights the latest

interesting cool tech things we're doing. Well, both are cool, in my opinion. But okay,

thank you again, Daniel, and yeah, speak again soon. My pleasure.

ARC believes that the information presented is accurate and was obtained from sources that

ARC believes to be reliable. However, ARC does not guarantee the accuracy or completeness of

any information and such information may be subject to change without notice from ARC.

Historical results are not indications of future results.

Certain of the statements contained in this podcast may be statements of future expectations

and other forward-looking statements that are based on ARC's current views and assumptions

and involve known and unknown risks and uncertainties that could cause actual results,

performance, or events that differ materially from those expressed or implied in such statements.

Machine-generated transcript that may contain inaccuracies.

On today’s episode of FYI, hosts Andrew Kim and Nicholas Grous are joined by Roblox CTO Daniel Sturman, as they venture into the dynamic world of this gaming platform. Roblox empowers users to both create and experience unique content. Join us as we explore the multifaceted ways people leverage this technology, from crafting games and orchestrating music concerts to forging immersive brand experiences. We’ll also venture into the future of virtual reality (VR) and emphasize the significance of offering users the choice to engage with content in their preferred manner. Stay tuned for a discussion with Daniel, as we discuss Roblox’s success, confront the challenges in rendering realistic 3D generations, and discover how the company is ensuring safety and civility on the platform.https://ark-invest.com/podcast/roblox-the-virtual-playground-where-gaming-fashion-and-real-world-brands-collide/

" target="blank">
“We try very hard not to create any of our own content. The general idea behind Roblox was one of bringing 3D simulation to people and letting them create whatever they wanted out of that.” – Daniel Sturman





Key Points From This Episode:

Roblox as a content creation platform
Showcasing an early demo of an artificial intelligence (AI) system; enhancing product features
Utilizing generative AI to boost casual creative accessibility
Addressing challenges in 3D generation and intricate avatar movement
Navigating inefficiencies with models not designed for 3D data
Leveraging rich safety data to refine generative AI models
Anticipating immersive experiences, personalized touches, and expansive growth
Prioritizing safety and custom experiences
Exploring VR and phone avenues for immersive content creation
Enhancing virtual reality communication through spatial sound; ensuring privacy and civility
Innovating with real-time word censoring and substitution in models
Approaching the horizon of arbitrary generation of user components