You’d be forgiven for feeling a little cynical about the sudden explosion of hype around artificial intelligence and LLMs (Large Language Models) that we’ve seen in the past few weeks; after all, it’s once bitten, twice shy, and we’ve all just spent a couple of years being preached at about blockchain and NFTs by glassy-eyed evangelists who got far too high on their own supply.
Anyone can see that LLMs are doing some impressive things, but given how many game companies fell face-first for the unjustified hype around blockchain, and the fact that some of the most obnoxious NFT grifters are now selling themselves as AI gurus online, you’d be quite right to feel wary and to wonder about the actual limitations of this technology, how much of it is smoke and mirrors, and what actual applications it might have to video games.
Not everyone shares such reservations. There are plenty of people fully caught up on the hype train, and just a bit of dabbling with ChatGPT and MidJourney has convinced them that an enormous revolution in games and entertainment is just around the corner. At least one major games discussion forum temporarily banned posting new threads about LLMs and AI because it was getting clogged up with wild speculation about the impact of these technologies. People are getting a long way ahead of the reality, just like they did with NFTs, and many of the use cases being suggested are far beyond the actual capabilities of the technology (and in some cases, aren’t anything a sane developer would want to implement even if they actually could).
There is some obvious potential in LLM-style technologies which, if carefully implemented, could revolutionise how some kinds of game work
This is understandable, though, because LLMs do quite significantly change our perspective on the kinds of things that computers can do, in a way that few technological shifts in recent decades have. The things that are possible with ChatGPT and its ilk are insanely impressive to people precisely because they’re not the sorts of things we’ve conventionally thought of computers being good at, and the resulting perspective adjustment makes people very excited because the new boundaries and limitations aren’t clearly defined yet. Some degree of building castles in the air is to be expected.
Even from the most cautious and conservative standpoint, though, there is some obvious potential in LLM-style technologies for video game creators – some of which, if the technology is carefully implemented, really could lead to revolutionary changes to how some kinds of game work. Where NFTs were a solution in search of a problem (and every problem they purported to solve was either already solved, or wasn’t actually a problem), LLMs actually offer a possible solution to one of the most long-standing and intractable issues with creating immersive game worlds – the immersion-breaking lifelessness of NPCs, and the need to funnel players down set pathways no matter how much their actions deviate from the narrative.
For decades, the solution to this problem was to have better writing, and more of it – letting players interact with characters who had more personality, more dialogue options, and more nuanced ways to respond to player actions. Many modern games are filled with smart, well-written, witty and nuanced dialogue, but the nature of the interactions themselves creates a limitation; a character is only smart, witty and nuanced until you talk to them one time too many, and they have to loop back and repeat a previous line, instantly breaking the sense of immersion.
The actual way in which game characters respond to player actions hasn’t really changed since the 1980s, for that matter; their dialogue is in response to a set trigger in the game world, or to a selection in a conversation tree. If you do something unexpected, the character can’t change its response to handle that eventuality, which often creates dissonance between what the player is doing in the game and what NPCs are saying to them – another level of ludo-narrative dissonance (which more commonly refers to a general disconnect between the gameplay and narrative) that takes people out of the experience, bashing them over the head with the fact that it’s just a game.
LLMs offer the potential to finally overcome this core limitation of game world immersivity. Just the potential, mind, because let’s not pretend that it’s anything like as simple as hooking up your NPCs’ dialogue boxes to ChatGPT and taking an early lunch.
The technology would allow certain levels of interaction to be handed over to an LLM so that it can become truly emergent and responsive. It’s not a huge leap to imagine a world full of non-player characters that respond to player dialogue and actions in genuinely dynamic ways, for example, or broader worlds full of characters that interact with each other as well as the player, and behave in ways that aren’t pre-programmed loops.
It feels like integrating this technology is an inevitable direction for games to take
Researchers at Stanford and Google published a really interesting proof-of-concept preprint along these lines recently, showing a bunch of LLM-driven AI agents running around in a Sims-style simulation and creating some very complex and realistic social interactions between them. The concept goes beyond just having NPCs hooked up to LLMs; entire games could be “directed” by an AI model of this kind, acting as a sort of Game Master in the background who constructs aspects of stories and player experience in direct response to the player’s own actions, potentially supervising the LLMs which create NPC dialogue and action by giving them high-level directions and goals to pursue.
That’s the dream, at least – at least the sensible version of it, given what we know LLMs can actually accomplish right now, and recognising the significant limitations they still have (especially in terms of memory – they’re not good at all at staying consistent and keeping track of goals long-term).
Assuming that even a decent proportion of this can be made to work, it would arguably be one of the biggest leaps in the immersiveness of game worlds that we’ve seen in decades. Don’t underestimate the extent to which these factors have genuinely been a weakness of the games medium; as game worlds have become increasingly realistic, enormous in scope, and beautifully rendered, the street full of NPCs who move in set patterns and have rapidly exhausted dialogue options have become more and more jarring. Improving on this weakness of the medium is absolutely a goal worth pursuing, and while LLMs will not replace the need for actual writers any time soon (ask ChatGPT to tell you an original joke, if you harbour the slightest illusion about these systems being able to actually replace talented human writers), effectively prompting and guiding LLMs to fit the behaviour of fictional agents to a certain narrative and world will likely become an important additional skill in the games writer’s repertoire.
One interesting aspect of this is that it may also finally fulfill the promise of cloud gaming – not the kind where games are streamed over the network to your device, but the rather more confusing and ill-defined kind whereby games were going to offload a bunch of their work to powerful cloud systems, thus delivering experiences a humble consumer device couldn’t manage.
Talking up the “power of the cloud” was almost de rigeur a few years ago, especially for Microsoft, who seemed equal parts keen to get Xbox games leveraging their very successful Azure cloud platform, and bereft of sensible ideas about what that might actually accomplish. A few ideas have been floated for “cloud games” of this kind over the years, but most things proposed could just as easily be done with a standard client-server game architecture. Cloud services are fantastic, and underpin much of the modern internet – I’m no refusenik on this front, and rely heavily on services like Azure and Google Cloud Platform for many parts of my job – but it was never clear what it was actually going to accomplish for games… until now.
High-quality LLMs demand enormous resources, and while stripped-down versions of ChatGPT style systems can in theory be run on consumer devices, they take up gigabytes of memory and use significant amounts of CPU and GPU compute time to deliver answers to queries. That might work for an on-device AI assistant – expect the likes of Siri and Alexa to get an astonishing upgrade in the next year or two – but it isn’t realistic to do this on a device that’s also serving up a high-quality game world to the player.
As game worlds have become increasingly realistic, NPCs who move in set patterns and rapidly exhaust dialogue options have become more and more jarring
NPCs and background game director AIs running as processes in the cloud is about the only way this technology makes sense. It’s actually an ideal use case, because these queries are not enormously time-sensitive – making the player wait an extra half a second for an NPC to respond to some dialogue will barely be noticed, whereas waiting an extra half a second for some textures to be delivered or for a button-press to be acknowledged would be a really bad game experience.
The fact that LLMs and cloud-powered games are such a great match is, of course, advantage Microsoft. The company is not only a leader in cloud services – Azure has grown to become one of its major business pillars – it’s also emerging as a major leader in AI, with its huge investment in OpenAI being just the part of the iceberg we can see above water. It seems relatively unlikely that the company would block games on competing platforms from using those technologies; although a world where the advantage of playing a game on Xbox instead of PlayStation is “smarter AI” isn’t entirely unthinkable, it could well be a bridge too far for competition authorities. But it should at least give Microsoft’s game studios an advantage in implementing these new technologies.
It would also potentially mean that Azure ends up earning revenue from games being played on competitors’ systems, which has always felt like it was part of Microsoft’s long-term master plan anyway – make Xbox into a success, but also make some money on the side from competing platforms, so the company wins both coming and going.
It’s worth emphasising that the timelines being suggested by some evangelists of this technology are a bit optimistic, at best. We’re seeing hockey-stick acceleration in the development of LLMs at the moment, but technology development often leaps forward just about to the point where a tech is 90% ready for the big time, with the last 10% actually being the hardest part. That final 10% is where you’re forced to figure out how to turn your cool technology into a desirable, functional and reliable product, and the distance between those things can be significant.
Going from interesting proofs of concept through to fully fledged games is a long and fraught path: ask any designer who has dabbled in emergent systems to tell you about the challenges of building in guide rails and supervision that prevent unexpected behaviours from leaving the game in an entirely broken state. There are also reputational and ethical concerns that game companies need to be very aware of.
Handing over control of certain aspects of narrative and dialogue to an LLM creates a risk of players ending up in some very inappropriate interactions – in the weeks since people got access to ChatGPT, we’ve already seen many ways in which it can “jailbroken” by carefully engineered prompts which bypass safety protocols and allow it to do supposedly forbidden things like explaining how to make napalm or build bombs. Off the top of my head, you can bet that within moments of launching any game with LLM-driven smart NPCs, there’ll be a lot of people with a lot of free time trying to get those NPCs to spout racist or sexist invective so they can record it and put it on YouTube – and the potential for abuse if this is a multiplayer context is even more serious.
If this really does works, it will arguably be the biggest step since the move to 3D
This technology is genuinely exciting – it really does rework some of our expectations about the capabilities of computers, and in ways that will be truly meaningful for games. None of this will be simple or fast, though, and there are lots of peripheral problems to solve beyond the core issue of actually getting a LLM to control facets of the game effectively and reliably. Modes of interaction will need to be rethought, for example, since the obvious solution of actually talking to NPCs using speech interpretation systems is difficult in its own right and also won’t be suited to every type of game or interaction.
It feels, though, like integrating this technology is an inevitable direction for games to take. Plenty of companies are already working towards that goal, at least in proof of concept terms; the creative advantages aside, everyone on the business side of the industry can no doubt easily imagine the enhanced engagement and player retention that will come from creating worlds full of genuinely dynamic characters for them to interact with.
Indeed, I’d argue that it’s justified to give ourselves over to the hype and excitement a little in this case; if this really does work, it will arguably be the biggest step we’ve taken towards the science fiction dream of what games could be since the move to 3D.