Will superintelligence destroy job security?
Improvements over the last decade in machines’ ability to generate images and text have been staggering. As is often the case with innovation, progress is not linear, but comes in leaps and bounds, which surprises and delights researchers and users alike. 2022 was a banner year for innovation in generative AI, built on the advent of diffusion methods for image generation and of increasingly large-scale transformers for text generation.And while it provided a major leap forward for the entire natural language processing (NLP) industry, there are three reasons why generative AI models were the first to stir the public’s excitement, and why they’ll still be the main points of entry into what language AI can do for the time being.
What’s behind the generative AI excitement?
The most obvious reason is that they fall into a very intuitive class of AI systems. These models aren’t used to create a high dimensional vector or some uninterpretable code, but rather natural-looking images, or fluent and coherent text — something that anyone can see and understand. People outside of machine learning do not need specific expertise to judge how natural or fluent the system is, which makes this part of AI research seem much more approachable than other (perhaps equally important) areas.
Second, there is a direct connection between generation and how we evaluate intelligence: When examining students in school, we value the ability to generate answers over the ability to discriminate answers by selecting the right answer. We believe that having students explain things in their own words helps show a better grasp of the topic — ruling out the chance that they’ve simply guessed the right answer or memorized it.
So when artificial systems produce natural images or coherent prose, we feel compelled to compare that to similar knowledge or understanding in humans, although whether this is overly generous to the actual abilities of artificial systems is an open question in the research community. What is clear from a technical perspective is that the ability of models to produce novel but plausible images and text shows that rich internal representations of the underlying domain (e.g., the task at hand, the sort of things the images or text are “about”) are contained in these models.
Furthermore, these representations are useful across a wider range of domains than just generation for generation’s sake. In short, while generative models were the first models to grasp the public’s attention, there will be many more valuable use cases to come.
One thing from another
Third, the latest generative models show an ability to conditionally generate. Instead of sampling existing images or snippets of text, they have the ability to create text, video, images or other modalities which are conditioned on something else — like partial text or imagery.
To see why this is important, one needs to look no further than most human activities, which involve generating something depending on something else. To give some examples:
- Writing an essay is generating text conditioned on a question/topic and the knowledge and views contained in our own experience and in books, papers and other documents.
- Having a conversation is generating responses conditioned on our knowledge of the world, our understanding of the pragmatics the situation calls for, and what has been said up to that point in the conversation.
- Drawing architectural plans is generating an image based on our knowledge of architectural and structural engineering principles, sketches or pictures of the terrain and its topology/surroundings, and the (often underspecified) requirements provided by the client.
Most intelligent behavior follows this pattern of producing something based on other things as context. The fact that artificial systems now have this ability means we’ll likely see more automation in our work, or at least a more symbiotic relationship between humans and computers to get things done. We can see this already in new tools to help humans code, like CodeWhisperer, or help write marketing copy, like Jasper.
Today, we have systems that can create text, images or videos based on other information we feed to it. That means we can apply these generations to similar problems and processes for which we once needed human experts. This will lead to additional automation, or for more symbiotic forms of support between humans and artificial systems, which has both practical and economic consequences.
The new foundational tools
For the rest of 2023, the big question will be what all this progress really means in terms of potential applications and utility. It is an exceedingly exciting time to be in the industry because we are looking to do nothing less than build foundational tools for building intelligent systems and processes, making them as intuitive and applicable as possible, and putting them into the hands of the broadest class of developers, builders and innovators possible. It’s something that drives my team and fuels our mission to help computers better communicate with us and use language to do so.
While there is more to human intelligence than the processes this technology will enable, I have little doubt that — paired with the boundless ability humans have to constantly innovate on the backs of new tools and technology — the innovation we’ll see in 2023 will change the way we use computers in disruptive and wonderful ways.
Ed Grefenstette is head of machine learning at Cohere.