Generative AI (Artificial Intelligence) and its underlying foundation models represent a paradigm shift in innovation, significantly impacting enterprises exploring AI applications. For the first time, because of generative AI models, we have systems that understand natural language at a near-human level and can generate and synthesize output in various media, including text and images. Enabling this technology are powerful, general foundation models that serve as a basis or starting point for developing other, more specialized generative AI models. These foundation models are trained on vast amounts of data. When prompted with natural language instructions, one can use these learnings in a context-specific manner to generate an output of astonishing sophistication. An analogy to generative AI used to create images may be the talented artist who, in response to a patron’s instructions, combines her lifelong exposure to other artists’ work with her inspiration to create something entirely novel.
As news cycles eclipse one another about these advancements, it may seem like generative AI sprang out of nowhere for many business and executive leaders. Still, the reality is that these new architectures are built on approaches that have evolved over the past few decades. Therefore, it is crucial to recognize the essential role the underlying technologies play in driving advancement, enterprise adoption, and opportunities for innovation.
How we got here
The most notable enabling technologies in generative AI are deep learning, embeddings, transfer learning (all of which emerged in the early to mid-2000s), and neural net transformers (invented in 2017). The ability to work with these technologies at an unprecedented scale – both in terms of the size of the model and the amount of training – is a recent and critically important phenomenon.
Deep learning emerged in academia in the early 2000s, with broader industry adoption starting around 2010. A subfield of machine learning – deep learning – trains models for various tasks by presenting them with examples. Deep learning can be applied to a particular type of model called an artificial neural net, which consists of layers of interconnected simple computing nodes called neurons. Each neuron processes information passed to it by other neurons and then passes the results on to neurons in subsequent layers. The parameters of the neural net models are adjusted using the examples presented to the model in training. The model can then predict or classify new, previously unseen data. For instance, if we have a model trained on thousands of pictures of dogs, that model can be leveraged to detect dogs in previously unseen images.
Transfer learning emerged in the mid-2000s and quickly became popular. It is a machine-learning technique that uses knowledge from one task to improve the model performance on another task. An analogy to understand this powerful technique is learning one of the “Romance Languages,” like Spanish. Due to their similarities, one may find it easier to learn another romance language, like Italian. Transfer learning is essential in generative AI because it allows a model to leverage knowledge from one task into another related task. This technique has proven groundbreaking as it mitigates the scarcity of data challenge. Transfer learning can also improve the diversity and quality of generated content. For example, a model pre-trained on a large dataset of text can be fine-tuned on a smaller dataset of text specific to a particular domain or style. This allows the model to generate more coherent and relevant text for a particular domain or style.
Another technique that became prevalent in the early to mid-2000s was embedding. This is a way to represent data, most frequently words, as numerical vectors. While consumer-facing technologies, such as ChatGPT, demonstrate what feels like human-like logic, they are a great example of the power of word embeddings. Word embeddings are designed to capture the semantic and syntactic relationships between words. For example, the vector space representation of the words “dog” and “lion” would be much closer to each other than to the vector space for “apple.” The reason is that “dog” and “lion” have considerable contextual similarities. In generative AI, this enables a model to understand the relationships between words and their meaning in context, making it possible for models like ChatGPT to provide original text that is contextually relevant and semantically accurate.
Embeddings proved immensely successful as a representation of language and fueled an exploration of new, more powerful neural net architectures. One of the most important of such architectures, the “transformer,” was developed in 2017. The transformer is a neural network architecture designed to process sequential input data, such as natural language, and perform tasks like text summarization or translation. Notably, the transformer incorporates a “self-attention” mechanism. This allows the model to focus on different parts of the input sequence as needed to capture complex relationships between words in a context-sensitive manner. Thus, the model can learn to weigh the importance of each part of the input data differently for each context. For example, in the phrase, “the dog didn’t jump the fence because it was too tired,” the model looks at the sentence to process each word and its position. Then, through self-attention, the model evaluates word positions to find the closest association with “it.” Self-attention is used to generate an understanding of all the words in the sentence relative to the one we are currently processing, “it.” Therefore, the model can associate the word “it” with the word “dog” rather than with the word “fence.”
Progress in deep learning architectures, efficiently distributed computation, and training algorithms and methodologies have made it possible to train bigger models. As of the time of writing this article, the largest model is OpenAI’s ChatGPT3, which consists of 173 billion parameters; ChatGPT4 parameter information is not yet available. ChatGPT3 is also noteworthy because it has “absorbed” the largest publicly known quantities of text, 45TB of data, in the form of examples of text, all text content of the internet, and other forms of human expression.
While the combined use of techniques like transfer learning, embedding, and transformers for Generative AI is evolutionary, the impact on how AI systems are built and on the adoption by the enterprise is revolutionary. As a result, the race for dominance of the foundation models, such as the popular Large Language Models (LLMs), is on with incumbent companies and startups vying for a winner-take-all or take-most position.
While the capital requirements for foundation models are high, favoring large incumbents in technology or extremely well-funded startups (read billions of dollars), opportunities for disruption by Generative AI are deep and wide across the enterprise.
Understanding the technology stack
To effectively leverage the potential of generative AI, enterprises and entrepreneurs should understand how its technology layers are categorized, and the implications each has on value creation.
The most basic way to understand the technologies around generative AI is to organize them in a three-layer technology “stack.” At the bottom of this stack are the foundation models, which represent a transformational wave in technology analogous to personal computing or the web. This layer will be dominated by entrenched incumbents such as Microsoft, Google, and Meta, rather than new startup entrants, not too different from what we saw with the mobile revolution or cloud computing. There are two critical reasons for this phenomenon. First, the scale in which these companies operate, and the size of their balance sheets are pretty significant. Secondly, today’s incumbents have cornered the primary resources that fuel foundation models: compute and data.
At the top of this stack are applications – software developed for a particular use case designed for a specific task. Next in the stack is the “middle layer.” The middle layer is where enabling technologies power the applications at the top layer and extend the capabilities of foundation models. For example, MosaicML allows users to build their own AI on their data by turning data into a large-scale AI model that efficiently runs machine learning workloads on any cloud in a user’s infrastructure. Notably, an in-depth assessment of the middle layer is missing from this discussion. Making predictions about this part of the stack this early in the cycle is fraught with risk. While free tools by incumbents seeking to drive adoption of their foundation models could lead to a commoditization of the middle layer, cross-platform or cross-foundational model tools that provide added capabilities and optimize for models best fit for a use case could become game-changers.
In the near term, preceding further development in the enabling products and platforms at the middle layer, the application layer represents the bulk of opportunities for investors and builders in generative AI. Of particular interest are user-facing products that run their proprietary model pipelines, often in addition to public foundation models. These are end-to-end applications. Such vertically integrated applications, from the model to the user-facing application layer, represent the greatest value as they provide defensibility. The proprietary model is valuable because continuously re-training a model on proprietary product data creates defensibility and differentiation. However, this comes at the cost of higher capital intensity and creates challenges for a product team to remain nimble.
Use cases in generative AI applications
Proper consideration of near-term application-layer use cases and opportunities for generative AI requires knowledge of the incremental value of data or content and a complete understanding of the implications of imperfect accuracy. Therefore, near-term opportunities will be those with a high value of incremental data or content, where more data or content has economic value to the business and low consequences of imperfect accuracy.
Additional considerations include the structure of the data for training and generation and the role of human-in-the-loop, an artificial intelligence system in which a human is an active participant and thus can check the work of the model.
Opportunities for entrepreneurs and enterprises in generative AI lie in use cases where data is very structured, such as software code. Additionally, human-in-the-loop can mitigate the risk of the mistakes an AI can make.
Industry verticals and use cases with these characteristics represent the initial opportunity with generative AI. They include:
- Content creation: Generative AI can improve creativity, rate of content creation, and content quality. The technology can also be leveraged to analyze the performance of different types of content, such as blogs or social media ads, and provide insight into what is resonating with the audience.
- Customer service and support: Generative AI can augment and automate customer service and support through chatbots or virtual assistants. This helps businesses provide faster and more efficient service to their customers while reducing the cost of customer service operations. By pre-training on large amounts of text data, foundation models can learn to accurately interpret customer inquiries and provide more precise responses, leading to improved customer satisfaction and reduced operating costs. Differentiation among new entrants leveraging generative AI will largely depend on their ability to use fine-tuned smaller models which enable a better understanding of industry-specific language, jargon, or common customer questions as a mechanism to deliver tailored support that meets the needs of each customer and to continuously refine products for more accurate and effective outcomes.
- Sales and marketing: AI can analyze customer behavior and preferences and generate personalized product recommendations. This can help businesses increase sales and customer engagement. In addition, fine-tuned models can help sales and marketing teams target the right customers with the right message at the right time. By analyzing data on customer behavior, the model can predict which customers are most likely to convert and which messaging will be most effective. And that becomes a strong differentiator for a new entrant to capture market share.
- Software and product development: Generative AI will simplify the entire development cycle from code generation, code completion, bug detection, documentation, and testing. Foundation models allow developers to focus on design and feature building rather than correcting errors in the code. For instance, new entrants can provide AI-powered assistants that are fine-tuned to understand programming concepts and provide context-aware assistance, helping developers navigate complex codebases, find relevant documentation, or suggest code snippets. This can help developers save time, upskill their abilities, and improve code quality.
Knowing the past to see the future
While we are still in the early days of the immense enterprise and startup value that generative AI and foundation models will unlock, everyone from entrepreneurs to C-suite decision-makers benefits from understanding how we arrived at where we are today. Moreover, understanding these concepts helps with realizing the potential for scale, reframing, and growing business opportunities. Knowing where the opportunities lie means making smart decisions about what promises to be an inspiring future ahead.