Generative artificial intelligence (AI) as we know it began in earnest in 2017.
That’s when a paper from Google’s DeepMind group titled “Attention Is All You Need” introduced the foundational architecture of transformer neural networks and socialized the revolutionary concept of multi-head attention, effectively slingshotting the capabilities of AI models past their previous, sequentially hamstrung limitations as popularized by recurrent neural networks (RNNs).
In the years since its publication, the paper has become one of the most cited and influential documents in AI’s surprisingly long history.
After all, transformers are the foundational technical engine that power today’s most popular large language model (LLM)-driven AI products, including OpenAI’s ChatGPT (which stands for generative pre-trained transformers) and Alphabet’s own Language Model for Dialogue Applications (LaMDA), which all use stacks of transformers to create their human-like generative outputs.
Now, per a Bloomberg report, the last remaining researcher who co-authored the paper which led to AI’s pivotal development, Llion Jones, has left the company to start his own AI venture.
Other of the paper’s authors have gone on to launch their own innovative businesses within the AI field, including the firms Cohere and Character.AI, which have helped move generative AI from the ivory tower and business laboratory to the everyday lives of hundreds of millions of users.
Jones’ departure marks the end of an era for Google, with all eight of the paper’s authors having left the company, but it may herald the start of a new one for the AI sector.
“It … feels like good timing to build something new given the momentum and progress in AI,” Jones said, per the Bloomberg report.
Read also: Peeking Under the Hood of AI’s High-Octane Technical Needs
Transformers wholly disrupted the way AI models are developed, as well as the capabilities they possess.
The previous generations of AI models primarily relied on RNNs, which while effective, were inherently constrained by the need to work through their data sequentially — a more linear computing method than the top-down, holistic weighting approach which transformers allow for.
“We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely,” the 2017 paper stated, noting that this innovative approach removed the “fundamental constraint” of “sequential computation.”
Transformers leverage the then-novel concept of multi-head attention, which allows AI models to compute a weighted sum of their input sequence, removing the need for complex recurrent or convolutional neural networks by allowing the model to focus on different parts of the input simultaneously to capture and weight various aspects of the data input to triage and generate the components of a desired output most effectively.
“Experiments on two machine translation tasks show these [Transformer] models to be superior in quality while being more parallelizable and requiring significantly less time to train,” wrote the researchers in 2017.
By helping AI models zero in on the most important pieces of information in the data they are analyzing, transformers transformed AI algorithms from the ground up, allowing for processes to be run in parallel, rather than sequentially where one task needed to be completed (albeit in as near to real time as possible) before another could begin.
Dozens of transformers running in sequence are what power today’s revolutionary LLMs, churning through billions, if not trillions, of parameters to instantaneously ingest the complex relationships between data elements before generating a relevant response.
But that doesn’t mean transformers are the end-all, be-all of AI models.
“It’s a cryptic paper, and people have been picking up the breadcrumbs ever since,” Paul Lintilhac, a PhD researcher in computer science at Dartmouth College told PYMNTS.
See also: How AI Regulation Could Shape Three Digital Empires
If RNNs unveiled the existence of AI’s pandora’s box, transformers opened the box and let loose the capabilities within.
That’s why the most important element of a transformer model is the way it is trained to weigh the dynamic values of the data being computed in parallel to generate an output, whereas RNNs relied on backpropagation to sequentially identify the point with the least error through sequence-aligned convolution — a more intensive and compute-heavy process, that starts to strain and break down when working with longer data sequences.
As significant as they are, transformers are only the latest iteration — like the combustion engine in transportation was in replacing steam.
OpenAI’s CEO Sam Altman has suggested that GPT-4 could be the last major advance to emerge from OpenAI’s strategy of making its LLMs bigger and feeding them more data as a way of increasing the capabilities of AI tools.
What does the future hold? Perhaps a future paper from the research teams at one of today’s emergent AI firms will light the way.
After all, “writing one paper in the space is like a golden ticket these days, but a game-changing paper is like a blank check,” Lintilhac said.