While artificial intelligence has taken the world by storm, its rise hasn’t come cheap.
“The consequence of multimodal AI is that the amount of data and compute power being used to train modern AI systems has grown exponentially,” R K Anand, co-founder and chief product officer at Recogni, told PYMNTS for the “AI Effect” series.
He explained that to train some of today’s largest foundational models, firms must spend months to a year, and more than hundreds of millions of dollars.
And that spending doesn’t stop once the models are ready. As just one example, Meta’s own predictions for its fiscal year capital expenditures on AI and metaverse development are set to range from $35 billion to $40 billion by the end of this year, as rather than accelerating its return on investment on AI, Meta plans to spend $5 billion more than it had originally planned.
That’s why driving the development of next-generation systems for AI inference solutions that can boost performance and power efficiency while offering the lowest total cost of ownership is so critical, Anand said.
“Inference is where the scale and demand of AI is going to be realized, and so building the most efficient technology from both a power cost and total cost of operations perspective is going to be key for AI, he said.
Improved power efficiency translates directly into lower operating costs.
Read also: Why Measuring the ROI of Transformative Technology Like GenAI Is So Hard
As Anand explained, AI inference is the next step after AI training, and the one end-users are most familiar with.
AI training is the process of building out a model with the appropriate weights and desired input-output algorithms that can enable the AI system to make accurate inferences above a set quality threshold. AI inference is the process of the AI system producing predictions or conclusions that meet that output threshold.
“Inference is when the model isn’t learning anything new, but when it does its job of responding to user prompts or to an API call,” Anand said. “And that task can now be optimized.”
Almost every real-world application of AI relies on AI inference, and inference represents an ongoing power and computing cost. If an AI model is actively in use, it is constantly making additional inferences, which can end up being quite expensive, at least if an AI system’s unit economics aren’t strategically optimized to counteract that cost.
“Training is an unavoidable cost center,” Anand explained. “You have to spend lots of money to build the models. But inference can be a profit center, and that’s because the elements associated with inference are how much does it cost for me to run that inference system, and how much am I going to charge customers to use it, and is there a differential that results in a profit for me to deliver that service? The economics of inference matter the most.”
See also: Recogni Raises $102 Million to Meet AI Applications’ Compute Demand
Pruning excess weights and reducing the model’s precision through quantization are two popular methods for designing more efficient models that perform better at inference time, he added.
Because up to 90% of an AI model’s life is spent in inference mode, the bulk of AI’s cost and energy footprint is also there, making optimizing it and lowering the cost of operations an attractive proposition.
“Enterprises will start taking models that are robust, have high quality, and bringing them in-house, whether they do them in the cloud or on-prem, and using them for getting higher productivity, higher returns on investment, and do inference tasks as a daily job,” said Anand. “And for that, inference has to be the most efficient, has to be the most economical, has to be the most power efficient.”
Without the unit economics of AI starting to make more sense from a cost basis, the industry “will be in trouble,” explained Anand, noting that the business landscape is only generating more data, and we’ve reached a tipping point where only AI solutions are best positioned to effectively parse it and recognize key patterns.
“There’s no human way to analyze and comprehend and carve out that data,” he said. “And so, you do need large AI machines.”
“People will only use tools and systems when they increase productivity but don’t incur more cost than what it costs today to expand and run a business,” Anand said. “Companies cannot have big jumps in operating expenditure just because they want to use AI. We are in an 80/20 rule today, where 80% of the compute is being used for training AI, and that will shift to 80% for inference when more of us use AI for our day-to-day work.”
For all PYMNTS AI coverage, subscribe to the daily AI Newsletter.