When you are processing 8 billion tokens a day, the economics of artificial intelligence change fast. It is one thing to run a pilot program for a few dozen users; it is an entirely different beast when your internal tools become critical infrastructure for over 100,000 employees.
This was the exact wall AT&T hit with its "Ask AT&T" tool. Launched in mid-2023, the system was designed to help employees with everything from writing code to summarizing documents. But as adoption skyrocketed, so did the bill. Relying exclusively on massive, reasoning-heavy Large Language Models (LLMs) for every single query simply wasn’t sustainable.
So, what did they do? They didn’t shut it down. Instead, they completely re-engineered the plumbing. Led by Chief Data Officer Andy Markus, the telecom giant pivoted to a "multi-agent" architecture. The result was staggering: a 90% reduction in operating costs and a faster system to boot.
Why did AT&T move away from a single model approach?
In the early days of the generative AI boom, the strategy for many enterprises was simple: plug everything into the smartest model available (often via Microsoft Azure and OpenAI) and hope for the best. That works fine at low volume. But when you scale to billions of daily tokens, you start paying premium rates for routine tasks that don’t require a premium "brain."
As "Ask AT&T" became deeply integrated into the daily workflows of the company’s workforce, routing every request through a flagship LLM became prohibitively expensive. It is akin to hiring a PhD physicist to teach high school algebra—effective, sure, but a massive misuse of resources and budget.
Markus and his team realized they needed a system that could discern the complexity of a request before spending the compute power to answer it. They needed a traffic cop, not just a bigger engine.
How does the new "Super Agent" system work?
To solve the cost and latency crisis, AT&T implemented a sophisticated orchestration layer built on LangChain. This isn’t just a chatbot anymore; it is a complex web of decision-making algorithms.
Here is how the architecture functions:
The Super Agent: At the top of the stack sits a "super agent." Its job isn’t to answer your question, but to understand what you are asking. It acts as a router.
The Worker Agents: Once the super agent identifies the intent—say, a simple HR policy lookup versus a complex coding problem—it directs the task to a specific, purpose-built "worker agent."
Small Language Models (SLMs): Many of these worker agents utilize smaller, open-source, or less computationally expensive models. These models are perfectly capable of handling routine tasks without the massive overhead of a flagship LLM.
This shift to "agentic AI" means the system only uses the heavy artillery when absolutely necessary. According to Markus, this flexible orchestration didn’t just save money; it dramatically improved latency and response times. Employees get answers faster because they aren’t waiting in a queue for the smartest model to free up capacity.
What is the future of AI inside the enterprise?
AT&T’s pivot offers a glimpse into where corporate AI is heading. It is moving away from the "one model rules them all" mentality toward compound AI systems. Markus told VentureBeat, "I believe the future of agentic AI is many, many, many small agents doing very specific things."
The company is already doubling down on this strategy. In late 2025, they introduced "Ask AT&T Workflows," a drag-and-drop builder that allows non-technical teams to spin up their own specialized agents. They are even testing a customer-facing "digital receptionist" to screen spam and handle routine calls, further proving that these smaller, specialized agents are ready for the front lines.
This trend isn’t isolated to AT&T. Competitors like Verizon and Orange are scaling similar platforms, with Orange even commercializing its internal "Live Intelligence" tool. But AT&T’s 90% cost reduction provides the hard data needed to validate this architectural shift.
The Real Story
While the headline is the 90% cost savings, the real story here is the death of the "wrapper" strategy for enterprise AI. AT&T has proven that the real value isn’t just in accessing a model, but in orchestrating a fleet of them. This is a massive win for framework providers like LangChain and a wake-up call for cloud providers banking on indefinite, high-volume inference fees from flagship models. As enterprises get smarter, they won’t just buy intelligence; they will buy efficiency, aggressively routing traffic to the cheapest model that is "good enough" for the job.