Black box AI isn’t enough: Why enterprise consulting is moving to grounded models
Presented by SAP
In an era where anyone can spin up an LLM, the real differentiator isn’t the AI technology itself, but the institutional knowledge it’s grounded in. Internal and partner consultants leading operational transformation can’t risk hallucinated guidance when their recommendations impact integrated processes across supply chain, manufacturing, finance, and other core functions.
"Grounded AI is non-negotiable, because accuracy isn’t optional when we’re doing million-dollar transformation projects within the SAP ecosystem, for example," says Natalie Han, VP and chief product officer, gen AI at SAP Business AI. "Retrieval-augmented generation technology, and the ability to anchor responses in trusted enterprise knowledge, helps ensure accurate code interpretation, best-practice guidance, and clean-core decision support. It's how we bring real trust into AI-powered consulting."
A fully grounded AI assistant like SAP Joule for Consultants has tremendous value in production use cases, she adds. SAP Joule has terabytes of institutional data that's continuously curated and updated, so a consultant is assured they're getting up-to-the-minute SAP best practices and methodologies when relying on Joule, while at the same time accelerating project delivery.
"We’re saving rework time by 14%, and saving consultants 1.5 hours per day per user, which is huge when you consider how expensive consultants are now," Han says. "Early adopters like Wipro have estimated they've saved 7 million hours on a manual basis for their consultants."
The foundation of SAP Joule
SAP Joule is as certified as any consultant, says Sachin Kaura, chief architect, SAP Business AI. The tool was born in 2023, when GPTs famously passed a simulated bar exam and ignited buzz around the ability of LLMs to handle large amounts of context. It is widely acknowledged that the SAP ecosystem, along with its associated domain ontology and taxonomy, is incredibly vast and can be very complex to navigate. The question became, how could an AI co-pilot be used to navigate that complexity when it was actually grounded within the SAP ecosystem itself?
Sachin Kaura began experimenting with frontier LLM models by putting them through the same certification exams SAP consultants take. The early results were poor, but after extensive context tuning and a focus on delivering value to the partner ecosystem, Joule now consistently scores 95% or higher.
"Not only were we testing from a data perspective, but we were able to work with all of our consultants to get what we call the golden data set," Han added. "It’s non-deterministic, language-based, and thoroughly grounded in human consultant expertise. We partnered with the whole consulting organization to manually label the golden data set across all of the products. That’s become the foundation for everything we do even now."
A state-of-the-art indexing pipeline
Joule for Consultants stays up-to-date in real time. A state-of-the-art indexing pipeline pushes new SAP documentation and release content into the model as soon as it’s published, giving consultants confidence that every answer reflects the most current guidance.
"This is pure engineering work done by our data scientists and engineers, using a lot of underlying SAP technology," Kaura explains. "We leverage the SAP business foundation layer, document grounding services, and a lot of purpose-built systems to stay on top of current events in the system."
SAP Business AI also has board-level alignment, ensuring this isn’t just a one-team effort but a company-wide priority. They’ve built strong internal partnerships with content owners across SAP — including SAP Learning, SAP Community, SAP Help, product teams, and consultant teams. Together, they continuously update proprietary content such as SAP Notes, Knowledge Base Articles (KBAs), and other domain-specific guidance that reflects SAP’s evolving best practices.
All of this means Joule for Consultants can take that continuously refreshed data and deliver answers in near real time. It's the kind of research that would otherwise take a consultant hours. But information pulled directly from the source gives consultants the most current and authoritative guidance available, helping eliminate the early-stage missteps that can derail a project months later when scoping wasn’t aligned with the latest capabilities.
Ensuring enterprise-grade security
SAP is building a product that is relevant, reliable, and responsible, Han says. As a company founded in Europe, it takes data privacy seriously, adhering to the GDPR and other EU company regulations. At the core of SAP Business AI is the AI Foundation, the AI operating system that governs AI with built-in security, ethics, and orchestration, using automation and intelligence to manage lifecycles, optimize resources, and boost resilience.
All the LLMs SAP and its customers use operate within the AI foundation, which protects private and proprietary data from being leaked. Beyond data protection, SAP treats bias, ethics, and security at an enterprise level as well, with humans in the loop to run checks and balances.
"We have an enterprise-grade security framework as well as prompt injection and guardrail testing," Kaura says. "The orchestration layer, built within the AI Foundation, anonymizes inputs as well as moderates them to prevent malicious content. That ensures that the output we give to our customers is relevant to the SAP ecosystem, relevant to the domain they’re asking about, and not just generic LLM excess. This set of tools, from the framework layer to the application layer to the product standards, and also the very thorough testing is critical to securing our product. Then and only then can it reach our customers and partners."
Pushing the limits of Joule for Consultants
"We’re barely scratching the surface of what LLMs and agentic AI can offer," Han says. "Accessing knowledge is just the beginning. We’re going to have a much deeper understanding of customers’ SAP systems and be able to help them implement and transform their journey. The product team and our engineers are working to make the tool more transformative, able to unearth more insights, connect with customers’ systems, and understand and optimize their processes, including generating code and handling customer code migration."
The next step is adding a second layer of grounding. SAP’s customer base is vast, and its partner ecosystem has implemented countless business scenarios. Grounding Joule in SAP’s institutional knowledge was the first milestone; the next is layering in each customer’s own proprietary context — historical system data, process designs, implementation blueprints, and internal documentation. This turns Joule from SAP-aware to customer-aware, delivering guidance that aligns with how a business actually operates.
“Think of it as grounding your knowledge on top of SAP knowledge — giving you more accurate and relevant guidance,” Kaura says. “Information that might otherwise be lost can sit on top of Joule for Consultants. Our system processes it and ensures it comes to you in the right manner and at the right time.”
This expanded grounding also lets Joule adjust its guidance to the consultant’s role — whether they’re working as an architect, a functional consultant, or a technical consultant.
"We deliver the information they need for a particular customer configuration," Han explains. "Then we can not only answer generic questions, but we can answer their particular configuration. From there it’s one step ahead to generating more insights and taking more actions."
Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.
Google Releases Updated Gemini Deep Research
Runware Secures $50M in Quest to Build ‘One API for All AI’
Korean AI startup Motif reveals 4 big lessons for training enterprise LLMs
We've heard (and written, here at VentureBeat) lots about the generative AI race between the U.S. and China, as those have been the countries with the groups most active in fielding new models (with a shoutout to Cohere in Canada and Mistral in France).
But now a Korean startup is making waves: last week, the firm known as Motif Technologies released Motif-2-12.7B-Reasoning, another small parameter open-weight model that boasts impressive benchmark scores, quickly becoming the most performant model from that country according to independent benchmarking lab Artificial Analysis (beating even regular GPT-5.1 from U.S. leader OpenAI).
But more importantly for enterprise AI teams, the company has published a white paper on arxiv.org with a concrete, reproducible training recipe that exposes where reasoning performance actually comes from — and where common internal LLM efforts tend to fail.
For organizations building or fine-tuning their own models behind the firewall, the paper offers a set of practical lessons about data alignment, long-context infrastructure, and reinforcement learning stability that are directly applicable to enterprise environments. Here they are:
1. Reasoning gains come from data distribution, not model size
One of Motif’s most relevant findings for enterprise teams is that synthetic reasoning data only helps when its structure matches the target model’s reasoning style.
The paper shows measurable differences in downstream coding performance depending on which “teacher” model generated the reasoning traces used during supervised fine-tuning.
For enterprises, this undermines a common shortcut: generating large volumes of synthetic chain-of-thought data from a frontier model and assuming it will transfer cleanly. Motif’s results suggest that misaligned reasoning traces can actively hurt performance, even if they look high quality.
The takeaway is operational, not academic: teams should validate that their synthetic data reflects the format, verbosity, and step granularity they want at inference time. Internal evaluation loops matter more than copying external datasets.
2. Long-context training is an infrastructure problem first
Motif trains at 64K context, but the paper makes clear that this is not simply a tokenizer or checkpointing tweak.
The model relies on hybrid parallelism, careful sharding strategies, and aggressive activation checkpointing to make long-context training feasible on Nvidia H100-class hardware.
For enterprise builders, the message is sobering but useful: long-context capability cannot be bolted on late.
If retrieval-heavy or agentic workflows are core to the business use case, context length has to be designed into the training stack from the start. Otherwise, teams risk expensive retraining cycles or unstable fine-tunes.
3. RL fine-tuning fails without data filtering and reuse
Motif’s reinforcement learning fine-tuning (RLFT) pipeline emphasizes difficulty-aware filtering — keeping tasks whose pass rates fall within a defined band — rather than indiscriminately scaling reward training.
This directly addresses a pain point many enterprise teams encounter when experimenting with RL: performance regressions, mode collapse, or brittle gains that vanish outside benchmarks. Motif also reuses trajectories across policies and expands clipping ranges, trading theoretical purity for training stability.
The enterprise lesson is clear: RL is a systems problem, not just a reward model problem. Without careful filtering, reuse, and multi-task balancing, RL can destabilize models that are otherwise production-ready.
4. Memory optimization determines what is even possible
Motif’s use of kernel-level optimizations to reduce RL memory pressure highlights an often-overlooked constraint in enterprise settings: memory, not compute, is frequently the bottleneck. Techniques like loss-function-level optimization determine whether advanced training stages are viable at all.
For organizations running shared clusters or regulated environments, this reinforces the need for low-level engineering investment, not just model architecture experimentation.
Why this matters for enterprise AI teams
Motif-2-12.7B-Reasoning is positioned as competitive with much larger models, but its real value lies in the transparency of how those results were achieved. The paper argues — implicitly but persuasively — that reasoning performance is earned through disciplined training design, not model scale alone.
For enterprises building proprietary LLMs, the lesson is pragmatic: invest early in data alignment, infrastructure, and training stability, or risk spending millions fine-tuning models that never reliably reason in production.
States Expected to Fight Back Against AI Executive Order
AI Startup Raises $13M to Combat Deepfakes
Woman Who Helped Coerce Victims into GirlsDoPorn Sex Trafficking Ring Sentenced to Prison
Nvidia Releases Nemotron 3 Open Models
This AI toy tucks kids in and lectures them on geopolitics
Tokenization takes the lead in the fight for data security
Presented by Capital One Software
Tokenization is emerging as a cornerstone of modern data security, helping businesses separate the value of their data from its risk. During this VB in Conversation, Ravi Raghu, president, Capital One Software, talks about the ways tokenization can help reduce the value of breached data and preserve underlying data format and usability, including Capital One’s own experience leveraging tokenization at scale.
Tokenization, Raghu asserts, is a far superior technology. It converts sensitive data into a nonsensitive digital replacement, called a token, that maps back to the original, which is secured in a digital vault. The token placeholder preserves both the format and the utility of the sensitive data, and can be used across applications — including AI models. Because tokenization removes the need to manage encryption keys or dedicate compute to constant encrypting and decrypting, it offers one of the most scalable ways for companies to protect their most sensitive data, he added.
"The killer part, from a security standpoint, when you think about it relative to other methods, if a bad actor gets hold of the data, they get hold of tokens," he explained. "The actual data is not sitting with the token, unlike other methods like encryption, where the actual data sits there, just waiting for someone to get hold of a key or use brute force to get to the real data. From every angle this is the ideal way one ought to go about protecting sensitive data."
The tokenization differentiator
Most organizations are just scratching the surface of data security, adding security at the very end, when data is read, to prevent an end user from accessing it. At minimum, organizations should focus on securing data on write, as it’s being stored. But best-in-class organizations go even further, protecting data at birth, the moment it’s created.
At one end of the safety spectrum is a simple lock-and-key approach that restricts access but leaves the underlying data intact. More advanced methods, like masking or modifying data, permanently alter its meaning — which can compromise its usefulness. File-level encryption provides broader protection for large volumes of stored data, but when you get down to field-level encryption (for example, a Social Security number), it becomes a bigger challenge. It takes a great deal of compute to encrypt a single field, and then to decrypt it at the point of usage. And still it has a fatal flaw: the original data is still right there, only needing the key to get access.
Tokenization avoids these pitfalls by replacing the original data with a surrogate that has no intrinsic value. If the token is intercepted — whether by the wrong person or the wrong machine — the data itself remains secure.
The business value of tokenization
"Fundamentally you’re protecting data, and that’s priceless," Raghu said. "Another thing that’s priceless – can you use that for modeling purposes subsequently? On the one hand, it’s a protection thing, and on the other hand it’s a business enabling thing."
Because tokenization preserves the structure and ordinality of the original data, it can still be used for modeling and analytics, turning protection into a business enabler. Take private health data governed by HIPAA for example: tokenization means that data canbeused to build pricing models or for gene therapy research, while remaining compliant.
"If your data is already protected, you can then proliferate the usage of data across the entire enterprise and have everybody creating more and more value out of the data," Raghu said. "Conversely, if you don’t have that, there’s a lot of reticence for enterprises today to have more people access it, or have more and more AI agents access their data. Ironically, they’re limiting the blast radius of innovation. The tokenization impact is massive, and there are many metrics you could use to measure that – operational impact, revenue impact, and obviously the peace of mind from a security standpoint."
Breaking down adoption barriers
Until now, the fundamental challenge with traditional tokenization has been performance. AI requires a scale and speed that is unprecedented. That's one of the major challenges Capital One addresses with Databolt, its vaultless tokenization solution, which can produce up to 4 million tokens per second.
"Capital One has gone through tokenization for more than a decade. We started doing it because we’re serving our 100 million banking customers. We want to protect that sensitive data," Raghu said. "We’ve eaten our own dog food with our internal tokenization capability, over 100 billion times a month. We’ve taken that know-how and that capability, scale, and speed, and innovated so that the world can leverage it, so that it’s a commercial offering."
Vaultless tokenization is an advanced form of tokenization that does not require a central database (vault) to store token mappings. Instead, it uses mathematical algorithms, cryptographic techniques, and deterministic mapping to generate tokens dynamically.This approach is faster, more scalable, and eliminates the security risk associated with managing a vault.
"We realized that for the scale and speed demands that we had, we needed to build out that capability ourselves," Raghu said. "We’ve been iterating continuously on making sure that it can scale up to hundreds of billions of operations a month. All of our innovation has been around building IP and capability to do that thing at a battle-tested scale within our enterprise, for the purpose of serving our customers."
While conventional tokenization methods can involve some complexity and slow down operations, Databolt seamlessly integrates with encrypted data warehouses, allowing businesses to maintain robust security without slowing performance or operations. Tokenization occurs in the customer’s environment, removing the need to communicate with an external network to perform tokenization operations, which can also slow performance.
"We believe that fundamentally, tokenization should be easy to adopt," Raghu said. "You should be able to secure your data very quickly and operate at the speed and scale and cost needs that organizations have. I think that’s been a critical barrier so far for the mass scale adoption of tokenization. In an AI world, that’s going to become a huge enabler."
Don't miss the whole conversation with Ravi Raghu, president, Capital One Software, here.
Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.