All Blog Posts

Deploying AI to prod at enterprises is a largely unsolved problem

Ravin Thambapillai

Enterprise AI has promise … and a serious limitation

Over the last several years, we've seen an explosion of chat-based LLM products entering the enterprise landscape. This started with ChatGPT’s launch in November 2022 and has cascaded across the entire enterprise software stack. Today, organizations use dozens of chatbots: native foundation model apps like ChatGPT and Claude, code editors like Cursor and Windsurf, and other chat-like features built into products such as Slack and Notion.

Put simply, an average organization uses copious products, some centered entirely on AI, others with LLM-powered features. Eventually, these chat-based applications will evolve into mature AI agents that can autonomously get work done for teams. But there is an underemphasized limitation: these AI agents aren’t too useful without access to other tools, agents, and data. To illustrate this, let’s consider a hypothetical.

Imagine an agent that's tasked with providing comprehensive account briefings before any customer call. The agent will struggle to provide a meaningful briefing without access to the business’s tools that house relevant context about customers. Ideally, the agent should access previous call transcripts from Gong and customer account data from Salesforce. There might be additional contextual data in Snowflake. Without this context, the account briefing agent won’t be that useful. It’ll miss key details, forcing humans to manually fill in the gaps. Agent hasn’t gotten the job done, just assisted with the task.

Consequently, if we don’t give AI agents access to tools and the enterprise’s context, then these agents won’t have a chance of solving the current sprawl problem. With each AI agent working in isolation, the humans are faced with connecting and translating the output of each agent.

This piecemeal problem isn’t unique to agents. It happens with human work, too. As companies scale, they accrue a "collaboration tax,", where more and more teams need to be updated, kept in the loop, and consulted for decisions. This makes work time-consuming and expensive. But while humans have a natural limit of how efficiently they can collaborate, AI does not—that is, if implemented correctly. AI has the potential to be the connective tissue between teams, making sure everyone is up-to-date with just-in-time information to get their job done.

In other words, if we ease the limitations that hamper AI agents, AI will make organizations considerably more productive than previously possible. So why hasn’t this happened yet? Why aren’t enterprises well-oiled machines aided by a seamless AI layer? Because it’s a difficult problem: granting agents access to other tools, agents, and data is non-trivial for enterprises. To understand why, we must first understand the ecosystem that AI agents live within—and the limitations of it's various components.

Model Context Protocol: A system, but not a solution

The crux of the problem is that AI agents pull context from something else—other apps, agents, or data. A precursor to that problem is being able to communicate with something else. That’s the job of Anthropic’s Model Context Protocol.

Model Context Protocol is a spec for how LLMs or agentic systems should interface with other software. Anthropic’s flagship analogy is that MCP serves as a USB-C for AI, making it easy to connect accessories to the core thing. The protocol is designed to be agnostic of what LLM or application is involved. To support MCP, the application leads by spinning up an MCP Server that serves as a bridge between itself and the agentic system.

To be clear, MCP does not solve our critical problem: it is simply the bridge between agentic systems and other software. It doesn’t actually dictate how things should cross that bridge—should they be subjected to restrictions, redactions, or oversight. To best illustrate this, let’s recap the various components of MCP, which are discussed in detail in this guide.

Tools

Tools are the specific functionalities that the agentic system can take. They are akin to an API’s routes or an SDK’s functions

Imagine a hypothetical SalesForce MCP server. The agentic system might have access to tools such as "create_new_opportunity()" or "update_customer_note()."

Resources

A resource allows the AI agent to pull in specific files from the integrated application. They are akin to a domain’s CDN.

A Google Drive MCP server could allow an LLM or an AI agent to pull documents from the drive. The selection of a particular resource can be done manually (in the case of an LLM interface) or automatically (in the case of an agent).

Prompts

A MCP prompt is identical to an LLM prompt, they are just pre-written to assist in particular situations. They’re akin to code snippets in documentation.

Let’s take the example of Salesforce again. They could expose a “Summarize Account” prompt. This prompt has been handcrafted by Salesforce to provide good results for that given task. It goes into deep detail about what each element of data from Salesforce means and what a good account summary looks like.

Actions

While, actions aren’t part of MCP’s spec, they’re closely related. Actions refer to the invocation of an MCP tool or set of tools. Accordingly, actions are a runtime construct and are dynamic; conversely, tools are defined at compile-time and are static. Actions produce a result for an AI agent to reason over.

Why bother with MCP over REST APIs?

Anthropic designed MCP to address the major weakness of APIs: APIs do not explain the business context behind endpoints. Rather, APIs are great for telling what can be done and giving a mechanism to do it. They don’t explain what endpoint or set of endpoints are needed to accomplish something material.

With MCP, there’s a dedicated endpoint into a product that’s specifically crafted for any AI agent to consume.

But why is MCP not the solution?

The complex nature of MCP can be misleading. MCP is detailed in the packaging context. But MCP doesn’t address the challenges unique to *enterprises,* including governance, security, and authorization. Large companies cannot just pass over arbitrary data and hope for the best; they need to enforce existing standards to protect themselves from creating vulnerabilities, leaking data, and breaking compliance standards.

The evidence of this is the current state of the market. Today, MCP is mostly used for personal projects by developers and hackers. There is some adoption from younger startups. However, there is little adoption amongst enterprises outside of pre-production tinkering.

The Dueling Constraints of Sharing Context with AI Agents

Enterprises differ from hobbyists and small companies in two principle categories: (a) enterprises have exponentially more data and data sources, and (b) enterprises have significant guardrails on how that data is accessed, used, and protected.

This creates two bounds for AI agents: a min function and a max function. The former ensures that an AI agent has enough context to effectively make the right decisions. The latter is ensuring an AI agent does not overstep its authority when fetching context and performing operations.

What is enough context for enterprise AI agents?

Previously, we discussed how enterprises have sprawling chat bots. That disarray extends to MCP—a collection of MCP servers doesn’t make it easier to build effective AI agents if those agents don’t know which MCP servers to interface with for an arbitrary task. For example:

An AI Agent needs to book a calendar invite with a prospective customer. The AI Agent can do that by invoking MCP servers connected to various products: Google Calendar, Zoom, and Calendly.

The need to unionize context across MCP servers isn’t just for choosing which server is apt, but also determining what specific actions to take.

The AI Agent might be booking the calendar invite for Karol, an account executive. Karol uses Gong for recording external calls and uses Fireflies for recording internal calls. The AI Agent needs to know which call recorder to include. Additionally, Karol tends to add his manager, Piotr, to the call. The AI Agent should add Piotr to the call as an optional attendee, despite not having context of why Piotr typically joins.

These details can be considered nits, but they require the human to inspect an agent’s work and make manual corrections, undercutting the benefit of AI agents. For AI agents to effectively provide value, this shared context is essential.

This general concept can be labeled as institutional memory, where the AI agent leverages memory accrued by multiple MCP servers. But because MCP is siloed to just the host application and the client AI agent, it doesn’t immediately provide a route to combine context. That would require another bespoke solution.

How do we stop AI agents from breaking the rules?

AI agents cannot break the rules. They must follow the same restrictions subject to humans. This includes not accessing sensitive data and not doing actions that break compliance and security rules. These need to be enforced programmatically—because AI is probabilistic, we cannot just tell it what it can and cannot do and assume it’ll follow those directions perfectly. For enterprises, there is no room for error. In other words, preventing AI from breaking the rules is a matter of limiting context and gating access.

There are three principles of providing guardrails to AI: Authorization, Governance, and Auditability.

Authorization

Every time an AI agent seeks to work with an MCP server, it must be authorized to access all or parts of it. Additionally, the data that’s passed to the AI agent needs to be authorized to be available to the end user, and if not, terminate with the AI agent.

Let’s return to our Salesforce analogy. In this example, the agent will need access to information from Salesforce such as notes from the previous meeting, the upcoming renewal date, and the contract value. However, only authorized employees should be able to trigger an action that pulls this information because many companies consider this sensitive information that they don’t want to leak to competitors.

Simply put, AI agents should not be able to leak information that wouldn’t have been otherwise available to a specific user. For many enterprises, authorization is a matter of security, customer needs, and compliance. Accordingly, AI agents need to be stringently gate-kept from side-stepping authorization grants.

Governance

While authorization is a granular aspect of enterprise compliance, it doesn’t capture the complex processes needed for safe operations. For example, an account executive might want to grant a large discount to a large customer, and while they have the ability to create the invoice, they might need the CRO’s rubber stamp before it’s dispatched. This same principle applies to AI agents—policies and permissions need to be followed when taking action.

This framework is known as governance. To extend our analogy:

A Salesforce agent might want to change the renewal date on a contract, giving a customer an extra free month. However, to do so, the contract needs to be greenlit by at least one North American sales manager.

This example highlights the open-ended nature of governance. Governance is not just a series of gates—it could be complex rules that enable businesses to enforce good restrictions without stifling efficiency. Imagine if all sensitive financial decisions needed to be approved by the CFO—they’d never sleep.

Credal’s AI analog for this is an Action Release Gate, which blocks a given action until a responsible individual is able to approve or release the action.

Auditability

Auditability, which is akin to observability from a development standpoint, provides a trace of which actions were taken by which AI agents (or humans) and what data was exposed to what AI agents during which runtimes. Auditability is helpful for multiple reasons:

It can inform what governance and authorization changes are necessary
It can help remediate mistakes due to AI’s occasional mistakes
It can provide visibility into a complex workflow that might require optimization

In general, having a record of what happened allows AI developers to create more robust applications.

AI Agents have the right context—then what?

Allowing AI agents to be effective at their jobs is only the first step towards an ideal agentic enterprise system. Just as software communicates with other software to accomplish complex tasks, or employees collaborate with other employees to solve problems, agents need to communicate with other agents. This paints the next frontier: agent to agent (A2A).

The Next Frontier: Agent to Agent Collaboration

When thinking about how to enable multi-agent collaboration, two main challenges come to mind:

(i) how are the agents supposed to interact with each other?
(ii) how are the agents supposed to discover and trust each other?

How do agents interact with each other?

Currently, there is nascent work in the agent to agent space. Google released the A2A protocol, a universal interface for agents to interact with each other. While early, it has gotten commendation from technical leaders at Atlassian, Box, Cohere, Datadog, Elastic, LangChain, and PayPal. A2A is not a replacement for MCP, but a corollary protocol that provides helpful tools and context to agents. A2A is designed for agents that do and don't share memory and is built on existing network and file standards.

Given the early adoption, it is still unclear if A2A is the future of agent to agent interactions. It is, however, the front-running candidate.

How will agents discover each other?

The second and larger concern is how an AI agent will discover other agents that it can use to accomplish a task.? For smaller companies, this isn’t a problem—they’ll have countable agents that can be manually rigged together. But for massive enterprises with sprawling departments, offices, and geographies, there might be dozens or hundreds of specialist agents with varying data sources. This is akin to how a growth employee in a New York office might need to interface with a customer success employee in Manila to solve a problem, despite never meeting before.

Imagine a customer health agent in Gainsight that needs to interface with a customer meeting agent to provide context for the upcoming check-in. How do these agents interact without having to be manually rigged together?

This discovery problem is non-trivial. A registry or director service is required to help agents find each other based on capabilities, data access, or functional areas. It's not as simple as a DNS registry, where communications are based on directed requests. Instead, the registry needs to allow agents to search and find the right partner agents.

How Credal Fits Into MCP and A2A

Here, we’d like to toss our hat into the ring. We’re particularly attuned to the problems that MCP and A2A are solving, and believe that there’s a critical infrastructure layer that’ll string these protocols together.

MCP is deterministic, designed for things like "create a ticket" or "fetch calendar events." It's great for well-defined, predictable interactions where it’s precise what needs to be accomplished. Agents, by contrast, are stateful, meaning they remember context, act asynchronously, and often require streaming interaction. MCP assumes logic terminates at the tool, but agents need to call other agents, maintain ongoing conversations, and handle complex, multi-step workflows that might span hours or days.

The reality is that most enterprises aren't going to deploy raw MCP servers or implement A2A protocols directly. They need a platform that abstracts away the complexity while providing the enterprise-grade features they require. While MCP handles the basic plumbing of tool integration, and A2A provides the framework for agent communication, there's still a massive gap in the middle: how do teams actually deploy this in an enterprise environment with all the governance, security, and context requirements discussed earlier?

This is where Credal's approach becomes particularly relevant.

Credal is building the infrastructure layer that sits between these protocols and the actual business processes. Credal is an enterprise operating system for AI agents, operating at multiple layers. (a) It handles the authorization layer, making sure agents can only access what they're supposed to access. (b) It provides the context layer, ensuring agents have the institutional knowledge they need to make good decisions. (c) And it manages the governance layer by tracking what agents do and when they do it, and providing the audit trails and approval workflows that enterprises actually need.

Most importantly, Credal is designed to work with whatever protocols emerge as standards. Whether it's MCP, A2A, or something else entirely, the core enterprise requirements around security, governance, context management, and agent discovery remain the same. By focusing on these foundational needs rather than betting on specific protocols, Credal provides a stable platform for enterprises to build their AI agent strategies on, regardless of how the underlying standards evolve.

The Closing Thought: One Consistent Abstraction Layer

As AI evolves, we are seeing more and more pieces appear that are, thus far, disconnected. MCP, A2A, authorization systems, governance systems, and auditability are all disparate parts that need to work together.

We are skeptical that enterprises will thrive by stringing together a system between these multiple raw components. Instead, there needs to be an abstraction layer that serves as the foundation for agents to follow authorization, governance, and auditability requirements to not stifle success with their MCP- and A2A-powered interactions. That’s what we're looking to build at Credal—we aren’t a replacement for the protocols that connect agents with agents, applications, or data—but rather a foundation so those protocols can be actually leveraged to drive enterprise outcomes.

‍