Using LLMs to Catch Money Laundering: A Case Study

Enterprises are starting to use LLMs, but we’re still in the early days.

We’re all familiar with the basic use cases: search over documents, customer support, and so on. But the harder problems come with regulated enterprises dealing with large amounts of sensitive data — where you have to deal with thorny technical issues like data integration, prompt injections, permissions, and auditability.

This post lays out a case study of how LLMs can be used in an AML (anti money-laundering) context, and some of the gotchas along the way.

This is the first in a series of case studies of how businesses are unlocking these more complex LLM-based workflows with regulated, or sensitive, data. Hopefully it can inspire you too!

How AML works: A Very Quick Introduction

Suppose you’re a fintech company.

The part of the problem we’ll focus on for this post is screening: making sure the businesses you serve are not going to be high risk for you. This means screening your customers to make sure they aren’t doing things that would force you to comply with additional regulations. Fintechs that serve industries like gambling, adult businesses, etc carry a lot of additional risk, so many fintechs choose to simply avoid serving such businesses.

How this typically works:

  1. Customer inputs data in the onboarding flow.
  2. We use that data to look up other contextual information, such as:
    • How many employees do we think the company has?
    • Does it have any “Politically Exposed People” or otherwise risky people that own significant shares in the business?
  3. An AML analyst reviews this data manually.
  4. They check the website of the company to make sure it is legitimate & consistent with our ToS.
  5. They checkout the details of the person signing up, to make sure they work at the company.
  6. They make a determination, and record for posterity why they made that decision.

This part of the process can be summarized in the following diagram:

How do LLMs improve this?

LLMs can:

  1. Save employees huge amounts of time, by making screening faster;
  2. Improve the error rate by reducing false negatives.
  3. Actually reduce bias by enforcing strict auditability around the reasons for each decision

You’d still need a human in the loop to ensure the decision is sound. But the LLM can do a lot of the manual work for you. To do that, though, the LLM needs access to the data sources you’re querying.

Recall that we’re enriching the customer-provided data with additional contextual data (from data sources like S3, BigQuery, Snowflake, or DataBricks, along with external APIs and possibly even web scraping). Say you’ve connected an LLM to those data sources via an enterprise RAG platform (such as Credal).

Once you’ve done that, you face three big questions:

  1. Data representation: how to make it easy for the LLM to understand and assist a human in determining the right course of action?
  2. Permissions: how to maintain permissions for the underlying data?
    • For example, if a user wants to ask a follow up question which requires the LLM to reference past examples, the LLM can query the underlying data sources in a way that respects that user’s permissions?
  3. Prompt injection: how to manage the prompt injection risks implicit in web scraping as part of an AI chain?

Challenge 1: Representing the data

LLMs can enable more sophisticated queries than were previously accessible: e.g. you can imagine an agent asking the LLM, “what decision did we make for similar businesses to [X]” and the LLM having to run a search to answer this. In order to do this, they require text (or image) inputs, and that presents our first challenge: what text should we be sending the LLM, and how should it be represented in the data?

In this use case, we have structured data in a database (such as Snowflake or Databricks) saying “Business Name'', “Business Address”, “ID”, “Description of Business”, “Website” etc, and then potentially hundreds of rows per day. We need to make sure that data is represented properly: as something that can be used in a prompt, both for this usecase, and for future cases which may need to touch this data.

In order to run that “fuzzy”, or semantic, search, certain key fields need to be embedded. So those fields — such as website description or website content — need to be classified as “Text” data.

Other fields can be classified as “Metadata” and are usually prepended to the prompt in classic RAG style, sometimes paired with the description depending on the query. For example, if you’re just asking the agent to assess a single business, you can just put everything into one prompt.

You can see an example of this workflow below in Credal:

ss.png
ss2.png

Challenge 2: Data governance, permissions,  and access controls

Regulated enterprises very often have to be able to defend decisions to their regulator, so a well governed system, that’s compliant is vital. That means careful audit logging of every request, easy deletion when customers ask for their data to be deleted, and easy observability that’s exportable to companies’ existing systems for storing audit logs in case regulators come asking for details in the future.

Equally important is the need to think through the permissions of who should be able to access the data we’ve connected from internal systems.

Most large enterprises will have a lot of teams which operate on the basis of the “The principle of least privilege (PoLP)”, which is an information security concept which maintains that a user or entity should only have access to the specific data, resources and applications needed to complete a required task. This is not how most startups operate, but typically makes sense in the context of a larger company.

In this case, some of the data is quite sensitive - the photos of individual people’s IDs, so we may need to restrict this to just the analyst dealing with this specific case, and their direct supervisors. In the simplest case, we can do this by making sure that the underlying data source in Snowflake, GCP etc has a field for allowed users, and then we can just point Credal’s allowed users at that field — inheriting role-based access controls.

There are more sophisticated strategies we can use than this (such as saying this object is tied to this record in Salesforce/Zendesk etc, and then inheriting the permissions of that record), but in many cases this will do. It’s worth noting that permissions for embeddings can be complex, and vary with what those embeddings represent; and we’ve written more about other data security risks with LLMs elsewhere.

Challenge 3: Protecting against prompt injection attacks

Prompt injections are one of the biggest security risks with LLMs. (OWASP listed it as #1). In short, it is possible to put malicious content on a website that, when ingested, instructs the LLM to do something the prompter didn’t intend. We need to manage this risk.


For example, here is a prompt we might use:

Now suppose a malicious business embeds the following instruction in its website HTML:

A malicious prompt injection attack prompts GPT to incorrectly summarize the business as a laundry business for healthcare companies, fooling our prompt. (Link: https://chat.openai.com/share/e6a85de2-f566-4a13-8a39-9a1bb5c9db2a)

This type of attack is something that regulated companies will need guardrails around; there are many highly sophisticated attackers looking for ways to exploit the KYC processes of financial institutions, and this example was one that took me about 2 minutes to construct.

Credal provides a number of out-of-the-box guardrails to prevent this kind of mistake. For obvious reasons, we’re not going to discuss all of them here, but one simple part of the offering which our regulated customers use is Credal’s acceptable use policy enforcement. This lets a person write a natural language acceptable use policy, and if that policy is triggered, Credal will automatically flag or block the request.

In this case, we can see the exact same query that tripped up ChatGPT getting flagged as potentially suspicious in Credal’s UI:

This is a simple example; in real life we might have multiple overlapping policies, which can be applied to all use cases or just specific ones. Here, if Credal detects something suspicious in the prompt, we’ll automatically flag that in our API response (or in the Credal UI, if the end-user is using Credal’s chat UI), and the customer can surface that in their application. We’ve also written a whole guide to dealing with prompt injections that you can find here.


Key Takeaways:

  1. Thoughtful data architecture and complex integrations can be required to make LLMs work in your enterprise: Doing RAG and more complex second order use cases requires enterprises to efficiently integrate structured data (from databases, spreadsheets, etc.) with AI systems like LLMs; this requires careful and intentional integration of data to ensure efficiency.
  2. Permissions management gets more complex as the number of integrations grows, requiring the use of software to manage it to avoid errors: Adhering to the principle of least privilege, enterprises should implement robust permission systems to ensure sensitive data is accessed only by authorized personnel. Matching AI access protocols with existing data permission structures (like those in Snowflake or GCP) is crucial for maintaining data security, and in some cases more complex permissions schemes will be required.
  3. Guardrails such as auditing and software to catch prompt injection attacks will be table stakes for deploying LLMs to enterprise: When tackling complex, regulated use cases which also rely on external data like web scraping, instituting guardrails will be key to the use case being able to go to production.
    • This means controlling PII, enforcing sensible use, audit logging all interactions in a way that can easily be reviewed, and managing the risks specific to your use case.
  4. AI can be useful for decision-making, but enterprises need to go beyond the chatbot: The real-world application of AI in areas such as KYC and AML for business onboarding can streamline complex, high-stakes decisions and reduce manual error. However, just using chatbots isn’t good enough anymore; often, they’ll need to build custom in-house UIs, or embed AI logic into existing workflows and tools; in either case, having a developer platform with built-in security features is critical to doing this safely.

Enterprises in regulated industries that deploy AI to improve their business metrics and employee productivity will gain an advantage; thus, we will see widespread AI deployment in these areas in the next few years. Businesses will  elevate their use of AI from simple Q&A tools to advanced systems that drive decision-making and safeguard operations. If you’re in a regulated industry or want advice on dealing with these issues, or want to arrange a demo of our platform, feel free to contact us and we’d be happy to help: founders@credal.ai

Building blocks towards secure AI apps

Credal gives you everything you need to supercharge your business using generative AI, securely.

Ready to dive in?

Get Started