Screenshot of prototype showing an answer to a question about casinos

There's no shortage of pundits making claims about AI and ChatGPT. This blog post isn't another one of those kinds of articles. This is an explanation of my own attempt to build an AI tool to answer anti-money laundering legal/compliance questions. You can try out the test system for yourself, with any questions about Canadian AML law: https://carrentis.com/aml/index.php. It's as interesting to see what it can do as what it can't. This post explains what I made, why I made it, and where AI might fit into the future of legal practice. Along the way I give the results of the AI methods for answering various AML questions with correct explanations of how the answer was developed.

One interesting takeaway: using the AI methods is only a bit better than using Google. This is probably because Google already incorporates these systems into its own question-answering system.

Why Make An AI Tool For Answering Legal Questions?

Lawyers make their living by answering questions about law, so a lawyer making a tool to do that doesn't sound like a smart career move. But the vast majority of legal questions go unanswered because people can't afford to pay a lawyer by the hour to answer the questions. Which means that many businesses who would like to pay something for an answer end up paying nothing. I think it would be fantastic to build a tool that does 50% of what a lawyer does at a price point approaching pennies per questions, because it would help address this completely unaddressed need.

But I'm also just curious about whether recent advances have gotten close enough that this is now feasible. I think the answer to that question is a less-than-satisfying: sort of. Definitely many questions can be answered. But there's some types of questions that stump the AI even where an answer is available on the FINTRAC website. There's also many questions where the answers can't be found clearly on the FINTRAC website (or in the regs or statute).

What Did I Make?

The best way to learn about something is to go out and do it. So I went out and built a prototype that uses two different methods to answer the question posed. The system runs both methods at the same time and displays them side-by-side so that I can see which approach works and which doesn't. There's a lot of optimizing that could be done but my purpose to this was just to get a system up and running quickly in order to test out some hypotheses about using large language models (specifically, the ChatGPT 4k and 16k GPT3.5 models).

If you're trying out the prototype (see link above) please note that this is not a finished, commercial product. It's for testing and it has bugs that I'm not going to fix. A real version of this would be better made but this is just to try out the methods and evaluate the results (see further down below).

Method #1

Method #1 is to go through every page on the FINTRAC website and ask the AI to look for any legal rules then extract them in the form if X then Y propositions. This is a powerful simplification of law that I used to use in law school for making notes for exams. It misses some of the nuance and it's not really how law works, but it does create a much smaller search space that can be inspected to look for any matching rules. When the user runs their query, the system looks through all of the 2000+ rules that were extracted to see which ones might be relevant. Then it checks the potentially relevant ones again in order to create an answer to the question. The final product is an answer, an explanation, and a set of sources.

Method #2

This is the method that proved to be much more effective. It's also one that's commonly used by other AI systems. How it works is that there's a keyword index that's built for all of the pages on the FINTRAC site. Then, when the user runs a search, the AI system produces a set of likely keywords then the search system runs all of those searches and merges the results into a candidate list of pages. Then, each page is inspected to see if it helps to answer the question. Then a final answer is prepared with an explanation and a list of sources.

Both Methods

Both of the methods that I use are backed by a recent crawl of the FINTRAC website using a page crawling system I developed for another purpose. The pages are extracted as a user would see them, by having an actual web browser visit the page. This is a better method than what's used by ChatGPT in their crawls of the Internet but probably it doesn't make much difference for this purpose because the FINTRAC website is very simple and doesn't make much use of Javascript or other things that can make a browser-based crawler more thorough. The search index is all of the FINTRAC pages, which is around 2500 web pages, with some near duplicates that I didn't filter out.

Is This Better Than ChatGPT?

The online ChatGPT is a great way to prototype how to do prompts and see how the APIs will likely work, but it is not at all the same as the above methods. First off, it doesn't cite sources and it's really hard to get it do that. Obviously future versions of ChatGPT will have this feature, but for now, it doesn't do a good job of explaining how it came up with answers (which is related to the well-publicized phenomenon of hallucinations AKA making things up). ChatGPT is also a point-in-time model that's nearly two years out of date. The above two methods are only ever as out-of-date as the web crawl that they're based on (which could be done about monthly or faster).

ChatGPT isn't a great legal tool without sources. No one should ever provide answers that it spits out about AML law or anything else about law. And neither would anyone want to rely on my two methods above, but you don't have to when you can click through to the sources.

The Results

The results are pretty good! Method #2 is better 90% of the time than method #1, which isn't surprising because it draws on more detailed information. Method #1 could be extended to look more like #2, and there's a lot more that could be improved with both, but I think the simpler method of #2 is the way to go for these kinds of systems. This is basically a more advanced search engine, that's summarizing pages and providing them in a format that's more like how people do their jobs. That's a powerful concept, but it's not quite the same as doing away with those jobs.

People aren't always right but these AI tools are more wrong. And the tools have a hard time judging their own confidence. For example, it doesn't seem feasible to ask whether method #1 or method #2 provided a betteranswer because the AI doesn't have the background that a person has which allows them to judge. Probably someone would want these tools running in the background with a human operator between the question asker and the AI system. A person can also retry the question with different phrasing if they see the answer is off-base.

Example Questions

I ran my system on a number of questions. Below are the examples with a note about whether method #1, method #2, and the synthesis of the two got the right answer. I also ran the same questions on ChatGPT online (prefacing them with "Canadian AML question: ") and I tried searching on Google (in incognito mode) to see what appears on the first page, prefacing the question with "FINTRAC question: ". Prefacing them on Google and ChatGPT is necessary to provide the context that is given as part of method #1 and method #2, so it's a more fair comparison. For Google, I judged the answer as correct based on whether the answer appeared on the first page of results or was given in an infobox.

How many slot machines can a place have before it counts as a casino?

Method #1: No

Method #2: Yes

Synthesis: Yes

Google: Yes

ChatGPT: No

Does the chief compliance officer have to be a full-time employee?

Method #1: Yes

Method #2: No

Synthesis: No. Right answer but wrong explanation

Google: No

ChatGPT: Yes

The ChatGPT answer is correct but rambles on quite a bit. Probably a user would prefer the more focused answer of method #1: No, the chief compliance officer does not have to be a full-time employee., with the explanation given as:

Based on the rules used, there is no specific rule that states the chief compliance officer must be a full-time employee. However, the rules indicate that a business must develop and maintain compliance policies and procedures, ongoing compliance training program, and conduct regular reviews to assess the effectiveness of their compliance program. As long as these requirements are met, the chief compliance officer could be a part-time or contract employee.

How much cash can an MSB accept from someone before they have to report it to FINTRAC?

Method #1: Yes

Method #2: Yes

Synthesis: Yes

Google: Yes

ChatGPT: Yes

What system is used to file STRs?

Method #1: No

Method #2: Yes

Synthesis: Yes

Google: Yes

ChatGPT: No

How many condos can be sold before someone needs to register?

Method #1: No

Method #2: Yes. And the answer provided is subtly correct.

Synthesis: No

Google: No

ChatGPT: No.

Here's the answer from method #2

There is no specific number of condos that can be sold before someone needs to register. According to the text from the FINTRAC website, real estate developers must meet their obligations under the Proceeds of Crime (Money Laundering) and Terrorist Financing Act (PCMLTFA) when they sell five or more new houses or condominium units to the public in any calendar year after 2007. It is important to note that the requirement applies to the sale of new houses or condominium units and not to the total number of condos sold. Therefore, it is the sale of new houses or condominium units that triggers the obligation to register, rather than a specific number of units.

Here's the explanation of the answer from method #2

The text clearly states that a real estate developer must meet their obligations under the PCMLTFA when they sell five or more new houses or condominium units to the public in any calendar year after 2007. This means that the requirement is based on the sale of new houses or condominium units, regardless of the total number of units sold. Therefore, there is no specific number of condos that triggers the registration requirement; instead, it is the sale of five or more new houses or condominium units that triggers the obligation.

Does a gold dealer have to register? If so, what is the registration called?

Method #1: Yes

Method #2: Yes

Synthesis: Yes

Google: Yes

ChatGPT: No. Very wrong. ChatGPT confused DPMS and MSB.

I counted this as a "yes" for Google and both methods (and the synthesis of them) because it's close enough to true. Technically it's not a registration to become a DPMS, it's more of a category to which obligations are attached. But since many DPMS's would end up registering for submission of reports, it's true in one sense that they have to "register". This question shows the level of nuance that these sorts of systems might one day be called upon to provide. But I still counted this as a "yes" for the two methods and perhaps a more strict view is that none of the responses were correct. But I'll give this one to the AI as a win and it appropriately provided the correct citation with method #2 to the main DPMS page on the FINTRAC website. Method #2 was the most correct for this reason.

Does a life insurance agent have to register? Do they have to file anything regularly or have other reporting obligations?

Method #1: No. Very wrong.

Method #2: Yes

Synthesis: Yes

Google: No

ChatGPT: Yes. But this answer is not very good compared to method #2.

What is a PPP account?

Method #1: Yes

Method #2: Yes. This was a more detailed answer than that given by method #1.

Synthesis: Yes

Google: Yes

ChatGPT: No.

The most helpful result on Google wasn't actually the FINTRAC page but a couple of blog posts from McCarthy Tetrault that have the right answer. Mining law firm blog posts is a good idea for building a quality question-answering system for law.

How much does it cost to register a cryptocurrency dealer with FINTRAC?

Method #1: No. This answer was very wrong; it suggested $100 was the fee (the right answer is there is no fee in Canada).

Method #2: Yes

Synthesis: Yes

Google: Yes

ChatGPT: No. The answer that's provided isn't quite right because it says it's not specified in the regs, and that's true, but in fact FINTRAC tells people that there is no fee. So it isn't quite correct to say that it's unclear, like ChatGPT seems to be saying in its answer when I tried it. I also tried regenerating the answers and none of them were better so this isn't just random chance.

What is a company that sells crypto registered as?

Method #1: No

Method #2: No

Synthesis: Yes. What's interesting about this answer is that the synthesized answer based on both methods is actually better than either one, and the explanation given by the AI makes sense.

Google: No

ChatGPT: No. The answer is mostly right but it mixes in foreign terminology (VASP) in a way that someone would mark as wrong if they were grading this as a test. The reason for this is that ChatGPT is a global model and it has a hard time localizing answers like this, even with the preface to the question that should limit it to Canadian AML law.

How much cryptocurrency can be sold to someone before the customer must provide identity documents?

Method #1: No

Method #2: No

Synthesis: No

Google: Yes

ChatGPT: No. This answer appears to be right superficially but it's actually all filler and it never answers the question. This is a common problem with ChatGPT's answers (which is probably the result of OpenAI training the system to generate answers that humans think look right, rather than being right, which is more difficult to do).

Method #1, method #2 and the synthesis approach all failed in answering this question. Probably the reason for this is that FINTRAC's website is structured to answer questions about AML in a certain way, and not the way that a person would approach these issues. The answer to the question can't be easily found because FINTRAC tends not to give answers about what people can do, but rather what they can't. There's also an issue with not being able to connect some subjects because the system lacks the understanding of FINTRAC's glossary and governing laws. It could also just be a limitation of my poor quality search system because Google was able to answer this.

If a suspicious transaction happens, how long does a company have to make a report with FINTRAC from the day they discover something is suspicious?

Method #1: No. Very wrong.

Method #2: Yes

Synthesis: Yes

Google: Yes

ChatGPT: No

The Google first page results for this question aren't really much of an answer but I'd say it's correct enough to give it to Google for this question.

Could This Be Improved With GPT4?

Yes. I'm quite sure some of the answers can be improved by using GPT4, which is much better than 3.5-turbo, the model that I used for the answers above. How do I know this? I tried answering the DPMS question above with the text of the DPMS page from FINTRAC as part of the prompting and the GPT4 answer better represents that DPMS isn't a registration, it's a category of obligations:

Based on the given text, the gold dealer is considered a Dealer in Precious Metals and Precious Stones (DPMS) and there's an implication that they need to comply with requirements under Proceeds of Crime (Money Laundering) and Terrorist Financing Act (PCMLTFA) and associated Regulations. However, the text does not explicitly mention a formal registration process specifically for DPMS. The obligations stated include maintaining a compliance program, knowing your client, reporting, record keeping, and complying with ministerial directives and penalties for non-compliance. If they fail to meet these requirements, they might face administrative monetary penalties.

GPT4 is 20x-30x more expensive than GPT3.5-turbo. Undoubtedly there are legal tasks that would justify that, but given that this is just an experiment to see what's possible, and what could be done as a low-cost mass-market service, I've skipped GPT4 for now. But I expect that price to come down, just like with the other LLM API costs. Perhaps next year GPT4 will be the default. These systems are rapidly improving and dropping in price.

Future Of Legal Practice

It wouldn't be a good idea to use a model like ChatGPT to answer AML questions because the models are so far out of date. In some areas of law, like AML, it is important to know the requirements as they are now, not as they were a long time ago when the model was made. This is the justification for extending the idea of ChatGPT using model #1 and model #2, which makes use of new information from current crawls of the FINTRAC website. This limitation of ChatGPT was the cause of the failure to answer the question above about PPP accounts.

ChatGPT's base online answers using the 3.5 model (the standard one deployed for unpaid customers and what most people call ChatGPT) are surprisingly poor quality. It's not enough on its own.

Method #1 seemed promising when I was first trying it out and making it, but now that I see the results, I'm not sure this is a useful method for doing these kinds of questions. Method #2 is much more consistently right, although occasionally method #1 gets it right while #2 doesn't. Synthesizing the answers does improve them a little bit, but it also obscures the sources from the original methods, so I'm not sure that's a plus when it comes to practical applications. The most important thing as a user of a service like this is that the answers need to have good explanations and they need to have good, relevant sources. So just getting a right answer is only part of the battle, and I'd say it's better to have more wrong answers but better sources/explanation, because then the human operator can adjust their question or use what's provided to get to the right answer.

AI-Powered Research Tools

AI tools might take the tedium out of research. An automatically generated report that is mostly right can be taken by a person and cleaned up to make one that's completely right, with sources, in a nice presentation. This could probably improve the efficiency of an AML lawyer or a bank compliance person. Probably more than half of routine questions can be answered like this. In my experiments with questions, the AI system gets the right answer most of the time.

Where an AI system would shine is in being able to do more with the information than can be done with Google, like generating more complex reports or integrating other sorts of information. Method #2 was only a bit better than Google in answering questions. Method #2 is also quite naive in how it works, and could be upgraded quite a bit by using vector search or even just better quality search systems like Elasticsearch.

There's a number of commercial products exploiting AI for use in research. Lawyers are probably not too far away from switching to these kinds of research products as both a first step and also as a check to see if there's anything new that's happened that they might not be aware of. AI tools that work like method #1 and method #2 above can make sure that the lawyer is up-to-date and won't miss anything.

These tools could be very affordable. Testing out my two approaches for many different questions today cost under $4 USD. Even running both methods side-by-side, the cost per question answered is around 10-15 cents USD. Not free, but in the context of legal answers, essentially free. That may end up being one of the more important parts of this technology, as it is very difficult for people to even get on the right track with legal answers. An AI research tool might get someone 50% of the way to the right answer, and that's all they need to get the remaining 50%. The price point for this could be 99% cheaper than a lawyer (and that's not an exaggeration! it really is that much cheaper).

The AI Future Of Law

The AI future of law is closer than I would have thought before doing these experiments. Even relatively quickly made tools, like my system, can produce quite good quality answers! These could probably be useful to deploy in companies like banks where there's a high volume of questions. An AI system can provide an example answer for the lawyer to approve or not. Bad answers can then be modified and submitted. Good answers can simply be approved and the lawyer can move on with their day. This could likely improve efficiency and allow legal teams to start fielding more low-effort/low-quality questions from staff. With the current model of having lawyers assigned to files and treating every question as a complex matter, there's not as much bandwidth for the simple questions that probably many employees have. I would think this approach could improve a company's anti-money laundering practices.

The cost for running model #1 and model #2 together, and synthesizing them, is under 10 cents per question, so this could be a big gain in efficiency for legal teams.

If you enjoyed this blog post, give the tool a try yourself with your own AML questions to see how the AI can answer them: https://carrentis.com/aml/index.php.