emailfacebookinstagrammenutwitterweiboyoutube

What Legl learned from introducing AI

Legl highlights key learnings on technical build, pricing, and prompt engineering after introducing artificial intelligence to its product and what it means for law firms

|Legl|

Legl is often deeply embedded within a law firm’s risk and compliance processes and other business processes like payment and client lifecycle management. As a result, we approached generative artificial intelligence (genAI) applications as an opportunity to add more value (a “smarter layer”) to existing workflows and law firm processes. With that in mind,  we started to look at how we could transform from a system of record — a technology tool that automated core business processes — to a method of cognition — a technology tool that automates processes and then uses that data to enable law firms to make faster, better decisions, particularly in the risk management space.

Legl has invested hundreds of engineering and product hours into development, with large language models (LLMs) from OpenAI and Anthropic. We wanted to share what we have learned – both (1) in our own product and (2)  how that might apply to law firm adoption and implementation of AI tooling.

  1. What we’ve done and some learnings we’ve had about the technical build on top of both Claude and GPT

We started our genAI journey with over 30 hypothetical use cases for creating more customer value within our software as a service (SaaS) application that we wanted to test, both philosophically (were they possible to build), from a customer point of view (did they solve a real problem / would they add value to our law firm customers) and tactically (how would we build them).

We have built using API-based platforms from OpenAI and Anthropic, which provide us access to their latest advanced models (GPT-4 and Claude 2) and give us the flexibility to integrate AI capabilities into our SaaS application in a variety of ways.

Some of our key learnings:

     a. Starting with narrow problems yielded better results

Both GPT-4 and Claude 2 give remarkably good responses even with minimal context —  the models demonstrate an impressive ability to generate coherent and relevant responses across a vast range of subject matters with minimal input.

While this is obviously a hugely valuable attribute, we also found it to be a double-edged sword at times. Often, there’s an inclination to apply these models to very broad problems and, despite their impressive capabilities, this can lead to poor or inconsistent outcomes. We’ve found that starting with a more specific and narrowly defined problem often yields far better results.

      b. Prompt engineering

An unavoidable aspect of working with LLMs is prompt engineering — i.e. creating the prompts to get the desired output from the LLM — and it’s far from trivial. We’ve found that it takes considerable time to experiment with different approaches to understand how the LLM interprets and responds to prompts. There are many different strategies for improving or tailoring the output, but it’s hard to know what will lead to better outcomes, so it’s a repetitive process of changing a prompt and then assessing and comparing the output (taking into account different datasets or contexts).

Achieving the desired response can require the use of multiple prompts or a chain of prompts, which can rapidly escalate in complexity and make testing slow and complex. Each prompt must be carefully crafted, not only in terms of content but also with an eye on token counts.

Although context windows (the number of tokens you can include in a prompt) are increasing, token counts are still relevant for a few reasons. Firstly, you may just hit the limit of the context window, in which case your request will fail. Despite the expanded context windows of the latest advanced LLMs, tokens in the middle of a long prompt tend to be overlooked or given less weight, affecting the response quality. There are also rate limits to consider — assuming the use of a third-party API, there will be limitations on how many tokens per minute you can process.

So what initially appears as a straightforward problem can quickly evolve into a complex one, requiring nuanced and strategic prompt design.

      c. Pricing

As an aside, you’re typically charged on a per-token basis for using an LLM, meaning that costs depend on the length of the prompts and outputs. This can lead to unpredictable costs, as the sizes will vary. The user experience can also have a significant impact here, as an experience that involves multiple back-and-forth exchanges with the LLM, which is common, can rapidly increase the number of tokens processed.

      d. Data security

As with any change to a tech stack, data security is an important consideration when incorporating generative AI. We use Microsoft Azure for our AI products to ensure that we have control over where our data is stored and processed, and to ensure that our data isn’t used for further training of models.

       e. Speed

Compared to humans, LLMs are incredibly fast at understanding and responding to large requests, but in software engineering terms, they’re actually quite slow. Sophisticated general-purpose models, like GPT-4, can take quite a while to fully respond to a prompt, especially when the prompt or response is large. Less sophisticated models can be faster, but their capabilities are generally weaker (or they are designed for specific applications), making them less suitable for many use cases.

This creates a challenging UX problem — users may be willing to wait for some amount of time, but an experience that involves multiple LLM requests, with each one potentially taking as long as 30-90 seconds, is unlikely to be feasible.

This chart shows the average and maximum response times for the LLM API we’re currently using.

This may partly explain why so many implementations at law firms are based on a chatbot-like experience, as the waiting time is quite easily incorporated into the UX. But the chat format may not always align with the users’ needs or preferences. This over-reliance on a chat-centric UX design overlooks the potential for more innovative and diverse interfaces that are better suited. So a part of our learning curve has been how to incorporate AI into our product in a way that suits the capabilities of AI but also caters to the needs of our users, while maintaining a high-quality user experience.

  1. Law firms’ technology stacks and genAI

LLMs require context in order to produce relevant and applicable outputs. Good context requires good data.  In terms of structural approaches to tackling better AI implementation, one area that law firms may need to invest more in overtime is their underlying IT infrastructure, as in our experience, many law firms operate with complex IT ecosystems, where data is scattered across various systems and not always structured or formatted in a way to make it easily accessible. These data silos, often a result of legacy systems and disjointed software solutions, may lead to complex and inefficient data retrieval when confronted with an implementation of AI tooling.

One way around this in an implementation journey is to focus on narrower use cases — as we found, addressing broad problems effectively requires a more extensive contextual understanding, which requires more data (as well as making the prompt(s) more complex, among other things). Hence, perhaps, the narrower, chatbot approach, that a number of law firms have taken.

Firms who want to target more advanced applications of AI that bring tangible benefits to their practice will need to consider the data infrastructure on which their LLM sits on top of, as well as the user experience, and key value for the firm. A first step is typically assessing and identifying key problems to solve – whether at a blue sky level (i.e. how do we deliver the best client services imaginable) or a more tactical level (i.e. how do we reduce time spent on X).

The consideration of whether the best way to solve that problem through something net new (an AI point solution, or an in-house or external build on top of LLMs), or an existing solution that works across existing data estates (whether that’s a core infrastructure tool like AWS and/or a platform for risk and compliance like Legl) is important.  Either way (and the two are not mutually exclusive), an approach centred on the user and the value, while collecting structured learnings (from prompts to outputs to value and ROI) is most likely to drive the type of cohesive, data-driven implementations that meet the complex demands of the modern law firm.

LPM Conference 2025

The LPM annual conference is the market-leading event for management leaders in SME law firms

Paths to progress

As firms lay down fresh tracks to develop and progress top talent, how are traditional career pathways in legal evolving?