In an exciting study, with implications for the legal industry, researchers have uncovered valuable findings that can help us leverage artificial intelligence (AI) for accurate contract analysis and extraction. These insights come from studies where researchers used advanced AI models to analyze medical records and extract key information. In this blog post, we will explore how these findings relate to the legal field, focusing on the benefits AI can bring.

Automating Information Extraction: A Shared Challenge

Both the medical and legal industries face a common challenge—extracting important information from complex documents. In the medical field, doctors and researchers spend significant time manually extracting relevant data from patient records. Similarly, in the legal world, extracting essential details from contracts and legal documents is a labor-intensive process. Even when using certain technologies, the quality control process can be quite time-consuming. To make this task easier, engineers have developed AI techniques that use language processing to automatically extract information called Natural Language Processing (NLP). One important NLP task is named entity recognition (NER), where AI systems identify and classify specific pieces of information like names of people, organizations and locations in a given text. This helps computers understand the meaning of the text and perform associated tasks more efficiently. For example, if you have a sentence like "Apple Inc. is planning to open a new store in New York City," a NER system would recognize that "Apple Inc." is the name of an organization and "New York City" is the name of a place and tag those terms appropriately.

David versus Goliath or Size versus Training

In the 2023 study referenced above, researchers examined an AI model's ability to perform NER on medical records without specifically training the model to do so. This is called “zero-shot learning” which refers to the phenomenon where machines accurately identify and process information they have never encountered before, without relying on supervised learning or fine-tuning techniques. The researchers specifically assessed whether a massive, large language model (LLMs) like GPT-3 with 175 billion parameters could outperform a smaller, domain-specific model called BioClinicalBERT, which was trained explicitly to understand medical records but has only 110 million parameters.

Key Findings

The study revealed three important findings that have implications for the legal industry:

AI Models Improve Quickly. ChatGPT, released in March 2022, performed better than the previous model, GPT-3 which was released in June 2020. This shows that AI technology is constantly improving and newer models are more effective at understanding complex documents like medical records.
Prompt Engineering is Critical. Researchers found that providing ChatGPT with clear instructions, known as prompt engineering, greatly improved its performance. By guiding the models with well-crafted prompts, we can get better results and make the AI understand what we want it to do.
Specialized Models Work Best. BioClinicalBERT, which was specifically trained on medical records, outperformed the more general AI models in accurately recognizing important information. This suggests that models specifically designed for a particular field and task perform better than those designed for general purposes.

A 2019 study yielded similar results. In that study, researchers compared BERT (general language model), to BioBERT (BERT fine-tuned on bio medical text), ClinicalBERT (BERT fine-tuned on clinical records) and BioClinicalBERT (BioBERT fine-tuned on clinical records). As one would suspect based on the 2023 study, BioClinicalBERT yielded the best NER results on clinical records. Conversely, BioClinicalBERT did not yield the best results for finding personal health information (PHI) because it was not trained to accurately detect those particular terms.

We can derive valuable insights from these medical record studies that apply to the legal industry:

Clear Instructions Matter. To get the best results from AI, we need to provide clear instructions tailored to the task at hand. By helping the AI understand what we want it to find and extract in contracts and legal documents, we can achieve more accurate results. prompt engineering strategies are too. Prompt engineering strategies are readily available online such as this prompt engineering strategy for developing contract negotiation playbooks.
Specialized Models Bring Accuracy. Using AI models trained specifically for legal tasks and documents, rather than general-purpose models, can significantly improve accuracy. These specialized models have the necessary knowledge and understanding to manage legal complexity effectively. For example, here is a link to LEGAL-BERT which, fine-tuned to address specific legal tasks or combined with GPT could yield more effective results in the legal domain than a general use model.
Transfer Learning is Key. Developing specialized AI models can be costly and time-consuming. However, transfer learning allows us to adapt existing models to specific tasks, making AI solutions more accessible and cost-effective for the legal industry.
AI Progress Happens Fast. AI technology, like GPT-3.5, continues to advance rapidly. Although GPT-3.5 did not outperform BioClinicalBERT, it showed significant progress without specific training. This suggests that AI models will continue to improve, becoming even more useful over time.
Understanding Limitations is Key. While AI technology is impressive, it is not perfect. Errors will still occur, such as misidentifying information or missing important details. To ensure accuracy, human involvement and quality control remain essential. In the 2023 study, false positives were the most prevalent issue with GPT-3.5, constituting approximately 80% of total errors. That means GPT-3.5 inaccurately classified clinical entity types. That would be equal to classifying an indemnity provision as a limitation of liability provision in a contract extraction.

Tips for Making AI Work for You

By leveraging specialized AI models, providing clear instructions, and understanding the potential and limitations of AI, the legal field can streamline information analysis and extraction. Since results are not yet perfect and model performance can differ depending on the task, here are five tips to help you get the most out of AI:

Assume that human-in-the-loop quality control will be necessary. The question is not whether humans will need to conduct quality control on machine outputs but when and at what level of effort. The goal with AI applications for NLP tasks should be to reduce human effort but do not expect to eliminate it quite yet.
Determine which LLMs have been incorporated into the tool. Before committing to a tool, explore which LLMs have been integrated or developed, how long the LLM has been part of the technology stack and, for extraction, the target accuracy score (expressed as an F-1 score). 
Benchmark against the most advanced models. GPT is a far more advanced LLM than anything available today and still it has significant limitations. Accordingly, when exploring AI solutions, use GPT and its iterations as benchmarks to compare other models in terms of performance, scale, training data and investment.
Determine implications and drivers for accuracy. Among other things, training, size, scale and quality of underlying documents will impact model accuracy. Validate those drivers and work to create optimal performance environments to improve model performance.
Develop AI fluency and digital literacy. AI is both complex and pervasive. The best line of defense is to educate yourself and adopt a mindset of experimentation. That will prevent you from expecting more from technology than it can feasibly provide or being fooled into doing so.

To learn more about how AI can improve your legal operations or enhance your AI fluency, feel free to contact me at michael.callier@factor.law.

Unlocking the Potential of AI in the Legal Industry: Insights from Medical Record Studies

Automating Information Extraction: A Shared Challenge

David versus Goliath or Size versus Training

Key Findings

Tips for Making AI Work for You

Get Integrated.