Demystifying AI: Can Generative AI Replace Human Expertise in eDiscovery?
Lawyer vs AI
Daniel Barak

April 25, 2023

The bombshell of generative AI has been resonating in shockwaves of speculation across all industries, prompting discussions about increased productivity, lost jobs and, of course, the obligatory references to The Matrix and Skynet. A recent highly publicized analysis from Goldman Sachs estimated that 300 million jobs will be lost or impacted, predicting that many jobs in the Legal profession would have “high exposures” to AI automation. We have all seen a lot of panicked “hot takes” and I’ve had my own feelings of what I like to half-jokingly describe as moments of existential dread.

With a background in computer science and almost 20 years in the eDiscovery world, I think a sober analysis of the impact of these advancements is much needed as a part of our continued conversations about a technology that some tech leaders have compared to the invention of the internet, or even electricity.

Here are some recurring questions that keep coming up regarding ChatGPT and eDiscovery: If ChatGPT can pass the bar, is it going to perform document review on its own or even replace attorneys? How long before we can just ask the AI to find all the responsive documents? I think there is an assumption among many that in a few months we will be able to just ingest all eDiscovery documents into a tool like ChatGPT and get all the answers. The reality, however, is more complicated..

Let’s start with some demystification. The following is a description of how a system like ChatGPT works: It tries to guess what word should come next in any given context. It sounds simple, but the more data it has been trained on, the better it becomes at this guessing game.

I will try to simplify some of the technical components and bring them down to earth or at least to give you a few AI terms that you can use to sound cool at social events.

Some key Terminology:

Large Language Model (LLM): A program built to generate human-like text. It is built by feeding the model large amounts of data, and from that data it is able to learn about language structure, grammar and context.

Generative Pre-trained Transformer (GPT): That’s why they call it ChatGPT…because you are chatting with a GPT. Ok, but what is a GPT? It’s a form of Large Language Model that has two levels of training. First there is unsupervised training (no humans involved) on massive amounts of data followed by a fine-tuning process wherein humans train a smaller subset by performing tasks such as providing the correct answers to questions. The goal is to leverage its extensive training data to accurately predict and generate (the G in GPT) the next word in a sequence, creating coherent text in response to user inputs, also known as prompts.

Tokens: Small units of text (words or pieces of a word) that the model works with. This term is important because Large Language Models typically have a maximum number of tokens that they can process and respond to. This limitation impacts a model’s ability to handle long text like full documents or articles.

Generative AI in eDiscovery: Strengths and Weaknesses

Now that we have defined some key terms and have some general idea of how generative AI works, we can talk about its strengths and weaknesses in the context of litigation and eDiscovery. An important thing to remember is that generic Large Language Models do not have knowledge of documents in an eDiscovery case. They have been trained on a specific set of data.

LLMs can be very effective at analyzing and summarizing content that is not part of their training model, in fact, there are many exciting and useful projects built around tools like ChatGPT that let you do things like chat with a PDF document and ask questions about its content. However, performing this process across a vast collection of documents, as is necessary in an eDiscovery case, introduces a set of distinct challenges.

There are essentially two ways that an LLM can provide insight into custom datasets. One way is to train the dataset into the model, essentially creating your own GPT that knows about the documents; the other is to use a generically trained model, such as GPT-4, and point it at a custom dataset stored in a separate database. Both approaches have advantages and disadvantages.

Now that we have defined some key terms and have some general idea of how generative AI works, we can talk about its strengths and weaknesses in the context of litigation and eDiscovery. An important thing to remember is that generic Large Language Models do not have knowledge of documents in an eDiscovery case. They have been trained on a specific set of data.

LLMs can be very effective at analyzing and summarizing content that is not part of their training model, in fact, there are many exciting and useful projects built around tools like ChatGPT that let you do things like chat with a PDF document and ask questions about its content. However, performing this process across a vast collection of documents, as is necessary in an eDiscovery case, introduces a set of distinct challenges.

There are essentially two ways that an LLM can provide insight into custom datasets. One way is to train the dataset into the model, essentially creating your own GPT that knows about the documents; the other is to use a generically trained model, such as GPT-4, and point it at a custom dataset stored in a separate database. Both approaches have advantages and disadvantages.

Lawyer vs AI

Training the dataset into the model

If you were to build a LLM by training an entire dataset into the model, you could then ask questions about the dataset and the LLM, if trained properly, could provide what appear to be answers. Why do I say, “appear to be”? Because, as I mentioned earlier, Generative LLMs such as GPT are designed to predict the next word in a sequence based on patterns it has seen before. Sometimes it may generate text that sounds plausible but is not factually accurate. When a LLM generates inaccurate content, it is called a Hallucination.

Here is an example of a ChatGPT Hallucination: I asked the question, “What is the first word of the text of the second chapter in Alice in Wonderland?”.

ChatGPT responded:
The first word of the text in the second chapter of “Alice’s Adventures in Wonderland” is “Alice.” The full sentence is: “Alice opened the door and found that it led into a small passage, not much larger than a rat-hole: she knelt down and looked along the passage into the loveliest garden you ever saw.”

This answer is incorrect. In fact, it gave a different answer each time I asked. The first sentence of chapter 2 actually begins as follows:
“Curiouser and curiouser! cried Alice”

Why do Hallucinations occur? Here’s an answer straight from ChatGPT: Hallucinations in language models like ChatGPT happen because they learn from imperfect data and have limitations in understanding context. Researchers are working to improve these models to make them more accurate and reliable.

I understand the irony of using a language model that can produce inaccurate text to explain why it can produce inaccurate text, but it’s not a bad explanation in this case.

If you were to build a LLM by training an entire dataset into the model, you could then ask questions about the dataset and the LLM, if trained properly, could provide what appear to be answers. Why do I say, “appear to be”? Because, as I mentioned earlier, Generative LLMs such as GPT are designed to predict the next word in a sequence based on patterns it has seen before. Sometimes it may generate text that sounds plausible but is not factually accurate. When a LLM generates inaccurate content, it is called a Hallucination.

Here is an example of a ChatGPT Hallucination: I asked the question, “What is the first word of the text of the second chapter in Alice in Wonderland?”.

ChatGPT responded:
The first word of the text in the second chapter of “Alice’s Adventures in Wonderland” is “Alice.” The full sentence is: “Alice opened the door and found that it led into a small passage, not much larger than a rat-hole: she knelt down and looked along the passage into the loveliest garden you ever saw.”

This answer is incorrect. In fact, it gave a different answer each time I asked. The first sentence of chapter 2 actually begins as follows:
“Curiouser and curiouser! cried Alice”

Why do Hallucinations occur? Here’s an answer straight from ChatGPT: Hallucinations in language models like ChatGPT happen because they learn from imperfect data and have limitations in understanding context. Researchers are working to improve these models to make them more accurate and reliable.

I understand the irony of using a language model that can produce inaccurate text to explain why it can produce inaccurate text, but it’s not a bad explanation in this case.

Lawyer vs AI

Point a LLM at external data

The approach of using a generically trained language model, such as GPT, for eDiscovery purposes presents both potential benefits and inherent limitations. This method involves applying the pre-trained model to analyze a specific set of legal documents. One of the key advantages is that it eliminates the need for retraining the model for each new dataset. Furthermore, the method will allow a user to reference metadata such as dates and email participants, of the documents in your eDiscovery data used to generate the model’s responses.

Despite these benefits, there are important caveats to consider. One limitation is that the answers provided by the language model may not be exhaustive or fully representative of the entire dataset, as they are based on the model’s search and analysis capabilities.

Another challenge arises from the maximum token limit imposed by large language models, which restricts the amount of text they can analyze and respond with at a given time. As a result, the AI might face difficulties in analyzing entire documents, particularly lengthy or complex ones, in a single pass.

These may sound like minor limitations, but what it means practically is that while you could ask the LLM a question about your data, it would not necessarily bring back an answer that encompasses the entire dataset. The answer would be limited to the documents returned by a search run by the LLM, potentially overlooking crucial documents that could have led to a more accurate response.

There are other practical uses of pointing a LLM at eDiscovery data that do not involve attempting to interact with it the way we interact with a service like ChatGPT. It could be used to summarize long documents or text message conversations. It could explain complex documents that would otherwise require input from an outside expert. Redaction tools can integrate LLMs to better identify and redact sensitive information and Personally Identifiable Information (PII). There are countless practical uses that do not involve asking questions of the dataset or attempting to have the LLM perform document review.

The approach of using a generically trained language model, such as GPT, for eDiscovery purposes presents both potential benefits and inherent limitations. This method involves applying the pre-trained model to analyze a specific set of legal documents. One of the key advantages is that it eliminates the need for retraining the model for each new dataset. Furthermore, the method will allow a user to reference metadata such as dates and email participants, of the documents in your eDiscovery data used to generate the model’s responses.

Despite these benefits, there are important caveats to consider. One limitation is that the answers provided by the language model may not be exhaustive or fully representative of the entire dataset, as they are based on the model’s search and analysis capabilities.

Another challenge arises from the maximum token limit imposed by large language models, which restricts the amount of text they can analyze and respond with at a given time. As a result, the AI might face difficulties in analyzing entire documents, particularly lengthy or complex ones, in a single pass.

These may sound like minor limitations, but what it means practically is that while you could ask the LLM a question about your data, it would not necessarily bring back an answer that encompasses the entire dataset. The answer would be limited to the documents returned by a search run by the LLM, potentially overlooking crucial documents that could have led to a more accurate response.

There are other practical uses of pointing a LLM at eDiscovery data that do not involve attempting to interact with it the way we interact with a service like ChatGPT. It could be used to summarize long documents or text message conversations. It could explain complex documents that would otherwise require input from an outside expert. Redaction tools can integrate LLMs to better identify and redact sensitive information and Personally Identifiable Information (PII). There are countless practical uses that do not involve asking questions of the dataset or attempting to have the LLM perform document review.

Lawyer vs AI

Hybrid Approach

While ingesting eDiscovery data into an LLM or using an LLM to read external data currently face technical limitations, a hybrid method could bring the AI takeover of document review closer to reality. It will likely be a while before a generic LLM is capable of completely replacing human document reviewers, but language models fine-tuned for specific types of document identification could be used in the near future. This would involve training and fine-tuning various language models for specific types of litigation. For example, in certain employment litigation, LLMs can be helpful in identifying patterns of discrimination, harassment or other workplace-related issues. In commercial litigation, LLMs can be used to identify specific and relevant clauses or provisions within contracts, flagging potential antitrust or competition law violations and reviewing large volumes of email for relevant information. This process would involve the LLM reading the contents of each document and performing categorization without training it for the specific dataset, rather for that category of litigation.

Another space to watch would be companies such as Microsoft incorporating an implementation of GPT into Compliance solutions. This integration could enable organizations and compliance officers to proactively identify and restrict the storage and dissemination of documents and messages with potential legal risks, mitigating potential lawsuits long before they materialize.

While ingesting eDiscovery data into an LLM or using an LLM to read external data currently face technical limitations, a hybrid method could bring the AI takeover of document review closer to reality. It will likely be a while before a generic LLM is capable of completely replacing human document reviewers, but language models fine-tuned for specific types of document identification could be used in the near future. This would involve training and fine-tuning various language models for specific types of litigation. For example, in certain employment litigation, LLMs can be helpful in identifying patterns of discrimination, harassment or other workplace-related issues. In commercial litigation, LLMs can be used to identify specific and relevant clauses or provisions within contracts, flagging potential antitrust or competition law violations and reviewing large volumes of email for relevant information. This process would involve the LLM reading the contents of each document and performing categorization without training it for the specific dataset, rather for that category of litigation.

Another space to watch would be companies such as Microsoft incorporating an implementation of GPT into Compliance solutions. This integration could enable organizations and compliance officers to proactively identify and restrict the storage and dissemination of documents and messages with potential legal risks, mitigating potential lawsuits long before they materialize.

Lawyer vs AI

Privacy Risks and Ethical Considerations in AI and eDiscovery

Using services like OpenAI involves allowing an external data source to read possibly sensitive information. It may be necessary to sanitize the data for PII or other confidential information prior to ingestion. This could be mitigated as companies such as Microsoft and Amazon have begun to offer Enterprise grade versions of generative AI that conform to strict security and compliance standards designed to protect sensitive data and adhere to various data privacy regulations. Additionally, there are a growing number of GPT style LLMs that can run locally on an eDiscovery vendor’s servers although at the time of this writing, few of them are authorized for commercial use.
Using services like OpenAI involves allowing an external data source to read possibly sensitive information. It may be necessary to sanitize the data for PII or other confidential information prior to ingestion. This could be mitigated as companies such as Microsoft and Amazon have begun to offer Enterprise grade versions of generative AI that conform to strict security and compliance standards designed to protect sensitive data and adhere to various data privacy regulations. Additionally, there are a growing number of GPT style LLMs that can run locally on an eDiscovery vendor’s servers although at the time of this writing, few of them are authorized for commercial use.
Lawyer vs AI

Key Takeaways

The integration of Large Language Models (LLMs) in eDiscovery presents a combination of opportunities and challenges. While LLMs have the potential to enhance certain aspects of document review, their current limitations prevent them from fully replacing human reviewers. A hybrid approach, focusing on training LLMs for specific litigation categories, may be a more viable option for improving eDiscovery efficiency in the near term.

It is also essential to address the ethical considerations and privacy risks associated with using LLMs in legal contexts. The possibility of inaccuracies in the generated content highlights the importance of maintaining human oversight in the review process. As LLM technology advances and becomes more integrated into legal and compliance solutions, it is crucial to develop a deeper understanding of the technical limitations to ensure accuracy and reliability when utilizing them in the context eDiscovery.

Please enable JavaScript in your browser to complete this form.

Stay in the Loop: Get Content Alerts