Twenty-First Century Delphic Oracle

A Lawyer (Me!) Looks At Artificial Intelligence


May 18, 2024
/ Author: Rick

I keep meaning to write a post about my experiences with ChatGPT-4, which I refer to as the “Twenty-First Century Delphic Oracle,” or just “the Oracle.” Those who know her well call her “Pythia” or “the Pythoness,” depending, I guess, on how they feel about her, where they’re from, or what language they speak.

(By the way, I’ve also tried other AI Oracles, like Gemini, with similar experiences to what I will talk about here.)

tl;dr version: I wouldn’t trust AI for actual legal work as far as I can spit Saturn.

A “Housekeeping” Note

I’m going to try something to significantly shorten this post for most readers. Because I will include some discussion of specific issues where you might not care about the details, I’ll use WordPress’s “accordion” blocks to include those. So you can easily skip over them if you want, or click on them to expand each section.

Without the accordions, this post is 1,416 words; with all the accordions, it hits 3,774. The WordPress prediction algorithm (driven by AI???) says it’s almost a 20-minute read if  you read all the accordions. If you skip them, it’s about a 7-minute read.

There’s some meat hidden in those accordions, though. If you have the time, you’ll want to check them out.

The Ancient Oracle at Delphi

The Oracle at Delphi, one of ancient Greece’s most significant religious sites, was considered the voice of Apollo, the god of prophecy. Located on Mount Parnassus, the sanctuary featured Pythia, the high priestess who served as the oracle. Delphi’s origins date back to the 8th century BCE, and it remained influential for over a thousand years.

According to myth, Apollo killed the serpent Python, who originally guarded the site. He then established his oracle, with the priestess named the Pythia after Python. The Pythia entered trances, often induced by inhaling fumes from a chasm, to deliver cryptic messages. Priests interpreted these messages and relayed them to seekers.

Consultation involved elaborate rituals. Individuals and city-states sought the oracle’s guidance on personal dilemmas and state matters. The oracle’s prophecies were famously ambiguous, requiring careful interpretation. For example, when King Croesus of Lydia asked if he should attack Persia, the oracle replied he would destroy a great empire. This prophecy proved true when his own empire fell.

The reliability of the Oracle at Delphi is debated. Some ancient sources revered her insights, claiming her pronouncements were divinely inspired and accurate. Others pointed to the ambiguity and potential for misinterpretation, suggesting the oracle’s statements could fit any outcome. Accounts also indicate that political and financial influences sometimes swayed the oracle’s pronouncements, leading to skepticism about her reliability.

Despite criticisms, the Oracle at Delphi remained central in Greek culture. She embodied the intersection of religion, politics, and social life in the ancient world. The site, now an important archaeological location, continues to attract interest for its historical and cultural significance.

Twenty-First Century Delphic Oracle

By the time you reach the end of this post, you’ll understand why I refer to ChatGPT-4 and other artificial intelligence “chatbots,” as twenty-first century Delphic oracles.

Artificial intelligence programs like ChatGPT are just one type of artificial intelligence, known as Large Language Models, or LLMs.

LLMs (and other AIs) don’t just spring into being ex nihilo nihil fit. They actually require specific kinds of “coddling” or training with specific kinds of data. You just don’t find an AI sitting on a bus bench, or hanging out at the park, and say, “Hey! Get a job! Go out and read the Internet! Learn something!”

Computer scientists “train” these AI programs on huge amounts of fondled data. In the end they create something that appears almost human, and seems to have reasoning, creativity, and emotional understanding. But the fact is, it doesn’t. AI’s dependency on human-designed algorithms and curated data underscores the point. AI is not an independent form of intelligence. Human intelligence creates (coding/programming), trains (with massaged/tagged data sets), and directs (through “prompts”) the program.

And while so far I’m talking about AI generally, Scott Greenfield — you almost had to know that was coming — points out one of the weaknesses of AI relating to a specific domain of knowledge: in this case, the law, and the idea that AI will fundamentally transform out legal systems, and, thus, our societies.

[T]he point is that the people pushing this notion, and creating the algorithms necessary to enable AI to play judge, jury and executioner, aren’t lawyers but computer nerds. They wouldn’t know a legally correct answer if it bit them in the butt. At least they have a grudging grasp that there might be issues.

— Scott Greenfield, “All Rise for Judge AI” (May 3, 2024)

Scott references numerous drawbacks to the fact that “computer nerds,” and not lawyers drive the car.

Twenty-First Century Delphic Fumes

A part of Scott’s analysis implies — and I completely agree — that the issue really boils down to this:

[A]s we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns—the ones we don’t know we don’t know. And if one looks throughout the history of our country and other free countries, it is the latter category that tends to be the difficult ones.

— C-SPAN Transript, “Defense Department Briefing,” quoting Donald Rumsfeld. (February 12, 2002)

These two types of unknowns impact the utility of our Twenty-First Century Delphic Oracle. While the Ancient Oracle inhaled “Delphic vapors” or “Pythian gases” before entering a trance and making predictions, our new oracles imbibe massive amounts of processed and spoon-fed text (and, for image-generating AIs, images). The cursor throbs as it enters its trance-like state in response to a user prompt: a question, sometimes “contextualized.” Then, the LLM spits out an answer. Maybe it’s useful. Or it might be ambiguous, requiring further prompting.

Maybe it’s a hallucination.

The Unknowns

When I first started writing this section, I was going to create separate lists for the known unknowns and the unknown unknowns. There’s a problem with this, though: the known unknowns often shade into unknown unknowns. Maybe that’s true of “unknowns” generally. I don’t know. But it’s particularly true when dealing with AI systems, like LLMs.

Ultimately, no one — not even the fumigated Twenty-First Century Delphic Oracle itself — knows what’s going on under the hood.

Large language models (LLMs) like the ones that power ChatGPT and Bard are different from revolutionary technologies of the past in at least one striking way: No one — not even the people who built the models — knows exactly how they work.

— Mark Sullivan, “The frightening truth about AI chatbots: Nobody knows exactly how they work” (May 17, 2023)

Therefore, I’m going to just talk about the various unknowns. I’ll list the known unknowns. If I can, I’ll drop hints about the unknown unknowns they may conceal. (I know. If they’re unknown unknowns, how can I do that? Well, let’s see what happens.)

We know that biases in training data can lead to biased AI outputs, but we may not fully understand the extent or nature of these biases until the models are deployed in real-world scenarios.

Technology does not exist outside of the biases and racism that are prevalent in our society. Studies show that facial recognition is least reliable for people of color, women, and nonbinary individuals. And that can be life-threatening when the technology is in the hands of law enforcement.  

Facial recognition automates discrimination.

— Rachel Fergus, “Biased Technology: The Automated Discrimination Of Facial Recognition” (February 29, 2024)(bold text in original)

That’s just one example. And it’s largely intractable. Partly, this happens because this “known unknown” encapsulates some “unknown unknowns.” As IBM said,

Such biases have a tendency to stay embedded because recognizing them, and taking steps to address them, requires a deep mastery of data-science techniques, as well as a more meta-understanding of existing social forces, including data collection. In all, debiasing is proving to be among the most daunting obstacles, and certainly the most socially fraught, to date.

— IBM Data and AI Team, “Shedding light on AI bias with real world examples” (October 16,2023), quoting Michael Chui, James Manyika, and Mehdi Miremadi, “What AI can and can’t do (yet) for your business” (January 11, 2018)

The “unknown unknowns” are multi-layered here. They live within our society. We can’t solve the problem of homelessness, poverty, or war because we either don’t know what causes it, or we inchoately understand the driving forces. They live in the data used to train our Twenty-First Century Delphic Oracle for the same reason. IBM states it more hopefully, hinting that “deep mastery” will solve the problem.

Good luck with that. Nor are these issues likely to be resolved by cross-domain fertilization; i.e., having lawyers, judges, and paralegals working with the “geeks” and “computer nerds” to develop AI programs and LLM training sets.

While we can track and understand many aspects of AI training, there are still unknowns regarding how specific decisions are made by complex models, especially in large-scale systems like LLMs. Sometimes, AI models pick up on spurious correlations in the training data.

For example, previous trials in recognizing pneumonia with Convolutional Neural Networks (CNNs) show that these models rely on the metal tokens in the chest X-ray scans from different hospitals, instead of learning the pathological features of the disease.

— Wenqian Ye, et al., “Spurious Correlations in Machine Learning: A Survey” (February 20, 2024)


Ensuring that AI systems are accurate and reliable in all legal scenarios is a significant challenge due to the complexity and variability of legal contexts. Errors or misinterpretations by AI in the legal field could lead to severe consequences, such as wrongful convictions or unfair rulings.

The legal language’s nuanced nature and the intricacies of jurisdictional differences necessitate an AI’s deep and contextually aware understanding. A misinterpretation of legal terms or contexts by an AI could result in incorrect legal advice or rulings, undermining the justice system’s integrity.

The lack of transparency and explainability in many AI systems further complicates the issue, as it is crucial for legal professionals to understand and trust the rationale behind AI-generated recommendations. Continuous monitoring, rigorous testing, and adherence to ethical guidelines are essential to mitigate these risks and ensure AI’s effective and fair application in legal scenarios.

As Scott put it,

To the lawyer, the old joke that the only correct answer to a complex legal question is “it depends” isn’t a joke, but the real response. To courts, the answer is often hard and contentious. Indeed, the most celebrated cases, whether for progress or otherwise, have reflected a paradigmatic shift in the law. Think Brown following Plessy, or Heller or Roe, and then Dobbs.

— Scott Greenfield, “All Rise for Judge AI” (May 3, 2024)

Lawyers spend years learning the nuances. Court opinions evolve legal concepts over time.

And “geeks” and “computer nerds” — to use Scott’s terms — program LLMs.

We anticipate that AI and LLMs can behave unpredictably in new environments, but the exact nature of these behaviors and their impacts often remain unknown until observed in practice.

The most obvious way this happens has already been discussed above: data bias. Human-AI Interaction (discussed more below) introduces its own set of unintended consequences.

Another of my friends, Mark Draughn, writing about “How to Use AI in Your Legal Practice,” talks about contextualizing prompts. Indeed, search the Internet yourself, and you’ll see any number of courses (ranging from YouTube videos to online university courses) on “prompt engineering.” Why? Because not only are LLMs biased, they’re gullible (especially when prompted with leading questions).

The long-term societal and ethical impacts of widespread AI deployment are partially understood, but there are many unknowns about how these technologies will interact with social systems over decades.

One Intermonetary Fund blog post argues that AI will disrupt — modify or replace — up to 40% of jobs worldwide in coming years. The longer-term impact includes power structures within our societies, wealth gaps, social behaviors, and even climate change in ways we can’t even fully understand (for all the same reasons we don’t understand now, like how LLMs work, and other unknown unknowns. Even our Twenty-First Century Delphic Oracle can’t hallucinate a prediction. I know. I asked. ScholarGPT responded with ideas, but stated it could not be predicted. It then provided these sources — which I have not checked, but which sound quite interesting.

Sources

  1. Brynjolfsson, E., & McAfee, A. (2014). The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies. W.W. Norton & Company.
  2. Manyika, J., et al. (2017). Jobs Lost, Jobs Gained: Workforce Transitions in a Time of Automation. McKinsey Global Institute.
  3. Topol, E. (2019). Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again. Basic Books.
  4. Obermeyer, Z., & Emanuel, E. J. (2016). Predicting the Future — Big Data, Machine Learning, and Clinical Medicine. New England Journal of Medicine, 375(13), 1216-1219.
  5. Luckin, R., Holmes, W., Griffiths, M., & Forcier, L. B. (2016). Intelligence Unleashed: An Argument for AI in Education. Pearson.
  6. Holmes, W., et al. (2019). Artificial Intelligence in Education: Promises and Implications for Teaching and Learning. Center for Curriculum Redesign.
  7. Zuboff, S. (2019). The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power. PublicAffairs.
  8. Acquisti, A., et al. (2015). Privacy and Human Behavior in the Age of Information. Science, 347(6221), 509-514.
  9. O’Neil, C. (2016). Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Crown.
  10. Buolamwini, J., & Gebru, T. (2018). Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. Proceedings of Machine Learning Research, 81, 1-15.
  11. Amodei, D., et al. (2016). Concrete Problems in AI Safety. arXiv preprint arXiv:1606.06565.
  12. Floridi, L., et al. (2018). AI4People—An Ethical Framework for a Good AI Society: Opportunities, Risks, Principles, and Recommendations. Minds and Machines, 28(4), 689-707.
  13. Allen, G., & Chan, T. (2017). Artificial Intelligence and National Security. Harvard Kennedy School Belfer Center for Science and International Affairs.
  14. Strubell, E., Ganesh, A., & McCallum, A. (2019). Energy and Policy Considerations for Deep Learning in NLP. arXiv preprint arXiv:1906.02243.

Potential weaknesses and exploitable vulnerabilities in AI systems can be anticipated but not fully known until they are discovered and exploited by malicious actors.

How might this impact lawyers? The legal system? Judges? I don’t really know, but I do know that almost all legal research tools are touting (and trying to sell lawyers on) their AI tools — nearly all based on LLMs.

Adversarial attacks, data poisoning, model inversion and model extraction attacks can impact law firms and courts in ways we don’t yet know. This includes manipulation of data, exposure of confidential data, and intellectual property theft (with modifications, if we’re talking model extraction).

As AI applications expand — again, I don’t know exactly how, nor probably does anyone else right now — these become bigger issues. What happens when we outsource control of in-house information storage systems to AI?

While we understand some aspects of how humans interact with AI, many nuances of this relationship, particularly in diverse cultural and socio-economic contexts, remain unknown. I already talked about the intricacies of prompt engineering. (This, by the way, can impact the results that attorneys using LLMs get while doing legal research.)

Lawyers using LLMs exemplify this complexity. Lawyers from different backgrounds interacting with LLMs can vary significantly, influenced by their cultural norms, legal training, and the specific legal environment they operate in.

Moreover, the socio-economic context can influence how accessible and reliable these AI tools are for different legal practitioners. Lawyers in well-resourced firms might have access to the latest AI technologies and training, while those in underfunded legal services might struggle with outdated or less accurate systems. This disparity could exacerbate existing inequalities in the legal system.

Additionally, cultural differences in communication styles and legal reasoning can affect how lawyers interpret and use AI-generated advice, potentially leading to misunderstandings or misapplications of the law.

We know that AIs can generalize from their training data, but the boundaries of their generalization capabilities, especially when encountering vastly different data, are not fully known. While large language models (LLMs) like GPT-4 have shown impressive abilities to generate human-like text and provide apparently-insightful responses based on patterns in their training data, we have already seen that they are not infallible. The models lack true understanding. They can and frequently do produce misleading or incorrect outputs when faced with unfamiliar or out-of-context information. This limitation is particularly critical in the legal field, where accuracy and context are paramount.

Legal professionals rely on precise language, specific terminology, and nuanced understanding of context that AI may not fully grasp. For example, while an AI might correctly generalize the structure of legal arguments from its training data, it could falter when novel legal issues arise or when interpreting complex legal statutes that were not well-represented in its training set. This unpredictability can lead to significant errors in legal reasoning or the generation of advice that might seem plausible but is fundamentally flawed. The lack of Artificial General Intelligence (AGI) means that LLMs operate without the deeper comprehension and reasoning abilities that humans possess, limiting their reliability in dynamic legal environments.

Don’t forget how LLMs are trained: huge amounts of data are dumped into their training sets. Much of this data is massaged, as I noted early on in this article.

Moreover, the legal field is characterized by constant evolution and variation across jurisdictions. An AI’s training data may not encompass the latest legal precedents or regional legal nuances, leading to potential gaps in its outputs. Lawyers and legal scholars must remain vigilant, understanding that while AI can assist in research and drafting, it cannot replace the critical thinking and interpretative skills of a trained legal mind. Ensuring that AI systems are used responsibly and in conjunction with human oversight is essential to avoid the pitfalls of over-reliance on these technologies, thereby safeguarding against the risks of wrongful convictions, unfair rulings, or other serious legal misjudgments.

As LLMs grow more complex, they may exhibit emergent behaviors not explicitly programmed. Predicting these behaviors in advance remains a significant challenge.

The ethical ramifications of AI decisions, especially in complex, real-world scenarios, are known to be significant but are often unpredictable and poorly understood.

We already talked about prompt engineering, and how prompts can affect output. But what about the use of AI systems for analyzing pretrial release risk factors? What about (I have no idea if someone is working on this, but since I thought of it, someone else probably has, too) using the Twenty-First Century Delphic Oracle to make sentencing decisions?

Remember our biases become the Oracle’s biases. And we don’t always know when, or how, that happens.

Ensuring AI systems comply with diverse and evolving legal standards globally is probably impossible, and unknowns about future regulations and their interpretations pose significant uncertainties.

Scott Greenfield talked about a lot of this, including the discussion of the so-called “legal singularity.”

The legal singularity refers to a theoretical point in the future when AI systems have advanced to such an extent that they can understand, interpret, and apply legal principles better than humans. It implies that AI would not only assist but potentially surpass human capabilities in all legal functions. The idea is inspired by the broader concept of the technological singularity, which predicts a point at which AI surpasses human intelligence across all domains, leading to unprecedented technological growth and societal change.

But as Scott pointed out,

But the point is that the people pushing this notion, and creating the algorithms necessary to enable AI to play judge, jury and executioner, aren’t lawyers but computer nerds. They wouldn’t know a legally correct answer if it bit them in the butt.

— Scott Greenfield, “All Rise for Judge AI” (May 3, 2024)

Scott also noted issues involving stare decisis, jurisdictional issues, and the evolution of law and society.

Jurisdictional issues alone probably make the singularity an impossibility. Unless we’re planning to cede total control of all human societies to our Twenty-First Century Delphic Oracle, and its descendants. Variations in statutes, rooted in variations in values, aren’t going to go away. Will the United States Constitution and issues of free speech give way to the demands of the GDPR?

Meanwhile, there are distinctive approaches to law itself. Some countries (or states) are common-law jurisdictions; others are civil. Within the United States, think Louisiana versus all the other states; or consider France. Canada (that I know of) has a hybrid system that has elements of common-law and civil law. (For the same reason Louisiana is different from other states of the United States. Essentially, it boils down to the influence of British common-law versus the French civil system.)

Developing AI systems with explainable and transparent decision-making processes can help legal professionals understand and trust AI recommendations. Not to mention the general public.

Especially if LLMs are used to analyze cases and defendants for pretrial release recommendations, risk assessments, and sentencing, there must be some way to understand how the AI arrived at its conclusions.

Audit trails for AI decisions provide insight into how conclusions were reached.

The Bottom Line

Despite its drawbacks and the caveats that come from what I’ve written (especially in the accordions) — the fumes it’s breathed, if you will, that impact its predictions and pontifications — I have found my Twenty-First Century Oracle useful.

I don’t use it to write briefs, or contracts, or to collect data from clients. I’ll talk more about this in a second article, already in progress. Because, ironically, I started writing this article as a response to Mark Draughn’s “How to Use AI in Your Legal Practice.” It began life as a comment to that article on his Windypundit blog.

When it passed a few paragraphs, I erased the comment, and decided to write my own article. I remembered Scott’s article, “All Rise for Judge AI,” and thought I’d incorporate both articles in my own.

As it turns out, including the accordion blocks, I’m now surpassing 3,700 words. I suspect nobody has time for that. I hope there’s enough interest that people read the article. (Leave me a comment, please? Love it? Hate it? It’s okay? Or anything else you want to say.)

And I haven’t even started talking (much) about my favorite AI issue: Hallucinations.

For that reason, I’ve decided that I’m going to do another article — probably much shorter, but separate nonetheless — specifically addressing Mark Draughn’s ideas, and talking about the ways in which I have used various AI programs in my practice.

Stay tuned!

Did you enjoy this post? Leave a comment below! And if you haven’t already subscribed, click the button and get my free ePamphlet on “How to Hire a Criminal Defense Lawyer.”

2 Comments

  1. It sounds like you’ve spent a lot of time thinking about these issues. You don’t offer a lot of answers, but you are asking a lot of really good questions. That sounds about right. These are difficult questions with, as you point out, a lot of unknowns. When it comes to AI, we should all be suspicious of anyone who claims to have all the answers.

    One concept you touch on in several places is that LLMs don’t handle hard constraints very well. Software systems like databases and spreadsheets are all about structured data — what state something is in, how much it cost, whether it’s insured, etc. LLMs aren’t so organized. So if you ask a legal question and prompt the AI to only respond using the laws of Wyoming, it may not be able to comply completely, because its neural network doesn’t necessarily capture the state from every legal document and law review article it ever read. I would expect it to have similar problems distinguishing between different types of business organizations, financial thresholds, or the different laws that apply to adults and minors.

  2. Denise Chaffee says:

    Well written article on a fascinating subject! I am going to forward to several colleagues who seem keen on learning more about this.

Leave a Comment

Your email address will not be published. Required fields are marked *