ChatGPT Court Brief Offers Cautionary Tale for Tax Pros, Too

A lawyer’s recent use of ChatGPT to erroneously draft the bulk of a court brief is a cautionary tale for practicing attorneys about the functional limitations of generative artificial intelligence—and not just in the legal tech world. Tax professionals should take note before making a similar mistake.

The lawyer’s blunder, laid bare for all to see, isn’t really an AI-gone-awry story at all—it’s an anecdote for what happens when available technology outpaces widespread technical knowledge in a domain. Large language models speak with conviction but often don’t have the facts to back up their assertions. In that way, they’re quite human-like, except you can’t throw ChatGPT under the bus and expect to escape reproach.

The corresponding stories for every industry are being written as we speak. Rather than wait for the tax story to unfold, maybe we can provide a pretend cautionary tale instead: a tale of two practitioners, each with their own bad ideas.

The Oversharing Accountant

Let’s say our first pretend practitioner has a lot of data—a spreadsheet of employees—that was going to take hours to “clean,” or get organized. They saw ChatGPT as the key to their early weekend.

The plan? Simply dump the data into ChatGPT and ask for it in a different format. The entered prompt was something along the lines of, “Please take the names in this list and put them in a table grouped by employment status.”

It’s not a bad plan from a technical perspective. Reformatting information given into some other form is one area where language models really do excel. Many practices within the tax industry would benefit from automating those sorts of tasks. Unfortunately for the fictional practitioner, the data dumped into the model was ingested by this also fictional version of ChatGPT.

A few days later, let’s say another user requests information about what distinguishes an independent contractor from an employee and is given a response that contains the employee information the practitioner provided. The model has correlated information about employment status with specific details also provided by the practitioner. Personal information has been leaked and we’re in full-blown data breach territory—that’s a big problem for everyone involved.

Now, someone is reviewing Publication 4557 and reporting client data theft to the IRS. And the practitioner is explaining why they thought dumping personal information into a chat box was exercising the duty of care they owed personal client data.

While ChatGPT’s owners claim the model doesn’t retain information gleaned from conversations with users for enhancement purposes, even if that’s true now, it needn’t be the case permanently. And it certainly isn’t the case across all language models. Owners of these models don’t make them available as a public service—they see value in allowing the public to interface with them and learning what prompts are being fed to it.

The moral of the first story is that personal information should never be included in prompts sent to any AI owned by a third party. A good rule of thumb: If you wouldn’t send something in a chat message to a random human user on the internet, don’t give it over to a random AI.

The ChatGPT logo on a smartphone in Brooklyn, N.Y., on March 9, 2023.

Photographer: Gabby Jones/Bloomberg via Getty Images

The Preparer With Too Much Trust

Our second practitioner made a more recognizable error: assuming ChatGPT’s tax research was accurate. Their overestimation needn’t rise to the level of asking the model to draft them a brief and cite sources; a simple question that turns on a nuanced semantic point can get you in just as much trouble.

Here, our cautionary tale is about a pretend preparer who asked ChatGPT for a list of states that subject alcohol sales to their sales tax. They then confirmed that Massachusetts was among them by getting an affirmative answer when directly asking ChatGPT, “Does Massachusetts subject alcohol sales to sales tax?”

Massachusetts has a complex regime for taxing alcohol and alcoholic beverages that a simple “yes” answer fails to capture. There are separate wine, beer, and liquor taxes. The general sales tax of 6.25% sometimes, but doesn’t always, apply.

Without a complete understanding, it’s hard to say for certain—but the distinction between a “sales tax” on alcohol and an “alcohol tax” on sales of alcohol might not immediately be apparent to a language model. In that way, once again, the model is surprisingly human-like.

Sign up for our new Artificial Intelligence newsletter to get legal, regulatory and business updates, as well as news on what the evolving technology means for the practice of accounting.

The Takeaway

The tax practice is all about nuances. Tax has an entire body of case law—the sham transaction doctrine—explicitly disclaiming transactions that are undertaken to give the color of legitimacy while existing solely to garner positive tax effects. Current language models wouldn’t be able to tell the difference between a transaction entered for economic purposes and one engaged for tax benefits.

These errors illustrate how the proliferation of large language models with public-facing interfaces has altered what kind of research savvy is required of users. A Google search using the same terms as the above ChatGPT prompt returns a Massachusetts governmental website explicating the alcohol tax regime. In that way, for better or worse, Google already has handled much of the work of determining whether a site is authoritative on a given term.

Myriad factors are being weighed as we determine a website’s credibility: the language used, how we came upon it, and the domain, to name a few. ChatGPT and language models that are sure to follow short-circuit that credibility test. Even when the answer is incorrect, it will be written using proper grammar, without any waffling or weasel-words, and will use knowledge area-specific keywords that give it an overall air of expertise.

When relying on language models for research, it’s prudent to think of a quote often (incorrectly) attributed on the internet to Mark Twain: “There are two types of speakers: those that are nervous and those that are liars.” Notice that ChatGPT is never nervous.

Look for Leahey’s column on Bloomberg Tax, and follow him on Mastodon at @andrew@esq.social.

Learn more about Bloomberg Tax or Log In to keep reading:

See Breaking News in Context

From research to software to news, find what you need to stay ahead.