The Place of Machine Learning in the Legal Realm
We often overestimate the ability of human judgment to predict the probability of an outcome. Our reliance on instinct, previous experience, or rules of thumb frequently leads us to make unreliable predictions. Studies have shown that formal algorithmic predictions can surpass human judgment that is based on informal contemplation and discussion.
My colleagues at the University of Toronto and I have applied recent advances in machine learning to create a system that uses algorithmic predictions to augment human judgment in the legal realm. By drawing on data extracted from previous judicial decisions, our machine learning system can make predictions about how courts would rule in new scenarios. Each prediction is also accompanied by a confidence percentage reflecting the likelihood of the outcome. The prediction draws on data from past cases and the algorithms make predictions based on the merits of the case. These predictions are accurate in over 90 percent of out-of-sample cases.
In a previous article, we saw the effects of behavioral control factors on the outcome of a worker classification case. By toggling various factors, we saw how weighing the dynamic relationships between factors allows the machine learning algorithm to arrive at a prediction. In this article, we’ll explore the application of machine learning to another tax problem: the question of whether the gains or losses from a real estate disposition should be taxed as capital or ordinary income. Machine learning can detect the relative importance of factors by leveraging data from previous case law. This approach allows us to ascertain which factors have the greatest impact on a predicted outcome for the determination of income vs. capital.
Income or Capital
The question of whether gains or losses on the sale of real estate should be treated on account of ordinary income or capital gains is particularly relevant for individual taxpayers, since capital gains are taxed at a lower rate than ordinary income.
“Capital assets” are defined by exclusion in tax code Section 1221. For the purposes of real estate dispositions, the key exclusion is paragraph 1221(a)(1), which states that a property is not considered a capital asset if it is “held by the taxpayer primarily for sale to customers in the ordinary course of his trade or business.” This exclusion figures largely in real estate tax cases because the outcome often turns on the extent of the taxpayer’s involvement in the real estate business.
Courts have developed and used several different multi-factor tests to assist in the determination of whether a gain or loss on the sale of real estate is on account of income or capital. The tests most commonly referred to are the seven-factor test set out in United States v. Winthrop, and the more recent three-step test outlined in Suburban Realty Co. v. United States:
1. Was the taxpayer engaged in a trade or business?
2. Did the taxpayer hold the specific property at issue primarily for sale in that business?
3. Were the sales made in the ordinary course of that business?
Factor tests are useful in taking account of the facts of the situation, but mere enumeration of factors does not necessarily lead to an accurate prediction of the outcome. Unlike predictive methods that rely on tallying up factors or employing more traditional forms of statistical analysis, machine learning allows for a nuanced approach that reflects the complexity of the legal question at hand. While lawyers can only draw on a handful of leading cases, machine learning can digest the facts of hundreds of cases in order to parse out the relative importance of each factor.
To demonstrate machine learning’s sensitive grasp of the relevant factors, let’s examine a recent case determining the nature of gains from a real estate sale.
Boree v. Commissioner
In November 2002, Gregory Boree and a partner, doing business as Glen Forest LLC acquired nearly 2,000 acres of vacant real property in Baker County, Fla. Glen Forest then engaged in a series of development activities, including pursuing rezoning and obtaining county approval for the initial phases of development. Beginning in late 2004, the county adopted a series of land use restrictions affecting Glen Forest’s property, West Glen Estates. In 2007, Glen Forest sold just over half of the acres of the West Glen Estates property to a developer, Adrian Development.
The issue to be determined was whether the Borees’ gain on the sale to Adrian was taxable as ordinary income or a capital gain. The U.S. Tax Court held that the gain was not a long-term capital gain. The taxpayers appealed, and the U.S. Court of Appeals for the Eleventh Circuit affirmed the ruling on tax liability.
According to our machine learning algorithm, the facts of the Boree scenario result in a finding of “not a capital asset” with a confidence level of 94 percent.
The taxpayers’ first contention before the court was that land use restrictions imposed by Baker County in 2005 and 2006 made further development “practically impossible.” As a result, their purpose in holding the property had changed from sale to investment.
Even if we credit the taxpayers’ argument and adjust the scenario for a change that made it difficult but not impossible to fulfill their primary intention of development, the finding remains “not a capital asset” with 92 percent confidence.
In this case, the taxpayers’ contention that their purpose in holding the property had changed makes only a minor difference. If we take a closer look at the facts of the case that support an ordinary income finding, it becomes clear that the other two factors under the Suburban Realty test—the taxpayer’s engagement in a trade or business, and whether the sales were made in the ordinary course of that business—point to a finding of ordinary income. The taxpayers pursued development activities throughout the period of holding the property and, in addition, the Tax Court found that between 2002 and 2006, the taxpayers made “frequent and substantial” sales to customers in the ordinary course of business, even after the imposition of the land use restrictions. Our algorithm suggests that these two factors outweigh any government restrictions or changes in intention in this case.
Testing an Alternative Scenario
Beyond changes of intention, our algorithm is able to account for numerous other factors in making its predictions, including the number of sales, the motivation for selling, and the circumstances of the disposition. Let’s explore how factors informing the first and third elements of the Suburban Realty test can affect the confidence level of the prediction.
In this instance, let’s suppose the Borees were not clearly engaged in the business of developing and selling real estate property. Imagine that after the county started to impose land use restrictions in 2004, the taxpayers stopped selling as many lots and began to dedicate more time to their other businesses, such that sales of the West Glen Estates property were no longer their primary source of income. Let’s also suppose that the sale at issue was initiated by an unsolicited offer from a potential purchaser. How would the outcome change in this scenario?
When we change these factors, the algorithm still offers a prediction of “not a capital asset,” but with a confidence level of 69 percent. While the result is the same, the machine learning algorithm’s confidence has moved 25 percent in the direction of capital in response to our situation’s mixed set of facts. A lawyer advising the taxpayers in this scenario would see that despite some important facts pointing to capital, a judge would still probably give more weight to the facts that indicate ordinary income.
Both Boree and our alternative scenario demonstrate the ability of data and machine learning algorithms to determine the dynamic relationships between different factors in both individual cases and across the case law. By detecting patterns in the case data, our system can simultaneously weigh multiple factors and detect the connections between them.
Multi-factor tests are useful tools for both decision makers and counsel in that they provide flexible legal standards for deciding unique scenarios. But such standards can be vague, and counsel may be unclear as to which factors should be weighed more heavily.
Human forecasts are imbued with bias and noise, whereas data and machine learning can assess the relative importance of factors much more accurately. In a situation where a tax lawyer must advise a client in the context of compliance or litigation, machine learning can augment human judgment and lead to more accurate forecasts.
Benjamin Alarie holds the Osler Chair in Business Law at the University of Toronto Faculty of Law and is the CEO of Blue J Legal.