Benjamin Alarie is the Osler Chair in Business Law at the University of Toronto and the CEO of Blue J Legal Inc. Bettina Xue Griffin is a senior legal research associate at Blue J Legal.
In this article, Alarie and Griffin review Blue J’s machine-learning predictions from the past year and reflect on the state of the technology of tax prediction.
Copyright 2022 Benjamin Alarie and Bettina Xue Griffin.
All rights reserved.
I. Introduction
In this first installment of Blue J Predicts for 2022, we take stock of this column’s inaugural year of 2021. We review our predictions from the past year and reflect on the state of the technology of tax prediction. In our 2021 articles we applied machine learning (ML) algorithms to analyze the likely outcomes of pending or recently decided federal income tax cases on an assortment of tax issues. Over the past year, our columns have addressed cases on topics such as the economic substance doctrine, innocent spouse relief, worker classification, captive insurance arrangements, the deductibility of ordinary and necessary expenses, the assignment of income doctrine, and bona fide partnerships. Each installment of our column discussed various predictions on the outcome of a case and the insights that our algorithms were able to generate on the relevant tax issues by leveraging ML.
In this article, we reflect on the predictions we made over the past year and provide general observations about how tax practitioners are beginning to learn how to leverage the insights of ML to “crack the code.” We also examine how practitioners use ML to quantify risks for their clients and ensure that tax advice can properly withstand scrutiny from the IRS and the courts. The goal is to guide tax experts in their tax planning and to help them devise the most effective ways to resolve tax disputes, leveraging new tools and technologies.
II. What Is Machine Learning?
When many people think of artificial intelligence, they imagine a science-fiction robot that behaves in ways that mimic human intelligence and is capable of replicating (or improving on) what an experienced and well-educated human will do in any intellectual endeavor. That type of generally competent AI is commonly referred to as artificial general intelligence (AGI). AGI is not now within technological reach, and may not be for many years, though of course research efforts abound in academia and industry to improve current techniques. Opinions vary widely on how distant AGI is from being realized. Many computer scientists predict that it will happen before 2050; the majority appear to expect that it will be realized before the year 2100.1 Views differ on how AGI, if it comes about, will be developed. Some researchers affiliated with the AI company DeepMind have recently claimed that reinforcement learning — a particular subset of AI research — is likely to be a core contributor to bringing about AGI.2 Others claim that recent advances in language modeling and ML along the lines of OpenAI’s GPT-3 and related language models are sound bets for realizing AGI.3 Others still suggest that there will be other, perhaps as-yet-unknown, methods for bringing about AGI. Unfortunately, the easy association of AI with AGI and speculation about when machines will surpass human capabilities run the risk of tax professionals dismissing AI as science fiction. For many tax professionals, it is natural to conclude, “I’ll believe it when I see it.” But adopting this not-unreasonable attitude towards AGI sometimes leads tax professionals to miss the advantages of relying on the benefits of more tailored machine-learning solutions to specific tax planning and tax dispute resolution challenges.
Artificial narrow intelligence in the form of tailored applications of ML is now being used in many different industries to make predictions about the likelihood of future outcomes and events. ML involves the use of algorithms and statistical models to learn from data without explicit programming — and it is versatile and powerful. For example, a humorous example from the HBO sitcom “Silicon Valley” included a reference to an ML model that could predict whether an image contained a hot dog or not by being trained on numerous images, each labeled with “hotdog” or “not hotdog,” and letting the algorithm identify the patterns in those images.4 After enough training data, the ML model could make accurate predictions with new unlabeled image data. This example is one subset of ML called supervised machine learning because the answer is already labeled for the algorithm during its development phase. After the model is “trained,” it will still need feedback on the accuracy of its predictions so that it can continue to update by finding better patterns.
For the legal profession, one of the novel applications of ML is the prediction of judicial and administrative decisions. It is now possible to uncover hidden statistical patterns in publicly available tax decisions to predict the outcome of new and novel sets of facts and related factual scenarios. By analyzing the facts and circumstances of past cases paired with their outcomes, ML models can make predictions on legal issues with reasonable accuracy. In this context, the accuracy of an ML model is determined by the average rate of agreement between the model’s predicted results and the actual results from IRS and court rulings on a particular legal issue. Blue J Legal Inc.’s ML models have an average agreement rate of 94.7 percent with the decisions reached by the courts for the tax issues available on Blue J’s platform.
Moreover, algorithms can be sensitive enough to uncover patterns in the data that would otherwise be functionally invisible, even to experienced lawyers. For example, leveraging ML models, practitioners can determine the likely direction and magnitude of effect that varying a particular finding of fact might have on the outcome reached in court. The benefit of using ML algorithms to run those analyses is that the effect of each fact is contextualized and dynamic rather than static. This means that any given fact in the context of a particular matrix of facts relevant for a file may have a different magnitude of effect on the most likely outcome depending on what other facts are also stipulated as being present or absent. The details of the dynamic interaction of facts are discussed in more detail below. In short, leveraging ML models allows us to isolate the effect of just one factual element by controlling for other factors — which is effectively impossible to do reliably without computational assistance. A tax practitioner will find it particularly challenging to read over 300 cases on any given legal issue and rank the facts in order of their precise effect on the outcome, but an ML model is capable of accomplishing such a feat with reasonable accuracy.
III. Building ML Models for Tax
Because an ML model learns to model relationships based on data and is not directly programmed by a human to place particular weight on any given element of the data, an ML model is only as good as the underlying data on which it has been trained. Indeed, the accuracy of the ML algorithm and its predictive ability is highly dependent on the quality of the data that is provided for the algorithm to “learn” from. Data scientists often remark, “garbage in, garbage out.”5 Just as in baking, using poor-quality ingredients or the wrong ingredients will lead to a different and inferior result. Therefore, data scientists and experienced lawyers should be involved at each step of the ML model-building process, from the selection of data to the monitoring of the accuracy of the predictions.
The process of building ML models to predict tax outcomes at Blue J begins with selecting legal questions that deal with areas of considerable legal uncertainty caused by the interaction of multiple factors and the absence of bright-line rules. Once the legal topic is selected, cases discussing the relevant legal question are then reviewed by our in-house legal research team to translate them into structured data. Throughout the translation process, the collected data is reviewed to ensure quality. The data collection process is a human-centric process to ensure that the data collected is accurate and that nuances in the language of the case are captured. This step is important. A team of trained individuals in tax law can identify the nuances of the decision and read between the lines with effectiveness that ML language models struggle to attain and cannot match, which ensures that clean and accurate data will form the foundation to assemble accurate ML models.6
Following the completion of the translation process, our data science team builds, tests, and refines a variety of ML models to identify a model that can make predictions with reasonably high accuracy and that aligns well with legal intuition. We employ out-of-sample testing to verify the accuracy of our predictions. By holding back some of the cases we identified as relevant and training our algorithm on the remainder, we can test the accuracy of our predictions by analyzing the frequency with which we can correctly predict the outcomes of those cases that were not used to train the model.
Once our ML model is made available for use on the Blue J platform, we monitor the feedback from our customers, consisting of lawyers and accountants. Generally, users disagree with Blue J’s predicted outcome less than 2 percent of the time. When users do indicate a disagreement, further inquiries are made to ensure that the ML model is not misbehaving. Occasionally, these further inquiries lead to changes and further enhancements of the product. As new decisions are released, we monitor rates of agreement between them and the algorithm. Further, any individual case that the courts and the ML model fail to agree upon in the outcome is analyzed to determine if it represents an outlier or a shift or change in the law. This information is then used to further adjust and refine the ML model as needed. Finally, any new decisions are added to our data set so that the ML model has the benefit of incorporating the latest case law.
Through this process, Blue J has built ML models that can anticipate court decisions reasonably well; these ML models have an average agreement rate of 94.7 percent with the judgments reached by the courts. And there are dozens of tax issues that are amenable to this kind of ML modeling: Each of the U.S. tax predictive ML models that has been developed and published for general use has an agreement rate with the courts of 90 percent or higher.
IV. Applying ML to Tax Law
The competitive advantage of using ML tools in the practice of tax will be similar to the competitive advantage of using electronic spreadsheets in bookkeeping and accounting. Within this decade, we expect that the use of these tools will be synonymous with the practice of tax law. The distinct advantages of using ML models in tax analysis include (1) the ability to concretely and specifically quantify risks for clients, (2) the ability to identify the best tax planning and business strategies (that is, to “crack the code”), (3) the ability to identify weak points and uncover blind spots, and (4) the ability to identify the most effective litigation strategies.
A. Quantify Risk
Clients grow tired of hearing the answer “it depends.” Although this may be the most appropriate answer given the many unknowns that may exist in complex tax planning situations, it would be less frustrating as an answer if a tax practitioner were able to point to exactly what the answer depends on and by how much in a way that is demonstrably backed by data.
To demonstrate how ML models can form the foundation of quantifying risk for a taxpayer in concrete terms, consider a situation in which a taxpayer wishes to assign income she earned to a different but related taxpayer so that the income is taxed in the hands of the related taxpayer. Although there will be ways for it to appear on paper that the related taxpayer earned the income, the courts will look to who controls the products, services, and funds at issue to identify the true earner of the income, and the income will be taxed in the hands of that true earner. This is the case even if there is an agreement to assign the income to a different taxpayer. It is only in a small number of cases that courts have been willing to accept the legitimacy of an assignment and have held that the assignee is liable for the earned income.
An example of a case in which the IRS and the Tax Court saw through a taxpayer’s assignment of income agreement with a related entity is Ryder,7 which was the subject of our November 2021 installment of Blue J Predicts.8 A tax practitioner faced with a fact pattern similar to that in Ryder would be able to quantify the risk of an assignment of income and communicate it to their client in the following ways:
that the likelihood that the IRS or the courts will disallow the assignment of income with this particular proposed structure is more than 94 percent, based on a comprehensive analysis of 297 IRS or court rulings in which the assignment of income was an issue;
that as a general matter only 5.8 percent of the IRS revenue rulings and court decisions involving an assignment of income for services have found that the income was appropriately assigned to another taxpayer; and
that significant changes will need to be made to the business arrangement for it to come close to successfully assigning the income to the related corporation, including significant changes to the structure of the agreements with the clients. For example, a clear agency relationship would almost certainly be needed between the professional corporation and the related corporation.
Quantifying the risk for the client in this way drives home how significant the degree of risk exposure is for the client. The practitioner can support their risk assessment with a neutral data-backed ML analysis of similar situations in the past.
B. Optimize Business/Tax Strategy
Sometimes a client will want to explore every avenue of obtaining their desired tax outcome despite the low probability of that outcome materializing. Alternatively, a client may wish to pursue a business strategy without triggering unintended tax consequences. In both situations, the client will expect their tax professional to point with confidence to the most relevant considerations that influence the tax consequences and provide an opinion on the most likely outcome.
Regardless of whether business decisions are made for purely substantive business reasons or purely to secure a specific tax efficiency, a tax practitioner who can identify what changes need to be made (or must be avoided) based on the most likely tax result allows the client to make their decisions with greater clarity. Akin to peering into a crystal ball to understand the likelihood of future events, clients can base their decisions on better information to understand their likely future tax position. In reality, what may look like a crystal ball is simply a computational tool that can harness data to project future states of affairs based on the past experience of the government and legions of taxpayers in court.
A prime example of a company that made sweeping changes to its business to position itself more favorably to obtain its desired outcome is Uber. Uber and other gig economy companies have been engaged in extensive litigation — principally in the employment law context rather than in the tax context — over whether their workers are independent contractors or employees. In our December 2021 installment of Blue J Predicts, we examined the insights that our ML algorithm uncovered on the classification of Uber drivers for federal income tax purposes.9 In that article, we discussed how two of Uber’s recent business decisions had an opposing effect on the likelihood of whether some of its drivers will be classified as employees or independent contractors for tax purposes.
First, in January 2020 Uber modified its standard form contract with its drivers to provide them with more flexibility. This flexibility included the ability to set their own fares and to contract directly with passengers outside of Uber. Based on our ML model of worker classification decisions, a worker in a similar situation as the class plaintiff in James10 — who takes full advantage of the added flexibility in the contract — would be 23 percent more likely to be classified as an independent contractor than they would have been before the change. In other words, with this one change, Uber has significantly increased the chances of many of its drivers being classified as independent contractors for federal income tax purposes.
A second business decision was less favorable for securing an independent contractor classification. Uber successfully lobbied in favor of California’s Proposition 22, which guarantees a minimum base pay and compensation for some expenses as well as healthcare benefits but exempts Uber’s drivers from the application of Assembly Bill 5 (AB5). AB5 codified the decision in Dynamex11 that created a general presumption that workers are employees unless an employer can satisfy the three-part ABC test. AB5 increased the likelihood that Uber drivers and other gig economy workers in the state will be classified as employees for some employment law purposes. Although the passage of Proposition 22 was a win for Uber in the employment law sense, from a tax perspective Blue J’s ML algorithm suggests that a representative worker’s chances of being classified as an independent contractor rather than an employee will decrease by 16 percent for drivers who receive a guaranteed minimum base pay and compensation for vehicle expenses. Further, for those workers that receive a healthcare stipend, the predicted likelihood of an independent contractor classification decreased by another 32 percent, despite the previous favorable modifications to the Uber standard form contracts with its workers in 2020.
Because the test for worker classification for federal income tax purposes is, as a formal legal matter, different from the test for worker classification for California employment law purposes, business decisions that may have a favorable outcome in one legal area may have unintended consequences or the opposite effect in the tax law. Drivers who would otherwise be exempt from an employee classification for California employment law purposes, or who would have otherwise been classified as an independent contractor because of the contract changes in 2020, could still be classified as an employee for federal income tax purposes if they also receive a healthcare stipend.
Although Proposition 22 was found to be unconstitutional by the California Superior Court (a decision that is now under appeal),12 this example demonstrates the importance of identifying the facts that have the biggest effect on whether the IRS or the courts will agree with the taxpayer’s tax position during the tax planning stage. This is vital to prevent litigation or audits and reassessments by the IRS. In Uber’s case, our analysis affirms the view that worker classification is highly fact-driven and cannot be determined for all Uber drivers in the same way. A uniform classification for federal tax purposes of all Uber drivers, one way or the other, would likely expose Uber to IRS challenges. Armed with the kind of insights ML models can uncover, practitioners can help their clients select the most appropriate business and tax planning strategy. ML is one more tool that can help practitioners better position their clients to successfully obtain desired outcomes and avoid unintended suboptimal tax consequences.
C. Uncover Blind Spots
Aside from helping practitioners uncover unintended and unfavorable tax consequences, ML can help practitioners uncover their blind spots to improve their chances of successfully obtaining desired tax outcomes.
One example of this is Aspro,13 which is a case now under appeal to the Eighth Circuit and was discussed in our October 2021 installment of Blue J Predicts.14 Aspro sought to deduct from taxable income the money paid to shareholders as “management fees.” To deduct the management fees, under section 162(a)(1), Aspro was required to show that (1) the fees paid to the shareholders were for ordinary and necessary services performed for Aspro by or on behalf of the shareholders, and (2) the fees paid to the shareholders were reasonable in their amounts.
Although these requirements appear straightforward, the threshold for when an expense is deductible is difficult because of the deceptive simplicity of the statutory language. The question whether an expense is “ordinary and necessary” is highly fact-specific. What is ordinary and necessary in one industry is not universally applicable to all industries. In these situations, in which the legal standards are blurry, ML is well adapted to uncovering the common patterns from numerous past decisions to help a practitioner draw appropriate insights from the cases with comparable businesses and expenses. A simple reading of the legislation and regulations is not necessarily enough to provide fully informed advice.
In a situation similar to Aspro, the use of predictive software such as Blue J’s would have revealed the following insights:
An independent analysis should be conducted for the deductibility of fees for different services, rather than treating disparate and unrelated services performed by different entities in the same manner.
The IRS and the courts may assess whether the expense is “customary or usual” by investigating whether the taxpayer’s competitors would likely also incur this type of expense.
Successfully proving that other competitors would also likely incur a similar expense can have a 30 percent effect on the probability that the expense is found to be “ordinary and necessary.”
Being able to present evidence that there is not a substantially more cost-effective alternative to the expense improves a taxpayer’s odds of claiming that the expense is ordinary and necessary.
Uncovering these insights can prevent blind spots that may occur in the tax planning stage. If left unchecked, blind spots can expose the taxpayer to challenges by the IRS and the courts and may result in the taxpayer having access to limited documentation and evidence that would have otherwise supported their position.
D. Optimize Litigation Strategy
Finally, despite the efforts discussed above, a client may still need to appeal a decision or assessment from the IRS or a decision by the court. Although it is relatively straightforward to identify the relevant test that the courts will apply on any given legal issue, it is significantly more challenging to know which aspect of any test to prioritize to unseat the legal analysis being relied on by the government to challenge a particular tax position.
We can infer from the data collected from U.S. tax cases that some arguments are more persuasive than others because some factors of a legal test have a higher correlation with a particular legal outcome than others. On the low end, a particular fact may be of moderate influence and affect the probability of an outcome by less than 1 percent, but on the high end, a factor may affect the probability of an outcome by more than 40 percent. A common example of this occurs in situations in which the IRS and the courts look beyond the language of the agreement between the parties and assess the actions and behavior of the parties to understand the true nature of the arrangement. As such, factors concerning how the parties papered the transaction (such as the formal existence of a written contract) do not tend to weigh as heavily as factors that more closely relate to the parties’ conduct and reasonably inferred intentions.
Tax practitioners who are preparing for litigation can give themselves the added advantage of knowing exactly which facts and circumstances associated with a test are most persuasive to the courts. With this knowledge, practitioners can prioritize their efforts accordingly and avoid spending precious time during oral arguments on factors that have a negligible effect on the overall outcome.
In our June 2021 installment of Blue J Predicts, we analyzed the facts in Cross Refined Coal15 and applied ML to identify the two key factors that the algorithm predicted as being determinative of the appeal.16 Although many arguments were raised on appeal, Blue J’s ML algorithm identified two pivotal factors that are each likely to be independently determinative. In other words, losing on either of those factors will more likely than not cause the IRS to lose its appeal based on the particular set of facts in Cross Refined Coal.
In addition to uncovering the pivotal factors in a given situation, ML algorithms can go one step further: They can identify how significant a given fact is in relation to the existence or absence of other facts. ML algorithms are capable of dynamically assessing the data independently and in conjunction with other data points from hundreds of past decisions. There is no static formula that the courts use to decide cases, and thus the ML algorithms also dynamically assess the variables. This is part of why ML algorithms, unlike traditional checklists, are so helpful in providing actionable insight.
This last point was discussed in our August 2021 installment of Blue J Predicts, in which we analyzed the facts in Reserve Mechanical,17 which involved whether the taxpayer was a bona fide insurance company.18 Our ML algorithm identified that although the use of actuarial methods to set premiums is typically a significant factor in determining whether a taxpayer operated as a bona fide insurance company, in situations involving a circular flow of funds between the insurer and insured, the impact of using actuarial methods is reduced.
V. Conclusion
Even though the strong forms of AI that approach human-level intelligence — AGI — are likely to be decades away, tax practitioners are increasingly able to leverage ML models that provide valuable insight into hidden patterns that will allow them to “crack the code.” Throughout 2022, we look forward to using ML models to elucidate the hidden, surprising, and often controversial patterns in the case law to show how using data-backed machine-learning models is the newest tool in a sophisticated tax practitioner’s toolbox.
FOOTNOTES
1 See Vincent C. Müller and Nick Bostrom, “Future Progress in Artificial Intelligence: A Survey of Expert Opinion,” in Fundamental Issues of Artificial Intelligence 553-571 (2016).
2 See David Silver et al., “Reward Is Enough,” 299 Artificial Intelligence (Oct. 2021).
3 See, e.g., Blaise Aguera y Arcas, “Do Large Language Models Understand Us?” Medium.com (Dec. 16, 2021).
4 See Season 4 Episode 4 of “Silicon Valley,” “Team Building Exercise,” HBO, May 14, 2017. For the relevant clip, see YouTube, “Silicon Valley: Season 4 Episode 4: Not Hotdog (HBO),” May 16, 2017.
5 For more about the curious origin of this term, see Wikipedia, “Garbage In, Garbage Out.”
6 To learn more about Blue J’s process, see Samantha Santoro, “Human-Centred Artificial Intelligence,” Blue J Legal.
7 Ernest S. Ryder & Associates Inc. v. Commissioner, T.C. Memo. 2021-88.
8 Benjamin Alarie and Kathrin Gardhouse, “Battling Uphill Against the Assignment of Income Doctrine: Ryder,” Tax Notes Federal, Nov. 29, 2021, p. 1253.
9 Alarie and Gardhouse, “Predicting Worker Classification in the Gig Economy,” Tax Notes Federal, Dec. 20, 2021, p. 1733.
10 James v. Uber Technologies Inc., No. 3:19-cv-06462, at 14 (N.D. Cal. 2021).
11 Dynamex Operations West Inc. v. Superior Court of Los Angeles, 416 P.3d 1 (Cal. 2018).
12 Castellanos v. California, No. RG21088725 (Cal. Super. Ct. 2021).
13 Aspro Inc. v. Commissioner, T.C. Memo. 2021-8.
14 Alarie and Christopher Yan, “Would Management Fees by Any Other Name Still Be Deductible?” Tax Notes Federal, Oct. 25, 2021, p. 499.
15 Cross Refined Coal LLC v. Commissioner, No. 19502-17 (2019) (bench opinion).
16 Alarie, Bettina Xue Griffin, and Yan, “An Unprofitable Pretax Venture Can Still Be a Partnership,” Tax Notes Federal, June 21, 2021, p. 1951.
17 Reserve Mechanical Corp. v. Commissioner, T.C. Memo. 2018-86.
18 Alarie and Griffin, “Captive Insurance Appeal in Reserve Mechanical Will Likely Fail,” Tax Notes Federal, Aug. 30, 2021, p. 1431.
END FOOTNOTES