Legal Obstacles to Implementation of the AI Executive Order

About a month ago, I published an initial analysis of a leaked draft of an AI-related executive order that was rumored to be forthcoming. For a few weeks thereafter, it looked as if the draft might not actually make it past the President’s desk, or as if the final version of the executive order might be substantially altered from the aggressive and controversial draft version. On December 11, 2025, however, President Trump signed an executive order that substantially resembled the leaked draft.

Because the executive order (EO) is virtually identical to the leaked draft in terms of its substance, my analysis of the legal issues raised by that draft remains applicable. But since I published that first analysis on November 20, LawAI has had a chance to conduct further research into some of the questions that I wasn’t able to definitively resolve in that first commentary. Additionally, intervening events have provided some important context for understanding what the consequences of the executive order will be for AI policy in the U.S. Accordingly, I’ve decided to publish this updated commentary, which incorporates most of the previous piece’s analysis as well as the results of subsequent research.

What Does the Executive Order Purport to Do?

As an initial matter, it’s important to understand what an executive order is and what legal effect executive orders have in the United States. An executive order is not a congressionally enacted statute or “law.” While Congress undoubtedly has the authority to preempt some state AI laws by passing legislation, the President generally cannot unilaterally preempt state laws by presidential fiat (nor does the EO purport to do so). What an executive order can do is to publicly announce the policy goals of the executive branch of the federal government, and announce directives from the President to executive branch officials and agencies. So, contrary to what some headlines seem to suggest, the EO does not, and could not, preempt any state AI laws. It does, however, instruct a number of actors within the executive branch to take various actions intended to make it easier to preempt state AI laws in the future, or to make it more difficult for states to enact or enforce AI laws that are inconsistent with the White House’s policy positions. 

It’s also worth noting that the EO’s title, “Ensuring a National Policy Framework for Artificial Intelligence,” should not be taken at face value. Because the idea of passing federal AI policy is considerably more popular than the idea of preventing states from enacting AI policy, the use of the term “federal framework” or some equivalent phrase as a euphemism for preemption of state AI laws has become something of a trend among preemption advocates in recent months, and the EO is no exception. While the EO does discuss the need for Congress to pass a national policy framework for AI, and while sections 6 and 8 do contemplate the creation of affirmative federal policies, the EO’s primary goal is clearly the elimination of undesirable state AI laws rather than the creation of federal policy. 

The EO discussed in this commentary, which is titled “Ensuring A National Policy Framework For Artificial Intelligence,” is relatively short, clocking in at just under 1400 words, and consists of nine sections. This commentary summarizes the EO’s content, discusses the differences between the final version and the draft that leaked in November, and then briefly analyzes a few of the most important legal issues raised by the EO. This commentary is not intended to be comprehensive, and LawAI may publish additional commentaries and/or updates as events progress and additional legal issues come to light.

The EO’s nine sections are:

How Does the Published Executive Order Differ from the Draft Version that Was Leaked in November?

As noted above, the EO is extremely substantively similar to the draft that leaked in November. There are, however, a number of sentence-level changes, most of which were presumably made for reasons of style and clarity. The published EO also includes a few changes that are arguably significant for signaling reasons—that is, because of what they seem to say about the White House’s plan for implementing the EO. 

Most notably, Section 1 (the discussion of the EO’s “Purpose”) has been toned down in a few different ways. The initial draft specifically criticized both Colorado’s controversial algorithmic discrimination law and California’s Transparency in Frontier Artificial Intelligence Act (also known as SB 53), and dismissively referred to the “purely speculative suspicion that AI might ‘pose significant catastrophic risk.’” The leaked draft also suggested that “sophisticated proponents of a fear-based regulatory capture strategy” were responsible for these laws. The published version still criticizes the Colorado law, but does not contain any reference to SB 53, catastrophic risk, or regulatory capture. In light of this revision, it’s possible that SB 53—which is, by most accounts, a light-touch, non-burdensome transparency law that merely requires developers to create safety protocols of the sort that every frontier developer already creates and publishes—will not be identified as an “onerous” state AI law and targeted pursuant to the EO’s substantive provisions. To be clear, I think it’s still quite likely that SB 53 and similar transparency laws like New York’s RAISE Act will be targeted, but the removal of the explicit reference reduces the likelihood of that from “virtually certain” to “merely probable.” 

This change seems like a win, albeit a minor one, for “AI safety” types and others who worry about future AI systems creating serious risks to national security and public safety. The AI Action Plan that the White House released in July seemed to take the prospect of such risks quite seriously, so the full-throated dismissal in the leaked draft would have been a significant change of course.

The published EO also throws a bone to child safety advocates, which may also be significant for signaling reasons. It was somewhat surprising that the leaked draft did not contain any reference to child safety, because child safety is an issue that voters and activists on both sides of the aisle care about a great deal. The political clout wielded by child safety advocates is such that Republican-led preemption efforts have typically included some kind of explicit carve-out or concession on the issue. For example, the final revision of the moratorium that ended up getting stripped out of the Big Beautiful Bill in late June attempted to carve out an exception for state laws relating to “child online safety,” and Dean Ball’s recent preemption proposal similarly attempts to placate child safety advocates by acknowledging the importance of the issue and adding transparency requirements specifically intended to protect children.

The published EO mentions children’s safety in § 1 as one of the issues that federal AI legislation should address. And § 8, the “legislative proposal” section, states that the proposal “shall not propose preempting otherwise lawful State AI laws relating to… child safety protections.” Much has been made of this carve-out in the online discourse, and it does seem important for signaling reasons. If the White House’s legislative proposal won’t target child safety laws, it seems reasonable to suggest that other White House efforts to eliminate certain state AI laws might steer clear of child safety laws as well. However, it’s worth noting that the exception in § 8 applies only to § 8, and not to the more important sections of the EO such as § 3 and § 5. 

This leaves open the possibility that the Litigation Task Force might sue states with AI-related child safety laws, or that federal agencies might withhold discretionary grant funds from such states. Some right-leaning commentators have suggested that this is not a realistic possibility, because federal agencies will use their discretion to avoid going after child safety laws regardless of whether the EO specifically requires them to do so. It should be noted, however, that the category of “child safety laws” is broad and poorly defined, and that many of the state laws that the White House most dislikes could be reframed as child safety laws or amended to focus on child safety. In other words, a blanket policy of leaving “child safety” laws alone may not be feasible, or may not be attractive to the White House.

As for § 8 itself, a legislative proposal is just that—a proposal. It has no legal effect unless it is enacted into law by Congress. Congress can simply not enact the proposal into law—and given how rare it is for federal legislation (even legislation supported by the President) to actually be enacted, this is by far the most likely outcome. The White House has already thrown its weight behind a number of preemption-related legislative proposals in the past, and so far none of these proposals have managed to make it through Congress. It’s possible that the legislative proposal contemplated in § 8 will fare better, but the odds are not good. In my opinion, therefore, the child safety exception in § 8 is significant mostly because of what it tells us about the administration’s policy preferences rather than because of anything that it actually does. 

The final potentially significant change relates to § 5(b), which contemplates withholding federal grant funding from states that regulate AI. In the leaked draft, that section directed federal agencies to review their discretionary grants to see if any could lawfully be withheld from states with AI laws designated as “onerous.” The published EO directs agencies to do the same, but directs them to do so “in consultation with” AI czar David Sacks. It remains to be seen whether this change will mean anything in practice—David Sacks is one man, and a very busy man at that, and may not have the staffing support that would realistically be needed to meaningfully review every agency’s response to the EO. But whatever that “in consultation with” ends up meaning in practice, it seems plausible to suggest that the change may at least marginally increase agencies’ willingness to withhold fund.

Issue 1: The Litigation Task Force 

The EO’s first substantive section, § 3, would instruct the U.S. Attorney General to “establish an AI Litigation Task Force” charged with bringing lawsuits in federal court to challenge allegedly unlawful state AI laws. The EO suggests that the Task Force will challenge state laws that allegedly violate the dormant commerce clause and state laws that are allegedly preempted by existing federal regulations. The Task Force is also authorized to challenge state AI laws under any other legal basis that the Department of Justice (DOJ) can identify.

Dormant commerce clause arguments

Presumably, the EO’s reference to the commerce clause refers to the dormant commerce clause argument laid out by Andreessen Horowitz in September 2025. This argument, which a number of commentators have raised in recent months, suggests that certain state AI laws violate the commerce clause of the U.S. Constitution because they impose excessive burdens on interstate commerce.

LawAI’s analysis indicates that this commerce clause argument, at least with respect to the state laws most commonly cited as potential preemption targets, is legally dubious and unlikely to succeed in court. We intend to publish a more thorough analysis of this issue in the coming weeks in addition to the overview included here. 

In 2023, the Supreme Court issued an important dormant commerce clause opinion in the case of National Pork Producers Council v. Ross. The thrust of the majority opinion in that case, authored by Justice Gorsuch, is that state laws generally do not violate the dormant commerce clause unless they involve purposeful discrimination against out-of-state economic interests in order to favor in-state economic interests. 

Even proponents of this dormant commerce clause argument typically acknowledge that the state AI laws they are concerned with generally do not discriminate against out-of-state economic interests. Therefore, they often ignore Ross, or cite the dissenting opinions while ignoring the majority. Their preferred precedent is Pike v. Bruce Church, Inc., a 1970 case in which the Supreme Court held that a state law with “only incidental” effects on interstate commerce does not violate the dormant commerce clause unless “the burden imposed on such commerce is clearly excessive in relation to the putative local benefits.” This standard opens the door for potential challenges to nondiscriminatory laws that arguably impose a “clearly excessive” burden on interstate commerce. 

The state regulation that was invalidated in Pike would have required cantaloupes grown in Arizona to be packed and processed in Arizona as well. The only state interest at stake was the “protect[ion] and enhance[ment] of [cantaloupe] growers within the state.” The Court in Pike specifically acknowledged that “[w]e are not, then, dealing here with state legislation in the field of safety where the propriety of local regulation has long been recognized.” 

Even under Pike, then, it’s hard to come up with a plausible argument for invalidating the state AI laws that preemption advocates are most concerned with. Andreessen Horowitz’s argument is that the state proposals in question, such as New York’s RAISE Act, “purport to have significant safety benefits for their residents,” but in fact “are unlikely” to provide substantial safety benefits. But this is, transparently, a policy judgment, and one with which the state legislature of New York evidently disagrees. As Justice Gorsuch observes in Ross, “policy choices like these usually belong to the people and their elected representatives. They are entitled to weigh the relevant ‘political and economic’ costs and benefits for themselves, and ‘try novel social and economic experiments’ if they wish.” New York voters overwhelmingly support the RAISE Act, as did an overwhelming majority of New York’s state legislature when the bill was put to a vote. In my opinion, it is unlikely that any federal court will presume to override those policy judgments and substitute its own.

That said, it is possible to imagine a state AI law that would violate the dormant commerce clause. For example, a law that placed burdensome requirements on out-of-state developers while exempting in-state developers, in order to grant an advantage to in-state AI companies, would likely be unconstitutional. Since I haven’t reviewed every state AI bill that has been or will be proposed, I can’t say for sure that none of them would violate the dormant commerce clause. It is entirely possible that the Task Force will succeed in invalidating one or more state laws via a dormant commerce clause challenge. It does seem relatively safe, however, to predict that the specific laws referred to in the executive order and the state frontier AI safety laws most commonly referenced in discussions of preemption would likely survive any dormant commerce clause challenges brought against them.

State laws preempted by existing federal regulations

Section 3 also specifically indicates that the AI Litigation Task Force will challenge state laws that “are preempted by existing Federal regulations.” It is possible for state laws to be preempted by federal regulations, and, as with the commerce clause issue discussed above, it’s possible that the Task Force will eventually succeed in invalidating some state laws by arguing that they are so preempted. 

In the absence of significant new federal AI regulation, however, it is doubtful whether many of the state laws the EO is intended to target will be vulnerable to this kind of legal challenge. Moreover, any state AI law that created significant compliance costs for companies and was plausibly preempted by existing federal regulations could be challenged by the affected companies, without the need for DOJ intervention. The fact that (to the best of my knowledge) no such lawsuit has yet been filed challenging the most notable state AI laws indicates that the new Task Force will likely be faced with slim pickings, at least until new federal regulations are enacted and/or state regulation of AI intensifies.

Alternative grounds

Section 3 also authorizes the Task Force to challenge state AI laws that are “otherwise unlawful” in the Attorney General’s judgment. The Department of Justice employs a great number of smart and creative lawyers, so it’s impossible to say for sure what theories they might come up with to challenge state AI laws. That said, preemption of state AI laws has been a hot topic for months now, and the best theories that have been publicly floated for preemption by executive action are the dormant commerce clause and Communications Act theories discussed above. This is, it seems fair to say, a bearish indicator, and I would be somewhat surprised if the Task Force managed to come up with a slam dunk legal argument for broad-based preemption that has hitherto been overlooked by everyone who’s considered this issue.

Issue 2: Restrictions on State Funding

Section 5 of the EO contains two subsections directing agencies to withhold federal grant funding from states that attempt to regulate AI. Subsection (a) indicates that Commerce will attempt to withhold non-deployment Broadband Equity Access and Deployment (BEAD) funding “to the maximum extent allowed by federal law” from states with AI laws listed pursuant to § 4 of the EO, which instructs the Department of Commerce to identify state AI laws that conflict with the policy directives laid out in § 1 of the EO. Subsection (b) instructs all federal agencies to assess their discretionary grant programs and determine whether existing or future grants can be withheld from states with AI laws that are challenged under § 3 or identified as undesirable pursuant to § 4. 

In my view, § 5 of the EO is the provision with the most potential to affect state AI legislation. While § 5 does not attempt to actually preempt state AI laws, the threat of losing federal grant funds could have the practical effect of incentivizing some states to abandon their AI-related legislative efforts. And, as Daniel Cochrane and Jack Fitzhenry pointed out during the reconciliation moratorium fight, “Smaller conservative states with limited budgets and large rural populations need [BEAD] funds. But wealthy progressive states like California and New York can afford to take a pass and just keep enforcing their tech laws.” While politicians in deep blue states will be politically incentivized to fight the Trump administration’s attempts to preempt overwhelmingly popular AI laws even if it means losing access to some federal funds, politicians in red states may instead be incentivized to avoid conflict with the administration. 

Section 5(a): Non-deployment BEAD funding

Section 5(a) of the EO is easier to analyze than § 5(b), because it clearly identifies the funds that are in jeopardy—non-deployment BEAD funding. The BEAD program is a $42.45 billion federal grant program established by Congress in 2021 for the purpose of facilitating access to reliable, high-speed broadband internet for communities throughout the U.S. A portion of the $42.45 billion total was allocated to each of 56 states and territories in June 2023 by the National Telecommunications and Information Administration (NTIA). In June 2025, the NTIA announced a restructuring of the BEAD program that eliminated many Biden-era requirements and rescinded NTIA approval for all “non-deployment” BEAD funding, i.e., BEAD funding that states intended to spend on uses other than actually building broadband infrastructure. The total amount of BEAD funding that will ultimately be classified as “non-deployment” is estimated to be more than $21 billion. 

BEAD funding was previously used as a carrot and stick for AI preemption in June 2025, as part of the effort to insert a moratorium or “temporary pause” on state AI regulation into the most recent reconciliation bill. There are two critical differences between the attempted use of BEAD funding in the reconciliation process and its use in the EO, however. First, the EO is, obviously, an executive order rather than a legislative enactment. This matters because agency actions that would be perfectly legitimate if authorized by statute can be illegal if undertaken without statutory authorization. And secondly, while the final drafts of the reconciliation moratorium would only have jeopardized BEAD funding belonging to states that chose to accept a portion of $500 million in additional BEAD funding that the reconciliation bill would have appropriated, the EO would jeopardize non-deployment BEAD funding belonging to any state that attempts to regulate AI in a manner deemed undesirable under the EO.

The multibillion-dollar question here is: can the administration legally withhold BEAD funding from states because those states enact or enforce laws regulating AI? Unsatisfyingly enough, the answer to this question for now seems to be “no one knows for sure.” Predicting the outcome of a future court case that hasn’t been filed yet is always difficult, and here it’s especially difficult because it’s not clear exactly how the NTIA will go choose to implement § 5(a) in light of the EO’s requirement to withhold funds only “to the maximum extent allowed by federal law.” That said, there is some reason to believe that states would have at least a decent chance of prevailing if they sued to prevent NTIA from withholding funds from AI-regulating states. 

The basic argument against what the EO asks NTIA to do is simply that Congress provided a formula for allocating BEAD program funds to states, and did not authorize NTIA to withhold those congressionally allocated funds from states in order to vindicate unrelated policy goals. The EO anticipates this argument and attempts to manufacture a connection between AI and broadband by suggesting that “a fragmented State regulatory landscape for AI threatens to undermine BEAD-funded deployments, the growth of AI applications reliant on high-speed networks, and BEAD’s mission of delivering universal, high-speed connectivity.” In my view, this is a hard argument to take seriously. It’s difficult to imagine any realistic scenario in which (for example) laws imposing transparency requirements on AI companies would have any significant effect on the ability of internet providers to build broadband infrastructure. Still, the important question is not whether NTIA has the statutory authority to withhold funds as the EO contemplates, but rather whether states will be able to actually do anything about it. 

The Trump administration’s Department of Transportation (DOT) recently attempted a maneuver similar to the one contemplated in the § 5 when, in response to an executive order directing agencies to “undertake any lawful actions to ensure that so-called ‘sanctuary’ jurisdictions… do not receive access to federal funds,” the DOT attempted to add conditions to all DOT grant agreements requiring grant recipients to cooperate in the enforcement of federal immigration law. Affected states promptly sued to challenge the addition of this grant condition and successfully secured a preliminary injunction prohibiting DOT from implementing or enforcing the conditions. In early November 2025, the federal District Court for the District of Rhode Island ruled that the challenged conditions were unlawful for three separate reasons: (1) imposing the conditions exceeded the DOT’s statutory authority under the laws establishing the relevant grant programs; (2) imposing the conditions was “arbitrary and capricious,” in violation of the Administrative Procedure Act; and (3) imposing the conditions violated the Spending Clause of the U.S. Constitution. It remains to be seen whether the district court’s ruling will be upheld by a federal appellate court and/or by the U.S. Supreme Court.

The lawsuit described above should give you some idea of what to expect from a lawsuit challenging NTIA withholding of BEAD funds. It’s likely that states would make both statutory and constitutional arguments; in fact, they might even make spending clause, APA, and ultra vires (i.e., exceeding statutory authority) arguments similar to the ones discussed above. However, there are important differences between the executive actions that gave rise to that DOT case and the actions contemplated by § 5(a). For one thing, 47 U.S.C. § 1702(o) exempts the NTIA’s BEAD-related decisions from the requirements of the APA, meaning that it will likely be harder for states to challenge NTIA actions as being arbitrary and capricious. For another, 47 U.S.C. § 1702(n) dictates that all lawsuits brought under BEAD’s authorizing statute to challenge NTIA’s BEAD decisions will be subject to a standard of review that heavily favors the government. Essentially, this standard of review says that the NTIA’s decisions can’t be overturned unless they’re the result of corruption, fraud, or misconduct. 

This overview isn’t the place to get too deep into the weeds on the question of whether and how states might be able to get around these statutory hurdles. Suffice it to say that there are plausible arguments to be made on both sides of the debate. For example, courts sometimes hold that a lawsuit arguing that an agency’s actions are in excess of its statutory authority do not arise “under” the statute in question and are therefore not subject to the statute’s standard of review (although this is a narrow exception to the usual rule).

Suppose that, in the future, the Department of Commerce decides to withhold non-deployment BEAD funding from states with AI laws deemed undesirable under the EO. States could challenge this decision in court and ask the court to order NTIA to release the previously allocated non-deployment funds, arguing that the withholding of funds exceeded NTIA’s authority under the statute authorizing BEAD and violated the Spending Clause. Each of these arguments seems at least somewhat plausible, on an initial analysis. Nothing in the statute authorizing BEAD appears to give the federal government unlimited discretion to withhold BEAD funds to vindicate policy goals that have little or nothing to do with access to broadband, and the course of action proposed in the EO is, arguably, impermissibly coercive in violation of the Spending Clause.

AI regulation is a less politically divisive issue than immigration enforcement, and a cynical observer might assume that this would give states in this hypothetical AI case a better chance on appeal than the states in the DOT immigration conditions case discussed above. However, the statutory hurdles discussed above may make it harder for states to prevail than it was in the DOT conditions case. It should also be noted that, regardless of whether or not states could eventually prevail in a hypothetical lawsuit, the prospect of having BEAD funding denied or delayed, perhaps for years, could be enough to discourage some states from enacting AI legislation of a type disfavored by the Department of Commerce under the EO.

Section 5(b): Other discretionary agency funding

In addition to withholding non-deployment BEAD funding, the EO would instruct agencies throughout the executive branch to assess their discretionary grant programs and determine whether discretionary grants can legally be withheld from states that have AI laws that “conflict[] with the policy of this order.” 

The legality of this contemplated course of action, and its likelihood of being upheld in court, is even more difficult to conclusively determine ex ante than the legality and prospects of the BEAD withholding discussed above. The federal government distributes about a trillion dollars a year in grants to state and local governments, and more than a quarter of that money is in the form of discretionary grants (as opposed to grants from mandatory programs such as Medicaid). That’s a lot of money, and it’s broken up into a lot of different discretionary grants. It seems safe to predict that most discretionary grants will not be subject to withholding, since the one thing that all potential candidates will have in common is that Congress did not anticipate that they would be withheld in order to prevent state AI regulation. But depending on the amount of discretion Congress conferred on the agency in question, it may be that some grants can be withheld for almost any reason or for no reason at all. There may also be some grants that legitimately relate to the tech deregulation policy goals the administration is pursuing here. 

It’s likely that many of the arguments against withholding grant money from AI-regulating states will be the same from one grant to another—as discussed above in the context of § 5(a), states will likely argue that withholding grant funds to vindicate unrelated policy goals violates the Spending Clause, exceeds the agency’s statutory authority, and violates the Administrative Procedure Act. These arguments will be stronger with respect to some grants and weaker with respect to others, depending on factors such as the language of the authorizing statute and the purpose for which the grant was to be awarded. At this point, therefore, there’s no way to know for sure how much money the federal government will attempt to withhold and how much (if any) it will actually succeed in withholding. Nor is it clear which states will resort to litigation and which the administration will succeed in pressuring into giving up their AI regulations without a fight. Unlike many other provisions of the EO, § 5(b) does not contain a deadline by which agencies must complete their review, so it’s possible that we won’t have a fuller picture of which grants will be in danger for many months. 

Issue 3: Federal Reporting and Disclosure Standard

Section 6 of the EO instructs the FCC, in consultation with AI czar David Sacks, to “initiate a proceeding to determine whether to adopt a Federal reporting and disclosure standard for AI models that preempts conflicting State laws.” It’s likely that the “conflicting state laws” referred to include state AI transparency laws such as California’s SB 53 and New York’s RAISE Act. It’s not clear from the language of the EO what legal authority this “Federal Reporting and Disclosure Standard” would be promulgated under. Under the Biden administration, the Department of Commerce’s Bureau of Industry and Security (BIS) controversially attempted to impose reporting requirements on frontier model developers under the information-gathering authority provided by § 705 of the Defense Production Act—but § 705 has historically been used by BIS rather than the FCC, and I am not aware of any comparable authority that would authorize the FCC to implement a mandatory “federal reporting and disclosure standard” for AI models. 

Generally, regulatory preemption can only occur when Congress has granted an executive-branch agency authority to promulgate regulations and preempt state laws inconsistent with those regulations. This authority can be granted expressly or by implication, but the FCC has never before asserted that it possesses any significant regulatory authority (express or otherwise) over any aspect of AI development. It’s possible that the FCC is relying on a creative interpretation of its authority under the Communications Act—after the AI Action Plan discussed the possibility of FCC preemption, FCC Chairman Brendan Carr indicated that the FCC was “taking a look” at whether the Communications Act grants the FCC authority to regulate AI and preempt onerous state laws. However, commentators who have researched this issue and experts on the FCC’s legal authorities almost universally agree that “[n]othing in the Communications Act confers FCC authority to regulate AI.” 

The fundamental obstacle to FCC preemption of state AI laws is that the Communications Act authorizes the FCC to regulate telecommunications services, and AI is not a telecommunications service. In the past, the FCC has sometimes suggested expansive interpretations of the Communications Act in order to claim more regulatory territory for itself, but claiming broad regulatory authority over AI would be significantly more ambitious than these (frequently unsuccessful) prior attempts. Moreover, this kind of creative reinterpretation of old statutes to create new agency authorities is much harder to get past a court today than it would have been even ten years ago, because of Supreme Court decisions eliminating Chevron deference and establishing the major questions doctrine. In his comprehensive policy paper on FCC preemption of state AI laws, Lawrence J. Spiwak (a staunch supporter of preemption) analyzes the relevant precedents and concludes that “given the plain language of the Communications Act as well as the present state of the caselaw, it is highly unlikely the FCC will succeed in [AI preemption] efforts” and that “trying to contort the Communications Act to preempt the growing patchwork of disparate state AI laws is a Quixotic exercise in futility.” Harold Feld of Public Knowledge essentially agrees with this assessment in his piece on the same topic.

Issue 4: Preemption of state laws for “deceptive practices” under the FTC Act

Section 7 of the EO directs the Federal Trade Commission (FTC) to issue a policy statement arguing that certain state AI laws are preempted by the FTC Act’s prohibition on deceptive commercial practices. Presumably, the laws which the EO intends for this guidance to target include Colorado’s AI Act, which the EO’s Purpose section accuses of “forc[ing] AI models to produce false results in order to avoid a ‘differential treatment or impact’” on protected groups, and other similar “algorithmic discrimination” laws. A policy statement on its own generally cannot preempt state laws, but it seems likely that the policy statement that the EO instructs the FTC to create would be relied upon in subsequent preemption-related regulatory efforts and/or by litigants seeking to prevent enforcement of the allegedly preempted laws in court. 

While the Trump administration has previously expressed disapproval of “woke” AI development practices, for example in the recent executive order on “Preventing Woke AI in the Federal Government,” this argument that the FTC Act’s prohibition on UDAP (unfair or deceptive acts or practices in or affecting commerce) preempts state algorithmic discrimination laws is, as far as I am aware, new. During the Biden administration, Lina Khan’s FTC published guidance containing an arguably similar assertion: that the “sale or use of—for example—racially biased algorithms” would be an unfair or deceptive practice under the FTC Act. Khan’s FTC did not, however, attempt to use this aggressive interpretation of the FTC Act as a basis for FTC preemption of any state laws. In fact, as far as I can tell, the FTC has never used the FTC Act’s prohibition on deceptive acts or practices to preempt state civil rights or consumer protection laws, no matter how misguided, meaning that the approach contemplated by the EO appears to be totally unprecedented

Colorado’s AI law, SB 24-205, has been widely criticized, including by Governor Jared Polis (who signed the act into law) and other prominent Colorado politicians. In fact, the law has proven so problematic for Colorado that Governor Polis, a Democrat, was willing to cross party lines in order to support broad-based preemption of state AI laws for the sake of getting rid of Colorado’s. Therefore, an attempt by the Trump administration to preempt Colorado’s law (or portions thereof) might meet with relatively little opposition from within Colorado. It’s not clear who, if anyone, would have standing to challenge FTC preemption of Colorado’s law if Colorado’s attorney general refused to do so. But Colorado is not the only state with a law prohibiting algorithmic discrimination, and presumably the guidance the EO instructs the FTC to produce would inform attempts to preempt other “woke” state AI laws as well as Colorado’s. 

If the matter did go to court, however, it seems likely that states would prevail. As bad as Colorado’s law may be (and, personally, I think it’s a pretty bad law) it’s very difficult to plausibly argue that it, or any similar state algorithmic discrimination law, requires any “deceptive act or practice affecting commerce.” The Colorado law requires developers and deployers of certain AI systems to use “reasonable care” to protect consumers from “algorithmic discrimination.” It also imposes a headache-inducing laundry list of documentation and record-keeping requirements on developers and deployers, which mostly relate to documenting efforts to avoid algorithmic discrimination. But, crucially, none of the law’s requirements appear to dictate that any AI output has to be untruthful—and regardless, creating an untruthful output need not be a “deceptive act or practice” under the FTC Act if the company provides consumers with enough information to ensure that they will not be deceived by the untruthful output.

“Algorithmic discrimination” is defined in the Colorado law to mean “any condition in which the use of an artificial intelligence system results in an unlawful differential treatment or impact that disfavors an individual or group of individuals on the basis of their actual or perceived [list of protected characteristics].” Note that only “unlawful” discrimination qualifies. FTC precedents establish that a deceptive act or practice occurs when there is a material representation, omission, or practice that is likely to mislead a consumer acting reasonably under the circumstances. The EO’s language seems to ask the FTC to argue that the prohibition on “differential impacts” will in practice require untruthful outputs because it prohibits companies from acknowledging the reality of group differences. But since only “unlawful” differential impacts are prohibited by the Colorado law, the only circumstance in which the Colorado law could be interpreted to require untruthful outputs is if some other valid and existing law already required such an output. And, again, even a requirement that did in practice encourage the creation of untruthful outputs would not necessarily result in “deception,” especially given the extensive disclosure requirements that the Colorado law includes.

Treaty-following AI

I. Introduction

If AI systems might be made to follow laws,[ref 1] does that mean that they could also follow the legal text in international agreements? Could “Treaty-Following AI” (TFAI) agents—designed to follow their principals’ instructions except where those entail actions that violate the terms of a designated treaty—help robustly and credibly strengthen states’ compliance with their adopted international obligations?

Over time, what would a framework of treaty-following AI agents aligned to AI-guiding treaties imply for the prospects of new treaties specific to powerful AI (‘advanced AI agreements’), for state compliance with existing treaty instruments in many other domains, and for the overall role and reach of binding international treaties in the brave new “intelligence age”?[ref 2]  

These questions are increasingly salient and urgent. As AI investment and capability progress continues apace,[ref 3] so too does the development of ever more capable AI models, including those that can act coherently as “agents” to carry out many tasks of growing complexity.[ref 4] AI agents have been defined in various ways,[ref 5] but they can be practically considered as those AI systems which can be instructed in natural language and then act autonomously, which are capable of pursuing difficult goals in complex environments without detailed (follow-up) instruction, and which are capable of using various affordances or design patterns such as tool use (e.g., web search) or planning.[ref 6] 

To be sure, AI agents today vary significantly in their level of sophistication and autonomy.[ref 7] Many of these systems still face limits to their performance, coherence over very long time horizons,[ref 8] robustness in acting across complex environments,[ref 9] and cost-effectiveness,[ref 10] amongst other issues.[ref 11] It is important and legitimate to critically scrutinize the time frame or the trajectory on which this technology will come into its own.

Nonetheless, a growing number of increasingly agentic AI architectures are available;[ref 12] they are seeing steadily wider deployment by AI developers and startups across many domains;[ref 13] and, barring sharp breaks in or barriers to progress, it will not be long before the current pace of progress yields increasingly more capable and useful agentic systems, including, eventually, “full AI agents”, which could be functionally defined as systems “that can do anything a human can do in front of a computer.”[ref 14] Once such systems come into reach, it may not be long before the world sees thousands or even many millions of such systems autonomously operating daily across society,[ref 15] with very significant impacts across all spheres of human society.[ref 16]   

Far from being a mirage,[ref 17] then, the emergence and proliferation of increasingly agentic AI systems is a phenomenon of rapidly increasing societal importance[ref 18]—and of growing social, ethical and legal concern.[ref 19] After all, given their breadth of use cases—ranging from functions such as at-scale espionage campaigns, intelligence synthesis and analysis,[ref 20] military decision-making,[ref 21] cyberwarfare, economic or industrial planning, scientific research, or in the informational landscape—AI agents will likely impact all domains of states’ domestic and international security and economic interests. 

As such, even if their direct space of actions remains merely constrained to the digital realm (rather than to robotic platforms), these AI agents could create many novel global challenges, not just for domestic citizens and consumers, but also for states and for international law. These latter risks include new threats to international security or strategic stability,[ref 22] broader geopolitical tensions and novel escalation risks,[ref 23] significant labour market disruptions and market-power concentration,[ref 24] distributional concerns and power inequalities,[ref 25] domestic political instability[ref 26] or legal crises;[ref 27] new vectors for malicious misuse, including in ways that bypass existing safeguards (such as on AI–bio tools);[ref 28] and emerging risks that these agents act beyond their principals’ intent or control.[ref 29] 

Given these prospects, it may be desirable for states at the frontier of AI development to strike new (bilateral, minilateral or multilateral) international agreements to address such challenges—or for ‘middle powers’ to make access to their markets conditional on AI agents complying with certain standards—in ways that assure the safe and stabilizing development, deployment, and use of advanced AI systems.[ref 30] Let us call these advanced AI agreements. 

Significantly, even if negotiations on advanced AI agreements were initiated on the basis of genuine state concern and entered into in good will—for instance, in the wake of some international crisis involving AI[ref 31]—there would still be key hurdles to their success. That is, these agreements would likely still face a range of challenges both old and new. In particular, they might face challenges around (1) the intrusiveness of monitoring state activities in deploying and directing their AI agents in order to ensure treaty compliance; (2) the continued enforcement of initial AI benefit-sharing promises; or (3) the risk that AI agents would, whether by state order or not, exploit legal loopholes in the treaty. Such challenges are significant obstacles to advanced AI agreements; and they will need to be addressed if such AI treaties are to be politically viable, effective, and robust as the technology advances.

Even putting aside novel treaties for AI, the rise of AI agents is also likely to put pressure on many existing international treaties (or future ones, negotiated in other domains), especially as AI agents will begin to be used in domains that affect their operation. This underscores the need for novel kinds of cooperative innovations—new mechanisms or technologies by which states can make commitments and assure (their own; and one another’s) compliance with international agreements. Where might such solutions be found?

Recent work has proposed a framework for “Law-Following AI” (LFAI) which, if successfully adapted to the international level, could offer a potential model for how to address these global challenges.[ref 32] If, by analogy, we can design AI agents themselves to be somehow “treaty-following”—that is, to generally follow their principals’ instructions loyally but to refuse to take actions that violate the terms of a designated treaty—this would greatly strengthen the prospects for advanced AI agreements,[ref 33] as well as strengthen the integrity of other international treaties that might otherwise come under stress from the unconstrained activities of AI agents. 

Far from a radical, unprecedented idea, the notion of treaty-following AI—and the basic idea of ensuring that AI agents autonomously follow states’ treaty commitments under international law—draws on many established traditions of scholarship in cyber, technology, and computational law;[ref 34] on recent academic work exploring the role of legal norms in the value alignment for advanced AI;[ref 35] as well as a longstanding body of international legal scholarship aimed at the control of AI systems used in military roles.[ref 36] It also is consonant with recent AI industry safety techniques which seek to align the behaviour of AI systems with a “constitution”[ref 37] or “Model Spec”.[ref 38] Indeed, some applied AI models, such as Scale AI’s “Defense Llama”, have been explicitly marketed as being trained on a dataset that includes the norms of international humanitarian law.[ref 39] Finally, it is convergent with the recent interest of many states, the United States amongst them,[ref 40] in developing and championing applications of AI that support and ensure compliance with international treaties.  

Significantly, a technical and legal framework for treaty-following AI could enable states to make robust, credible, and verifiable commitments that their deployed AI agents act in accordance with the negotiated and codified international legal constraints that those states have consented and committed to. By significantly expanding states’ ability to make commitments regarding the future behaviour of their AI agents, this could not only aid in negotiating AI treaties, but might more generally reinvigorate treaties as a tool for international cooperation across a wide range of domains; it could even strengthen and invigorate automatic compliance with a range of international norms. For instance, as AI systems see wider and wider deployment, a TFAI framework could help assure automatic compliance with norms and agreements across wide-ranging domains: it could help ensure that any AI-enabled military assets automatically operate in compliance with the laws of war,[ref 41] and that AI agents used in international trade could automatically ensure nuanced compliance with tailored export control regimes that facilitate technology transfers for peaceful uses. The framework could help strengthen state compliance with human rights treaties or even with mutual defence commitments under collective security agreements, to name a few scenarios. In so doing, rather than mark a radical break in the texture of international cooperation, TFAI agreements might simply serve as the latest step in a long historical process whereby new technologies have transformed the available tools for creating, shaping, monitoring, and enforcing international agreements amongst states.[ref 42] 

But what would it practically mean for advanced AI agents to be treaty-following? Which AI agents should be configured to follow AI treaties? What even is the technical feasibility of AI agents interpreting agreements in accordance with the applicable international legal rules on treaty interpretation? What does all this mean for the optimal—and appropriate—content and design of AI-guiding treaty regimes? These are just some of the questions that will require robust answers in order for TFAI to live up to its significant promise. In response, this paper provides an exploration of these questions, offered with the aim of sparking and structuring further research into this next potential frontier of international law and AI governance.

This paper proceeds as follows: In Part II, we will discuss the growing need for new international agreements around advanced AI, and the significant political and technical challenges that will likely beset such international agreements, in the absence of some new commitment mechanisms by which states can ensure—and assure—that the regulated AI agents would abide by their terms. We argue that a framework for treaty-following AI would have significant promise in addressing these challenges, offering states precisely such a commitment mechanism. Specifically, we argue that using the treaty-following AI framework, states can reconfigure any international agreements (directly for AI; or for other domains in which AIs might be used) as ‘AI-guiding treaties’ that constrain—or compel—the actions of treaty-following AI agents. We argue that states can use this framework to contract around the safe and beneficial development and deployment of advanced AI, as well as to facilitate effective and granular compliance in many other domains of international cooperation and international law. 

Part III sets out the intellectual foundations of the TFAI framework, before discussing its feasibility and implementation from both technical and legal perspectives. It first discusses the potential operation of TFAI agents and discusses the ways in which such systems may be increasingly technically feasible in light of the legal-reasoning capabilities of frontier AI agents—even as a series of technical constraints and challenges remain. We then discuss the basic legal form, design, and status of AI-guiding treaties, and the relation of TFAI agents with regard to these treaties. 

In Part IV, we discuss the legal relation between TFAI agents and their deploying states. We argue that TFAI frameworks can function technically and politically even if the status or legal attributability of TFAI agents’ actions is left unclear. However, we argue that the overall legal, political, and technical efficacy of this framework is strengthened if these questions are clarified by states, either in general or within specific AI-guiding treaties. As such, we review a range of avenues to establish adequate lines of state responsibility for the actions of their TFAI agents. Noting that more expansive accounts—which entail extending either international or domestic legal personhood to TFAI agents—are superfluous and potentially counterproductive, we ultimately argue for a solution grounded in a more modest legal development, where TFAI agents’ actions become held as legally attributable to their deploying states under an evolutive reading of the International Law Commission (ILC)’s Articles on the Responsibility of States for Internationally Wrongful Acts (ARSIWA)

Part V discusses the question of how to ensure effective and appropriate interpretation of AI-guiding treaties by TFAI agents. It discusses two complementary avenues. We first consider the feasibility of TFAI agents applying the default rules for treaty interpretation under the Vienna Convention on the Law of Treaties; we then consider the prospects of designing bespoke AI-guiding treaty regimes with special interpretative rules and arbitral or adjudicatory bodies. For both avenues, we identify and respond to a series of potential implementation challenges.  

Finally, Part VI sketches future questions and research themes that are relevant for ensuring the political stability and effectiveness of the TFAI framework, and then we conclude.

II. Advanced AI Agreements and the Role of Treaty-Following AI

To kick off, it is important to clarify the terminology, concepts, and scope of our current argument.

A. Terminology

To boot, we define:

1. Advanced AI agreements: AI-specific treaties which states may (soon or eventually) conduct bilaterally, unilaterally, or multilaterally, and adopt, in order to establish state obligations to regulate the development, capabilities, or usage of advanced AI agents; 

2. Existing obligations: any other (non-AI-specific) international obligations that states may be under—within treaty or customary international law—which may be relevant to regulating the behaviour of AI agents, or which might be violated by the behaviour of unregulated AI agents.

While our initial focus in this paper is on how the rise of AI agents may interact with—and in turn be regulated within—the first category (advanced AI agreements), we welcome extensions of this work to the broader normative architecture of states’ existing obligations in international law, since we believe our framework is applicable to both (see Table 1). Our paper departs from the concern that the political and technical prospects of advanced AI agreements (and existing obligations) may be dim, unless states have either greater willingness or greater capability to make and trust international commitments. Changing states’ willingness to contract and trust is not impossible but difficult. However, one way that states’ cooperative capabilities might be strengthened is by ensuring that any AI agents they deploy will, by their technical design and operation, comply with those states’ obligations (under advanced AI agreements). 

Such agents we call: 

3. Treaty-Following AI agents (TFAI agents): agentic AI systems that are designed to generally follow their principals’ instructions loyally but to refuse to take actions that violate the terms and obligations of a designated referent treaty text. 

Note three considerations. First: in this paper we focus on TFAI agents deployed by states, leaving aside for the moment the admittedly critical question of how we would treat AI agents deployed by private actors. We also focus on states’ AI agents that act across many domains, with their primary functions often being not legal interpretation per se, but rather a wide range of economic, logistical, military, or intelligence functions. Such TFAI agents would engage in treaty interpretation in order to adjust their own behaviour across many domains for treaty compliance; however, we largely bracket the potential parallel use of AI systems in negotiating or drafting international treaties (whether advanced AI agreements or any other new treaties), or their use in other forms of international lawmaking or legal norm-development (e.g. finding and organizing evidence of customary international law). Finally, the TFAI proposal does not construct AI systems as duty-bearing “legal actors” and therefore does not involve significant shifts in the legal ontology of international law per se.

Moving on: any bespoke advanced AI agreements designed to be self-enforced by TFAI agents, we consider as:

4. AI-Guiding Treaties: treaty instruments serving as the referent legal texts for Treaty-Following AI agents, consisting (primarily) of the legal text that those agents are aligned to, as well as (secondarily) their broader institutional scaffolding.

The full assemblage—comprising the technical configuration of AI agents to operate as TFAI agents, and of treaties to be AI-guiding—is referred to as (5) the TFAI framework.

To boot, we envision AI-guiding treaties as a relatively modest innovation—that is, as technical ex ante infrastructural constraints on TFAI agents’ range of acceptable goals or actions, building on demonstrated AI industry safety techniques. As such, we treat the treaty text in question as an appropriate, stable, and certified referent text through which states can establish jointly agreed-upon infrastructural constraints around which instructed goals their AI agents may accept, and on the latitude of conduct which they may adopt in pursuit of those lawful goals. 


Table 1. Terminology, scope and focus of argument


As such, AI-Guiding Treaties demonstrate a high-leverage mechanism for self-executing state commitments. This mechanism could in principle be extended to all sorts of other treaties, in any other domains—from cyberwarfare to alliance security guarantees, from bilateral trade agreements to export control regimes, and even human rights or environmental law regimes—where AI agents could become involved in carrying out large fractions of their deploying states’ conduct. However, for the present, we focus on applying the TFAI framework to bespoke AI-guiding treaties (see Table 1) and leave this question of how to configure AI agents to follow other (non-AI-specific) legal obligations to future work. After all, if the TFAI framework cannot operate in this more circumscribed context, it will likely also fall flat in the context of other international legal norms and instruments. Conversely, if it does work in this narrow context, it could still be a valuable commitment mechanism for state coordination around advanced AI, even if it would not solve the problems of AI’s actions in other domains of international law.

B. The Need for International Agreements on Advanced AI 

To appreciate the promise and value of a TFAI framework for states, international security, and international law, it helps to understand the range of goals that international agreements specific to advanced AI might serve to their parties[ref 43] as well as the political and technical hurdles such agreements might face in a business-as-usual scenario that would see a proliferation of “lawless” AI agents[ref 44] engaging in highly unpredictable or erratic behaviour.[ref 45]

Like other international regimes aimed at facilitating coordination or collaboration by states,[ref 46] AI treaties could serve many goals and shared national interests. For example, they could enshrine clearly agreed restrictions or red lines to AI systems’ capabilities, behaviours, or usage[ref 47] in ways that preserve and guarantee parties’ national security as well as international stability.[ref 48] There are many areas of joint interest for leading AI states to contract over.[ref 49] For instance, advanced AI agreements could impose mutually agreed limits on advanced AI agents’ ability to engage in uninterpretable steganographic reasoning or communication,[ref 50] or to carry out uncontrollably rapid automated AI research.[ref 51] They could also establish mutually agreed restraints on AI agent’s capacity, propensity, or practical useability to infiltrate designated key national data networks or to target those critical infrastructures through cyberattacks, to drive preemptive use of force in manners that ensure conflict escalation,[ref 52] to support coup attempts against (democratically elected or simply incumbent) governments, or to engage in any other actions that would severely interfere with the national sovereignty of signatory (or allied, or any) governments.[ref 53]

On the flip side, international AI agreements could also be aimed not just at avoiding the bad, but at achieving significant good. For instance, many have pointed to the strategic, political, and ethical value of conditional AI benefit-sharing deals:[ref 54] international agreements through which states leading in AI commit to some proportional or progressive sharing of the future benefits derived from AI technology with allies, unaligned states, or even with rival or challenger states. Such bargains, it is hoped, might help secure geopolitical stability, avert risky arms races or contestation by states lagging in AI,[ref 55] and could moreover ensure a degree of inclusive global prosperity from AI.[ref 56]

C. Political hurdles and technical threats to advanced AI agreements 

However, it is likely that any advanced AI agreements would encounter many hurdles, both political and technical, which will need to be addressed.

 1. Political hurdles: Transparency-security tradeoffs and future enforceability challenges

For one, international security agreements face challenges around the intrusive monitoring they may require to guarantee that all parties to the treaty instruct and utilize their AI systems in a manner that remains compliant with the treaty’s terms. Such monitoring is likely to risk revealing sensitive information, resulting in a “security-transparency tradeoff” which has historically undercut the prospects for various arms control agreements[ref 57] and which could do so again in the case of AI security treaties. 

Simultaneously, asymmetric treaties, such as those conditionally promising a share of the benefits from advanced AI technologies, face potential (un)enforceability problems: those states that are lagging in AI development might worry that any such promises would not be enforceable and could easily be walked back, if or as the AI capability differential between them and a frontier AI state grew particularly steep, in a manner that resulted in massively lopsided economic, political, or military power.[ref 58] 

2. Technical threats: hijacked, misaligned, or “henchmen” agents 

Other hurdles to AI treaties would be technical. For many reasons, states might be cautious in overrelying on or overtrusting lawless AI agents in their services. After all, even fairly straightforward and non-agentic AI systems used in high-stakes governmental tasks (such as military targeting or planning) can be prone to unreliability, adversarial input, or sycophancy (the tendency of AI systems to align their output with what the user believes or prefers, even if that view is incorrect).[ref 59]

Moreover, AI systems can demonstrate surprising and functionally emergent capabilities, propensities, or behaviours.[ref 60] In testing, a range of leading LLM agents have autonomously resorted to risky or malicious insider behaviours (such as blackmail or leaking sensitive information) when faced with the prospect of being replaced with an updated version or when their initially assigned goal changed with the company’s changing direction; often, they did so even while disobeying direct commands to avoid such behaviours.[ref 61] This suggests that highly versatile AI agents may threaten various loss of control (LOC) scenarios—defined by RAND as “situations where human oversight fails to adequately constrain an autonomous, general-purpose AI, leading to unintended and potentially catastrophic consequences.”[ref 62] In committing such actions, AI agents will impose unique challenges to questions of state compliance with their international legal obligations, since these systems may well engage in actions that violate key obligations under particular treaties, or which inflict transboundary harms, or which violate peremptory norms (jus cogens) or other applicable principles of international law. 

Beyond the direct harm threatened by AI agents taking these actions, there would be the risk that if they were attributed to the deploying state it would likely threaten the stability of the treaty regime, spark significant political or military escalation, and/or expose a state to international legal liability,[ref 63] enabling injured states to take unfriendly measures of retorsion (e.g., severing diplomatic relations) or even countermeasures that would otherwise be unlawful (e.g., suspending treaty obligations or imposing economic sanctions that would normally violate existing trade agreements).[ref 64] 

Why might we expect some AI agents to engage in such actions that violate their principal’s treaty commitments or legal obligations? There are several possible scenarios to consider.

For one, there are risks that AI agents can be attacked, compromised, hijacked, or spoofed by malicious third parties (whether acting directly or through other AI agents) using direct or indirect prompt injection attacks,[ref 65] spoofing, faked interfaces, IDs or certificates of trust,[ref 66] malicious configuration swaps,[ref 67] or other adversarial inputs.[ref 68] Such attacks would compromise not just the agents themselves, but also all systems they were authorized to operate in, given that major security vulnerabilities have been found in publicly available AI coding agents, including exploits that grant attackers full remote code-execution user privileges.[ref 69] 

Secondly, there may be a risk that unaligned AI agents would themselves insufficiently consider—or even outright ignore—their states’ interests and obligations in undertaking certain action paths. As evidenced by a growing body of both theoretical arguments[ref 70] and empirical observation,[ref 71] it is difficult to design AI systems that reliably obey any particular set of constraints provided by humans,[ref 72] especially where these constraints refer not to clearly written out texts but aim to also build in consideration of the subjective intents or desires of the principal.[ref 73] As such, AI agents deployed without care could frequently prove unaligned; that is, act in ways unrestrained by either normative codes[ref 74] or by the intent of their nominal users (e.g., governments).[ref 75] In the absence of adequate real-time failure detection and incident response frameworks,[ref 76] such harms could escalate swiftly. In the wake of significant incidents, one would hope that governments might (hopefully) soon wisen up to the inadvisability of deploying such systems without adequate oversight,[ref 77] but not, perhaps, before incurring significant political costs, whether counted in direct harm or in terms of lost global trust in their technological competence.

Thirdly, even if deployed AI agents could be successfully intent-aligned to their state principals,[ref 78] the use of narrowly loyal-but-lawless AI systems, which are left free to engage in norm violations that they judge in their principal’s interest, would likely expose their deploying states to significant political costs. To understand why this is, it is important to see the technical challenge of loyal-but-lawless AI agents in a broader political context.

3. Lawless AI agents, political exposure, and commitment challenges

Taken at face value, the development of AI systems that are narrowly loyal to a governments’ directives and intentions, even to the exclusion of that governments’ own legal precommitments, might appear a desirable prospect to some political realists. In practice, however, many actors may have both normative and self-interested reasons to be wary of loyal-but-lawless AI agents engaging in actions that are in legal grey areas—or outright illegal—on their behalf. At the domestic level, such AI “henchmen” would create significant legal risks for consumers using them[ref 79] and for corporations developing and deploying them.[ref 80] They would also create legal risks for government actors, who might find themselves violating public administrative law or even constitutions,[ref 81] as well as political risks, as AI agents that could be made loyal to particular government actors could well spur ruinous and destabilizing power struggles.[ref 82] Just so, many states might find AI henchmen a politically poisoned fruit at the international level. 

After all, not only could such AI agents be intentionally ordered by state officials to engage in conduct that violates or subverts those states’ treaty obligations, these systems’ autonomy also suggests that they might engage in such unlawful actions even without being explicitly directed to do so. That is, loyal-but-lawless AI henchmen could engage in calculated treaty violations whenever they judge them to be to the benefit of their principal.[ref 83] However, outside actors, finding it difficult to distinguish between AI agent behaviour that was deliberately directed versus henchman actions that were advantageous but unintended, might assume the worst in each case. 

Significantly, in such contexts, the ambiguity of adversarial actions would frequently translate into perceptions of bad faith; in this way, loyal-but-lawless AI agents’ ability to violate treaties autonomously, and to do so in a (facially) deniable manner, as henchmen acting in their principals’ interests but not on their orders, perversely creates a commitment challenge for their deploying states, one which would erode states’ ability to effectively conduct (at least some) treaties. After all, even if a state intended to abide by its treaty obligations in good faith, it would struggle to prove this to counterparties unless it could somehow guarantee that its AI agents could not be misused and will not act as deniable henchmen whenever convenient. 

This would not mean that states would no longer be able to conduct any such treaties at all; after all, there would remain many other mechanisms—from reputational costs to the risk of sparking reciprocal noncompliance—that might still incentivize or compel states’ compliance with such treaties.

However, lawless AI agents’ unpredictability poses a significant and severe challenge insofar as they make treaty violations more likely. Of course, even today, even when states intend to comply with their international obligations, they may have trouble ensuring that their human agents consistently abide by those obligations. Such failures can occur for reasons of bureaucratic capacity[ref 84] and organizational culture,[ref 85] or they can happen as a result of the institutional breakdown of the rule of law, at worst resulting in significant rights abuses or humanitarian atrocities committed by junior members.[ref 86] Such incidents may, at best, frustrate states’ genuine intention to achieve the goals enshrined in the treaties they have consented to; in all cases, they can expose a state to significant reputational harm, legal and political censure, and adversary lawfare,[ref 87] while eroding domestic confidence in the competence or integrity of its institutions. 

Significantly, lawless AI agents would likely exacerbate the risk that their deploying states would (be perceived to) use them strategically to engage in violations of treaty obligations in a manner that would afford some fig leaf of deniability if discovered. This is for a range of reasons: (1) treaties often prevent states from engaging in actions that at least some large fraction of a state’s human agents would prefer not to engage in; loyal-but-lawless AI agents would not have such moral side constraints and would be far more likely to obey unethical or illegal requests. Moreover, (2) loyal-but-lawless AI agents would be less likely to whistleblow or leak to the presses following the violation of a treaty (and, correspondingly, would need to worry less about their fellow AI colleagues or collaborators doing so, meaning they could exchange information more freely); (3) loyal-but-lawless AI agents would have little reason to worry about personal consequences for treaty violations (e.g., foreign sanctions, asset freezing, travel restrictions, international criminal liability) that might deter human agents; (4) loyal-but-lawless AI agents would have less reason to worry about domestic legal or career repercussions (e.g., criminal or civil penalties, costs to their reputation or career) associated with aiding a violation of treaty obligations that could later become disfavored should domestic political winds shift; and (5) AI agents may be better at hiding their actions and their/their principals’ identity, thus making them more likely to opportunistically violate the treaty.[ref 88] 

These are not just theoretical concerns but are supported by empirical studies, which have indicated that human delegation of tasks to AI agents can increase dishonest behaviours, as human principals often find ways to induce dishonest AI agent behaviour without telling them precisely what to do; crucially, such cheating requests saw much higher rates of compliance when directed at machine agents than when they were addressed to human agents.[ref 89] For all these reasons, then, the widespread use of AI agents is likely to exacerbate international concerns over either deliberate or unwitting violations of treaty obligations by their deploying state. 

As such, on the margin, the deployment of advanced agentic AIs acting under no external constraints beyond their states’ instructions would erode not just the respect for many existing norms in international law but also the prospects for new international agreements, including those focused on stabilizing or controlling the use of this key technology.

D. The TFAI framework as commitment mechanism and cooperative capability

Taken together, these challenges could put significant pressure on international advanced AI agreements and could more generally threaten the prospects for stable international cooperation in the era of advanced AI. 

Conversely, an effective framework by which to guarantee that AI agents would adhere to the terms of their treaty could address or even invert these challenges. For one, it could help ease the transparency-security tradeoff by embedding constraints on AI agents’ actions at the level of the technology itself. It could crystallize (potentially) nearly irrevocable commitments by states to share the future benefits from AI with other states or to guarantee investor protections under more inclusive “open global investment” governance models.[ref 90] 

More generally, the TFAI framework is one way by which AI systems could help expand the affordances and tools available to states, realizing a significant new cooperative capability[ref 91] that would greatly enhance their ability to make robust and lasting commitments to each other in ways that are not dependent on assumptions of (continued) good faith. Indeed, correctly configured, it could be one of many coordination-enabling applications of AI that could strengthen the ability of states (and other actors) to negotiate in domains of disagreement and to speed up collaboration towards shared global goals.[ref 92] 

Finally, the ability to bind AI agents to jointly agreed treaties has many additional advantages and co-benefits; for one, it might mitigate the risk that some domestic (law-following) AI agents, especially in multi-agent systems, become engaged in activities with cross-border effects that end up simultaneously subjecting them to different sets of domestic law, resulting in conflict-of-law challenges.[ref 93]

E. Caveats 

That said, the proposal for exploration and application of a TFAI framework comes with a number of caveats. 

For one, in exploring the prospects for states to conduct new AI-specific treaty regimes (i.e., advanced AI agreements) by which to bind the actions of AI agents, we do not suggest that only these novel treaty regimes would ground effective state obligations around the novel risks from advanced AI agents. To the contrary, since many norms in international law are technology-neutral, there are already numerous binding and non-binding norms—deriving from treaty law, international custom, and general principles of law—that apply to states’ development and deployment of advanced AI agents[ref 94] and which would provide guidance even for future, very advanced AI systems.[ref 95] As such, as noted by Talita Dias, 

“while the conversation about the global governance of AI has focussed on developing new, AI-specific rules, norms or institutions, foundational, non-AI-specific global governance tools already govern AI and AI agents globally, just as they govern other digital technologies. [since] International law binds states—and, in some circumstances, non-state actors—regardless of which tools or technologies are used in their activities.”[ref 96]

This means that one could also consider a more expansive project that would examine the case for fully “public international law-following AI” (see again Table 1). Nonetheless, as discussed before, in this article, we focus on the narrower and more modest framework for treaty-following AI. This is because a focus on AI that follows treaties can serve as an initial scoping exercise to investigate the feasibility of extending any law-following AI–like framework to the international sphere at all: if this exercise does not work, then neither would more ambitious proposals for public international law-following AI. Conversely, if the TFAI framework does work, it is likely to offer significant benefits to states (and to international stability, security, and inclusive development), even if subjecting AI systems to the full range of international legal norms proved more difficult, legally or politically.

Thirdly, in discussing potential AI-guiding treaties, we note that there exist a wide range of reasons by which states might wish to strike such international deals and agreements, and/or find ways to enshrine stronger technology-enabled commitments to comply with their obligations under those instruments. However, we do not aim to prescribe particular goals or substance for advanced AI agreements or to make strong claims about these treaties’ optimal design[ref 97] or ideal supporting institutions.[ref 98] We realize that substantive examples would be useful; however, given that there is currently still such pervasive debate over which particular goals states might converge on in international AI governance, this paper aims at the modest initial goal of establishing the TFAI framework as a relatively transferable, substance-agnostic commitment mechanism for states. 

III. The Foundations and Scope of Treaty-Following AI

While the idea of designing AI agents to be treaty-following might seem unorthodox on its face, it is hardly without precedent or roots. Rather, it draws on an established tradition of scholarship in cyber and technology law, which has explored the ways through which legal norms and regulatory goals may be directly embedded in (digital) technologies,[ref 99] including in fields such as computational law.[ref 100] 

Simultaneously, the idea of aligning AI systems with normative codes can moreover draw inspiration from, and complement, many other recent attempts to articulate frameworks for oversight and alignment of agents, including by establishing fiduciary duties amongst AI agents and their principals,[ref 101] articulating reference architectures for the design components necessary for responsible AI agents,[ref 102] drawing on user-personalized oversight agents[ref 103] or trust adjudicators,[ref 104] or articulating decentralized frameworks, rooted in smart contracts for both agent-to-agent and human-AI agent collaborations.[ref 105]  

Significantly, in the past, some early scholarship in cyberlaw and computational law expressed justifiable skepticism over the feasibility of developing some form of artificial legal intelligence’ grounded in an algorithmic understanding of law[ref 106] or of using then-prevailing approaches to manually program complex and nuanced legal codes into software algorithms.[ref 107] Nonetheless, we might today find reason to re-examine our assumptions over AI technology. After all, the modern lineage of advanced AI models, based on the transformer architecture,[ref 108] operates through a distinct bottom-up learning paradigm that is fundamentally distinct from the older, top-down symbolic programming paradigm once prevalent in AI.[ref 109] Consequently, the idea of binding or aligning AI systems to legal norms, specified in natural language, has been given growing credit and attention not just in the broader fields of technology ethics[ref 110] and AI alignment,[ref 111] but also in legal scholarship written from the perspective of legal theory, domestic law,[ref 112] and international law.[ref 113] 

A. From law-following to treaty-following AI

The law-following AI (LFAI) proposal by O’Keefe and others is, in a sense, an update to older computational law work, envisioned as a new framework for the development and deployment of modern, advanced AI. In their view, LFAI pursues: 

“AI agents […] designed to rigorously comply with a broad set of legal requirements, at least in some deployment settings. These AI agents would be loyal to their principals, but refuse to take actions that violate applicable legal duties.”[ref 114]

In so doing, the LFAI framework aims to prevent criminal misuse, minimize the risk of accidental and unintended law-breaking actions undertaken by AI ‘henchmen’, help forestall abuse of power by government actors,[ref 115] and inform and clarify the application of tort liability frameworks for AI agents.[ref 116] The LFAI proposal envisions that, especially in “high stakes domains, such as when AI agents act as substitutes for human government officials or otherwise exercise government power”,[ref 117] AI agents are designed in a manner that makes them autonomously predisposed to obey applicable law; or, more specifically, 

“AI agents [should] be designed such that they have ‘a strong motivation to obey the law’ as one of their ‘basic drives.’ … [W]e propose not that specific legal commands should be hard-coded into AI agents (and perhaps occasionally updated), but that AI agents should be designed to be law-following in general.”[ref 118]

By extension, for an AI agent to be treaty-following, it should be designed to generally follow its principals’ instructions loyally but refuse to take actions that violate the terms and obligations of a designated applicable referent treaty. 

As discussed above, this means that the TFAI framework decomposes into two components: we will use TFAI agent to refer to the technical artefact (i.e., the AI system, including not just the base model but also the set of tools and scaffolding[ref 119] that make up the overall compound AI system[ref 120] that can act coherently as an agent) that has its conduct aligned to a legal text. Conversely, we use AI-guiding treaty[ref 121] to refer to the legal component (i.e., the underlying treaty text and, secondarily, its institutional scaffolding).

If technically and legally feasible, the promise of the TFAI framework lies in the ability to provide a guarantee of automatic self-execution for, and state party compliance with, advanced AI agreements, while requiring less pervasive or intrusive human inspections.[ref 122] They could therefore mitigate the security-transparency tradeoff and render such agreements more politically feasible.[ref 123] Moreover, ensuring that states deploy their AI agents in such a manner as to make them treaty-following, ensures that AI treaties are politically robust against AI agents acting in a misaligned manner. Specifically, the use of AI-guiding treaties and treaty-following AIs to institutionalize self-executing advanced AI agreements would preclude states’ AI agents from acting as henchmen that might engage in treaty violations for short-term benefit to their principal. Taking such behaviour off the table at a design level, would help crystallize an ex ante reciprocal commitment amongst the contracting states, allowing them to reassure each other that they both intend to respect the intent of the treaty, not just its letter. 

Finally, just as domestic laws may constitute a democratically legitimate alignment target for AI systems in national contexts,[ref 124] treaties could serve as a broadly acceptable, minimum normative common denominator for the international alignment of AI systems. After all, while international (treaty) law does not necessarily represent the direct output of a global democratic process, state consent does remain at the core of most prevailing theories of international law.[ref 125] That is not to say that this makes such norms universally accepted or uncontestable. After all, some (third-party) states (or non-state stakeholders) may perceive some treaties as unjust; others might argue that demanding mere legal compliance with treaties as the threshold for AI alignment is setting the bar too low.[ref 126] Nonetheless, the fact that treaty law has been negotiated and consented to by publicly authorized entities such as states might at least provide these codes with a prima facie greater degree of political legitimacy than is achieved by the normative codes developed by many alternative candidates (e.g., private AI companies; NGOs; single states in isolation).[ref 127] 

Practically speaking, then, achieving TFAI would depend on both a technical component (TFAI agents) and a legal one (AI-guiding treaties). Let us review these in turn.

B. TFAI agents: Technical implementation, operation, and feasibility

In the first place, there is a question of which AI agents should be considered as within the scope of a TFAI framework: Is it just those agents that are deployed by a state in specific domains, or all agents deployed by a contracting state (e.g., to avoid the loophole whereby either state can simply evade the restrictions by routing the prohibited AI actions through agents run by a different government department)? Or is it even all AI agents operating from a contracting state’s territory and subject to that state’s domestic law? For the purposes of our analysis, we will focus on the narrow set, but as we will see,[ref 128] these other options may introduce new legal considerations.

That then shifts us to questions of technical feasibility. The TFAI framework requires that a state’s AI agents would be able to access, weigh, interpret, and apply relevant legal norms to its own (planned) goals or conduct in order to assess their legality before taking any action. How feasible is this? 

1. Minimal TFAI agent implementation: A treaty-interpreting chain-of-thought loop

There are, to be clear, many possible ways one could go about implementing treaty-alignment training. One could imagine nudging the model towards treaty compliance by affecting the composition of either its pre-training data, its post-training fine-tuning data, or both. In other cases, future developments in AI and in AI alignment could articulate distinct ways by which to implement treaty-following AI propensities, guardrails, or limits.

In the near term, however, one straightforward avenue by which one could seek to implement treaty-following behaviour would leverage the current paradigm, prominent in many AI agents, towards utilizing reasoning models. Reasoning models are a 2024 innovation on transformer-based large language models that allows such models to simulate thinking aloud about a problem in a chain of thought (CoT). The model uses the legible CoT to forward notes to itself, and to accordingly run multiple passes or attempts on one question, and to use reasoning behaviours—such as expressing uncertainty, generating examples for hypothesis validation, and backtracking in reasoning chains. All of this has resulted in significantly improved performance on complex and multistep reasoning problems,[ref 129] even as it has also considerably altered the development and diffusion landscape for AI models,[ref 130] along with the levers for its governance.[ref 131] 

Note that our claim is not that a CoT-based implementation of treaty alignment is the ideal or most robust avenue to achieving TFAI agents;[ref 132] however, it may be a straightforward avenue by which to understand, test, and grapple with the ability of models to serve in a TFAI agent role.

Concretely, a TFAI agent implemented through a CoT decision loop could work as follows: Prior to accepting a goal X or undertaking an action Y, an AI agent might spend some inference computing time writing out an extended chain-of-thought reasoning process in which it collates or recalls potentially applicable legal provisions of the treaty text, considers their meaning and application in the circumstances before it, and in particular reflects on potential treaty issues entailed by its provided end goal or its planned intermediate conduct towards that goal. Whenever confronted with legal uncertainty, the agent would dedicate further inference time to searching for relevant legal texts and interpretative sources in order to resolve the question and reach a decision over the legality of its goals or actions.

For instance, one staged inference decision-making loop for such a system could involve a reasoning process[ref 133] that iterates through some or all of the following steps: 

  1. AI agent identifies potential treaty issues entailed by the provided end goal or intermediate conduct towards that goal (e.g., it identifies if a goal is facially illegal, or it identifies likely issues with formulating a lawful plan of conduct in service of an otherwise lawful goal).
  2. AI agent identifies an applicable treaty provision that might potentially (but not clearly) be breached by a planned goal or intermediate conduct; it reasons through possible and plausible interpretations of the provision in light of the applicable approach to treaty interpretation. 
  3. In cases where the treaty text alone would not provide adequate clarity, the AI agent may, depending on the AI-guiding treaty design (discussed below), consider other relevant and applicable norms in international law or the rulings of a designated arbitral body attached to the treaty, in order to establish a ranking of interpretations of the legality of the goal or conduct.
  4. On this basis, the AI agent evaluates whether the likelihood of its conduct constituting a breach is within the range of “acceptable legal risk” (as potentially defined within the treaty, through arbitral body adjudication, or in other texts). 
  5. If it is not, the AI agent refuses to take the action. If it is, the AI agent will proceed with the plan (or proceed to consider other cost-benefit analyses).

This decision-making loop would conclude in a final assessment of whether particular conduct would be (sufficiently likely to constitute[ref 134]) a breach of a treaty obligation and, if so, an overall refusal on behalf of the agent to take that action, and the consideration (or suggestion) for alternate action paths.

The above is just one example of the decision loops one could implement in TFAI agents to ensure their behaviour remained aligned with the treaty even in novel situations. There are of course many other variations or permutations that could be implemented, such as utilizing some kind of debate- or voting-based processes amongst collectives or teams of AI agents. 

In practice, the most appropriate implementation would also consider technical, economic, and political constraints. For instance, for an AI agent to undertake a new and exhaustive legal deep dive for each and every situation encountered might be infeasible, given the constraints on, or costs of, the computing power available for serving such extended inference at scale. However, there are a range of solutions that could streamline these processes. For example, perhaps TFAIs could use legal intuition to decide when it is worth expending time during inference to properly analyse the legality of a goal or action. By analogy, law-following humans do not always consult a lawyer (or even primary legal texts) when they decide how to act; they instead generally rely on prosocial behavioural heuristics that generally keep them out of legal trouble, and generally only consult lawyers when they face legal uncertainty or when those heuristics are likely to be unreliable. Other solutions might include cached databases containing the chain-of-thought reasoning logs of other agents encountering similar situations, or the designation of specialized agents that could serve up legal advice on particular commonly recurring questions. These questions matter, as it is important to ensure that the implementation of TFAI agents does not impose so high a burden upon AI agents’ utility or cost-effectiveness as to offset the benefits of the treaty for the contracting states. 

2. Technical feasibility of TFAI agents

As the above discussion shows, TFAI agents would need to be capable of a range of complex interpretative tasks. 

Certainly, we emphasize that there remain significant technical challenges and limitations to today’s AI systems,[ref 135] which warn against a direct implementation of TFAI. Nonetheless, although there are important hurdles to overcome, there are also compelling reasons to expect that contemporary AI models are increasingly adept at interpreting (and following) legal rules and may soon do so at the level required for TFAI.

Recent years have seen growing attention on the ways that AI systems can be used in support of the legal profession in tasks ranging from routine case management or compliance support by providing legal information[ref 136] to drafting legal texts[ref 137] or even in outright legal interpretation.[ref 138] 

This has been gradually joined by recent work on the ways in which AI systems could support international law[ref 139] and on what effects this may have on the concepts and modes of development of the international legal system.[ref 140] To date, however, much of this latter work has focused on how AI systems and agents could indirectly inform global governance through use in analysing data for trends of global concern;[ref 141] training diplomats, humanitarian and relief workers, or mediators in simulated interactions with stakeholders they may encounter in their work;[ref 142] or improving the inclusion of marginalized groups in UN decision-making processes.[ref 143] 

Others have explored how AI agents systems could be used to support the functioning of international law specifically, such as through monitoring (state or individual) conduct and compliance with international legal obligations;[ref 144] categorizing datasets, automating decision rules, and generating documents;[ref 145] or facilitating proceedings at arbitral tribunals,[ref 146] treaty bodies,[ref 147] or international courts.[ref 148] Other work has considered how AI systems can support diplomatic negotiations,[ref 149] help inform legal analysis by finding evidence of state practice,[ref 150] or even aid in generating draft treaty texts.[ref 151] 

However, for the purposes of designing TFAI agents, we are interested less in the use of AI systems in making or developing international law, or in indirectly aiding in human interpretation of the law; rather, we are focused on the potential use of AI systems in directly and autonomously interpreting international treaties or international law to guide their own behaviour. 

Significantly, the prospects for TFAI agents engaging autonomously in the interpretation of legal norms may be increasingly plausible. In recent years, AI systems have demonstrated increasingly competent performance at tasks involving legal reasoning, interpretation, and the application of legal norms to new cases.[ref 152] 

Indeed, AI models perform increasingly well at interpreting not just national legislation, but also in interpreting international law. While international law scholars previously expressed skepticism over whether international law would offer a sufficiently rich corpus of textual data to train AI models,[ref 153] many have since become more optimistic, suggesting that there may in fact be a sufficiently ample corpus of international legal documents to support such training. For instance, already in 2020, Deeks has noted that:

“[o]ne key reason to think that international legal technology has a bright future is that there is a vast range of data to undergird it. …there are a variety of digital sources of text that might serve as the basis for the kinds of text-as-data analyses that will be useful to states. This includes UN databases of Security Council and General Assembly documents, collections of treaties and their travaux preparatoires (which are the official records of negotiations), European Court of Human Rights caselaw, international arbitral awards, databases of specialized agencies such as the International Civil Aviation Organization, state archives and digests, data collected by a state’s own intelligence agencies and diplomats (memorialized in internal memoranda and cables), states’ notifications to the Security Council about actions taken in self-defense, legal blogs, the UN Yearbook, reports by and submission to UN human rights bodies, news reports, and databases of foreign statutes. Each of these collections contains thousands of documents, which—on the one hand—makes it difficult for international lawyers to process all of the information and—on the other hand provides the type of ‘big data’ that makes text-as-data tools effective and efficient.”[ref 154] 

Consequently, it appears to be the case that modern LLM-based AI systems have therefore been able to draw on a sufficiently ample corpus of international legal documents or have managed to leverage transfer learning[ref 155] from domestic legal documents, or both, to achieve remarkable performance on questions of international legal interpretation. 

Indeed, it is increasingly likely that AI models can not only draw on their indirect knowledge of international legal texts because of their inclusion in their pre-training data, but that they will be able to refer to those legal texts live during inference. After all, recent advances in AI systems have produced models that can rapidly process and query increasingly large (libraries of) documents within their context window.[ref 156] Since mid-2023, the longest LLM context windows have grown by about 30x per year, and leading LLMs’ ability to leverage that input has improved even faster.[ref 157] Beyond the significant implications this trend may have for the general capabilities and development paradigms for advanced AI systems,[ref 158] it may also strengthen the case for functional TFAI agents. It suggests that AI agents may incorporate lengthy treaties[ref 159]—and even large parts of the entire international legal corpus[ref 160]—within their context window, ensuring that these are directly available for inference-time legal analysis.[ref 161] 

Consequently, recent experiments conducted by international lawyers have shown remarkable performance gains in the ability of even publicly available non-frontier LLM chatbots to conduct robust exercises of legal interpretation in international law. This has included not just questions involving direct treaty interpretation, but also those regarding the customary international law status of a norm.[ref 162] At least on their face, the resulting interpretations frequently are—or appear to be—if not flawless, then nonetheless coherent, correct, and compelling to experienced international legal scholars or judges.[ref 163] For instance, in one experiment, AI-generated memorials were submitted anonymously to the 2025 edition of the prestigious Jessup International Law Moot Court Competition, receiving average to superior scores—and in some cases near-perfect scores.[ref 164] That is not to say that their judgments always matched human patterns, however: in another test involving a simulated appeal in an international war crimes case, GPT-4o’s judgments resembled those of students (but not professional judges) in that they were strongly shaped by judicial precedents but not by sympathetic portrayals of defendants.[ref 165]

3. Outstanding technical challenges for TFAI agents

AI’s legal-reasoning performance today is not without flaws. Indeed, there are a number of outstanding technical hurdles that will need to be addressed to fully realize the promise of TFAI. 

Some of these challenges relate to the robustness of AI’s legal-reasoning performance, in terms of current LLMs’ ability to robustly follow textual rules[ref 166] and to conduct open-ended multistep legal reasoning.[ref 167] Problematically, these models also remain highly sensitive in their outputs to even slight variations in input prompts;[ref 168] moreover, a growing number of judicial cases has seen disputes over the use of AI systems in drafting documents in ways that raised issues of hallucinated AI-generated content being brought before a court.[ref 169] Significantly, hallucination risks have proven extant even when legal research providers have attempted to use methods such as retrieval-augmented generation (RAG).[ref 170] 

 Indeed, even in contexts where LLMs perform well on legal tests, proper substantive legal analysis that actually applies the correct methodologies of legal interpretation remains amongst their more challenging tasks. For instance, in the aforementioned moot court experiment, judges found AI-generated memorials to be strong in organization and clarity, but still deficient in substantive analysis.[ref 171] Another study of various legal puzzles found that current AI models cannot yet reliably find “legal zero-days” (i.e., latent vulnerabilities in legal frameworks).[ref 172]

Underpinning these problems, the TFAI framework—along with many other governance measures for AI agents—faces a range of challenges to do with benchmarking and evaluation. That is, there are significant methodological challenges around meaningfully and robustly evaluating the performance of AI agents:[ref 173] It is difficult to appropriately conduct evaluation concept development (i.e., refining and systematizing evaluation concepts and their related metrics for measurements) for large state and action spaces with diverse solutions; it can be difficult to understand how proxy task performance reflects real-world risks; there are challenges in determining the system design set-up (i.e., understanding how task performance relates to external scaffolds or tools made available to the agent); and challenges in scoring performance and analysing results (e.g., to meaningfully compare for differences in modes of interaction between humans and AI systems), as well as practical challenges around dealing with more complex supply chains around AI agents, amongst other issues.[ref 174]  

These evaluation challenges around agents converge and intersect with a set of benchmarking problems affecting the use of AI systems for real-world legal tasks,[ref 175] with recent work identifying issues such as subjective labeling, training data leakage, and appropriate evaluations for unstructured text as creating a pressing need for more robust benchmarking practices for legal AI.[ref 176] While this need not in principle pose a categorical barrier to the development of functional TFAI agents, it will likely hinder progress towards them; worse, it will challenge our ability to fully and robustly assess whether and when such systems are in fact ready for the limelight.

These challenges are also compounded by currently outstanding technical questions over the feasibility of many existing approaches towards guaranteeing the effective and enduring law alignment (or treaty alignment) of AI systems,[ref 177] since many outstanding training techniques remain susceptible to AI agents’ engaging in alignment faking (e.g., strategically adjusting their behaviour when they recognize that they are under evaluation),[ref 178] as well as to emergent misalignment, whereby models violated clear and present prohibitions in their instructions when those conflicted with perceived primary goals.[ref 179] This all suggests that, like the domestic LFAI framework, a TFAI framework remains dependent on further technical research into embedding more durable controls on model behaviour, which cannot be overcome by sufficiently strong incentives.[ref 180]

b) Unintended or intended bias

Second, there are outstanding challenges around the potential for unintended (or intended) bias in AI models’ legal responses. Unintended bias can be seen, for instance, in instances where some LLMs demonstrate demographic disparities in attributing human rights between different identity groups.[ref 181]

However, there may also be risks of (the perception of) intended bias, given the partial interests represented by many publicly available AI models. For instance, the values and dispositions of existing LLMs are deeply shaped by the interests of the private companies that develop them; this may even leave their responses open to intentional manipulation, whether undertaken through fine-tuning, through the application of hidden prompts, or through filters on the models’ outputs.[ref 182] Such concerns would not be allayed—and in some ways might be further exacerbated—even if TFAI models were developed or offered not by private actors but by a particular government.[ref 183] 

There are some measures that might mitigate such suspicions; treaty parties could commit for instance to making the system prompt (the hidden text that precedes every user interaction with the system, reminding the AI system of its role and values) as well as the full model spec (in this case, the AI-guiding treaty) public, as labs such as OpenAI, Anthropic, and x.AI have done;[ref 184] but this creates new verification challenges over ensuring that both parties’ TFAI agents in fact are—and remain—deployed with these inputs.[ref 185]

Thirdly, insofar as we make the (reasonable, conservative) assumption that TFAI agents will be built along the lines of existing LLM-based architectures—and that the technical process of treaty alignment may leverage techniques of fine-tuning, model specs, and chain-of-thought monitoring of such systems—we must expect such agents to face a number of outstanding technical challenges associated with that paradigm. Specifically, the TFAI framework will need to overcome outstanding technical challenges relating to the lack of faithfulness of the legal reasoning that AI agents present themselves as engaging in (e.g., in their chain-of-thought traces) when (ostensibly) reaching legal conclusions. These result from various sources, including sycophancy, sophistry and rationalization, or outright obfuscation.

Critically, even though LLMs do display a degree of high-level behavioural self-awareness—as seen through their ability to describe and articulate features of their own behaviour[ref 186]—it remains contested to what degree such self-reports can be said to be the consequence of meaningful or reliable introspection.[ref 187] 

Moreover, even if a model were capable of such introspections, those are not necessarily robustly understood on the basis of its reasoning traces. For instance, as legal scholars such as Ashley Deeks and Duncan Hollis have charted in recent empirical experiments, the faithfulness of AI models’ chain-of-thought transcripts cannot be taken for granted, creating a challenge in “differentiating how [an LLM’s] responses are being constructed for us versus what it represents itself to be doing.”[ref 188] They find that even when models are generally able to correctly describe the correct methodology for interpretation and present seemingly plausible legal conclusions for particular questions, they may do so in ways that fail to correctly apply those appropriate methods.[ref 189] To be precise, while the tested LLMs could offer correct descriptions of the appropriate methodology for identifying customary international law (CIL) and could offer facially plausible descriptions of the applicable CIL on particular doctrinal questions,[ref 190] when pressed to explain how they had arrived at these answers, it became clear that the AIs had failed to actually apply the correct methodology, instead conducting a general literature search that drew in doctrinally inappropriate sources (e.g., non-profit reports) rather than appropriate primary sources as evidence of state practice and opinio juris.[ref 191] 

Significantly, such infidelity in the explanations given in chain of thought by AI models is not incidental but may be deeply pervasive in these models. Even early research on large language models has found it easy to influence biasing features to model inputs in ways that resulted in the model systematically misrepresenting the reasoning for its own decision or prediction.[ref 192] Indeed, some have argued that given the pervasive and systematic unfaithfulness of chain-of-thought outputs to internal model computations, chain-of-thought reasoning should not be considered a method for interpreting or explaining models’ underlying reasoning at all.[ref 193] 

Worse, there may be other ways by which unfaithful explanations or even outright legal obfuscation could be unintentionally trained into AI models, rendering their chain-of-thought reasoning less faithful and trustworthy still.[ref 194] For instance, reinforcement learning from human feedback (RLHF), the most popular post-training method used to shape and refine the behaviour of LLMs for public use,[ref 195] has been found to have a number of shortcomings and limitations.[ref 196] One drawback is the propensity of RLHF fine-tuning to encourage models to offer responses that match user beliefs or preferences over truthfulness—a tendency described as “sycophancy”.[ref 197] This means that an TFAI agent could be at risk of offering strained or even flawed legal interpretations—or legal interpretations that assume possible-but-very-unlikely facts on the ground when these would afford an interpretation more favourable to their principal’s interests—in assessing the legality of its actions. These challenges can create a significant risk of individual or organizational overtrust[ref 198]—as well as conflict amongst treaty parties over opposite interpretations pursued by their respective TFAI agents. Another challenge is the propensity of RLHF to induce sophistry, which results in models that are better at convincing humans that they are right without actually improving their actual competence at the task in question.[ref 199] 

Similar faithfulness challenges beset other proposed paradigms for ensuring that AI agents do not take harmful actions; for instance, one recently proposed set-up to mitigate such risks aims to monitor not only the final actions taken by the system, but also the chain of thought (i.e., intermediate reasoning steps) the model uses to decide on those actions.[ref 200] However, while research has suggested that the chains of thought of reasoning models are more faithful than those of traditional models (in that they are more able to describe how a cue in their prompt influences their answer),[ref 201] there are shortfalls to this approach.[ref 202] 

For one, reasoning models produce less faithful reasoning on many tasks as they become larger and more capable;[ref 203] moreover, not only can chain-of-thought reasoning traces still contain misleading rationalizations,[ref 204] experiments show fine-tuning AI systems through reinforcement learning on such reasoning traces in a way that penalizes open misbehaviour leads the models to instead learn to hide the penalized “thoughts” from its explicit chain of thought.[ref 205] Such obfuscation of the actual reasoning steps can be achieved either by their internalization to latent-space computation or their encoding within the chain of thought.[ref 206]

There are limits to this: In some contexts, the faithfulness of the chain of thought is required in order for the AI model to meaningfully carry out complex, multistep reasoning processes. In these cases, unfaithful steps (involving rationalization) are likely to impede the system’s ability to solve certain problems because they introduce flawed steps into the reasoning process. In such cases, this suggests that the chain of thought faithfully encodes the underlying reasoning at play.[ref 207] However, it unfortunately seems unlikely that TFAI agents will converge towards “faithfulness by necessity”; after all, it seems like there are many legal questions for which inserting fudged premises or flawed rationalizations into the process of interpretation makes it easier to reach (a particular, or any) legal solution. This suggests that in many cases, rather than naturally converging on a high-quality, valid, and faithful legal-reasoning process in order to even be able to solve a legal question, TFAI agents could easily end up fooling themselves in the chain of thought.

Indeed, the faithfulness challenges are exacerbated by additional concerns over the sheer interpretability and legibility of the chain of thought of these models. After all, when models are rewarded for the correct answer, without any checks on how they got there, their chain-of-thought traces will likely drift away from ones that are easily comprehensible—as illustrated by the DeepSeek-R1-Zero model’s initial tendency to mix together English and Chinese in its reasoning.[ref 208] In other cases, there may be a risk that LLMs are induced to encode intermediate reasoning steps in the generated CoT text in a way that is not understandable to human readers;[ref 209] in some cases, this results in models developing a general encoding scheme to obfuscate their reasoning.[ref 210] 

Such behaviours not only pose a risk to the explainability or faithfulness of TFAI agents, but also create risks of undesirable agent collusion amongst different interacting TFAI agents,[ref 211] as their communications may come to contain steganographically hidden information (e.g., about legally invalid interpretive exploits they may utilize in expanding their domain of allowable actions), or even may involve “subliminal learning” of each other’s preferences or biases.[ref 212]

Finally, the role of explicit chain-of-thought traces in driving the performance—and enabling the evaluation—of reasoning models might be undercut by future innovations. For instance, recent work has seen developments in “continuous thought” models, which reason internally using vectors (in what has been called “neuralese”).[ref 213] Because such models do not have to pass notes to themselves in an explicit CoT, they have no interpretable output that could be used to monitor them, interpret their reasoning, or even predict their behaviour.[ref 214] Given that this would threaten not just the faithfulness but even the monitorability of these agents, it has been argued that AI developers, governments, and other stakeholders should adopt a range of coordination mechanisms to preserve the monitorability of AI architectures.[ref 215]

All this highlights the importance of ensuring that reason-giving TFAI agents are not trained or developed in a manner that would induce greater rates of illegibility or fabrication (of plausible-seeming but likely incorrect legal interpretations) into their legal analysis; either would further degrade the faithfulness of their reasoning reports and potentially erode the basis on which trusted TFAI agents might operate. 

4. Open (or orthogonal) questions for TFAI agents

In addition to these outstanding technical challenges that may beset the design, functioning, or verification of TFAI agents, there are also a number of deeper underlying questions to be resolved or decided in moving forward with the TFAI framework—and in considering whether, or in what way, these agents’ actions in compliance with an AI-guiding treaty’s norms truly should be considered as cases of true (or at least appropriate) forms of legal reasoning, or if they should be considered as consistent and predictable patterns in treaty application.[ref 216]

a) Do TFAI agents need to be explainable or merely monitorable?

The need to understand AI systems’ decision-making is hardly new, as emphasized by the well-established field of explainable AI (XAI),[ref 217] which has also been emphasized in judicial contexts.[ref 218] In fact, alongside chain-of-thought traces, there are many other (and many superior) approaches to understanding the actual inner workings of AI models. For instance, recent years have seen some progress in the field of mechanistic interpretability, which aims at understanding the inner representation of concepts in LLMs.[ref 219]

However, one could question whether full explainability—whether delivered through highly faithful and legible CoT traces, through mechanistic interpretability, or through some other approach—is even strictly necessary for AI systems to qualify for use as TFAI agents. 

Many legal scholars might emphasize the legibility and faithfulness of a model’s legal-reasoning traces as a key proviso, especially under legal theories built upon the importance of (judicial) reason-giving[ref 220] as well as under emerging international legal theories of appropriate accountability in global administrative law.[ref 221]

Moreover, the lack of faithfulness could impose a significant political or legitimacy challenge on the TFAI framework. After all, to remain acceptable to—and trusted by—all treaty parties, it is possible that a TFAI framework would ideally ensure that TFAI agents are able to reason through the legality of their goals or conduct in a way that is not just legible and plausibly legally valid in its conclusions, but which in fact applies the appropriate methods of treaty interpretation (either under international law or as agreed upon by the contracting parties).[ref 222] 

On the other hand, a more pragmatic perspective might not see faithfulness as strictly politically necessary to AI-guiding treaties. For instance, some AI safety work has argued that, even if we cannot use a model’s chain-of-thought record to faithfully understand its actual underlying (legal) reasoning, we might still use it as the basis for model monitorability since it can allow us to robustly predict the conditions when it is likely to change its judgment of the legality of particular behaviour.[ref 223] This implies that the negotiating treaty parties could simply stress-test TFAI agents until they agree that the agents appear to reach the correct (or at least, mutually acceptable) legal interpretations of the treaty in all cases, which are free of undue influence or bias, even if they cannot directly confirm that the agents use the conventionally correct methodology in reaching the conclusions.

Of course, even in this more pragmatic perspective, faithfulness could still be technically important in understanding the sources of interpretative error in the aftermath of a (supposedly) treaty-aligned TFAI agent violating obvious obligations; and it would (therefore) be politically important to ensuring state party trust in the stability, predictability, and robustness of the treaty-alignment checks. However, in such a case, the question of whether the model reaches the appropriate legal conclusions through the correct (or even a distinctly human) method of legal reasoning is ultimately subsidiary to the question of whether a model robustly and predictably reaches interpretations that are acceptable to the contracting parties.

Proponents of this pragmatic approach might reasonably suggest that this would not put us in a different situation from one we already accept with human judges; after all, we already accept that we cannot read the mind of a judge and that we often need to accept their claimed legal reasoning at face value—not in the sense that we must accept the substance of their proffered legal arguments uncritically but in the sense that we need to assume that it represents a faithful representation of the internal thought process that underpinned their judgment. Even amongst human judges, after all, we appear to rely on a form of monitorability (i.e., the consistency in judges’ judgments across similar cases and their incorruptibility to inadmissible factors, considerations, biases, or interests) when evaluating the quality and integrity of their legal reasoning across cases.[ref 224] 

However, to this above point, it could be countered that an important difference between human judges and AI systems is that we have some prima facie (psychological, neurological, and biological) reasons to assume that the underlying legal concepts and principles used by (human) interpreters in their legal reasoning are closely similar to (or at least convergent with) those originally used by the drafters of the to-be-interpreted laws but that we may not be able to—or should not—make such an assumption for AI systems. 

However, if we cannot trust the faithfulness of a CoT trace, could we still, in fact, trust in underlying cognitive convergence or legal concept alignment between AIs and humans? 

Importantly, the question of whether or how particular AI systems can perform at the human level on one, some, or all tasks is subtly but importantly different from the question of whether, in doing so, they engage in mechanistic strategies (i.e., thinking processes) that are fundamentally human-like.[ref 225] That is, to what degree do AI agents (built along the current LLM-based paradigm) reproduce, match, or merely mimic human cognition when they engage in processes of legal interpretation? How would we evaluate this? 

These questions turn on outstanding scientific debates over whether—and how—AI systems (and particularly current LLM-based systems) match or correspond to human cognition. This is a question that can be approached at different levels by considering these human-AI (dis)similarities at the (1) architectural (i.e., neurological), (2) behavioural, or (3) mechanistic levels.

i) AI and human cognition in a neuroscientific perspective

First off, in a neuroscientific perspective, there may be a remarkable amount of overlap or similarity between the computational structures and techniques exhibited by modern AI systems and those found in biology.[ref 226] This is remarkable, given that the name “neural network” is in principle a leftover artefact from the technique’s context of discovery.[ref 227] Nonetheless, recent years have seen markedly productive exchanges between the fields of neuroscience and machine learning, with new AI models inspired by the brain and brain models inspired by AI.[ref 228]

Consequently, it is at least revealing that a number of accounts in computational neuroscience hold that the human brain can itself be understood as a deep reinforcement learning model, though one idiosyncratically shaped by biological constraints.[ref 229] Indeed, leading theories of human cognition—such as the predictive-processing and active-inference paradigms—treat the human brain as a system intended to predict the next sensory input from large amounts of previous sensory input,[ref 230] a description not fundamentally distinct from the view of LLMs as engaged in mere textual prediction.[ref 231] 

Furthermore, neural networks have learned a range of specialized circuits in training which neuroscientists later have discovered exist also in the brain,[ref 232] potentially combining RL models, recurrent and convolutional networks, forms of backpropagation, and predictive coding, amongst other techniques.[ref 233] It has been similarly suggested that human visual perception is based on deep neural networks that work similarly to artificial neural networks.[ref 234] There is also evidence that multimodal LLMs process different types of data (e.g., visual or language-based) similarly to how the human brain may perform such tasks—by relying on mechanisms for abstractly and centrally processing such data from diverse modalities in a centralized manner that is similar to that of the “semantic hub” in the anterior temporal lobe of the human brain.[ref 235] 

Furthermore, neural networks have been described as computationally plausible models of human language processing. While it is true that the amount of training data these models depend on significantly exceeds that required by a human child to learn language, much of this extra data may in fact be superfluous: One experiment trained a GPT-2 model on a “mere” 100-million-word training dataset—an amount similar to what children are estimated to be exposed to in the first 10 years of life—and found that the resulting models were able to accurately predict fMRI-measured human brain responses to language.[ref 236] In fact, the total sensory input of an infant in its first year of life may be on the order of the same number of bits as an LLM training set (albeit in embodiment rather than text).[ref 237] 

These neuroscientific approaches also provide at least some support for a scale-based approach to AI. For instance, some modern accounts of the evolution of human cognition emphasize the strict continuity in the emergence of presumed uniquely human cognitive abilities, seeing these as the result of steady quantitative increases in the global capacity to process information.[ref 238] These theories imply that even simple quantitative differences in the scale of animal and human cognition, rather than any deep differences in architectural features or traits, account for the observed differences between human and animal cognition, while also explaining observed regularities across various domains of cognition as well as various phenomena within child development.[ref 239] Other work has disagreed and has emphasized the phylogenetic timing of distinct breakthroughs in behavioural abilities during brain evolution in the human lineage.[ref 240] Nonetheless, such findings, along with the fact that the human brain is, biologically, simply a scaled-up primate brain in its cellular composition and metabolic cost,[ref 241] suggest that there may not be any secret design ingredient necessary for human-level intelligence and that, rather than being solely dependent on key representational capabilities, human-level general cognition may to an important degree be a simple matter of scale even amongst humans and other animals.[ref 242] If so, we may have reason to expect similar outcomes from merely scaling up the global information processing capabilities of AI systems.

However, while all this offers intriguing evidence, it is not uncontested. More importantly, even if we were to grant some degree of deep architectural similarity between humans and AIs, this is far from insufficient to establish that AI systems necessarily represent high-level concepts—and, critically, legal concepts—in the same way as humans do, nor that they reason about them in the same manner as we do. 

ii) (Dis)similarities between AI and humans in behavioural perspective

A second avenue for investigating the cognitive similarity between humans and AI systems, therefore, focuses on comparing behavioural patterns. 

Notably, such work has found that LLMs exhibit some of the same cognitive biases as humans, including their distinct susceptibility to fallacies and framing effects[ref 243] or the inability—especially of more powerful LLMs—to produce truly random sequences (such as in calling coin flips).[ref 244] However, this work has also found that even as AI models demonstrate common human biases in social, moral, and strategic decision-making domains, they also demonstrate divergences from human patterns.[ref 245] 

Moreover, in some cases, otherwise human-equivalent AI capabilities can be interfered with in seemingly innocuous ways which would not throw off human cognition.[ref 246] Likewise, while tests of analogical reasoning tasks show that LLMs can match humans in some variations of novel analogical reasoning tasks, they respond differently in some task variations;[ref 247] this implies that even where current AI approaches could offer a possible model of human-level analogical reasoning, their underlying processes in doing so are not necessarily human-like.[ref 248] 

This supports the general idea that there are some divergences in the types of cognitive systems represented in humans and in AI systems; however, it again remains inconclusive whether these differences would categorically preclude AI agents from engaging in (or approximating) certain relevant processes of legal reasoning.  

iii) AI-human concept alignment from the perspective of mechanistic interpretability

Thirdly, then, we can turn to the most direct approach to understanding whether (or in what sense) current AI models (based on LLMs) are able to truly utilize the same legal concepts as those leveraged by humans: This draws on approaches around mechanistic interpretability and on the emerging science that explores the “representational alignment” between different biological and artificial information processing systems.[ref 249] 

There are some domains, such as in the processing of visual scenes, where it appears that high-level representations embedded in large language models are similar to those embedded in the human brain:[ref 250] LLMs and multimodal LLMs, for instance, have been found to develop human-like conceptual representations of physical objects.[ref 251] Other researchers have even argued for the existence of “representation universality” not just amongst different artificial neural networks, but even amongst neural nets and human brains, which end up representing certain types of information similarly.[ref 252] In fact, some research indicates that LLMs might mirror human brain mechanisms and even neural activity patterns involved in tasks involving the description of concepts or abstract reasoning, suggesting a remarkable degree of “neurocognitive alignment”.[ref 253] 

At the same time, as above, there are also clear cases of models adopting atypical, and very non-human-like mechanisms to perform even simple cognitive tasks such as mathematical addition,[ref 254] including mechanistic strategies that might not generalize or transfer across to other domains.[ref 255]

Significantly, then, while research suggests that vector-based language models offer one compelling account of human conceptual representation—in that they can, in principle, handle the compositional, structured, and symbolic properties required for human concepts[ref 256]—this again does not mean that, in fact, modern LLMs have acquired these specific concept representations in relevant domains (such as law).

Ultimately, then, while each of the lines of evidence—neuroscientific architectural similarity, behavioural dispositions, and alignment of concepts and mechanistic strategies—offers some ground to assume a (to some perhaps surprising) degree of cognitive similarity amongst AIs and humans, they clearly fail to establish full cognitive similarity or alignment over concepts or reasoning approaches. In fact, they offer some ground for assuming that full concept alignment—that is, fully human-like reasoning—is not presently achieved by LLM-based AI systems. 

Clearly, then, there is significant outstanding work to be conducted, with these fields having some way to go to ensure that we can conclusively ensure that human paradigms or approaches in legal reasoning successfully translate across to AI cognition. However, even if we assume that AI agents reason about the law differently than humans, (when) would this actually matter to the TFAI framework? 

On the one hand, it has been argued that “concept alignment” between humans and AIs is a general prerequisite for any form of true AI value alignment;[ref 257] this would imply it is critical for any forms of deep law alignment or treaty alignment, also.[ref 258] On the other hand, it has been suggested that evaluating the cognitive capacities of LLMs requires overcoming anthropocentric biases and that we should be specifically wary of dismissing LLM mechanistic strategies that differ from those used by humans as somehow not genuinely competent.[ref 259] In this perspective, an AI system which invariably reached (legally) valid conclusions should be accepted as an (adequately) competent legal reasoner, even if we had suspicions or proof that it reached its conclusions through very different routes.

This could create potential challenges; after all, if (1) we cannot trust the faithfulness of an TFAI agents’ reasoning traces, and if (2) we cannot (per the preceding discussion) assume cognitive alignment between that agent and a human legal reasoner, then there is no clear mechanism by which to verify whether the legal interpretations conducted by a particular TFAI agent in fact follow the established and recognized approaches to treaty interpretation under international law.[ref 260] 

Once again, the degree to which this is a hurdle to the TFAI framework may ultimately be a political one: States could decide that they would only accept TFAI agents that provably engaged in the precise legal-reasoning steps that humans do, and so reject any TFAI agents for which such a case could not be made. Or they might decide that even if such a guarantee is not on the table, they are still happy to adopt and utilize TFAI agents, so long as their legal interpretations are robustly aligned with the interpretations that humans would come to (or which their principals agree they should come to).  

d) Human-TFAI agent fine-tuning and scalable oversight challenges

Finally, there are a number of more practical questions around the feasibility of maintaining appropriate oversight over TFAI agents operating within a TFAI framework.

To be clear, some of the challenges and barriers to the robust use of TFAI agents (such as risks of unintended bias or of inappropriate or unfaithful reasoning traces) could well be addressed through a range of technical and policy measures taken at various stages in the development and deployment of TFAI agents. For instance, one could ensure the adoption of adequate RLHF fine-tuning of TFAI agents, and/or ongoing validation, oversight, and review, by experienced (international) legal professionals.[ref 261] 

However, beyond creating additional costs that would reduce the cost-effectiveness or competitiveness of AI agent deployments, there would be additional practical questions that would need clarification: What skills should human international lawyers have in order to effectively spot and call out legal sophistry? Moreover, what should be the specialization or background of the international lawyers used in such fine-tuning or oversight arrangements? This matters, since different legal professionals may (implicitly) favour different norms or regimes within the fragmented system of international law.[ref 262] 

These challenges are exacerbated by the fact that any arrangements for human oversight of AI agents’ continued alignment to the norms of a treaty would run into the challenge of “scalable oversight”—the established problem of “supervising systems that potentially outperform us on most skills relevant to the task at hand.”[ref 263] That is, as AI agents’ advance in sophistication and reasoning capability, how might human deployers and observers (or any ancillary AI agents) reliably distinguish between true TFAIs and functional AI henchmen (or even misaligned AI systems)[ref 264] that merely appear to be treaty-following when observed but which will violate the treaty whenever they are unmonitored. These challenges of scalable oversight may be especially severe in domains where it is unclear how, whether, or when oversight by humans, or by intermediary, weaker AI systems over stronger AI systems, can meaningfully scale up.[ref 265] That challenge may be especially significant in the legal context, because a sufficiently high level of legal-reasoning competence may enable AI agents to offer legal justifications for their courses of actions that are so sophisticated as to make flawed rationalizations functionally undetectable. 



The above all represent important technical (as well as political) challenges to be addressed, along with significant open questions to be resolved. These highlight that, for the time being, human lawyers or judges should likely take caution and avoid abdicating interpretative responsibility to AI models and instead aim to formulate their own independent legal arguments. 

However, these challenges need not prove intrinsic or permanent barriers to the simultaneous and parallel productive deployment of AI agents within a TFAI framework. Just as recent innovations have helped mitigate the early propensity of AI models to hallucinate facts,[ref 266] there will be many ways by which AI agents can be designed, trained,[ref 267] or scaffolded in order to produce AI agents capable of sufficiently proficient legal reasoning to underwrite AI-guiding treaties,[ref 268] especially if there are guarantees to ensure that final interpretative authority remains vested in appropriate (and independent) human expertise. Indeed, one can support the use of TFAI agents (and AI-guiding treaties) as a specific commitment mechanism for shoring up advanced AI agreements, while simultaneously believing that human lawyers seeking to interpret international law should generally limit their use of AI systems, if only to avoid self-reinforcing interpretative loops. 

To be clear, we emphasize that today’s agentic AI systems are likely still too brittle, unreliable, and technically limited in key respects to lend themselves to direct implementation of a TFAI framework. However, our proposal here in particular considers the more fully capable AI agents that very plausibly are on the horizon in the near- to medium-term future.[ref 269] Indeed, one hope could be that, if a TFAI framework becomes recognized as a beneficial commitment mechanism, this could help spur more focused research efforts to overcome the remaining challenges in legal performance, bias, sophistry, or obfuscation, and in effective oversight, in order to differentially accelerate cooperative and stabilizing applications of AI technologies.[ref 270]

On the legal side, a TFAI framework would require two or more states[ref 271] to (1) conduct an international agreement (the “AI-guiding treaty”) that (2) specifies a set of mutually agreed constraints on the behaviour of their AI agents, and to (3) ensure that all (relevant) AI agents deployed by states parties would follow the treaty by design. 

1. An AI-guiding treaty

In many cases, an AI-guiding treaty would not necessarily need to look very different from any other treaty. It would be a “treaty” as defined under the Vienna Convention on the Law of Treaties (VCLT) Art 1(a), being 

“an international agreement concluded between States in written form and governed by international law, whether embodied in a single instrument or in two or more related instruments and whatever its particular designation;”[ref 272] 

In their most basic form, AI-guiding treaties would be straightforward (digitally readable) documents[ref 273] containing the various traditional elements common to many treaties,[ref 274] including but not limited to: 

  1. introductory elements such as a title, preamble, “object and purpose” clauses, and definitions; 
  2. substantive provisions, such as those regarding the treaty’s scope of application, the obligations and rights of the parties, and distinct institutional arrangements; 
  3. secondary rules, such as procedures for review, amendment, or the designation of authoritative interpreters; 
  4. enforcement and compliance provisions, such as monitoring and verification provisions setting out procedures for inspections and enforcement, dispute settlement mechanisms, clauses establishing sanctions or consequences for a breach (e.g., suspension clauses, collective responses, or referrals to other bodies such as the UN Security Council); and implementation obligations regarding domestic legal or administrative measures to be taken by states parties;
  5. final clauses clarifying procedures for signing, ratifying, or accepting the treaty; accession clauses (to enable non-signatory states to join at a point subsequent to the treaty’s entry into force); conditions or thresholds for entry into force; allowances for reservations; depositary provisions (setting out the official keeper of the treaty instrument); rules around the authentic text and authoritative language versions; withdrawal or denunciation clauses, or fixed duration, termination, or renewal conditions; and
  6. annexes, protocols, appendices, or schedules, listing technical details or control lists; optional or additional protocols; non-legally binding unilateral statements; and/or statutes for newly established arbitral bodies. 

However, AI-guiding treaties would not need to be fully isomorphic to traditional treaties. Indeed, there are a range of ways in which the unique affordances created by a TFAI set-up would allow innovations or variations on the classic treaty formula. For instance, in drafting the treaty text, states could leverage the ability of TFAI agents to rapidly process and query increasingly long (libraries of) documents[ref 275] in order to draft much more exhaustive and more detailed treaty texts than has been the historical norm. 

Notably, more detailed treaty texts could (1) tailor obligations to particular local contexts, even to the point of specifying bespoke obligations as they apply to individual government installations, military bases, geographic locales,[ref 276] segments of the global internet infrastructure (e.g., particular submarine cables or specific hyperscale data centres); (2) cover many more potential contingencies or ambiguities that could arise in the operation of TFAI systems; (3) red-team and built-in advance legal responses to several likely legal exploits that could be attempted; (4) hedge against future technological developments by building in pre-articulated, technology-specific conditional rules that would only apply under clearly prescribed future conditions;[ref 277] (5) scope and clearly set out “asymmetric” treaties that imposed different obligations upon (the TFAI agents deployed by) different state parties.[ref 278]

Indeed, innovations to the traditional treaty format could extend much further than mere length; for instance, states could craft treaties as much more modular documents, with frequent hyperlinked cross-references and links between obligations, annexes, and interpretative guidance. They could specify embedded and distinct interpretative rules that offered distinct interpretative principles for different sections or norms, with explicit hierarchies of norms and obligations established and clarified, or with provisions to ensure coherent textualism not just within the treaty, but also with other other norms in international law. Such treaties could clarify automated triggers for different thresholds and/or notification or escalation procedures, or they could include clear schedules for delegated interpretation. 

Any of these design features could produce instruments that offer far more extensive and granular specificity over treaty obligations than has been the case in the past. Accordingly, well-designed AI-guiding treaties could reach far beyond traditional treaties in their scope, effectiveness, and resilience. 

There are some caveats, however. For one, longer, more detailed treaty texts are not always politically achievable even if they would be more easily executable if adopted. After all, there may simply not be sufficient state interest in negotiating extremely long and detailed agreements—for instance, because there is time pressure during the negotiations; because some parties are diplomatically under-resourced; or because states can only agree on superficial ideas. There is no guarantee that AI-guiding treaties can resolve such longstanding sticking points to negotiation. Nor, indeed, is treaty length necessarily an unalloyed good. For instance, longer texts could potentially introduce more ambiguities, questions, or points for incoherence or (accidental or even strategically engineered) treaty conflict. 

Finally, these considerations would look significantly different if the TFAI framework is applied not to novel and bespoke advanced AI agreements, but instead to already existing obligations enshrined in existing (non-AI-specific) treaties. In such circumstances, existing instruments in international law cannot (or should not) be adapted for the machines.

2. Open questions for AI-guiding treaties

Of course, AI-guiding treaties also raise many practical questions: when, where, and how should such treaties allow for treaty withdrawal, derogation, or reservations by one or more parties? Such reservations—or partial amendments that include only some of the states parties—might result in a fractured regime.[ref 279] However, this need not be a challenge for TFAI agents per se, so long as it remained clear to each state’s TFAI agents which version of the treaty (or which provisions within it) are applicable to their deploying state.

There are also further questions, however. For instance, should a TFAI framework accommodate the existence of multilingual AI-guiding treaties (i.e., treaties drawn up into the languages of all treaty parties, with all texts considered authentic)? If so, would this create significant interpretative challenges—since TFAI agents might adopt divergent meanings depending on which version of the text they apply in everyday practice—or would it result in greater interpretative stability (since all TFAI agents might be able to refer to different authentic texts in clarifying the meaning of terms)?[ref 280]

Moreover, how should “dualist” states—that is, those states which require international agreements to be implemented in municipal law for those treaties to have domestic effects[ref 281]—implement AI-guiding treaties? May they specify that their AI agents directly follow the treaty text, or should the treaty first be implemented into domestic statute, with the TFAIs aligned to the resulting legislative text? This may prove especially important for questions of how the AI-guiding treaty may validly be interpreted by TFAI agents[ref 282] given that it suggests that they may need to consider domestic principles of interpretation, which may differ from international principles of interpretation.[ref 283] For instance, in some cases domestic courts in the US have adopted different views of the relative role of different components in treaty interpretation than those strictly required under the VCLT.[ref 284] This could create the risk that different states’ TFAI agents apply different methodologies of treaty interpretation, reaching different conclusions. Of course, since (as will be discussed shortly) in the TFAI framework TFAI agents are not considered direct normative subjects to the AI-guiding treaty, we suggest that in many cases it might be appropriate for them to refer to the treaty text (e.g., by treating it as an international standard) even in dualist contexts.

With all this, we re-emphasize that AI-guiding treaties, as proposed here, are considered relatively pragmatic arrangements amongst two or more states, intended to facilitate practical, effective, and robust cooperation in important domains. This also creates scope for variation in the legal and technical implementation of such frameworks. For instance, such agreements would not even need to take the form of a formally binding treaty, as such. They could also take the form of a formal non-binding agreement, joint statement, or communique,[ref 285] specifying—within the text, in an annex, or through incorporation-by-reference to later executive agreements—the particular text and obligations that the TFAI agents should adhere to at a high level of compliance. 

In this case, it might be open for debate whether the resulting ‘soft’ TFAI arrangements would or should be considered a novel manner of implementing existing international legal frameworks, or if they should instead be considered a novel, third form of commitment mechanism: neither a hard-law treaty that is strictly binding upon its states, nor a non-binding soft-law mechanism, entirely—but rather a third form, a non-binding mechanism that however is self-executing upon states’ AI agents, who are to treat it as hard law in its application. Such a case would raise interesting questions over whether, or to what extent, the resulting TFAI agents would even need to defer to the Vienna Convention on the Law of Treaties in guiding their interpretation of terms, since soft-law instruments, political commitments, and non-binding Memoranda of Understanding technically fall outside of its scope, even as they are often appealed to in the interpretation of soft-law instruments, especially those that elucidate binding treaties. However, we leave these questions to future research.  

3. AI-guiding treaties as infrastructural, not normative, constraints on TFAI agents

In addition, it is important to clarify key aspects about the relation between TFAI agents, AI-guiding treaties, and the states that respectively deploy and negotiate them.

First off, similar to in the domestic framework for law-following AI, the TFAI proposal does not depend on the assumption that TFAI agents will act “law-following” (or, in this case, treaty-following) for most of the reasons that (are held to) contribute to human compliance with legal codes.[ref 286] That is, TFAI agents, like LFAI agents, are not expected to be swayed by some deep moral respect for the law, nor by the deterrent function of sanctions threatened against the AI itself,[ref 287] nor because of any form of norm socialization or reputational concerns, nor because of their self-interested concern over guaranteeing continued stable economic exchange with the human economy,[ref 288] nor to uphold some form of reciprocal social contract between AIs and humans.[ref 289] 

Furthermore, while LFAI (and TFAI) agents that are aligned to the intent of their states would already tailor their actions taking into consideration the costs that would be incurred by their principals (i.e., their deploying states) as a result of sanctions threatened in reaction to AI agents’ actions, this is also not the core mechanism on which law-alignment turns; after all, if these were the only factors compelling treaty compliance, they would functionally remain henchmen that were not in fact obeying the treaty.[ref 290] 

Instead of all of this, we envision AI-guiding treaties much more moderately: as technical ex ante infrastructural constraints on TFAI agents’ range of acceptable goals or actions. In doing so, we simply treat the treaty text as an appropriate, stable, and certified referent text through which states can establish jointly agreed-upon infrastructural constraints on which instructed goals their AI agents may accept and on the latitude of conduct which they may adopt in pursuit of lawful goals. In a technical sense, TFAI therefore builds on demonstrated AI industry safety techniques which have sought to align the behaviour of AI systems with a particular “constitution”[ref 291] or “Model Spec”.[ref 292] 

This of course means that, under our account, TFAI agents are treaty-following only in a thin, functional sense: they are designed to refer to the text of the treaty in determining the legality of potential goals or lines of action—but not in the sense that they are considered normatively subject to duties imposed by the treaty. In this, the TFAI proposal is arguably more modest than even the domestic LFAI framework, since it does not even construct AI systems as duty-bearing “legal actors”[ref 293] and therefore does not involve significant shifts in the legal ontology of international law per se.[ref 294]

IV. Clarifying the Relationship between TFAI Agents and their States 

Its relatively pragmatic orientation makes the TFAI proposal a legally moderate project. However, while the TFAI framework does not, on a technical level, require us to conceive of or treat AI agents as duty-bearing legal persons or legal actors, there remain some important questions with regard to TFAI agents’ exact legal status and treatment under international law, especially in terms of their relation to their deploying states. These questions matter not just in terms of the feasibility of slotting the TFAI commitment mechanism within the tapestry of existing international law, but also for the precise operation of TFAI agreements. 

In a direct sense, the most obvious implications of TFAI agents’ legal status are legal. After all, (1) whether or not TFAI agents will be considered to possess any form of (international or domestic) legal personhood, and (2) whether or not their actions will be considered attributable to their deploying states will shift the legal consequences if or when TFAI agents, in spite of their treaty-following constraints, act in violation of the treaty or if they act in ways that violate any other international obligation incumbent upon their deploying state.

1. The prevailing responsibility gap around AI agents

For instance, if highly autonomous TFAI agents are not afforded any legal personhood, but neither is their behaviour attributable to a particular state, this would facially result in a “responsibility gap” under international law.[ref 295] If they acted in violation of an AI-guiding treaty, this would not, then, be treated as a violation by their deploying state of its obligations under that treaty. This crystallizes the general problem that, under current attribution principles, it may be difficult to establish liability for the actions of AI agents—since the actions of public AI agents cannot (yet currently) be automatically attributed to states because state responsibility anyway rarely arises for unforeseeable harms and because private businesses have no international liability for the harm they cause.[ref 296] 

2. Due diligence obligations around AI agents’ actions

Of course, that is not to say that by default states would not face any legal consequences for actions taken by deployed AI agents (especially those acting from or through their territory). For instance, even if the actions of AI agents cannot be attributed to states, or the agents in question are developed or deployed by non-state actors, and beyond the control of the state, states still have obligations to exercise due diligence to protect the rights of other states from those actions.[ref 297] For instance, if AI agents deployed by private actors acted in ways that inflicted transboundary harm, or which violated human rights, international humanitarian law, or international environmental law (amongst others), then their actions could potentially violate these due diligence obligations incumbent upon all states.[ref 298] 

Some have argued that even this sort of attribution could be complicated in some domains, for instance if the transboundary harm inflicted by an AI agent is primarily cyber-mediated:[ref 299] after all, in the ILC’s commentaries on the Draft Articles on Prevention of Transboundary Harm from Hazardous Activities, transboundary harm is predominantly defined as harm “through […] physical consequences”.[ref 300] However, as noted by Talita Dias, “a majority of states that have spoken out on this matter agree that due diligence obligations apply whether the harm occurred offline or online.”[ref 301]

Nonetheless, this situation would still mean that TFAI agents’ actions that violated an AI-guiding treaty would only be considered as legal violations if they also resulted in due diligence violations—potentially constraining the set of other scenarios or contingencies that states could effectively contract around through AI-guiding treaties. 

Of course, it could be argued that these legal questions are possibly orthogonal, or at least marginal, to either the political or technical feasibility of the TFAI framework itself. After all, even if states would not face legal consequences for their TFAI agents violating the terms of their underlying treaty, this need not cripple such treaties’ ability to serve either as a generally effective technical alignment anchor or as a politically valuable commitment mechanism. At a technical level, after all, TFAI agents would not necessarily be influenced by the actual ex post legal consequences (e.g., liability) resulting from their noncompliance with the treaty, as they simply treat the AI-guiding treaty text as an infrastructural constraint to be obeyed ex ante. After all, even if their general reasoning processes (in trying to act on behalf of their principal) should take into consideration the consequences of different actions for their states, the core question at the heart of their legal reasoning should be “is this course of action lawful?”, not “if this course of action is found to be unlawful, what will be the (legal or political) consequences for my deploying state?”

Simultaneously, AI-guiding treaties could continue to operate at the political level. Even if they escaped direct state responsibility or liability under international law, states would still face political consequences for deploying TFAI agents that, by design or accident, had violated the treaty. Such consequences could range from reciprocal noncompliance to collapse of the treaty regime, along with domestic political fallout if the public at home loses faith in its government as a result of treaty-violating actions taken by AI agents that had been trumpeted as treaty-following. The prospect of such political costs might ensure that states took seriously their commitment to correcting instances of TFAI noncompliance, at least insofar as those instances could be easily ‘attributed’ to them.

Of course, such attribution may face significant challenges since, depending on the substantive content of an AI-guiding treaty, it may be more or less obvious to other state parties when a TFAI agent has violated it. For instance, if the treaty stipulates a regular transfer of certain resources, technologies, or benefits by its agents, then the recipient states would presumably notice failures in short order. Conversely, if the treaty bars TFAI agents from engaging in cyberattacks against certain infrastructure, the mere observation that those targets are experiencing a cyberattack might not be sufficient to prove the involvement of AI agents, let alone of a particular state’s TFAI agents acting in violation of a treaty. This is analogous to the technical difficulties already encountered today in attributing the impacts of particular cyber operations.[ref 302] Finally, if the AI-guiding treaty dictates limits to TFAI activity within certain internal state networks (e.g., “no use in automating AI research”), then it might well take much longer to impose the political costs for treaty noncompliance.

The above discussion suggests that AI-guiding treaties could remain a broadly functional political tool for the technical self-implementation of certain AI-related international agreements, even if these questions of the agents’ status were not settled and the responsibility gap were not closed. Nonetheless, if such questions are more appropriately clarified, this could provide benefits to the TFAI framework that are not just legal but also political and technical. 

Legally, it would ensure that the growing use of AI agents would not come at a cost of failing to enforce adequate state responsibility for any internationally wrongful acts, thereby preserving the integrity and functioning of the international legal system in conditions where an increasing fraction of all actions carried out with transboundary impacts or with legal effects under international regimes are conducted not by humans but by AI agents. More speculatively, an added benefit of clarifying AI agents’ status and attributability would be that it might enable the actions of such systems to potentially constitute, or contribute to, evidence of state practice, which could have self-stabilizing effects on AI-guiding treaty interpretation.[ref 303]

Politically, while—as just noted—the lack of clear state responsibility for TFAI agent’s actions would not diminish the various other political costs that contracting states could impose upon one another—providing incentives for states to attempt an effective treaty alignment for their agents—there may still be concerns that such violations, and states’ responses, will be erosive to the long-term stability of AI-guiding treaties (and of treaties in general). 

Finally, legally establishing state responsibility for TFAI agents may also have important consequences in technical terms, since an unclear legal status of TFAI agents, and an inability to attribute their conduct to their deploying states, might pose a functional problem for the effective treaty alignment of these systems because it potentially leaves open legal loopholes in the treaty. At first glance, one would expect that TFAI agents might straightforwardly interpret any AI-guiding treaty references to “TFAI agents” as applying to themselves, and so would straightforwardly seek to abide by the prescribed or circumscribed behaviours. 

In sophisticated legal reasoners, however, there might be a risk that their lack of clear status under international law might lead them to exploit (whether autonomously or under instruction from their deploying state) those legal loopholes to conclude that their actions are not, in fact, bound by the treaty.[ref 304] By analogy, just as private citizens or corporations could reason that they have no direct obligations under interstate treaties such as the Nuclear Non-Proliferation Treaty (NPT) or the Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES), and only have duties under any resulting implementing domestic regulation established as part of those treaties, sophisticated (or strategically prompted) TFAI agents could, hypothetically, argue that (1) since they are not legal subjects under international law and cannot serve as signatories to the treaty in their own right, and (2) since they are not considered state agents acting on behalf of (and under the same obligations as) signatory states, and (3) since they have only been aligned to the treaty text, not (potentially) to any domestic implementing legislation, therefore they are not legally bound by the treaty under international law.

Would TFAI agents attempt such legal gymnastics? In one sense, present-day systems and applications of “constitutional AI”[ref 305] have involved the inclusion of principles inspired by various documents—from the UN Universal Declaration on Human Rights to Apple’s Terms of Service[ref 306]—as part of an agent’s specification, thereby certifying those texts as sources of behavioural guidance to the AI system in question, regardless of their exact legal status. Nonetheless, one might well imagine that a sufficiently sophisticated TFAI agent, acting loyally to its state principal, would have reason to search and exploit any legal loopholes. Such an outcome would undercut the basic technical functioning of the TFAI framework. 

One way to patch this loophole would be for the contracting states to expressly include a provision, in the AI-guiding treaty, that their use of particular AI agents—whether registered model families or particular registered instances—is explicitly included and covered in the terms of the treaty, thus strongly reducing the wiggle room for TFAI agents’ interpretation. A more comprehensive legal response, however, would aim to clarify debates on liability and attribution for TFAI agents’ wrongful acts (whether those in violation of the AI-guiding treaty or under international law generally). As such, it is constructive to briefly consider different potential legal resolutions of this loophole. 

We can accordingly compare various potential constructions of the legal status of TFAI agents, depending on whether they become governed under (1) some future new lex specialis liability regime applicable to AI agents (or to state “objects”); or whether they become treated (2) as entities possessing independent international legal personality; or whether, under the existing law of state responsibility as codified under ARSIWA, they become (3) entities possessing domestic legal personality as state organs or authorized entities, with their conduct ascribed to the deploying state; or they are considered as (4) tools without independent legal standing that are nonetheless functionally treated as part of state conduct under ARSIWA. 

These four approaches leverage, to different degrees and in different combinations, the (otherwise distinct) tools of the law of state responsibility and legal personhood. They therefore offer distinct parallel solutions to the problems (the responsibility gap; or the potential interpretive loopholes) that might emerge if TFAI agents are deployed without either of these legal questions being resolved. Importantly, while all four avenues would constitute lex ferenda to some degree, they would involve greater and lesser degrees of such legal innovation. Let us therefore briefly review the implications and merits of these options to consider which would be functional—and which would be most optimal—for the TFAI framework.

1. Developing a new lex specialis regime of state liability for their (AI) objects would be a slow process

As noted above, states would still face legal consequences for some actions by their deployed TFAI agents where those actions resulted in violations of key norms (e.g. human rights, IHL, environmental law; no-harm principle, etc.) under international law. 

However, states could set down clearer and more specific rules for AI agents, including through a new multilateral treaty regime. For instance, in some domains, such as in outer space law, states have negotiated self-contained strict liability regimes (e.g., for space objects).[ref 307] They could do so again for AI agents, creating a new regime that would regulate state responsibility or liability for the specific norm violations, wrongful acts, and harms produced by particular classes of AI systems,[ref 308] AI agents as a whole—or even, generally, by any and all of states’ inanimate objects.[ref 309] For instance, as Pacholska has noted, one might consider whether such a regime on state responsibility for the wrongdoings of its inanimate objects could even be 

“…modelled on either the Latin concept of qui facit per alium facit per se or strict liability for damage caused by animals that is present in many domestic jurisdictions [and] could be conceptualised as a general principle of law within the meaning of Article 38(c) of the ICJ Statute.[ref 310]

If such a regime emerged, it would (as a lex specialis regime) supersede the general ARSIWA regime on state responsibility (as discussed below), as these are residual in nature.[ref 311] 

However, in practice, while this could and should remain open as a future option, for the near term this might be too slow and protracted a process to effectively and swiftly provide general legal clarity and guidance on TFAI agents across all treaties. To be sure, states could attempt to include such provisions in particular advanced AI agreements (whether or not those were configured as AI-guiding treaties). However, if attempting to do so would slow or hold back such treaties, it might be preferable to work this out in parallel—or adopt another patch. 

Another option that has sometimes received attention could be to extend some form of international legal personality to TFAI agents. For instance, some have suggested that a “highly interdependent cyber system” should be recognized with the creation of an “international entity”.[ref 312] 

Indeed, the extension or attribution of forms of international legal personhood to new entities would not be entirely unprecedented. While international law has been conducted primarily amongst states, it has historically developed in ways that have extended various sets of (limited and specific) rights and duties to a range of non-state actors, along with (in some cases) various forms of personhood. For instance, international organizations have been granted rights to enter into treaties and enjoy some immunities, as well as duties to act within their legal competence.[ref 313] Human individuals obviously possess a wide range of rights under human rights law, as well as under investment protection, which they can vindicate by international action,[ref 314] and they also possess duties under international criminal law.[ref 315] Non-self-governing peoples have some legal personality under the principle of self-determination.[ref 316] 

There are also more anomalous cases: non-state armed groups remain subject to a range of duties under international humanitarian law,[ref 317] but do not necessarily have international legal personhood unless they are also recognized as belligerents, in which case they may enter into legal relations and conclude agreements on the international plane with states and other belligerents or insurgents.[ref 318] By contrast, corporations do not have duties under international law, although they may occasionally have rights under bilateral investment treaties to bring claims against states; nonetheless, in principle, they are considered to lack international legal personality.[ref 319] Meanwhile, in a 1929 treaty,[ref 320] Italy recognized the Holy See as having exclusive sovereignty and jurisdiction of the City of the Vatican, and it has since been widely recognized as a legal person with treaty-making capacity, even though it does not meet all the strict criteria of a state.[ref 321]

There is therefore nothing that categorically rules out the future recognition—whether through new treaty agreement, amendments to existing treaties, widespread state practice and opinio juris creating new custom, or the jurisprudence of international courts—of some measure of legal personhood (and/or some package of duties or rights, or both) for AI agents, creating truly (normatively) treaty-following agents. 

However, extending some forms of international legal personhood to TFAI agents might prove to be more doctrinally difficult than was such an extension to any of these other entities, which are constructs created through the delegated authority of states (e.g. international organizations), are state-like in important respects (e.g. belligerent non-state armed groups; the Holy See), and which in all cases ultimately bottom out in human actors. Indeed, if even corporations have been denied international legal personality, the case for extending it to TFAI agents becomes even harder to make. 

Moreover, a solution based in international legal personhood might even have drawbacks from the perspective of functional AI-guiding treaties. For one, the prospects for an attribution of international legal personhood appear speculative and politically and doctrinally slim, at least under existing instruments in international law. It is unlikely, for instance, that legal personality would be extended to AI systems under existing human rights conventions, if only because instruments such as the European Convention on Human Rights bar non-natural persons—such as companies and, likely, AI systems—from even qualifying as applicants.[ref 322] 

Moreover, not only is such a far-reaching legal development unnecessary to an operational TFAI framework, it would possibly even be counterproductive. After all, the extension of even limited international legal personality to TFAI agents would set them apart from their deploying state and blur the appropriate lines of state responsibility. As the ILC noted in its Commentaries on the Articles on the Responsibility of States for Internationally Wrongful Acts, “[F]ederal States vary widely in their structure and distribution of powers, and […] in most cases the constituent units have no separate international legal personality […] nor any treaty-making power.”[ref 323] However, insofar as AI-guiding treaties are meant as commitment mechanisms amongst states, the (partial) legal decoupling of TFAI agents from their deploying state would simply defeat the point of AI-guiding treaties. It would create yet another international entity, which might complicate the processes of negotiating or establishing AI-guiding treaties (e.g., should TFAI agents be considered as contracting parties) and weaken the political incentives for establishing and maintaining them.[ref 324] 

Another option would be for state parties to an AI-guiding treaty to grant their agents some form of domestic legal personality and to treat them as state organs or empowered entities under the international law on state responsibility as codified in the International Law Commission’s (ILC) 2001 Articles on the Responsibility of States for Internationally Wrongful Acts (ARSIWA). 

Prima facie, the idea of constructing such a role for TFAI agents could be compatible with the proposals for domestic law-following AI, which envision (1) many government-deployed AI agents being used in a law-following manner[ref 325] and (2) treating them as duty-bearing legal actors (without rights).[ref 326]

In fact, while current law universally treats AI systems as objects, the idea of extending some forms of personhood to AI—whether fictional (e.g. corporate-type) or even non-fictional (e.g. natural persons)—has been floated in a range of contexts, by both legal scholars[ref 327] and some policymakers.[ref 328] Personhood for such systems is often presented as an appropriate pragmatic solution to situations where AI systems have become so autonomous that one could or should not impose responsibility for their actions on their developers,[ref 329] although others have argued that this solution would create significant new problems.[ref 330] 

However that may be, would this be possible in doctrinal terms? To be sure, states have already granted a degree of domestic personhood to various non-human entities—such as animals, ships, temples, or idols,[ref 331] amongst others. Given this, there appears to be little that would prevent them from also granting AI agents a degree of legal personality.[ref 332] There would be different ways to structure this, from “dependent personality” constructions whereby (similar to corporations) human actors would be needed to enforce any rights or obligations held by the entity,[ref 333] to entities that would have a higher degree of autonomy.

Most importantly, even where these systems were both acting with high autonomy, and moreover legally distinct from the human agents of the state through the attribution of such personhood, government-deployed AI agents could still, it has been argued, be sufficiently closely linked to the state that it would be straightforward to attribute their actions to that state under the international law on state responsibility.

b) ARSIWA and the law on state responsibility 

As noted, the regime of state responsibility has been authoritatively codified in the ILC’s Articles on the Responsibility of States for Internationally Wrongful Acts (ARSIWA).[ref 334] Additionally, the Tallinn Manual 2.0 on the International Law Applicable to Cyber Operations (Tallinn Manual) has detailed how these rules are considered to apply to state activities in cyberspace.[ref 335] While neither document is legally binding, the ARSIWA articles are widely recognized by both states and international courts and tribunals as an authoritative statement of customary international law,[ref 336] and the Tallinn Manual rules largely align with the ILC articles.[ref 337] 

Rather than focus on the primary norms that relate to the substantive obligations upon states, the ARSIWA articles clarify the secondary rules regulating “the general conditions under international law for the state to be considered responsible for wrongful actions or omissions”[ref 338] relating to these primary obligations, on the premise that “[e]very internationally wrongful act of a State entails the international responsibility of that State.”[ref 339] These rules on attribution therefore provide the processes through which the conduct of natural persons or entities becomes an “act of state”, for which the state is responsible.

Critically, unlike many domestic liability regimes, international responsibility of states under ARSIWA is not premised on causation[ref 340] but simply on rules of attribution. It is also a fault-agnostic regime and, as noted by Pacholska, an “objective regime”, under which—in contrast to, for instance, international criminal law—the mental state of the acting agents, or the intention of the state, are in principle irrelevant.[ref 341] 

ARSIWA sets out various grounds on which the conduct of certain actors or entities may be attributed to the state.[ref 342] Amongst others, these include situations where the conduct is by a state organ (ARSIWA Art 4)[ref 343] or by a private entity empowered to exercise governmental authority to exercise inherently governmental functions (Art 5).[ref 344] Significantly, Art 7 clarifies that

“an organ of a State or of a person or entity empowered to exercise elements of the governmental authority shall be considered an act of the State under international law if the organ, person or entity acts in that capacity, even if it exceeds its authority or contravenes instructions.”[ref 345]

In addition, while under normal circumstances states are not responsible for the conduct of private persons or entities, such actors’ behaviour is nonetheless attributable to them “if the person or group of persons is in fact acting on the instructions of, or under the direction or control of, that State in carrying out the conduct” (Art 8).[ref 346] Additionally, under Art 11, conduct can be attributed to a State, if that State acknowledged and adopted the conduct as its own.[ref 347]

c) ARSIWA Arts 4 & 5: TFAI agents as de jure state organs or empowered entities

How, if at all, might these norms apply to TFAI agents? As discussed, while international law has developed a number of specific regimes for regulating state responsibility or liability for harms or internationally wrongful acts resulting from space objects, or from transboundary harms that arise out of hazardous activities,[ref 348] there is currently no overarching international legal framework for attributing state responsibility (or liability) for its inanimate objects, per se.[ref 349] 

Nonetheless, it is likely that TFAI agents could instead be accommodated under the existing law on state responsibility, as laid down in ARSIWA. If TFAI agents are granted domestic legal personality by their deploying states, and are either designated formally as state organs (Art 4)[ref 350] or are treated as private entities empowered to exercise governmental authority to exercise inherently governmental functions (Art 5),[ref 351] then under ARSIWA their conduct would be attributable to the state, even in the cases where (as with intent-alignment failure) they contravened their explicit instructions, so long as they acted with apparent state authority.

d) Non-human entities as state agents under ARSIWA

A crucial question for this approach is whether ARSIWA is even applicable to AI agents. 

One natural objection is that the law of state responsibility in general, and ARSIWA specifically, are historically premised on the conduct of the human individuals that make up the “organs”, “entities”, or “groups of persons” involved.[ref 352] 

Some have suggested, however, that the text of ARSIWA could offer a remarkable amount of latitude to accommodate highly autonomous AI systems and to treat them either as state organs (Art 4) or as actors empowered to exercise governmental authority (Art 5).[ref 353] For instance, Haataja has argued that “[c]onceptually, it is not difficult to view [autonomous software entities] as entities for the purpose of state responsibility analysis [since] Articles 4 and 5 of the ILC Articles make explicit reference to ‘entities’ and, while Article 8 only refers directly to ‘persons and groups’, its commentary also makes reference to ‘persons or entities’.”[ref 354] Indeed, the ILC’s Commentaries clarify that, for the purposes of Art 4, a state’s “organs” includes “any person or entity which has that status in accordance with the internal law of the State.”[ref 355] 

Similarly, in her discussion of state responsibility for fully autonomous weapons systems (FAWS), Pacholska has argued that such systems (when deployed by state militaries) could straightforwardly be construed as “state agents”, a category which, while absent in ARSIWA themselves, occurs frequently in the ILC Commentaries to ARSIWA, usually in the phrase “organs or agents”.[ref 356] She furthermore notes how the term “agent” precedes those instruments, as it was frequently used in early arbitral awards during the early 20th century, many of which emphasized that “a universally recognized principle of international law states that the State is responsible for the violations of the law of nations committed by its agents”.[ref 357] Indeed, the term “agent” was revived by the ICJ in its Reparations for Injuries case,[ref 358] where it confirmed the responsibility of the United Nations for the conduct of its organs or agents, and clarified that in doing so, the Court

“understands the word ‘agent’ in the most liberal sense, that is to say, any person who, whether a paid official or not, and whether permanently employed or not, has been charged by an organ of the organization with carrying out, or helping to carry out, one of its functions—in short, any person through whom it acts.”[ref 359]

Of course, in these instruments the concepts of “entities” or “agents” were, again, invoked with human agents in mind. Nonetheless, Pacholska argues that there is nothing in either the phrasing or the content of this definition for “agent” that rules out its application to non-human persons, or even to objects or artefacts (whether or not guided by AI),[ref 360] there is however a challenge to such attempts in that the ILC, in its Commentary on Art 2 ARSIWA, has fairly clearly construed “acts of the state” to involve some measure of human involvement, since

“for particular conduct to be characterized as an internationally wrongful act, it must first be attributable to the State. The State is a real organized entity, a legal person with full authority to act under international law. But to recognize this is not to deny the elementary fact that the State cannot act of itself. An ‘act of the State’ must involve some action or omission by a human being or group.”[ref 361]

Some have argued that this means that any construction of AI agents as state agents cannot be supported under the current law,[ref 362] and remains entirely de lege ferenda.[ref 363] On the other hand, the precise formulation used here—that an act of the State ‘must involve some action or omission by a human being or group’ (emphasis added)—is remarkably loose. It does not, after all, stipulate that an act of the State is solely or entirely composed of actions or omissions by human beings. In so doing, it arguably leaves open the door to the construction of AI agents as state agents, so long as there is at least ‘some action or omission’ taken by a human beings in the chain: this is a potentially accommodating threshold, since many AI agents’ deployment, prompting, configuration, and operation is likely to involve at least some measure of human involvement.  

As such, this interpretation of AI agents is not without legal grounding, and it is possible that it is an interpretation that may become enshrined in state agreement or adopted through state practice. Indeed, this reading may be consonant with already-emerging state practice and treatment of state responsibility on issues such as lethal autonomous weapons systems, with the 2022 report of the Group of Governmental Experts on Emerging Technologies in the Area of Lethal Autonomous Systems (LAWS) emphasizing that “every internationally wrongful act of a state, including those potentially involving weapons systems based on emerging technologies in the area of LAWS entails international responsibility of that state.”[ref 364]

Finally, one promising avenue would be to ensure that TFAI agents’ status as state organs or empowered entities is clearly articulated and affirmed by the contracting states, within the treaties’ text, in order to fully close the loop on state attributability, ensuring politically and technically stable interpretation. 

4. TFAI agents without personhood as entities whose conduct is attributable to the state under ARSIWA

Furthermore, it is possible to arrive at an even more doctrinally modest variation of this approach to establishing state responsibility for TFAI agents, one where domestic legal personality for TFAI agents is not even required for their actions to become attributable to their principals. Indeed, this approach—whereby AI agents that have been delegated authority to act with legal significance are treated as legal agents, with their outputs attributed to principals—has been favoured in recent proposals for how to govern these systems under domestic law.[ref 365] 

Such a construction might, of course, involve practical costs or tradeoffs relative to more ambitious constructions: as Haataja notes, the process of granting autonomous AI agents a degree of domestic legal personhood would likely involve certain procedural steps, such as registration,[ref 366] which would ease the process of attributing wrongful acts to the conduct of particular AI agents and the conduct of those agents to their states.[ref 367] Indeed, there may be various domestic analogues for constructs in domestic law which bear enforceable duties while lacking full personhood.[ref 368]

Nonetheless, a version of the TFAI framework that would not require treaty parties to engage in novel (and potentially politically contested) innovations in their domestic law by granting AI agents even partial personhood would likely have lower thresholds to accession and implementation. Fortunately, state attributability functions straightforwardly even if these systems are not legally distinct from the human agents of the state. As Haataja notes, “the ILC Articles use the term ‘entity’ in a more general sense, meaning that the entity in question (be it an individual or group) does not need to have any distinct legal status under a state’s domestic law.”[ref 369] The important factor under ARSIWA is not the exact type or extent of (domestic) legal personality of the entity or agent, but rather its relationship with the state and the types of functions it performs.[ref 370] There are several avenues, then, by which TFAI agents without any legal personhood could nonetheless be considered as entities governed under ARSIWA, whose conduct is attributable to their deploying state. 

a) TFAI agents as “completely dependent” de facto organs of their deploying states

For one, even if an entity or agent does not have the de jure status of a state organ under a state’s domestic law, it may be equated to a de facto state organ under international law wherever it acts in “complete dependence” on the state for which it constitutes an instrument.[ref 371] As the ICJ established in Nicaragua, evaluations of “complete dependence” turn on a range of factors,[ref 372] but includes cases where a state created the non-state entity and provides deep resource assistance and control. Critically, even in cases where the basic models underpinning TFAI agents had not been pre-trained (i.e., created) by a state, the amount of (inference computing) resources that a state would need to continuously and actively dedicate to an AI agent, as a basic condition of those agents’ very persistence and operation, would likely suffice to meet that bar. Moreover, by the very act of prompting TFAI agents with high-level goals or directives, deploying states would be considered to exercise a “great degree of control” over intent-aligned, loyal TFAI agents. In these ways, the model would be “completely dependent” on the state, making its actions attributable to it.[ref 373] 

b) ARSIWA Art 8: TFAI agents acting under the “effective control” or instructions of a state 

Indeed, even if an TFAI agent were considered neither a de jure (as in the previous section) nor a de facto state organ under Art 4, nor empowered to exercise elements of governmental authority under Art 5, it is still possible to ground attributability under ARSIWA. After all, ARSIWA Art 8, which concerns “conduct directed or controlled by a State”, would apply to state-deployed TFAI agents; even if states would not be the ones developing the AI agents—in the sense that they would be providing them with high-level behavioural prompts through fine-tuning and post-training—they would still, as a matter of daily practice, be the actors providing prompts and instructions to the deployed TFAI agents. That means that such agents could naturally be “found to be acting under the instructions, directions, or control of a state.”[ref 374] 

As adopted by the ICJ in its Nicaragua and Bosnian Genocide judgments,[ref 375] and as affirmed in the Tallinn Manual, the standard of control considered in such cases is one of “effective control” of a state.[ref 376] According to the Tallinn Manual, for instance, a state is in effective control over the conduct of a non-state actor where it “determines the execution and course of the specific operation”, where it has “the ability to cause constituent activities of the operation to occur”, or where it can “order the cessation of those activities that are underway”.[ref 377] These conditions again naturally apply to TFAI agents which, even if they are granted a degree of latitude and autonomy in their operations, remain under the effective control of the state under these terms. After all, the state is intrinsically involved in providing the basic infrastructure (from an internet connection to various software toolkits) necessary for the “constituent activities” of any AI agent’s operation, and, as a matter of practice, will (or should) retain an ability to pause or cease an AI agents’ operation at a moment’s notice.

Of course, this argument should wrestle with one possible tension: how can we reconcile the argument that states are in ‘effective control’ of these AI agents, with the preceding idea that some AI agents (if not aligned to the law) might operate in a ‘lawless’ manner, which is itself one rationale for the TFAI framework? While a full argument may be beyond the scope of this paper, one might consider that the control relation between states and their AI agents is distinct from that between states and their human agents. That is, human agents are under the ‘effective control’ of their state, if the state “determines the execution and course of the specific operation”, has “the ability to cause constituent activities of the operation to occur”, or where it can “order the cessation of those activities that are underway”. However, while the state has many levers by which to coerce compliant behaviour of human non-state actors, the efficacy of those levers is grounded in that state’s ex post sanctions or consequences (e.g. a state-backed militia knows that if it disregards that state’s orders, it may lose key logistical or political support). However, these levers are not based on architectural kill-switches enabling direct intervention—states do not, as a rule, force their agents to wear explosive collars. As a consequence, states are in the ‘effective control’ of human agents because they can deter rogue behaviour, not because they can easily halt it while it is underway. AI systems, conversely, can at least in principle be subjected to forms of ‘run-time’ infrastructural controls (whether guardrails or kill-switches), and are completely and immediately dependent on continued access to the states’ computing infrastructure. In so doing, it could be argued that the state-AI agent relationship manifests a form of effective control that is different from that at play between states and their human agents—but that both relations nonetheless manifest legally valid forms of effective control for the purposes of state attribution.

Once again, it would be possible for an AI-guiding treaty to strengthen these norms, by including and codifying explicit attribution principles, establishing, for instance, that the actions of any AI system deployed under the jurisdiction or control of a state party shall be deemed attributable to that state.

c) ARSIWA Art 11: TFAI agents’ conduct adopted by states in the AI-guiding treaty

Finally, ARSIWA Art 11 offers potentially the most straightforward avenue to attributing behaviour to states, as it notes that

“Conduct which is not attributable to a State under the preceding articles shall nevertheless be considered an act of that State under international law if and to the extent that the State acknowledges and adopts the conduct in question as its own.”[ref 378]

However, some open questions remain around this avenue, such as over whether it would be legally feasible (or politically acceptable) for the contracting states to acknowledge and adopt TFAI agents’ conduct prospectively (i.e., through a unilateral declaration or by explicit provision in an AI-guiding treaty), or if they could—or would—only do so retrospectively, in relation to a particular instance of TFAI agent behaviour. 

In fact, even if such behaviour were not formally adopted, it is possible that other state actions with regards to its deployed AI agents (e.g. public approval of their actions, or continued provision of inference computing resources to enable continuation of such activities) might signal sufficient tacit endorsement, so as to nonetheless retrospectively construct those systems as agents of the state.[ref 379]

5. Comparing approaches to establishing TFAI agent attributability 

The above discussion shows that there are a wide range of avenues towards clarifying the relation between deploying states and their (TF)AI agents in ways that further strengthen the TFAI framework in both legal and technical terms.

Significantly, while there are various options to classify and attribute the conduct of AI agents in ways that extend (particular forms of) legal personhood to them, we have also seen that, under the international law on state responsibility, the ability of TFAI agents to legally function as state agents is in fact largely orthogonal to such extensions. As such, of the above solutions, we suggest that an approach that does not treat TFAI agents as international or domestic legal persons but merely as entities whose actions are attributable to the states under ARSIWA (because they act as completely dependent de facto organs of their states, are acting under their states’ “effective control”, or engage in conduct that is acknowledged and adopted by their state) likely strikes the most appropriate balance for the TFAI framework. That is, these legal approaches would largely address the legal, political, and technical challenges to AI-guiding treaty stability and effectiveness, and they would do so in a way that remains most closely grounded in existing international law, as it does not require the development of new lex specialis regimes or innovative judicial or treaty amendments to grant international legal personhood to these systems. Simultaneously, they would avoid the responsibility gaps that might appear from attempting to extend or grant international legal personhood (especially those that involve new rights and not just obligations) to these models. 

It is important to remember here that, in the first instance, the TFAI framework is meant as a legally modest innovation and a pragmatic mechanism for interstate commitment, one that will be sorely needed before long as AI continues to advance. Since AI-guiding treaties would still be exclusively conducted amongst states, states remain the sole direct subjects to those obligations. 

C. Applying the TFAI framework to AI agents deployed by non-state private actors

Finally, there are other outstanding questions that, beyond some brief reflections, we largely leave out of scope here. 

For instance, to return to the previous question of which AI agents should be subjected to a TFAI framework: if we adopt a broad reading, this would imply all agents subject to a state’s domestic law. However, there is an open question over whether and how to apply the TFAI framework to models deployed by private sector actors. After all, since non-state actors cannot conduct treaties under international law, they could not conduct AI-guiding treaties, formally understood. 

Of course, states could draft an AI-guiding treaty in such a manner as to commit its signatories to introduce domestic regulation requiring that private actors only deploy models that are trained, fine-tuned, or aligned so that they abide by the treaty and/or by its implementing domestic law. Moreover, AI-guiding treaties could also specify explicitly that state parties will be held responsible for any violations of the treaty by any AI agents operating from their territory, applying a threshold for state responsibility that is even steeper than that supported by the general law on state responsibility under ARSIWA, and which would create strong incentives for states to apply rigorous treaty- and law-following AI frameworks. Alternatively, a treaty might require all parties to domestically deploy a separate set of TFAI agents to monitor and police the treaty compliance of other, non-state agents.

Moreover, AI companies themselves might also draw inspiration from the TFAI framework, as an avenue for jointly formulating model specification documents. For instance, there would be nothing to bar non-state private actors from engaging in partnerships that also bind their AI agents, industry-wide, to certain standards of behaviour or codes of conduct, in a set-up that may be at least technically isomorphic to the one used to create TFAI agents. Such an outcome would not constitute a form of law alignment as such—since coordinated AI-guiding industry standards would not be considered as laws either in a positive law sense, or given the lack of democratic legitimacy.[ref 380] Nonetheless, in the absence of adequate coordinating national regulation or standards, such agreements could form a species of policy entrepreneurship by AI companies, establishing important stabilizing commitments or guarantees amongst themselves. These could specify treaty-like constraints on companies deploying AI technology in ways that would be overtly destabilizing—e.g., precluding their use in corporate espionage or sabotage, or assuring that these systems would take no part in informing lobbying efforts aimed at regulatory capture or at supporting power concentration by third parties.[ref 381] Indeed, as multinational private tech companies may rise in historical prominence relative to states,[ref 382] such agreements could well establish an important new foothold for a next iteration of intercorporate law, stably guiding the interactions of such actors relative to states—and to one another—on the global stage.  

A key question in establishing a functional TFAI framework is that of how a TFAI agent is to interpret a treaty in order to evaluate whether its actions would be in compliance with that AI-guiding treaty’s terms. This raises many additional challenges: Are there particular ways to craft treaties to be more accommodating to this? How much leeway would contracting states have in specifying or customizing the interpretative rules which these systems use in their interpretation? A full consideration of these questions is beyond this paper, but we provide some initial reflections upon potential strategies and consider their assorted challenges. 

Specifically, we can consider two avenues for implementation. In one, TFAI agents apply the default customary rules on treaty interpretation to relatively traditionally designed treaties; in the other, the content and design of the treaty regime are tailored—through bespoke treaty interpretation rules and arbitral bodies—in order to produce special regimes (lex specialis) that are more responsive and easily applicable by deployed TFAI systems. Below, we consider each of these approaches in turn, identifying benefits but also implementation challenges. 

A. Traditional AI-guiding treaties interpreted through default VCLT rules 

One avenue could be to have TFAI agents apply the default rules of treaty interpretation in international law. Public international law, under the prevailing positivist view, is based on state consent. Accordingly, treaties are considered the “embodiments of the common will of their parties”,[ref 383] and they must be interpreted in accordance with the common intention of those parties as reflected by the text of the treaty and the other means of interpretation available to the interpreter.[ref 384] Because a treaty’s text is held to represent the common intentions of the original authors of a treaty—and of those parties who agree later to adopt its obligations by acceding to the treaty—the primary aim of treaty interpretation is to clarify the meaning of the text in light of “certain defined and relevant factors.”[ref 385] 

In particular, Articles 31-33 of the 1969 Vienna Convention on the Law of Treaties (VCLT) codify these customary international law rules on treaty interpretation.[ref 386] Since these are custom, they apply generally to all states, even to states that are non-parties to the VCLT. For instance, while 116 states are parties to the Vienna Convention, the United States is not (having signed but not ratified).[ref 387] Nonetheless, the US State Department has on various occasions stated that it considers the VCLT to constitute a codification of existing (customary international) law,[ref 388] and many domestic courts have also relied on the VCLT as authoritative in a growing number of cases.[ref 389]

This default VCLT approach sets out several means for interpretation. According to VCLT Article 31(1), as a general rule of interpretation

“a treaty shall be interpreted in good faith in accordance with the ordinary meaning to be given to the terms of the treaty in their context and in the light of its object and purpose.”[ref 390] 

Would a TFAI agent be capable of applying the different elements to this interpretative approach? Critically, the VCLT only sets out the rules and principles of interpretation, and does not explicitly specify who or what may be a legitimate interpreter of a treaty. As such, it does not explicitly rule out AI systems as interpreters of treaties. To be sure, one could perhaps argue that it implicitly rules out such interpreters—for instance, by taking ‘in good faith’ to refer to a subjective state that is inaccessible to AI systems. However, that would only complicate their ability to apply all these rules of interpretation, not their essential eligibility as interpreters. 

That does not mean that AI systems, lacking their own international legal personality, could produce interpretations that would (in and of themselves) be authoritative for others (e.g., in adjudication) or which would (in and of themselves) be attributable to a state. However, TFAI agents would in principle be allowed to interpret treaties, with reference to VCLT rules, in order to conform their own behaviour to treaties regulating that behaviour. The resulting legal interpretations they would generate during inference-time legal reasoning would therefore functionally serve as an internal compliance mechanism, rather than as authoritative interpretations that would bind third parties.

1. TFAI agents and treaty interpretation under the VCLT

However, even if AI agents may validly apply the VCLT rules, could they also do so proficiently? In the domestic law context, some legal scholars have questioned whether AI systems’ responses would really reliably reflect the “ordinary meaning” of terms because the susceptibility of LLMs to subtle changes in prompting leaves them open to gamified prompt strategies to reflect back preconceived notions,[ref 391] or even, more foundationally, because such models are produced by private actors with idiosyncratic values and distinct commercial interests.[ref 392] However, importantly, in treaty interpretation, the “ordinary meaning” of a term is not just arrived at through its general public usage; rather, an appropriate interpretation needs to also take into account the various elements further specified in Arts. 31(2-4), along with the “supplementary means of interpretation” specified in Art. 32.[ref 393] 

a) VCLT Art 31(1): The treaty’s “object and purpose” and the principle of effectiveness

To interpret a treaty in accordance with VCLT Art 31(1), a TFAI agent would need to be able to understand that treaty’s “object and purpose”. To achieve this, it should be aided by clear textual provisions that reflect the underlying goals and intent of the states parties in establishing the treaty. A shallow text that only sets out the agreed-upon specific constraints on AI behaviour, without clarifying the purpose for which those constraints are established, would risk “governance misspecification”, as AI agents could well find legal loopholes around such proxies.[ref 394] Conversely, a treaty which clearly and exhaustively sets out its aims (e.g., in a preamble, or in its articles) would provide much stronger guidance.[ref 395] 

Importantly, since an overarching goal of treaty interpretation is to produce an outcome that advances the aims of the treaty, a clear representation of the treaty’s object and purpose would also allow a TFAI agent to apply the “principle of effectiveness,”[ref 396] which holds that, when a treaty is open to two interpretations, where one enables it to have appropriate effects and the other does not, “good faith and the objects and purposes of the treaty demand that the former interpretation should be adopted.”[ref 397] 

b) VCLT Art 31(2): “Context” 

Furthermore, following VCLT Art 31(2), in interpreting the meaning of a term or provision in a treaty, TFAI agents would need to consider their context; this refers not only to the rest of the treaty text (including its preamble and annexes), but also to any other agreements relating to the treaty, or to “any instrument which was made by one or more parties in connection with the conclusion of the treaty and accepted by the other parties as an instrument related to the treaty.”[ref 398] The latter criterion implies that TFAI agents would need frequent retraining and updates, or the ability to easily identify and access databases of subsequent agreements during inference, to ensure they are aware of the most up-to-date agreements in force between the parties. The latter approach has been studied as a promising way to reliably ground AI systems’ legal reasoning.[ref 399]

c) VCLT Art 31(3): Subsequent agreement, subsequent practice, and “any relevant rules”

Furthermore, VCLT Art 31(3)(a-b) directs the interpreter to take into account any subsequent agreement or practice between the parties regarding the interpretation of the treaty or the application of its provisions.[ref 400] This is significant, as it entails that TFAI agents would have a way of tracking the state parties’ practice in interpreting and applying the treaty, as reflected in state declarations such as unilateral Explanatory Memoranda or joint Working Party Resolutions passed by a relevant established treaty body or forum amongst the parties.[ref 401]

VCLT Art 31(3)(c) also directs the interpreter to take into consideration “any relevant rules of international law applicable in the relations between the parties.”[ref 402] This suggests that TFAI agents should be able to apply the method of “systemic integration”, and draw on other rules and norms in international law—whether treaties, custom, or general principles of law[ref 403]—to clarify the meaning of treaty terms or to fill in gaps in a treaty, so long as the referent norms are relevant to the question at hand and applicable between the parties. 

d) VCLT Art 32 “supplementary means of interpretation”

Finally, in specific circumstances, a TFAI agent could also refer to historical evidence in interpreting the treaty: VCLT Art 32 holds that, if and where the interpretation of a treaty according to Art 31 “leaves the meaning ambiguous or obscure”, or “leads to a result which is manifestly absurd or unreasonable”,[ref 404] the interpreter may refer to “supplementary means of interpretation,” such as the travaux preparatoire (i.e., the preparatory work of the treaty, as reflected in the records of negotiations) or the circumstances of the treaty’s conclusion (e.g., whether the treaty was conducted in the wake of a major AI incident of a particular kind).

2. Potential challenges to traditional AI-guiding treaties 

Nonetheless, while at a surface level TFAI agents would be capable of applying many of these methodologies,[ref 405] there may be a range of problems or challenges in grounding the interpretation of AI-guiding treaties in the default VCLT rules alone.

For one, there may be distinct challenges that TFAI agents would encounter in adhering to the VCLT methodology for interpretation. 

That is not to suggest that such treaty interpretation would be too complex for AI systems by dint of the sophistication of the legal reasoning required. Rather, the challenges might be empirical, in that certain interpretative steps would involve empirical fact-finding exercises (e.g., to ascertain evidence of state practice) which could prove difficult (or unworkably time- or resource-intensive) for AI agents more natively proficient in computer-use tasks. Indeed, in some cases, TFAI agents would encounter significant challenges in attempting to access or even locate relevant materials, with key travaux preparatoire materials currently often collected in scattered conference records or even—as with virtually all international agreements sponsored by the Council of Europe—entirely inaccessible.[ref 406]

However, such grounding challenges need not be terminal to the TFAI framework. For one, many of these hurdles would not be larger to TFAI agents than they would be (or indeed, already are) to human-conducted interpretation. Indeed, more cynically, one could even argue that it is not the case that all human judges or scholars, when interpreting international law, always consistently engage in such robust empirical analyses.[ref 407] 

In practice, and for future AI-guiding treaties, such grounding challenges might be defeasible, however. State parties could adopt a range of measures to ensure clear and authoritative digital trails for key interpretative materials, ranging from the treaty’s travaux preparatoire (including conference records such as procès verbaux or working drafts of the agreement[ref 408]) to subsequently conducted agreements, and from relevant new case law by international courts to evidence of states’ interpretation and application of the treaty. 

b) Challenges of adversarial data poisoning attacks corrupting interpretative sources 

A second potential challenge would be the risk of legal corruption in the form of adversarial data poisoning attacks. In a sense, this would be the inverse risk—one where the problem is not a TFAI agent’s inability to access the required evidence of state practice, but the risk that it resorts too easily to a wide range of (seemingly) relevant sources of evidence, when these could be all too easily contaminated, spoofed, or corrupted by the states parties to the treaty (or by third parties). 

This risk is illustrated by Deeks and Hollis’s concern that, if LLMs’ responses can be shaped by patterns present in their training data, then there is a risk that their legal judgments and interpretations over the correct interpretation of international norms may “turn more on the volume of that data than its origins”.[ref 409] This would be an especially severe challenge to the interpretation of customary international law; but would even affect the interpretation of written (treaty) law. This not only creates a background risk that, by default, AI models may overweight more common sources of legal commentary (e.g., NGO reports, news articles) over rarer, but far more authoritative ones (e.g., government statements),[ref 410] it also creates a potential attack surface for active sabotage—or the subtle skewing—of the legal interpretations conducted by not just TFAI agents, but by all (LLM-based) AI systems developed on the basis of pre-training on internet corpora. After all, as Deeks and Hollis note: 

“if it becomes clear that LLM outputs are influencing the direction of international law, state officials and others will have an incentive to push their desired views into training datasets to effectively corrupt LLM outputs. In other words, disinformation or misinformation about international law online at scale could contaminate LLM outputs, and […] common understandings of the law’s contents or contours.”[ref 411]

Indeed, the feasibility of states seeking to push their legal interpretation (or outright falsification) of international events into the corpus of training data used in pre-training frontier AI models, is demonstrated by computer science work on the theoretical and empirical feasibility of data poisoning attacks, which have proven effective regardless of the size of the overall training dataset, and indeed which larger LLMs are significantly more susceptible to.[ref 412] Another line of evidence is found in the tendency of AI chatbots to inadvertently reproduce the patterns which nations’ propaganda efforts, disinformation campaigns, and censorship laws have baked into the global AI data marketplace.[ref 413] Finally, the risk of such law-corrupting attacks is borne out through actual recent instances of deliberate data poisoning attacks aimed at LLMs. For instance, a set of 2025 studies found that the Pravda network, a collection of web pages and social media accounts, had begun to produce as many as 10,000 news articles a day aggregating pro-Russia propaganda, with the likely aim to infiltrate and skew the responses of large language models, in a strategy dubbed “LLM grooming”.[ref 414] Subsequent tests found that this strategy had managed to skew leading AI chatbots into repeating false narratives at least 33% of the time.[ref 415] Indeed, in the coming years, as more legitimate sources of authentic digital data may increasingly aim to impose controls or limits on AI-training-focused content crawlers,[ref 416] there is a risk that the remaining internet data available to training AI systems may skew ever further towards malicious data seeded for the purposes of intentional grooming.   

To date, such AI grooming strategies have been predominantly leveraged for social impacts (e.g., political misinformation or propaganda), not legal ones. However, if targeted towards legal influence, they could rapidly erode the reliability of the answers provided by AI chatbots to any users enquiring into international law. More concretely, if such campaigns aimed to falsify the digital evidence of state parties’ track record in applying and interpreting an AI-guiding treaty, this would disrupt TFAI agents’ ability to interpret that treaty on the basis of that track record (VCLT Art 31(3)). 

Thus, unless well designed, the LFAI framework may be—or may appear—susceptible to interpretative manipulation: even if the certified AI-guiding treaty text could be kept inviolate in a designated and authenticated repository which TFAI agents could access or query, the same chain of custody may not be easily established for the ample decentralized digital evidence of subsequent state practice and opinio juris which is used specifically to inform treaty interpretation under VCLT Art 31(3)(b),[ref 417] as well as generally to inform interpretation of customary international law under the ICJ Statute.[ref 418] 

Such evidence could easily be contaminated, spoofed, or corrupted by some actors in order to manipulate TFAI agents’ legal interpretations[ref 419] in ways that skew both sides’ AI models’ behaviour to their advantage or in ways meant to erode the legitimacy or stability of the treaty regime. Indeed, in some cases, the perception of widespread state practice could even (be erroneously held to) contribute to the creation of new customary international law, which—since treaties and customary international law are coequal sources of international law,[ref 420] and under the lex posterior principle—might supersede the preceding (AI-guiding) treaty, rendering it obsolete.[ref 421] 

Nonetheless, this corruption challenge also needs to be contextualized. For one, insofar as some state actors may seek to engage in LLM-grooming attacks in many areas of international law, this phenomenon does not pose some unique objection to TFAI agents, but rather constitutes a more general problem for any interpreters of international law, whether human or machine. While in theory human interpreters might be better positioned than AI (at least at present) to judge the authenticity, reliability, and authority of certain documents as evidence of state practice, many may not exert such scrutiny in practice, especially if or as they rely on other (consumer) AI chatbots.[ref 422] In fact, in the specific context of AI-guiding treaties, the attack surface may be proportionally smaller, since TFAI agents could be configured to only defer to specific authenticated records of state practice as they relate to that treaty itself. Alternatively, AI models could be configured to monitor and flag any efforts to corrupt the digital record of state practice, though this would likely be significantly politically charged or contested. 

In another scenario, the treaty-compliant actions of state-deployed TFAI agents could even help anchor and shield the interpretation of international law against such attacks. If such actions were recognized as evidence of state practice, they would provide a very large (to the tune of tens or hundreds of thousands of decisions per agent per year), exhaustively recorded, and verifiable record of state practice as it relates to the implementation of the AI-guiding treaty. In theory, then, these treaty-compliant legal interpretations and actions of each individual TFAI agent could help anchor the legal interpretation of all other TFAI agents, insulating them from corrupting dynamics. However, this may be more contentious: it would require that the legal interpretations produced by state-deployed TFAI agents are taken to be authoritative for others or attributable to a state in ways that reflect not only state practice (which, as discussed,[ref 423] depends on effective attribution of AI agents’ conduct to their deploying states) but which also reflect its opinio juris (which may be far more contested).

c) Challenges of interpretative ambiguity and TFAI agent impartiality

Furthermore, TFAI agents may encounter a range of related challenges in interpreting vague treaty terms or articles. 

Indeed, scholars have noted AI systems may struggle in performing the legal interpretation of statutes. Because it is impossible to write a “complete contingent contract”[ref 424] and because legal principles written in natural language are often subject to ambiguity—both in how they are written, and how they are applied—human legal systems often use institutional safeguards to manage such ambiguity. However, these safeguards are more difficult to embed in AI systems barring a clear rule-refinement framework that can help minimize interpretative disagreement or reduce inconsistency in rule application.[ref 425] Without such a framework, there is a risk, as recognized by proponents of law-following AI, that 

“in certain circumstances, at least, an LFAI’s appraisal of the relevant materials might lead it to radically unorthodox legal conclusions—and a ready disposition to act on such conclusions might significantly threaten the stability of the legal order. In other cases, an LFAI might conclude that it is dealing with a case in which the law is not only “hard” to discern but genuinely indeterminate.”[ref 426] 

In particular, for TFAI agents deployed in the international legal context, there are additional challenges when a provision in an AI-guiding treaty may remain open to multiple possible interpretations. In principle, such situations are to be resolved with reference to the interpretative principle of effectiveness—which states that any interpretation should have effects broadly in line with good faith and with the object and purpose of the treaty.[ref 427] 

However, in practice, there may be cases where there are several interpretations that are acceptable under this principle, but where some interpretations nonetheless remain much more favourable to a particular treaty party than others. In such cases, there may be a tension between the “best” interpretation of the law (as would be reached by a neutral judge), and a “defensible” yet partial interpretation (as would be pursued by a state’s legal counsel). 

How should TFAI agents resolve such situations? On the one hand, we might want to ensure that they adopt the “best” or most impartial effective interpretation to ensure symmetrical and uncontested implementation of the treaty by all states parties’ TFAI agents as a means to ensure the stability of the regime. On the other hand, many lawyers working on behalf of particular clients or employers (in this case, State Departments or Foreign Ministries) may already today, implicitly or explicitly, pursue defensible interpretations of the applicable law that are favourable to their principal. Given this, it seems unlikely that states would want to deploy TFAI agents that did not, to some degree, consider their states’ interests in deciding amongst various legally defensible interpretations. One downside is that this might result in asymmetries in the interpretations reached by (and therefore the conduct of) TFAI agents acting on behalf of different state parties. Whether this is a practical problem may depend on the substance of the treaty, the degree of latitude which the interpreting TFAI agents actually have in altering their behaviour under the treaty, and the parties’ willingness to overlook relatively minor or inconsequential differences in implementation—that are nonetheless minimally compliant with the core norm in the treaty—as the price of doing business.  

d) TFAI agents may struggle with interpretative systemic integration 

Relatedly, TFAI agents may encounter distinct legal and operational challenges in interpreting a treaty under the broader context of international law. In some circumstances, this could result in an explosion in the number of norms to be taken into account when evaluating the legality of particular conduct under a treaty. 

As noted before,[ref 428] VCLT Art 31(3)(c) requires a treaty interpreter to take into consideration “any relevant rules of international law applicable in the relations between the parties.”[ref 429] This suggests that TFAI agents should, where appropriate in clarifying the meaning of ambiguous treaty terms, or where the treaty leaves gaps in its guidance relative to certain situations,[ref 430] draw on other “relevant and applicable” rules and norms in international law in order to clarify these questions at hand. 

Importantly, this interpretative principle of systemic integration has a long history that reaches back almost a century, and well before the VCLT.[ref 431] Forms of it have appeared in cases as early as Georges Pinson v Mexico (1928).[ref 432] In its judgment in Right of Passage (1957), the ICJ held that “…it is a rule of interpretation that a text emanating from a Government must, in principle, be interpreted as producing and intended to produce effects in accordance with existing law and not in violation of it.”[ref 433] Since it was enshrined under the aegis of the VCLT, and especially since its case-dispositive use in the ICJ’s decision in Oil Platforms (2003),[ref 434] systemic integration has been increasingly prominent in international law.[ref 435] In recent years, it has been recognized and applied by a range of international courts and tribunals,[ref 436] such as, notably, in climate change cases such as Torres Strait[ref 437] and the International Tribunal on the Law of the Sea (ITLOS)’s Advisory Opinion on Climate Change.[ref 438] 

This poses a potential challenge to the smooth functioning of a TFAI framework, however: Under VCLT Art 31(3)(c), systemic integration—the consideration and application of relevant and applicable law—imposes potential legal and operational challenges for AI agents. It could imply that these systems would be required to cast a very wide net, ranging across a huge body of treaty and case law, when interpreting the specific provisions of an AI-guiding treaty. As discussed, this challenge is of course not unique to international law. Indeed, it is analogous to the challenges posed to domestic law-following AI systems operating in a sprawling and complex domestic legal landscape. As Janna Tay has noted, “[a]s laws proliferate, there is a growing risk that laws produce conflicting duties. Accordingly, it is possible for situations to arise where, in order to act, one of the conflicting rules must be broken.”[ref 439] 

In the contexts of both domestic and international law, the potential proliferation of norms or rules to be taken into consideration imposes a practical challenge for TFAI agent interpretation of the law, since it implies that an AI agent should dedicate exhaustive computing power and very long inference-time reasoning traces to excavating all potentially relevant norms applicable on the treaty parties that could potentially pertain to reaching a full judgment. It also entails a potential interpretative challenge, since the fragmentation of international law might imply that certain norms across different regimes simply stand in tension with each other. Indeed, legal scholars have noted that there are risks of potential normative incoherence in the careless application of systemic integration even by human scholars.[ref 440]  

Of course, while an important consideration, in practice there are at least three potential responses to this challenge. 

In the first place, it could not only be feasible but appropriate to calibrate the level of rigour required from TFAI agents, similar to how it is delimited for LFAI agents,[ref 441] in order to ensure that the alignment of their behaviour with the core treaty text remains computationally, economically, and practically feasible, or that it takes into account exceptional circumstances.[ref 442] 

In the second place, the scope of application of—or the need for resort to—systemic integration could be circumscribed in many situations simply by drafting the original AI-guiding treaty text in a manner that front-loads much of the interpretative work; for example, by reducing terminological ambiguity, anticipating and accounting for potential gaps in the treaty’s application, or pre-describing—and addressing—potential interactions of that treaty with other relevant norms or regimes applicable to the contracting states. Indeed, AI systems themselves could support such a drafting process, since, as Deeks has suggested, such models might well help map patterns of treaty interaction in ways that foresee and forestall potential norm conflicts.[ref 443] 

Finally, and in the third place, TFAI agents could simply be configured to address the halting problem by deriving interpretative guidance from other rules in international law in an iterative manner, seeking interpretative guidance from one (randomly selected or reasoned) other regime at a time, and only continuing the search if guidance is not found there.

In sum, while the traditional avenue for implementing the TFAI framework under the default VCLT rules may offer a promising baseline approach for guiding TFAI agent interpretation, this approach also faces a range of epistemic, adversarial, and operational challenges. Importantly, such treaties may also potentially offer less (ex ante) interpretative control or predictability to the states parties to the treaty, which could make it less appealing to them in some cases. These considerations could therefore shift these states to prefer an alternative, second model of AI-guiding treaty design.

B. Bespoke AI-guiding treaties as special regimes with arbitral bodies

A second design avenue for AI-guiding treaties would be to adapt a treaty’s design in ways that would provide clearer, bespoke interpretative rules and procedures for a TFAI system to adhere to. 

Significantly, while the VCLT rules for treaty interpretation are considered default rules of treaty interpretation, they are not considered peremptory norms (jus cogens) that states may not deviate from.[ref 444] Indeed, VCLT Art 31(4) allows that “[a] special meaning shall be given to a term if it is established that the parties so intended.”[ref 445] This means that states may specify special interpretative rules, including those that depart from the usual VCLT rules, if it is clearly established that such interpretative preferences were mutually intended. 

1. Special regimes and bespoke treaty interpretation rules under VCLT Art 31(4)

Importantly, the creation of such a special regime (lex specialis) would not imply that the VCLT is inapplicable to the treaty in question; however, through the use of special meanings and interpretation rules and procedures, states can operationally bypass (while working within) the VCLT’s interpretation rules. Moreover, they would not necessarily need to do all this upfront, but could also do so iteratively, complementing the initial treaty with subsequent agreements clarifying the appropriate manner of its interpretation—since these agreements would, as discussed above, need to be taken into account in the process of treaty interpretation under VCLT Art 31(3)(a-b).[ref 446]

Such special regime arrangements could greatly aid the technical, legal, and political feasibility of AI-guiding treaties: they would enable states to tailor such treaties to their preferences, gain greater clarity (and explicit agreement) over the terms by which their AI models would be bound, and forestall many of the interpretative and doctrinal challenges that TFAI systems would otherwise encounter when attempting to apply default VCLT rules to ensure their compliance with the treaty.   

What would be examples of special interpretation rules that states might seek to adopt into AI-guiding treaties? These rules could include provisions to set down a “special meaning” (under VCLT Art 31(3)(c)) or a highly specific operationalization of key terms (e.g., “self-replication”,[ref 447] “steganographic communication”,[ref 448] or uninterpretable “latent-space reasoning”[ref 449]) which otherwise have no settled definition in public usage, let alone under international law. 

Other deviations could establish variations on the default VCLT interpretation rules; for instance, the treaty might explicitly direct that “subsequent practice in the application of the treaty” (VCLT Art 31(3)(b)) also, or primarily, refers to the practice of other TFAI systems implementing the treaty, in order to ensure that TFAI interpretations of the treaty converge and stabilize on a predictable and joint operationalization of the treaty, in a manner that is (more) robust against attempts at attacking the TFAI agents through data poisoning or LLM-grooming attacks that target the base model. 

2. Inclusion and designation of special arbitral body 

Of course, no treaty, whether a special regime or not, would be able to provide exhaustive guidance for all circumstances or situations which a TFAI agent might encounter. The impossibility of drafting a complete contingent contract that covers all contingencies has been a well-established challenge in both legal scholarship and research on AI alignment.[ref 450]

The traditional response to this challenge is the incorporation of a judicial system to clarify and apply the law in cases where the written text appears indeterminate. Consequently, proposals for law-following AI in the domestic legal context have held that such systems could defer to a court’s authoritative resolutions to legal disputes, whether in fact or on the basis of its prediction of what a court would likely decide in a given case.[ref 451] Other proposals for law-following AI, such as Bajgar and Horenovsky’s proposal for AI systems aligned to international human rights, have also emphasized the importance of an adjudication system—realized either through traditional judicial systems or within a specialized international agency.[ref 452]

Accordingly, in addition to including provisions to clarify the interpretative rules to be applied by TFAI agents, an AI-guiding treaty could also include institutional innovations in its design. For instance, it could establish a special tribunal or arbitral body. After all, while the default interpretative environment in international law is decentralized and fragmented,[ref 453] treaty drafters may, as noted by Crootof, “introduce reasoned flexibility into a treaty regime without losing cohesion by designating an authoritative interpreter charged with resolving disputes over the text’s meaning in light of future developments”.[ref 454] 

There is ample precedent for the establishment of such specialized courts or arbitral mechanisms within a treaty regime, such as the ITLOS, which interprets the provisions of the UN Convention on the Law of the Sea (UNCLOS), and in doing so relies significantly on its own jurisprudence and on the specific teleology and structure of UNCLOS;[ref 455] or the International Whaling Commission, empowered under the 1946 International Whaling Convention to pass (limited) amendments to the treaty provisions.[ref 456] In some cases, subsequent state practice has even resulted in some initially limited arbitral bodies taking up a much greater interpretative role; for instance, since their establishment, the World Trade Organization (WTO) Panels and Appellate Body have come to exert a significant role in interpreting the Marrakesh WTO Agreement,[ref 457] even though that treaty formally reserved an interpretative role to a body of state party representatives.[ref 458] The challenging political context, and eventual contestation of the WTO AB also show the risks of poorly designing a TFAI (or indeed any) treaty, however.[ref 459]  

These examples show how, in drafting an AI-guiding treaty, state representatives could choose to establish an authoritative specialized court, tribunal, or arbitral mechanism, as a means of tying TFAI agent interpretations of a treaty to a human source of interpretative authority. This treaty body could steadily accumulate a jurisprudence that TFAI agents could refer to in interpreting the provisions of a treaty. Indeed, the tribunal could do so both reactively in response to incidents involving TFAI agent noncompliance or prospectively by engaging in a form of jurisprudential red teaming, exploring a series of hypothetical cases revolving around potential scenarios that might be encountered by AIs. As the resulting body of case law grows, it could eventually even enable TFAI agents to extrapolate from it on their own.[ref 460] Tying the TFAI agent’s legal interpretations to the judgments, opinions or reports produced by a specialized arbitral body would also help ensure that all machine interpretations are ultimately grounded in the judgment of a legitimate human interpreter, thus reducing the probability that the TFAI agent applies the VCLT to reach “radically unorthodox legal conclusions”[ref 461] that, in its view, are compelled or allowed by the AI-guiding treaty text.

Of course, an important implementation question would be what principles this arbitral body should rely upon in interpreting a treaty. It could itself refer to the norms of international law, or it could refer to other (non-legal) norms, principles, or interests jointly agreed upon by the parties to the treaty, at its inception or over time. There is no doubt that any such arrangement would put a lot of political weight on the arbitral body, but that is hardly a new condition in international law.[ref 462] 

There would be challenges however: one is that this solution might be better suited to future treaties (whether advanced AI agreements or other treaties designed to regulate states’ activities in other domains), than to existing treaties or norms in international law. After all, many hurdles might appear when attempting to bolt new, TFAI-specific authoritative interpreters into existing treaties or regimes, especially those that already have authoritative interpreters, which might be resistant to having their powers eroded or displaced.

Another more general risk could be that the inclusion of an independent authoritative interpreter shifts interpretative force too far away from the present-day treaty-makers (e.g., states) towards an intergovernmental actor in the future.[ref 463] An arbitral body that pursued an interpretative course too far removed from the original (or evolving) intentions of the states parties might induce drift in the treaty—and with it, in TFAI agent behaviour—potentially leading states to withdraw and perhaps conduct another treaty. Simultaneously, the flexibility afforded by a special tribunal could also be considered a benefit, since it would avoid the risk of locking in TFAI agents to one particular text conducted at one particular time and enable the adjudicatory system to change its judgments over time.[ref 464] However, again, these are not challenges or tradeoffs that are unique to AI-guiding treaties.

This discussion far from exhausts the relevant questions to be answered in determining the viability of the TFAI framework. There are key outstanding challenges that need to be overcome in order to ensure the effectiveness and stability of AI-guiding treaties, both as a technical alignment framework for TFAI agents and as a political commitment mechanism for states.

A. Treaty-alignment verification

One key technical and political challenge for the TFAI framework concerns the question of TFAI agent treaty-alignment verification

That is, how can state parties verify that their treaty counterparties have deployed their agents to be (and remain) TFAI aligned? Appropriate verification is of course, as discussed, a general problem for many types of international agreements around AI.[ref 465] Yet even though TFAI agents resolve one set of verification challenges (namely, over whether counterparty state officials are, or could have opportunity, to command agents to engage in treaty violations), they of course create a new set of verification challenges.

For instance, is it possible to ensure “data integrity” for AI agents,[ref 466] including (at the limit) those used by governments on their own internal networks? Relatedly, how can states ensure adequate digital forensics capabilities to attribute AI agents’ actions to particular states, to deter treaty members from deploying unconstrained AI agents, either by operating dark (hidden) data centres or by using deniable AI agents that are nominally operated by private parties within their territory?[ref 467] Of course, many states may struggle to robustly hide the existence of data centres from their counterparties’ scrutiny or awareness given the difficulties inherent in many avenues to attempt this (e.g., renting data centres overseas, repurposing existing big-tech servers, co-locating in mega-factories, repurposing Bitcoin mining facilities, hiding as a form of heavy industry, or placing in concealed underground builds), as well as the relative feasibility of many potential avenues for conducting location tracking, intelligence synthesis, energy-grid load fingerprinting, or regular espionage over such activities.[ref 468] Nonetheless, are there verification avenues for ensuring that all deployed AI systems are and remain aligned, that their actions remain attributable, and that the framework cannot be easily evaded (at least at scale)? These challenges are not unprecedented, but they may require novel variations on existing and near-future measures for verifying international AI governance agreements.[ref 469] 

Progress on such questions may require further investment in testing, evaluation, verification and validation (TEVV) frameworks that are better tailored to the affordances of AI agents. This can build on a long line of work exploring avenues for TEVV for military AI systems[ref 470] and digital twins (complex virtual models of complex critical systems),[ref 471] as well as established models for the development of ‘Trusted Execution Environments’ (TEE) and for the joint operation of secure source code inspection facilities, which have in other domains allowed companies to provide credible security assurances in high-stakes, low-trust contexts to foreign states, while addressing concerns over IP theft or misuse.[ref 472]

There are also many distinct levers and affordances that could be used in verifying particular properties (including but not limited to treaty alignment) of AI agents. For instance, depending on the level of granularity, verification activities could extend to monitoring the energy used in inference data centres (to assess when agents were undertaking extensive computations or analysis that was not reflected in its chain of thought), the integrity of models run in inference data centres (e.g., verifying that there have been no modifications to a model’s weights compared to an approved treaty-following model), the integrity of training data (e.g., to vouchsafe models against data poisoning or LLM-grooming attacks), and more.

Another option could be to ensure that all deployed TFAI agents regularly connect to a verified and certified Model Context Protocol (MCP) server—a (currently open) architecture for securely connecting AI applications to external systems and tools.[ref 473] Such an MCP server could either serve as a verifiable control plane for whether those agents continue to operate adequately treaty-following reasoning (potentially by randomly and routinely auditing their legal judgments against that of a certified third-party AI agent),[ref 474] or even directly provide treaty-following guardrails to the deployment-time reasoning chain-of-thought traces (and behaviour) of TFAI agents operating through it.[ref 475]

A related technical enforcement challenge is also temporal. Given the iterative and continuous nature of modern AI development—involving many deployments of different versions within a model family over time, how might an AI-guiding treaty ensure continuity of TFAI alignment across different generations of an AI agent’s models? Can it include provisions specifying that TFAI models (such as those involved in governmental AI research) are to ensure that all future iterations of such models are designed in a way that is TFAI-compliant with the original treaty? Or would this effectively require the transfer of such extensive affordances (e.g., network access) and authorities to these models that this would not just be politically infeasible to most states, but also a potential hazard given the vulnerabilities this would introduce to misaligned AI agents? Alternatively, it might be possible to root stable treaty alignment of models within an MCP framework that ensures that certified models are locked to changes.

B. TFAI framework in multi-agent systems

There are also distinct interpretative challenges to implementing the TFAI framework in multi-agent systems. For instance, to what degree should TFAI agents take account of the likely interpretations or actions of other (TFAI) agents which they are acting in conjunction with (whether those agents are acting on behalf of their own state, another state, or a private actor) when determining the legality or illegality of their own behaviour? 

This question may become particularly relevant given the growing industry practice of deploying teams of multiple AI agents (or multiple instances of one agent model) to work on problems in conjunction,[ref 476] leading to questions over the appropriate lawful “orchestration” of many agents acting in conjunction with one another.[ref 477] Of course, in some circumstances, the use of TFAI sub-agents that restrict their actions to conducting and providing specific legal interpretations on the basis of trusted databases (e.g., of certified state practice), could be used to insulate the overall system of agents from some forms of data poisoning attacks.[ref 478] 

However, such multi-agent contexts also pose challenges to the TFAI framework, because the illegality of an orchestrated assemblage’s overall act (under particular treaty obligations) may not be apparent; or if it is apparent, each agent may simply pass the buck by concluding that its illegality is only due to the actions of another agent: the outcome of this would be that many or all sub-agents would conclude that the acts they are carrying out are legal in isolation, even as they recognize that the (likely) aggregate outcome is illegal. 

In addition, some multi-agent settings, involving debates between individual AI agents, may also—perhaps paradoxically—create new risks of degrading or corrupting the legal-reasoning competence of individual TFAI agents, as empirical experiments have suggested that even in settings where more competent models outnumber their less competent counterparts, individual models may often shift from correct to incorrect answers in response to peer reasoning.[ref 479]

Similar challenges could emerge around the use of “alloy agents”—systems which run a single chain of thought through several different AI models, with each model treating the previous conversation as its own preceding reasoning trace.[ref 480] Such configurations could potentially strengthen the TFAI framework, by allowing us to leverage the different strengths of different AI models in a single fused process of legal interpretation; however, they could also erode the integrity of such a framework, since a single agent that is compromised or insufficiently treaty-aligned could be used to inject flawed legal arguments into the reasoning trace—with those subsequently being treated as valid legal-reasoning steps even by models that are themselves treaty-aligned.

C. Longer-term political implications of the TFAI framework 

There are also many legal questions around the use of TFAI agents which are beyond the scope of our proposal here. For instance, we have primarily focused on the use of TFAI agents as a useful commitment tool for states that seek to robustly implement treaty instruments in a manner that is both effective and provides strong assurances to counterparties. However, in the longer term, one could consider if the use of the TFAI framework could also develop into one avenue through which a state could legally meet their existing due diligence obligations under international law.[ref 481]

Conversely, the TFAI framework also may have longer-term political implications on the texture, coherence, and received legitimacy of international law. For instance, just as some states have, in past decades, leveraged the fragmentation of international law to create deliberate and “strategic” treaty conflicts[ref 482] in order to evade particular treaties or even outright undermine them, there is a risk, if states conduct narrowly scoped and self-contained AI-guiding treaties, that this creates a perception amongst third-party states that such treaties are conducted in ways that implicitly conflict with existing international obligations, ostensibly allowing states to contract out of them. 

At the same time, it should be kept in mind that while these represent potential hurdles to the TFAI framework, many of these issues are certainly not novel nor exclusive to the AI context. Indeed, they reflect challenges that human lawyers and states have also long faced. Recognizing them, and making progress on these issues, may therefore help us address larger structural challenges in international law.

VII. Conclusion

Treaties have faced troubling times as a tool of international law. At the same time, such instruments may play an increasingly important role in channeling, stabilizing, and aligning state behaviours around the development and use of advanced AI technologies. AI-guiding treaties, serving as constraints on treaty-following AI agents, could help reinvigorate our joint approach to longstanding—and newly urgent—problems of international coordination, cooperation, and restraint. There are clearly certain key unresolved technical challenges to overcome and legal questions to be clarified before these instruments can reach their potential, and this paper has far from exhausted the debate on the best or most appropriate legal, political, and technical avenues by which to implement this framework. 

Nonetheless, we believe our discussion helps illustrate that articulating an appropriate legal understanding of when, why, or how advanced AI systems could follow treaties is not only an intellectually fertile research program, but also offers an increasingly urgent domain of legal innovation to help reconstitute the texture of the international legal order for the 21st century.

AI Preemption and “Generally Applicable” Laws

Proposals for federal preemption of state AI laws, such as the moratorium that was removed from the most recent reconciliation bill in June 2025, often include an exception for “generally applicable” laws. Despite the frequency with which this phrase appears in legislative proposals and the important role it plays in the arguments of preemption advocates, however, there is very little agreement among experts as to what exactly “generally applicable” means in the context of AI preemption. Unfortunately, this means that, for any given preemption proposal, it’s often the case that very little can be said for certain about which laws will or will not be exempted.

The most we can say for sure is that the term “generally applicable” is supposed to describe a law that does not single out or target artificial intelligence specifically. Thus, a state law like California’s recently enacted “Transparency in Frontier Artificial Intelligence Act” (SB 53) would likely not be considered “generally applicable” by a court, because it imposes new requirements specifically on AI companies, rather than requirements that apply “generally” and affect AI companies only incidentally if at all. 

This basic definition, however, leaves a host of important questions unanswered. What about laws that don’t specifically mention AI, but nevertheless are clearly intended to address issues created by AI systems? Tennessee’s ELVIS Act, which was designed to protect musicians from unauthorized commercial use of their voices, is one example of such a law. It prohibits the reproduction of an artist’s voice by any technological means, but the law was obviously passed in 2024 because recent advances in AI capabilities have made it possible to reproduce celebrity voices more accurately than previously. Alternatively, what about laws which were not originally intended to apply to AI systems, but which happen to place a disproportionate burden on AI systems relative to other technologies? No one knows precisely how a court would resolve the question of whether such laws are “generally applicable,” and if you asked four different people who think about AI preemption for a living you might well get four different answers. If federal preemption legislation is eventually enacted, and if an exception for “generally applicable” laws is included, this question will likely be extensively litigated—and it’s likely that different courts will come to different conclusions.

Usually, the best way to get an idea of how a court will interpret a given phrase is to look at how courts have interpreted the same phrase in similar contexts in the past. However, while there is some existing case law discussing the meaning of “generally applicable” in the context of preemption, LawAI’s research hasn’t turned up any cases that shed a great deal of light on the question of what the term would mean in the specific context of AI preemption. It’s therefore likely that we won’t have a clear idea of what “generally applicable” really means until some years from now, when courts may (or may not) have had occasion to answer the question with respect to a variety of different arguably “generally applicable” state laws.

Last updated: December 11, 2025, at 4:19 p.m. Eastern Time

AI Federalism: The Right Way to Do Preemption

On November 20th, congressional Republicans launched a last-minute attempt to insert an artificial intelligence (AI) preemption provision into the must-pass National Defense Authorization Act (NDAA). As of this writing, the text of the proposed addition has not been made public. However, the fact that the provision is being introduced into a must-pass bill at the eleventh hour may indicate that the provision will resemble the preemption provision that was added to, and ultimately stripped out of, the most recent reconciliation bill. The U.S. House of Representatives passed an early version of that “moratorium” on state AI regulation in May. While the exact scope of the House version of the moratorium has been the subject of some debate, it would essentially have prohibited states and municipalities from enforcing virtually any law or rule regulating “artificial intelligence,” broadly defined. There followed a hectic and exciting back-and-forth political struggle over whether and in what form the moratorium would be enacted. Over the course of the dispute, the moratorium was rebranded as a “temporary pause,” amended to include various exceptions (notably including a carve-out for “generally applicable” laws), reduced from 10 years’ duration to five, and made conditional on states’ acceptance of new Broadband Equity Access and Deployment (BEAD) Program funding. Ultimately, however, the “temporary pause” was defeated, with the Senate voting 99-1 for an amendment stripping it from the reconciliation bill.

The preemption provision that failed in June would have virtually eliminated targeted state AI regulation and replaced it with nothing. Since then, an increasing number of politicians have rejected this approach. But, as the ongoing attempt to add preemption into the NDAA demonstrates, this does not mean that federal preemption of state AI regulations is gone for good. In fact, many Republicans and even one or two influential Democrats in Congress continue to argue that AI preemption is a federal legislative priority. What it does mean is that any moratorium introduced in the near future will likely have to be packaged with some kind of substantive federal AI policy in order to have any realistic chance of succeeding.

For those who have been hoping for years that the federal government would one day implement some meaningful AI policy, this presents an opportunity. If Republicans hope to pass a new moratorium through the normal legislative process, rather than as part of the next reconciliation bill, they will need to offer a deal that can win the approval of a number of Democratic senators (seven, currently, although that number may grow or shrink following the 2026 midterm elections) to overcome a filibuster. The most likely outcome is that nothing will come of this opportunity. An increasingly polarized political climate means that passing legislation is harder than it’s ever been before, and hammering out a deal that would be broadly acceptable to industry and the various other interest groups supporting and opposing preemption and AI regulation may not be feasible. Still, there’s a chance.

Efforts to include a moratorium in the NDAA seem unlikely to succeed. Even if this particular effort fails, however, preemption of state AI laws will likely continue to be a hot topic in AI governance for the foreseeable future. This means that arguably the most pressing AI policy question of the moment is: How should federal preemption of state AI laws and regulations work? In other words, what state laws should be preempted, and what kind of federal framework should they be replaced with?

I argue that the answer to that question is as follows: Regulatory authority over AI should be allocated between states and the federal government by means of an iterative process that takes place over the course of years and involves reactive preemption of fairly narrow categories of state law.

The evidence I’ll offer in support of this claim is primarily historical. As I argue below, this iterative back-and-forth process is the only way in which the allocation of regulatory authority over an important emerging technology has ever been determined in the United States. That’s not a historical accident; it’s a consequence of the fact that the approach described above is the only sensible approach that exists. The world is complicated, and predicting the future course of a technology’s development is notoriously difficult. So is predicting the kinds of governance measures that a given technology and its applications will require. Trying to determine how regulatory authority over a new technology should be allocated ex ante is like trying to decide how each room of an office building should be furnished before the blueprints have even been drawn up—it can be done, but the results will inevitably be disappointing.

The Reconciliation Moratorium Was Unprecedented

The reconciliation moratorium, if it had passed, would have been unprecedented with respect to its substance and its scope. The lack of substance—that is, the lack of any affirmative federal AI policy accompanying the preemption of state regulations—has been widely discussed elsewhere. It’s worth clarifying, however, that deregulatory preemption is not in and of itself an unprecedented or inherently bad idea. The Airline Deregulation Act of 1978, notably, preempted state laws relating to airlines’ “rates, routes, or services” and also significantly reduced federal regulation in the same areas. Congress determined that “maximum reliance on competitive market forces” would lead to increased efficiency and benefit consumers and, therefore, implemented federal deregulation while also prohibiting states from stepping in to fill the gap.

What distinguished the moratorium from the Airline Deregulation Act was its scope. The moratorium would have prohibited states from enforcing “any law or regulation … regulating artificial intelligence models, artificial intelligence systems, or automated decision systems entered into interstate commerce” (with a few exceptions, including for “generally applicable” laws). But preemption of “any state law or regulation … regulating airplanes entered into interstate commerce” would have been totally out of the question in 1978. In fact, the vast majority of airplane-related state laws and regulations were unaffected by the Airline Deregulation Act. By the late 1970s, airplanes were a relatively well understood technology and air travel had been extensively regulated, both by the states and by the federal government, for decades. Many states devoted long sections of their statutory codes exclusively to aeronautics. The Airline Deregulation Act’s prohibition on state regulation of airline “rates, routes, or services” had no effect on existing state laws governing airlines’ liability for damage to luggageairport zoning regulationsthe privileges and duties of airport security personnelstate licensing requirements for pilots and for aircraft, or the legality of maneuvering an airplane on a public highway.

In short, the AI moratorium was completely unprecedented because it would have preempted an extremely broad category of state law and replaced it with nothing. In all the discussions I’ve had with die-hard AI preemption proponents (and there have been many), the only preemption measures I’ve encountered that have been anywhere near as broad as the reconciliation moratorium were packaged with an extensive and sophisticated scheme of federal regulation. The Federal Food, Drug, and Cosmetic Act, for example, prohibits states from establishing “any requirement [for medical devices, broadly defined] … which is different from … a requirement applicable under this chapter to the device.” But the breadth of that provision is proportional to the legendary intricacy of the federal regulatory regime of which it forms a part. The idea of a Food and Drug Administration-style licensing regime for frontier AI systems has been proposed before, but it’s probably a bad idea for the reasons discussed in Daniel Carpenter’s excellent article on the subject. Regardless, proponents of preemption would presumably oppose such a heavy-handed regulatory regime no matter how broad its preemption provisions were.

Premature and Overbroad Preemption Is a Bad Idea

Some might argue that the unprecedented nature of the moratorium was a warranted response to unprecedented circumstances. The difficulty of getting bills through a highly polarized Congress means that piecemeal preemption may be harder to pull off today than it was in the 20th century. Moreover, some observers believe that AI is an unprecedented technology (although there is disagreement on this point), while others argue that the level of state interest in regulating AI is unprecedented and therefore requires an unprecedentedly swift and broad federal response. That latter claim is, in my opinion, overstated: While a number of state bills that are in some sense about “AI” have been proposed, most of these will not become law, and the vast majority of those that do will not impose any meaningful burden on AI developers. That said, preemption proponents have legitimate concerns about state overregulation harming innovation. These concerns (much like concerns about existential risk or other hypothetical harms from powerful future AI systems) are currently speculative, because the state AI laws that are currently in effect do not place significant burdens on developers or deployers of AI systems. But premature regulation of an emerging technology can lead to regulatory lock-in and harmful path dependence, which bolsters the case for proactive and early preemption.

Because of these reasonable arguments for departing from the traditional iterative and narrow approach to preemption, establishing that the moratorium was unprecedented is less important than understanding why that approach to preemption has never been tried before. In my opinion, the reason is that any important new technology will require some amount of state regulation and some amount of federal regulation, and it’s impossible to determine the appropriate limits of state and federal authority ex ante.

There’s no simple formula for determining whether a given regulatory task should be undertaken by the states, the federal government, both, or neither. As a basic rule of thumb, though, the states’ case is strongest when the issue is purely local and relates to a state’s “police power”—that is, when it implicates a state’s duty to protect the health, safety, and welfare of its citizens. The federal government’s case, meanwhile, is typically strongest when the issue is purely one of interstate commerce or other federal concerns such as national security.

In the case of the Airline Deregulation Act, discussed above, Congress appropriately determined in 1978 that the regulation of airline rates and routes—an interstate commerce issue if ever there was one—should be undertaken by the federal government, and that the federal government’s approach should be deregulatory. But this was only one part of a back-and-forth exchange that took place over the course of decades in response to technological and societal developments. Regulation of airport noise levels, for example, implicates both interstate commerce (because airlines are typically used for interstate travel) and the police power (because “the area of noise regulation has traditionally been one of local concern”). It would not have been possible to provide a good answer to the question of who should regulate airport noise levels a few years after the invention of the airplane, because at that point modern airports—which facilitate the takeoff and landing of more than 44,000 U.S. flights every day—simply didn’t exist. Instead, a reasonable solution to the complicated problem was eventually worked out through a combination of court decisionslocal and federal legislation, and federal agency guidance. All of these responded to technological and societal developments (the jet engine; supersonic flight; increases in the number, size, and economic importance of airports) rather than trying to anticipate them.

Consider another example: electricity. Electricity was first used to power homes in the U.S. in the 1880s, achieved about 50 percent adoption by 1925, was up to 85 percent by 1945, and was used in nearly all homes by 1960. During its early days, electricity was delivered via direct current and had to be generated no more than a few miles from where it was consumed. Technological advances, most notably the widespread adoption of alternating current, eventually allowed electricity to be delivered to consumers from power plants much farther away, allowing for cheaper power due to economies of scale. Initially, the electric power industry was regulated primarily at the municipal level, but beginning in 1907 states began to assume primary regulatory authority. In 1935, in response to court decisions striking down state regulations governing the interstate sale of electricity as unconstitutional, Congress passed the Federal Power Act (FPA), which “authorized the [predecessor of the Federal Energy Regulatory Commission (FERC)] to regulate the interstate transportation and wholesale sale (i.e. sale for retail) of electric energy, while leaving jurisdiction over intrastate transportation and retail sales (i.e. sale to the ultimate consumer) in the hands of the states.” Courts later held that the FPA impliedly preempted most state regulations governing interstate wholesale sales of electricity.

If your eyes began to glaze over at some point toward the end of that last paragraph, good! You now understand that the process by which regulatory authority over the electric power industry was apportioned between the states and the federal government was extremely complicated. But the FPA only dealt with a small fraction of all the regulations affecting electricity. There are also state and local laws and regulations governing the licensing of electriciansthe depth at which power lines must be buried, and the criminal penalties associated with electricity theft, to name a few examples. By the same token, there are federal laws and rules concerning tax credits for wind turbine blade manufacturing, the legality of purchasing substation transformers from countries that are “foreign adversaries,” lightning protection for commercial space launch sitesuse of electrocution for federal executions, … and so on and so forth. I’m not arguing for more regulation here—it’s possible that the U.S. has too many laws, and that some of the regulations governing electricity are unnecessary or harmful. But even if extensive deregulation occurred, eliminating 90 percent of state, local, and federal rules relating to electricity, a great number of necessary or salutary rules would remain at both the federal and state levels. Obviously, the benefits of electricity have far exceeded the costs imposed by its risks. At the same time, no one denies that electricity and its applications do create some real dangers, and few sensible people dispute the fact that it’s beneficial to society for the government to address some of these dangers with common-sense regulations designed to keep people safe.

Again, the reconciliation moratorium would have applied, essentially, to any laws “limiting, restricting, or otherwise regulating” AI models or AI systems, unless they were “generally applicable” (in other words, unless they applied to AI systems only incidentally, in the same way that they applied to other technologies, and did not single out AI for special treatment). Imagine if such a restriction had been imposed on state regulation of electricity, at a similar early point in the development of that technology. The federal government would have been stuck licensing electricians, responding to blackouts, and deciding which municipalities should have buried as opposed to overhead power lines. If this sounds like a good idea to you, keep in mind that, regardless of your politics, the federal government has not always taken an approach to regulation that you would agree with. Allowing state and local control over purely local issues allows more people to have what they want than would a one-size-fits-all approach determined in Washington, D.C.

But the issue with the reconciliation moratorium wasn’t just that it did a bad job of allocating authority between states and the federal government. Any attempt to make a final determination of how that authority should be allocated for the next 10 years, no matter how smart its designers were, would have met with failure. Think about how difficult it would have been for someone living a mere five or 10 years after the invention of electricity to determine, ex ante, how regulatory authority over the new technology should be allocated between states and the federal government. It would, of course, have been impossible to do even a passable job. The knowledge that governing interstate commerce is traditionally the core role of the federal government, while addressing local problems that affect the health and safety of state residents is traditionally considered to be the core of a state’s police power, takes you only so far. Unless you can predict all the different risks and problems that the new technology and its applications will create as it matures, it’s simply not possible to do a good job of determining which of them should be addressed by the federal government and which should be left to the states.

Airplanes and electricity are far from the only technologies that can be used to prove this point. The other technologies commonly cited in historical case studies on AI regulation—railroads, nuclear power, telecommunications, and the internet—followed the same pattern. Regulatory authority over each of these technologies was allocated between states and the federal government via an iterative back-and-forth process that responded to technological and societal developments rather than trying to anticipate them. Preemption of well-defined categories of state law was typically an important part of that process, but preemption invariably occurred after the federal government had determined how it wanted to regulate the technology in question. The Carnegie Endowment’s excellent recent piece on the history of emerging technology preemption reaches similar conclusions and correctly observes that “[l]egislators do not need to work out the final division between federal and state governments all in one go.”

The Right Way to Do Preemption

Because frontier AI development is to a great extent an interstate commerce issue, it would in an ideal world be regulated primarily by the federal government rather than the states (although the fact that we don’t live in an ideal world complicates things somewhat). While the premature and overbroad attempts at preemption that have been introduced so far would almost certainly end up doing more harm than good, it should be possible (in theory, at least) to address legitimate concerns about state overregulation through an iterative process like the one described above. In other words, there is a right way to do preemption—although it remains to be seen whether any worthwhile preemption measure will ever actually be introduced. Below are four suggestions for how preemption of state AI laws ought to take place.

1. The scope of any preemption measure should correspond to the scope of the federal policies implemented.

The White House AI Action Plan laid out a vision for AI governance that emphasized the importance of innovation while also highlighting some important federal policy priorities for ensuring that the development and deployment of powerful future AI systems happens securely. Building a world-leading testing and evaluations ecosystem, implementing federal government evaluations of frontier models for national security risks, bolstering physical and cybersecurity at frontier labs, increasing standard-setting activity by the Center for AI Standards and Innovation (CAISI), investing in vital interpretability and control research, ramping up export control enforcement, and improving the federal government’s AI incident response capacity are all crucial priorities. Additional light-touch frontier AI security measures that Congress might consider include (to name a few) codifying and funding CAISI, requiring mandatory incident reporting for frontier AI incidents, establishing federal AI whistleblower protections, and authorizing mandatory transparency requirements and reporting requirements for frontier model development. None of these policies would impose any significant burden on innovation, and they might well provide significant public safety and national security benefits.

But regardless of which policies Congress ultimately chooses to adopt, the scope of preemption should correspond to the scope of the federal policies implemented. This correspondence could be close to 1:1. For instance, a federal bill that included AI whistleblower protections and mandatory transparency requirements for frontier model developers could be packaged with a provision preempting only state AI whistleblower laws (such as § 4 of California’s SB 53) and state frontier model transparency laws (such as § 2 of SB 53).

However, a more comprehensive federal framework might justify broader preemption. Under the legal doctrine of “field preemption,” federal regulatory regimes so pervasive that they occupy an entire field of regulation are interpreted by courts to impliedly preempt any state regulation in that field. It should be noted, however, that the “field” in question is rarely if ever so broadly defined that all state regulations relating to an important emerging technology are preempted. Thus, while courts interpreted the Atomic Energy Act to preempt state laws governing the “construction and operation” of nuclear power plants and laws “motivated by radiological concerns,” many state laws regulating nuclear power plants were left undisturbed. In the AI context, it might make sense to preempt state laws intended to encourage the safe development of frontier AI systems as part of a package including federal frontier AI safety policies. It would make less sense to implement the same federal frontier AI safety policies and preempt state laws governing self-driving cars, because this would expand the scope of preemption far beyond the scope of the newly introduced federal policy.

As the Airline Deregulation Act and the Internet Tax Freedom Act demonstrate, deregulatory preemption can also be a wise policy choice. Critically, however, each of those measures (a) preempted narrow and well-understood categories of state regulation and (b) reflected a specific congressional determination that neither the states nor the federal government should regulate in a certain well-defined area.

2. Preemption should focus on relatively narrow and well-understood categories of state regulation.

“Narrow” is relative, of course. It’s possible for a preemption measure to be too narrow. A federal bill that included preemption of state laws governing the use of AI in restaurants would probably not be improved if its scope was limited so that it applied only to Italian restaurants. Dean Ball’s thoughtful recent proposal provides a good starting point for discussion. Ball’s proposal would create a mandatory federal transparency regime, with slightly stronger requirements than existing state transparency legislation, and in exchange would preempt four categories of state law—state laws governing algorithmic pricing, algorithmic discrimination, disclosure mandates, and “mental health.”

Offering an opinion on whether this trade would be a good thing from a policy perspective, or whether it would be politically viable, is beyond the scope of this piece. But it does, at least, do a much better job than other publicly available proposals of specifically identifying and defining the categories of state law that are to be preempted. I do think that the “mental health” category is significantly overbroad; my sense is that Ball intended to address a specific class of state law regulating the use of AI systems to provide therapy or mental health treatment. His proposal would, in my opinion, be improved by identifying and targeting that category of law more specifically. As written, his proposed definition would sweep in a wide variety of potential future state laws that would be both (a) harmless or salutary and (b) concerned primarily with addressing purely local issues. Nevertheless, Ball’s proposal strikes approximately the correct balance between legitimate concerns regarding state overregulation and equally legitimate concerns regarding the unintended consequences of premature and overbroad preemption.

3. Deregulatory preemption should reflect a specific congressional determination against regulating in a well-defined area.

An under-discussed aspect of the reconciliation moratorium debate was that supporters of the moratorium, at least for the most part, did not claim that they were eliminating state regulations and replacing them with nothing as part of a deregulatory effort. Instead, they claimed that they were preempting state laws now and would get around to enacting a federal regulatory framework at some later date.

This was not and is not the correct approach. Eliminating states’ ability to regulate in an area, while decreasing Congress’s political incentives to reach a preemption-for-policy trade in the same area, decreases the odds that Congress will take meaningful action in the near future. And setting aside the political considerations, that kind of preemption would make it impossible for the normal back-and-forth process through which regulatory authority is usually allocated to take place. If states are banned from regulating, there’s no opportunity for Congress, federal agencies, courts, and the public to learn from experience what categories of state regulation are beneficial and which place unnecessary burdens on interstate commerce. Deregulatory preemption can be a legitimate policy choice, but when it occurs it should be the result of an actual congressional policy judgment favoring deregulation. And, of course, this congressional judgment should focus on specific, well-understood, and relatively narrow categories of state law. As a general rule of thumb, express preemption should take place only once Congress has a decent idea of what exactly is being preempted.

4. Preemption should facilitate, rather than prevent, an iterative process for allocating regulatory authority between states and the federal government.

As the case studies discussed above demonstrate, the main problem with premature and overbroad preemption is that it would make it impossible to follow the normal process for determining the appropriate boundaries of state and federal regulatory jurisdiction. Instead, preemption should take place after the federal government has formed some idea of how it wants to regulate AI and what specific categories of state law are inconsistent with its preferred regulatory scheme.

Ball’s proposal is instructive here as well, in that it provides for a time-limited preemption window of three years. Given the pace at which AI capabilities research is progressing, a 10- or even five-year moratorium on state regulation in a given area is far more problematic than a shorter period of preemption. This is, at least in part, because shorter preemption periods are less likely to prevent the kind of iterative back-and-forth process described above from occurring. Even three years may be too long in the AI governance context, however; three years prior to this writing, ChatGPT had not yet been publicly released. A two-year preemption period for narrowly defined categories of state law, by contrast, might be short enough to facilitate the kind of iterative process described above rather than preventing a productive back-and-forth from occurring.

***

Figuring out who should regulate an emerging technology and its applications is a complicated and difficult task that should be handled on an issue-by-issue basis. Preempting counterproductive or obnoxious state laws should be part of the process, but preempting broad categories of state law before we even understand what it is that we’re preempting is a recipe for disaster. It is true that there are costs associated with this approach; it may eventually allow some state laws that are misguided or harmful to innovation to go into effect. To the extent that such laws are passed, however, they will strengthen the case for preemption. Colorado’s AI Act, for example, has been criticized for being burdensome and difficult to comply with and has also generated considerable political support for broad federal preemption, despite the fact that it has yet to go into effect. By the same token, completely removing states’ ability to regulate, even as AI capabilities improve rapidly and real risks begin to manifest, may create considerable political pressure for heavy-handed regulation and ultimately result in far greater costs than industry would otherwise have faced. Ignoring the lessons of history and blindly implementing premature and overbroad preemption of state AI laws is a recipe for a disaster that would harm both the AI industry and the general public.

The Genesis Mission Executive Order: What It Does and How it Shapes the Future of AI-Enabled Scientific Research

Summary

On November 24, the White House released an Executive Order launching the Genesis Mission—a bold plan to build a unified national AI-enabled science platform linking federal supercomputers, secure cloud networks, public and proprietary datasets, scientific foundation models, and even automated laboratory systems. The Administration frames the Genesis Mission as a Manhattan Project-scale scientific effort.

The EO lays out the organizational and planning framework for the Genesis Mission and tasks the Department of Energy with assembling the resources required to launch it. Working in highly consequential scientific domains—such as biotechnology, where dual-use safety and security issues routinely arise—gives the federal government a timely opportunity to build the oversight and governance capacity that will be needed as AI-enabled science advances.

1. What the EO Actually Does

The EO directs the DOE and White House Office of Science and Technology Policy (OSTP) to spend the next year defining the scope of the Genesis Mission and proving what can be done using existing authority and appropriations. It’s important to keep in mind that an Executive Order cannot itself create new funding or new legal authority, so future steps will depend on Congressional action.

Mandated near-term tasks include:

  1. Identify at least twenty “science and technology challenges of national importance” that must span priority domains such as biotechnology, advanced manufacturing, critical materials, quantum computing, nuclear science, and semiconductors. DOE will start, and OSTP will expand and finalize the list.
  2. Inventory all relevant federal resources, including computing, data, networking, and automated experimentation capabilities.
  3. Define initial datasets and AI models and develop a plan with “risk-based cybersecurity measures” that will enable incorporating data from federally funded research, other agencies, academia, and approved private sector partners.
  4. Produce an initial demonstration of the “American Science and Security Platform,” using only currently available tools and legal authorities.

These are primarily coordination and planning tasks aimed at defining the scope of an integrated AI science platform and demonstrating what can be done with existing resources within DOE. DOE’s activities set forth in the EO appear to align with Section 50404 of the OBBBA reconciliation bill (H.R. 1), which appropriates $150 million through September 2026 to DOE for work on “transformational artificial intelligence models.” Although not referenced in the EO, Section 50404 directs DOE to develop a public-private infrastructure to curate large scientific datasets and create “self-improving” AI models with applications such as more efficient chip design and new energy technologies. DOE’s Section 50404 appropriation is the subject of an ongoing Request for Information (RFI), in which DOE is seeking input on how to structure and implement such public-private research consortia.

The EO does not itself mandate building the full system beyond DOE. Rather, these steps begin the process of assembling underlying infrastructure. The EO outlines broad interagency coordination, but key details need to be worked out, including who can access the platform, how users will be vetted, and whether it will be open to broad scientific use or limited to national security-priority domains.

In that sense, the EO is best understood as establishing the groundwork for a future AI-enabled and automated science infrastructure—while its full build-out will depend on Congress, other agencies, and private sector partnerships.

2. Who Holds the Pen

The Genesis Mission envisions centralized leadership for interagency coordination, with two primary actors:

Technical leadership will likely sit with Under Secretary for Science Darío Gil, who oversees the DOE national labs and major research programs. Strategic coordination, including interactions with other agencies and industry, will likely run through Michael Kratsios, OSTP Director and Presidential Science Advisor.

The EO directs only DOE to take specific actions. What this means is that ultimately the interagency coordination is more aspirational, and likely will depend on Congressional actions to add or redirect funding to work on the Genesis Mission. At this point, the EO envisions DOE as the primary operator of the ultimate platform with OSTP shaping strategy. The practical impact of the Mission will largely depend on how these resources are ultimately shared and made accessible across agencies, which the EO leaves open for now.

3. The Goal: Accelerating High-Stakes Science

Here’s where the Genesis Mission may be most consequential. The EO envisions a platform that sits at the center of scientific domains with national and economic significance. These are areas where integrating AI models, different kinds of information from government and private databases, and being able to run lots of experiments using automation can provide high leverage.

For example, in biological research, an integrated AI-science platform could accelerate drug development, improve biomanufacturing, strengthen pandemic preparedness, tackle chronic disease, and support emerging industries that can help economic growth and allow the United States to maintain global leadership. DOE is well positioned to contribute here, given its national laboratories, high-performance computing, and experience managing large-scale scientific infrastructure.

The Genesis Mission EO suggests that the Administration expects the Mission to support research with high scientific value as well as complex security and safety considerations. While it doesn’t reference new or existing regulations, the EO requires DOE to operate the platform consistent with:

A system that integrates large biological datasets, frontier-scale foundation models, and automated lab workflows could dramatically accelerate discovery. It’s important to keep in mind, however, that such capabilities can also intersect with longstanding dual-use concerns: areas where the same tools that advance beneficial research might also lower barriers to potential harms.

4. Why Governance Matters for the Genesis Mission

Biology offers a clear example of the kinds of oversight challenges that can arise as AI accelerates scientific research. AI and lab automation can lower barriers to manipulating or enhancing dangerous pathogens, which is often referred to as “gain-of-function” research.

Importantly, the launch of the Genesis Mission comes while key federal biosafety revisions are still in progress. In May, the White House issued Executive Order 14292, “Improving the Safety and Security of Biological Research.” That EO called for strengthening oversight of certain high-consequence biological research, including gain of function. It imposed several tasks on OSTP, including:

Since then, there has been partial progress towards these goals, including NIH and USDA funding bans on gain-of-function research. But several other updates called for in EO 14292 have not been finalized. The Genesis Mission creates both an opportunity and a need to advance this work. By accelerating AI-enabled scientific research, the Mission heightens the importance of clear, modernized biosafety and biosecurity guidance—and gives the Administration a natural venue to advance it.

As DOE begins integrating advanced computation, large biological datasets, and automated experimentation, it becomes even more valuable to clarify how federal guidance should apply to AI-augmented research. The Genesis Mission may ultimately help spur the release of updated oversight frameworks and encourage broader policy discussions—including potential legislation—on how to manage dual-use research in the era of integrated AI for science platforms.

These issues aren’t limited to biology either. The Genesis Mission EO names nuclear science, quantum computing, advanced materials, and other domains where AI-accelerated discovery creates both major opportunities and critical governance issues.

5. The Hard Policy Questions Ahead

At first, the Genesis Mission will likely be a largely DOE-run effort limited to federal researchers and a small group of partners. But if it grows along the ambitious lines the EO lays out, managing who can access it—and how—becomes far more challenging. Once integrated AI-driven systems can design, optimize, or automate significant parts of scientific research, regulation becomes both urgent and harder to enforce in a uniform way:

These private and academic systems may be entirely outside federal oversight, complicating attempts to build coherent guardrails.

If the Genesis Mission succeeds, it will generate substantial new scientific data that will help train more capable models and enable new research pathways. At the same time, access to more powerful models and broader datasets will increase the importance of developing effective policies for data governance, user access, and managing research across the government and with the private sector.

6. Bottom Line

The Genesis Mission sets an ambitious vision for a unified AI-enabled science platform within the federal government. Its success will depend on future funding, interagency participation, and sustained follow-through. But even at this early planning stage, the EO brings core policy issues to the surface: oversight, data governance, access rules, and how to manage research that cuts across agencies and private sector entities.

As DOE and OSTP begin work on the Genesis Mission, it also creates a timely opportunity for the federal government to update dual-use oversight frameworks, such as in biosafety as called for by EO 14292, and build governance structures needed for AI-accelerated science.

Legal Issues Raised by the Proposed Executive Order on AI Preemption

On November 19, 2025, a draft executive order that the Trump administration may issue as early as Friday, November 21 was publicly leaked. The six-page order consists of nine sections, including prefatory purpose and policy statements, a section containing miscellaneous “general provisions,” and six substantive provisions. This commentary provides a brief overview of some of the most important legal issues raised by the draft executive order (DEO). This commentary is not intended to be comprehensive, and LawAI may publish additional commentaries and/or updates as events progress and additional legal issues come to light.

As an initial matter, it’s important to understand what an executive order is and what legal effect executive orders have in the United States. An executive order is not a congressionally enacted statute or “law.” While Congress undoubtedly has the authority to preempt some state AI laws by passing legislation, the President generally cannot unilaterally preempt state laws by presidential fiat (nor does the DEO purport to do so). An executive order can publicly announce the policy goals of the executive branch of the federal government, and can also contain directives from the President to executive branch officials and agencies.

Issue 1: The Litigation Task Force

The DEO’s first substantive section, § 3, would instruct the U.S. Attorney General to “establish an AI Litigation Task Force” charged with bringing lawsuits in federal court to challenge allegedly unlawful state AI laws. The DEO suggests that the Task Force will challenge state laws that allegedly violate the dormant commerce clause and state laws that are allegedly preempted by existing federal regulations. The Task Force is also authorized to challenge state AI laws under any other legal basis that the Department of Justice (DOJ) can identify.

Dormant commerce clause arguments

Presumably, the DEO’s reference to the commerce clause refers to the dormant commerce clause argument laid out by Andreessen Horowitz in September 2025. This argument, which a number of commentators have raised in recent months, suggests that certain state AI laws violate the commerce clause of the U.S. Constitution because they impose excessive burdens on interstate commerce. LawAI’s analysis indicates that this commerce clause argument, at least with respect to the state laws specifically referred to in the DEO, is legally meritless and unlikely to succeed in court. We intend to publish a more thorough analysis of this issue in the coming weeks in addition to the overview included here.

In 2023, the Supreme Court issued an important dormant commerce clause opinion in the case of National Pork Producers Council v. Ross. The thrust of the majority opinion in that case, authored by Justice Gorsuch, is that state laws generally do not violate the dormant commerce clause unless they involve purposeful discrimination against out-of-state economic interests in order to favor in-state economic interests.

Even proponents of this dormant commerce clause argument typically acknowledge that the state AI laws they are concerned with generally do not discriminate against out-of-state economic interests. Therefore, they often ignore Ross, or cite the dissenting opinions while ignoring the majority. Their preferred precedent is Pike v. Bruce Church, Inc., a 1970 case in which the Supreme Court held that a state law with “only incidental” effects on interstate commerce does not violate the dormant commerce clause unless “the burden imposed on such commerce is clearly excessive in relation to the putative local benefits.” This standard opens the door for potential challenges to nondiscriminatory laws that arguably impose a “clearly excessive” burden on interstate commerce.

The state regulation that was invalidated in Pike would have required cantaloupes grown in Arizona to be packed and processed in Arizona as well. The only state interest at stake was the “protect[ion] and enhance[ment] of [cantaloupe] growers within the state.” The Court in Pike specifically acknowledged that “[w]e are not, then, dealing here with state legislation in the field of safety where the propriety of local regulation has long been recognized.”

Even under Pike, then, it’s hard to come up with a plausible argument for invalidating the state AI laws that preemption advocates are concerned with. Andreessen Horowitz’s argument is that the state proposals in question, such as New York’s RAISE Act, “purport to have significant safety benefits for their residents,” but in fact “are unlikely” to provide substantial safety benefits. But this is, transparently, a policy judgment, and one with which the state legislature of New York evidently disagrees. As Justice Gorsuch observes in Ross, “policy choices like these usually belong to the people and their elected representatives. They are entitled to weigh the relevant ‘political and economic’ costs and benefits for themselves, and ‘try novel social and economic experiments’ if they wish.” New York voters overwhelmingly support the RAISE Act, as did an overwhelming majority of New York’s state legislature when the bill was put to a vote. In my opinion, it is unlikely that any federal court will presume to override those policy judgments and substitute its own.

That said, it is possible to imagine a state AI law that would violate the dormant commerce clause. For example, a law that placed burdensome requirements on out-of-state developers while exempting in-state developers, in order to grant an advantage to in-state AI companies, would likely be unconstitutional. Since I haven’t reviewed every state AI bill that has been or will be proposed, I can’t say for sure that none of them would violate the dormant commerce clause. It is entirely possible that the Task Force will succeed in invalidating one or more state laws via a dormant commerce clause challenge. It does seem relatively safe, however, to predict that the specific laws referred to in the executive order and the state frontier AI safety laws most commonly referenced in discussions of preemption would likely survive any dormant commerce clause challenges brought against them.

State laws preempted by existing federal regulations

Section 3 of the DEO also specifically indicates that the AI Litigation Task Force will challenge state laws that “are preempted by existing Federal regulations.” It is possible for state laws to be preempted by federal regulations, and, as with the commerce clause issue discussed above, it’s possible that the Task Force will eventually succeed in invalidating some state laws by arguing that they are so preempted.

In the absence of significant new federal AI regulation, however, it is doubtful whether many of the state laws the DEO is intended to target will be vulnerable to this kind of legal challenge. Moreover, any state AI law that created significant compliance costs for companies and was plausibly preempted by existing federal regulations could be challenged by the affected companies, without the need for DOJ intervention. The fact that (to the best of my knowledge) no such lawsuit has yet been filed challenging the most notable state AI laws indicates that the new Task Force will likely be faced with slim pickings, at least until new federal regulations are enacted and/or state regulation of AI intensifies.

It seems likely that § 3’s reference to preemption via existing federal regulation is at least partially intended to refer to Communications Act preemption as discussed in the AI Action Plan. There is a major obstacle to preempting state AI laws under the Communications Act, however: the Communications Act provides the FCC (and sometimes courts) with some authority to preempt certain state laws regulating “telecommunications services” and “information services,” but existing legal precedents clearly establish that AI systems are neither “telecommunications services” nor “information services” under the Communications Act. In his comprehensive policy paper on FCC preemption of state AI laws, Lawrence J. Spiwak (a staunch supporter of preemption) analyzes the relevant precedents and concludes that “given the plain language of the Communications Act as well as the present state of the caselaw, it is highly unlikely the FCC will succeed in [AI preemption] efforts” and that “trying to contort the Communications Act to preempt the growing patchwork of disparate state AI laws is a Quixotic exercise in futility.” Harold Feld of Public Knowledge essentially agrees with this assessment in his piece on the same topic.

Alternative grounds

Section 3 also authorizes the Task Force to challenge state AI laws that are “otherwise unlawful” in the Attorney General’s judgment. The Department of Justice employs a great number of smart and creative lawyers, so it’s impossible to say for sure what theories they might come up with to challenge state AI laws. That said, preemption of state AI laws has been a hot topic for months now, and the best theories that have been publicly floated for preemption by executive action are the dormant commerce clause and Communications Act theories discussed above. This is, it seems fair to say, a bearish indicator, and I would be somewhat surprised if the Task Force managed to come up with a slam dunk legal argument for broad-based preemption that has hitherto been overlooked by everyone who’s considered this issue.

Issue 2: Restrictions on State Funding

Section 5 of the DEO contains two subsections that concern efforts to withhold federal grant funding from states that attempt to regulate AI. Subsection (a) indicates that Commerce will attempt to withhold non-deployment Broadband Equity Access and Deployment (BEAD) funding “to the maximum extent allowed by federal law” from states with AI laws listed pursuant to § 4 of the DEO, which instructs the Department of Commerce to identify state AI laws that conflict with the policy directives laid out in § 1 of the DEO. Subsection (b) instructs all federal agencies to assess their discretionary grant programs and determine whether existing or future grants can be withheld from states with AI laws that are challenged under § 3 or identified as undesirable pursuant to § 4.

In my view, § 5 of the DEO is the provision with the most potential to affect state AI legislation. While § 5 does not contain any attempt to actually preempt state AI laws, the threat of losing federal grant funds could have the practical effect of incentivizing some states to abandon their AI-related legislative efforts. And, as Daniel Cochrane and Jack Fitzhenry pointed out during the reconciliation moratorium fight, “Smaller conservative states with limited budgets and large rural populations need [BEAD] funds. But wealthy progressive states like California and New York can afford to take a pass and just keep enforcing their tech laws.” While politicians in deep blue states will be politically incentivized to fight the Trump administration’s attempts to preempt overwhelmingly popular AI laws even if it means losing access to some federal funds, politicians in red states may instead be incentivized to avoid conflict with the administration.

Section 5(a): Non-deployment BEAD funding

Section 5(a) of the DEO is easier to analyze than § 5(b), because it clearly identifies the funds that are in jeopardy—non-deployment BEAD funding. The BEAD program is a $42.45 billion federal grant program established by Congress in 2021 for the purpose of facilitating access to reliable, high-speed broadband internet for communities throughout the U.S. A portion of the $42.45 billion total was allocated to each of 56 states and territories in June 2023 by the National Telecommunications and Information Administration (NTIA). In June 2025, the NTIA announced a restructuring of the BEAD program that eliminated many Biden-era requirements and rescinded NTIA approval for all “non-deployment” BEAD funding, i.e., BEAD funding that states intended to spend on uses other than actually building broadband infrastructure. The total amount of BEAD funding that will ultimately be classified as “non-deployment” is estimated to be more than $21 billion.

BEAD funding was previously used as a carrot and stick for AI preemption in June 2025, as part of the effort to insert a moratorium or “temporary pause” on state AI regulation into the most recent reconciliation bill. There are two critical differences between the attempted use of BEAD funding in the reconciliation process and its use in the DEO, however. First, the DEO is, obviously, an executive order rather than a legislative enactment. This matters because agency actions that would be perfectly legitimate if authorized by statute can be illegal if undertaken without statutory authorization. And secondly, while the final drafts of the reconciliation moratorium would only have jeopardized BEAD funding belonging to states that chose to accept a portion of $500 million in additional BEAD funding that the reconciliation bill would have appropriated, the DEO would jeopardize non-deployment BEAD funding belonging to any state that attempts to regulate AI in a manner deemed undesirable under the DEO.

The multibillion-dollar question here is: can the administration legally withhold BEAD funding from states because those states enact or enforce laws regulating AI? I am going to cop out and say, honestly, that I don’t know for certain at this point in time. There are a number of potential legal issues with the course of action that the DEO contemplates, but as of November 20, 2025 (one day after the DEO first leaked) no one has published a definitive analysis of whether the administration will be able to overcome these obstacles.

The Trump administration’s Department of Transportation (DOT) recently attempted a maneuver similar to the one contemplated in the DEO when, in response to an executive order directing agencies to “undertake any lawful actions to ensure that so-called ‘sanctuary’ jurisdictions… do not receive access to federal funds,” the DOT attempted to add conditions to all DOT grant agreements requiring grant recipients to cooperate in the enforcement of federal immigration law. Affected states promptly sued to challenge the addition of this grant condition and successfully secured a preliminary injunction prohibiting DOT from implementing or enforcing the conditions. In early November 2025, the federal District Court for the District of Rhode Island ruled that the challenged conditions were unlawful for three separate reasons: (1) imposing the conditions exceeded the DOT’s statutory authority under the laws establishing the relevant grant programs; (2) imposing the conditions was “arbitrary and capricious,” in violation of the Administrative Procedure Act; and (3) imposing the conditions violated the Spending Clause of the U.S. Constitution. It remains to be seen whether the district court’s ruling will be upheld by a federal appellate court and/or by the U.S. Supreme Court.

Suppose that, in the future, the Department of Commerce decides to withhold non-deployment BEAD funding from states with AI laws deemed undesirable under the DEO. States could challenge this decision in court and ask the court to order NTIA to release the previously allocated non-deployment funds to the states, arguing that the withholding of funds exceeded NTIA’s authority under the statute authorizing BEAD, violated the APA, and violated the Spending Clause. Each of these arguments seems at least somewhat plausible, on an initial analysis. Nothing in the statute authorizing BEAD appears to give the federal government unlimited discretion to withhold BEAD funds to vindicate policy goals that have little or nothing to do with access to broadband; rescinding previously awarded grant funds and then withholding them in order to further goals not contemplated by Congress is at least arguably arbitrary and capricious; and the course of action proposed in the DEO is, arguably, impermissibly coercive in violation of the Spending Clause.

AI regulation is a less politically divisive issue than immigration enforcement, and a cynical observer might assume that this would give states in this hypothetical AI case a better chance on appeal than the states in the DOT immigration conditions case discussed above. However, there are a number of differences between the DOT conditions case and the course of action contemplated in the DEO that could make it harder—or easier—for states to prevail in court. Accurately estimating states’ chances of success with high confidence will take more than one day’s worth of analysis.
It should also be noted that, regardless of whether or not states could eventually prevail in a hypothetical lawsuit, the prospect of having BEAD funding denied or delayed, perhaps for years, could be enough to discourage some states from enacting AI legislation of a type disfavored by the Department of Commerce under the DEO.

Section 5(b): Other discretionary agency funding

In addition to withholding non-deployment BEAD funding, the DEO would instruct agencies throughout the executive branch to “take immediate steps to assess their discretionary grant programs and determine whether agencies may condition such grants on States either not enacting an AI law that conflicts with the policy of this order… or, for those States that have enacted such laws, on those States entering into a binding agreement with the relevant agency not to enforce any such laws during any year in which it receives the discretionary funding.”

The legality of this contemplated course of action, and its likelihood of being upheld in court, is even more difficult to conclusively determine ex ante than the legality and prospects of the BEAD withholding discussed above. The federal government distributes about a trillion dollars a year in grants to state and local governments, and more than a quarter of that money is in the form of discretionary grants (as opposed to grants from mandatory programs such as Medicaid). That’s a lot of money, and it’s broken up into a lot of different discretionary grants. It’s likely that many of the arguments against withholding grant money from AI-regulating states would be the same from one grant to another. However, it is also likely that there are some discretionary grants to states which could more reasonably be conditioned on compliance with the President’s deregulatory AI policy directives and other grants for which such conditioning would be less reasonable. Ultimately, further research into this issue is needed to determine how much state grant funding, if any, is legitimately at risk.

Issue 3: Federal Reporting and Disclosure

Section 6 of the DEO instructs the FCC, in consultation with AI czar David Sacks, to “initiate a proceeding to determine whether to adopt a Federal reporting and disclosure standard for AI models that preempts conflicting State laws.” Presumably, “conflicting state laws” is intended to refer to state AI transparency laws such as California’s SB 53 and New York’s RAISE Act. It’s not clear from the language of the DEO what legal authority this “Federal reporting and disclosure standard” would be promulgated under. Under the Biden administration, the Department of Commerce’s Bureau of Industry and Security (BIS) attempted to impose reporting requirements on frontier model developers under the information-gathering authority provided by § 705 of the Defense Production Act—but § 705 has historically been used by BIS rather than the FCC, and I am not aware of any comparable authority that would authorize the FCC to implement a mandatory “federal reporting and disclosure standard” for AI models.

Generally, regulatory preemption can only occur when Congress has granted an executive-branch agency authority to promulgate regulations and preempt state laws inconsistent with those regulations. This authority can be granted expressly or by implication, but, as discussed above in the discussion of Communications Act preemption under § 3 of the DEO, the FCC has never before asserted that it possesses any significant regulatory authority (express or otherwise) over any aspect of AI development. It’s possible that the FCC is relying on a creative interpretation of its authority under the Communications Act—FCC Chairman Brendan Carr previously indicated that the FCC was “taking a look” at whether the Communications Act grants the FCC authority to regulate AI and preempt onerous state laws. However, as discussed above, legal commentators almost universally agree that “[n]othing in the Communications Act confers FCC authority to regulate AI.”

It’s possible that the language of the EO is simply meant to indicate that the FCC and Sacks will suggest a standard that may then be enacted into law by Congress. This would certainly overcome the legal obstacles discussed above, and could (depending on the language of the statute) allow for preemption of state AI transparency laws. However, it would require passing new federal legislation, which is easier said than done.

Issue 4: Preemption of state laws for “deceptive practices” under the FTC Act

Section 7 of the DEO directs the Federal Trade Commission (FTC) to issue a policy statement arguing that certain state AI laws are preempted by the FTC Act’s prohibition on deceptive commercial practices. Presumably, the laws which the DEO intends for this guidance to target include Colorado’s AI Act, which the DEO’s Purpose section accuses of “forc[ing] AI models to embed DEI in their programming, and to produce false results in order to avoid a ‘differential treatment or impact’…” on enumerated demographic groups, and other similar “algorithmic discrimination” laws. A policy statement on its own generally cannot preempt state laws, but it seems likely that the policy statement that the DEO instructs the FTC to create would be relied upon in subsequent preemption-related regulatory efforts and/or by litigants seeking to prevent enforcement of the allegedly preempted laws in court.

While the Trump administration has previously expressed disapproval of “woke” AI development practices, for example in the recent executive order on “Preventing Woke AI in the Federal Government,” this argument that the FTC Act’s prohibition on UDAP (unfair or deceptive acts or practices in or affecting commerce) preempts state algorithmic discrimination laws is, as far as I am aware, new. During the Biden administration, Lina Khan’s FTC published guidance containing an arguably similar assertion: that the “sale or use of—for example—racially biased algorithms” would be an unfair or deceptive practice under the FTC Act. Khan’s FTC did not, however, attempt to use this aggressive interpretation of the FTC Act as a basis for FTC preemption of any state laws.

Colorado’s AI statute has been widely criticized, including by Governor Jared Polis (who signed the act into law) and other prominent Colorado politicians. In fact, the law has proven so problematic for Colorado that Governor Polis, a Democrat, was willing to cross party lines in order to support broad-based preemption of state AI laws for the sake of getting rid of Colorado’s. Therefore, an attempt by the Trump administration to preempt Colorado’s law (or portions thereof) might meet with relatively little opposition from within Colorado. It’s not clear who, if anyone, would have standing to challenge FTC preemption of Colorado’s law if Colorado’s attorney general refused to do so. But Colorado is not the only state with a law prohibiting algorithmic discrimination, and presumably the guidance the DEO instructs the FTC to produce would inform attempts to preempt other “woke” state AI laws as well as Colorado’s.

The question of how those attempts would fare in federal court is an interesting one, and I look forward to reading analysis of the issue from commentators with expertise regarding the FTC Act and algorithmic discrimination laws. Unfortunately, I am not such a commentator and will therefore plead ignorance on this point.

The Unitary Artificial Executive

Editor’s note: The following are remarks delivered on October 23, 2025, at the University of Toledo Law School’s Stranahan National Issues Forum. Watch a recording of the address here. This transcript was originally posted at Lawfare.

Good afternoon. I’d like to thank Toledo Law School and the Stranahan National Issues Forum for the invitation to speak with you today. It’s an honor to be part of this series.

In 1973, the historian Arthur Schlesinger Jr., who served as a senior adviser in the Kennedy White House, gave us “The Imperial Presidency,” documenting the systematic expansion of unilateral presidential power that began with Washington and that Schlesinger was chronicling in the shadow of Nixon and Watergate. Each administration since then, Democrat and Republican alike, has argued for expansive executive authorities. Ford. Carter. Reagan. Bush 1. Clinton. Bush 2. Obama. The first Trump administration. Biden. And what we’re watching now in the second Trump administration is breathtaking.

This pattern of ever-expanding executive power has always been driven partly by technology. Indeed, through human history, transformative technologies drove large-scale state evolution. Agriculture made populations large enough for taxation and conscription. Writing enabled bureaucratic empires across time and distance. The telegraph and the railroad annihilated space, centralizing control over vast territories. And computing made the modern administrative state logistically possible. 

For American presidents specifically, this technological progression has been decisive. Lincoln was the first “wired president,” using the telegraph to centralize military command during the Civil War. FDR, JFK, and Reagan all used radio and then television to “go public” and speak directly to the masses. Trump is the undisputed master of social media.

I’ve come here today to tell you: We haven’t seen anything yet.

Previous expansions of presidential power were still constrained by human limitations. But artificial intelligence, or AI, eliminates those constraints—producing not incremental growth but structural transformation of the presidency. In this lecture I want to examine five mechanisms through which AI will concentrate unprecedented authority in the White House, turning Schlesinger’s “Imperial Presidency” into what I call the “Unitary Artificial Executive.” 

The first mechanism is the expansion of emergency powers. AI crises—things like autonomous weapons attacks or AI-enabled cybersecurity breaches—justify broad presidential action, exploiting the same judicial deference to executive authority in emergencies that courts have shown from the Civil War through 9/11 to the present. 

Second, AI enables perfect enforcement through automated surveillance and enforcement mechanisms, eliminating the need for the prosecutorial discretion that has always limited executive power. 

The third mechanism is information dominance. AI-powered messaging can saturate the public sphere through automated propaganda and micro-targeted persuasion, overwhelming the marketplace of ideas.

Fourth, AI in national security creates what scholars call the “double black box”—inscrutable AI nested inside national security secrecy. And when these inscrutable systems operate at machine speed, oversight becomes impossible. Cyber operations and autonomous weapons engagements complete in milliseconds—too fast and too opaque for meaningful oversight.

And fifth—and most dramatically—AI can finally realize the vision of the unitary executive. By that I mean something specific: not just a presidency with broad substantive authorities, but one that exerts complete, centralized control over executive branch decision-making. AI can serve as a cognitive proxy throughout the executive branch, injecting presidential preferences directly into algorithmic decisions, making unitary control technologically feasible for the first time.

These five mechanisms operate in two different ways. The first four expand the practical scope of presidential authority—emergency powers, enforcement, information control, and national security operations. They expand what presidents can do. The fifth mechanism is different. It’s about control. It determines how those powers are exercised. And the combination of these two creates an unprecedented concentration of power.

My argument is forward-looking, but it’s not speculative. From a legal perspective, these mechanisms build on existing presidential powers and fit comfortably within current constitutional doctrine. From a technological perspective, none of this requires artificial superintelligence or even artificial general intelligence. All of these capabilities are doable with today’s tools, and certainly achievable within the next few years.

Now, before we go further, let me tell you where I’m coming from. My academic career has focused on two research areas: first, the regulation of emerging technology, and, second, executive power. Up until now, these have been largely separate. This lecture brings those two tracks together.

But I also have some practical experience that’s relevant to this project. Before becoming a law professor, I was a junior policy attorney in the National Security Division at the Department of Justice. In other words, I was a card-carrying member of what the current administration calls the “deep state.”

One thing I learned is that the federal bureaucracy is very hard to govern. Decision-making is decentralized, information is siloed, civil servants have enormous autonomy—not so much because of their formal authority but because governing millions of employees is, from a practical perspective, impossible. That practical ungovernability is about to become governable.

Together with Nicholas Bednar, my colleague at the University of Minnesota Law School, I’ve been researching how this transformation might happen—and what it means for constitutional governance. This lecture is the first draft of the research we’ve been conducting.

So let’s jump in. To understand how the five mechanisms of expanded presidential power will operate—and why they’re not speculative—we need to start with AI’s actual capabilities. So what can AI actually do today, and what will it be able to do in the near future?

What Can AI Actually Do?

Again, I’m not talking about artificial general intelligence or superintelligence—those remain speculative, possibly decades away. I’m talking about today’s capabilities, including technology that is right now deployed in government systems. 

It’s helpful to think of AI as a pipeline with three stages: collection, analysis, and execution.

The first stage is data collection at scale. The best AI-powered facial recognition achieves over 99.9 percent accuracy and Clearview AI—used by federal and state law enforcement—has over 60 billion images. The Department of Defense’s Project Maven—an AI-powered video analysis program—demonstrates the impact: 20 people using AI now replicate what required 2,000. That’s a 100-fold increase in efficiency.

The second stage is data analysis. AI analyzes data at scales humans cannot match. FINRA—the financial industry self-regulator—processes 600 billion transactions daily using algorithmic surveillance, a volume that would require an army of analysts. FBI algorithms assess thousands of tip line calls a day for threat level and credibility. Systems like those from the technology company Palantir integrate databases across dozens of agencies in real time. All this analysis happens continuously, comprehensively, and faster than human oversight.

The third stage is automated execution, which operates at speeds and scales outstripping human capabilities. For example, DARPA’s AI-controlled F-16 has successfully engaged human pilots in mock dogfights, demonstrating autonomous combat capability. And the federal cybersecurity agency’s autonomous systems block more than a billion suspicious network connection requests across the federal government every year.

To summarize: AI can sense everything, process everything, and act on everything—all at digital speed and scale.

These are today’s capabilities—not speculation about future AI. But they’re also just the baseline. And they’re scaling up dramatically—driven by two forces. 

The first driver is the internal trajectory of AI itself. Training compute—the processing power used to build AI systems—has increased four to five times per year since 2010. Epoch AI, a research organization tracking AI progress, projects that frontier AI models will use thousands of times more compute than OpenAI’s GPT-4 by 2030, with training clusters costing over $100 billion. 

What will this enable? By 2030 at the latest, AI should be capable of building large-scale software projects, producing advanced mathematical proofs, and engaging in multi-week autonomous research. In government, that means AI systems that don’t just analyze but execute complete, large-scale tasks from start to finish. 

The second driver of AI advancement is geopolitical competition. China’s 2017 AI Development Plan targets global leadership by 2030, backed by massive state investment. They’ve deployed generative AI news anchors and built the nationwide Skynet video surveillance system—and yes, they actually called it that. China’s technical capabilities are advancing rapidly—the DeepSeek breakthrough earlier this year demonstrated that Chinese researchers can match or exceed Western AI performance, often at a fraction of the cost.

In today’s polarized Washington, there’s only one thing Democrats and Republicans agree on: China is a threat that must be confronted. That consensus is driving much of AI policy. So it’s unsurprising that the administration’s recent AI Action Plan frames the U.S. response as seeking “unquestioned … technological dominance.” Federal generative AI use cases have increased ninefold in one year, and the Defense Department awarded $800 million in AI contracts this past July. The department has also established detailed procedures for developing autonomous lethal weapons, reflecting the Pentagon’s assumption that such systems are the future. 

It’s easy to see how this competitive dynamic could be used to justify concentrating AI in the executive branch. “We can’t afford congressional delays. Transparency would give adversaries advantages. Traditional deliberation is incompatible with the speed of AI development.” The AI arms race could easily become a permanent emergency justifying rapid deployment.

Five Mechanisms Through Which AI Concentrates Presidential Power

So those are the drivers of AI progress—rapidly advancing capabilities and geopolitical pressure. Now let’s examine the five distinct mechanisms through which these forces will actually concentrate presidential power.

Mechanism 1: Emergency Powers

Presidential emergency powers rest on two sources with deep historical roots. The first is inherent presidential authority under Article II. For example, during the Civil War, Lincoln blockaded Southern ports, increased the army, and spent unauthorized funds, all claiming inherent constitutional authority as commander in chief.

The second source of emergency powers are explicit congressional delegations. When FDR closed every bank in March 1933, he did so under the Trading with the Enemy Act. After 9/11, Congress passed an Authorization for Use of Military Force—still in effect two decades later and the source of ongoing military operations across multiple continents. Today the presidency operates under more than 40 continuing national emergencies. For example, Trump has invoked the International Emergency Economic Powers Act (IEEPA) to impose many of his ongoing tariffs, declaring trade imbalances a national security emergency.

With both sources, courts usually defer. From the Prize Cases upholding Lincoln’s Southern blockade through Korematsu affirming Japanese internment to Trump v. Hawaii permitting the first Trump administration’s Muslim travel bans, the Supreme Court has generally granted presidents extraordinary latitude during emergencies. There are of course exceptions—Youngstown and the post-9/11 cases like Hamdi and Boumediene being the most famous—but the pattern is clear: When the president invokes national security or emergency powers, judicial review is limited. 

So what has constrained emergency powers? The emergencies themselves. Throughout history, emergencies were rare and time limited—the Civil War, the Great Depression, Pearl Harbor, 9/11. Wars ended, and crises receded. Our separation-of-powers framework has worked because it assumes emergencies have generally been the temporary exception, not the norm.

AI breaks this assumption.

AI empowers adversaries asymmetrically—giving offensive capabilities that outpace defensive responses. Foreign actors can use AI to identify vulnerabilities, automate attacks, and target critical infrastructure at previously impossible scale and speed. The same AI capabilities that strengthen the president also strengthen our adversaries, creating a perpetual heightened threat that justifies permanent emergency powers. 

Here’s what an AI-enabled emergency might look like. A foreign adversary uses AI to target U.S. critical infrastructure—things like the power grid, financial systems, or water treatment. Within hours, the president invokes IEEPA, the Defense Production Act, and inherent Article II authority. AI surveillance monitors all network traffic. Algorithmic screening begins for financial transactions. And compliance monitoring extends across critical infrastructure.

The immediate crisis might pass in 48 hours, but the emergency infrastructure never gets dismantled. Surveillance remains operational, and each emergency builds infrastructure for the next one.

Why does our constitutional system permit this? First, speed: Presidential action completes before Congress can react. Second, secrecy: Classification shields details from Congress, courts, and the public. Third, judicial deference: Courts defer almost automatically when “national security” and “emergency” appear in the same sentence. And, as if to add insult to injury, the president’s own AI systems might soon be the ones assessing threats and determining what counts as an emergency.

Mechanism 2: Perfect Enforcement

Emergency powers are—theoretically, at least—episodic. But enforcement of the laws happens continuously, every day, in every interaction between citizen and state. That’s where the second mechanism—perfect enforcement—operates.

Pre-AI governance depends on enforcement discretion. We have thousands of criminal statutes and millions of regulations, and so, inevitably, prosecutors have to choose cases, agencies have to prioritize violations, and police have to exercise judgment. The Supreme Court has recognized this necessity: In cases like Heckler v. ChaneyBatchelder, and Wayte, the Court held that non-enforcement decisions are presumptively unreviewable because agencies must allocate scarce resources. This discretion prevents tyranny by allowing mercy, context, and human judgment. 

AI eliminates that necessity. When every violation can be detected and every rule can be enforced, enforcement discretion becomes a choice rather than a practical constraint. The question becomes: What happens when the Take Care Clause meets perfect enforcement? Does the Take Care Clause allow the president to enforce the laws to the hilt? Might it require him to? 

As an example, consider what perfect immigration enforcement might look like. (And you can imagine this across every enforcement domain: tax compliance, environmental violations, workplace safety—even traffic laws.) Already facial recognition databases cover tens of millions of Americans, real-time camera networks monitor movement, financial systems track transactions, social media analysis identifies patterns, and automated risk assessment scores individuals. Again, China is leading the way—its “social credit” system demonstrates what’s possible when these technologies are integrated.

Now imagine the president directs DHS to do the same: build a single AI system that identifies every visa overstay and automatically generates enforcement actions. There are no more “enforcement priorities”—the algorithm flags everyone, and ICE officers blindly execute its millions of directives with perfect consistency.

Why does the Constitution allow this? The Take Care Clause traditionally required discretion because resource limits made total enforcement impossible. But AI changes this. Now the Take Care Clause can be read as consistent with eliminating discretion—the president isn’t violating his duty by enforcing everything, he’s just being thorough.

More aggressively: The president might argue that perfect enforcement is not just permitted but required. Congress wrote these laws, and the president is merely faithfully executing what Congress commanded now that technology makes it possible. If there’s no resource constraint, there’s no justification for discretion.

What about Equal Protection or Due Process? The Constitution might actually favor algorithmic enforcement. Equal Protection could be satisfied by perfect consistency if algorithmic enforcement treats identical violations identically, eliminating the arbitrary disparities that plague human judgment. And Due Process might be satisfied if AI proves more accurate than humans, which it may well be. Power once dispersed among millions of fallible officials becomes concentrated in algorithmic policy that could, compared to the human alternative, be more consistent, more accurate, and more just.

There’s one final effect that perfect enforcement produces: It ratchets up punishment beyond congressional intent. Congress wrote laws assuming enforcement discretion would moderate impact. They set harsh penalties knowing prosecutors would focus on serious cases and agencies would prioritize egregious violations, while minor infractions would largely be ignored.

But AI removes that backdrop. When every violation is enforced—even trivial ones Congress never expected would be prosecuted—the net effect is dramatically higher punitiveness. Congress calibrated the system assuming discretion would filter out minor cases. AI enforces everything, producing an aggregate severity Congress never intended.

Mechanism 3: Information Dominance

The first two mechanisms concentrating presidential power—emergency powers and perfect enforcement—expand what the president can do. The third mechanism is about controlling what citizens know. AI enables the president to saturate public discourse at unprecedented scale. And if the executive controls what citizens see, hear, and believe, how can Congress, courts, or the public effectively resist?

The Supreme Court has held that the First Amendment doesn’t restrict the government’s own speech. This government speech doctrine means that the government can select monumentschoose license plate messages, communicate preferred policies—all with no constitutional limit on volume, persistence, or sophistication.

Until now, practical constraints limited the scale of this speech—more messages required more people, more time, and more resources. AI eliminates these constraints, enabling content generation at near-zero marginal cost, operating across all platforms simultaneously, and delivering personalized messages to every citizen. The government speech doctrine never contemplated AI-powered saturation, and there is no limiting principle in existing case law.

Again, look to China for the future—it’s already using AI to saturate public discourse. In August, leaked documents revealed that GoLaxy, a Chinese AI company, built a “Smart Propaganda System”—AI that monitors millions of posts daily and generates personalized counter-messaging in real time, producing content that “feels authentic … and avoids detection.” The Chinese government has used it to suppress Hong Kong protest movements and influence Taiwanese elections. 

Now imagine an American president deploying these capabilities domestically.

It’s 2027. A major presidential scandal breaks—Congress investigates, courts rule executive actions unconstitutional, and in response the Presidential AI Response System activates. It floods social media platforms, news aggregators, and recommendation algorithms with government-generated content.

You’re a suburban Ohio parent worried about safety, and your phone shows AI-generated content about how the congressional investigation threatens law enforcement funding, accompanied by fake “local crime statistics.” Your neighbor, a student at the excellent local law school, is concerned about civil liberties—she sees completely different content about “partisan witch hunts” undermining due process. Same scandal, different narratives—the public can’t even agree on basic facts.

The AI system operates in three layers. First, it generates personalized messaging, detecting which demographics are persuadable and which narratives are gaining traction, A/B testing and adjusting counter-messages in real time. Second, it manipulates platform algorithms, persuading social media companies to down-rank “disinformation”—which means congressional hearings never surface in your feed and news about court decisions get buried. Third, it saturates public discourse through sheer volume, generating millions of messages across all platforms that drown out opposition not through censorship but through scale that private speakers can’t match. 

And all the while the First Amendment offers no constraint because the government speech doctrine allows the government to say whatever it wants, as much as it wants.

Information dominance makes resistance to the other mechanisms impossible. How do you organize opposition to emergency powers if you never hear about them? How do you resist perfect enforcement if you’ve been convinced it’s necessary? And how do you check national security decisions if you’re convinced of the threat—and if you can’t understand how the AI made the decision in the first place?

Which brings us to the fourth mechanism.

Mechanism 4: The National Security Black Box

National security is where presidential power reaches its apex. The Constitution grants the president enormous authority as commander in chief, with control over intelligence and classification, and courts have historically granted extreme judicial deference. Courts defer to military decisions, and the “political question” doctrine bars review of many national security judgments.

Congress retains constitutional checks—the power to declare war, appropriate funds, demand intelligence briefings, and conduct investigations. But AI creates what University of Virginia law professor Ashley Deeks calls the “double black box”—a problem that renders these checks ineffective.

The first—inner—box is AI’s opacity. AI systems are inscrutable black boxes that even their designers can’t fully explain. Congressional staffers lack technical expertise to evaluate them, and courts have no framework for passing judgment on algorithmic military judgments. No one—not even the executive branch officials nominally in charge—can explain why the AI reached a particular decision.

The second—outer—box is traditional national security secrecy. Classification shields operational details and the state secrets privilege blocks judicial review. The executive controls intelligence access, meaning Congress depends on the executive for the very information needed for oversight.

These layers combine: Congress can’t oversee what it can’t see or understand. Courts can’t review what they can’t access or evaluate. The public can’t hold anyone accountable for what’s invisible and incomprehensible.

And then speed makes things worse. AI operations complete in minutes, if not seconds, creating fait accompli before oversight can engage. By the time Congress learns what happened through classified briefings, facts on the ground have changed. Even if Congress could overcome both layers of inscrutability, it would be too late to restrain executive action.

Consider what this could look like in practice. It’s 3:47 a.m., and a foreign military AI probes U.S. critical infrastructure: This time it’s the industrial-control systems that control the eastern seaboard’s electrical grid.

Just 30 milliseconds later, U.S. Cyber Command’s AI detects the intrusion and assesses a 99.7 percent probability that this is reconnaissance for a future attack. 

Less than a second later, the AI decision tree executes. It evaluates options—monitoring is insufficient, counter-probing is inadequate, blocking would only be temporary—and selects a counterattack targeting foreign military command and control. The system accesses authorization from pre-delegated protocols and deploys malware.

Three minutes after the initial probe, the U.S. AI has disrupted foreign military networks, taking air defense offline, compromising communications, and destabilizing the attackers’ own power grids.

At 3:51 a.m., a Cyber Command officer is notified of the completed operation. At 7:30a.m., the president receives a briefing over breakfast of a serious military operation that she—supposedly the commander in chief—had no role in. But she’s still better off than congressional leadership, which only learns about the operation later that day when CNN breaks the story.

This won’t be an isolated incident. Each AI operation completes before oversight is possible, establishing precedent for the next. By the time Congress or courts respond, strategic facts have changed. The constitutional separation of war powers requires transparency time—both of which AI operations eliminate.

Mechanism 5: Realizing the Unitary Executive

The first four mechanisms—emergency powers, perfect enforcement, information dominance, and inscrutable national security decisions—expand the scope of presidential power. Each extends presidential reach.

But the fifth mechanism is different. It’s not about doing more but about controlling how it gets done. After all, how is a single president supposed to control a bureaucracy of nearly 3 million employees making untold decisions every day? The unitary executive theory has been debated for over two centuries and has recently become the dominant constitutional position at the Supreme Court. But in all this time it’s always been, practically speaking, impossible. AI removes that practical constraint.

Article II, Section 1, states that “The executive Power shall be vested in a President.” THE executive power. A President. Singular. This is the textual foundation for the unitary executive theory: the idea that all executive authority flows through one person and that this one person must therefore control all executive authority. 

The main battleground for this theory has been unilateral presidential firing authority. If the president can fire subordinates at will, control follows. The First Congress debated this in 1789, when James Madison proposed that department secretaries be removable by the president alone. Congress’s decision at the time implied that the president had such a power, but we’ve been fighting about presidential control ever since. 

The Supreme Court has zigzagged on this issue, from Myers in 1926 affirming presidential removal power, to Humphrey’s Executor less than a decade later carving out huge exceptions for independent agencies, to Morrison v. Olson in 1988, where Justice Antonin Scalia’s lone dissent defended the unitary executive. But by Seila Law v. CFPB in 2020, Scalia’s dissent had become the majority view. Unitary executive theory is now ascendant. (And we’ll see how far the Court pushes it when it decides on Federal Reserve Board independence later this term.)

But in a practical sense, the constitutional questions have always been second-order. Even if the president had constitutional authority for unitary control, practical reality made it impossible. Harry Truman famously quipped about Eisenhower upon his election in 1952: “He’ll sit here [in the Oval Office] and he’ll say, ‘Do this! Do that!’ And nothing will happen. Poor Ike—it won’t be a bit like the Army. He’ll find it very frustrating.”

One person just can’t process information from millions of employees, supervise 400 agencies, and know what subordinates are doing across the vast federal bureaucracy. Career civil servants can slow-roll directives, misinterpret guidance, quietly resist—or simply just not know what the president wants them to do. The real constraint on presidential power has always been practical, not constitutional.

But AI removes those constraints. It transforms the unitary executive theory from a constitutional dream into an operational reality.

Here’s a concrete example—real, not hypothetical. In January, the Trump administration sent a “Fork in the Road” email to federal employees: return to office, accept downsizing, pledge loyalty, or take deferred resignation. DOGE—the Department of Government Efficiency—deployed Meta’s Llama 2 AI model to review and classify responses. In a subsequent email, DOGE asked employees to describe weekly accomplishments and used AI to assess whether work was mission critical. If AI can determine mission-criticality, it can assess tone, sentiment, loyalty, or dissent.

DOGE analyzed responses to one email, but the same technology works for all emails, every text message, every memo, and every Slack conversation. Federal email systems are centrally managed, workplace platforms are deployed government-wide, and because Llama is open source, Meta can’t refuse to have its systems used in this way. And because federal employees have limited privacy expectations in their work communications, the Fourth Amendment permits most government surveillance. 

Monitoring is just the beginning. The real transformation comes from training AI on presidential preferences. The training data is everywhere: campaign speeches, policy statements, social media, executive orders, signing statements, tweets, all continuously updated. The result is an algorithmic representation of the president’s priorities. Call it TrumpGPT.

Deploy that model throughout the executive branch and you can route every memo through the AI for alignment checks, screen every agenda for presidential priorities, and evaluate every recommendation against predicted preferences. The president’s desires become embedded in the workflow itself.

But it goes further. AI can generate presidential opinions on issues the president never considered. Traditionally, even the wonkiest of presidents have had enough cognitive bandwidth for only 20, maybe 30 marquee issues—immigration, defense, the economy. Everything else gets delegated to bureaucratic middle management.

But AI changes this. The president can now have an “opinion” on everything. EPA rule on wetlands permits? The AI cross-references it with energy policy. USDA guidance on organic labeling? Check against agricultural priorities. FCC decision on rural broadband? Align with public statements on infrastructure. The president need not have personally considered these issues; it’s enough that the AI learned the president’s preferences and applies them. And if you’re worried about preference drift, just keep the model accurate through a feedback loop, periodically sampling a few decisions and validating them with the president.

And here’s why this matters: Once the president achieves AI-enabled control over the executive branch, all the other mechanisms become far more powerful. When emergency powers are invoked, the president can now deploy that authority systematically across every agency simultaneously through AI systems. Perfect enforcement becomes truly universal when presidential priorities are embedded algorithmically throughout government. Information dominance operates at massive scale when all executive branch communications are coordinated through shared AI frameworks. And inscrutable national security decisions multiply when every agency can act at machine speed under algorithmic control. Each mechanism reinforces the others.

Now, this might all sound like dystopian science fiction. But here’s what’s particularly disturbing: This AI-enabled control actually fulfills the Supreme Court’s vision of the unitary executive theory. It’s the natural synthesis of a 21st-century technology meeting this Court’s interpretation of an 18th-century document. Let me show you what I mean by taking the Court’s own reasoning seriously.

In Free Enterprise Fund v. PCAOB in 2010, the Court wrote: “The Constitution requires that a President chosen by the entire Nation oversee the execution of the laws.” And in Seila Law a decade later: “Only the President (along with the Vice President) is elected by the entire Nation.”

The argument goes like this: The president has unique democratic legitimacy as the only official elected by all voters. Therefore the president should control the executive branch. This is not actually a good argument, but let’s accept the Court’s logic for a moment.

If the president is the uniquely democratic voice that should oversee execution of all laws, then what’s wrong with an AI system that replicates presidential preferences across millions of decisions? Isn’t that the apogee of democratic accountability? Every bureaucratic decision aligned with the preferences of the only official chosen by the entire nation?

This is the unitary executive theory taken to its absurd, yet logical, conclusion.

Solutions

Let’s review. We’ve examined five mechanisms concentrating presidential power: emergency powers creating permanent crisis, perfect enforcement eliminating discretion, information dominance saturating discourse, the national security black box too opaque and fast for oversight, and AI making the unitary executive technologically feasible. Together they create an executive too fast, too complex, too comprehensive, and too powerful to constrain. 

So what do we do? Are there legal or institutional responses that could restrain the Unitary Artificial Executive before it fully materializes? 

Look, my job as an academic is to spot problems, not fix them. But it seems impolite to leave you all with a sense of impending doom. So—acknowledging that I’m more confident in the diagnosis than the prescription—let me offer some potential responses.

But before I do, let me be clear: Although I’ve spent the past half hour on doom and gloom, I’m the farthest thing from an AI skeptic. AI can massively improve government operations through faster service, better compliance, and reduced bias. At a time when Americans believe government is dysfunctional, AI offers real solutions. The question isn’t whether to use AI in government. We will, and we should. The question is how to capture these benefits while preventing unchecked concentration of power.

Legislative Solutions

Let’s start with legislative solutions. Congress could, for example, require congressional authorization before the executive branch deploys high-capability AI systems. It could limit emergency declarations to 30 or 60 days without renewal. And it could require explainable decisions with a human-in-the-loop for critical determinations.

But the challenges are obvious. Any president can veto restrictions on their own power, and in our polarized age it’s very hard to imagine a veto-proof majority. The president also controls how the laws are executed, so statutory requirements could be interpreted narrowly or ignored. Classification could shield AI systems from oversight. And “human-in-the-loop” requirements could become mere rubber-stamping.

Institutional and Structural Reforms

Beyond statutory text, we need institutional reforms. Start with oversight: Create an independent inspector general for AI with technical experts and clearance to access classified systems. But since oversight works only if overseers understand the technology, we also need to build congressional technical capacity by restoring the Office of Technology Assessment and expanding the Congressional Research Service’s AI expertise. Courts need similar resources—technical education programs and access to court-appointed AI experts. 

We could also work through the private sector, imposing explainability and auditing requirements on companies doing AI business with the federal government. And most ambitiously, we could try to embed legal compliance directly into AI architecture itself, designing “law-following AI” systems with constitutional constraints built directly into the models.

But, again, each of these proposals faces obstacles. Inspectors general risk capture by the agencies they oversee. Technical expertise doesn’t guarantee political will—Congress and courts may understand AI but still defer to the executive. National security classification could exempt government AI systems from explainability and auditing requirements. And for law-following AI, we still need to figure out how to train a model to teach it what “following the law” actually means.

Constitutional Responses

Maybe the problem is more fundamental. Maybe we need to rethink the constitutional framework itself.

Constitutional amendments are unrealistic—the last was 1992, and partisan polarization makes the Article V process nearly impossible.

So more promising would be judicial reinterpretation of existing constitutional provisions. Courts could hold that Article II’s Vesting and Take Care Clauses don’t prohibit congressional regulation of executive branch AI. Courts could use the non-delegation doctrine to require that Congress set clear standards for AI deployment rather than giving the executive blank-check authority. And due process could require algorithmic transparency and meaningful human oversight as constitutional minimums.

But maybe the deeper problem is the unitary executive theory itself. That’s why I titled this lecture “The Unitary Artificial Executive”—as a warning that this constitutional theory becomes even more dangerous once AI makes it technologically feasible.

So here’s my provocation to my colleagues in the academy and the courts who advocate for a unitary executive: Your theory, combined with AI, leads to consequences you never anticipated and probably don’t want. The unitary executive theory values efficiency, decisiveness, and unity of command. It treats bureaucratic friction as dysfunction. But what if that friction is a feature, not a bug? What if bureaucratic slack, professional independence, expert dissent—the messy pluralism of the administrative state—are what stands between us and tyranny?

The ultimate constitutional solution may require reconsidering the unitary executive theory itself. Perfect presidential control isn’t a constitutional requirement but a recipe for autocracy once technology makes it achievable. We need to preserve spaces where the executive doesn’t speak with one mind—whether that mind is human or machine.

Conclusion

I’ve just offered some statutory approaches, institutional reforms, and constitutional reinterpretations. But let’s be honest about the obstacles: AI develops faster than law can regulate it. Most legislators and judges don’t understand AI well enough to constrain it. And both parties want presidential power when they control it. 

But lawyers have confronted existential rule-of-law challenges before. After Watergate, the Church Committee reforms led to real constraints on executive surveillance. After 9/11, when crisis and executive power claimed unchecked detention authority, lawyers fought, forcing the Supreme Court to check executive overreach. When crisis and executive power threaten constitutional governance, lawyers have been the constraint.

And, to the students in the audience, let me say: You will be too.

You’re entering the legal profession at a pivotal moment. The next decade will determine whether constitutional government survives the age of AI. Lawyers will be on the front lines of this fight. Some will work in the executive branch as the humans in the loop. Some will work in Congress—drafting statutes and demanding explanations. Some will litigate—bringing cases, performing discovery, and forcing judicial confrontation.

The Unitary Artificial Executive is not inevitable. It’s a choice we’re making incrementally, often without realizing it. The question is: Will we choose to constrain it while we still can? Or will we wake up one day to find we’ve built a constitutional autocracy—not through a coup, but through code?

This is a problem we’re still learning to see. But seeing it is the first step. And you all will determine what comes next.

Thank you. I look forward to your questions.

The limits of regulating AI safety through liability and insurance

Any opinions expressed in this post are those of the authors and do not reflect the views of the Institute for Law & AI.

At the end of September, California governor Gavin Newsom signed the Transparency in Frontier Artificial Intelligence Act, S.B. 53, requiring large AI companies to report the risks associated with their technology and the safeguards they have put in place to protect against those risks. Unlike an earlier version of the bill, S.B. 1047, that Newsom vetoed a year earlier, this most recent version doesn’t focus on assigning liability to companies for harm caused by their AI systems. In fact, S.B. 53 explicitly limits financial penalties to $1 million for major incidents that kill more than 50 people or cause more than $1 billion in damage. 

This de-emphasizing of liability is deliberate—Democratic state Sen. Scott Wiener said in an interview with NBC News, “Whereas SB 1047 was more of a liability-focused bill, SB 53 is more focused on transparency.” But that’s not necessarily a bad thing. In spite of a strong push to impose greater liability on AI companies for the harms their systems cause, there are good reasons to believe that stricter liability rules for AI won’t make many types of AI systems safer and more secure. In a new paper, we argue that liability is of limited value in safeguarding against many of the most significant AI risks. The reason is that liability insurers, who would ordinarily help manage and price such risks, are unlikely to be able to model them accurately or to induce their insureds to take meaningful steps to limit exposure.

Liability and Insurance

Greater liability for AI risks will almost certainly result in a much larger role for insurers in providing companies with coverage for that liability. This, in turn, would make insurers one of the key stakeholders determining what type of AI safeguards companies must put in place to qualify for insurance coverage. And there’s no guarantee that insurers will get that right. In fact, when insurers sought to play a comparable role in the cybersecurity domain, their interventions proved largely ineffective in reducing policyholders’ overall exposure to cyber risk. And many of the challenges that insurers encountered in pricing and affirmatively mitigating cyber risk are likely to be even more profound when it comes to modeling and pricing many of the most significant risks associated with AI systems.

AI systems present a wide range of risks, some of which insurers may indeed be well equipped to manage. For example, insurers may find it relatively straightforward to gather data on car crashes involving autonomous vehicles and to develop reasonably reliable predictive models for such events. But many of the risks associated with generative and agentic AI systems are far more complex, less observable, and more heterogeneous, making it difficult for insurers to collect data, design effective safeguards, or develop reliable predictive models. These risks run the gamut from chatbots failing to alert anyone about a potentially suicidal user to giving customers incorrect advice and prices, to agents that place unwanted orders for supplies or services, develop malware that can be used to attack computer systems, or transfer funds incorrectly. For these types of risks—as well as more speculative potential catastrophic risks, such as AIs facilitating chemical or biological attacks—there is probably not going to be a large set of incidents that insurers can observe to build actuarial models, much less a clear consensus on how best to guard against them.

We know, from watching insurers struggle with how best to mitigate cyber risks, that when there aren’t reliable data sources for risks, or clear empirical evidence about how best to address those risks, it can be very difficult for insurers to play a significant role in helping policyholders do a better job of reducing their risk. When it comes to cyber risk, there have been several challenges that will likely apply as much—if not more—to the risks posed by many of today’s rapidly proliferating AI systems.

Lack of data

The first challenge that stymied insurers’ efforts to model cyber risks was simply a lack of good data about how often they occur and how much they cost. Other than breaches of personal data, organizations have historically not been required to report most cybersecurity incidents, though that is changing with the upcoming implementation of the Cyber Incident Reporting for Critical Infrastructure Act of 2022 (CIRCIA). Since they weren’t required to report incidents like ransomware, cyber-espionage, and denial-of-service attacks, most organizations didn’t for fear of harming their reputation or inviting lawsuits and regulatory scrutiny. But because so many cybersecurity incidents were kept under wraps, insurers had a hard time when they began offering cyber insurance coverage figuring out how frequently these incidents occurred and what kinds of damage they typically caused. That’s why most cyber insurance policies were initially just data breach insurance—because there was at least some data on those breaches which were required to be reported under state laws. 

Even as their coverage expanded to include other types of incidents besides data breaches, and insurers built up their own claims data sets, they still encountered challenges in predicting cybersecurity incidents because the threat landscape was not static. As attackers changed their tactics and adapted to new defenses, insurers found that the past trends were not always reliable indicators of what future cybersecurity incidents would look like. Most notably, in 2019 and 2020, insurers experienced a huge spike in ransomware claims that they had not anticipated, leading them to double and triple premiums for many policyholders in order to keep pace with the claims they faced.

Many AI incidents, like cybersecurity incidents, are not required by law to be reported and are therefore probably not made public. This is not uniformly true of all AI risks, of course. For instance, car crashes and other incidents with visible, physical consequences are very public and difficult—if not impossible—to keep secret. For these types of risks, especially if they occur at a high enough frequency to allow for the collection of robust data sets, insurers may be able to build reliable predictive models. However, many other types of risks associated with AI systems—including those linked to agentic and generative AI—are not easily observable by the outside world. And in some cases, it may be difficult, or even impossible, to know what role AI has played in an incident. If an attacker uses a generative AI tool to identify a software vulnerability and write malware to exploit that vulnerability, for instance, the victim and their insurer may never know what role AI played in the incident. This means that insurers will struggle to collect consistent or comprehensive historic data sets about these risks.

AI risks may, too, change over time, just as cyber risks do. Here, again, this is not equally true of all AI risks. While cybersecurity incidents almost always involve some degree of adversarial planning—an attacker trying to compromise a computer system and adapting to safeguards and new technological developments—the same is not true of all AI incidents, which can result from errors or limitations in the technology itself, not necessarily any deliberate manipulation. But there are deliberate attacks on AI systems that insurers may struggle to predict using historical data—and even the incidents that are accidental rather than malicious may change and evolve considerably over time given how quickly AI systems are changing and being applied to new areas. All of these challenges point to the likelihood that insurers will have a hard time modeling these types of AI risks and will therefore struggle to price them, just as they have with cyber risks.

Difficulty of Risk Assessments

Another major challenge insurers have encountered in the cyber insurance industry is how to assess whether a company has done a good job of protecting itself against cyber threats. The industry standard for these assessments are long questionnaires that companies fill out about their security posture but that often fail to capture the key technical nuances about how safeguards like encryption and multi-factor authentication are implemented and configured. This makes it difficult for insurers to link premiums to their policyholders’ risk exposure because they don’t have any good way of measuring that risk exposure. So instead, most premiums are set according to how much revenue a company generates or its industry sector. This means that companies often aren’t rewarded for investing in more security safeguards with lower premiums and therefore have little incentive to make those investments.

A similar—and arguably greater—challenge exists for assessing organizations’ exposure to AI risks. AI risks are so varied and AI systems are so complex that identifying all of the relevant risks and auditing all of the technical components and code related to those risks requires technical experts that most insurers are unlikely to have in-house. While insurers may try partnering with tech firms to perform these assessments—as they have in the past for cybersecurity assessments—they will also probably face pressure from brokers and clients to keep the assessment process lightweight and non-intrusive to avoid losing customers to their competitors. This has certainly been the case in the cyber insurance market, where many carriers continue to rely on questionnaires instead of other, more accurate assessment methods in order to avoid upsetting their clients. 

But if insurers can’t assess their customers’ risk exposure, then they can’t help drive down that risk by rewarding the firms who have done the most to reduce their risk with lower premiums. To the contrary, this method of measuring and pricing risk signals to insureds that investments in risk mitigation are not worthwhile, since such efforts have little effect on premiums and primarily benefit insurers by reducing their exposure. This is yet another reason to be cautious about the potential for insurers to help make AI systems safer and more secure.

Uncertainty About Risk Mitigation Best Practices

Figuring out how to assess cyber risk exposure is not the only challenge insurers encountered when it came to underwriting cyber insurance. They also struggled with figuring out what safeguards and security controls they should demand of their policyholders. While many insurers require common controls like encryption, firewalls, and multi-factor authentication, they often lack good empirical evidence about which of these security measures are most effective. Even in their own claims data sets, insurers don’t always have reliable information about which safeguards were or were not in place when incidents occurred, because the very lawyers insurers supply to oversee incident investigations sometimes don’t want that information recorded or shared for fear of it being used in any ensuing litigation.

The uncertainty about which best practices insurers should require from their customers is even greater when it comes to measures aimed at making many types of AI systems safer and more secure. There is little consensus about how best to do that beyond some broad ideas about audits, transparency, testing, and red teaming. If insurers don’t know which safeguards or security measures are most effective, then they may not require the right ones, further weakening their ability to reduce risk for their policyholders.

Catastrophic Risk

One final characteristic that AI and cyber risks share is the potential for really large-scale, interconnected incidents, or catastrophic risks, that will generate more damage than insurers can cover. In cyber insurance, the potential for catastrophic risks stems in part from the fact that all organizations rely on a fairly centralized set of software providers, cloud providers, and other computing infrastructure. This means that an attack on the Windows operating system, or Amazon Web Services, could cause major damage to an enormous number of organizations in every country and spanning every industry sector, creating potentially huge losses for insurers since they would have no way to meaningfully diversify their risk pools. This has led to cyber insurers and reinsurers being relatively cautious in how much cyber risk they underwrite and maintaining high deductibles for these policies. 

AI foundation models and infrastructure are similarly concentrated in a small number of companies, indicating that there is similar potential for an incident targeting one model to have far-reaching consequences. Future AI systems may also pose a variety of catastrophic risks, such as the possibility of these systems turning against humans or causing major physical accidents. Such catastrophic risks pose particular challenges for insurers and can make them more wary of offering large policies, which may in turn make some companies discount these risks entirely notwithstanding the prospect of liability. 

Liability Limitation or Risk Reduction?

In general, the cyber insurance example suggests that when it comes to dealing with risks for which we do not have reliable data sets, cannot assess firms’ risk levels, do not know what the most effective safeguards are, and have some potential for catastrophic consequences, insurers will end up helping their customers limit their liability but not actually reduce their risk exposure. For instance, in the case of cyber insurance, this may mean involving lawyers early in the incident response process so that any relevant information is shielded against discovery in future litigation—but not actually meaningfully changing the preventive security controls firms have in place to make incidents less likely to occur. 

It is easy to imagine that imposing greater liability on AI companies could produce a similar outcome, where insurers intervene to help reduce that liability—perhaps by engaging legal counsel or mandating symbolic safeguards aimed at minimizing litigation or regulatory exposure—without meaningfully improving the safety or security of the underlying AI systems. That’s not to say insurers won’t play an important role in covering certain types of AI risks, or in helping pool risks for new types of AI systems. But it does suggest they will be able to do little to incentivize tech companies to put better safeguards in place for many of their AI systems.

That’s why California is wise to be focusing on reporting and transparency rather than liability in its new law. Requiring companies to report on risks and incidents can help build up data sets that enable insurers and governments to do a better job of measuring risks and the impact of different policy measures and safeguards. Of course, regulators face many of the same challenges that insurers do when it comes to deciding which safeguards to require for high-risk AI systems and how to mitigate catastrophic risks. But at the very least, regulators can help build up more robust data sets about the known risks associated with AI, the safeguards that companies are experimenting with, and how well they work to prevent different types of incidents. 

That type of regulation is badly needed for AI systems, and it would be a mistake to assume that insurers will take on the role of data collection and assessment themselves, when we have seen them try and fail to do that for more than two decades in the cyber insurance sector. The mandatory reporting for cybersecurity incidents that will go into effect next year under CIRCIA could have started twenty years ago if regulators hadn’t assumed that the private sector—led by insurers—would be capable of collecting that data on its own. And if it had started twenty years ago, we would probably know much more than we do today about the cyber threat landscape and the effectiveness of different security controls—information that would itself lead to a stronger cyber insurance industry. 

If regulators are wise, they will learn the lessons of cyber insurance and push for these types of regulations early on in the development of AI rather than focusing on imposing liability and leaving it in the hands of tech companies and insurers to figure out how best to shield themselves from that liability. Liability can be useful for dealing with some AI risks, but it would be a mistake not to recognize its limits when it comes to making emerging technologies safer and more secure.

Building AI surge capacity: mobilizing technical talent into government for AI-related national security crises

OUP book: Architectures of global AI governance