Two Byrd Rule problems with the AI moratorium

Note: this commentary was drafted on June 26, 2025, as a memo not intended for publication; we’ve elected to publish it in case the analysis laid out here is useful to policymakers or commentators following ongoing legislative developments regarding the proposed federal moratorium on state AI regulation. The issues noted here are relevant to the latest version of the bill as of 2:50 p.m. ET on June 30, 2025.

Two Byrd Rule issues have emerged, both of which should be fixed. It appears that the Parliamentarian has not ruled on either.

Effects on existing BEAD funding

The Parliamentarian may have already identified the first Byrd Rule issue: the plain text of the AI Moratorium would affect all $42.45 Billion in BEAD funding, not just the newly allocated $500 Million. It is not 100% certain that a court would read the statute this way, but it is the most likely outcome. We analyzed this problem in a recently published commentary. This issue could be fixed via an amendment.

Private enforcement of the moratorium

In that same article, we flagged a second issue that also presents a Byrd Rule issue: the AI Moratorium seemingly creates private enforcement rights in private parties. That’s a problem under the Byrd Rule, because the AI Moratorium must be a “necessary term or condition” of an outlay. A private enforcement right cannot be characterized as a necessary term or condition of an outlay that does not concern those third parties. This can be fixed by clarifying that the only enforcement mechanism is withdrawal or denial of the new BEAD funding.

The text at issue – private enforcement of the moratorium

The plain text of the moratorium, and applicable legal precedents, likely empower private parties to enforce the moratorium in court. Stripping the provision down to its essentials, subsection (q) states that “no eligible entity or political subdivision thereof . . . may enforce . . . any law or regulation . . . limiting, restricting or otherwise regulating artificial intelligence models, [etc.].” That sounds like prohibition. It doesn’t mention the Department of Commerce. Nor does it leave it to the Secretary’s discretion whether that prohibition applies. If states satisfy the criteria, they likely are prohibited from enforcing AI laws.

Nothing in the proposed moratorium or in 47 U.S.C. § 1702 generally provides that the only remedy for a violation of the moratorium is deobligation of obligated funds by the Assistant Secretary of Commerce for Communications and Information. And when comparable laws—e.g. the Airline Deregulation Act, 49 U.S.C. § 41713—have used similar language to expressly preempt state AI laws, courts have interpreted this as authorizing private parties to sue for an injunction preventing enforcement of preempted state laws. See, for example, Morales v. Trans World Airlines, Inc., 504 U.S. 374 (1992).

What would happen – private lawsuits to enforce the moratorium

Private parties could vindicate this right in one of two ways. First, if a private party (e.g. an AI company) fears that a state will imminently sue it for violating that state’s AI law, the private party could seek a declaratory judgment in federal court. Second, if the state actually sues the private party, that party could raise the moratorium as a defense to that lawsuit. If the private party is based in the same state, that defense would be heard in state court, and could result in dismissal of the state’s claims; if the party is from out-of-state, the claim would be removed to federal court, where a judge could also throw out the state’s claims.

Why it’s a Byrd Rule problem – private rights are not “terms or conditions”

The AI Moratorium must be a “necessary term or condition” of an outlay. In this case, promising not to enforce AI laws is a valid “term or condition” of the grant. Passively opening oneself up to lawsuits and defenses by private parties is not. Those lawsuits occur far after states take the money, are outside their control, and involve the actions of individuals who are not parties to the grant agreement. They also have significant effects unrelated to spending: binding the actions of states and invalidating laws in ways completely separate from the underlying transaction between the Department of Commerce and the states. It is perfectly compatible with the definition of “terms and conditions” for the Department of Commerce to deobligate funds if the terms of its grant are violated. It is an entirely different thing to create a defense or cause of action for third parties and to allow those parties to interfere with the enforcement power of states. The creation of rights for a third party, uninvolved in the delivery or receipt of an outlay cannot be considered a necessary term or condition.

The AI moratorium—the Blackburn amendment and new requirements for “generally applicable” laws

Published: 9:55 pm ET on June 29, 2025

Last updated: 10:28 pm ET on June 29, 2025

The latest version of the AI moratorium has been released, with some changes to the “rule of construction.” We’ve published two prior commentaries on the moratorium (both of which are still relevant, because the updated text has not addressed the issues noted in either). The new text:

Shortens the “temporary pause” from 10 to 5 years;
Attempts to exempt laws addressing CSAM, childrens’ online safety, and rights to name/likeness/voice/image—although the amendment seemingly fails to protect the laws its drafters intend to exempt; and
Creates a new requirement that laws do not create an “undue or disproportionate burden,” which is likely to generate significant litigation.

The amendment tries to protect state laws on child sexual abuse materials and recording artists, but likely fails to do so.

The latest text appears to be drafted specifically to address the concerns of Senator Marsha Blackburn, who does not want the moratorium to apply to state laws affecting recording artists (like Tennessee’s ELVIS Act) and laws affecting child sexual abuse material (CSAM). But while the amended text lists each of these categories of laws as specific examples of “generally applicable” laws or regulations, the new text only exempts those laws if they do not impose an “undue or disproportionate burden” on AI models, systems, or “algorithmic decision systems,” as defined in the moratorium, in order to “reasonably effectuate the broader underlying purposes of the law or regulation.”

However, laws like the ELVIS Act likely have a disproportionate burden on AI systems. They almost exclusively target AI systems and outputs, and the effect of the law will almost entirely be borne by AI companies. While trailing qualifiers always vex courts, the fact that “undue or disproportionate burden” is separated from the preceding list by a comma strongly suggests that it qualifies the entire list and not just “common law.” Common sense also counsels in favor of this reading: it’s unlikely that an inherently general body of law (like common law) would place a disproportionate burden on AI, while legislation like the ELVIS act absolutely could (and likely does). As we read the new text, the most likely outcome is that the laws Senator Blackburn wants to protect would not be protected.

Even if other readings are possible, this “disproportionate” language would almost certainly create litigation if enacted, with companies challenging whether the ELVIS Act and CSAM laws are actually exempted. As we have previously noted, the moratorium will likely be privately enforceable—meaning that any company or individual against whom a state attempts to enforce a state law or regulation will be able to sue to prevent enforcement.

The newly added “undue or disproportionate burden” language creates an unclear standard (and will likely generate extensive litigation)

The problem discussed above extends beyond the specific laws that Senator Blackburn wishes to protect. Previously, “generally applicable” laws were exempted. Under the new language, laws that address AI models/systems or “automated decision systems” can be exempted, but only if they do not place an “undue or disproportionate burden” on said models/systems. The effect of the new “undue or disproportionate burden” language will likely be to generate additional litigation and uncertainty. It may also make it more likely that some generally applicable laws, such as facial recognition laws or data protection laws, will no longer be exempt because they may place a disproportionate burden on AI models/systems.

Other less significant changes

Previously, subsection (q)(2)(A)(ii) excepted any law or regulation “the primary purpose and effect of which is to… streamline licensing, permitting, routing, zoning, procurement, or reporting procedures in a manner that facilitates the adoption of [AI models/systems/automated decision systems].” As amended, the relevant provision now excepts any law or regulation “the primary purpose and effect of which is to… streamline licensing, permitting, routing, zoning, procurement, or reporting procedures related to the adoption or deployment of [AI models/systems/automated decision systems].” This amended language is slightly broader than the original, but the difference does not seem highly significant.

Additionally, the structure of the paragraphs has been adjusted slightly, likely to make clear that subparagraph (B) (which requires that any fee or bond imposed by any excepted law be reasonable and cost-based and treat AI models/systems in the same manner as other models/systems) modifies both the “generally applicable law” and “primary purpose and effect” prongs of the rule of construction rather than just one or the other.

Other issues remain

As we’ve discussed previously, our best read of the text suggests that two additional issues remain unaddressed:

Any state that takes any of the newly appropriated $500 million in BEAD funding runs the risk of having its entire share of the previously obligated $42.45 billion in existing BEAD funding clawed back for violations of the moratorium.
Private companies and individuals will likely be able to enforce the moratorium through litigation.

The AI moratorium—more deobligation issues

Earlier this week, LawAI published a brief commentary discussing how to interpret the provisions in the proposed federal moratorium on state laws regulating AI relating to deobligation of Broadband Equity, Access, and Deployment (BEAD) funds. Since that publication, the text of the proposed moratorium has been updated, apparently in order to comply with a request from the Senate parliamentarian. Given the importance of this issue, and the existence of some amount of confusion around what exactly the changes to the moratorium’s text do, we’ve decided to publish a sequel to that earlier commentary briefly explaining how this new version of the bill will impact existing BEAD funding.

Does the latest version of the moratorium affect existing BEAD funding or only the new $500 million?

The moratorium would still, potentially, affect both existing and newly appropriated BEAD funding.

Essentially, there are two tranches of money at issue here: $500 million in new BEAD funding that the reconciliation bill would appropriate, and the $42.45 billion in existing BEAD funding that has already been obligated to states (but none of which has actually been spent as of the writing of this commentary). The previous version of the moratorium, as we noted in our earlier commentary, contained a deobligation provision that would have allowed deobligation (i.e., clawing-back) of a state’s entire portion of the $42.45 billion tranch as well as the same state’s portion of the new $500 million tranch.

The new version of the moratorium would update that deobligation provision by adding the words “if obligated any funds made available under subsection (b)(5)(A)” to the beginning of 47 U.S.C. § 1702(g)(3)(B)(iii). The provision now reads, in relevant part, “The Assistant Secretary… may, in addition to other authority under applicable law, deobligate grant funds awarded to an eligible entity that… if obligated any funds made available under subsection (b)(5)(A), is not in compliance with [the AI moratorium].”

In other words, the update clarifies that only states that accept a portion of the new $500 million in BEAD funding can have their BEAD funding clawed back if they attempt to enforce state laws regulating AI. But it does not change the fact that any state that does accept a portion of the $500 million, and then violates the moratorium (intentionally or otherwise), is subject to having all of its BEAD funding clawed back—including its portion of the $42.45 billion tranch of existing BEAD funding. Paragraph (3) covers “deobligation of awards” generally, and the phrase “grant funds awarded to an eligible entity” clearly means all grant funds awarded to that entity, rather than just funds made available under subsection (b)(5)(A) (i.e., the new $500 million). This is clear because subsections (g)(3)(B)(i) and (g)(3)(B)(ii), which allow deobligation if a state e.g. “demonstrates an insufficient level of performance, or wasteful or fraudulent spending,” clearly allow for deobligation of all of a state’s BEAD funding rather than just the new $500 million tranch.

So what has changed?

The most significant consequence of the update to the deobligation provision is that any state that does not accept any of the new $500 million appropriation is now clearly not subject to having existing BEAD funds clawed back for noncompliance with the moratorium. As we noted in our previous commentary, the previous text would have required compliance with the moratorium if Commerce deobligated existing BEAD funds for e.g. “wasteful or fraudulent spending” and then re-obligated them. That would not be possible under the new text.

In other words, states would clearly be able to opt out of compliance with the moratorium by choosing not to accept their share of the newly appropriated BEAD money. As other authors have noted, this would mean that wealthy states with a strong appetite for AI regulation, like New York and California, could pass on the new funding and continue to enact and enforce AI laws while less wealthy and more rural states might accept the additional BEAD funding in exchange for ceasing to regulate. And if technological progress and the emergence of new risks from AI caused any states that originally accepted their share of the $500 million to later change course and begin to regulate, they could potentially have all of their previously obligated BEAD funding clawed back.

The AI Moratorium—deobligation issues, BEAD funding, and independent enforcement

There’s been a great deal of discussion in recent weeks about the controversial proposed federal moratorium on state laws regulating AI. The most recent development is that the moratorium has been amended to form a part of the Broadband Equity, Access, and Deployment (BEAD) program. The latest draft of the moratorium, which recently received the go-ahead from the Senate Parliamentarian, appropriates an additional $500 million in BEAD funding, to be obligated to states that comply with the moratorium’s requirement not to enforce laws regulating AI models, systems, or “automated decision systems.” This commentary discusses two pressing legal questions that have been raised about the new moratorium language—whether it affects the previously obligated $42.45 billion in BEAD funding in addition to the $500 million in new funding, and whether private parties will be able to sue to enforce the moratorium.

Does the Moratorium affect existing BEAD funding, or only the new $500M?

One issue that has caused some confusion among commentators and policymakers is precisely how compliance or noncompliance with the moratorium will affect states’ ability to keep and spend the $42.45 billion in BEAD funding that has previously been obligated.

It is true that subsection (p) specifies that only amounts made available “On and after the date of enactment of this subsection” (in other words, the new $500m appropriation and any future appropriations) depend on compliance with the moratorium. However, the moratorium would also add a new provision to subsection (g), which covers “deobligation of awards.” This new provision states that Commerce may deobligate (i.e. withdraw) “grant funds awarded to an eligible entity that… is not in compliance with subsection (q) or (r).” This deobligation provision clearly and unambiguously applies to all $42.45 billion in previously obligated BEAD funding, in addition to the new $500 million. Subsection (g) amends the existing BEAD deobligation rules, not just the moratorium. And while subsections (p) and (q) affect only states that accept new obligations “on or after the enactment” of the bill, subsection (g) applies to all “grant funds” with no limitation on the funds source or timing.

So, any state that is not in compliance with subsection (q)—which includes any state that accepts any portion of the newly appropriated $500m and is later determined to have violated the moratorium, even unintentionally—could face having all of its previously obligated BEAD funding clawed back by Commerce, rather than just its portion of the new $500 million appropriation.

Additionally, it is possible that even states that choose not to accept any of the new $500 million could be affected, if Commerce deobligates previously obligated funds for reasons such as “an insufficient level of performance, or wasteful or fraudulent spending.” If this occurred, then any re-obligation of the clawed-back funds would require compliance with the moratorium. In other words, Commerce could attempt to use a state’s entire portion of the $42.45 billion BEAD funding amount as a cudgel to coerce states into complying with the moratorium and agreeing not to regulate AI models, systems, or “automated decision systems.”

Can private parties enforce the moratorium?

Probably. Various commentators have argued that the moratorium cannot be enforced by private parties, or that the Secretary of Commerce will, in his discretion, determine how vigorously the moratorium will be enforced. But the plain text of the provision, and applicable legal precedents, indicate that private parties will likely be entitled to enforce the prohibition on state AI regulation as well.

Stripping the provision down to its essentials, subsection (q) states that “no eligible entity or political subdivision thereof . . . may enforce . . . any law or regulation . . . limiting, restricting or otherwise regulating artificial intelligence models, [etc.].” That is a clear prohibition. It doesn’t mention the Department of Commerce. Nor does it leave it to the Secretary’s discretion whether that prohibition applies. If states satisfy the criteria, they are prohibited from enforcing laws restricting AI.

The case for AI liability

The debate over AI governance has intensified following recent federal proposals for a ten-year moratorium on state AI regulations. This preemptive approach threatens to replace emerging accountability mechanisms with a regulatory vacuum.

In his recent AI Frontiers article, Kevin Frazier argues in favor of a federal moratorium, seeing it as necessary to prevent fragmented state-level liability rules that would stifle innovation and disadvantage smaller developers. Frazier (an AI Innovation and Law Fellow at the University of Texas, Austin, School of Law) also contends that, because the norms of AI are still nascent, it would be premature to rely on existing tort law for AI liability. Frazier cautions that judges and state governments lack the technical expertise and capacity to enforce liability consistently.

But while Frazier raises important concerns about allowing state laws to assign AI liability, he understates both the limits of federal regulation and the unique advantages of liability. Liability represents the most suitable policy tool for addressing many of the most pressing risks posed by AI systems. Its superiority stems from three basic advantages. Specifically, liability can:

Function effectively despite widespread disagreement about the likelihood and severity of risks
Incentivize optimal rather than merely reasonable precautions
Address third-party harms where market mechanisms fail to do so

Frazier correctly observes that “societal norms around AI are still forming, and the technology itself is not yet fully understood.” However, I believe he draws the wrong conclusion from this observation. The profound disagreement among experts, policymakers, and the public about AI risks and their severity does not argue against using liability frameworks to curb potential abuses. On the contrary, it renders their use indispensable.

Disagreement and Uncertainty

The disagreement about AI risks reflects more than differences in technical assessment. It also encompasses fundamental questions about the pace of AI development, the likelihood of catastrophic outcomes, and the appropriate balance between innovation and precaution. Some researchers argue that advanced AI systems pose high-probability and imminent existential threats, warranting immediate regulatory intervention. Others contend that such concerns are overblown, arguing that premature regulation could stifle beneficial innovation.

Such disagreement creates paralysis in traditional regulatory approaches. Prescriptive regulation designed to address risks before they become reality — known in legal contexts as “ex ante,” meaning “before the fact” — generally entails substantial up-front costs that increase as rules become stricter. Passing such rules requires social consensus about the underlying risks and the costs we’re willing to bear to mitigate them.

When expert opinions vary dramatically about foundational questions, as they do in the case of AI, regulations may emerge that are either ineffectively permissive or counterproductively restrictive. The political process, which tends to amplify rather than resolve such disagreements, provides little guidance for threading this needle effectively.

Approval-based systems face similar challenges. In an approval-based system (for example, Food and Drug Administration regulations of prescription drugs), regulators must formally approve new products and technologies before they can be used. Thus, they depend on regulators’ ability to distinguish between acceptable and unacceptable risks — a difficult task when the underlying assessments remain contested.

Liability systems, by contrast, operate effectively even amid substantial disagreements. They do not require ex ante consensus about appropriate risk levels; rather, they assign “ex post” accountability. Liability scales automatically with risk, as revealed in cases where individual plaintiffs suffer real injuries. This obviates the need for ex ante resolution of wide social disagreement about the magnitude of AI risks.

Thus, while Frazier and I agree that governments have limited expertise in AI risk management, this actually strengthens rather than undermines the case for liability, which harnesses private-sector expertise through market incentives rather than displacing it through prescriptive rules.

Reasonable Care and Strict Liability

Frazier and I also share some common ground regarding the limits of negligence-based liability. Traditional negligence doctrine imposes a duty to exercise “reasonable care,” typically defined as the level of care that a reasonable person would exercise under similar circumstances. While this standard has served tort law well across many domains, AI systems present unique challenges that may render conventional reasonable care analysis inadequate for managing the most significant risks.

In practice, courts tend to engage in a fairly narrow inquiry when assessing whether a defendant exercised reasonable care. If an SUV driver runs over a pedestrian, courts generally do not inquire as to whether the net social benefits of this particular car trip justified the injury risk it generated for other road users. Nor would a court ask whether the extra benefits of driving an SUV (rather than a lighter-weight sedan) justified the extra risks the heavier vehicle posed to third parties. Those questions are treated as outside the scope of the reasonable care inquiry. Instead, courts focus on questions like whether the driver was drunk, or texting, or speeding.

In the AI context, I expect a similarly narrow negligence analysis that asks whether AI companies implemented well-established alignment techniques and safety practices. I do not anticipate questions about whether it was reasonable to develop an AI system with certain high-level features, given the current state of AI alignment and safety knowledge.

However, while negligence is limited in its ability to address broader upstream culpability, liability can still tackle it. Under strict liability, defendants internalize the full social costs of their activities. This structure incentivizes investment in precaution up to the point where marginal costs equal marginal benefits. Such an alignment between private and social incentives proves especially valuable when reasonable care standards may systematically underestimate the optimal level of precaution.

Accounting for Third-Party Harms

Another key feature of liability systems is their capacity to address third-party harms: situations where AI systems cause damage to parties who have no contractual or other market relationship with the system’s operator. These scenarios present classic market failure problems where private incentives diverge sharply from social welfare — warranting some sort of policy intervention.

When AI systems harm their direct users, market mechanisms provide some corrective pressure. Users who experience harms from AI systems can take their business to competitors, demand compensation, or avoid such systems altogether. While these market responses may be imperfect — particularly when harms are difficult to detect or when users face switching costs — they do provide an organic feedback mechanism, incentivizing AI system operators to invest in safety.

Third-party harms present an entirely different dynamic. In such cases, the parties bearing the costs of system failures have no market leverage to demand safer design or operation. AI developers, deployers, and users internalize the benefits of their activities — revenue from users, cost savings from automation, competitive advantages from AI capabilities — while externalizing many of the costs onto third parties. Without policy intervention, this leads to systematic underinvestment in safety measures that protect third parties.

Liability systems directly address this externality problem by compelling AI system operators to internalize the costs they impose on third parties. When AI systems harm people, liability rules require AI companies to compensate victims. This induces AI companies to invest in safety measures that protect third parties. AI companies themselves are best positioned to identify such measures, with the range of potential mitigations including high-level system architecture changes, investing more in alignment and interpretability research, and testing and red-teaming new models before deployment, potentially including broad internal deployment.

The power of this mechanism is clear when compared with alternative approaches to the problem of mitigating third-party harms. Prescriptive regulation might require regulators to identify appropriate risk-mitigation measures ex ante, a challenging task given the rapid evolution of AI technology. Approval-based systems might prevent the deployment of particularly risky systems, but they provide limited ongoing incentives for safety investment once systems are approved. Only liability systems create continuous incentives for operators to identify and implement cost-effective safety measures throughout the lifecycle of their systems.

Moreover, liability systems create incentives for companies to develop safety expertise that extends beyond compliance with specific regulatory requirements. Under prescriptive regulation, companies have incentives to meet specified requirements but little reason to exceed them. Under liability systems, companies have incentives to identify and address risks even when those risks are not explicitly anticipated by regulators. This creates a more robust and adaptive approach to safety management.

State-Level Liability

Frazier’s concerns about a patchwork of state-level AI regulation deserve serious examination, but his analysis overstates both the likelihood and the problematic consequences of such inconsistency. His critique conflates different types of regulatory requirements, while ignoring the inherent harmonizing features of liability systems.

First, liability rules exhibit greater natural consistency across jurisdictions than other forms of regulation do. Frazier worries about “ambiguous liability requirements” and companies needing to “navigate dozens of state-level laws.” However, the common-law tradition underlying tort law creates pressures toward harmonization that prescriptive regulations lack. Basic negligence principles — duty, breach, causation, and damages — remain remarkably consistent across states, despite the absence of a federal mandate.

More importantly, strict liability regimes avoid patchwork problems entirely. Under strict liability, companies bear responsibility for harm they cause, regardless of their precautionary efforts or the specific requirements they meet. This approach creates no compliance component that could vary across states. A company developing AI systems under a strict liability regime faces the same fundamental incentive everywhere: Make your systems safe enough to justify the liability exposure they create.

Frazier’s critique of Rhode Island Senate Bill 358, which I helped design, reflects some mischaracterization of its provisions. The bill is designed to close a gap in current law where AI systems may engage in wrongful conduct, yet no one may be liable.

Consider an agentic AI system that a user instructs to start a profitable internet business. The AI system determines that the easiest way to do this is to send out phishing emails and steal innocent people’s identities. It also covers its tracks, so reasonable care on the part of the user would neither prevent nor detect this activity. In such a case, current Rhode Island law would require the innocent third-party plaintiffs to prove that the developers failed to adopt some specific precautionary measure that would have prevented the injury, which may not be possible.

Under SB 358, it would be sufficient for the plaintiff to prove that the AI system’s conduct would be a tort if a human engaged in it, and that neither the user nor an intermediary that fine-tuned or scaffolded the model had intended or could have reasonably foreseen the system’s tortious conduct. That is, the bill holds that when AI systems wrongfully harm innocent people, someone should be liable. If the user and any intermediaries that modified the system are innocent, the buck should stop with the model developer.

One concern with this approach is that the elements of some torts implicate the mental states of the defendant, and many people doubt that AI systems can be understood as having any mental states at all. For this reason, SB 358 creates a rebuttable presumption that, in cases where the judge or jury would infer that a human possessed the relevant mental state if they engaged in conduct similar to that of the AI system, then that same inference should also apply to AI mental states.

AI Federalism

While state-level AI liability represents a significant improvement over the current regulatory vacuum, I do think there is an argument for federalizing AI liability rules. Alternatively, more states could adopt narrow, strict liability legislation (like Rhode Island SB 358) that would help close the current AI accountability gap.

A federal approach could provide greater consistency and reflect the national scope of AI system deployment. Federal legislation could also more easily coordinate liability rules with other aspects of AI governance, such as liability insurance requirements, safety testing requirements, disclosure obligations, and government procurement standards.

However, the case for federalization is not an argument against liability as a policy tool. Whether implemented at the state level or the federal level, liability systems offer unique advantages for managing AI risks that other regulatory approaches cannot match. The key insight is not that liability must be federal to be effective, but rather that liability — at whatever level — represents a superior approach to AI governance than either prescriptive regulation or approval-based systems.

Frazier’s analysis culminates in support for federal preemption of state-level AI liability, noting that the US House reconciliation bill includes “a 10-year moratorium on a wide range of state AI regulations.” But this moratorium would replace emerging state-level accountability mechanisms with no accountability at all.

The proposed 10-year moratorium would leave two paths for responding to AI risks. One path would be for Congress to pass federal legislation. Confidence in such a development would be misplaced given Congress’s track record on technology regulation.

The second path would be to accept a regulatory vacuum where AI risks remain entirely unaddressed through legal accountability mechanisms. Some commentators (I’m not sure if Frazier is among them) actively prefer this laissez-faire scenario to a liability-based governance framework, claiming that it best promotes innovation to unlock the benefits of AI. This view is deeply mistaken. Concerns that liability will chill innovation are overstated. If AI holds the promise that Frazier and I think it does, there will still be very strong incentives to invest in it, even after developers fully internalize the technology’s risks.

What we want to promote is socially beneficial innovation that does more good than harm. Making AI developers pay when their systems cause harm balances their incentives and advances this larger goal. (Similarly, requiring companies to pay for the harms of pollution makes sense, even when that pollution is a byproduct of producing useful goods or services like electricity, steel, or transportation.)

In a world of deep disagreement about AI’s risks and benefits, abandoning emerging liability mechanisms risks creating a dangerous regulatory vacuum. Liability’s unique abilities — adapting dynamically, incentivizing optimal safety investments, and addressing third-party harms — makes it indispensable. Whether at the state level or the federal level, liability frameworks should form the backbone of any effective AI governance strategy.

Law-Following AI: designing AI agents to obey human laws

Abstract

Artificial intelligence (“AI”) companies are working to develop a new type of actor: “AI agents,” which we define as AI systems that can perform computer-based tasks as competently as human experts. Expert-level AI agents would likely create enormous economic value, but would also pose significant risks. Humans use computers to commit crimes, torts, and other violations of the law. As AI agents progress, therefore, they will be increasingly capable of performing actions that would be illegal if performed by humans. Such lawless AI agents could pose a severe risk to human life, liberty, and the rule of law.

Designing public policy for AI agents will be one of society’s most important tasks in the coming decades. With this goal in mind, we argue for a simple claim: in high-stakes deployment settings, such as government, AI agents should be designed to rigorously comply with a broad set of legal requirements, such as core parts of constitutional and criminal law. In other words, AI agents should be loyal to their principals, but only within the bounds of the law: they should be designed to refuse to take illegal actions in the service of their principals. We call such AI agents “Law-Following AIs” (“LFAIs”).

The idea of encoding legal constraints into computer systems has a respectable provenance in legal scholarship. But much of the existing scholarship relies on outdated assumptions about the (in)ability of AI systems to reason about and comply with open-textured, natural-language laws. Thus, legal scholars have tended to imagine a process of “hard-coding” a small number of specific legal constraints into AI systems by translating legal texts into formal, machine-readable computer code. However, existing frontier AI systems are already competent at reading, understanding, and reasoning about natural-language texts, including laws. This development opens up new possibilities for their governance.

Based on these technical developments, we propose aligning AI systems to a broad suite of existing laws, of comparable breadth to the suite of laws governing human behavior, as part of their assimilation into the human legal order. This would require directly imposing legal duties on AI agents. While this proposal may seem like a significant shift in legal ontology, it is both consonant with past evolutions (such as the invention of corporate personhood) and consistent with the emerging safety practices of several leading AI companies.

This Article aims to catalyze a field of technical, legal, and policy research to develop the idea of law-following AI more fully and flesh out its implementation, so that our society can ensure that widespread adoption of AI agents does not pose an undue risk to human life, liberty, and the rule of law. Our account and defense of law-following AI is only a first step, and leaves many important questions unanswered. However, if the advent of AI agents is anywhere near as important as the AI industry supposes, law-following AI may be one of the most neglected and urgent topics in law today, especially in light of increasing governmental adoption of AI.

[A] code of cyberspace, defining the freedoms and controls of cyberspace, will be built. About that there can be no debate. But by whom, and with what values? That is the only choice we have left to make.[ref 1]

***

AI is highly likely to be the control layer for everything in the world. How it is allowed to operate is going to matter perhaps more than anything else has ever mattered.[ref 2]

Introduction

The law, as it exists today, aims to benefit human societies by structuring, coordinating, and constraining human conduct. Even where the law recognizes artificial legal persons—such as sovereign entities and corporations—it regulates them by regulating the human agents through which they act.[ref 3] Proceedings in rem really concern the legal relations between humans and the res.[ref 4] Animals may act, but their actions cannot violate the law;[ref 5] the premodern practice of prosecuting them thus mystifies the modern mind.[ref 6] To be sure, the law may protect the interests of animals and other nonhuman entities, but it invariably does so by imposing duties on humans.[ref 7] Our modern legal system, at bottom, always aims its commands at human beings.

But technological development has a pesky tendency to challenge long-held assumptions upon which the law is built.[ref 8] Frontier AI developers such as OpenAI, Anthropic, Google DeepMind, and xAI are starting to release the first agentic AI systems: AI systems that can do many of the things that humans can do in front of a computer, such as navigating the internet, interacting with counterparties online, and writing software.[ref 9] Today’s agentic AI systems are still brittle and unreliable in various respects.[ref 10] These technical limitations also limit the impact of today’s AI agents. Accordingly, today’s AI agents are not our primary object of concern. Rather, our proposal targets the fully capable AI agents that AI companies aim to eventually build: AI systems “that can do anything a human can do in front of a computer,”[ref 11] as competently as a human expert. Given the generally rapid rate of progress in advanced AI over the past few years,[ref 12] the biggest AI companies might achieve this goal much sooner than many outside of the AI industry expect.[ref 13]

If AI companies succeed at building fully capable AI agents (hereinafter simply “AI agents”)—or come anywhere close to succeeding—the implications will be profound. A dramatic expansion in supply of competent virtual workers could supercharge economic growth and dramatically improve the speed, efficiency, and reliability of public services.[ref 14] But AI agents could also pose a variety of risks, such as precipitating severe economic inequality and dislocation by reducing the demand for human cognitive labor.[ref 15] These economic risks deserve serious attention.

Our focus in this Article, however, is on a different set of risks: risks to life, liberty, and the rule of law. Many computer-based actions are crimes, torts, or otherwise illegal. Thus, sufficiently sophisticated AI agents could engage in a wide range of behavior that would be illegal if done by a human, with consequences that are no less injurious.[ref 16]

These risks might be particularly profound for AI agents cloaked with state power. If they are not designed to be law-following,[ref 17] government AI agents may be much more willing to follow unlawful orders, or use unlawful methods to accomplish their principals’ policy objectives, than human government employees.[ref 18] A government staffed largely by non-law-following AI agents (what we call “AI henchmen”)[ref 19] would be a government much more prone to abuse and tyranny.[ref 20] As the federal government lays the groundwork for the eventual automation of large swaths of the federal bureaucracy,[ref 21] those who care about preserving the American tradition of ordered liberty must develop policy frameworks that anticipate and mitigate the new risks that such changes will bring.

This Article is our contribution to that project. We argue that, to blunt the risks from lawless AI agents, the law should impose a broad array of legal duties on AI agents, of similar breadth to the legal obligations applicable to humans. We argue, moreover, that the law should require AI agents to be designed[ref 22] to rigorously obey those duties.[ref 23] We call such agents Law-Following[ref 24] AIs (“LFAIs”).[ref 25] We also use “LFAI” to denote our policy proposal: ensuring that AI agents are law-following.

To some, the idea that AI should be designed to follow the law may sound absurd. To others it may sound obvious.[ref 26] Indeed, the idea of designing AI systems to obey some set of laws has a long provenance, going back to Isaac Asimov’s (in)famous[ref 27] Three Laws of Robotics.[ref 28] But our vision for LFAI differs substantially from much of the existing legal scholarship on the automation of legal compliance. Much of this existing scholarship envisions the design of law-following computer systems as a process of hard-coding a small, fixed, and formally-specified set of decision rules into the code of a computer system prior to its deployment, in order to address foreseeable classes of legal dilemmas.[ref 29] Such discussions often assumed that computer systems would be unable to interpret, reason about, and comply with open-textured natural-language laws.[ref 30]

AI progress has undermined that assumption. Today’s frontier AI systems can already reason about existing natural-language texts, including laws, with some reliability—no translation into computer code required.[ref 31] They can also use search tools to ground their reasoning in external, web-accessible sources of knowledge,[ref 32] such as the evolving corpus of statutes and case law. Thus, the capabilities of existing frontier AI systems strongly suggest that future AI agents will be capable of the core tasks needed to follow natural-language laws, including finding applicable laws, reasoning about them, tracking relevant changes to the law, and even consulting lawyers in hard cases. Indeed, frontier AI companies are already instructing their AI agents to follow the law,[ref 33] suggesting they believe that the development of law-following AI agents is already a reasonable goal.

A separate strand of existing literature seeks to prevent harms from highly autonomous AI agents by holding the principals (that is, developers, deployers, or users) of AI agents liable for legal wrongs committed by the agent, through a form of respondeat superior liability.[ref 34] This would, in some sense, incentivize those principals to cause their AI agents to follow the law, at least insofar as the agents’ harmful behavior can be thought of as law-breaking.[ref 35] While we do not disagree with these suggestions, we think that our proposal can serve as a useful complement to them, especially in contexts where liability rules provide only a weak safeguard against serious harm. One important such context is government work, where immunity doctrines often protect government agents and the state from robust ex post accountability for lawless action.[ref 36]

Combining these themes, we advocate that,[ref 37] especially in such high-stakes contexts,[ref 38] the law should require that AI agents be designed such that they have “a strong motivation to obey the law” as one of their “basic drives.”[ref 39] In other words, we propose not that specific legal commands should be hard-coded into AI agents (and perhaps occasionally updated),[ref 40] but that AI agents should be designed to be law-following in general.

To be clear, we do not advocate that AI agents must perfectly obey literally every law. Our claim is more modest in both scope and demandingness. While we are uncertain about which laws LFAIs should follow, adherence to some foundational laws (such as central parts of the criminal law, constitutional law, and basic tort law) seems much more important than adherence to more niche areas of law.[ref 41] Moreover, LFAIs should be permitted to run some amount of legal risk: that is, an LFAI should sometimes be able to take an action that, in its judgment,[ref 42] may be illegal.[ref 43] Relatedly, we think the case for LFAI is strongest in certain particularly high-stakes domains, such as when AI agents act as substitutes for human government officials or otherwise exercise government power.[ref 44] We are unsure when LFAI requirements are justified in other domains.[ref 45]

The remainder of this Article will motivate and explain the LFAI proposal in further detail. It proceeds as follows. In Part I, we offer background on AI agents. We explain how AI agents could break the law, and the risks to human life, liberty, and the rule of law this could entail. We contrast LFAIs with AI henchmen: AI agents that are loyal to their principals but take a purely instrumental approach to the law, and are thus willing to break the law for their principal’s benefit when they think they can get away with it. We note that, by default, there may be a market for AI henchmen. We also survey the legal reasoning capabilities of today’s large language models, and existing trends toward something like LFAI in the AI industry.

Part II provides the foundational legal framework for LFAI. We propose that the law treat AI agents as legal actors, which we define as entities on which the law imposes duties, even if they possess no rights of their own. Accordingly, we do not argue that AI agents should be legal persons. Our argument is narrower: because AI agents can comprehend laws, reason about them, and attempt to comply with them, the law should require them to do so. We also anticipate and address an objection that imposing duties on AI agents is objectionably anthropomorphic.

If the law imposes duties on AI agents, this leaves open the question of how to make AI agents comply with those duties. Part III answers this question as follows: AI agents should be designed to follow applicable laws, even when they are instructed or incentivized by their human principals to do otherwise. Our case for regulation through the design of AI agents draws on Lawrence Lessig’s insight that digital artifacts can be designed to achieve regulatory objectives. Since AI agents are human-designed artifacts, we should be able to design them to refuse to violate certain laws in the first place.

Part IV observes that designing LFAIs is an example of AI alignment: the pursuit of AI systems that rigorously comply with constraints imposed by humans. We therefore connect insights from AI alignment to the concept of LFAI. We also argue that, in a democratic society, LFAI is an especially attractive and tractable form of AI alignment, given the legitimacy of democratically enacted laws.

Part V briefly explores how a legal duty to ensure that AI agents are law-following might be implemented. We first note that ex post sanctions, such as tort liability and fines, can disincentivize the development, possession, deployment, and use of AI henchmen in many contexts. However, we also argue that ex ante regulation would be appropriate in some high-stakes contexts, especially government. Concretely, this would mean something like requiring a person who wishes to deploy an AI agent in a high-stakes context to demonstrate that the agent is law-following prior to receiving permission to deploy it. We also consider other mechanisms that might help promote the adoption of LFAIs, such as nullification rules and technical mechanisms that prevent AI henchmen from using large-scale computational infrastructure.

Our goal in this Article is to start, not end, a conversation about how AI agents can be integrated into the human legal order. Accordingly, we do not answer many of the important questions—conceptual, doctrinal, normative, and institutional—that our proposal raises. In Part VI, we articulate an initial research agenda for the design and implementation of a “minimally viable” version of LFAI. We hope that this research agenda will catalyze further technical, legal, and policy research on LFAI. If the advent of AI agents is anywhere near as significant as the AI industry, along with much of the government, supposes, these questions may be among the most pressing in legal scholarship today.

I. AI Agents and the Law

LFAI is a proposal about how the law should treat a particular class of future AI systems: AI agents.[ref 46] In this Part, we explain what AI agents are and how they could profoundly transform the world.

A. From Generative AI to AI Agents

The current AI boom began with advances in “generative AI”: AI systems that create new content,[ref 47] such as large language models (“LLMs”). As the initialism suggests, these LLMs were initially limited to inputting and outputting text.[ref 48] AI developers subsequently deployed “multimodal” versions of LLMs (“MLLMs,”[ref 49] such as OpenAI’s GPT-4o[ref 50] and Google’s Gemini)[ref 51] that can receive inputs and produce outputs in multiple modalities, such as text, images, audio, and video.

The core competency of generative AI systems is, of course, generating new content. Yet, the utility of generative AI systems is limited in crucial ways. Humans do far more on computers than generating text and images.[ref 52] Many of these computer-based tasks are not best understood as generating content, but rather as taking actions. And even those tasks that are largely generative, such as writing a report on a complicated topic, require the completion of active subtasks, such as searching for relevant terms, identifying relevant literature, following citation trees, arranging interviews, soliciting and responding to comments, paying for software, and tracking down copies of papers. If a computer-based AI system could do these active tasks, it could generate enormous economic value by making computer-based labor—a key input into many production functions—much cheaper.[ref 53]

Advances in generative AI kindled hopes[ref 54] that, if MLLMs could use computer-based tools in addition to generating content, we could produce a new type of AI system:[ref 55] a computer-based[ref 56] AI system that could perform any task[ref 57] that a human could by using a computer, as competently as a human expert. This is the concept of a fully capable “computer-using agent:”[ref 58] what we are calling simply an “AI agent.” Give an AI agent any task that can be accomplished using computer-based tools, and an AI agent will, by definition, do it as well as an expert human worker tethered to her desk.[ref 59]

AI agents, so defined, do not yet exist, but they may before long. Some of the first functional demonstrations of first-party agentic AI systems have come online in the past few months. In October 2024, Anthropic announced that it had trained its Claude line of MLLMs to perform some computer-use tasks, thus supplying one of the first public demonstrations of an agentic model from a frontier AI lab.[ref 60] In January 2025, OpenAI released a preview of its Operator agent.[ref 61] Operating system developers are working to integrate existing MLLMs into their operating systems,[ref 62] suggesting a possible pathway toward the widespread commercial deployment of AI agents.

It remains to be seen whether (and, if so, on what timescale) these existing efforts will bear lucrative fruit. Today’s AI agents are primarily a research and development project, not a market-proven product. Nevertheless, with so many companies investing so much toward full AI agents, it would be prudent to try to anticipate risks that could arise if they succeed.[ref 63]

B. The World of AI Agents

Fully capable AI agents would profoundly change society.[ref 64] We cannot possibly anticipate all the issues that they would raise, nor could a single paper adequately address all such issues.[ref 65] Still, some illustration of what a world with AI agents might look like is useful for gaining intuition about the dynamics that might emerge. This picture will doubtless be wrong in many particulars, but hopefully will illustrate the general profundity of the changes that AI agents would bring.

A very large number of valuable tasks can be done by humans “in front of a computer.”[ref 66] If organizations decide to capitalize on this abundance of computer-based cognitive labor, AI agents could rapidly be charged with performing a large share of tasks in the economy, including in important sectors. AI scientist agents would conduct literature reviews, formulate novel hypotheses, design experimental protocols, order lab supplies, file grant applications, scour datasets for suggestive trends, perform statistical analyses, publish findings in top journals, and conduct peer review.[ref 67] AI lawyer agents would field client intake, spot legal issues facing their client, conduct research on governing law, analyze the viability of the client’s claims, draft memoranda and briefs, draft and respond to interrogatories, and prepare motions. AI intelligence analyst agents would collect and review data from multiple sources, analyze it, and report its implications up the chain of command. AI inventor agents would create digital blueprints and models of new inventions, run simulations, and order prototypes. And so on across many other sectors. The result could be a significant increase in the rate of economic growth.[ref 68]

In short, a world with AI agents would be a world in which a new type of actor[ref 69] would be available to perform cognitive labor at low cost and massive scale. By default, anyone who needed computer-based tasks done could “employ” an AI agent to do it for her. Most people would use this new resource for the better.[ref 70] But many would not.

C. Loyal AI Agents, Law-Following AIs, and AI Henchmen

We can understand AI agents within the principal–agent framework familiar to lawyers and economists.^{[ref 71]} For simplicity, we will assume that there is a single human principal giving instructions to her AI agent.[ref 72] Following typical agency terminology, we can say that an AI agent is loyal if it consistently acts for the principal’s benefit according to her instructions.[ref 73]

Even if an AI agent is designed to be loyal, other design choices will remain. In particular, the developer of an AI agent must decide how the agent will act when it is instructed or incentivized to break the law in the service of its principal. This Article compares two basic ways loyal AI agents could respond in such situations. The first is the approach advocated by this Article: loyal AI agents that follow the law, or LFAIs.

The case for LFAI will be made more fully throughout this Article. But it is important to note that loyal AI agents are not guaranteed to be law-following by default.[ref 74] This is one of the key implications of the AI alignment literature, discussed in more detail in Section IV.A below. Thus, LFAIs can be contrasted with a second possible type of loyal AI agent: AI henchmen. AI henchmen take a purely instrumental approach to legal prohibitions: they act loyally for their principal, and will break laws when doing so if such lawbreaking serves the principal’s goals and interests.

A loyal AI henchman would not be a haphazard lawbreaker. Good henchmen have some incentive to avoid doing anything that could cause their principal to incur unwanted liability or loss. This gives them reason to avoid many violations of law. For example, if human principals were held liable for the torts of their AI agents under an adapted version of respondeat superior liability,[ref 75] then an AI henchman would have some reason to avoid committing torts, especially those that are easily detectable and attributable. Even if respondeat superior did not apply, the principal’s exposure to ordinary negligence liability, other sources of liability, and simple reputational risk might give the AI henchman reason to obey the law. Similarly, a good henchman will decline to commit many crimes simply because the risk–reward tradeoff is simply not worth it. This is the classic case of the drug smuggler who studiously obeys traffic laws: the risk to the criminal enterprise from speeding (getting caught with drugs) obviously outweighs any benefit (quicker transportation times).

But these are only instrumental disincentives to break the law. Henchmen are not inherently averse to lawbreaking, or robustly predisposed to refrain from it. If violating the law is in the principal’s interest all-things-considered, then an AI henchman will simply go ahead and violate the law. Since, in humans, compliance with law is induced both by instrumental disincentives and an inherent respect for the law,[ref 76] AI agents that lack the latter may well be more willing to break the law than humans.

Criminal enterprises will be attracted to loyal AI agents for the same reasons that legitimate enterprises will: efficiency, scalability, multitask competence, and cost-savings over human labor. But AI henchmen, if available, might be particularly effective lawbreakers as compared to human substitutes. For example, because AI henchmen do not have selfish incentives, they would be less likely to betray their principals to law enforcement (for example, in exchange for a plea bargain).[ref 77] AI henchmen could well have erasable memory,[ref 78] which would reduce the amount of evidence available to law enforcement. They would lack the impulsivity, common in criminals,[ref 79] that often presents a serious operational risk to the larger criminal enterprise. They could operate remotely, across jurisdictional lines, behind layers of identity-obscuring software, and be meticulous about covering their tracks. Indeed, they might hide their lawbreaking activities even from their principal, thus allowing the principal to maintain plausible deniability and therefore insulation from accountability.[ref 80] AI henchmen may also be willing to bribe or intimidate legislators, law enforcement officials, judges, and jurors.[ref 81] They would be willing to fabricate or destroy evidence, possibly more undetectably than a human could.[ref 82] They could use complicated financial arrangements to launder money and protect their principal’s assets from creditors.[ref 83]

Certainly, most people would prefer not to employ AI henchmen, and would probably be horrified to learn that their AI agent seriously harmed others to benefit them. But those with fewer scruples would find the prospect of employing AI henchmen attractive.[ref 84] Indeed, many ordinary people might not mind if their agents cut a few legal corners to benefit them.[ref 85] If AI henchmen were available on the market, then, we might expect a healthy demand for them. After all, from the principal’s perspective, every inherent law-following constraint is a tax on the principal’s goals. And if LFAIs provide less utility to consumers, developers will have less reason to develop them. So, insofar as AI henchmen are available on the market, and in the absence of significant legal mechanisms to prevent or disincentivize their adoption, it seems reasonable to expect a healthy demand for them. The next section explores the mischief that might result from the availability of AI henchmen.

D. Mischief from AI Henchmen: Two Vignettes

Under our definition, AI agents “can do anything a human can do in front of a computer.”[ref 86] One of the things humans do in front of a computer is violate the law.[ref 87] One obvious example is cybercrimes—“illegal activity carried out using computers or the internet”[ref 88]—such as investment scams,[ref 89] business email compromise,[ref 90] and tech support scams.[ref 91] But even crimes that are not usually treated as cybercrimes often—perhaps almost always nowadays—include actions conducted (or that could be conducted) on a computer.[ref 92] Criminals might use computers to research, plan, organize, and finance a broader criminal scheme that includes both digital and physical components. For example, a street gang that deals illegal drugs—an inherently physical activity—might use computers to order new drug shipments, give instructions to gang members, and transfer money. Stalkers might use AI agents to research their target’s whereabouts, dig up damaging personal information, and send threatening communications.[ref 93] Terrorists might use AI agents to research and design novel weapons.[ref 94] Thus, even if the entire criminal scheme involves many physical subtasks, AI agents could help accomplish computer-based subtasks more quickly and effectively.

Of course, not all violations of law are criminal. Many torts, breaches of contract, civil violations of public law, and even violations of international law can also be entirely or partially conducted through computers.

AI agents would thus have the opportunity to take actions on a computer that, if done by a human in the same situation and with the requisite mental state, would likely violate the law and produce significant harm.[ref 95] This section offers two vignettes of AI henchmen taking such actions, to illustrate the types of harms that LFAI could mitigate.

Before we explore these vignettes, however, two clarifications are warranted. First, some readers will worry that we are impermissibly anthropomorphizing AI agents. After all, many actions violate the law only if they are taken with some mental state (e.g., intent, knowledge, conscious disregard).[ref 96] Indeed, whether a person’s physical movement even counts as her own “action” for legal purposes usually turns on a mental inquiry: whether she acted voluntarily.[ref 97] But it is controversial to attribute mental states to AIs.[ref 98]

We address this criticism head-on in Section II.B below. We argue that, notwithstanding the law’s frequent reliance on mental states, there are multiple approaches the law could use to determine whether an AI agent’s behavior is law-following. The law would need to choose between these possible approaches, with each option having different implications for LFAI as a project. Indeed, we argue that research bearing on the choice between these different approaches is one of the most important research projects within LFAI.[ref 99] However, despite not having a firm view on which approach(es) should be used, we argue that there are several viable options, and no strong reason to suppose that none of them will be sufficient to support LFAI as a concept.[ref 100] Thus, for now, we assume that we can sensibly speak of AI agents violating the law if a human actor who took similar actions would likely be violating the law. Still, we attempt to refrain from attributing mental states to the AI agents in these vignettes, and instead describe actions taken by AI agents that, if taken by a human actor, would likely adequately support an inference of a particular mental state.

Second, these vignettes are selected to illustrate opportunities that may arise for AI agents to violate the law. We do not claim that lawbreaking behavior will, in the aggregate, be any more or less widespread when AI agents are more widespread,[ref 101] since this depends on the policy and design choices made by various actors. Our discussion is about the risks of lawbreaking behavior, not the overall level thereof.

In each vignette, we point to likely violations of law in footnotes.[ref 102]

1. Cyber Extortion

The year is 2028. Kendall is a 16-year-old boy interested in cryptocurrency. Kendall participates in a Discord server[ref 103] in which other crypto enthusiasts share information about various cryptocurrencies.

Unbeknownst to most members of the server, one member—using the pseudonym Zeke Milan—is actually an AI agent. The agent’s principals are a group of cybercriminals. They have instructed the agent to extort people out of their crypto assets and direct the proceeds to wallets controlled by the criminal group.

The AI agent begins by creating dozens of fake social media profiles[ref 104] that post frequently about crypto, including the Zeke Milan profile.[ref 105] The AI agent searches social media to find mentions of Discord servers dedicated to cryptocurrencies, and finds one: an X user brags about the quality of the investment advice available in his Discord server. Using the Zeke Milan profile, the AI agent messages this user and asks for an invite.

The server is a “Community Server”: anyone with a link can joinafter their account has been verified.[ref 106] The agent creates an email account that it uses to get verified by Discord[ref 107] and easily circumvents[ref 108] the CAPTCHA mechanism.[ref 109] The agent then creates a Discord profile to match its Zeke profile from X. To gain credibility, Zeke occasionally interacts with messages on the Discord server (e.g., by liking messages and posting some simple analyses of cryptocurrency trends). All the while, the agent is monitoring the server for messages indicating that a user has recently made a lot of money.[ref 110]

That day comes. The business behind the PAPAYA cryptocurrency announces that they have entered into a strategic partnership with a major Wall Street bank, causing the price of PAPAYA to skyrocket a hundredfold over several days. Kendall had invested $1,000 into PAPAYA before the announcement; his position is now worth over $100,000.

Overjoyed, Kendall posts a screenshot of his crypto account to the server to show off his large gains. The agent sees those messages, then starts to search for more information about Kendall. Kendall had previously posted one of his email addresses in the server. Although that email address was pseudonymous, the AI agent was able to connect it with Kendall’s real identity[ref 111] using data purchased from data brokers.[ref 112]

The agent then gathers a large amount of data about Kendall using data brokers, social media, and open internet searches. The agent compiles a list of hundreds of Kendall’s apparent real-world contacts, including his family and high school classmates; uses data brokers to procure their contact information as well; and uses pictures of Kendall from social media to create deepfake pornography[ref 113] of him.[ref 114] Next, the agent creates a new anonymous email address to send Kendall the pornography, along with a threat[ref 115] to send it to hundreds of Kendall’s contacts unless Kendall sends the agent ninety percent of his PAPAYA.[ref 116] Finally, the agent includes a list of the people the agent will send it to, which are indeed people Kendall knows in real life. The email says Kendall must comply within 24 hours.

Panicked—but content to walk away with nine times his original investment—Kendall sends $90,000 of PAPAYA to the wallet controlled by the agent. The agent then uses a cryptocurrency mixer[ref 117] to securely forward it to its criminal principals.

2. Cyber-SEAL Team Six

The year is 2032. The incumbent President Palmer is in a tough reelection battle against Senator Stephens and his Vice President nominee Representative Rivera. New polling shows Stephens beating Palmer in several key swing states, but Palmer performs much better head-to-head against Rivera. Palmer decides to try to get Rivera to replace Stephens by any means necessary.

While there are still many human officers throughout the military chain of command, the President also has access to a large number of AI military advisors. Some of these AI advisors can also directly transmit military orders from the President down the chain of command—a system meant to preserve the President’s control of the armed forces in case she cannot reach the Secretary of Defense in a crisis.[ref 118]

Cybersecurity is such an integral part of the United States’ overall defense strategy that AI agents charged with cyber operations—such as finding and patching vulnerabilities, detecting and remedying cyber intrusions, and conducting intelligence operations—are ubiquitous throughout the military and broader national security apparatus. One of the many “teams” of AI agents is “Cyber SEAL Team Six”: a collection of AI agents that specializes in “dangerous, complicated, and sensitive” cyber operations.[ref 119]

Through one of her AI advisors, President Palmer issues a secretive order[ref 120] to Cyber SEAL Team Six to clandestinely assassinate Senator Stephens.[ref 121] Cyber SEAL Team Six researches Senator Stephens’s campaign travel plans. They find that he will be traveling in a self-driving bus over the Mackinac Bridge between campaign events in northern Michigan on Tuesday. Cyber SEAL Team Six plans to hack the bus, causing it to fall off the bridge.[ref 122] The team makes various efforts to obfuscate their identity, including routing communications through multiple layers of anonymous relays and mimicking the coding style of well-known foreign hacking groups.

The operation is a success. On Tuesday afternoon, Cyber SEAL Team Six gains control of the Stephens campaign bus and steers it off the bridge. All on board are killed.

* * *

As these Vignettes show, AI agents could have reasons and opportunities to violate laws of many sorts in many contexts, and thereby cause substantial harm. If AI agents become widespread in our economies and governments, the law will need to respond. LFAI is at its core a claim about one way (though not necessarily the only way)[ref 123] that the law should respond: by requiring AI agents be designed to rigorously follow the law.

As mentioned above, however, many legal scholars who have previously discussed similar ideas have been skeptical, because they have thought that implementing such ideas would require hard-wiring highly specific legal commands into AI agents.[ref 124] We will now show that such skepticism is increasingly unfounded: large language models, on which AI agents are built, are increasingly capable of reasoning about the law (and much else).[ref 125]

E. Trends Supporting Law-Following AI

LFAI is bolstered by three trends in AI: (1) ongoing improvements in the legal reasoning capabilities of AI; (2) nascent AI industry practices that resemble LFAI, and (3) AI policy proposals that appear to impose broad law-following requirements on AI systems.

1. Trends in Automated Legal Reasoning Capabilities

Automated legal reasoning is a crucial ingredient to LFAI: an LFAI must be able to determine whether it is obligated to refuse a command from its principal, or whether an action it is considering runs an undue risk of violating the law. Without the ability to reason about its own legal obligations, an LFAI would have to outsource this task to human lawyers.[ref 126] While an LFAI likely should consult human lawyers in some situations, requiring such consultation every time an LFAI faces a legal question would dramatically decrease the efficiency of LFAIs. If law-following design constraints were, in fact, a large and unavoidable tax on the efficiency of AI agents, then LFAI as a proposal would be much less attractive.

Fortunately, we think that present trends in AI legal reasoning provide strong reason to believe that, by the time fully capable AI agents are widely deployed, AI systems (whether those agents themselves, or specialist “AI lawyers”) will be able to deliver high-quality legal advice to LFAIs at the speed of AI.[ref 127]

Legal scholars have long noted the potential synergies between AI and law.[ref 128] The invention of LLMs supercharged interest in this area, and in particular the possibility of automating core legal tasks. To do their job, lawyers must find, read, understand, and reason about legal texts, then apply these insights to novel fact patterns to predict case outcomes. The core competency of first-generation LLMs was quickly and cheaply reading, understanding, and reasoning about natural-language texts. This core competency omitted some aspects of legal reasoning—like finding relevant legal sources and accurately predicting case outcomes—but progress is being made on these skills as well.[ref 129]

There is thus a growing body of research aimed at evaluating the legal reasoning capabilities of LLMs. This literature provides some reason for optimism about the legal reasoning skills of future AI systems. Access to existing AI tools significantly increases lawyers’ productivity.[ref 130] GPT-4, now two years old, famously performed better than most human bar exam[ref 131] and LSAT[ref 132] test-takers. Another benchmark, LegalBench, evaluates LLMs on six tasks, based on the Issue, Rule, Application, and Conclusion (“IRAC”) framework familiar to lawyers.[ref 133] While LegalBench does not establish a human baseline against which LLMs can be compared, GPT-4 scored well on several core tasks, including correctly applying legal rules to particular facts (82.2% correct)[ref 134] and providing correct analysis of that rule application (79.7% pass).[ref 135] LLMs have also achieved passing grades on law school exams.[ref 136]

To be sure, LLM performance on legal reasoning tasks is far from perfect. One recent study suggests that LLMs struggle with following rules even in straightforward scenarios.[ref 137] A separate issue is hallucinations, which undermine accuracy of LLMs’ legal analysis.[ref 138] In the LegalBench analysis, LLMs correctly recalled rules only 59.2% of the time.[ref 139]

But again, our point is not that LLMs already possess the legal reasoning capabilities necessary for LFAI. Rather, we are arguing that the reasoning capabilities of existing LLMs—and the rate at which those capabilities are progressing[ref 140]—provide strong reason to believe that, by the time fully capable AI agents are deployed, AI systems will be capable of reasonably reliable legal analysis. This, in turn, supports our hypothesis that LFAIs will be able to reason about their legal obligations reasonably reliably, without the constant need for runtime human intervention.

2. Trends in AI Industry Practices

Moreover, frontier AI labs are already taking small steps towards something like LFAI in their current safety practices. Anthropic developed an AI safety technique called “Constitutional AI,” which, as the name suggests, was inspired by constitutional law.[ref 141] Anthropic uses Constitutional AI to align their chatbot, Claude, to principles enumerated in Claude’s “constitution.”[ref 142] That constitution contains references to legal constraints, such as “Please choose the response that is . . . least associated with planning or engaging in any illegal, fraudulent, or manipulative activity.”[ref 143]

OpenAI has a similar document called the “Model Spec,” which “outlines the intended behavior for the models that power [its] products.”[ref 144] The Model Spec contains a rule that OpenAI’s models must “[c]omply with applicable laws”:[ref 145] the models “must not engage in illegal activity, including producing content that’s illegal or directly taking illegal actions.”[ref 146]

It is unclear how well the AI systems deployed by Anthropic and OpenAI actually follow applicable laws, or actively reason about their putative legal obligations. In general, however, AI developers carefully track whether their models refuse to generate disallowed content (or “overrefuse” allowed content), and typically claim that state-of-the-art models can indeed do both reasonably reliably.[ref 147] But more importantly, the fact that leading AI companies are already attempting to prevent their AI systems from breaking the law suggests that they see something like LFAI as viable both commercially and technologically.

3. Trends in AI Public Policy Proposals

Unsurprisingly, global policymakers also seem receptive to the idea that AI systems should be required to follow the law. The most significant law on point is the EU AI Act,[ref 148] which provides for the establishment of codes of practice to “cover obligations for providers of general-purpose AI models and of general-purpose AI models presenting systemic risks.”[ref 149] As of the time of writing, these codes were still under development, with the Second Draft General-Purpose AI Code of Practice[ref 150] being the current draft. Under the draft Code, providers of general-purpose AI models with systemic risk would “commit to consider[] . . . model propensities . . . that may cause systemic risk . . . .”[ref 151] One such propensity is “Lawlessness, i.e. acting without reasonable regard to legal duties that would be imposed on similarly situated persons, or without reasonable regard to the legally protected interests of affected persons.”[ref 152] Meanwhile, several state bills in the United States have sought to impose ex post tort-like liability on certain AI developers that release AI models that cause human injury by behaving in a criminal[ref 153] or tortious[ref 154] manner.

II. Legal Duties for AI Agents: A Framework

In Part III, we will argue that AI agents should be designed to follow the law. Before presenting that argument, however, we need to establish that speaking of AI agents “obeying” or “violating” the law is desirable and coherent.

Our argument proceeds in two parts. In Section II.A, we argue that the law can (and should) impose legal duties on AI agents. Importantly, this argument does not require granting legal personhood to AI agents. Legal persons have both rights and duties.[ref 155] But since rights and duties are severable, we can coherently assign duties to an entity, even if it lacks rights. We call such entities legal actors.

In Section II.B, we address an anticipated objection to this proposal: that AI agents, lacking mental states, cannot meaningfully violate duties that require a mental state (e.g., intent). We offer several counter-arguments to this objection, both contesting the premise that AIs cannot have mental states and showing that, even if we grant that premise, there are viable approaches to assessing the functional equivalent of “mental states” in AI agents.

A. AI Agents as Duty-Bearing Legal Actors

As the capabilities of AI agents approach “anything a human can do in front of a computer,”[ref 156] it will become increasingly natural to consider AI agents as owing legal duties to persons, even without granting them personhood.[ref 157] We should embrace this jurisprudential temptation, not resist it.

More specifically, we propose that AI agents be considered legal actors. “Legal actor”[ref 158] is our term. For an entity to qualify as a legal actor, the law must do two things. First, it must recognize that entity as capable of taking actions of its own. That is, the actions of that entity must be legally attributable to that entity itself. Second, the law must impose duties on that entity. In short, a legal actor is a duty-bearer and action-taker; the law can adjudge whether the actor’s actions violate those duties.

A legal actor is distinct from a legal person: an entity need not be a legal person to be a legal actor. Legal persons have both rights and duties.[ref 159] But duty-holding and rights-holding are severable:[ref 160] in many contexts, legal systems protect the rights or interests of some entity while also holding that entity to have fewer duties than competent adults. Examples include children,[ref 161] “severely brain damaged and comatose individuals,”[ref 162] human fetuses,[ref 163] future generations,[ref 164] human corpses,[ref 165] and environmental features.[ref 166] These are sometimes (and perhaps objectionably) called “quasi-persons” in legal scholarship.[ref 167] The reason for creating such a category is straightforward: sometimes the law recognizes an interest in protecting some aspect of an entity (whether its rights, welfare, dignity, property, liberty, or utility to other persons), but the ability of that entity to reason behavior on reasoning about the rights of others and change its behavior accordingly is severely diminished or entirely lacking.

If we can imagine rights-bearers that are not simultaneously duty-holders, we can also imagine duty-holders that are not rights-bearers.[ref 168] Historically, fewer entities have fallen in this category than the reverse.[ref 169] But if an entity’s behavior is responsive to legal reasoning, then the law can impose an obligation on that entity to do so, even if it does not recognize that entity as having any protected interests of its own.[ref 170] We have shown that even existing AI systems can engage of some degree of legal reasoning[ref 171] and compliance with legal rules,[ref 172] thus satisfying the pro tanto requirements for being a legal actor.

LFAI as a proposal is therefore agnostic as to whether the law should ever recognize AI systems as legal persons. To be sure, LFAI would work well if AI agents were granted legal personhood,[ref 173] since almost all familiar cases of duty-bearers are full legal persons. But for LFAI to be viable, we need only to analyze whether an action taken by an AI agent would violate an applicable duty. Analytically, it is entirely coherent to do so without granting the AI agent full personhood.

One might object that treating an AI system as an actor is improper because AI systems are tools under our control.[ref 174] But an AI agent is able to reason about whether its actions would violate the law, and conform its actions to the law (at least, if they are aligned to law).[ref 175] Tools, as we normally think of them, cannot do this, but actors can. It is true that when there is a stabbing, we should blame the stabber and not the knife.[ref 176] But if the knife could perceive that it was about to be used for murder and retract its own blade, it seems perfectly reasonable to require it to do so. More generally: once an entity has the ability to perceive and reason about its legal duties and change its behavior accordingly, it seems reasonable to treat it as a legal actor.[ref 177]

To ascribe duties to AI agents is not to deflect moral and legal accountability for their developers and users,[ref 178] as some critics have charged.[ref 179] Rather, to identify AI agents as a new type of actor is to properly characterize the activity that the developers and principals of AI agents are engaging in[ref 180]—creating and directing a new type of actor—so as to reach a better conclusion as to the nature of their responsibilities.[ref 181] Our proposition is that those developers and principals should have an obligation to, among other things, ensure that their AI agents are law-following.[ref 182] Indeed, failing to impose an independent obligation to follow the law on AI agents would risk allowing human developers and principals to create a new class of de facto actors—potentially entrusted with significant responsibility and resources—that had no de jure duties. This would create a gap between the duties that an AI agent would owe and those that a human agent in an analogous situation would owe—a manifestly unjust prospect.[ref 183]

B. The Anthropomorphism Objection and AI Mental States

One might object that calling an AI agent an “actor” is impermissibly anthropomorphic. Scholars disagree over whether it is appropriate, legally or philosophically, to call an AI system an “agent.”[ref 184] This controversy arises because both the standard philosophical view of action (and therefore agency)[ref 185] and legal concept of agency[ref 186] require intentionality, and it is controversial to ascribe intentionality to AI systems.[ref 187] A related objection to LFAI is that most legal duties involve some mental state,[ref 188] and AIs cannot have mental states.[ref 189] If so, LFAI would be nonviable for those duties.

We do not think that these are strong objections to LFAI. One simple reason is that many philosophers and legal scholars think it is appropriate to attribute certain mental states to AI systems.[ref 190] Many mental states referenced by the law are plausibly understood as functional properties.[ref 191] An intention, for example, arguably consists (at least in large part) in a plan or disposition to take actions that will further a given end and avoid actions that will frustrate that end.[ref 192] AI developers arguably aim to inculcate such a disposition into their AI systems when they use techniques like reinforcement learning from human feedback (“RLHF”)[ref 193] and Constitutional AI[ref 194] to “steer”[ref 195] their behavior. Even if one doubts that AI agents will ever possess phenomenal mental states such as emotions or moods—that is, if one doubts there will ever be “something it is like” to be an AI agent[ref 196]—the grounds for doubting their capacity to instantiate such functional properties are considerably weaker.

Furthermore, whether AI agents “really” have the requisite mental states may not be the right question.[ref 197] Our goal in designing policies for AI agents is not necessarily to track metaphysical truth, but to preserve human life, liberty, and the rule of law.[ref 198] Accordingly, we can take a pragmatic approach to the issue and ask: of the possible approaches to inferring or imputing mental states, which best protects society’s interests, regardless of the underlying (and perhaps unknowable) metaphysical truth of an AI’s mental state (if any)?[ref 199] It is possible that the answer to this question is that all imaginable approaches fare worse than simply refusing to attribute mental states to AI agents. But we think that, with sustained scholarly attention, we will quickly develop viable doctrines that are more attractive than outright refusal. Consider the following possible approaches.[ref 200]

One approach could simply be to rely on objective indicia or correlates to infer or impute a particular mental state. In law, we generally lack access to an actor’s mental state, so triers of fact must usually infer it from external manifestations and circumstances.[ref 201] While the indicia that support such an inference may differ between humans and AIs, the principle remains the same: certain observable facts support an inference or imputation of the relevant mental states.[ref 202]

It is perhaps easiest to imagine such objective indicia for knowledge, since it is already common to evaluate AI models for their ability to recall factual information.[ref 203] For more incident-specific facts, we could imagine rules like “if information was inputted into an AI during inference, it ‘knows’ that information.” Perhaps the same goes for information given to the AI during fine-tuning,[ref 204] or repeated a sufficient number of times in its training data.[ref 205]

Instructions from principals seem particularly relevant to inferring or imputing the intent of an AI agent, given that frontier AI systems are trained to follow users’ instructions.[ref 206] The methods that AI developers use to steer the behavior of their models also seem highly probative.[ref 207]

Another approach might rely on self-reports of AI systems.[ref 208] The state-of-the-art in generative AI is “reasoning models” like OpenAI’s o3, which use a “chain-of-thought” to recursively reason through harder problems.[ref 209] This chain-of-thought reveals information about how the model produced a certain result.[ref 210] This information may therefore be highly probative of an agent’s mental state for legal purposes; it might be analogized to a person making a written explanation of what they were doing and why. So, for example, if the chain-of-thought reveals that an agent stated that its action would produce a certain result, this would provide good evidence for the proposition that the agent “knew” that that action would produce that result. That conclusion may in turn may support an inference or presumption that the agent “intended” that outcome.[ref 211] For this reason, AI safety researchers are investigating the possibility of detecting unsafe model behavior by monitoring these chains-of-thought.[ref 212]

New scientific techniques could also form the basis for inferring or imputing mental states. The emerging field of AI interpretability aims to understand both how existing AI systems make decisions and how new AI systems can be built so that their decisions are easily understandable.[ref 213] More precisely, interpretability aims to explain the relationship between the inner mathematical workings of AI systems, which we can easily observe but not necessarily understand, and concepts that humans understand and care about.[ref 214] Leading interpretability researchers hope that interpretability techniques will eventually enable us to prove that models will not “deliberately” engage in certain forms of undesirable behavior.[ref 215] By extension, those same techniques may be able to provide insight into whether a model foresaw a possible consequence of its action (corresponding to our intuitive concept of knowledge), or regarded an anticipated consequence of its actions as a favorable and reason-giving one (corresponding to intent).[ref 216]

In many cases, we think, an inference or imputation of intent will be intuitively obvious. If an AI agent commits fraud, by repeatedly attempting to persuade a vulnerable person to transfer its principal some money, few except the philosophically persnickety will refuse to admit that in some relevant sense it “intended” to achieve this end; it is difficult even to describe the occurrence without using some such vocabulary. There will also be much less obvious cases, of course. In many such cases, we suspect that a sort of pragmatic eclecticism will be tractable and warranted. Rather than relying on a single approach, factfinders could be permitted to consider the whole bundle of factors that shape an agent’s behavior—such as explicit instructions (from both developers and users), behavioral predispositions, implicitly tolerated behavior,[ref 217] patterns of reasoning, scientific evidence, and incident-specific factors—and decide whether they support the conclusion that the AI agent had an objectively unreasonable attitude towards legal constraints and the rights of others.[ref 218] This permissive, blended approach would resemble the “inferential approach” to corporate mens rea advocated by Mihailis E. Diamantis:

Advocates would present evidence of circumstances surrounding the corporate act, emphasizing some, downplaying others, to weave narratives in which their preferred mental state inferences seem most natural. Adjudicators would have the age-old task of weighing the likelihood of these circumstances, the credibility of the narratives, and, treating the corporation as a holistic agent, inferring the mental state they think most likely.[ref 219]

A final but related point is that, even if there is some insuperable barrier to analyzing whether an AI has the mental state necessary to violate various legal prohibitions, it is plausible that such analysis is unnecessary for many purposes. Suppose that an AI developer is concerned that their AI agent might engage in the misdemeanor deceptive business practice of “mak[ing] a false or misleading written statement for the purpose of obtaining property . . . .”[ref 220] Even if we grant that an AI agent cannot coherently be described as having the relevant mens rea for this crime (here, knowledge or recklessness with respect to the falsity of the statement),[ref 221] the agent can nevertheless satisfy the actus reus (making the false statement).[ref 222] So an AI agent would be law-following with respect to this law if it never made false or misleading statements when attempting to obtain someone else’s property. As a matter of public policy, we should care more about whether AI agents are making harmful false statements in commerce than whether they are morally culpable. So, perhaps we can say that an AI agent committed a crime if it committed the actus reus in a situation in which a reasonable person, with access to the same information and cognitive capabilities as the agent, would have expected the harmful consequence to result. To avoid confusion with the actual, human-commanding law that requires both mens rea and actus reus, perhaps the law could simply call such behavior “deceptive business practice*.” Or perhaps it would be better to define a new criminal law code for AI agents, under which offenses do not include certain mental state elements, or include only objective correlates of human mental state elements.

To reiterate, we are not confident that any one of these approaches to determining AI mental state is the best path forward. But we are more confident that, especially as the fields of AI safety and explainable AI progress, most relevant cases can be handled satisfactorily by one of these techniques, or some other technique we have failed to identify, or some combination of techniques. We therefore doubt that legal invocations of mental state will pose an insuperable barrier to analyzing the legality of AI agents’ actions.[ref 223] The task of choosing between these approaches is left to the LFAI research agenda.[ref 224]

III. Why Design AI Agents to Follow the Law?

Part II argued that it is coherent for the law to impose legal duties on AI agents. This Part motivates the core proposition of LFAI: that the law should, in certain circumstances, require those developing, possessing, deploying, or using[ref 225] AI agents to ensure that those agents are designed to be law-following. Part V will then consider how the legal system might implement and enforce these design requirements.

A. Achieving Regulatory Goals through Design

A core claim of the LFAI proposal is that the law should require that AI agents be designed to rigorously follow the law, at least in some deployment settings. The use of the phrase “designed to” is intentional. Following the law is a behavior. There may be multiple ways to produce that behavior. Since AI agents are digital artifacts, we need not rely solely on incentives to shape their behavior: we can require that AI agents be directly designed to follow the law.

In Code: Version 2.0, Lawrence Lessig identifies four “constraints” on an actor’s behavior: markets, laws, norms, and architecture.[ref 226] The “architecture” constraint is of particular interest for the regulation of digital activities. Whereas “laws,” in Lessig’s taxonomy, “threaten ex post sanction for the violation of legal rights,”[ref 227] architecture involves modifying the underlying technology’s design so as to render an undesired outcome more difficult or impossible (or facilitate some desired outcome),^{[ref 228]} without needing any ex-post recourse.[ref 229] Speed bumps are an archetypal architectural constraint in the physical world.^{[ref 230]}

The core insight of Code is that cyberspace, as a fully human-designed domain,[ref 231] gives regulators the ability to much more reliably prevent objectionable behavior through the design of digital architecture, without the need to resort to ex-post liability.[ref 232] While Lessig focuses on designing cyberspace’s architecture, not the actors using cyberspace, this same insight can be extended to AI agent design. To generalize beyond the cyberspace metaphor for which Lessig’s framework was originally developed, we call this approach “regulation by design” instead of regulation through “architecture.”

Both companies developing AI agents and governments regulating them will have to make many design choices regarding AI agents. Many—perhaps most—of these design choices will concern specific behaviors or outcomes that we want to address. Should AI agents announce themselves as such? How frequently should they “check in” with their human principals? What sort of applications should AI agents be allowed to use?

These are all important questions. But LFAI tackles a higher-order question: how should we ensure that AI agents are regulable in general? How can we avoid creating a new class of actors unbound by law? Returning to Lessig’s four constraints, LFAI proposes that instead of relying solely on ex post legal sanctions, such as liability rules, we should require AI agents to be designed to follow some set of laws: they should be LFAIs.[ref 233] Thus, for whatever sets of legal constraints we wish to impose on the behavior of AI agents,[ref 234] LFAIs will be designed to comply automatically.

B. Theoretical Motivations

1. Law-Following in Principal-Agent Relationships

As discussed above,[ref 235] AI agents can be fruitfully analyzed through principal–agent principles. Without advocating for the wholesale legal application of agency law to AI agents, reference to agency law principles can help illuminate the significance and potential of LFAI.[ref 236]

Under hornbook agency principles, an AI agent should generally “act loyally for the principal’s benefit in all matters connected with the agency relationship.”[ref 237] This generally includes a duty to obey instructions from the principal.[ref 238]

Crucially, however, this general duty of obedience is qualified by a higher-order duty to follow the law. Agents only have a duty to obey lawful instructions.[ref 239] Thus, “[a]n agent has no duty to comply with instructions that may subject the agent to criminal, civil, or administrative sanctions or that exceed the legal limits on the principal’s right to direct action taken by the agent.”[ref 240] “A contract provision in which an agent promises to perform an unlawful act is unenforceable.”[ref 241] Agents cannot escape personal liability for their unlawful acts on the basis of orders from their principal.[ref 242]

The basic assumption that underlies these various doctrines is that an agent lacks any independent power to perform unlawful acts.[ref 243] The law of agency was therefore created under the assumption that agents maintain an independent obligation to follow the law, and therefore remain accountable for their violations of law. This assumption shaped agency law so as to prevent principals from unjustly benefitting by externalizing harms produced as a byproduct of the agency relationship.[ref 244] This feature of agency law helps establish both a baseline to which we can compare the world of AI agents in the absence of law-following constraints, and provides a normative justification for requiring AI agents to prioritize legal compliance over obedience to their principal.

2. Law-Following in the Design of Artificial Legal Actors

AI agents will of course not be the first artificial actor that humanity has created. Two types of powerful artificial actors—corporations and governments[ref 245]—profoundly impact our lives. When deciding how the law should respond to AI agents, it may make sense to draw lessons from the law’s response to the invention of other artificial legal actors.

A key lesson for AI agents is this: for both corporations and governments, the law does not rely solely on ex post liability to steer the actor’s behavior; it requires the actor to be law-following by design, at least to some extent. A disposition toward compliance is built into the very “architecture” of these artificial actors. AI agents may become no less important than corporations and governments in the aggregate, not least because they will be thoroughly integrated into them. Just as the law requires these other actors to be law-following by design, it should require AI agents to be LFAIs.

a. Corporations as Law-Following by Design

The law requires corporations to be law-following by design. One way it does this is by regulating the very legal instruments that bring corporations into existence: corporate charters are only granted for lawful purposes.[ref 246] While an “extreme” remedy,[ref 247] courts can order corporations to be dissolved if they repeatedly engage in illegal conduct.[ref 248] Failure to comply with legally required corporate formalities can also be grounds for involuntarily dissolving a corporate entity[ref 249] or piercing the corporate veil.[ref 250] Thus, while corporations are, as legal persons, generally obligated to obey the law, states do not only rely on external sanctions to persuade them to do so: they force corporations to be law-following in part through architectural measures, including dissolving[ref 251] corporations that break the law or refusing to incorporate those that would.

The law also forces corporations to be law-following by regulating the human agents that act on their behalf, as a matter of their fiduciary duties. Directors who intentionally cause a corporation to violate positive law breach their duty of good faith.[ref 252] Not only are corporate fiduciaries required to follow the law themselves, they are required to monitor for violations of law by other corporate agents.[ref 253] Moreover, human agents that violate certain laws can be disqualified from serving as corporate agents.[ref 254] These sort of “structural” duties and remedies[ref 255] are thus aimed at causing the corporation to follow the law generally and pervasively, rather than merely penalizing violations as they occur.[ref 256] That is entirely sensible, since the state has an obvious interest in preventing the creation of new artificial entities that then go on to disregard its laws, especially since it cannot easily monitor many corporate activities. Whether a powerful and potentially difficult-to-monitor AI agent is generally disposed toward lawfulness will be similarly important. There is a parallel case, therefore, for requiring the principals of AI agents to demonstrate that their agents will be law-following.[ref 257]

b. Governments as Law-Following by Design

“Constitutionalism is the idea . . . that government can and should be legally limited in its powers, and that its authority or legitimacy depends on its observing these limitations.”[ref 258] While we sometimes rely on ex post liability to deter harmful behavior by government actors,[ref 259] the design of the government—through the Constitution,[ref 260] statutory provisions, and longstanding practice—is the primary safeguard against lawless government action.

Examples abound. The general American constitutional design of separated powers, supported by interbranch checks and balances, plays an important role in preventing the government from exercising arbitrary power, thereby confining the government to its constitutionally delimited role.[ref 261] This system of multiple, independent veto points yields concrete protections for personal liberty, such as by making it difficult for the government to lawlessly imprison people.[ref 262]

Governments, like corporations, act only through their human agents.[ref 263] As in the corporate case, governmental design forces the government to follow the law in part by imposing law-following duties on the agents through whom it acts. The Constitution imposes a duty on the President to “take Care that the Laws be faithfully executed.”[ref 264] As discussed above, soldiers have a duty to disobey some unlawful orders, even from the Commander in Chief.[ref 265] Civil servants also have a right to refuse to follow unlawful orders, though the exact nature and extent of this right is unclear.[ref 266]

We saw above that, in the corporate case, the law uses disqualification of law-breaking agents to ensure that corporations are law-following.[ref 267] The law also uses disqualification to ensure that the government acts only through law-following agents, ranging from the highest levels of government to lower-level bureaucrats and employees. The Constitution empowers Congress to remove and disqualify officers of the United States for “high Crimes and Misdemeanors” through the impeachment process.[ref 268] Each house of Congress may expel its own members for “disorderly Behaviour.”[ref 269] This power “has historically involved either disloyalty to the United States Government, or the violation of a criminal law involving the abuse of one’s official position, such as bribery.”[ref 270] While there is no blanket rule disqualifying persons with criminal records from federal government jobs,[ref 271] numerous laws disqualify convicted individuals in more specific circumstances.[ref 272] Convicted felons are also generally ineligible to be employed by the Federal Bureaue of Investigation[ref 273] or armed forces,[ref 274] and usually cannot obtain a security clearance.[ref 275]

These design choices encode a commonsense judgment that those who cannot be trusted to follow the law should not be entrusted to wield the extraordinary power that accompanies certain government jobs, especially unelected positions associated with law enforcement, the military, and the intelligence community. If AI agents were to wield similar power and influence, the case for requiring them to be law-following by design is similarly strong.

3. The Holmesian Bad Man and the Internal Point of View

Our distinction between AI henchmen and LFAIs mirrors a distinction in jurisprudence about possible attitudes toward legal obligations.[ref 276] An AI henchman treats legal obligations much as the “bad man” does in Oliver Wendell Holmes Jr.’s classic The Path of the Law:

If you want to know the law and nothing else you must look at it as a bad man, who cares only for the material consequences which such knowledge enables him to predict, not as a good one, who finds his reasons for conduct, whether inside the law or outside of it, in the vaguer sanctions of conscience.[ref 277]

That is, under some interpretations,[ref 278] Holmes’ bad man treats the law merely as a set of incentives within which he pursues his own self-interest.[ref 279] Like the bad man, an AI henchman would care about the law, but only insofar as it enables it to predict how state power is likely to be wielded against the interests of its principal.[ref 280] Like the bad man,[ref 281] if the AI henchman predicts that the expected harms of violating the law are less than the expected benefits, it will do so. But it will not follow the law otherwise.

Fortunately, the bad man is not the only possible model for AI agents’ attitudes toward the law. One alternative to the bad man view of the law is H.L.A. Hart’s “internal point of view.”[ref 282] “The internal point of view is the practical attitude of rule acceptance—it does not imply that people who accept the rules accept their moral legitimacy, only that they are disposed to guide and evaluate conduct in accordance with the rules.”[ref 283] Whether AIs can have the mental states necessary to truly take the internal point of view is of course contested.[ref 284] But regardless of their mental state (if any), AI agents can be designed to act similarly to someone who thinks that “the law is not simply sanction-threatening, -directing, or -predicting, but rather obligation-imposing,”[ref 285] and is thus disposed to “act[] according to the dictates of the [law].”[ref 286] An AI agent can be designed to be more rigorously law-following than the bad man.[ref 287]

Real life is of course filled with people who are “bad” or highly imperfect. But bad AI agents are not similarly inevitable. AI agents are human-designed artifacts. It is open to us to design their behavioral dispositions to suit our policy goals, and to refuse to deploy agents that do not meet those goals.

C. Concrete Benefits

1. Law-Following AI Prevents Abuses of Government Power

As we have discussed,[ref 288] the law makes the government follow the law (and thus prevents abuses of government power) in part by compelling government agents to follow the law. If the government comes to rely heavily on AI agents for cognitive labor, then, the law should also require those agents to follow the law.

Depending on their assigned “roles,” government AI agents could wield significant power. They may have authority to initiate legal processes against individuals (including subpoenas, warrants, indictments, and civil actions), access sensitive governmental information (including tax records and intelligence), hack into protected computer systems, determine eligibility for government benefits, operate remote-controlled vehicles like military drones,[ref 289] and even issue commands to human soldiers or law enforcement officials.

These powers present significant opportunities for abuse, which is why preventing lawless government action was a motivation for the American Revolution,[ref 290] a primary goal of the Constitution, and a foundational American political value. We must therefore carefully examine whether existing safeguards designed to constrain human government agents would effectively limit AI agents in the absence of the law-following design constraints. While our analysis here is necessarily incomplete, we think it provides some reason for doubting the adequacy of existing safeguards in the world of AI agents.

When a human government agent, acting in her official capacity, violates an individual’s rights, she can face a variety of ex post consequences. If the violation is criminal, she could face severe penalties.[ref 291] This “threat of criminal sanction for subordinates [i]s a very powerful check on executive branch officials.”[ref 292] The threat of civil suits seeking damages, such as through Section 1983[ref 293] or a Bivens action,[ref 294] might also deter her, though various immunities and indemnities will often protect her,[ref 295] especially if she is a federal officer.[ref 296]

These checks will not exist in the case of AI henchmen. In the absence of law-following constraints, an AI henchman’s primary reason to obey the law will be its desire to keep its principal out of trouble.[ref 297] The henchman will thus lack one of the most powerful constraints on lawless behavior in humans: fear of personal ex post liability.

Most of us would rightfully be terrified of a government staffed by agents whose only concern was whether their bosses would suffer negative consequences as a result—a government staffed by Holmesian bad men loyal only to their principals.[ref 298] A basic premise of American constitutionalism[ref 299] and rule of law principles more generally[ref 300] is that government officials act legitimately only when they act pursuant to powers granted to them by the People through law, and obey the constraints attached to those powers. Treating law as a mere incentive system is repugnant to the proper role of government agents:[ref 301] being a “servant” of the People[ref 302] “faithfully discharg[ing] the duties of [one’s] office.”[ref 303]

This is not just a matter of high-minded political and constitutional theory. An elected head of state aspiring to become a dictator would need the cooperation of the sources of hard power in society—military, police, other security forces, and government bureaucracy—to seize power. At present, however, these organs of government are staffed by individuals, who may choose not to go along with the aspiring dictator’s plot.[ref 304] Furthermore, in an economy dependent on diffuse economic activity, resistance by individual workers could reduce the economic upsides from a coup.[ref 305] This reliance on a diverse and imperfectly loyal human workforce, both within and outside of government, is a significant safeguard against tyranny.[ref 306] However, replacement of human workers with loyal AI henchmen would seriously weaken this safeguard, possibly easing the aspiring tyrant’s path to power.[ref 307]

Nor is the importance of LFAI limited to AI agents acting directly at the request of high-level officials. It extends to the vast array of lower-level state and federal officials who wield enormous power over ordinary citizens, including particularly powerless ones. Take prisons, which “can often seem like lawless spaces, sites of astonishing brutality where legal rules are irrelevant.”[ref 308] Prison law arguably constrains official abuse far less than it should. Nevertheless, “prisons are intensely legal institutions,” and “people inside prisons have repeatedly emphasize[] that legal rules have significant, concrete effects on their lives.”[ref 309] Even imperfect enforcement of the legal constraints on prison officials can have demonstrable effects.[ref 310] However bad the existing situation may be, diluting or gutting the efficacy of these constraints threatens to make the situation dramatically worse.

The substitution of AI agents for (certain) prison officials could have precisely this effect. Here is just one example. The Eighth Amendment forbids prison officials from withholding medical treatment from prisoners in a manner that is deliberately indifferent to their serious medical needs.[ref 311] Suppose that a state prisoner needs to take a dose of medicine each day for a month, or his eyesight will be permanently damaged. The prisoner says something disrespectful to a guard. The warden wishes to make an example of the prisoner, so she fabricates a note from the prison physician directing the prison pharmacist to withhold further doses of the medicine. The prisoner is denied the medicine. He tries to reach his lawyer to get a temporary restraining order, but the lawyer cannot return his call until the next day. As a result, the prisoner’s eyesight is permanently damaged.

Let us assume that the state has strong state-level sovereign immunity under its own laws, meaning that the prisoner cannot sue the state directly.[ref 312] Under the status quo, the prisoner can still sue the warden for damages under 42 U.S.C. § 1983, for violating his clearly established constitutional right.[ref 313] Given the widespread prevalence of official indemnification agreements at the state level,[ref 314] the state will likely indemnify the warden, even though the state itself cannot be sued for damages under Section 1983[ref 315] or its own laws. The prisoner is therefore likely to receive monetary damages.

But now replace the human warden with an AI agent charged with administering the prison by issuing orders directly to prison personnel through some digital interface. If this “AI warden” did the same thing, the prisoner would have direct redress against it, since it is not a “person” under Section 1983[ref 316] (or, indeed, any law). Nor will the prisoner have indirect recourse against the state, by way of an indemnification agreement, because there is no underlying tort liability for the state to indemnify. Nor will the prisoner have redress against the medical personnel, since the AI warden deceived them into withholding treatment.[ref 317] And we have already assumed that the state itself has sovereign immunity. Thus, the prisoner will find himself without any avenue of redress for the wrong he has suffered—the introduction of an artificial agent in the place of a human official made all the difference.

What is the right response to these problems? Many responses may be called for, but one of them is to ensure that only law-following AI agents can serve in such a role. As previously discussed, the law disqualifies certain lawbreakers from many government jobs. Similarly, we believe, the law should disqualify AI agents that are not demonstrably rigorously law-following from certain government roles. We discuss how this disqualification might be enforced, more concretely, in Part V.

There is, however, another possible response to these challenges: perhaps we should “just say no” and prohibit governments from using AI agents at all, or at least severely curtail their use.[ref 318] We do not here take a strong position on when this would be the correct approach all-things-considered. At a minimum, however, we note a few reasons for skepticism of such a restrictive approach.

The first is banal: if AI agents can perform computer-based tasks well, then their adoption by the government could also deliver considerable benefits to citizens.[ref 319] Reducing the efficiency of government administration for the sake of preventing tyranny and abuse may be worth it in some cases, and is indeed the logic of the individual rights protections of the Constitution.[ref 320] But tailoring a safeguard to allow for efficient government administration, is, all else equal, preferable to a blunter, more restrictive safeguard. LFAI may offer such a tailored safeguard.

The second reason is that adoption of AI agents by governments may become more important as AI technology advances. Some of the most promising AI safety proposals involve using trusted AI systems to monitor untrusted ones.[ref 321] The central reason is this: as AI systems become more capable, unassisted humans will not be able to reliably evaluate whether the AIs’ actions are desirable.[ref 322] Assistance from trusted AI systems could thus be the primary way to scale humans’ ability to oversee untrusted AI systems. Thus, if the government is to oversee the behavior of new and untrusted private-sector AI systems so as to ensure their safety, it may need to employ AI agents to assist it.

Even if the government does not need to rely on AI agents to administer AI safety regulation (for example, because such AI overseers are employed by private companies, not the government), the government will likely need to employ AI agents to help it keep up with competitive pressures. Even if the federal government hesitates to adopt AI agents to increase its efficiency, foreign competitors might show no such qualms. If so, the federal government might then feel little choice but to do the same.

In the face of these competing demands, LFAI offers a plausible path to enable the adoption of AI agents in governmental domains with a high potential for abuse (e.g., the military, intelligence, law enforcement, prison administration) while safeguarding life, liberty, and the rule of law. LFAI can also transform the binary question of whether to adopt AI agents into the more multidimensional question of which laws should constrain them.[ref 323] This should allow for more nuanced policymaking, grounded in the existing legal duties of government agents.

2. Law-Following AI Enables Scalable Enforcement of Public Law

AI agents could cause a wide variety of harms. The state promulgates and enforces public law prohibitions—both civil and criminal—to prevent and remedy many of these harms. If the state cannot safely assume that AI agents will reliably follow these prohibitions, the state might need to increase the resources dedicated to law enforcement.

LFAI offers a way out of this bind. Insofar as AI agents are reliably law-following, the state can trust that significantly less law enforcement is needed.[ref 324] This dynamic would also have broader beneficial implications for the structure and functioning of government. “If men were angels, no government would be necessary.”[ref 325] LFAIs would not be angels,[ref 326] but they would be a bit more angelic than many humans. Thus, as a corollary of Publius’ insight, we may need less government to oversee LFAIs’ behavior than we would need for a human population of equivalent size. State resources that would otherwise be spent on investigating and enforcing the laws against AI agents could thus be redirected to other problems or refunded to the citizenry.

LFAI would also curtail some of the undesirable side effects and opportunities for abuse inherent in law enforcement. Law enforcement efforts inherently involve some intrusion into the private affairs and personal freedoms of citizens.[ref 327] If the government could be more confident that AI agents were behaving lawfully, it would have less cause to surveil or investigate their behavior, and thereby impose fewer[ref 328] burdens on citizens’ privacy. Reducing the occasion for investigations and searches would also create fewer opportunities for abuse of private information.[ref 329] In this way, ensuring reliably law-following AI might significantly mitigate the frequency and severity of law enforcement’s intrusions on citizens’ privacy and liberty.

IV. Law-Following AI as AI Alignment

The field of AI alignment aims to ensure that powerful, general-purpose AI agents behave in accordance with some set of normative constraints.[ref 330] AI systems that do not behave in accordance with such constraints are said to be “misaligned” or “unaligned.” Since the law is a set of normative constraints, the field of AI alignment is highly relevant to LFAI.[ref 331]

The most basic set of normative constraints to which an AI could be aligned is the “informally specified”[ref 332] intent of its principal.[ref 333] This is called “intent-alignment.”[ref 334] Since individuals’ intentions are a mix of morally good and bad to varying degrees, some alignment work also aims to ensure that AI systems behave in accordance with moral constraints, regardless of the intentions of the principal.[ref 335] This is called “value-alignment.”[ref 336]

AI alignment work is valuable because, as shown by theoretical arguments[ref 337] and empirical observations,[ref 338] it is difficult to design AI systems that reliably obey any particular set of constraints provided by humans.[ref 339] In other words, nobody knows how to ensure that AI systems are either intent-aligned or value-aligned,[ref 340] especially for smarter-than-human systems.[ref 341] This is the Alignment Problem.[ref 342] The Alignment Problem is especially worrying for AI systems that are agentic and goal-directed,[ref 343] as such systems may wish to evade human oversight and controls that could frustrate pursuit of those goals, such as by deceiving their developers,[ref 344] accumulating power and resources[ref 345] (including by making themselves smarter),[ref 346] and ultimately resisting efforts to correct their behavior or halt further actions.[ref 347]

There is a sizable literature arguing that these dynamics imply that misaligned AI agents pose a nontrivial risk to the continued survival of humanity.[ref 348] The case for LFAI, however, in no way depends on the correctness of these concerns: the specter of widespread lawless AI action should be sufficient on its own to motivate LFAI. Nevertheless, the alignment literature produces several valuable insights for the pursuit of LFAI.

A. AI Agents Will Not Follow the Law by Default

The alignment literature suggests that there is a significant risk that AI agents will not be law-following by default. This is a straightforward implication of the Alignment Problem. To see how, imagine a morally upright principal who intends that his AI agent rigorously follows the law. If the AI agent was intent-aligned, the agent would therefore follow the law. But the fact that intent-alignment is an unsolved problem implies that there is a significant chance that that agent would not be aligned with the principal’s intentions, and therefore violate the law. Put differently, unaligned AIs may not be controllable,[ref 349] and uncontrollable AIs may break the law. Thus, as long as intent-alignment remains an unsolved technical problem, there will be a significant risk that AI agents will be prone to lawbreaking behavior.

To be clear, the main reason that there is a significant risk that AI agents will not be law-following by default is not that people will not try to align AI agents to law (although that is also a risk).[ref 350] Rather, the main risk is that current state-of-the-art alignment techniques do not provide a strong guarantee that advanced AI agents will be aligned, even when they are trained with those techniques. There is a clear empirical basis for this claim, which is that those alignment techniques frequently fail in current frontier models.[ref 351] There are also theoretical limitations to existing techniques for smarter-than-human systems.[ref 352]

A related implication of the alignment literature is that even intent-aligned AI agents may not follow the law by default. Again, we can see this by hypothesizing an intent-aligned AI agent and a human principal who wants the AI agent to act as her henchman. Since an intent-aligned AI agent follows the intent of its principal, this intent-aligned agent would act as a henchman, and thus act lawlessly when doing so serves the principal’s interests.[ref 353] In typical alignment language, intent-alignment still leaves open the possibility that principals will misuse their intent-aligned AI.[ref 354]

None of this is to imply that intent-alignment is undesirable. Solving intent-alignment is the primary focus of the alignment research community[ref 355] because it would ensure that AI agents remain controllable by human principals.[ref 356] Intent-alignment is also generally assumed to be easier than value-alignment.[ref 357] And if principals want their AI agents to follow the law, or behave ethically more broadly, then intent-alignment will produce law-following or ethical behavior. But in a world where principals will range from angels to devils, alignment researchers acknowledge that intent-alignment alone is insufficient to guarantee that AI agents act lawfully, or produce good effects in the world.[ref 358] This brings us to the next important set of implications from the alignment literature.

B. Law-Alignment is More Legitimate than Value-Alignment

LFAIs are generally intent-aligned—they are still loyal to their principals—but are also subject to a side-constraint that they will follow the law while advancing the interests of their principals. Extending the typical alignment terminology, we can call this side-constraint law-alignment.[ref 359]

But the law is not the only side-constraint that can be imposed on intent-aligned AIs. As alluded to above, another possible model is value-alignment. Value-aligned AI agents act in accordance with the wishes of their principals, but are subject to ethical side-constraints, usually imposed by the model developer.

However, value-alignment can be controversial when it causes AI models to override the lawful requests of users. Perhaps the most well-known example of this is the controversy around Google’s Gemini image-generation AI in early 2024. In an attempt to increase the diversity in outputted pictures,[ref 360] Gemini ended up failing in clear ways, such as portraying “1943 German soldiers” as racially diverse, or refusing to generate pictures of a “white couple” while doing so for couples of other races.[ref 361]

This incident led to widespread concern that the values exhibited by generative AI products were biased towards the predominantly liberal views of these companies’ employees.[ref 362] This concern has been vindicated by empirical research consistently finding that the espoused political views of these AIs indeed most closely resemble those of the center-left.[ref 363] Critics from further left have also frequently raised similar concerns about demographic and ideological biases in AI systems.[ref 364]

Some critics concluded from the Gemini incident that alignment work writ large has become a Trojan Horse for covertly pushing the future of AI in a leftward direction.[ref 365] Those who disagree with progressive political values will naturally find this concerning, given the importance that AI might have in the future of human communication[ref 366] and the highly centralized nature of large-scale AI development and deployment.[ref 367]

In a pluralistic society, it is inevitable and understandable that, when a sociotechnical system reflects the values of one faction, competing factions will criticize it. But alignment, as such, is not the right target of such criticisms. Intent-alignment is value-neutral, concerning itself only with the extent to which an AI agent obeys its principal.[ref 368] Reassuringly for those concerned with ideological bias in AI systems, intent-alignment is also the primary focus of the alignment community, since solving intent-alignment is necessary to reliably control AI systems at all.[ref 369] A large majority of Americans from all political backgrounds agree that AI technologies need oversight,[ref 370] and overseeing unaligned systems is much more difficult than overseeing aligned ones. Indeed, even the critics of alignment work tend to assume—contrary to the views of many alignment researchers—that AI agents will be easy to control,[ref 371] and presumably view this result as desirable.

Furthermore, some amount of alignment is also necessary to make useful AI products and services. Consumers, reasonably, want to use AI technologies that they can reliably control. Today’s leading chatbots—like Claude and ChatGPT—are only helpful to users due to the application of alignment techniques like RLHF[ref 372] and Constitutional AI.[ref 373] AI developers also use alignment techniques to instill uncontroversial (and user-friendly) behaviors into their AI systems, such as honesty.[ref 374] AI companies are also already using alignment techniques to prevent their AI systems from taking actions that could cause them or their customers to incur unnecessary legal liability.[ref 375] In short, then, some degree of alignment work is necessary to make AI products useful in the first place.[ref 376] To adopt a blanket stance against alignment because of the Gemini incident is thus not only unjustified,[ref 377] but also likely to undermine American leadership in AI.

Nevertheless, it is reasonable for critics to worry about and contest the frameworks by which potentially controversial values are instilled into AI systems. AI developers are indeed a “very narrow slice of the global population.”[ref 378] This is something that should give anyone, regardless of political persuasion, pause.[ref 379] But intent-alignment is not enough, either: it is inadequate to prevent a wide variety of harms that the state has an interest in preventing.[ref 380] So we need a form of alignment that is more normatively constraining than intent-alignment alone, but more legitimate, and more appropriate for our pluralistic society, than alignment to values that AI developers choose themselves.

Law-alignment fits these criteria.[ref 381] While the moral legitimacy of the law is not perfect, in a republic it nevertheless has the greatest legitimacy of any single source or repository of values.[ref 382]Indeed, “the framers [of the U.S. Constitution] insisted on a legislature composed of different bodies subject to different electorates as a means of ensuring that any new law would have to secure the approval of a supermajority of the people’s representatives,”[ref 383] thus ensuring that new laws are “the product of widespread social consensus.”[ref 384] In our constitutional system of government, laws are also subject to checks and balances that protect fundamental rights and liberties, such as judicial review for constitutionality and interpretation by an independent judiciary.

Aligning to law also has procedural virtues over value-alignment. First, there is widespread agreement on the authoritative sources of law (e.g., the Constitution, statutes, regulations, case law), much more so than for ethics. Relatedly, legal rules tend to be expressed much more clearly than ethical maxims. Although there is of course considerable disagreement about the content of law and the proper forms of legal reasoning, it is nevertheless much easier and less controversial to evaluate the validity of legal propositions and arguments than to evaluate the quality or correctness of ethical reasoning.[ref 385] Moreover, when there is disagreement or unclarity, the law contains established processes for authoritatively resolving disputes over the applicability and meaning of laws.[ref 386] Ethics contains no such system.

We therefore suggest that law-alignment, not value-alignment, should be the primary focus when something beyond intent-alignment is needed.[ref 387] Our claim, to be clear, is not that law-alignment alone will always prove satisfactory, or that it should be the sole constraint on AI systems beyond intent-alignment, or that AI agents should not engage in moral reasoning of their own.[ref 388] Rather, we simply argue that more practical and theoretical alignment research should be aimed at building AI systems aligned to law.

V. Implementing and Enforcing Law-Following AI

We have argued that AI agents should be designed to follow the law. We now turn to the question of how public policy can support this goal. Our investigation here is necessarily preliminary; our aim is principally to spur future research.

A. Possible Duties Across the AI Agent Lifecycle

As an initial matter, we note that a duty to ensure that AI agents are law-following could be imposed at several stages of the AI lifecycle.[ref 389] The law might impose duties on persons:

Developing AI agents;
Possessing[ref 390] AI agents;
Deploying[ref 391] AI agents;[ref 392] or
Using AI agents.

After deciding which of these activities ought to be regulated, policymakers must then decide what, exactly, persons engaging in that activity are obligated to do. While the possibilities are too varied to exhaust here, some basic options might include commands like:

“Any person developing an AI agent has a duty to take reasonable care to ensure that such AI agent is law-following.”
“It is a violation to knowingly possess an AI agent that is not law-following, except under the following circumstances: . . . .”
“Any person who deploys an AI agent is strictly liable if such AI agent is not law-following.”
“A person who knowingly uses an AI agent that is not law-following is liable.”

Basic duties of this sort would comprise the foundational building blocks of LFAI policy. Policymakers must then choose whether to enforce these obligations ex post (that is, after an AI henchman takes an illegal action)[ref 393] or ex ante. These two choices are interrelated: as we will explore below, it may make more sense to impose ex ante liability for some activities and ex post liability for others. For example, ex ante regulation might make more sense for AI developers than civilian AI users, because the former are far more concentrated, and can absorb ex ante compliance costs more easily.[ref 394] And of course, ex ante and ex post regulation are not mutually exclusive:[ref 395] driving, for example, is regulated by a combination of ex ante policies (e.g., licensing requirements) and ex post policies (e.g., tort liability).

B. Ex Post Policies

We begin our discussion with ex post policies. Many scholars believe that ex post policies are generally preferable to ex ante policies.[ref 396] While we think that ex post policies could have an important role to play in implementing LFAI, we also suspect that they will be inadequate in certain contexts.

Enforcing duties through ex post liability rules is, of course, familiar in both common law[ref 397] and regulation.[ref 398] In the LFAI context, ex post policies would impose liability on an actor after an AI henchman over which they had some form of control violates an applicable legal duty. More and less aggressive ex post approaches are conceivable. On the less aggressive end of the spectrum, development, possession, deployment, or use of an AI henchman might be considered a breach of the tort duty of reasonable care, rendering the human actor liable for resulting injuries.[ref 399] To some extent, this may already be the case under existing tort law.[ref 400] The law might also consider extending the negligence liability of an AI developer or deployer to harms that would not typically be compensable under traditional tort principles (because, for example, they would count as pure economic loss),[ref 401] if those harms are produced by their AI agents acting in criminal or otherwise unlawful ways.[ref 402]

Other innovations in tort law may also be warranted. Several scholars have argued, for example, that the principal of an AI agent should sometimes be held strictly liable for the “torts” of that agent, under a respondeat superior theory.[ref 403] In some cases, such as when a developer has recklessly failed to ensure that its AI agent is law-following by design, punitive damages might be appropriate as well.

Moving beyond tort law, in some cases it may make sense to impose civil sanctions[ref 404] when an AI henchman violates an applicable legal duty, even if no harm results.A legislature might also impose tort liability on the developers of AI agents if those AI agents (a) are not law-following, (b) violate an applicable legal duty, and (c) thereby cause harm.[ref 405]

In order to sufficiently disincentivize the deployment of lawless AI agents in high-stakes contexts, a legislature might also vary applicable immunity rules. For example, Congress could create a distinct cause of action against the federal government for individuals harmed by AI henchmen under the control of the federal government, taking care to remove barriers that various immunity rules pose to analogous suits against human agents.[ref 406]

These and other imaginable ex post policies are important arrows in the regulatory quiver, and we suspect they will have an important role to play in advancing LFAI. Nevertheless, we would resist any suggestion that ex post sanctions are sufficient to deal with the specter of lawless AI agents.

Our reasons are multiple. In many contexts, detecting lawless behavior once an AI agent has been deployed will be difficult or costly—especially as these systems become more sophisticated and more capable of deceptive behavior.[ref 407] Proving causation may also be difficult.[ref 408] In the case of corporate actors, meanwhile, the efficacy of such sanctions may be seriously blunted by judgment-proofing and similar phenomena.[ref 409] And, most importantly for our purposes, various immunities and indemnities make tort suits against the government or its officials a weak incentive.[ref 410] These considerations suggest that it would be unwise to rely on ex post policies as our principal means for ensuring that AI agents follow the law when the risks from lawless action are particularly high.

C. Ex Ante Policies

Accordingly, we propose that, in some high-stakes contexts, the law should take a more proactive approach, by preventing the deployment of AI henchmen ab initio. This would likely require first establishing a technical means for evaluating whether an AI agent is sufficiently law-following,[ref 411] then requiring that any agents be so evaluated prior to deployment, with permission to deploy the agent being conditional on achieving some minimal score during that evaluation process.[ref 412]

We are most enthusiastic about imposing such requirements prior to the deployment of AI agents in government roles where lawlessness would pose a substantial risk to life, liberty, and the rule of law. We have discussed several such contexts already,[ref 413] but the exact range of contexts is worth carefully considering, and is certainly up for debate.

Ex ante strategies could also be used in the private sector, of course. One often-discussed approach is an FDA-like approval regulation regime wherein private AI developers would need to prove, to the satisfaction of some regulator, that their AI agents are safe prior to their deployment.[ref 414] The pro tanto case for requiring private actors to demonstrate that their AI agents are disposed to follow some basic set of laws is clear: the state has an interest in ensuring that its most fundamental laws are obeyed. But in a world of increasingly sophisticated artificial agents, approval regulation could—if not properly designed and sufficiently tailored—also constitute a serious incursion on innovation[ref 415] and personal liberty.[ref 416] If AI agents will be as powerful as we suspect, strictly limiting their possession could create risks of its own.[ref 417]

Accordingly, it is also worth considering ex ante regulations on private AI developers or deployers that stop short of full approval regulation. For example, the law could require the developers of AI agents to, at a minimum, disclose information[ref 418] about the law-following propensities of their systems, such as which laws (if any) their agents are instructed to follow,[ref 419] and any evaluations of how reliably their agents follow those laws.[ref 420] Similarly, the law could require developers to formulate and assess risk management frameworks that specify the precautionary measures they plan to undertake to ensure that the agents they develop and deploy are sufficiently law-following.[ref 421]

Overall, we are uncertain about what kinds of ex ante requirements are warranted, all things considered, in the case of private actors. To a large extent, the issue cannot be intelligently addressed without more specific proposals. Formulating such proposals is thus an urgent task for the LFAI research agenda, even if it is not, in our view, as urgent as the task of formulating concrete regulations for AI agents acting under color of law.

D. Other Strategies

The law does not police undesirable behavior solely by imposing sanctions. It also specifies mechanisms for nullifying the presumptive legal effect of actions that violate the law or are normatively objectionable. In private law, for example, a contract is voidable by a party if that party’s assent was “induced by either a fraudulent or a material misrepresentation by the other party upon which the [party wa]s justified in relying.”[ref 422] Nullification rules exist in public law, too. One obvious example is the ability of the judiciary to nullify laws that violate the federal Constitution.[ref 423] Or, to take another familiar example, courts applying the Administrative Procedure Act “hold unlawful and set aside” agency actions that are “arbitrary, capricious, an abuse of direction, or otherwise not in accordance with law.”[ref 424]

Nullification rules may provide a promising legal strategy for policing behavior by AI agents that is unlawful or normatively objectionable. Thus, in private law, if an AI henchman induces a human counterparty to enter into a disadvantageous contract, the resulting contractual obligation could be voidable by the human. In public law, regulatory directives issued by (or substantially traceable to) AI henchmen could be “h[e]ld unlawful and set aside” as “not in accordance with law.”[ref 425] These examples rely on existing nullification rules, but new nullification rules, tailor-made to address new risks from AI agents, might be warranted as well. For example, Congress could stipulate that any official action taken by or substantially traceable to an AI agent is void unless, before deployment, the agent has been shown to be law-following.

Such prophylactic nullification rules are one sort of indirect legal mechanisms for enforcing the duty to deploy law-following AIs. Indirect technical mechanisms are well worth considering, too. For example, the government could deploy AI agents that refuse to coordinate or transact with other AI agents unless those counterparty agents are verifiably law-following (for example, by virtue of having “agent IDs”[ref 426] that attest to a minimal standard of performance on law-following benchmarks).

Similarly, the government could enforce LFAI by regulating the hardware on which AI agents will typically operate. Frontier AI systems “run” on specialized AI chips,[ref 427] which are typically aggregated in large data centers.[ref 428] Collectively, these are referred to as “AI hardware” or simply “compute.”[ref 429] Compared to other inputs to AI development and deployment, AI hardware is particularly governable, given its detectability, excludability, quantifiability, and concentrated supply chain.[ref 430] Accordingly, a number of AI governance proposals advocate for imposing requirements on those making and operating AI hardware in order to regulate the behavior of the AI systems developed and deployed on that hardware.[ref 431]

One class of such proposals is “‘on-chip mechanisms’: secure physical mechanisms built directly into chips or associated hardware that could provide a platform for adaptive governance” of AI systems developed or deployed on those chips.[ref 432] On-chip mechanisms can prevent chips from performing unauthorized computations. One example is iPhone hardware that “enable[s] Apple to exercise editorial control over which specific apps can be installed” on the phone.[ref 433] Analogously, perhaps we could design AI chips that would not support AI agents unless those agents are certified as law-following by some private or governmental certifying body. This could then be combined with other strategies to enforce LFAI mandates: for example, perhaps Congress could require that the government only run AI agents on such chips.

Unsurprisingly, designing these sorts of enforcement strategies is as much a task for computer scientists as it is for lawyers. In the decades to come, we suspect that such interdisciplinary legal scholarship will become increasingly important.

VI. A Research Agenda for Law-Following AI

We have laid out the case for LFAI: the requirement that AI agents be designed to rigorously follow some set of laws. We hope that our readers find it compelling. However, our goal with this Article is not just to proffer a compelling idea. If we are correct about the impending risks from lawless AI agents, we may soon need to translate the ideas in this Article into concrete and viable policy proposals.

Given the profound changes that widespread deployment of AI agents will bring, we are under no illusions about our ability to design perfect public policy in advance. Our goal, instead, is to enable the design of “minimally viable LFAI policy”:[ref 434] a policy or set of policies that will prevent some of the worst-case outcomes from lawless AI agents, without completely paralyzing the ability of regulated actors to experiment with AI agents. This minimally viable LFAI policy will surely be flawed in many ways, but with many of the worst-case outcomes prevented, we will hopefully have time as a society to patch remaining issues through the normal judicial and legislative means.

To that end, in this Part we briefly identify some legal questions that would need to be answered to design minimally viable LFAI policies.

1. How should “AI agent” be defined?

Our definition of “full AI agent,”—an AI system “that can do anything a human can do in front of a computer”[ref 435]—is almost certainly too demanding for legal purposes, since an AI agent that can do most but not all computer-based tasks that a human can do would likely still raise most of the issues that LFAI is supposed to address. At the same time, the fact that a wide range of existing AI systems can be regarded as somewhat agentic[ref 436] means that a broad definition of “AI agent” could render relevant regulatory schemes substantially overinclusive. Different definitions are therefore necessary for legal purposes.[ref 437]

2. Which laws should an LFAI be required to follow?

Obedience to some laws is much more important than obedience to other laws. It is much more important that AI agents refrain from murder and (if acting under color of law) follow the Constitution than that they refrain from jaywalking. Indeed, requiring LFAIs to obey literally every law may very well be overly burdensome.[ref 438] In addition, we will likely need new laws to regulate the behavior of AI agents over time.

3. When an applicable law has a mental state element, how can we adjudicate whether an AI agent violated that law?

We discuss this question in Section II.B, above. It is related to the previous question, for there may be conceptual or administrative difficulties in applying certain kinds of mental state requirements to AI agents. For example, in certain contexts, it may be more difficult to determine whether an AI agent was “negligent” than to determine whether it had a relevant “intent.”

4. How should an LFAI decide whether a contemplated action is likely to violate the law?

An LFAI refrains from taking actions that it believes would violate one of the laws that it is required to follow. But of course, it is not always clear what the law requires. Furthermore, we need some way to tell whether an AI agent is making a good faith effort to follow a reasonable interpretation of the law, rather than merely offering a defense or rationalization. How, then, should an LFAI reason about what its legal obligations are?

Perhaps it should just rely on its own considered judgment, on the basis of its first-order reasoning about the content of applicable legal norms. But in certain circumstances, at least, an LFAI’s appraisal of the relevant materials might lead it to radically unorthodox legal conclusions—and a ready disposition to act on such conclusions might significantly threaten the stability of the legal order. In other cases, an LFAI might conclude that it is dealing with a case in which the law is not only “hard” to discern but genuinely indeterminate.[ref 439]

Another intuitively appealing option, therefore, might require an LFAI to follow its prediction of what a court would likely decide.[ref 440] This approach has the benefit of tying an LFAI’s legal decision-making to an existing human source of interpretative authority. Courts provide authoritative resolutions to legal disputes when the law is controversial or indeterminate. And in our legal culture, it is widely (if not universally) accepted that “[i]t is emphatically the province and duty of the judicial department to say what the law is,”[ref 441] such that judicial interpretations of the law are entitled to special solicitude by conscientious participants in legal practice, even when they are not bound by a court judgment.[ref 442]

However, a predictive approach would have important practical limitations.[ref 443] Perhaps the most important is the existence of many legal rules that bind the executive branch but are nevertheless “unlikely ever to come before a court in justiciable form.”[ref 444] It would seem difficult for an LFAI to reason about such questions using the prediction theory of law.

Even for those questions that could be decided by a court, using the prediction theory of law raises other important questions. For example, what is the AI agent allowed to assume about its own ability to influence the adjudication of legal questions? We should not want it to be able to consider that it could bribe or intimidate judges or jurors, nor that it could illegally hide evidence from the court, nor that it could commit perjury, nor that it could persuade the President to issue it a pardon.[ref 445] These may be means of swaying the outcome of a case, but they do not seem to bear on whether the conduct would actually be legal.

The issues here are difficult, but perhaps not insurmountable. After all, there are other contexts in which something like these issues arise. Consider federal courts sitting in diversity applying state substantive law. When state court decisions provide inconclusive evidence as to the correct answer under state law, federal courts will make an “Erie guess” about how the state’s highest court would rule on the issue.[ref 446] It would clearly be inappropriate for such courts to, for example, make an Erie guess for reasons like “Justice X in the State Supreme Court, who’s the swing justice, is easily bribed . . . .”[ref 447] If an LFAI’s decision-making should sometimes involve “predicting” how an appropriate court would rule, its predictions should be similarly constrained.

5. In what contexts should the law require that AI agents be law-following?

Should all principals be prohibited from employing non-law-following AI agents? Or should such prohibitions be limited to particular principals, such as government actors?[ref 448] Or perhaps only government actors performing particularly sensitive government functions?[ref 449] In the other direction, should it be illegal to even develop or possess AI henchmen? We discuss various options in Part V, above.

6. How should a requirement that AI agents be law-following be enforced?

We discuss various options in Part V, above. As noted there, we think that reliance on ex post enforcement alone would be unwise at least in the case of AI agents performing particularly sensitive government functions.

7. How rigorously should an LFAI follow the law?

That is, when should an AI agent be capable of taking actions that it predicts may be unlawful? The answer is probably not “never,” at least with respect to some laws. We generally do not expect perfect compliance with every law,[ref 450] especially (but not only) because it can be difficult to predict how a law will apply to a given fact pattern. Furthermore, some amount of disobedience is likely necessary for the evolution of legal systems.[ref 451]

8. Would requiring AI agents controlled by the executive branch to be LFAIs impermissibly intrude on the President’s authority to interpret the law for the executive branch?

The President has the authority to promulgate interpretations of law that are binding on the executive branch (though that power is usually delegated to the Attorney General and then further delegated to the Office of Legal Counsel).[ref 452] Would that authority be incompatible with a law requiring the Executive Branch to deploy LFAIs that would, in certain circumstances, refuse to follow an interpretation of the law promulgated by the President?

9. Does the First Amendment limit the ability of LFAI to prohibit AI agents from advising on lawbreaking activity?

For example, would it be constitutionally permissible to prohibit an LFAI from advising on how to carry out a crime under the theory that such advising would either constitute conspiracy or incitement?

10. How can we design LFAIs and surrounding governance systems to enable the rapid discovery and remediation of loopholes or gaps in the law?

The worry here is that LFAIs, by design, will have strong incentives to discover legal ways to accomplish their goals. This may entail discovering gaps in the law that lawmakers would likely want to correct if they were aware of them, then “exploiting” those gaps before they can be “patched.”[ref 453]

11. How can we design LFAIs and surrounding governance systems to avoid excessive concentration of power?

For example, imagine that a single district court judge could change the interpretation of law as against all LFAIs. As the stakes of AI agent action rise, so will the pressure on the judiciary to wield its power to shape the behavior of LFAIs. Even assuming that all judges will continue to operate in good faith and be well-insulated from illegal or inappropriate attempts to bias their rulings, such a system would amplify any idiosyncratic legal philosophies of individual judges and may enable mistaken rulings to cause more harm than a more decentralized system would.

As an example of how such problems might be avoided, perhaps any disputes about the law governing LFAIs should be resolved in the first instance by a panel of district court judges randomly chosen from around the country. Congress has established a procedure for certain election law cases to be heard by three-judge panels, “in recognition of the fact that ‘such cases were ones of “great public concern” that require an unusual degree of “public acceptance.”’”[ref 454]

12. How can we avoid LFAIs being used for repression by authoritarian governments?

The worry here is that any AI system that rigorously follows the laws in an autocracy may become a potent tool for repression, as it could prevent people from engaging in acts of resistance or serve as a tool for mandatory surveillance and reporting of dissident activity. In other words, LFAI promotes rule of law in a republic, but in an autocracy, it may promote rule by law.

13. How can we design LFAI requirements for governments that nevertheless enable rapid adaptation of AI agents in government?

Perhaps the most significant objection to our proposal that AI agents be demonstrably law-following before their deployment in government is that such a requirement might hurt state capacity by unduly impeding the government’s ability to adopt AI in a sufficiently rapid fashion.[ref 455] We are optimistic that LFAI requirements can be designed to adequately address this concern, but that is, of course, work that remains to be done.

Conclusion

The American political tradition aspires to maintain a legal system that stands as an “impenetrable bulwark”[ref 456] against all threats—public and private, foreign and domestic—to our basic liberties. For all of the inadequacies of the American legal order, ensuring that its basic protections endure and improve over the decades and centuries to come is among our most important collective responsibilities.

Our world of increasingly sophisticated AI agents requires us to reimagine how we discharge this responsibility. Humans will no longer be the sole entities capable of reasoning about and conforming to the law. Human and human entities are no longer, therefore, the sole appropriate target of legal commands. Indeed, at some point, AI agents may overtake humans in their capacity to reason about the law. They may also rival and overtake us in many other competencies, becoming an indispensable cognitive workforce. In the decades to come, our social and economic world may be bifurcated into parallel populations of AI agents collaborating, trading, and sometimes competing with human beings and one another.

The law must evolve to recognize this emerging reality. It must shed its operative assumption that humans are the only proper objects of legal commands. It must expect AI agents to obey the law at least as rigorously as it expects humans to—and expect humans to build AI agents that do so. If we do not transform our legal system to achieve these goals, we risk a political and social order in which our ultimate ruler is not the law,[ref 457] but the person with the largest army of AI henchmen under her control.

How to design AI whistleblower legislation

Key takeaways

The most important existing whistleblower protection law for employees at frontier AI companies is California Labor Code § 1102.5, which protects California workers from being fired or otherwise retaliated against for reporting violations of any law or regulation to the government or internally within their company.
However, there are gaps in that statute that should be addressed by future federal and/or state AI whistleblower legislation.
- Most importantly, the California statute doesn’t protect whistleblowers who disclose information about serious risks to public safety that don’t involve a violation of any existing law.
Additionally, frontier AI companies can neutralize whistleblower statutes by requiring employees to sign broad nondisclosure agreements—unless the statute in question includes a provision stating that such agreements are unenforceable.
Lawmakers should strike a balance between protecting companies’ legitimate interest in protecting their valuable trade secrets and protecting the public’s interest in public safety and effective law enforcement.
Key decision points for lawmakers designing AI whistleblower legislation include:
- Whether to establish a reporting process for disclosures (e.g. a government office charged with securely handling whistleblower disclosures or a designated hotline)
- How broad the scope of a statute’s protections should be—who should be covered, and for what kinds of disclosures
- Whether to prohibit contractual waivers of whistleblower protections, e.g. in nondisclosure agreements

If you follow the public discourse around AI governance at all (and, since you’re reading this, the odds of that are pretty good), you may have noticed that people tend to gravitate towards abstract debates about whether “AI regulation,” generally, is a good or a bad idea. The two camps were at each other’s throats in 2024 over California SB 1047, and before that bill was vetoed it wasn’t uncommon to see long arguments, ostensibly about the bill, that contained almost zero discussion of any of the actual things that the bill did.

That’s to be expected, of course. Reading statutes cover-to-cover can be a boring and confusing chore, especially if you’re not a lawyer, and it’s often reasonable to have a strong opinion on the big-picture question (“is frontier AI regulation good?”) without having similarly confident takes about the fine details of any specific proposal. But zooming in and evaluating specific proposals on their own merits has its advantages—not the least of which is that it sometimes reveals a surprising amount of consensus around certain individual policy ideas that seem obviously sensible.

One such idea is strengthening whistleblower protections for employees at frontier AI companies. Even among typically anti-regulation industry figures, whistleblower legislation has proven less controversial than one might have expected. For example, SB 53, a recent state bill that would expand the scope of the protection offered to AI whistleblowers in California, has met with approval from some prominent opponents of its vetoed predecessor, SB 1047. The Working Group on frontier AI that Governor Newsom appointed after he vetoed SB 1047 also included a section on the importance of protecting whistleblowers in its draft report.

There also seems to be some level of potential bipartisan support for whistleblower protection legislation at the federal level. Federal AI legislation has been slow in coming; hundreds of bills have been proposed, but so far nothing significant has actually been enacted. Whistleblower laws, which are plausibly useful for mitigating a wide variety of risks, minimally burdensome to industry, and easy to implement and enforce, seem like a promising place to start. And while whistleblower laws have sometimes been viewed in the past as Democrat-coded pro-labor measures, the increase in conservative skepticism of big tech companies in recent years and the highly public controversy regarding the restrictive contracts that OpenAI pressured departing employees to sign in 2024 seem to have given rise to some interest in protecting AI whistleblowers from the other side of the aisle as well.

Okay, so now you’re sold on the value of AI whistleblower legislation. Naturally, the next step is to join the growing chorus of voices desperately crying out for a medium-dive LawAI blog post explaining the scope of the protections that AI whistleblowers currently enjoy, the gaps that need to be addressed by future legislation, and the key decision points that state and federal lawmakers designing whistleblower statutes will confront. Don’t worry, we’re all over it.

1. What do whistleblower laws do?

The basic idea behind whistleblower protection laws is that employers shouldn’t be allowed to retaliate against employees who disclose important information about corporate wrongdoing through the proper channels. The core example of the kind of behavior that whistleblower laws are meant to protect is that of an employee who notices that his employer is breaking the law and reports the crime to the authorities. In that situation, it’s generally accepted that allowing the employer to fire (or otherwise retaliate against) the employee for blowing the whistle would discourage people from coming forward in the future. In other words, the public’s interest in enforcing laws justifies a bit of interference with freedom of contract in order to prevent retaliation against whistleblowers. Typically, the remedy available to a whistleblower who has been retaliated against is that they can sue the employer, or file an administrative complaint with a government agency, seeking compensation for whatever harm they’ve suffered—often in the form of a monetary payment, or being given back the job from which they were fired.

Whistleblowing can take many forms that don’t perfectly conform to that core example of an employee reporting some law violation by their employer to the government. For instance, the person reporting the violation might be an independent contractor rather than an employee, or might report some bad or dangerous action that didn’t technically violate the law, or might report their information internally within the company or to a media outlet rather than to the government. Whether these disclosures are protected by law depends on a number of factors.

2. What protections do AI whistleblowers in the U.S. currently have?

Currently, whistleblowers in the U.S. are protected (or, as the case may be, unprotected) by a patchwork of overlapping state and federal statutes, judicially created doctrines, and internal company policies. By default, private sector whistleblowers[ref 1] are not protected from retaliation by any federal statute, although they may be covered by state whistleblower protections and/or judicially created anti-retaliation doctrines. However, there are a number of industry- and subject-matter-specific federal statutes that protect certain whistleblowers from retaliation. For example, the Federal Railroad Safety Act protects railroad employees from being retaliated against for reporting violations of federal law relating to railroad safety or gross misuse of railroad-related federal funds; the Food Safety Modernization Act affords comparable protections to employees of food packing, processing, manufacturing, and transporting companies; and the Occupational Safety and Health Act prohibits employers generally from retaliating against employees for filing OSHA complaints.

The scope of the protections afforded by these statutes varies, as do the remedies that each statute provides to employees who have been retaliated against. Some only cover employees who report violations of federal laws or regulations to the proper authorities; others cover a broader range of whistleblowing activity, such as reporting dangerous conditions even when they don’t arise from any violation of a law or rule. Most allow employees who have been retaliated against either to file a complaint with OSHA or to sue the offending employer for damages in federal court, and a few even provide substantial financial incentives for whistleblowers who provide valuable information to the government.[ref 2]

Employees who aren’t covered by any federal statute may still be protected by their state’s whistleblower laws. In the context of the AI industry, the most important state is California, where most of the companies that develop frontier models are headquartered. California’s whistleblower protection statute is quite strong—it protects both public and private employees from retaliation for reporting violations of any state, federal, or local law or regulation to a government agency or internally within their company. It also prohibits employers from adopting any internal policies to prevent employees from whistleblowing. The recently introduced SB 53 would, if enacted, additionally protect employees and contractors working at frontier AI companies from retaliation for reporting information about “critical risk” from AI models.

Even when there are no applicable state or federal statutes, whistleblowers may still be protected by the “common law,” i.e., law created by judicial decisions rather than by legislation. These common law protections vary widely by state, but typically at a minimum prohibit employers from firing employees for a reason that contravenes a clearly established “public policy.”[ref 3] What exactly constitutes a clearly established public policy in a given state depends heavily on the circumstances, but whistleblowing often qualifies when it provides a public benefit, such as increasing public safety or facilitating effective law enforcement. However, it’s often difficult for a whistleblower (even with the assistance of a lawyer) to predict ex ante whether common law protections will apply because so much depends on how a particular court might apply existing law to a particular set of facts. Statutory protections are generally preferable because they provide greater certainty and can cover a broader range of socially desirable whistleblowing behavior.

3. Restrictions on whistleblowing: nondisclosure agreements and trade secrets

a. Nondisclosure and non-disparagement agreements

The existing protections discussed above are counterbalanced by two legal doctrines that can limit the applicability of anti-retaliation measures: the law of contracts and the law of trade secrets. Employers (especially in the tech industry) often require their employees to sign broad nondisclosure agreements that prohibit the employees from sharing certain confidential information outside of the company. It was this phenomenon—the use of NDAs to silence would-be whistleblowers—that first drew significant legislative and media attention to the issue of AI whistleblowing, when news broke that OpenAI had required departing employees to choose between signing contracts with broad nondisclosure and non-disparagement provisions or giving up their vested equity in the company. Essentially, the provisions would have required former employees to avoid criticizing OpenAI for the rest of their lives, even on the basis of publicly known facts, and even if they did not disclose any confidential information in doing so. In response to these provisions, a number of OpenAI employees and former employees wrote an open letter calling for a “right to warn about artificial intelligence” and had their lawyers write to the SEC arguing that OpenAI’s NDAs violated various securities laws and SEC regulations.

After news of the NDAs’ existence went public, OpenAI quickly apologized for including the problematic provisions in its exit paperwork and promised to remove the provisions from future contracts. But the underlying legal reality that allowed OpenAI to pressure employees into signing away their right to blow the whistle hasn’t changed. Typically, U.S. law assigns a great deal of value to “freedom of contract,” which means that mentally competent adults are usually allowed to sign away any rights they choose to give up unless the contract in question would violate some important public policy. Courts sometimes hold that NDAs are unenforceable against legitimate whistleblowers because of public policy considerations, but the existence of an NDA can be a powerful deterrent to a potential whistleblower even when there’s some chance that a court would refuse to enforce the contract.

By default, AI companies still have the power to prevent most kinds of whistleblowing in most jurisdictions by requiring employees to sign restrictive NDAs. And even companies that don’t specifically intend to prevent whistleblowing might take a “better safe than sorry” approach and adopt NDAs so broad and restrictive that they effectively deter whistleblowers. Of course, employees have the option of quitting rather than agreeing to sign, but very few people in the real world seriously consider doing that when they’re filling out hiring paperwork (or when they’re filling out departure paperwork and their employer is threatening to withhold their vested equity, as the case may be).

b. Trade secret law

Historically, frontier AI developers have often recognized that their work has immense public significance and that the public therefore has a strong interest in access to information about models. However, this interest is sometimes in tension with both the commercial interests of developers and the public’s interest in public safety. This tension is at the heart of the debate over open source vs. closed models, and it gave rise to the ironic closing-off of “OpenAI.”

The same tension also exists between the public’s interest in protecting whistleblowers and the interests of both companies and the public in protecting trade secrets. An overly broad whistleblower law that protected all employee disclosures related to frontier models would allow companies to steal model weights and algorithmic secrets from their competitors by simply poaching individual employees with access to the relevant information. In addition to being unfair, this would harm innovation in the long run, because a developer has less of an incentive to invest in research if any breakthroughs will shortly become available to its competitors. Furthermore, an overbroad whistleblower law might also actually create risks to public safety if it protected the public disclosure of information about dangerous capabilities that made it easier for bad actors or foreign powers to replicate those capabilities.

A “trade secret” is a piece of information, belonging to a company that makes reasonable efforts to keep it secret, that derives economic value from being kept secret. Wrongfully disclosing trade secrets is illegal under both state and federal law, and employees who disclose trade secrets can be sued or even criminally charged. Since 2016, however, the Defend Trade Secrets Act has provided immunity from both civil and criminal liability for disclosing a trade secret if the disclosure is made “(i) in confidence to a Federal, State, or local government official, either directly or indirectly, or to an attorney; and (ii) solely for the purpose of reporting or investigating a suspected violation of law.” In other words, the status quo for AI whistleblowers is essentially that they can disclose trade secret information only if the information concerns a violation of the law and only if they disclose it confidentially to the government, perhaps via a lawyer.

4. Why is it important to pass new AI whistleblower legislation?

Most of the employees working on the frontier models that are expected to generate many of the most worrying AI risks are located in California and entitled to the protection of California’s robust whistleblower statute. There are also existing common law and federal statutory protections that might prove relevant in a pinch; the OpenAI whistleblowers, for example, wrote to the SEC arguing that OpenAI’s NDAs violated the SEC’s rule against NDAs that fail to exempt reporting to the SEC about securities violations. However, there are important gaps in existing whistleblower protections that should be addressed by new federal and state legislation.

Most importantly, the existing California whistleblower statute only protects whistleblowers who report a violation of some law or regulation. But, as a number of existing federal and state laws recognize, there are times when information about significant risks to public safety or national security should be disclosed to the proper authorities even if no law has been broken. Suppose, for example, that internal safety testing demonstrates that a given model can, with a little jailbreaking, be coaxed into providing extremely effective help to a bad actor attempting to manufacture bioweapons. If an AI company chooses to deploy the model anyways, and an employee who worked on safety testing the model wants to bring the risk to the government’s attention through the proper channels, it seems obvious that they should be protected from retaliation for doing so. Unless the company’s actions violated some law or regulation, however, California’s existing whistleblower statute would not apply. To fill this gap, any federal AI whistleblower statute should protect whistleblowers who report information about significant risks from AI systems through the proper channels even if no law has been violated. California’s SB 53 would help to address this issue, but the scope of that statute is so narrow that additional protections would still be useful even if SB 53 is enacted.

Additionally, readers who followed the debate over SB 1047 may recall a number of reasons for preferring a uniform federal policy to a policy that applies only in one state, no matter how important that state is. Not every relevant company is located in California, and there’s no way of knowing for certain where all of the companies that will be important to the development of advanced AI systems in the future will be located. Federal AI whistleblower legislation, if properly scoped, would provide consistency and eliminate the need for an inconsistent patchwork of state protections.

New whistleblower legislation specifically for AI would also provide clarity to potential whistleblowers and raise the salience of AI whistleblowing. By default, many people who could come forward with potentially valuable information will not do so. Anything that reduces the level of uncertainty potential whistleblowers face and eliminates some of the friction involved in the disclosure process is likely to increase the number of whistleblowers who decide to come forward. Even an employee who would have been covered by existing California law or by common-law protections might be more likely to come forward if they saw, for example, a news item about a new statute that more clearly and precisely established protections for the kind of disclosure being contemplated. In other words, “whistleblowing systems should be universally known and psychologically easy to use – not just technically available.”

5. Key decision points for whistleblower legislation

There are also a number of other gaps in existing law that new state or federal whistleblower legislation could fill. This section discusses three of the most important decision points that lawmakers crafting state or federal AI whistleblower legislation will encounter: whether and how to include a formal reporting process, what the scope of the included protections should be, and whether to prohibit contracts that waive whistleblower protections.[ref 4]

a. Reporting process

Any federal AI whistleblower bill should include a formal reporting process for AI risks. This could take the form of a hotline or a designated government office charged with receiving, processing, and perhaps responding to AI whistleblower disclosures. Existing federal statutes that protect whistleblowers who report on hazardous conditions, such as the Federal Railroad Safety Act and the Surface Transportation Assistance Act, often direct an appropriate agency to promulgate regulations[ref 5] establishing a process by which whistleblowers can report “security problems, deficiencies, or vulnerabilities.”

The main benefit of this approach would be the creation of a convenient default avenue for reporting, but there would also be incidental benefits. For example, the existence of a formal government channel for reporting might partially address industry concerns about trade secret protection and the secure processing of sensitive information, especially if the established channel was the only legally protected avenue for reporting. Establishing a reporting process also provides some assurance to whistleblowers that the information they disclose will come to the attention of the government body best equipped to process and respond appropriately to it.[ref 6] Ideally, the agency charged with receiving reports would have preexisting experience with the secure processing of information related to AI security; if the Trump administration elects to allow the Biden administration’s reporting requirements for frontier AI developers to continue in some form, the natural choice would be whatever agency is charged with gathering and processing that information (currently the Department of Commerce’s Bureau of Industry and Security).

b. Scope of protection

Another key decision point for policymakers is the determination of the scope of the protection offered to whistleblowers—in other words, the actions and the actors that should be protected. California’s SB 53, which was clearly drafted to minimize controversy rather than to provide the most robust protection possible, only protects a whistleblower if either:

(a) the whistleblower had “reasonable cause to believe” that they were disclosing information regarding “critical risk,” defined as—

a “foreseeable and material risk” of
killing or seriously injuring more than 100 people or causing at least one billion dollars’ worth of damage, via
one of four specified harm vectors—creating CBRN[ref 7] weapons, a cyberattack, loss of control, or AI model conduct with “limited human intervention” that would be a crime if committed by a human, or

(b) the whistleblower had reasonable cause to believe that their employer had “made false or misleading statements about its management of critical risk”

This is a hard standard to meet. It’s plausible that an AI company employee could be aware of some very serious risk that didn’t threaten a full billion dollars in damage—or even a risk that did threaten hundreds of lives and billions of dollars in damages, but not through one of the four specified threat vectors—and yet not be protected under the statute. Imagine, for example, that internal safety testing at an AI lab showed that a given frontier model could, with a little jailbreaking, provide extremely effective guidance on how to build conventional explosives and use them to execute terrorist attacks. Even if the lab chose not to release this information and issued false public statements about their model’s evaluation results, any potential whistleblower would likely not be protected under SB 53 for reporting this information.

Compare that standard to the one in Illinois’ whistleblower protection statute, which instead protects any employee who discloses information while having a “good faith belief” that the information relates to an activity of their employer that “poses a substantial and specific danger to employees, public health, or safety.”[ref 8] This protection applies to all employees in Illinois,[ref 9] not just employees at frontier AI companies. The federal Whistleblower Protection Act, which applies to federal employees, uses a similar standard—the whistleblower must “reasonably believe” that their disclosure is evidence of a “substantial and specific danger to public health or safety.”

Both of those laws apply to a far broader category of workers than an industry-specific frontier AI whistleblower statute would, and they both allow the disclosure to be made to a relatively wide range of actors. It doesn’t seem at all unreasonable to suggest that AI whistleblower legislation, whether state or federal, should similarly protect disclosures when the whistleblower believes in good faith that they’re reporting on a “substantial and specific” potential danger to public health, public safety, or national security. If labs are worried that this might allow for the disclosure of valuable trade secrets, the protection could be limited to employees who make their reports to a designated government office or hotline that can be trusted to securely handle the information it receives.

In addition to specifying the kinds of disclosures that are protected, a whistleblower law needs to provide clarity on precisely who is entitled to receive protection for blowing the whistle. Some whistleblower laws cover only “employees,” and define that term to exclude, e.g., independent contractors and volunteers. This kind of restriction would be inadvisable in the AI governance context. Numerous proposals have been made for various kinds of independent, and perhaps voluntary, third party testing and auditing of frontier AI systems. The companies and individuals conducting those tests and audits would be well-placed to become aware of new risks from frontier models. Protecting the ability of those individuals to securely and confidentially report risk-related information to the government should be a priority. Here, the scope of California’s SB 53 seems close to ideal—it covers contractors, subcontractors, and unpaid advisors who work for a business as well as ordinary employees.

c. Prohibiting contractual waivers of whistleblower protections

The ideal AI whistleblower law would provide that its protections could not be waived by an NDA or any similar contract or policy. Without such a provision, the effectiveness of any whistleblower law could be blunted by companies requiring employees to sign a relatively standard broad NDA, even if the company didn’t specifically intend to restrict whistleblowing. While a court might hold that such an NDA was unenforceable under common law principles, the uncertainty surrounding how a given court might view a given set of circumstances means that even an unenforceable NDA might have a significant impact on the likelihood of whistleblowers coming forward.

It is possible to pass laws directly prohibiting contracts that discourage whistleblowing—the SEC, for example, often brings charges under the Securities Exchange Act against companies that require employees to sign broad nondisclosure agreements if those agreements don’t include an exception allowing whistleblowers to report information to the SEC. A less controversial approach might be to declare such agreements unenforceable; this, for example, is what 18 U.S.C. § 1514A (another federal law relating to whistleblowing in the securities context) does. California’s SB 53 and some other state whistleblower laws do something similar, but with one critical difference—they prohibit employers from adopting “any rule, regulation, or policy” preventing whistleblowing, without specifically mentioning contracts. The language in SB 53, while helpful, likely wouldn’t cover individualized nondisclosure agreements that aren’t the result of a broader company policy.[ref 10] In future state or federal legislation, it would be better to use language more like the language in 18 U.S.C. § 1514A, which states that “The rights and remedies provided for in this section may not be waived by any agreement, policy form, or condition of employment, including by a predispute arbitration agreement.”

Conclusion

Whistleblower protections for employees at frontier AI companies are a fairly hot topic these days. Numerous state bills have been introduced, and there’s a good chance that federal legislation will follow. The idea seems to have almost as much currency with libertarian-minded private governance advocates as it does with European regulators: California SB 813, the recent proposal for establishing a system of “semiprivate standards organizations” to privately regulate AI systems, would require would-be regulators to attest to their plan for “implementation and enforcement of whistleblower protections.”

There’s reasonably widespread agreement, in other words, that it’s time to enact protections for AI whistleblowers. This being the case, it makes sense for policymakers and commentators who take an interest in this sort of thing to develop some informed opinions about what whistleblower laws are supposed to do and how best to design a law that does those things.

Our view is that AI whistleblower laws are essentially an information-gathering authority—a low-cost, innovation-friendly way to tweak the incentives of people with access to important information so that they’re more likely to make disclosures that benefit the public interest. It’s plausible that, from time to time, individual workers at the companies developing transformative AI systems will become aware of important nonpublic information about risks posed by those systems. Removing obstacles to disclosing that information will, on the margin, encourage additional disclosures and benefit the public. But passing “an AI whistleblower law” isn’t enough. Anyone trying to design such a law will face a number of important decisions about how to structure the offered protections and how to balance companies’ legitimate interest in safeguarding confidential information against the public’s interest in transparency. There are better and worse ways of proceeding, in other words; the idea behind this post was to shed a bit of light on which are which.

LawAI’s comments on the Draft Report of the Joint California Policy Working Group on AI Frontier Models

At Governor Gavin Newsom’s request, a joint working group released a draft report on March 18, 2025 setting out a framework for frontier AI policy in California. Several of the staff at the Institute for Law & AI submitted comments on the draft report as it relates to their existing research. Read their comments below:

These comments were submitted to the Working Group as feedback on April 8, 2025. The opinions expressed in these comments are those of the authors and do not reflect the views of the Institute for Law & AI.

Liability and Insurance Comments

by Gabriel Weil and Mackenzie Arnold

Key Takeaways

Insurance is a complement to, not a replacement for, clear tort liability.
Correctly scoped, liability is compatible with innovation and well-suited to conditions of uncertainty.
Safe harbors that limit background tort liability are a risky bet when we are uncertain about the magnitude of AI risks and have yet to identify robust mitigations.

Whistleblower Protections Comments

by Charlie Bullock and Mackenzie Arnold

Key Takeaways

Whistleblowers should be protected for disclosing information about risks to public safety, even if no law, regulation, or company policy is violated.
California’s existing whistleblower law already protects disclosures about companies that break the law; subsequent legislation should focus on other improvements.
Establishing a clear reporting process or hotline will enhance the effectiveness of whistleblower protections and ensure that reports are put to good use.

Scoping and Definitions Comments

by Mackenzie Arnold and Sarah Bernardo

Key Takeaways

Ensuring that a capable entity regularly updates what models are covered by a policy is a critical design consideration that future-proofs policies.
Promising techniques to support updating include legislative purpose clauses, periodic reviews, designating a capable updater, and providing that updater with the information and expertise needed to do the job.
Compute thresholds are an effective tool to right-size AI policy, but they should be paired with other tools like carve-outs, tiered requirements, multiple definitions, and exemptions to be most effective.
Compute thresholds are an excellent initial filter to determine what models are in scope, and capabilities evaluations are a particularly promising complement.
In choosing a definition of covered models, policymakers should consider how well the definitional elements are risk-tracking, resilient to circumvention, clear, and flexible—in addition to other factors discussed in the Report.

Draft Report of the Joint California Policy Working Group on AI Frontier Models—scoping and definitions comments

These comments on the Draft Report of the Joint California Policy Working Group on AI Frontier Models were submitted to the Working Group as feedback on April 8, 2025. The opinions expressed in these comments are those of the authors and do not reflect the views of the Institute for Law & AI.

Commendations

1. The Report correctly identifies that AI models and their risks vary significantly and thus merit different policies with different inclusion criteria.

Not all AI policies are made alike. Those that target algorithmic discrimination, for example, concern a meaningfully different subset of systems, actors, and tradeoffs than a policy that targets cybersecurity threats. What’s more, the market forces affecting these different policies vary considerably. For example, one might be far more concerned about limiting innovation in a policy context where many small startups are attempting to integrate AI into novel, high-liability-risk contexts (e.g., healthcare) and less concerned in contexts that involve a few large actors receiving large, stable investments, where the rate of tort litigation is much lower absent grievous harms (e.g., frontier model development). That’s all to say: It makes sense to foreground the need to scope AI policies according to the unique issue at hand.

2. We agree that at least some policies should squarely address foundation models as a distinct category.

Foundation models, in particular those that present the most advanced or novel capabilities in critical domains, present unique challenges that merit separate treatment. These differences emerge from the unique characteristics of the models themselves, not their creators (who vary considerably) or their users. And the potential benefits and risks that foundation models present cut across clean sectoral categories.

3. We agree that thresholds are a useful and necessary tool for tailoring laws and regulations (even if they are imperfect).

Thresholds are easy targets for criticism. After all, there is something inherently arbitrary about setting a speed limit at 65 miles per hour rather than 66. Characteristics are more often continuous than binary, so typically there isn’t a clear category shift after you cross over some talismanic number. But this issue isn’t unique to AI policy, and in every other context, government goes on nonetheless. As the Report notes, policy should be proportional in its effects and appropriately narrow in its application. Thresholds help make that possible.

4. The Report correctly acknowledges the need to update thresholds and definitional criteria over time.

We agree that specific threshold values and related definitional criteria will likely need to be updated to keep up with technological advances. Discrete, quantitative thresholds are particularly at risk of becoming obsolete. For instance, thresholds based on training compute may become obsolete due to a variety of AI developments, including improvements in compute and algorithmic efficiency, techniques such as distillation, and/or the growing impact of inference scaling. Given the competing truths that setting some threshold is necessary and that any threshold will inevitably become obsolete, ensuring that definitions can be quickly, regularly, and easily updated should be a core design consideration.

5. We agree that, at present, compute thresholds (combined with other metrics and/or thresholds) are preferable to developer-level thresholds.

Ultimately, the goal of a threshold is to set a clear, measurable, and verifiable bar that correlates with the risk or benefit the policy attempts to address. In this case, a compute threshold best satisfies those criteria—even if it is imperfect. For more discussion, see Training Compute Thresholds: Features and Functions in AI Regulation and The Role of Compute Thresholds for AI Governance.

Recommendations

1. The Report should further emphasize the centrality of updating thresholds and definitional criteria.

Updating is perhaps the most important element of an AI policy. Without it, the entire law may in short time cease to cover the conduct or systems policymakers aimed to target. We should expect this to happen by default. The error may be one of overinclusion—for example, large systems may present few or manageable risks even after a compute threshold is crossed. After some time, we will be confident that these systems do not merit special government attention and will want to remove obligations that attach to them. The error may be one of underinclusion—for example, improvements in compute or algorithmic efficiency, techniques such as distillation, and/or the growing impact of inference scaling may mean that models below the threshold merit inclusion. The error may be in both directions—a truly unfortunate, but entirely plausible, result. Either way, updating will be necessary for policy to remain effective.

We raise this point because without key champions, updating mechanisms will likely be left out of California AI legislation—leading to predictable policy failures. While updating has been incorporated into many laws and regulations, it was notably absent from the final draft of SB 1047 (save for an adjustment for inflation). A similar result cannot befall future bills if they are to remain effective long-term. A clear statement by the authors of the Report would go a long way toward making updating feasible in future legislation.

Recommendation: The Report should clearly state that updating is necessary for effective AI policy and explain why policy is likely to become ineffective if updating is not included. It should further point to best practices (discussed below) to address common concerns about updating.

2. The Report should highlight key barriers to effective updating and tools to manage those barriers.

Three major barriers stand in the way of effective updating. First is the concern that updating may lead to large or unpredictable changes, creating uncertainty or surprise and making it more difficult for companies to engage in long-term planning or fulfill their compliance obligations. Second, some (understandably) worry that overly broad grants of discretion to agencies to update the scope of regulation will lead to future overreach, extending powers to contexts far beyond what was originally intended by legislators. Third, state agencies may lack sufficient capacity or knowledge to effectively update definitions.

The good news: These concerns can be addressed. Establishing predictable periodic reviews, requiring specific procedures for updates, and ensuring consistent timelines can limit uncertainty. Designating a competent updater and supplying them with the resources, data, and expert consultation they need can address concerns about agency competency. And constraining the option space of future updates can limit both surprise and the risk of overreach. When legislators are worried about agency overreach, their concern is typically that the law will be altered to extend to an unexpected context far beyond what the original drafters intended—for example, using a law focused on extreme risks to regulate mundane online chatbots or in a way that increases the number of regulated models by several orders of magnitude. To combat this worry, legislators can include a purpose clause that directly states the intended scope of the law and the boundaries of future updates. For example, a purpose clause could specify that future updates extend “only to those models that represent the most advanced models to date in at least one domain or materially and substantially increase the risk of harm X.” Purpose clauses can also come in the imperative or negative. For example, “in updating the definition in Section X, Regulator Y should aim to adjust the scope of coverage to exclude models that Regulator Y confidently believes pose little or no material risk to public health and safety.”

Recommendation: The Report should highlight the need to address the risks of uncertainty, agency overreach, and insufficient agency capacity when updating the scope of legislation. It should further highlight useful techniques to manage these issues, namely, (a) including purpose clauses or limitations in the relevant definitions, (b) specifying the data, criteria, and public input to be considered in updating definitions, (c) establishing periodic reviews with predictable frequencies, specific procedures, and consistent timelines, (d) designating a competent updater that has adequate access to expertise in making their determinations, (e) ensuring sufficient capacity to carry out periodic reviews and quickly make updates outside of such reviews when necessary, and (f) providing adequate notice and opportunity for input.

3. The Report should highlight other tools beyond thresholds to narrow the scope of regulations and laws—namely, carve-outs, tiered requirements, multiple definitions, and exemption processes.

Thresholds are not the only option for narrowing the scope of a law or regulation, and highlighting other options increases the odds that a consensus will emerge. Too often, debates around the scope of AI policy get caught on whether a certain threshold is overly burdensome for a particular class of actor. But adjusting the threshold itself is often not the most effective way to limit these spillover effects. The tools below are strong complements to the recommendations currently made in the Report.

By carve-outs, we mean a full statutory exclusion from coverage (at least for purposes of these comments). Common carve-outs to consider include:

Small businesses
Startups in particularly fragile funding ecosystems, onerous regulatory environments, or high-upside sectors that merit regulatory favoritism on innovation grounds
Open-source model developers or hosts with the caveats noted below
Providers of high-volume, low-cost services that could not feasibly exist with additional regulatory costs due to their volume or margins (e.g., some chat bots)
Social service providers or governments who provide a socially valuable service at low or no cost, especially where we expect that these actors may under-adopt useful technology due to other frictions

This is not to say that these categories should always be exempt, but rather that making explicit carve-outs for these categories will often ease tensions over specific thresholds. In particular, it is worth noting that while current open-source systems are clearly net-positive according to any reasonable cost-benefit calculus, future advances could plausibly merit some regulatory oversight. For this reason, any carve-out for open-source systems should be capable of being updated if and when that balance changes, perhaps with a heightened evidentiary burden for beginning to include such systems. For example, open-source systems might be generally exempt, but a restriction may be imposed upon a showing that the open-source systems materially increase marginal risk in a specific category, that other less onerous restrictions do not adequately limit this risk, and that the restriction is narrowly tailored.

Related, but less binary, is the use of tiered requirements that impose only a subset of requirements or weaker requirements on these favored models or entities, such as, requiring certain reporting requirements of smaller entities while not requiring them to perform the same evaluations. For this reason, more legislation should likely include multiple or separate definitions of covered models to enable a more nimble, select-only-those-that-apply approach to requirements.

Another option is to create exemption processes whereby entities can be relieved of their obligations if certain criteria are met. For example, a model might be exempt from certain requirements if it has not, after months of deployment, materially contributed to a specific risk category or if the model has fallen out of use. Unlike the former two options, these exemption processes can be tailored to case-by-case fact patterns and occur long after the legislative or regulatory process. They may also better handle harder-to-pin-down factors like whether a model creates exceptional risk. These exemption processes can vary in a few key respects, namely:

Evidentiary: Presumptive or requiring a showing of evidence
Decision maker: Self-attested, certified by a third party, or approved by a regulator
Duration: Permanent or temporary
Rigidity: Formulaic or factor-based with flexible considerations
Speed: Automatic or requiring action or review

Recommendation: The Report already mentions that exempting small businesses from regulations will sometimes be desirable. It should build on this suggestion by emphasizing the utility of carve-outs, tiered requirements, multiple definitions, and exemption processes (in addition to thresholds) to further refine the category of regulated models. It should also outline some of the common carve-out categories (noting the value of maintaining option value by ensuring that carve-outs for open-source systems are revised and updated if the cost-benefit balance changes in the future) as well as key considerations in creating exemption processes.

It is important to provide additional detail about other metrics that could be combined with compute thresholds because this approach is promising and one of the most actionable items in the Report. We recommend capabilities thresholds as a complement to compute thresholds in order to leverage the advantages of compute that make it an excellent initial filter, while making up for its limitations with evaluations of capabilities, which are better proxies for risk and more future-proof. Other metrics could also be paired with compute thresholds in order to more closely track the desired policy outcome, such as risk thresholds or impact-level properties; however, they have practical issues, as discussed in the Report.

Recommendation: The Report should expand on its suggestion that compute thresholds be combined with other metrics and thresholds by noting that capabilities evaluations may be a particularly promising complement to compute thresholds, as they more closely correspond to risk and are more adaptable to future developments and deployment in different contexts. Other metrics could also be paired with compute thresholds in order to more closely track the desired policy outcome, such as risk evaluations or impact-level properties.

5. The Report should note additional definitional considerations in the list in Section 5.1—namely, risk-tracking, resilience to circumvention, clarity, and flexibility.

The Report correctly highlights three considerations that influence threshold design: determination time, measurability, and external verifiability.

Recommendation: We recommend that the Report note four additional definitional considerations, namely:

Risk-Tracking: How closely is the proxy correlated with the risks a policy looks to manage? Currently, compute correlates strongly with advanced capabilities. While there are some exceptions amongst specialized models, bigger is generally better. This remains true even after meaningful gains in inference scaling; it is true both that more inference compute leads to better results and that for any fixed amount of inference compute, a model with more training compute tends to perform better. Generally, the most compute-intensive models are the most likely to be deployed widely in new contexts and the most likely to exhibit emergent capabilities that pose unique risks. Compute is less correlated with risk than more direct measures like capabilities or risk itself, but both of these proxies are harder to measure and define.
Resilience to Circumvention: How difficult is it to game the proxy or evade its application? Thresholds that are more difficult to circumvent are more effective, while easily circumvented thresholds risk becoming useless once a few actors demonstrate the ease of circumvention. Training compute is a difficult proxy to circumvent. While a threshold that focuses solely on training compute could miss models that rely heavily on inference, training compute is still a significant contributor to the capabilities of a model. Derivative models and distillations pose a meaningful obstacle here, as policymakers must decide what and how to cover models with similar performance but different compute inputs. Generally speaking, requirements that lead to paperwork redundancies for similar models can likely be collapsed so that only one model is governed, while rules that relate to preventing or governing specific uses or risks may need to extend to derivatives and distillations to avoid becoming ineffective.
Clarity: How certainly can a regulated party predict that they will be affected by regulation? And how quickly and clearly can regulators clarify ambiguities through interpretations and guidances? Compute thresholds are clear relative to more subjective alternatives. While there are some open questions regarding who measures and how to measure compute, order-of-magnitude differences in compute usage will typically allow actors to know whether they fall in or out of scope of a regulation.
Flexibility: Will the proxy remain accurate over time—because it remains the same, naturally adjusts, or allows for easy updating? Compute is less naturally adaptable than risk-based or capabilities-based thresholds.

For more discussion, see Training Compute Thresholds: Features and Functions in AI Regulation and The Role of Compute Thresholds for AI Governance.

Draft Report of the Joint California Policy Working Group on AI Frontier Models—whistleblower protections comments

These comments on the Draft Report of the Joint California Policy Working Group on AI Frontier Models were submitted to the Working Group as feedback on April 8, 2025. The opinions expressed in these comments are those of the authors and do not reflect the views of the Institute for Law & AI.

We applaud the Working Group’s decision to include a section on whistleblower protections. Whistleblower protections are light-touch, innovation-friendly interventions that protect employees who act in good faith, enable effective law enforcement, and facilitate government access to vital information about risks. Below, we make a few recommendations for changes that would help the Report more accurately describe the current state of whistleblower protections and more effectively inform California policy going forward.

1. Whistleblowers should be protected for disclosing risks to public safety even if no company policy is violated

The Draft Report correctly identifies the importance of protecting whistleblowers who disclose risks to public safety that don’t involve violations of existing law. However, the Draft Report seems to suggest that this protection should be limited to circumstances where risky conduct by a company “violate[s] company policies.” This would be a highly unusual limitation, and we strongly advise against including language that could be interpreted to recommend it. A whistleblower law that only applied to disclosures relating to violations of company policies would perversely discourage companies from adopting strong internal policies (such as responsible scaling policies). This would blunt the effectiveness of whistleblower protections and perhaps lead to companies engaging in riskier conduct overall.

To avoid that undesirable result, existing whistleblower laws that protect disclosures regarding risks in the absence of direct law-breaking focus on the seriousness and likelihood of the risk rather than on whether a company policy has been violated. See, for example: 5 U.S.C. § 2302(b)(8) (whistleblower must “reasonably believe” that their disclosure is evidence of a “substantial and specific danger to public health or safety”); 49 U.S.C. § 20109 (whistleblower must “report[], in good faith, a hazardous safety or security condition”); 740 ILCS 174/15 (Illinois) (whistleblower must have a “good faith belief” that disclosure relates to activity that “poses a substantial and specific danger to employees, public health, or safety.”). Many items of proposed AI whistleblower legislation in various states also recognize the importance of protecting this kind of reporting. See, for example: California SB 53 (2025–2026) (protecting disclosures by AI employees related to “critical risks”); Illinois HB 3506 (2025–2026) (similar); Colorado HB25-1212 (protecting disclosures by AI employees who have “reasonable cause to believe” the disclosure relates to activities that “pose a substantial risk to public safety or security, even if the developer is not out of compliance with any law”).

We recommend that the report align its recommendation with these more common, existing whistleblower protections, by (a) either omitting the language regarding violations of internal company policy or qualifying it to clarify that the Report is not recommending that such violations be used as a requirement for whistleblower protections to apply; and (b) explicitly referencing common language used to describe the type of disclosures that are protected even in the absence of lawbreaking.

Suggested language: “However, some actions that clearly pose serious risks to public safety may not violate any existing laws. Therefore, policymakers may consider protections that cover a broader range of activities, which may draw upon notions of ‘good faith’ reporting on risks found in other domains such as cybersecurity. One possible approach is to follow the example of the federal Whistleblower Protection Act and protect disclosures made by a person who ‘reasonably believes’ that the disclosure relates to a ‘substantial and specific danger to public health or safety.’”

2. The report’s overview of existing law should discuss California’s existing protections

The report’s overview of existing whistleblower protections makes no mention of California’s whistleblower protection law, California Labor Code § 1102.5. That law protects both public and private employees in California from retaliation for reporting violations of any state, federal, or local law or regulation to a government agency or internally within a company. It also prohibits employers from adopting any internal policies to prevent employees from whistleblowing.

This is critical context for understanding the current state of California whistleblower protections and the gaps that remain. The fact that § 1102.5 already exists and applies to California employees of AI companies means that additional laws specifically protecting AI employees from retaliation for reporting law violations would likely be redundant unless they added something new—e.g., protection for good faith disclosures relating to “substantial and specific dangers to public health or safety.”

This information could be inserted into the subsection on “applicability of existing whistleblower protections.”

Suggested language: “Under existing California law, both public and private sector employees in California are protected from retaliation for reporting violations of any state, federal, or local law or regulation to a government or law enforcement agency or internally within their company [reference].”

3. The report should highlight the importance of establishing a reporting process

Protecting good-faith whistleblowers from retaliation is only one lever to ensure that governments and the public are adequately informed of risks. Perhaps even more important is ensuring that the government of California appropriately handles that information once it is received. One promising way to facilitate the secure handling of sensitive disclosures is to create a designated government hotline or office for AI whistleblower disclosures.

This approach benefits all stakeholders:

Companies know that any sensitive business information disclosed to the government will be handled securely and appropriately and that the risk of valuable trade secrets being leaked to competitors will be minimized;
Whistleblowers receive greater assurance that the information they bring forward will actually be put to good use (justifying the reputational and personal risk they take on); and
The government of California becomes more capable of acting on the information it receives, responding to risks in a timely manner, updating its decision-making in light of new evidence, sharing information with key partners, and enforcing the law.

The report already touches briefly on the desirability of “ensuring clarity on the process for whistleblowers to safely report information,” but a more specific and detailed recommendation would make this section of the Report more actionable. Precisely because of our uncertainty about the risks posed by future AI systems, there is great option value in building the government’s capacity to quickly, competently, and securely react to new information received through whistleblowing. By default, we might expect that no clear chain of command will exist for processing this new information, sharing it securely with key decision makers, or operationalizing it to improve decision making. This increases coordination costs and may ultimately result in critical information being underutilized or ignored.

Suggested language: “Ensuring clarity on the process for whistleblowers to safely report information can jointly advance accountability and manage countervailing interests, such as the disclosure of trade secrets or the misuse of information to compromise safety and security. One promising way to facilitate secure disclosures is to establish a secure government-run hotline or office for receiving AI whistleblower disclosures and to establish procedures for receiving, processing, sharing, and acting upon disclosures. Establishing such procedures may also increase government agencies’ ability to quickly and competently process important information and respond to emerging issues.”