Research Article | 
September 2024

Legal considerations for defining “frontier model”

Charlie Bullock, Suzanne Van Arsdale, Mackenzie Arnold, Cullen O'Keefe, Christoph Winter

I. Introduction

One of the few concrete proposals on which AI governance stakeholders in industry1 and government2 have mostly3 been able to agree is that AI legislation and regulation should recognize a distinct category consisting of the most advanced AI systems. The executive branch of the U.S. federal government refers to these systems, in Executive Order 14110 and related regulations, as “dual-use foundation models.”4 The European Union’s AI Act refers to a similar class of models as “general-purpose AI models with systemic risk.”5 And many researchers, as well as leading AI labs and some legislators, use the term “frontier models” or some variation thereon.6 

These phrases are not synonymous, but they are all attempts to address the same issue—namely that the most advanced AI systems present additional regulatory challenges distinct from those posed by less sophisticated models. Frontier models are expected to be highly capable across a broad variety of tasks and are also expected to have applications and capabilities that are not readily predictable prior to development, nor even immediately known or knowable after development.7 It is likely that not all of these applications will be socially desirable; some may even create significant risks for users or for the general public. 

The question of precisely how frontier models should be regulated is contentious and beyond the scope of this paper. But any law or regulation that distinguishes between “frontier models” (or “dual-use foundation models,” or “general-purpose AI models with systemic risk”) and other AI systems will first need to define the chosen term. A legal rule that applies to a certain category of product cannot be effectively enforced or complied with unless there is some way to determine whether a given product falls within the regulated category. Laws that fail to carefully define ambiguous technical terms often fail in their intended purposes, sometimes with disastrous results.8 Because the precise meaning of the phrase “frontier model” is not self-evident,9 the scope of a law or regulation that targeted frontier models without defining that term would be unacceptably uncertain. This uncertainty would impose unnecessary costs on regulated companies (who might overcomply out of an excess of caution or unintentionally undercomply and be punished for it) and on the public (from, e.g., decreased compliance, increased enforcement costs, less risk protection, and more litigation over the scope of the rule).

 The task of defining “frontier model” implicates both legal and policy considerations. This paper provides a brief overview of some of the most relevant legal considerations for the benefit of researchers, policymakers, and anyone else with an interest in the topic. 

II. Statutory and Regulatory Definitions

Two related types of legal definition—statutory and regulatory—are relevant to the task of defining “frontier model.” A statutory definition is a definition that appears in a statute enacted by a legislative body such as the U.S. Congress or one of the 50 state legislatures. A regulatory definition, on the other hand, appears in a regulation promulgated by a government agency such as the U.S. Department of Commerce or the California Department of Technology (or, less commonly, in an executive order).

Regulatory definitions have both advantages and disadvantages relative to statutory definitions. Legislation is generally a more difficult and resource-intensive process than agency rulemaking, with additional veto points and failure modes.10 Agencies are therefore capable of putting into effect more numerous and detailed legal rules than Congress can,11 and can update those rules more quickly and easily than Congress can amend laws.12 Additionally, executive agencies are often more capable of acquiring deep subject-matter expertise in highly specific fields than are congressional offices due to Congress’s varied responsibilities and resource constraints.13 This means that regulatory definitions can benefit from agency subject-matter expertise to a greater extent than can statutory definitions, and can also be updated far more easily and often.

The immense procedural and political costs associated with enacting a statute do, however, purchase a greater degree of democratic legitimacy and legal resiliency than a comparable regulation would enjoy. A number of legal challenges that might persuade a court to invalidate a regulatory definition would not be available for the purpose of challenging a statute.14 And since the rulemaking power exercised by regulatory agencies is generally delegated to them by Congress, most regulations must be authorized by an existing statute. A regulatory definition generally cannot eliminate or override a statutory definition15 but can clarify or interpret. Often, a regulatory regime will include both a statutory definition and a more detailed regulatory definition for the same term.16 This can allow Congress to choose the best of both worlds, establishing a threshold definition with the legitimacy and clarity of an act of Congress while empowering an agency to issue and subsequently update a more specific and technically informed regulatory definition. 

III. Existing Definitions

This section discusses five noteworthy attempts to define phrases analogous to “frontier model” from three different existing measures. Executive Order 14110 (“EO 14110”), which President Biden issued in October 2023, includes two complementary definitions of the term “dual-use foundation model.” Two definitions of “covered model” from different versions of the Safe and Secure Innovation for Frontier Artificial Intelligence Models Act, a California bill that was recently vetoed by Governor Newsom, are also discussed, along with the EU AI Act’s definition of “general-purpose AI model with systemic risk.”

A. Executive Order 14110

EO 14110 defines “dual-use foundation model” as:

an AI model that is trained on broad data; generally uses self-supervision; contains at least tens of billions of parameters; is applicable across a wide range of contexts; and that exhibits, or could be easily modified to exhibit, high levels of performance at tasks that pose a serious risk to security, national economic security, national public health or safety, or any combination of those matters, such as by:

(i) substantially lowering the barrier of entry for non-experts to design, synthesize, acquire, or use chemical, biological, radiological, or nuclear (CBRN) weapons;

(ii) enabling powerful offensive cyber operations through automated vulnerability discovery and exploitation against a wide range of potential targets of cyber attacks; or

(iii) permitting the evasion of human control or oversight through means of deception or obfuscation.

Models meet this definition even if they are provided to end users with technical safeguards that attempt to prevent users from taking advantage of the relevant unsafe capabilities.17

The executive order imposes certain reporting requirements on companies “developing or demonstrating an intent to develop” dual-use foundation models,18 and for purposes of these requirements it instructs the Department of Commerce to “define, and thereafter update as needed on a regular basis, the set of technical conditions for models and computing clusters that would be subject to the reporting requirements.”19 In other words, EO 14110 contains both a high-level quasi-statutory20 definition and a directive to an agency to promulgate a more detailed regulatory definition. The EO also provides a second definition that acts as a placeholder until the agency’s regulatory definition is promulgated:

any model that was trained using a quantity of computing power greater than 1026 integer or floating-point operations, or using primarily biological sequence data and using a quantity of computing power greater than 1023 integer or floating-point operations21

Unlike the first definition, which relies on subjective evaluations of model characteristics,22 this placeholder definition provides a simple set of objective technical criteria that labs can consult to determine whether the reporting requirements apply. For general-purpose models, the sole test is whether the model was trained on computing power greater than 1026 integer or floating-point operations (FLOP); only models that exceed this compute threshold23 are deemed “dual-use foundation models” for purposes of the reporting requirements mandated by EO 14110.

B. California’s “Safe and Secure Innovation for Frontier Artificial Intelligence Act” (SB 1047)

California’s recently vetoed “Safe and Secure Innovation for Frontier Artificial Intelligence Models Act” (“SB 1047”) focused on a category that it referred to as “covered models.”24 The version of SB 1047 passed by the California Senate in May 2024 defined “covered model” to include models meeting either of the following criteria:

(1) The artificial intelligence model was trained using a quantity of computing power greater than 1026 integer or floating-point operations.

(2) The artificial intelligence model was trained using a quantity of computing power sufficiently large that it could reasonably be expected to have similar or greater performance as an artificial intelligence model trained using a quantity of computing power greater than 1026 integer or floating-point operations in 2024 as assessed using benchmarks commonly used to quantify the general performance of state-of-the-art foundation models.25

This definition resembles the placeholder definition in EO 14110 in that it primarily consists of a training compute threshold of 1026 FLOP. However, SB 1047 added an alternative capabilities-based threshold to capture future models which “could reasonably be expected” to be as capable as models trained on 1026 FLOP in 2024. This addition was intended to “future-proof”26 SB 1047 by addressing one of the main disadvantages of training compute thresholds—their tendency to become obsolete over time as advances in algorithmic efficiency produce highly capable models trained on relatively small amounts of compute.27 

Following pushback from stakeholders who argued that SB 1047 would stifle innovation,28 the bill was amended repeatedly in the California State Assembly. The final version defined “covered model” in the following way:

(A) Before January 1, 2027, “covered model” means either of the following:

(i) An artificial intelligence model trained using a quantity of computing power greater than 1026 integer or floating-point operations, the cost of which exceeds one hundred million dollars29 ($100,000,000) when calculated using the average market prices of cloud compute at the start of training as reasonably assessed by the developer.

(ii) An artificial intelligence model created by fine-tuning a covered model using a quantity of computing power equal to or greater than three times 1025 integer or floating-point operations, the cost of which, as reasonably assessed by the developer, exceeds ten million dollars ($10,000,000) if calculated using the average market price of cloud compute at the start of fine-tuning.

(B) (i) Except as provided in clause (ii), on and after January 1, 2027, “covered model” means any of the following:

(I) An artificial intelligence model trained using a quantity of computing power determined by the Government Operations Agency pursuant to Section 11547.6 of the Government Code, the cost of which exceeds one hundred million dollars ($100,000,000) when calculated using the average market price of cloud compute at the start of training as reasonably assessed by the developer.

(II) An artificial intelligence model created by fine-tuning a covered model using a quantity of computing power that exceeds a threshold determined by the Government Operations Agency, the cost of which, as reasonably assessed by the developer, exceeds ten million dollars ($10,000,000) if calculated using the average market price of cloud compute at the start of fine-tuning.

(ii) If the Government Operations Agency does not adopt a regulation governing subclauses (I) and (II) of clause (i) before January 1, 2027, the definition of “covered model” in subparagraph (A) shall be operative until the regulation is adopted.

This new definition was more complex than its predecessor. Subsection (A) introduced an initial definition slated to apply until at least 2027, which relied on a training compute threshold of 1026 FLOP paired with a training cost floor of $100,000,000.30 Subsection (B), in turn, provided for the eventual replacement of the training compute thresholds used in the initial definition with new thresholds to be determined (and presumably updated) by a regulatory agency. 

The most significant change in the final version of SB 1047’s definition was the replacement of the capability threshold with a $100,000,000 cost threshold. Because it would currently cost more than $100,000,000 to train a model using >1026 FLOP, the addition of the cost threshold did not change the scope of the definition in the short term. However, the cost of compute has historically fallen precipitously over time in accordance with Moore’s law.31 This may mean that models trained using significantly more than 1026 FLOP will cost significantly less than the inflation-adjusted equivalent of 100 million 2024 dollars to create at some point in the future. 

The old capability threshold expanded the definition of “covered model” because it was an alternative to the compute threshold—models that exceeded either of the two thresholds would have been “covered.” The newer cost threshold, on the other hand, restricted the scope of the definition because it was linked conjunctively to the compute threshold, meaning that only models that exceed both thresholds were covered. In other words, where the May 2024 definition of “covered model” future-proofed itself against the risk of becoming underinclusive by including highly capable low-compute models, the final definition instead guarded against the risk of becoming overinclusive by excluding low-cost models trained on large amounts of compute. Furthermore, the final cost threshold was baked into the bill text and could only have been changed by passing a new statute—unlike the compute threshold, which could have been specified and updated by a regulator. 

Compared with the overall definitional scheme in EO 14110, SB 1047’s definition was simpler, easier to operationalize, and less flexible. SB 1047 lacked a broad, high-level risk-based definition like the first definition in EO 14110. SB 1047 did resemble EO 14110 in its use of a “placeholder” definition, but where EO 14110 confers broad discretion on the regulator to choose the “set of technical conditions” that will comprise the regulatory definition, SB 1047 only authorized the regulator to set and adjust the numerical value of the compute thresholds in an otherwise rigid statutory definition.

C. EU Artificial Intelligence Act

The EU AI Act classifies AI systems according to the risks they pose. It prohibits systems that do certain things, such as exploiting the vulnerabilities of elderly or disabled people,32 and regulates but does not ban so-called “high-risk” systems.33 While this classification system does not map neatly onto U.S. regulatory efforts, the EU AI Act does include a category conceptually similar to the EO’s “dual-use foundation model”: the “general-purpose AI model with systemic risk.”34 The statutory definition for this category includes a given general-purpose model35 if:

a. it ‍has high impact capabilities37 evaluated on the basis of appropriate technical tools and methodologies, including indicators and benchmarks; [or]

b. based on a decision of the Commission,38 ex officio or following a qualified alert from the scientific panel, it has capabilities or an impact equivalent to those set out in point (a) having regard to the criteria set out in Annex XIII.

Additionally, models are presumed to have “high impact capabilities” if they were trained on >1025 FLOP.38 The seven “criteria set out in Annex XIII” to be considered in evaluating model capabilities include a variety of technical inputs (such as the model’s number of parameters and the size or quality of the dataset used in training the model), the model’s performance on benchmarks and other capabilities evaluations, and other considerations such as the number of users the model has.39 When necessary, the European Commission is authorized to amend the compute threshold and “supplement benchmarks and indicators” in response to technological developments, such as “algorithmic improvements or increased hardware efficiency.”40

The EU Act definition resembles the initial, broad definition in the EO in that they both take diverse factors like the size and quality of the dataset used to train the model, the number of parameters, and the model’s capabilities into account. However, the EU Act definition is likely much broader than either EO definition. The training compute threshold in the EU Act is sufficient, but not necessary, to classify models as systemically risky, whereas the (much higher) threshold in the EO’s placeholder definition is both necessary and sufficient. And the first EO definition includes only models that exhibit a high level of performance on tasks that pose serious risks to national security, while the  EU Act includes all general-purpose models with “high impact capabilities,” which it defines as including any model trained on more than 1025 FLOP.

The EU Act definition resembles the final SB 1047 definition of “covered model” in that both definitions authorize a regulator to update their thresholds in response to changing circumstances. It also resembles SB 1047’s May 2024 definition in that both definitions incorporate a training compute threshold and a capabilities-based element.

IV. Elements of Existing Definitions

As the examples discussed above demonstrate, legal definitions of “frontier model” can consist of one or more of a number of criteria. This section discusses a few of the most promising definitional elements.

A. Technical inputs and characteristics

A definition may classify AI models according to their technical characteristics or the technical inputs used in training the model, such as training compute, parameter count, and dataset size and type. These elements can be used in either statutory or regulatory definitions.

Training compute thresholds are a particularly attractive option for policymakers,41 as evidenced by the three examples discussed above. “Training compute” refers to the computational power used to train a model, often  measured in integer or floating-point operations (OP or FLOP).42 Training compute thresholds function as a useful proxy for model capabilities because capabilities tend to increase as computational resources used to train the model increase.43 

One advantage of using a compute threshold is that training compute is a straightforward metric that is quantifiable and can be readily measured, monitored, and verified.44 Because of these characteristics, determining with high certainty whether a given model exceeds a compute threshold is relatively easy. This, in turn, facilitates enforcement of and compliance with regulations that rely on a compute-based definition. Since the amount of training compute (and other technical inputs) can be estimated prior to the training run,45 developers can predict whether a model will be covered earlier in development. 

One disadvantage of a compute-based definition is that compute thresholds are a proxy for model capabilities, which are in turn a proxy for risk. Definitions that make use of multiple nested layers of proxy terms in this manner are particularly prone to becoming untethered from their original purpose.46 This can be caused, for example, by the operation of Goodhart’s Law, which suggests that “when a measure becomes a target, it ceases to be a good measure.”47 Particularly problematic, especially for statutory definitions that are more difficult to update, is the possibility that a compute threshold may become underinclusive over time as improvements in algorithmic efficiency allow for the development of highly capable models trained on below-threshold levels of compute.48 This possibility is one reason why SB 1047 and the EU AI Act both supplement their compute thresholds with alternative, capabilities-based elements.

In addition to training compute, two other model characteristics correlated with capabilities are the number of model parameters49 and the size of the dataset on which the model was trained.50 Either or both of these characteristics can be used as an element of a definition. A definition can also rely on training data characteristics other than size, such as the quality or type of the data used; the placeholder definition in EO 14110, for example, contains a lower compute threshold for models “trained… using primarily biological sequence data.”51 EO 14110 requires a dual-use foundation model to contain “at least tens of billions of parameters,”52 and the “number of parameters of the model” is a criteria to be considered under the EU AI Act.53 EO 14110 specified that only models “trained on broad data” could be dual-use foundation models,54 and the EU AI Act includes “the quality or size of the data set, for example measured through tokens” as one criterion for determining whether an AI model poses systemic risks.55

Dataset size and parameter count share many of the pros and cons of training compute. Like training compute, they are objective metrics that can be measured and verified, and they serve as proxies for model capabilities.56 Training compute is often considered the best and most reliable proxy of the three, in part because it is the most closely correlated with performance and is difficult to manipulate.57 However, partially redundant backup metrics can still be useful.58 Dataset characteristics other than size are typically less quantifiable and harder to measure but are also capable of capturing information that the quantifiable metrics cannot. 

B. Capabilities 

Frontier models can also be defined in terms of their capabilities. A capabilities-based definition element typically sets a threshold level of competence that a model must achieve to be considered “frontier,” either in one or more specific domains or across a broad range of domains. A capabilities-based definition can provide specific, objective criteria for measuring a model’s capabilities,59 or it can describe the capabilities required in more general terms and leave the task of evaluation to the discretion of future interpreters.60 The former approach might be better suited to a regulatory definition, especially if the criteria used will have to be updated frequently, whereas the latter approach would be more typical of a high-level statutory definition.

Basing a definition on capabilities, rather than relying on a proxy for capabilities like training compute, eliminates the risk that the chosen proxy will cease to be a good measure of capabilities over time. Therefore, a capabilities-based definition is more likely than, e.g., a compute threshold to remain robust over time in the face of improvements in algorithmic efficiency. This was the point of the May 2024 version of SB 1047’s use of a capabilities element tethered to a compute threshold (“similar or greater performance as an artificial intelligence model trained using a quantity of computing power greater than 1026 integer or floating-point operations in 2024”)—it was an attempt to capture some of the benefits of an input-based definition while also guarding against the possibility that models trained on less than 1026 FLOP may become far more capable in the future than they are in 2024. 

However,  capabilities are far more difficult than compute to accurately measure. Whether a model has demonstrated “high levels of performance at tasks that pose a serious risk to security” under the EO’s broad capabilities-based definition is not something that can be determined objectively and to a high degree of certainty like the size of a dataset in tokens or the total FLOP used in a training run. Model capabilities are often measured using benchmarks (standardized sets of tasks or questions),61 but creating benchmarks that accurately measure the complex and diverse capabilities of general-purpose foundation models62 is notoriously difficult.63 

Additionally, model capabilities (unlike the technical inputs discussed above) are generally not measurable until after the model has been trained.64 This makes it difficult to regulate the development of frontier models using capabilities-based definitions, although post-development, pre-release regulation is still possible.

C. Risk

Some researchers have suggested the possibility of defining frontier AI systems on the basis of the risks they pose to users or to public safety instead of or in addition to relying on a proxy metric, like capabilities, or a proxy for a proxy, such as compute.65 The principal advantage of this direct approach is that it can, in theory, allow for better-targeted regulations—for instance, by allowing a definition to exclude highly capable but demonstrably low-risk models. The principal disadvantage is that measuring risk is even more difficult than measuring capabilities.66 The science of designing rigorous safety evaluations for foundation models is still in its infancy.67 

Of the three real-world measures discussed in Section III, only EO 14110 mentions risk directly. The broad initial definition of “dual-use foundation model” includes models that exhibit “high levels of performance at tasks that pose a serious risk to security,” such as “enabling powerful offensive cyber operations through automated vulnerability discovery” or making it easier for non-experts to design chemical weapons. This is a capability threshold combined with a risk threshold; the tasks at which a dual-use foundation model must be highly capable are those that pose a “serious risk” to security, national economic security, and/or national public health or safety. As EO 14110 shows, risk-based definition elements can specify the type of risk that a frontier model must create instead of addressing the severity of the risks created. 

D. Epistemic elements

One of the primary justifications for recognizing a category of “frontier models” is the likelihood that broadly capable AI models that are more advanced than previous generations of models will have capabilities and applications that are not readily predictable ex ante.68 As the word “frontier” implies, lawmakers and regulators focusing on frontier models are interested in targeting models that break new ground and push into the unknown.69 This was, at least in part, the reason for the inclusion of training compute thresholds of 1026 FLOP in EO 14110 and SB 1047—since the most capable current models were trained on 5×1025 or fewer FLOP,70 a model trained on 1026 FLOP would represent a significant step forward into uncharted territory. 

While it is possible to target models that advance the state of the art by setting and adjusting capability or compute thresholds, a more direct alternative approach would be to include an epistemic element in a statutory definition of “frontier model.” An epistemic element would distinguish between “known” and “unknown” models, i.e., between well-understood models that pose only known risks and poorly understood models that may pose unfamiliar and unpredictable risks.71 

This kind of distinction between known and unknown risks has a long history in U.S. regulation.72 For instance, the Toxic Substances Control Act (TSCA) prohibits the manufacturing of any “new chemical substance” without a license.73 The EPA keeps and regularly updates a list of chemical substances which are or have been manufactured in the U.S., and any substance not included on this list is “new” by definition.74 In other words, the TSCA distinguishes between chemicals (including potentially dangerous chemicals) that are familiar to regulators and unfamiliar chemicals that pose unknown risks. 

One advantage of an epistemic element is that it allows a regulator to address “unknown unknowns” separately from better-understood risks that can be evaluated and mitigated more precisely.75 Additionally, the scope of an epistemic definition, unlike that of most input- and capability-based definitions, would change over time as regulators became familiar with the capabilities of and risks posed by new models.76 Models would drop out of the “frontier” category once regulators became sufficiently familiar with their capabilities and risks.77 Like a capabilities- or risk-based definition, however, an epistemic definition might be difficult to operationalize.78 To determine whether a given model was “frontier” under an epistemic definition, it would probably be necessary to either rely on a proxy for unknown capabilities or authorize a regulator to categorize eligible models according to a specified process.79 

E. Deployment context

The context in which an AI system is deployed can serve as an element in a definition. The EU AI Act, for example, takes the number of registered end users and the number of registered EU business users a model has into account as factors to be considered in determining whether a model is a “general-purpose AI model with systemic risk.”80 Deployment context typically does not in and of itself provide enough information about the risks posed by a model to function as a stand-alone definitional element, but it can be a useful proxy for the kind of risk posed by a given model. Some models may cause harms in proportion to their number of users, and the justification for aggressively regulating these models grows stronger the more users they have.  A model that will only be used by government agencies, or by the military, creates a different set of risks than a model that is made available to the general public.

V. Updating Regulatory Definitions

A recurring theme in the scholarly literature on the regulation of emerging technologies is the importance of regulatory flexibility.81 Because of the rapid pace of technological progress, legal rules designed to govern emerging technologies like AI tend to quickly become outdated and ineffective if they cannot be rapidly and frequently updated in response to changing circumstances.82 For this reason, it may be desirable to authorize an executive agency to promulgate and update a regulatory definition of “frontier model,” since regulatory definitions can typically be updated more frequently and more easily than statutory definitions under U.S. law.83

Historically, failing to quickly update regulatory definitions in the context of emerging technologies has often led to the definitions becoming obsolete or counterproductive. For example, U.S. export controls on supercomputers in the 1990s and early 2000s defined “supercomputer” in terms of the number of millions of theoretical operations per second (MTOPS) the computer could perform.84 Rapid advances in the processing power of commercially available computers soon rendered the initial definition obsolete, however, and the Clinton administration was forced to revise the MTOPS threshold repeatedly to avoid harming the competitiveness of the American computer industry.85 Eventually, the MTOPS metric itself was rendered obsolete, leading to a period of several years in which supercomputer export controls were ineffective at best.86

There are a number of legal considerations that may prevent an agency from quickly updating a regulatory definition and a number of measures that can be taken to streamline the process. One important aspect of the rulemaking process is the Administrative Procedure Act’s “notice and comment” requirement.87 In order to satisfy this requirement, agencies are generally obligated to publish notice of any proposed amendment to an existing regulation in the Federal Register, allow time for the public to comment on the proposal, respond to public comments, publish a final version of the new rule, and then allow at least 30–60 days before the rule goes into effect.88 From the beginning of the notice-and-comment process to the publication of a final rule, this process can take anywhere from several months to several years.89 However, an agency can waive the 30–60 day publication period or even the entire notice-and-comment requirement for “good cause” if observing the standard procedures would be “impracticable, unnecessary, or contrary to the public interest.”90 Of course, the notice-and-comment process has benefits as well as costs; public input can be substantively valuable and informative for agencies, and also increases the democratic accountability of agencies and the transparency of the rulemaking process. In certain circumstances, however, the costs of delay can outweigh the benefits. U.S. agencies have occasionally demonstrated a willingness to waive procedural rulemaking requirements in order to respond to emergency AI-related developments. The Bureau of Industry and Security (“BIS”), for example, waived the normal 30-day waiting period for an interim rule prohibiting the sale of certain advanced AI-relevant chips to China in October 2023.91  

Another way to encourage quick updating for regulatory definitions is for Congress to statutorily authorize agencies to eschew or limit the length of notice and comment, or to compel agencies to promulgate a final rule by a specified deadline.92 Because notice and comment is a statutory requirement, it can be adjusted as necessary by statute.  

For regulations exceeding a certain threshold of economic significance, another substantial source of delay is OIRA review. OIRA, the Office of Information and Regulatory Affairs, is an office within the White House that oversees interagency coordination and undertakes centralized cost-benefit analysis of important regulations.93 Like notice and comment, OIRA review can have significant benefits—such as improving the quality of regulations and facilitating interagency cooperation—but it also delays the implementation of significant rules, typically by several months.94 OIRA review can be waived either by statutory mandate or by OIRA itself.95

VI. Deference, Delegation, and Regulatory Definitions

Recent developments in U.S. administrative law may make it more difficult for Congress to effectively delegate the task of defining “frontier model” to a regulatory agency. A number of recent Supreme Court cases signal an ongoing shift in U.S. administrative law doctrine intended to limit congressional delegations of rulemaking authority.96 Whether this development is good or bad on net is a matter of perspective; libertarian-minded observers who believe that the U.S. has too many legal rules already97 and that overregulation is a bigger problem than underregulation have welcomed the change,98 while pro-regulation observers predict that it will significantly reduce the regulatory capacity of agencies in a number of important areas.99 

Regardless of where one falls on that spectrum of opinion, the relevant takeaway for efforts to define “frontier model” is that it will likely become somewhat more difficult for agencies to promulgate and update regulatory definitions without a clear statutory authorization to do so. If Congress still wishes to authorize the creation of regulatory definitions, however, it can protect agency definitions from legal challenges by clearly and explicitly authorizing agencies to exercise discretion in promulgating and updating definitions of specific terms.

A. Loper Bright and deference to agency interpretations

In a recent decision in the combined cases of Loper Bright Enterprises v. Raimondo and Relentless v. Department of Commerce, the Supreme Court repealed a longstanding legal doctrine known as Chevron deference.100 Under Chevron, federal courts were required to defer to certain agency interpretations of federal statutes when (1) the relevant part of the statute being interpreted was genuinely ambiguous and (2) the agency’s interpretation was reasonable. After Loper Bright, courts are no longer required to defer to these interpretations—instead, under a doctrine known as Skidmore deference,101 agency interpretations will prevail in court only to the extent that courts are persuaded by them.102 

Justice Elena Kagan’s dissenting opinion in Loper Bright argues that the decision will harm the regulatory capacity of agencies by reducing the ability of agency subject-matter experts to promulgate regulatory definitions of ambiguous statutory phrases in “scientific or technical” areas.103 The dissent specifically warns that, after Loper Bright, courts will “play a commanding role” in resolving questions like “[w]hat rules are going to constrain the development of A.I.?”104 

Justice Kagan’s dissent probably somewhat overstates the significance of Loper Bright to AI governance for rhetorical effect.105 The end of Chevron deference does not mean that Congress has completely lost the ability to authorize regulatory definitions; where Congress has explicitly directed an agency to define a specific statutory term, Loper Bright will not prevent the agency from doing so.106 An agency’s authority to promulgate a regulatory definition under a statute resembling EO 14110, which explicitly directs the Department of Commerce to define “dual-use foundation model,” would likely be unaffected. However, Loper Bright has created a great deal of uncertainty regarding the extent to which courts will accept agency claims that Congress has implicitly authorized the creation of regulatory definitions.107 

To better understand how this uncertainty might affect efforts to define “frontier model,” consider the following real-life example. The Energy Policy and Conservation Act (“EPCA”) includes a statutory definition of the term “small electric motor.”108 Like many statutory definitions, however, this definition is not detailed enough to resolve all disputes about whether a given product is or is not a “small electric motor” for purposes of EPCA. In 2010, the Department of Energy (“DOE”), which is authorized under EPCA to promulgate energy efficiency standards governing “small electric motors,”109 issued a regulatory definition of “small electric motor” specifying that the term referred to motors with power outputs between 0.25 and 3 horsepower.110 The National Electrical Manufacturers Association (“NEMA”), a trade association of electronics manufacturers, sued to challenge the rule, arguing that motors with between 1 and 3 horsepower were too powerful to be “small electric motors” and that the DOE was exceeding its statutory authority by attempting to regulate them.111 

In a 2011 opinion that utilized the Chevron framework, the federal court that decided NEMA’s lawsuit considered the language of EPCA’s statutory definition and concluded that EPCA was ambiguous as to whether motors with between 1 and 3 horsepower could be “small electric motors.”112 The court then found that the DOE’s regulatory definition was a reasonable interpretation of EPCA’s statutory definition, deferred to the DOE under Chevron, and upheld the challenged regulation.113

Under Chevron, federal courts were required to assume that Congress had implicitly authorized agencies like the DOE to resolve ambiguities in a statute, as the DOE did in 2010 by promulgating its regulatory definition of “small electric motor.” After Loper Bright, courts will recognize fewer implicit delegations of definition-making authority. For instance, while EPCA requires the DOE to prescribe “testing requirements” and “energy conservation standards” for small electric motors, it does not explicitly authorize the DOE to promulgate a regulatory definition of “small electric motor.” If a rule like the one challenged by NEMA were challenged today, the DOE could still argue that Congress implicitly authorized the creation of such a rule by giving the DOE authority to prescribe standards and testing requirements—but such an argument would probably be less likely to succeed than the Chevron argument that saved the rule in 2011.

Today, a court that did not find an implicit delegation of rulemaking authority in EPCA would not defer to the DOE’s interpretation. Instead, the court would simply compare the DOE’s regulatory definition of “small electric motor” with NEMA’s proposed definition and decide which of the two was a more faithful interpretation of EPCA’s statutory definition.114 Similarly, when or if some future federal statute uses the phrase “frontier model” or any analogous term, agency attempts to operationalize the statute by enacting detailed regulatory definitions that are not explicitly authorized by the statute will be easier to challenge after Loper Bright than they would have been under Chevron

Congress can avoid Loper Bright issues by using clear and explicit statutory language to authorize agencies to promulgate and update regulatory definitions of “frontier model” or analogous phrases. However, it is often difficult to predict in advance whether or how a statutory definition will become ambiguous over time. This is especially true in the context of emerging technologies like AI, where the rapid pace of technological development and the poorly understood nature of the technology often eventually render carefully crafted definitions obsolete.115 

Suppose, for example, that a federal statute resembling the May 2024 draft of SB 1047 was enacted. The statutory definition would include future models trained on a quantity of compute such that they “could reasonably be expected to have similar or greater performance as an artificial intelligence model trained using [>1026 FLOP] in 2024.” If the statute did not contain an explicit authorization for some agency to determine the quantity of compute that qualified in a given year, any attempt to set and enforce updated regulatory compute thresholds could be challenged in court. 

The enforcing agency could argue that the statute included an implied authorization for the agency to promulgate and update the regulatory definitions at issue. This argument might succeed or fail, depending on the language of the statute, the nature of the challenged regulatory definitions, and the judicial philosophy of the deciding court. But regardless of the outcome of any individual case, challenges to impliedly authorized regulatory definitions will probably be more likely to succeed after Loper Bright than they would have been under Chevron. Perhaps more importantly, agencies will be aware that regulatory definitions will no longer receive the benefit of Chevron deference and may regulate more cautiously in order to avoid being sued.116 Moreover, even if the statute did explicitly authorize an agency to issue updated compute thresholds, such an authorization might not allow the agency to respond to future technological breakthroughs by considering some factor other than the quantity of training compute used.

In other words, a narrow congressional authorization to regulatorily define “frontier model” may prove insufficiently flexible after Loper Bright. Congress could attempt to address this possibility by instead enacting a very broad authorization.117 An overly broad definition, however, may be undesirable for reasons of democratic accountability, as it would give unelected agency officials discretionary control over which models to regulate as “frontier.” Moreover, an overly broad definition might risk running afoul of two related constitutional doctrines that limit the ability of Congress to delegate rulemaking authority to agencies—the major questions doctrine and the nondelegation doctrine.

B. The nondelegation doctrine

Under the nondelegation doctrine, which arises from the constitutional principle of separation of powers, Congress may not constitutionally delegate legislative power to executive branch agencies. In its current form, this doctrine has little relevance to efforts to define “frontier model.” Under current law, Congress can validly delegate rulemaking authority to an agency as long as the statute in which the delegation occurs includes an “intelligible principle” that provides adequate guidance for the exercise of that authority.118 In practice, this is an easy standard to satisfy—even vague and general legislative guidance, such as directing agencies to regulate in a way that “will be generally fair and equitable and will effectuate the purposes of the Act,” has been held to contain an intelligible principle.119 The Supreme Court has used the nondelegation doctrine to strike down statutes only twice, in two 1935 decisions invalidating sweeping New Deal laws.120

However, some commentators have suggested that the Supreme Court may revisit the nondelegation doctrine in the near future,121 perhaps by discarding the “intelligible principle” test in favor of something like the standard suggested by Justice Gorsuch in his 2019 dissent in Gundy v. United States.122 In Gundy, Justice Gorsuch suggested that the nondelegation doctrine, properly understood, requires Congress to make “all the relevant policy decisions” and delegate to agencies only the task of “filling up the details” via regulation.123

Therefore, if the Supreme Court does significantly strengthen the nondelegation doctrine, it is possible that a statute authorizing an agency to create a regulatory definition of “frontier model” would need to include meaningful guidance as to what the definition should look like. This is most likely to be the case if the regulatory definition in question is a key part of an extremely significant regulatory scheme, because “the degree of agency discretion that is acceptable varies according to the power congressionally conferred.”124  Congress generally “need not provide any direction” to agencies regarding the manner in which it defines specific and relatively unimportant technical terms,125 but must provide “substantial guidance” for extremely important and complex regulatory tasks that could significantly impact the national economy.126 

C. The major questions doctrine

Like the nondelegation doctrine, the major questions doctrine is a constitutional limitation on Congress’s ability to delegate rulemaking power to agencies. Like the nondelegation doctrine, it addresses concerns about the separation of powers and the increasingly prominent role executive branch agencies have taken on in the creation of important legal rules. Unlike the nondelegation doctrine, however, the major questions doctrine is a recent innovation. The Supreme Court acknowledged it by name for the first time in the 2022 case West Virginia v. Environmental Protection Agency,127 where it was used to strike down an EPA rule regulating power plant carbon dioxide emissions. Essentially, the major questions doctrine provides that courts will not accept an interpretation of a statute that grants an agency authority over a matter of great “economic or political significance” unless there is a “clear congressional authorization” for the claimed authority.128 Whereas the nondelegation doctrine provides a way to strike down statutes as unconstitutional, the major questions doctrine only affects the way that statutes are interpreted. 

Supporters of the major questions doctrine argue that it helps to rein in excessively broad delegations of legislative power to the administrative state and serves a useful separation-of-powers function. The doctrine’s critics, however, have argued that it limits Congress’s ability to set up flexible regulatory regimes that allow agencies to respond quickly and decisively to changing circumstances.129 According to this school of thought, requiring a clear statement authorizing each economically significant agency action inhibits Congress’s ability to communicate broad discretion in handling problems that are difficult to foresee in advance. 

This difficulty is particularly salient in the context of regulatory regimes for the governance of emerging technologies.130 Justice Kagan made this point in her dissent from the majority opinion in West Virginia, where she argued that the statute at issue was broadly worded because Congress had known that “without regulatory flexibility, changing circumstances and scientific developments would soon render the Clean Air Act obsolete.”131 Because advanced AI systems are likely to have a significant impact on the U.S. economy in the coming years,132 it is plausible that the task of choosing which systems should be categorized as “frontier” and subject to increased regulatory scrutiny will be an issue of great “economic and political significance.” If it is, then the major questions doctrine could be invoked to invalidate agency efforts to promulgate or amend a definition of “frontier model” to address previously unforeseen unsafe capabilities. 

For example, consider a hypothetical federal statute instituting a licensing regime for frontier models that includes a definition similar to the placeholder in EO 14110 (empowering the Bureau of Industry and Security to “define, and thereafter update as needed on a regular basis, the set of technical conditions [that determine whether a model is a frontier model].”). Suppose that BIS initially defined “dual-use foundation model” under this statute using a regularly updated compute threshold, but that ten years after the statute’s enactment a new kind of AI system was developed that could be trained to exhibit cutting-edge capabilities using a relatively small quantity of training compute. If BIS attempted to amend its regulatory definition of “frontier model” to include a capabilities threshold that would cover this newly developed and economically significant category of AI system, that new regulatory definition might be challenged under the major questions doctrine. In that situation, a court with deregulatory inclinations might not view the broad congressional authorization for BIS to define “frontier model” as a sufficiently clear statement of congressional intent to allow BIS to later institute a new and expanded licensing regime based on less objective technical criteria.133 

VI. Conclusion

One of the most common mistakes that nonlawyers make when reading a statute or regulation is to assume that each word of the text carries its ordinary English meaning. This error occurs because legal rules, unlike most writing encountered in everyday life, are often written in a sort of simple code where a number of the terms in a given sentence are actually stand-ins for much longer phrases catalogued elsewhere in a “definitions” section. 

This tendency to overlook the role that definitions play in legal rules has an analogue in a widespread tendency to overlook the importance of well-crafted definitions to a regulatory scheme. The object of this paper, therefore, has been to explain some of the key legal considerations relevant to the task of defining “frontier model” or any of the analogous phrases used in existing laws and regulations. 

One such consideration is the role that should be played by statutory and regulatory definitions, which can be used independently or in conjunction with each other to create a definition that is both technically sound and democratically legitimate. Another is the selection and combination of potential definitional elements, including technical inputs, capabilities metrics, risk, deployment context, and familiarity, that can be used independently or in conjunction with each other to create a single statutory or regulatory definition. Legal mechanisms for facilitating rapid and frequent updating for regulations targeting emerging technologies also merit attention. Finally, the nondelegation and major questions doctrines and the recent elimination of Chevron deference may affect the scope of discretion that can be conferred for the creation and updating of regulatory definitions.

Share
Cite

Charlie Bullock et al., Legal Considerations for Defining “Frontier Model”, Institute for Law & AI (September 2024), https://law-ai.org/frontier-model-definitions.

Legal considerations for defining “frontier model”
Charlie Bullock, Suzanne Van Arsdale, Mackenzie Arnold, Cullen O'Keefe, Christoph Winter
Full text PDFs
Legal considerations for defining “frontier model”
Charlie Bullock, Suzanne Van Arsdale, Mackenzie Arnold, Cullen O'Keefe, Christoph Winter