The governance misspecification problem
Abstract
Legal rules promulgated to govern emerging technologies often rely on proxy terms and metrics in order to indirectly effectuate background purposes. A common failure mode for this kind of rule occurs when, due to incautious drafting or unforeseen technological developments, a proxy ceases to function as intended and renders a rule ineffective or counterproductive. Borrowing a concept from the technical AI safety literature, we call this phenomenon the “governance misspecification problem.” This article draws on existing legal-philosophical discussions of the nature of rules to define governance misspecification, presents several historical case studies to demonstrate how and why rules become misspecified, and suggests best practices for designing legal rules to avoid misspecification or mitigate its negative effects. Additionally, we examine a few proxy terms used in existing AI governance regulations, such as “frontier AI” and “compute thresholds,” and discuss the significance of the problem of misspecification in the AI governance context.
Legal considerations for defining “frontier model”
Abstract
Many proposed laws and rules for the regulation of artificial intelligence would distinguish between a category consisting of the most advanced models—often called “frontier models”—and all other AI systems. Legal rules that make this distinction will typically need to include or reference a definition of “frontier model” or whatever analogous term is used. The task of creating this definition implicates several important legal considerations. The role of statutory and regulatory definitions in the overall definitional scheme should be considered, as should the advantages and disadvantages of incorporating elements such as technical inputs, capability metrics, epistemic elements, and deployment context into a definition. Additionally, existing legal obstacles to the rapid updating of regulatory definitions should be taken into account—including recent doctrinal developments in administrative law such as the elimination of Chevron deference and the introduction of the major questions doctrine.
I. Introduction
One of the few concrete proposals on which AI governance stakeholders in industry[ref 1] and government[ref 2] have mostly[ref 3] been able to agree is that AI legislation and regulation should recognize a distinct category consisting of the most advanced AI systems. The executive branch of the U.S. federal government refers to these systems, in Executive Order 14110 and related regulations, as “dual-use foundation models.”[ref 4] The European Union’s AI Act refers to a similar class of models as “general-purpose AI models with systemic risk.”[ref 5] And many researchers, as well as leading AI labs and some legislators, use the term “frontier models” or some variation thereon.[ref 6]
These phrases are not synonymous, but they are all attempts to address the same issue—namely that the most advanced AI systems present additional regulatory challenges distinct from those posed by less sophisticated models. Frontier models are expected to be highly capable across a broad variety of tasks and are also expected to have applications and capabilities that are not readily predictable prior to development, nor even immediately known or knowable after development.[ref 7] It is likely that not all of these applications will be socially desirable; some may even create significant risks for users or for the general public.
The question of precisely how frontier models should be regulated is contentious and beyond the scope of this paper. But any law or regulation that distinguishes between “frontier models” (or “dual-use foundation models,” or “general-purpose AI models with systemic risk”) and other AI systems will first need to define the chosen term. A legal rule that applies to a certain category of product cannot be effectively enforced or complied with unless there is some way to determine whether a given product falls within the regulated category. Laws that fail to carefully define ambiguous technical terms often fail in their intended purposes, sometimes with disastrous results.[ref 8] Because the precise meaning of the phrase “frontier model” is not self-evident,[ref 9] the scope of a law or regulation that targeted frontier models without defining that term would be unacceptably uncertain. This uncertainty would impose unnecessary costs on regulated companies (who might overcomply out of an excess of caution or unintentionally undercomply and be punished for it) and on the public (from, e.g., decreased compliance, increased enforcement costs, less risk protection, and more litigation over the scope of the rule).
The task of defining “frontier model” implicates both legal and policy considerations. This paper provides a brief overview of some of the most relevant legal considerations for the benefit of researchers, policymakers, and anyone else with an interest in the topic.
II. Statutory and Regulatory Definitions
Two related types of legal definition—statutory and regulatory—are relevant to the task of defining “frontier model.” A statutory definition is a definition that appears in a statute enacted by a legislative body such as the U.S. Congress or one of the 50 state legislatures. A regulatory definition, on the other hand, appears in a regulation promulgated by a government agency such as the U.S. Department of Commerce or the California Department of Technology (or, less commonly, in an executive order).
Regulatory definitions have both advantages and disadvantages relative to statutory definitions. Legislation is generally a more difficult and resource-intensive process than agency rulemaking, with additional veto points and failure modes.[ref 10] Agencies are therefore capable of putting into effect more numerous and detailed legal rules than Congress can,[ref 11] and can update those rules more quickly and easily than Congress can amend laws.[ref 12] Additionally, executive agencies are often more capable of acquiring deep subject-matter expertise in highly specific fields than are congressional offices due to Congress’s varied responsibilities and resource constraints.[ref 13] This means that regulatory definitions can benefit from agency subject-matter expertise to a greater extent than can statutory definitions, and can also be updated far more easily and often.
The immense procedural and political costs associated with enacting a statute do, however, purchase a greater degree of democratic legitimacy and legal resiliency than a comparable regulation would enjoy. A number of legal challenges that might persuade a court to invalidate a regulatory definition would not be available for the purpose of challenging a statute.[ref 14] And since the rulemaking power exercised by regulatory agencies is generally delegated to them by Congress, most regulations must be authorized by an existing statute. A regulatory definition generally cannot eliminate or override a statutory definition[ref 15] but can clarify or interpret. Often, a regulatory regime will include both a statutory definition and a more detailed regulatory definition for the same term.[ref 16] This can allow Congress to choose the best of both worlds, establishing a threshold definition with the legitimacy and clarity of an act of Congress while empowering an agency to issue and subsequently update a more specific and technically informed regulatory definition.
III. Existing Definitions
This section discusses five noteworthy attempts to define phrases analogous to “frontier model” from three different existing measures. Executive Order 14110 (“EO 14110”), which President Biden issued in October 2023, includes two complementary definitions of the term “dual-use foundation model.” Two definitions of “covered model” from different versions of the Safe and Secure Innovation for Frontier Artificial Intelligence Models Act, a California bill that was recently vetoed by Governor Newsom, are also discussed, along with the EU AI Act’s definition of “general-purpose AI model with systemic risk.”
A. Executive Order 14110
EO 14110 defines “dual-use foundation model” as:
an AI model that is trained on broad data; generally uses self-supervision; contains at least tens of billions of parameters; is applicable across a wide range of contexts; and that exhibits, or could be easily modified to exhibit, high levels of performance at tasks that pose a serious risk to security, national economic security, national public health or safety, or any combination of those matters, such as by:
(i) substantially lowering the barrier of entry for non-experts to design, synthesize, acquire, or use chemical, biological, radiological, or nuclear (CBRN) weapons;
(ii) enabling powerful offensive cyber operations through automated vulnerability discovery and exploitation against a wide range of potential targets of cyber attacks; or
(iii) permitting the evasion of human control or oversight through means of deception or obfuscation.
Models meet this definition even if they are provided to end users with technical safeguards that attempt to prevent users from taking advantage of the relevant unsafe capabilities.[ref 17]
The executive order imposes certain reporting requirements on companies “developing or demonstrating an intent to develop” dual-use foundation models,[ref 18] and for purposes of these requirements it instructs the Department of Commerce to “define, and thereafter update as needed on a regular basis, the set of technical conditions for models and computing clusters that would be subject to the reporting requirements.”[ref 19] In other words, EO 14110 contains both a high-level quasi-statutory[ref 20] definition and a directive to an agency to promulgate a more detailed regulatory definition. The EO also provides a second definition that acts as a placeholder until the agency’s regulatory definition is promulgated:
any model that was trained using a quantity of computing power greater than 1026 integer or floating-point operations, or using primarily biological sequence data and using a quantity of computing power greater than 1023 integer or floating-point operations[ref 21]
Unlike the first definition, which relies on subjective evaluations of model characteristics,[ref 22] this placeholder definition provides a simple set of objective technical criteria that labs can consult to determine whether the reporting requirements apply. For general-purpose models, the sole test is whether the model was trained on computing power greater than 1026 integer or floating-point operations (FLOP); only models that exceed this compute threshold[ref 23] are deemed “dual-use foundation models” for purposes of the reporting requirements mandated by EO 14110.
B. California’s “Safe and Secure Innovation for Frontier Artificial Intelligence Act” (SB 1047)
California’s recently vetoed “Safe and Secure Innovation for Frontier Artificial Intelligence Models Act” (“SB 1047”) focused on a category that it referred to as “covered models.”[ref 24] The version of SB 1047 passed by the California Senate in May 2024 defined “covered model” to include models meeting either of the following criteria:
(1) The artificial intelligence model was trained using a quantity of computing power greater than 1026 integer or floating-point operations.
(2) The artificial intelligence model was trained using a quantity of computing power sufficiently large that it could reasonably be expected to have similar or greater performance as an artificial intelligence model trained using a quantity of computing power greater than 1026 integer or floating-point operations in 2024 as assessed using benchmarks commonly used to quantify the general performance of state-of-the-art foundation models.[ref 25]
This definition resembles the placeholder definition in EO 14110 in that it primarily consists of a training compute threshold of 1026 FLOP. However, SB 1047 added an alternative capabilities-based threshold to capture future models which “could reasonably be expected” to be as capable as models trained on 1026 FLOP in 2024. This addition was intended to “future-proof”[ref 26] SB 1047 by addressing one of the main disadvantages of training compute thresholds—their tendency to become obsolete over time as advances in algorithmic efficiency produce highly capable models trained on relatively small amounts of compute.[ref 27]
Following pushback from stakeholders who argued that SB 1047 would stifle innovation,[ref 28] the bill was amended repeatedly in the California State Assembly. The final version defined “covered model” in the following way:
(A) Before January 1, 2027, “covered model” means either of the following:
(i) An artificial intelligence model trained using a quantity of computing power greater than 1026 integer or floating-point operations, the cost of which exceeds one hundred million dollars[ref 29] ($100,000,000) when calculated using the average market prices of cloud compute at the start of training as reasonably assessed by the developer.
(ii) An artificial intelligence model created by fine-tuning a covered model using a quantity of computing power equal to or greater than three times 1025 integer or floating-point operations, the cost of which, as reasonably assessed by the developer, exceeds ten million dollars ($10,000,000) if calculated using the average market price of cloud compute at the start of fine-tuning.
(B) (i) Except as provided in clause (ii), on and after January 1, 2027, “covered model” means any of the following:
(I) An artificial intelligence model trained using a quantity of computing power determined by the Government Operations Agency pursuant to Section 11547.6 of the Government Code, the cost of which exceeds one hundred million dollars ($100,000,000) when calculated using the average market price of cloud compute at the start of training as reasonably assessed by the developer.
(II) An artificial intelligence model created by fine-tuning a covered model using a quantity of computing power that exceeds a threshold determined by the Government Operations Agency, the cost of which, as reasonably assessed by the developer, exceeds ten million dollars ($10,000,000) if calculated using the average market price of cloud compute at the start of fine-tuning.
(ii) If the Government Operations Agency does not adopt a regulation governing subclauses (I) and (II) of clause (i) before January 1, 2027, the definition of “covered model” in subparagraph (A) shall be operative until the regulation is adopted.
This new definition was more complex than its predecessor. Subsection (A) introduced an initial definition slated to apply until at least 2027, which relied on a training compute threshold of 1026 FLOP paired with a training cost floor of $100,000,000.[ref 30] Subsection (B), in turn, provided for the eventual replacement of the training compute thresholds used in the initial definition with new thresholds to be determined (and presumably updated) by a regulatory agency.
The most significant change in the final version of SB 1047’s definition was the replacement of the capability threshold with a $100,000,000 cost threshold. Because it would currently cost more than $100,000,000 to train a model using >1026 FLOP, the addition of the cost threshold did not change the scope of the definition in the short term. However, the cost of compute has historically fallen precipitously over time in accordance with Moore’s law.[ref 31] This may mean that models trained using significantly more than 1026 FLOP will cost significantly less than the inflation-adjusted equivalent of 100 million 2024 dollars to create at some point in the future.
The old capability threshold expanded the definition of “covered model” because it was an alternative to the compute threshold—models that exceeded either of the two thresholds would have been “covered.” The newer cost threshold, on the other hand, restricted the scope of the definition because it was linked conjunctively to the compute threshold, meaning that only models that exceed both thresholds were covered. In other words, where the May 2024 definition of “covered model” future-proofed itself against the risk of becoming underinclusive by including highly capable low-compute models, the final definition instead guarded against the risk of becoming overinclusive by excluding low-cost models trained on large amounts of compute. Furthermore, the final cost threshold was baked into the bill text and could only have been changed by passing a new statute—unlike the compute threshold, which could have been specified and updated by a regulator.
Compared with the overall definitional scheme in EO 14110, SB 1047’s definition was simpler, easier to operationalize, and less flexible. SB 1047 lacked a broad, high-level risk-based definition like the first definition in EO 14110. SB 1047 did resemble EO 14110 in its use of a “placeholder” definition, but where EO 14110 confers broad discretion on the regulator to choose the “set of technical conditions” that will comprise the regulatory definition, SB 1047 only authorized the regulator to set and adjust the numerical value of the compute thresholds in an otherwise rigid statutory definition.
C. EU Artificial Intelligence Act
The EU AI Act classifies AI systems according to the risks they pose. It prohibits systems that do certain things, such as exploiting the vulnerabilities of elderly or disabled people,[ref 32] and regulates but does not ban so-called “high-risk” systems.[ref 33] While this classification system does not map neatly onto U.S. regulatory efforts, the EU AI Act does include a category conceptually similar to the EO’s “dual-use foundation model”: the “general-purpose AI model with systemic risk.”[ref 34] The statutory definition for this category includes a given general-purpose model[ref 35] if:
a. it has high impact capabilities[ref 36] evaluated on the basis of appropriate technical tools and methodologies, including indicators and benchmarks; [or]
b. based on a decision of the Commission,[ref 37] ex officio or following a qualified alert from the scientific panel, it has capabilities or an impact equivalent to those set out in point (a) having regard to the criteria set out in Annex XIII.
Additionally, models are presumed to have “high impact capabilities” if they were trained on >1025 FLOP.[ref 38] The seven “criteria set out in Annex XIII” to be considered in evaluating model capabilities include a variety of technical inputs (such as the model’s number of parameters and the size or quality of the dataset used in training the model), the model’s performance on benchmarks and other capabilities evaluations, and other considerations such as the number of users the model has.[ref 39] When necessary, the European Commission is authorized to amend the compute threshold and “supplement benchmarks and indicators” in response to technological developments, such as “algorithmic improvements or increased hardware efficiency.”[ref 40]
The EU Act definition resembles the initial, broad definition in the EO in that they both take diverse factors like the size and quality of the dataset used to train the model, the number of parameters, and the model’s capabilities into account. However, the EU Act definition is likely much broader than either EO definition. The training compute threshold in the EU Act is sufficient, but not necessary, to classify models as systemically risky, whereas the (much higher) threshold in the EO’s placeholder definition is both necessary and sufficient. And the first EO definition includes only models that exhibit a high level of performance on tasks that pose serious risks to national security, while the EU Act includes all general-purpose models with “high impact capabilities,” which it defines as including any model trained on more than 1025 FLOP.
The EU Act definition resembles the final SB 1047 definition of “covered model” in that both definitions authorize a regulator to update their thresholds in response to changing circumstances. It also resembles SB 1047’s May 2024 definition in that both definitions incorporate a training compute threshold and a capabilities-based element.
IV. Elements of Existing Definitions
As the examples discussed above demonstrate, legal definitions of “frontier model” can consist of one or more of a number of criteria. This section discusses a few of the most promising definitional elements.
A. Technical inputs and characteristics
A definition may classify AI models according to their technical characteristics or the technical inputs used in training the model, such as training compute, parameter count, and dataset size and type. These elements can be used in either statutory or regulatory definitions.
Training compute thresholds are a particularly attractive option for policymakers,[ref 41] as evidenced by the three examples discussed above. “Training compute” refers to the computational power used to train a model, often measured in integer or floating-point operations (OP or FLOP).[ref 42] Training compute thresholds function as a useful proxy for model capabilities because capabilities tend to increase as computational resources used to train the model increase.[ref 43]
One advantage of using a compute threshold is that training compute is a straightforward metric that is quantifiable and can be readily measured, monitored, and verified.[ref 44] Because of these characteristics, determining with high certainty whether a given model exceeds a compute threshold is relatively easy. This, in turn, facilitates enforcement of and compliance with regulations that rely on a compute-based definition. Since the amount of training compute (and other technical inputs) can be estimated prior to the training run,[ref 45] developers can predict whether a model will be covered earlier in development.
One disadvantage of a compute-based definition is that compute thresholds are a proxy for model capabilities, which are in turn a proxy for risk. Definitions that make use of multiple nested layers of proxy terms in this manner are particularly prone to becoming untethered from their original purpose.[ref 46] This can be caused, for example, by the operation of Goodhart’s Law, which suggests that “when a measure becomes a target, it ceases to be a good measure.”[ref 47] Particularly problematic, especially for statutory definitions that are more difficult to update, is the possibility that a compute threshold may become underinclusive over time as improvements in algorithmic efficiency allow for the development of highly capable models trained on below-threshold levels of compute.[ref 48] This possibility is one reason why SB 1047 and the EU AI Act both supplement their compute thresholds with alternative, capabilities-based elements.
In addition to training compute, two other model characteristics correlated with capabilities are the number of model parameters[ref 49] and the size of the dataset on which the model was trained.[ref 50] Either or both of these characteristics can be used as an element of a definition. A definition can also rely on training data characteristics other than size, such as the quality or type of the data used; the placeholder definition in EO 14110, for example, contains a lower compute threshold for models “trained… using primarily biological sequence data.”[ref 51] EO 14110 requires a dual-use foundation model to contain “at least tens of billions of parameters,”[ref 52] and the “number of parameters of the model” is a criteria to be considered under the EU AI Act.[ref 53] EO 14110 specified that only models “trained on broad data” could be dual-use foundation models,[ref 54] and the EU AI Act includes “the quality or size of the data set, for example measured through tokens” as one criterion for determining whether an AI model poses systemic risks.[ref 55]
Dataset size and parameter count share many of the pros and cons of training compute. Like training compute, they are objective metrics that can be measured and verified, and they serve as proxies for model capabilities.[ref 56] Training compute is often considered the best and most reliable proxy of the three, in part because it is the most closely correlated with performance and is difficult to manipulate.[ref 57] However, partially redundant backup metrics can still be useful.[ref 58] Dataset characteristics other than size are typically less quantifiable and harder to measure but are also capable of capturing information that the quantifiable metrics cannot.
B. Capabilities
Frontier models can also be defined in terms of their capabilities. A capabilities-based definition element typically sets a threshold level of competence that a model must achieve to be considered “frontier,” either in one or more specific domains or across a broad range of domains. A capabilities-based definition can provide specific, objective criteria for measuring a model’s capabilities,[ref 59] or it can describe the capabilities required in more general terms and leave the task of evaluation to the discretion of future interpreters.[ref 60] The former approach might be better suited to a regulatory definition, especially if the criteria used will have to be updated frequently, whereas the latter approach would be more typical of a high-level statutory definition.
Basing a definition on capabilities, rather than relying on a proxy for capabilities like training compute, eliminates the risk that the chosen proxy will cease to be a good measure of capabilities over time. Therefore, a capabilities-based definition is more likely than, e.g., a compute threshold to remain robust over time in the face of improvements in algorithmic efficiency. This was the point of the May 2024 version of SB 1047’s use of a capabilities element tethered to a compute threshold (“similar or greater performance as an artificial intelligence model trained using a quantity of computing power greater than 1026 integer or floating-point operations in 2024”)—it was an attempt to capture some of the benefits of an input-based definition while also guarding against the possibility that models trained on less than 1026 FLOP may become far more capable in the future than they are in 2024.
However, capabilities are far more difficult than compute to accurately measure. Whether a model has demonstrated “high levels of performance at tasks that pose a serious risk to security” under the EO’s broad capabilities-based definition is not something that can be determined objectively and to a high degree of certainty like the size of a dataset in tokens or the total FLOP used in a training run. Model capabilities are often measured using benchmarks (standardized sets of tasks or questions),[ref 61] but creating benchmarks that accurately measure the complex and diverse capabilities of general-purpose foundation models[ref 62] is notoriously difficult.[ref 63]
Additionally, model capabilities (unlike the technical inputs discussed above) are generally not measurable until after the model has been trained.[ref 64] This makes it difficult to regulate the development of frontier models using capabilities-based definitions, although post-development, pre-release regulation is still possible.
C. Risk
Some researchers have suggested the possibility of defining frontier AI systems on the basis of the risks they pose to users or to public safety instead of or in addition to relying on a proxy metric, like capabilities, or a proxy for a proxy, such as compute.[ref 65] The principal advantage of this direct approach is that it can, in theory, allow for better-targeted regulations—for instance, by allowing a definition to exclude highly capable but demonstrably low-risk models. The principal disadvantage is that measuring risk is even more difficult than measuring capabilities.[ref 66] The science of designing rigorous safety evaluations for foundation models is still in its infancy.[ref 67]
Of the three real-world measures discussed in Section III, only EO 14110 mentions risk directly. The broad initial definition of “dual-use foundation model” includes models that exhibit “high levels of performance at tasks that pose a serious risk to security,” such as “enabling powerful offensive cyber operations through automated vulnerability discovery” or making it easier for non-experts to design chemical weapons. This is a capability threshold combined with a risk threshold; the tasks at which a dual-use foundation model must be highly capable are those that pose a “serious risk” to security, national economic security, and/or national public health or safety. As EO 14110 shows, risk-based definition elements can specify the type of risk that a frontier model must create instead of addressing the severity of the risks created.
D. Epistemic elements
One of the primary justifications for recognizing a category of “frontier models” is the likelihood that broadly capable AI models that are more advanced than previous generations of models will have capabilities and applications that are not readily predictable ex ante.[ref 68] As the word “frontier” implies, lawmakers and regulators focusing on frontier models are interested in targeting models that break new ground and push into the unknown.[ref 69] This was, at least in part, the reason for the inclusion of training compute thresholds of 1026 FLOP in EO 14110 and SB 1047—since the most capable current models were trained on 5×1025 or fewer FLOP,[ref 70] a model trained on 1026 FLOP would represent a significant step forward into uncharted territory.
While it is possible to target models that advance the state of the art by setting and adjusting capability or compute thresholds, a more direct alternative approach would be to include an epistemic element in a statutory definition of “frontier model.” An epistemic element would distinguish between “known” and “unknown” models, i.e., between well-understood models that pose only known risks and poorly understood models that may pose unfamiliar and unpredictable risks.[ref 71]
This kind of distinction between known and unknown risks has a long history in U.S. regulation.[ref 72] For instance, the Toxic Substances Control Act (TSCA) prohibits the manufacturing of any “new chemical substance” without a license.[ref 73] The EPA keeps and regularly updates a list of chemical substances which are or have been manufactured in the U.S., and any substance not included on this list is “new” by definition.[ref 74] In other words, the TSCA distinguishes between chemicals (including potentially dangerous chemicals) that are familiar to regulators and unfamiliar chemicals that pose unknown risks.
One advantage of an epistemic element is that it allows a regulator to address “unknown unknowns” separately from better-understood risks that can be evaluated and mitigated more precisely.[ref 75] Additionally, the scope of an epistemic definition, unlike that of most input- and capability-based definitions, would change over time as regulators became familiar with the capabilities of and risks posed by new models.[ref 76] Models would drop out of the “frontier” category once regulators became sufficiently familiar with their capabilities and risks.[ref 77] Like a capabilities- or risk-based definition, however, an epistemic definition might be difficult to operationalize.[ref 78] To determine whether a given model was “frontier” under an epistemic definition, it would probably be necessary to either rely on a proxy for unknown capabilities or authorize a regulator to categorize eligible models according to a specified process.[ref 79]
E. Deployment context
The context in which an AI system is deployed can serve as an element in a definition. The EU AI Act, for example, takes the number of registered end users and the number of registered EU business users a model has into account as factors to be considered in determining whether a model is a “general-purpose AI model with systemic risk.”[ref 80] Deployment context typically does not in and of itself provide enough information about the risks posed by a model to function as a stand-alone definitional element, but it can be a useful proxy for the kind of risk posed by a given model. Some models may cause harms in proportion to their number of users, and the justification for aggressively regulating these models grows stronger the more users they have. A model that will only be used by government agencies, or by the military, creates a different set of risks than a model that is made available to the general public.
V. Updating Regulatory Definitions
A recurring theme in the scholarly literature on the regulation of emerging technologies is the importance of regulatory flexibility.[ref 81] Because of the rapid pace of technological progress, legal rules designed to govern emerging technologies like AI tend to quickly become outdated and ineffective if they cannot be rapidly and frequently updated in response to changing circumstances.[ref 82] For this reason, it may be desirable to authorize an executive agency to promulgate and update a regulatory definition of “frontier model,” since regulatory definitions can typically be updated more frequently and more easily than statutory definitions under U.S. law.[ref 83]
Historically, failing to quickly update regulatory definitions in the context of emerging technologies has often led to the definitions becoming obsolete or counterproductive. For example, U.S. export controls on supercomputers in the 1990s and early 2000s defined “supercomputer” in terms of the number of millions of theoretical operations per second (MTOPS) the computer could perform.[ref 84] Rapid advances in the processing power of commercially available computers soon rendered the initial definition obsolete, however, and the Clinton administration was forced to revise the MTOPS threshold repeatedly to avoid harming the competitiveness of the American computer industry.[ref 85] Eventually, the MTOPS metric itself was rendered obsolete, leading to a period of several years in which supercomputer export controls were ineffective at best.[ref 86]
There are a number of legal considerations that may prevent an agency from quickly updating a regulatory definition and a number of measures that can be taken to streamline the process. One important aspect of the rulemaking process is the Administrative Procedure Act’s “notice and comment” requirement.[ref 87] In order to satisfy this requirement, agencies are generally obligated to publish notice of any proposed amendment to an existing regulation in the Federal Register, allow time for the public to comment on the proposal, respond to public comments, publish a final version of the new rule, and then allow at least 30–60 days before the rule goes into effect.[ref 88] From the beginning of the notice-and-comment process to the publication of a final rule, this process can take anywhere from several months to several years.[ref 89] However, an agency can waive the 30–60 day publication period or even the entire notice-and-comment requirement for “good cause” if observing the standard procedures would be “impracticable, unnecessary, or contrary to the public interest.”[ref 90] Of course, the notice-and-comment process has benefits as well as costs; public input can be substantively valuable and informative for agencies, and also increases the democratic accountability of agencies and the transparency of the rulemaking process. In certain circumstances, however, the costs of delay can outweigh the benefits. U.S. agencies have occasionally demonstrated a willingness to waive procedural rulemaking requirements in order to respond to emergency AI-related developments. The Bureau of Industry and Security (“BIS”), for example, waived the normal 30-day waiting period for an interim rule prohibiting the sale of certain advanced AI-relevant chips to China in October 2023.[ref 91]
Another way to encourage quick updating for regulatory definitions is for Congress to statutorily authorize agencies to eschew or limit the length of notice and comment, or to compel agencies to promulgate a final rule by a specified deadline.[ref 92] Because notice and comment is a statutory requirement, it can be adjusted as necessary by statute.
For regulations exceeding a certain threshold of economic significance, another substantial source of delay is OIRA review. OIRA, the Office of Information and Regulatory Affairs, is an office within the White House that oversees interagency coordination and undertakes centralized cost-benefit analysis of important regulations.[ref 93] Like notice and comment, OIRA review can have significant benefits—such as improving the quality of regulations and facilitating interagency cooperation—but it also delays the implementation of significant rules, typically by several months.[ref 94] OIRA review can be waived either by statutory mandate or by OIRA itself.[ref 95]
VI. Deference, Delegation, and Regulatory Definitions
Recent developments in U.S. administrative law may make it more difficult for Congress to effectively delegate the task of defining “frontier model” to a regulatory agency. A number of recent Supreme Court cases signal an ongoing shift in U.S. administrative law doctrine intended to limit congressional delegations of rulemaking authority.[ref 96] Whether this development is good or bad on net is a matter of perspective; libertarian-minded observers who believe that the U.S. has too many legal rules already[ref 97] and that overregulation is a bigger problem than underregulation have welcomed the change,[ref 98] while pro-regulation observers predict that it will significantly reduce the regulatory capacity of agencies in a number of important areas.[ref 99]
Regardless of where one falls on that spectrum of opinion, the relevant takeaway for efforts to define “frontier model” is that it will likely become somewhat more difficult for agencies to promulgate and update regulatory definitions without a clear statutory authorization to do so. If Congress still wishes to authorize the creation of regulatory definitions, however, it can protect agency definitions from legal challenges by clearly and explicitly authorizing agencies to exercise discretion in promulgating and updating definitions of specific terms.
A. Loper Bright and deference to agency interpretations
In a recent decision in the combined cases of Loper Bright Enterprises v. Raimondo and Relentless v. Department of Commerce, the Supreme Court repealed a longstanding legal doctrine known as Chevron deference.[ref 100] Under Chevron, federal courts were required to defer to certain agency interpretations of federal statutes when (1) the relevant part of the statute being interpreted was genuinely ambiguous and (2) the agency’s interpretation was reasonable. After Loper Bright, courts are no longer required to defer to these interpretations—instead, under a doctrine known as Skidmore deference,[ref 101] agency interpretations will prevail in court only to the extent that courts are persuaded by them.[ref 102]
Justice Elena Kagan’s dissenting opinion in Loper Bright argues that the decision will harm the regulatory capacity of agencies by reducing the ability of agency subject-matter experts to promulgate regulatory definitions of ambiguous statutory phrases in “scientific or technical” areas.[ref 103] The dissent specifically warns that, after Loper Bright, courts will “play a commanding role” in resolving questions like “[w]hat rules are going to constrain the development of A.I.?”[ref 104]
Justice Kagan’s dissent probably somewhat overstates the significance of Loper Bright to AI governance for rhetorical effect.[ref 105] The end of Chevron deference does not mean that Congress has completely lost the ability to authorize regulatory definitions; where Congress has explicitly directed an agency to define a specific statutory term, Loper Bright will not prevent the agency from doing so.[ref 106] An agency’s authority to promulgate a regulatory definition under a statute resembling EO 14110, which explicitly directs the Department of Commerce to define “dual-use foundation model,” would likely be unaffected. However, Loper Bright has created a great deal of uncertainty regarding the extent to which courts will accept agency claims that Congress has implicitly authorized the creation of regulatory definitions.[ref 107]
To better understand how this uncertainty might affect efforts to define “frontier model,” consider the following real-life example. The Energy Policy and Conservation Act (“EPCA”) includes a statutory definition of the term “small electric motor.”[ref 108] Like many statutory definitions, however, this definition is not detailed enough to resolve all disputes about whether a given product is or is not a “small electric motor” for purposes of EPCA. In 2010, the Department of Energy (“DOE”), which is authorized under EPCA to promulgate energy efficiency standards governing “small electric motors,”[ref 109] issued a regulatory definition of “small electric motor” specifying that the term referred to motors with power outputs between 0.25 and 3 horsepower.[ref 110] The National Electrical Manufacturers Association (“NEMA”), a trade association of electronics manufacturers, sued to challenge the rule, arguing that motors with between 1 and 3 horsepower were too powerful to be “small electric motors” and that the DOE was exceeding its statutory authority by attempting to regulate them.[ref 111]
In a 2011 opinion that utilized the Chevron framework, the federal court that decided NEMA’s lawsuit considered the language of EPCA’s statutory definition and concluded that EPCA was ambiguous as to whether motors with between 1 and 3 horsepower could be “small electric motors.”[ref 112] The court then found that the DOE’s regulatory definition was a reasonable interpretation of EPCA’s statutory definition, deferred to the DOE under Chevron, and upheld the challenged regulation.[ref 113]
Under Chevron, federal courts were required to assume that Congress had implicitly authorized agencies like the DOE to resolve ambiguities in a statute, as the DOE did in 2010 by promulgating its regulatory definition of “small electric motor.” After Loper Bright, courts will recognize fewer implicit delegations of definition-making authority. For instance, while EPCA requires the DOE to prescribe “testing requirements” and “energy conservation standards” for small electric motors, it does not explicitly authorize the DOE to promulgate a regulatory definition of “small electric motor.” If a rule like the one challenged by NEMA were challenged today, the DOE could still argue that Congress implicitly authorized the creation of such a rule by giving the DOE authority to prescribe standards and testing requirements—but such an argument would probably be less likely to succeed than the Chevron argument that saved the rule in 2011.
Today, a court that did not find an implicit delegation of rulemaking authority in EPCA would not defer to the DOE’s interpretation. Instead, the court would simply compare the DOE’s regulatory definition of “small electric motor” with NEMA’s proposed definition and decide which of the two was a more faithful interpretation of EPCA’s statutory definition.[ref 114] Similarly, when or if some future federal statute uses the phrase “frontier model” or any analogous term, agency attempts to operationalize the statute by enacting detailed regulatory definitions that are not explicitly authorized by the statute will be easier to challenge after Loper Bright than they would have been under Chevron.
Congress can avoid Loper Bright issues by using clear and explicit statutory language to authorize agencies to promulgate and update regulatory definitions of “frontier model” or analogous phrases. However, it is often difficult to predict in advance whether or how a statutory definition will become ambiguous over time. This is especially true in the context of emerging technologies like AI, where the rapid pace of technological development and the poorly understood nature of the technology often eventually render carefully crafted definitions obsolete.[ref 115]
Suppose, for example, that a federal statute resembling the May 2024 draft of SB 1047 was enacted. The statutory definition would include future models trained on a quantity of compute such that they “could reasonably be expected to have similar or greater performance as an artificial intelligence model trained using [>1026 FLOP] in 2024.” If the statute did not contain an explicit authorization for some agency to determine the quantity of compute that qualified in a given year, any attempt to set and enforce updated regulatory compute thresholds could be challenged in court.
The enforcing agency could argue that the statute included an implied authorization for the agency to promulgate and update the regulatory definitions at issue. This argument might succeed or fail, depending on the language of the statute, the nature of the challenged regulatory definitions, and the judicial philosophy of the deciding court. But regardless of the outcome of any individual case, challenges to impliedly authorized regulatory definitions will probably be more likely to succeed after Loper Bright than they would have been under Chevron. Perhaps more importantly, agencies will be aware that regulatory definitions will no longer receive the benefit of Chevron deference and may regulate more cautiously in order to avoid being sued.[ref 116] Moreover, even if the statute did explicitly authorize an agency to issue updated compute thresholds, such an authorization might not allow the agency to respond to future technological breakthroughs by considering some factor other than the quantity of training compute used.
In other words, a narrow congressional authorization to regulatorily define “frontier model” may prove insufficiently flexible after Loper Bright. Congress could attempt to address this possibility by instead enacting a very broad authorization.[ref 117] An overly broad definition, however, may be undesirable for reasons of democratic accountability, as it would give unelected agency officials discretionary control over which models to regulate as “frontier.” Moreover, an overly broad definition might risk running afoul of two related constitutional doctrines that limit the ability of Congress to delegate rulemaking authority to agencies—the major questions doctrine and the nondelegation doctrine.
B. The nondelegation doctrine
Under the nondelegation doctrine, which arises from the constitutional principle of separation of powers, Congress may not constitutionally delegate legislative power to executive branch agencies. In its current form, this doctrine has little relevance to efforts to define “frontier model.” Under current law, Congress can validly delegate rulemaking authority to an agency as long as the statute in which the delegation occurs includes an “intelligible principle” that provides adequate guidance for the exercise of that authority.[ref 118] In practice, this is an easy standard to satisfy—even vague and general legislative guidance, such as directing agencies to regulate in a way that “will be generally fair and equitable and will effectuate the purposes of the Act,” has been held to contain an intelligible principle.[ref 119] The Supreme Court has used the nondelegation doctrine to strike down statutes only twice, in two 1935 decisions invalidating sweeping New Deal laws.[ref 120]
However, some commentators have suggested that the Supreme Court may revisit the nondelegation doctrine in the near future,[ref 121] perhaps by discarding the “intelligible principle” test in favor of something like the standard suggested by Justice Gorsuch in his 2019 dissent in Gundy v. United States.[ref 122] In Gundy, Justice Gorsuch suggested that the nondelegation doctrine, properly understood, requires Congress to make “all the relevant policy decisions” and delegate to agencies only the task of “filling up the details” via regulation.[ref 123]
Therefore, if the Supreme Court does significantly strengthen the nondelegation doctrine, it is possible that a statute authorizing an agency to create a regulatory definition of “frontier model” would need to include meaningful guidance as to what the definition should look like. This is most likely to be the case if the regulatory definition in question is a key part of an extremely significant regulatory scheme, because “the degree of agency discretion that is acceptable varies according to the power congressionally conferred.”[ref 124] Congress generally “need not provide any direction” to agencies regarding the manner in which it defines specific and relatively unimportant technical terms,[ref 125] but must provide “substantial guidance” for extremely important and complex regulatory tasks that could significantly impact the national economy.[ref 126]
C. The major questions doctrine
Like the nondelegation doctrine, the major questions doctrine is a constitutional limitation on Congress’s ability to delegate rulemaking power to agencies. Like the nondelegation doctrine, it addresses concerns about the separation of powers and the increasingly prominent role executive branch agencies have taken on in the creation of important legal rules. Unlike the nondelegation doctrine, however, the major questions doctrine is a recent innovation. The Supreme Court acknowledged it by name for the first time in the 2022 case West Virginia v. Environmental Protection Agency,[ref 127] where it was used to strike down an EPA rule regulating power plant carbon dioxide emissions. Essentially, the major questions doctrine provides that courts will not accept an interpretation of a statute that grants an agency authority over a matter of great “economic or political significance” unless there is a “clear congressional authorization” for the claimed authority.[ref 128] Whereas the nondelegation doctrine provides a way to strike down statutes as unconstitutional, the major questions doctrine only affects the way that statutes are interpreted.
Supporters of the major questions doctrine argue that it helps to rein in excessively broad delegations of legislative power to the administrative state and serves a useful separation-of-powers function. The doctrine’s critics, however, have argued that it limits Congress’s ability to set up flexible regulatory regimes that allow agencies to respond quickly and decisively to changing circumstances.[ref 129] According to this school of thought, requiring a clear statement authorizing each economically significant agency action inhibits Congress’s ability to communicate broad discretion in handling problems that are difficult to foresee in advance.
This difficulty is particularly salient in the context of regulatory regimes for the governance of emerging technologies.[ref 130] Justice Kagan made this point in her dissent from the majority opinion in West Virginia, where she argued that the statute at issue was broadly worded because Congress had known that “without regulatory flexibility, changing circumstances and scientific developments would soon render the Clean Air Act obsolete.”[ref 131] Because advanced AI systems are likely to have a significant impact on the U.S. economy in the coming years,[ref 132] it is plausible that the task of choosing which systems should be categorized as “frontier” and subject to increased regulatory scrutiny will be an issue of great “economic and political significance.” If it is, then the major questions doctrine could be invoked to invalidate agency efforts to promulgate or amend a definition of “frontier model” to address previously unforeseen unsafe capabilities.
For example, consider a hypothetical federal statute instituting a licensing regime for frontier models that includes a definition similar to the placeholder in EO 14110 (empowering the Bureau of Industry and Security to “define, and thereafter update as needed on a regular basis, the set of technical conditions [that determine whether a model is a frontier model].”). Suppose that BIS initially defined “dual-use foundation model” under this statute using a regularly updated compute threshold, but that ten years after the statute’s enactment a new kind of AI system was developed that could be trained to exhibit cutting-edge capabilities using a relatively small quantity of training compute. If BIS attempted to amend its regulatory definition of “frontier model” to include a capabilities threshold that would cover this newly developed and economically significant category of AI system, that new regulatory definition might be challenged under the major questions doctrine. In that situation, a court with deregulatory inclinations might not view the broad congressional authorization for BIS to define “frontier model” as a sufficiently clear statement of congressional intent to allow BIS to later institute a new and expanded licensing regime based on less objective technical criteria.[ref 133]
VI. Conclusion
One of the most common mistakes that nonlawyers make when reading a statute or regulation is to assume that each word of the text carries its ordinary English meaning. This error occurs because legal rules, unlike most writing encountered in everyday life, are often written in a sort of simple code where a number of the terms in a given sentence are actually stand-ins for much longer phrases catalogued elsewhere in a “definitions” section.
This tendency to overlook the role that definitions play in legal rules has an analogue in a widespread tendency to overlook the importance of well-crafted definitions to a regulatory scheme. The object of this paper, therefore, has been to explain some of the key legal considerations relevant to the task of defining “frontier model” or any of the analogous phrases used in existing laws and regulations.
One such consideration is the role that should be played by statutory and regulatory definitions, which can be used independently or in conjunction with each other to create a definition that is both technically sound and democratically legitimate. Another is the selection and combination of potential definitional elements, including technical inputs, capabilities metrics, risk, deployment context, and familiarity, that can be used independently or in conjunction with each other to create a single statutory or regulatory definition. Legal mechanisms for facilitating rapid and frequent updating for regulations targeting emerging technologies also merit attention. Finally, the nondelegation and major questions doctrines and the recent elimination of Chevron deference may affect the scope of discretion that can be conferred for the creation and updating of regulatory definitions.
Existing authorities for oversight of frontier AI models
Abstract
It has been suggested that a national frontier AI governance strategy should include a comprehensive regime for tracking and licensing the creation and dissemination of frontier models and critical hardware components (“AI Oversight”). A robust Oversight regime would almost certainly require new legislation. In the absence of new legislation, however, it might be possible to accomplish some of the goals of an AI Oversight regime using existing legal authorities. This memorandum discusses a number of existing authorities in order of their likely utility for AI Oversight. The existing authorities that appear to be particularly promising include the Defense Production Act, the Export Administration Regulations, the International Emergency Economic Powers Act, the use of federal funding conditions, and Federal Trade Commission consumer protection authorities. Somewhat less promising authorities discussed in the memo include § 606(c) of the Communications Act of 1934, Committee on Foreign Investment in the United States review, the Atomic Energy Act, copyright and antitrust laws, the Biological Weapons Anti-Terrorism Act, the Chemical Weapons Convention Implementation Act, and the Federal Select Agent Program.
It has been suggested that frontier artificial intelligence (“AI”) models may in the near future pose serious risks to the national security of the United States—for example, by allowing terrorist groups or hostile foreign state actors to acquire chemical, biological, or nuclear weapons, spread dangerously compelling personalized misinformation on a grand scale, or execute devastating cyberattacks on critical infrastructure. Wise regulation of frontier models is, therefore, a national security imperative, and has been recognized as such by leading figures in academia,[ref 1] industry,[ref 2] and government.[ref 3]
One promising strategy for governance of potentially dangerous frontier models is “AI Oversight.” AI Oversight is defined as a comprehensive regulatory regime allowing the U.S. government to:
1) Track and license hardware for making frontier AI systems (“AI Hardware”)
2) Track and license the creation of frontier AI systems (“AI Creation”), and
3) License the dissemination of frontier AI systems (“AI Proliferation”).
Implementation of a comprehensive AI Oversight regime will likely require substantial new legislation. Substantial new federal AI governance legislation, however, may be many months or even years away. In the immediate and near-term future, therefore, government Oversight of AI Hardware, Creation, and Proliferation will have to rely on existing legal authorities. Of course, tremendously significant regulatory regimes, such as a comprehensive licensing program for a transformative new technology, are not typically—and, in the vast majority of cases, should not be—created by executive fiat without any congressional input. In other words, the short answer to the question of whether AI Oversight can be accomplished using existing authorities is “no.” The remainder of this memorandum attempts to lay out the long answer. Despite the fact that a complete and effective Oversight regime based solely on existing authorities is an unlikely prospect, a broad survey of the authorities that could in theory contribute to such a regime may prove informative to AI governance researchers, legal scholars, and policymakers. In the interests of casting a wide net and giving the most complete possible picture of all plausible or semi-plausible existing authorities for Oversight, the included authorities were intentionally selected with an eye towards erring on the side of overinclusiveness. Therefore, this memo includes some authorities which are unlikely to be used, authorities which would only indirectly or partially contribute to Oversight, and authorities which would likely face serious legal challenges if used in the manner proposed.
Each of the eleven sections below discusses one or more existing authorities that could be used for Oversight and evaluates the authority’s likely relevance. The sections are listed in descending order of evaluated relevance, with the more important and realistic authorities coming first and the more speculative or tangentially relevant authorities bringing up the rear. Some of the authorities discussed are “shovel-ready” and could be put into action immediately, while others would require some agency action, up to and including the promulgation of new regulations (but not new legislation), before being used in the manner suggested.
Included at the beginning of each Section are two bullet points addressing the aspects of Oversight to which each authority might contribute and a rough estimation of the authority’s likelihood of use for Oversight. No estimation of the likelihood that a given authority’s use could be successfully legally challenged is provided, because the outcome of a hypothetical lawsuit would depend too heavily on the details of the authority’s implementation for such an estimate to be useful.[ref 4] The likelihood of use is communicated in terms of rough estimations of likelihood (“reasonably likely,” “unlikely,” etc.) rather than, e.g., percentages, in order to avoid giving a false impression of confidence, given that predicting whether a given authority will be used even in the relatively short term is quite difficult.
The table below contains a brief description of each of the authorities discussed along with the aspects of Oversight to which they may prove relevant and the likelihood of their use for Oversight.
Defense Production Act
- Potentially applicable to: Licensing AI Hardware, Creation, and Proliferation; Tracking AI Hardware and Creation.
- Already being used to track AI Creation; reasonably likely to be used again in the future in some additional AI Oversight capacity.
The Defense Production Act (“DPA”)[ref 5] authorizes the President to take a broad range of actions to influence domestic industry in the interests of the “national defense.”[ref 6] The DPA was first enacted during the Korean War and was initially used solely for purposes directly related to defense industry production. The DPA has since been reenacted a number of times—most recently in 2019, for a six-year period expiring in September 2025—and the statutory definition of “national defense” has been repeatedly expanded by Congress.[ref 7] Today DPA authorities can be used to address and prepare for a variety of national emergencies.[ref 8] The DPA was originally enacted with seven Titles, four of which have since been allowed to lapse. The remaining Titles—I, III, and VII—furnish the executive branch with a number of authorities which could be used to regulate AI hardware, creation, and proliferation.
Invocation of the DPA’s information-gathering authority in Executive Order 14110
Executive Order 14110 relies on the DPA in § 4.2, “Ensuring Safe and Reliable AI.”[ref 9] Section 4.2 orders the Department of Commerce to require companies “developing or demonstrating an intent to develop dual-use foundation models” to “provide the Federal Government, on an ongoing basis, with information, reports, or records” regarding (a) development and training of dual-use foundation models and security measures taken to ensure the integrity of any such training; (b) ownership and possession of the model weights of any dual-use foundation models and security measures taken to protect said weights; and (c) the results of any dual-use foundation model’s performance in red-teaming exercises.[ref 10] The text of the EO does not specify which provision(s) of the DPA are being invoked, but based on the language of EO § 4.2[ref 11] and on subsequent statements from the agency charged with implementing EO § 4.2[ref 12] the principal relevant provision appears to be § 705, from Title VII of the DPA.[ref 13] According to social media statements by official Department of Commerce accounts, Commerce began requiring companies to “report vital information to the Commerce Department — especially AI safety test results.,” no later than January 29, 2024.[ref 14] However, no further details about the reporting requirements have been made public and no proposed rules or notices relating to the reporting requirements have been issued publicly as of the writing of this memorandum.[ref 15] Section 705 grants the President broad authority to collect information in order to further national defense interests,[ref 16] which authority has been delegated to the Department of Commerce pursuant to E.O. 13603.[ref 17]
Section 705 authorizes the President to obtain information “by regulation, subpoena, or otherwise,” as the President deems necessary or appropriate to enforce or administer the Defense Production Act. In theory, this authority could be relied upon to justify a broad range of government efforts to track AI Hardware and Creation. Historically, § 705 has most often been used by the Department of Commerce’s Bureau of Industry and Security (“BIS”) to conduct “industrial base assessment” surveys of specific defense-relevant industries.[ref 18] For instance, BIS recently prepared an “Assessment of the Critical Supply Chains Supporting the U.S. Information and Communications Technology Industry” which concluded in February 2022.[ref 19] BIS last conducted an assessment of the U.S. artificial intelligence sector in 1994.[ref 20]
Republican elected officials, libertarian commentators, and some tech industry lobbying groups have questioned the legality of EO 14110’s use of the DPA and raised the possibility of a legal challenge.[ref 21] As no such lawsuit has yet been filed, it is difficult to evaluate § 4.2’s chances of surviving hypothetical future legal challenges. The arguments against its legality that have been publicly advanced—such as that the “Defense Production Act is about production… not restriction”[ref 22] and that AI does not present a “national emergency”[ref 23]—are legally dubious, in this author’s opinion.[ref 24] However, § 705 of the DPA has historically been used mostly to conduct “industrial base assessments,” i.e., surveys to collect information about defense-relevant industries.[ref 25] When the DPA was reauthorized in 1992, President George H.W. Bush remarked that using § 705 during peacetime to collect industrial base data from American companies would “intrude inappropriately into the lives of Americans who own and work in the Nation’s businesses.”[ref 26] While that observation is not in any sense legally binding, it does tend to show that EO 14110’s aggressive use of § 705 during peacetime is unusual by historical standards and presents potentially troubling issues relating to executive overreach. The fact that companies are apparently to be required to report on an indefinitely “ongoing basis”[ref 27] is also unusual, as past industrial base surveys have been snapshots of an industry’s condition at a particular time rather than semipermanent ongoing information-gathering institutions.
DPA Title VII: voluntary agreements and recruiting talent
Title VII includes a variety of provisions in addition to § 705, a few of which are potentially relevant to AI Oversight. Section 708 of the DPA authorizes the President to “consult with representatives of industry, business, financing, agriculture, labor, and other interests in order to provide for the making by such persons, with the approval of the President, of voluntary agreements and plans of action to help provide for the national defense.”[ref 28] Section 708 provides an affirmative defense against any civil or criminal antitrust suit for all actions taken in furtherance of a presidentially sanctioned voluntary agreement.[ref 29] This authority could be used to further the kind of cooperation between labs on safety-related issues that has not happened to date because of labs’ fear of antitrust enforcement.[ref 30] Cooperation between private interests in the AI industry could facilitate, for example, information-sharing regarding potential dangerous capabilities, joint AI safety research ventures, voluntary agreements to abide by shared safety standards, and voluntary agreements to pause or set an agreed pace for increases in the size of training runs for frontier AI models.[ref 31] This kind of cooperation could facilitate an effective voluntary pseudo-licensing regime in the absence of new legislation.
Sections 703 and 710 of the DPA could provide effective tools for recruiting talent for government AI roles. Under § 703, agency heads can hire individuals outside of the competitive civil service system and pay them enhanced salaries.[ref 32] Under § 710, the head of any governmental department or agency can establish and train a National Defense Executive Reserve (“NDER”) of individuals held in reserve “for employment in executive positions in Government during periods of national defense emergency.”[ref 33] Currently, there are no active NDER units, and the program has been considered something of a failure because of underfunding and mismanagement since the Cold War,[ref 34] but the statutory authority to create NDER units still exists and could be utilized if top AI researchers and engineers were willing to volunteer for NDER roles. Both §§ 703 and 710 could indirectly facilitate tracking and licensing by allowing information-gathering agencies like BIS or agencies charged with administering a licensing regime to hire expert personnel more easily.
DPA Title I: priorities and allocations authorities
Title I of the DPA empowers the President to require private U.S. companies to prioritize certain contracts in order to “promote the national defense.” Additionally, Title I purports to authorize the President to “allocate materials, services, and facilities” in any way he deems necessary or appropriate to promote the national defense.[ref 35] These so-called “priorities” and “allocations” authorities have been delegated to six federal agencies pursuant to Executive Order 13603.[ref 36] The use of these authorities is governed by a set of regulations known as the Defense Priorities and Allocations System (“DPAS”),[ref 37] which is administered by BIS.[ref 38] Under the DPAS, contracts can be assigned one of two priority ratings, “DO” or “DX.”[ref 39] All priority-rated contracts take precedence over all non-rated contracts, and DX contracts take priority over DO contracts.[ref 40]
Because the DPA defines the phrase “national defense” expansively,[ref 41] the text of Title I can be interpreted to authorize a broad range of executive actions relevant to AI governance. For example, it has been suggested that the priorities authority could be used to prioritize government access to cloud-compute resources in times of crisis[ref 42] or to compel semiconductor companies to prioritize government contracts for chips over preexisting contracts with private buyers.[ref 43] Title I could also, in theory, be used for AI Oversight directly. For instance, the government could in theory attempt to institute a limited and partial licensing regime for AI Hardware and Creation by either (a) allocating limited AI Hardware resources such as chips to companies that satisfy licensing requirements promulgated by BIS, or (b) ordering companies that do not satisfy such requirements to prioritize work other than development of potentially dangerous frontier models.[ref 44]
The approach described would be an unprecedentedly aggressive use of Title I, and is unlikely to occur given the hesitancy of recent administrations to use the full scope of the presidential authorities Title I purports to convey. The allocations authority has not been used since the end of the Cold War,[ref 45] perhaps in part because of uncertainty regarding its legitimate scope.[ref 46] That said, guidance from the Defense Production Act Committee (“DPAC”), a body that “coordinate[s] and plan[s] for . . . the effective use of the priorities and allocations authorities,”[ref 47] indicates that the priorities and allocations authorities can be used to protect against, respond to, or recover from “acts of terrorism, cyberattacks, pandemics, and catastrophic disasters.”[ref 48] If the AI risk literature is to be believed, frontier AI models may soon be developed that pose risks related to all four of those categories.[ref 49]
The use of the priorities authority during the COVID-19 pandemic tends to show that, even in recognized and fairly severe national emergencies, extremely aggressive uses of the priorities and allocations authorities are unlikely. FEMA and the Department of Health and Human Services (“HHS”) used the priorities authority to require companies to produce N95 facemasks and ventilators on a government-mandated timeline,[ref 50] and HHS and the Department of Defense (“DOD”) also issued priority ratings to combat supply chain disruptions and expedite the acquisition of critical equipment and chemicals for vaccine development as part of Operation Warp Speed.[ref 51] But the Biden administration did not invoke the allocations authority at any point, and the priorities authority was used for its traditional purpose—to stimulate, rather than to prevent or regulate, the industrial production of specified products.
DPA Title III: subsidies for industry
Title III of the DPA authorizes the President to issue subsidies, purchase commitments and purchases, loan guarantees, and direct loans to incentivize the development of industrial capacity in support of the national defense.[ref 52] Title III also establishes a Defense Production Act Fund, from which all Title III actions are funded and into which government proceeds from Title III activities and appropriations by Congress are deposited.[ref 53] The use of Title III requires the President to make certain determinations, including that the resource or technology to be produced is essential to the national defense and that Title III is the most cost-effective and expedient means of ensuring the shortfall is addressed.[ref 54] The responsibility for making these determinations is non-delegable.[ref 55] The Title III award program is overseen by DOD.[ref 56]
Like Title I, Title III authorities were invoked a number of times in order to address the COVID-19 pandemic. For example, DOD invoked Title III in April 2020 to award $133 million for the production of N-95 masks and again in May 2020 to award $138 million in support of vaccine supply chain development.[ref 57] More recently, President Biden issued a Presidential Determination in March 2023 authorizing Title III expenditures to support domestic manufacturing of certain important microelectronics supply chain components—printed circuit boards and advanced packaging for semiconductor chips.[ref 58]
It has been suggested that Title III subsidies and purchase commitments could be used to incentivize increased domestic production of important AI hardware components, or to guarantee the purchase of data useful for military or intelligence-related machine learning applications.[ref 59] This would allow the federal government to exert some influence over the direction of the funded projects, although the significance of that influence would be limited by the amount of available funding in the DPA fund unless Congress authorized additional appropriations. With respect to Oversight, the government could attach conditions intended to facilitate tracking or licensing regimes to contracts entered into under Title III.[ref 60]
Export controls
- Potentially applicable to: Licensing AI Hardware, Creation, and Proliferation
- Already being used to license exports of AI Hardware; new uses relating to Oversight likely in the near future
Export controls are legislative or regulatory tools used to restrict the export of goods, software, and knowledge, usually in order to further national security or foreign policy interests. Export controls can also sometimes be used to restrict the “reexport” of controlled items from one foreign country to another, or to prevent controlled items from being shown to or used by foreign persons inside the U.S.
Currently active U.S. export control authorities include: (1) the International Traffic in Arms Regulations (“ITAR”), which control the export of weapons and other articles and services with strictly military applications;[ref 61] (2) multilateral agreements to which the United States is a state party, such as the Wassenaar Arrangement;[ref 62] and (3) the Export Administration Regulations (“EAR”), which are administered by BIS and which primarily regulate “dual use” items, which have both military and civilian applications.[ref 63] This section focuses on the EAR, the authority most relevant to Oversight.
Export Administration Regulations
The EAR incorporate the Commerce Control List (“CCL”).[ref 64] The CCL is a list, maintained by BIS, of more than 3,000 “items” which are prohibited from being exported, or prohibited from being exported to certain countries, without a license from BIS.[ref 65] The EAR define “item” and “export” broadly—software, data, and tangible goods can all be “items,” and “export” can include, for example, showing controlled items to a foreign national in the United States or posting non-public data to the internet.[ref 66] However, software or data that is “published,” i.e., “made available to the public without restrictions upon its further dissemination,” is generally not subject to the EAR. Thus, the EAR generally cannot be used to restrict the publication or export of free and open-source software.[ref 67]
The CCL currently contains a fairly broad set of export restrictions that require a license for exports to China of advanced semiconductor chips, input materials used in the fabrication of semiconductors, and semiconductor manufacturing equipment.[ref 68] These restrictions are explicitly intended to “limit the PRC’s ability to obtain advanced computing chips or further develop AI and ‘supercomputer’ capabilities for uses that are contrary to U.S. national security and foreign policy interests.”[ref 69] The CCL also currently restricts “neural computers”[ref 70] and a narrowly-defined category of AI software useful for analysis of drone imagery[ref 71]—“geospatial imagery ‘software’ ‘specially designed’ for training a Deep Convolutional Neural Network to automate the analysis of geospatial imagery and point clouds.”[ref 72]
In addition to the item-based CCL, the EAR include end-user controls, including an “Entity List” of individuals and companies subject to export licensing requirements.[ref 73] Some existing end-user controls are designed to protect U.S. national security interests by hindering the ability of rivals like China to effectively conduct defense-relevant AI research. For example, in December 2022 BIS added a number of “major artificial intelligence (AI) chip research and development, manufacturing and sales entities” that “are, or have close ties to, government organizations that support the Chinese military and the defense industry” to the Entity List.[ref 74]
The EAR also include, at 15 C.F.R. § 744, end-use based “catch-all” controls, which effectively prohibit the unlicensed export of items if the exporter knows or has reason to suspect that the item will be directly or indirectly used in the production, development, or use of missiles, certain types of drones, nuclear weapons, or chemical or biological weapons.[ref 75] Section 744 also imposes a license requirement on the export of items which the exporter knows are intended for a military end use.[ref 76]
Additionally, 15 C.F.R. § 744.6 requires “U.S. Persons” (a term which includes organizations as well as individuals) to obtain a license from BIS before “supporting” the design, development, production, or use of missiles or nuclear, biological, or chemical weapons, “supporting” the military intelligence operations of certain countries, or “supporting” the development or production of specified types of semiconductor chips in China. The EAR definition of “support” is extremely broad and covers “performing any contract, service, or employment you know may assist or benefit” the prohibited end uses in any way.[ref 77]
For both the catch-all and U.S. Persons restrictions, BIS is authorized to send so-called “is informed” letters to individuals or companies advising that a given action requires a license because the action might result in a prohibited end-use or support a prohibited end-use or end-user.[ref 78] This capability allows BIS to exercise a degree of control over exports and over the actions of U.S. Persons immediately, without going through the time-consuming process of Notice and Comment Rulemaking. For instance, BIS sent an “is informed” letter to NVIDIA on August 26, 2022, imposing a new license requirement on the export of certain chips to China and Russia, effective immediately, because BIS believed that there was a risk the chips would be used for military purposes.[ref 79]
BIS has demonstrated a willingness to update its semiconductor export regime quickly and flexibly. For instance, after BIS restricted exports of AI-relevant chips in a rule issued on October 7, 2022, Nvidia modified its market-leading A100 and H100 chips to comply with the regulations and began to export the resultant modified A800 and H800 chips to China.[ref 80] On October 17, 2023, BIS announced a new interim final rule prohibiting exports of A800 and H800 chips to China and waived the 30-day waiting period normally required by the Administrative Procedure Act so that the interim rule became effective just a few days after being announced.[ref 81] Commerce Secretary Gina Raimondo stated that “[i]f [semiconductor companies] redesign a chip around a particular cut line that enables them to do AI, I’m going to control it the very next day.”[ref 82]
In summation, the EAR currently impose a license requirement on a number of potentially dangerous actions relating to AI Hardware, Creation, and Proliferation. These controls have thus far been used primarily to restrict exports of AI hardware, but in theory they could also be used to impose licensing requirements on activities relating to AI creation and proliferation. The primary legal issue with this kind of regulation arises from the First Amendment.
Export controls and the First Amendment
Suppose that BIS determined that a certain AI model would be useful to terrorists or foreign state actors in the creation of biological weapons. Could BIS inform the developer of said model of this determination and prohibit the developer from making the model publicly available? Alternatively, could BIS add model weights which would be useful for training dangerous AI models to the CCL and require a license for their publication on the internet?
One potential objection to the regulations described above is that they would violate the First Amendment as unconstitutional prior restraints on speech. Courts have held that source code can be constitutionally protected expression, and in the 1990s export regulations prohibiting the publication of encryption software were struck down as unconstitutional prior restraints.[ref 83] However, the question of when computer code constitutes protected expression is a subject of continuing scholarly debate,[ref 84] and there is a great deal of uncertainty regarding the scope of the First Amendment’s application to export controls of software and training data. The argument for restricting model weights may be stronger than the argument for restricting other relevant software or code items, because model weights are purely functional rather than communicative; they tell a computer what to do, but cannot be read or interpreted by humans.[ref 85]
Currently, the EAR avoids First Amendment issues by allowing a substantial exception to existing licensing requirements for “published” information.[ref 86] A great deal of core First Amendment communicative speech, such as basic research in universities, is “published” and therefore not subject to the EAR. Non-public proprietary software, however, can be placed on the CCL and restricted in much the same manner as tangible goods, usually without provoking any viable First Amendment objection.[ref 87] Additionally, the EAR’s recently added “U.S. Persons” controls regulate actions rather than directly regulating software, and it has been argued that this allows BIS to exercise some control over free and open source software without imposing an unconstitutional prior restraint, since under some circumstances providing access to an AI model may qualify as unlawful “support” for prohibited end-uses.[ref 88]
Emergency powers
- Applicable to: Tracking and Licensing AI Hardware & Creation; Licensing Proliferation
- Already in use (IEEPA, to mandate know-your-customer requirements for IAAS providers pursuant to EO 14110); Unlikely to be used (§ 606(c))
The United States Code contains a number of statutes granting the President extraordinary powers that can only be used following the declaration of a national emergency. This section discusses two such emergency provisions—the International Emergency Economic Powers Act[ref 89] and § 606(c) of the Communications Act of 1934[ref 90]—and their existing and potential application to AI Oversight.
There are three existing statutory frameworks governing the declaration of emergencies: the National Emergencies Act (“NEA”),[ref 91] the Robert T. Stafford Disaster Relief and Emergency Assistance Act,[ref 92] and the Public Health Service Act.[ref 93] Both of the authorities discussed in this section can be invoked following an emergency declaration under the NEA.[ref 94] The NEA is a statutory framework that provides a procedure for declaring emergencies and imposes certain requirements and limitations on the exercise of emergency powers.[ref 95]
International Emergency Economic Powers Act
The most frequently invoked emergency authority under U.S. law is the International Emergency Economic Powers Act (“IEEPA”), which grants the President expansive powers to regulate international commerce.[ref 96] The IEEPA gives the President broad authority to impose a variety of economic sanctions on individuals and entities during a national emergency.[ref 97] The IEEPA has been “the sole or primary statute invoked in 65 of the 71”[ref 98] emergencies declared under the NEA since the NEA’s enactment in 1976.
The IEEPA authorizes the President to “investigate, regulate, or prohibit” transactions subject to U.S. jurisdiction that involve a foreign country or national.[ref 99] The IEEPA also authorizes the investigation, regulation, or prohibition of any acquisition or transfer involving a foreign country or national.[ref 100] The emergency must originate “in whole or in substantial part outside the United States” and must relate to “the national security, foreign policy, or economy of the United States.”[ref 101] There are some important exceptions to the IEEPA’s general grant of authority—all “personal communications” as well as “information” and “informational materials” are outside of the IEEPA’s scope.[ref 102] The extent to which these protections would prevent the IEEPA from effectively being used for AI Oversight is unclear, because there is legal uncertainty as to whether, e.g., the transfer of AI model training weights overseas would be covered by one or more of the exceptions. If the relevant interpretive questions are resolved in a manner conducive to strict regulation, a partial licensing regime could be implemented under the IEEPA by making transactions contingent on safety and security evaluations. For example, foreign companies could be required to follow certain safety and security measures in order to offer subscriptions or sell an AI model in the U.S., or U.S.-based labs could be required to undergo safety evaluations prior to selling subscriptions to an AI service outside the country.
EO 14110 invoked the IEEPA to support §§ 4.2(c) and 4.2(d), provisions requiring the Department of Commerce to impose “Know Your Customer” (“KYC”) reporting requirements on U.S. Infrastructure as a Service (“IAAS”) providers. The emergency declaration justifying this use of the IEEPA originated in EO 13694, “Blocking the Property of Certain Persons Engaging in Significant Malicious Cyber-Enabled Activities” (April 1, 2015), which declared a national emergency relating to “malicious cyber-enabled activities originating from, or directed by persons located, in whole or in substantial part, outside the United States.”[ref 103] BIS introduced a proposed rule to implement the EO’s KYC provisions on January 29, 2024.[ref 104] The proposed rule would require U.S. IAAS providers (i.e., providers of cloud-based on-demand compute, storage, and networking services) to submit a report to BIS regarding any transaction with a foreign entity that could result in the training of an advanced and capable AI model that could be used for “malicious cyber-enabled activity.”[ref 105] Additionally, the rule would require each U.S. IAAS provider to develop and follow an internal “Customer Identification Program.” Each Customer Identification Program would have to provide for verification of the identities of foreign customers, provide for collection and maintenance of certain information about foreign customers, and ensure that foreign resellers of the U.S. provider’s IAAS products similarly verify, collect, and maintain.[ref 106]
In short, the proposed rule is designed to allow BIS to track attempts at AI Creation by foreign entities who attempt to purchase the kinds of cloud compute resources required to train an advanced AI model, and to prevent such purchases from occurring. This tracking capability, if effectively implemented, would prevent foreign entities from circumventing export controls on AI Hardware by simply purchasing the computing power of advanced U.S. AI chips through the cloud.[ref 107] The EO’s use of the IEEPA has so far been considerably less controversial than the use of the DPA to impose reporting requirements on the creators of frontier models.[ref 108]
Communications Act of 1934, § 606(c)
Section 606(c) of the Communications Act of 1934 could conceivably authorize a licensure program for AI Creation or Proliferation in an emergency by allowing the President to direct the closure or seizure of any networked computers or data centers used to run AI systems capable of aiding navigation. However, it is unclear whether courts would interpret the Act in such a way as to apply to AI systems, and any such use of Communications Act powers would be completely unprecedented. Therefore, § 606(c) is unlikely to be used for AI Oversight.
Section 606(c) confers emergency powers on the President “[u]pon proclamation by the President that there exists war or a … national emergency” if it is deemed “necessary in the interest of national security or defense.” The National Emergency Act (“NEA”) of 1976 governs the declaration of a national emergency and established requirements for accountability and reporting during emergencies.[ref 109] Neither statute defines “national emergency.” In an emergency, the President may (1) “suspend or amend … regulations applicable to … stations or devices capable of emitting electromagnetic radiations”; (2) close “any station for radio communication, or any device capable of emitting electromagnetic radiations between 10 kilocycles and 100,000 megacycles [10 kHz–100 GHz], which is suitable for use as a navigational aid beyond five miles” and (3) authorize “use or control” of the same.[ref 110]
In other words, § 606(c) empowers the President to seize or shut down certain types of electronic “device” during a national emergency. The applicable definition of “device” could arguably encompass most of the computers, servers, and data centers utilized in AI Creation and Proliferation.[ref 111] Theoretically, § 606(c) could be invoked to sanction seizure or closure of these devices. However, § 606(c) has never been utilized, and there is significant uncertainty concerning whether courts would allow its application to implement a comprehensive program of AI oversight.
Federal funding conditions
- Potentially applicable to: Tracking and Licensing AI Hardware & AI Creation; Licensing AI Proliferation
- Reasonably likely to be used for Oversight in some capacity
Attaching conditions intended to promote AI safety to federal grants and contracts could be an effective way of creating a partial licensing regime for AI Creation and Proliferation. Such a regime could be circumvented by simply forgoing federal funding, but could still contribute to an effective overall scheme for Oversight.
Funding conditions for federal grants and contracts
Under the Federal Property and Administrative Services Act, also known as the Procurement Act,[ref 112] the President can “prescribe policies and directives” for government procurement, including via executive order.[ref 113] Generally, courts have found that the President may order agencies to attach conditions to federal contracts so long as a “reasonably close nexus”[ref 114] exists between the executive order and the Procurement Act’s purpose, which is to provide an “economical and efficient system” for procurement.[ref 115] This is a “lenient standard[],”[ref 116] and it is likely that an executive order directing agencies to include conditions intended to promote AI safety in all AI-related federal contracts would be upheld under it.
Presidential authority to impose a similar condition on AI-related federal grants via executive order is less clear. Generally, “the ability to place conditions on federal grants ultimately comes from the Spending Clause, which empowers Congress, not the Executive, to spend for the general welfare.”[ref 117] It is therefore likely that any conditions imposed on federal grants will be imposed by legislation rather than by executive order. However, plausible arguments for Presidential authority to impose grant conditions via executive order in certain circumstances do exist, and even in the absence of an explicit condition executive agencies often wield substantial discretion in administering grant programs.[ref 118]
Implementation of federal contract conditions
Government-wide procurement policies are set by the Federal Acquisition Regulation (“FAR”), which is maintained by the Office of Federal Procurement Policy (“OFPP”).[ref 119] A number of FAR regulations require the insertion of a specified clause into all contracts of a certain type; for example, FAR § 23.804 requires the insertion of clauses imposing detailed reporting and tracking requirements for ozone-depleting chemicals into all federal contracts for refrigerators, air conditioners, and similar goods.[ref 120] Amending the FAR to include a clause imposing regulations related to the safe development of AI and prohibiting the publication of any sufficiently advanced model that had not been reviewed and deemed safe in accordance with specified procedures would effectively impose a licensing requirement on AI Creation and Proliferation, albeit a requirement that would apply only to entities receiving government funding.
A less ambitious real-life approach to implementing federal contract conditions encouraging the safe development of AI under existing authorities appears in Executive Order 14110. Section 4.4(b) of that EO directs the White House Office of Science and Technology Policy (OSTP) to release a framework designed to encourage DNA synthesis companies to screen their customers, in order to reduce the danger of e.g. terrorist organizations acquiring the tools necessary to synthesize biological weapons.[ref 121] Recipients of federal research funding will be required to adhere to the OSTP’s Framework, which was released in April 2024.[ref 122]
Potential scope of oversight via conditions on federal funding
Depending on their nature and scope, conditions imposed on grants and contracts could facilitate the tracking and/or licensing of AI Hardware, Creation, and Proliferation. The conditions could, for example, specify best practices to follow during AI Creation, and prohibit labs that accepted federal funds from developing frontier models without observing said practices; this, in effect, would create a non-universally applicable licensing regime for AI Creation. The conditions could also specify procedures (e.g. audits by third-party or government experts) for certifying that a given model could safely be made public, and prohibit the release of any AI model developed using a sufficiently large training run until it was so certified. For Hardware, the conditions could require contractors and grantees to track any purchase or sale of the relevant chips and chipmaking equipment and report all such transactions to a specified government office.
The major limitation of Oversight via federal funding conditions is that the conditions might not apply to entities that did not receive funding from the federal government. However, it is possible that this regulatory gap could be at least partially closed by drafting the included conditions to prohibit contractors and grantees from contracting with companies that fail to abide by some or all of the conditions. This would be a novel and aggressive use of federal funding conditions, but would likely hold up in court.
FTC consumer protection authorities
- Applicable to: Tracking and Licensing AI Creation, Licensing AI Proliferation
- Unlikely to be used for licensing, but somewhat likely to be involved in tracking AI Creation in some capacity
The Federal Trade Commission Act (“FTC Act”) includes broad consumer protection authorities, two of which are identified in this section as being potentially relevant to AI Oversight. Under § 5 of the FTC Act, the Federal Trade Commission (“FTC”) can pursue enforcement actions in response to “unfair or deceptive acts or practices in or affecting commerce”[ref 123]; this authority could be relevant to licensing for AI creation and proliferation. And under § 6(b), the FTC can conduct industry studies that could be useful for tracking AI creation.
The traditional test for whether a practice is “unfair,” codified at § 5(n), asks whether the practice: (1) “causes or is likely to cause substantial injury to consumers” (2) which is “not reasonably avoidable by consumers themselves” and (3) is not “outweighed by countervailing benefits to consumers or to competition.”[ref 124] “Deceptive” practices have been defined as involving: (1) a representation, omission, or practice, (2) that is material, and (3) that is “likely to mislead consumers acting reasonably under the circumstances.”[ref 125]
FTC Act § 5 oversight
Many potentially problematic or dangerous applications of highly capable LLMs would involve “unfair or deceptive acts or practices” under § 5. For example, AI safety researchers have warned of emerging risks from frontier models capable of “producing and propagating highly persuasive, individually tailored, multi-modal disinformation.”[ref 126] A commercially available model with such capabilities would likely constitute a violation of § 5’s “deceptive practices” prong.[ref 127]
Furthermore, the FTC has in recent decades adopted a broad plain-meaning interpretation of the “unfair practices” prong, meaning that irresponsible AI development practices that impose risks on consumers could constitute an “unfair practice.”[ref 128] The FTC has recently conducted a litigation campaign to impose federal data security regulation via § 5 lawsuits, and this campaign could serve as a model for a future effort to require AI labs to implement AI safety best practices while developing and publishing frontier models.[ref 129] In its data security lawsuits, the FTC argued that § 5’s prohibition of unfair practices imposed a duty on companies to implement reasonable data security measures to protect their consumers’ data.[ref 130] The vast majority of the FTC’s data security cases ended in settlements that required the defendants to implement certain security best practices and agree to third party compliance audits.[ref 131] Furthermore, in several noteworthy data security cases, the FTC has reached settlements under which defendant companies have been required to delete models developed using illegally collected data.[ref 132]
The FTC can bring § 5 claims based on prospective or “likely” harms to consumers.[ref 133] And § 5 can be enforced against defendants whose conduct is not the most proximate cause of an injury, such as an AI lab whose product is foreseeably misused by criminals to deceive or harm consumers, when the defendant provided others with “the means and instrumentalities for the commission of deceptive acts or practices.”[ref 134] Thus, if courts are willing to accept that the commercial release of models developed without observation of AI safety best practices is an “unfair” or “deceptive” act or practice under § 5, the FTC could impose, on a case-by-case basis,[ref 135] something resembling a licensing regime addressing areas of AI creation and proliferation. As in the data security settlements, the FTC could attempt to reach settlements with AI labs requiring the implementation of security best practices and third party compliance audits, as well as the deletion of models created in violation of § 5. This would not be an effective permanent substitute for a formal licensing regime, but could function as a stop-gap measure in the short term.
FTC industry studies
Section 6(b) of the FTC Act authorizes the conduct of industry studies.[ref 136] The FTC has the authority to collect confidential business information to inform these studies, requiring companies to disclose information even in the absence of any allegation of wrongdoing. This capability could be useful for tracking AI Creation.
Limitations of FTC oversight authority
The FTC has already signaled that it intends to “vigorously enforce” § 5 against companies that use AI models to automate decisionmaking in a way that results in discrimination on the basis of race or other protected characteristics.[ref 137] Existing guidance also shows that the FTC is interested in pursuing enforcement actions against companies that use LLMs to deceive consumers.[ref 138] The agency has already concluded a few successful § 5 enforcement actions targeting companies that used (non-frontier) AI models to operate fake social media accounts and deceptive chatbots.[ref 139] And in August 2023 the FTC brought a § 5 “deceptive acts or practices” enforcement action alleging that a company named Automators LLC had deceived customers with exaggerated and untrue claims about the effectiveness of the AI tools it used, including the use of ChatGPT to create customer service scripts.[ref 140]
Thus far, however, there is little indication that the FTC is inclined to take on broader regulatory responsibilities with respect to AI safety. The § 5 prohibition on “unfair practices” has traditionally been used for consumer protection, and commentators have suggested that it would be an “awkward tool” for addressing more serious national-security-related AI risk scenarios such as weapons development, which the FTC has not traditionally dealt with.[ref 141] Moreover, even if the FTC were inclined to pursue an aggressive AI Oversight agenda, the agency’s increasingly politically divisive reputation might contribute to political polarization around the issue of AI safety and inhibit bipartisan regulatory and legislative efforts.
Committee on Foreign Investment in the United States
- Potentially applicable to: Tracking and/or Licensing AI Hardware and Creation
- Unlikely to be used to directly track or license frontier AI models, but could help to facilitate effective Oversight.
The Committee on Foreign Investment in the United States (“CFIUS”) is an interagency committee charged with reviewing certain foreign investments in U.S. businesses or real estate and with mitigating the national security risks created by such transactions.[ref 142] If CFIUS determines that a given investment threatens national security, CFIUS can recommend that the President block or unwind the transaction.[ref 143] Since 2012, Presidents have blocked six transactions at the recommendation of CFIUS, all of which involved an attempt by a Chinese investor to acquire a U.S. company (or, in one instance, U.S.-held shares of a German company).[ref 144] In three of the six blocked transactions, the company targeted for acquisition was a semiconductor company or a producer of semiconductor manufacturing equipment.[ref 145]
Congress expanded CFIUS’s scope and jurisdiction in 2018 by enacting the Foreign Investment Risk Review Modernization Act of 2018 (“FIRRMA”).[ref 146] FIRRMA was enacted in part because of a Pentagon report warning that China was circumventing CFIUS by acquiring minority stakes in U.S. startups working on “critical future technologies” including artificial intelligence.[ref 147] This, the report warned, could lead to large-scale technology transfers from the U.S. to China, which would negatively impact the economy and national security of the U.S.[ref 148] Before FIRRMA, CFIUS could only review investments that might result in at least partial foreign control of a U.S. business.[ref 149] Under Department of the Treasury regulations implementing FIRRMA, CFIUS can now review “any direct or indirect, non-controlling foreign investment in a U.S. business producing or developing critical technology.”[ref 150] President Biden specifically identified artificial intelligence as a “critical technology” under FIRRMA in Executive Order 14083.[ref 151]
CFIUS imposes, in effect, a licensing requirement for foreign investment in companies working on AI Hardware and AI Creation. It also facilitates tracking of AI Hardware and Creation, since it reduces the risk of cutting-edge American advances, subject to American Oversight, being clandestinely transferred to countries in which U.S. Oversight of any kind is impossible. A major goal of any AI Oversight regime will be to stymie attempts by foreign adversaries like China and Russia to acquire U.S. AI capabilities, and CFIUS (along with export controls) will play a major role in the U.S. government’s pursuit of this goal.
Atomic Energy Act
- Applicable to: Licensing AI Creation and Proliferation
- Somewhat unlikely to be used to create a licensing regime in the absence of new legislation
The Atomic Energy Act (“AEA”) governs the development and regulation of nuclear materials and information. The AEA prohibits the disclosure of “Restricted Data,” which phrase is defined to include all data concerning the “design, manufacture, or utilization of atomic weapons.”[ref 152] The AEA also prohibits communication, transmission, or disclosure of any “information involving or incorporating Restricted Data” when there is “reason to believe such data will be utilized to injure the United States or to secure an advantage to any foreign nation.” A sufficiently advanced frontier model, even one not specifically designed to produce information relating to nuclear weapons, might be capable of producing Restricted Data based on inferences from or analysis of publicly available information.[ref 153]
A permitting system that regulates access to Restricted Data already exists.[ref 154] Additionally, the Attorney General can seek a prospective court-ordered injunction against any “acts or practices” that the Department of Energy (“DOE”) believes will violate the AEA.[ref 155] Thus, licensing AI Creation and Proliferation under the AEA could be accomplished by promulgating DOE regulations stating that AI models that do not meet specified safety criteria are, in DOE’s judgment, likely to be capable of producing Restricted Data and therefore subject to the permitting requirements of 10 C.F.R. § 725.
However, there are a number of potential legal issues that make the application of the AEA to AI Oversight unlikely. For instance, there might be meritorious First Amendment challenges to the constitutionality of the AEA itself or to the licensing regime proposed above, which could be deemed a prior restraint of speech.[ref 156] Or, it might prove difficult to establish beforehand that an AI lab had “reason to believe” that a frontier model would be used to harm the U.S. or to secure an advantage for a foreign state.[ref 157]
Copyright law
- Potentially applicable to: Licensing AI Creation and Proliferation
- Unlikely to be used directly for Oversight, but will likely indirectly affect Oversight efforts
Intellectual property (“IP”) law will undoubtedly play a key role in the future development and regulation of generative AI. IP’s role in AI Oversight, narrowly understood, is more limited. That said, there are low-probability scenarios in which IP law could contribute to an ad hoc licensing regime for frontier AI models. This section discusses the possibility that U.S. Copyright law[ref 158] could contribute to a sort of licensing regime for frontier AI models. In September and October 2023, OpenAI was named as a defendant in a number of recent putative class action copyright lawsuits.[ref 159] The complaints in these suits allege that OpenAI trained GPT-3. GPT-3.5, and GPT-4 on datasets including hundreds of thousands of pirated books downloaded from a digital repository like Z-Library or LibGen.[ref 160] In December 2023, the New York Times filed a copyright lawsuit against OpenAI and Microsoft alleging that OpenAI infringed its copyrights by using Times articles in its training datasets.[ref 161] The Times also claimed that GPT-4 had “memorized” long sections of copyrighted articles and could “recite large portions of [them] verbatim” with “minimal prompting.”[ref 162]
The eventual outcome of these lawsuits is uncertain. Some commentators have suggested that the infringement case against OpenAI is strong and that the use of copyrighted material in a training run is copyright infringement.[ref 163] Others have suggested that using copyrighted work for an LLM training run falls under fair use, if it implicates copyright law at all, because training a model on works meant for human consumption is a transformative use.[ref 164]
In a worst-case scenario for AI labs, however, a loss in court could in theory result in an injunction prohibiting OpenAI from using copyrighted works in its training runs and statutory damages of up to $150,000 per copyrighted work infringed.[ref 165] The dataset that OpenAI is alleged to have used to train GPT-3, GPT-3.5, and GPT-4 contains over a 100,000 copyrighted works,[ref 166] meaning that the upper bound for potential statutory damages for OpenAI any other AI lab that used the same dataset to train a frontier model would be upwards of $15 billion.
Such a decision would have a significant impact on the development of frontier LLMs in the United States. The amount of text required to train a cutting-edge LLM is such that an injunction requiring OpenAI and its competitors to train their models without the use of any copyrighted material would require the labs to retool their approach to training runs.
Given the U.S. government’s stated commitment to maintaining U.S. leadership in Artificial Intelligence,[ref 167] it is unlikely that Congress would allow such a decision to inhibit the development of LLMs in the United States on anything resembling a permanent basis. But copyright law could in theory impose, however briefly, a de facto halt on large training runs in the United States. If this occurred, the necessity of Congressional intervention[ref 168] would create a natural opportunity for imposing a licensing requirement on AI Creation.
Antitrust authorities
- Applicable to: Tracking and Licensing AI Hardware and AI Creation
- Unlikely to be used directly for government tracking or licensing regimes, but could facilitate the creation of an imperfect private substitute for true Oversight
U.S. antitrust authorities include the Sherman Antitrust Act of 1890[ref 169] and § 5 of the FTC Act,[ref 170] both of which prohibit anticompetitive conduct that harms consumers. The Sherman Act is enforced primarily by the Department of Justice’s (“DOJ”) Antitrust Division, while § 5 of the FTC Act is enforced by the FTC.
This section focuses on a scenario in which non-enforcement of antitrust law under certain circumstances could facilitate the creation of a system of voluntary agreements between leading AI labs as an imperfect and temporary substitute for a governmental Oversight regime. As discussed above in Section 1, one promising short-term option to ensure the safe development of frontier models prior to the enactment of comprehensive Oversight legislation is for leading AI labs to enter into voluntary agreements to abide by responsible AI development practices. In the absence of cooperation, “harmful race dynamics” can develop in which the winner-take-all nature of a race to develop a valuable new technology can incentivize firms to disregard safety, transparency, and accountability.[ref 171]
A large number of voluntary agreements have been proposed, notably including the “Assist Clause” in OpenAI’s charter. The Assist Clause states that, in order to avoid “late-stage AGI development becoming a competitive race without time for adequate safety precautions,” OpenAI commits to “stop competing with and start assisting” any safety-conscious project that comes close to building Artificial General Intelligence before OpenAI does.[ref 172] Other potentially useful voluntary agreements include agreements to: (1) abide by shared safety standards, (2) engage in joint AI safety research ventures, (3) share information, including by mutual monitoring, sharing reports about incidents during safety testing, and comprehensively accounting for compute usage,[ref 173] pause or set an agreed pace for increases in the size of training runs for frontier AI models, and/or (5) pause specified research and development activities for all labs whenever one lab develops a model that exhibits dangerous capabilities.[ref 174]
Universal, government-administered regimes for tracking and licensing AI Hardware, Creation, and Proliferation would be preferable to the voluntary agreements described for a number of reasons, notably including ease of enforcement and a lack of economic incentives for companies to defect and refuse to agree. However, many of the proposed agreements could accomplish some of the goals of AI Oversight. Compute accounting, for example, would be a substitute (albeit an imperfect one) for comprehensive tracking of AI Hardware, and other information-sharing agreements would be imperfect substitutes for tracking AI Creation. Agreements to cooperatively pause upon discovery of dangerous capabilities would serve as an imperfect substitute for an AI Proliferation licensing regime. Agreements to abide by shared safety standards would substitute for an AI Creation licensing regime, although the voluntary nature of such an arrangement would to some extent defeat the point of a licensing regime.
All of the agreements proposed, however, raise potential antitrust concerns. OpenAI’s Assist Clause, for example, could accurately be described as an agreement to restrict competition,[ref 175] as could cooperative pausing agreements.[ref 176] Information-sharing agreements between competitors can also constitute antitrust violations, depending on the nature of the information shared and the purpose for which competitors share it.[ref 177] DOJ or FTC enforcement proceedings against AI companies over such voluntary agreements —or even uncertainty regarding the possibility of such enforcement actions— could deter AI labs from implementing a system for partial self-Oversight.
One option for addressing such antitrust concerns would be the use of § 708 of the DPA, discussed above in Section 1, to officially sanction voluntary agreements between companies that might otherwise violate antitrust laws. Alternatively, the FTC and the DOJ could publish guidance informing AI labs of their respective positions on whether and under what circumstances a given type of voluntary agreement could constitute an antitrust violation.[ref 178] In the absence of some sort of guidance or safe harbor, the risk-averse in-house legal teams at leading AI companies (some of which are presently involved in and/or staring down the barrel of ultra-high-stakes antitrust litigation[ref 179]) are unlikely to allow any significant cooperation or communication between rank and file employees.
There is significant historical precedent for national security concerns playing a role in antitrust decisions.[ref 180] Most recently, after the FTC secured a permanent injunction to prohibit what it viewed as anticompetitive conduct from semiconductor company Qualcomm, the DOJ filed an appellate brief in support of Qualcomm and in opposition to the FTC, arguing that the injunction would “significantly impact U.S. national security” and incorporating a statement from a DOD official to the same effect.[ref 181] The Ninth Circuit sided with Qualcomm and the DOJ, citing national security concerns in an order granting a stay[ref 182] and later vacating the injunction.[ref 183]
Biological Weapons Anti-Terrorism Act; Chemical Weapons Convention Implementation Act
- Potentially applicable to: Licensing AI Creation & Proliferation
- Unlikely to be used for AI oversight
Among the most pressing dangers posed by frontier AI models is the risk that sufficiently capable models will allow criminal or terrorist organizations or individuals to easily synthesize dangerous biological or chemical agents or to easily design and synthesize novel and catastrophically dangerous biological or chemical agents for use as weapons.[ref 184] The primary existing U.S. government authorities prohibiting the development and acquisition of biological and chemical weapons are the Biological Weapons Anti-Terrorism Act of 1989 (“BWATA”)[ref 185] and the Chemical Weapons Convention Implementation Act of 1998 (“CWCIA”),[ref 186] respectively.
The BWATA implements the Biological Weapons Convention (“BWC”), a multilateral international agreement that prohibits the development, production, acquisition, transfer, and stockpiling of biological weapons.[ref 187] The BWC requires, inter alia, that states parties implement “any necessary measures” to prevent the proliferation of biological weapons within their territorial jurisdictions.[ref 188] In order to accomplish this purpose, Section 175(a) of the BWATA prohibits “knowingly develop[ing], produc[ing], stockpil[ing], transfer[ing], acquir[ing], retain[ing], or possess[ing]” any “biological agent,” “toxin,” or “delivery system” for use as a weapon, “knowingly assist[ing] a foreign state or any organization” to do the same, or “attempt[ing], threaten[ing], or conspir[ing]” to do either of the above.[ref 189] Under § 177, the Government can file a civil suit to enjoin the conduct prohibited in § 175(a).[ref 190]
The CWCIA implements the international Convention on the Prohibition of the Development, Stockpiling, and Use of Chemical Weapons and on Their Destruction.[ref 191] Under the CWCIA it is illegal for a person to “knowingly develop, produce, otherwise acquire, transfer directly or indirectly, receive, stockpile, retain, own, possess, or use, or threaten to use, any chemical weapon,” or to “assist or induce, in any way, any person to” do the same.[ref 192] Under § 229D, the Government can file a civil suit to enjoin the conduct prohibited in § 229 or “the preparation or solicitation to engage in conduct prohibited under § 229.”[ref 193]
It could be argued that publicly releasing an AI model that would be a useful tool for the development or production of biological or chemical weapons would amount to “knowingly assist[ing]” (or attempting or conspiring to knowingly assist) in the development of said weapons, under certain circumstances. Alternatively, with respect to chemical weapons, it could be argued that the creation or proliferation of such a model would amount to “preparation” to knowingly assist in the development of said weapons. If these arguments are accepted, then the U.S. government could, in theory, impose a de facto licensing regime on frontier AI creation and proliferation by suing to enjoin labs from releasing potentially dangerous frontier models publicly.
This, however, would be a novel use of the BWATA and/or the CWCIA. Cases interpreting § 175(a)[ref 194] and § 229[ref 195] have typically dealt with criminal prosecutions for the actual or supposed possession of controlled biological agents or chemical weapons or delivery systems. There is no precedent for a civil suit under §§ 177 or 229D to enjoin the creation or proliferation of a dual-use technology that could be used by a third party to assist in the creation of biological or chemical weapons. Furthermore, it is unclear whether courts would accept that the creation of such a dual-use model rises to the level of “knowingly” assisting in the development of chemical or biological weapons or preparing to knowingly assist in the development of chemical weapons.[ref 196]
A further obstacle to the effective use of the BWATA and/or CWCIA for oversight of AI creation or proliferation is the lack of any existing regulatory apparatus for oversight. BIS oversees a licensing regime implementing certain provisions of the Chemical Weapons Convention,[ref 197] but this regime restricts only the actual production or importation of restricted chemicals, and says nothing about the provision of tools that could be used by third parties to produce chemical weapons.[ref 198] To effectively implement a systematic licensing regime based on §§ 177 and/or 229D, rather than an ad hoc series of lawsuits attempting to restrict specific models on a case-by-case basis, new regulations would need to be promulgated.
Federal Select Agent Program
- Potentially applicable to: Tracking and/or Licensing AI Creation and Proliferation
- Unlikely to be used for AI Oversight
Following the anthrax letter attacks that killed 5 people and caused 17 others to fall ill in the fall of 2001, Congress passed the Public Health Security and Bioterrorism Preparedness and Response Act of 2002 (“BPRA”)[ref 199] in order “to improve the ability of the United States to prevent, prepare for, and respond to bioterrorism and other public health emergencies.”[ref 200] The BPRA authorizes HHS and the United States Department of Agriculture to regulate the possession, use, and transfer of certain dangerous biological agents and toxins; this program is known as the Federal Select Agent Program (“FSAP”).
The BPRA includes, at 42 U.S.C. § 262a, a section that authorizes “Enhanced control of dangerous biological agents and toxins” by HHS. Under § 262a(b), HHS is required to “provide for… the establishment and enforcement of safeguard and security measures to prevent access to [FSAP agents and toxins] for use in domestic or international terrorism or for any other criminal purpose.”[ref 201]
Subsection 262a(b) is subtitled “Regulation of transfers of listed agents and toxins,” and existing HHS regulations promulgated pursuant to § 262a(b) are limited to setting the processes for HHS authorization of transfers of restricted biological agents or toxins from one entity to another.[ref 202] However, it has been suggested that § 262a(b)’s broad language could be used to authorize a much broader range of prophylactic security measures to prevent criminals and/or terrorist organizations from obtaining controlled biological agents. A recent article in the Journal of Emerging Technologies argues that HHS has statutory authority under § 262a(b) to implement a genetic sequence screening requirement for commercial gene synthesis providers, requiring companies that synthesize DNA to check customer orders against a database of known dangerous pathogens to ensure that they are “not unwittingly participating in bioweapon development.”[ref 203]
As discussed in the previous section, one of the primary risks posed by frontier AI models is that sufficiently capable models will facilitate the synthesis by criminal or terrorist organizations of dangerous biological agents, including those agents regulated under the FSAP. HHS’s Office for the Assistant Secretary of Preparedness and Response also seems to view itself as having authority under the FSAP to make regulations to protect against synthetic “novel high-risk pathogens.”[ref 204] If HHS decided to adopt an extremely broad interpretation of its authority under § 262a(b), therefore, it could in theory “establish[] and enforce[]… safeguard and security measures to prevent access” to agents and toxins regulated by the FSAP by creating a system for Oversight of frontier AI models. HHS is not well-positioned, either in terms of resources or technical expertise, to regulate frontier AI models generally, but might be capable of effectively overseeing a tracking or licensing regime for AI Creation and Proliferation that covered advanced models designed for drug discovery, gene editing, and similar tasks.[ref 205]
However, HHS appears to view its authority under § 262a far too narrowly to undertake any substantial AI Oversight responsibility under its FPAS authorities.[ref 206] Even if HHS did make the attempt, courts would likely view an attempt to institute a licensing regime solely on the basis of § 262a(b), without any further authorization from Congress, as ultra vires.[ref 207] In short, the Federal Select Agent Program in its current form is unlikely to be used for AI Oversight.
Chips for Peace: how the U.S. and its allies can lead on safe and beneficial AI
This piece was originally published in Lawfare.
The United States and its democratic allies can lead in AI and use this position to advance global security and prosperity.
On Dec. 8, 1953, President Eisenhower addressed the UN General Assembly. In his “Atoms for Peace” address, he set out the U.S. view on the risks and hopes for a nuclear future, leveraging the U.S.’s pioneering lead in that era’s most critical new technology in order to make commitments to promote its positive uses while mitigating its risks to global security. The speech laid the foundation for the international laws, norms, and institutions that have attempted to balance nuclear safety, nonproliferation of nuclear weapons, and peaceful uses of atomic energy ever since.
As a diverse class of largely civilian technologies, artificial intelligence (AI) is unlike nuclear technology in many ways. However, at the extremes, the stakes of AI policy this century might approach those of nuclear policy last century. Future AI systems may have the potential to unleash rapid economic growth and scientific advancement —or endanger all of humanity.
The U.S. and its democratic allies have secured a significant lead in AI supply chains, development, deployment, ethics, and safety. As a result, they have an opportunity to establish new rules, norms, and institutions that protect against extreme risks from AI while enabling widespread prosperity.
The United States and its allies can capitalize on that opportunity by establishing “Chips for Peace,” a framework with three interrelated commitments to address some of AI’s largest challenges.
First, states would commit to regulating their domestic frontier AI development and deployment to reduce risks to public safety and global security. Second, states would agree to share the benefits of safe frontier AI systems broadly, especially with states that would not benefit by default. Third, states would coordinate to ensure that nonmembers cannot undercut the other two commitments. This could be accomplished through, among other tools, export controls on AI hardware and cloud computing. The ability of the U.S. and its allies to exclude noncomplying states from access to the chips and data centers that enable the development of frontier AI models undergirds the whole agreement, similar to how regulation of highly enriched uranium undergirds international regulation of atomic energy. Collectively, these three commitments could form an attractive package: an equitable way for states to advance collective safety while reaping the benefits of AI-enabled growth.
Three grand challenges from AI
The Chips for Peace framework is a package of interrelated and mutually reinforcing policies aimed at addressing three grand challenges in AI policy.
The first challenge is catastrophe prevention. AI systems carry many risks, and Chips for Peace does not aim to address them all. Instead, Chips for Peace focuses on possible large-scale risks from future frontier AI systems: general-purpose AI systems at the forefront of capabilities. Such “catastrophic” risks are often split into misuse and accidents.
For misuse, the domain that has recently garnered the most attention is biosecurity: specifically, the possibility that future frontier AI systems could make it easier for malicious actors to engineer and weaponize pathogens, especially if coupled with biological design tools. Current generations of frontier AI models are not very useful for this. When red teamers at RAND attempted to use large language model (LLM) assistants to plan a more viable simulated bioweapon attack, they found that the LLMs provided answers that were inconsistent, inaccurate, or merely duplicative of what was readily discoverable on the open internet. It is reasonable to worry, though, that future frontier AI models might be more useful to attackers. In particular, lack of tacit knowledge may be an important barrier to successfully constructing and implementing planned attacks. Future AI models with greater accuracy, scientific knowledge, reasoning capabilities, and multimodality may be able to compensate for attackers’ lack of tacit knowledge by providing real-time tailored troubleshooting assistance to attackers, thus narrowing the gap between formulating a plausible high-level plan and “successfully” implementing it.
For accidental harms, the most severe risk might come from future increasingly agentic frontier AI systems: “AI systems that can pursue complex goals with limited direct supervision” through use of computers. Such a system could, for example, receive high-level goals from a human principal in natural language (e.g., “book an island getaway for me and my family next month”), formulate a plan about how to best achieve that goal (e.g., find availability on family calendars, identify possible destinations, secure necessary visas, book hotels and flights, arrange for pet care), and take or delegate actions necessary to execute on that plan (e.g., file visa applications, email dog sitters). If such agentic systems are invented and given more responsibility than managing vacations—such as managing complex business or governmental operations—it will be important to ensure that they are easily controllable. But our theoretical ability to reliably control these agentic AI systems is still very limited, and we have no strong guarantee that currently known methods will work for smarter-than-human AI agents, should they be invented. Loss of control over such agents might entail inability to prevent them from harming us.
Time will provide more evidence about whether and to what extent these are major risks. However, for now there is enough cause for concern to begin thinking about what policies could reduce the risk of such catastrophes, should further evidence confirm the plausibility of these harms and justify actual state intervention.
The second—no less important—challenge is ensuring that the post-AI economy enables shared prosperity. AI is likely to present acute challenges to this goal. In particular, AI has strong tendencies towards winner-take-all dynamics, meaning that, absent redistributive efforts, the first countries to develop AI may reap an outsized portion of its benefit and make catch-up growth more difficult. If AI labor can replace human labor, then many people may struggle to earn enough income, including the vast majority of people who do not own nearly enough financial assets to live off of. I personally think using the economic gains from AI to uplift the entire global economy is a moral imperative. But this would also serve U.S. national security. A credible, U.S.-endorsed vision for shared prosperity in the age of AI can form an attractive alternative to the global development initiatives led by China, whose current technological offerings are undermining the U.S.’s goals of promoting human rights and democracy, including in the Global South.
The third, meta-level challenge is coordination. A single state may be able to implement sensible regulatory and economic policies that address the first two challenges locally. But AI development and deployment are global activities. States are already looking to accelerate their domestic AI sectors as part of their grand strategy, and they may be tempted to loosen their laws to attract more capital and talent. They may also wish to develop their own state-controlled AI systems. But if the price of lax AI regulation is a global catastrophe, all states have an interest in avoiding a race to the bottom by setting and enforcing strong and uniform baseline rules.
The U.S.’s opportunity to lead
The U.S. is in a strong position to lead an effort to address these challenges, for two main reasons: U.S. leadership throughout much of the frontier AI life cycle and its system of alliances.
The leading frontier AI developers—OpenAI (where, for disclosure, I previously worked), Anthropic, Google DeepMind, and Meta—are all U.S. companies. The largest cloud providers that host the enormous (and rising) amounts of computing power needed to train a frontier AI model—Amazon, Microsoft, Google, and Meta—are also American. Nvidia chips are the gold standard for training and deploying large AI models. A large, dynamic, and diverse ecosystem of American AI safety, ethics, and policy nonprofits and academic institutions have contributed to our understanding of the technology, its impacts, and possible safety interventions. The U.S. government has invested substantially in AI readiness, including through the CHIPS Act, the executive order on AI, and the AI Bill of Rights.
Complementing this leadership is a system of alliances linking the United States with much of the world. American leadership in AI depends on the notoriously complicated and brittle semiconductor supply chain. Fortunately, however, key links in that supply chain are dominated by the U.S. or its democratic allies in Asia and Europe. Together, these countries contribute more than 90 percent of the total value of the supply chain. Taiwan is the home to TSMC, which fabricates 90 percent of advanced AI chips. TSMC’s only major competitors are Samsung (South Korea) and Intel (U.S.). The Netherlands is home to ASML, the world’s only company capable of producing the extreme ultraviolet lithography tools needed to make advanced AI chips. Japan, South Korea, Germany, and the U.K. all hold key intellectual property or produce key inputs to AI chips, such as semiconductor manufacturing equipment or chip wafers. The U.K. has also catalyzed global discussion about the risks and opportunities from frontier AI, starting with its organization of the first AI Safety Summit last year and its trailblazing AI Safety Institute. South Korea recently hosted the second summit, and France will pick up that mantle later this year.
These are not just isolated strengths—they are leading to collective action. Many of these countries have been coordinating with the U.S. on export controls to retain control over advanced computing hardware. The work following the initial AI Safety Summit—including the Bletchley Declaration, International Scientific Report on the Safety of Advanced AI, and Seoul Declaration—also shows increased openness to multilateral cooperation on AI safety.
Collectively, the U.S. and its allies have a large amount of leverage over frontier AI development and deployment. They are already coordinating on export controls to maintain this leverage. The key question is how to use that leverage to address this century’s grand challenges.
Chips for Peace: three commitments for three grand challenges
Chips for Peace is a package of three commitments—safety regulation, benefit-sharing, and nonproliferation—which complement and strengthen each other. For example, benefit-sharing compensates states for the costs associated with safety regulation and nonproliferation, while nonproliferation prevents nonmembers from undermining the regulation and benefit-sharing commitments. While the U.S. and its democratic allies would form the backbone of Chips for Peace due to their leadership in AI hardware and software, membership should be open to most states that are willing to abide by the Chips for Peace package.
Safety regulation
As part of the Chips for Peace package, members would first commit to implementing domestic safety regulation. Member states would commit to ensuring that any frontier AI systems developed or deployed within their jurisdiction must meet consistent safety standards narrowly tailored to prevent global catastrophic risks from frontier AI. Monitoring of large-scale compute providers would enable enforcement of these standards.
Establishing a shared understanding of catastrophic risks from AI is the first step toward effective safety regulation. There is already exciting consensus formation happening here, such as through the International Scientific Report on the Safety of Advanced AI and the Seoul Declaration.
The exact content of safety standards for frontier AI is still an open question, not least because we currently do not know how to solve all AI safety problems. Current methods of “aligning” (i.e., controlling) AI behavior rely on our ability to assess whether that behavior is desirable. For behaviors that humans can easily assess, such as determining whether paragraph-length text outputs are objectionable, we can use techniques such as reinforcement learning from human feedback and Constitutional AI. These techniques already have limitations. These limitations may become more severe as AI systems’ behaviors become more complicated and therefore more difficult for humans to evaluate.
Despite our imperfect knowledge of how to align AI systems, there are some frontier AI safety recommendations that are beginning to garner consensus. One emerging suggestion is to start by evaluating such models for specific dangerous capabilities prior to their deployment. If a model lacks capabilities that meaningfully contribute to large-scale risks, then it should be outside the jurisdiction of Chips for Peace and left to individual member states’ domestic policy. If a model has dangerous capabilities sufficient to pose a meaningful risk to global security, then there should be clear rules about whether and how the model may be deployed. In many cases, basic technical safeguards and traditional law enforcement will bring risk down to a sufficient level, and the model can be deployed with those safeguards in place. Other cases may need to be treated more restrictively. Monitoring the companies using the largest amounts of cloud compute within member states’ jurisdictions should allow states to reliably identify possible frontier AI developers, while imposing few constraints on the vast majority of AI development.
Benefit-sharing
To legitimize and drive broad adoption of Chips for Peace as a whole—and compensate for the burdens associated with regulation—members would also commit to benefit-sharing. States that stand to benefit the most from frontier AI development and deployment by default would be obligated to contribute to programs that ensure benefits from frontier AI are broadly distributed, especially to member states in the Global South.
We are far from understanding what an attractive and just benefit-sharing regime would look like. “Benefit-sharing,” as I use the term, is supposed to encompass many possible methods. Some international regulatory regimes, like the International Atomic Energy Agency (IAEA), contain benefit-sharing programs that provide some useful precedent. However, some in the Global South understandably feel that such programs have fallen short of their lofty aspirations. Chips for Peace may also have to compete with more laissez-faire offers for technological aid from China. To make Chips for Peace an attractive agreement for states at all stages of development, states’ benefit-sharing commitments will have to be correspondingly ambitious. Accordingly, member states likely to be recipients of such benefit-sharing should be in the driver’s seat in articulating benefit-sharing commitments that they would find attractive and should be well represented from the beginning in shaping the overall Chips for Peace package. Each state’s needs are likely to be different, so there is not likely to be a one-size-fits-all benefit-sharing policy. Possible forms of benefit-sharing from which such states could choose could include subsidized access to deployed frontier AI models, assistance tailoring models to local needs, dedicated inference capacity, domestic capacity-building, and cash.
A word of caution is warranted, however. Benefit-sharing commitments need to be generous enough to attract widespread agreement, justify the restrictive aspects of Chips for Peace, and advance shared prosperity. But poorly designed benefit-sharing could be destabilizing, such as if it enabled the recipient state to defect from the agreement but still walk away with shared assets (e.g., compute and model weights) and thus undermine the nonproliferation goals of the agreement. Benefit-sharing thus needs to be simultaneously empowering to recipient states and robust to their defection. Designing technical and political tools that accomplish both of these goals at once may therefore be crucial to the viability of Chips for Peace.
Nonproliferation
A commitment to nonproliferation of harmful or high-risk capabilities would make the agreement more stable. Member states would coordinate on policies to prevent non-member states from developing or possessing high-risk frontier AI systems and thereby undermining Chips for Peace.
Several tools will advance nonproliferation. The first is imposing cybersecurity requirements that prevent exfiltration of frontier AI model weights. Second, more speculatively, on-chip hardware mechanisms could prevent exported AI hardware from being used for certain risky purposes.
The third possible tool is export controls. The nonproliferation aspect of Chips for Peace could be a natural broadening and deepening of the U.S.’s ongoing efforts to coordinate export controls on AI chips and their inputs. These efforts rely on the cooperation of allies. Over time, as this system of cooperation becomes more critical, these states may want to formalize their coordination, especially by establishing procedures that check the unilateral impulses of more powerful member states. In this way, Chips for Peace could initially look much like a new multilateral export control regime: a 21st-century version of COCOM, the Cold War-era Coordinating Committee for Multilateral Export Controls (the predecessor of the current Wassenaar Arrangement). Current export control coordination efforts could also expand beyond chips and semiconductor manufacturing equipment to include large amounts of cloud computing capacity and the weights of models known to present a large risk. Nonproliferation should also include imposition of security standards on parties possessing frontier AI models. The overall goal would be to reduce the chance that nonmembers can indigenously develop, otherwise acquire (e.g., through espionage or sale), or access high-risk models, except under conditions multilaterally set by Chips for Peace states-parties.
As the name implies, this package of commitments draws loose inspiration from the Treaty on the Non-Proliferation of Nuclear Weapons and the IAEA. Comparisons to these precedents could also help Chips for Peace avoid some of the missteps of past efforts.
Administering Chips for Peace
How would Chips for Peace be administered? Perhaps one day we will know how to design an international regulatory body that is sufficiently accountable, legitimate, and trustworthy for states to be willing to rely on it to directly regulate their domestic AI industries. But this currently seems out of reach. Even if states perceive international policymaking in this domain as essential, they are understandably likely to be quite jealous of their sovereignty over their domestic AI industries.
A more realistic approach might be harmonization backed by multiple means of verifying compliance. States would come together to negotiate standards that are promulgated by the central intergovernmental organization, similar to the IAEA Safety Standards or Financial Action Task Force (FATF) Recommendations. Member states would then be responsible for substantial implementation of these standards in their own domestic regulatory frameworks.
Chips for Peace could then rely on a number of tools to detect and remedy member state noncompliance with these standards and thus achieve harmonization despite the international standards not being directly binding on states. The first would be inspections or evaluations performed by experts at the intergovernmental organization itself, as in the IAEA. The second is peer evaluations, where member states assess each other’s compliance. This is used in both the IAEA and the FATF. Finally, and often implicitly, the most influential member states, such as the U.S., use a variety of tools—including intelligence, law enforcement (including extraterritorially), and diplomatic efforts—to detect and remedy policy lapses.
The hope is that these three approaches combined may be adequate to bring compliance to a viable level. Noncompliant states would risk being expelled from Chips for Peace and thus cut off from frontier AI hardware and software.
Open questions and challenges
Chips for Peace has enormous potential, but an important part of ensuring its success is acknowledging the open questions and challenges that remain. First, the analogy between AI chips and highly enriched uranium (HEU) is imperfect. Most glaringly, AI models (and therefore AI chips) have a much wider range of beneficial and benign applications than HEU. Second, we should be skeptical that implementing Chips for Peace will be a simple matter of copying the nuclear arms control apparatus to AI. While we can probably learn a lot from nuclear arms control, nuclear inspection protocols took decades to evolve, and the different technological features of large-scale AI computing will necessitate new methods of monitoring, verifying, and enforcing agreements.
Which brings us to the challenge of monitoring, verification, and enforcement (MVE) more generally. We do not know whether and how MVE can be implemented at acceptable costs to member states and their citizens. There are nascent proposals for how hardware-based methods could enable highly reliable and (somewhat) secrecy-preserving verification of claims about how AI chips have been used, and prevent such chips from being used outside an approved setting. But we do not yet know how robust these mechanisms can be made, especially in the face of well-resourced adversaries.
Chips for Peace probably works best if most frontier AI development is done by private actors, and member states can be largely trusted to regulate their domestic sectors rigorously and in good faith. But these assumptions may not hold. In particular, perceived national security imperatives may drive states to become more involved in frontier AI development, such as through contracting for, modifying, or directly developing frontier AI systems. Asking states to regulate their own governmental development of frontier AI systems may be harder than asking them to regulate their private sectors. Even if states are not directly developing frontier AI systems, they may also be tempted to be lenient toward their national champions to advance their security goals.
Funding has also been a persistent issue in multilateral arms control regimes. Chips for Peace would likely need a sizable budget to function properly, but there is no guarantee that states will be more financially generous in the future. Work toward designing credible and sustainable funding mechanisms for Chips for Peace could be valuable.
Finally, although I have noted that the U.S.’s democratic allies in Asia and Europe would form the core of Chips for Peace due to their collective ability to exclude parties from the AI hardware supply chain, I have left open the question of whether membership should be open only to democracies. Promoting peaceful and democratic uses of AI should be a core goal of the U.S. But the challenges from AI can and likely will transcend political systems. China has shown some initial openness to preventing competition in AI from causing global catastrophe. China is also trying to establish an independent semiconductor ecosystem despite export controls on chips and semiconductor manufacturing equipment. If these efforts are successful, Chips for Peace would be seriously weakened unless China was admitted. As during the Cold War, we may one day have to create agreements and institutions that cross ideological divides in the shared interest of averting global catastrophe.
While the risk of nuclear catastrophe still haunts us, we are all much safer due to the steps the U.S. took last century to manage this risk.
AI may bring risks of a similar magnitude this century. The U.S. may once again be in a position to lead a broad, multilateral coalition to manage these enormous risks. If so, a Chips for Peace model may manage those risks while advancing broad prosperity.
Computing power and the governance of artificial intelligence
Abstract
Computing power, or “compute,” is crucial for the development and deployment of artificial intelligence (AI) capabilities. As a result, governments and companies have started to leverage compute as a means to govern AI. For example, governments are investing in domestic compute capacity, controlling the flow of compute to competing countries, and subsidizing compute access to certain sectors. However, these efforts only scratch the surface of how compute can be used to govern AI development and deployment. Relative to other key inputs to AI (data and algorithms), AI-relevant compute is a particularly effective point of intervention: it is detectable, excludable, and quantifiable, and is produced via an extremely concentrated supply chain. These characteristics, alongside the singular importance of compute for cutting-edge AI models, suggest that governing compute can contribute to achieving common policy objectives, such as ensuring the safety and beneficial use of AI. More precisely, policymakers could use compute to facilitate regulatory visibility of AI, allocate resources to promote beneficial outcomes, and enforce restrictions against irresponsible or malicious AI development and usage. However, while compute-based policies and technologies have the potential to assist in these areas, there is significant variation in their readiness for implementation. Some ideas are currently being piloted, while others are hindered by the need for fundamental research. Furthermore, naïve or poorly scoped approaches to compute governance carry significant risks in areas like privacy, economic impacts, and centralization of power. We end by suggesting guardrails to minimize these risks from compute governance.
Advanced AI governance: a literature review of problems, options, and proposals
Abstract
As the capabilities of AI systems have continued to improve, the technology’s global stakes have become increasingly clear. In response, an “advanced AI governance” community has come into its own, drawing on diverse bodies of research to analyze the potential problems this technology poses, map the options available for its governance, and articulate and advance concrete policy proposals. However, this field still faces a lack of internal and external clarity over its different research programmes. In response, this literature review provides an updated overview and taxonomy of research in advanced AI governance. After briefly setting out the aims, scope, and limits of this project, this review covers three major lines of work: (I) problem-clarifying research aimed at understanding the challenges advanced AI poses for governance, by mapping the strategic parameters (technical, deployment, governance) around its development and by deriving indirect guidance from history, models, or theory; (II) option-identifying work aimed at understanding affordances for governing these problems, by mapping potential key actors, their levers of governance over AI, and pathways to influence whether or how these are utilized; (III) prescriptive work aimed at identifying priorities and articulating concrete proposals for advanced AI policy, on the basis of certain views of the problem and governance options. The aim is that, by collecting and organizing the existing literature, this review will contribute to greater analytical and strategic clarity, enabling more focused and productive research, public debate, and policymaking on the critical challenges of advanced AI.
Executive Summary
This literature review provides an overview and taxonomy of past and recent research in the emerging field of advanced AI governance.
Aim: The aim of this review is to help disentangle and consolidate the field, improve its accessibility, enable clearer conversations and better evaluations, and contribute to overall strategic clarity or coherence in public and policy debates.
Summary: Accordingly, this review is organized as follows:
The introduction discusses the aims, scope, selection criteria, and limits of this review and provides a brief reading guide.
Part I reviews problem-clarifying work aimed at mapping the parameters of the AI governance challenge, including lines of research to map and understand:
- Key technical parameters constituting the technical characteristics of advanced AI technology and its resulting (sociotechnical) impacts and risks. These include evaluations of the technical landscape of advanced AI (its forms, possible developmental pathways, timelines, trajectories), models for its general social impacts, threat models for potential extreme risks (based on general arguments and direct and indirect threat models), and the profile of the technical alignment problem and its dedicated research field.
- Key deployment parameters constituting the conditions (present and future) of the AI development ecosystem and how these affect the distribution and disposition of the actors that will (first) deploy such systems. These include the size, productivity, and geographic distribution of the AI research field; key AI inputs; and the global AI supply chain.
- Key governance parameters affecting the conditions (present and future) for governance interventions. These include stakeholder perceptions of AI and trust in its developers, the default regulatory landscape affecting AI, prevailing barriers to effective AI governance, and effects of AI systems on the tools of law and governance themselves.
- Other lenses on characterizing the advanced AI governance problem. These include lessons derived from theory, from abstract models and wargames, from historical case studies (of technology development and proliferation, of its societal impacts and societal reactions, of successes and failures in historical attempts to initiate technology governance, and of successes and failures in the efficacy of different governance levers at regulating technology), and lessons derived from ethics and political theory.
Part II reviews option-identifying work aimed at mapping potential affordances and avenues for governance, including lines of research to map and understand:
- Potential key actors shaping advanced AI, including actors such as or within AI labs and companies, the digital AI services and compute hardware supply chains, AI industry and academia, state and governmental actors (including the US, China, the EU, the UK, and other states), standard-setting organizations, international organizations, and public, civil society, and media actors.
- Levers of governance available to each of these actors to shape AI directly or indirectly.
- Pathways to influence on each of these key actors that may be available to (some) other actors in aiming to help inform or shape the key actors’ decisions around whether or how to utilize key levers of governance to improve the governance of advanced AI.
Part III reviews prescriptive work aimed at putting this research into practice in order to improve the governance of advanced AI (for some view of the problem and of the options). This includes lines of research or advocacy to map, articulate, and advance:
- Priorities for policy given theories of change based on some view of the problem and of the options.
- Good heuristics for crafting AI policy. These include general heuristics for good regulation, for (international) institutional design, and for future-proofing governance.
- Concrete policy proposals for the regulation of advanced AI, and the assets or products that can help these be realized and implemented. This includes proposals to regulate advanced AI using existing authorities, laws, or institutions; proposals to establish new policies, laws, or institutions (e.g., temporary or permanent pauses on AI development; the establishment of licensing regimes, lab-level safety practices, or governance regimes on AI inputs; new domestic governance institutions; new international AI research hubs; new bilateral agreements; new multilateral agreements; and new international governance institutions).
Introduction
This document aims to review, structure, and organize existing work in the field of advanced AI governance.
Background: Despite being a fairly young and interdisciplinary field, advanced AI governance offers a wealth of productive work to draw on and is increasingly structured through various research agendas[ref 1] and syllabi.[ref 2] However, while technical research on the possibility, impacts, and risks of advanced AI has been mapped in various literature reviews and distillations,[ref 3] few attempts have been made to comprehensively map and integrate existing research on the governance of advanced AI.[ref 4] This document aims to provide an overview and taxonomy of work in this field.
Aims: The aims of this review are several:
- Disentangle and consolidate the field to promote greater clarity and legibility regarding the range of research, connections between different research streams and directions, and open gaps or underexplored questions. Literature reviews can contribute to such a consolidation of academic work;[ref 5]
- Improve the field’s accessibility and reduce some of its “research debt”[ref 6] to help those new to the field understand the existing literature, in order to facilitate a more cohesive and coordinated research field with lower barriers to entry, which reduces duplication of effort or work;
- Enable clearer conversations between researchers exploring different questions or lines of research, discussing how and where their insights intersect or complement one another;
- Enable better comparison between different approaches and policy proposals; and
- Contribute to greater strategic clarity or coherence,[ref 7] improving the quality of interventions, and refining public and policy debates.
Scope: While there are many ways of framing the field, one approach is to define advanced AI governance as:
Advanced AI governance: “the study and shaping of local and global governance systems—including norms, policies, laws, processes, and institutions—that affect the research, development, deployment, and use of existing and future AI systems, in ways that help the world choose the role of advanced AI systems in its future, and navigate the transition to that world.”[ref 8]
However, the aim of this document is not to engage in restrictive boundary policing of which research is part of this emerging field, let alone the “core” of it. The guiding heuristic here is not whether a given piece of research is directly, explicitly, and exclusively focused on certain “right” problems (e.g., extreme risks from advanced AI), nor whether it is motivated by certain political orientations or normative frameworks, nor even whether it explicitly uses certain terminology (e.g., “Transformative AI,” “AGI,” “General-Purpose AI System,” or “Frontier AI”).[ref 9] Rather, the broad heuristic is simply whether the research helps answer a part of the advanced AI governance puzzle.
Accordingly, this review aims to cast a fairly broad net to cover work that meets any of the following criteria:
- Explicitly focuses on the governance of future advanced, potentially transformative AI systems, in particular with regard to their potential significant impacts or extreme risks.
- Focuses on the governance of today’s AI systems, where (at least some of) the authors are interested in the implications of the analysis for the governance of future AI systems;
- Focuses on today’s AI systems, where the original work is (likely) not directly motivated by a concern over (risks from) advanced AI but nonetheless offers lessons that are or could be drawn upon by the advanced AI governance community to inform insights for the governance of advanced AI systems; and
- Focuses on (the impacts or governance of) non-AI technologies or issues (such as historical case studies of technology governance), where the original work is not directly motivated by questions around AI but nonetheless offers lessons that are or could be drawn upon by the advanced AI governance community to inform insights for the governance of advanced AI systems.
Limitations: With this in mind, there are also a range of limitations or shortcomings for this review:
- Preliminary survey: A literature review of this attempted breadth will inevitably fall short of covering all relevant work and sub-literatures in sufficient depth. In particular, given the speed of development in this field, a project like this will inevitably miss key work, so it should not be considered exhaustive. Indeed, because of the breadth of this report, I do not aim to go into the details of each topic, but rather to organize and list sources by topic. Likewise, there is some unbalance in that there has to date been more organized (technical) literature on (Part 1) characterizing the problem of advanced AI governance, than there has been on drafting concrete proposals (Part 3). As such, I invite others to produce “spin-offs” of this report which go into the detail of the content for each topic or sub-section in order to produce more in-depth literature reviews.[ref 10]
- Broad scope: In accordance with the above goal to cast a “broad net,” this review covers both work that is core to and well established in the existing advanced AI governance field, and adjacent work that could be or has been considered by some as of significant value, even if it has not been as widely recognized yet. It also casts a broad net in terms of the type of sources surveyed, covering peer reviewed academic articles, reports, books, and more informal digital resources such as web fora.
- Incomplete in scope: By and large, this review focuses on public and published analyses and mostly omits currently in-progress, unpublished, or draft work.[ref 11] Given that a significant portion of relevant and key work in this field is unpublished, this means that this review likely will not capture all research directions in this field. Indeed, I estimate that this review captures at best ~70% of the work and research undertaken on many of these questions and subfields, and likely less. I therefore welcome further, focused literature reviews.
- A snapshot: While this review covers a range of work, the field is highly dynamic and fast-moving, which means that this project will become outdated before long. Attempts will be made to update and reissue the report occasionally.
Finally, a few remaining disclaimers: (1) inclusion does not imply endorsement of a given article’s conclusions; (2) this review aims to also highlight promising directions, such as issues or actors, that are not yet discussed in depth in the literature. As such, whenever I list certain issues (e.g., “actors” or “levers”) without sources, this is because I have not yet found (or have missed out on) much work on that issue, suggesting there is a gap in the literature—and room for future work. Overall, this review should be seen as a living document that will be occasionally updated as the field develops. To that end, I welcome feedback, criticism, and suggestions for improvement.
Reading guide: In general, I recommend that rather than aiming to read this from the top, readers instead identify a theme or area of interest and jump to that section. In particular, this review may be most useful to readers (a) that already have a specific research question and want to see what work has been done and how a particular line of work would fit into the larger landscape; (b) that aim to generate or distill syllabi for reading groups or courses; or (c) that aim to explore the broader landscape or build familiarity with fields or lines of research they have not previously explored. All the research presented here is collected from prior work, and I encourage readers to consult and directly cite those original sources named here.
I. Problem-clarifying work: Understanding the AI governance challenge
Most object-level work in the field of advanced AI governance has sought to disambiguate and reduce uncertainties around relevant strategic parameters of the AI governance challenge.[ref 12]
AI governance strategic parameters can be defined as “features of the world, such as the future AI development trajectory, the prevailing deployment landscape, and applicable policy conditions, which significantly determine the strategic nature of the advanced AI governance challenge.”[ref 13]
Strategic parameters serve as highly decision-relevant or even crucial considerations, determining which interventions or solutions are appropriate, necessary, viable, or beneficial for addressing the advanced AI governance challenge. Different views of these parameters constitute underlying cruxes for different theories of actions and approaches. This review discusses three types of strategic parameters:[ref 14]
- Technical parameters of the advanced AI challenge (i.e., what are the future technical developments in AI, on what timelines and on what trajectory will progress occur, why or how might such systems pose risks, and how difficult is the alignment challenge);
- Deployment parameters of who is most likely to develop advanced AI systems and how they are likely to develop and use them (i.e., whose development decisions are to be governed); and
- Governance parameters of how, when, and why governance interventions to shape advanced AI development and deployment are most likely to be viable, effective, or productive.
Accordingly, research in this subfield includes:
- Empirical and theoretical work aiming to identify or get better estimates of each of these parameters as they apply to advanced AI (Sections 1, 2, 3).
- Work applying other lenses to the advanced AI governance problem, drawing on other fields (existing theories, models, historical case studies, political and ethical theory) in order to derive crucial insights or actionable lessons (Section 4).
1. Technical parameters
An initial body of work focuses on mapping the relevant technical parameters of the challenge for advanced AI governance. This includes work on a range of topics relating to understanding the future technical landscape, understanding the likelihood of catastrophic risks given various specific threat models, and understanding the profile of the technical alignment problem and the prospects of it being solved by existing technical alignment research agendas.[ref 15]
1.1. Advanced AI technical landscape
One subfield involves research to chart the future technical landscape of advanced AI systems.[ref 16] Work to map this landscape includes research on the future form, pathways, timelines, and trajectories of advanced AI.
Forms of advanced AI
Work exploring distinct potential forms of advanced AI,[ref 17] including:
- strong AI,[ref 18] autonomous machine intelligence,[ref 19] general artificial intelligence,[ref 20] human-level AI (HLAI),[ref 21] general-purpose AI system (GPAIS),[ref 22] comprehensive AI services (CAIS),[ref 23] highly capable foundation models,[ref 24] artificial general intelligence (AGI),[ref 25] robust artificial intelligence,[ref 26] AI+,[ref 27] (machine/artificial) superintelligence,[ref 28] and superhuman general purpose AI,[ref 29] amongst others.
Developmental paths towards advanced AI
This includes research and debate on a range of domains. In particular, such work focuses on analyzing different hypothesized pathways towards achieving advanced AI based on different paradigms or theories.[ref 30] Note that many of these are controversial and contested, and there is pervasive disagreement over the feasibility of many (or even all) of these approaches for producing advanced AI.
Nonetheless, some of these paradigms include programs to produce advanced AI based on:
- First principles: Approaches that aim to create advanced AI based on new fundamental insights in computer science, mathematics, algorithms, or software, producing AI systems that may, but need not, mimic human cognition.[ref 31]
- Direct/Scaling: Approaches that aim to “brute force” advanced AI[ref 32] by running (one or more) existing AI approaches with increasingly greater computing power and/or training data to exploit observed “scaling laws” in system performance.[ref 33]
- Evolutionary: Approaches that aim to create advanced AI based on algorithms that compete to mimic the evolutionary brute search process that produced human intelligence.[ref 34]
- Reward-based: Approaches that aim to create advanced AI by running reinforcement learning systems with simple rewards in rich environments.[ref 35]
- Bootstrapping: Approaches that aim to create some minimally intelligent core system capable of subsequent recursive (self)-improvement as a “seed AI.”[ref 36]
- Neuro-inspired: Various forms of biologically-inspired, brain-inspired, or brain-imitative approaches that aim to draw on neuroscience and/or “connectomics” to reproduce general intelligence.[ref 37]
- Neuro-emulated: Approaches that aim to digitally simulate or recreate the states of human brains at a fine-grained level, possibly producing whole-brain-emulation.[ref 38]
- Neuro-integrationist: Approaches that aim to create advanced AI based on merging components of human and digital cognition.
- Embodiment: Approaches that aim to create advanced AI by providing the AI system with a robotic physical “body”’ to ground cognition and enable it to learn from direct experience of the world.[ref 39]
- Hybrid: Approaches that rely on combining deep neural network-based approaches to AI with other paradigms (such as symbolic AI).[ref 40]
Notably, of these approaches, recent years have seen most sustained attention focused on the direct (scaling) approach and whether current approaches to advanced AI, if scaled up with enough computing power or training data, will suffice to produce advanced or transformative AI capabilities. There have been various arguments both in favor of and against this direct path.
- Arguments in favor of a direct path: “scaling hypothesis,”[ref 41] “prosaic AGI,”[ref 42] and “Human feedback on diverse tasks (HFDT)”;[ref 43]
- Arguments against a direct path, highlighting various limits and barriers: “deep limitations,”[ref 44] “the limits of machine intelligence,”[ref 45] “why AI is harder than we think,”[ref 46] and other skeptical arguments;[ref 47]
- Discussion of the possible features of “engineering roadmaps” for AGI-like systems.[ref 48]
Advanced AI timelines: Approaches and lines of evidence
A core aim of the field is to chart the timelines for advanced AI development across the future technical development landscape.[ref 49] This research focuses on various lines of evidence,[ref 50] which are here listed in order from more abstract to more concrete and empirical, and from relying more on outside-view arguments to relying more on inside-view arguments,[ref 51] with no specific ranking on the basis of the strength of individual lines of evidence.
Outside-view analyses of timelines
Outside-view analyses of AI development timelines, including:
- Estimates based on philosophical arguments and anthropic reasoning:
- Prima facie likelihood that we (of all generations) are the ones to find ourselves living in the “most important” century, one that we can expect to contain things such as transformative technologies.[ref 52]
- Estimates based on extrapolating historical (growth) trends:
- Insights from endogenous growth theory on AI development dynamics;[ref 53]
- Likelihood of explosive economic growth occurring this century, for some reason (plausibly technological, plausibly AI[ref 54]), given analyses of long-run economic history;[ref 55]
- The accelerating historical rate of development of new technologies[ref 56] as well as potential changes in the historical rate of increase in the economy;[ref 57]
- The historical patterns of barriers to technology development,[ref 58] including unexpected barriers or delays in innovation,[ref 59] as well as lags in subsequent deployment or diffusion.[ref 60]
- Estimates based on extrapolating from historical trends in efforts dedicated to creating advanced AI:
- External “semi-informative priors” (i.e., only basic information regarding how long people have attempted to build advanced, transformative AI and what resources they have used, and comparing it to how long it has taken other comparable research fields to achieve their goals given certain levels of funding and effort);
- Arguments extrapolating from “significantly increased near-future investments in AI progress” given that (comparatively) moderate past investments already yielded significant progress.
- Estimates based on meta-induction from the track record of past predictions:
- The general historical track record of past technological predictions, especially those made by futurists[ref 63] as well as those made in professional long-range forecasting exercises,[ref64] to understand the frequency of over- or underconfidence and of periods of excessive optimism (hype) or excessive pessimism (counterhype);[ref 65]
- The specific historical track record of past predictions around AI development[ref 66] and the frequency of past periods’ excessive optimism (hype) or excessive pessimism (counterhype or “underclaiming”[ref 67]).[ref 68]
Judgment-based analyses of timelines
Judgment-based analyses of timelines, including:
Estimates based on (specialist) expert opinions:
- Expert opinion surveys of anticipated rates of progress;[ref 69]
- Expert elicitation techniques (e.g., Delphi method).[ref 70]
- Estimates based on (generalist) estimates from information aggregation mechanisms (financial markets; forecaster prediction markets):[ref 71]
- Forecasters’ predictions of further AI progress on prediction platforms[ref 72] or forecasting competitions;[ref 73]
- Current financial markets’ real interest rates, assuming the efficient market hypothesis, suggesting that markets reject short timelines.[ref 74]
Inside-view models on AI timelines
Inside-view models-based analyses of timelines, including:
- Estimates based on first-principle estimates of minimum resource (compute, investment) requirements for a “transformative” AI system, compared against estimated trends in these resources:
- The “biological anchors” approach:[ref 75] Comparison with human biological cognition by comparing projected trends in the falling costs of training AI models to the expected minimum amount of computation needed to train an AI model as large as the human brain;[ref 76]
- The “direct approach”:[ref 77] Analysis of empirical neural scaling laws in current AI systems to upper bound the compute needed to train a transformative model. In order to provide estimates of the system’s development, this analysis can be combined with estimates of future investment in model training, hardware price-performance, and algorithmic progress[ref 78] as well as with potential barriers in the (future) availability of the data and compute needed to train these models.[ref 79]
- Estimates based on direct evaluation of outputs (progress in AI systems’ capabilities):
- Debates over the significance and implications of specific ongoing AI breakthroughs for further development;[ref 80]
- Operationalizing and measuring the generality of existing AI systems.[ref 81]
Methodological debates on AI-timelines analysis
Various methodological debates around AI-timelines analysis:
- On the potential pitfalls in many of the common methods (forecasting methods,[ref 82] extrapolation, expert predictions[ref 83]) in forecasting AI;
- On the risk of misinterpreting forecasters who are depending on poor operationalization;[ref 84]
- On the risk of deference cycles in debates over AI timelines[ref 85] because the opinions and analyses of a small number of people end up tacitly informing the evaluations of a wide range of others in ways that create the impression of many people independently achieving similar conclusions;[ref 86]
- On the (potentially) limited utility of further discourse over and research into AGI timelines: arguments that all low-hanging fruit may already have been plucked[ref 87] and counterarguments that specific timelines remain relevant to prioritizing strategies.[ref 88]
Advanced AI trajectories and early warning signals
A third technical subfield aims at charting the trajectories of advanced AI development, especially the potential for rapid and sudden capability gains, and whether there will be advanced warning signs:
- Exploring likely AGI “takeoff speeds”:[ref 89]
- From first principles: arguments in favor of “fast takeoff”[ref 90] vs. arguments for slow(er), more continuous development;[ref 91]
- By analogy: exploring historical precedents for sudden disjunctive leaps in technological capabilities.[ref 92]
- Mapping the epistemic texture of the AI development trajectory in terms of possible advance warning signs of capability breakthroughs[ref 93] or the lack of any such fire alarms.[ref 94]
1.2. Impact models for general social impacts from advanced AI
Various significant societal impacts that could result from advanced AI systems:[ref 95]Potential for advanced AI systems to drive significant, even “explosive” economic growth[ref 96] but also risks of significant inequality or corrosive effects on political discourse;[ref 97]
- Significant impacts on scientific progress and innovation;[ref 98]
- Significant impacts on democracy;[ref 99]
- Lock-in of harmful socio-political dangers as a result of the increasing role of centralization and optimization;[ref 100]
- Impacts on geopolitics and international stability.[ref 101]
This is an extensive field that spans a wide range of work, and the above is by no means exhaustive.
1.3. Threat models for extreme risks from advanced AI
A second subcluster of work focuses on understanding the threat models of advanced AI risk,[ref 102] based on indirect arguments for risks, specific threat models for direct catastrophe, or takeover,[ref 103] or on specific threat models for indirect risks.[ref 104]
General arguments for risks from AI
Analyses that aim to explore general arguments (by analogy, on the basis of conceptual argument, or on the basis of empirical evidence from existing AI systems) over whether or why we might have grounds to be concerned about advanced AI.[ref 105]
Analogical arguments for risks
Analogies[ref 106] with historical cases or phenomena in other domains:
- Historical cases of intelligence enabling control: emergence of human dominion over the natural world: “second species argument”[ref 107] and “the human precedent as indirect evidence of danger”;[ref 108]
- Historical cases where actors were able to achieve large shifts in power despite only wielding relatively minor technological advantages: conquistadors;[ref 109]
- Historical cases of “lock-in” of suboptimal or bad societal trajectories based on earlier choices and exacerbated by various mechanisms for lock-in: climate change, the agricultural revolution, and colonial projects.[ref 110]
Analogies with known “control problems” observed in other domains:
- Analogies with economics principal-agent problems;[ref 111]
- Analogies with constitutional law “incomplete contracting” theorems;[ref 112] in particular, the difficulty of specifying adequate legal responses to all situations or behaviors in advance because it is hard to specify specific and concrete rules for all situations (or in ways that cannot be gamed), whereas vague standards (such as the “reasonable person test”) may rely on intuitions that are widely shared but difficult to specify and need to be adjudicated ex post;[ref 113]
- Analogies to economic systems[ref 114] and to bureaucratic systems and markets, and their accordant failure modes and externalities;[ref 115]
- Analogies to “Goodhart’s Law,” where a proxy target metric is used to improve a system so far that further optimization becomes ineffective or harmful;[ref 116]
- Analogies to the “political control problem”—the problem of the alignment and control of powerful social entities (corporations, militaries, political parties) with (the interests of) their societies, a problem that remains somewhat unsolved, with societal solutions relying on patchwork and fallible responses that cannot always prevent misalignment (e.g., corporate malfeasance, military coups, or unaccountable political corruption);[ref 117]
- Analogies with animal behavior, such as cases of animals responding to incentives in ways that demonstrate specification gaming;[ref 118]
- Illustration with thought experiments and well-established narrative tropes: “sorcerer’s apprentice,”[ref 119] “King Midas problem,”[ref 120] and “paperclip maximizer.”[ref 121]
Conceptual arguments for risks
Conceptual and theoretical arguments based on existing ML architectures:
- Arguments based on the workings of modern deep learning systems.[ref 122]
Conceptual and theoretical arguments based on the competitive environment that will shape the evolutionary development of AIs:
- Arguments suggesting that competitive pressures amongst AI developers may lead the most successful AI agents to likely have (or be given) undesirable traits, which creates risks.[ref 123]
Empirical evidence for risks
Empirical evidence of unsolved alignment failures in existing ML systems, which are expected to persist or scale in more advanced AI systems:[ref 124]
- “Faulty reward functions in the wild,”[ref 125] “specification gaming,”[ref 126] and reward model overoptimization;[ref 127]
- “Instrumental convergence,”[ref 128] goal misgeneralization, and “inner misalignment” in reinforcement learning;[ref 129]
- Language model misalignment[ref 130] and other unsolved safety problems in modern ML,[ref 131] and the harms from increasingly agentic algorithmic systems.[ref 132]
Empirical examples of elements of AI threat models that have already occurred in other domains or with simpler AI systems:
- Situational awareness: cases where a large language model displays awareness that it is a model, and it can recognize whether it is currently in testing or deployment;[ref 133]
- Acquisition of a goal to harm society: cases of AI systems being given the outright goal of harming humanity (ChaosGPT);
- Acquisition of goals to seek power and control: cases where AI systems converge on optimal policies of seeking power over their environment;[ref 134]
- Self-improvement: examples of cases where AI systems improve AI systems;[ref 135]
- Autonomous replication: the ability of simple software to autonomously spread around the internet in spite of countermeasures (various software worms and computer viruses);[ref 136]
- Anonymous resource acquisition: the demonstrated ability of anonymous actors to accumulate resources online (e.g., Satoshi Nakamoto as an anonymous crypto billionaire);[ref 137]
- Deception: cases of AI systems deceiving humans to carry out tasks or meet goals.[ref 138]
Direct threat models for direct catastrophe from AI
Work focused at understanding direct existential threat models.[ref 139] This includes:
- Various overviews and taxonomies of different accounts of AI risk: Barrett & Baum’s “model of pathways to risk,”[ref 140] Clarke et al.’s Modelling Transformative AI Risks (MTAIR),[ref 141] Clarke & Martin on “Distinguishing AI Takeover Scenarios,”[ref 142] Clarke & Martin’s “Investigating AI Takeover Scenarios,”[ref 143] Clarke’s “Classifying Sources of AI X-Risk,”[ref 144] Vold & Harris “How Does Artificial Intelligence Pose an Existential Risk?,”[ref 145] Ngo “Disentangling Arguments for the Importance of AI Safety,”[ref 146] Grace’s overview of arguments for existential risk from AI,[ref 147] Nanda’s “threat models,”[ref 148] and Kenton et al.;[ref 149]
- Analysis of potential dangerous capabilities that may be developed by general-purpose AI models, such as cyber-offense, deception, persuasion and manipulation, political strategy, weapons acquisition, long-horizon planning, AI development, situational awareness, and self-proliferation.[ref 150]
Scenarios for direct catastrophe caused by AI
Other lines of work have moved from providing indirect arguments of risk, to instead sketching specific scenarios in and through which advanced AI systems could directly inflict existential catastrophe.
Scenario: Existential disaster because of misaligned superintelligence or power-seeking AI
- Older accounts, including by Yudkowsky,[ref 151] Bostrom,[ref 152] Sotala,[ref 153] Sotala and Yampolskiy,[ref 154] and Alexander;[ref 155]
- Newer accounts, such as Cotra & Karnofsky’s “AI takeover analysis,”[ref 156] Christiano’s account of “What Failure Looks Like,”[ref 157] Carlsmith on existential risks from power-seeking AI,[ref 158] Ngo on “AGI Safety From First Principles,”[ref 159] and “Minimal accounts” of AI takeover scenarios;[ref 160]
- Skeptical accounts: various recent critiques of AI takeover scenarios.[ref 161]
Scenario: Gradual, irretrievable ceding of human power over the future to AI systems
- Christiano’s account of “What Failure Looks Like, (1).”[ref 162]
Scenario: Extreme “suffering risks” because of a misaligned system
- Various accounts of “worst-case AI safety”;[ref 163]
- Potential for a “suffering explosion” experienced by AI systems.[ref 164]
Scenario: Existential disaster because of conflict between AI systems and multi-system interactions
- Disasters because of “cooperation failure”[ref 165] or “multipolar failure.”[ref 166]
Scenario: Dystopian trajectory lock-in because of misuse of advanced AI to establish and/or maintain totalitarian regimes;
- Use of advanced AI to establish robust totalitarianism;[ref 167]
- Use of advanced AI to establish lock-in of the future values.[ref 168]
Scenario: Failures in or misuse of intermediary (non-AGI) AI systems, resulting in catastrophe
- Deployment of “prepotent” AI systems that are non-general but capable of outperforming human collective efforts on various key dimensions;[ref 169]
- Militarization of AI enabling mass attacks using swarms of lethal autonomous weapons systems;[ref 170]
- Military use of AI leading to (intentional or unintentional) nuclear escalation, either because machine learning systems are directly integrated in nuclear command and control systems in ways that result inescalation[ref 171] or because conventional AI-enabled systems (e.g., autonomous ships) are deployed in ways that result in provocation and escalation;[ref 172]
- Nuclear arsenals serving as an arsenal “overhang” for advanced AI systems;[ref 173]
- Use of AI to accelerate research into catastrophically dangerous weapons (e.g., bioweapons);[ref 174]
- Use of AI to lower the threshold of access to dual-use biotechnology, creating risks of actors misusing it to create bioweapons.[ref 175]
Other work: vignettes, surveys, methodologies, historiography, critiques
- Work to sketch vignettes reflecting on potential threat models:
- AI Impacts’ AI Vignettes project;[ref 176]
- FLI Worldbuilding competition;[ref 177]
- Wargaming exercises;[ref 178]
- Other vignettes or risk scenarios.[ref 179]
- Surveys of how researchers rate the relative probability of different existential risk scenarios from AI;[ref 180]
- Developing methodologies for AI future developments and risk identification,[ref 181] such as red-teaming,[ref 182] wargaming exercises,[ref 183] and participatory technology assessment,[ref 184] as well as established risk identification techniques (scenario analysis, fishbone method, and risk typologies and taxonomies), risk analysis techniques (causal mapping, Delphi technique, cross-impact analysis, bow tie analysis, and system-theoretic process analysis), and risk evaluation techniques (checklists and risk matrices);[ref 185]
- Historiographic accounts of changes in AI risk arguments and debates over time:
- General history of concerns around AI risk (1950s–present);[ref 186]
- Early history of the rationalist and AI risk communities (1990s–2010);[ref 187]
- Recent shifts in arguments (e.g., 2014–present);[ref 188]
- Development and emergence of AI risk “epistemic community.”[ref 189]
- Critical investigations of and counterarguments to the case for extreme AI risks, including object-level critiques of the arguments for risk[ref 190] as well as epistemic arguments, arguments about community dynamics, and argument selection effects.[ref 191]
Threat models for indirect AI contributions to existential risk factors
Work focused at understanding indirect ways in which AI could contribute to existential threats, such as by shaping societal “turbulence”[ref 192] and other existential risk factors.[ref 193] This covers various long-term impacts on societal parameters such as science, cooperation, power, epistemics, and values:[ref 194]
- Destabilizing political impacts from AI systems in areas such as domestic politics (e.g., polarization, legitimacy of elections), international political economy, or international security[ref 195] in terms of the balance of power, technology races and international stability, and the speed and character of war ;
- Hazardous malicious uses;[ref 196]
- Impacts on “epistemic security” and the information environment;[ref 197]
- Erosion of international law and global governance architectures;[ref 198]
- Other diffuse societal harms.[ref 199]
1.4. Profile of technical alignment problem
- Work mapping different geographical or institutional hubs active on AI alignment: overview of the AI safety community and problem,[ref 200] and databases of active research institutions[ref 201] and of research;[ref 202]
- Work mapping current technical alignment approaches;[ref 203]
- Work aiming to assess the (relative) efficacy or promise of different approaches to alignment, insofar as possible:[ref 204] Cotra,[ref 205] Soares,[ref 206] and Leike.[ref 207]
- Mapping the relative contributions to technical AI safety by different communities[ref 208] and the chance that AI safety problems get “solved by default”;[ref 209]
- Work mapping other features of AI safety research, such as the need for minimally sufficient access to AI models under API-based “structured access” arrangements.[ref 210]
2. Deployment parameters
Another major part of the field aims to understand the parameters of the advanced AI deployment landscape by mapping the size and configuration of the “game board” of relevant advanced AI developers—the actors whose (ability to take) key decisions (e.g., around whether or how to deploy particular advanced AI systems, how much to invest in alignment research, etc.) may be key in determining risks and outcomes from advanced AI.
As such, there is significant work on mapping the disposition of the AI development ecosystem and how this will determine who is (or will likely be) in the position to develop and deploy the most advanced AI systems. Some work in this space focuses on mapping the current state of these deployment parameters; other work focuses on the likely future trajectories of these deployment parameters over time.
2.1. Size, productivity, and geographic distribution of the AI research field
- Mapping the current size, activity, and productivity of the AI research field;[ref 211]
- Mapping the global geographic distribution of active AGI programs,[ref 212] including across key players such as the US or China.[ref 213]
2.2. Geographic distribution of key inputs in AI development
- Mapping the current distribution of relevant inputs in AI development, such as the distribution of computation,[ref 214] semiconductor manufacturing,[ref 215] AI talent,[ref 216] open-source machine learning software,[ref 217] etc.
- Mapping and forecasting trends in relevant inputs for AI,[ref 218] such as:
- Trends in compute inputs scaling[ref 219] and in the training costs and GPU price-performance of machine learning systems over time;[ref 220]
- Trends in dataset scaling and potential ceilings;[ref 221]
- Trends in algorithmic progress, including their effect on the ability to leverage other inputs, e.g., the relative importance of CPUs versus specialized hardware;[ref 222]
- Mapping and forecasting trends in input criticality for AI, such as trends in data efficiency[ref 223] and the degree to which data becomes the operative constraint on language model performance.[ref 224]
2.3. Organization of global AI supply chain
- Mapping the current shape of the AI supply chain;[ref 225]
- Mapping and forecasting dominant actors in the future AI ecosystem, in terms of:
- different actors’ control of and access to key inputs and/or chokepoints;[ref 226]
- future shape of the AI supply chain (e.g., level of integration and monopoly structure);[ref 227]
- shape of AI deployment landscape (e.g., dominance of key operators of generative models vs. copycat models).
2.4. Dispositions and values of advanced AI developers
- Anticipating the likely behavior or attitude of key advanced AI actors with regard to their caution about and investment in safety research, such as expecting AI companies to “race forward” and dedicate “naive safety effort.”[ref 228]
2.5. Developments in converging technologies
- Mapping converging developments in adjacent, potentially intersecting or relevant technologies, such as cryptography,[ref 229] nanotechnology,[ref 230] and others.
3. Governance parameters
Work on governance parameters aims to map (1) how AI systems are currently being governed, (2) how they are likely to be governed by default (given prevailing perceptions and regulatory initiatives), as well as (3) the conditions for developing and implementing productive governance interventions on advanced AI risk.
Some work in this space focuses on mapping the current state of these governance parameters and how they affect AI governance efforts initiated today. Other work focuses on the likely future trajectories of these governance parameters.
3.1. Stakeholder perceptions of AI
Surveys of current perceptions of AI among different relevant actors:
- Public perceptions of the future of AI,[ref 231] of AI’s societal impacts,[ref 232] of the need for caution and/or regulation of AI,[ref 233] and of the rights or standing of AI entities;[ref 234]
- Policymaker perceptions of AI[ref 235] and the prominence of different memes, rhetorical frames, or narratives around AI;[ref 236]
- Expert views on best practices in AGI lab safety and governance.[ref 237]
Predicting future shifts in perceptions of AI among relevant actors given:
- The spread of ongoing academic conversations concerned about advanced AI risk;[ref 238]
- The effects of “warning shots,”[ref 239] or other “risk awareness moments”;[ref 240]
- The effect of motivated misinformation or politicized AI risk skepticism.[ref 241]
3.2. Stakeholder trust in AI developers
- Public trust in different actors to responsibly develop AI;[ref 242]
- AI-practitioner trust in different actors to responsibly develop AI[ref 243] and Chinese AI researchers’ views on the development of “strong AI.”[ref 244]
3.3. Default landscape of regulations applied to AI
This work maps the prevailing (i.e., default, “business-as-usual”) landscape of regulations that will be applied to AI in the near term. These matter as they will directly affect the development landscape for advanced AI and indirectly bracket the space for any new (AI-specific) governance proposals.[ref 245] This work includes:
- Existing industry norms and practices applied to AI in areas such as release practices around generative AI systems;[ref 246]
- General existing laws and governance regimes which may be extended to or affect AI development, such as anticompetition law;[ref 247] national and international standards;[ref 248] international law norms, treaties, and regimes;[ref 249] and existing global governance institutions.[ref 250]
- AI-specific governance regimes currently under development, such as:
- EU: the EU AI Act [ref 251] and the AI Liability Directive,[ref 252] amongst others;
- US: the US AI policy agenda,[ref 253] such as various federal legislative proposals relating to generative AI,[ref 254] or President Biden’s executive order,[ref 255] amongst others.;
- International: such as the 2019 OECD AI Principles (nonbinding);[ref 256] the 2021 UNESCO Recommendation on the Ethics of Artificial Intelligence (nonbinding);[ref 257] the 2023 G7 Hiroshima guidelines (nonbinding);[ref 258] and the Council of Europe’s draft (framework) Convention on Artificial Intelligence, Human Rights, Democracy and the Rule of Law (potentially binding),[ref 259] amongst others.
3.4. Prevailing barriers to effective AI governance
- Definitional complexities of AI as target for regulation;[ref 260]
- Potential difficulties around building global consensus given geopolitical stakes and tensions;[ref 261]
- Potential difficulty around building civil society consensus given outstanding disagreements and tensions between different expert communities;[ref 262]
- Potential challenges around cultivating sufficient state capacity to effectively implement and enforce AI legislation.[ref 263]
3.5. Effects of AI systems on tools of governance
Predicting the impact of future technologies on governance and the ways these could shift the possibility frontier of what kind of regimes will be politically viable and enforceable:
- Effects of AI on general cooperative capabilities;[ref 264]
- Effects of AI on international law creation and enforcement;[ref 265]
- Effects of AI on arms control monitoring.[ref 266]
4. Other lenses on the advanced AI governance problem
Other work aims to derive key strategic lessons for advanced AI governance, not by aiming to empirically map or estimate first-order facts about the key (technical, deployment, or governance) strategic parameters, but rather by drawing indirect (empirical, strategic, and/or normative) lessons from abstract models, historical cases, and/or political theory.
4.1. Lessons derived from theory
Work characterizing the features of advanced AI technology and of its governance challenge, drawing on existing literatures or bodies of theory:
Mapping clusters and taxonomies of AI’s governance problems:
- AI creating distinct types of risk deriving from (1) accidents, (2) misuse, and (3) structure;[ref 267]
- AI creating distinct problem logics across domains: (1) ethical challenges, (2) safety risks, (3) security threats, (4) structural shifts, (5) common goods, and (6) governance disruption;[ref 268]
- AI driving four risk clusters: (1) inequality, turbulence, and authoritarianism; (2) great-power war; (3) the problems of control, alignment, and political order; and (4) value erosion from competition.[ref 269]
Mapping the political features of advanced AI technology:
- AI as general-purpose technology, highlighting radical impacts on economic growth, disruption to existing socio-political relations, and potential for backlash and social conflict;[ref 270]
- AI as industry-configured general-purpose tech (low fixed costs and private sector dominance), highlighting challenges of rapid proliferation (compared to “prestige,” “public,” or “strategic” technologies);[ref 271]
- AI as information technology, highlighting challenges of increasing returns to scale driving greater income inequality, impacts on broad collective identities as well as community fragmentation, and increased centralization of (cybernetic) control;[ref 272]
- AI as intelligence technology, highlighting challenges of bias, alignment, and control of the principal over the agent;[ref 273]
- AI as regulation-resistant technology, rendering coordinated global regulation difficult.[ref 274]
Mapping the structural features of the advanced AI governance challenge:
- In terms of its intrinsic coordination challenges: as a global public good,[ref 275] as a collective action problem,[ref 276] and as a matter of “existential security”;[ref 277]
- In terms of its difficulty of successful resolution: as a wicked problem[ref 278] and as a challenge akin to “racing through a minefield”;[ref 279]
- In terms of its strategic dynamics: as a technology race,[ref 280] whether motivated by security concerns or by prestige motivations,[ref 281] or as an arms race[ref 282] (but see also critiques of the arms race framing on definitional grounds,[ref 283] on empirical grounds,[ref 284] and on grounds of rhetorical or framing risks[ref 285]);
- In terms of its politics and power dynamics: as a political economy problem.[ref 286]
Identifying design considerations for international institutions and regimes, from:
- General theory on the rational design of international institutions;[ref 287]
- Theoretical work on the orchestration and organization of regime complexes of many institutions, norms, conventions, etc.[ref 288]
4.2. Lessons derived from models and wargames
Work to derive or construct abstract models for AI governance in order to gather lessons from these for understanding AI systems’ proliferation and societal impacts. This includes models of:
- International strategic dynamics in risky technology races,[ref 289] and theoretical models of the role of information sharing,[ref 290] agreement, or incentive modeling;[ref 291]
- AI competition and whether and how AI safety insights will be applied under different AI safety-performance tradeoffs,[ref 292] including collaboration on safety as a social dilemma[ref 293] and models of how compute pricing factors affect agents’ spending on safety (“safety tax”) meant to reduce the danger from the new technology;[ref 294]
- The offense-defense balance of increasing investments in technologies;[ref 295]
- The offense-defense balance of scientific knowledge in AI with potential for misuse;[ref 296]
- Lessons from the “epistemic communities” lens, on how coordinated expert networks can shape policy;[ref 297]
- Lessons from wargames and role-playing exercises.[ref 298]
4.3. Lessons derived from history
Work to identify and study relevant historical precedents, analogies, or cases and to derive lessons for (AI) governance.[ref 299] This includes studies where historical cases have been directly applied to advanced AI governance as well as studies where the link has not been drawn but which might nevertheless offer productive insights for the governance of advanced AI.
Lessons from the history of technology development and spread
Historical cases that (potentially) provide insights into when, why, and how new technologies are pursued and developed—and how they subsequently (fail to) spread.
Historical rationales for technology pursuit and development
Historical rationales for actors pursuing large-scale scientific or technology development programs:
- Development of major transformative technologies during wartime: US development of the atom bomb;[ref 300]
- Pursuit of strategically valuable megaprojects: the Apollo Program and the Manhattan Project;[ref 301]
- Technologies pursued for prestige reasons: Ming Dynasty treasure fleets,[ref 302] the US/USSR space race,[ref 303] and the French nuclear weapons program;[ref 304]
- Risk of races being started by possibly incorrect perceptions that a rival is actively pursuing a technology: the Manhattan Project (1939–1945), spurred by the Einstein Letter; the “missile gap” project to build up a US ICBM capability (1957–1962).[ref 305]
Historical strategies of deliberate large-scale technology development projects
Historical strategies for unilateral large-scale technology project development:
- Crash recruitment and resource allocation for a large strategic program: “Operation Paperclip,” the post-WWII effort to recruit 1,600 German scientists and engineers, fast-tracking the US space program as well as several programs aimed at other Cold War weapons of mass destruction;[ref 306]
- Different potential strategies for pursuing advanced strategic technologies: the distinct nuclear proliferation strategies (“hedging, sprinting, sheltered pursuit, hiding”) taken by different countries in pursuing nuclear weapons;[ref 307]
- Government-industry collaborations to boost development of strategic technologies: the 1980’s SEMATECH collaborative research consortium to boost the US semiconductor industry;[ref 308]
- Nations achieving early and sustained unilateral leads in developing key strategic technologies: the US program to develop stealth aircraft;[ref 309]
- Surprisingly rapid leaps from the political decision to run a big technology program to the achievement: Apollo 8 (134 days between NASA decision to go to the moon and launch),[ref 310] UAE’s “Hope” Mars mission (set up its space agency UAESA in 2014, was only able to design its own satellite (KhalifaSat) in 2018, and launched its “Hope” Mars Mission in July 2020, less than six years after establishment),[ref 311] and various other examples including BankAmericard (90 days), P-80 Shooting Star (first USAF jet fighter) (143 days), Marinship (197 days), The Spirit of St. Louis (60 days), the Eiffel Tower (2 years and 2 months), Treasure Island, San Francisco (~2 years), the Alaska Highway (234 days), Disneyland (366 days), the Empire State Building (410 days), Tegel Airport and the Berlin Airlift (92 days),[ref 312] the Pentagon (491 days), Boeing 747 (930 days), the New York Subway (4.7 years), TGV (1,975 days), USS Nautilus (first nuclear submarine) (1,173 days), JavaScript (10 days), Unix (21 days), Xerox Alto (first GUI-oriented computer) (4 months), iPod (290 days), Amazon Prime (6 weeks), Git (17 days), and COVID-19 vaccines (3-45 days).[ref 313]
Historical strategies for joint or collaborative large-scale technology development:
- International “big science” collaborations: CERN, ITER, International Space Station, Human Genome Project,[ref 314] and attempted collaborations on Apollo-Soyuz between the US and Soviet space programs.[ref 315]
Historical instances of sudden, unexpected technological breakthroughs
Historical cases of rapid, historically discontinuous breakthroughs in technological performance on key metrics:
- “Large robust discontinuities” in historical technology performance trends:[ref 316]
- the Pyramid of Djoser (2650 BC—structure height trends);
- the SS Great Eastern (1858—ship size trends);
- the first and second telegraphs (1858, 1866—speed of sending a message across the Atlantic Ocean);
- the first nonstop transatlantic flight (1919—speed of passenger or military payload travel);
- first nuclear weapons (1945—relative effectiveness of explosives);
- first ICBM (1958—average speed of military payload);
- the discovery of YBa2Cu3O7 as a superconductor (1987—warmest temperature of superconduction).[ref 317]
- “Bolt-from-the-blue” technology breakthroughs that were held to be unlikely or impossible even shortly before they happened: Invention of flight;[ref 318] of penicillin, nuclear fission, nuclear bombs, or space flight;[ref 319] of internet hyperlinks and effective internet search.[ref 320]
Historical patterns in technological proliferation and take-up
Historical cases of technological proliferation and take-up:[ref 321]
- Patterns in the development, dissemination and impacts of major technological advancements: flight, the telegraph, nuclear weapons, the laser, penicillin, the transistor, and others;[ref 322]
- Proliferation and penetration rates of other technologies in terms of time between invention and widespread use: steam engine (80 years), electricity (40 years), IT (20 years),[ref 323] and mobile phones;
- Role of state “diffusion capacity” in supporting the diffusion or wide adoption of new innovations: the US in the Second Industrial Revolution and the Soviet Union in the early postwar period;[ref 324]
- Role of espionage in facilitating critical technology diffusion: early nuclear proliferation[ref 325] and numerous information leaks in modern IT systems;[ref 326]
- Constrained proliferation of technological insights (even under compromised information security conditions): surprisingly limited track record of bioweapon proliferation: the American, Soviet, Iraqi, South African, and Aum Shinrikyo bioweapon programs ran into a range of problems which resulted in programs that failed if not totally then at least to make effective steps towards weaponization. This suggests that tacit knowledge and organizational conditions can be severely limiting and prevent proliferation even when some techniques are available in the public scientific literature.[ref 327] The (1991–2018) limited success of China in re-engineering US fifth-generation stealth fighters in spite of extensive espionage that included access to blueprints, recruitment of former engineers, and even access to the wreck of a F-117 aircraft that had crashed in Serbia;[ref 328]
- Various factors contributing to technological delay or restraint with many examples of technologies being slowed or abandoned or having their uptake inhibited, including weapon systems, nuclear power, geoengineering, and genetically modified (GM) crops, as a result of (indirect) regulations, public opposition, and historical contingency;[ref 329]
- Supply chain evolution of previous general-purpose technologies: studies of railroads, electricity, and cloud computing industries, where supply chains were initially vertically integrated but then evolved into a fully disintegrated natural monopoly structure with a handful of primary “upstream” firms selling services to many “downstream” application sectors.[ref 330]
Lessons from the historical societal impacts of new technologies
Historical cases that (potentially) provide insights into when, why, and how new technologies can have (unusually) significant societal impacts or pose acute risks.
Historical cases of large-scale societal impacts from new technologies
Historical cases of large-scale societal impacts from new technologies:[ref 331]
- Impacts of previous narrowly transformative technologies: impact of nuclear weapons on warfare, and electrification of militaries as driver of “general-purpose military transformation”;[ref 332]
- Impacts of previous general-purpose technologies: general electrification,[ref 333] printing, steam engines, rail transport, motor vehicles, aviation, and computing;[ref 334]
- Impacts of previous “revolutionary” or “radically transformative”[ref 335] technologies: domesticated crops and the steam engine;[ref 336]
- Impacts of previous information technologies: speech and culture, writing, and the printing press; digital services; and communications technologies;[ref 337]
- Impacts of previous intelligence technologies: price mechanisms in a free market, language, bureaucracy, peer review in science, and evolved institutions like the justice system and law;[ref 338]
- Impacts of previous labor-substitution technologies as they compare to the possible societal impacts of large language models.[ref 339]
Historical cases of particular dangers or risks from new technologies
Historical precedents for particular types of dangers or threat models from technologies:
- Human-machine interface risks and failures around complex technologies: various “normal accidents” in diverse industries and domains, most notably nuclear power;[ref 340]
- Technology misuse risks: the proliferation of easily available hacking tools, such as the “Blackshades Remote Access Tool,”[ref 341] but see also the counterexample of non-use of an (apparent) decisive strategic advantage: the brief US nuclear monopoly;[ref 342]
- Technological “structural risks”: the role of technologies in lowering the threshold for war initiation such as the alleged role of railways in inducing swift, all-or-none military mobilization schedules and precipitating escalation to World War I.[ref 343]
Historical cases of value changes as a result of new technologies
Historical precedents for technologically induced value erosion or value shifts:
- Shared values eroded by pressures of global economic competition: “sustainability, decentralized technological development, privacy, and equality”;[ref 344]
- Technological progress biasing the development of states towards welfare-degrading (inegalitarian and autocratic) forms: agriculture, bronze working, chariots, and cavalry;[ref 345]
- Technological progress biasing the development of states towards welfare-promoting forms: ironworking, ramming warships, and industrial revolution;[ref 346]
- Technological progress leading to gradual shifts in societal values: changes in the prevailing technology of energy capture driving changes in societal views on violence, equality, and fairness;[ref 347] demise of dueling and honor culture after (low-skill) pistols replaced (high-skill) swords; changes in sexual morality after the appearance of contraceptive technology; changes in attitudes towards farm animals after the rise of meat replacements; and the rise of the plough as a driver of diverging gender norms.[ref 348]
Historical cases of the disruptive effects on law and governance from new technologies
Historical precedents for effects of new technology on governance tools:
- Technological changes disrupting or eroding the legal integrity of earlier (treaty) regimes: submarine warfare;[ref 349] implications of cyberwarfare for international humanitarian law;[ref 350] the Soviet Fractional Orbital Bombardment System (FOBS) evading the 1967 Outer Space Treaty’s ban on stationing WMDs “in orbit”;[ref 351] the mid-2010’s US “superfuze” upgrades to its W76 nuclear warheads, massively increasing their counterforce lethality against missile silos without adding a new warhead, missile, or submarine, formally complying with arms control regimes like New START;[ref 352] and various other cases;[ref 353]
- Technologies strengthening international law: satellites strengthening monitoring with treaty compliance,[ref 354] communications technology strengthening the role of non-state and civil-society actors.[ref 355]
Lessons from the history of societal reactions to new technologies
Historical cases that (potentially) provide insights into how societies are likely to perceive, react to, or regulate new technologies.
Historical reactions to and regulations of new technologies
Historical precedents for how key actors are likely to view, treat, or regulate AI:
- The relative roles of various US actors in shaping the development of past strategic general-purpose technologies: biotech, aerospace tech, and cryptography;[ref 356]
- Overall US government policy towards perceived “strategic assets”: oil[ref 357] and early development of US nuclear power regulation;[ref 358]
- The historical use of US antitrust law motivated by national security considerations: various cases over the last century;[ref 359]
- Early regulation of an emerging general-purpose technology: electricity regulation in the US;[ref 360]
- Previous instances of AI development becoming framed as an “arms race” or competition: 1980’s “race” between the US and Japan’s Fifth Generation Computer Systems (FGCS) project;[ref 361]
- Regulation of the “safety” of foundational technology industries, public infrastructures, and sectors: UK regulation of sectors such as medicines and medical devices, food, financial services, transport (aviation & road and rail), energy, and communications;[ref 362]
- High-level state actors buy-in to ambitious early-stage proposals for world control and development of powerful technology: Initial “Baruch Plan” for world control of nuclear weapons (eventually failed);[ref 363] extensive early proposals for world control of airplane technology (eventually failed);[ref 364] and repeated (private and public) US offers to the Soviet Union for a joint US-USSR moon mission, including a 1963 UN General Assembly offer by John F. Kennedy to convert the Apollo lunar landing program into a joint US-Soviet moon expedition (initially on-track, with Nikita Khruschev eager to accept the offer; however, Kennedy was assassinated a week after the offer, the Soviets were too suspicious of similar offers by the Johnson administration, and Khruschev was removed from office by coup in 1964);[ref 365]
- Sustained failure of increasingly more powerful technologies to deliver their anticipated social outcomes: sustained failure of the “Superweapon Peace” idea—the recurring idea that certain weapons of radical destructiveness (nuclear and non-nuclear) may force an end to war by rendering it too destructive to contemplate;[ref 366]
- Strong public and policy reactions to “warning shots” of a technology being deployed: Sputnik launch and Hiroshima bombing;[ref 367]
- Strong public and policy reactions to publicly visible accidents involving a new technology: Three Mile Island meltdown,[ref 368] COVID-19 pandemic,[ref 369] and automotive and aviation industries;[ref 370]
- Regulatory backlash and path dependency: case of genetically modified organism (GMO) regulations in the US vs. the EU;[ref 371]
- “Regulatory capture” and/or influence of industry actors on tech policy, the role of the US military industrial complex in perpetuating the “bomber gap” and “missile gap” myths,[ref 372] and undue corporate influence in the World Health Organisation during the 2009 H1N1 pandemic;[ref 373]
- State norm “antipreneurship” (actions aiming to preserve the prevailing global normative status quo at the global level against proposals for new regulation or norm-setting): US resistance to proposed global restraints on space weapons, between 2000 and the present, utilizing a range of diplomatic strategies and tactics to preserve a permissive international legal framework governing outer space.[ref 374]
Lessons from the history of attempts to initiate technology governance
Historical cases that (potentially) provide insights into when efforts to initiate governance intervention on emerging technologies are likely to be successful and into the efficacy of various pathways towards influencing key actors to deploy regulatory levers in response.
Historical failures to initiate or shape technology governance
Historical cases where a fear of false positives slowed (plausibly warranted) regulatory attention or intervention:
- Failure to act in spite of growing evidence: a review of nearly 100 cases of environmental issues where the precautionary principle was raised, concluding that fear of false positives has often stalled action even though (i) false positives are rare and (ii) there was enough evidence to suggest that a lack of regulation could lead to harm.[ref 375]
Historical cases of excessive hype leading to (possibly) premature regulatory attention or intervention:
- Premature (and possibly counterproductive) legal focus on technologies that eventually took much longer to develop than anticipated: Weather modification technology,[ref 376] deep seabed mining,[ref 377] self-driving cars,[ref 378] virtual and augmented reality,[ref 379] and other technologies charted under the Gartner Hype Cycle reports.[ref 380]
Historical successes for pathways in shaping technology governance
Historical precedents for successful action towards understanding and responding to the risks of emerging technologies, influencing key actors to deploy regulatory levers:
- Relative success in long-range technology forecasting: some types of forecasts for military technology that achieved reasonable accuracy decades out;[ref 381]
- Success in anticipatory governance: history of “prescient actions” in urging early action against risky new technologies, such as Leo Szilard’s warning of the dangers of nuclear weapons[ref 382] and Alexander Fleming’s 1945 warning of the risk of antibiotic resistance;[ref 383]
- Successful early action to set policy for safe innovation in a new area of science:[ref 384] the 1967 Outer Space Treaty, UK’s Warnock Committee and Human Embryology Act 1990, the Internet Corporation for Assigned Names and Numbers (ICANN);
- Governmental reactions and responses to new risks as they emerge: the 1973 Oil Crisis, the 1929–1933 Great Depression,[ref 385] the 2007–2009 financial crisis,[ref 386] the COVID-19 pandemic;[ref 387]
- How effectively other global risks motivated action in response, and how cultural and intellectual orientations influence perceptions: biotechnology, nuclear weapons, global warming, and asteroid collision;[ref 388]
- The impact of cultural media (film, etc.) on priming policymakers to risks:[ref 389] the role of The Day After in motivating Cold War efforts towards nuclear arms control,[ref 390] of the movies Deep Impact and Armageddon in shaping perceptions of the importance of asteroid defense,[ref 391] of the novel Ghost Fleet in shaping Pentagon perceptions of the importance of emerging technologies to war,[ref 392] of Contagion in priming early UK policy responses to COVID-19,[ref 393] of Mission Impossible: Dead Reckoning: Part One in deepening President Biden’s concerns over AI prior to signing a landmark 2023 Executive Order.[ref 394]
- The impact of different analogies or metaphors in framing technology policy:[ref 395] for example, the US military’s emphasis on framing the internet as “cyberspace” (i.e., just another “domain” of conflict) led to strong consequences institutionally (supporting the creation of the US Cyber Command) as well as for how international law has subsequently been applied to cyber operations;[ref 396]
- The role of “epistemic communities” of experts in advocating for international regulation or agreements,[ref 397] specifically their role in facilitating nonproliferation treaties and arms control agreements for nuclear weapons[ref 398] and anti-ballistic missile systems,[ref 399] as well as the history of the earlier era of arms control agreements;[ref 400]
- Attempted efforts towards international control of new technology: early momentum but ultimate failure of the Baruch Plan for world control of nuclear weapons[ref 401] and the failure of world control of aviation in 1920s;[ref 402]
- Policy responses to past scientific breakthroughs, and the role of geopolitics vs. expert engagement: the 1967 UN Outer Space Treaty, the UK’s Warnock Committee and the Human Fertilisation and Embryology Act 1990, the establishment of the Internet Corporation for Assigned Names and Numbers (ICANN), and the European ban on GMO crops;[ref 403]
- The role of activism and protests in spurring nonproliferation and moratoria in spurring nuclear nonproliferation agreements and nuclear test bans;[ref 404] the role of activism (in response to “trigger events”) in achieving a de facto moratorium on genetically modified crops in Europe in the late 1990s;[ref 405] in addition, the likely role of protests and public pressure in contributing to abandonment or slowing of various technologies from geoengineering experiments to nuclear weapons, CFCs, and nuclear power;[ref 406]
- The role of philanthropy and scientists in fostering Track-II diplomacy initiatives: the Pugwash conferences.[ref 407]
Lessons from the historical efficacy of different governance levers
Historical cases that (potentially) provide insights into when different societal (legal, regulatory, and governance) levers have proven effective in shaping technology development and use in desired directions.
Historical failures of technology governance levers
Historical precedents for failed or unsuccessful use of various (domestic and/or international) governance levers for shaping technology:
- Mixed-success use of soft-law governance tools for shaping emerging technologies: National Telecommunications and Information Administration discussions on mobile app transparency, drone privacy, facial recognition, YourAdChoices, UNESCO declaration on genetics and bioethics, Environmental Management System (ISO 14001), Sustainable Forestry Practices by the Sustainable Forestry Initiative and Forest Stewardship Council, and Leadership in Energy and Environmental Design.[ref 408]
- Failed use of soft-law governance tools for shaping emerging technologies: Children’s Online Privacy Protection Rule, Internet Content Rating Association, Platform for Internet Content Selection, Platform for Privacy Preferences, Do Not Track system, and nanotechnology voluntary data call-in by Australia, the US, and the UK;[ref 409]
- Failures of narrowly technology-focused approaches to safety engineering: failure of narrow technology-focused approaches to the design of safe cars and in the design and calibration of pulse oximeters during the COVID pandemic, which were mismatched to—and therefore led to dangerous outcomes for—female drivers and darker-skinned patients, respectively, highlighting the role of incorporating human, psychological, and other disciplines;[ref 410]
- Failures of information control mechanisms at preventing proliferation: selling of nuclear secrets by A.Q. Khan network,[ref 411] limited efficacy of Cold War nuclear secrecy regimes at meaningfully constraining proliferation of nuclear weapons,[ref 412] track record of major leaks and hacks of digital information, 2005–present;[ref 413]
- Failure to transfer (technological) safety techniques, even to allies: in the late 2000s, the US sought to help provide security assistance to Pakistan to help safeguard the Pakistani nuclear arsenal but was unable to transfer permissive action link (PAL) technologies because of domestic legal barriers that forbade export to states that were not part of the Nuclear Non-Proliferation Treaty (NPT);[ref 414]
- Degradation of previously established export control regimes: Cold War-era US high performance computing export controls struggled to be updated sufficiently quickly to keep pace with hardware advancements.[ref 415] The US initially treated cryptography as a weapon under export control laws, meaning that encryption systems could not be exported for commercial purposes even to close allies and trading partners; however, by the late 1990s, several influences—including the rise of open-source software and European indignation at US spying on their communications—led to new regulations that allowed cryptography to be exported with minimal government interference;[ref 416]
- “Missed opportunities” for early action against anticipated risks: mid-2000s effort to put “killer robots” on humanitarian disarmament issue agenda, which failed as these were seen as “too speculative”;[ref 417]
- Mixed success of scientific and industry self-regulation: the Asilomar Conference, the Second International Conference on Synthetic Biology, and 2004–2007 failed efforts to develop guidelines for nanoparticles;[ref 418]
- Sustained failure to establish treaty regimes: various examples, including the international community spending nearly 20 years since 2004 negotiating a new treaty for Biodiversity Beyond National Jurisdiction;[ref 419]
- Unproductive locking-in of insufficient, “empty” institutions, “face-saving” institutions, or gridlocked mechanisms: history of states creating suboptimal, ill-designed institutions—such as the United Nations Forum on Forests, the Copenhagen Accord on Climate Change, the UN Commission on Sustainable Development, and the 1980 UN Convention on Certain Conventional Weapons—with mandates that may deprive them of much capacity for policy formulation or implementation;[ref 420]
- Drawn-out contestation of hierarchical and unequal global technology governance regimes: the Nuclear Non-Proliferation Treaty regime has seen cycles of contestation and challenge by other states;[ref 421]
- Failures of non-inclusive club governance approaches to nonproliferation: the Nuclear Security Summits (NSS) (2012, 2014, 2016) centered on high-level debates over the stocktaking and securing of nuclear materials. These events saw a constrained list of invited states; as a result, the NSS process was derailed because procedural questions over who was invited or excluded came to dominate discussions (especially at the 2016 Vienna summit), politicizing what had been a technical topic and hampering the extension and take-up of follow-on initiatives by other states.[ref 422]
Historical successes of technology governance levers
Historical precedents for successful use of various governance levers at shaping technology:
- Effective scientific secrecy around early development of powerful new technologies: early development of the atomic bomb[ref 423] and early computers (Colossus and ENIAC).[ref 424]
- Successes in the oversight of various safety-critical technologies: track record of “High Reliability Organisations”[ref 425] in addressing emerging risks after initial incidents to achieve very low rates of errors, such as in air traffic control systems, naval aircraft carrier operations,[ref 426] the aerospace sector, construction, and oil refineries;[ref 427]
- Successful development of “defense in depth”[ref 428] interventions to lower the risks of accident in specific industries: safe operation of nuclear reactors, chemical plants, aviation, space vehicles, cybersecurity and information security, software development, laboratories studying dangerous pathogens, improvised explosive devices, homeland security, hospital security, port security, physical security in general, control system safety in general, mining safety, oil rig safety, surgical safety, fire management, and health care delivery,[ref 429] and lessons from defense-in-depth frameworks developed in cybersecurity for frontier AI risks;[ref 430]
- Successful safety “races to the top” in selected industries: Improvements in aircraft safety in the aviation sector;[ref 431]
- Successful use of risk assessment techniques in safety-critical industries: examination of popular risk identification techniques (scenario analysis, fishbone method, and risk typologies and taxonomies), risk analysis techniques (causal mapping, Delphi technique, cross-impact analysis, bow tie analysis, and system-theoretic process analysis), and risk evaluation techniques (checklists and risk matrices) used in established industries like finance, aviation, nuclear, and biolabs, and how these might be applied in advanced AI companies;[ref 432]
- Susceptibility of different types of digital technologies to (global) regulation: relative successes and failures of global regulation of different digital technologies that are (1) centralized and clearly material (e.g., submarine cables), (2) decentralized and clearly material (e.g., smart speakers); (3) centralized and seemingly immaterial (e.g., search engines), and (4) decentralized and seemingly immaterial (e.g., Bitcoin protocol);[ref 433]
- Use of confidence-building measures to stabilize relations and expectations: 1972 Incidents at Sea Agreement[ref 434] and the 12th–19th century development of Maritime Prize Law;[ref 435]
- Successful transfer of developed safety techniques, even to adversaries: the US “leaked” PAL locks on nuclear weapons to the Soviet Union;[ref 436]
- Effective nonproliferation regimes: for nuclear weapons, a mix of norms, treaties, US “strategies of inhibition,”[ref 437] supply-side export controls,[ref 438] and domestic politics factors[ref 439] have produced an imperfect but remarkably robust track record of nonproliferation.[ref 440] Indeed, based on IAEA databases there have historically been 74 states that decided to build or use nuclear reactors, of which 69 have at some time been considered potentially able to pursue nuclear weapons, and of which 10 went nuclear and 7 ran but abandoned a program, and for 14–23, evidence exists of a considered decision not to use their infrastructure to pursue nuclear weapons;[ref 441]
- General design lessons from existing treaty regimes: drawing insights from the design and efficacy of a range of treaties—including the Single Convention on Narcotic Drugs (SCND), the Vienna Convention on Psychotropic Substances (VCPS), the Convention Against Illicit Trafficking of Narcotic Drugs and Psychotropic Substances (CAIT), the Montreal Protocol on Substances that Deplete the Ozone Layer, the Cartagena Protocol on Biosafety to the Convention on Biological Diversity, the Biological Weapons Convention (BWC), the Treaty on the Non-Proliferation of Nuclear Weapons (NPT), the Convention on Nuclear Safety, the Convention on International Trade in Endangered Species (CITES), the Basel Convention on the Control of Transboundary Movements of Hazardous Wastes and their Disposal, and the Bern Convention on the Conservation of European Wildlife and Natural Habitats—to derive design lessons for a global regulatory system dedicated to the regulation of safety concerns from high-risk AI;[ref 442]
- Effective use of international access and benefit distribution mechanisms in conjunction with proliferation control measures: the efficacy of the IAEA’s “dual mandate” to enable the transfer of peaceful nuclear technology whilst seeking to curtail its use for military purposes;[ref 443]
- Effective monitoring and verification (M&V) mechanisms in arms control regimes: M&V implementation across three types of nuclear arms control treaties: nonproliferation treaties, US-USSR/Russia arms limitation treaties, and nuclear test bans;[ref 444]
- Scientific community (temporary) moratoria on research: the Asilomar Conference[ref 445] and the H5N1 gain-of-function debate;[ref 446]
- Instances where treaty commitments, institutional infighting, or bureaucratic politics contributed to technological restraint: a range of cases resulting in cancellation of weapon systems development, including nuclear-ramjet powered cruise missiles, “continent killer” nuclear warheads, nuclear-powered aircraft, “death dust” radiological weapons, various types of anti-ballistic-missile defense, and many others.[ref 447]
- International institutional design lessons from successes and failures in other areas: global governance successes and failures in the regime complexes for environment, security, and/or trade;[ref 448]
- Successful use of soft-law governance tools for shaping emerging technologies: Internet Corporation for Assigned Names and Numbers, Motion Picture Association of America, Federal Trade Commission consent decrees, Federal Communications Commission’s power over broadcaster licensing, Entertainment Software Rating Board, NIST Framework for Improving Critical Infrastructure Cybersecurity, Asilomar rDNA Guidelines, International Gene Synthesis Consortium, International Society for Stem Cell Research Guidelines, BASF Code of Conduct, Environmental Defense Fund, and DuPont Risk Framework;[ref 449]
- Successful use of participatory mechanisms in improving risk assessment: use of scenario methods and risk assessments in climate impact research.[ref 450]
4.4. Lessons derived from ethics and political theory
Mapping the space of principles or criteria for “ideal AI governance”:[ref 451]
- Mapping broad normative desiderata for good governance regimes for advanced AI,[ref 452] either in terms of outputs or in terms of process;[ref 453]
- Understanding how to weigh different good outcomes post-TAI-deployment;[ref 454]
- Understanding the different functional goals and tradeoffs in good international institutional design.[ref 455]
II. Option-identifying work: Mapping actors and affordances
Strategic clarity requires an understanding not just of the features of the advanced AI governance problem, but also of the options in response.
This entails mapping the range of possible levers that could be used in response to this problem. Critically, this is not just about speculating about what governance tools we may want to put in place for future advanced AI systems mid-transition (after they have arrived). Rather, there might be actions we could take in the “pre-emergence” stage to adequately prepare ourselves.[ref 456]
Within the field, there has been extensive work on options and areas of intervention. Yet there is no clear, integrated map of the advanced AI governance landscape and its gaps. Sam Clarke proposes that there are different ways of carving up the landscape, such as based on different types of interventions, different geographic hubs, or “Theories of Victory.”[ref 457] To extend this, one might segment the advanced AI governance solution space along work which aims to identify and understand, in turn:[ref 458]
- Key actors that will likely (be in a strong position to) shape advanced AI;
- Levers of influence (by which these actors might shape advanced AI);
- Pathways towards influencing these actors to deploy their levers well.[ref 459]
1. Potential key actors shaping advanced AI
In other words, whose decisions might especially affect the development and deployment of advanced AI, directly or indirectly, such that these decisions should be shaped to be as beneficial as possible?
Key actors can be defined as “actors whose key decisions will have significant impact on shaping the outcomes from advanced AI, either directly (first-order), or by strongly affecting such decisions made by other actors (second-order).”[ref 460]
Key decisions can be further defined as “a choice or series of choices by a key actor to use its levers of governance, in ways that directly affect beneficial advanced AI outcomes, and which are hard to reverse.”[ref 461]
Some work in this space explores the relative importance of (the decisions of) different types of key actors:
- The roles of state vs. firms vs. AI researchers in shaping AI policy;[ref 462]
- Role of “epistemic communities” of scientific experts,[ref 463] especially members of the AI research community;[ref 464]
- The role of different potentially relevant stakeholders for responsible AI systems across its development chain, from individual stakeholders to organizational stakeholders to national/international stakeholders;[ref 465]
- The relative role of expert advice vs. public pressure in shaping policymakers’ approach to AI;[ref 466]
- Role of different actors in and around the corporation in shaping lab policy,[ref 467] including actors within the lab (e.g., senior management, shareholders, AI lab employees, and employee activists)[ref 468] and actors outside the lab (e.g., corporate partners and competitors, industry consortia, nonprofit organizations, the public, the media, and governments).[ref 469]
Other work focuses more specifically on mapping particular key actors whose decisions may be particularly important in shaping advanced AI outcomes, depending on one’s view of strategic parameters.
The following list should be taken more as a “landscape” review than a literature review, since coverage of different actors differs amongst papers. Moreover, while the list aims to be relatively inclusive of actors, it is clear that the (absolute and relative) importance of each of these actors obviously differs hugely between worldviews and approaches.
1.1. AI developer (lab & tech company) actors
Leading AI firms pursuing AGI:
- OpenAI,
- DeepMind,
- Anthropic,
- Aleph Alpha,
- Adept,
- Cohere,
- Inflection,
- Keen,
- xAI.[ref 470]
Chinese labs and institutions researching “general AI”;
- Baidu Research,
- Alibaba DAMO Academy,
- Tencent AI Lab,
- Huawei,
- JD Research Institute,
- Beijing Institute for General Artificial Intelligence;
- Beijing Academy of Artificial Intelligence, etc.[ref 471]
Large tech companies[ref 472] that may take an increasingly significant role in AGI research:
- Microsoft,
- Google,
- Facebook,
- Amazon.
Future frontier labs, currently not known but to be established/achieve prominence (e.g., “Magma”[ref 473]).
1.2. AI services & compute hardware supply chains
AI services supply chain actors:[ref 474]
- Cloud computing providers:[ref 475]
- Globally: Amazon Web Services (32%), Microsoft Azure (21%), and Google Cloud (8%); IBM;
- Chinese market: Alibaba, Huawei, and Tencent.
Hardware supply chain industry actors:[ref 476]
- Providers of optical components to photolithography machine manufacturers:
- Carl Zeiss AG [Germany], a key ASML supplier of optical lenses;[ref 477]
- Producers of extreme ultraviolet (EUV) photolithography machines:
- ASML [The Netherlands].[ref 478]
- Photoresist processing providers:
- Asahi Kasei and Tokyo Ohka Kogyo Co. [Japan].[ref 479]
- Advanced chip manufacturing:
- TMSC [Taiwan];
- Intel [US];
- Samsung [South Korea].
- Semiconductor intellectual property owners and chip designers:
- Arm [UK];
- Graphcore [UK].
- DRAM integrated circuit chips:
- Samsung (market share 44%) [South Korea];
- SK hynix (27%) [South Korea];
- Micron (22%) [US].
- GPU providers:
- Intel (market share 62%) [US];
- AMD (18%) [US];
- NVIDIA (20%) [US].
1.3. AI industry and academic actors
Industry bodies:
- Partnership on AI;
- Frontier Model Forum;[ref 480]
- ML Commons;[ref 481]
- IEEE (Institute of Electrical and Electronics Engineers) + IEEE-SA (standards body);
- ISO (and IEC).
Standard-setting organizations:
- US standard-setting organizations (NIST);
- European Standards Organizations (ESOs), tasked with setting standards for the EU AI Act: the European Committee for Standardisation (CEN), European Committee for Electrotechnical Standardisation (CENELEC), and European Telecommunications Standards Institute (ETSI);[ref 482]
- VDE (influential German standardization organization).[ref 483]
Software tools & community service providers:
- arXiv;
- GitHub;
- Colab;
- Hugging Face.
Academic communities:
- Scientific ML community;[ref 484]
- AI conferences: NeurIPS, AAAI/ACM, ICLR, IJCAI-ECAI, AIES, and FAccT, etc.;
- AI ethics community and various subcommunities;
- Numerous national-level academic or research institutes.
Other active tech community actors:
- Open-source machine learning software community;[ref 485]
- “Open”/diffusion-encouraging[ref 486] AI community (e.g., Stability.ai, Eleuther.ai);[ref 487]
- Hacker communities;
- Cybersecurity and information security expert communities.[ref 488]
1.4. State and governmental actors
Various states, and their constituent (government) agencies or bodies that are, plausibly will be, or potentially could be moved to be in powerful positions to shape the development of advanced AI.
The United States
Key actors in the US:[ref 489]
- Executive Branch actors;[ref 490]
- Legislative Branch;[ref 491]
- Judiciary;[ref 492]
- Federal agencies;[ref 493]
- Intelligence community;[ref 494]
- Independent federal agencies;[ref 495]
- Relevant state and local governments, such as the State of California (potentially significant extraterritorial regulatory effects),[ref 496] State of Illinois and State of Texas (among first states to place restrictions on biometrics), etc.
China
Key actors in China:[ref 497]
- 20th Central Committee of the Chinese Communist Party;
- China’s State Council;
- Bureaucratic actors engaged in AI policy-setting;[ref 498]
- Actors and institutions engaged in track-II diplomacy on AI.[ref 499]
The EU
Key actors in the EU:[ref 500]
- European Commission;
- European Parliament;
- Scientific research initiatives and directorates;[ref 501]
- (Proposed) European Artificial Intelligence Board and notified bodies.[ref 502]
The UK
Key actors in the UK:[ref 503]
- The Cabinet Office;[ref 504]
- Foreign Commonwealth and Development Office (FCDO);
- Ministry of Defence (MoD);[ref 505]
- Department for Science, Innovation and Technology (DSIT);[ref 506]
- UK Parliament;[ref 507]
- The Digital Regulators Cooperation Forum;
- Advanced Research and Invention Agency (ARIA).
Other states with varying roles
Other states that may play key roles because of their general geopolitical influence, AI-relevant resources (e.g., compute supply chain and significant research talent), or track record as digital norm setters:
- Influential states: India, Russia, and Brazil;
- Significant AI research talent: France, and Canada;
- Hosting nodes in the global hardware supply chain: US (NVIDIA), Taiwan (TSMC), South Korea (Samsung), the Netherlands (ASML), Japan (photoresist processing), UK (Arm), and Germany (Carl Zeiss AG);
- Potential (regional) neutral hubs: Singapore[ref 508] and Switzerland;[ref 509]
- Global South coalitions: states from the Global South[ref 510] and coalitions of Small Island Developing States (SIDS);[ref 511]
- Track record of (digital) norm-setters: Estonia and Norway.[ref 512]
1.5. Standard-setting organizations
International standard-setting institutions:[ref 513]
- ISO;
- IEC;
- IEEE;
- CEN/CENELEC;
- VDE (Association for Electrical, Electronic & Information Technologies) and its AI Quality & Testing Hub.[ref 514]
1.6. International organizations
Various United Nations agencies:[ref 515]
- ITU;[ref 516]
- UNESCO;[ref 517]
- Office of the UN Tech Envoy (conducting the process leading to the Global Digital Compact in 2024);
- UN Science, Technology, and Innovation (STI) Forum;
- UN Executive Office of the Secretary-General;
- UN General Assembly;
- UN Security Council (UNSC);
- UN Human Rights Council;[ref 518]
- Office of the High Commissioner on Human Rights;[ref 519]
- UN Chief Executives Board for Coordination;[ref 520]
- Secretary-General’s High-Level Advisory Board on Effective Multilateralism (HLAB);
- Secretary-General’s High-Level Advisory Body on Artificial Intelligence (“AI Advisory Body”).[ref 521]
Other international institutions already engaged on AI in some capacity[ref 522] (in no particular order):
- OECD;[ref 523]
- Global Partnership on AI;
- G7;[ref 524]
- G20;[ref 525]
- Council of Europe (Ad Hoc Committee on Artificial Intelligence (CAHAI));[ref 526]
- NATO;[ref 527]
- AI Partnership for Defense;[ref 528]
- Global Road Traffic Forum;[ref 529]
- International Maritime Organisation;
- EU-US Trade and Technology Council (TTC);[ref 530]
- EU-India Trade and Technology Council;
- Multi-stakeholder fora: World Summit on the Information Society (WSIS), Internet Governance Forum (IGF), Global Summit on AI for Good,[ref 531] and World Economic Forum (Centre for Trustworthy Technology).
Other international institutions not yet engaged on AI:
- International & regional courts: International Criminal Court (ICC), International Court of Justice (ICJ), and European Court of Justice.
1.7. Public, Civil Society, & media actors
Civil society organizations:[ref 532]
- Gatekeepers engaged in AI-specific norm-setting and advocacy: Human Rights Watch, Campaign to Stop Killer Robots,[ref 533] and AlgorithmWatch;[ref 534]
- Civilian open-source intelligence (OSINT) actors engaged in monitoring state violations of human rights and international humanitarian law:[ref 535] Bellingcat, NYT Visual Investigation Unit, CNS (Arms Control Wonk), Middlebury Institute, Forensic Architecture, BBC Africa Eye, Syrian Archive, etc.
- Military AI mediation: Centre for Humanitarian Dialogue and Geneva Centre for Security Policy.[ref 536]
Media actors:
- Mass media;[ref 537]
- Tech media;
- “Para-scientific media.”[ref 538]
Cultural actors:
- Film industry (Hollywood, etc.);
- Influential and widely read authors.[ref 539]
2. Levers of governance (for each key actor)
That is, how might each key actor shape the development of advanced AI?
A “lever (of governance)” can be defined as “a tool or intervention that can be used by key actors to shape or affect (1) the primary outcome of advanced AI development; (2) key strategic parameters of advanced AI governance; (3) other key actors’ choices or key decisions.”[ref 540]
Research in this field includes analysis of different types of tools (key levers or interventions) available to different actors to shape advanced AI development and use.[ref 541]
2.1. AI developer levers
Developer (intra-lab)-level levers:[ref 542]
- Levers for adequate AI model evaluation and technical safety testing:[ref 543] decoding; limiting systems, adversarial training; throughout-lifecycle test, evaluation, validation, and verification (TEVV) policies;[ref 544] internal model safety evaluations;[ref 545] and risk assessments;[ref 546]
- Levers for safe risk management in AI development process: Responsible Scaling Policies (RSPs),[ref 547] the Three Lines of Defense (3LoD) model,[ref 548] organizational and operational criteria for adequately safe development,[ref 549] and “defense in depth” risk management procedures;[ref 550]
- Levers to ensure cautious overall decision-making: ethics and oversight boards;[ref 551] corporate governance policies that support and enable cautious decision-making,[ref 552] such as establishing an internal audit team;[ref 553] and/or incorporating as a Public Benefit Corporation to allow the board of directors to balance stockholders’ pecuniary interests against the corporation’s social mission;
- Levers to ensure operational security: information security best practices[ref 554] and structured access mechanisms[ref 555] at the level of cloud-based AI service interfaces;
- Policies for responsibly sharing safety-relevant information: information-providing policies to increase legibility and compliance: model cards;[ref 556]
- Policies to ensure organization can pace and/or pause capability research:[ref 557] Board authority to pause research and channels to invite external AI scientists to review alignment of systems.[ref 558]
Developer external (unilateral) levers:
- Use of contracts and licensing to attempt to limit uses of AI and its outputs (e.g., the Responsible AI Licenses (RAIL) initiative);[ref 559]
- Voluntary safety commitments;[ref 560]
- Norm entrepreneurship (i.e., lobbying, public statements, or initiatives that signal public concern and/or dissatisfaction with an existing state of affairs, potentially alerting others to the existence of a shared complaint and facilitating potential “norm cascades” towards new expectations or collective solutions).[ref 561]
2.2. AI industry & academia levers
Industry-level (coordinated inter-lab) levers:
- Self-regulation;[ref 562]
- Codes of conduct;
- AI ethics principles;[ref 563]
- Professional norms;[ref 564]
- AI ethics advisory committees;[ref 565]
- Incident databases;[ref 566]
- Institutional, software, and hardware mechanisms for enabling developers to make verifiable claims;[ref 567]
- Bug bounties;[ref 568]
- Evaluation-based coordinated pauses;[ref 569]
- Other inter-lab cooperation mechanisms:[ref 570]
- Assist Clause;[ref 571]
- Windfall Clause;[ref 572]
- Mutual monitoring agreements (red-teaming, incident-sharing, compute accounting, and seconding engineers);
- Communications and heads-up;
- Third-party auditing;
- Bias and safety bounties;
- Secure compute enclaves;
- Standard benchmarks & audit trails;
- Publication norms.[ref 573]
Third-party industry actors levers:
- Publication reviews;[ref 574]
- Certification schemes;[ref 575]
- Auditing schemas.[ref 576]
Scientific community levers:
- Institutional Review Boards (IRBs);[ref 577]
- Conference or journal pre-publication impact assessment requirements;[ref 578] academic conference practices;[ref 579]
- Publication and model sharing and release norms;[ref 580]
- Benchmarks;[ref 581]
- Differential technological development (innovation prizes);[ref 582]
- (Temporary) moratoria.[ref 583]
2.3. Compute supply chain industry levers
Global compute industry-level levers:[ref 584]
- Stock-and-flow accounting;
- Operating licenses;
- Supply chain chokepoints;[ref 585]
- Inspections
- Passive architectural on-chip constraints (e.g., performance caps)
- Active architectural on-chip constraints (e.g., shutdown mechanisms)
2.4. Governmental levers
We can distinguish between general governmental levers and the specific levers available to particular key states.
General governmental levers[ref 586]
Legislatures’ levers:[ref 587]
- Create new AI-specific regimes, such as:
- Horizontal risk regulation;[ref 588]
- Industry-specific risk regulatory regimes;
- Permitting, licensing, and market gatekeeping regimes;[ref 589]
- Bans or moratoria;
- Know-Your-Customer schemes.[ref 590]
- Amend laws to extend or apply existing regulations to AI:[ref 591]
- Domain/industry-specific risk regulations;
- Competition/antitrust law,[ref 592] including doctrines around merger control, abuse of dominance, cartels, and collusion; agreements on hardware security; and state aid;
- Liability law;[ref 593]
- Insurance law;[ref 594]
- Contract law;[ref 595]
- IP law;[ref 596]
- Copyright law (amongst others through its impact on data scraping practices);[ref 597]
- Criminal law;[ref 598]
- Privacy and data protection law (amongst others through its impact on data scraping practices);
- Public procurement law and procurement processes.[ref 599]
Executive levers:
- Executive orders;
- Foreign investment restrictions;
- AI R&D funding strategies;[ref 600]
- Nationalization of firms;
- Certification schemes;
- Various tools of “differential technology development”:[ref 601] policies for preferential advancement of safer AI architectures (funding and direct development programs, government prizes, advanced market commitments, regulatory requirements, and tax incentives)[ref 602] and policies for slowing down research lines towards dangerous AI architectures (moratoria, bans, defunding, divestment, and/or “stage-gating” review processes);[ref 603]
- Foreign policy decisions, such as initiating multilateral treaty negotiations.
Judiciaries’ levers:
- Judicial decisions handed down on cases involving AI that extend or apply existing doctrines to AI, shaping economic incentives and setting precedent for regulatory treatment of advanced AI, such as the US Supreme Court ruling on Gonzalez v. Google, which has implications for whether algorithmic recommendations will receive full Section 230 protections;[ref 604]
- Judicial review, especially of drastic executive actions taken in response to AI risk scenarios;[ref 605]
- Judicial policymaking, through discretion in evaluating proportionality or balancing tests.[ref 606]
Expert agencies’ levers:
- A mix of features of other actors, from setting policies to adjudicating disputes to enforcing decisions;[ref 607]
- Create or propose soft law.[ref 608]
Ancillary institutions:
- Improved monitoring infrastructures;[ref 609]
- Provide services in terms of training, insurance, procurement, identification, archiving, etc.[ref 610]
Foreign Ministries/State Department:
- Set activities and issue agendas in global AI governance institutions;
- Bypass or challenge existing institutions by engaging in “competitive regime creation,”[ref 611] “forum shopping,”[ref 612] or the strategic creation of treaty conflicts;[ref 613]
- Initiate multilateral treaty negotiations;
- Advice policymakers about the existence and meaning of international law and which obligations these impose;[ref 614]
- Conduct state behavior around AI issues (in terms of state policy, and through discussion of AI issues in national legislation, diplomatic correspondence, etc.) in such a way as to contribute to the establishment of binding customary international law (CIL).[ref 615]
Specific key governments levers
Levers available to specific key governments:
US-specific levers:[ref 616]
- AI-specific regulations, such as the AI Bill of Rights;[ref 617] Algorithmic Accountability Act;[ref 618] 2023 Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence;[ref 619] and various currently pending federal legislative proposals for regulating generative and/or frontier AI;[ref 620]
- General levers,[ref 621] such as federal R&D funding, foreign investment restrictions, export controls,[ref 622] visa vetting, expanded visa pathways, secrecy orders, voluntary screening procedures, use of the Defense Production Act,[ref 623] antitrust enforcement, the “Born Secret” Doctrine, nationalization of companies or compute hardware, various Presidential Emergency powers,[ref 624] etc.
EU-specific levers:
- AI-specific regulations, including:
- The AI Act, which will have direct regulatory effects[ref 625] but may also exert extraterritorial impact as part of a “Brussels Effect”;[ref 626]
- Standard-setting by European Standards Organizations (ESOs);[ref 627]
- AI Liability Directive.[ref 628]
China-specific levers:
- AI-specific regulations;[ref 629]
- Standards;[ref 630]
- Activities in global AI governance institutions.[ref 631]
UK-specific levers:[ref 632]
- National Security and Investment Act 2021;
- Competition Law: 1998 Competition Act;
- Export Control legislation;
- Secrecy orders.
2.5. Public, civil society & media actor levers
Civil Society/activist movement levers:[ref 633]
- Lab-level (internal) levers:
- Shareholder activism, voting out CEOs;
- Unions and intra-organizational advocacy, strikes, and walkouts;[ref 634]
- Capacity-building of employee activism via recruitment, political education, training, and legal advice.
- Lab-level (external) levers:
- Stigmatization of irresponsible practices;[ref 635]
- Investigative journalism, awareness-raising of scandals and incidents, hacking and leaks, and whistleblowing;
- Impact litigation[ref 636] and class-action lawsuits;[ref 637]
- Public protest[ref 638] and direct action (e.g., sit-ins).
- Industry-level levers:
- Norm advocacy and lobbying;
- Open letters and statements;
- Mapping and highlighting (compliance) performance of companies; establishing metrics, indexes, and prizes; and certification schemes.[ref 639]
- Public-focused levers:
- Media content creation;[ref 640]
- Boycott and divestment;
- Shaming of state noncompliance with international law;[ref 641]
- Emotional contagion—shaping and disseminating of public emotional dynamics or responses to a crisis.[ref 642]
- Creating alternatives:
- Public interest technology research;
- Creating alternative (types of) institutions[ref 643] and new AI labs.
- State-focused levers:
- Monitor compliance with international law.[ref 644]
2.6. International organizations and regime levers
International standards bodies’ levers:
- Set technical safety and reliability standards;[ref 645]
- Undertake “para-regulation,” setting pathways for future regulation not by imposing substantive rules but rather by establishing foundational concepts or terms.[ref 646]
International regime levers:[ref 647]
- Setting or shaping norms and expectations:
- Setting, affirming, and/or clarifying states’ obligations under existing international law principles;
- Set fora and/or agenda for negotiation of new treaties or regimes in various formats, such as:
- Broad framework conventions;[ref 648]
- Nonproliferation and arms control agreements;[ref 649]
- Export control regimes.[ref 650]
- Create (technical) benchmarks and focal points for decision-making by both states and non-state actors;[ref 651]
- Organize training and workshops with national officials.
- Coordinating behavior; reducing uncertainty, improving trust:
- Confidence-building measures;[ref 652]
- Review conferences (e.g., BWC);
- Conferences of parties (e.g., UNFCCC);
- Establishing information and benefit-sharing mechanisms.
- Creating common knowledge or shared perceptions of problems; establish “fire alarms”:
- Intergovernmental scientific bodies (e.g., Intergovernmental Panel on Climate Change (IPCC) and Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES));
- International warning systems (e.g., WHO’s “public health emergency of international concern” mechanism).
- Adjudicating and arbitrating state disagreements over application of policies, resolving tensions or crises for regimes:
- Arbitral bodies (e.g., WTO Appellate Body);
- Adjudicatory tribunals (e.g., ICJ);
- Treaty bodies (e.g., Human Rights Committee);
- Other dispute resolution mechanisms (e.g., BWC or Resolution 1540 allowing complaints to be lodged at the UNSC).
- Establishing material constraints:
- Supply-side material proliferation controls (e.g., stock-and-flow accounting and trade barriers);
- Fair and equitable treatment standards in international investment law.
- Monitoring state compliance:
- Inspection regimes;
- Safeguards;
- National contributions;
- Network of national contact points.
- Sanctioning noncompliance:
- Inducing direct costs through sanctions;
- Inducing reputational costs,[ref 653] in particular through shaming.[ref 654]
2.7. Future, new types of institutions and levers
Novel governance institutions and innovations:
- “Regulatory markets” and private regulatory authorities;[ref 655]
- New monitoring institutions and information markets;[ref 656]
- Quadratic voting and radical markets[ref 657]
- Blockchain smart contracts.[ref 658]
3. Pathways to influence (on each key actor)
That is, how might concerned stakeholders ensure that key actors use their levers to shape advanced AI development in appropriate ways?
In this context, a “pathway (to influence)” can be defined as “a tool or intervention by which other actors (that may not themselves be key actors) can affect, persuade, induce, incentivize, or require key actors to make certain key decisions around the governance of AI. This can include interventions that ensure that certain levers of control are (not) used, or used in particular ways.”[ref 659]
This includes research on the different pathways by which the use of these above levers might be enabled, advocated for, and implemented (i.e., the tools available to affect the decisions by key actors).
This can draw on mappings and taxonomies: “A Map to Navigate AI Governance”[ref 660] “The Longtermist AI Governance Landscape”.[ref 661]
3.1. Pathways to directly shaping advanced AI systems’ actions through law
Directly shaping advanced AI actions through law (i.e., legal systems and norms as an anchor or lodestar for technical alignment approaches):
- “Law-following AI”;[ref 662]
- Encode “incomplete contracting” as a framework for AI alignment;[ref 663]
- Negative human rights as technical safety constraint for minimal alignment;[ref 664]
- Human rights norms as a benchmark for maximal alignment;[ref 665]
- Encode fiduciary duties towards users into AI systems;[ref 666]
- Mandatory on-chip controls (monitoring and remote shutdown);
- Legal informatics approach to alignment.[ref 667]
3.2. Pathways to shaping governmental decisions
Shaping governmental decisions around AI levers at the level of:
- Legislatures:
- Advocacy within the legislative AI policymaking process.[ref 668]
- Executives:
- Serve as high-bandwidth policy advisor;[ref 669]
- Provide actionable technical information;[ref 670]
- Shape, provide, or spread narratives,[ref 671] ideas, “memes,”[ref 672] framings, or (legal) analogies[ref 673] for AI governance.
- Clarify or emphasize established principles within national law (e.g., precautionary principle and cost-benefit analysis[ref 674]) and/or state obligations under international law (e.g., customary international law,[ref 675] IHRL,[ref 676] etc.).
3.3. Pathways to shaping court decisions
Shaping court decisions around AI systems that set critical precedent for the application of AI policy to advanced AI:
- Advance legal scholarship with new arguments, interpretations, or analogies and metaphors for AI technology;[ref 677]
- Clarifying the “ordinary meaning” of key legal terms around AI;[ref 678]
- Judge seminars and training courses;[ref 679]
- Online information repositories.[ref 680]
3.4. Pathways to shaping AI developers’ decisions
Shaping individual lab decisions around AI governance:
- Governmental regulations (e.g., industry risk, liability, criminal, etc.);
- Institutional design choices: establish rules in the charter that enable the board of directors to make more cautious or pro-social choices,[ref 681] and establish an internal AI ethics board[ref 682] or internal audit functions;[ref 683]
- Campaigns or resources to educate researchers about AI risk, making AI safety research more concrete and legible, and/or creating common knowledge about researchers’ perceptions of and attitudes towards these risks;[ref 684]
- Employee activism and pressure,[ref 685] and documented communications of risks by employees (which make companies more risk averse because they are more likely to be held liable in court);[ref 686]
- Human rights norms generally applicable to business activities under the Ruggie Principles,[ref 687] which amongst others can directly influence decisions by tech company oversight bodies;[ref 688]
- Develop and provide clear industry standards and resources for their implementation, such as AI risk management frameworks.[ref 689]
Shaping industry-wide decisions around AI governance:
- Governmental regulations (as above);
- Ensure competition law frameworks enable cooperation on safety.[ref 690]
3.5. Pathways to shaping AI research community decisions
Shaping AI research community decisions around AI governance:
- Develop and disseminate clear guidelines and toolsets to facilitate responsible practices, such as:
- Frameworks for pre-publication impact assessment of AI research;[ref 691]
- “Model cards” for the transparent reporting of benchmarked evaluations of a model’s performance across conditions and for different groups;[ref 692]
- General risk management frameworks for evaluating and anticipating AI risks.[ref 693]
- Framing and stigmatization around decisions or practices;[ref 694]
- Participatory technology assessment processes.[ref 695]
Shaping civil society decisions around AI governance:
- Work with “gatekeeper” organizations to put issues on the advocacy agenda.[ref 696]
3.6. Pathways to shaping international institutions’ decisions
Shaping international institutional decisions around AI governance:
- Clarify global administrative law obligations;[ref 697]
- Influence domestic policy processes in order to indirectly shape transnational legal processes;[ref 698]
- Scientific expert bodies’ role in informing multilateral treaty-making by preparing evidence for treaty-making bodies, scientifically advising these bodies, and directly exchanging with them at intergovernmental body sessions or dialogical events.[ref 699]
Shaping standards bodies’ decisions around AI governance:
- Technical experts’ direct participation in standards development;[ref 700]
- Advancing standardization of advanced AI-relevant safety best practices.[ref 701]
3.7. Other pathways to shape various actors’ decisions
Shaping various actors’ decisions around AI governance:
- Work to shape broad narratives around advanced AI, such as through compelling narratives or depictions of good outcomes;[ref 702]
- Work to shape analogies or metaphors used by the public, policymakers, or courts in thinking about (advanced) AI;[ref 703]
- Pursue specific career paths with key actors to contribute to good policymaking.[ref 704]
III. Prescriptive work: Identifying priorities and proposing policies
Finally, a third category of work aims to go beyond either analyzing the problem of AI governance (Part I) or surveying potential elements or options for governance solutions analytically (Part II). This category is rather prescriptive in that it aims to directly propose or advocate for specific policies or actions by key actors. This includes work focused on:
- Articulating broad theories of change to identify priorities for AI governance (given a certain view of the problem and of the options available);
- Articulating broad heuristics for crafting good AI regulation;
- Putting forward policy proposals as well as assets that aim to help in their implementation.
1. Prioritization: Articulating theories of change
Achieving an understanding of the AI governance problem and potential options in response is valuable. Yet, this is not enough alone to deliver strategic clarity about which of these actors should be approached or which of these levers should be utilized in what ways. For that, it is necessary to develop more systematic accounts of different (currently held or possible) theories of change or impact.
The idea of exploring and comparing such theories of action is not new. There have been various accounts that aim to articulate the linkages between near-term actions and longer-term goals. Some of these have focused primarily on theories of change (or “impact”) from the perspective of technical AI alignment.[ref 705] Others have articulated more specific theories of impact for the advanced AI governance space.[ref 706] These include:
- Dafoe’s Asset-Decision model, which focuses on the direction of research activities to help (1) create assets which can eventually (2) inform impactful decisions;[ref 707]
- Leung’s model for impactful AI strategy research that can shape key decisions by (1) those developing and deploying AI and (2) those actors shaping the environments in which it is developed and deployed (i.e., research lab environment, legislative environment, and market environment).[ref 708]
- Garfinkel’s “AI Strategy: Pathways for Impact,”[ref 709] which highlights three distinct pathways for positively influencing the development of advanced AI: (1) become a decision-maker (or close enough to influence one), (2) spread good memes that are picked up by decision-makers, and (3) think of good memes to spread and make them credible;
- Baum’s framework for “affecting the future of AI governance,” which distinguishes several avenues by which AI policy could shape the long-term:[ref 710] (1) improve current AI governance, (2) support AI governance communities, (3) advance research on future AI governance, (4) advance CS design of AI safety and ethics to create solutions, and (5) improve underlying governance conditions.
In addition, some have articulated specific scenarios for what successful policy action on advanced AI might look like,[ref 711] especially in the relative near-term future (“AI strategy nearcasting”).[ref 712] However much further work is needed.
2. General heuristics for crafting advanced AI policy
General heuristics for making policies relevant or actionable to advanced AI.
2.1. General heuristics for good regulation
Heuristics for crafting good AI regulation:
- Utilizing and articulating suitable terminology for drafting and scoping AI regulations, especially risk-focused terms;[ref 713]
- Understand implications of different regulatory approaches (ex ante, ex post; risk regulation) for AI regulations;[ref 714]
- Grounding AI policy within an “all-hazards” approach to managing various other global catastrophic risks simultaneously;[ref 715]
- Requirements for an advanced AI regime to avoid “perpetual risk”: exclusivity, benevolence, stability, and successful alignment;[ref 716]
- Establishing monitoring infrastructures to provide governments with actionable information.[ref 717]
2.2. Heuristics for good institutional design
Heuristics for good institutional design:
- General desiderata and tradeoffs for international institutional design in terms of questions of regime centralization or decentralization;[ref 718]
- Procedural heuristics for organizing international negotiation processes: ensure international AI governance fora are inclusive of Global South actors;[ref 719]
- Ideal characteristics of global governance systems for high-risk AI, such as those that (1) govern dual-use technology; (2) take a risk-based approach; (3) provide safety measures; (4) incorporate technically informed, expert-driven, multi-stakeholder processes that enable rapid iteration; (5) where the effects are consistent with the treaty’s intent; and (6) that possess enforcement mechanisms.[ref 720]
2.3. Heuristics for future-proofing governance
Heuristics for future-proofing governance regimes and desiderata and systems for making existing regulations more adaptive, scalable, or resilient:[ref 721]
- Traditional (treaty) reform or implementation mechanisms:
- The formal treaty amendment process;[ref 722]
- Unilateral state actions (explanatory memoranda and treaty reservations) or multilateral responses (Working Party Resolution) to adapt multilateral treaties;[ref 723]
- The development of lex scripta treaties through the lex posteriori of customary international law, spurred by new state behavior.[ref 724]
- Adaptive treaty interpretation methods:
- Evolutionary interpretation of treaties;[ref 725]
- Treaty interpretation under the principle of systemic integration.[ref 726]
- Instrument choices that promote flexibility:
- Use of framework conventions;[ref 727]
- Use of informal governance institutions;[ref 728]
- The subsequent layering of soft law on earlier hard-law regimes;[ref 729]
- Use of uncorrelated governance instruments to enable legal resilience.[ref 730]
- Regime design choices that promote flexibility:
- Scope: include key systems (“general-purpose AI systems,” “highly capable foundation models,” “frontier AI systems,” etc.) within the material scope of the regulation;[ref 731]
- Phrasing: in-text technological neutrality or deliberate ambiguity;[ref 732]
- Flexibility provisions: textual flexibility provisions[ref 733] such as exceptions or flexibility clauses.
- Flexibility approaches beyond the legal regime:
- Pragmatic and informal “emergent flexibility” about the meaning of norms and rules during crises.[ref 734]
3. Policy proposals, assets and products
That is, what are specific proposals for policies to be implemented? How can these proposals serve as products or assets in persuading key actors to act upon them?
In this context, a “(decision-relevant) asset” can be defined as: “resources that can be used by other actors in pursuing pathways to influence key actors with the aim to induce how these key actors make key decisions (e.g., about whether or how to use their levers). This includes new technical research insights, worked-out policy products, networks of direct advocacy, memes, or narratives.”
A “(policy) product” can be defined as “a subclass of assets; specific legible proposals that can be presented to key actors.”
Specific proposals for advanced AI-relevant policies; note that these are presented without comparison or prioritization. This list is non-exhaustive. Many proposals moreover combine several ideas, falling into different categories.
3.1. Overviews and collections of policies
- Previous collections of older proposals, such as Dewey’s list of “long-term strategies for ending existential risk”[ref 735] as well as Sotala and Yampolskiy’s survey of high-level “responses” to AI risk.[ref 736]
- More recent lists and collections of proposed policies to improve the governance, security, and safety of AI development[ref 737] in domains such as compute security and governance; software export controls; licenses;[ref 738] policies to establish improved standards, system evaluations, and licensing regimes; procurement rules and funding for AI safety;[ref 739] or to establish a multinational AGI consortium to enable oversight of advanced AI, a global compute cap, and affirmative safety evaluations.[ref 740]
3.2. Proposals to regulate AI using existing authorities, laws, or institutions
In particular, drawing on evaluations of the default landscape of regulations applied to AI (see Section I.3.3), and of the levers of governance for particular governments (see Section II.2.4).
Regulate AI using existing laws or policies
- Strengthen or reformulate existing laws and policies, such as EU competition law,[ref 741] contract and tort law,[ref 742] etc.;
- Strengthen or reorganize existing international institutions[ref 743] rather than establishing new institutions;[ref 744]
- Extend or apply existing principles and regimes in international law,[ref 745] including, amongst others:
- Norms of international peace and security law:
- Prohibitions on the use of force and intervention in the domestic affairs of other states;
- Existing export control and nonproliferation agreements.
- Principles of international humanitarian law, such as:
- Distinction and proportionality in wartime;
- Prohibition on weapons that are by nature indiscriminate or cause unnecessary suffering;
- The requirements of humanity;
- The obligation to conduct legal reviews of new weapons or means of war (Article 36 under Additional Protocol I to the Geneva Conventions).
- Norms of international human rights law[ref 746] and human rights and freedoms, including the right to life and freedom from cruel, inhuman, and degrading treatment, among others; the rights to freedom of expression, association, and security of the person, among others; and the principle of human dignity;[ref 747]
- Norms of international environmental law, including the no-harm principle and the principle of prevention and precaution;
- International criminal law, with regard to war crimes and crimes against humanity and with regard to case law of international criminal courts regarding questions of effective control;[ref 748]
- Rules on state responsibility,[ref 749] including state liability for harm;
- Peremptory norms of jus cogens, outlawing, for example, genocide, maritime piracy, slavery, wars of aggression, and torture;
- International economic law:[ref 750] security exception measures under international trade law and non-precludement measures under international investment law, amongst others;[ref 751]
- International disaster law: obligations regarding disaster preparedness, including forecasting and pre-disaster risk assessment, multi-sectoral forecasting and early warning systems, disaster risk and emergency communication mechanisms, etc. (Sendai Framework);
- Legal protections for the rights of future generations: including existing national constitutional protections for the rights of future generations[ref 752] and a potential future UN Declaration on Future Generations.[ref 753]
- Norms of international peace and security law:
Proposals to set soft-law policy through existing international processes
- Proposals for engagement in existing international processes on AI: support the campaign to ban lethal autonomous weapons systems,[ref 754] orchestrate soft-law policy under G20,[ref 755] engage in debate about digital technology governance at the UN Summit for the Future,[ref 756] etc.
3.3. Proposals for new policies, laws, or institutions
A range of proposals for novel policies.
Impose (temporary) pauses on AI development
- Coordinated pauses amongst AI developers whenever they identify hazardous capabilities;[ref 757]
- Temporary pause on large-scale system training beyond a key threshold,[ref 758] giving time for near-term policy-setting in domains such as robust third-party auditing and certification, regulation of access to computational power, establishment of capable national AI agencies, and establishment of liability for AI-caused harms, etc.;[ref 759]
- (Permanent) moratoria on developing (certain forms of) advanced AI.[ref 760]
Establish licensing regimes
- Evaluation and licensing regimes: establishment of a AI regulation regime for frontier AI systems, comprising “(1) standard-setting processes to identify appropriate requirements for frontier AI developers, (2) registration and reporting requirements to provide regulators with visibility into frontier AI development processes, and (3) mechanisms to ensure compliance with safety standards for the development and deployment of frontier AI models.”[ref 761]
Establish lab-level safety practices
- Proposals for establishing corporate governance and soft law: establish Responsible Scaling Policies (RSPs)[ref 762] and establish corporate governance and AI certification schemes.[ref 763]
Establish governance regimes on AI inputs (compute, data)
- Compute governance regimes: establish on-chip firmware mechanisms, inspection regimes, and supply chain monitoring and custody mechanisms to ensure no actor can use large quantities of specialized chips to execute ML training runs in violation of established rules;[ref 764]
- Data governance: establish public data trusts to assert control over public training data for foundation models.[ref 765]
Establish domestic institutions for AI governance
- Proposals for new domestic institutions: US “AI Control Council”[ref 766] or National Algorithms Safety Board,[ref 767] and European AI Agency[ref 768] or European AI Office.[ref 769]
Establish international AI research consortia
Proposals to establish new international hubs or organizations aimed at AI research.[ref 770]
- A diverse range of proposals for international institutions, including: a “CERN for AI,”[ref 771] “European Artificial Intelligence megaproject,”[ref 772] “Multilateral AI Research Institute (MAIRI),”[ref 773] “international large-scale AI R&D projects,”[ref 774] a collaborative UN superintelligence research project,[ref 775] “international organization that could serve as clearing-house for research into AI,”[ref 776] “joint international AI project with a monopoly on hazardous AI development,”[ref 777] “UN AI Research Organization,”[ref 778] a “good-faith joint US-China AGI project,”[ref 779] “AI for shared prosperity,”[ref 780] and a proposal for a new “Multinational AGI Consortium.”[ref 781]
Establish bilateral agreements and dialogues
- Establish confidence-building measures[ref 782] and pursue international AI safety dialogues.[ref 783]
Establish multilateral international agreements
Proposal to establish a new multilateral treaty on AI:[ref 784]
- “Treaty on Artificial Intelligence Safety and Cooperation (TAISC),”[ref 785] global compute cap treaty,[ref 786] “AI development convention,”[ref 787] “Emerging Technologies Treaty,”[ref 788] “Benevolent AGI Treaty,”[ref 789] “pre-deployment agreements,”[ref 790] and many other proposals.
Establish international governance institutions
Proposals to establish a new international organization, along one or several models:[ref 791]
- A diverse range of proposals for international institutions, including a Commission on Frontier AI, an Advanced AI Governance Organization, a Frontier AI Collaborative, and an AI Safety Project;[ref 792] an International AI Organization (IAIO) to certify state jurisdictions for compliance with international AI oversight standards to enable states to prohibit the imports of goods “whose supply chains embody AI from non-IAIO-certified jurisdictions”;[ref 793] a proposal for an “international consortium” for evaluations of societal-scale risks from advanced AI;[ref 794] a “Global Organization for High-Risk Artificial Intelligence (GOHAI)”;[ref 795] and many other proposals.[ref 796]
Conclusion
The recent advances in AI have turned global public attention to this technology’s capabilities, impacts, and risks. AI’s significant present-day impacts and the prospect that these will only spread and scale further as these systems get increasingly advanced have firmly fixed this technology as a preeminent challenge for law and global governance this century.
In response, the disparate community of researchers that have explored aspects of these questions over the past years may increasingly be called upon to translate that research into rigorous, actionable, legitimate, and effective policies. They have developed—and continue to produce—a remarkably far-flung body of research, drawing on a diverse range of disciplines and methodologies. The urgency of action around advanced AI accordingly create a need for this field to increase the clarity of its work and its assumptions, to identify gaps in its approaches and methodologies where it can learn from yet more disciplines and communities, to improve coordination amongst lines of research, and to improve legibility of its argument and work to improve constructive scrutiny and evaluation of key arguments and proposed policies.
This review has not remotely achieved these goals—as no single document or review can. Yet by attempting to distill and disentangle key areas of scholarship, analysis, and policy advocacy, it hopes to help contribute to greater analytical and strategic clarity, more focused and productive research, and better-informed public debates and policymaker initiatives on the critical global challenges of advanced AI.
Also in this series
- Maas, Matthijs, and Villalobos, José Jaime. ‘International AI institutions: A literature review of models, examples, and proposals.’ Institute for Law & AI, AI Foundations Report 1. (September 2023). https://www.law-ai.org/international-ai-institutions
- Maas, Matthijs, ‘AI is like… A literature review of AI metaphors and why they matter for policy.’ Institute for Law & AI. AI Foundations Report 2. (October 2023). https://www.law-ai.org/ai-policy-metaphors
- Maas, Matthijs, ‘Concepts in advanced AI governance: A literature review of key terms and definitions.’ Institute for Law & AI. AI Foundations Report 3. (October 2023). https://www.law-ai.org/advanced-ai-gov-concepts
Concepts in advanced AI governance: a literature review of key terms and definitions
Abstract
As AI systems have become increasingly capable, policymakers, the public, and the field of AI governance have begun to consider the potential impacts and risks from these systems—and the question of how best to govern such increasingly advanced AI. Call this field “Advanced AI Governance.” However, debates within and between these communities often lack clarity over key concepts and terms. In response, this report provides an overview, taxonomy, and preliminary analysis of many cornerstone ideas and concepts within advanced AI governance.
To do so, it first reviews three different purposes for seeking definitions (technological, sociotechnical, and regulatory), and discusses why and how terminology matters to both the study and the practice of AI governance. Next, the report surveys key definitions in advanced AI governance. It reviews 101 definitions across 69 terms that have been coined for advanced AI systems, within four categories: (1) essence-based concepts that focus on the anticipated form of advanced AI, (2) development-based terms that emphasize the hypothesized pathways towards advanced AI, (3) sociotechnical-change-based terms that center the societal impacts of such AI, and (4) risk-based terms that highlight specific critical capabilities of advanced AI systems. The report then reviews distinct definitions of the tools of (AI) “policy” and “governance”, different paradigms within the field of advanced AI governance, and different concepts around theories of change. By disentangling these terms and definitions, this report aims to facilitate more productive conversations between AI researchers, academics, policymakers, and the public on the key challenges of advanced AI.
Executive summary
This report provides an overview, taxonomy, and preliminary analysis of many cornerstone ideas and concepts in the emerging field of advanced AI governance.
Aim: The aim of this report is to contribute to improved analysis, debate, and policy by providing greater clarity around core terms and concepts. Any field of study or regulation can be improved by such clarity.
As such, this report reviews definitions for four categories of terms: the object of analysis (e.g., advanced AI), the tools for intervention (e.g., “governance” and “policy”), the reflexive definitions of the field of “advanced AI governance”, and its theories of change.
Summary: In sum, this report:
- Discusses three different purposes for seeking definitions for AI technology, discusses the importance of such terminology in shaping AI policy and law, and discusses potential criteria for evaluating and comparing such terms.
- Reviews concepts for advanced AI, covering a total of 101 definitions across 69 terms, including terms focused on:
- the forms of advanced AI,
- the (hypothesized) pathways towards those advanced AI systems,
- the technology’s large-scale societal impacts, and
- particular critical capabilities that advanced AI systems are expected to achieve or enable.
- Reviews concepts within “AI governance”, such as nine analytical terms used to define the tools for intervention (e.g., AI strategy, policy, and governance), four terms used to characterize different approaches within the field of study, and five terms used to describe theories of change.
The terms are summarized below in Table 1. Appendices provide detailed lists of definitions and sources for all the terms covered as well as a list of definitions for nine other auxiliary terms within the field.
Introduction
As AI systems have become increasingly capable and have had increasingly public impacts, the field that focuses on governing advanced AI systems has come into its own.
While researchers come to this issue with many different motivations, concerns, or hopes about AI—and indeed with many different perspectives on or expectations about the technology’s future trajectory and impacts—there has grown an emerging field of researchers, policy practitioners, and activists concerned with and united by what they see as the increasingly significant and pivotal societal stakes of AI. Along with significant disagreements, many in this emerging community share the belief that shaping the transformative societal impacts of advanced AI systems is a top global priority.[ref 1] However, this field still lacks clarity regarding not only many key empirical and strategic questions but also many key terms that are used.
Background: This lack of clarity matters because the recent wave of progress in AI, driven especially but not exclusively by the dramatic success of large language models (LLMs), has led to an accumulation of a wide range of new terms to describe these AI systems. Yet many of these terms—such as “foundation model”,[ref 2] “generative AI”,[ref 3] or “frontier AI”[ref 4]—do not always have clear distinctions[ref 5] and are often used interchangeably.[ref 6] They moreover emerge on top of and alongside a wide range of past terms, concepts, and words that have been used in the past decades to refer to (potential) advanced AI systems, such as “strong AI”, “artificial general intelligence”, or “transformative AI”. What are we to make of all of these terms?
Rationale: Critically, debates over terminology in and for advanced AI are not just semantics—these terms matter. In a broad sense, framings, metaphors, analogies, and explicit definitions can strongly affect not just developmental pathways for technology but also policy agendas and the efficacy and enforceability of legal frameworks.[ref 7] Indeed, different terms have already become core to major AI governance initiatives—with “general-purpose AI” serving as one cornerstone category in the EU AI Act[ref 8] and “frontier AI models” anchoring the 2023 UK AI Safety Summit.[ref 9] The varying definitions and implications of such terms may lead to increasing contestation,[ref 10] as well they should: Extensive work over the past decade has shown how different terms for “AI” import different regulatory analogies[ref 11] and have implications for crafting legislation.[ref 12] We might expect the same to hold for the new generation of terms used to describe advanced AI and to center and focus its governance.[ref 13]
Aim: The aim of this report is to contribute to improved analysis, debate, and policy by providing greater clarity around core terms and concepts. Any field of study or regulation can be improved by such clarity. Such literature reviews may not just contribute to a consolidation of academic work, but can also refine public and policy debates.[ref 14] Ideally, they provide foundations for a more deliberate and reflexive choice over what concepts and terms to use (and which to discard), as well as a more productive refinement of the definition and/or operationalization of cornerstone terms.
Scope: In response, this report considers four types of terms, including potential concepts and definitions for each of the following:
- the core objects of analysis—and the targets for policy (i.e., what is the “advanced AI” to be governed?),
- the tools for intervention to be used in response (i.e., what is the range of terms such as “policy”, “governance”, or “law”?),
- the field or community (i.e., what are current and emerging accounts, projects, or approaches within the broader field of advanced AI governance?), and
- the theories of change of this field (i.e., what is this field’s praxis?).
Disclaimers: This project comes with some important caveats for readers.
First, this report aims to be relatively broad and inclusive of terms, framings, definitions, and analogies for (advanced) AI. In doing so, it draws from both older and recent work and from a range of sources from academic papers to white papers and technical reports to public fora.
Second, this report is primarily concerned with mapping the conceptual landscape and with understanding the (regulatory) implications of particular terms. As such, it is less focused on policing the appropriateness or coherence of particular terms or concepts. Consequently, with regard to advanced AI it covers many terms that are still highly debated or contested or for which the meaning is unsettled. Not all the terms covered are equally widely recognized, used, or even accepted as useful in the field of AI research or within the diverse fields of the AI ethics, policy, law, and governance space. Nonetheless, this report will include many of these terms on the grounds that a broad and inclusive approach to these concepts serves best to illuminate productive future debate. After all, even if some terms are (considered to be) “outdated,” it is important to know where such terms and concepts have come from and how they have developed over time. If some terms are contested or considered “too vague,” that should precisely speak in favor of aiming to clarify their usage and relation to other terms. This will either allow the (long overdue) refinement of concepts or will at least enable an improved understanding of when certain terms are not usefully recoverable. In both cases, it will facilitate greater clarity of communication.
Third, this review is a snapshot of the state of debate at one moment. It reviews a wide range of terms, many of which have been coined recently and only some of which may have staying power. This debate has developed significantly in the last few years and will likely continue to do so.
Fourth, this review will mostly focus on analytical definitions of or for advanced AI along four approaches.[ref 15] In so doing, it will on this occasion mostly omit detailed exploration of a fifth, normative dimension to defining AI, which would focus on reviewing especially desirable types of advanced AI systems that (in the view of some) ought to be pursued or created. Such a review would cover a range of terms such as “ethical AI”,[ref 16] “responsible AI”,[ref 17] “explainable AI”,[ref 18] “friendly AI”,[ref 19] “aligned AI”,[ref 20] “trustworthy AI”,[ref 21] “provably-safe AI”,[ref 22] “human-centered AI”,[ref 23] “green AI”,[ref 24] “cooperative AI”,[ref 25] “rights-respecting AI”,[ref 26] “predictable AI”,[ref 27] “collective intelligence”,[ref 28] and “digital plurality”,[ref 29] amongst many other terms and concepts. At present, this report will not focus in depth on surveying these terms, since only some of them were articulated in the context of or in consideration of especially advanced AI systems. However, many or all of these terms are capability-agnostic and so could clearly be extended to or reformulated for more capable, impactful, or dangerous systems. Indeed, undertaking such a deepening and extension of the taxonomy presented in this report in ways that engage more with the normative dimension of advanced AI would be very valuable future work.
Fifth, this report does not aim to definitively resolve debates—or to argue that all work should adopt one or another term over others. Different terms may work best in different contexts or for different purposes and for different actors. Indeed, given the range of actors interested in AI—whether from a technical engineering, sociotechnical, or regulatory perspective—it is not surprising that there are so many terms and such diversity in definitions even for single terms. Nonetheless, to be able to communicate effectively and learn from other fields, it helps to gain greater clarity and precision in the terms we use, whether these are terms referring to our objects of analysis, our own field and community, or our theory of action. Of course, achieving clarity on terminology is not itself sufficient. Few problems, technical or social or legal, may be solved exclusively by haggling over words. Nonetheless, a shared understanding facilitates problem solving. The point here is not to achieve full or definitive consensus but to understand disagreements and assumptions. As such, this report seeks to provide background on many terms, explore how they have been used, and consider the suitability of these terms for the field.[ref 30] In doing so, this report highlights the diversity of terms in current use and provides context for more informed future study and policymaking.
Structure: Accordingly, this report now proceeds as follows.
Part I provides a background to this review by discussing three purposes to defining key terms such as AI. It also discusses why the choice for one or another term matters significantly from the perspective of AI policy and regulation, and finally discusses some criteria by which to evaluate the suitability of various terms and definitions for the specific purpose of regulation.
In Part II, this report reviews a wide range of terms for “advanced AI”, across different approaches which variably focus on (a) the anticipated forms or design of advanced AI systems, (b) the hypothesized scientific pathways towards these systems, (c) the technology’s broad societal impacts, or (d) the specific critical capabilities particular advanced AI systems are expected to achieve.
Part III turns from the object of analysis to the field and epistemic community of advanced AI governance itself. It briefly reviews three categories of concepts of use for understanding this field. First, it surveys different terms used to describe AI “strategy”, “policy”, or “governance” as this community understands the available tools for intervention in shaping advanced AI development. It then reviews different paradigms within the field of advanced AI governance as ways in which different voices within it have defined that field. Finally, it briefly reviews recent definitions for theories of change that aim to compare and prioritize interventions into AI governance.
Finally, three appendices list in detail all the terms and definitions offered, with sources, and offer a list of auxiliary definitions that can aid future work in this emerging field.[ref 31]etail, with sources; and offer a list of auxiliary definitions that can aid future work in this emerging field.
I. Defining ‘advanced AI (governance)’: Background
Any quest for clarifying definitions of “advanced AI” is complicated by the already long-running, undecided debates over how to even define the more basic terms “AI” or, indeed, “intelligence”.[ref 32]
To properly evaluate and understand the relevance of different terms for AI, it is useful to first set out some background. In the first place, one should start by considering the purposes for which the definition is sought. Why or how do we seek definitions of “(advanced) AI”?
1. Three purposes for definitions
For instance, rather than trying to consider a universally best definition for AI, a more appropriate approach is to consider the implications of different definitions, or—to invert the question—to ask for what purpose we seek to define AI. We can consider (at least) three different rationales for defining a term like ‘AI’.
- To build it (the technological research purpose): In the first place, AI researchers or scientists may pursue definitions of (advanced) AI by defining it from the “inside,” as a science.[ref 33] The aim of such technical definitions of AI[ref 34] is to clarify or create research-community consensus about (1) the range and disciplinary boundaries of the field—that is, what research programs and what computational techniques[ref 35] count as “AI research” (both internally and externally to research funders or users); (2) the long-range goals of the field (i.e., the technical forms of advanced AI); and/or (3) the intermediate steps the field should take or pursue (i.e., the likely pathways towards such AI). Accordingly, this definitional purpose aligns particularly closely with essence-based definitions (see Part II.1) and/or development-based definitions (see Part II.2) of advanced AI.
- To study it (the sociotechnical research purpose): In the second place, experts (in AI, but especially in other fields) may seek to primarily understand AI’s impacts on the world. In doing so, they may aim to define AI from the “outside,” as a sociotechnical system including its developers and maintainers.[ref 36] Such definitions or terms can aid researchers (or governments) who seek to understand the societal impacts and effects of this technology in order to diagnose or analyze the potential dynamics of AI development, diffusion, and application, as well as the long-term sociopolitical problems and opportunities. For instance, under this purpose researchers may aim to get to terms with understanding issues such as (1) (the geopolitics or political economy of) key AI inputs (e.g., compute, data, and labor), (2) how different AI capabilities[ref 37] give rise to a spectrum of useful applications[ref 38] in diverse domains, and (3) how these applications in turn produce or support new behaviors and societal impacts.[ref 39] Accordingly, this purpose is generally better served by sociotechnical definitions of AI systems’ impacts (see Part II.3) or risk-based definitions (see Part II.4).
- To regulate it (the regulatory purpose): Finally, regulators or academics motivated by appropriately regulating AI—either to seize the benefits or to mitigate adverse impacts—can seek to pragmatically delineate and define (advanced) AI as a legislative and regulatory target. In this approach, definitions of AI are to serve as useful handles for law, regulation, or governance.[ref 40] In principle, this purpose can be well served by many of the definitional approaches: highly technology-specific regulations for instance can gain from focusing on development-based definitions of (advanced) AI. However, in practice regulation and governance is usually better served by focusing on the sociotechnical impacts or capabilities of AI systems.
Since it is focused on the field of “advanced AI governance,” this report will primarily focus on the second and third of these purposes. However, it is useful to keep all three in mind.
2. Why terminology matters to AI governance
Whether taking a sociotechnical perspective on the societal impacts of advanced AI or a regulatory perspective on adequately governing it, the need to pick suitable concepts and terms becomes acutely clear. Significantly, the implications and connotations of key terms matter greatly for law, policy, and governance. This is because, as reviewed in a companion report,[ref 41] distinct or competing terms for AI—with their meanings and connotations—can influence all stages of the cycle from a technology’s development to its regulation. They do so in both a broad and a narrow sense.
In the broad and preceding sense, the choice of term and definition can, explicitly or implicitly, import particular analogies or metaphors into policy debates that can strongly shape the direction—and efficacy—of the resulting policy efforts.[ref 42] These framing effects can occur even if one tries to avoid explicit analogies between AI and other technologies, since apparently “neutral” definitions of AI still focus on one or another of the technology’s “features” as the most relevant, framing policymaker perceptions and responses in ways that are not neutral, natural, or obvious. For instance, Murdick and others found that the particular definition one uses for what counts as “AI” research directly affects which (industrial or academic) metrics are used to evaluate different states’ or labs’ relative achievements or competitiveness in developing the technology—framing downstream evaluations of which nation is “ahead” in AI.[ref 43] Likewise, Kraftt and colleagues found that whereas definitions of AI that emphasize “technical functionality” are more widespread among AI researchers, definitions that emphasize “human-like performance” are more prevalent among policymakers, which they suggest might prime policymaking towards future threats.[ref 44]
Beyond the broad policy-framing impacts of technology metaphors and analogies, there is also a narrower sense in which terms matter. Specifically, within regulation, legislative and statutory definitions delineate the scope of a law and of the agency authorization to implement or enforce it[ref 45]—such that the choice for a particular term for (advanced) AI may make or break the resulting legal regime.
Generally, within legislative texts, the inclusion of particular statutory definitions can play both communicative roles (clarifying legislative intent), and performative roles (investing groups or individuals with rights or obligations).[ref 46] More practically, one can find different types of definitions that play distinct roles within regulation: (1) delimiting definitions establish the limits or boundaries on an otherwise ordinary meaning of a term, (2) extending definitions broaden a term’s meaning to expressly include elements or components that might not normally be included in its ordinary meaning, (3) narrowing definitions aim to set limits or expressly exclude particular understandings, and (4) mixed definitions use several of these approaches to clarify components.[ref 47]
Likewise, in the context of AI law, legislative definitions for key terms such as “AI” obviously affect the material scope of the resulting regulations.[ref 48] Indeed, the effects of particular definitions have impacts on regulation not only ex ante, but also ex post: in many jurisdictions, legal terms are interpreted and applied by courts based on their widely shared “ordinary meaning.”[ref 49] This means, for instance, that regulations that refer to terms such as “advanced AI”, “frontier AI”, or “transformative AI” might not necessarily be interpreted or applied in ways that are in line with how the term is understood within expert communities. All of this underscores the importance of our choice of terms—from broad and indirect metaphors to concrete and specific legislative definitions—when grappling with the impacts of this technology on society.
Indeed, the strong legal effects of different terms mean that there can be challenges for a law when it depends on a poorly or suboptimally specified regulatory term for the forms, types, or risks from AI that the legislation means to address. This creates twin challenges. On the one hand, picking suitable concepts or categories can be difficult at an early stage of a technology’s development and deployment, when its impacts and limits are not always fully understood—the so-called Collingridge dilemma.[ref 50]
At the same time, the cost of picking and locking in the wrong terms within legislative texts can be significant. Beyond the opportunity costs, unreflexively establishing legal definitions for key terms can create the risk of downstream or later “governance misspecification.”[ref 51]
Such governance misspecification may occur when regulation is originally targeted at a particular artifact or (technological) practice through a particular material scope and definition for those objects. The implicit assumption here is that the term in question is a meaningful proxy for the underlying societal or legal goals to be regulated. While that assumption may be appropriate and correct in many cases, there is a risk that if that assumption is wrong—either because of an initial misapprehension of the technology or because subsequent technological developments lead to that proxy term diverging from the legislative goals—the resulting technology law will less efficient, ineffective, or even counterproductive to its purposes.[ref 52]
Such cases of governance misspecification can be seen in various cases of technology governance and regulation. For instance:
- The “high-performance computer” threshold in US 1990s export control regimes: In the 1990s, the US established a series of export control regimes under the Wassenaar Arrangement, which set an initial threshold for “high-performance computers” at just 195 million theoretical operations per second (MTOPS); in doing so, the regime treated that technology as far too static and could not keep pace with Moore’s Law.[ref 53] As a result, the threshold had to be updated six times within a decade,[ref 54] even as the regime became increasingly ineffective at preventing or even inhibiting US adversaries from accessing as much computing power as they needed, and it may even have become harmful to national security as it inhibited the domestic US tech industry.[ref 55]
- The “in orbit” provision in the Outer Space Treaty: In the late 1960s, the Outer Space Treaty aimed to outlaw positioning weapons of mass destruction in space. It therefore (as proxy) specified a ban on placing these weapons “in orbit.”[ref 56] This definition meant that there was a loophole to be exploited by the Soviet development of fractional orbital bombardment systems (FOBS), which were able to position nuclear weapons in space (on non-ballistic trajectories) without, strictly, putting them “in orbit.”[ref 57]
- Under- and overinclusive 2010s regulations on drones and self-driving cars: Calo has chronicled how, in the early 2010s, various US regulatory responses to drones or self-driving cars defined these technologies in ways that were either under- or overinclusive, leading to inefficiency or the repeal of laws.[ref 58]
Thus, getting greater clarity in our concepts and terminology for advanced AI will be critical in crafting effective, resilient regulatory responses—and in avoiding brittle missteps that are easily misspecified.
Given all the above, the aim in this report is not to find the “correct” definition or frame for advanced AI. Rather, it considers that different frames and definitions can be more useful for specific purposes or for particular actors and/or (regulatory) agencies. In that light, we can explore a series of broad starting questions, such as:
- What different definitions have been proposed for advanced AI? What other terms could we choose?
- What aspects of advanced AI (e.g., its form and design, the expected scientific principles of its development pathways, its societal impacts, or its critical capabilities) do these different terms focus on?
- What are the regulatory implications of different definitions?
In sum, this report is premised on the idea that exploring definitions of AI (and related terms) matters, whether we are trying to understand AI, understand its impacts, or govern them effectively.
3. Criteria for definitions
Finally, we have the question of how to formulate relevant criteria for suitable terms and definitions for advanced AI. In the first place, as discussed above, this depends on one’s definitional purpose.
Nonetheless, from the specific perspective of regulation and policymaking, what are some good criteria for evaluating suitable and operable definitions for advanced AI? Notably, Jonas Schuett has previously explored legal approaches to defining the basic term “AI”. He emphasizes that to be suitable for the purpose of governance, the choice of terms for AI should meet a series of requirements for all good legal definitions—namely that terms are neither (1) overinclusive nor (2) underinclusive and that they are (3) precise, (4) understandable, (5) practicable, and (6) flexible.[ref 59] Other criteria have been proposed: for instance, it has been suggested that an additional desiderata for a useful regulatory definition for advanced AI might include something like ex ante clarity—in the sense that the definition should allow one to assess, for a given AI model, whether it will meet the criteria for that definition (i.e., whether it will be regulated within some regime), and ideally allow this to be assessed in advance of deployment (or even development) of that model.[ref 60] Certainly, these criteria remain contested and are likely incomplete. In addition, there may be trade-offs between the criteria, such that even if they are individually acceptable, one must still strike a workable balance between them.[ref 61]
II. Defining the object of analysis: Terms for advanced AI
Having briefly discussed the different definitional purposes, the relevance of terms for regulation, and potential criteria for evaluating definitions, this report now turns to survey the actual terminology for advanced AI.
Within the literature and public debate, there are many terms used to refer to the conceptual cluster of AI systems that are advanced—i.e., that are sophisticated and/or are highly capable and/or could have transformative impacts on society.[ref 62] However, because of this diversity of terms, not all have featured equally strongly in governance or policy discussions. To understand and situate these terms, it is useful to compare their definitions with others and to review different approaches to defining advanced AI.
In Schuett’s model for “legal” definitions for AI, he has distinguished four types of definitions, which focus variably on (1) the overarching term “AI”, (2) particular technical approaches in machine learning, (3) specific applications of AI, and (4) specific capabilities of AI systems (e.g., physical interaction, ability to make automated decisions, ability to make legally significant decisions).[ref 63]
Drawing on Schuett’s framework, this report draws a similar taxonomy for common definitions for advanced AI. In doing so, it compares between different approaches that focus on one of four features or aspects of advanced AI.
- The anticipated technical form or design of AI systems (essence-based approaches);
- The proposed scientific pathways and paradigms towards creating advanced AI (development-based approaches);
- The broad societal impacts of AI systems, whatever their cognitive abilities (sociotechnical-change-based approach);
- The specific critical capabilities[ref 64] that could potentially enable extreme impacts in particular domains (risk-based approaches).
Each of these approaches has a different focus, object, and motivating question (Table 2).
This report will now review these categories of approaches in turn. For each, it will broadly (1) discuss that approach’s core definitional focus and background, (2) list the terms and concepts that are characteristic of it, (3) provide some brief discussion of common themes and patterns in definitions given to these terms,[ref 65] and (4) then provide some preliminary reflections on the suitability of particular terms within this approach, as well as of the approach as a whole, to provide usable analytical or regulatory definitions for the field of advanced AI governance.[ref 66]
1. Essence-based definitions: Forms of advanced AI
Focus of approach: Classically, many definitions of advanced AI focus on the anticipated form, architecture, or design of future advanced AI systems.[ref 67] These definitions as such focus on AI systems that instantiate particular forms of advanced intelligence,[ref 68] for instance by instantiating an “actual mind” (that “really thinks”); by displaying a degree of autonomy; or by being human-like, general-purpose, or both in the ability to think, reason, or achieve goals across domains (see Table 3).
Terms: The form-centric approach to defining advanced AI accordingly encompasses a variety of terms, including strong AI, autonomous machine (/ artificial) intelligence, general artificial intelligence, human-level AI, foundation model, general-purpose AI system, comprehensive AI services, artificial general intelligence, robust artificial intelligence, AI+, (machine/artificial) superintelligence, superhuman general-purpose AI, and highly-capable foundation models.[ref 69]
Definitions and themes: While many of these terms are subject to a wide range of different definitions (see Appendix 1A), they combine a range of common themes or patterns (see Table 3).
Suitability of overall definitional approach: In the context of analyzing advanced AI governance, there are both advantages and drawbacks to working with form-centric terms. First, we review five potential benefits.
Benefit (1): Well-established and recognized terms: In the first place, using form-centric terms has the advantage that many of these terms are relatively well established and familiar.[ref 72] Out of all the terms surveyed in this report, many form-centric definitions for advanced AI, like strong AI, superintelligence, or AGI, have both the longest track record and the greatest visibility in academic and public debates around advanced AI. Moreover, while some of these terms are relatively niche to philosophical (“AI+”) or technical subcommunities (“CAIS”), many of these terms are in fact the ones used prominently by the main labs developing the most disruptive, cutting-edge AI systems.[ref 73] Prima facie, reusing these terms could avoid the problem of having to reinvent the wheel and achieve widespread awareness of and buy-in on newer, more niche terms.
Benefit (2): Readily intuitive concepts: Secondly, form-centric terms evoke certain properties—such as autonomy, adaptability, and human-likeness—which, while certainly not uncontested, may be concepts that are more readily understood or intuited by the public or policymakers than would be more scientifically niche concepts. At the same time, this may also be a drawback, if the ambiguity of many of these terms opens up greater scope for misunderstanding or flawed assumptions to creep into governance debates.
Benefit (3): Enables more forward-looking and anticipatory policymaking towards advanced AI systems and their impacts. Thirdly, because some (though not all) form-centric definitions of advanced AI relate to systems that are perceived (or argued) to appear in the future, using these terms could help extend public attention, debate, and scrutiny to the future impacts of yet more general AI systems which, while their arrival might be uncertain, would likely be enormously impactful. This could help such debates and policies to be less reactive to the impacts of each latest AI model release or incident and start laying the foundations for major policy initiatives. Indeed, centering governance analysis on form-centric terms, even if they are (seen as) futuristic or speculative, can help inform more forward-looking, anticipatory, and participatory policymaking towards the kind of AI systems (and the kind of capabilities and impacts) that may be on the horizon.[ref 74]
One caveat here is that to consider this a benefit, one has to strongly assume that these futuristic forms of advanced AI systems are in fact feasible and likely near in development. At the same time, this approach need not presume absolute certainty over which of these forms of advanced AI can or will be developed, or on what timelines; rather, well-established risk management approaches[ref 75] can warrant some engagement with these scenarios even under uncertainty. To be clear, this need not (and should not) mean neglecting or diminishing policy attention for the impacts of existing AI systems,[ref 76] especially as these impacts are already severe and may continue to scale up as AI systems both become more widely implemented and create hazards for existing communities.
Benefit (4): Enables public debate and scrutiny of overarching (professed) direction and destination for AI development. Fourthly, and relatedly, this above advantage to using form-centric terms could still hold, even if one is very skeptical of these types of futuristic AI, because they afford the democratic value of allowing the public and policymakers to chime in on the actual professed long-term goals and aspirations of many (though not all) leading AI labs.[ref 77]
In this way, the cautious, clear, and reflexive use of terms such as AGI in policy debates could be useful even if one is very skeptical of the actual feasibility of these forms of AI (or believes they are possible but remains skeptical that they will be built anytime soon using extant approaches). This is because there is democratic and procedural value in the public and policymakers being able to hold labs to account for the goals that they in fact espouse and pursue—even if those labs may turn out mistaken about the ability to execute on those plans (in the near term).[ref 78] This is especially the case when these are goals that the public might not (currently) agree with or condone.[ref 79]
Using these “futuristic” terms could therefore help ground public debate over whether the development of these particular systems is even a societal goal they condone, whether society might prefer for labs or society to pursue a different visions for society’s relation to AI technology,[ref 80] or (if these systems are indeed considered desirable and legitimate goals) what additional policies or guarantees the world should demand.[ref 81]
Benefit (5): Technology neutrality: Fifthly, the use of form-centric terms in debates can build in a degree of technology neutrality[ref 82] in policy responses, since debates need not focus on the specific engineering or scientific pathways by which one or another highly capable and impactful AI system is pursued or developed. This could make the resulting regulatory frameworks more scalable and future-proof.
At the same time, there are a range of general drawbacks to using (any of these) form-focused definitions in advanced AI governance.
Drawback (1): Connotations and baggage around terms: In the first place, the greater familiarity of some of these terms means that many form-focused terms have become loaded with cultural baggage, associations, or connotations which may mislead, derail, or unduly politicize effective policymaking processes. In particular, many of these terms are contested and have become associated (whether or not necessarily) with particular views or agendas towards building these systems.[ref 83] This is a problem because, as discussed previously, the use of different metaphors, frames, and analogies may be irreducible in (and potentially even essential to) the ways that the public and policymakers make sense of regulatory responses. Yet different analogies—and especially the unreflexive use of terms—also have limits and drawbacks and create risks of inappropriate regulatory responses.[ref 84]
Drawback (2): Significant variance in prominence of terms and constant turnover: In the second place, while some of these terms have held currency at different times in the last decades, many do not see equally common use or recognition in modern debates. For instance, terms such as “strong AI” which dominated early philosophical debates, appear to have fallen slightly out of favor in recent years[ref 85] as the emergence and impact of foundation models generally, and generative AI systems specifically, has revived significantly greater attention to terms such as “AGI”. This churn or turnover in definitions may mean that it may not be wise to attempt to pin down a single term or definition right now, since analyses that focus on one particular anticipated form of advanced AI may be more likely to be rendered obsolete. At the same time, this is likely to be a general problem with any concepts or terminology chosen.
Drawback (3): Contested terms, seen as speculative or futuristic: In the third place, while some form-centric terms (such as “GPAIS” or “foundation model”) have been well established in AI policy debates or processes, others, such as “AGI”, “strong AI”, or “superintelligence”, are more future-oriented, referring to advanced AI systems that do not (yet) exist.[ref 86] Consequently, many of these terms are contested and seen as futuristic and speculative. This perception may be a challenge, because even if it is incorrect (e.g., such that particular systems like “AGI” will in fact be developed within short timelines or are even in some sense “already here”[ref 87]), the mere perception that a technology or term is far-off or “speculative” can serve to inhibit and delay effective regulatory or policy action.[ref 88]
A related but converse risk of using future-oriented terms for advanced AI policy is that it may inadvertently import a degree of technological determinism[ref 89] in public and policy discussions, as it could imply that one or another particular forms or architectures of advanced AI (“AGI”, “strong AI”) are not just possible but inevitable—thereby shifting public and policy discussions away from the question of whether we should (or can safely) develop these systems (rather than other, more beneficial architectures)[ref 90] towards less ambitious questions over how we should best (safely) reckon with the arrival or development of these technologies.
In response, this drawback could be somewhat mitigated by relying on terms for the forms of advanced AI—such as GPAIS or highly-capable foundation models—that are (a) more present-focused, while (b) not putting any strong presumed ceilings on the capabilities of the systems.
Drawback (4): Definitional ambiguity: In the fourth place, many of these terms, and especially future-oriented terms such as “strong AI”, “AGI”, and “human-level AI”, suffer from definitional ambiguity in that they are used both inconsistently and interchangeably with one another.[ref 91]
Of course, just because there is no settled or uncontested definition for a term such as “AGI” does not make it prima facie unsuitable for policy or public debate. By analogy, the fact that there can be definitional ambiguity over the content or boundaries of concepts such as “the environment” or “energy” does not render “environmental policy” or “energy policy” meaningless categories or irrelevant frameworks for regulation.[ref 92] Nor indeed does outstanding definitional debate mean that any given term, such as AGI, is “meaningless.”[ref 93]
Nonetheless, the sheer range of contesting definitions for many of these concepts may reflect an underlying degree of disciplinary or philosophical confusion, or at least suggest that, barring greater conceptual clarification and operationalization,[ref 94] these terms will lead to continued disagreement. Accordingly, anchoring advanced AI governance to broad terms such as “AGI” may make it harder to articulate appropriately scoped legal obligations for specific actors that will not end up being over- or underinclusive.[ref 95]
Drawback (5): Challenges in measurement and evaluation: In the fifth place, an underlying and related challenge for the form-centric approach is that (in part due to these definitional disagreements and in part due to deeper reasons) it faces challenges around how to measure or operationalize (progress towards) advanced AI systems.
This matters because effective regulation or governance—especially at the international level[ref 96]—often requires (scientific and political) consensus around key empirical questions, such as when and how we can know that a certain AI system truly achieves some of the core features (e.g., autonomy, agency, generality, and human-likeness) that are crucial to a given term or concept. In practice, AI researchers often attempt to measure such traits by evaluating an AI system’s ability to pass one or more specific benchmark tests (e.g., the Turing test, the Employment test, the SAT, etc.).[ref 97]
However, such testing approaches have many flaws or challenges.[ref 98] At the practical level, there have been problems with how tests are applied and scored[ref 99] and how their results are reported.[ref 100] Underlying this is a challenge that the way in which some common AI performance tests are constructed may emphasize nonlinear or discontinuous metrics, which can lead to an overtly strong impression that some model skills are “suddenly” emergent properties (rather than smoothly improving capabilities).[ref 101] More fundamentally, there have been challenges to the meaningfulness of applying human-centric tests (such as the bar exam) to AI systems[ref 102] and indeed deeper critiques of the construct validity of leading benchmark tests in terms of whether they actually are indicative of progress towards flexible and generalizable AI systems.[ref 103]
Of course, that does not mean that there may not be further scientific progress towards the operationalization of useful tests for understanding when particular forms of advanced AI such as AGI have been achieved.[ref 104] Nor is it to suggest that benchmark and evaluation challenges are unique to form-centric definitions of AI—indeed, they may also challenge many approaches focused on specific capabilities of advanced AIs.[ref 105] However, the extant challenges over the operationalization of useful tests mean that overreliance on these terms could muddle debates and inhibit consensus over whether a particular advanced system is within reach (or already being deployed).
Drawback (6): Overt focus on technical achievement of particular forms may make this approach underinclusive of societal impacts or capabilities: In the sixth place, the focus of future-oriented form-centric approaches on the realization of one or another type of advanced AI system (“AGI”, “human-level AI”), might be adequate if the purpose for our definitions is for technical research.[ref 106] However, for those whose definitional purpose is to understand AI’s societal impacts (sociotechnical research) or to appropriately regulate AI (regulatory), many form-centric terms may miss the point.
This is because what matters from the perspective of human and societal safety, welfare, and well-being—and from the perspective of law and regulation[ref 107]—is not the achievement of some fully general capacity in any individual system but rather overall sociotechnical impacts or the emergence of key dangerous capabilities—even if they derive from systems that are not yet (fully) general[ref 108] or that develop dangerous emergent capabilities that are not human-like.[ref 109] Given all this, there is a risk that taking a solely form-centric approach leaves advanced AI governance vulnerable to a version of the “AI effect,” whereby “real AGI” is always conceived of as being around the corner but rarely as a system already in production.
Suitability of different terms within approach: Given the above, if one does aim to draw on this approach, it may be worth considering which terms manage to gain from the strengths of this approach while reducing some of the pitfalls. In this view, the terms “GPAIS” or “foundation model” may be more suitable in many contexts, as they are recognized as categories of (increasingly) general and competent AI systems of which some versions already exist today. In particular, because (versions) of these terms are already used in ongoing policy debates, they could provide better regulatory handles for governing the development of advanced AI—for instance by their relation to the complex supply chain of modern AI development that contains both upstream and downstream developers and users.[ref 110] Moreover, these terms do not presume a ceiling in the system’s capability; accordingly, concepts such as “highly-capable foundation model”,[ref 111] “extremely capable foundation model”, or “threshold foundation model” could help policy debates be cognizant of the growing capabilities of these systems while still being more easily understandable for policymakers.[ref 112]
2. Development-based definitions: Pathways towards advanced AI
Focus of approach: A second cluster of terms focuses on the anticipated or hypothesized scientific pathways or paradigms that could be used to create advanced AI systems. Notably, the goal or target of these pathways is often to build “AGI”-like systems.[ref 113]
Notes and caveats: Any discussion of proposed pathways towards advanced AI has a number of important caveats. In the first place, many of these proposed paradigms have long been controversial, with pervasive and ongoing disagreement about their scientific foundations and feasibility as paths towards advanced AI (or in particular as paths towards particular forms of advanced AI, such as AGI).[ref 114] Secondly, these approaches are not necessarily mutually exclusive, and indeed many labs combine elements from several in their research.[ref 115] Thirdly, because the relative and absolute prominence and popularity of many of these paradigms have fluctuated over time and because there are often, as in any scientific field, significant disciplinary gulfs between paradigms, there is highly unequal treatment of these pathways and terms. As such, whereas some paradigms (such as the scaling, reinforcement-learning, and, to some extent, brain-inspired approaches) are reasonably widely known, many of the other approaches and terms listed (such as “seed AI”) may be relatively unknown or even very obscure within the modern mainstream machine learning (ML) community.[ref 116]
Other taxonomies: There have been various other such attempts to create taxonomies of the main theorized pathways that have been proposed to build or implement advanced AI. For instance, Goertzel and Pennachin have defined four different approaches to creating “AGI”, which to different degrees draw on lessons from the (human) brain or mind.[ref 117] More recently, Hannas and others have drawn on this framework and extended it to five theoretical pathways towards “general AI”.[ref 118]
Further extending such frameworks, one can distinguish between at least 11 proposed pathways towards advanced AI (See Table 4).
Terms: Many of these paradigms or proposed pathways towards advanced AI come with their own assorted terms and definitions (see Appendix 1B). These terms include amongst others de novo AGI, prosaic AGI, frontier (AI) model [compute threshold], [AGI] from evolution, [AGI] from powerful reinforcement learning agents, powerful deep learning models, seed AI, neuroAI, brain-like AGI, neuromorphic AGI, whole-brain emulation, brain-computer interface, [advanced AI based on] a sophisticated embodied agent, or hybrid AI (see Table 4).
Definitions: As noted, these terms can be mapped on 11 proposed pathways towards advanced AI, with their own terms for the resulting advanced AI systems.
Notably, there are significant differences in the prominence of these approaches—and the resources dedicated to them—at different frontier AI labs today. For instance, while some early work on the governance of advanced AI systems focused on AI systems that would (presumably) be built from first principles, bootstrapping,[ref 121] or neuro-emulated approaches (see Table 4), much of such work has more recently shifted to focus on understanding the risks from and pathways to aligning and governing advanced AI systems created through computational scaling.
This follows high-profile trends in leading AI labs. While (as discussed above) many research labs are not dedicated to a single paradigm, the last few years (and 2023 in particular) have seen a significant share of resources going towards computational scaling approaches, which have yielded remarkably robust (though not uncontested) performance improvements.[ref 122] As a result, the scaling approach has been prominent in informing the approaches of labs such as OpenAI,[ref 123] Anthropic,[ref 124] DeepMind,[ref 125] and Google Brain (now merged into Google DeepMind).[ref 126] This approach has also been prominent (though somewhat lagging) in some Chinese labs such as Baidu, Alibaba, Tencent, and the Beijing Institute for General Artificial Intelligence.[ref 127] Nonetheless, other approaches continue to be in use. For instance, neuro-inspired approaches have been prominent in DeepMind,[ref 128] Meta AI Research,[ref 129] and some Chinese[ref 130] and Japanese labs,[ref 131] and modular cognitive architecture approaches have informed the work by Goertzel’s OpenCog project,[ref 132] amongst others.
Suitability of overall definitional approach: In the context of analyzing advanced AI governance, there are both advantages and drawbacks to using concepts that focus on pathways of development.
Amongst the advantages of this approach are:
Benefit (1): Close(r) grounding in actual technical research agendas aimed at advanced AI: Defining advanced AI systems according to their (envisioned) development pathways has the benefit of keeping advanced AI governance debates more closely grounded in existing technical research agendas and programs, rather than the often more philosophical or ambiguous debates over the expected forms of advanced AI systems.
Benefit (2): Technological specificity allowing scoping of regulation to approaches of concern: Relatedly, this also allows better regulatory scoping of the systems of concern. After all, the past decade has seen a huge variety amongst AI techniques and approaches, not just in terms of their efficacy but also in terms of the issues they raise, with particular technical approaches raising distinct (safety, interpretability, robustness) issues.[ref 133] At the same time, these correlations might be less relevant in the last few years given the success of scaling-based approaches at creating remarkably versatile and general-purpose systems.
However, taking the pathways-focused approach to defining advanced AI has its own challenges:
Drawback (1): Brittleness as technological specificity imports assumptions about pathways towards advanced AI: The pathway-centric approach may import strong assumptions about what the relevant pathways towards advanced AI are. As such, governance on this basis may not be robust to ongoing changes or shifts in the field.
Drawback (2): Suitability of terms within this approach: Given this, development-based definitions of pathways towards advanced AI seem particularly valuable if the purpose of definition is technical research but may be less relevant if the purpose is sociotechnical analysis or regulation. Technical definitions of AI might therefore provide an important baseline or touchstone for analysis in many other disciplines, but they may not be fully sufficient or analytically enlightening to many fields of study dealing with the societal consequences of the technology’s application or with avenues for governing these.
At any rate, one interesting feature of development-based definitions of advanced AI is that the choice of approach (and term) to focus on has significant and obvious downstream implications for framing the policy agendas for advanced AI—in terms of the policy issues to address, the regulatory “surface” of advanced AI (e.g., the necessary inputs or resources to pursue research along a certain pathway), and the most feasible or appropriate tools. For instance, a focus on neuro-integrationist-produced brain-computer interfaces suggests that policy issues for advanced AI will focus less on questions of value alignment[ref 134] and rather around (biomedical) questions of human consent, liability, privacy, (employer) neurosurveillance,[ref 135] and/ormorphological freedom.[ref 136] A focus on embodiment-based approaches towards robotic agents raises more established debates from robot law.[ref 137] Conversely, if one expects that the pathway towards advanced AI still requires underlying scientific breakthroughs, either from first principles or through a hybrid approach, this would imply that very powerful AI systems could be developed suddenly by small teams or labs, which lack large compute budgets.
Similarly, focusing on scaling–based approaches—which seems most suitable given the prominence and success of this approach in driving the recent wave of AI progress—leads to a “compute-based” perspective on the impacts of advanced AI.[ref 138] This suggests that the key tools and levers for effective governance should focus on compute governance—provided we assume that this will remain a relevant or feasible precondition for developing frontier AI. For instance, such an approach underpins the compute-threshold definition for frontier AI, which defines advanced AI with reference to particular technical elements or inputs (such as a compute usage or FLOP threshold, dataset size, or parameter count) used in its development.[ref 139] While a useful referent, this may be an unstable proxy given that it may not reliably or stably correspond to the particular capabilities of concern.
3. Sociotechnical-change based definitions: Societal impacts of advanced AI
Focus of approach: A third cluster of definitions in advanced AI governance mostly brackets out philosophical questions of the precise form of AI systems or engineering questions of the scientific pathways towards their development. Rather, it aims at defining advanced AI in terms of different levels of societal impacts.
Many concepts in this approach have emerged from scholarship that aimed to abstract away from these architectural questions and rather explore the aggregate societal impacts of advanced AI. This includes work on AI technology’s international, geopolitical impacts[ref 140] as well as work on identifying relevant historical precedents for the technology’s societal impacts, strategic stakes, and political economy.[ref 141] Examples of this work are those that identified novel categories of unintended “structural” risks from AI as distinct from “misuse” or “accident” risks,[ref 142] or taxonomies of the different “problem logics” created by AI systems.[ref 143]
Terms: The societal-impact-centric approach to defining advanced AI includes a variety of terms, including: (strategic) general-purpose technology, general-purpose military transformation, transformative AI, radically transformative AI, AGI (economic competitiveness definition), and machine superintelligence.
Definitions and themes: While many of these terms are subject to a wide range of different definitions (see Appendix 1C), they again feature a range of common themes or patterns (see Table 5).
Suitability of approach: Concepts within the sociotechnical-change-based approach may be unsuitable iFocus of approach: Finally, a fourth cluster of terms follows a risk-based approach and focuses on critical capabilities, which certain types of advanced AI systems (whatever their underlying form or scientific architecture) might achieve or enable for human users. The development of such capabilities could then mark key thresholds or inflection points in the trajectory of society.
Other taxonomies: Work focused on the significant potential impacts or risks of advanced AI systems is of course hardly new.[ref 150] Yet in the past years, as AI capabilities have progressed, there has been renewed and growing concern that these advances are beginning to create key threshold moments where sophisticated AI systems develop capabilities that allow them to achieve or enable highly disruptive impacts in particular domains, resulting in significant societal risks. These risks may be as diverse as the capabilities in question—and indeed discussions of these risks do not always or even mostly presume (as do many form-centric approaches) the development of general capabilities in AI.[ref 151] For instance, many argue that existing AI systems may already contribute to catastrophic risks in various domains:[ref 152] large language models (LLMs) and automated biological design tools (BDTs) may already be used to enable weaponization and misuse of biological agents,[ref 153] the military use of AI systems in diverse roles may inadvertently affect strategic stability and contribute to the risk of nuclear escalation,[ref 154] and existing AI systems’ use in enabling granular and at-scale monitoring and surveillance[ref 155] may already be sufficient to contribute to the rise of “digital authoritarianism”[ref 156] or “AI-tocracy”[ref 157], to give a few examples.
As AI systems become increasingly advanced, they may steadily and increasingly achieve or enable further critical capabilities in different domains that could be of special significance. Indeed, as leading LLM-based AI systems have advanced in their general-purpose abilities, they have frequently demonstrated emergent abilities that are surprising even to their developers.[ref 158] This has led to growing concern that as these models continue to be scaled up[ref 159] some next generation of these systems could develop unexpected but highly dangerous capabilities if not cautiously evaluated.[ref 160]
What are these critical capabilities?[ref 161] In some existing taxonomies, critical capabilities could include AI systems reaching key levels of performance in domains such as cyber-offense, deception, persuasion and manipulation, political strategy, building or gaining access to weapons, long-horizon planning, building new AI systems, situational awareness, self-proliferation, censorship, or surveillance,[ref 162] amongst others. Other experts have been concerned about cases where AI systems display increasing tendencies and aptitudes towards controlling or power-seeking behavior.[ref 163] Other overviews identify other sets of hazardous capabilities.[ref 164] In all these cases, the concern is that advanced AI systems that achieve these capabilities (regardless of whether they are fully general, autonomous, etc.) could enable catastrophic misuse by human owners, or could demonstrate unexpected extreme—even hazardous—behavior, even against the intentions of their human principals.
Terms: Within the risk-based approach, there are a range of domains that could be upset by critical capabilities. A brief survey (see Table 6) can identify at least eight such capability domains—moral/philosophical, economic, legal, scientific, strategic or military, political, exponential, and (extremely) dangerous.[ref 165] Namely, these include:[ref 166]
- Concepts relating to AI systems that achieve or enable critical moral and/or philosophical capabilities include artificial/machine consciousness, digital minds, digital people, sentient artificial intelligence, robot rights catastrophe, (negative) synthetic phenomenology, suffering risks, and adversarial technological maturity.
- Concepts relating to AI systems that achieve or enable critical economic capabilities include high-level machine intelligence, tech company singularity, and artificial capable intelligence (ACI).
- Concepts relating to AI systems that achieve or enable critical legal capabilities include advanced artificial judicial intelligence, technological-legal lock-in, and legal singularity.
- Concepts relating to AI systems that achieve or enable critical scientific capabilities include process-automating science and technology and scientist model.
- Concepts relating to AI systems that achieve or enable critical strategic and/or military capabilities include decisive strategic advantage and singleton.
- Concepts relating to AI systems that achieve or enable critical political capabilities include stable totalitarianism, value lock-in, and actually existing AI.
- Concepts related to AI systems that achieve or enable critical exponential capabilities include intelligence explosion, autonomous replication in the real world, autonomous AI research, and duplicator.
- Concepts relating to AI systems that achieve or enable (extremely) hazardous capabilities include advanced AI, high-risk AI systems, AI systems of concern, prepotent AI, APS systems, WIDGET, rogue AI, runaway AI, frontier (AI) model (under two definitional thresholds), and highly-capable systems of concern.
Definitions and themes: As noted, many of these terms have different definitions (see Appendix 1D). Nonetheless, a range of common themes and patterns can be distilled (see Table 6).
Suitability of approach: There are a range of benefits and drawbacks to defining advanced AI systems by their (critical) capabilities. These include (in no particular order):
Benefit (1): Focuses on key capability development points of most concern: A first benefit of adopting the risk-based definitional approach is that these concepts can be used, alone or in combination, to focus on the key thresholds or transition points in AI development that we most care about—not just the aggregate eventual, long-range societal outcomes nor the (eventual) “final” form of advanced AI, but rather the key intermediate (technical) capabilities that would suffice to create (or enable actors to achieve) significant societal impacts: the points of no return.
Benefit (2): Highlighting risks and capabilities can more precisely inform the public understanding: Ensuring that terms for advanced AI systems clearly center on particular risks or capabilities can help the public and policymakers understand the risks or challenges to be avoided, in a way that is far clearer than terms that focus on very general abilities or which are highly technical (i.e., terms within essence- or development-based approaches, respectively). Such terms may also assist the public in comparing the risks of one model to those posed by another.[ref 169]
Benefit (3): Generally (but not universally) clearer or more concrete: While some terms within this approach are quite vague (e.g., “singleton”) or potentially difficult to operationalize or test for (e.g., “artificial consciousness”), some of the more specific and narrow terms within this approach could offer more clarity, and less definitional drift, to regulation. While many of them would need significant further clarification before they could be suitable for use in legislative texts (whether domestic laws or international treaties), they may offer the basis for more circumscribed, tightly defined professional cornerstone concepts for such regulation.[ref 170]
However, there are also a number of potential drawbacks to risk-based definitions.
Drawback (1): Epistemic challenges around “unknown unknown” critical capabilities: One general challenge to this risk-based approach for characterizing advanced AI is that, in the absence of more specific and empirical work, it can be hard to identify and enumerate all relevant risk capabilities in advance (or to know that we have done so). Indeed, aiming to exhaustively list out all key capabilities to watch for could be a futile exercise to undertake.[ref 171] At the same time, this is a challenge that is arguably faced in any domain of (technology) risk mitigation, and it does not mean that doing such analysis to the best of our abilities is void. However, this challenge does create an additional hurdle for regulation, as it heightens the chance that if the risk profile of the technology rapidly changes, regulators or existing legal frameworks will be unsure of how or where to classify that model.
Drawback (2): Challenges around comparing or prioritizing between risk capabilities: A related challenge lies in the difficulty of knowing which (potential) capabilities to prioritize for regulation and policy. However, that need not be a general argument against this approach. Instead, it may simply help us make explicit the normative and ethical debates over what challenges to avoid and prioritize.
Drawback (3): Utilizing many parallel terms focused on different risks can increase confusion: One risk for this approach is that while the use of many different terms for advanced AI systems, depending on their specific critical capabilities in particular domains, can make more for appropriate and context-sensitive discussions (and regulation) within those domains, at an aggregate level this may increase the range of terms that regulators and the public have to reckon with and compare between—with the risk that these actors simply drown in the range of terms.
Drawback (5): Outstanding disagreements over appropriate operationalization of capabilities: One further challenge with these terms may lie in the way that some key terms remain contested or debated—and that even clearer terms are not without challenge. For instance, in 2023, the concept of “frontier model” has become subject to increasing debate over its potential adequacy for regulation.[ref 172] Notably, there are at least three ways of operationalizing this concept. The first, computational threshold, has been discussed above.[ref 173]
However, a second operationalization for frontier AI focuses on some relative-capabilities threshold. This approach includes recent proposals to define “frontier AI models” in terms of capabilities relative to other AI systems,[ref 174] as models that “exceed the capabilities currently present in the most advanced existing models” or as “models that are both (a) close to, or exceeding, the average capabilities of the most capable existing models.”[ref 175] Taking such a comparative approach to defining advanced AI may be useful in combating the easy tendency of observers to normalize or become used to the rapid pace of AI capability progress.[ref 176] Yet there may be risks with such a comparative approach, especially when tied to a moving wavefront of “the most capable” existing models, as this could easily impose a need on regulators to engage in constant regulatory updating, as well as creating risks of underinclusivity of some foundation models that did not display hazardous capabilities in their initial evaluations, but which once deployed or shared might be reused or recombined in ways that could create or enable significant harms.[ref 177] The risk of embedding this definition of frontier AI in regulation, would be to leave a regulatory gap around significantly harmful capabilities, especially those that are no longer at the technical “frontier,” but which remain unaddressed even so. Indeed, for similar reasons, Seger and others have advocated using the concept “highly-capable foundation models” instead.[ref 178]
A third approach to defining frontier AI models has instead focused more on identifying a set of static and absolute criteria grounded in particular dangerous capabilities (i.e., a dangerous-capabilities threshold). Such definitions might be useful insofar as they help regulators or consumers identify better when a model crosses a safety threshold and in a way that is less susceptible to slippage or change over time. This could make such concepts more suitable (and resulting regulations less at risk of obsolescence or governance misspecification) than operationalizations of “frontier AI model” that rely on indirect technological metrics (such as compute thresholds) as proxies for these capabilities. Even so, as discussed above, anchoring the “frontier AI model” concept on particular dangerous capabilities leaves open questions around how to best operationalize and create evaluation suites that are able to identify or predict such capabilities ex ante.
Given this, while the risk-based approach may be the most promising ground for defining advanced AI systems from a regulatory perspective, it is clear that not all terms in use in this approach are equally suitable, and many require further operationalization and clarification.
III. Defining the advanced AI governance epistemic community
Beyond the object of concern of “advanced AI” (in all its diverse forms), researchers in the emerging field concerned with the impacts and risks of advanced AI systems have begun to specify a range of other terms and concepts, relating to the tools for intervening in and on the development of advanced AI systems in socially beneficial ways, terms by which this community’s members conceive of the overarching approach or constitution of their field, and theories of change.
1. Defining the tools for policy intervention
First off, those writing about the risks and regulation of AI have proposed a range of terms in describing the tools, practices, or nature of governance interventions that could be used in response (see Table 7).
Like the term “advanced AI”, these terms set out objects of study in scoping the practices or tools of AI governance. They matter insofar as they link these terms to tools for intervention.
Nonetheless, these terms do not capture the methodological dimension of how different approaches to advanced AI governance have approached these issues—nor the normative question of why different research communities have been driven to focus on the challenges from advanced AI in the first place.[ref 180]
2. Defining the field of practice: Paradigms
Thus, we can next consider different ways that practitioners have defined the field of advanced AI governance.[ref 181] Researchers have used a range of terms to describe the field of study that focuses on understanding the trajectory to forms of, or impacts of advanced AI and how to shape these. While these have significant overlaps in practice, it is useful to distinguish some key terms or framings of the overall project (Table 8).
However, while these terms show some different focus and emphasis, and different normative commitments, this need not preclude an overall holistic approach. To be sure, work and researchers in this space often hold diverse expectations about the trajectory, form, or risks of future AI technologies; diverse normative commitments and motivations for studying these; and distinct research methodologies given their varied disciplinary backgrounds and epistemic precommitments.[ref 184] However, even so, many of these communities remain united by a shared perception of the technology’s stakes—the shared view that shaping the impacts of AI is and should be a significant global priority.[ref 185]
As such, one takeaway here is not that scholars or researchers need pick any one of these approaches or conceptions of the field. Rather, there is a significant need for any advanced AI governance field to maintain a holistic approach, which includes many distinct motivations and methodologies. As suggested by Dafoe,
“AI governance would do well to emphasize scalable governance: work and solutions to pressing challenges which will also be relevant to future extreme challenges. Given all this potential common interest, the field of AI governance should be inclusive to heterogenous motivations and perspectives. A holistic sensibility is more likely to appreciate that the missing puzzle pieces for any particular challenge could be found scattered throughout many disciplinary domains and policy areas.”[ref 186]
In this light, one might consider and frame advanced AI governance as an inclusive and holistic field, concerned with, broadly, “the study and shaping of local and global governance systems—including norms, policies, laws, processes, and institutions—that affect the research, development, deployment, and use of existing and future AI systems, in ways that help the world choose the role of advanced AI systems in its future, and navigate the transition to that world.”
3. Defining theories of change
Finally, researchers in this field have been concerned not just with studying and understanding the strategic parameters of the development of advanced AI systems,[ref 187] but also with considering ways to intervene upon it, given particular assumptions or views about the form, trajectory, societal impacts, or risky capabilities of this technology.
Thus, various researchers have defined terms that aim to capture the connection between immediate interventions or policy proposals, and the eventual goals they are meant to secure (see Table 9).
Drawing on these terms, one might also articulate new terms that incorporate elements from the above.[ref 196] For instance, one could define a “strategic approach” as a cluster of correlated views on advanced AI governance, encompassing (1) broadly shared assumptions about the key technical and governance parameters of the challenge; (2) a broad theory of victory and impact story about what solving this problem would look like; (3) a broadly shared view of history, with historical analogies to provide comparison, grounding, inspiration, or guidance; and (4) a set of intermediate strategic goals to be pursued, giving rise to near-term interventions that would contribute to reaching these.
Conclusion
The community focused on governing advanced AI systems has developed a rich and growing body of work. However, it has often lacked clarity, not only regarding many key empirical and strategic questions, but also regarding many of its fundamental terms. This includes different definitions for the relevant object of analysis—that is, species of “advanced AI”—as well as different framings for the instruments of policy, different paradigms or approaches to the field itself, and distinct understandings of what it means to have a theory of change to guide action.
This report has reviewed a range of terms for different analytical categories in the field. It has discussed three different purposes for seeking definitions for core terms, and why and how (under a “regulatory” purpose) the choice of terms matters to both the study and practice of AI governance. It then reviewed analytical definitions of advanced AI used across different clusters which focus on the forms or design of advanced AI systems, the (hypothesized) scientific pathways towards developing these systems, the technology’s broad societal impacts, and the specific critical capabilities achieved by particular AI systems. The report then briefly reviewed analytical definitions of the tools for intervention, such as “policy” and governance”, before discussing definitions of the field and community itself and definitions for theories of change by which to prioritize interventions.
This field of advanced AI governance has shown a penchant for generating many concepts, with many contesting definitions. Of course, while any emerging field will necessarily engage in a struggle to define itself, this field has seen a particularly broad range of terms, perhaps reflecting the disciplinary range. Eventually, the community may need to more intentionally and deliberately commit to some terms. In the meantime, those who engage in debate within and beyond the field should at least have greater clarity about the ways that these concepts are used and understood, and about the (regulatory) implications of some of these terms. This report has aimed to provide such greater clarity in order to help provide greater context for more informed and clear discussions about questions in and around the field.
Appendix 1: Lists of definitions for advanced AI terms
This appendix provides a detailed list of definitions for advanced AI systems, with sources. These may be helpful for readers to explore work in this field in more detail; to understand the longer history and evolution of many terms; and to consider the strengths and drawbacks of particular terms, and of specific language, for use in public debate, policy formulation, or even in direct legislative texts.
1.A. Definitions focused on the form of advanced AI
Different definitional approaches emphasize distinct aspects or traits that would characterize the form of advanced AI systems—such as that it is ‘mind-like’, performs ‘autonomously’, ‘is general-purpose’, ‘performs like a human’, ‘performs general-purpose like a human’, etc. However, it should be noted that there is significant overlap, and many of these terms are often (whether or not correctly) used interchangeably.
Advanced AI is mind-like & really thinks
- Strong AI
- An “appropriately programmed computer [that] really is a mind, in the sense that computers given the right programs can be literally said to understand and have other cognitive states.”[ref 198]
- “The assertion that machines could possibly act intelligently (or, perhaps better, act as if they were intelligent) is called the ‘weak AI’ hypothesis by philosophers, and the assertion that machines that do so are actually thinking (as opposed to simulating thinking) is called the ‘strong AI’ hypothesis.”[ref 199]
- “the combination of Artificial General Intelligence/Human-Level AI and Superintelligence.”[ref 200]
Advanced AI is autonomous
- Autonomous machine intelligence: “intelligent machines that learn more like animals and humans, that can reason and plan, and whose behavior is driven by intrinsic objectives, rather than by hard-wired programs, external supervision, or external rewards.”[ref 201]
- Autonomous artificial intelligence: “artificial intelligence that can adapt to external environmental challenges. Autonomous artificial intelligence can be similar to animal intelligence, called (specific) animal-level autonomous artificial intelligence, or unrelated to animal intelligence, called non-biological autonomous artificial intelligence.”[ref 202]
General artificial intelligence: “broadly capable AI that functions autonomously in novel circumstances”.[ref 203]
Advanced AI is human-like
- Human-level AI (HLAI)
- “systems that operate successfully in the common sense informatic situation [defined as the situation where] the known facts are incomplete, and there is no a priori limitation on what facts are relevant. It may not even be decided in advance what phenomena are to be taken into account. The consequences of actions cannot be fully determined. The common sense informatic situation necessitates the use of approximate concepts that cannot be fully defined and the use of approximate theories involving them. It also requires nonmonotonic reasoning in reaching conclusions.”[ref 204]
- “machines exhibiting true human-level intelligence should be able to do many of the things humans are able to do. Among these activities are the tasks or ‘jobs’ at which people are employed. I suggest we replace the Turing test by something I will call the ‘employment test.’ To pass the employment test, AI programs must… [have] at least the potential [to completely automate] economically important jobs.”[ref 205]
- “AI which can reproduce everything a human can do, approximately. […] [this] can mean either AI which can reproduce a human at any cost and speed, or AI which can replace a human (i.e. is as cheap as a human, and can be used in the same situations.)”[ref 206]
- “An artificial intelligence capable of matching humans in every (or nearly every) sphere of intellectual activity.”[ref 207]
Advanced AI is general-purpose
- Foundation model
- “models trained on broad data at scale […] that are adaptable to a wide range of downstream tasks.”[ref 208]
- “AI systems with broad capabilities that can be adapted to a range of different, more specific purposes. […] the original model provides a base (hence ‘foundation’) on which other things can be built.”[ref 209]
- General-purpose AI systems (GPAIS)
- “an AI system that can be used in and adapted to a wide range of applications for which it was not intentionally and specifically designed.”[ref 210]
- “An AI system that can accomplish or be adapted to accomplish a range of distinct tasks, including some for which it was not intentionally and specifically trained.”[ref 211]
- “An AI system that can accomplish a range of distinct valuable tasks, including some for which it was not specifically trained.”[ref 212]
- See also “general-purpose AI models”: “AI models that are designed for generality of their output and have a wide range of possible applications.”[ref 213]
- Comprehensive AI services (CAIS)
“asymptotically recursive improvement of AI technologies in distributed systems [which] contrasts sharply with the vision of self-improvement internal to opaque, unitary agents. […] asymptotically comprehensive, superintelligent-level AI services that—crucially—can include the service of developing new services, both narrow and broad, [yielding] a model of flexible, general intelligence in which agents are a class of service-providing products, rather than a natural or necessary engine of progress in themselves.”[ref 214]
Advanced AI is general-purpose & of human-level performance
- Artificial general intelligence (AGI) [task performance definitions][ref 215]
- “systems that exhibit the broad range of general intelligence found in humans.”[ref 216]
- “Artificial intelligence that is not specialized to carry out specific tasks, but can learn to perform as broad a range of tasks as a human.”[ref 217]
- AI systems with “the ability to achieve a variety of goals, and carry out a variety of tasks, in a variety of different contexts and environments.”[ref 218]
- AI systems which “can reason across a wide range of domains, much like the human mind.”[ref 219]
- “machines designed to perform a wide range of intelligent tasks, think abstractly and adapt to new situations.”[ref 220]
- “AI that is capable of solving almost all tasks that humans can solve.”[ref 221]
- “AIs that can generalize well enough to produce human-level performance on a wide range of tasks, including abstract low-data tasks.”[ref 222]
- “The AI that […] can do most everything we humans can do, and possibly much more.”[ref 223]
- “[a]n AI that has a level of intelligence that is either equivalent to or greater than that of human beings or is able to cope with problems that arise in the world that surrounds human beings with a degree of adequacy at least similar to that of human beings.”.[ref 224]
- “an agent that has a world model that’s vastly more accurate than that of a human in, at least, domains that matter for competition over resources, and that can generate predictions at a similar rate or faster than a human.”[ref 225]
- “type of AI system that addresses a broad range of tasks with a satisfactory level of performance [or in a stronger sense] systems that not only can perform a wide variety of tasks, but all tasks that a human can perform.”[ref 226]
- “[AI with] cognitive capabilities fully generalizing those of humans.”[ref 227]
- See also the subdefinition of autonomous AGI (AAGI) as “an autonomous artificial agent with the ability to do essentially anything a human can do, given the choice to do so—in the form of an autonomously/internally determined directive—and an amount of time less than or equal to that needed by a human.”[ref 228]
- “a machine-based system that can perform the same general-purpose reasoning and problem-solving tasks humans can.”[ref 229]
- “an AI system that equals or exceeds human intelligence in a wide variety of cognitive tasks.”[ref 230]
- “AI systems that achieve or exceed human performance across a wide range of cognitive tasks”.[ref 231]
- “hypothetical type of artificial intelligence that would have the ability to understand or learn any intellectual task that a human being can.”[ref 232]
- “a shorthand for any intelligence […] that is flexible and general, with resourcefulness and reliability comparable to (or beyond) human intelligence.”[ref 233]
- “systems that demonstrate broad capabilities of intelligence as […] [a very general mental capability that, among other things, involves the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly and learn from experience], with the additional requirement, perhaps implicit in the work of the consensus group, that these capabilities are at or above human-level.”[ref 234]
- “autonomous artificial intelligence that reaches Human-level intelligence. It can adapt to external environmental challenges and complete all tasks that humans can accomplish, achieving human-level intelligence in all aspects.”[ref 235]
Robust artificial intelligence: “intelligence that, while not necessarily superhuman or self-improving, can be counted on to apply what it knows to a wide range of problems in a systematic and reliable way, synthesizing knowledge from a variety of sources such that it can reason flexibly and dynamically about the world, transferring what it learns in one context to another, in the way that we would expect of an ordinary adult.”[ref 236]
Advanced AI is general-purpose & beyond-human-performance
- AI+: “artificial intelligence of greater than human level (that is, more intelligent than the most intelligent human)”[ref 237]
- (Machine/Artificial) superintelligence (ASI):
- “an intellect that is much smarter than the best human brains in practically every field, including scientific creativity, general wisdom and social skills.”[ref 238]
- “any intellect that greatly exceeds the cognitive performance of humans in virtually all domains of interest.”[ref 239]
- “an AI significantly more intelligent than humans in all respects.”[ref 240]
- “Artificial intelligence that can outwit humans in every (or almost every) intellectual sphere.”[ref 241]
- “future AI systems dramatically more capable than even AGI.”[ref 242]
- “Artificial General Intelligence that has surpassed humans in all aspects of human intelligence.”[ref 243]
- “AI that might be as much smarter than us as we are smarter than insects.”[ref 244]
- See also “machine superintelligence” [form and impact]: “general artificial intelligence greatly outstripping the cognitive capacities of humans, and capable of bringing about revolutionary technological and economic advances across a very wide range of sectors on timescales much shorter than those characteristic of contemporary civilization.”[ref 245]
- Superhuman general-purpose AI (SGPAI): “general purpose AI systems […] that are simultaneously as good or better than humans across nearly all tasks.”[ref 246]
- Highly capable foundation models: “Foundation models that exhibit high performance across a broad domain of cognitive tasks, often performing the tasks as well as, or better than, a human.”[ref 247]
1.B. Definitions focused on the pathways towards advanced AI
First-principles pathways: “De novo AGI”
Pathways based on new fundamental insights in computer science, mathematics, algorithms, or software, producing advanced AI systems that may, but need not mimic human cognition.[ref 248]
- De novo AGI: “AGI built from the ground up.”[ref 249]
Scaling pathways: “Prosaic AGI”, “frontier (AI) model” [compute threshold]
Approaches based on “brute forcing” advanced AI,[ref 250] by running (one or more) existing AI approaches (such as transformer-based LLMs)[ref 251] with increasingly more computing power and/or training data, as per the “scaling hypothesis.”[ref 252]
- Prosaic AGI: AGI “which can replicate human behavior but doesn’t involve qualitatively new ideas about ‘how intelligence works.’”[ref 253]
- Frontier (AI) model [compute threshold]:[ref 254]
- “foundation model that is trained with more than some amount of computational power—for example, 10^26 FLOP.”[ref 255]
- “models within one order of magnitude of GPT-4 (>2e24 FLOP).”[ref 256]
Evolutionary pathways: “[AGI] from evolution”
Approaches based on algorithms competing to mimic the evolutionary brute search process that produced human intelligence.[ref 257]
- [AGI] from evolution: “[AGI re-evolved through] genetic algorithms on computers that are sufficiently fast to recreate on a human timescale the same amount of cumulative optimization power that the relevant processes of natural selection instantiated throughout our evolutionary past.”[ref 258]
Reward-based pathways: “[AGI] from powerful reinforcement learning agents”, “powerful deep learning models”
Approaches based on running reinforcement learning systems with simple rewards in rich environments.
- [AGI] from powerful reinforcement learning agents: “powerful reinforcement learning agents, when placed in complex environments, will in practice give rise to sophisticated expressions of intelligence.”[ref 259]
Powerful deep learning models: “a powerful neural network model [trained] to simultaneously master a wide variety of challenging tasks (e.g. software development, novel-writing, game play, forecasting, etc.) by using reinforcement learning on human feedback and other metrics of performance.”[ref 260]
Bootstrapping pathways:[ref 261] “Seed AI”
Approaches that pursue a minimally intelligent core system capable of subsequent recursive (self)-improvement,[ref 262] potentially leveraging hardware or data “overhangs.”[ref 263]
- Seed AI:
- “an AI designed for self-understanding, self-modification, and recursive self-improvement.”[ref 264]
- “a strongly self-improving process, characterized by improvements to the content base that exert direct positive feedback on the intelligence of the underlying improving process.”[ref 265]
- “The first AI in a series of recursively self-improving systems.”[ref 266]
Neuro-emulated pathways: “Whole-brain-emulation” (WBE)
Approaches that aim to digitally simulate or recreate the states of human brains at fine-grained level.
- Whole-brain-emulation (WBE):
- “software (and possibly dedicated non-brain hardware) that models the states and functional dynamics of a brain at a relatively fine-grained level of detail.”[ref 271]
- “The process of making an exact computer-simulated copy of the brain of a particular animal (e.g., a particular human).”[ref 272]
- Digital people [emulation definition]: “a computer simulation of a specific person, in a virtual environment […] perhaps created via mind uploading (simulating human brains) [or] entities unlike us in many ways, but still properly thought of as ‘descendants’ of humanity.”[ref 273]
- See also related terms: “Ems”[ref 274] or “uploads”.
Neuro-integrationist pathways: “Brain-computer-interfaces” (BCI)
Approaches to create advanced AI, based on merging components of human and digital cognition.
- Brain-computer-interfaces (BCI): “use brain-computer interfaces to position both elements, human and machine, to achieve (or overachieve) human goals.”[ref 275]
Embodiment pathways:[ref 276] “Embodied agent”
Based on providing the AI system with a robotic physical “body” to ground cognition and enable it to learn from direct experience of the world.[ref 277]
- “an embodied agent (e.g., a robot) which learns, through interaction and exploration, to creatively solve challenging tasks within its environment.”[ref 278]
Modular cognitive architecture pathways
Used in various fields, including in robotics, where researchers integrate well-tested but distinct state-of-the-art modules (perception, reasoning, etc.) to improve agent performance without independent learning.[ref 279]
- No clear single term.
Hybrid pathways
Approaches that rely on combining deep neural network-based approaches to AI with other paradigms (such as symbolic AI).
- Hybrid AI: “hybrid, knowledge-driven, reasoning-based approach, centered around cognitive models.”[ref 280]
1.C. Definitions focused on the aggregate societal impacts of advanced AI
(Strategic) general-purpose technology (GPT)
- “[AI systems] having an unusually broad and deep impact on the world, comparable to that of electricity, the internal combustion engine, and computers.”[ref 281]
- This has been further operationalized as: “[this] need not emphasize only agent-like AI or powerful AI systems, but instead can examine the many ways even mundane AI could transform fundamental parameters in our social, military, economic, and political systems, from developments in sensor technology, digitally mediated behavior, and robotics. AI and associated technologies could dramatically reduce the labor share of value and increase inequality, reduce the costs of surveillance and repression by authorities, make global market structure more oligopolistic, alter the logic of the production of wealth, shift military power, and undermine nuclear stability.”[ref 282]
- See also strategic general-purpose technology: “A general purpose technology which has the potential to deliver vast economic value and substantially affect national security, and is consequently of central political interest to states, firms, and researchers.”[ref 283]
General-purpose military transformation (GMT)
- The process by which general-purpose technologies (such as electricity and AI) “influence military effectiveness through a protracted, gradual process that involves a broad range of military innovations and overall industrial productivity growth.”[ref 284]
Transformative AI (TAI):[ref 285]
- “potential future AI that precipitates a transition comparable to (or more significant than) the agricultural or industrial revolution.”[ref 286]
- “AI powerful enough to bring us into a new, qualitatively different future.”[ref 287]
- “software which causes a tenfold acceleration in the rate of growth of the world economy (assuming that it is used everywhere that it would be economically profitable to use it).”[ref 288]
- “AI that can go beyond a narrow task … but falls short of achieving superintelligence.”[ref 289]
- “a range of possible advances with potential to impact society in significant and hard-to-reverse ways.”[ref 290]
- “Any AI technology or application with potential to lead to practically irreversible change that is broad enough to impact most important aspects of life and society.”[ref 291]
Radically transformative AI (RTAI)
- “any AI technology or application which meets the criteria for TAI, and with potential impacts that are extreme enough to result in radical changes to the metrics used to measure human progress and well-being, or to result in reversal of societal trends previously thought of as practically irreversible. This indicates a level of societal transformation equivalent to that of the agricultural or industrial revolutions.”[ref 292]
AGI [economic competitiveness definition]
- “highly autonomous systems that outperform humans at most economically valuable work.”[ref 293]
- “AI systems that power a comparably profound transformation (in economic terms or otherwise) as would be achieved in [a world where cheap AI systems are fully substitutable for human labor].”[ref 294]
- “future machines that could match and then exceed the full range of human cognitive ability across all economically valuable tasks.”[ref 295]
Machine superintelligence [form & impact definition]
“general artificial intelligence greatly outstripping the cognitive capacities of humans, and capable of bringing about revolutionary technological and economic advances across a very wide range of sectors on timescales much shorter than those characteristic of contemporary civilization”[ref 296]
1.D. Definitions focused on critical capabilities of advanced AI systems
Systems with critical moral and/or philosophical capabilities
- Artificial/Machine consciousness:
- “machines that genuinely exhibit conscious awareness.”[ref 297]
- “Weakly construed, the possession by an artificial intelligence of a set of cognitive attributes that are associated with consciousness in humans, such as awareness, self-awareness, or cognitive integration. Strongly construed, the possession by an AI of properly phenomenological states, perhaps entailing the capacity for suffering.”[ref 298]
- Digital minds: “machine minds with conscious experiences, desires, and capacity for reasoning and autonomous decision-making […] [which could] enjoy moral status, i.e. rather than being mere tools of humans they and their interests could matter in their own right.”[ref 299]
- Digital people [capability definition]: “any digital entities that (a) had moral value and human rights, like non-digital people; (b) could interact with their environments with equal (or greater) skill and ingenuity to today’s people.”[ref 300]
- Sentient artificial intelligence: “artificial intelligence (capable of feeling pleasure and pain).”[ref 301]
- Robot rights catastrophe: The point where AI systems are sufficiently advanced that “some people reasonably regard [them] as deserving human or humanlike rights. [while] Other people will reasonably regard these systems as wholly undeserving of human or humanlike rights. […] Given the uncertainties of both moral theory and theories about AI consciousness, it is virtually impossible that our policies and free choices will accurately track the real moral status of the AI systems we create. We will either seriously overattribute or seriously underattribute rights to AI systems—quite possibly both, in different ways. Either error will have grave moral consequences, likely at a large scale. The magnitude of the catastrophe could potentially rival that of a world war or major genocide.”[ref 302]
- (Negative) synthetic phenomenology: “machine consciousness [that] will have preferences of their own, that […] will autonomously create a hierarchy of goals, and that this goal hierarchy will also become a part of their phenomenal self-model […] [such that they] will be able to consciously suffer,”[ref 303] creating a risk of an “explosion of negative phenomenology” (ENP) (“Suffering explosion”).[ref 304]
- Suffering risks: “[AI that brings] about severe suffering on an astronomical scale, vastly exceeding all suffering that has existed on Earth so far.”[ref 305]
- Adversarial technological maturity:
- “the point where there are digital people and/or (non-misaligned) AIs that can copy themselves without limit, and expand throughout space [creating] intense pressure to move – and multiply (via copying) – as fast as possible in order to gain more influence over the world.”[ref 306]
- “a world in which highly advanced technology has already been developed, likely with the help of AI, and different coalitions are vying for influence over the world.”[ref 307]
Systems with critical economic capabilities[ref 308]
- High-level machine intelligence (HLMI):
- “unaided machines [that] can accomplish every task better and more cheaply than human workers.”[ref 309]
- “an AI system (or collection of AI systems) that performs at the level of an average human adult on key cognitive measures required for economically relevant tasks.”[ref 310]
- “The spectrum of advanced AI capabilities from next-generation AI systems to artificial general intelligence (AGI). Often used interchangeably with advanced AI.”[ref 311]
- Tech company singularity: “a transition of a technology company into a fully general tech company [defined as] a technology company with the ability to become a world-leader in essentially any industry sector, given the choice to do so—in the form of agreement among its Board and CEO—with around one year of effort following the choice.”[ref 312]
- Artificial capable intelligence (ACI):
- “AI [that] can achieve complex goals and tasks with minimal oversight.”[ref 313]
- “a fast-approaching point between AI and AGI: ACI can achieve a wide range of complex tasks but is still a long way from being fully general.”[ref 314]
Systems with critical legal capabilities
- Advanced artificial judicial intelligence (AAJI): “an artificially intelligent system that matches or surpasses human decision-making in all domains relevant to judicial decision-making.”[ref 315]
- Technological-legal lock-in: “hybrid human/AI judicial systems [which] risk fostering legal stagnation and an attendant loss of judicial legitimacy.”[ref 316]
- Legal singularity: “when the accumulation of a massive amount of data and dramatically improved methods of inference make legal uncertainty obsolete. The legal singularity contemplates complete law. […] the elimination of legal uncertainty and the emergence of a seamless legal order, which is universally accessible in real time.”[ref 317]
Systems with critical scientific capabilities
- Process-automating science and technology (PASTA): “AI systems that can essentially automate all of the human activities needed to speed up scientific and technological advancement.”[ref 318]
- Scientist model: “a single unified transformative model […] which has flexible general-purpose research skills.”[ref 319]
Systems with critical strategic or military capabilities[ref 320]
- Decisive strategic advantage: “a position of strategic superiority sufficient to allow an agent to achieve complete world domination.”[ref 321]
- Singleton: [AI capabilities sufficient to support] “a world order in which there is a single decision-making agency at the highest level.”[ref 322]
Systems with critical political capabilities
- Stable totalitarianism: “AI [that] could enable a relatively small group of people to obtain unprecedented levels of power, and to use this to control and subjugate the rest of the world for a long period of time (e.g. via advanced surveillance).”[ref 323]
- Value lock-in:
- “an event [such as the use of AGI] that causes a single value system, or set of value systems, to persist for an extremely long time.”[ref 324]
- “AGI [that] would make it technologically feasible to (i) perfectly preserve nuanced specifications of a wide variety of values or goals far into the future, and (ii) develop AGI-based institutions that would (with high probability) competently pursue any such values for at least millions, and plausibly trillions, of years.”[ref 325]
Actually existing AI (AEAI): A paradigm by which the broader ecosystem of AI development, on current trajectories, may produce harmful political outcomes, because “AI as currently funded, constructed, and concentrated in the economy—is misdirecting technological resources towards unproductive and dangerous outcomes. It is driven by a wasteful imitation of human comparative advantages and a confused vision of autonomous intelligence, leading it toward inefficient and harmful centralized architectures.”[ref 326]
Systems with critical exponential capabilities
- Intelligence explosion:[ref 327]
- “explosion to ever greater levels of intelligence, as each generation of machines creates more intelligent machines in turn.”[ref 328]
- “a chain of events by which human-level AI leads, fairly rapidly, to intelligent systems whose capabilities far surpass those of biological humanity as a whole.”[ref 329]
- Autonomous replication in the real world: “A model that is unambiguously capable of replicating, accumulating resources, and avoiding being shut down in the real world indefinitely, but can still be stopped or controlled with focused human intervention.”[ref 330]
- Autonomous AI research: “A model for which the weights would be a massive boost to a malicious AI development program (e.g. greatly increasing the probability that they can produce systems that meet other criteria for [AI Safety Level]-4 in a given timeframe).”[ref 331]
Duplicator: [digital people or particular forms of advanced AI that would allow] “the ability to make instant copies of people (or of entities with similar capabilities) [leading to] explosive productivity.”[ref 332]
Systems with critical hazardous capabilities
Systems that pose or enable critical levels of (extreme or even existential) risk,[ref 333] regardless of whether they demonstrate a full range of human-level/like cognitive abilities.
- Advanced AI:
- “systems substantially more capable (and dangerous) than existing […] systems, without necessarily invoking specific generality capabilities or otherwise as implied by concepts such as ‘Artificial General Intelligence.’”[ref 334]
- “Systems that are highly capable and general purpose.”[ref 335]
- High-risk AI system”:
- An AI system that is both “(a) … intended to be used as a safety component of a product, or is itself a product covered by the Union harmonisation legislation […] (b) the product whose safety component is the AI system, or the AI system itself as a product, is required to undergo a third-party conformity assessment with a view to the placing on the market or putting into service of that product […].”[ref 336]
- “AI systems that are used to control the operation of critical infrastructure… [in particular] highly capable systems, increasingly autonomous systems, and systems that cross the digital-physical divide.”[ref 337]
- AI systems of concern: “highly capable AI systems that are […] high in ‘Property X’ [defined as] intrinsic characteristics such as agent-like behavior, strategic awareness, and long-range planning.”[ref 338]
- Prepotent AI: “an AI system or technology is prepotent […] (relative to humanity) if its deployment would transform the state of humanity’s habitat—currently the Earth—in a manner that is at least as impactful as humanity and unstoppable to humanity.”[ref 339]
- APS Systems: AI systems with “(a) Advanced capabilities, (b) agentic Planning, and (c) Strategic awareness.”[ref 340] These systems may risk instantiating “MAPS”—“misaligned, advanced, planning, strategically aware” systems;[ref 341] also called “power-seeking AI”.[ref 342]
- WIDGET: “Wildly Intelligent Device for Generalized Expertise and Technical Skills.”[ref 343]
- “Rogue AI:
- “an autonomous AI system that could behave in ways that would be catastrophically harmful to a large fraction of humans, potentially endangering our societies and even our species or the biosphere.”[ref 344]
- “a powerful and dangerous AI [that] attempts to execute harmful goals, irrespective of whether the outcomes are intended by humans.”[ref 345]
- Runaway AI: “advanced AI systems that far exceed human capabilities in many key domains, including persuasion and manipulation; military and political strategy; software development and hacking; and development of new technologies […] [these] superhuman AI systems might be designed to autonomously pursue goals in the real world.”[ref 346]
- “Frontier (AI) model [relative-capabilities threshold]:
- “large-scale machine-learning models that exceed the capabilities currently present in the most advanced existing models, and can perform a wide variety of tasks.”[ref 347]
- “highly capable general-purpose AI models that can perform a wide variety of tasks and match or exceed the capabilities present in today’s most advanced models.”[ref 348]
- Frontier (AI) model [unexpected-capabilities threshold]:
- “Highly capable foundation models, which could have dangerous capabilities that are sufficient to severely threaten public safety and global security. Examples of capabilities that would meet this standard include designing chemical weapons, exploiting vulnerabilities in safety-critical software systems, synthesising persuasive disinformation at scale, or evading human control.”[ref 349]
- “models that are both (a) close to, or exceeding, the average capabilities of the most capable existing models, and (b) different from other models, either in terms of scale, design (e.g. different architectures or alignment techniques), or their resulting mix of capabilities and behaviours.”[ref 350]
- Highly-capable systems of concern:
- “Highly capable foundation models […] capable of exhibiting dangerous capabilities with the potential to cause significant physical and societal-scale harm”[ref 351]
Appendix 2: Lists of definitions for policy tools and field
2.A. Terms for tools for intervention
Strategy[ref 352]
- AI strategy research: “the study of how humanity can best navigate the transition to a world with advanced AI systems (especially transformative AI), including political, economic, military, governance, and ethical dimensions.”[ref 353]
- AI strategy: “the study of big picture AI policy questions, such as whether we should want AI to be narrowly or widely distributed and which research problems ought to be prioritized.”[ref 354]
- Long-term impact strategies: “shape the processes that will eventually lead to strong AI systems, and steer them in a safer direction.”[ref 355]
- Strategy: “the activity or project of doing research to inform interventions to achieve a particular goal. […] AI strategy is strategy from the perspective that AI is important, focused on interventions to make AI go better.”[ref 356]
- AI macrostrategy: “the study of high level questions having to do with prioritizing the use of resources on the current margin in order to achieve good AI outcomes.”[ref 357]
Policy
- AI policy: “concrete soft or hard governance measures which may take a range of forms such as principles, codes of conduct, standards, innovation and economic policy or legislative approaches, along with underlying research agendas, to shape AI in a responsible, ethical and robust manner.”[ref 358]
- AI policymaking strategy: “A research field that analyzes the policymaking process and draws implications for policy design, advocacy, organizational strategy, and AI governance as a whole.”[ref 359]
Governance
- AI governance:
- “AI governance (or the governance of artificial intelligence) is the study of norms, policies, and institutions that can help humanity navigate the transition to a world with advanced artificial intelligence. This includes a broad range of subjects, from global coordination around regulating AI development to providing incentives for corporations to be more cautious in their AI research.”[ref 360]
- “local and global norms, policies, laws, processes, politics, and institutions (not just governments) that will affect social outcomes from the development and deployment of AI systems.”[ref 361]
- “shifting and setting up incentive structures for actions to be taken to achieve a desired outcome [around AI].”[ref 362]
- “identifying and enforcing norms for AI developers and AI systems themselves to follow. […] AI governance, as an area of human discourse, is engaged with the problem of aligning the development and deployment of AI technologies with broadly agreeable human values.”[ref 363]
- “the study or practice of local and global governance systems—including norms, policies, laws, processes, and institutions—govern or should govern AI research, development, deployment, and use.”[ref 364]
- Collaborative governance of AI technology: “collaboration between stakeholders specifically in the legal governance of AI technology. The stakeholders could include representatives of governments, companies, or other established groups.”[ref 365]
- AGI safety and governance practices: “internal policies, processes, and organizational structures at AGI labs intended to reduce risk.”[ref 366]
2.B. Terms for the field of practice
AI governance
- “the field of AI governance studies how humanity can best navigate the transition to advanced AI systems, focusing on the political, economic, military, governance, and ethical dimensions.”[ref 367]
- “AI governance concerns how humanity can best navigate the transition to a world with advanced AI systems. It relates to how decisions are made about AI, and what institutions and arrangements would help those decisions to be made well.”[ref 368]
- “AI governance refers (1) descriptively to the policies, norms, laws, and institutions that shape how AI is built and deployed, and (2) normatively to the aspiration that these promote good decisions (effective, safe, inclusive, legitimate, adaptive). […] governance consists of much more than acts of governments, also including behaviors, norms, and institutions emerging from all segments of society. In one formulation, the field of AI governance studies how humanity can best navigate the transition to advanced AI systems.”[ref 369]
Transformative AI governance
- “[governance that] includes both long-term AI and any nearer-term forms of AI that could affect the long-term future [and likewise] includes governance activities in both the near-term and the long-term that could affect the long-term future.”[ref 370]
Longterm(ist) AI governance
- Long-term AI governance: “[governance that] includes both long-term AI and any nearer-term forms of AI that could affect the long-term future [and likewise] includes governance activities in both the near-term and the long-term that could affect the long-term future.”[ref 371]
- Longtermist AI governance:
- “longtermism-motivated AI governance / strategy / policy research, practice, advocacy, and talent-building.”[ref 372]
- “the subset of [AI governance] work that is motivated by a concern for the very long-term impacts of AI. This overlaps significantly with work aiming to govern transformative AI (TAI).”[ref 373]
- “longtermist AI governance […] which is intellectually and sociologically related to longtermism […] explicitly prioritizes attention to considerations central to the long-term trajectory for humanity, and thus often to extreme risks (as well as extreme opportunities).”[ref 374]
Appendix 3: Auxiliary definitions and terms
Beyond this, it is also useful to clarify a range of auxiliary definitions that can support analysis in the advanced AI governance field. These include, but are not limited to:[ref 375]
- Strategic parameters: Features of the world that significantly determine the strategic nature of the advanced AI governance challenge. These parameters serve as highly decision-relevant or even crucial considerations, determining which interventions or solutions are appropriate, necessary, viable, or beneficial to addressing the advanced AI governance challenge; accordingly, different views of these underlying strategic parameters constitute underlying cruxes for different theories of actions and approaches. This encompasses different types of parameters:
- technical parameters (e.g., advanced AI development timelines and trajectories, threat models, and feasibility of alignment solution),
- deployment parameters (e.g., the distribution and constitution of actors developing advanced AI systems), and
- governance parameters (e.g., the relative efficacy and viability of different governance instruments).[ref 376]
- Key actor: An actor whose key decisions will have significant impact on shaping the outcomes from advanced AI, either directly (first-order), or by strongly affecting such decisions made by other actors (second-order).
- Key decision: A choice or series of choices by a key actor to use its levers of governance, in ways that directly affect beneficial advanced AI outcomes, and which are hard to reverse. This can include direct decisions about deployment or testing during a critical moment, but also includes many upstream decisions (such as over whether to initiate risky capabilities).
- Lever (of governance):[ref 377] A tool or intervention that can be used by key actors to shape or affect (1) the primary outcome of advanced AI development, (2) key strategic parameters of advanced AI governance, and (3) other key actors’ choices or key decisions.
- Pathway (to influence): A tool or intervention by which other actors (that are not themselves key actors) can affect, persuade, induce, incentivize, or require key actors to make certain key decisions. This can include interventions that ensure that certain levers of control are (not) used, or used in particular ways.
- (Decision-relevant) asset: Resources that can be used by other actors in pursuing pathways of influence to key actors, and that aim to induce how these key actors make key decisions (e.g., about whether or how to use their levers). This includes new technical research insights, worked-out policy products; networks of direct influence, memes, or narratives;
- (Policy) product: A subclass of assets; specific legible proposals that can be presented to key actors.
- Critical moment(s): High-leverage[ref 378] moments where high-impact decisions are made by some actors on the basis of the available decision-relevant assets, which affect whether beneficial advanced AI outcomes are within reach. These critical moments may occur during a public “AI crunch time,”[ref 379] but they may also occur potentially long in advance (if they lock in choices or trajectories).
- “Beneficial” AI outcomes: The desired and/or non-catastrophic societal outcomes from AI technology. This is a complex normative question, which one may aim to derive by some external moral standard or philosophy,[ref 380] through social choice theory,[ref 381] or through some legitimate (e.g., democratic) process by key stakeholders themselves.[ref 382] However, this concept is often undertheorized and needs significantly more work, scholarship, and normative and public deliberation.
Also in this series:
- Maas, Matthijs, and Villalobos, José Jaime. “International AI institutions: A literature review of models, examples, and proposals.” Institute for Law & AI, AI Foundations Report 1. (September 2023). https://law-ai.org/international-ai-institutions
- Maas, Matthijs, “AI is like… A literature review of AI metaphors and why they matter for policy.” Institute for Law & AI. AI Foundations Report 2. (October 2023). https://law-ai.org/ai-policy-metaphors
- Maas, Matthijs, “Advanced AI governance: A literature review.” Institute for Law & AI, AI Foundations Report 4. (November 2023). https://law-ai.org/advanced-ai-gov-litrev
Open-sourcing highly capable foundation models
Abstract
Recent decisions by leading AI labs to either open-source their models or to restrict access to their models has sparked debate about whether, and how, increasingly capable AI models should be shared. Open-sourcing in AI typically refers to making model architecture and weights freely and publicly accessible for anyone to modify, study, build on, and use. This offers advantages such as enabling external oversight, accelerating progress, and decentralizing control over AI development and use. However, it also presents a growing potential for misuse and unintended consequences. This paper offers an examination of the risks and benefits of open-sourcing highly capable foundation models. While open-sourcing has historically provided substantial net benefits for most software and AI development processes, we argue that for some highly capable foundation models likely to be developed in the near future, open-sourcing may pose sufficiently extreme risks to outweigh the benefits. In such a case, highly capable foundation models should not be open-sourced, at least not initially. Alternative strategies, including non-open-source model sharing options, are explored. The paper concludes with recommendations for developers, standard-setting bodies, and governments for establishing safe and responsible model sharing practices and preserving open-source benefits where safe.
International AI institutions
Abstract
The question of how to ensure adequate international governance of artificial intelligence (AI) has come to the center of global attention. This literature review examines the range of institutional models that have been proposed as the basis for new international organizations focused on AI. It reviews and discusses these proposals under a taxonomy of seven distinct institutional models that have been offered by scholars and practitioners. The models we include in this review are (a) scientific consensus-building, (b) political consensus-building and norm-setting, (c) coordination of policy and regulation, (d) enforcement of standards or restrictions, (e) stabilization and emergency response, (f) international joint research, and (g) distribution of benefits or access.
For each model, we provide (a) a description of the model’s functions and types, (b) the most common examples of the model, (c) some examples that are somewhat underexplored in the literature but that show promise, (d) a review of proposals for applying that model to the international regulation of AI, and (e) critiques of the model both generally and in its potential application to AI. In sum, we review thirty-five commonly invoked examples of these institutional models, twenty-four rarely explored but promising alternate institutional examples, and forty-nine proposals for new AI institutions. Finally, we sketch five directions for further research.
Executive summary
This literature review examines a range of institutional models that have been proposed for the international governance of artificial intelligence (AI). The review specifically focuses on proposals that would involve creating new international institutions for AI. As such, it focuses on seven models for international AI institutions with distinct functions.
Part I consists of the literature review. For each model, we provide (a) a description of each model’s functions and types, (b) the most common examples of the model, (c) some underexplored examples that are not (often) mentioned in the AI governance literature but that show promise, (d) a review of proposals for applying that model to the international regulation of AI, and (e) critiques of the model both generally and in its potential application to AI.
Part II briefly discusses some considerations for further research concerning the design of international institutions for AI, including the effectiveness of each model at accomplishing its aims, treaty-based regulatory frameworks, other institutional models not covered in this review, the compatibility of institutional functions, and institutional options to host a new international AI governance body.
Overall, the review covers seven models, as well as thirty-five common examples of those models, twenty-four additional examples, and forty-nine proposals of new AI institutions based on those models. Table 1 summarizes these findings.[ref 1]
Introduction
Recent and ongoing progress in artificial intelligence (AI) technology has highlighted that AI systems will have increasingly significant global impacts. In response, the past year has seen intense attention to the question of how to regulate these technologies, both at domestic and international levels. As part of this process, there have been renewed calls for establishing new international institutions to carry out much-needed governance functions and anchor international collaboration on managing the risks as well as realizing the benefits of this technology.
This literature review examines and categorizes a wide range of institutions that have been proposed to carry out the international governance of AI.[ref 2] Before reviewing these models, however, it is important to situate proposals to establish a new international institution on AI within the broader landscape of approaches to the global governance of AI. Not all approaches to AI governance focus on creating new institutions. Rather, the institutional approach is only one of several different approaches to international AI governance—each of them concentrating on different governance challenges posed by AI, and each of them providing different solutions.[ref 3] These approaches include:
(1) Rely on unilateral extraterritorial regulation. The extraterritorial approach foregoes (or at least does not prioritize) the multilateral pursuit of international regimes, norms, or institutions. Rather, it aims to enact effective domestic regulations on AI developments and then rely on the direct or extraterritorial effects of such regulations to affect the conditions or standards for AI governance in other jurisdictions. As such, this approach includes proposals to first regulate AI within (key) countries, whether by existing laws,[ref 4] through new laws or standards developed by existing institutions, or through new domestic institutions (such as a US “AI Control Council”[ref 5] or a National Algorithms Safety Board[ref 6]). These national policy levers[ref 7] can unilaterally affect the global approach to AI, either directly—for instance, through the effect of export controls on chokepoints in the AI chip supply chains[ref 8]—or because of the way such regulations can spill over to other jurisdictions, as seen in discussions of a “Brussels Effect,” a “California Effect,” or even a “Beijing Effect.”[ref 9]
(2) Apply existing international institutions, regimes, or norms to AI. The norm-application-focused approach argues that because much of international law establishes broad, technology-neutral principles and obligations, and many domains are already subject to a wide set of overlapping institutional activities, AI technology is in fact already adequately regulated in international law.[ref 10] As such, AI governance does not need new institutions or novel institutional models; rather, the aim is to reassert, reapply, extend, and clarify long-existing international institutions and norms. This is one approach that has been taken (with greater and lesser success) to address the legal gaps initially created by some past technologies, such as submarine warfare,[ref 11] cyberwar,[ref 12] or data flows within the digital economy,[ref 13] amongst others. This also corresponds to the approach taken by many international legal scholars, who argue that states should simply recognize that AI is already covered and regulated by existing norms and doctrines in international law, such as the principles of International Human Rights Law,[ref 14] International Humanitarian Law, International Criminal Law,[ref 15] the doctrine of state responsibility,[ref 16] or other regimes.[ref 17]
(3) Adapt existing international institutions or norms to AI. This approach concedes that AI technology is not yet adequately or clearly governed under international law but holds that existing international institutions could still be adapted to take on this role and may already be doing so. This approach includes proposals that center on mapping, supporting, and extending the existing AI-focused activities of existing international regimes and institutions such as the IMO, ICAO, ITU,[ref 18] various UN agencies,[ref 19] or other international organizations.[ref 20] Others explore proposals for refitting existing institutions, such as expanding the G20 with a Coordinating Committee for the Governance of Artificial Intelligence[ref 21] or changing the mandate or composition of UNESCO’s International Research Centre of Artificial Intelligence (ICRAI) or the International Electrotechnical Commission (IEC),[ref 22] to take up a stronger role in AI governance. Finally, others explore how either states (through Explanatory Memoranda or treaty reservations) or treaty bodies (through Working Party Resolutions) could adapt existing treaty regimes to more clearly cover AI systems.[ref 23] The emphasis here is on a “decentralized but coordinated” approach that supports institutions to adapt to AI,[ref 24] rather than necessarily aiming to establish new institutions in an already-crowded existing international “regime complex.”[ref 25]
(4) Create new international institutions to regulate AI based on the model of past or existing institutions. The institution-re-creating approach argues that AI technology does need new, distinct international institutions to be adequately governed. However, in developing designs or making the case for such institutions, this approach often points to the precedent of past or existing international institutions and regimes that have a similar model.
(5) Create entirely novel international institutional models to regulate AI. This approach argues not only that AI technology needs new international institutions, but also that past or existing international institutions (mostly) do not provide adequate models to narrowly follow or mimic.[ref 26] This is potentially reflected in some especially ambitious proposals for comprehensive global AI regimes or in suggestions to introduce entirely new mechanisms (e.g., “regulatory markets”[ref 27]) to governance.
In this review, we specifically focus on proposals for international AI governance and regulation that involve creating new international institutions for AI. That is to say, our main focus is on approach 4 and, to a lesser extent, approach 5.
We focus on new institutions because they might be better positioned to respond to the novelty, stakes, and technical features of advanced AI systems.[ref 28] Indeed, the current climate of global attention on AI seems potentially more supportive of establishing new landmark institutions for AI than has been the case in past years. As AI capabilities progress at an unexpected rate, multiple government representatives and entities[ref 29] as well as international organizations[ref 30] have recently stated their support towards a new international AI governance institution. Additionally, the idea of establishing such institutions has taken root among many of the leading actors in the AI industry.[ref 31]
With this, our review comes with two caveats. In the first place, our focus on this institutional approach above others does not mean that pursuing the creation of new institutions is necessarily an easy strategy or more feasible than the other approaches listed above. Indeed, proposals for new treaty regimes or international institutions for AI—especially when they draw analogies with organizations that were set up decades ago—may often underestimate how much the ground of global governance has changed in recent years. As such, they do not always reckon fully with the strong trends and forces in global governance which, for better or worse, have come to frequently push states towards relying on extending existing norms (approach 2) or adapting existing institutions (approach 3)[ref 32] rather than creating novel institutions. Likewise, there are further trends shifting US policy towards pursuing international cooperation through nonbinding international agreements rather than treaties[ref 33] as well as concerns that by some trends, international organizations may be taking up a less central role in international relations today than they have in the past.[ref 34] All of these trends should temper, or at least inform, proposals to establish new institutions.
Furthermore, even if one is determined to pursue establishing a new international institution along one of the models discussed here, many key open questions remain about the optimal route to design and establish that organization, including (a) Given that many institutional functions might be required to adequately govern advanced AI systems, might there be a need for “hybrid” or combined institutions with a dual mandate, like the IAEA?[ref 35] (b) Should an institution be tightly centralized or could it be relatively decentralized, with one or more new institutions orchestrating the AI policy activities of a constellation of many other (existing or new) organizations?[ref 36] (c) Should such an organization be established formally, or are informal club approaches adequate in the first instance?[ref 37] (d) Should voting rules within such institutions work on the grounds of consensus or simple majority? (e) What rules should govern adapting or updating the institution’s mission and mandate to track ongoing developments in AI? This review will briefly flag and discuss some of these questions in Part II but will leave many of them open for future research.
Regarding terminology, we will use both “international institution” and “international organization” interchangeably and broadly to refer to any of (a) formally established formal intergovernmental organizations (FIGOs) founded through a constituent document (e.g., WTO, WHO); (b) treaty bodies or secretariats that have a more limited mandate, primarily supporting the implementation of a treaty or regime (e.g., BWC Implementation Support Unit); and (c) ”informal IGOs” (IIGOs) that consist of loose “task groups” and coalitions of states (e.g., the G7, BRICS, G20).[ref 38] We use “model” to refer to the general cluster of institutions under discussion; we use “function” to refer to a given institutional model’s purpose or role. We use “AI proposals” to refer to the precise institutional models that are proposed for international AI governance.
I. Review of institutional models
Below, we review a range of institutional models that have been proposed for AI governance. For each model, we discuss its general functions, different variations or forms of the model, a range of examples that are frequently invoked, and explicit AI governance proposals that follow the model. In addition, we will highlight additional examples that have not received much attention but that we believe could be promising. Finally, where applicable, we will highlight existing critiques of a given model.
Model 1: Scientific consensus-building
1.1 Functions and types: The functions of the scientific consensus-building institutional model are to (a) increase general policymaker and public awareness of an issue, and especially to (b) establish a scientific consensus on an issue. The aim of this is to facilitate greater common knowledge or shared perception of an issue amongst states, with the aim to motivate national action or enable international agreements. Overall, the goal of institutions following this model is not to establish an international consensus on how to respond or to hand down regulatory recommendations directly, but simply to provide a basic knowledge base to underpin the decisions of key actors. By design, these institutions are, or aim to be, non-political—as in the IPCC’s mantra to be “policy-relevant and yet policy-neutral, never policy-prescriptive.”[ref 39]
1.2 Common examples: Commonly cited examples of scientific consensus-building institutions include most notably the Intergovernmental Panel on Climate Change (IPCC),[ref 40] the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES),[ref 41] and the Scientific Assessment Panel (SAP) of the United Nations Environment Programme (UNEP).[ref 42]
1.3 Underexplored examples: An example that has not yet been invoked in the literature but that could be promising to explore is the Antarctic Treaty’s Committee for Environmental Protection (CEP), which provides expert advice to the Antarctic Treaty Consultative Meetings and which combines scientific consensus-building models with risk-management functions, supporting the Protocol on Environmental Protection to the Antarctic Treaty.[ref 43] Another example could be the World Meteorological Organization (WMO), which monitors weather and climatic trends and makes information available.
1.4 Proposed AI institutions along this model: There have been a range of proposals for scientific consensus-building institutions for AI. Indeed, in 2018 the precursor initiative to what would become the Global Partnership on AI (GPAI) was initially envisaged by France and Canada as an Intergovernmental Panel on AI (IPAI) along the IPCC model.[ref 44] This proposal was supported by many researchers: Kemp and others suggest an IPAI that could measure, track, and forecast progress in AI, as well as its use and impacts, to “provide a legitimate, authoritative voice on the state and trends of AI technologies.”[ref 45] They argue that an IPAI could perform structural assessments every three years as well as take up quick-response special-issue assessments. In a contemporaneous paper, Mialhe proposes an IPAI model as an institution that would gather a large and global group of experts “to inform dialogue, coordination, and pave the way for efficient global governance of AI.”[ref 46]
More recently, Ho and others propose an intergovernmental Commission on Frontier AI to “establish a scientific position on opportunities and risks from advanced AI and how they may be managed,” to help increase public awareness and understanding, to “contribute to a scientifically informed account of AI use and risk mitigation [and to] be a source of expertise for policymakers.”[ref 47] Bremmer and Suleyman propose a global scientific body to objectively advise governments and international bodies on questions as basic as what AI is and what kinds of policy challenges it poses.[ref 48] They draw a direct link to the IPCC model, noting that “this body would have a global imprimatur and scientific (and geopolitical) independence […] [a]nd its reports could inform multilateral and multistakeholder negotiations on AI.”[ref 49] Bak-Coleman and others argue in favor of an Intergovernmental Panel on Information Technology, an independent, IPCC-like panel charged with studying the “impact of emerging information technologies on the world’s social, economic, political and natural systems.”[ref 50] In their view, this panel would focus on many “computational systems,” including “search engines, online banking, social-media platforms and large language models” and would have leverage to persuade companies to share key data.[ref 51]
Finally, Mulgan and others, in a 2023 paper, propose a Global AI Observatory (GAIO) as an institution that “would provide the necessary facts and analysis to support decision-making [and] would synthesize the science and evidence needed to support a diversity of governance responses.”[ref 52] Again drawing a direct comparison to the IPCC, they anticipate that such a body could set the foundation for more serious regulation of AI through six activities: (a) a global standardized incident reporting database, (b) a registry of crucial AI systems, (c) a shared body of data and analysis of the key facts of the AI ecosystem, (d) working groups exploring global knowledge about the impacts of AI on critical areas, (e) the ability to offer legislative assistance and model laws, and (f) the ability to orchestrate global debate through an annual report on the state of AI.[ref 53] They have since incorporated this proposal within a larger “Framework for the International Governance of AI” by the Carnegie Council for Ethics in International Affairs’s Artificial Intelligence & Equality Initiative, alongside other components such as a neutral technical organization to analyze “which legal frameworks, best practices, and standards have risen to the highest level of global acceptance.”[ref 54]
1.5 Critiques of this model: One concern that has been expressed is that AI governance is currently too institutionally immature to support an IPCC-like model, since, as Roberts argues, “the IPCC […] was preceded by almost two decades of multilateral scientific assessments, before being formalised.”[ref 55] He considers that this may be a particular problem for replicating that model for AI, given that some AI risks are currently still subject to significantly less scientific consensus.[ref 56] Separately, Bak-Coleman and others argue that a scientific consensus-building organization for digital technologies would face a far more difficult research environment than the IPCC and IPBES because, as opposed to the rich data and scientifically well-understood mechanisms that characterize climate change and ecosystem degradation, research into the impacts of digital technologies often faces data access restrictions.[ref 57] Ho and others argue that a Commission on Frontier AI would face more general scientific challenges in adequately studying future risks “on the horizon,” as well as potential politicization, both of which might inhibit the ability of such a body to effectively build consensus.[ref 58] Indeed, it is possible that in the absence of decisive and incontrovertible evidence about the trajectory and risks of AI, a scientific consensus-building institution would likely struggle to deliver on its core mission and might instead spark significant scientific contestation and disagreement amongst AI researchers.
Model 2: Political consensus-building and norm-setting
2.1 Functions and types: The function of political consensus-building and norm-setting institutions is to help states come to greater political agreement and convergence about the way to respond to a (usually) clearly identified and (ideally) agreed issue or phenomenon. These institutions’ aim is to reach the required political consensus necessary to either align national policymaking responses sufficiently well, achieving some level of harmonization that reduces trade restrictions or impedes progress towards addressing the issue, or to help begin negotiations on other institutions that establish more stringent regimes. Political consensus-building institutions do this by providing fora for discussion and debate that can aid the articulation of potential compromises between state interests and by exerting normative pressure on states towards certain goals. In a norm-setting capacity, institutions can also draw on (growing) political consensus to set and share informal norms, even if formal institutions have not yet been created. For instance, if negotiations for a regulatory or control institution are held up, slowed, or fail, political consensus-building institutions can also play a norm-setting function by establishing, as soft law, informal standards for behavior. While such norms are not as strictly specified or as enforceable as hard-law regulations are, they can still carry force and see take-up.
2.2 Common examples: There are a range of examples of political consensus-building institutions. Some of these are broad, such as conferences of parties to a treaty (also known as COPs, the most popular one being that of the United Nations Framework Convention on Climate Change [UNFCCC]).[ref 59] Many others, however, such as the Organization for Economic Co-operation and Development (OECD), the G20, and the G7, reflect smaller, at times more informal, governance “clubs,” which can often move ahead towards policy-setting more quickly because their membership is already somewhat aligned[ref 60] and because many of them have already begun to undertake activities or incorporate institutional units focused on AI developments.[ref 61]
Gutierrez and others have reviewed a range of historical cases of (domestic and global) soft-law governance that they argue could provide lessons for AI. These include a range of institutional activities, such as UNESCO’s 1997 Universal Declaration on the Human Genome and Human Rights, 2003 International Declaration on Human Genetic Data, and 2005 Universal Declaration on Bioethics and Human Rights,[ref 62] the Environmental Management System (ISO 14001), the Sustainable Forestry Practices by the Sustainable Forestry Initiative and Forest Stewardship Council, and the Leadership in Energy and Environmental Design initiative.[ref 63] Others, however, such as the Internet Corporation for Assigned Names and Numbers (ICANN), the Asilomar rDNA Guidelines, the International Gene Synthesis Consortium, the International Society for Stem Cell Research Guidelines, the BASF Code of Conduct, the Environmental Defense Fund, and the DuPont Risk Frameworks, offer greater examples of success.[ref 64] Turner likewise argues that the ICANN, which manages to develop productive internet policy, offers a model for international AI governance.[ref 65] Elsewhere, Harding argues that the 1967 Outer Space Treaty offers a telling case of a treaty regime that quickly crystallized state expectations and policies around safe innovation in a then-novel area of science.[ref 66] Finally, Feijóo and others suggest that “new technology diplomacy” on AI could involve a series of meetings or global conferences on AI, which could draw lessons from experiences such as the World Summits on the Information Society (WSIS).[ref 67]
2.3 Underexplored examples: Examples of norm-setting institutions that formulate and share relevant soft-law guidelines on technology include the International Organization for Standardization (ISO), International Electrotechnical Commission (IEC), the International Telecommunication Union (ITU), and the United Nations Commission on International Trade Law (UNCITRAL)’s Working Group on Electronic Commerce.[ref 68] Another good example of a political consensus-building and norm-setting initiative could be found in the 1998 Lysøen Declaration,[ref 69] an initiative by Canada and Norway that expanded to 11 highly committed states along with several NGOs and which kicked off a “Human Security Network” that achieved a significant and outsized global impact, including the Ottawa Treaty ban on antipersonnel mines, the Rome Treaty establishing the International Criminal Court, the Kimberley Process aimed at inhibiting the flow of conflict diamonds, and landmark Security Council resolutions on Children and Armed Conflict and Women, Peace and Security. Another norm-setting institution that is not yet often invoked in AI discussions but that could be promising to explore is the Codex Alimentarius Commission (CAC), which develops and maintains the Food and Agriculture Organization (FAO)’s Codex Alimentarius, a collection of non-enforceable but internationally recognized standards and codes of practice about various aspects of food production, food labeling, and safety. Another example of a “club” under this model which is not often mentioned but that could be influential is the BRICS group, which recently expanded from 5 to 11 members.
2.4 Proposed AI institutions along this model: Many proposals for political consensus-building institutions on AI tend to not focus on establishing new institutions, arguing instead that it is best to put AI issues on the agenda of existing (established and recognized) consensus-building institutions (e.g., the G20) or of existing norm-setting institutions (e.g., ISO). Indeed, even recent proposals for new international institutions still emphasize that these should link up well with already-ongoing initiatives, such as the G7 Hiroshima Process on AI.[ref 70]
However, there have been proposals for new political consensus-building institutions. Erdélyi and Goldsmith propose an International AI Organisation (IAIO), “to serve as an international forum for discussion and engage in standard setting activities.”[ref 71] They argue that “at least initially, the global AI governance framework should display a relatively low level of institutional formality and use soft-law instruments to support national policymakers in the design of AI policies.”[ref 72] Moreover, they emphasize that the IAIO “should be hosted by a neutral country to provide for a safe environment, limit avenues for political conflict, and build a climate of mutual tolerance and appreciation.”[ref 73] More recently, the US’s National Security Commission on Artificial Intelligence’s final report includes a proposal for an Emerging Technology Coalition, “to promote the design, development, and use of emerging technologies according to democratic norms and values; coordinate policies and investments to counter the malign use of these technologies by authoritarian regimes; and provide concrete, competitive alternatives to counter the adoption of digital infrastructure made in China.”[ref 74] Recently, Marcus and Reuel propose the creation of an International Agency for AI (IAAI) tasked with convening experts and developing tools to help find “governance and technical solutions to promote safe, secure and peaceful AI technologies.”[ref 75]
At the looser organizational end, Feijóo and others propose a new technology diplomacy initiative as “a renewed kind of international engagement aimed at transcending narrow national interests and seeks to shape a global set of principles.” In their view, such a framework could “lead to an international constitutional charter for AI.”[ref 76] Finally, Jernite and others propose a multi-party international Data Governance Structure, a multi-party, distributed governance arrangement for improving the global systematic and transparent management of language data at a global level, and which includes a Data Stewardship Organization in order to develop “appropriate management plans, access restrictions, and legal scholarship.”[ref 77] Other proposed organizations are also more focused on supporting states in implementing AI policy, such as through training. For instance, Turner proposes creating an International Academy for AI Law and Regulation.[ref 78]
2.5 Critiques of this model: There have not generally been many in-depth critiques of proposals for new political consensus-building or norm-setting institutions. However, some concerns focus on the difficult tradeoffs that consensus-building institutions face in deciding whether to prioritize breadth of membership and inclusion or depth of mission alignment. Institutions that aim to foster consensus across a very broad swath of actors may be very slow to reach such normative consensus, and even when they do, they may only achieve a “lowest-common-denominator” agreement.[ref 79] On the other hand, others counter that AI consensus-building institutions or fora will need to be sufficiently inclusive—in particular, and possibly controversially, with regard to China[ref 80]—if they do not want to run the risk of producing a fractured and ineffective regime, or even see negotiations implode over the political question of who was invited or excluded.[ref 81] Finally, a more foundational challenge to political consensus-building institutions is that while it may result in (the appearance of) joint narratives, this may not have much teeth if the agreement is not binding.[ref 82]
Model 3: Coordination of policy and regulation
3.1 Functions and types: The functions of this institutional model are to help align and coordinate policies, standards, or norms[ref 83] in order to ensure a coherent international approach to a common problem. There is significant internal variation in the setup of institutions under this model, with various subsidiary functions. For instance, such institutions may (a) directly regulate the deployment of a technology in relative detail, requiring states to comply with and implement those regulations at the national level; (b) assist states in the national implementation of agreed AI policies; (c) focus on the harmonization and coordination of policies; (d) focus on the certification of industries or jurisdictions to ensure they comply with certain standards; or (e) in some cases, take up functions related to monitoring and enforcing norm compliance.
3.2 Common examples: Common examples of policy-setting institutions include the World Trade Organization (WTO) as an exemplar of an empowered, centralized regulatory institution.[ref 84] Other examples given of regulatory institutions include the International Civil Aviation Organization (ICAO), the International Maritime Organization (IMO), the International Atomic Energy Agency (IAEA), and the Financial Action Task Force (FATF).[ref 85] Examples of policy-coordinating institutions may include the United Nations Environment Programme (UNEP), which synchronized international agreements on the environment and facilitated new agreements, including the 1985 Vienna Convention for the Protection of the Ozone Layer.[ref 86] Nemitz points to the example of the institutions created under the United Nations Convention on the Law of the Sea (UNCLOS) as a model for an AI regime, including an international court to enforce the proposed treaty.[ref 87] Finally, Sepasspour proposes the establishment of an “AI Ethics and Safety Unit” within the existing International Electrotechnical Commission (IEC), under a model that is “inspired by the Food and Agriculture Organization’s (FAO) Food Safety and Quality Unit and Emergency Prevention System for Food Safety early warning system.”[ref 88]
3.3 Underexplored examples: Examples that are not yet often discussed but that could be useful or insightful include the European Monitoring and Evaluation Programme (EMEP), which implements the 1983 Convention on Long-Range Transboundary Air Pollution—a regime that has proven particularly adaptive.[ref 89] A more sui generis example is that of international financial institutions, like the World Bank or the International Monetary Fund (IMF), which tend to shape domestic policy indirectly through conditional access to loans or development fund
3.4 Proposed AI institutions along this model: Specific to advanced AI, recent proposals for regulatory institutions include Ho and other’s Advanced AI Governance Organisation, which “could help internationalize and align efforts to address global risks from advanced AI systems by setting governance norms and standards, and assisting in their implementation.”[ref 90]
Trager and others propose an International AI Organization (IAIO) to certify jurisdictions’ compliance with international oversight standards. These would be enforced through a system of conditional market access in which trade barriers would be imposed on jurisdictions which are not certified or whose supply chains integrate AI from non-IAIO certified jurisdictions. Among other advantages, the authors suggest that this system could be less vulnerable to proliferation of industry secrets by having states establish their own domestic regulatory entities rather than having international jurisdictional monitoring (as is the case with the IAEA). However, the authors also propose that the IAIO could provide monitoring services to governments that have not yet built their own monitoring capabilities. The authors argue that their model has several advantages over others, including agile standard-setting, monitoring, and enforcement.[ref 91]
In a regional context, Stix proposes an EU AI Agency which, among other roles, could be an analyzer of gaps in AI policy and a developer of policies that could fill such gaps. For this agency to be effective, Stix suggests it should be independent from political agendas by, for instance, having a mandate that does not coincide with election cycles.[ref 92] Webb proposes a “Global Alliance on Intelligence Augmentation” (GAIA), which would bring together experts from different fields to set best practices for AI.[ref 93]
Chowdhury proposes a generative AI global governance body as a “consolidated ongoing effort with expert advisory and collaborations [which] should receive advisory input and guidance from industry, but have the capacity to make independent binding decisions that companies must comply with.”[ref 94] In her analysis, this body should be funded via unrestricted and unconditional funds by all AI companies engaged in the creation or use of generative AI and it should “cover all aspects of generative AI models, including their development, deployment, and use as it relates to the public good. It should build upon tangible recommendations from civil society and academic organizations, and have the authority to enforce its decisions, including the power to require changes in the design or use of generative AI models, or even halt their use altogether if necessary.”[ref 95]
A proposal for a policy-coordinating institution is Kemp and others’ Coordinator and Catalyser of International AI Law, which would be “a coordinator for existing efforts to govern AI and catalyze multilateral treaties and arrangements for neglected issues.”[ref 96]
3.5 Critiques of this model: Castel and Castel critique international conventions on the grounds that they “are difficult to monitor and control.”[ref 97] More specifically, Ho and others argue that a model like an Advanced AI Governance Organization would face challenges around its ability to set and update standards sufficiently quickly, around incentivizing state participation in adopting the regulations, and in sufficiently scoping the challenges to focus on.[ref 98] Finally, reviewing general patterns in current state activities on AI standard-setting, von Ingersleben notes that “technical experts hailing from geopolitical rivals, such as the United States and China, readily collaborate on technical AI standards within transnational standard-setting organizations, whereas governments are much less willing to collaborate on global ethical AI standards within international organizations,”[ref 99] which suggests potential thresholds to overcoming state disinterest in participating in any international institutions focused on more political and ethical standard-setting.
Model 4: Enforcement of standards or restrictions
4.1 Functions and types: The function of this institutional model is to prevent the production, proliferation, or irresponsible deployment of a dangerous or illegal technology, product, or activity. To fulfill that function, institutions under this model rely on, among other mechanisms, (a) bans and moratoria, (b) nonproliferation regimes, (c) export-control lists, (d) monitoring and verification mechanisms,[ref 100] (e) licensing regimes, and (f) registering and/or tracking of key resources, materials, or stocks. Other types of mechanisms, such as (g) confidence-building measures (CBMs), are generally transparency-enabling.[ref 101] While generally focused on managing tensions and preventing escalations,[ref 102] CBMs can also build trust amongst states in each other’s mutual compliance with standards or prohibitions, and can therefore also support or underwrite standard- and restriction-enforcing institutions.
4.2 Common examples: The most prominent example of this model, especially in discussions of institutions capable of carrying out monitoring and verification roles, is the International Atomic Energy Agency (IAEA)[ref 103]—in particular, its Department of Safeguards. Many other proposals refer to the monitoring and verification mechanisms of arms control treaties.[ref 104] For instance, Baker has studied the monitoring and verification mechanisms for different types of nuclear arms control regimes, reviewing the role of the IAEA system under Comprehensive Safeguards Agreements with Additional Protocols in monitoring nonproliferation treaties such as the Non-Proliferation Treaty (NPT) and the five Nuclear-Weapon-Free-Zone Treaties, the role of monitoring and verification arrangements in monitoring bilateral nuclear arms control limitation treaties, and the role of the International Monitoring System (IMS) in monitoring and enforcing (prospective) nuclear test bans under the Preparatory Commission for the Comprehensive Nuclear-Test-Ban Treaty Organization (CTBTO).[ref 105] Shavit likewise refers to the precedent of the NPT and IAEA in discussing a resource (compute) monitoring framework for AI.[ref 106]
Examples given of export-control regimes include the Nuclear Suppliers Group, the Wassenaar Arrangement, and the Missile Technology Control Regime.[ref 107] As examples of CBMs, people have pointed to the Open Skies Treaty,[ref 108] which is enforced by the Open Skies Consultative Commission (OSCE).
There are also examples of global technology control institutions that were not carried through but which are still discussed as inspirations for AI, such as the international Atomic Development Authority (ADA) proposed in the early nuclear age[ref 109] or early- to mid-20th-century proposals for the global regulation of military aviation.[ref 110]
4.3 Underexplored examples: Examples that are not yet often discussed in the context of AI but that could be promising are the Organisation for the Prohibition of Chemical Weapons (OPCW),[ref 111] the Biological Weapons Convention’s Implementation Support Unit, the International Maritime Organization (in its ship registration function), and the Convention on International Trade in Endangered Species of Wild Fauna and Flora’s (CITES) Secretariat, specifically, its database of national import and export reports.
4.4 Proposed AI institutions along this model: Proposals along this model are particularly widespread and prevalent. Indeed, as mentioned, a significant part of the literature on the international governance of AI has made reference to some sort of “IAEA for AI.” For instance, in relatively early proposals,[ref 112] Turchin and others propose a “UN-backed AI-control agency” which “would require much tighter and swifter control mechanisms, and would be functionally equivalent to a world government designed specifically to contain AI.”[ref 113] Ramamoorthy and Yampolskiy propose a “global watchdog agency” that would have the express purpose of tracking AGI programs and that would have the jurisdiction and the lawful authority to intercept and halt unlawful attempts at AGI development.[ref 114] Pointing to the precedent of both the IAEA and its inspection regime, and the Comprehensive Nuclear Test-Ban Treaty Organization (CTBTO)’s Preparatory Commission, Nindler proposes an International Enforcement Agency for safe AI research and development, which would support and implement the provisions of an international treaty on safe AI research and development, with the general mission “to accelerate and enlarge the contribution of artificial intelligence to peace, health and prosperity throughout the world [and … to ensure that its assistance] is not used in such a way as to further any military purpose.”[ref 115] Such a body would be charged with drafting safety protocols and measures, and he suggests that its enforcement could, in extreme cases, be backed up by the use of force under the UN Security Council’s Chapter VII powers.[ref 116]
Whitfield draws on the example of the UN Framework Convention on Climate Change to propose a UN Framework Convention on AI (UNFCAI) along with a Protocol on AI that would subsequently deliver the first set of enforceable AI regulations. He proposes that these should be supported by three new bodies: an AI Global Authority (AIGA) to provide an inspection regime in particular for military AI, an associated “Parliamentary Assembly” supervisory body that would enhance democratic input into the treaty’s operations and play “a constructive monitoring role,” as well as a multistakeholder Intergovernmental Panel on AI to provide scientific, technical, and policy advice to the UNFCAI.[ref 117]
More recently,[ref 118] Ho and others propose an “Advanced AI Governance Organization” which, in addition to setting international standards for the development of advanced AI (as discussed above), could monitor compliance with these standards through, for example, self-reporting, monitoring practices within jurisdictions, or detection and inspection of large data centers.[ref 119] Altman and others propose an AIEA for Superintelligence” consisting of “an international authority that can inspect systems, require audits, test for compliance with safety standards, place restrictions on degrees of deployment and levels of security.”[ref 120] In a very similar vein, Guest (based on an earlier proposal by Karnofsky)[ref 121] calls for an “International Agency for Artificial Intelligence (IAIA)” to conduct “extensive verification through on-chip mechanisms [and] on-site inspections” as part of his proposal for a “Collaborative Handling of Artificial Intelligence Risks with Training Standards (CHARTS).”[ref 122] Drawing together elements from several models—and referring to the examples of the IPCC, Interpol, and the WTO’s dispute settlement system—Gutierrez proposes a “multilateral AI governance initiative” to mitigate “the shared large-scale high-risk harms caused directly or indirectly by AI.”[ref 123] His proposal envisions an organizational structure consisting of (a) a forum for member state representation (which adopts decisions via supermajority); (b) technical bodies, such as an external board of experts, and a permanent technical and liaison secretariat that works from an information and enforcement network and which can issue “red notice” alerts; and (c) an arbitration board that can hear complaints by non-state AI developers who seek to contest these notices as well as by member states.[ref 124]
In a 2013 paper, Wilson proposes an “Emerging Technologies Treaty”[ref 125] that would address risks from many emerging technologies. In his view, this treaty could either be housed under an existing international organization or body or established separately, and it would establish a body of experts that would determine whether there was a “reasonable grounds for concern” about AI or other dangerous research, after which states would be required to regulate or temporarily prohibit research.[ref 126] Likewise drawing on the IAEA model, Chesterman proposes an International Artificial Intelligence Agency (IAIA) as an institution with “a clear and limited normative agenda, with a graduated approach to enforcement,” arguing that “the main ‘red line’ proposed here would be the weaponization of AI—understood narrowly as the development of lethal autonomous weapon systems lacking ‘meaningful human control’ and more broadly as the development of AI systems posing a real risk of being uncontrollable or uncontainable.”[ref 127] In practice, this organization would draw up safety standards, develop a forensic capability to identify those responsible for “rogue” AI, serve as a clearinghouse to gather and share information about such systems, and provide early notification of emergencies.[ref 128] Chesterman argues that one organizational cause that could be adopted for this IAIA is to learn from the IAEA, where its Board of Governors (rather than the annual General Conference) has ongoing oversight of its operations.
Other authors endorse an institution more directly aimed at preventing or limiting proliferation of dangerous AI systems. Jordan and others propose a “NPT+” model,[ref 129] and the Future of Life Institute (FLI) proposes “international agreements to limit particularly high-risk AI proliferation and mitigate the risks of advanced AI.”[ref 130] PauseAI proposes an international agreement that sets up an “International AI Safety Agency” that would be in charge of granting approvals for deployments of AI systems and new training runs above a certain size.[ref 131] The Elders, a group of independent former world leaders, have recently called on countries to request, via the UN General Assembly, that the International Law Commission draft an international treaty to establish a new “International AI Safety Agency,”[ref 132] drawing on the models of the NPT and the IAEA, “to manage these powerful technologies within robust safety protocols [and to …] ensure AI is used in ways consistent with international law and human rights treaties.”[ref 133] More specific monitoring provisions are also entertained; for instance, Balwit briefly discusses an advanced AI chips registry, potentially organized by an international agency.[ref 134]
At the level of transparency-supporting agreements, there are many proposals for confidence-building measures for (military) AI. Such proposals focus on bilateral arrangements that build confidence amongst states and contribute to stability (as under Model 5), but which lack distinct institutions. For instance, Shoker and others discuss an “international code of conduct for state behavior.”[ref 135] Scharre, Horowitz, Khan and others discuss a range of other AI CBMs,[ref 136] including the marking of autonomous weapons systems, geographic limits, and limits on particular (e.g., nuclear) operations of AI.[ref 137] They propose to group these under an International Autonomous Incidents Agreement (IAIA) to “help reduce risks from accidental escalation by autonomous systems, as well as reduce ambiguity about the extent of human intention behind the behavior of autonomous systems.”[ref 138] In doing so, they point to the precedent of arrangements such as the 1972 Incidents at Sea Agreement[ref 139] as well as the 12th–19th century development of Maritime Prize Law.[ref 140] Imbrie and Kania propose an “Open Skies on AI” agreement.[ref 141] Bremmer & Suleyman propose a bilateral US-China regime to foster cooperation between the US and Beijing on AI, envisioning this “to create areas of commonality and even guardrails proposed and policed by a third party.”[ref 142]
4.5 Critiques of this model: Many critiques of the enforcement model have ended up focusing (whether fairly or not) on the appropriateness of the basic analogy between nuclear weapons and AI that is explicit or implicit in proposals for an IAEA- or NPT-like regime. For instance, Kaushik and Korda have opposed what they see as aspirations to a “wholesale ban” on dangerous AI and argue that “attempting to regulate artificial intelligence indiscriminately would be akin to regulating the concept of nuclear fission itself.”[ref 143]
Others critique the appropriateness of an IAEA-modeled approach: Stewart suggests that the focus on the IAEA’s safeguards is inadequate since AI systems cannot be safeguarded in the same way, and he suggests that, rather, better lessons might be found in the IAEA’s International Physical Protection Advisory Service (IPPAS) missions, which allow it to serve as an independent third party to assess the regulatory preparedness of countries that aim to develop nuclear programs.[ref 144] Drexel and Depp argue that even if this IAEA model could work on a technical level, it will likely be prohibitively difficult to negotiate such an intense level of oversight.[ref 145] Further, Sepasspour as well as Law note that rather than a straightforward setup, there were years of delay between the IAEA’s establishment (1957), its adoption of the INFCIRC 26 safeguards document (1961), its taking of a leading role in nuclear nonproliferation upon the adoption of the NPT (1968), and its eventual further empowerment of its verification function through the Additional Protocol (1997).[ref 146] Such a slow aggregation might not be adequate given the speed of advanced AI development. Finally, another issue is that the strength of an IAEA agency depends on the existence of supportive international treaties as well as specific incentives for participation.
Others question whether this model would be desirable, even if achievable. Howard generally critiques many governance proposals that would involve centralized control (whether domestic or global) over the proliferation of and access to frontier AI systems, arguing that such centralization would end up only advantaging currently powerful AI labs as well as malicious actors willing to steal models, with the concern that this would have significant illiberal effects.[ref 147]
Model 5: Stabilization and emergency response
5.1 Functions and types: The function of this institutional model is to ensure that an emerging technology or an emergency does not have a negative impact on social stability and international peace.
Such institutions can serve various subsidiary functions, including (a) performing general stability management by assessing and mitigating systemic vulnerabilities that are susceptible to incidents or accidents; (b) providing early warning of—and response coordination to—incidents and emergencies, providing timely warning, and creating common knowledge of an emergency;[ref 148] (c) generally stabilizing relations, behavior, and expectations around AI technology to encourage transparency and trust around state activities in a particular domain and to avoid inadvertent military conflict.
5.2 Common examples: Examples of institutions involved in stability management include the Financial Stability Board (FSB), an entity “composed of central bankers, ministries of finance, and supervisory and regulatory authorities from around the world.”[ref 149] Another example might be the United Nations Office for Disaster Risk Reduction (UNDRR), which focuses on responses to natural disasters.[ref 150] Gutierrez invokes Interpol’s “red notice” alert system as an example of a model by which an international institution could alert global stakeholders about the dangers of a particular AI system.[ref 151]
5.3 Underexplored examples: Examples that are not yet invoked, but that could be promising examples of early warning functions include WHO’s “public health emergency of international concern” early warning mechanism and the procedure established in the IAEA’s 1986 Convention on Early Notification of a Nuclear Accident.
5.4 Proposed AI institutions along this model: AI proposals along the early warning model include Pauwels’ paper describing a Global Foresight Observatory as a multistakeholder platform aimed at fostering greater cooperation in technological and political preparedness for the impacts of innovation in various fields, including AI.[ref 152] Brenner and Suleyman propose a Geotechnology Stability Board which “could work to maintain geopolitical stability amid rapid AI-driven change” based on the coordination of national regulatory authorities and international standard-setting bodies. At other times, such a body would help prevent global technology actors from “engaging in regulatory arbitrage or hiding behind corporate domiciles.” Finally, it could also take up responsibility for governing open-source AI and censoring uploads of highly dangerous models.[ref 153]
5.5 Critiques of this model: As there have been relatively limited numbers of proposals for this model, there are not yet many critiques. However, possible critiques might focus on the potential adequacy of relying on international institutions to respond to (rather than prevent) situations where dangerous AI systems have already seen deployment, as coordinating, communicating, and implementing effective countermeasures in those situations might either be very difficult or far too slow to respond adequately to countering a misaligned AI system.
Model 6: International joint research
6.1 Functions and types: The function of this institutional model is to start a bilateral or multilateral collaboration between states or state entities to solve a common problem or achieve a common goal. Most institutions following this model would focus on accelerating the development of a technology or exploitation of a resource by particular actors in order to avoid races. Others would aim at speeding up the development of safety techniques.
In some proposals, an institution like this aims not just to rally and organize a major research project, but simultaneously to include elements of an enforcing institution in order to exclude all other actors from conducting research and/or creating capabilities around that problem or goal, creating a de facto or an explicit international monopoly on an activity.
6.2 Common examples: Examples that are pointed to as models of an international joint scientific program include the European Organization for Nuclear Research (CERN),[ref 154] ITER, the International Space Station (ISS), and the Human Genome Project.[ref 155] Example models of a (proposed) international monopoly include the Acheson-Lilienthal Proposal[ref 156] and the resulting Baruch Plan, which called for the creation of an Atomic Development Authority.[ref 157]
6.3 Underexplored examples: Examples that are not yet discussed in the literature but that could be promising are the James Webb Space Telescope and the Laser Interferometer Gravitational-Wave Observatory (LIGO),[ref 158] which is organized internationally through the LIGO Scientific Collaboration (LSC).
6.4 Proposed AI institutions along this model: Explicit AI proposals along the joint scientific program model are various.[ref 159] Some proposals focus primarily on accelerating safety. Lewis Ho and others suggest an “AI Safety Project” to “promote AI safety R&D by promoting its scale, resourcing and coordination.” To ensure AI systems are reliable and less vulnerable to misuse, this institution would have access to significant compute and engineering capacity as well as to AI models developed by AI companies. Contrary to other international joint scientific programs like CERN or ITER, which are strictly intergovernmental, Ho and others propose that the AI Safety Project comprise other actors as well (e.g., civil society and the industry). The authors also suggest that, to prevent replication of models or diffusion of dangerous technologies, the AI Safety Project should incorporate information and security measures such as siloing information, structuring model access, and designing internal review processes.[ref 160] Neufville and Baum point out that “a clearinghouse for research into AI” could solve the collective problem of underinvestment in basic research, AI ethics, and safety research.[ref 161] More ambitiously, Ramamoorthy and Yampolskiy propose a “Benevolent AGI Treaty,” which involves “the development of AGI as a global, non-strategic humanitarian objective, under the aegis of a special agency within the United Nations.”[ref 162]
Other proposals suggest intergovernmental collaboration for the development of AI systems more generally. Daniel Zhang and others at Stanford University’s HAI recommend that the United States and like-minded allies create a “Multilateral Artificial Intelligence Research Institute (MAIRI)” to facilitate scientific exchanges and promote collaboration on AI research—including the risks, governance, and socio-economic impact of AI—based on a foundational agreement outlining agreed research practices. The authors suggest that MAIRI could also strengthen policy coordination around AI.[ref 163] Fischer and Wenger add that a “neutral hub for AI research” should have four functions: (a) fundamental research in the field of AI, (b) research and reflection on societal risks associated with AI, (c) development of norms and best practices regarding the application of AI, and (d) further education for AI researchers. This hub could be created by a conglomerate of like-minded states but should eventually be open to all states and possibly be linked to the United Nations through a cooperation agreement, according to the authors.[ref 164] Other authors posit that an international collaboration on AI research and development should include all members of the United Nations from the start, as similar projects like the ISS or the Human Genome Project have done. They suggest that this approach might reduce the possibility of an international conflict.[ref 165] In this vein, Kemp and others call for the foundation of a “UN AI Research Organization (UNAIRO),” which would focus on “building AI technologies in the public interest, including to help meet international targets […] [a] secondary goal could be to conduct basic research on improving AI techniques in the safest, careful and responsible environment possible.”[ref 166]
Philipp Slusallek, Scientific Director of the German Research Center for Artificial Intelligence, suggests a “CERN for AI”—“a collaborative, scientific effort to accelerate and consolidate the development and uptake of AI for the benefit of all humans and our environment.” Slusallek promotes a very open and transparent design for this institution, in which data and knowledge would flow freely between collaborators.[ref 167] Similarly, the Large-scale Artificial Intelligence Open Network (LAION) calls for a CERN-like open-source collaboration among the United States and allied countries to establish an international “supercomputing research facility” hosting “a diverse array of machines equipped with at least 100,000 high-performance state-of-the-art accelerators” that can be overseen by democratically elected institutions from participating countries.[ref 168] Daniel Dewey goes a step further and suggests a potential joint international AI project with a monopoly over hazardous AI development in the same spirit of the 1946 Baruch Plan, which proposed an international Atomic Development Authority with a monopoly over nuclear activities. However, Dewey admits this proposal is possibly politically intractable.[ref 169] In another proposal for monopolized international development, Miotti suggests a “Multilateral AGI Consortium” (MAGIC), which would be an international organization mandated to run “the world’s only advanced and secure AI facility focused on safety-first research and development of advanced AI.”[ref 170] This organization would only share breakthroughs with the outside world once proven demonstrably safe and would therefore be coupled with a global moratorium on the creation of AI systems exceeding a set compute-governance threshold.
The proposals for an institution analogous to CERN discussed thus far envision a grand institution that draws talent and resources for research and development of AI projects in general. Other proposals have a narrower focus. Charlotte Stix, for example, suggests that a more decentralized version of this model could be more beneficial. Stix argues that a “European Artificial Intelligence megaproject” could be composed of a centralized headquarters to overview collaborations and provide economies of scale for AI precursors within a network of affiliated AI laboratories that conduct most of the research.[ref 171] Other authors argue that rather than focus on AI research in general, an international research collaboration could focus on the use of AI to solve problems in a specific field, such as climate change, health, privacy-enhancing technologies, economic measurement, or the sustainable development goals.[ref 172]
6.5 Critiques of this model: In general, there have been few sustained critiques of this institutional model. However, Ho and others suggest that an international collaboration to conduct technical AI-safety research might face challenges in that it might pull safety researchers away from the frontier AI developers, reducing in-house safety expertise. In addition, there are concerns that any international project that would need to access advanced AI models would run risks over security concerns and model leaking.[ref 173]
Moreover, more fundamental critiques do exist; for instance, Kaushik and Korda critique the feasibility of a “Manhattan Project-like undertaking to address the ‘alignment problem’,” arguing that massively accelerating AI-safety research through any large-scale governmental project is infeasible. Moreover, they argue that it is an inappropriate analogy because the Manhattan Project offered a singular goal, whereas AI safety faces a situation where ‘“ten thousand researchers have ten thousand different ideas on what it means and how to achieve it.”[ref 174]
Model 7: Distribution of benefits and access
7.1 Functions and types: The function of this institutional model is to provide access to the benefits of a technology or a global public good to those states or individuals who do not yet have it due to geographic or economic reasons, among others. Very often, the aim of such an institution is to facilitate unrestricted access or even access schemes targeted to the most needy and deprived. When the information or goods being shared can potentially pose a risk or be misused, yet responsible access is still considered a legitimate, necessary, or beneficial goal, institutions under this model tend to create a system for conditional access.
7.2 Common examples: Examples of unrestricted benefit-distributor institutions include international public-private partnerships like Gavi, the Vaccine Alliance, and the Global Fund to Fight AIDS, Tuberculosis and Malaria.[ref 175] Examples of conditional benefit-distributor institutions might include the IAEA’s nuclear fuel bank.[ref 176]
7.3 Underexplored examples: Examples that are not yet invoked in the AI context but that could be promising include the Nagoya Protocol’s Access and Benefit-sharing Clearing-House (ABS Clearing-House),[ref 177] the UN Climate Technology Centre and Network,[ref 178] and the United Nations Industrial Development Organization (UNIDO), which is tasked with helping build up industrial capacities in developing countries.
7.4 Proposed AI institutions along this model: Stafford and Trager draw an analogy between the NPT and a potential international regime to govern transformative AI. The basis for this comparison is that both technologies are dual-use, both present risks even in civilian applications, and there are significant gaps in the access different states have to these technologies. Just like in the case of nuclear energy, in a scenario where there is a clear leader in the race to develop AI while others are lagging, it is mutually beneficial for the actors to enter a technology-sharing bargain. This way, the leader can ensure it will continue to be at the front of the race, while the laggards secure access to the technology. Stafford and Trager call this the “Hopeless Laggard effect.” To enforce this technology-sharing bargain in the sphere of transformative AI, an international institution would have to be created to conduct similar functions to the IAEA’s Global Nuclear Safety and Security Network, which transfers knowledge from countries with mature nuclear energy programs to those who are just starting to develop one. As an alternative, the authors suggest that the leader in AI could prevent the laggards from engaging in a race by sharing the wealth resulting from transformative AI.[ref 179]
The US’s National Security Commission on Artificial Intelligence’s final report included a proposal for an International Digital Democracy Initiative (IDDI) “with allies and partners to align international assistance efforts to develop, promote, and fund the adoption of AI and associated technologies that comports with democratic values and ethical norms around openness, privacy, security, and reliability.”[ref 180]
Ho and others envision a model that incorporates the private sector into the benefit-distribution dynamic. A “Frontier AI Collaborative” could spread the benefits of cutting-edge AI—including global resilience to misused or misaligned AI systems—by acquiring or developing AI systems with a pool of resources from member states and international development programs, or from AI laboratories. This form of benefit-sharing could have the additional advantage of incentivizing states to join an international AI governance regime in exchange for access to the benefits distributed by the collaborative.[ref 181] More broadly, the Elders suggest creating an institution analogous to the IAEA to guarantee that AI’s benefits are “shared with poorer countries.”[ref 182] In forthcoming work, Adan sketches key features for a Fair and Equitable Benefit Sharing Model, to “foster inclusive global collaboration in transformative AI development and ensure that the benefits of AI advancements are equitably shared among nations and communities.”[ref 183]
7.5 Critiques of this model: One challenge faced by benefit-distributor institutions is how to balance the risk of proliferation with ensuring meaningful benefits and take-up from its technology-promotional and distributive work.[ref 184] For instance, Ho and others suggest that proposals such as their Frontier AI Collaborative proposal could risk inadvertently diffusing dangerous dual-use technologies while simultaneously encountering barriers and obstacles to effectively empowering underserved populations with AI.[ref 185]
More fundamentally, potential challenges or concerns with global benefit- and access-providing institutions—especially those that involve some forms of conditional access—will likely see challenges (and critiques) on the basis of how they organize participation. In recent years, several researchers have argued that the global governance of AI is seeing only limited participation by states from the Global South;[ref 186] Veale and others have recently critiqued many initiatives to secure “AI for Good” or “responsible AI,” arguing that these have fallen into a “paradox of participation,” one involving “the surface-level participation of Global South stakeholders without providing the accompanying resources and structural reforms to allow them to be involved meaningfully.”[ref 187] It is likely that similar critiques will be raised against benefit-distributing institutions.
II. Directions for further research
In light of the literature review conducted in Part I, we can consider a range of additional directions for further research. Without intending to be exhaustive, this section discusses some of those directions briefly, offering some initial thoughts on the existing gaps in the current literature and how each line of research might be helpful to inform key decisions around the international governance of AI—around whether or when to create international institutions, what specific institutional models to prioritize, how to establish these institutions, and how to design them for effectiveness, amongst others.
Direction 1: Effectiveness of institutional models
In the above summary, we have outlined potential institutional models for AI without always making an assessment of their weaknesses or their effectiveness in meeting their stated goals. We believe such further analysis could be critical, however, to filter out models that would be apt to govern the risks from AI and reduce such risks de facto (not just de jure).
There is, of course, a live debate on the “effectiveness” of international law and institutions, with an extensive literature that tries to assess patterns of state compliance with different regimes in international law[ref 188] as well as more specific patterns affecting the efficacy of international organizations[ref 189] or their responsiveness to shifts in the underlying problem.[ref 190]
Such work has highlighted the imperfect track record of many international treaties in meeting their stated purposes,[ref 191] the various ways in which states may aim to evade obligations even while complying with the letter of the law,[ref 192] the ways in which states may aim to use international organizations to promote narrow national interests rather than broader organizational objectives,[ref 193] and the situations under which states aim to exit, shift away from, or replace existing institutions with new alternatives.[ref 194] Against such work, other studies have explored the deep normative changes that international norms have historically achieved in topics such as the role of territorial war,[ref 195] the transnational and domestic mechanisms by which states are pushed to commit to and comply with different treaties,[ref 196] the more nuanced conditions that may induce greater or lesser state compliance with norms or treaties,[ref 197] the effective role that even nonbinding norms may play,[ref 198] as well as arguments that a narrow focus on state compliance with international rules understates the broader effects that those obligations may have on the way that states bargain in light of those norms (even when they aim to bend them).[ref 199] Likewise, there is a larger body of foundational work that considers whether traditional international law, based in state consent, might be an adequate tool to secure global public goods such as those around AI, even if states complied with their obligations.[ref 200]
Work to evaluate the (prospective) effectiveness of international institutions on AI could draw on this widespread body of literature to learn lessons from the successes and failures of past regimes, as well as on scholarship on the appropriate design of different bodies[ref 201] and measures to improve the decision-making performance of such organizations,[ref 202] in order to understand when or how any given institutional model might be most appropriately designed for AI.
Direction 2: Multilateral AI treaties without institutions
While our review has focused on international AI governance proposals that would involve the establishment of some forms of international institutions, there are of course other models of international cooperation. Indeed, some types of treaties do not automatically establish distinct international organizations[ref 203] and primarily function by setting shared patterns of expectations and reciprocal behavior amongst states in order (ideally) to become self-enforcing. As discussed, our literature review omits discussing this type of regime. However, analyzing them in combination with the models we have outlined could be useful to determine international governance alternatives for AI, including whether or when state initiatives to establish such multilateral normative regimes that lack treaty bodies would likely be effective or might likely fall short.
Such an analysis could draw from a rich vein of existing proposals for new international treaties on AI. There have of course been proposals for new treaties for autonomous weapons.[ref 204] There are also proposals for international conventions to mitigate extreme risks from technology. Some of these, such as Wilson’s “Emerging Technologies Treaty”[ref 205] or Verdirame’s Treaty on Risks to the Future of Humanity,[ref 206] would address many types of potential existential risks simultaneously, including potential risks from AI.
Other treaty proposals are focused more specifically on regulating AI risk in particular. Dewey discusses a potential “AI development convention” that would set down “a ban or set of strict safety rules for certain kinds of AI development.”[ref 207] Yet others address different types of risks from AI, such as Metzinger’s proposal for a global moratorium on artificial suffering.[ref 208] Carayannis and Draper discuss a “Universal Global Peace Treaty” (UGPT), which would commit states “not to declare or engage in interstate war, especially via existential warfare, i.e., nuclear, biological, chemical, or cyber war, including AI- or ASI-enhanced war.” They would see this treaty supported by a separate Cyberweapons and AI Convention, which would commit as its main article that “each State Party to this Convention undertakes never in any circumstances to develop, produce, stockpile or otherwise acquire or retain: (1) cyberweapons, including AI cyberweapons; and (2) AGI or artificial superintelligence weapons.”[ref 209]
Notwithstanding these proposals, there are significant gaps in the scholarship surrounding the design of an international treaty for AI regulation. Some issues that we believe should be explored include, but are not limited to, the effects of reciprocity on the behavior of state parties, the relationship between the specificity of a treaty and its pervasiveness, the success and adaptability of the framework convention model (a broad treaty and protocols which specify the initial treaty’s obligations) in accomplishing their goals, and adjudicatory options for conflicts between state parties.
Direction 3: Additional institutional models not covered in detail in this review
There are many other institutional models that this literature review does not address, as they are (currently) rarely proposed in the specific context of international AI governance. These include, but are not limited to:[ref 210]
- Various international non-governmental organizations (NGOs), e.g., the World Wide Fund for Nature (WWF), Amnesty International (AI);
- Political and economic unions, e.g., Association of Southeast Asian Nations (ASEAN);
- Military alliances that establish security guarantees and/or political, economic, and defense cooperation, e.g., the North Atlantic Treaty Alliance (NATO), the Shanghai Cooperation Organisation (SCO), or the Collective Security Treaty Organization (CSTO);
- International courts and tribunals, e.g., the International Criminal Court (ICC), various regional courts of human rights (African Court on Human and Peoples’ Rights, the European Court of Human Rights, the Inter-American Court of Human Rights);
- Interstate arbitral and dispute settlement bodies, e.g., the International Court of Justice (ICJ); the WTO Appellate Body, which hears disputes by WTO Members; the International Tribunal for the Law of the Sea (ITLOS), which is one of the dispute resolution mechanisms for the UN Convention on the Law of the Sea (UNCLOS); the Permanent Court of Arbitration (PCA), which resolves disputes arising out of international agreements between member states, international organizations, or private parties; or the European Nuclear Energy Tribunal, which oversees nuclear energy disputes within the OECD;
- Cartels aimed at articulating, aggregating, and securing the (economic) interests of their members, e.g., the Organization of the Petroleum Exporting Countries (OPEC), whose members cooperate to reduce market competition but whose operations may be protected by the doctrine of states immunity under international law;
- Policy implementation and/or direct service delivery organizations, e.g., the United Nations Development Programme (UNDP) or the World Bank;
- Data gathering and dissemination organizations, e.g., the World Meteorological Organization’s (WMO) climate data monitoring or the Food and Agriculture Organization (FAO) gathering of statistics on global food production;
- Post-disaster response and relief organizations, e.g., The World Food Programme (WFP) or the International Committee of the Red Cross (ICRC);
- Capacity-building and training organizations, e.g., governmental capacity-building trainings offered by the United Nations Institute for Training and Research (UNITAR), fiscal management training programs offered by the International Monetary Fund (IMF), or border-control trainings provided by the International Organization for Migration (IOM);
- Norm promotion organizations, e.g., the UNESCO World Heritage site program or UNHCR advocacy for refugee rights;
- Awareness-raising organizations, e.g., the Joint United Nations Programme on HIV/AIDS (UNAIDS), which amongst others organizes World AIDS Day; and
- “Meta”-organizations which aim to support or enhance the activities of other existing international organizations in general, e.g., the United Nations Office for Project Services (UNOPS).
Accordingly, future lines of research could focus on exploring what such models could look like for AI and what they might contribute to international AI governance.
Direction 4: Compatibility of institutional functions
There are multiple instances of compatibility between the functions of institutions proposed by the literature explored in this review. Getting a better sense of those areas of compatibility could be advantageous when designing an international institution for AI that borrows the best features from each model rather than copying a single model. Further research could explore hybrid institutions that combine functions from several models. Some potential combinations include, but are not limited to:
- Comprehensive scientific consortia, which could combine elements from scientific consensus-building institutions, international joint scientific programs, and (scientific) benefit-distributing institutions;
- Full-spectrum consensus-building fora, which could combine elements from scientific consensus-building with political consensus-building institutions and potentially with stabilization and emergency-response institutions;
- Integrated regulator institutions, which could combine elements from regulatory and policy coordinator institutions with monitoring and verification institutions; and
- Centralized control institutions, which could combine elements from nonproliferation, export-control institutions, with access-controlling institutions, and potentially with monitoring and verification institutions.
Direction 5: Potential fora for an international AI organization
This review omits establishing patterns among different proposals on their preferred fora to negotiate or host an international AI organization. While we do not expect there to be much commentary on this, it might be a useful additional element to take into consideration when designing an international AI institution. For example, some fora that have been proposed are:
- The United Nations could establish a UN specialized agency through state negotiations or initiative at the UN General Assembly.[ref 211] For instance, as seen above, Kemp and others call for a UN AI Research Organization (UNAIRO).[ref 212]
- Regional organizations, such as the European Union, the Organization of American States, the African Union, or ASEAN, could pioneer regional regulatory regimes that exert indirect extraterritorial effects on global AI governance. The European Union in particular has proven to be effective at indirectly regulating industries at a global level through the so-called Brussels Effect.[ref 213] Siegmann and Anderljung suggest that the EU AI Act could have a similar effect on the global AI industry.[ref 214]
- Similarly, minilateral club organizations like the G7, BRICS, the G20, or the OECD could play a similar role, bringing together like-minded countries to negotiate an international governance framework for AI that other states can then join.[ref 215]
- Public-private partnerships or coalitions between state and non-state actors, such as the Lysøen Initiative on human security[ref 216] or the Christchurch Call, an initiative (led by France and Aotearoa New Zealand) on eliminating online terrorist and violent extremist content,[ref 217] which can organize a coalition of like-minded states and actors to pursue the negotiation of new treaties, where necessary outside of UN fora.
- Gradual formalization of initial informal institutions: in some cases, organizations that are initially established in an informal configuration could lay the foundation for formal frameworks for cooperation, as happened with the gradual transformation of the General Agreement on Tariffs and Trade (GATT) into the WTO, and which Erdélyi and Goldsmith suggest as one route that could be taken by an International Artificial Intelligence Organization.[ref 218]
This does not exhaust the available or feasible avenues, however. In many cases, significant additional work will have to be undertaken to evaluate these pathways in detail.
Conclusion
This literature review analyzed seven models for the international governance of AI, discussing common examples of those models, underexplored examples, specific proposals of their application to AI in existing scholarship, and critiques. We found that, while the literature covers a wide range of options for the international governance of AI, most of the time proposals are vague and do not develop the specific attributes an international institution would need to have in order to garner the benefits and curb the risks associated with AI. Thus, we proposed a series of pathways for further research that we expect should contribute to the design of such an international institution.
Also in this series
- Maas, Matthijs, ‘AI is like… A literature review of AI metaphors and why they matter for policy.’ Institute for Law & AI. AI Foundations Report 2. (October 2023). https://www.law-ai.org/ai-policy-metaphors
- Maas, Matthijs, ‘Concepts in advanced AI governance: A literature review of key terms and definitions.’ Institute for Law & AI. AI Foundations Report 3. (October 2023). https://www.law-ai.org/advanced-ai-gov-concepts
- Maas, Matthijs, ‘Advanced AI governance: A literature review.’ Institute for Law & AI, AI Foundations Report 4. (November 2023). https://www.law-ai.org/advanced-ai-gov-litrev
International governance of civilian AI
Abstract
This report describes trade-offs in the design of international governance arrangements for civilian artificial intelligence (AI) and presents one approach in detail. This approach represents the extension of a standards, licensing, and liability regime to the global level. We propose that states establish an International AI Organization (IAIO) to certify state jurisdictions (not firms or AI projects) for compliance with international oversight standards. States can give force to these international standards by adopting regulations prohibiting the import of goods whose supply chains embody AI from non-IAIO-certified jurisdictions. This borrows attributes from models of existing international organizations, such as the International Civilian Aviation Organization (ICAO), the International Maritime Organization (IMO), and the Financial Action Task Force (FATF). States can also adopt multilateral controls on the export of AI product inputs, such as specialized hardware, to non-certified jurisdictions. Indeed, both the import and export standards could be required for certification. As international actors reach consensus on risks of and minimum standards for advanced AI, a jurisdictional certification regime could mitigate a broad range of potential harms, including threats to public safety.