Disclaimer: Based on the announcement of the EO, without having seen the full text.
While I am heartened to hear that the Executive Order on AI uses the Defense Production Act to compel disclosure of various data from the development of large AI models, these disclosures do not go far enough. The EO seems to be requiring only data on the procedures and results of “Red Teaming” (i.e. adversarial testing to determine a model’s flaws and weak points), and not a wider range of information that would help to address many of the other concerns outlined in the EO. These include:
What data sources the model is trained on. Availability of this information would assist in many of the other goals outlined in the EO, including addressing algorithmic discrimination and increasing competition in the AI market, as well as other important issues that the EO does not address, such as copyright. The recent discovery (documented by an exposé in The Atlantic) that OpenAI, Meta, and others used databases of pirated books, for example, highlights the need for transparency in training data. Given the importance of intellectual property to the modern economy, copyright ought to be an important part of this executive order. Transparency on this issue will not only allow for debate and discussion of the intellectual property issues raised by AI, it will increase competition between developers of AI models to license high-quality data sources and to differentiate their models based on that quality. To take one example, would we be better off with the medical or legal advice from an AI that was trained only with the hodgepodge of knowledge to be found on the internet, or one trained on the full body of professional information on the topic?Operational Metrics. Like other internet-available services, AI models are not static artifacts, but dynamic systems that interact with their users. AI companies deploying these models manage and control them by measuring and responding to various factors, such as permitted, restricted, and forbidden uses; restricted and forbidden users; methods by which its policies are enforced; detection of machine-generated content, prompt-injection, and other cyber-security risks; usage by geography, and if measured, by demographics and psychographics; new risks and vulnerabilities identified during operation that go beyond those detected in the training phase; and much more. These should not be a random grab-bag of measures thought up by outside regulators or advocates, but disclosures of the actual measurements and methods that the companies use to manage their AI systems.Policy on use of user data for further training. AI companies typically treat input from their users as additional data available for training. This has both privacy and intellectual property implications.Procedures by which the AI provider will respond to user feedback and complaints. This should include its proposed redress mechanisms.Methods by which the AI provider manages and mitigates risks identified via Red Teaming, including their effectiveness. This reporting should not just be “once and done,” but an ongoing process that allows the researchers, regulators, and the public to understand whether the models are improving or declining in their ability to manage the identified new risks.Energy usage and other environmental impacts.