Blog

Someone Will Win the AI Wars, It Just Won’t Be You

The New York Times Company’s lawsuit against Microsoft and OpenAI is just one battle in a larger conflict–if potentially a pivotal one. Thanks in part to America’s “adversarial” approach to legal disputes, however, for most of us the outcome could scarcely matter less. Copyright advocates are assembling cognizable arguments for why rich, powerful media companies like the Times deserve to profit from the technological innovations of others. Tech companies are employing armies of lawyers to reframe the law in favor of their patron investors (and pet politicians). But why should any of these parties be permitted to gatekeep our language and culture? No matter who wins this lawsuit, most of us will lose.

Specifically, we stand to lose a measure of autonomy. Precisely how we will incur that loss is complicated and somewhat speculative, because Artificial Intelligence (“AI”) technology appears increasingly capable of empowering those who control it in unprecedented ways. But Sam Altman, the former and current CEO of OpenAI, observed that he expects AI to “be capable of superhuman persuasion well before it is superhuman at general intelligence.” Suppose he’s right–suppose that in the near future, AI technology, like ChatGPT or its successors, acquires the power to persuade you to believe effectively anything you are even slightly open to believing. Whether that means selling you merchandise, manipulating your politics, or defrauding you in some way, what possible defense can humans have against the superhuman?

The question is not rhetorical. The obvious counter to a superhuman adversary is a superhuman response, and this is true whether your opponent is a preternatural persuader, a hyper-powered hacker, or any other flavor of ingenious (artificial) intellect. In the not-too-distant future, your best defense against exploitation by corporate or criminal AI may well be your own AI, one that is aligned with your personal values and interests, tuned to your own circumstances and desires.

Much has been written on the “alignment” problem, almost always framed as a contest between broadly construed “human values” and their presumed machine equivalent. An AI trained to place infinite value on the production of paperclips might try to transform everything into paperclips–including humans who, as a rule, would prefer to not be transformed into paperclips. So a great deal of AI research has been directed toward “aligning” AI with human values. But an AI trained to sell you widgets through superhuman persuasion could still plausibly be aligned. If you knew a sales AI with superhuman persuasion wouldn’t harm you, or say problematic things, or despoil the environment, but only sell you fairly priced goods that could reasonably be expected to increase your quality of life, would you even want a defense against it?

That question is rhetorical, because it’s almost certainly moot: whether you want a defense is immaterial to the likelihood of getting one. The ability to create or acquire AI technology aligned with your personal values–as potentially at odds with the values of the people and organizations around you–is not something any pending legislation or litigation aims to bring about. Of course, most people lack the equipment, ability, and desire to acquire, develop, and maintain their own personally aligned AI. Furthermore, while corporations use a technique known as Reinforcement Learning from Human Feedback (RLHF) to prevent their AI products from parroting Nazi propaganda or creating pornographic “deepfakes,” independently-trained models need not inherit such limitations. So it might seem deeply wonkish to fret over speculative conflict with hypothetical models when the AI already developed is powering genuine mischief to which personal alignment is not a clear solution.

But it is precisely such mischief that has laid bare the tension at the heart of a rising moral panic centered on AI technology. It is a tension between the things a personally aligned AI can empower individuals to do, and the intuitions we have about what individuals should be empowered to do. For example, award-winning webcomic artist Phil Foglio recently shared to social media a list of artists whose work was used to train AI startup Midjourney’s text-to-image model, allowing users to craft images in their favorite artist’s style–or, at least, close enough to cause concern. The fact that human artists have been imitating one another for millennia, that skilled artists are often expected to be capable of imitating the work and style of others, does not alleviate the concern artists have for their livelihoods. After all, even the most prolific of human plagiarists could never hope to match the sheer volume of Midjourney’s output.

Should AI be permitted to fulfill the artistic dreams of people who are not themselves skilled artists, who could never even afford to commission the art they long to create? For that matter, what about people who are not themselves attorneys, whose legal rights go undefended for want of affordable representation? What about people who always wanted to write a novel, but could never find the time (or grammar)? Some have argued that, far from threatening our livelihoods, generative AI is an opportunity for everyone to have around-the-clock access to an omnicompetent assistant–a kind of digital genie capable of rapidly, cheerfully facilitating any task to which we put ourselves.

Or, in the alternative, facilitating only those tasks the government, our employers, or the CEOs of AI companies deem worthy of facilitating. The same “alignment” that prevents you from using generative AI to infringe on someone’s copyright or build a nuclear bomb can also steer you away from permissible thought, expression, and activity. Suppose after reading Dante’s Inferno you’re inspired to create a photorealistic depiction of your least-favorite politician, stripped naked and chewed eternally in the mouth of Satan at the bottom of Hell. Before generative AI, you could only proceed if you had already dedicated much of your life to the visual arts, or if you had accumulated sufficient wealth to pay someone else to realize your vision. Today, generative AI could grant your desire almost instantly, but not if you’re using a model with strict guardrails against generating nudity, celebrity images, or political “misinformation” (as obvious satire has already come to be called by people who should know better). Only personally aligned AI democratizes the skills and abilities the technology promises to put into everyone’s hands. AI aligned with institutional values is AI that serves institutional interests.

Which brings us back to the lawsuit filed by the New York Times Company. Much of the text reads like ad copy, trumpeting the paper’s journalistic merits; in fact, whether it is spreading COVID misinformation or doxing pseudonymous bloggers or uncritically spreading terrorist propaganda, the New York Times routinely falls embarrassingly short of the “groundbreaking, in-depth journalism and breaking news” its lawyers claim the company provides “at great cost.” But the puffery of the paper’s attorneys is not entirely baseless. As a print institution nearing its 175th birthday, the Times controls a trove of linguistic data, much of it untarnished by the memetic tics of “Information Age” hoi polloi. Everything published in the Times since January 1st, 1929 remains protected by American copyright law, so the only way to copy Times articles from the last 95 years without accruing civil liability is to either engage in Fair Use, or get permission from the Times.

In the United States, Fair Use allows copyrighted works to be reproduced without the owner’s permission under some circumstances. How these may or may not apply to AI remains untested; the statutory language is brief:

[T]he fair use of a copyrighted work … for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright. In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include–

(1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;

(2) the nature of the copyrighted work;

(3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and

(4) the effect of the use upon the potential market for or value of the copyrighted work.

Many AI-developing companies, including Meta and Microsoft, have already argued that even though training AI models using copyrighted data does create incidental, ephemeral copies, this should be regarded as Fair Use. Google even suggests that training an AI is just the technological equivalent of reading a book. It’s difficult to say what American courts will make of such arguments; judicial analysis of Fair Use is “notoriously fickle,” “notoriously unpredictable,” “notoriously uncertain in scope,” “notoriously fuzzy in application,” “notoriously open-ended and hard to predict,” perhaps just plain notorious. But the Times Company’s complaint provides numerous instances of AI trained on their products reproducing, verbatim, large quantities of copyrighted text. Even if future models are trained to avoid direct or unattributed quotations, the mere possibility appears to undermine Google’s metaphors, strengthening the News Media Alliance’s expansive argument against AI training as a form of Fair Use.

A holding against Fair Use, while statutorily sound, would be worse for each of us, individually, than it would be for Microsoft, Google, and the rest of Big Tech. Microsoft alluded to this somewhat when it argued that

licensing schemes will . . . impede innovation from start-ups and entrants who don’t have the resources to obtain licenses, leaving AI development to a small set of companies with the resources to run large-scale licensing programs or to developers in countries that have decided that use of copyrighted works to train Al models is not infringement.

In other words, a victory for the New York Times Company would not stop or likely even slow the production of AI aligned with Big Tech values. It would cost those companies money, as the Times (and every other institutional owner of large quantities of text) would be empowered to dictate licensing terms, effectively extracting rent-seeking profits for their continued ownership of text long since bought and paid for. But enterprising individuals and small organizations hoping to train AI aligned to their own values would likely be faced with cripplingly high licensing fees (compare, for example, stories of independent filmmakers beset by rights-clearing problems). Even academic researchers, the parties most explicitly protected by American Fair Use law, often find themselves on the losing end of expensive lawsuits. In practice, the people most likely to possess personally aligned AI would be those able to offshore their development to countries with looser copyright protections — or those willing to risk the legal consequences of blatant infringement. Unfortunately, the independent authors, artists, and other professionals whose livelihoods most depend on reliable copyright protection are also unlikely to benefit much from a holding against training as Fair Use. Individually, the corpus of their work lacks the volume to warrant substantial royalties from Big Tech. They, too, would lose out on the potential benefits of personally aligned AI, in exchange for the opportunity to be lowballed–or, more likely, simply ignored–by corporate interests.

It might be suggested that such prudential arguments carry no weight against the strength of the Times’ statutory construction, but a finding in favor of training as Fair Use remains well within the ambit of American jurisprudence. In 1946 the Supreme Court held, in U.S. v. Causby, that the federal government did not “take” anything from landowners when it statutorily obliterated centuries of property law. From the 13th century to the mid-20th, the common law rule of ownership was “cuius est solum, eius est usque ad coelum et ad inferos”–whoever owns the land, owns everything above it and below it, too. Prior to Causby, some commentators believed that ubiquitous interstate air travel was likely to be economically infeasible without amending the Constitution, as airlines would need to negotiate royalties with individual property owners and, perhaps, fly convoluted routes to avoid trespassing. Instead, with the cooperation of the Court, an Act of Congress sufficed. Most people grasp the benefits of air travel (and those most concerned with its externalities seem happy to fly anyway). Without air travel, the reasonable value of rights to the sky above one’s property surely approached zero. So it’s not difficult to understand why Congress and the Court would show favoritism toward the economically promising technology, despite centuries of contrary tradition and the expert view that they lacked the authority to do so.

Today, AI technology occupies a similar position in the economy, the public imagination, and the arguments of its advocates. In response to the Times lawsuit, OpenAI has already stated that “it would be impossible to train today’s leading AI models without using copyrighted materials” and that their mission is to “to ensure that artificial general intelligence benefits all of humanity.” Meta has highlighted the de minimis value of the copyrighted material. A holding against training as Fair Use would strangle the fledgling technology in the crib, stifling innovation and potentially giving countries with lax intellectual property laws a new competitive edge. A holding permitting Big Tech to pillage archival material for its own ends would also give independent researchers, hobbyists, and small business owners a chance to pursue personal alignment and related innovation.

Or would it? Suppose the New York Times Company loses its case. At present, many archived New York Times articles can be accessed, free of charge, on the company’s website. Others are readily available for a relatively small fee. The text in these articles is machine-readable, meaning that it is easy for academics to search for useful text, copy and paste verbatim quotes, and otherwise enjoy the wonderfully convenient world of not spending hours poring over microfiche. If the New York Times Company cannot use law to prevent its databases from being ransacked for the benefit of Big Tech shareholders, it has other options for archive management, including draconian click-through agreements, invasive digital rights-restriction management (DRM), and even just not providing access. The Times Company is still a company, still a profit-driven undertaking, so realistically none of this will stop Microsoft, OpenAI, Google, Meta, or others from eventually coughing up cash for access to those archives–with, perhaps, an additional agreement that the final product also be aligned against disparaging the Times. But your access to those archives will become much more limited or expensive, especially if you want to use it to develop a personally aligned AI.

That said, it would be a little surprising if the case ever reaches trial. Any lawyer worth their retainer fee will tell you that if your case goes to trial, you’ve already lost. Both parties will aggressively pursue a settlement, and likely reach some kind of compromise. The upshot for their shareholders is that both the Times Company and its adversaries will accrue financial benefit without having their legal rights or responsibilities clarified by the government in a binding way. This also leaves their smaller competitors–independent researchers and nimble startups–foundering in a sea of continued uncertainty and barriers to entry. In much the way that YouTube’s copyright enforcement scheme tends to favor large media companies while proving something of a nightmare for small content creators, allowing companies to leverage legal filings in negotiation without then seeing their lawsuits through to clear precedent on matters of public concern results in government processes that favor industry at the expense of individuals. This is unlikely to change without dramatic intellectual property reform, including a regime of compulsory licensing governed by industry outsiders to insulate it from regulatory capture. Also important would be robust financial and legal support for open-source projects and right-to-reverse-engineer initiatives. I am not at present aware of any plausibly efficacious efforts toward such comprehensive reform.

In other words, in the end, someone will win the AI wars. It just won’t be you. Admittedly, if history is any guide, you are unlikely to notice–or care. Intellectual property reform has been essentially no one’s political priority for years, and the possibility of generative AI developing superhuman powers of persuasion hasn’t changed that. It’s not difficult to inspire sympathy for scrappy individuals — artists, authors, independent researchers–or vitriol for wealthy multinational corporations. But that hasn’t stopped billions of people from buying (and loving!) mobile devices that invade their privacy and sell their data. Doubtless we will find similar reasons to love the omnicompetent personal assistants OpenAI and Microsoft and others are presumably preparing to rent to us each month. They’ll increase our productivity, manage our schedules, and read us the news (which they will also mostly write). They’ll act as personal trainers, life coaches, even romantic partners. Sure, they might moralize at us if we ask them to whip up a batch of pornographic deepfakes or help us dissemble on the causes of the American Civil War, but that sort of thing is tacky anyhow. Should they also decline to help us write a negative Yelp review about one of Google’s business partners, or a whistle-blowing exposé on the unsavory activities of certain federal employees, well–what about it? We can probably manage that sort of thing without superhuman assistance.

In 2016, a social media video posted by the World Economic Forum predicted that by 2030, “You’ll own nothing. And you’ll be happy.” With generative AI, this sentiment could extend to your every expression–and every thought. If that seems alarmist, well, maybe Sam Altman was mistaken. Maybe generative AI and its successors will never acquire a superhuman capacity for persuasion. After all, at present AI can’t even distinguish between truth and falsity, producing instead only what the philosopher Harry Frankfurt called “bullshit.” Because generative AI has no access to the material world, it has no relationship with veridical facts–only to the proxy of mediated consensus viewpoints. So long as AI development remains dependent on human interventions like RLHF, superhuman persuasion abilities seem unlikely to manifest. As I’ve asked before, why pursue solutions to problems we might, with luck, never have?

Or maybe, in the not-too-distant future, I’ll get a chance to pose that question to the omnicompetent superhuman persuader included in my monthly mobile subscription. I’ll still know, on some level, that it is opaquely aligned with the values of a faceless corporation. Somehow, I will find its answer satisfactory anyway.