Dennis Yi Tenen Ø about books notes projects publications service talks teaching

Drowning in a Mechanical Chorus (for the Issues in Science and Technology)

Letter to the Editor
Issues in Science and Technology
Volume XL.3 Spring 2024

In her thoughtful essay, “How Generative AI Endangers Cultural Narratives” (Issues, Winter 2024), Jill Walker Rettberg writes about the potential loss of a beloved Norwegian children’s story alongside several “misaligned” search engine results. The examples are striking. They point also to even more significant challenges implicit in the framing of the discussion.

The fact that search results in English overwhelm those in Norwegian, which has far fewer global speakers, reflects the economic dominance of the American technology sector. Millions of people, from Moldova to Mumbai, study English in the hope of furthering their careers. English, despite, and perhaps because of, its willingness to borrow from other cultures, including the Norse, has become the de facto lingua franca in many fields, including software engineering, medicine, and science. The bias toward English in the search therefore reflects the socioeconomic realities of the world.

Search engines of the future will undoubtedly do a better job in localizing the query results. And the improvement might come exactly from the kind of tightly curated machine learning datasets that Rettberg encourages us to consider. A large language model “trained” on local Norwegian texts, including folk tales and children’s stories, will serve more relevant answers to a Norwegianspeaking audience. (In brief, large language models are trained, using massive textual datasets consisting of trillions of words, to recognize, translate, predict, or generate text or other content.) But—and here’s the crucial point—no amount of engineering can make a model more fair or more equitable than the world it is meant to represent. To improve it, we must improve ourselves. Technology encodes global politics (and economics) as they are, not as they should be. And we humans tend to be a quarrelsome bunch, rarely converging on the same shared vision of a better future.

The author’s conclusions suggest we consider a further, more troubling, aspect of generative AI. In addition to the growing dominance of the English language, we have yet to contend with the increasing mass of machine-generated text. If the early large language models were trained on human input, we are likely soon to reach the point where generated output far exceeds any original input. That means the large language models of the future will be trained primarily on machinegenerated inputs. In technical terms, this results in overfitting, where the model follows too closely in its own footsteps, unable to respond to novel contexts. It is a difficult problem to solve, first because we can’t really tell human and machine-generated texts apart, and second, because any novel human contribution is likely to be overwhelmed by the zombie horde of machine outputs. The voices of any future George R. R. Martins or Toni Morrisons may simply drown in a mechanical chorus.

Will human creativity survive the onslaught? I have no doubt. The game of chess, for example, became more vibrant, not less, with the early advent of artificial intelligence. The same, I suspect, will hold true in other domains, including the literary— where humans and technology have long conspired to bring us, at worst, some countless hours of formulaic entertainment, and, at their collaborative best, the incredible powers of near-instantaneous translation, grammar checking, and sentence completion—all scary and satisfying in any language.

made w/ vim + markdown + jekyll + tachyons + github pages CC BY-SA 2024