Please find below my response to A Questionnaire on Art and Machine Learning published in the current issue of October magazine. The text was also translated into German and published as "Ein Begehren namens Synthese," in TEXT+KRITIK: Das Subjekt des Schreibens--Über Große Sprachmodelle, edited by Hannes Bajohr and Moritz Hiller.
Today's artificial intelligence is a tool for generating new numbers from patterns in massive piles of old numbers. Given the recent ebullience around AI, it's important not to lose sight of this. These tools are no doubt dazzling, but they are essentially next-word predictors, or next-pixel predictors. I stress today's because the history is important. Modern research into artificial intelligence began, in the decades after World War II, by using approaches grounded in logic and symbolic rationality. After this early approach largely failed, leading to an "AI winter," engineers eventually retooled with data-driven and empirical methods. Concurrent with this new wave came an unprecedented proliferation of human data via emailing, blogging, the authoring of HTML, the snapping of digital photos, etc., much of which was posted publicly or accessible internally to the cloud platforms that hosted it all. This data furnished the fuel for today's data-centric AI.
One consequence of this history is a shift in the balance between data and algorithms. Software development entails a variety of different kinds of input data (global variables, input files and databases, graphical elements for the user interface, essentially anything that can't be generated procedurally). At the same time, development requires a complex set of procedures (function calls, simple arithmetical and logical operations, if/then control structures). For many years, the normal way to do software development was to have a relatively small amount of data and a relatively large number of procedures. "Normal" is often a contested word, to be sure. But I mean everything from when Linus Torvalds built the Linux kernel to when Cory Arcangel wrote the assembly code for Super Mario Clouds. Today's AI essentially rearranges the previous proportion. Instead of a few variables and data inputs appended to a more prolonged set of procedures, we find massive amounts of data paired with a relatively small codebase. Sure, the code repository at OpenAI or Google is large, but their data stores are almost immeasurably larger. In fact, you or I could program a simple machine-learning algorithm in just a few hundred lines of code. Today's AI is not algorithmically elaborate, even if it remains data intensive. The data is heavy and the procedures are light. Continue reading