I remember snickering when Chris Anderson announced "The End of Theory" in 2008. Writing in Wired magazine, Anderson claimed that the structure of knowledge had inverted. It wasn't that models and principles revealed the facts of the world, but the reverse, that the data of the world spoke their truth unassisted. Given that data were already correlated, Anderson argued, what mattered was to extract existing structures of meaning, not to pursue some deeper cause. Anderson's simple conclusion was that "correlation supersedes causation...correlation is enough."
This hypothesis -- that correlation is enough -- is the thorny little nexus at the heart of Wendy Chun's new book, Discriminating Data. Chun's topic is data analytics, a hard target that she tackles with technical sophistication and rhetorical flair. Focusing on data-driven tech like social media, search, consumer tracking, AI, and many other things, her task is to exhume the prehistory of correlation, and to show that the new epistemology of correlation is not liberating at all, but instead a kind of curse recalling the worst ghosts of the modern age. As Chun concludes, even amid the precarious fluidity of hyper-capitalism, power operates through likeness, similarity, and correlated identity.
While interleaved with a number of divergent polemics throughout, the book focuses on four main themes: correlation, discrimination, authentication, and recognition. Chun deals with these four as general problems in society and culture, but also interestingly as specific scientific techniques. For instance correlation has a particular mathematical meaning, as well as a philosophical one. Discrimination is a social pathology but it's also integral to discrete rationality. I appreciated Chun's attention to details large and small; she's writing about big ideas -- essence, identity, love and hate, what does it mean to live together? -- but she's also engaging directly with statistics, probability, clustering algorithms, and all the minutia of data science.