Measuring and mapping

[Cross-posted from]

I’ve been thinking about Justin Smith’s post Philosophometry, with its reference to Franco Moretti’s Graphs, Maps, Trees: Abstract Models for Literary History, and more generally to “the value of quantitative, digitally based study” of the texts one is interested in. There is, as Smith says, a good deal of such discussion of such approaches in the humanities, if not in philosophy — this is part of what goes on under the name of ‘digital humanities’. This is something by which I’ve been persistently intrigued, despite never really doing anything about it.

There is a problem — at least a practical one — with the approach Smith has in mind. One apparently needs “to compile a massive database of texts, titles, key words [and] key arguments”. But how do we do this? Generating a database in this way apparently requires a good deal of interpretation. Do we have to commit to close reading of everything, before we can do the data analysis? If the project is to map the locations of occurrence of certain views, then probably yes. But is there the same necessity in all ‘digital humanities’ approaches to history of philosophy?

One paper that has attempted an approach of this sort in the history of modern philosophy, with explicit reference to Moretti, is Shaun Nichols’ ‘The Rise of Compatibilism: A Case Study in the Quantitative History of Philosophy’ (Midwest Studies in Philosophy 31 (2007), 260-70) [pdf]. And Nichols addresses this problem:

Whereas for Moretti, the unit of analysis is the book, for our project, the relevant unit will be the philosopher. And rather than sort philosophers into genres, we will place them into the space of philosophical positions. How will we determine which philosophers go into which slots? It will not do for the quantitative historian of philosophy to decide on his own where the philosophers are located in philosophical space. For that will be a source for something like experimenter bias. Rather, the quantitative historian should rely on the work of others to determine where the philosophers lay in the philosophical geography. There is, however, an obvious problem with this method. Historians of philosophy notoriously disagree about how to interpret the philosophers. That’s their bread and butter. Indeed, from an outsider’s perspective, it can seem that the surest path to fame as a historian of philosophy is to make some outrageously heterodox interpretation seem plausible. Descartes isn’t a rationalist, Berkeley isn’t an idealist, Kant isn’t a Kantian. For doing quantitative history of philosophy, we want to avoid such controversies. Ideally, we will want to use the dominant interpretations among the experts in the area. The expectation is that if we get a good number of philosophers into our sample, it will not matter all that much if some of the standard interpretations we use are mistaken. Of course, if most standard interpretations are wrong, that spells real trouble for the quantitative history of philosophy. But if we are so bad at interpretation, this spells trouble for history of philosophy quite generally (Nichols, 262-3).

Nichols’s approach here relies on dominant interpretations. But one of the appealing features of doing this sort of thing (‘distant reading’) is being able to take some account of large numbers of texts, many of which have not been the subject of large amounts of close interpretive attention. In such cases there may not be a genuinely dominant view. There may not be any view at all. Nichols himself ends up considering twenty early modern figures who addressed free will. This is, as he notes himself, not a random sample (268, n18). It’s also just not a very big one. This is more like analysing the canon than analysing the much larger array of philosophical texts (or philosophers) of early modern Europe. That’s not to say there’s nothing to be learned here. And this approach does solve the problem of the need to do an overwhelming amount of close reading to generate the initial database. But it solves it at the expense of one of the attractive features of data-driven approaches, the ability to take at least some account of unusually large numbers of texts.

But this is all about approaches that require interpretation before doing the “quantitative, digitally based study”. There are people out there doing things that don’t really require that. The alternative is to do that study using texts themselves, rather than some encoded information about the texts. For an example, see this talk by Glenn Roe on ‘text-mining’, with reference to Diderot and the Encyclopédie: slides and summary; recording; and related paper on investigating the sources of the Encyclopédie. There’s also a whole host of people interested in topic-modeling in the humanities (see for instance this guide to topic modeling for humanists).

All of this is quite far from writing papers that offer new interpretations of famous arguments in canonical texts. And perhaps one main point of this post is motivational rather than argumentative — I should get on and work on some small project in this realm myself, to see where it leads, rather than persistently being intrigued from the outside. Any thoughts?