topic modelling – June Oh

It has been almost seven years since Moretti’s radical claim disturbed the humanities. The fundamentals of reading a text, “close-reading”–which I just taught my undergraduates last week and will continue to do so, is now being questioned. Given that it is humanities’ job to challenge whatever is the “fundamental,” it may not be so surprising that we are now questioning our own method of study. But this questioning is more than a challenge to the method; it is an examination of our approach to art and ultimately what we are trying to get out of this field.

I can’t tell for sure how much text a scholar must read but the comps list (comprehensive exam list: a list of texts you propose to read in order to write your doctoral dissertation) has around 150 works of literature (both primary and secondary). Say you are an avid reader and read around 150 during your BA and MA, and 150 more taking graduate seminars and for other responsibilities (including pleasure-reading). As a second-year PhD student about to compose my comps list, I’m around 450 (given that I already read the text for my comps exam). As NYT puts it, Moretti asks, “so what?” Even if you are an avid reader and read 1,000 texts, what about the other texts?

My immediate response would be it is not about the numbers of texts you read but about what each text represents and about how to exercise your own critical eyes on that text. Right? But again, even if I do not totally agree with Moretti, there is something insightful about the so-called “distant reading.” I do believe some concerns about quantitative approach to literature such as “how can numbers represent the “infinate variation of human perception” (Drucker, 2017)?” But they do provide an additional perspective. They do not have to be the alternative but another axis that humanities might consider.

I have recently been to a distant reading workshop here at MSU with Ted Underwood. His project involves machine learning which is basically a program which learns from examples rather than definitions and finds patters that might be meaningful to a looking eye. For example, his recent publication “The Life Cycles of Genres” investigates the development of genres using word frequency calculating program and finds patterns. When did the “Gothic” give way to, say (if), science fiction? In order to answer such question, one might start with defining what Gothic is. Despite decades of scholar’s effort, however, it is not easy to grasp what Gothic really is. Instead of finding a “definition” of this elusive genre, quantitative method finds a list of text, the corpus, and finds patterns that occur throughout. As Underwood admits, this might find things that affirm our previous findings if it does not come up with “novel” findings. But isn’t it what we “traditional” scholars do anyways? And at least, we will know, quantitatively, it is proved.

click to view Ted Underwood’s new publication “The Life Cycles of Genre” (Image from the URL below)

The most intriguing part about Underwood’s talk was how he accepts these models as they contain biases and utilizes that fact for his analysis. Like how would one corpus set developed by librarians as “science fiction” would differentiate the result from another data set by goodreads? This talk made me think maybe it isn’t just about the distant reading approach itself but the mere lack of study that is already out there that makes it seem troubling. The same would’ve gone for close-reading. If there is only a hundreds of scholars analyzing 19th century novel, their consensus, if there is one, about that time period would not mean much. If we have various data sets questioning or at least bringing in multiple assumptions/bias/social norms, it would be the start of looking at art in a meaningful way.

Tag: topic modelling

“The Mechanic Muse”: Text Analysis and Distant Reading: Ted Underwood