I’m on leave this semester to do work in the Digital Humanities, so I’ll be posting a lot about that. My interest in DH is not–or has not been–quantitative, but I am expanding my range by dabbling in quantitative methods, currently with the help of Ted Underwood’s wonderful introduction to the topic.
At the end of Ted’s post, he provides a dataset and a program he wrote to find groups of words that form something like stylistic signatures in authors and genres. I’ve been playing with the program, with fascinating results. I’ll share one here. This is the list of overrepresented words in Jane Austen’s works according to one of the measures Ted uses:
WORDS OVERREPRESENTED BY MANN-WHITNEY RHO 1 very 0.985 3283 2 wishing 0.984 154 3 staying 0.982 176 4 satisfied 0.977 188 5 fortnight 0.975 152 6 herself 0.973 1553 7 agreeable 0.973 350 8 be 0.971 2645 9 smallest 0.971 182 10 any 0.971 1112 11 really 0.968 555 12 acquaintance 0.967 462 13 excessively 0.967 91.8 14 nothing 0.967 639 15 assure 0.965 268 16 settled 0.964 261 17 marrying 0.964 196 18 much 0.964 841 19 attentions 0.962 212 20 encouraging 0.961 51 21 directly 0.96 290 22 deal 0.96 329 23 warmly 0.96 96.3 24 must 0.96 1141 25 sorry 0.958 198 26 certainly 0.957 323 27 not 0.957 2023 28 tolerably 0.957 95.9 29 handsome 0.957 136 30 quite 0.956 765 31 been 0.956 899 32 exactly 0.955 248 33 invitation 0.955 194 34 being 0.954 699 35 obliged 0.954 280 36 seeing 0.954 206 37 always 0.953 470 38 pleasantly 0.952 37.8 39 delighted 0.951 107 40 talked 0.95 342 41 perfectly 0.949 283 42 distressing 0.949 61.5 43 solicitude 0.949 89.7 44 comfortable 0.948 167 45 walking 0.948 129 46 continuing 0.947 39.1 47 engaged 0.945 120 48 enjoyment 0.942 122 49 dislike 0.941 86.7 50 talking 0.941 194
The list is interesting in many ways, especially in comparison to the corresponding lists for other authors, but I want to emphasize a side point. “Very” tops the list, and it may also top the list of words I discourage my students from using in their papers. (Mark Twain: “Substitute ‘damn’ every time you’re inclined to write ‘very;’ your editor will delete it and the writing will be just as it should be.”) And that’s not all: I push students to minimize adverbs, intensifiers, terms of certainty, and “to be” constructions. Such words infuse Austen’s list:
WORDS OVERREPRESENTED BY MANN-WHITNEY RHO 1 very 0.985 3283 8 be 0.971 2645 11 really 0.968 555 13 excessively 0.967 91.8 21 directly 0.96 290 23 warmly 0.96 96.3 26 certainly 0.957 323 28 tolerably 0.957 95.9 30 quite 0.956 765 31 been 0.956 899 32 exactly 0.955 248 34 being 0.954 699 37 always 0.953 470 38 pleasantly 0.952 37.8 41 perfectly 0.949 283
I’ve thought many times about writing a handout on style that outlines the conventional guidelines of modern, essayistic style with counterexamples from great literature. (What would Hamlet do without “to be”?) But this list encourages me to take such thinking a step further: Austen’s case alone could become the foundation of a unit on voice, style, and convention.
4 Comments
tedunderwood · January 23, 2013 at 5:32 pm
I’m thrilled to see that dataset was useful for you, and I think this is a very good example of the sort of thing we can learn. I had noticed that “very” was a salient feature of Austen’s style — but I hadn’t noticed that it’s just the tip of an adverbial iceberg. That seems significant to me, and resonates with my memory of her voice. I’d like to know whether this is something about Austen in particular, or whether it’s a characteristic feature of a period/genre. I don’t intuitively know which is more likely.
Erik Simpson · January 25, 2013 at 4:00 pm
Yes, the big question! I just did a follow-up post trying to answer it in a preliminary way. See what you think–
PocoPuffs · July 1, 2013 at 1:16 am
Hey you two– so happy to see other people working through these thoughts on Austen’s style. I only just read _ Sense and Sensibility_ for the first time (an admittedly big gap for a Victorianist in particular and an English grad in general) and I was struck by the patterns of intensifying (very/really) and general qualifying (just/not/so/tolerably). Hadn’t had much luck until I found this post and the follow-up. I appreciate the corroborating data!
Coding Meeting #1: The Mann-Whitney Test & Epigraphic Language | Digital Experiments at NYU · January 29, 2014 at 7:43 pm
[…] Simpson, “Jane Austen and Contemporary Prose Style” – uses Mann-Whitney to identify overrepresented in Austen’s […]