I’m on leave this semester to do work in the Digital Humanities, so I’ll be posting a lot about that. My interest in DH is not–or has not been–quantitative, but I am expanding my range by dabbling in quantitative methods, currently with the help of Ted Underwood’s wonderful introduction to the topic.

At the end of Ted’s post, he provides a dataset and a program he wrote to find groups of words that form something like stylistic signatures in authors and genres. I’ve been playing with the program, with fascinating results. I’ll share one here. This is the list of overrepresented words in Jane Austen’s works according to one of the measures Ted uses:


WORDS OVERREPRESENTED BY MANN-WHITNEY RHO
1 very 0.985 3283
2 wishing 0.984 154
3 staying 0.982 176
4 satisfied 0.977 188
5 fortnight 0.975 152
6 herself 0.973 1553
7 agreeable 0.973 350
8 be 0.971 2645
9 smallest 0.971 182
10 any 0.971 1112
11 really 0.968 555
12 acquaintance 0.967 462
13 excessively 0.967 91.8
14 nothing 0.967 639
15 assure 0.965 268
16 settled 0.964 261
17 marrying 0.964 196
18 much 0.964 841
19 attentions 0.962 212
20 encouraging 0.961 51
21 directly 0.96 290
22 deal 0.96 329
23 warmly 0.96 96.3
24 must 0.96 1141
25 sorry 0.958 198
26 certainly 0.957 323
27 not 0.957 2023
28 tolerably 0.957 95.9
29 handsome 0.957 136
30 quite 0.956 765
31 been 0.956 899
32 exactly 0.955 248
33 invitation 0.955 194
34 being 0.954 699
35 obliged 0.954 280
36 seeing 0.954 206
37 always 0.953 470
38 pleasantly 0.952 37.8
39 delighted 0.951 107
40 talked 0.95 342
41 perfectly 0.949 283
42 distressing 0.949 61.5
43 solicitude 0.949 89.7
44 comfortable 0.948 167
45 walking 0.948 129
46 continuing 0.947 39.1
47 engaged 0.945 120
48 enjoyment 0.942 122
49 dislike 0.941 86.7
50 talking 0.941 194

The list is interesting in many ways, especially in comparison to the corresponding lists for other authors, but I want to emphasize a side point. “Very” tops the list, and it may also top the list of words I discourage my students from using in their papers. (Mark Twain: “Substitute ‘damn’ every time you’re inclined to write ‘very;’ your editor will delete it and the writing will be just as it should be.”) And that’s not all: I push students to minimize adverbs, intensifiers, terms of certainty, and “to be” constructions. Such words infuse Austen’s list:


WORDS OVERREPRESENTED BY MANN-WHITNEY RHO
1 very 0.985 3283
8 be 0.971 2645
11 really 0.968 555
13 excessively 0.967 91.8
21 directly 0.96 290
23 warmly 0.96 96.3
26 certainly 0.957 323
28 tolerably 0.957 95.9
30 quite 0.956 765
31 been 0.956 899
32 exactly 0.955 248
34 being 0.954 699
37 always 0.953 470
38 pleasantly 0.952 37.8
41 perfectly 0.949 283

I’ve thought many times about writing a handout on style that outlines the conventional guidelines of modern, essayistic style with counterexamples from great literature. (What would Hamlet do without “to be”?) But this list encourages me to take such thinking a step further: Austen’s case alone could become the foundation of a unit on voice, style, and convention.


4 Comments

tedunderwood · January 23, 2013 at 5:32 pm

I’m thrilled to see that dataset was useful for you, and I think this is a very good example of the sort of thing we can learn. I had noticed that “very” was a salient feature of Austen’s style — but I hadn’t noticed that it’s just the tip of an adverbial iceberg. That seems significant to me, and resonates with my memory of her voice. I’d like to know whether this is something about Austen in particular, or whether it’s a characteristic feature of a period/genre. I don’t intuitively know which is more likely.

    Erik Simpson · January 25, 2013 at 4:00 pm

    Yes, the big question! I just did a follow-up post trying to answer it in a preliminary way. See what you think–

PocoPuffs · July 1, 2013 at 1:16 am

Hey you two– so happy to see other people working through these thoughts on Austen’s style. I only just read _ Sense and Sensibility_ for the first time (an admittedly big gap for a Victorianist in particular and an English grad in general) and I was struck by the patterns of intensifying (very/really) and general qualifying (just/not/so/tolerably). Hadn’t had much luck until I found this post and the follow-up. I appreciate the corroborating data!

Coding Meeting #1: The Mann-Whitney Test & Epigraphic Language | Digital Experiments at NYU · January 29, 2014 at 7:43 pm

[…] Simpson, “Jane Austen and Contemporary Prose Style” – uses Mann-Whitney to identify overrepresented in Austen’s […]

Leave a Reply to tedunderwood Cancel reply

Avatar placeholder

Your email address will not be published. Required fields are marked *

css.php