Ted Underwood responded to my post on Jane Austen’s style–pointing out the prevalence of adverbs, “to be” constructions, and terms of certainty–by raising the issue of baseline comparisons: “I’d like to know whether this is something about Austen in particular, or whether it’s a characteristic feature of a period/genre. I don’t intuitively know which is more likely.”
Let’s explore! I’m again using Ted’s corpus and software, comparing a given author’s work to the whole corpus. This file is a transcript of the commands and output I’m interpreting below.
I thought the most conventional guess of an author to produce results similar to Austen would be Maria Edgeworth. Here’s the list for her:
WORDS OVERREPRESENTED BY MANN-WHITNEY RHO 1 understand 0.937 271 2 recollect 0.923 309 3 talking 0.916 127 4 know 0.916 523 5 could 0.913 754 6 provoking 0.912 41.9 7 nonsense 0.911 62.3 8 perfectly 0.905 119 9 explain 0.903 192 10 continually 0.889 95.4 11 tired 0.888 76 12 going 0.888 205 13 do 0.884 586 14 dear 0.88 792 15 sorry 0.879 79.5 16 satisfied 0.879 93.8 17 yesterday 0.879 48.9 18 liked 0.875 48.1 19 spoiled 0.874 19.6 20 directly 0.869 77.2 21 quite 0.869 136 22 please 0.868 182 23 you 0.868 2467 24 repeated 0.868 233 25 decide 0.866 101 26 afraid 0.864 148 27 repeating 0.862 52.7 28 thank 0.862 115 29 manage 0.86 44 30 guess 0.86 97.8 31 sure 0.859 290 32 ashamed 0.857 35.4 33 put 0.856 140 34 admiration 0.855 90.5 35 disappointed 0.855 44.8 36 surprised 0.855 75.6 37 tiresome 0.853 37.2 38 especially 0.853 76.3 39 not 0.853 802 40 reading 0.853 80.1 41 dressing 0.852 9.04 42 said 0.852 2783 43 formerly 0.851 50 44 understanding 0.851 103 45 possible 0.85 157 46 because 0.85 261 47 really 0.85 125 48 any 0.85 632 49 saw 0.85 183 50 think 0.85 173
My unsystematic eyeballs see no forms of “to be” and far fewer adverbs than populated Austen’s list. Terms of cognition seem especially prominent:
WORDS OVERREPRESENTED BY MANN-WHITNEY RHO 1 understand 0.937 271 2 recollect 0.923 309 4 know 0.916 523 9 explain 0.903 192 25 decide 0.866 101 30 guess 0.86 97.8 44 understanding 0.851 103 50 think 0.85 173
What about Charlotte Lennox? Her list has “extremely” and “wholly” in the first and sixth places, but only one other “-ly” adverb (“instantly” at #29). Lennox’s vocabulary emphasizes the dynamics of sociability. Highlights:
WORDS OVERREPRESENTED BY MANN-WHITNEY RHO 2 civility 0.97 117 7 amiable 0.959 353 8 accompany 0.959 55.8 11 conversation 0.957 258 12 behaviour 0.954 419 13 mortified 0.949 34.6 14 mortification 0.948 113 15 received 0.945 119 18 amusements 0.939 32.3 19 entreaties 0.937 54.9 20 apprehensions 0.937 89.4 21 attentions 0.936 70.9 27 conduct 0.929 195 28 insisted 0.928 80.6 29 instantly 0.927 209 30 countenance 0.925 123 31 situation 0.924 260 33 visit 0.923 107 35 arrival 0.922 83.5 36 acknowledged 0.92 53 37 reception 0.92 46.8 38 circumstance 0.919 98.7 41 relations 0.917 84.3 42 letter 0.916 312 43 politeness 0.916 110 44 shocked 0.914 89.2 45 accident 0.913 74.1 46 inform 0.913 74.8 47 acquaintance 0.912 131 50 ordered 0.91 66.6
Walter Scott’s list of 50 (using only his fiction for the sake of comparison) includes only three adverbs, none in his top 30, and the highest-ranking is an adverb of action: “hastily.” Scott’s list evokes military contexts and especially hierarchies of authority:
1 answered 0.958 2519 4 warrant 0.944 501 8 risk 0.93 263 13 permit 0.914 247 14 trusty 0.913 169 19 weapon 0.905 235 22 boot 0.902 127 23 followers 0.898 505 27 domestics 0.897 122 30 commanded 0.895 222 32 courtesy 0.894 262 33 quarrel 0.893 183 34 kinsman 0.892 432 35 assistance 0.892 248 37 saddle 0.891 109 43 displeasure 0.89 123 44 attendance 0.889 162 47 willingly 0.889 170
Hannah More’s list (again, using only her fiction) is unsurprisingly packed with religious terminology, and I see little overlap between her list and the others.
If you want motion in your novel, open your James Fenimore Cooper:
WORDS OVERREPRESENTED BY MANN-WHITNEY RHO 1 movements 0.979 903 3 movement 0.97 576 4 direction 0.961 579 6 commenced 0.958 374 8 companion 0.952 645 18 distance 0.915 552 20 quest 0.913 190 21 returned 0.913 829 27 companions 0.902 268 37 disappeared 0.894 137 38 preparations 0.893 93.3 39 placing 0.893 74.7 40 position 0.892 168
At this point, I think we have at least a preliminary answer to our question: the prevalence of adverbs and so forth in Austen’s works is indeed characteristic of Austen herself, rather than her period or genre.
This little exploration was great fun for me, as the results returned a mix of new insights–particularly about Austen and Edgeworth–and reassuring common-sense confirmation that the tool identifies the characteristic thematic emphases of Scott and More. In a follow-up post, I’ll offer some quick thoughts about other uses of this kind of word-frequency analysis, from the perspective of a beginning user with a pedagogical emphasis.