More on Jane Austen and stylistic signatures

Ted Underwood responded to my post on Jane Austen’s style–pointing out the prevalence of adverbs, “to be” constructions, and terms of certainty–by raising the issue of baseline comparisons: “I’d like to know whether this is something about Austen in particular, or whether it’s a characteristic feature of a period/genre. I don’t intuitively know which is more likely.”

Let’s explore! I’m again using Ted’s corpus and software, comparing a given author’s work to the whole corpus. This file is a transcript of the commands and output I’m interpreting below.

I thought the most conventional guess of an author to produce results similar to Austen would be Maria Edgeworth. Here’s the list for her:

 WORDS OVERREPRESENTED BY MANN-WHITNEY RHO 
1	understand     	0.937	271	
2	recollect      	0.923	309	
3	talking        	0.916	127	
4	know           	0.916	523	
5	could          	0.913	754	
6	provoking      	0.912	41.9	
7	nonsense       	0.911	62.3	
8	perfectly      	0.905	119	
9	explain        	0.903	192	
10	continually    	0.889	95.4	
11	tired          	0.888	76	
12	going          	0.888	205	
13	do             	0.884	586	
14	dear           	0.88	792	
15	sorry          	0.879	79.5	
16	satisfied      	0.879	93.8	
17	yesterday      	0.879	48.9	
18	liked          	0.875	48.1	
19	spoiled        	0.874	19.6	
20	directly       	0.869	77.2	
21	quite          	0.869	136	
22	please         	0.868	182	
23	you            	0.868	2467	
24	repeated       	0.868	233	
25	decide         	0.866	101	
26	afraid         	0.864	148	
27	repeating      	0.862	52.7	
28	thank          	0.862	115	
29	manage         	0.86	44	
30	guess          	0.86	97.8	
31	sure           	0.859	290	
32	ashamed        	0.857	35.4	
33	put            	0.856	140	
34	admiration     	0.855	90.5	
35	disappointed   	0.855	44.8	
36	surprised      	0.855	75.6	
37	tiresome       	0.853	37.2	
38	especially     	0.853	76.3	
39	not            	0.853	802	
40	reading        	0.853	80.1	
41	dressing       	0.852	9.04	
42	said           	0.852	2783	
43	formerly       	0.851	50	
44	understanding  	0.851	103	
45	possible       	0.85	157	
46	because        	0.85	261	
47	really         	0.85	125	
48	any            	0.85	632	
49	saw            	0.85	183	
50	think          	0.85	173	

My unsystematic eyeballs see no forms of “to be” and far fewer adverbs than populated Austen’s list. Terms of cognition seem especially prominent:

 WORDS OVERREPRESENTED BY MANN-WHITNEY RHO 
1	understand     	0.937	271	
2	recollect      	0.923	309	
4	know           	0.916	523	
9	explain        	0.903	192	
25	decide         	0.866	101	
30	guess          	0.86	97.8	
44	understanding  	0.851	103	
50	think          	0.85	173	

What about Charlotte Lennox? Her list has “extremely” and “wholly” in the first and sixth places, but only one other “-ly” adverb (“instantly” at #29). Lennox’s vocabulary emphasizes the dynamics of sociability. Highlights:

 WORDS OVERREPRESENTED BY MANN-WHITNEY RHO 
2	civility       	0.97	117	
7	amiable        	0.959	353	
8	accompany      	0.959	55.8	
11	conversation   	0.957	258	
12	behaviour      	0.954	419	
13	mortified      	0.949	34.6	
14	mortification  	0.948	113	
15	received       	0.945	119	
18	amusements     	0.939	32.3	
19	entreaties     	0.937	54.9	
20	apprehensions  	0.937	89.4	
21	attentions     	0.936	70.9	
27	conduct        	0.929	195	
28	insisted       	0.928	80.6	
29	instantly      	0.927	209	
30	countenance    	0.925	123	
31	situation      	0.924	260	
33	visit          	0.923	107	
35	arrival        	0.922	83.5	
36	acknowledged   	0.92	53	
37	reception      	0.92	46.8	
38	circumstance   	0.919	98.7	
41	relations      	0.917	84.3	
42	letter         	0.916	312	
43	politeness     	0.916	110	
44	shocked        	0.914	89.2	
45	accident       	0.913	74.1	
46	inform         	0.913	74.8	
47	acquaintance   	0.912	131	
50	ordered        	0.91	66.6	

Walter Scott’s list of 50 (using only his fiction for the sake of comparison) includes only three adverbs, none in his top 30, and the highest-ranking is an adverb of action: “hastily.” Scott’s list evokes military contexts and especially hierarchies of authority:

1	answered       	0.958	2519	
4	warrant        	0.944	501	
8	risk           	0.93	263	
13	permit         	0.914	247	
14	trusty         	0.913	169	
19	weapon         	0.905	235	
22	boot           	0.902	127	
23	followers      	0.898	505	
27	domestics      	0.897	122	
30	commanded      	0.895	222	
32	courtesy       	0.894	262	
33	quarrel        	0.893	183	
34	kinsman        	0.892	432	
35	assistance     	0.892	248	
37	saddle         	0.891	109	
43	displeasure    	0.89	123	
44	attendance     	0.889	162	
47	willingly      	0.889	170	

Hannah More’s list (again, using only her fiction) is unsurprisingly packed with religious terminology, and I see little overlap between her list and the others.

If you want motion in your novel, open your James Fenimore Cooper:

 WORDS OVERREPRESENTED BY MANN-WHITNEY RHO 
1	movements      	0.979	903	
3	movement       	0.97	576	
4	direction      	0.961	579	
6	commenced      	0.958	374	
8	companion      	0.952	645	
18	distance       	0.915	552	
20	quest          	0.913	190	
21	returned       	0.913	829	
27	companions     	0.902	268	
37	disappeared    	0.894	137	
38	preparations   	0.893	93.3	
39	placing        	0.893	74.7	
40	position       	0.892	168	

At this point, I think we have at least a preliminary answer to our question: the prevalence of adverbs and so forth in Austen’s works is indeed characteristic of Austen herself, rather than her period or genre.

This little exploration was great fun for me, as the results returned a mix of new insights–particularly about Austen and Edgeworth–and reassuring common-sense confirmation that the tool identifies the characteristic thematic emphases of Scott and More. In a follow-up post, I’ll offer some quick thoughts about other uses of this kind of word-frequency analysis, from the perspective of a beginning user with a pedagogical emphasis.

css.php