For each of the novels I’ve surveyed for my Writers Who Read study group I’ve created a number of data maps that seek to represent quantitative data that can be gleaned from the text, utilizing techniques as complicated as sentiment analysis and topic modeling, and as simple as counting the number of words in each section. I hope to lay bare both the overarching structure of the novel and the peaks and valleys of its emotional journey. Over time I’ve been able to combine more and more data onto a single graphic, and the image above overlays no fewer than seven unique axes of information onto the same timeline:
- Major/Minor Divisions Every section of the novel is displayed across the bottom of the chart beginning from the left: horizontal rectangles of major divisions span multiple vertical rectangles of minor divisions, whether major / minor represents Parts / Chapters, or Chapters / Sections. Their relative widths are determined by the number of words in each division.
- Average Sentence and Paragraph Length Within every minor division average sentence length is shown with a vertical red line and average paragraph length with a vertical blue line, both to the same scale.
- Verb Tense The percentage of past tense verbs is shown by the darker shade rising from the bottom of each minor division rectangle
- Cumulative Sentiment The jagged orange line soaring above the major/minor sections charts the cumulative sentiment of every paragraph throughout the entire novel. This line is overlaid with whisker plots denoting the degree of emotional variance found within each minor division
- Story Shape The green line is a smoothed-out version of the cumulative sentiment line, revealing a simpler shape that represents the story type.
- Arch-Plot The grey triangle represents the traditional arch-plot shape, with inflection points at the Inciting Incident and the Climax, and with vertical dotted lines dividing the novel into halves (red), thirds (blue), and quarters (grey).
- POV (not shown) Colors of the minor divisions in the graphic above are random; the novel was narrated from a single point of view. But when a novel contains more than one point of view, colors can be used to represent different characters’ POV.
It’s important to be able to tie the peaks and valleys of the cumulative sentiment back to specific text within the novel, to understand how they line up (or don’t line up) with the story. Given a wide variance of the lengths of different paragraphs, it doesn’t necessarily follow that the midpoint of a novel containing 2,490 paragraphs will come at exactly paragraph number 1,245. In fact, the midpoint of this novel, counting words, not paragraphs, is at paragraph number 1,026. [The leftward skew of the midpoint, and the uneven spacing of the purple and green vertical lines, for example, is because this visual was created using the scale of paragraphs, not words. But the underlying calculations to determine overall percentages use the number of words, not paragraphs. That’s also why the widths of this graph are subtly different from the same shape (the orange line) in the chart at the top of this page.]
Does sentiment analysis reveal everything within a novel? No. Often times, a sentiment’s trajectory will be opposite to what it seems it should be when analyzing the text through close reading, but that’s not necessarily a bad thing. We have a name for stories where the plot matches the sentiment: fairy tales. In adult literature, we don’t find so much ‘happily ever after’ as we find truth and positive achievement earned at significant emotional cost. So the downward emotional direction at the end of a novel does not necessarily represent an unhappy ending. And in this case, it’s quite the opposite: the protagonist has come to a shattering emotional truth that we recognize as a positive, if emotionally draining (see: downward trend), ending.
And although it is not always possible to map the emotional trajectory back to plot points, the peaks and valleys we find do sometimes reveal contours within the story that might otherwise have been missed. I’ll explain. I have come to learn that if a novel easily divides into halves, and further into thirds or quarters, it’s usually the work of an editor. Every piece of literary fiction we’ve surveyed has some significant shift of direction at its midpoint. This can hardly be a coincidence. Furthermore, novels seem to reveal further divisions at either the 1/3 and 2/3 marks, and/or at the 1/4 and 3/4 marks. The novel above shows an emotional peak at paragraph 631 that lines up very closely to the 1/3 mark at paragraph 672. (It’s easier to see in the chart at the top of the page.) The protagonist’s emotional trajectory is steadily improving until that moment, which turns out to be a significant plot point. Exactly how significant might not have been teased out through traditional literary analysis, but it is very easy to spot on this chart.
This novel seems to be constructed in thirds: the first third is a steady upward line; the second third scoops down in the shape of a ladle; the final third is the same shape of the second third, but upside down and reversed. (Again, easier to see by following the green line in the chart at the top of the page.) Quite an elegant shape, indeed.
Also interesting is that the lowest point of the emotional arc is at the novel’s end. Our protagonist began at a slightly elevated emotional level, which tracks to the story, and finished somewhat diminished, which also tracks to the story very well. So would that indicate an unhappy ending, or the culmination of a tragedy? Nope. The emotional arc, we’ve learned, cannot be used alone to analyze a novel, but it is a very powerful tool to reveal insights that might otherwise have gone undiscovered.