Text Analysis of Amazon Shareholder Letters

Amazon is led by its charismatic founder and CEO, Jeff Bezos, who is widely hailed as one of the top visionaries of this era. At the end of the fiscal year in March, Amazon publishes a letter to shareholders written by Bezos, in which he summarises his thoughts, business and management philosophy. While reading these letters, among other things that strikes is Bezos’ clarity of thought, expressed in a concise and easy to comprehend way. It is a very rare quality possessed by very few business leaders. Few other names that come to mind are Warren Buffett, Charlie Munger, Elon Musk and late Steve Jobs.

The first 1997 Letter to Shareholders was published at the conclusion of the first fiscal year of Amazon as a public company. In that letter, Jeff Bezos outlined his vision for Amazon and what kind of company he wanted it to be. Obsessing over customers and offering compelling value, is a core principle he laid out in that letter. Focussing on long term investment decisions, company growth and profitability, is how he envisioned Amazon to thrive. The remarkable thing is that in the last 20+ years, Amazon has executed right along these principles and has become, as of this post, the fourth largest company in the United States by market cap. In the first shareholder letter, Bezos said it is Day 1 for the internet and Amazon. He echoes the same thoughts in the most recent 2016 Letter to Shareholders, published earlier this month. In other words, he believes, Amazon still functions as a startup, though now it’s the biggest one. In every shareholder letter, Bezos appends the first shareholder letter of 1997, which shows how he has stayed true to his vision.

In this post, I’ll do a text analysis of Amazon Shareholder Letters from 1997 to 2016.

These letters are in pdf format and could be downloaded from Amazon’s investor relations website. I have put them in a zipped file here. The first step is to combine the text and remove the 1997 letter from every subsequent letter, and do some additional cleaning. Since ‘Day 1’ and ‘Day 2’ are often used phrases in the letters, I decided to alter the numbers 1 and 2 to one and two and remove all other numbers from the letters.

How does Bezos start these letters?

We see that the letters are typically addressed to shareholders/shareowners. In 1998, they were addressed to shareholders, customers and employees. 2016 has been a marked departure in his approach. While the earlier letters were mixed with facts and opinion, the 2016 letter constructs a short narrative of why companies decline. He reminds employees about the pitfalls of ‘Day 2’ and beckons them to remain in ‘Day 1’ mode for the next couple of decades.

year first_line
1 1997 To our shareholders.
2 1998 To our shareholders, customers, and employees.
3 1999 To our shareholders.
4 2000 To our shareholders.
5 2001 To our shareholders.
6 2002 To our shareholders.
7 2003 To our shareholders.
8 2004 To our shareholders.
9 2005 To our shareholders.
10 2006 To our shareholders.
11 2007 To our shareowners.
12 2008 To our shareowners.
13 2009 To our shareowners.
14 2010 To our shareowners.
15 2011 To our shareowners.
16 2012 To our shareowners.
17 2013 To our shareowners.
18 2014 To our shareowners.
19 2015 To our shareowners.
20 2016 “Jeff, what does Day two look like?”


Words per Letter

We can see that Bezos doesn’t write very long letters. The mean word count is around 1700 words. But from 2013-2015, his letters were more than twice as long.

Top Words

After excluding the commonly used ‘stop words’, a wordcloud of top 100 words is shown below. A remarkable thing to note is how much more customers are emphasized in the letters. Besides the company name, other terms that gain prominence are business, sales, service, experience, cash, and shareholders. Also note the frequent mention of words related to some products and services - Kindle, Prime, AWS, Marketplace, Fulfillment. Bezos has been talking about ‘Day 1’ right from the first to the latest letter, hence we find both day and one in the mix as well.

Words Commonly Associated with ‘customers’

I tokenized the text into bigrams (sequence of two words occuring together) and excluded all instances containing commonly used ‘stop words’. Since we know customer or customers are the most common words, let’s examine which words appear the most with them.

The arrows point towards the second word in the bigram. And n denotes the frequency of occurance. We can see that ‘customer experience’ has the strongest focus. It has been mentioned 47 times in all the letters, followed by ‘customer service’ and ‘customer satisfaction’.

Dominant Themes

For each bigram and year pair, I calculate its tf-idf statistic, which measures how important a bigram is to a document in a collection of documents.

Plotting the top 10 terms by this statistic every year, reveals what was on Bezos’ mind every year that was not so common with all other years.

In most of the cases, the top 10 tf-idf terms neatly summarise the important points mentioned in the letters. What’s really interesting is, we could get a sense of almost all major products and services offered by Amazon and when they were introduced or became popular. Beginning in 1998 with the launch of music and video stores (it means CDs and DVDs for you millennials out there :), to Prime Instant Video in 2013 and so many more. 2004 was all about explaining why free cash flow is important than earnings. 2010 was focussed on advancing machine learning and building advanced technology capabilities. The latest one from 2016 pertains to the Day 1 and Day 2 narrative and the quick decision making philosophy at a juggernaut like Amazon.

Network of Common Bigrams

These are bigrams that appear in 5 or more letters. In a nutshell. they show what has been most important to Amazon.

Sentiment Analysis

For computing sentiment scores, I use 2 lexicons from the tidytext package:

  • bing : A general purpose lexicon created by Bing Liu et. al., containing a dictionary of words classified into 2 categories, positive and negative.

  • loughran : Developed by Tim Loughran and Bill McDonald based on analyses of financial reports, it contains a dictionary of words classified into 6 categories - constraining, litiguous, negative, positive, superflous, uncertainity. (As of this writing, this lexicon is available only in the dev version of tidytext)

I restrict the usage to only positive and negative sentiment words from the loughran lexicon. Words not found in these lexicons are classified as neutral.

Let’s look at the top 10 most common words which drive the positive and negative sentiment in these letters.

The bing lexicon treats free and fulfillment as positive words, and cloud as a negative word. But these are mostly used in business terms by Amazon. free is used in the context of ‘free super saver shipping’ or ‘free shipping’, fulfillment is used in terms of ‘fulfillment center(s)’ and cloud in terms of ‘cloud service(s)’. None of these could be interpretted as positive or negative so it is best to remove these terms from our word dictionary before calculating the net sentiment scores.

I define a net sentiment score as the sum of all positive terms, less the sum of all negative terms, divided by the sum of all positive, negative and neutral terms. The plot below shows the net sentiment scores by year, alongside the annual returns of Amazon stock. Amazon stock had a return of 966% in 1998, which distorts its plot, so I limited the annual returns plot to within 200% range.

We can observe the sentiment in these letters has remained quite positive, even in the wake of Amazon stock dropping more than 80% after the dot-com bubble bust. Judging from the sentiment scores of these letters, Bezos doesn’t seem to be swayed by Amazon stock performance at all. The annual returns have almost 0 correlation to the yearly sentiment scores.


The R markdown file for this post is available here.

Related