Redo the readability calculation after executing the preprocessing steps described in the previous section. What do you observe?
Start with the original letters corpus (i.e., prior to preprocessing) and identify the 20 most common words and create a word cloud for these words.
Use the dictionary feature of text mining to remove selected words from the Buffett letters’ corpus to see if you can determine what differentiates the letters of 2010 and 2011.
Experiment with the topicmodels package to identify the topics in Buffett’s letters. You might need to use the dictionary feature of text mining to remove selected words from the corpus to develop a meaningful distinction between topics.
This page is part of the promotional and support material for Data Management (open edition) by Richard T. Watson |