Introduction: What is a Word Cloud?

Word Clouds have become quite a popular way for marketing campaigns. As more and more information is shown to the viewer, these Word Cloud present a modern graphic way to represent an idea or focus around which the whole concept is born. 

Let us start with simple Word Cloud examples with the topic web 2.0. We can see the main topic in the center (web 2.0) and multiple related topics around that. Here, size and words after the main topic define the importance or relation to the overall topic.

A Word Cloud for terms related to Web 2.0 Source

A larger font size depicts the higher weight of the particular subject in a given Word Cloud topic. So the topic of Usability, Design, Convergence, Standardization, Economy, and Participation has more influence than Web Standards, Mobility, Data-Driven, CSS, Simplicity, and Microformats. 

On a general note, these are also known as a tag cloud, or wordle, or a weighted list. Many online platforms use the tag cloud to represent specific items or tags that are found on that website. Suppose a website has hundreds of posts or content on a website. A tag cloud can separate the use of words on these posts to define the number of views on a specific analysis. 

Word Clouds have three main types as frequency, Significance, and Categorization based on meaning rather than their visual appearances. In this post, we will learn step by step process for how to make a Word Cloud in R language. 

  1. Who is using Word Clouds?
  2. Reason to use Word Clouds to present your text data
  3. Main steps to create Word Cloud in R
  4. Word Cloud examples
  5. When you should use Word Clouds 

1. Who is using Word Clouds?

With data analysis gaining focus in almost every industry, these Word Clouds are gaining a lot of importance in getting facts and discovering patterns. Word Clouds are now used across multiple domains as in topics of:

  • Research to conclude qualitative information from large amount and multiple forms of data
  • Social Media Sites to collect and analyze data in discovering potential current trends, separate miscreant or offenders, and upcoming changes in user behaviors.
  • Marketing to uncover present trends, user behavior, and trending products.
  • Education to bring more focus on the essential issues that need more attention.
  • Politicians and Journalists to get more attention from users.

2. Reason to use Word Clouds to present your text data

Here are the main reasons to use Word Clouds in presenting text data.

  • Keeps the Focus: Word Cloud represents a sophisticated communication tool for modern viewers to focus on the main factors or reasons rather than going through the whole document.
  • Simple and precise information: A Word Cloud will bring the exact information to the viewers instantly.
  • Highly engaging: Word Clouds are more visually engaging in the viewer’s eye.
  • Enhance user experience: Overall Word Cloud is a great way to improvise user experience.

3. Main steps to create Word Cloud in R


Here is your guide for creating the first Word Cloud in R. 

Step 1: Starting by creating a text file 

The first step is choosing the topic or selecting the data and creating a text file for easy processing. You can take any speech from a Politician leader or thousands of social media posts. You can use any editor from your system or online to copy-paste the data and create a specific text file for building a Word Cloud.  

Step 2: Install and loading the WordCloud package in R

Then open the RStudio. And for generating Word Cloud in R, you must have the WordCloud package in R and RcolorBrewer package for representing colors, respectively. Here are the commands to use these packages in the RStudio:


#Installalation and loading of packages 
install.packages(“wordcloud”) 
library(wordcloud)
install.packages(“RColorBrewer”)
library(RColorBrewer)

Users also have the option to use the WordCloud2 package that offers extra designs and funny applications for developing a more engaging Word Cloud in R. Here is the command for that:


install.packages(“wordcloud2)
library(wordcloud2)

Now data or texts take center stage for the whole analysis. Now you need to load your text data as a corpus. Here tm package can help you in this process.


install.packages(“tm”)
library(tm)
#Command for creating a vector containing only the text
text <- data$text

#Command to Create a corpus 
docs <- Corpus(VectorSource(text))

And in case you are using Twitter data, then there is a separate rtweet package that can ease your process too.

Step 3: Text Mining in R: Cleaning the data 

Once the text is available with Corpus() function via the text mining ™, then cleaning the data is the next stage. Now you must remove the special characters, punctuation, or any numbers from the complete text for separating words. This will help the WordCloud to focus on words only and be more productive in delivering insights precisely.

There are multiple packages to help you clean data when using the corpus. For the tm package, you can use the following list of commands.


docs <- docs %>%
tm_map(removeNumbers) %>%
tm_map(removePunctuation) %>%
tm_map(stripWhitespace)
docs <- tm_map(docs, content_transformer(tolower))
docs <- tm_map(docs, removeWords, stopwords(“english”))

Text mining in R can have numerous key points in the process.

  • Removes all the number present in the data or text (removeNumbers argument)
  • Removes all the punctuation marks from the sentences (remove punctuation argument)
  • Strips the text for any white space (stripWhitespace argument)
  • Transforms all the words into lower case (content_transformer(tolower))
  • Removing common stop words such as “we”, “I”, or “the” in the whole document. (removeWords, stopwords )

And in case you are working with Twitter data. Here is a code for refining data for a sample of tweets to clean your texts.


gsub(“https\\S*”, “”, tweets$text)
gsub(“@\\S*”, “”, tweets$text)
gsub(“amp”, “”, tweets$text)
gsub(“[\r\n]”, “”, tweets$text)
gsub(“[[:punct:]]”, “”, data$text)

Here is a complete list of R Codes to help you in the text mining process.


# Transforming the text to lower case docs <- tm_map(docs, content_transformer(tolower)) 
# Removing the numbers docs <- tm_map(docs, removeNumbers) 
# Remove english common stopwords docs <- tm_map(docs, removeWords, stopwords(“english”)) 
# Remove own stop word for any specific document# specify stopwords as a character vector docs <- tm_map(docs, removeWords, c(“example1”, “example2”)) 
# Removing punctuations docs <- tm_map(docs, removePunctuation) 
# Eliminating all the rest of the extra white spaces docs <- tm_map(docs, stripWhitespace) 
# Text stemming from the document  docs <- tm_map(docs, stemDocument)

Step 4: Creating a document-term-matrix

In the next step, we create a document-term-matrix that defines a mathematical matrix with the frequency of words in a given document.


Once executed, this creates a data frame with two columns, one as a word and the second as their frequency in the document. Here is the code for building a document term matrix for the tm package using the TermDocumentMatrix function.


dtm <- TermDocumentMatrix(docs)
matrix <- as.matrix(dtm)
words <- sort(rowSums(matrix),decreasing=TRUE)
df <- data.frame(word = names(words),freq=words)

There is a tidytext package you can use for creating a document term-matrix more specifically for working with tweets.


tweets_words <-  tweets %>%
select(text) %>%
unnest_tokens(word, text)
words <- tweets_words %>% count(word, sort=TRUE)

Step 5: Generating the Word Cloud 

Now you can simply use the wordcloud function to generate a Word Cloud from the text. You can set limits for a number of words, frequency, and more to get a final presentation for the Word Cloud.


set.seed(1234) # for reproducibility 
wordcloud(words = df$word, freq = df$freq, min.freq = 1,max.words=100, random.order=FALSE, rot.per=0.35,            colors=brewer.pal(8, “Dark2”))

Here are parameters to help you build a more specific Word Cloud.

  • words: To define words that you want to see in the cloud
  • freq: To define word frequencies
  • min.freq: To define the minimum frequency of words to be listed in the Cloud
  • max.words: To defines the maximum number of words in the cloud (otherwise you might see every word in the graphic)
  • random.order: To represent words in the cloud in random order. (For selecting false you will get them in decreasing order)
  • rot.per: To define the vertical text percentage in the given data
  • colors: To defines a wide variety of choice for colors in representing data
  • Scale: to manage the font size between the smallest and largest words

You may also find some words that are often cropped or don’t show up in the Word Cloud. You can adjust them as per your preference and get more productive by enhancing the quality of results. Another common mistake that one sees in Word Cloud is the use of many words with little frequency. Here you can use min.freq function to further limit the use of words respectively.

4. Word Cloud examples

Here is a collective result for a document term matrix from Martin Luther King’s speech titled ‘I have a dream speech.

            word freqwill         will   17freedom   freedom   13ring         ring   12day           day   11dream       dream   11let           let   11every       every    9able         able    8one           one    8together together    7

A matrix showing the words and their corresponding frequency in the data And the result for that Word Cloud will be as below:

On analysis, you can visualize that the top words that came in his speech were will, freedom, let, dream, ringday, every, able, one, together, and then decreasing thereafter.

The WordCloud2 package has more added visualizations and features for users. This package allows you to give custom shapes to the Word Cloud, such as Pentagon or a particular letter ‘L’. Here are your code and visual presentation of UN speeches given by presidents. 

wordcloud2(data=df, size=1.6, color=’random-dark’)
wordcloud2(data=df, size = 0.7, shape = ‘pentagon’)

5. When you should use Word Clouds 

Communication: With the right use of Word Clouds, writers can know the focus points of content and build their content around that. Especially for fiction writing, writers can use Word Clouds to know the importance of any particular character, scene, or emotion to collaborate into a final product. 

 Business Insights: Here, Word Clouds can help in discovering customers’ strong or weak points from their feedback analysis to be more close to their emotions. Things such as getting the main topic in Word Cloud as long delays showing an essential weaker point in their procedure can be done easily.

 While for Business to Consumer analysis, Word Clouds can also pinpoint the specific technical information or Jargon in creating a balance of different words in a single piece of content.

Conclusion 

In this age of data analysis, Word Cloud presents a unique and engaging way to simplify data into graphical presentations. Today, organizations and businesses around the world are making sure to build more engaging content. This guide is a comprehensive guide to build Word Cloud examples in R from scratch and get more user attention for their data analysis.

You start by selecting the text file and then learn how to install Wordcloud and other packages. Text mining in R is quite a sophisticated technique to draw Word Cloud and get more user engagement for their platforms. Then create a document-term-matrix and enjoy your first Word Cloud output.

Though Word Cloud in R has gained a lot of popularity with the prominence of Data analysis and provides an engaging way for qualitative insight, still, Word Cloud can’t be used to represent the whole research or any statistical analysis. While many researchers, online platforms, and marketing professionals enjoy showcasing Word Cloud in their field to get more user focus towards their focus, products, or services.

If you are interested in making a career in the Data Science domain, our 11-month in-person Postgraduate Certificate Diploma in Data Science course can help you immensely in becoming a successful Data Science professional. 

SHARE