Word Clouds have become quite a popular way for marketing campaigns. As more and more information is shown to the viewer, these Word Cloud present a modern graphic way to represent an idea or focus around which the whole concept is born.
Let us start with simple Word Cloud examples with the topic web 2.0. We can see the main topic in the center (web 2.0) and multiple related topics around that. Here, size and words after the main topic define the importance or relation to the overall topic.
A Word Cloud for terms related to Web 2.0 Source
A larger font size depicts the higher weight of the particular subject in a given Word Cloud topic. So the topic of Usability, Design, Convergence, Standardization, Economy, and Participation has more influence than Web Standards, Mobility, Data-Driven, CSS, Simplicity, and Microformats.
On a general note, these are also known as a tag cloud, or wordle, or a weighted list. Many online platforms use the tag cloud to represent specific items or tags that are found on that website. Suppose a website has hundreds of posts or content on a website. A tag cloud can separate the use of words on these posts to define the number of views on a specific analysis.
Word Clouds have three main types as frequency, Significance, and Categorization based on meaning rather than their visual appearances. In this post, we will learn step by step process for how to make a Word Cloud in R language.
With data analysis gaining focus in almost every industry, these Word Clouds are gaining a lot of importance in getting facts and discovering patterns. Word Clouds are now used across multiple domains as in topics of:
Here are the main reasons to use Word Clouds in presenting text data.
Here is your guide for creating the first Word Cloud in R.
Step 1: Starting by creating a text file
The first step is choosing the topic or selecting the data and creating a text file for easy processing. You can take any speech from a Politician leader or thousands of social media posts. You can use any editor from your system or online to copy-paste the data and create a specific text file for building a Word Cloud.
Step 2: Install and loading the WordCloud package in R
Then open the RStudio. And for generating Word Cloud in R, you must have the WordCloud package in R and RcolorBrewer package for representing colors, respectively. Here are the commands to use these packages in the RStudio:
#Installalation and loading of packages
Users also have the option to use the WordCloud2 package that offers extra designs and funny applications for developing a more engaging Word Cloud in R. Here is the command for that:
Now data or texts take center stage for the whole analysis. Now you need to load your text data as a corpus. Here tm package can help you in this process.
#Command for creating a vector containing only the text
text <- data$text
#Command to Create a corpus
docs <- Corpus(VectorSource(text))
And in case you are using Twitter data, then there is a separate rtweet package that can ease your process too.
Step 3: Text Mining in R: Cleaning the data
Once the text is available with Corpus() function via the text mining ™, then cleaning the data is the next stage. Now you must remove the special characters, punctuation, or any numbers from the complete text for separating words. This will help the WordCloud to focus on words only and be more productive in delivering insights precisely.
There are multiple packages to help you clean data when using the corpus. For the tm package, you can use the following list of commands.
docs <- docs %>%
docs <- tm_map(docs, content_transformer(tolower))
docs <- tm_map(docs, removeWords, stopwords(“english”))
Text mining in R can have numerous key points in the process.
And in case you are working with Twitter data. Here is a code for refining data for a sample of tweets to clean your texts.
gsub(“https\\S*”, “”, tweets$text)
gsub(“@\\S*”, “”, tweets$text)
gsub(“amp”, “”, tweets$text)
gsub(“[\r\n]”, “”, tweets$text)
gsub(“[[:punct:]]”, “”, data$text)
Here is a complete list of R Codes to help you in the text mining process.
# Transforming the text to lower case docs <- tm_map(docs, content_transformer(tolower))
# Removing the numbers docs <- tm_map(docs, removeNumbers)
# Remove english common stopwords docs <- tm_map(docs, removeWords, stopwords(“english”))
# Remove own stop word for any specific document# specify stopwords as a character vector docs <- tm_map(docs, removeWords, c(“example1”, “example2”))
# Removing punctuations docs <- tm_map(docs, removePunctuation)
# Eliminating all the rest of the extra white spaces docs <- tm_map(docs, stripWhitespace)
# Text stemming from the document docs <- tm_map(docs, stemDocument)
Step 4: Creating a document-term-matrix
In the next step, we create a document-term-matrix that defines a mathematical matrix with the frequency of words in a given document.
Once executed, this creates a data frame with two columns, one as a word and the second as their frequency in the document. Here is the code for building a document term matrix for the tm package using the TermDocumentMatrix function.
dtm <- TermDocumentMatrix(docs)
matrix <- as.matrix(dtm)
words <- sort(rowSums(matrix),decreasing=TRUE)
df <- data.frame(word = names(words),freq=words)
There is a tidytext package you can use for creating a document term-matrix more specifically for working with tweets.
tweets_words <- tweets %>%
words <- tweets_words %>% count(word, sort=TRUE)
Step 5: Generating the Word Cloud
Now you can simply use the wordcloud function to generate a Word Cloud from the text. You can set limits for a number of words, frequency, and more to get a final presentation for the Word Cloud.
set.seed(1234) # for reproducibility
wordcloud(words = df$word, freq = df$freq, min.freq = 1,max.words=100, random.order=FALSE, rot.per=0.35, colors=brewer.pal(8, “Dark2”))
Here are parameters to help you build a more specific Word Cloud.
You may also find some words that are often cropped or don’t show up in the Word Cloud. You can adjust them as per your preference and get more productive by enhancing the quality of results. Another common mistake that one sees in Word Cloud is the use of many words with little frequency. Here you can use min.freq function to further limit the use of words respectively.
Here is a collective result for a document term matrix from Martin Luther King’s speech titled ‘I have a dream speech.
|word freqwill will 17freedom freedom 13ring ring 12day day 11dream dream 11let let 11every every 9able able 8one one 8together together 7|
A matrix showing the words and their corresponding frequency in the data And the result for that Word Cloud will be as below:
On analysis, you can visualize that the top words that came in his speech were will, freedom, let, dream, ringday, every, able, one, together, and then decreasing thereafter.
The WordCloud2 package has more added visualizations and features for users. This package allows you to give custom shapes to the Word Cloud, such as Pentagon or a particular letter ‘L’. Here are your code and visual presentation of UN speeches given by presidents.
|wordcloud2(data=df, size=1.6, color=’random-dark’)|
|wordcloud2(data=df, size = 0.7, shape = ‘pentagon’)|
Communication: With the right use of Word Clouds, writers can know the focus points of content and build their content around that. Especially for fiction writing, writers can use Word Clouds to know the importance of any particular character, scene, or emotion to collaborate into a final product.
Business Insights: Here, Word Clouds can help in discovering customers’ strong or weak points from their feedback analysis to be more close to their emotions. Things such as getting the main topic in Word Cloud as long delays showing an essential weaker point in their procedure can be done easily.
While for Business to Consumer analysis, Word Clouds can also pinpoint the specific technical information or Jargon in creating a balance of different words in a single piece of content.
In this age of data analysis, Word Cloud presents a unique and engaging way to simplify data into graphical presentations. Today, organizations and businesses around the world are making sure to build more engaging content. This guide is a comprehensive guide to build Word Cloud examples in R from scratch and get more user attention for their data analysis.
You start by selecting the text file and then learn how to install Wordcloud and other packages. Text mining in R is quite a sophisticated technique to draw Word Cloud and get more user engagement for their platforms. Then create a document-term-matrix and enjoy your first Word Cloud output.
Though Word Cloud in R has gained a lot of popularity with the prominence of Data analysis and provides an engaging way for qualitative insight, still, Word Cloud can’t be used to represent the whole research or any statistical analysis. While many researchers, online platforms, and marketing professionals enjoy showcasing Word Cloud in their field to get more user focus towards their focus, products, or services.
If you are interested in making a career in the Data Science domain, our 11-month in-person Postgraduate Certificate Diploma in Data Science course can help you immensely in becoming a successful Data Science professional.