The key to unlocking services on G-Cloud

November 14, 2017
By

(This article was first published on R – thinkr, and kindly contributed to R-bloggers)

The importance of keyword-rich descriptions

There are nearly 20,000 services on G-Cloud. Suppliers have strewn their services with keywords designed to grab the attention of buyers. So what should buyers search for, and how does that vary by cloud service category?

Only selected parts of the suppliers’ content are indexed for searching: The service title, a 50-word summary, and bulleted features and benefits. So suppliers must cram in thoughtful keyword-rich phrases to optimise their chances of success.

In this blog, I want to compare and contrast the most frequent keywords used by suppliers. I’ve selected four categories from the Cloud Hosting lot for this purpose:

  • Compute & Application Hosting (C&AH)
  • Object Storage
  • Infrastructure & Platform Security (I&PS)
  • Platform as a Service (PaaS)

Discarding distracting data

Services can belong to multiple categories as demonstrated in the Venn diagram below. For example, 53 (those at the heart of the plot) are aligned to all four categories. Comparing and contrasting the keywords for these would clearly be of little benefit. So I’m going to focus on those services around the periphery which are unique to each category, for example, the 323 for C&AH and so forth.

Venn diagram of G-Cloud services and how they align to 4 hosting categories

Having defined the scope, we now need to do a bit of cleaning. The words are converted to lower case so that we get a truer count of each distinct word. Common stop words, such as “and” and “the”, are removed. Words which are category-neutral, such as “cloud” and “service”, as well as the names of the suppliers or services themselves, are also weeded out. This cleaning will enable us to home in on service characteristics.

Visualisation of search terms

With that done, we could visualise the word frequency per category with a Word Cloud. The Compute & Application Hosting example below shows the most frequent words, where, for example, “uk”, “data”, “virtual”, “scale” and “security” figure prominently.

Word cloud of G-Cloud search terms for Compute & Application Hosting

However, whilst visually appealing, we do need a better approach if we are to compare and contrast across categories. This facet-wrap plot shows the ten most frequent words in each category. The advantage here is that we can more easily see both common ground and points of distinction.

Top 10 G-Cloud search terms used by suppliers in 4 hosting categories

“Security” and “data” are among the top keywords for all four categories. In contrast, “API” and “integration” are distinctively important for Platform as a Service (PaaS). Similarly, “scale” and “virtual[isation]” are distinctively important for Compute and Application Hosting.

The takeaway

A more extensive analysis of this nature may help the G-Cloud team to identify inter-category dissimilarity and thus refine the service categorisation newly introduced in the ninth iteration of G-Cloud. It could also form the basis of guidance to buyers on the keywords to consider when preparing search terms for a given category.

R tools used

  Packages Functions
purrr map_df
rvest read_html; html_nodes; html_text
dplyr select; arrange; filter; count; mutate; if_else; anti_join
tidyr separate
tidytext unnest_tokens
stringr str_replace; str_trim
tibble tibble
lubridate today
ggplot2 theme_set; geom_col; geom_text; coord_flip; facet_wrap
vennDiagram venn.diagram; calculate.overlap
wordcloud2 wordcloud2
ggthemes theme_
economist

Citation

R Development Core Team (2008). R: A language and environment for
statistical computing. R Foundation for Statistical Computing,
Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.

Contains public sector information licensed under the Open Government Licence v3.0.

The post The key to unlocking services on G-Cloud appeared first on thinkr.

To leave a comment for the author, please follow the link and comment on their blog: R – thinkr.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)