Sweeping through data in R
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.
How do you apply one particular row of your data to all other rows?
Today I came across a data set which showed the revenue split by product and location. The data was formated to show only the split by product for each location and the overall split by location, similar to the example in the table below.
Revenue by product and continent

I wanted to understand the revenue split by product and location. Hence, I have to multiply the total split by continent for each product in each column. Or in other words I would like to use the total line and sweep it through my data. Of course there is a function in base R for that. It is called sweep
. To my surprise I can’t remember that I ever used sweep
before. The help page for sweep
states that it used to be based on apply
, so maybe that’s how I would have approached those tasks in the past.
Anyhow, the sweep
function requires an array or matrix as an input and not a data frame. Thus let’s store the above table in a matrix.
Product < c("A", "B", "C", "Total")<br />Continent < c("Africa", "America", "Asia", "Australia", "Europe")<br />values < c(0.4, 0.2, 0.4, 0.1, 0.3, 0.4, 0.3, 0.4, 0.5, 0.2, <br /> 0.3, 0.2, 0.4, 0.3, 0.3, 0.1, 0.4, 0.4, 0.2, 0.2)<br /><br />M < matrix(values, ncol=5, dimnames=list(Product, Continent))
Now I can sweep through my data. The arguments for sweep
are the data set itself (in my case the first three rows of my matrix), the margin dimension (here 2, as I want to apply the calculations to the second dimension / columns), the summary statistics to be applied (in my case the totals in row 4) and the function to be applied (in my scenario a simple multiplication “*”):
swept.M < sweep(M[1:3,], 2, M[4,], "*")<br />
The output is what I desired and can be plotted nicely as a bar plot.
> swept.M<br /> Continent<br />Product Africa America Asia Australia Europe<br /> A 0.04 0.12 0.10 0.04 0.08<br /> B 0.02 0.16 0.04 0.03 0.08<br /> C 0.04 0.12 0.06 0.03 0.04<br /><br />barplot(swept.M*100, legend=dimnames(swept.M)[["Product"]],<br /> main="Revenue by product and continent",<br /> ylab="Revenue split %") <br /></code><br /></pre><div class="separator" style="clear: both; textalign: center;"><a href="http://2.bp.blogspot.com/9BjThenvTcs/T4if5OcbULI/AAAAAAAAAPE/nt6JcdJ6kvs/s1600/Rplot.png" imageanchor="1" style="marginleft:1em; marginright:1em"><img border="0" height="315" width="400" src="http://2.bp.blogspot.com/9BjThenvTcs/T4if5OcbULI/AAAAAAAAAPE/nt6JcdJ6kvs/s400/Rplot.png" /></a></div><p><h2>One more example</h2>Another classical example for using the <code>sweep</code> function is of course the case when you have revenue information and would like to calculate the income split by product for each location:<br /></p><pre><code>Revenue < matrix(1:15, ncol=5)<br />sweep(Revenue, 2, colSums(Revenue), "/")
This is actually the same as prop.table(Revenue, 2)
, which is short for:
sweep(x, margin, margin.table(x, margin), "/")
Reading the help file for margin.table
shows that this function is the same as apply(x, margin, sum)
and colSum
is just a faster version of the same statement.
Rbloggers.com offers daily email updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/datascience job.
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.