**Rsome - A blog about some R stuff**, and kindly contributed to R-bloggers)

## Introduction

This blog post is the follow-up on part I on programming with ggplot2. If you have not read the first post of the

series, I strongly recommend doing so before continuing with this second part,

otherwise it might prove difficult to follow.

Having developed a scalable approach to column-wise and data

type-dependent visualization, we will continue to customize our plots. Specifically,

the focus of this post is how we can use a log-transformed x-axis with nice

breakpoints for continuous data.

If you don’t like the idea of having a

non-linear scale, don’t stop reading here. The principles developed below can be

generalized well to customize the plots regarding other aspects in which

the customization depends on the data itself.

## The problem

Recall from part one that we ended

up with the following code to produce graphs for two different data types in

our data frame with four columns.

Our goal is to alter the x-axis from a linear to a log-transformed scale to make

better use of the space in the plot.

## A fist solution

At first glance, the solution to the problem seems easy.

Similarly to the first post of this series,

we can create a new function `scale_x_adapt`

which returns a continuous scale

and a discrete scale otherwise. Then, we could pass the transform argument

via `...`

to `scale_x_continuous`

and integrate it with our current framework.

This seems fine, except for the fact that the break ticks are not really chosen

wisely. There are various ways to go about that:

- Resort to functionality from existing packages like
`trans_breaks`

(from the

scales package),`annotation_logticks`

(ggplot2) and others. - Create your own function that returns pretty breaks.

We go for the second option because it is a slightly more general approach and I

was not able to find a solution that pleased me for our specific case.

## A second solution

We need to change the way the breaks are created within `scale_x_adapt`

.

To produce appropriate breaks, we need to know the maximum and the minimum of the

data we are dealing with (that is, the column that `lapply`

currently passes over)

and then create a sequence between the minimum and the maximum with some function.

Recall that in part 1 we used a function `current_class`

that does

something similar to what we want. It gets the class of the current data. Hence,

we can expand this function to get any property from our current data (and

give the function a more general name).

Note the new argument f, which allows us to fetch a wider range of properties from

the current data, not just the class, as `current_class`

did.

This is key

for every customization that depends on the input data, because this function

can now get us virtually any information out of the data we could possibly want.

In our case, we are interested in the minimum and maximum

values for the current batch of data. As a finer detail, also note that

`current_class`

called `class`

and returned the first value, since objects can

have multiple classes and we were only interested in the first one (otherwise

we could not do the logical comparison with `%in%`

). We now return all elements

that `f`

returns, since we can always perform the subset outside the function

`current_property`

, and this makes the function more flexibile.

Next, we need to create a function that, given a range, computes

some nice break values we can pass to the `breaks`

argument of

`scale_x_continuous`

. This task is independent of the rest of the framework we

are developing here. One function that does something that is close to what

we want is the following.

Let me break these lines into pieces.

- The basic idea is to create a sequence of breaks between the minimum and the

maximum value of the current batch of data using`seq`

. - Let us assume we want break points that are equi-distant on the log scale.

Since our plot is going to be on a logarithmic x-axis, we need to create a linear sequence

between`log(start)`

and`log(end)`

and transform it with`exp`

so we end up

with breaks that have the same distance on the logarithmic scale

It becomes

evident that the solution presented above is suitable for a log-transformed

axis, but if you choose another transformation, e.g. the square root-

transformation, you need to adapt the function. - We want to round the values depending on their absolute value. For example,

the values for carat (which are in the range of 0.2 to 5) should be rounded to

one decimal point, whereas the values of price (ranging up to 18’000)

should be rounded to thousands or tens of thousands.

So note that`log10(10)`

is one,`log10(100) = 2`

and`log10(0.1) = -1`

etc, which

is exactly what we need. In other words, we make the rounding dependent on the

log of the difference between the maximum and the minimum of the input data

for each plot. - A constant
`correction`

is added so it is possible to manually*adjust*

the rounding from more to less digits.

Finally, we can put it all together:

## Conclusion

In this blog post, we wanted to further customize our plots created in the first

post of the series.

We introduced a new function, `scale_x_adapt`

that returns a

predefined scale for a given data type. It can be integrated with our framework

similarly to `geom_hist_or_bar`

. We created a more general version of

`current_class`

, `current_property`

which takes a function as an argument and

allows us to evaluate this function on the current data column.

In our example, this is helpful because using `current_property(min)`

and `current_property(max)`

, we found out the range of the column we are

processing and hence can construct nice breakpoints with `calc_log_breaks`

that then get

used in `scale_x_adapt`

. `current_property`

is a key function in the framework

developed here since it can extract any information from the batch of data we are

processing within `lapply`

.

**leave a comment**for the author, please follow the link and comment on their blog:

**Rsome - A blog about some R stuff**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...