其他分享
首页 > 其他分享> > ggplot - legend, label and font size

ggplot - legend, label and font size

作者:互联网

1. Introduction

ggplot is one of the most famous library in R and I use it very ofen in daily workflow. But there are three topics I seldomly touch before: legend, label and font size. 

One reason is that they are not a necessity in out plot. But I believe it is good to be packed in our backpocket. Besides, in some business situation, they will be very useful.

In this article, we will use open data nycflights13::flights as an example.

For those who are not familiar with ggplot or tidyverse: ggplot is included in tidyverse, and because we will deal with some date data, we also use lubridate.

library(tidyverse)
library(lubridate)

(flights <- nycflights13::flights %>%
    filter(month %in% c(1), carrier %in% c("AA", "UA", "DL")))

 

year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum origin dest  air_time distance  hour minute time_hour          
   <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>     <dbl> <chr>    <int> <chr>   <chr>  <chr>    <dbl>    <dbl> <dbl>  <dbl> <dttm>             
 1  2013     1     1      517            515         2      830            819        11 UA        1545 N14228  EWR    IAH        227     1400     5     15 2013-01-01 05:00:00
 2  2013     1     1      533            529         4      850            830        20 UA        1714 N24211  LGA    IAH        227     1416     5     29 2013-01-01 05:00:00
 3  2013     1     1      542            540         2      923            850        33 AA        1141 N619AA  JFK    MIA        160     1089     5     40 2013-01-01 05:00:00
 4  2013     1     1      554            600        -6      812            837       -25 DL         461 N668DN  LGA    ATL        116      762     6      0 2013-01-01 06:00:00
 5  2013     1     1      554            558        -4      740            728        12 UA        1696 N39463  EWR    ORD        150      719     5     58 2013-01-01 05:00:00
 6  2013     1     1      558            600        -2      753            745         8 AA         301 N3ALAA  LGA    ORD        138      733     6      0 2013-01-01 06:00:00
 7  2013     1     1      558            600        -2      924            917         7 UA         194 N29129  JFK    LAX        345     2475     6      0 2013-01-01 06:00:00
 8  2013     1     1      558            600        -2      923            937       -14 UA        1124 N53441  EWR    SFO        361     2565     6      0 2013-01-01 06:00:00
 9  2013     1     1      559            600        -1      941            910        31 AA         707 N3DUAA  LGA    DFW        257     1389     6      0 2013-01-01 06:00:00
10  2013     1     1      559            600        -1      854            902        -8 UA        1187 N76515  EWR    LAS        337     2227     6      0 2013-01-01 06:00:00
# ... with 11,111 more rows 

 

2. Legend

2.1 use at least one aesthetic to make legend comes up

If we want a legend, the easiest way is to use at least one aesthetic.

That means in aes() we are not only giving x= and y=, we will also specify other argument like color=.

# example 1
flights %>%
  mutate(date=make_date(year, month, day)) %>%
  group_by(date, carrier) %>%
  summarise(delay=mean(dep_delay, na.rm=TRUE)) %>%
  ggplot() +
  geom_line(aes(x=date, y=delay, color=carrier))

  

It may be natural to write pipelines like above at exploring, but to make things clear, I equally change above code to below. That will have same result.

# it may be natural to write pipelines like above at exploring,
# but before everything goes too crazy, let us do it separately.
(delay <- flights %>%
    mutate(date=make_date(year, month, day)) %>%
    group_by(date, carrier) %>%
    summarise(delay=mean(dep_delay, na.rm=TRUE)))

delay %>%
  ggplot() +
  geom_line(aes(x=date, y=delay, color=carrier))

Now let's have a look at other arguments.

For example, this is how linetype= will affect legend.(example2)

And good thing is they can be mixed and matched.(exmaple3)

# example 2
delay %>%
  ggplot() +
  geom_line(aes(x=date, y=delay, linetype=carrier))

  

# example 3
delay %>%
  ggplot() +
  geom_line(aes(x=date, y=delay, color=carrier, linetype=carrier))

 

2.2 manually change legend behaviour

We always want to change color or orders or positions of legend, how we can do that?

Most of them can be changed by scale_color_manual() or theme().

# change color
delay %>%
  ggplot() +
  geom_line(aes(x=date, y=delay, color=carrier)) +
  scale_color_manual(values=c("black", "red", "orange"))

  

To make our code more clear, I equally change above code to below. They will have same result.

# or, equally
base <- delay %>%
  ggplot() +
  geom_line(aes(x=date, y=delay, color=carrier))

base + 
  scale_color_manual(values=c("black", "red", "orange"))

I often found changing color is useful than other thing, because some companies have their particular brand color.

Besides to that, people often require changing legend order. For example, "our company" at first line, our archrival at second line, and so on.

# change legend order
base +
  scale_color_manual(values=c("black", "red", "orange"),
                     breaks=c("DL", "UA", "AA"))

  

In above examples, legend text are created by ggplot and they are from original dataset. We can change that if we need.

# change legend text
base +
  scale_color_manual(values=c("black", "red", "orange"),
                     name="Airlines", # can set to NULL
                     breaks=c("DL", "UA", "AA"),
                     labels=c("Delta", "United", "American"))

  

The position of legend can be changed by below method.

# change legend position
base +
  theme(legend.position="bottom", legend.title=element_blank()) # legend.title cannot set to NULL

  

2.3 For a solid shape(like bar-chart), use this one instead

We can use scale_fill_manual() inistead. All the other arguments are all the same as above.  

base2 <- flights %>%
  group_by(carrier) %>%
  summarise(delay=mean(dep_delay, na.rm=TRUE)) %>%
  ggplot() +
  geom_col(aes(x=carrier, y=delay, fill=carrier))

base2 +
  scale_fill_manual(values=c("black", "red", "orange"))

  

3. Label

3.1 Add Pure Number

In Python, I wrote an article about how to add label on a matplotlib base plot. That's not so easy nor so straightforward.

But ggplot is very well designed that we can easily do what we want.

Most of time we only need geom_text().

# add number
base <- delay %>%
  ggplot() +
  geom_line(aes(x=date, y=delay, color=carrier))

base +
  geom_text(aes(x=date, y=delay, label=round(delay, 0)))

  

This method works for bar-chart and other solid plot as well.

If we want some offset to make the number more clear, we can use nudge_y=.

# it also works for bar chart
base2 <- flights %>%
  group_by(carrier) %>%
  summarise(delay=mean(dep_delay, na.rm=TRUE)) %>%
  ggplot() +
  geom_col(aes(x=carrier, y=delay, fill=carrier))

# use nudge_y=... to do some offset
base2 +
  geom_text(aes(x=carrier, y=delay, label=round(delay, 0)), nudge_y=0.5)

  

 3.2 Add Number with a small backborad

Just like geom_text(), there is another function called geom_label().

# add number with little backboard
base +
  geom_label(aes(x=date, y=delay, label=round(delay, 0)))

base2 +
  geom_label(aes(x=carrier, y=delay, label=round(delay, 0)), nudge_y=0.5)

  

 3.3 dummy variables

geom_text() can also works with dummy variables. That means we can create a series most of them are NA, but only special one have values.

With this method we can easily put emphasis on a special point without showing all label at onces.

len <- length(delay$delay)
null_labels <- rep(NA, len-1)
null_labels <- c(null_labels, round(delay$delay[len-1]))

base +
  geom_label(aes(x=date, y=delay, label=null_labels))

  

 4. Font size(and colors)

Font size and color can be change with theme() functions.

4.1 Font size can be changed all together

# use theme_*()
base +
  theme_grey(base_size = 17)

  

 4.2 Or we can change font size individually

# use theme()
base +
  theme(axis.title.x=element_text(color="orange", size=17),
        axis.text.x=element_text(color="blue", size=17))

  

 

标签:00,01,color,label,delay,carrier,font,2013,size
来源: https://www.cnblogs.com/drvongoosewing/p/14254687.html