nominal, qualitativeordinal

For visualization, the main difference is that ordinal data says a particular display order.

You are watching: How to plot categorical data in r

Purely categorical data can come in a variety of formats. The most typical are

raw data: individual observations;aggregated data: counts because that each unique mix of levelscross-tabulated data

Raw Data

Raw data because that a survey of individuals that records hair color, eye color, and gender that 592 individuals can look prefer this:

head(raw)## Hair Eye Sex## 1 Brown Blue Female## 2 black color Hazel Female## 3 Red Blue Female## 4 Brown Hazel Female## 5 Red green Male## 6 blond hair Brown Male

Aggregated Data

One way to aggregate raw categorical data is to use count from dplyr:

library(dplyr)agg ## 1 black color Brown male 32## 2 black color Brown mrs 36## 3 black Blue male 11## 4 black Blue female 9## 5 black Hazel masculine 10## 6 black Hazel woman 5The count_ duty from dplyr allows the variables to usage to be read from the data:

agg ## 1 black color Brown male 32## 2 black color Brown female 36## 3 black color Blue masculine 11## 4 black color Blue woman 9## 5 black color Hazel male 10## 6 black Hazel female 5

Cross-Tabulated Data

Cross-tabulated data can be created from accumulation data using xtabs:

xtabs(n ~ Hair + Eye + Sex, data = agg)## , , Sex = Male## ## Eye## Hair Brown Blue Hazel Green## black 32 11 10 3## Brown 53 50 25 15## Red 10 10 7 7## blond 3 30 5 8## ## , , Sex = Female## ## Eye## Hair Brown Blue Hazel Green## black color 36 9 5 2## Brown 66 34 29 14## Red 16 7 7 7## blond hair 4 64 5 8Cross-tabulated data can be produced from raw data making use of table:

xtb Both raw and aggregate date in this example are in tidy form; the cross-tabulated date is not.

Cross-tabulated data ~ above (p) variables is arranged in a (p)-way array.

The cross-tabulated data deserve to be converted to the tidy aggregate form using

class(xtb)## <1> "table"head( Hair Eye Sex Freq## 1 black Brown masculine 32## 2 Brown Brown male 53## 3 Red Brown male 10## 4 blond Brown masculine 3## 5 black Blue masculine 11## 6 Brown Blue masculine 50The variable xtb synchronizes to the data collection HairEyeColor in the datasets package,

Working with Categorical Variables

Categorical variables space usually stood for as:

character vectorsfactors.

Some benefits of factors:

more manage over ordering of levelslevels are kept when creating subsets

Most plotting and modeling attributes will convert character vectors to factors with levels ordered alphabetically.

Some typical R attributes for working with components include

factor create a factor from another type of variablelevels return the level of a factorreorder alters level order to match an additional variablerelevel moves a particular level to the an initial position together a base linedroplevels gets rid of levels no in the variable.

The tidyverse package forcats adds some much more tools, including

fct_inorder creates a element with level ordered by first appearancefct_infreq orders levels by diminish frequencyfct_rev reverses the levelsfct_recode changes factor levelsfct_relevel move one or an ext levelsfct_c merges two or more factors

Bar Charts because that Frequencies


The bar graph is often used to present the frequencies of a categorical variable.

By default, geom_bar uses stat = "count" and also maps its result to the y aesthetic. This is suitable for life data:

ggplot(raw) + geom_bar(aes(x = Hair))


For a nominal change it is often better to stimulate the bars by to decrease frequency:

library(forcats)ggplot(mutate(raw, Hair = fct_infreq(Hair))) + geom_bar(aes(x = Hair))


If the data have already been aggregated, then you have to specify stat = "identity" as well as the variable containing the counts as the y aesthetic:

ggplot(agg) + geom_bar(aes(x = Hair, y = n), stat = "identity")


An different is to usage geom_col.

For aggregated data reordering can be based upon the computed counts using

agg_ord -n is provided to order largest to smallest;the default an overview used by reorder is mean; sum is far better here.ggplot(agg_ord) + geom_col(aes(x = Hair, y = n))


Adding a grouping Variable

Mapping the Eye change to fill in ggplot to produce a stacked bar chart.

An alternative, stated with position = "dodge", is a side by side bar chart, or a clustered bar chart.

For the side by next chart in specific it might be valuable to also reorder the Eye color levels.



Faceting deserve to be offered to carry in extr variables:

p1 + facet_wrap(~ Sex)


The counts shown here might not be the many relevant functions for expertise the joint distributions of this variables.

Pie Charts and also Doughnut Charts

Pie charts can be viewed as stacked bar charts in polar coordinates:



The axes and also grid lines space not advantageous for the pie chart and also can be eliminated with part theme settings.

Using faceting us can additionally separately show the distributions for men and women:



Doughnut charts are a different that has recently end up being popular in the media:



The center is frequently used for annotation:

p4 + geom_text(aes(x = 0, y = 0, label = Sex)) + theme(strip.background=element_blank(), strip.text=element_blank())


Some Notes

Pie charts are reliable for judging part/whole relationships.Pie charts space not really effective for comparing proportions.

See more: How Do You Say You Re Welcome In Arabic, How Do You Say You Are Welcome In Arabic


Stacked bar charts with equal heights are an different for representing part-whole relationhips:

ggplot(agg) +geom_col(aes(x = Sex, y = n, fill = Hair), position = "fill") + scale_fill_manual(values = hcols)


Another alternate is a waffle chart, sometimes likewise called a square pie chart.