The Charts of W.E.B. Du Bois - Part 2

Intro
Continuing in my series on the data visualizations of W.E.B. Du Bois (you can read my first post here), the next chart in the series is titled “Comparative rate of increase of the White and Black elements of the population of the United States.”
This one was quite a bit harder than my last post, as the data for the prior post was clearly labeled on the chart. For this one, I had to find the data first, which fortunately was available on Wikipedia. This presented a great opportunity to use the rvest package to scrape the data and turn it into an R data frame to create the graph.
A quick note on web scraping with R: I highly recommend reading this article to get started, it provides an excellent overview for getting started.
Here is the code to grab the data from Wikipedia and some manipulation to get it into a single data frame:
# Load Required Libraries
##############################################################################
library('rvest')
library('dplyr')
library('stringr')
library('ggplot2')
library('reshape2')
library('scales')
library('grid')
library('tidyr')
# Wikipedia data source
url <- 'https://en.wikipedia.org/wiki/Historical_racial_and_ethnic_demographics_of_the_United_States'
# Read HTML code from the URL
webpage <- read_html(url)
# Using CSS selectors to scrape the table
tbls <- html_nodes(webpage,'table')
# Pull data from 1760-1840
##############################################################################
w0 <- html_table(tbls[grep("1760 and 1840",tbls,ignore.case = T)],fill = T)[[1]] %>%
filter(`Race/Ethnic Group` %in% c("White", "Black (also called Negro)")) %>%
select(-`1760`, -`1770`) %>%
rename('ethnicity' = `Race/Ethnic Group`) %>%
mutate(ethnicity = str_replace(ethnicity,"Black [(]also called Negro[)]","Black"))
# Pull data from 1850-1920
##############################################################################
w1 <- html_table(tbls[grep("1850 and 1920",tbls,ignore.case = T)],fill = T)[[1]] %>%
filter(`Race/Ethnic Group` %in% c("White", "Black")) %>%
select(-`1900`, -`1910`, -`1920`) %>%
rename('ethnicity' = `Race/Ethnic Group`)
# Combine data into single frame, years 1790-1890
##############################################################################
w2 <- inner_join(w0,w1,by='ethnicity')
Fast Facts
As mentioned in my last post, I would also like to highlight some of Dr. Du Bois’s major accomplishment outside of his data visualizations. This one comes from the NAACP:
“In 1909, Du Bois was among the founders of the National Association for the Advancement of Colored People (NAACP) and from 1910 to 1934 served it as director of publicity and research, a member of the board of directors, and founder and editor of The Crisis, its monthly magazine.”
The Original Chart
Another fantastic hand drawn product! I am especially impressed by the use of the millions on the Y axis, while the data labels display the percent increase in the population over time. This can be tricky to accomplish, I was able to use
geom_text(label = scales::percent(w4$pct))
for the percent labels and
scale_y_continuous(labels = as.numeric(seq(5, 50, by=5)),
breaks = as.numeric(seq(5000000, 50000000, by=5000000)))
to use millions on the Y axis, rounded and scaled to match the original chart.