Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Contents

Table of Contents

Introduction

This is an attempt to use motion charts to visualize the daily growth, since January 2020, in US Covid-19 cases (confirmed and deaths) by State and by California county showing the population by bubble size and color by median age, education or income.

The visualization tool is https://www.charte.ca/, a Motion graphics chart tool. In particular, we use it to visualize the growth in time of Covid-19 cases, using the bubble size for population or population density and colors for education, age, political leaning, income etc.

In particular, try out the motion charts in the Results section below. There is a slider at the bottom of each chart to move backward and forward in time, and one can move the mouse over a bubble to find more details.

Method

The Covid-19 statistics are from the Johns Hopkins University (JHU). There are raw data for confirmed cases and deaths by date for each county in each US state. For the US state analysis, the data from the counties are aggregated into the values for each state.

For each state, we extracted various demographics including the ISO 2 character label, population, area, population density, education, income, median age, political leaning. The demographics were obtained from the following sources for the US.


A Perl script covid-us.pl was developed to gather the above information and cast it in a suitable form for the www.charte.ca motion charts and correlation data.  See:

The script also ranks the age, income, and education demographics for each state or county into low, medium, or high based on their tertiles. This is so these demographics can be used with charte.ca's grouping feature.

In many cases we divide the demographic (e.g. per capita income) into 3 colors representing the tertile (red for low, yellow for medium and blue for high). The key is displayed above the chart.

Results

N.b. I am having problems with using www.charte.ca as one increases the number of Excel lines of data beyond around 2000. It often times-out after several minutes and asks if I wish to continue to wait, this may happen several times and there appears to be no guarantee it will finally provide a result, instead basically locking up and requiring re-logging into www.charte.ca. This is the reason the data is sometimes divided by Jan-Apr and May-Jun, and we only look at the data for each of every few (e.g.5) days. Also why we reduce the number of countries in a chart to <=10.

United States

Include Page
US COVID-19 cases
US COVID-19 cases

California

Include Page
Covid-19 - Visualizing California data
Covid-19 - Visualizing California data

Globally

Include Page
Covid-19 - Visualizing Countries with various populations
Covid-19 - Visualizing Countries with various populations

Include Page
Covid-19 - Globally
Covid-19 - Globally

Visualizing Africa, Asia, Europe and South America

Include Page
Covid-19 - Visualizing the Africa, Asia, Europe and S America data
Covid-19 - Visualizing the Africa, Asia, Europe and S America data

Notes

Demographic correlations for California

We investigated the correlations between the demographics and the confirmed Covid-19 cases for California. We used the R squared coefficient of determination to characterize the degree of correlation between the various demographics using a linear fit.


Demographic
Income
Education
Cases
Politics
Age
Population density
Population
Per capita income*0.810.0032.280.0080.120.19

% of people completing college education for CA

0.81*0.0160.430.0090.200.043
Covid-19 confirmed cases0.00320.016*0.340.0120.140.77
% of Registered voters who are registered as Democratic0.280.430.034*0.0890.150.081
Median age0.0080.0080.0120.0089*0.00290.084
Population density (people/sq mile)0.120.200.140.250.0029*
Population0.190.0430.770.0810.084
*

It is seen that there is a strong correlation between the per capita income and education and between confirmed cases and population, a medium correlation between registered voter political leaning and education. The excel spreadsheet of the analysis of the above demographics and their correlations can be found here.

Use of www.charte.ca

We noticed that with a large amount of data (e.g. over 2200 lines of Excel comma-separated value data from say each day from January 22 thru July 22 and say greater than 20 states) the building of a grouped motion chart would not complete. Thus we broke the data down by either reducing the number of states (e.g. by selection states based on their population range)or reducing the number of days (e.g. by only including every 5th day).

The line graphs often use a logarithmic y-axis in order to spread the data out. Ideally, when the points are plotted in a log scale the y-axis would be labeled linearly. In other words, labeling the plots 1, 10, 100, etc., instead of 1, 2, 3, and also do this for the hover detail on the points. Right now, when one hovers over a point it gives the log base10 of the value rather than the value.
The problem is I can't find a way for the web based charting app I am using (www.charte.ca) to provide a Log axis when using line graphs (www.charte.ca does provide such a feature for the motion charts). Thus the only way round I have found is to take the log10 before giving the data to the chart. Also there does not appear to be a place to ask questions of the www.charte.ca people. Changing over to using a different charting /presentation package would be a non-trivial reordering of the data to fit the charting package. I am open to suggestions.

Ungrouped

Deaths vs Confirmed:

  • We tend to use a log log chart which provides greater visibility of a wide range of data (compare the two charts below), and since both the confirmed data and deaths are exponential in their behavior for most states. Also note that with a logarithmic scale: a straight line means exponential growth, and the steeper a line, the faster the total number of confirmed coronavirus cases or deaths is doubling.

    Linear plot
    Log Log plot


    .

  • The first confirmed cases were seen by WA 1/22/20, IL a day later and  AZ & CA on 1/25/20
    .
  • First deaths reported for Washington State were at the start of March
    .
  • Deaths start to really increase in the second week in March
    .
  • By the start of April, NY followed by NJ were leading the way in both deaths and confirmed cases
  • By the end of the second week in April, SD and UT are noticeably below the general line followed by other states
    .
  • On 3/30/20 WV was the last state to record a Covid-19 death  
  • At the end of the second week in April, WY appears to be the last state to have greater than one Corona-19 death.
    .

Deaths/Million Population vs  Confirmed cases/Million Population


  • The leading states in terms of deaths per million population are: NY, NJ, CT, MA, LA, MI, DE,DC, RI.
    The leading states in terms of confirmed per million of population are: NY, NJ, MA, DE, CT,RI, LA, DC, MI

    If one does not normalize by population the NY and NJ standout followed by the top of the bunch being MA and MI
    .
    Looking at a log vs log plot the trailing states one sees the lowest deaths are for AK SD, HI, MT, WY, ND. The lowest Confirmed cases are for AK, SD, SD,HI, MT, WY, ND.


  • Note that since it is a log-log scale no bubble appears for a state until there is at least 1 confirmed case and 1 death for the state.
  • % Confirmed and deaths both low for AK, VT, NH, ID
  • Cluster of DE, DC and RI with low deaths compared to the % confirmed cases
  • NY, NJ, MA, DE, CT, LA, RI, DC have the highest % confirmed cases.
  • By March 14th, WA, NY, CA, FL were reporting deaths.

Grouped data

If we color the bubbles by each state's political leaning the chart appears below. It is apparent that Covid-19 is impacting democratic states the hardest, followed by the swing states.


We can also group the data by age, income or education tertiles

Grouped by Income
Grouped by Age
Grouped by Education (Bachelor degree or Equivalent)