Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Contents

Table of Contents

Introduction

This is an attempt to use motion charts to visualize the daily growth, since January 2020, in US Covid-19 cases (confirmed and deaths) by State and by California county showing the population by bubble size and color by median age, education or income.

...

In particular, try out the motion charts in the Results section below. There is a slider at the bottom of each chart to move backward and forward in time, and one can move the mouse over a bubble to find more details.

Method

The Covid-19 statistics are from the Johns Hopkins University (JHU). There are raw data for confirmed cases and deaths by date for each county in each US state. For the US state analysis, the data from the counties are aggregated into the values for each state.

For each state or California (CA) county, we extracted various demographics including the ISO 2 character label for the US or the Abbreviation for CA, population, area, population density, education, income, median age, political leaning. The demographics were obtained from the following sources for the US.

For CA they were obtained from here:


A Perl script covid-us.pl was developed to gather the above information and cast it in a suitable form for the www.charte.ca motion charts and correlation data.  See:

...

The script also ranks the age, income, and education demographics for each state or county into low, medium, or high based on their tertiles. This is so these demographics can be used with charte.ca's grouping feature.

In many cases we divide the demographic (e.g. per capita income) into 3 colors representing the tertile (red for low, yellow for medium and blue for high). The key is displayed above the chart.

Results

N.b. I am having problems with using www.charte.ca as one increases the number of Excel lines of data beyond around 25002000. It often times-out after several minutes and asks if I wish to continue to wait, this may happen several times and there appears to be no guarantee it will finally provide a result, instead basically locking up and requiring re-logging into www.charte.ca. This is the reason the data is sometimes divided by Jan-Apr and May-Jun, and we only look at the data for each of every few (e.g.5) days. Also why we reduce the number of countries in a chart to <=10.

United States

Include Page
US

...

COVID-19 cases

...

US States Motion Charts of Covid-19 deaths vs confirmed cases

Cumulative:

Daily:

Observations

  • Looking at the total deaths vs cases it is seen that New York (NY), Delaware (DE), Massachusetts (MA), Rhode Island (RI) and Connecticut (CT) lead the pack followed by Maryland (MD), Illinois (IL) and more cently Louisiana (LA) but other states are catching up.
  • It is a similar situation when one normalizes the states by their populations.
  • When one looks at the new daily confirmed cases/Million state population and deaths/Million state population it is seen to be very variable day-to-day.
    • Arizona's increased confirmed cases pulled it away from the pack starting May 14, 2020, and Florida pulled away starting May 24, 2020
    • The values of new Confirmed cases/day/Million State population appears to decrease as one moves forward in time beyond May 23 but then to increase again (driven by Arizona) starting June 14, 2020. Maybe tgis part of the "opening".
  • Comparing the above with the same data but not normalized by state population
    • At the start of May there is a noticeable correlation between deaths and confirmed,  by the end of June the correlation is much weaker
    • At the start of May the deaths and confirmed is dominated by New York (NY), New Jersey (NJ), Illinois (IL), Massachusetts (MA), California (CA) and Maryland (MD). By the end of June California (CA)is still a leader joined by Texas (TX), Florida (FL), Arizona (AZ), and Georgia (GA).
  • Looking at the cases with bubbles sized by population density:
    • Some states with high population densities such as New York (NY), New Jersey (NJ) and Massachusetts (MA) have high numbers of confirmed cases while others such as Washington DC, Rhode Island (RI), Delaware (DE) have fairly low numbers of confirmed cases.
  • In general, the deaths/confirmed cases appear to be falling with time for especially as we get into June. This may be a mixture of increased testing finding more cases as well as more effective treatment.
    • Exceptions that are still increasing include: NJ, MA, PA, CT, SD, NH, IL.

California

California Counties Motion Charts of Covid-19 deaths vs confirmed cases

bubbles sized by population density for CA Counties and colored by per capita income (Jan 25, 2020 - Apr 30, 2020)

cases per million population with bubbles sized by CA county population and colored by per capita income (May 1, 2020 - Jun 25, 2020) 

...

  • The leading counties are Lake (LAK), Nevada (NEV), Mono (MNO) with Imperial (IMP) catching up since the end of May. 
  • These are followed by LA County (LA), Orange Country (OR), Kings County (KIN), Tuolumne (TUO) and Riverside (RIV).

...

  • The leading counties are LA County (LA), Orange County (OR), Lake County (LAK), Nevada County (Nevada), Riverside (RIV), San Diego County (SD) and San Bernadino County (SBD)

...

US COVID-19 cases

California

Include Page
Covid-19 - Visualizing California data
Covid-19 - Visualizing California data

Globally

Include Page
Covid-19 - Visualizing Countries with various populations
Covid-19 - Visualizing Countries with various populations

Include Page
Covid-19 - Globally
Covid-19 - Globally

...

Visualizing Africa, Asia, Europe and South America



Include Page
Covid-19 - Visualizing the Africa, Asia, Europe and S America data
Covid-19 - Visualizing the Africa, Asia, Europe and S America data

Notes

Demographic correlations for

...

California

We investigated the correlations between the demographics and the confirmed Covid-19 cases for

...

California. We used the R squared coefficient of determination to characterize the degree of correlation between the various demographics using a linear fit.


Demographic
Income
Education
Cases
Politics
Age
Population density
Population
Per capita income*0.810.0032.280.0080.120.19

% of people completing college education for CA

0.81*0.0160.430.0090.200.043
Covid-19 confirmed cases0.00320.016*0.340.0120.140.77
% of Registered voters who are registered as Democratic0.280.430.034*0.0890.150.081
Median age0.0080.0080.0120.0089*0.00290.084
Population density (people/sq mile)0.120.200.140.250.0029*
 

Population0.190.0430.770.0810.084
 

*

It is seen that there is a strong correlation between the per capita income and education and between confirmed cases and population, a medium correlation between registered voter political leaning and education. The excel spreadsheet of the analysis of the above demographics and their correlations can be found here.

Use of www.charte.ca

We noticed that with a large amount of data (e.g. over 2200 lines of Excel comma-separated value data from say each day from January 22 thru July 22 and say greater than 20 states) the building of a grouped motion chart would not complete. Thus we broke the data down by either reducing the number of states (e.g. by selection states based on their population range)or reducing the number of days (e.g. by only including every 5th day).

The line graphs often use a logarithmic y-axis in order to spread the data out. Ideally, when the points are plotted in a log scale the y-axis would be

...

labeled linearly. In other words, labeling the plots 1, 10, 100, etc., instead of 1, 2, 3, and also do this for the hover detail on the points. Right now, when one hovers over a point it gives the log base10 of the value rather than the value.
The problem is I can't find a way for the web based charting app I am using (www.charte.ca) to provide a Log axis when using line graphs (www.charte.ca does provide such a feature for the motion charts). Thus the only way round I have found is to take the log10 before giving the data to the chart. Also there does not appear to be a place to ask questions of the www.charte.ca people. Changing over to using a different charting /presentation package would be a non-trivial reordering of the data to fit the charting package. I am open to suggestions.

Ungrouped

Deaths vs Confirmed:

  • We tend to use a log log chart which provides greater visibility of a wide range of data (compare the two charts below), and since both the confirmed data and deaths are exponential in their behavior for most states. Also note that with that with a logarithmic scale: a straight line means exponential growth, and the steeper a line, the faster the total number of confirmed coronavirus cases or deaths is doubling.

    Linear plot
    Log Log plot
    Image Modified
    Image Modified


    .

  • The first confirmed cases were seen by WA 1/22/20, IL a day later and  AZ & CA on 1/25/20
    .Image Modified
  • First deaths reported for Washington State were at the start of March
    Image Modified.
  • Deaths start to really increase in the second week in March
    Image Modified.
  • By the start of April, NY followed by NJ were leading the way in both deaths and confirmed cases
    Image Modified
  • By the end of the second week in April, SD and UT are noticeably below the general line followed by other states
    Image Modified.
  • On 3/30/20 WV was the last state to record a Covid-19 death  
  • At the end of the second week in April, WY appears to be the last state to have greater than one Corona-19 death.
    .Image Modified

Deaths/Million Population vs  Confirmed cases/Million Population

...

  • The leading states in terms of deaths per million population are: NY, NJ, CT, MA, LA, MI, DE,DC, RI.
    The leading states in terms of confirmed per million of population are: NY, NJ, MA, DE, CT,RI, LA, DC, MI
    Image Modified
    If one does not normalize by population the NY and NJ standout followed by the top of the bunch being MA and MI
    Image Modified.
    Looking at a log vs log plot the trailing states one sees the lowest deaths are for AK SD, HI, MT, WY, ND. The lowest Confirmed cases are for AK, SD, SD,HI, MT, WY, ND.
    Image Modified

  • Note that since it is a log-log scale no bubble appears for a state until there is at least 1 confirmed case and 1 death for the state.
  • % Confirmed and deaths both low for AK, VT, NH, ID
  • Cluster of DE, DC and RI with low deaths compared to the % confirmed cases
  • NY, NJ, MA, DE, CT, LA, RI, DC have the highest % confirmed cases.
  • By March 14th, WA, NY, CA, FL were reporting deaths.

Grouped data

If we color the bubbles by each state's political leaning the chart appears below. It is apparent that Covid-19 is impacting democratic states the hardest, followed by the swing states.

Image ModifiedImage Modified


We can also group the data by age, income or education tertiles

Grouped by Income
Grouped by Age
Grouped by Education (Bachelor degree or Equivalent)
Image ModifiedImage ModifiedImage Modified