you're reading...

Recent Posts

Analysis: How we use NCDHHS, Johns Hopkins data

June 22. Analysis. North Carolina policymakers have struggled with what safely reopening looks like as the novel coronavirus ravages both the state and nation. To keep up with health, economic, and racial metrics, elected officials often rely heavily on data and to inform their decision-making process. As such, North Carolina has conveyed new COVID-19 cases, hospitalization rates, and death tolls through NCDHHS’ website, which parallel closely to Johns Hopkins University (JHU) and other sources as well.

In addition to NCDHHS, Cornelius Today has relied on JHU to gain an accurate picture of the virus’ footprint in North Carolina. Every morning, the university releases a new set of data on how the virus has affected not only constituencies across the United States, but throughout the world in the previous day. While there are small differences in NCDHHS and JHU, both sets of data show a general trend upward and the data differ by only a few cases each day. These reporting anomalies are consistent and expected when numerous sources compile their data. Also, when a few independent sources report similar findings, the consistency insinuates data integrity in each source.

For over a month, Cornelius Today has collected both DHHS’ data and Johns Hopkins’. CT has looked for discrepancies in the sources and has yet to find any. Each day, CT has compiled graphs using Johns Hopkins’ open-source data (https://github.com/CSSEGISandData/COVID-19).

CT has compared four metrics in addition to the provision of graphs: the 7-day average number of new cases, 7-day median number of new cases, 31-day average number of new cases, and 31-day median number of new cases. In Statistics, averages, or means, are not always trustworthy because they aggregate both the usual cases and the outliers; therefore, CT emphasized the medians as well.

For the sake of simplicity, imagine a set of the following numbers: 0, 25, 30, 28, 22. Zero is an outlier, much like the days Mecklenburg reported zero new cases or the days in May when North Carolina reported above 1,500. The average of the five numbers is 21, which weights the zero as part of the five, but the median is 25. In this case, 25 is more representative of the five numbers than 21 because the numbers span the twenties and the median weights the zero with less emphasis than the average does. Further analyses incorporating more advanced statistical techniques would show the same, but save that for another day.

And now back to North Carolina’s COVID-19 numbers. Statisticians and Public Health experts rely upon multiple metrics and indices to convey the novel coronavirus’ impact on Public Health and the economy. The old saying by an anonymous source, often misattributed to Mark Twain follows: “There are three kinds of lies: lies, damned lies, and statistics.” This quote is appropriate if the multiple sources compiling the data only measured either new cases of COVID-19, hospitalization rates, or death tolls. None of the three alone give an accurate representation of the coronavirus’ toll, but with these metrics in unison, Public Health experts have a clearer picture of whether North Carolina and the United States should remain open or closed.

As data analysis of the novel coronavirus continues, it is important to consume as much information as possible, especially as Public Health experts gain more of an understanding on the virus’ impact each day. Cornelius Today’s intent is to provide accurate reporting of multiple data points, including death tolls, hospitalization rates, and new cases. With more clarity that only time can provide, the United States and the world will have a deeper understanding of how COVID-19 affects the global community.

If you have any further questions on Cornelius Today’s ongoing use of JHU’s data, we would be more than happy to answer them. Email: corneliustoday@gmail.com