In Part 2 of this Product Profile, we explain some of the issues associated with the voluntary nature of the NHS, and how to address them. We also show what some consortia have begun to do about NHS data quality, especially with respect to mapping the global non-response rate. (More about global non-response rates in a moment.)
In addition, you may want to check out some other resources on the NHS:
|Official CCSD position on the NHS||Statistics Canada reference guide to each NHS topic||All other official NHS reference materials|
- Non-response bias and the global non-response rate
- Strategies to assess the quality of NHS data
- Preliminary mapping of global non-response rates
Non-response bias and the global non-response rate
The 2011 National Household Survey (NHS) is a voluntary survey of the Canadian population. Questions in the NHS replace those previously found in the long-form Census (prior to 2011).
In light of the change from a mandatory to a voluntary sample, the NHS may under-report the number of people belonging to certain subgroups. Results can be skewed towards groups that are more inclined to respond to surveys (e.g. the highly educated, those with middle-upper incomes, and non-immigrants). This under-reporting is known as non-response bias.
For each geographic unit, the non-response is measured by the global non-response rate, or GNR. When the GNR exceeds 50 percent for a given geographic unit, Statistics Canada will not release the data on its website. That is not to say that they have suppressed it. It simply isn't published on their website for the public to view and download.
With this in mind, any data that the Community Data Program orders from Statistics Canada includes all geographic units regardless of GNR. The GNR itself is also provided for each unit. You're welcome to use data with a GNR greater than 50 percent, but make sure to take extra caution—this means more than 50 percent of those who received a survey did not fill it out.
Because CDP-ordered NHS data is never suppressed for data quality reasons, the only reason for data suppression is Statistics Canada's confidentiality rules. In essence, if a geographic unit has fewer than 250 individuals, then the data will be suppressed and replaced with an "x".
Data quality flags in previous Censuses
If you'd like to compare data quality in the 2006 long-form Census with that of the 2011 NHS, you absolutely can.
Non-reponse rates for Censuses prior to the NHS were measured using a five-digit data quality code, or flag, instead of numerical GNR values. You'll notice the five digit number following any given geographic unit in the 2011 Census or previous Censuses. For example, the "00000" next to St. John's (001) in the image below is the data quality flag.
For 2006 data, the values that each digit can take on is summarised in the publication below. Specifically, see the table on pages 13 and 14.
[Edit 2014-02-03: For the 2011 short-form Census, the values each digit can take on is summarised in the website below.
The 2011 NHS also has a five-digit data quality flag (in addition to the numerical GNR). But its fourth digit is different from that of previous long-form Censuses.
Global non-response isn't everything
Keep in mind that the GNR is not a perfect measure of data quality. A geographic area with a high GNR could accurately reflect its underlying population. Similarly, a geographic area with a low GNR could be unusually biased if non-respondents make up a specific subset of the population. As such, we recommend that you use your discretion when working with the NHS. When possible, try to verify numbers with other data sources, or your own knowledge. We'll explain more about this in a moment.
Risk of error is higher for small populations
According to Statistics Canada, the risk of error increases for smaller levels of geography (e.g. neighbourhoods) and for smaller populations in general. If you plan to use the NHS at the DA, or even small-population CT or CSD, make sure to take extra caution.
Avoid comparing the NHS with Censuses prior to 2011
We recommend that you avoid comparing 2011 NHS results with those from the 2006-and-earlier long-form Census for two reasons. First, the data may not be comparable due to the "non-response bias" explained above.
Secondly, there are differences between the target population of the 2011 NHS and that of the 2006 Census. The NHS does not cover persons living in institutional collective dwellings such as hospitals, nursing homes and penitentiaries. Nor does it cover persons living in non-institutional collective dwellings such as work camps, hotels and motels, and student residences. By contrast, the 2006 Long-Form Census did include collective dwellings.
We understand that in some cases you will need to compare the 2011 NHS with earlier long-form Censuses. We simply recommend that you use extra caution and be explicit about possible sources of variation and error. Some strategies to do so are outlined below.
Strategies to assess the quality of NHS data
There are at least three quantitative ways to verify the accuracy of NHS data for any given geographic unit:
- Check the GNR
- Compare the 2011 NHS population estimate with the 2011 Census population count*
- Cross-check the data with an another data source, e.g. Taxfiler
We cover the first two strategies in this Product Profile.
*You can also verify differences between variables common to the 2006 long-form Census and the 2011 NHS, but remember that changes over time and in the target population may affect how you interpret the comparison.
We've prepared a data quality spreadsheet to help assess the data quality of the NHS for any given geographic area. The spreadsheet lists standard census geographies, their GNRs, their five-digit data quality flags, and a comparison between the 2011 Census population counts and 2011 NHS population estimates. The next two subsections of the Product Profile explain how to use the spreadsheet to help gauge the quality of NHS data at any given geographic area.
How to check the GNR
The goal here is to check the GNR value for a given geographic unit, or area. In general, a lower GNR is better. The lower the GNR, the greater the proportion of those who received surveys filled them out. The spreadsheet makes it relatively to check the GNR.
First, download and open the spreadsheet.
If you're interested in census metropolitan areas, census agglomerations, or census tracts, then select the tab "CMACACT".
If you're interested in Canada, provinces or territories, census divisions, census subdivisions, or dissemination areas, then select the tab "CDCSDDA".
Rows are geographic units. If you need help finding the geographic units of interest to you, don't hesitate to contact us.
Columns will help you asses quality. In the column named "n_gnr", you will find the GNR for each geographic unit. The GNR for Canada, for example, is 26.1 percent. That is to say, 26.1 percent of Canadians who received a National Household Survey did not fill it out. [Edit 2014-02-03: Note that the GNR is not perfectly equal to the proportion of people who did not fill out a survey. It is in reality a mixed measure than includes both the overall response rate and an indicator of survey completeness.]
As a general rule, the lower the GNR, the better. In other words, the more people who filled out the survey in a given geographic area, the better.
How to compare 2011 NHS population estimates with 2011 Census population counts
The goal here is to compare what the 2011 Census found the actual population to be, with what the 2011 NHS estimated the population to be. If the actual population and population estimate are close, the NHS data are more likely to accurately represent the underlying population. Like the GNR, this is not perfect, but it will help to paint a general picture of data quality.
First, download and open the data quality spreadsheet.
If you're interested in census metropolitan areas, census agglomerations, or census tracts, then select the tab called "CMACACT".
If you're interested in Canada, provinces or territories, census divisions, census subdivisions, or dissemination areas, then select the tab called "CDCSDDA".
The column named "c_pop" is the 2011 Census population count. The column named "n_pop" is the 2011 NHS population estimate. The column named "n_pop / c_pop" is the ratio of the 2011 NHS population estimate to the 2011 Census population count.
The value in this last column will tell you how much higher or lower the NHS population estimate is, in comparison to the population Census count. A value lower than 1 means the NHS estimate is lower than the Census count. A value higher than 1 means the estimate is higher than the count. A value of 1 means the estimate and count are identical. The closer this number is to 1, the better.
How to check NHS data against other data sources
This process is less simple than the first two. Because Taxfiler data aren't available at the census subdivision or dissemination area level, it can be difficult to make comparisons between non-tracted census geographies in the NHS and postal geographies in Taxfiler data. That said, there are strategies. We'll touch upon some of them at an upcoming CDP webinar on Taxfiler and NHS data.
Preliminary mapping of global non-response rates
Some consortia have begun working with GNR rates in their communities. In Regina, York Region, and Peel Region, we've seen maps of GNRs by census tract. Feel free to check them out. Mapping GNRs in your respective communities is a good way to better understand NHS data quality from a geographic perspective.
|View the map of GNRs in Regina (forthcoming)||View the map of GNRs in York Region|
If you've mapped the GNR in your community, or done anything else with NHS data, let us know so that we can share the experience with other consortia across the country.
We hope this helps and inspires you to use the NHS, despite its shortcomings. And as always, happy data!