Product Profile 2.2: Taxfiler data (2010)

T1-derived datasets from the Income Statistics Division of Statistics Canada (known as Annual Estimates for Census Families and Individuals, or simply Taxfiler data) are an increasingly important source of Canada-wide small-area income data.

Taxfiler data offers income-related information down to the CT and rural six-digit postal code level.  This information is generated from individual tax files from the Canada Revenue Agency, and published on a yearly basis.

In this Product Profile, we will describe the Taxfiler data in general, and explore a sample table that has historically garnered interest: Family Data Table 18, After-tax low income (based on after-tax low income measures, LIMs).  In the end, you'll be more familiar with Taxfiler data, and you'll be able to calculate the rate of low income in your community.

Whether or not you're familiar with Taxfiler data, you'll still find the Taxfiler metadata reference file useful for looking up variables and geographies.  We recommend that you download it and use it to help find the data you're looking for.

Check out the Taxfiler master metadata file

Contents

  1. What is Taxfiler data?
  2. Note on geographic scales
  3. A sample Taxfiler table: Family Data - Table 18
  4. Sample task: Looking up the rate of low income for your community

1. What is Taxfiler data?

Taxfiler data is the colloquial term for a set of standardised data products generated from T1 tax files by Statistics Canada's Income Statistics Division (ISD). The formal term is "Annual Estimates for Census Families and Individuals".

Note that in previous years the data was generated by the Small Area and Administrative Data (SAAD) Division. You may come across this abbreviation when dealing with Taxfiler data from before 2010.

Taxfiler data includes four products that the Community Data Program has acquired for all of Canada. Each product consists of a group of tables related to that product.

  • Family Data
  • Neighbourhood Income and Demographics
  • Seniors
  • Financial and Charitable Donors

In reality Financial and Charitable Donors is an umbrella term for eight products that relate to personal finances and charitable giving: Charitable Donations, Canadian Taxfilers, Canadian Capital Gains, Canadian Investment Income, Canadian Investors, Canadian Savers, RRSP Contribution Limits (Room), and RRSP Contributors.

2. Note on geographic scales

Because Taxfiler data use Postal Codes as their basic geographic building block, they are available at two different kinds of scales: Postal geographic scales and Census geographic scales.

Postal geographic scales are based on (or aggregated from) six-digit Postal Codes.  These include:

1 - Postal walk
2 - Other postal walk
3 - Urban forward sortation area (residential area)
4 - Rural route
5 - Suburban service
6 - Rural postal code (within city)
7 - Other urban area (non-residential within city)
8 - City total (a.k.a. postal city)
9 - Rural postal code (not in city)
10 - Other provincial total
11 - Province or territory total
12 - Canada

Census geographic scales on the other hand are more familiar to most of us.  For Taxfiler data, we have:

21 - Census division
31 - Federal electoral district
41 - Census metropolitan area
42 - Census agglomeration
51 - Economic region
61 - Census tract

If you're interested in Taxfiler data for an area without Census tracts, we recommend using a combination of Geographies 8 ("City total", also known as "Postal City") and 9 ("Rural postal code (not in city)").  It's possible to uncover which six-digit Postal Codes fall within any given area coded 8 or 9 by merging the "Place name" variable in Taxfiler data with the "COMMNAME" variable in the Postal Code Conversion File.  Contact us if you need help with this.

3. A sample Taxfiler table: Family Data - Table 18

To illustrate how to navigate a Taxfiler table (they're all formatted in the same way), let's take Family Data - Table 18, as an example.  You can download the table yourself here:

Table F-18 Family data - After-tax low income (based on after-tax low income measures, LIMs), 2010

***Please note that this post has been edited. We no longer transform each T1FF table from Income Division, however this tutorial is useful for familiarizing users with the variables in the taxfiler data.***

When you open the table, you're presented with six tabs: Data, Variables, Source, GeoLegend, PostalGeoHierarchy, and Original

  • Data reveals the Taxfiler data with geographic units as rows and variables as columns
  • Variables lists the variables available in the table
  • Source provides the table's name and data source
  • GeoLegend explains each of the Postal and Census geographic scales that are available (or in some cases not available) for Taxfiler data
  • PostalGeoHierarchy displays the Postal geographic scales in a visual format
  • Original shows the unformatted tables sent to us from Statistics Canada

Let's take a look at each tab to get ourselves familiar with its contents.

Data

The Data tab houses the raw data for each table.  Geographic units are presented as rows.  Variables are presented as filterable columns.

The Data tab always begins with five standard columns: City ID, Postal area, Postal walk, Level of geo, and Place name.  These five columns, and three in particular, help us identify the geographies that we're interested in.

  • City ID is a unique identifier for the Place name variable.
  • Postal area is the geographic unique identifier, regardless of the place name.  A Census tract code or three-digit postal code will be listed in this field.
  • Postal walk is not useful for our purposes, as we don't have access to the postal walk geography.
  • Level of geo indicates the geographic scale for the row in question.  (So if this field says "61", it means the row is a Census tract.  More on geographic scales later.)
  • Place name is the plain-language geographic identifier.  (Note that the "Place name" in Taxfiler data for geographies 1-12 corresponds to "COMMNAME" in the Postal Code Conversion File.  This makes it possible for the GIS people to map Taxfiler data, albeit in a roundabout way.  "Place name" does not correspond with "COMMNAME" in the PCCF for geographies 21-61.)

As the comment indicates, you can use the Level of geo field to filter for the geographic scale you're interested in.  Perhaps, for example, you only care about Census tracts.  In this case, you'd filter Column D to show the value "61" only.

The subsequent variables are specific to the table in question, in this case Family Data - Table 18. 

Let's take a look at a couple of variables for the first row (Row 2 in Excel).  This row presents data for Bauline: a Forward Sortation Area (FSA) defined by the aggregation of six-digit Postal Codes that begin with A1K (Place name="BAULINE", Postal area="AIK").  Bauline is located north of St. John's, Newfoundland, in case you're interested.

  • Column F indicates that Bauline has 50 couple families with no children
  • Column BE indicates that Bauline has 20 lone parent families
  • Column DL indicates that it has a total of 390 residents ("All families & non-family persons · # of persons · Total")

   

We'd recommend looking up data for a place you're familiar with in order to get used to Taxfiler tables.  To do so, filter rows by Level of geo (i.e. the geographic scale you're interested in), Postal area (which lists the Census tract, FSA, or other geographic identifier you might be looking for), and/or Place name

Just remember that Place name refers to Postal place names for geographies 1-12, and Census place names for geographies 21-61.  To illustrate: if you're filtering Level of geo to show only "61" (Census tract), then the Place name associated with each of those Census tracts will refer to the Census parent of the tract, which is the Census metropolitan area (CMA) – not the municipality or Postal City ("City total", Geography 8).  Conversely, if you're filtering Level of geo to show only "3" (Forward Sortation Area), then the Place name associated with each of those FSAs will refer to the Postal parent of the FSA, which would be the name of the community as defined by Canada Post.

To further illustrate, if you're interested in Markham FSAs (Level of geo="3") and CTs (Level of geo="61"), the CTs would have "TORONTO" as their Place name, but the FSAs would have "MARKHAM" as their Place name.

If this is way over your head, feel free to drop us a line.

Variables

The Variables tab lists each of the variables available in the table.  In this case there are 149 variables in total.  Column A and B refer to the variable's column in the Data tab, and the variable's name respectively.

Next to the Data column and Variable columns, you'll find the three "tiers" or components of each variable.  The three tiers correspond to the fact that the original, unformatted tables have three rows of variables.

To illustrate what this is all about, let's look at Row 20: "Couple families · # of persons · With 3+ children".  Its "Data column" is S.  If we go to the "Data" tab then, under S, we find the same variable.

Meanwhile, back in the Variables tab, we see that the variable for Row 20 has three tiers:

  1. Couple families
  2. # of persons
  3. With 3+ children

We got those values from the original, unformatted table.  If we take a quick peek at the Original tab, we'll see under Column S "With 3+ children" (Cell S4).  Its parent variable is "# of persons" (Cell P3).  The parent of that is "Couple families" (Cell F2). 

We've simply merged the three rows together in the Data and Variables tabs to make things easier to filter and look up.

For a full list of variables within all of the Taxfiler products, we encourage you to check out the Taxfiler metadata file.  The metadata file also makes it easier to ascertain which tables you might be interested in before downloading them.

Check out the Taxfiler metadata reference file

Source

The Source tab displays the name and source of the table. 

For Table 18, you'll see the following information:

  • Table F-18 Family data - After-tax low income (based on after-tax low income measures, LIMs), 2010
  • Source: Statistics Canada, Income Statistics Division, 2010, Annual Estimates for Census Families and Individuals, 13C0016
  • © This data includes information copied with permission from Canada Post Corporation

GeoLegend

GeoLegend includes a list of all the Postal and Census geographic scales available for any given Taxfiler table.  For reference, we've reproduced that table on communitydata.ca.

Click here for a list of geographies available for 2010 Taxfiler data

PostalGeoHierarchy

The PostalGeoHierarchy tab shows how the postal geographic scales relate to one another.  The diagram from the tab is copied below:

Original

The Original tab displays the raw, unformatted data that we receive from Statistics Canada.  The values in the Original tab are identical to the values in the Data tab.  They just look a little messy and aren't as easy to filter:

4. Sample task: Looking up the rate of low income for your community

Now that we know what to expect with Taxfiler tables, let's use Table 18 to calculate the rate of low income based on the after-tax LIM.  Let's start by going to the Variables tab.

We're looking for two variables:

  1. A numerator to tell us the number of persons in low-income families plus the number of non-family persons with low incomes
  2. A denominator to tell us the number of persons in all families plus the number of non-family persons

We can look this up any number of ways, but let's filter a few columns for fun.  Since we aren't interested in the number of children, we can filter Variable tier 3 by "Total".

Under Variable tier 2, we know we want the number of persons, not the number of families, median income, or number of individuals within a given age range.  For this reason, we could then filter this field by "# of persons".

Under Variable tier 1, we know we want to capture everyone, not just a type of family, so we can filter this field by "All families & non-family persons" and "All low income families & non-family persons".

At this stage, we know exactly which two variables we need, because we've filtered out everything else.  These two variables are:

  • "All families & non-family persons · # of persons · Total" (Column DL)
  • "All low income families & non-family persons · # of persons · Total" (Column EJ)

Back in the Data tab, we see the variables we're looking for in Columns DL and EJ respectively:

 

To calculate the rate of low income, you need to divide the value in Column EJ (numerator) by the value in Column DL (denominator) for the communities you're interested in.  Or for any community if you'd like.

What's the rate of low income for your area?

--

That's it for this Product Profile.  As with all data, we encourage you to play around.  If you need help with this or any dataset, make sure to ask us. 

Happy data!