Note: This document was updated on 2014-11-12
- Three quick notes before we start
- TransUnion Credit Report Characteristics basics
- Digging into the data: variables, values, and units
- What the table actually looks like: a sample row
- What to do now
What is the product profile?
This is the first product profile produced by the Community Data Program Team. The purpose of the profile is to help users (that's you!) understand, in plain English, how to use a given table in the catalogue. Each profile will showcase a different dataset available in the catalogue. We will release three product profiles per year (January, May, and September).
Fundamentally, we want to encourage users to use the data in the catalogue to report on local trends. Sometimes playing around with a new dataset can be challenging. The product profile takes the edge off that first step by showing users what to expect when working with new data.
We consider this first product profile to be in Beta. That is to say, your feedback is not just welcome, but encouraged. We want to know what you found useful, what you would like to see in future releases, and any other comments that you may have.
What isn't the product profile?
The product profile is not an atlas, nor does it provide any analysis in its own right. We are not uncovering anything or presenting any findings. But we hope this report helps you do just that.
What is the TransUnion credit report characteristics dataset?
New for the 2012-2017 program cycle, the Community Data Program Team is acquiring Canada-wide data from credit rating agency TransUnion. As of now, the 2011 dataset is in the catalogue at http://communitydata-donneescommunautaires.ca/node/7664. Yearly data between 2012 and 2017 will be uploaded to the catalogue when we receive them.
This new dataset is exciting for two reasons. First, its variables help to measure financial vulnerability, which has been difficult to do until recently. And second, the data are available at the six digit Postal Code level for all of Canada, with the exception of certain suppression rules. Basically, any Postal Code with fewer than 15 individuals with credit files was removed from the data for reasons of confidentiality and statistical soundness. This allows analysts (like you!) not only to measure financial vulnerability, but to do so block by block, street by street, neighbourhood by neighbourhood.
Three quick notes before we start
1. The table is very large
The TransUnion dataset is extremely large (~80Mb). It is in .xlsx format with over 400,000 rows, each representing one six-digit Postal Code. That said, Microsoft Excel should be able to handle the table. If you are having trouble opening it, or would like the data in a different format, e.g. CSV, just let us know.
2. List of variables you'll find in the table
The list of variables, or column names, is available here.
3. Legal mumbo jumbo
Make sure you understand what you can and cannot do with the data. Also, if you're planning to publish something that uses TransUnion data, make sure you know the protocol associated with doing so.
Credit report characteristics basics
The catalogue entries for the credit report characteristics tables should have all relevant metadata. Metadata is information used to describe the table, like column names, geographic scale, and year. Here's a screenshot of the catalogue entry for the 2012 version of the table.
You can download the Credit Report Characteristics tables here:
Let's take a look at the variables, values, and units that make up the TransUnion data, so that opening the data for the first time isn't overwhelming.
Digging into the data: variables, values, and units
TransUnion credit report characteristics data include three debt-related variables that are based on individuals' credit files monitored by credit rating agency TransUnion. These three variables are: (1) Non-mortgage consumer debt (nmcd), (2) Risk Score ("rs"), and (3) Bankruptcy Score ("bs"). The following table lists the details for each variable. Values are based on the 2011 Q1 version of the table.
For further information on the components of the Risk and Bankruptcy Scores, see their respective PowerPoint presentations, provided by TransUnion in .pdf format. (Note that TransUnion refers to Non-mortgage consumer debt as "AT033" or "total balance of all trades", Risk Score as "AS115", and Bankruptcy Score as "AS105".) Unfortunately, because they are derived using proprietary formulae, we cannot know precisely how they are calculated.
Risk Score presentation
Bankruptcy Score presentation
For each variable in the TransUnion dataset (nmcd, rs, and bs) there are six descriptive statistics and one count. The six descriptive statistics are: sum, minimum, maximum, mean (average), median, and standard deviation. Each variable's count refers to the number of individuals for whom the variable (say, Risk Score) could be calculated.
Confused yet? Don't worry, we'll walk through a sample row so that you can see what we mean.
What the table actually looks like: a sample row
Let's take a look at a typical row from the 2012Q1 TransUnion credit report characteristics dataset, so that it isn't a shocker when you open the file yourself.
This is what a sample row looks like. (There are a lot of columns, so we've separated it into four bite-sized chunks.) The first eight columns describe the location of the Postal Code in question, as well as the number of credit files that are found within it. Here's a description of each variable:
- fsaldu: Six-digit Postal Code (Forward Sortation Area Local Delivery Unit)
- commname_tu: Municipality name (TU)*
- prov_tu: Province name (TU)*
- commname_pccf: Municipality name (PCCF)**
- prov_pccf: Province name (PCCF)**
- lat_pccf: Latitude of the six-digit Postal Code (PCCF)**
- lon_pccf: Longitude of the six-digit Postal Code (PCCF)**
- count: Number of credit files within the postal code
* According to TransUnion's records.
** According to the same-year Postal Code Conversion File (PCCF), not available before 2012.
You may have noticed that we (the CDP Team) merged the TU data with its same-year PCCF for all years after 2011. This saves you the trouble of having to do so on your own. (If you have an in-house tool to map Postal Codes, you're free to use that instead.)
You'll also notice that the municipality is sometimes different according to TransUnion and the PCCF. This difference is caused by two factors:
- TransUnion receives its data directly from financial institutions. Their records are typically correct, but may in some cases not be. For example, a person may have a credit card linked to an address that he or she hasn't updated following a move. Or the person may write their address as a pre-merger municipality, e.g. Scarborough rather than Toronto.
- The PCCF uses postal geographies to define its municipalities. So, the PCCF boundary for "TORONTO" may be different from the current political boundaries of the city.
Note that the number of credit files within any given postal code ("count") is roughly equal to all individuals 18 years and over. Each person can only have one credit file with TransUnion, and very, very few people over the age of 17 have no credit file.
The subsequent seven columns show the count and six statistics associated with the non-mortgage consumer debt ("nmcd") variable. nmcd_count is a tally, and the rest of the numbers are in current (unadjusted) Canadian dollars. So for the six-digit Postal Code shown above, the person with the smallest amount of debt excluding mortgage—amongst individuals with credit files—had precisely $240 in debt. The person with the most debt owed precisely $385,505. The mean (average) amount of consumer debt per individual living in this Postal Code was $79,734.47. The median was lower, at $38,004. (That tells us that there are a few big spenders who are dragging up the mean.) And finally, the standard deviation was $119,440.80.
Next we have seven columns that show the count and six statistics for the Risk Score ("rs") variable. As with nmcd, the rs_count is a tally. But the rest of the numbers do not have units. They are based on an index with arbitrary values, like demerit points, or your score in Tetris. Now, it's worth mentioning this next point in big bold letters, because it is not intuitive:
The LOWER the Risk Score, the HIGHER the risk of credit delinquency
The HIGHER the Risk Score, the LOWER the risk of credit delinquency
Here the person with the highest level of risk has a Risk Score of 460. The person with the lowest level of risk has a Risk Score of 870. The mean (average) Risk Score is 801, and the median Risk Score is 834. The standard deviation of Risk Scores, for those who are interested, is 96.63. Again, all these values refer to individuals within one six-digit Postal Code.
The last seven columns show the count and six statistics for the Bankruptcy Score ("bs") variable. Again, the count is a tally of individuals with credit files, whose Bankcruptcy Score could be calculated. Like the Risk Score, the rest of the numbers are based on an index, and do not have units. Just so things are crystal clear, we're going to emphasise this point again, because it's true for the Bankruptcy Score as well:
The LOWER the Bankruptcy Score, the HIGHER the risk of declaring personal bankruptcy
The HIGHER the Bankruptcy Score, the LOWER the risk of declaring personal bankruptcy
Here, the person with the highest probability of declaring bankruptcy has a Bankruptcy Score of 303. The person with the lowest probability of declaring bankruptcy has a Bankruptcy Score of 923. The mean (average) Bankruptcy Score is 752.28, and the median is slightly higher, at 761.5. For the wonks: the standard deviation is 157.47.
We made it to the end of the variables! Hopefully this has demystified the dataset a little.
What to do now
We encourage you to give the data a test drive, especially now that this profile is fresh in your mind. One simple exercise to get you using the data is to ask yourself: What's the average level of consumer debt for my Postal Code? (It's $27,766.65 for mine.) Eventually, we would love it if you used the data to inform your programming, report on trends, and even draft policy. After all, that is the ultimate purpose of the Community Data Program.
If you have any questions or comments, make sure to get in touch with us. We anticipate a lot of interest in this dataset, and so we recommend sharing findings, best practices, and press releases with other CDP members. Let us know too, so that we can show off your work!
Best of luck with this new and exciting dataset.