2. Three quick notes before we start
3. TransUnion Credit Report Characteristics basics
4. Digging into the data: variables, values, and units
5. What the table actually looks like: a sample row
6. What to do now
What is the product profile?
This is the first product profile produced by the Community Data Program Team. The purpose of the profile is to help users (that's you!) understand, in plain English, how to use a given table in the catalogue. Each profile will showcase a different dataset available in the catalogue. We will release three product profiles per year (January, May, and September).
Fundamentally, we want to encourage users to use the data in the catalogue to report on local trends. Sometimes playing around with a new dataset can be challenging. The product profile takes the edge off that first step by showing users what to expect when working with new data.
We consider this first product profile to be in Beta. That is to say, your feedback is not just welcome, but encouraged. We want to know what you found useful, what you would like to see in future releases, and any other comments that you may have.
What isn't the product profile?
The product profile is not an atlas, nor does it provide any analysis in its own right. We are not uncovering anything or presenting any findings. But we hope this report helps you do just that.
What is the TransUnion credit report characteristics dataset?
New for the 2012-2017 program cycle, the Community Data Program Team is acquiring Canada-wide data from credit rating agency TransUnion. As of now, the 2011 dataset is in the catalogue at http://communitydata-donneescommunautaires.ca/node/7664. Yearly data between 2012 and 2017 will be uploaded to the catalogue when we receive them.
This new dataset is exciting for two reasons. First, its variables help to measure financial vulnerability, which has been difficult to do until recently. And second, the data are available at the six digit Postal Code level for all of Canada, with the exception of certain suppression rules. Basically, any Postal Code with fewer than 15 individuals with credit files was removed from the data for reasons of confidentiality and statistical soundness. This allows analysts (like you!) not only to measure financial vulnerability, but to do so block by block, street by street, neighbourhood by neighbourhood.
Three quick notes before we start
1. Dealing with the size of the table: The TransUnion dataset is extremely large. It is a .csv (comma-separated table) with precisely 431,529 individual rows, each representing a six-digit Postal Code. Microsoft Excel will likely get cranky if you use it to try opening the file. We would recommend asking your IT department if you can be given an SPSS license. (You could even use GIS to open the table if you're familiar with mapping software.) If you don't have access to any statistical or database software, don't panic—there are other options. Email us at email@example.com so that we can point you in the right direction.
2. List of variables and basic info about the data ("metadata"): For reference, the TransUnion dataset comes bundled with a Microsoft Excel spreadsheet that provides both metadata (basic information about the data) and a list of variables. This file is called TU_2011Q1_MV.xls. "MV" refers to metadata and variables.
3. Legal mumbo jumbo: Make sure you understand what you can and cannot do with the data. Also, if you're planning to publish something that uses TransUnion data, make sure you know the protocol associated with doing so.
Credit report characteristics basics
This information explains the basics of the credit report characteristics datasets. (Metadata refers to information about the data. Scale refers to the unit of geography at which the data are available.)
Metadata and variables
|TU_20##Q1_PC_MV.xls (bundled with data)
|First quarter of each year, 2011-2017 inclusive
|Six-digit Postal Code
|The credit report characteristics dataset measures levels of consumer debt and related risk
|Presentations on Risk Score and Bankruptcy Score (courtesy of TransUnion)
The rest of this report refers specifically to the 2011Q1 dataset, but the concepts are applicable to any given year.
Download the 2011Q1 dataset here:
Let's first take a look at the variables, values, and units that make up the TransUnion data, so that opening the data for the first time isn't overwhelming.
Digging into the data: variables, values, and units
TransUnion credit report characteristics data include three debt-related variables that are based on individuals' credit files monitored by credit rating agency TransUnion. These three variables are: (1) Non-mortgage consumer debt ("NMCD"), (2) Risk Score ("RS"), and (3) Bankruptcy Score ("BS"). The following table lists the details for each variable.
For further information on the components of the Risk and Bankruptcy Scores, see their respective PowerPoint presentations, provided by TransUnion in .pdf format. (Note that TransUnion refers to Non-mortgage consumer debt as "AT033" or "total balance of all trades", Risk Score as "AS115", and Bankruptcy Score as "AS105".) Unfortunately, because they are derived using proprietary formulae, we cannot know precisely how they are calculated.
For each variable in the TransUnion dataset (NMCD, RS, and BS) there are six descriptive statistics and one count. The six descriptive statistics are: sum, minimum, maximum, mean (average), median, and standard deviation. Each variable's count refers to the number of individuals for whom the variable (say, Risk Score) could be calculated.
Confused yet? Don't worry, we'll walk through a sample row so that you can see what we mean.
What the table actually looks like: a sample row
Let's take a look at a typical row from the 2011Q1 TransUnion credit report characteristics dataset, so that it isn't a shocker when you open the file yourself.
This is what a sample row looks like. (There are a lot of columns, so we've separated it into four bite-sized chunks.) The first four columns show the Postal Code, city, province, and a variable called "Count". Postal Codes are the unique identifiers in this dataset, and I've blurred the last three digits of the sample Postal Code for the sake of anonymity. Count refers to the number of individuals that have a credit file with TransUnion. Just so you know, this is nearly all Canadians aged 18 years and over.
The subsequent seven columns show the count and six statistics for the non-mortgage consumer debt ("NMCD") variable. NMCD_Count is a tally, and the rest of the numbers are in current (unadjusted) Canadian dollars. So for this postal code, the person with the smallest amount of debt—of individuals with credit files—had precisely $117 in debt. The person with the most debt owed precisely $319,121. The mean (average) amount of consumer debt per individual living in this Postal Code was $20,515.37 plus half a cent. The median was far lower, at $8,113. (That tells us that there are a few big spenders who are dragging up the mean.) And finally, the standard deviation was $55,101.52 plus one-tenth of a cent.
Next we have seven columns that show the count and six statistics for the Risk Score ("RS") variable. As with NMCD, the RS_Count is a tally. But the rest of the numbers do not have units. They are based on an index with arbitrary values, like demerit points, or your score in Tetris. Now, it's worth mentioning this next point in big bold letters, because it is not intuitive:
The LOWER the Risk Score, the HIGHER the risk of credit delinquency
The HIGHER the Risk Score, the LOWER the risk of credit delinquency
Here the person with the highest level of risk has a Risk Score of 430. The person with the lowest level of risk has a Risk Score of 870. The mean (average) Risk Score is 731.882, and the median Risk Score is 809. The standard deviation of Risk Scores, for those who are interested, is 134.626. Again, all these values refer to individuals within one six-digit Postal Code.
The last seven columns show the count and six statistics for the Bankruptcy Score ("BS") variable. Again, the count is a tally of individuals with credit files, whose Bankcruptcy Score could be calculated. Like the Risk Score, the rest of the numbers are based on an index, and do not have units. Just so things are crystal clear, I'm going to emphasise this point again, because it's true for the Bankruptcy Score as well:
The LOWER the Bankruptcy Score, the HIGHER the risk of declaring personal bankruptcy
The HIGHER the Bankruptcy Score, the LOWER the risk of declaring personal bankruptcy
Here, the person with the highest probability of declaring bankruptcy has a Bankruptcy Score of 152. The person with the lowest probability of declaring bankruptcy has a Bankruptcy Score of 944. The mean (average) Bankruptcy Score is 610.5, and the median is slightly higher, at 623. For the wonks: the standard deviation is 219.824.
We made it! Hopefully this has demystified the dataset a little.
What to do now
We encourage you to give the data a test drive, especially now that this profile is fresh in your mind. One simple exercise to get you using the data is to ask yourself: What's the average level of consumer debt for my Postal Code? (It's $27 766.65 for mine.) Eventually, we would love it if you used the data to inform your programming, report on trends, and even draft policy. After all, that is the ultimate purpose of the Community Data Program.
If you have any questions or comments, make sure to get in touch with us at firstname.lastname@example.org. We anticipate a lot of interest in this dataset, and so we recommend sharing findings, best practices, and press releases with other CDP members. Let us know too, so that we can show off your work!
Best of luck with this new and exciting dataset.