Binning is a useful tool to have in your analytic arsenal (my colleagues and I have written several blogs on the subject if you’re interested in learning more about the topic). When I ask various analysts how they use binning to improve their projects, I tend to get many different answers. In this blog, I’ll dig into just one of these helpful applications of binning: side-by-side analysis of different targets (a.k.a. performance outcomes).
Have you ever worked on a project where the target is discussed endlessly before you can even begin? Rather than debating this without the ability to understand the nuances of the decision, you can use side-by-side binning techniques to provide insight.
The length of the target window is commonly debated when developing behavior scorecards. They are often built with a 6-month target, but sometimes the question arises: what if the target were 12 months instead? There are many ways to analyze the difference, but my preferred method includes side-by-side binning.
To perform this type of side-by-side binning, you must begin by creating the dataset with the same observation date but include targets over both time periods (with different performance dates). Then, run two binnings: the first with the 6-month target in the primary position and the 12-month target in the secondary position, and the second with the 6-month target in the secondary position and the 12-month target in the primary position.
Once the binning has run, sort by information value and analyze the results in two ways: first, by outputting the entire list of variables to Excel and analyzing the results side-by-side; then, by reviewing patterns of individual variables in the binning detail.
Below is an example of the side-by-side information values:
Prf 1= 12-month target, Prf 2= 6-month target. Result sorted by Prf 1.
Let’s jump into the data; we’ll start by analyzing the 6-month target (Prf 2) to see where the sort order does not match that of the 12-month target (Prf 1). You can see that in row 6, status is more important relative to other variables in Prf 2. In row 5, the maximum utilization over the last 3 months is relatively less important compared to other delinquency variables or pmt/amt due or pmt/balance ratios. This difference between 6-month and 12-month targets makes sense if utilization is a leading indicator of delinquency performance. This would mean that the bad behavior associated with maximum utilization is not detected as quickly, but status, which is likely to be highly related to delinquency and over limit status, is related to shorter term bad performance.
Ultimately, this side-by-side binning revealed that a unique set of variables are important depending on the length of the target window. This provides insight you would otherwise not get from this same set of data.
Given the information above about variable importance, I expect slight differences in the relative importance of variables in the model, but do not expect the differences to be major. Armed with these differences in information value, we can now analyze the patterns in the Weight of Evidence (WoE). Observe the similarities in the patterns below:
It is clear that the patterns are very similar. In fact, the patterns across all the variables are nearly identical. From this, one can infer that if used in a scorecard, they would end up with similar weight patterns. So, similar sets of accounts would be “rewarded” or “penalized” in a comparable fashion regardless of the length of the target window.
Of course, individual situations will differ, but based on this example, my recommendation would be to use the shorter window. Not only will the observation date be more recent, thus more likely to represent the current population, but you could collect multiple observation dates more rapidly and validate the model on the development definition more quickly. I think these advantages outweigh the minimal improvement you might see by using the longer-term window. This type of analysis can add great value and hopefully shorten those endless debates about the length of target windows!
To apply this technique, side-by-side binning is a feature in Scorecard Pro and will soon be available in Analytics Workbench.