suehubbard

Process Issues Solved by Binning

Blog Post created by suehubbard Advocate on Feb 1, 2018

Binning provides a unifying framework for categorical and continuous predictors, as well as binary and continuous targets.

 

The binning process supports both continuous and categorical predictors. Continuous predictors can be put through an auto-binning algorithm that returns bin breaks optimized to a specific target. Unique values of categorical predictors can remain in their own individual bins, or can be combined into a coarser binning. In any case, the precursory steps for model development remain consistent across predictors, and also across projects.

 

Binning provides generalization in terms of both predictors and targets:

process issues 1.png

 

Note that, for continuous targets, bin-level predictive assessment is based on Normalized Mean, and variable-level assessment is based on R2.

 

Binning supports predictive content measures that are invariant to the population odds (binary target) or to the population mean (continuous target).

 

Weight of Evidence derives its numeric value from the distribution of observations within each principal set.  As shown below, even when population odds are multiplied by a factor of 10, the relationship between predictor and binary target remains unchanged.

 

Weight of Evidence provides for normalization:

process issues 2.png
By its definition, the same holds true for Information Value.

 

Information Value provides for normalization:

process issues 3.png

 

This normalization provides a consistent basis for making comparisons. The variable-level Information Value can be used to compare the predictive strength of variables within a project, and also across projects. Predictors with higher Information Values have greater predictive strength than those with lower values:

process issues 4.png

 

Similarly, for continuous targets, Normalized Mean and R2 measures are both invariant to the population mean. Predictors with higher R2 values have greater predictive strength than those with lower values. This provides projects based on a continuous target with a consistent basis for making comparisons as well.

 

For more details, watch the recording of my webinar: FICO Webinar: Why Use Binned Variables in Predictive Models?

Outcomes