Binning is the process of creating mutually exclusive, collectively exhaustive categories for the values of each candidate model predictor. This includes a category, or bin, reserved for capturing missing information for each predictor. Classed models (such as Scorecards) calculate an optimized coefficient (ß) for each model predictor bin, in addition to an intercept term (ß0). The resulting model prediction for any “scored” observation is calculated by summing the appropriate coefficients, as determined by the predictor values for that observation. This produces an effective model that is highly transparent and easy to interpret.
Below is an example of a discrete additive model with a single binned predictor (X), transformed into indicator (dummy) variables for each user-defined range, or bin:
Here’s a more formal equation for scoring observation i, using a model with J predictors, and Kj bins per predictor j:
There are several types of modeling issues that can be addressed by predictor binning, including:
Each issue uniquely benefits from binning. I will explain each in depth in a series of corresponding posts. If you have any specific questions you'd like to see addressed, please comment here.
For more details, watch the recording of my webinar: FICO Webinar: Why Use Binned Variables in Predictive Models?