suehubbard

Why Use Binned Variables in Predictive Models

Blog Post created by suehubbard Advocate on Jan 22, 2018

Binning is the process of creating mutually exclusive, collectively exhaustive categories for the values of each candidate model predictor. This includes a category, or bin, reserved for capturing missing information for each predictor. Classed models (such as Scorecards) calculate an optimized coefficient (ß) for each model predictor bin, in addition to an intercept term (ß0). The resulting model prediction for any “scored” observation is calculated by summing the appropriate coefficients, as determined by the predictor values for that observation. This produces an effective model that is highly transparent and easy to interpret.

 

Below is an example of a discrete additive model with a single binned predictor (X), transformed into indicator (dummy) variables for each user-defined range, or bin:

 

why bin 1.png

Here’s a more formal equation for scoring observation i, using a model with J predictors, and Kj bins per predictor j:

why bin 2.png

There are several types of modeling issues that can be addressed by predictor binning, including:

  1. Analytic issues
  2. Process issues
  3. Data issues
  4. Practical issues

 

Each issue uniquely benefits from binning. I will explain each in depth in a series of corresponding posts. If you have any specific questions you'd like to see addressed, please comment here.

 

For more details, watch the recording of my webinar: FICO Webinar: Why Use Binned Variables in Predictive Models?

Outcomes