Binning helps visualize the relationship between a predictor and the target variable.
An x-y scatter plot is helpful for depicting the relationship between a continuous predictor and its continuous target, but loses its effectiveness when the target is binary. As seen below, plotting a predictor against a binary target is inconclusive:
The numeric and graphical interpretation of Weight of Evidence is as follows:
- 0 : level k is neutral indicator
- + : level k is positive indicator (higher propensity to be a “1”)
- - : level k is negative indicator (higher propensity to be a “0”)
A piece-wise fitting methodology eliminates the requirement to transform predictor values for the purpose of forcing a linear relationship between the two. As a result, binning allows for understanding and preserving non-linear relationships:
A regression model would produce the linear fit depicted by the dotted line above. But binning would produce the piece-wise constant fit depicted by the solid yellow line, which is closer to the underlying relationship observed in the data. In addition, binning ensures that the resulting model coefficients remain in the context of the original, non-transformed predictor values, making the model more transparent and easier to interpret.
Consider an example where a large retailer needs to build a model to predict which consumers are most likely to purchase diapers in one of their stores. Their objective is to offer a coupon for diapers with the goal of bringing new parents into the store to purchase additional baby products as well. To execute this promotion most effectively, the retailer needs to control the expense of sending the coupon by targeting an appropriate audience; they can perform a test mailing and plot response rates by various demographics, in order to get the required data.
Example results of the test mailing performed to measure response rates by demographics:
The expected heightened response to the diaper coupon is seen in the 25-44 age range, and also in the 55-64 age range. If the retailer were using a linear regression model, Age would need to be transformed to smooth out this bi-modal pattern in response, which would disguise the actual relationship between Age and response. But using a classed model that bins the values for Age, the resulting coefficient pattern tells a valuable story regarding who is buying diapers: the parents…AND the grandparents.
For more details, watch the recording of my webinar: FICO Webinar: Why Use Binned Variables in Predictive Models?