Dataset Details

Document created by Makenna.Brei Advocate on Oct 27, 2017Last modified by Makenna.Brei Advocate on Dec 11, 2017
Version 3Show Document
  • View in full screen mode

Home Equity Line of Credit (HELOC) dataset

This competition focuses on an anonymized dataset of Home Equity Line of Credit (HELOC) applications made by real homeowners. A HELOC is a line of credit typically offered by a bank as a percentage of home equity (the difference between the current market value of a home and its purchase price). The customers in this dataset have requested a credit line in the range of $5,000 - $150,000. Homeowners seeking a HELOC must submit an application, which is assessed alongside their credit report. Specifically, the prediction problem is: given credit report, information from the application, and current mortgage loan data about a homeowner, will they repay their HELOC account within 2 years? This prediction is then used to decide whether the homeowner qualifies for a line of credit and, if so, how much credit should be extended.


Predictor variables

The predictor variables are all quantitative or categorical, and come from three sources

  1. Anonymized credit bureau data
  2. Application variables
  3. Loan variables that include internal scores within the bank such as LoanAmount, Collateral, etc

Please refer to the data dictionary for full descriptions of the variables and their sources. Note that there are various special values in the dataset, which require careful handling. Also note the section below on Monotonicity constraints.



The target variable to predict is a binary variable called RiskPerformance. The value 0 indicates that a consumer failed to make their payment (90 days past due) over a period of 24 months from when the credit account is opened, while the value 1 indicates that they have made their payments.