Skip navigation
1 2 3 Previous Next

FICO Community Articles

93 posts

Co-author: pietrobelotti@fico.com

 

Terrific beaches in Valencia, Spain? Excellent wine and great food in Bordeaux, France? Or a soothing road-trip to Greenville, South Carolina? All of that combined with the latest and greatest research results in Mathematical Optimization?

 

That's right, three of the most important scientific conferences in the field of Mathematical Optimization are approaching and the FICO Xpress team will be present at all of them: The 23rd International Symposium on Mathematical Programming in Bordeaux, the 2018 Mixed Integer Programming Workshop at Clemson University, and the 29th European Conference on Operational Research in Valencia. This blog gives a sneak preview what to expect at each of them.

 

ISMP2018, Bordeaux

ISMP is a triennial congress, dating back until 1951, the very, very early days of Mathematical Programming. And 2018 is an ISMP year, yay! There will be really interesting (semi-)plenaries by leading researchers from the field. Francis Bach will investigate "The relationship between machine learning and optimization," Oktay Günlük presents "Recent progress in MIP," Emanuel Candes asks us "What’s happening in nonconvex optimization?" Matteo Fischetti talks about "Modern Branch-and-Cut Implementation," and Santanu Dey brings us a "Theoretical Analysis of Cutting-Planes in IP Solvers."

 

Concerning the FICO presence, Michael Perregaard will speak about the "Latest Developments in the FICO Xpress-Optimizer," Csaba Mészáros will present "On the implementation of the crossover algorithm," and Johannes Müller will discuss "Creating an optimization web app with FICO Xpress." Don't miss out!

 

MIP2018, Clemson, SC

The Mixed Integer Programming (MIP) workshop is now a staple in the discrete optimization community. This year, it takes place in beautiful Greenville, South Carolina, and it is organized with the local support of a strong Optimization group from nearby Clemson University. FICO is sponsoring MIP2018 and Pietro Belotti, Xpress developer and MINLP expert, will represent FICO at this workshop.

 

In its fifteenth edition, the MIP workshop has grown to be one of the most coveted events of the year for its cutting-edge content and outstanding speakers. As has been the trend since 2010, there is now a strong presence of linear as well as nonlinear MIP presentations. Cole Smith will talk about the use of binary decision diagrams in binary optimization and Jim Ostrowski will present results on almost-symmetry in integer programming. Pierre le Bodic will discuss branch-and-bound tree size estimates in MIPs and Aida Khajavirad will describe strong polyhedral relaxations for polynomial optimization. Finally, Miles Lubin will show the latest results on mixed-integer convex representability.

 

CmnO-jpWgAQyqVp.jpg

Robert Aumann's plenary at EURO2016 in Poznan

 

EURO2018, Valencia

EURO focuses on Operations Research (OR) and Management Science; it is Europe's largest conference in this field. This year's EURO offers some great plenaries and keynote lectures. To name just a few: MIT's chancellor Cynthia Barnhart will present on "Air Transportation Optimization," Patrick Jaillet will discuss "Online Optimization for Dynamic Matching Markets," Dolores Morales will speak about "Interpretability in Data Science," and Tamás Terlaky will review "Six Decades of Interior Point Methods."

 

Together with Bob Fourer from Ampl, we organized a stream on Optimization Software. It consists of six sessions with 21 presentations altogether, two of which are from the FICO Xpress team. I, Timo Berthold, will talk about "How to fold a linear program," and present the latest developments in the FICO Xpress Optimizer. Susanne Heipcke will present on "Opening Xpress Mosel," and how to connect any solver to the powerful Mosel language (you can get started yourself with the free Xpress Community License). Furthermore, I will be part of the panel discussion on "MIP solvers in practice," organized by Björn Thalén.

 

I hope to see you at at least one of the conferences. If not, I will also be at OR2018 in Brussels in September, organizing a "Software Applications and Modelling Systems," stream. And after that, the INFORMS annual meeting in Phoenix is just around the corner... Stay tuned!

 

Get the latest news and connect with the FICO Optimization team anytime in our Optimization Community, and follow us on Twitter @FICO_Xpress

 

PS:  Math fun fact concerning the banner image: Did you know that the number of clockwise and counter-clockwise seed spirals on a sunflower face are always subsequent Fibonacci numbers? Enjoy your summer!

Binning is a useful tool to have in your analytic arsenal (my colleagues and I have written several blogs on the subject if you’re interested in learning more about the topic). When I ask various analysts how they use binning to improve their projects, I tend to get many different answers. In this blog, I’ll dig into just one of these helpful applications of binning: side-by-side analysis of different targets (a.k.a. performance outcomes).

 

The Issue

Have you ever worked on a project where the target is discussed endlessly before you can even begin? Rather than debating this without the ability to understand the nuances of the decision, you can use side-by-side binning techniques to provide insight.

 

The length of the target window is commonly debated when developing behavior scorecards. They are often built with a 6-month target, but sometimes the question arises: what if the target were 12 months instead? There are many ways to analyze the difference, but my preferred method includes side-by-side binning.

 

The Project

To perform this type of side-by-side binning, you must begin by creating the dataset with the same observation date but include targets over both time periods (with different performance dates). Then, run two binnings: the first with the 6-month target in the primary position and the 12-month target in the secondary position, and the second with the 6-month target in the secondary position and the 12-month target in the primary position.

 

Once the binning has run, sort by information value and analyze the results in two ways: first, by outputting the entire list of variables to Excel and analyzing the results side-by-side; then, by reviewing patterns of individual variables in the binning detail.

 

Below is an example of the side-by-side information values:

 

Picture1.png

 

Prf 1= 12-month target, Prf 2= 6-month target. Result sorted by Prf 1.

 

Let’s jump into the data; we’ll start by analyzing the 6-month target (Prf 2) to see where the sort order does not match that of the 12-month target (Prf 1). You can see that in row 6, status is more important relative to other variables in Prf 2. In row 5, the maximum utilization over the last 3 months is relatively less important compared to other delinquency variables or pmt/amt due or pmt/balance ratios. This difference between 6-month and 12-month targets makes sense if utilization is a leading indicator of delinquency performance. This would mean that the bad behavior associated with maximum utilization is not detected as quickly, but status, which is likely to be highly related to delinquency and over limit status, is related to shorter term bad performance.

 

Ultimately, this side-by-side binning revealed that a unique set of variables are important depending on the length of the target window. This provides insight you would otherwise not get from this same set of data.

 

Given the information above about variable importance, I expect slight differences in the relative importance of variables in the model, but do not expect the differences to be major. Armed with these differences in information value, we can now analyze the patterns in the Weight of Evidence (WoE). Observe the similarities in the patterns below:

 

classer1.2.png

 

It is clear that the patterns are very similar. In fact, the patterns across all the variables are nearly identical. From this, one can infer that if used in a scorecard, they would end up with similar weight patterns. So, similar sets of accounts would be “rewarded” or “penalized” in a comparable fashion regardless of the length of the target window.

 

Of course, individual situations will differ, but based on this example, my recommendation would be to use the shorter window. Not only will the observation date be more recent, thus more likely to represent the current population, but you could collect multiple observation dates more rapidly and validate the model on the development definition more quickly. I think these advantages outweigh the minimal improvement you might see by using the longer-term window. This type of analysis can add great value and hopefully shorten those endless debates about the length of target windows!

 

To apply this technique, side-by-side binning is a feature in Scorecard Pro and will soon be available in Analytics Workbench.

Linear Programming (LP) and Mixed-Integer Programming (MIP) have been standard Optimization business tools for several decades now. While some major players in the field are always seeking new applications, FICO has a solid understanding of what LP and MIP are good at. Therefore, we are focused on how to use these tools to reap the most benefits.

 

Since LP and MIP are precisely described by their mathematical definition, there is a narrow specification of the functionality of an LP/MIP solver. The aspects mentioned below bring many advantages for users of LP and MIP solver software:

 

  1. MIP solvers are here to stay. Xpress, the most sustainable player on the field, has been around for more than 35 years. Throughout this time, Xpress has continuously delivered state-of-the-art optimization software.
  2. MIP solver speed has increased to be 30,000x faster over the last twenty years due to performance enhancements in solver development.
  3. The existing basic APIs of Xpress (and other solvers) hardly ever change, making it convenient to learn and use. Of course, as more functionality is being added, the advanced API capabilities continue to grow over time.
  4. As a result of the precise specification, the APIs of the leading solvers are quite similar.

 

Given the similarities in APIs, it is remarkably easy to change your existing LP or MIP applications from IBM llog Cplex or Gurobi to FICO Xpress. Once you have moved over, you can benefit from the advanced features of the Xpress API, such as the built-in concept of goal programming. This provides a much richer set of callbacks. FICO also has a larger number of user controls to fine-tune the solver experience for your company's needs. FICO Xpress Optimization has the widest breadth of industry-leading optimization algorithms and technologies, including constraint programming and (mixed-integer) nonlinear programming.

AdobeStock_113720292.jpeg

The easiest way to evaluate Xpress as an alternative for any other solver is to export your model in MPS format. All three major vendors, smaller vendors, and academic solvers support this format. This enables you to evaluate Xpress on your favorite optimization model in just minutes.

 

If you should choose to change solvers based on this evaluation, completely exchanging the underlying MIP solver of your application is not complicated. Assuming you have an application that is using the Gurobi C API, the following outlines the basic steps of transitioning to Xpress.

 

Creating your problem data structure

The first step is to initialize your Xpress instance and create a problem. This is the equivalent step to loading an environment and creating a model in Gurobi. In Xpress, you need to call XPRSinit() and XPRScreateprob(),where in Gurobi you call GRBloadenv() and GRBnewmodel(), respectively. You need to call XPRSdestroyprob() and XPRSfree  instead of GRBfreemodel() and GRBfreeenv(), respectively.

 

Modifying your problem instance

Xpress provides a set of routines for manipulating the problem data. These include a set of routines for adding and deleting problem rows and columns. Rows and columns can be added to the problem together with their linear coefficients using XPRSaddrows() and XPRSaddcols(), respectively. A call to XPRSaddrows() corresponds to a call of GRBaddconstrs(). A call to XPRSaddcols() corresponds to a call of GRBaddvars().

 

Running the optimization algorithms

The two main commands to run the optimization algorithms on a problem are XPRSmipoptimize() and XPRSlpoptimize(), depending on whether the problem needs to be solved with or without respecting integrality requirements on the variables. This is akin to the usage of GRBoptimize() which covers both continuous and discrete models, but might require you to change variable type attributes if the user wants to solve the continuous relaxation independently of the full model.

 

Processing the optimization results

FICO Xpress provides several functions for accessing solution information. You may access the current LP solution information via memory using XPRSgetlpsol(). By calling XPRSgetlpsol(), the user can obtain copies of the double precision values of the decision variables, slack variables, dual values and reduced costs. Further, you can obtain the last MIP solution information with the XPRSgetmipsol() function. In Gurobi, these functionalities are covered by diverse attributes, foremost GRB_DBL_ATTR_X for the actual solution.

 

A more detailed walk-through for a migration from IBM llog Cplex or Gurobi to Xpress is given in white papers referenced below. There, we also cover user controls, problem attributes and callbacks. Further, we explain how Python applications can be ported from Gurobi to Xpress, and how C++, Java and .NET applications can be migrated from IBM llog Cplex to Xpress. Let us know in the comments if there are other languages you would like to see a similar document for.

 

And hey, there's no reason to stop here! Once your application works with the Xpress-Optimizer libraries, you might want to learn about the other convenient tools in our portfolio such as Xpress Workbench, Xpress Executor, Xpress Solver and Xpress Insight.

 

One of our more popular tools is Xpress Insight. This software enables businesses to rapidly deploy optimization models as powerful applications. It lets teams work in a collaborative environment with interactive visualization and an interface designed for business users to work with models in easy-to-understand terms and account for trade-offs and sensitivities implicit in the business problem. Xpress Workbench integrates with Xpress Insight for seamless development and deployment of complete optimization solutions. It includes Xpress Mosel, the market leading analytic orchestration, optimization modeling and programming language.

 

Find more detailed instructions in these technical guides: How to Migrate from Gurobi to Xpress and How to Migrate from IBM Ilog Cplex to FICO Xpress. Learn more about all FICO Xpress Optimization has to offer in our community and user forum and follow us on Twitter @FICO_Xpress.

I frequently field questions about using business rules for mortgage pricing and approvals, so I’d like to share my top three best practices in this area.

 

1. How to Include Audit Log Messaging in a Decision Table

You want to deliver a value in response to hitting a cell in a decision table, but you also want to return all the information in the conditions. The key to doing this is using a Decision Table Cell Value Provider. You can create one for each column in the table to return all those results.

 

Picture1.png

 

Aside from the header for the column you want and the Provider to reflect, you don’t need to fill anything else out.

 

Picture2.png

 

When building out the result you want to return, you need to reference all the value holders that you’ve created providers for:

 

Picture3.png

 

2. Formatting the Input into a Table in a Rule Maintenance Application

When you are creating a template and want to format how it looks when editing, you can use any html tags to format the display. One of the most useful tips is to create it all within <table> tags. Here is a mortgage product template example and how this looks in the RMA:

 

Picture4.png

 

As a result, the Rule Maintenance Center will look like the following:

 

Picture5.png

 

3. Using Advisor Classes to Simplify Very Complex Structures

The object model used for the input to the demonstration was the MISMO Standard, which, while very in-depth, can be overwhelming. To balance this, I created an Advisor class that has all the key indicators necessary. As the first step in the ruleflow, I called a setup function that goes through the incoming data source and maps it into the more straight-forward Advisor class that is used for the intermediate storage of values.

 

At this point,  the rules are either based on the calculation of values, or the values that are spread across an array of input. At the completion of the project, results are mapped back into the original class.

 

Picture6.png

 

Here is an example of the start of the mapping function:

 

Picture7.png

 

These examples will help you as you build your own complex Blaze Advisor projects. If you have any questions or comments, leave them here or join the discussion at the FICO® Blaze Advisor® User Forum.

We all know that scorecards can be excellent predictive models, lauded for their combination of simplicity, transparency, and accuracy. And we also love machine learning algorithms for their highly automated and remarkably thorough search for signal. So if you’re already building random forests, gradient boosted trees or neural networks in your analytic projects, is there still a role for scorecards?

 

I’ll admit this isn’t the first time, the second time or even the third time we’ve entertained this question, but the answer remains a resounding “yes.”

 

Scorecards and Machine Learning: A Potent Partnership

Scorecards, and especially segmented scorecards, are a phenomenally efficient and transparent means to create strong predictive models that everyone can immediately understand. This is particularly true when you know the data well and have a great set of predictive features to model from.

 

Scorecards are also incredibly helpful for bolstering our knowledge of our customers and their behaviors. Because they actively engage a human in the loop during the model development process, scorecards help the data scientist quickly learn about the multivariate relationships in the data, which can in turn lead to better predictions and decisions downstream.

 

On the other hand, machine learning methods can be highly automated, and don’t even really afford the chance (or need?) for the data scientist to carefully inspect every aspect of the resulting model. In their search, machine learning techniques are often very thorough, and can automatically capitalize on hidden, latent, non-additive features in the data.

 

But here’s the rub: if you want to understand the insights captured within the machine learning model, you may be out of luck, because common ML libraries simply cannot reveal them to you. Without a healthy dose of what we call “xAI” (explainable artificial intelligence), it’s nearly impossible to know what’s actually happening inside that machine learned model.

 

supervised learning_ML v Scorecard.png

 

In the upcoming release of FICO® Analytics Workbench™, analysts and data scientists will have machine learning, deep learning and xAI techniques at their fingertips — right beside our advanced, leading techniques for scorecard development — allowing them to build, deploy and explain ever-stronger prediction machines.

 

We’re excited to share some of the details behind these methods and show you the new and improved scorecards in action. Join our complimentary webinar, May 30 at 1:30pm PST, presented by myself, Andy Flint, and Lamar Shahbazian, FICO’s Analytic Tools product managers. In the webinar, we’ll discuss:

 

  • The pros and cons of scorecard and machine learning techniques
  • What is a scorecard and why is it valuable
  • How you can incorporate machine learning to amplify predictive strength
  • How to combine these techniques for stronger models, without losing explainability
  • The empowering new design for scorecard development in FICO® Analytics Workbench™

 

Click here to register for this free webinar on May 30, 2018.

In this blog, I'll dive into Xpress Insight’s mirror database in relation to Tableau, providing insights into how best to configure it. This topic was first introduced in my previous blog, Visualizing your data in Xpress Insight.

 

Within an Insight application the input and result data can vary from kilobytes up to a gigabyte. To reduce the data footprint, it is stored in a highly compressed format within the Insight repository. Although this is good from a data footprint point of view, it is a format that Tableau is not able to easily consume. To resolve this, the mirror database is used with the sole purpose of exposing only key entities of an application in an uncompressed format so they can be consumed by Tableau.

 

The entities to mirror are defined through the application companion file for the application. At its simplest, the user must define the prefix to use for each of the tables and each of the mirror tables to appear in the mirror database. For example, the FlowShop example provided with Xpress Insight uses such a configuration:

<database-mirror table-prefix="flowshop_">

    <mirror-tables>

        <mirror-table name="plan">

            <entity name="startj"/>

            <entity name="DUR"/>

        </mirror-table>

        <mirror-table name="interval">

            <entity name="interval_start"/>

            <entity name="interval_duration"/>

        </mirror-table>

        <mirror-table name="goal">

            <entity name="Goal"/>

        </mirror-table>

    </mirror-tables>

</database-mirror>

When deciding on which entities to add to a mirror table, you need to keep the following rules in mind, a mirror table can only have either:

  • One or more array entities with the same index set.
    • Looking to the FlowShop example above, startj and DUR both use the same index set of MACH and JOBS.
  • A set.
    • Looking to the FlowShop example above, a table could be added to hold the STATES set as follows:

<mirror-table name="allstates">

    <entity name="STATES"/>

</mirror-table>

  • One or more scalars by repeating the entity row for each scalar specifying the scalar in the name attribute for the entity.
    • Looking to the FlowShop example above, a table could be added to hold the AvgMakespan and Tardiness scalars as follows:

<mirror-table name="myscalars">

    <entity name="AvgMakespan"/>

    <entity name="Tardiness"/>

</mirror-table>

  • All parameters by entering parameters in the name attribute for the entity.
    • For example:

<mirror-table name="myparameters">

    <entity name="parameters"/>

</mirror-table>

Scenario Execution

When an application is uploaded to Xpress Insight, the mirror database tables defined in the application companion file are created but will contain no data. Loading a scenario will update the tables that have entities associated to them, which are input data. Executing a scenario will update the tables that have entities associated to them, these entities  are result data. When a scenario is executed, the scenario data (for that scenario) is deleted from the mirror database during the completing phase of the scenario execution. This is the final stage of the scenario execution and is also when the data from the execution is stored within the Xpress Insight repository.

 

Partial Mirroring

Using the default mirror strategy known as partial mirroring, a mirror table will only be populated if the total number of rows to be inserted into that table for the scenario is less than 50,000. Otherwise, the table is populated when a user opens a Tableau view within Insight that uses this table.

 

This approach balances Tableau view responsiveness, database volumes, and execution time; it should be suitable for most apps. Although this is not relevant for the simple FlowShop example, which uses and generates a small amount of data, it becomes far more critical when an application is using/generating large amounts of data.

 

With large amounts of data, there may be Tableau views which are infrequently viewed by the users that use large amounts of data. Where this is the case, it makes sense to update the mirror database only when the data is required, rather than slow down the overall scenario execution time (this can be overridden and will be discussed later in this post). To provide this capability internally, Xpress Insight holds a mapping of Tableau views to mirror tables, which is refreshed on user log in or when an application update is initiated. Therefore, it is important that Tableau workbooks are published via Xpress Insight rather than directly to Tableau server using Tableau Desktop when partial mirroring is used.

 

Although multiple users can access the Tableau views, the scenario data is only written once to the database. Each time a user accesses a Tableau view within the Xpress Insight client, the Xpress Insight server ensures that the data is up to date. It will also write an entry into the insight_security table to indicate that the user is authorized to access this data. If you're interested in more detail on this subject, a future post will discuss how to best to secure the data used by a Tableau view within Xpress Insight, so keep an eye on this community blog.

 

Overriding the default settings

Typically, the database mirror is defined in the application companion file as follows:

<database-mirror table-prefix="flowshop_">

As the sync-after-execution attribute has not been included in the definition, it is defaulted to "auto" and Xpress Insight will use the default mirror strategy known as partial mirroring. If you wish to populate all of the mirror database tables regardless of the data size, then the sync-after-execution attribute should be included and set to a value of "true." Typically, this is used where execution time is not important but availability of the data for the Tableau views is.

 

It could also be used when developing new or modifying existing Tableau views where you will want all of the data available at the time of development. In this case, it is recommended that the application companion file is not altered, but partial mirroring is disabled by unchecking the Partial mirroring enabled checkbox located on the Tableau page of the Xpress Insight Web Admin client.

insight-web-admin.png

The sync-after-execution attribute can also be applied at the mirror-table level. For example:

<mirror-table name="interval" sync-after-execution="true">

    <entity name="interval_start"/>

    <entity name="interval_duration"/>

</mirror-table>

This is particularly useful in cases where you know your tables have more than 50,000 rows per scenario that are commonly used in the Tableau views. In this situation, you may want to mirror the data at execution time to speed up the load time of those views.

 

Finally, it is possible to manually control when a mirror table is populated by specifying the Tableau workbooks or workbook views that use it. For example:

<mirror-table name="clients">

    <entity name="CLIENT_LATITUDE"/>

    <sync-for-tableau-view>

        <include workbook="Analysis"/>

        <exclude workbook="Forecast" view="Summary"/>

    </sync-for-tableau-view>

</mirror-table>

It is recommended not to use this, as it will be deprecated in the future. Instead, partial mirroring should be used.

 

In Summary

  • For most applications, the out of the box configuration can be used.
  • Where large amounts of data will be written to the mirror database tables, you should start to consider what is more important – scenario execution times or the speed in which the Tableau views display prioritized by how often these Tableau views are viewed.
  • With this information you should be able to make an informed decision whether you need to mirror any of the larger tables on execution.

 

Check out the FICO Xpress Optimization Community for more resources and information.

Cloud computing, grown out of the success of software-as-a-service, has come a long way from a largely developer-focused environment to a full-service IT infrastructure offering. By fulfilling the expected requirements around availability, scalability, security and agility, cloud computing has become a de-facto part of virtually every IT infrastructure and enterprise software support plan. However, we shouldn’t take enterprise software support and business ubiquity for granted. Cloud Computing has grown conceptually to become a Mission Critical Cloud.

 

FICO, ever cognizant of its roots in banking and the strict financial services industry, set requirements for the highest availability of service levels, security and business criticality. Our cloud solutions must deliver on these strict requirements while developing and deploying loan originations, banking account strategies, debt collection, and fraud detection solutions. We deliver our own solutions in the cloud, in the form of the FICO Analytic Cloud, as a managed service. There, FICO customers can focus on business results instead of infrastructure and service management. Once deployed, customers can rest assured that their services will run seamlessly following FICO best practices.

 

What do we mean by Mission Critical?

Mission Critical delivers on the requirements of high availability, scalability, disaster recovery, security, governance and compliance. Delivering mission critical services requires architecting the services from the ground-up to have redundancy, flexibility, hooks for monitoring key services, scheduled data backups and a security architecture that is reviewed constantly for threats. This infrastructure is backed with a staff that is trained in the respective areas and is equipped to resolve issues quickly.

 

For FICO and our customers, one of the mission critical components is achieving Payment Card Industry (PCI) certification. PCI certification represents the highest level of infrastructure security best practices to ensure that personal and credit card information can securely be stored and leveraged by the related services. Only after a stringent review of its architecture, policies and procedures is a service deemed to be PCI compliant. As of April 2018, we’re proud to announce that the FICO Analytic Cloud was deemed PCI compliant. It now delivers the highest levels of mission critical security and compliance as a managed service. Mission Critical enables FICO customers to leverage the most sensitive information for their decision automation, optimization and analytical tasks; it also provides a secure infrastructure to leverage machine learning and artificial intelligence in the cloud.

 

AdobeStock_89127696.jpg

 

Addressing infrastructure and service redundancy, automatic failover and continuous monitoring is also very important to delivering Mission Criticality in the cloud. The FICO Analytic Cloud provides a 99.9% uptime for the services deployed on it without any additional cost to the customer.

 

You can use this SLA and Uptime Calculator to see what an acceptable service level would be for your applications.

 

At FICO World 2018, we announced the following new capabilities:

  • Decision Management Suite Enterprise Grade Cloud Service: Now available on AWS, this managed service provides 24x7 high availability, disaster recovery, and supports the development, test and production environment.
  • New tools and execution platform (FICO Platform): For mission critical AI, decisions, analytics and optimization, the new FICO® Analytics Workbench™ supports a wide range of commonly used AI and ML model executions and FICO developed analytic models for use cases such as fraud and anomaly detection.
  • FICO® Decision Central™ and DMS Hub in the Cloud: Enables true centralized decisioning with performance tracking, governance, source code control and collaboration for analytics and AI models.
  • Next generation decision optimization: Latest version of FICO® Decision Optimizer is now available in the Cloud.

 

Find and trial these products yourself on the FICO Analytic Cloud marketplace.

This blog series features the opinions and experiences of five experts working in various roles in global strategy design. They were invited to discuss the best practices outlined in this white paper, and also to add their own. With combined experience of over 70 years, they partner with various businesses and help them to use data, tools and analytic techniques to ensure effective decision strategies to drive success. The experts share their personal triumphs and difficulties; you’ll be surprised to learn the stark differences, and occasional similarities, in these assorted expert approaches to accomplishing successful data driven strategies across industries.

 

jill deckert cropped.png

 

Jill Deckert is a Principal Consultant at FICO where she’s worked for the past 11 years. Jill works collaboratively with her clients to provide an actionable roadmap of improvements. Throughout the process, she considers her clients’ current business constraints and understands the wider context of their business.

 

From Judgment to Data Based Strategies

Judgment-based strategies are effective at carrying out decision processes based upon established guidelines. These strategies are usually developed according to a stack of business rules, often referred to as knock-out rules. Strictly following these steadfast rules can often lead to unintentionally excluding profitable accounts, which leaves the business with a smaller target population and reduces their opportunity for growth.

 

Data-driven strategies are built using historical data. With a data driven approach, the construction of the strategy is guided by how the data reacts to different decision elements and threshold values. Therefore, the best treatment can be assigned for different sub-populations of accounts. These types of strategies are often a balance between judgmental criteria (i.e. business rules) and data science. Data driven strategies can be much more effective than a judgment based strategy; however, the transition from judgment based to data driven requires not only an operational shift, but an institutional shift as well.

 

Picture1.png

 

Upfront preparation, buy-in, proof and trust are key components to success, as discussed by my colleagues earlier in this series. Since the data extraction process can be difficult and time consuming, it is common to see lenders relying on established processes. However, embracing a different approach to building strategies could result in improved success and greater profit potential.

 

Case Study
My team and I worked with an automobile lender to build an origination strategy. They had been lending for more than 20 years, and were using established knock-out rules to identify risky populations. Their goal for the new strategy was to maintain their existing bad rates while increasing automation rates (i.e. reduce the number of manual reviews).  

 

We began planning the transition from a judgmental to a data driven strategy by mapping out their current champion strategy and comparing it to the new challenger strategy that was built with data driven decisioning (learn more about Champion/Challenger testing here). We spent half a day walking through the strategy and reviewing their business rules to show how the existing rules were eliminating profitable segments of their population, but not reducing any incremental risk.

 

We were also able to show that a large percent of their manual review population could be automatically approved based on additional decisioning criteria they weren’t already using. During the course of this project we were able to demonstrate how a phased approach to implementing a data driven strategy can more effectively eliminate risk and allow for increased auto-approval rates.

 

We agreed on several critical business rules to be included and then incorporated them into a “hybrid” data driven strategy.

 

The success of this project is due in large part to the use of FICO’s Decision Tree Professional. This tool helps identify the most predictive variables to use. We were able to show how different decision keys and key splits impacted the decisions and outcomes. This exercise was extremely useful in demonstrating how certain decision keys are more effective at eliminating bads from the target population.

 

At the end of the day, the lender was much more comfortable with the data driven approach. The ability to visually show the data within the strategy tool and the comparison of results between the champion and the challenger strategy provided compelling reasons to move on from a judgmental approach to a data driven approach. It was clear that the challenger strategy was more efficient and effective.

 

Picture2.png

Bad rates by strategy

 

After shifting from a risk focused, judgmental approach to a data driven strategy approach, the next step is to implement mathematical optimization. You can look forward to hearing about this step from my colleague Sonja Clark later on in this series.

Are you relying on knock out rules to dictate your strategy decisions? Discuss the benefits of introducing data driven decisioning into your strategy below in the comments, or in our TRIAD and Analytic Tools communities.

If you’re a current TRIAD user, join us at our upcoming Customer Forum, May 23-24 in Atlanta, Georgia.

The data used and generated by Xpress Insight apps can vary from a few kilobytes to several gigabytes. Given all this information, it is key to be able to make sense of this data and analyze and display it in a way that others can understand. This can be done by visualizing the data and allowing the user to interact with the visualization; the buff word for this at the moment is ‘business intelligence’. Xpress Insight provides two mechanisms through which this can be achieved – Tableau and VDL Charts.

 

Tableau

Tableau is a business intelligence and data visualization tool that focuses on data visualization, dashboards and data discovery. It has been placed in the leader’s quadrant for the sixth year in a row in 2018 in the Business Intelligence and Analytics Platforms by IT research firm, Gartner. Tableau provides a wide range of support for multiple types of charts, maps and plots. Due to the number of visualizations available, it can lead to a quandary over which one is best to use. When trying to decide, it is worth reading Which Chart or Graph is best? A guide to data visualization, as this article provides valuable guidance regarding which type of visualization to choose in specific scenarios.

 

Building your data visualizations is achieved using Tableau Desktop. Here, you can connect to Xpress Insight’s mirror database and construct the data you wish to use in the visualization. Once this has been done, you are free to create your visualizations. A visualization can either be a single view onto the data or multiple views combined into a dashboard.

Single View

tableau1.png

 

Multiple Views Combined into a Dashboard

tableau2.png

 

All of this is done via an intuitive drag and drop interface. Out of the box statistical summaries, trend analysis, regressions, and correlations are available to aide statistical understanding. Reference lines and forecasts can also be added to views to aid understanding. Tableau maps have built in postal codes for more than 50 countries worldwide with definable custom geocodes and territories available to generate personalized regions.

 

tableau3.png

 

If you don’t have Tableau Desktop, then workbooks can still be created via their web interface. Although the web interface is not as feature rich as Tableau Desktop, it does provide adequate functionality to create simple and averagely complex workbooks. The main drawback of the web interface is that you cannot create data sources. The user must select from the ones that have been published to the Tableau server. Via the Insight application companion file, it is possible to create simple data sources which will be published to Tableau when you import or upgrade the application.

 

For more detail on how to create Tableau views in Xpress Insight see Insight Developers Guide - Adding a Visualization Using Tableau for FICO.

 

VDL Charts

The View Definition Language (VDL) is a markup language within Xpress Insight designed to allow views to be authored using an XML notation. Markup tags are provided to define basic page structure and to incorporate and configure components such as tables, forms and charts. The language has syntax for referencing scenario information, and for enumerating content over data sets from the scenario. For more detail on how to create custom views in Xpress Insight, see Insight Developers Guide - Authoring Custom Views.

 

A component within VDL are VDL charts. VDL charts are a lightweight alternative to Tableau charts built using a leading open source, client side data visualization library to provide a range of visualizations. VDL Charts provide support for bar, stacked bar, line, scatter and pie charts, with plans to extend this in the future.

 

Using a combination of standard VDL form elements with VDL Charts it is possible to provide an interactive experience through which the user can apply filters to the data visualizations. Within a single VDL view it is also possible to display multiple VDL Charts.

 

Data is passed into the chart by either using model entity arrays directly or using JavaScript to combine data from multiple entity arrays. When developing a view with one or more VDL Charts, you should take care around the amount of data being pulled down to the client with regards to how it will perform across your network and how much memory it will consume in the browser.

vdl1.pngvdl3.png

 

So, should you use Tableau or VDL Charts?

If you are looking for ad hoc data visualizations that either use the existing model entity arrays as is or require a small amount of data manipulation client side, then VDL Charts are the ones to use. If you are looking for more sophisticated visualizations with a wide range of chart types, or you are dealing with high volumes of data, then Tableau is the one to use. Both options are seamlessly integrated into Xpress Insight.

 

If you have any questions or opinions on data visualization, I'd love to hear them in the comments section below. You can also check out our FICO Xpress Optimization Community for more resources and information.

If you haven’t already heard, FICO is collaborating with Google, Berkeley, Oxford, Imperial, MIT and UC Irvine to host an Explainable Machine Learning (xML) Challenge.

 


xML Challenge - YouTube

 

 

FICO has been at the forefront of driving innovation surrounding explainable AI for the last twenty-five years. To continue with this, we are inviting teams to create machine learning models that are both highly accurate and explainable based on a real-world financial dataset. The explanations will be qualitatively evaluated by data scientists at FICO and will be used to generate new research in the area of algorithmic explainability.

 

It is important the data scientists can understand and interpret the models that they fit. We must look at biases, make improvements, and make the business case for adopting them. However, the current black box nature of machine learning algorithms makes them neither interpretable nor explainable. Without explanations, these innovative algorithms cannot meet regulatory requirements and, as a result, cannot be adopted by financial institutions.

 

More sophisticated machine learning techniques that offer both accuracy and explainability should mean greater collaboration between humans and machines. We encourage you and your team to enter this challenge and help improve the understanding and interpretation of complex machine learning models.

 

Get started with the details, dataset and rules on the xML Challenge page.

A couple of weeks ago, I went to Dagstuhl castle to attend a seminar on "Designing and Implementing Algorithms for Mixed-Integer Nonlinear Optimization."  It was an inspiring conference, as it provided the opportunity to meet some of the world's leading researchers in Mixed-Integer Nonlinear Programming (MINLP) and have discussions. For my presentation, I spoke about "Numerical challenges in MIP and MINLP solvers."

 

The term numerical error comprises two sources of errors:

 

Errors from approximating numbers

  • This error occurs due to the fact that each number involved in a computer calculation is only represented by a finite number of digits (more precisely: bits). Thus, there are inevitably some numbers that cannot be represented exactly as floats (a data type used to represent numbers in a computer code). Take Pi, with its infinite series of non-repeating decimals, as the most famous example. But there are other, more surprising cases as well. For example, the number 16,777,217 cannot be represented as a float, while 16,777,216 and 16,777,218 can.
    *(for more information, see the detailed explanation at the end of the blog)

 

Errors from approximating mathematical procedures

  • This error is due to the fact that a computer program has to evaluate even the most complex mathematical formulas by essentially only adding and multiplying. For example, trigonometric functions like sine and cosine will be approximated by a sum of polynomials, called a Taylor series.

 

iterative.png

Numerical errors and the use of tolerances often lead to slightly

infeasible solutions - which is acceptable in most cases.

 

A single numerical error is usually negligible (and as said before, unavoidable), but things might get more and more serious when the computation is continued with such slightly-off numbers; as the errors can add up and propagate.

 

The big challenge is figuring out how to avoid having numerical errors add up to create a big impact on the solution of your optimization models!

 

One answer, by FICO Xpress, is our Solution Refiner. Due to numerical errors, the final solution might violate some of the linear constraints, some of the variable bounds and some of the integrality requirements. The solution refiner will trigger extra LP iterations with higher precision to reduce or even eliminate those violations. This procedure is activated by default for both linear and mixed-integer optimization in Xpress.

 

For nonlinear optimization, a lot depends on defining so-called convergence criteria. Since there are more than twenty of these criteria, balancing them to get a decent performance can be a real burden. Fortunately, we recently introduced the "self-tuning default convergence," where a user can specify target values for feasibility and optimality tolerances and FICO Xpress will take care of the rest.

 

So, while a lot of effort is already taken to get numerical errors under control, many challenges remain, and these challenges open a wide field for research. This is particularly true for Nonlinear Optimization, since it features a lot of issues that are not present in Linear Optimization. Take singularities, like evaluating 1/x close to x=0 as one example. But also some MINLP-specific algorithms like Outer Approximation are error-prone by design.

 

IMG_20180328_115159.jpgDagstuhl 2018: Developers of ten different optimization solvers in one picture (I'm on the left).

 

In Dagstuhl, we had another lively discussion about Numerics at the Breakout-Session on "Experimentation with MINLP software." It is surely a topic that will keep the solver developers and the academic research community on the run for the next couple of years. Well then: challenge accepted!

 

You can develop, model, and deploy FICO Xpress Optimization software for free by downloading our Community License. Learn more about FICO Xpress Optimization, ask questions, and hear from experts in the Optimization Community.

 

*float uses 23 bits to represent the mantissa. Since 16,777,216 = 2^24, the 2-bit is the last significant bit and the 1-bit is lost. As a consequence, odd numbers between 2^24 and 2^25 cannot be represented. Similarly, from 2^25 to 2^26, only multiples of four can be represented exactly, and so on. As an example in the decimal system, consider you had to use a number format which allows you to specify three digits of each number plus its order of magnitude. You could represent all numbers from 0-999. Also 1000 would work (since 1000 = 1,00 *10^3), but the next number you could represent would be 1010 (= 1,01 *10^3). The numbers 1001 to 1009 would not be representable, but would have to  be rounded up or down.

This blog series features the opinions and experiences of five experts working in various roles in global strategy design. They were invited to discuss the best practices outlined in this white paper, and also to add their own. With combined experience of over 70 years, they partner with various businesses, helping them to use data, tools and analytic techniques to ensure effective decision strategies to drive success. The experts share their personal triumphs and difficulties; you’ll be surprised to learn the stark differences, and occasional similarities, in these assorted expert approaches to accomplishing successful data driven strategies across industries.

 

Jim patterson headshot cropped.pngJim Patterson is a Director of Business Consulting within the Credit Lifecycle Practice of Fair Isaac Advisors where he’s worked for the past 20 years. Jim focuses in the application of adaptive decision management and predictive analytic technologies within the financial services industry.

 

Data Preparation

Thoughtful data preparation is the key first step before any analytic strategy design effort. It is critical to identify which variables to include in the dataset by thinking through the specific business challenge. This should not be approached with tunnel vision, as many portfolio trends have a multi-decisional context. For example, rising losses may require new and innovative approaches not only in how a lender manages collections prioritization and treatment, but also a combination of limit and authorizations management in the case of revolving products.

 

It is important that you also consider how you will validate the design of the new strategy. You must ensure that relevant profiling variables are included in addition to the outcome variables that drive the design. This facilitates a critical part of the design and deployment process – selling your new approach to internal stakeholders. Lastly, select a performance window that is relevant to the decision context. For instance, credit limit increase strategy design typically leverages a longer performance window, say twelve months, since revenue gains are observed much earlier than risk-related impacts (e.g. delinquency and loss).

 

Pick Your Sample

It often isn’t necessary to sample all available account records. In large portfolios with millions of customer accounts, doing so would simply extend the strategy development process with little to no benefit as compared with using a representative subset of account records. Applying a suitable sampling methodology can pay big dividends in terms of the efficiency of the strategy development process while yielding essentially the same output.

 

It is important to note, however, that the development dataset should not be constructed with a purely random sample. Imagine a pool off 100 marbles with 95 green marbles and 5 red marbles; a pure random sample would most likely miss the red marbles. To avoid a situation like this when sampling your data, you must ensure an accurate representation of key account profiles. A common approach is to use a stratified random sampling methodology that results in unbalanced account profiles being accurately represented in the sampled datasets. In stratified random sampling, the full population is broken into smaller groups, called strata, that share important characteristics. These smaller data groups are then sampled at different rates.

stratified random sampling.png

Sampling must be done correctly in order for the development data to properly represent the true portfolio composition and produce usable, beneficial results. A practical example is sampling 100 percent of non-performing accounts (i.e. “bads”).

 

Don’t Neglect Your Business Intuition
The best designed strategies blend business expertise and judgement with the mathematical power of the analytical software. The analyst’s industry knowledge should be a guiding force behind any strategy development effort. The absence of business expertise in the process, where the strategy builder is completely reliant on mathematical outcomes, introduces significant vulnerabilities. The lens of industry experience, as stressed by my colleague Stacey in her best practices, helps to identify unexpected patterns in the data that would otherwise go unnoticed by those lacking applicable business understanding; this could compromise the integrity and performance of the resulting strategy.

 

This is not to say that valid data patterns are never surprising to industry veterans! However, if the unanticipated patterns or suggested strategy splits cannot be rationalized, take a step back and check the validity of your data. 

 

The combination of good statistical-based software and human business expertise for contextual wisdom is key to a successful data-driven strategy design project. On occasion, the analytic output will justify a particular course of action that is not implementable for practical business reasons.

 

Take, for example, a limit reduction strategy designed to manage exposure at risk. Credit bureau and/or behavior risk scores are common inclusions to the targeting criteria. If the proposed limit reduction action requires adverse action notification to the customer, consideration must be given to the customer communication. Specifically, the score reasons provided to the customer as justification for the adverse action must be reasonable and sensible. This level of explainability can help avoid indefensible customer complaints and operational disruptions in the form of unmanageable inbound call volumes.

 

It is good practice to balance the analytically-derived strategy design with review of live account examples that would fall within the marginal spectrum of the targeted score range of your treatment group. A manual review and understanding of score reasons that will be communicated to customers may necessitate strategy modifications – so be prepared.

 

Operational Agreement and Communication

While some business strategies are executed without dependence or reliance on operational areas (e.g. credit limit increases), the success and effective testing of other decision strategies are as dependent on human execution as they are on the design of the strategy itself. A clear example is collection strategy design where execution consensus is essential. When an operational center is responsible for carrying out the customer treatment prescribed by your new business rules, agreement among relevant parties is imperative to ensure the test is applied in practice. It also ensures that back end performance monitoring reflects the business strategy design which is critical to the design-test-learn cycle.

 

Equally important is communication to call centers that will be impacted by shifts in customer treatment approach. Call center agents should be coached on the nature of anticipated customer reaction and how to address related customer concerns. Call center agents and statistics can provide a valuable qualitative feedback loop once strategy tests are put into production. If there is an unexpectedly high volume of customer calls tied to your test, you should revisit your strategy design and treatment. Significant negative impact to the customer experience will manifest in complaints to the call center, so capture the “voice of the customer” in your strategy management procedures.

 

Pre-Implementation Testing

Once the strategy design process is complete, it is important to conduct pre-implementation simulations to assess the strategy-prescribed customer treatments against current day portfolio volumes and distributions. Strategy development datasets often leverage observation dates up to a year ago or more, subsequent shifts in the portfolio makeup can lead to unpleasant production surprises that can be avoided by running estimator reports prior to deployment. This acts as a final check of the strategy before rolling it out into a production test. Treatment distribution and volume shifts can be addressed, if needed, through a staggered rollout of the new strategy or minor strategy logic changes. 

 

Have questions, comments or tips about data-driven strategy design practices? We can discuss in the comments below or in the TRIAD Community. If you're already a TRIAD user, please join us at our upcoming Customer Forum, May 23-24 in Atlanta, Georgia.

Alliant, one of the largest credit unions in the US has been serving members for over 80 years, growing to over 335,000 members and 9.3 billion in assets. In our new press release, John Listak, lending systems project Manager at Alliant, shares how the company built a new consumer loan origination system using FICO® Origination Manager. They were able to improve efficiencies by reducing the number of decision rules per loan product by over 25%, and credit card applications can now close in less than 30 minutes.

 

Alliant's story is one we see with Financial Services clients around the world; they want to meet growth goals while mitigating risk and providing a fast, more digital customer experience. Most of the time, legacy systems built on outdated technology stand in the way of meeting these goals. Alliant's previous system was not flexible enough to make changes easily, add loan products or improve the product. This of course makes it difficult to stay competitive, because it is difficult to adapt to regulation and meet customer needs. FICO® Origination Manager helped Alliant reach these goals by adding sophistication to their system, enabling fast changes, easy decision rule updates and multiple credit bureau integration. If this story sounds similar to that of your organization, ask your questions about the Alliant story or Origination Manager in the comment section below.

 

Read the full press release here.

Reliability and reproducibility are key goals when developing the FICO Xpress solver engines; but what does that mean exactly? First, the same input to an algorithm should always lead to the same output. In addition to that, runtimes should be repeatable. Ultimately, the algorithm should always take the exact same path to the final results. This is what computer scientists call a deterministic algorithm, it makes results reproducible. That is handy, and even necessary in many different contexts:

 

  • Want to show a live demonstration to your bosses? Your software better be deterministic!
  • Want to fix a bug that one of your customers reported? Your software better be deterministic!
  • Want to report your research results in a scientific journal that peer-reviews software? Your software better be deterministic!
  • The results of your work might have legal consequences,for example, when you optimize the assignment of students to universities? Your software really, really better be deterministic!

 

I could go on, but for those reasons and many more, any commercial MIP (Mixed Integer Programming) solver that I am aware of, and even many academic solvers, are deterministic, to a certain point. That is, they are deterministic as long as they run in the same computational environment, i.e., on the same machine. Even if the input is completely identical, a different operating system, more or fewer CPU cores, or different cache sizes can destroy deterministic behavior.

 

While this probably doesn't matter for your live demonstration (as it will be on the same laptop as the test run), it is almost guaranteed to break performance in the journal peer review. For customer support, it comes down to being required to hold a farm of different machines, light ones, heavy ones, with (at least) three different operating systems to maintain. This can be a nightmare for both the developers and the IT admins.

 

concept gears.png

Reliability is a crucial aspect of optimization software.

 

This is where we at FICO Xpress outperform other solvers. To the best of my knowledge, Xpress is the only MIP solver whose solution path is identical under Windows, Linux and Mac. Our performance is also independent of the cache size. There is a dependence concerning different numbers of CPUs, but it can be overcome. The important ingredient here is that our MIP parallelization (see my earlier blog post about Parallelization of the FICO Xpress-Optimizer) does not depend on the number of actually available threads, but on the number of tasks that we hold in the queue to be scheduled in parallel. Such a task can consist of solving a node of the branch-and-bound tree or running a certain primal heuristic. Tasks do not have a one-to-one association to computing threads; therefore, they can be interrupted and be continued in a different thread in a later point in time. The idea is to always have more tasks available than there are threads.

 

The maximum number of parallel tasks can be determined via the user control MAXMIPTASKS. An important consequence of breaking the task-thread-association is that a run on a machine with X threads can always be reproduced on a machine with Y threads, no matter whether X is smaller or larger than Y. You just need to adjust the MAXMIPTASKS control in the Y-run to the value that is reported in the log file of the X-run. For example, on my 40-threaded Windows development server, the corresponding log line reads:

 

   Starting tree search.

   Deterministic mode with up to 40 running threads and up to 128 tasks.

 

So, if I want to redo a run on my 8-threaded Mac laptop, or inside my 4-threaded Linux virtual machine, I simply set MAXMIPTASKS=128, then I am good to go.

 

Albert Einstein is credited with saying: "Insanity is doing the same thing over and over again and expecting a different outcome." It's good to know that my favorite MIP solver is sane, and, moreover, offers me different ways of doing the same thing with the same outcome. This gives me the freedom to choose the way that comforts me most and is the best for the job at hand.

 

You can develop, model, and deploy FICO Xpress Optimization software for free by downloading our Community License. Learn more about FICO Xpress Optimization, ask questions, and hear from experts in the Optimization Community.

Recent FICO research reiterates that card-not-present (CNP) fraud is growing dramatically as a percentage of total fraud losses. In EMEA specifically, CNP fraud now accounts for 70% of total card fraud losses, up from 50% in 2008. At the account level, this figure is close to 90%. Similar CNP metrics exist around the globe.

 

While issuers often avoid high dollar liability for CNP fraud, they have a strong interest in CNP fraud prevention as an improved consumer experience benefits all parties. As a result, payment processors, card issuers, and merchants are all striving to improve the separation of fraudulent and legitimate CNP transactions.

In support of these efforts, FICO has developed new machine learning techniques focused specifically on CNP transactions. These advances have demonstrated an ability to reduce total CNP fraud losses by upwards of 30% without increasing false positive rates. The CNP machine learning innovations will be included in the 2018 consortium models (both Credit and Debit) and will be available in the standard consortium model release cycle. There are no incremental licensing or upgrade fees associated with this release.

Filter Blog

By date: By tag: