Can Machine Learning Save Big Data?

Blog Post created by Advocate on Apr 11, 2017

The Big Data promise continues to be just that, a promise. This is not because the technologies are flawed, but because effort is not focused in the right place.


8 in 10 organizations confessed to Gartner they are not able to exploit Big Data for competitive advantage. Following the Gartner Hype Cycle, Big Data travelled through the Technology Trigger from 2011-2012 into the Peak of Inflated Expectations in 2013, then down into the Trough of Disillusionment in 2014.


Can machine learning save Big Data from the Trough of Disillusionment? It’s no secret that machine learning is of great interest right now; it was identified by Gartner to be at the Peak of Inflated Expectations in 2016. With all this hype, everyone wants to do it, but few are implementing it well.


The formula for effective machine learning goes beyond just Big Data + Open Source Libraries + Data Scientists. Currently, poor data leads to lack of operational results, and this is causing tension between executive expectation and technical staff. To close this gap and effectively operationalize machine learning, companies need clean data, focused decision frameworks, and innovative analytic approaches such as self-calibrating AI models.


All About the Data

All informed decisions begin with data, so companies must invest in data quality and data governance. High quality results start with high quality data. That means that data gathered from multiple sources is aligned, monitored, and refreshed. The objective of the analytics must be top of mind when creating a data-stream (or creek) vs data-lake. Specific objectives require specific relevant data and domain knowledge. For example, analyzing available credit card balance data is not helpful if one attempts to detect fraud because it is a lagging indicator of fraud. When fraud is detected and flagged, it’s too late; the fraud already occurred and the majority of funds were successfully removed! Careful review of the specific data elements is needed, and they must be relevant to the analytic objective. This type of data collection represents a required shift in focus from the algorithm to the decision parameters necessary to solve the problem.


Adaptive Models

Collecting clean, relevant data is only the first step. An entire process must follow to operationalize the data. Machine learning is introduced to offer a high level of automation where the machine learns complex features from raw data that drive prediction or detection, instead of the expert knowledge of the human. However, this is not to say expert knowledge is not essential. Domain knowledge is used in the identification of data and operationalizing scores through rule strategies, which are then used to ensure delivery of business value. Optimization is further applied to improve the process and strategies around the analytic scores.


Analytic models manifest real-world wisdom, but the real world is constantly changing. The fluidity of the real world requires models to be continually updated, as illustrated in the image below. Build it and forget it will not work in the long run. Traditional models are trained on historical data and model weights are frozen thereafter. However, if customer behaviors or data patterns change, there is no mechanism in place in these traditional methods to “adjust” the model weights in real time, rendering a model degraded if the environment changes. This lack of adaptation leads the decisions system to under deliver on the value promise. This is where machine learning and artificial intelligence can improve the process.


SZ Blog 1.png

Machine learning and Artificial Intelligence makes the decision-making process cyclical as it is constantly improving based on new data and events


Companies must invest in adaptive and streaming models that learn from each new data-point to optimize their prediction. Today, many companies recognize that these shifts in behaviors require AI that adapts such as Multi-Layered Self-Calibrating models. These models self-learn to identify risky feature values and unique real-time, on-the-fly, latent feature creation to combat rapidly changing environments.


In addition to this real time adjustment, adaptive analytics technologies reflect real-time feedback from analysts working cases. This results in a self-learning and adapting model that is constantly responding to the production environment.


Further, other forms of AI are being increasingly utilized to find the needles in the haystack of knowledge. As an example, auto-encoder technology is consistently used by leading corporations to monitor how data and features are changing between development and product environments; the reconstruction error from these models directs one to new patterns. Implementing self-learning AI surfaces new information, patterns, and predictive features, which allow data scientists to detect and plan for changes in the future. Consequently, models are improved in a targeted and efficient way.


SZ blog 2.png

Self-Learning AI in Action

As an example, self-learning AI is leveraged by Stanford University to help address grant spending compliance. An original list of expert rules was compiled to capture domain knowledge as an intelligent base for the system and its basic features. Multi-Layered Self-Calibrating AI models then learned behaviors that were not captured by the established rules, surfacing latent features and optimal ways to calibrate the detection of non-compliance. Further, the self-calibrating analytics are constantly adjusting the models and the output is continuously monitored to identify high-risk outlier invoices for human review. This system identifies non-compliance and new schemes to help analysts organize and trace follow-ups.


An Integrated System

This integrated system was only possible with the use of a robust end-to-end platform. An example of such platform is the FICO Decision Management Suite (DMS), which has the power to weave intelligence throughout the entire decision process of analyzing data to make decisions that drive profitable action. DMS handles the variety and velocity of data needed to enable deployment of innovative self-learning AI-powered decisions. Continuous improvements can be made through optimization tools, and integrated, self-service collaborative development. This is all tied together with universal model governance and model management.


To achieve the promise of Big Data and machine learning, one must look beyond just an algorithm. The desired business value proposition must be kept top of mind from the inception of a project all the way through execution, with intelligence injected every step of the way. This is available in the FICO Decision Management Platform and corresponding applications-- try it for free here.


Read other articles by Scott Zoldi about machine learning, Big Data, analytics, and more on the FICO Blog.