SAS Response to EBA report on Big Data and Advanced Analytics

Banks see Open Source as a hotbed for rapid innovation. As a result, open source is now firmly on the agenda for decision makers at the world’s leading financial institutions. The thinking is that to drive digital transformation, their businesses need real-time insight. To gain that insight, they need AI. And to deliver AI, they need to be able to harness Open Source tools.

Banks see Open Source as a hotbed for rapid innovation. offering flexibility without the perceived cost of expensive vendor software or the need to wait months for the official release of a new feature when you can edit the source code on demand.

The need for innovation has become a top priority within the Banking Industry, with a strong need to remain competitive and relevant to their customers, this is where enthusiasm for Open Source software is especially prevalent, where languages such as R and Python have built an increasingly dominant position in the sphere of artificial intelligence and machine learning. As a result, open source is now firmly on the agenda for decision makers at the world’s leading financial institutions. The thinking is that to drive digital transformation, their businesses need real-time insight. To gain that insight, they need AI. And to deliver AI, they need to be able to harness Open Source tools.

However, despite its popularity, open source models are not without their challenges and Banks should still have legitimate concerns about the encountering these challenges along their open source journey. After all, Banks are first in-line to feel this pressure due to their high-regulated nature. The EBA’s recent report on Big Data and Advanced Analytics outlines a number of risks with an open source approach and proposes success factors to help mitigate these risks.

We believe it’s about having the right balance between Choice and Control

We understand that choice and flexibility is critical to a successful analytics strategy. It drives innovation and creativity. It’s about the ability to embrace Open Source technology whilst harnessing the control you need to successfully execute your analytics strategy in a way that’s right for your Bank. Analytics chaos can easily derail progress. Without control over your data, models and process, you can’t trust the results. By providing the perfect balance of choice and control, SAS enables you to orchestrate your analytics journey to ensure optimal returns on your investments in data, talent and analytic technology. 

Below we discuss the key findings from the EBA’s recent report on Big Data and Advanced Analytics and how SAS can help alongside Open Source to mitigate the risks:

Success Factors for mitigating the risks from the use of Open Source Solutions

- Open Source tooling and support for the entire data science process.

When Banks use open-source frameworks this can include a diverse range of tools across the data science process. Open source projects are typically tightly focused on solving a specific set of problems. Each project is a powerful tool designed for a specific purpose whether that be manipulating and refining large data sets, visualising data, designing machine learning models, running distributed calculations on a cluster of servers, and so on. Often no dedicated tools appear to prevail and in-house solutions are used in combination with ad-hoc tools as required. This complicates matters somewhat – especially if it’s not the case that the tools support the entire data science process leading to a getting a specific output or result in a reproducible way, for example, in some institutions only the source code is recoverable while in other institutions all relevant events are reproducible. As a result, unless banks are prepared to invest in building a robust end-to-end data science platform from the ground up, they can easily end up with a tangled string of cobbled-together tools, with manual processes filling the gaps.

SAS can help to mitigate this risk by integrating the entire life cycle, with traceable results path from start to finish, visible to any audit, which provides the Bank the confidence to rerun processes and obtain the same result.

- Explainability and interpretability: Lack of explainability could represent a risk in the case of models developed by external third parties and then sold as ‘black box’ (opaque) packages and techniques exist to prevent or detect bias.

Banks using machine and deep learning models are making determinations that affect our lives, such as mortgage and loan decisions. For this reason, everyone who automates processes and decisions with AI must deal with the ethical aspects – for moral, regulatory and practical reasons. After all, no company wants bad results to negatively affect its image.

A model is explainable when its internal behaviour can be directly understood by humans (interpretability) or when explanations (justifications) can be provided for the main factors that led to its output. The significance of explainability is greater whenever decisions have a direct impact on customers/humans and depend on the particular context and the level of automation involved. Explainability and transparency refer to the entire analytical process, not just to an algorithm of machine learning that automates a decision.

But even machine learning algorithms are not a closed black box forever. “The algorithm made me do it” can never justify the consequences of using AI. It is trust and transparency that remove barriers to the use of AI – to the benefit of consumers, legislators and companies that use data analytics. To trust artificial intelligence, we need to explain how AI makes recommendations.

SAS is committed to solving the constantly evolving challenge of the explainability of AI. With natural language and open industry-standard frameworks directly embedded in the SAS Platform, like LIME, Partial Dependence, Individual Conditional Expectation, Kernel Shap and others, we help surface general biases in data and models and provide clarity into the factors and variables that lead to a decision. We can also help you reflect general biases in the data or model by automatically surfacing potential hidden relationships, enabling you to run what-if analysis.

- Traceability and auditability: the use of traceable solutions assists in tracking all the steps, criteria and choices throughout the process, which enables the repetition of the processes resulting in the decisions made by the model and helps to ensure the auditability of the system (including versioning).

The code of an open source project may be available for anyone to review. But tracing the complex web of dependencies between packages can quickly become extremely complex. This poses significant risks for any financial institution that wants to build on open source software. As a result, when a bank opts for an open source approach, it either needs to put trust in a lot of people or spend a lot of time reviewing, testing and auditing changes in each package before it puts any new code into production. This can be a very significant trade-off compared to the safety of a well-tested enterprise solution from a trusted vendor. Especially because banking is a highly regulated industry, and the penalties for running insecure or noncompliant systems in production are significant. Essentially, if you build a credit risk model that depends on an open source package, your systems also depend on all the dependencies of that package. Each of those dependencies may be maintained by a different individual or group of developers. If they make changes to their package, and those changes introduce a defect, or break compatibility with a package further up the dependency tree, or include malicious code, there could be an impact on the functionality or integrity of your model or application.

SAS integrates the entire life cycle, with traceable results path from start to finish, visible to any audit and data and model governance to verify the accuracy and validity of analytical results with a rigorous testing framework and proven domain practices.

- Data protection – for data protection, institutions must comply with the GDPR throughout the entire lifecycle.

The EU and many countries outside the EU have established strong laws and regulations to protect personal data. In the case of the GDPR, data subjects have a right to be informed about automatic decision making.

SAS delivers a unified view of your data. Our superior detection capabilities let you search your entire network to locate personal data stored in varying file formats and traditional and emerging data sources. With SAS, it’s easier to find the data that’s needed, and you can seamlessly manage logging, user access and encryption to ensure enterprise governance and compliance.

- Data security and model security will become increasingly important. Appropriate safeguards for data security and model security need to be defined and implemented, and data security could be addressed throughout the entire analytics lifecycle.

Secure your analytical assets with features such as authentication, authorization and encryption. Follow organizational and regulatory standards for reliance and user access.

SAS plus open source

By combining the power of the SAS with open source technologies, you can unify disparate toolsets and analytics assets into a streamlined, governed and collaborative environment that improves productivity, fosters business agility and delivers tangible results. 

One SAS client, a large financial services provider in the UK, recently took this exact approach. The client uses open source languages to develop machine learning models for more accurate pricing. Then it uses the SAS Platform to train and deploy models into full-scale production. As a result, model training times dropped from over an hour to just two and a half minutes. And the company now has a complete audit trail for model deployment and governance. Crucially, the ability to innovate by moving from traditional regression models to a more accurate machine learning-based approach is estimated to deliver up to £16 million in financial benefits over the next three years.