Monte Zweben recently attended a conference on the use of data in the Insurance industry at the St. John’s School of Risk in New York City and here are 5 key things he took away:
1. All insurance companies aspire to use more data but few are able to operationalize the use of data
The insurance industry has a renewed realization of the value of data due to AI and machine learning. Predictive models have myriad applications in the insurance industry including optimizing customer acquisition, delivering personalized service, processing claims efficiently, intelligently underwriting policies, and detecting fraud more effectively. The common ingredient to build and train predictive models is operational and business data which the industry now can now harness both within the company as well as externally. One person sitting next to me at the conference worked at a reinsurance company and had the title of “Data Hunter”. Her sole job was to seek new data sources to help the company. However operationalizing AI can be challenging for incumbent insurance companies whose traditional IT infrastructure cannot scale to take advantage of new data sources and whose internal data is locked up in silos that are incompatible with each other.
Implication
One of the biggest opportunities the industry incumbents have at their disposal is to break down the corporate data silos. Imagine a data platform that has the customer, policy and claims data all stored in one place so that new policy underwriting could consider previous claims and leverage data from the underwriting process such as policy modifications.
2. Early movers are using external data in really interesting ways:
If there is a common thread that has the potential to transform all the segments and the lines of business of the insurance industry, it is the use of external data. New data sources are transformative for the insurance industry because they can make customer interactions seamless to increase brand loyalty, make critical business processes such as claims management efficient and even help implement preventive practices that can improve the overall profitability of the industry. Let’s explore some of these new data sources.
Automobiles equipped with sensors (telematics) and mobile apps will make the process of claims management automated. Armed with data from sensors, insurance companies will no longer be dependent on the parties involved in the incident to determine liability. Furthermore, the application of artificial intelligence and machine learning (ML) to this data will enable the insurance companies to resolve the claims and pay for the damages in a matter of days versus weeks or months. The sensor data and ML will also play a significant role in identifying fraudulent claims and preventing claims to improve the profitability of the insurer. I love the ad where the pregnant mother insists that her husband not speed to the hospital so that she retains her safe driving rating on her mobile app. A financial incentive to not speed using a mobile app.
Sensors are not only relevant to the automobile insurance but also to property and casualty (P&C) business. Smart devices such as thermostats, smoke detectors, and security systems represent only the first step towards preventing an adverse event. Once we have sensors in our homes and offices that can detect events such as fire and leaks before they happen and notify the relevant agencies or the homeowner, the potential losses that insurance companies have to cover each year will be significantly reduced.
According to eMarketer, about 22% of the U.S. population owns a wearable device. These devices that track physical activity and vital signs enable insurers to incorporate this data into pricing life insurance policies based on the lifestyle of the applicant. Insurance companies that will figure out how to make this data as part of their underwriting and pricing processes will also have the first-mover advantage of targeting the healthiest and the most profitable segment of the population.
Implication
Insurance companies need a data platform that has three defining attributes. First, it must be capable of storing data from diverse data sources including the ones mentioned above but can also scale from terabytes to petabytes and from a few to hundreds of nodes by simply adding commodity servers.
Second, the platform must form the foundation of both the operational fabric of the insurance company capable of powering mission-critical applications in addition to facilitating data analysis for decision support and management reporting. Insights should not be decoupled from the application but rather be inextricably linked.
Third, the platform must offer functionality to build, train and operationalize predictive models using machine learning – not just as a stand-alone data science workbench but at the underlying database level where operational and business data is stored to accelerate model training and operationalization into the production environment.
3. Incumbents will need technology-based solutions to thrive against InsureTechs and industry disruption
InsureTechs refers to companies that are using technology to disrupt the traditional insurance industry. InsurTechs tend to be smaller entrepreneurial companies with roots in data, artificial intelligence, and mobile application development. For example, companies like DataCubes and Friss are using data science to transform and accelerate core insurance functions such as commercial underwriting and fraud detection respectively. Whereas others such as Metromile and Root Insurance are reinventing core insurance products such as usage-based auto insurance based on the driving distance and habits of the insured.
InsurTechs are disrupting the industry not only through the application of technology but they are also reshaping consumer expectations and demands. According to McKinsey research, since 2012, more than $10 billion has been invested in the InsurTech sector.
Implication
In order to compete effectively against InsureTechs, incumbents must reinvent and modernize the applications that have been the source of their competitive advantage. These are the same applications that have been targeted by InsureTechs using artificial intelligence and machine learning. Incumbents have rich data sources and experienced personnel that have been trained in technologies such as SQL Rather than trying to duct tape together various components of the traditional IT infrastructure and acquire hard to find skills, insurance companies need to consider a unified platform that can effectively manage both operational and analytical data using SQL.This platform will provide the foundation to insurance companies to build predictive algorithms at the database level. In-database machine learning can greatly accelerate the speed of decision making and help incumbents leapfrog the InsureTechs.
4. Data lakes still plague insurers
In an effort to manage Big Data effectively and to drive real-time analytics and decisions, the insurance sector invested heavily in data lakes. These data lakes were built using commercial Hadoop distributions – a number of independent Open Source compute engines supported in a platform. This platform would power the data lake to analyze data in different ways. However, the data lake’s schema-on-read functionality led insurance companies to bypass the process of defining which tables contained what data and how are they connected to each other resulted in a repository built haphazardly.
Data lake projects have begun to fail because insurance companies like companies in other industries placed a priority on storing all the enterprise data in a central location with the goal to make this data available to all the developers — an uber data warehouse if you will versus thinking about how the data will power applications. As a result, Hadoop clusters have devolved into gateways of enterprise data pipelines that filter, process, and transform data that is then exported to other databases and data marts for reporting downstream. Data in the data lakes almost never finds its way to a real business application in the operating fabric of the enterprise. As a result, the data lakes end up being a massive set of disparate compute engines, operating on disparate workloads, all sharing the same storage which is very difficult to manage.
Implication
Like I discussed in my article, insurance companies are increasingly under pressure to demonstrate the value of their data lake, but they need to focus on the operational applications first and then work their way back to the data. By focusing on modernizing the applications with data and intelligence, insurance companies will be able to develop apps that can leverage data to predict what might happen in the future. Insurance companies can then proactively make decisions in-the-moment and without human intervention that results in superior business outcomes.
5. Regulators have made data governance and ML transparency paramount
Given immense data volume and diverse data from a large number of data sources, the real value of AI and ML can only be achieved if the system is capable of making intelligent decisions at scale without human intervention. However, this capability once achieved gives rise to the perception of a “black box” where most of the business personnel do not fully understand why or how a certain action was taken by the predictive model. This capability is simply not just nice to have but actually critical for use cases where the insurance company must be able to document and defend its decision such as the denial of a claim or insurance policy. Regulators will increasingly press the insurance companies to explain the inner workings of their predictive models, especially in cases where models are used in underwriting and pricing determine premiums to ensure the absence of any discriminatory practices.
Implications
Data governance provides a framework that helps define how the data is sourced, managed and used in any ecosystem. This framework is used to enhance the confidence of the enterprise in its data and the actions that are taken based on analyzing that data. At a time when the insurance industry is undergoing a major transformation, companies not only need a robust framework that provides visibility into data lineage, the transformations that have been performed on it and how it is used but the same framework must now also encapsulate predictive and machine learning models. Insurance companies must be able to demonstrate to regulators all the experiments that their data scientists have performed and which model was put into production and how it was modified over time. Therefore data governance must be an integral part of the platform that is used by data scientists to build, train and operationalize models.
To accomplish this goal, consider a platform that provides data scientists the ability to experiment freely. Data science and building predictive models is an iterative process that requires data scientists to continuously run their models using various features, hyper parameters and algorithms to assess their impact on predictive accuracy of the model. In order to keep track of their experiments, data scientists require a platform like MLFlow with built-in capability to keep track and document the variables for each run so that they can objectively demonstrate to internal stakeholders and external regulators the rationale for placing a specific model into production and the absence of any discriminatory practices.
If you’d like to learn more about application modernization in the insurance industry, the team at Splice Machine (where I’m CEO and co-founder) has created a white paper that reflects the work we’ve done with some of the world’s leading insurers. You can download it here.