Over the past decade, predictive analytics in the insurance industry has evolved from a tentative experiment to a competitive necessity. Any insurer that hopes to see the end of the next decade will need to augment traditional actuarial analytics with strong predictive analytics in marketing, operations, cash flow management, risk assessment, pricing, and claims. Although the Ph.D.s producing sophisticated models may get the headlines, the fact is that effective analytics must start with quality, relevant data. Good data can sometimes compensate for mediocre analysis, but the opposite is never true: Bad data or data implementation will always lead to bad results — no matter how skilled the analyst.

Insurance companies collect vast amounts of data. Policy data, billing information, underwriting data, and claims data are among the most important sources. The analyst's job is to make good use of all the data available within the company. However, this means that the original raw data for analytics is sourced from systems designed for purposes other than analytics. Rarely will a company collect data specifically for use in advanced analytics. In other words, analytics is an opportunistic user of data. The implication is that analytics rarely sets the quality requirements for the data. The concern for analytics often is as much about understanding and evaluating the characteristics and limitations of the data as the ability to manage data quality.

Many data quality and usability characteristics have been suggested in the academic literature, but the most important for the analyst to understand are:

? Accuracy – How well does the data element describe the object or fact in the real world?

? Reliability – Is the measurement repeatable and consistent for the object or fact in the real world?

? Timeliness – Is the data measured at the time that a prediction would have been made?

? Completeness – Is the data available for the vast majority of the cases in the database?

? Availability – Will the data be available in the appropriate form and updated at appropriate intervals to meet business needs?

? Permissibility – Will use of the data be legal and comply with regulations?

Information regarding quality and usability measures must be collected and documented about each data source and data element so that the data manager and analyst can make a determination about the fitness for use of the data in question.

Once the data has been determined to be suitable for analytics and will be available to meet possible business purposes, it will need to be managed. The usual method is to develop an analytic database that links all the data sources to be used in an analysis. The analyst usually requires a two-dimensional file of historic data for analysis consisting of unique keys of one or more target variables (dependent variables) and many potential predictive variables (independent variables). The challenges of managing an analytic database are in some cases very different and even counterintuitive compared with managing databases for other purposes, such as operations or reporting.

Because there is little or no control over the numerous data sources, the manager of analytic data must be able to deal with asynchronous update schedules for the different data elements needed for model implementation. In addition, since the initial analysis is conducted on historic data, managing the temporal aspect of the data also becomes crucial and may obviate the need for updates to the original analytic data set.

Developing an indexing strategy to ensure adequate query performance is more challenging for analytic data. Analytic queries are ad hoc, so it is useless to try to optimize indexes for specific queries. The best that can be accomplished is to understand the types of queries an analyst may require and build indexes to assist the most common. This tends to result in many more indexes being created than for a typical database, and some of the indexes may never be used.

Data warehouse designs are most commonly based on a star or snowflake schema, also known as a dimensional model. This approach has advantages for query performance and makes it easy for analysts to understand the data. The design exploits independent Second Normal Form dimensions (sometimes referred to as de-normalized) fitted with Third Normal Form fact tables. This goes against what many data administrators have been taught as good database design practice. Typical dimensions include multilevel hierarchies such as time, geospatial, geopolitical (demographic), and environmental. A good rule of thumb is to design the initial database to partition large tables, particularly large fact tables, and age-out old data right from the start.

Building dimensions on a speculative (opportunistic) basis, as opposed to traditional requirements-gathering practices, implies an agile methodological approach where work may be done iteratively and changed as necessary. Be transparent. To provide clarity, data should work with visualization methods that are easily accessible and capable of producing useful super-graphs that can contain many data points. Typical visualizations include graphs and charts that portray geospatial, network-relational, and time-relational perspectives.

Another common requirement is to be able to document system performance, usage metrics, and other key information needed to ensure accuracy, timeliness, and security of the data warehouse. Also, one must consider data privacy and compliance issues. As such, the development of metadata to support these processes is essential. Metadata should be available, current, accurate, and readily accessible. It may include data source, versioning, descriptions and definitions, and locations for access, as well as data quality metrics such as frequencies, ranges, domains (allowable values), and transformation logic.

While a company's internal data is the obvious first place to look for analytic data, publically available data and third-party data vendors are also important sources of data for analytics. Of course, using third-party data sources will incur a licensing fee, so cost-benefit analysis becomes more important than for internal or publically available data. The benefit of a third-party data source is the predictive power that the third-party data provides over and above the data already available. One method to ascertain the benefit is to build the best model possible using only internal data and then add the third-party data. Determine the lift added by any third-party data elements, and this will be the benefit.

For those data elements that attain the appropriate incremental predictive power to warrant implementation into an insurer's business processes — and are available and permissible for use — an implementation plan must be developed.

The development of the business implementation rules should involve the users of the results and should match operational and strategic plans. Operational management should help establish monitoring and fine-tuning processes to ensure business goals are being met. Your information technology staff may need to be involved to build data feeds for real-time and batch data transfer. And legal and finance will need to participate in contract negotiations if any third-party data sources are earmarked for implementation.

Communication to all those involved is key. Can you effectively explain to those affected by the inclusion of the new data how it may alter their daily activities? Wherever possible, discuss how the new data may impact them, and also provide explanations that can be used to answer questions from new or existing policyholders.

As for timing, decide where and when you will begin using the new data. Will you use the data on new business only or for renewals? Will you transition your existing book incrementally or all at once? Your rollout doesn't require a full integration. You can achieve results with a phased implementation. This approach allows for testing and refinement before full integration.

After implementation, you may need to fine-tune the process and make adjustments to meet your business needs. Plan to refresh the data according to a predetermined schedule and perform minor recalibrations if necessary. And monitor whether all participants in the implementation are executing the plan properly.

To remain competitive, continue to look for new data sources and/or additions to existing data that were previously unavailable, along with new ways to analyze and integrate information. Observe changing market conditions so you can adjust your competitive strategy accordingly.

Developing a data management capability for analytics and effectively implementing the data into your processes requires specialized skills and techniques. It is not easy, quick, or inexpensive, but it is arguably the cornerstone of any serious advanced analytics effort and essential to attaining or maintaining a meaningful competitive advantage.

Phil Hatfield is vice president of product development at ISO Innovative Analytics (IIA). Darlene Pogrebinsky is vice president of product management at ISO Innovative Analytics (IIA). Gerry Gloskin is director of data warehousing at ISO Innovative Analytics (IIA). IIA is dedicated to developing new analytic products and enhancing existing products through strategic alliances with ISO business units, subsidiaries, and client companies.

Want to continue reading?
Become a Free PropertyCasualty360 Digital Reader

Your access to unlimited PropertyCasualty360 content isn’t changing.
Once you are an ALM digital member, you’ll receive:

  • Breaking insurance news and analysis, on-site and via our newsletters and custom alerts
  • Weekly Insurance Speak podcast featuring exclusive interviews with industry leaders
  • Educational webcasts, white papers, and ebooks from industry thought leaders
  • Critical converage of the employee benefits and financial advisory markets on our other ALM sites, BenefitsPRO and ThinkAdvisor
NOT FOR REPRINT

© 2024 ALM Global, LLC, All Rights Reserved. Request academic re-use from www.copyright.com. All other uses, submit a request to [email protected]. For more information visit Asset & Logo Licensing.