Detecting insurance fraud through graph algorithms

Graph algorithms can help insurance providers to uncover hidden relationships between potential fraudsters.

While other forms of data analytics can also detect instances of fraud, these other machine learning algorithms often act as “black boxes” — spitting out predictions, but not always giving analysts the context necessary for them to immediately understand why it is likely that a given claim might be fraudulent. (Credit: Olivier Le Moal/Shutterstock.com)

Insurance fraud is an enormous problem — not only for insurance companies, but for law enforcement, banks and other financial institutions as well. According to some estimates, fraud accounts for up to 10%-20% of insurance losses. The FBI estimates that the total annual cost of insurance fraud exceeds $40 billion, with some other estimates exceeding $80 billion. Consumers end up paying for fraud through higher premiums, costing $400-$700 per year for the average U.S. family.

The vast majority of insurance providers have dedicated fraud investigation teams in place, and yet many insurers haven’t begun to leverage modern technology to detect instances of fraud. As recently as 2019, only 1 in 5 insurers planned to implement artificial intelligence for fraud detection over the course of the following two years.

The benefits of big data analytics, artificial intelligence and machine learning are obvious. These tech tools can process much more information than human teams can handle, and over time they can even teach themselves to become better at spotting suspicious behavior.

In particular, graph algorithms are extremely useful for analyzing insurance claim data. A graph algorithm (or simply “graph”) is a data structure consisting of vertices (various data points) and edges (relationships between those data points). Graphs can be of use in fields like social media and transportation, helping organizations to better understand the relationships and interactions between users and vehicles, respectively. They can also help financial firms and insurance companies to identify instances of fraud more accurately.

In a way, graphs seem tailor-made for detecting insurance fraud, which is often perpetrated by loosely connected criminal networks. For instance, in states with no-fault auto insurance (which allows policyholders to recover losses from their own insurance company, regardless of who was at fault in an automobile accident), dishonest attorneys, medical providers, repair shops and others may “pad” costs for legitimate claims. In other cases, criminal teams will stage completely fake accidents — complete with fake drivers, fake passengers, fake pedestrians and fake witnesses. To mask their criminal activity, these fraudsters will sometimes change roles — playing the driver in one scam, a pedestrian who’s been hit in another and a witness in yet another.

These fake accidents will mostly result in relatively small claims. And since minor motor vehicle accidents happen many times every day in real life, any single incident is unlikely to throw up any red flags. But by connecting the dots through graph algorithms that identify “central” actors or discover certain structure in graphs, insurers can find the hidden relationships between multiple accidents and begin to see patterns that might indicate fraud.

Visualization is one of the key benefits of looking at data through a graph algorithm. A graph makes it easy for data analysts to see all of the different relationships among various actors, and then dig deeper into an incident or a group of incidents once they notice a suspicious pattern. This visual element can also be useful when reporting suspicious activity to executives.

While other forms of data analytics can also detect instances of fraud, these other machine learning algorithms often act as “black boxes” — spitting out predictions, but not always giving analysts the context necessary for them to immediately understand why it is likely that a given claim might be fraudulent. The visual component of graphs lets analysts immediately see the relationships between various parties in a scam, and also gives them the information they need to effectively escalate the case within their organization.

Additionally, analysts can tag vertices and edges in graphs with metadata, incorporating factors like age, how many times a person has been involved in an accident and any other information that is deemed relevant.

In a recent white paper, one leading graph database provider laid out an example scenario of how powerful graphs can be in helping insurance companies to detect fraud. In this scenario, criminal rings made up of doctors, lawyers, body shops and accident participants collude to stage “paper collisions” that result in soft tissue injuries. These types of claims are favored by fraudsters because they’re hard to validate and expensive to treat. If ten people stage five false accidents, the white paper estimates, the fraud ring can generate up to $1.6 million in injury and automobile damage claims.

Julian Shun, a lead instructor at the Massachusetts Institute of Technology. (Credit: Lillie Paquette)

Not even the most sophisticated data analytics solution will be able to completely eliminate fraud in the insurance industry. But graph algorithms are a powerful tool that can help analysts to spot the relationships that form the foundation of insurance fraud.

Julian Shun, lead instructor of Massachusetts Institute of Technology’s (MIT) Professional Education’s Graph Algorithms and Machine Learning course, is an associate professor of Electrical Engineering and Computer Science at MIT and a lead investigator in MIT Computer Science and Artificial Intelligence Laboratory. His research focuses on the theory and practice of parallel algorithms and programming, with particular emphasis on designing algorithms and frameworks for large-scale graph processing and spatial data analysis. He also studies parallel algorithms for text analytics, concurrent data structures and methods for deterministic parallelism.

The opinions expressed here are the author’s own.

Related: