Synthetic Data for Health Insurers

Posted June 27, 2022

Earlier this year, one of the largest public health insurers, Anthem (or Elevance Health) announced that they’re working with Google Cloud to create a synthetic data platform to generate an amazing 1.5 to 2 petabytes of synthetic data. That would include artificial medical histories, claims data, and more. The synthetic data is designed to be used to validate and train AI systems to ultimately better detect fraud while improving personalized care for members.

That likely had a lot of health insurers asking how they can leverage synthetic data to improve their business and development processes. With that in mind, here’s a short primer on synthetic data for health insurers.

What is synthetic data?

As the name implies, synthetic data is artificially generated data that isn’t created as a result of direct measurement. Typically, synthetic data algorithms or computer simulations generate the data. Synthetic data can take the form of tabular or relational data, text-based data, or image-based data.

Health insurers can create that data in any number of ways. Many organizations have likely created rule-based data, which generally uses basic SQL and/or rules to generate data. For example, many organizations have likely seeded a database with customer data using rules dictating how the data should look. Other ways of creating synthetic data include statistical models which build records similar to real data and incorporate the statistical distributions of the original data. There’s also a machine learning technique called generative adversarial networks (GAN) that automatically generates data based on patterns in input data that can be used to create synthetic data. Agent-based models can also be used to generate synthetic data.

What are the benefits of using synthetic data?

There are several benefits to using synthetic data. Among them:

It can drive down costs if real data is expensive
It can improve outcomes if real data is not available. This likely happens if you’re creating a new product and don’t have available data.
It can help you overcome privacy regulations that limit access to real data and/or require masking techniques.
Synthetic data can improve artificial intelligence (AI). In some cases, a lack of data can introduce bias into some AI systems. Or, it can help create a better understanding of relatively rare scenarios by enabling larger data sets and greater exploration of those rare scenarios.

What are some risks of using synthetic data?

As with anything, there are some risks when using synthetic data. Those include:

The quality of the synthetic data depends on the quality of the input data and how developers generate the synthetic data. Poor input data and generation models lead to poor synthetic data.
As an emerging technology, not everyone in your organization may accept using synthetic data.
It could be difficult to replicate real data if you’re generating complex data and relationships.

When should health insurers use synthetic data?

There are many reasons and situations where payers should consider using synthetic data. As we mentioned, if you want to test a system where no live data exists, synthetic data can help you perform tests that can help determine the efficacy of that system.

Another use case: Testing analytics products. With the rise of third-party analytics platforms that combine data from disparate systems, the ability to create synthetic data to test and implement dashboards can make it easier to trial analytics tools without using live data.

Many payers leverage synthetic data to train machine learning models. Because the data can be built with a wider range of edge cases than you may find in real data and can be balanced to minimize bias. As a result, you can test your machine learning project faster and likely get it up and running faster, as well. Anthem plans to use its synthetic data to train AI to identify potential fraud. A future use case involves training AI to provide a more personalized member experience. That includes potentially identifying members in need of medical intervention.

Synthetic data can also help test your systems. For example, if you can’t access production data, you can leverage synthetic data to test a wider range of possible use cases. Plus, you can increase the volume of data to do volume testing.

Humana even built a Data Exchange for the healthcare industry that contains millions of records built using synthetic data.

Is there software that can help me create synthetic data?

Yes, several companies can assist in developing synthetic data. For example, BetterData, Diveplan, Hazy, Mostly AI, and Tonic are companies offering some form of synthetic data development.

Certifi’s health insurance premium billing and payment solutions help healthcare payers improve member satisfaction while reducing administrative costs.