Artwork

Content provided by Benjamin Bourgeois. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Benjamin Bourgeois or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

CDO Matters Ep. 16 | The Death of the Single Version of the Truth with Jeff Jonas

50:40
 
Share
 

Manage episode 362469513 series 3473189
Content provided by Benjamin Bourgeois. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Benjamin Bourgeois or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

The truth isn’t always black and white. Sometimes, it requires more context and background when attributed to different scenarios and situations.

The same can be said about your data and whether a “single version of the truth” can be properly applied to multiple use cases for your business.

In this episode, Malcolm interviews Jeff Jonas, the Founder, and CEO of Senzing — a software company on the leading edge of developing “entity resolution” solutions — which solve the growing challenge of uniquely identifying people or objects across multiple systems of duplicate, low-quality data. Malcolm and Jeff discuss how advances in technology are fueling more modern forms of entity resolution, where companies are now able to implement more context-centric approaches to complex matching, particularly within their master data management (MDM) programs.

As technical as “entity resolution” may sound, the two uncover the global effect this technology has on people each day — including the job of the Chief Data Officer (CDO). Also known as “disambiguation” or “fuzzy matching,” effective entity resolution allows software systems or data stewards to decipher whether records for Richard Smith and Dick Smith may represent the same person, even when it is not overtly suggested by the data.

Jeff describes how entity resolution sits as a foundational component of data within MDM, customer relationship management (CRM), know your customer (KYC), supply chain and every other major business process that relies on accurate, trustworthy data.

Jeff correctly notes that aside from horrible customer experiences that may arise from a lack of effective entity resolution, “it creates a lot of waste for companies to think you are two or three people instead of one.” Citing a person’s example of having their name represented three distinct times in a hotel loyalty club database, he emphasizes the toxicity that comes with a lack of focus on entity resolution for companies who are trying to be both customer and data-centric.

While many companies — particularly those already investing in AI/ML — may be attracted to implementing DIY solutions for entity resolution, Jeff notes that it’s “super expensive to build”, especially given the complexity and diversity of data, and even language itself. The ability to understand meaning across objects, cultures, languages and even alphabets is at the core of reliable entity resolution and building bespoke solutions for tackling these complex problems — at scale — is beyond the capabilities or budgets of an overwhelming number of companies.

When considering a “single version of the truth”, Malcolm unpacks the 30-year history of large, monolithic enterprise resource planning (ERP) suites that created the mindset of master data only living in a singular place within the organization. Thanks to the democratization of IT, the “single version” mindset is shifting both as a practicality and as a business need. Today, master data can be sourced from a single location while supporting multiple versions of truth based on the use case of that data.

In talking about the evolution of large-scale entity resolution, its use in MDM to enable multiple versions of the truth and the legacy requirement to have data stewards manually review records, Jeff notes, “There are definitely times…when you want a human to take a look and make an adjudication. But, I will tell you, in large-scale systems, you don’t have enough humans.”

Rather than adding more people into data stewardship roles to support higher confidence matching, Jeff advocates the approach of widening the pool of data used by entity resolution processes — beyond just name and address — to make match decisions, including the possible use of third-party data sources.

The last few minutes of the conversation go deep into AI/ML, and how these new technologies are used to augment human data stewardship processes. Jeff makes a great case to suggest that most stewardship tasks could be mostly automated, but that many companies are unable to duplicate pure human judgment.

Throughout the conversation, Jeff and Malcolm take extremely complex technical issues and make them digestible and relatable to CDOs — consistently refocusing on how using entity resolution is critical to establish truth in an organization, especially when using MDM systems to manage said truth. CDOs who want to have more informed conversations with their technical staff about the role of entity resolution going beyond just “fuzzy matching” will find this episode of CDO Matters highly insightful.

Key Moments

[3:42] Entity Resolution Defined

[7:50] The Impact of Poor Data Quality

[11:23] The Death of the ‘Single Version of the Truth’

[15:45] Entity Resolution Failures

[24:03] Understanding Unique Entities

[26:13] The Cost of Being Wrong

[29:10] Human Intervention vs. Trust in the Algorithm

[32:05] Gaining New Insights with MDM

[35:09] Valuing Human Judgment

[40:03] The Future of Entity Resolution in Digital Transformation

Key Takeaways

What Is Entity Resolution? (1:58)

“Entity resolution is recognizing when two things are the same…it’s [also] called ‘fuzzy record matching’, ‘link detection’, ‘disambiguation’, ‘match/merge’ and lots of names. It’s been congealing into this term ‘entity resolution’ and is being more used. And really, the definition that I would have for it is recognizing when two identities are the same despite being described as different.” — Jeff Jonas

The Cost and Complexity of Efficient Entity Resolution (5:15)

“This problem [with poor entity resolution] is ubiquitous, and it turns out, is super expensive to build. People think you can hire a little team and do some AI/ML and think you are going to match well. And I am telling you, you cannot create something competitive in five years for twenty million.” — Jeff Jonas

Is the ‘Single Version of the Truth’ Dead in 2023? (13:07)

“There are still many people saying you need a single version of the truth…At an operational level [within a company], there are multiple versions of the truth. The way a marketer would define a customer is different than the way somebody in finance may define a customer, particularly B2B, where one would be a ‘sell to’ and the other is a ‘bill to,’ and they are both correct. A lot of people still think that is what MDM is, and it can be that if you want it to be that, but it doesn’t have to be.” — Malcolm Hawker

Working with Multiple Versions of the Truth (14:12)

“I think the ‘single version of the truth’…those are the dark ages, the dark days. The truth is you really want systems that can present truth to the eye of the beholder. It’s about who the recipient is. But there are two forms of truth: One is about separating ‘who is who?’ from which attribute is the best attribute…and how many entities does the organization have? Do you really want marketing to have a different [data] account than finance?” — Jeff Jonas

Human vs. Software: The Role of Human Judgment (29:38)

“There are definitely times/cases in data when you want a human to take a look and make an adjudication. But I will tell you, in large-scale systems, you don’t have enough humans. Second, I will tell you, ‘How does the human do it?’ The human is using additional data. It’s either data stuck in their head or they’re searching it up somewhere…but a lot of times, you have to actually do research…so making these decisions on records with some human intervention is about adding data. And one of the things that we propose is that there are kinds of data that is the initial data needed.” — Jeff Jonas

About Jeff Jonas

Jeff Jonas is not only the CEO and Founder of Senzing but also the Chief Scientist. Since 2016, the organization has provided fast and easy API for accurate data matching. For more than 30 years in the field, he has been at the forefront of solving big data problems for both companies and governments. National Geographic recognized Jeff for his talents in the data space, referring to him as the “Wizard of Big Data.”

EPISODE LINKS & RESOURCES:

Follow Malcolm Hawker on LinkedIn

Follow Jeff Jonas on LinkedIn

Visit Senzing’s website

Learn more about ‘entity resolution’

View a PDF of Jeff’s publication, Privacy by Design in the Age of Big Data

  continue reading

56 episodes

Artwork
iconShare
 
Manage episode 362469513 series 3473189
Content provided by Benjamin Bourgeois. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Benjamin Bourgeois or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

The truth isn’t always black and white. Sometimes, it requires more context and background when attributed to different scenarios and situations.

The same can be said about your data and whether a “single version of the truth” can be properly applied to multiple use cases for your business.

In this episode, Malcolm interviews Jeff Jonas, the Founder, and CEO of Senzing — a software company on the leading edge of developing “entity resolution” solutions — which solve the growing challenge of uniquely identifying people or objects across multiple systems of duplicate, low-quality data. Malcolm and Jeff discuss how advances in technology are fueling more modern forms of entity resolution, where companies are now able to implement more context-centric approaches to complex matching, particularly within their master data management (MDM) programs.

As technical as “entity resolution” may sound, the two uncover the global effect this technology has on people each day — including the job of the Chief Data Officer (CDO). Also known as “disambiguation” or “fuzzy matching,” effective entity resolution allows software systems or data stewards to decipher whether records for Richard Smith and Dick Smith may represent the same person, even when it is not overtly suggested by the data.

Jeff describes how entity resolution sits as a foundational component of data within MDM, customer relationship management (CRM), know your customer (KYC), supply chain and every other major business process that relies on accurate, trustworthy data.

Jeff correctly notes that aside from horrible customer experiences that may arise from a lack of effective entity resolution, “it creates a lot of waste for companies to think you are two or three people instead of one.” Citing a person’s example of having their name represented three distinct times in a hotel loyalty club database, he emphasizes the toxicity that comes with a lack of focus on entity resolution for companies who are trying to be both customer and data-centric.

While many companies — particularly those already investing in AI/ML — may be attracted to implementing DIY solutions for entity resolution, Jeff notes that it’s “super expensive to build”, especially given the complexity and diversity of data, and even language itself. The ability to understand meaning across objects, cultures, languages and even alphabets is at the core of reliable entity resolution and building bespoke solutions for tackling these complex problems — at scale — is beyond the capabilities or budgets of an overwhelming number of companies.

When considering a “single version of the truth”, Malcolm unpacks the 30-year history of large, monolithic enterprise resource planning (ERP) suites that created the mindset of master data only living in a singular place within the organization. Thanks to the democratization of IT, the “single version” mindset is shifting both as a practicality and as a business need. Today, master data can be sourced from a single location while supporting multiple versions of truth based on the use case of that data.

In talking about the evolution of large-scale entity resolution, its use in MDM to enable multiple versions of the truth and the legacy requirement to have data stewards manually review records, Jeff notes, “There are definitely times…when you want a human to take a look and make an adjudication. But, I will tell you, in large-scale systems, you don’t have enough humans.”

Rather than adding more people into data stewardship roles to support higher confidence matching, Jeff advocates the approach of widening the pool of data used by entity resolution processes — beyond just name and address — to make match decisions, including the possible use of third-party data sources.

The last few minutes of the conversation go deep into AI/ML, and how these new technologies are used to augment human data stewardship processes. Jeff makes a great case to suggest that most stewardship tasks could be mostly automated, but that many companies are unable to duplicate pure human judgment.

Throughout the conversation, Jeff and Malcolm take extremely complex technical issues and make them digestible and relatable to CDOs — consistently refocusing on how using entity resolution is critical to establish truth in an organization, especially when using MDM systems to manage said truth. CDOs who want to have more informed conversations with their technical staff about the role of entity resolution going beyond just “fuzzy matching” will find this episode of CDO Matters highly insightful.

Key Moments

[3:42] Entity Resolution Defined

[7:50] The Impact of Poor Data Quality

[11:23] The Death of the ‘Single Version of the Truth’

[15:45] Entity Resolution Failures

[24:03] Understanding Unique Entities

[26:13] The Cost of Being Wrong

[29:10] Human Intervention vs. Trust in the Algorithm

[32:05] Gaining New Insights with MDM

[35:09] Valuing Human Judgment

[40:03] The Future of Entity Resolution in Digital Transformation

Key Takeaways

What Is Entity Resolution? (1:58)

“Entity resolution is recognizing when two things are the same…it’s [also] called ‘fuzzy record matching’, ‘link detection’, ‘disambiguation’, ‘match/merge’ and lots of names. It’s been congealing into this term ‘entity resolution’ and is being more used. And really, the definition that I would have for it is recognizing when two identities are the same despite being described as different.” — Jeff Jonas

The Cost and Complexity of Efficient Entity Resolution (5:15)

“This problem [with poor entity resolution] is ubiquitous, and it turns out, is super expensive to build. People think you can hire a little team and do some AI/ML and think you are going to match well. And I am telling you, you cannot create something competitive in five years for twenty million.” — Jeff Jonas

Is the ‘Single Version of the Truth’ Dead in 2023? (13:07)

“There are still many people saying you need a single version of the truth…At an operational level [within a company], there are multiple versions of the truth. The way a marketer would define a customer is different than the way somebody in finance may define a customer, particularly B2B, where one would be a ‘sell to’ and the other is a ‘bill to,’ and they are both correct. A lot of people still think that is what MDM is, and it can be that if you want it to be that, but it doesn’t have to be.” — Malcolm Hawker

Working with Multiple Versions of the Truth (14:12)

“I think the ‘single version of the truth’…those are the dark ages, the dark days. The truth is you really want systems that can present truth to the eye of the beholder. It’s about who the recipient is. But there are two forms of truth: One is about separating ‘who is who?’ from which attribute is the best attribute…and how many entities does the organization have? Do you really want marketing to have a different [data] account than finance?” — Jeff Jonas

Human vs. Software: The Role of Human Judgment (29:38)

“There are definitely times/cases in data when you want a human to take a look and make an adjudication. But I will tell you, in large-scale systems, you don’t have enough humans. Second, I will tell you, ‘How does the human do it?’ The human is using additional data. It’s either data stuck in their head or they’re searching it up somewhere…but a lot of times, you have to actually do research…so making these decisions on records with some human intervention is about adding data. And one of the things that we propose is that there are kinds of data that is the initial data needed.” — Jeff Jonas

About Jeff Jonas

Jeff Jonas is not only the CEO and Founder of Senzing but also the Chief Scientist. Since 2016, the organization has provided fast and easy API for accurate data matching. For more than 30 years in the field, he has been at the forefront of solving big data problems for both companies and governments. National Geographic recognized Jeff for his talents in the data space, referring to him as the “Wizard of Big Data.”

EPISODE LINKS & RESOURCES:

Follow Malcolm Hawker on LinkedIn

Follow Jeff Jonas on LinkedIn

Visit Senzing’s website

Learn more about ‘entity resolution’

View a PDF of Jeff’s publication, Privacy by Design in the Age of Big Data

  continue reading

56 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide