#274 Your Data Platform is a Product, Treat it Like One! - Interview w/ Sean Gustafson
Manage episode 387460410 series 3293786
Please Rate and Review us on your podcast app of choice!
Get involved with Data Mesh Understanding's free community roundtables and introductions: https://landing.datameshunderstanding.com/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
Episode list and links to all available episode transcripts here.
Provided as a free resource by Data Mesh Understanding. Get in touch with Scott on LinkedIn.
Transcript for this episode (link) provided by Starburst. You can download their Data Products for Dummies e-book (info-gated) here and their Data Mesh for Dummies e-book (info gated) here.
Sean's LinkedIn: https://www.linkedin.com/in/seangustafson/
In this episode, Scott interviewed Sean Gustafson, Director of the Data Platform at Delivery Hero.
Delivery Hero has been on the data mesh journey for longer than most organizations, at least over 3 years.
Some key takeaways/thoughts from Sean's point of view:
- It's extremely hard but still important to try to impact your culture through things like your data platform. Who are you trying to make information available to? How do you make it accessible? How do you make data ownership easier?
- A key role of the data platform is that golden/easy path. Showing people easy ways to accomplish what they need with data products. Embed best practices into the platform when possible.
- You need a product manager in your data platform team. It's easy-ish to build cool things in data but understanding and building to user needs is harder and a must. Treat your data platform as a product!
- Relatedly, there isn't anything all that special about product management around the data platform. You can take what we've learned from other disciplines - especially software - and tweak it a bit for data. But it's not some arcane art.
- Focus on KPIs around what you are building and why, especially for your data platform. It's very hard to measure developer productivity but that doesn't mean you just don't measure it.
- ?Controversial?: Be prepared to deal with a lot of qualitative data when measuring success around your data platform. Surveys work far better than most might think.
- Good product managers balance the short and long-term. You don't want to make drastic and breaking changes to your data platform often but that doesn't mean you can't take bigger bets and shake things up. Just balance iterative improvements and the bigger picture. Scott note: Zhamak talks about Thomas Kuhn and cumulative progress versus paradigm shifts
- In the same vein, make small bets where small bets will do but don't be afraid to make big bets when necessary.
- ?Controversial?: It will be hard to iteratively change a traditional centralized-focused data platform to do data mesh/decentralized ownership well. You want to at least consider a fresh start when looking at your mesh platform.
- Tools like dbt have given a much broader group of people the ability to model their data. There are inherent problems if they don't do it well but we still need to encourage more people to do data work so we can get them better and producing great work.
- Data products are a lot like APIs. There are many best practices we can take from APIs and apply them to data products.
- Explaining data mesh to software engineers can be tough. They probably get the concepts given most are just software engineering concepts reconceptualized for data. But the biggest challenge is they will probably see data as a second class citizen to the underlying back-end systems. Scott note: Unfortunate but extremely common
- In data incident management, e.g. data loss, you have to look at the prioritization but our general historical focus - how much money did we lose - just doesn't make sense in data. We have to take reliability engineering practices from software and tweak them to work with data but we can take incident management essentially as is. We just have to understand prioritization far better.
Sean started with a bit about how he sees his role as leading the data platform team. It is very challenging but still important in his view to try to shape culture even through the data platform. There are so many places in data mesh where there is friction, how do you make things easier as everyone transitions to product thinking and decentralized ownership? Just because you have mandates from the top, people need new ways to accomplish new goals. Make your platform reflect the type of data culture you want. Instill in people the understanding that they can and should participate in your data culture/work. Easier said than done of course.
Relatedly, Sean believes the data platform should show people the right way to do things, give them that easy path where possible. But still give them the freedom to do some aspects … not so right 😅
Treat your data platform as a product is something Sean strongly believes in. And to do that, you need someone acting as a product manager. It's not rocket science, we know how product management works and it's not very different when it comes to building a data platform. But you need someone specifically focusing on user needs. And part of that role is also to advocate new features and using the platform. Just because you built it, that doesn't mean people will use it.
When asked about iterating to good, Sean talked about how in product management, good practice is about making constant and small improvements but also balancing the bigger picture/big bets. It's not always about the big new platform but sometimes, it's okay to shake things up - make small bets when small bets are good enough but make big bets when necessary. But you have to do that by balancing the short-term and long-term picture. Fail fast and iterative improvements are crucial to good product thinking in software and we need to apply that to data. But again, big changes are okay if you properly build to them instead of trying to flip a switch. He specifically mentioned that it will be hard to iterate to a platform that does decentralized ownership well from one that was highly centralized. Not impossible but at least consider building that out more from scratch.
Sean talked about Generative AI and how it's starting to change lots of people's views internally about data. While previously, many software teams were at best reluctant/hesitant to model their data, there is a big interest from the software engineers to directly interact with the large language models (LLMs). Tools like dbt previously brought many new people to the data party, making it easy to model data - at least structurally - so hopefully GenAI will mean more people learning to model their data. There are inherent challenges but the more the merrier when it comes to people working to produce good data. We just have to make sure they learn how to do it well 😅many who are new to data modeling do it… not so well…
When it comes to product management, you need to measure how well you're doing. For Sean, that of course extends to the data platform. While KPIs can be somewhat hard around your data platform, that doesn't mean you get to slack off and not measure things. At Delivery Hero, right now they are using surveys to measure a number of things around their data platform rather than trying to measure things automatically without context. It also creates a lot of conversations in the data platform team about what are you trying to do and why, which prevents a lot of waste. It's not perfect but it's getting better. Scott Note: this is why I am writing a book on success factors then one on success metrics in data mesh 😅 this is HARD
Sean talked a bit about APIs and how much data products _should_ be treated like APIs. Not just versioning but tracking usage and having users register to use them. There's a lot to learn from how APIs evolved so we don't have to make the same mistakes in data. Scott note: Zhamak comments on this VERY frequently that API approaches are crucial to data mesh
When talking to software engineering people, Sean has found using data terminology, especially data mesh terminology, doesn't really resonate with them. We probably need to come up with new terms - or potentially use the terms Zhamak took from software and just make them about data too instead of inventing new terms. But be prepared for it all to fall back to that most software people will see the back-end systems as more important than the data. If you get them over that hump, it's far easier to get them bought in on data mesh. You may be able to win them over by showing them how the data is used internally.
Incident management in data is still pretty nascent in Sean's view. While on the software engineering side, there are very well established processes, often in data it has been more slapdash at best. No escalation, no prioritization, no formal process, no post mortem + shared learning, etc. The traditional measure around data issues - how much money did we lose - often isn't applicable to data. So we have to rethink what matters and why because our prioritization is often skewed.
Sean wrapped back to the start about how important culture is. Not just getting your organization to be data driven but setting up more and more people for success in your organization through their work with data.
Learn more about Data Mesh Understanding: https://datameshunderstanding.com/about
Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/
If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf
422 episodes