MDS Newsletter #57
πͺ Happy Diwali to all! πͺ Here's wishing you infinite joy and prosperity on this festival of lights π
Modern Data Show S01 E07
S01 E07: Powering real-time Change Data Capture using Fivetran with Mark Van de Wiel, Field CTO at Fivetran: In this episode, Mark Van de Wiel (Field CTO at Fivetran, previously CTO at HVR Software) walks us through the tech architecture behind HVR (acquired by Fivetran in 2021), why they merged with Fivetran, how both the technologies complemented each other and how are they now powering real-time Change Data Capture. We also dig deep into open-source technology and solving the problem of long tail connectors.
Listen Nowπ
Featured tools of the week
- Debezium is an open source distributed platform for change data capture. Start it up, point it at your databases, and your apps can start responding to all of the inserts, updates, and deletes that other apps commit to your databases. Debezium is durable and fast, so your apps can respond quickly and never miss an event, even when things go wrong.
- 5X is the modern data stack as a managed service that enables companies to answer business questions, without having to worry about building data infrastructure or bringing in the right data engineering team. 5x helps top deploy end-to-end data strategy.
Featured data stack of the week
- Rack Room Shoes is an American footwear retailer headquartered in Charlotte, North Carolina, which operates more than 500 stores in 36 states under the Rack Room Shoes and Off Broadway Shoe Warehouse brands. Here's how they have organsied their data stack.
Good reads and resources
- How We Enabled Dev and Data Science Independence With Clear API Boundaries Using Airflow and Databricks: Your dev team needs to use a data science algorithm to solve a real business problem, but how can you use this algorithm? Usually, data scientists write in R, Python, or Scala (Spark), and these do not expose a microservice you can consume using a clear API like any other service. So, you will often need someone (Dev/ML Platform) to wrap a data science artifact and expose it for consumption. This, of course, requires time and effort from another team to deploy and maintain artifacts that are actually owned by the data science team. In this post Uri Brodsky writes about how they enabled their data science team to expose these artifacts with a clear API, allowing them to take full ownership of the process from deployment to production.
- A Framework to Understand How Low-Quality Data Hurts Business Performance: Data quality issues unite the data practitioners across the industry, however, the root cause can as unique as each product or service that the data team supports but it sure does impact the business reputation. Data leaders are responsible for contextualising data quality in a way that makes sense to business stakeholders within the scope of your business KPIs and goals. In this post, Kevin Hu writes about How data practitioners should reason about the importance and impact of data quality on businesses. He writes in detail about the ways companies use data, a three-part framework that can be used to identify how data quality impacts business performance in your organization and how to prevent and troubleshoot data quality issues before they impact your business.
- The Anatomy of a Data Product: Data Products make data easy to find, consume, share, and govern. And to deliver this value, the data practitioner's job is to make Data Products easy to build, deploy, secure, and manage. In this article Eric Broda answers two key questions: How are data products designed, and how do they work such that they make data easy to find, consume, share, and govern and what capabilities, APIs, and lifecycle needs to be established to make data products easy to build, deploy, secure, and manage.
Upcoming data events and summit
- Rivery is hosting a webinar on 'The Data Teamβs Journey to Positive ROI: Making Analytics Operational' on 8th November at 8 AM PT.
In this live session, join Taylor McGrath (Riveryβs VP of Data Labs) for answers to the questions data teams are currently facing:
1. How to show positive ROI
2. What data team changes are required
3. Organizational changes to consider
4. How to truly make analytics a source of growth
Reserve your spot here.
MDS Jobs
- WorkStep is hiring a 'data analyst'
Location: USA
Data stack: Fivetran, dbt, BigQuery
Apply here. - Mammoth Growth is hiring an 'Analytics Engineering Lead'
Location: Remote
Data stack:dbt, Snowflake, BigQuery, Redshift
Apply here. - Rill Data is hiring a 'Data Engineering Lead, Customer Success'
Location: Remote
Stack: Snowflake, BigQuery,dbt
Apply here.
π₯ on Twitter
Just for fun π
If you are enjoying this newsletter series please consider forwarding this to a friend! If a friend sent you this, get the next newsletter by signing up here
What do you think about our weekly Newsletter?
Love it | It's great | Β Good | Okay-ish | Meh
If you have any suggestions, want us to feature an article, or list a data engineering job, hit us up! We would love to include it in our next editionπ
About Moderndatastack.xyzββ - We're building a platform to bring together people in the data community to learn everything about building and operating a Modern Data Stack. It's pretty cool - do check it out :)