Oct 19, 2022 5 min read

MDS Newsletter #56

Translating a business problem into a data problem is a critical part of data engineering, but is not talked about enough. In this week's newsletter read why is business context critical for data engineering.

Modern Data Show S01 E06

Understanding Full-Stack Data Observability with Salma Bakouk, Co-founder of Sifflet Data: Data quality issues have existed since the time businesses started using data to drive business initiatives. ‘Data Observability’ as a category is gaining a lot of attention and is maturing pretty fast. To understand the evolution and the current rise of ‘data observability’ we have Salma Bakouk with us, who with her team is building a tool that can help both data engineers and data consumers navigate data reliability and data quality issues. Listen Now

You can also listen to all the episodes on Apple Podcast, Spotify, Google Podcast, YouTube and Amazon Music.

Featured data stack of the week

Cube is the headless business intelligence platform for accessing data from modern data stores, organizing it into consistent definitions, and delivering it to every application. Cube works with all kinds of data sources and delivers data to any BI tool or data app.

Cube Dev has raised a total of $21.7M in funding over 2 rounds. Their latest funding was raised on Jul 19, 2021 from a Series A round.
Acho is a Cloud Data Hub that can help you process a large amount of data with ease. Acho builds virtualized hubs for you on the cloud so you may build powerful pipelines for data resources.

Acho has raised a total of $2.4M in funding over 2 rounds. Their latest funding was raised on Sep 1, 2021 from a Seed round.

Featured data stack of the week

Fanatics is a leading global digital sports platform, complete with offerings that excite fans and maximize the reach and presence of partners across the entire sports ecosystem. We operate more than 300 online and offline stores including an e-commerce business with all major professional sports leagues (NFL, MLB, NBA, NHL, NASCAR, MLS, PGA), major media brands (NBC Sports, CBS Sports, FOX Sports) and over 300+ collegiate and professional team properties. Here are the data tools they use.

Good reads and resources

Why Data Contracts are Obviously a Good Idea: Data contract has become one of the most debated topics in data space. From arguments ranging from ' data contracts provide insights into who owns what data products and helps manage data pipelines with confidence' to 'They are impractical to achieve, impossible to maintain', a lot has been discussed. In his latest blog, Yali Sassoon argues why data contracts are a powerful tool for enabling the controlled evolution of data in the business. He argues that when a business is creating data instead of extracting it like 'oil' and then processing the same, the data contract becomes easy to adopt and ensures data integrity. When data is created, data contracts govern the process of creating and testing the data, which leads to a solid foundation for building downstream data processing applications.
The Question That Every Data Engineer Should Ask: Every data engineering problem always starts with an ambiguous business problem. Even before data engineering starts building a data solution they need to understand the business context of the problems, the teams that are involved and both the short-term and long-term goals to build well-defined optimized data solutions accordingly. Translating a business problem into a data problem is such a critical part of data engineering, but we don’t talk about it enough. In this article, Xinran Waibel writes about why is business context critical for data engineering.

Journal

Running on fumes: Why exhaust data is killing your data team: Most of the time of data teams is spent on fielding requests from around the business, trying to provide answers to specific questions or enabling analysis and automation. Sometimes the data which the business team needs is not even readily available, data teams have to go out into the wild and round up whatever data from their estate that they can find, which is usually ‘exhaust data’ – data that’s been generated by a tool or platform for a purpose other than what your team is looking for. Simply put, it’s a by-product. Data teams spend significant amounts of time and effort cleaning and wrangling this 'exhaust data'. William Stolton writes fortunately, there exists another way to drive value with data and it’s called ‘Data Creation’.

Upcoming data events and summits

The modern data stack is betting on metadata. Atlan is hosting a webinar with the leaders from Fivetran, dbt Labs, and Snowflake on Wed, Oct 26 to discuss how metadata can help data teams drive the future of data collaboration. Sign up here.
Sifflet team loves it when software principles are applied to data engineering, so when they heard about API for data, they got intrigued. Tune in as CEO Salma Bakouk sits down with Andrew Jones - who coined the term and successfully implemented the concept at GoCardless - to dissect the topic. Registration link here.

Data startup funding news

RisingWave Labs, a platform for data stream processing, announced that it raised $36 million in a Series A funding round led by Yunqi Partners, undisclosed corporate investors and angel investors. Bringing RisingWave’s total raised to over $40 million.

MDS Jobs

Takeda is hiring a ' Data Platform Engineer'
Location: Boston, Massachusetts
Stack: Databricks and Tableau
Apply here
EnergyHub is hiring a Data Engineering Manager
Location: Remote (US)
Stack: Python, Airflow, Snowflake, dbt, Tableau
Apply here
Qogita is hiring a 'Business Intelligence Analyst'
Location: UK, Netherlands
Stack: dbt, Looker, Snowflake and Fivetran
Apply here

🔥 on Twitter

Just for fun 😃

If you are enjoying this newsletter series please consider forwarding this to a friend! If a friend sent you this, get the next newsletter by signing up here

Love it | It's great | Good | Okay-ish | Meh

If you have any suggestions, want us to feature an article, or list a data engineering job, hit us up! We would love to include it in our next edition😎

About Moderndatastack.xyz‌‌ - We're building a platform to bring together people in the data community to learn everything about building and operating a Modern Data Stack. It's pretty cool - do check it out :)

Modern Data Show S01 E06

Featured data stack of the week

Featured data stack of the week

Good reads and resources

Journal

Upcoming data events and summits

Data startup funding news

MDS Jobs

🔥 on Twitter

Just for fun 😃

What do you think about our weekly Newsletter?