MDS Newsletter #69
Are you tired of sifting through mountains of data to find what you need? Look no further! This week, we're excited to introduce Metaplane and PipeRider, two cutting-edge tools designed to make monitoring and understanding your data stack a breeze. In addition, we'll be diving into articles on the future of the modern data stack in 2023 and best practices for implementing a Medallion architecture. Keep an eye out for MDS jobs, upcoming events, and webinars. Be sure to stay ahead of the curve by subscribing to our newsletter.
Featured tools of the week
- Metaplane: continuously monitors the data flowing through your data stack and then alerts you when something may be going wrong. They do this by collecting metrics, metadata, lineage, and logs, training anomaly detection models on historical values, then sending you alerts for outliers with options to provide model feedback.
Metaplane has raised a total of $8.4M in funding over 3 rounds. Their latest funding was raised on Jan 10, 2023, from a Seed round.
- PipeRider: is an open-source data quality toolkit for data professionals. By coupling data profiling with data assertions, PipeRider provides a platform for both better understanding your data and also defining what you expect your data to be.
Featured data stack of the week
- Aritzia: Aritzia is a vertically integrated design house with an innovative global platform. They are creators and purveyors of everyday luxury, home to an extensive portfolio of exclusive brands for every function and individual aesthetic. They are about good design, quality materials, and timeless style.
Here are the data tools of Aritzia:
Good reads and resources
- The Future of the Modern Data Stack in 2023: This article by Prukalpa covers 10 big trends for the modern data stack in 2023, The first four trends are emerging trends that will be a big deal in 2023 and the last six trends discussed are existing trends that are poised to grow even further.
The trends include the shift towards efficiency and cost-cutting, growth of metadata to reduce costs, "data governance shifting left" meaning data producers will have to document and check data against pre-defined standards before it can go live, an increase in semantic layer used to make data more understandable for business users, the continued growth of CDPs and emergence of "data activation" using data from warehouses to handle CDP functions, growth of cost-management and data optimization tools from independent companies and storage partners, and the introduction of compatible optimization features.
- Medallion architecture: best practices for managing Bronze, Silver, and Gold: This article is written by Piethein Strengholt, who discusses best practices for implementing a Medallion architecture in a data lake, which organizes data in three layers: Bronze, Silver, and Gold. The article first explains how the design of the layers may vary depending on how the data platform is used and whether it is aligned with the source-system side or the consuming side of the architecture. The article then provides an overview of each layer, starting with an optional landing area, where data from various source systems is temporarily stored before moving it into the Bronze layer. The Bronze layer is a reservoir that stores data in its natural and original state, without validation, and is typically used for the storage and archiving of data. The Silver layer is used for data integration and governance, and the Gold layer is used for data consumption and analytics. The article also provides recommendations for file formats and partitioning methods for each layer.
- 5 Helpful Extract & Load Practices for High-Quality Raw Data: The article discusses best practices for Extract and Load (EL) in data architectures. The extract and load phase is crucial for determining data quality for transformations and beyond, and robust EL pipelines are necessary for delivering accurate, timely, and error-free data. The article lists 5 practices recommended by data experts that will drive up quality for all data sets, regardless of the tool used. These practices include making each EL run uniquely identifiable, deduplicating data, not flattening data during EL, having an immutable raw level, and not transforming data on ingestion. The article emphasizes that these practices are not set in stone and should be tailored to the specific needs of the project and organization.
By Sven Balnojan
If you also have an interesting blog that you would like us to share with the data community, submit it here.
Upcoming data events, webinars, and summits
- Join " The Great Data Debate" with the founders of the modern data stack virtually for a live, interactive discussion on the changing data ecosystem hosted by Atlan on 24th January, Tuesday at 1:00 pm (EST). It will have two sessions:
Session 1 covers the "Future of the Modern Data Stack"
Speakers include Bob Muglia, Tristan Handy, Prukalpa Sankar, Austin Kronz,
Session 2 covers the "Future of Data Culture: ROI, Value & Data Teams"
Speakers include Barr Moses, Benn Stancil, Douglas Laney, Austin Kronz,
Register for the event here
- Join the virtual data session "Data Teams Summit 2023 "on 25th January from 8:00 am (PST). The topics include data quality, the future of data orchestration, data contracts, going from DevOps to DataOps, etc.
Register for the event here
- Infinite Lambda is hiring Senior Analytics Engineer and Data Architect
Stack: Fivetran, dbt, Snowflake, BigQuery
- Ecosia is hiring a Senior Data Analyst and Data Protection Officer
Stack: dbt, Redshift, Looker, Airflow
- Education Analytics is hiring an Analytics Engineer/ Data Engineer
Stack: dbt, Snowflake, Airflow
🔥 Trending on Twitter
Just for fun 😀
Subscribe to our Newsletter, Follow us on Twitter and LinkedIn, and never miss data updates again.
What do you think about our weekly Newsletter?
Love it | It's great | Good | Okay-ish | Meh
If you have any suggestions, want us to feature an article, or list a data engineering job, hit us up! We would love to include it in our next edition😎
About Moderndatastack.xyz - We're building a platform to bring together people in the data community to learn everything about building and operating a Modern Data Stack. It's pretty cool - do check it out :)