Welcome to the latest edition of the MDS Newsletter! We're excited to announce the launch of our Rocketship Awards 2023, where we recognize the most impactful data tools in the Modern Data Stack community. Chosen by a Jury of top leaders and investors in data, the awards will celebrate 30 winners across 30 data categories over 30 days. In addition, check out our latest podcast episode featuring data insights from the team at Shopify. Don't miss out on these exciting updates - read on for more!
Modern Data Show S02 E09
S02 E09: Building Data Pipelines at Shopify: Insights from Marc Laforet, Senior Data Engineer at Shopify: With its widespread popularity and success in the e-commerce industry, it is difficult to imagine anyone who has not at least heard of Shopify. This episode features Marc Laforet, a senior data engineer at Shopify, who shares his journey of how he transitioned from being a biochemist to a data engineer at Shopify. Marc explains the type of data Shopify works with, which is diverse in format and comes from different sources, and how the company determines which tools to build to extract the most value from the data. Marc also discusses data governance and explains two possible architectures: a gating process or a trust-but-verify approach.
Featured tools of the week
- Skyvia: Skyvia is a universal SaaS (Software as a Service) data platform for quick and easy solving a wide set of data-related tasks with no coding: data integration, automating workflows, cloud data backup, building reports and dashboards, data management with SQL, CSV import/export, creating OData services, etc.
- Tinybird: Tinybird helps data teams build real-time Data Products at scale through SQL-based API endpoints. It ingests millions of rows per second and serves low latency, high concurrency analytical queries over any amount of data.
Tinybird has raised a total of $37M in funding over 2 rounds. Their latest funding was raised on Apr 5, 2022 from a Series A round.
Featured stack of the week
- Wellthy: Wellthy is a caregiving support service that provides personalized support to help individuals and families manage the logistical and administrative tasks of caring for themselves or their loved ones who have complex, chronic, and ongoing care needs. With Wellthy, you can access a variety of resources and tools through an online dashboard, allowing you to take control of the care process and streamline the management of appointments, medications, insurance, and other related tasks.
Here are the data tools used by Wellthy:
Good reads and resources
- Data Quality at Scale with Great Expectations, Spark, and Airflow on EMR: What are the challenges that companies face with data quality, and how can the Modern Data Stack (MDS) help? Cicero Moura addresses this topic in his article, which highlights the Great Expectations tool that helps ensure data quality by defining expectations about data and checking whether they meet them. He presents a practical case using Great Expectations with Spark to execute test cases. The Spark environment will be on EMR, and Airflow will orchestrate the jobs that will run. He provides a step-by-step guide on how to use Great Expectations with Spark and Airflow to ensure data quality.
- How HelloFresh establishes Data Quality with an in-house tool: Have you heard of HelloFresh, a company that provides meal kits and recipes to its customers? Abhishek Khare, one of the key players in the company's data team, identified the need for data quality after realizing its importance in creating data products. With a vision of providing every user of their data platform easy-to-use, easy-to-understand, and well-integrated data quality tooling, the company developed an Airflow Operator for Data Quality that targets the analytical data space. This tool is distributed and allows users to plug in data quality at any stage of the data pipeline, using Spark as the compute framework for data quality calculations and soda.io as the open-source data quality framework. Abhishek and the team built the DQ Airflow operator on top of the KubernetesPodOperator and executed DQ checks using Soda on Spark, with results stored in AWS S3. Partnering with a domain team to cover an important dataset with data quality helped the company understand user requirements and receive feedback, which allowed them to add features and increase adoption of data quality at HelloFresh.
Upcoming data events, summits and webinars
- The Open Data Science Conference is a leading gathering of professionals and companies that are driving the future of AI and data science. With major conferences in the USA, Europe, and Asia, ODSC brings together attendees, presenters, and speakers who are shaping the industry. The conference features world-class speakers, including core contributors to many open-source libraries and languages, and offers exclusive product launches and interviews. Attendees can also take advantage of the conference's media room. The upcoming conference will be held from May 9th to 11th in Boston.
Don't miss out on this incredible event. Register now.
- The Gartner Data and Analytics Summit invites you to join them in Mumbai, India on May 8th and 9th for a unique opportunity to explore the latest data and analytics solutions for your most pressing challenges. With a focus on innovation and adaptability, attendees will learn how to create and lead organizations that can thrive in the face of disruption. Featuring experts in data science, analytics, and cloud-based data management, this event will equip you with the tools you need to build a resilient and responsive organization. Don't miss out on the chance to scale your purpose beyond organizational silos, optimize costs, and foster societal perseverance. Register now for the Gartner Data and Analytics Summit in Mumbai.
- Spond is hiring Lead Data Engineer
Location: Oslo, Norway
Stack: AWS, dbt, Databricks, Fivetran
- Census is hiring Senior Data Community Advocate
Location: Remote / Philadelphia
Stack: Fivetran, dbt, and Snowflake
- Hygraph is hiring Senior Data Engineer
Location: Remote in EMEA
Stack: Fivetran, dbt, Bigquery
🔥 Trending on Twitter
Just for fun 😀
Are you always hungry for more information and updates about the ever-evolving world of data?
But wait, there's more! We want to hear from you - rate us here and let us know how we're doing.
We welcome any suggestions, articles you would like us to showcase, or data engineering job listings that you may have. Don't hesitate to get in touch with us as and we would be delighted to incorporate your input into our next edition.
About Moderndatastack.xyz - We're building a platform to bring together people in the data community to learn everything about building and operating a Modern Data Stack. It's pretty cool - do check it out :)