MDS Newsletter #76

Welcome to our weekly newsletter! πŸ˜ƒ We are thrilled to announce that Episode 3 of Season 2 of "The Modern Data Show" is now out, starring the amazing Gunnar Morling, Senior Staff Software Engineer at decodable. 🌟 This podcast is loaded with excitement that you just can't afford to miss! 🀩 Don't wait for another second and click the link below to dive into the fascinating world of data with us. πŸš€

And guess what? 😎 We have tons of new and captivating episodes lined up for you every week! So make sure to stick around for more exhilarating content. πŸ˜‰

Modern Data Show S02 E02

  • S02 E03: Innovating the Modern Data Stack: Change Data Capture and Beyond with Gunnar Morling Senior Staff Software Engineer at Decodable: In this episode of Modern Data Show Gunnar Morling discussed his interest in software engineering and databases and his recent move to Decodable, a real-time stream processing platform based on Apache Flink. He talked about the importance of cohesive data pipelines, from source to sink, and how his work with Debezium led him to become interested in stream processing. Gunnar also discussed how Decodable provides managed stream processing based on Apache Flink, ingesting real-time data streams and processing them, and putting the data into other systems.
  • WhyLabs is an AI observability platform that prevents data quality or model performance degradation by allowing you to monitor your data pipelines and machine learning models in production. The WhyLabs approach to AI observability and monitoring is based on cutting edge research, but flexibility is a priority and users have plenty of options to customize their implementation to their needs. Β 

    WhyLabs has raised a total of $14M in funding over 2 rounds. Their latest funding was raised on Nov 4, 2021 from a Series A round.
  • Heap is a digital insights platform that helps you understand how and why customers engage with your product. It automatically captures all user interactions in your app, then organizes this data into a simple yet flexible hierarchy for you to answer questions, run experiments, and explore your data to unearth insights.

    Heap has raised a total of $218.1M in funding over 7 rounds. Their latest funding was raised on Dec 7, 2021 from a Series D round.
  • CLARK offers its customers the opportunity to keep a digital eye on all their insurance policies at all times and to inform them howindividual tariffs can be optimized.

Β  Β  Here are the data tools of CLARK:

Good reads and resources

  • Modern Data Stack: Which Place for Spark?: This article is written by Furcy Pin, who discusses the place of Apache Spark in the Modern Data Stack. While the stack is built around a massively parallel SQL engine such as BigQuery, Redshift, or Snowflake, and dbt for data transformation, Spark is often absent. She compares BigQuery with Spark and mentions that the former is much easier to master than Spark, which requires a steeper learning curve. She also praises dbt's organizational approach to transformation pipelines, which can easily lead to bad practices without such a tool. Additionally, the article discusses the limitations of SQL and the need for a DataFrame API similar to pySpark's for BigQuery, Redshift, or Snowflake. Finally, the article mentions the importance of user-defined functions for logic implementation, where SQL falls short.
  • DataOps 03: Trino + DBT + Spark β€” Everything Everywhere All at Once: This article is written by Ong Xuan Hong, who discusses the benefits of using a combination of Trino, DBT, and Spark for the ETL process in data analytics.  Ong explains that this combination provides a streamlined and reliable way of processing and analyzing data. Trino connects easily to different data sources, DBT is adaptable for loading and transferring data between systems, and Spark is excellent for transforming large datasets. Together, these tools create an efficient and flexible data pipeline that is suitable for various data analytics tasks.

Upcoming data events, webinars and summits

  • Sharpen your leadership skills, refine your strategies and discover the latest technologies at 'Gartner Data & Analytics Summit' on March 20 – 22, 2023 in Orlando, FL. Join the summit to discover the top trends and technologies you will need to empower the innovative and adaptable organizations of the future while networking with 4,000+ data and analytics leaders.

    Register for the event here.
  • Join the online event 'Enterprise Data World' from Mrach 27th - 31st, 2023, which will cover a variety of topics like Data Governance and Quality, Data Architecture, Data Strategy and much more. The event has been regarded as the biggest global educational event on managing data for the past 26 years. This year's conference will provide in-depth learning by data-driven professionals from all over the globe. Β 

    Register for the event here.

MDS Jobs

  • Leap is hiring Analytics Engineer
    Location: Remote based out of Sydney
    Stack: Fivetran, DBT, Snowflake, Tableau
    Apply here
  • Drata is hiring Senior Revenue Operations Data Analyst
    Location: Remote
    Stack: Fivetran, DBT, Snowflake, AWS
    Apply here
  • Rocket Money is hiring Senior Analytics Engineer
    Location: Washington D.C., Remote (USA)
    Stack: DBT, BigQuery/GCP, Fivetran, Looker, Mode
    Apply here

Just for fun πŸ˜€

Subscribe to our Newsletter, Follow us on Twitter and LinkedIn, and never miss data updates again.

What do you think about our weekly Newsletter?

Love it | It's great | Β Good | Okay-ish | Meh

If you have any suggestions, want us to feature an article, or list a data engineering job, hit us up! We would love to include it in our next edition😎


About Moderndatastack.xyzβ€Œβ€Œ - We're building a platform to bring together people in the data community to learn everything about building and operating a Modern Data Stack. It's pretty cool - do check it out :)