MDS Newsletter #24

And it's a wrap!!!

As you know we launched the inaugural version of the MDS Rocketship Awards last month which concluded yesterday. Loved the response that we received from our audience & community. We'll be coming up with them again next year. So if you haven't already checked all the 27 winners from over 27 different data categories, you can take a look at them here

We have recently added this amazing new section where we ask our subscribers a question related to data every week and post their interesting answers for everyone else to learn from. If you relate to any question feel free to revert your answer. We'd love to have it💖

Here it goes👇

Community Speaks

Last week's question- Since Women's Day is around the corner and we are celebrating and recognizing all the wonderful women around us, here's the question, who are the women in data that have inspired you through their work and achievement?

Here are the amazing answers we received

I am thinking of:
Ariane Hoffenberg, Head of Analytics Engineering for all Products @ Nubank
I did a random coffee and was blown away by her visionary strategy for data combining perfectly technical scalability & business impact
Virginie Cornu, VP Data @ Jellysmack
I tried to sell Castor to her and although she hasn't bought (yet). I am impressed by the pace at which she built the data team at Jellysmack. It takes some amazing energy.
Xavier de Boisredon, Co-founder and COO, Castor

A few more I admire 🤩: Emilie Schario Hilary Mason Maura C crystal widjaja Elena Dyachkova Claire Carroll Gwen Windflower Renee Teate Laura Ellis 🚀💪
Stefania Olafsdottir, CEO & Co-Founder, Avo

I also want to recognize Emilie Schario, Claire Carroll, Gwen Windflower, Meghan Cassidy which are just a few of the amazing women that inspire me on the reg!
Josh Devlin, Senior Analytics Engineer, Brooklyn Data

Love this topic! I’d say Kristine Cristobal who is one of the best data engineers I ever worked with. She’s so smart, and humble for someone who is constantly 10 steps ahead! She’s also always learning something new, and teaches us all her cool data tricks!
Dina Mohammad-Laity, VP Data, Feeld

This week's question- How would you describe data jobs in one word?

You can send answers by replying to the newsletter email or using the 'contact us' section on our website.

Have you ever used pivot tables to get an answer to your business question?
If yes, you've already had a flavor of spreadsheet-based BI.

Do you remember the last time you or a team member extracted a CSV file from your back office, CRM or another system, dumped it in Excel, and did a pivot table to answer a question you had? Well, that’s spreadsheet-based BI. You used Excel (a spreadsheet) to analyze data and answer key business questions (that’s Business Intelligence).

Read this article and Tweet thread by Jonathan Parisot, CEO Actiondesk to know more

  • Delta Lake is an open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs for Scala, Java, Rust, Ruby, and Python.

    Category: Data Lake

    Founded in 2019, Delta Lake is a sub-project of the Linux Foundation Projects.
  • Hevo is an Automated Unified Data Platform that helps companies understand their users and customers better. Using Hevo, companies can build a 360-degree view of their customers by combining data from multiple disparate data sources and applications including sales CRM, advertising channels, marketing tech, financial system software, and customer support products.

    Category: ETL Tool

    Hevo Data has raised a total of $43M in funding over 4 rounds. Their latest funding was raised on 17th Dec' 2021 from a Series B round.

Good reads resources

Your data ingestion strategy is a key factor in data quality: Data ingestion is one of the most crucial and time taking tasks. Data teams spend almost 25% of their week’s time on data ingestion. That’s quite a lot, right? And to get some of it back one has to solve a very crucial issue with the incoming data that gets ingested, “data quality”. Yes, solving the data quality issue can help prevent bad data ingestion which in turn can help in reducing the time for the process. But this gets difficult due to the technological barriers of pipeline tools such as Airflow. So how can you overcome this?

In this article, John Blust has shared a data ingestion strategy and framework designed to help you wrestle more of your time back, and keep out bad data for good.

7 Antifragile Principles for a Successful Data Warehouse: It is a misconception that data warehouses have always been associated with rigid processes, slow adaptability, & high costs. It's the poor execution that doesn’t support changing business needs. What’s the solution?

In this article, Iliana Iankoulova has shared 7 antifragile principles for a successful data warehouse. She has discussed in detail  The best of two worlds — the structure and quality of a centralized data warehouse combined with the agility of antifragile practices to make sure that your data warehouse is in sync with your evolving business.

Why Data Engineers Must Have Domain Knowledge — And How To Gain It: Data job market is on the boom and if you are in a data role or aspiring to be in one you might know this. There’s a high requirement for data scientists, data engineers, data analysts in different organisations offering handsome salaries to attract the best of the talent. But these roles are often only focused on the candidate’s technical aptitude and don’t take into account the domain knowledge. This makes you miss the larger context of the business problem that you are tasked to solve. You are just someone who codes and misses the opportunity to be an asset to your organization or be eligible for higher promotions internally or externally.

In this article, Zach Quinn has shared Why developing business knowledge is as important as developing technical skills.

Building a Data Lake on Google Cloud Platform: Big data is a discipline that deals with methods for analyzing, methodically extracting information from, or otherwise dealing with data volumes that are too massive or complicated for typical data-processing application software to handle. To handle the data generated by modern applications, the application of Big Data is very necessary.

In this article, Md Hishaam Akhtar has provided a tutorial on how to create a data lake that reads any changes from an application's database and writes it to the relevant place in the data lake using -  Debezium, MySQL, Apache Kafka, Apache Hudi, Apache Spark.

The Paradigm Shift of Business Models in the Data Space is Real: 2021 was a good year from data industry’s perspective. Last year we saw an explosion of startup investments in no-code and low-code data platforms as well as open-source projects in the data space, which primarily define the modern data stack. But now it is time to dive a little deeper into the topic and understand the dynamics in those markets. In this article, Florian Grüning highlights the reasons for these booms and draws attention to the problems of the markets to ultimately express the thesis — Are the final days of the classic SaaS model in the data space counted?

Upcoming Data Events and Summits

  • Great Data Mind is organizing the "3rd Annual Technology Matters Marathon" on March 10, 2022.

    At this virtual event, companies will present an overview of their company, demonstrate their solution and provide future road map enhancements.

    Register here
  • Monte Carlo Data is organizing IMPACT TOUR- The Data Observability Hybrid Event. This is going to be 3 part virtual keynote series on March 10, March 31, and April 27, 2022, followed by the in-person city tour.

    The event is to celebrate the release of O'Reilly's Data Quality Fundamentals, where industry experts will talk about how they're tackling the biggest challenges in data, from building more reliable stacks to hiring top talent for your team.

    To know more about the event click here
  • Vertica is organizing a virtual roundtable "Going Beyond a Data Lakehouse" on March 10th, 2022.

    At this event, the speaker will discuss how companies are using new technologies to unify different analytics approaches, different teams of analysts, and different architectures in ways that go beyond what most data lakehouses can deliver.

    Register here

Funding and Acquisition News

  • Snowflake Acquires Streamlit
    Snowflake, a Data Cloud company, acquired Streamlit, a framework built to simplify and accelerate the creation of data applications, for $800 million.

    Snowflake launched in 2012 and raised $1.4 billion before going public in September 2020. Streamlit launched in 2019 and raised $62 million.
    Read here
  • Synk Acquires Topcoat
    As an additional element of the existing Snyk Developer Security Platform, the TopCoat product will provide crucial core reporting and analytical capabilities. Within the data ecosystem, TopCoat will shift from “vendor” to “contributor.”

    Snyk was founded in 2015 and raised a total of $1.4B in funding over11 rounds. TopCoat Data was launched in 2019.

MDS Jobs

  • Wellthy is hiring a 'Senior Data Analyst'
    Location: US
    Apply here
    Check out Wellthy's data stack here
  • Health Joy is hiring a 'Director, Data'
    Location: Chicago, IL
    Apply here
    Check out Health Joy's data stack here
  • Ramp is hiring a 'Senior Analytics Engineer"
    Location: New York, Remote
    Apply here
  • SKIMS is hiring a 'Data Analyst'
    Location: Los Angeles, CA (Hybrid)
    Apply here
  • CSV Health is hiring a 'Data Engineering – Senior Manager'
    Location: Irving, TX
    Apply here

What's 🔥on Twitter


Just for fun

Do you feel the same???

If you like this newsletter (I know you do😉 ), share it with your friends. It will take 10 seconds for you to share this, but took us 10 hours to prepare. Send us some love 💖

Do you have any suggestions, or want us to feature an article, or list a data engineering job, hit us up! We would love to include it in our next edition😎


About Moderndatastack.xyz‌‌We're building a platform to bring together people in the data community to learn everything about building and operating a Modern Data Stack. It's pretty cool - do check it out :)