MDS Newsletter #22
It's Wednesday and we are back with our weekly newsletter where we bring all the latest happenings in data space so that you don't miss out on any😃
This week we are launching a cool new section Community Speaks,
where we'll ask an interesting question or ask for your take on trending topics in the data space. All the cool answers will be published here!
Last week's question was- What's that one thing you wish you would have known when you first became a data leader?
Here are some interesting answers
The importance of stakeholder relationship building, data journalism, and internal marketing of data success stories. The mission of the data leader is to build a data culture that helps people make decisions based on data. The trap to avoid is to get stuck in answering data requests as they come in, reserving no time for high-leverage activities and proactive research. It's important to build a flywheel of data activities across the organization that increase data curiosity, data literacy, and the desire to use data to make decisions. Think of it like product-led adoption of data within your organization :)
Stefania Olafsdottir ,CEO & Co-Founder, Avo
I wish I had known that you can and must work on being a leader. I did not think much about being a better leader and developing a curriculum for myself at the start of my career. It's much easier to read technical documentation and learn LookML, MLflow, XGBoost for the first time. It's possible to become a better leader and manager, and it requires focus and intentionality the same way learning how to be a data scientist does. The best way I've found is to find people I admire, and adopt something they do into my own style of leadership.
Ian Macomber, Head of Analytics Engineering, Ramp
At first, I didn't fully appreciate how the Modern Data Stack would significantly improve GTM teams access to product and business data. GTM leaders now expect all business workflows to be built on top of the MDS.
Thomas Schiavone, CEO and Co-founder Calixa
Data work is never just about the tools. A lot of our conversations these days are focused on the modern data stack, but that's only one side of the story. Turning data into insights requires a deep understanding of the shape and meaning of the data and the business processes that generate it. On the other side, it's absolutely crucial to talk to stakeholders and collaborate with them on key business metrics. Effective data work is as much about understanding the business as it is about using the right tools.
Sam Bail, Staff Data Engineer, Collectors Universe
This week's question - What's that one piece of advice you'll give to companies who have just started building their data teams?
Hit reply on this email!
Featured Category of the week - Augmented Analytics
The term Augmented Analytics was coined by Gartner in 2017 & since then it has gained a lot of traction. So many different tools are built around the concept to solve business problems using data. But what is Augmented Analytics and how does it fit in the Modern Data Stack?
Augmented analytics is the use of enabling technologies such as machine learning and AI to assist with data preparation, insight generation, and insight explanation to augment how people explore and analyse data.
Here's an article and Twitter thread by Stefan Dörfelt, CEO Kausa, where he talks about Augmented Analytics in detail!
Featured Tools of the week
- Weld enables a powerful reporting and data activation out of the box - built on top of your data warehouse, your single source of truth. It helps companies to take control of all their data and utilize it for long-term growth.
Weld is headquartered in Copenhagen, Denmark and was founded in 2021
Weld has raised a total of 4.6 M in a seed round of funding. The seed round of held on Nov 29, 2021.
- Supergrain is the API-first BI platform that helps data teams manage and integrate metrics across every application.
Headquartered in San Francisco, CA, Supergrain was founded in 2021.
Supergrain has raised a total of $6.8M in funding over 1 round. This was a Seed round raised on Nov 9, 2021.
Good reads and resources
- Recipe for building your first Data Product in a Data Mesh: In 2021, Data Mesh was amongst the trendiest discussions within the data space and its popularity continues to grow in 2022 as well. The decentralized strategy of data mesh distributes data ownership to domain-specific teams that manage, own, and serve the data as a product. Just like the journey of a thousand miles begins with a single step, for a Data Mesh, this journey begins with a single Data Product.
In this article, Tinh Ha has outlined a fine recipe for building your first Data Product. He also gave some amazing points and explained what a typical data product’s architecture on Google Cloud looks like with the help of a diagram.
- Data Warehouse Automation: Data Warehouse Automation is no longer a new concept, numerous organizations have already started exploring data warehouse automation capabilities. And due to this rising interest off-the-shelf tools from various vendors have started to enter the market to meet the DWA demand. In this article, Ganesh Nathan has talked in detail about the “what”, “why”, & “how” of Data Warehouse Automation.
- How foodpanda reduced 45% of our BigQuery cost with reservations slots: Foodpanda is an online food and grocery delivery platform owned by Delivery Hero. It is currently the largest food and grocery delivery platform in Asia, outside of China, operating in 12 markets across Asia. Foodpanda has grown tremendously over the past year which has led to growth in the number of users that work on their data warehouse which is built on Bigquery. This led to a rise in the Bigquery costs.
In this article, Brenda Thng has discussed how the data team at Foodpanda lowered their Bigquery costs using “reservation slots” in these last 6 months. She also discussed in detail the pros and cons of the different BigQuery pricing models and current strategy at Foodpanda, and some tips on how to get started. A must-read for those who are also building data warehouses on BigQuery.
- Introduction to Change Data Capture (CDC): Change Data Capture can be defined as the process of tracking & identifying changes occurring to a source system so that the downstream system can trigger an action to that change. In this article, the writer has explained the concept of Change Data Capture in detail where he talked about the workings of the CDC process and its benefits.
- Fresher Data Lake on AWS S3: Robinhood’s mission is to democratize finance for all. Continuous data analysis and data-driven decision-making at different levels within Robinhood are fundamental to achieving this.
In this article, Balaji Varadarajan has shared the journey of how Robinhood built its Change Data Capture based incremental ingestion using various open-source tools to reduce the data freshness latency for its core datasets from one day to under 15 minutes. He also described the limitations they had with the big batch ingestion model as well as lessons learned while operating incremental ingestion pipelines at a massive scale.
Data startups funding news
- Arcion Labs raises $13M in Series A round of funding!
Arcion is the only cloud-native, real-time data mobility platform, delivering high-performance, high-volume pipelines.
This round of funding was led by Bessemer Venture Partners. Headquartered in San Mateo, California Arcion Labs was founded by Rajkumar Sen and Miryana Joksovic.
- Decodable raises $20M in Series A funding
Decodable is a real-time data engineering company. It delivers the first self-service real-time data platform that anyone can run.
This round of funding was led by Venrock and Bain Capital Ventures, with participation from individual investors including, DJ Patil, Olivier Pomel, Spencer Kimball, and Jason Forget.
Upcoming data events and summits
- Hevo Data is hosting a virtual event on “Activating Real-time Insights with the Magic of a Modern Data Stack” on 24th Feb 2022.
At this event, you'll gain insights into new trends and real-life use cases of the modern data stack.
- The 2nd Annual 'Data Mishaps Night' is going to be held on 24th Feb 2022 at 7 pm CST.
Listen to the data mistakes stories of data professionals & the lesson they learnt along the way.
Link to Register.
- Confluent is organising a webinar on 'How Data Mesh Can Transform Your Business' on 1st March 2022
In this webinar, confluent will present a conceptual model for how to think strategically about data mesh: What is it? When to consider it for your business?
Register here for the webinar
- Canva is hiring a ‘Data Warehouse Engineer’
Check out Canva’s data stack here
- Collibra is hiring a ‘Data Engineer’
Location: Remote, UK
Check out Collibra’s data stack here
- Loft is hiring a ‘Senior Data Engineer’
Check out Loft’s data stack here
- Gopuff is hiring a ‘Manager, Analytics Engineering’
Check out Gopuff’s data stack here
- Health Joy is hiring a ‘Director, Data’
Location: Chicago, IL
Check out Health Joy’s data stack here
What's 🔥 on Twitter
Just for fun
If you like this newsletter (I know you do😉 ), share it with your friends. It will take 10 seconds for you to share this, but took us 10 hours to prepare. Send us some love 💖
Do you have any suggestions, or want us to feature an article, or list a data engineering job, hit us up! We would love to include it in our next edition😎
About Moderndatastack.xyzWe're building a platform to bring together people in the data community to learn everything about building and operating a Modern Data Stack. It's pretty cool - do check it out :)