MDS Newsletter #21
I hope you all had a lovely Valentine's day with your loved ones.
Last week we launched the inaugural version of the MDS Rocketship Awards: an effort to bring the data community together to recognize & celebrate the most impactful data tools and platforms in the world.
We're announcing one winner each day, and the MDS Rocketships so far are -
- Hightouch(Reverse ETL)
- Transform(Metrics Store)
- Immuta(Data Privacy & Governance),
- Monte Carlo(Data Quality Monitoring)
- dbt(Data Modelling & Transformation)
You can get the latest updates on the winners for upcoming data categories on our Awards page & on Twitter (All the winner announcements are added in the thread. Stay tuned!
These awards are adjourned by an awesome jury panel consisting of top data leaders & investors with diverse professional experience in the data industry.
Now let's dive into this week's edition.
Featured category this week: No-Code Automation
"To automate, one must hire engineers to write code" , thanks to the evolution of the no-code automation platforms, this now has become a thing of the past.
No-Code Automation is an umbrella term for the process where primarily non-technical users create novel ways to share data between applications on a scheduled or triggered basis.
Here's an amazing article and tweet thread from Steve West, Co-founder Phiona explaining no-code automation and how it fits into the Modern Data Stack.
Featured tools this week
- Dremio is a high-performance SQL (data) lakehouse platform built on an open data architecture that helps to accelerate BI and Analytics directly on cloud data lake storage. Dremio is a fundamentally new approach to data analytics that helps companies get more value from their data, faster. It makes data engineering teams more productive, and data consumers more self-sufficient.
Dremio has raised total funding of $410 M in over 6 rounds. The latest round of Series E funding was held on 25th Jan 2022 where Demio raised $160 M.
- Soda is the data reliability and observability company that provides Open Source Software (OSS) tools and a SaaS platform to enable data teams to discover, prioritize, and resolve data issues.
Soda has raised total funding of €14.6M in over 6 rounds. Their latest funding was raised on 2nd Feb 2021, from a Series A round.
Good reads & resources
- The Unbundling of Airflow: In the data world, it's hard to figure out what makes a platform, but luckily some tools self-advertise themselves as such. Airflow is one of them! In this article, Gorkem Yurtseven has talked about the unbundling of Airflow: a platform to programmatically author, schedule, and monitor workflows.
- Launching and Scaling Data Science Teams: Three Years Later: Building & scaling a data team is a tough job. And with ever changing business environment the difficulty only increases. In this article, Ian Macomber talks about how to launch & scale analytics teams at an organization. He shared his takeaways from the lessons he learned while working at Drizly. He shared how the data team there managed to deal with the growing pains & challenges of a rapidly shifting privacy and security landscape, supported two due diligence data rooms and acquisition, all while transitioning to a modern data stack.
- Data Observability vs. Data Testing: Everything You Need to Know: In the last decade the only way to keep an eye on bad data was testing, and it was sufficient then but not anymore. Due constantly growing volume of data that is being ingested by organizations into their data systems, it's not feasible to rely on testing alone anymore, you need data observability too.
In this article, Lior Gavish has talked about the two different issues with data quality, "known unknowns" & "unknown unknowns", and discussed how these two require two distinct approaches to testing and data observability. He gave clear points on when to use "testing" & when to use "data observability", stated 4 ways data observability differs from data testing, and gave a much-needed answer to the question, "Do you need both?".
- Scaling reliable data (and human) pipelines: In this podcast episode, Marc Stone talks about the challenges of going from being the only person working on data to running a very large data team. Some of the high points include topics like what private companies can learn from governments and nonprofits, how to think about data pipelines (especially when there's a human in the mix), how to architect your systems for speed and robustness, how to stay ahead of your users' needs, and what the appropriate data team size is for your maturity level.
- Danger Zone: Inconsistent Metrics at Work: Working with inconsistent metrics is like playing chess with a pigeon; it’s meaningless. Despite being meaningless, inconsistency also has severe consequences for your business. It leads to worthless comparisons, erroneous results, and dangerous data cultures where no one can trust the data. In this article, Lauri Hänninen talks about the "what", "why", & "how" of inconsistent metrics and how to achieve consistency by standardizing the metrics to leave that "danger zone".
Latest funding news
- Starburst raised $250M in a Series-D round of funding
Starburst is the analytics engine for the data mesh. It unlocks the value of distributed data by making it fast and easy to access, no matter where it lives.
This round of funding was led by Alkeon Capital with participation from Altimeter and B Capital Group. Along with existing investors including Andreessen Horowitz, Coaute Management, Index Ventures and Salesforce VC. Read here.
- Census raised $60M in a Series-B round of funding!
Census is the data automation platform that syncs your data warehouse with the apps you use.
Tiger Global led the round with participation from Insight Partners and previous investors Sequoia and Andreessen Horowitz. Read here
- Popsql raised $14M in a Series-A round of funding!
PopSQL is a collaborative SQL editor to share queries and visualize data and provide solutions to write queries, visualize data, and share results with the team.
This round of funding was led by Tiger Global. Other participants include Gradient Ventures, FundersClub, and Y Combinator. Read here
Upcoming data stack events & webinars
- Materialize is hosting a Materialize + dbt + Redpanda Virtual Hack Day 2022 on Feb 17th.
The goal is to encourage knowledge-sharing between our communities and give you a sense of what building streaming analytics pipelines with this stack looks like. Register here for the event.
- Thoughtworks is organising a virtual summit on 'State of Data Mesh 2022' on 23rd Feb.At this event, you'll learn about the current and future state of data mesh, lessons from data leaders and technologists who have adopted it and get a chance to hear the perspective of industry leaders in the space. Register here for the event.
- Dremio is organising Subsurface LIVE Winter 2022 from March 2-3, 2022
Hear firsthand from technology experts, open-source innovators, data engineers, architects, and more on the trends and strategies propelling today’s cloud data lake ecosystem, including Data lakehouses, ETL, orchestration, data quality, and visualization. Register here for the event.
- Snowplow Analytics is hiring an 'Analytics Engineer'
Location: UK, Remote
Stack: dbt, Dataform, Looker
- Knotch is hiring a 'Data Engineering Lead'
Location: New York ( Remote Optional)
Stack: Lambda, Redshift, EKS
- GetinData is hiring a 'Data Platform Engineer
'Location: RemoteStack: Airflow, Python, Spark, Amundsen
- Heap is hiring an "Engineering Manager, Data Ecosystem"
Stack: Redshift, Snowflake, BigQuery
- Netlify is hiring a 'Director, Data & Insights'
What's 🔥 on Twitter
Just for fun
If you like this newsletter (I know you do😉 ), share it with your friends. It will take 10 seconds for you to share this, but took us 10 hours to prepare. Send us some love 💖
Do you have any suggestions, or want us to feature an article, or list a data engineering job, hit us up! We would love to include it in our next edition😎
About Moderndatastack.xyzWe're building a platform to bring together people in the data community to learn everything about building and operating a Modern Data Stack. It's pretty cool - do check it out :)