That includes the 6 large concepts you must know from 2021
As the info world slowed down for the vacations, I bought some downtime to step again and take into consideration the final yr. And I can’t assist however suppose, wow, what a yr it’s been!
Is it simply me, or did knowledge undergo 5 years’ value of change in 2021?
It’s partially COVID time, the place a month appears like a day and a yr on the similar time. You’d blink, and instantly there could be a brand new buzzword dominating Knowledge Twitter. It’s additionally partially the deluge of VC cash and loopy startup rounds, which added gas to the yr’s knowledge hearth.
With a lot hype, it’s arduous to know what traits are right here to remain and which is able to disappear simply as rapidly as they arose.
This weblog breaks down the six concepts you must know in regards to the fashionable knowledge stack going into 2022 — those that exploded within the knowledge world final yr and don’t appear to be going away.
You most likely know this time period by now, even you don’t precisely know what it means. The thought of the “knowledge mesh” got here from two 2019 blogs by Zhamak Dehghani, Director of Rising Applied sciences at Thoughtworks:
- Methods to Transfer Past a Monolithic Knowledge Lake to a Distributed Knowledge Mesh
- Knowledge Mesh Rules and Logical Structure
Its core concept is that firms can grow to be extra data-driven by shifting from centralized knowledge warehouses and lakes to a “domain-oriented decentralized knowledge possession and structure” pushed by self-serve knowledge and “federated computational governance”.
As you possibly can see, the language across the knowledge mesh will get complicated quick, which is why there’s no scarcity of “what really is a knowledge mesh?” articles.
The thought of the info mesh has been quietly rising since 2019, till instantly it was in every single place in 2021. The Thoughtworks Know-how Radar moved Knowledge Mesh’s standing from “Trial” to “Assess” in only one yr. The Knowledge Mesh Studying Group launched, and their Slack group bought over 1,500 signups in 45 days. Zalando began doing talks about the way it moved to a knowledge mesh.
Quickly sufficient, scorching takes have been flying backwards and forwards on Twitter, with knowledge leaders arguing over whether or not the info mesh is revolutionary or ridiculous.
In 2022, I feel we’ll see a ton of platforms rebrand and supply their providers because the “final knowledge mesh platform”. However the factor is, the info mesh isn’t a platform or a service which you could purchase off the shelf. It’s a design idea with some great ideas like distributed possession, domain-based design, knowledge discoverability, and knowledge product delivery requirements — all of that are value attempting to operationalize in your group.
So right here’s my recommendation: As knowledge leaders, it is very important persist with the primary ideas at a conceptual degree, somewhat than purchase into the hype that you just’ll inevitably see out there quickly. I wouldn’t be stunned if some groups (particularly smaller ones) can obtain the info mesh structure by means of a totally centralized knowledge platform constructed on Snowflake and dbt, whereas others will leverage the identical ideas to consolidate their “knowledge mesh” throughout complicated multi-cloud environments.
Metrics are vital to assessing and driving an organization’s progress, however they’ve been struggling for years. They’re typically break up throughout completely different knowledge instruments, with completely different definitions for a similar metric throughout completely different groups or dashboards.
In 2021, folks lastly began speaking about how the trendy knowledge stack may repair this concern. It’s been referred to as the metrics layer, metrics retailer, headless BI, and much more names than I can checklist right here.
It began in January, when Base Case proposed “Headless Enterprise Intelligence”, a brand new method to fixing metrics issues. A pair months later, Benn Stancil from Mode talked in regards to the “lacking metrics layer” in right now’s knowledge stack.
That’s when issues actually took off. 4 days later, Mona Akmal and Aakash Kambuj from Falkon printed articles about making metrics first-class residents and the “fashionable metrics stack”.
Two days after that, Airbnb introduced that it had been constructing a home-grown metrics platform referred to as Minerva to unravel this concern. Different outstanding tech firms quickly adopted go well with, together with LinkedIn’s Unified Metrics Platform, Uber’s uMetric, and Spotify’s metrics catalog of their “new experimentation platform”.
Simply once we thought this fervor had died down, Drew Banin (CPO and Co-Founding father of dbt) opened a PR on dbtcore in October. He hinted that dbt could be incorporating a metrics layer into its product, and even included hyperlinks to these foundational blogs by Benn and Base Case. The PR blew up and reignited the dialogue round constructing a greater metrics layer within the fashionable knowledge stack.
In the meantime, a bunch of early stage startups have launched to compete for this house. Rework might be the most important identify to date, however Metriql, Lightdash, Supergrain, and Metlo additionally launched this yr. Some greater names are additionally pivoting to compete within the metrics layer, reminiscent of GoodData’s foray into Headless BI.
I’m extraordinarily excited in regards to the metrics layer lastly turning into a factor. A couple of months in the past, George Fraser from Fivetran had an unpopular opinion that all metrics shops will evolve into BI instruments. Whereas I don’t totally agree, I do imagine {that a} metrics layer that isn’t tightly built-in with BI is unlikely to ever grow to be commonplace.
Nevertheless, present BI instruments aren’t actually incentivized to combine an exterior metrics layer into their instruments… which makes this a rooster and egg downside. Standalone metrics layers will battle to encourage BI instruments to undertake their frameworks, and will probably be compelled to construct BI like Looker was compelled to a few years in the past.
For this reason I’m actually enthusiastic about dbt asserting their foray into the metrics layer. dbt already has sufficient distribution to encourage no less than the trendy BI instruments (e.g. Preset, Mode, Thoughtspot) to combine deeply into the dbt metrics API, which can create aggressive strain for the bigger BI gamers.
I additionally suppose that metrics layers are so deeply intertwined with the transformation course of that intuitively this is smart. My prediction is that we’ll see metrics grow to be a first-class citizen in additional transformation instruments in 2022.
For years, ETL (Extract, Rework, Load) was how knowledge groups populated their techniques. First, they’d pull knowledge from third-party techniques, clear it up, after which load it into their warehouses. This was nice as a result of it saved knowledge warehouses clear and orderly, but it surely additionally meant that it took without end to get knowledge into warehouses. Generally, knowledge groups simply needed to dump uncooked knowledge into their techniques and take care of it later.
That’s why many firms moved from ETL to ELT (Extract, Load, Rework) a few years in the past. As a substitute of remodeling knowledge first, firms would ship uncooked knowledge into a knowledge lake, then rework it later for a particular use case or downside.
In 2021, we bought one other main evolution on this concept — reverse ETL. This idea first began getting consideration in February, when Astasia Myers (Founding Enterprise Accomplice at Quiet Capital) wrote an article in regards to the emergence of reverse ETL.
Since then, Hightouch and Census (each of which launched in December 2020) have set off a firestorm as they’ve battled to personal the reverse ETL house. Census introduced that it raised a $16 million Collection A in February and printed a collection of benchmarking experiences focusing on Hightouch. Hightouch countered with three raises of a complete $54.2 million in lower than 12 months.
Hightouch and Census have dominated the reverse ETL dialogue this yr, however they’re not the one ones within the house. Different notable firms are Grouparoo, HeadsUp, Polytomic, Rudderstack, and Workato (who closed a $200m Collection E in November). Seekwell even bought acquired by Thoughtspot in March.
I’m fairly enthusiastic about the whole lot that’s fixing the “final mile” downside within the fashionable knowledge stack. We’re now speaking extra about the right way to use knowledge in every day operations than the right way to warehouse it — that’s an unbelievable signal of how mature the elemental constructing blocks of the info stack (warehousing, transformation, and so on) have grow to be!
What I’m not so certain about is whether or not reverse ETL ought to be its personal house or simply be mixed with a knowledge ingestion instrument, given how comparable the elemental capabilities of piping knowledge out and in are. Gamers like Hevodata have already began providing each ingestion and reverse ETL providers in the identical product, and I imagine that we would see extra consolidation (or deeper go-to-market partnerships) within the house quickly.
Within the final couple of years, the talk round knowledge catalogs was, “Are they out of date?” And it could be straightforward to suppose the reply is sure. In a few well-known articles, Barr Moses argued that knowledge catalogs have been lifeless, and Michael Kaminsky argued that we don’t want knowledge dictionaries.
Alternatively, there’s by no means been a lot buzz about knowledge catalogs and metadata. There are such a lot of knowledge catalogs that Rohan from our group created thedatacatalog.com, a “catalog of catalogs”, which feels each ridiculous and fully needed. So which is it — are knowledge catalogs lifeless or stronger than ever?
This yr, knowledge catalogs bought new life with the creation of two new ideas — third-generation knowledge catalogs and lively metadata.
At first of 2021, I wrote an article on fashionable metadata for the trendy knowledge stack. I launched the concept we’re coming into the third-generation of knowledge catalogs, a elementary transformation from the prevalent old-school, on-premise knowledge catalogs. These new knowledge catalogs are constructed round various knowledge property, “large metadata”, end-to-end knowledge visibility, and embedded collaboration.
This concept bought amplified by a large transfer Gartner made this yr — scrapping its Magic Quadrant for Metadata Administration Options and changing it with the Market Information for Energetic Metadata. In doing this, they launched “lively metadata” as a brand new class within the knowledge house.
What’s the distinction? Outdated-school knowledge catalogs accumulate metadata and produce them right into a siloed “passive” instrument, aka the standard knowledge catalog. Energetic metadata platforms act as two-way platforms — they not solely convey metadata collectively right into a single retailer like a metadata lake, but additionally leverage “reverse metadata” to make metadata obtainable in every day workflows.
Because the first time we wrote about third-generation catalogs, they’ve grow to be a part of the discourse round what it means to be a contemporary knowledge catalog. We even noticed the phrases pop up in RFPs!
On the similar time, VCs have been keen to take a position on this new house. Metadata administration has grown a ton with raises throughout the board — e.g. Collibra’s $250m Collection G, Alation’s $110m Collection D, and our $16m Collection A at Atlan. Seed-stage firms like Stemma and Acryl Knowledge additionally launched to construct managed metadata options on present open-source initiatives.
The info world will at all times be various, and that range of individuals and instruments will at all times result in chaos. I’m most likely biased, on condition that I’ve devoted my life to constructing an organization within the metadata house. However I actually imagine that the important thing to bringing order to the chaos that’s the fashionable knowledge stack lies in how we are able to use and leverage metadata to create the trendy knowledge expertise.
Gartner summarized the way forward for this class in a single sentence: “The stand-alone metadata administration platform will probably be refocused from augmented knowledge catalogs to a metadata ‘anyplace’ orchestration platform.”
The place knowledge catalogs within the 2.0 technology have been passive and siloed, the three.0 technology is constructed on the precept that context must be obtainable wherever and each time customers want it. As a substitute of forcing customers to go to a separate instrument, third-gen catalogs will leverage metadata to enhance present instruments like Looker, dbt, and Slack, lastly making the dream of an clever knowledge administration system a actuality.
Whereas there’s been a ton of exercise and funding within the house in 2021, I’m fairly certain we’ll see the rise of a dominant and actually third-gen knowledge catalog (aka an lively metadata platform) in 2022.
As the trendy knowledge stack goes mainstream and knowledge turns into a much bigger a part of every day operations, knowledge groups are evolving to maintain up. They’re now not “IT people”, working individually from the remainder of the corporate. However this raises the query, how ought to knowledge groups work with the remainder of the corporate? Too typically, they get caught within the “service entice” — unending questions and requests for creating stats, somewhat than producing insights and driving influence by means of knowledge.
In 2021, Emilie Schario from Amplify Companions, Taylor Murphy from Meltano, and Eric Weber from Sew Repair talked a couple of method to break knowledge groups out of this entice — rethinking knowledge groups as product groups. They first defined this concept with a weblog on Regionally Optimistic, adopted by nice talks at conferences like MDSCON, dbt Coalesce, and Future Knowledge.
A product isn’t measured on what number of options it has or how rapidly engineers can quash bugs — it’s measured on how nicely it meets prospects’ wants. Equally, knowledge product groups ought to be centered on the customers (i.e. knowledge shoppers all through the corporate), somewhat than questions answered or dashboards constructed. This enables knowledge groups to concentrate on expertise, adoption, and reusability, somewhat than ad-hoc questions or requests.
This concentrate on breaking out of the service entice and reorienting knowledge groups round their customers actually resonated with the info world this yr. Extra folks have began speaking about what it means to construct “knowledge product groups”, together with loads of scorching takes on who to rent and the right way to set targets.
Of all of the hyped traits in 2021, that is the one I’m most bullish on. I imagine that within the subsequent decade, knowledge groups will emerge as some of the necessary groups within the group material, powering the trendy, data-driven firms on the forefront of the economic system.
Nevertheless, the fact is that knowledge groups right now are caught in a service entice, and solely 27% of their knowledge initiatives are profitable. I imagine the important thing to fixing this lies within the idea of the “knowledge product” mindset, the place knowledge groups concentrate on constructing reusable, reproducible property for the remainder of the group. This may imply investing in consumer analysis, scalability, knowledge product delivery requirements, documentation, and extra.
This concept got here out of “knowledge downtime”, which Barr Moses from Monte Carlo first spoke about in 2019 saying, “Knowledge downtime refers to durations of time when your knowledge is partial, inaccurate, lacking or in any other case inaccurate”. It’s these emails you get the morning after an enormous challenge, saying “Hey, the info doesn’t look proper…”
Knowledge downtime has been part of regular life on a knowledge group for years. However now, with many firms counting on knowledge for actually each facet of their operations, it’s an enormous deal when knowledge stops working.
But everybody was simply reacting to points as they cropped up, somewhat than proactively stopping them. That is the place knowledge observability — the thought of “monitoring, monitoring, and triaging of incidents to forestall downtime” — got here in.
I nonetheless can’t imagine how rapidly knowledge observability has gone from being simply an concept to a key a part of the trendy knowledge stack. (Just lately, it’s even began being referred to as “knowledge reliability” or “knowledge reliability engineering”.)
The house went from being non-existent to internet hosting a bunch of firms, with a collective $200m of funding raised in 18 months. This consists of Acceldata, Anomalo, Bigeye, Databand, Datafold, Metaplane, MonteCarlo, and Soda. Folks even began creating lists of recent “knowledge observability firms” to assist preserve monitor of the house.
I imagine that previously two years, knowledge groups have realized that tooling to enhance productiveness just isn’t a good-to-have however essential. In any case, knowledge professionals are some of the sought-after hires you’ll ever make, in order that they shouldn’t be losing their time on troubleshooting pipelines.
So will knowledge observability be a key a part of the trendy knowledge stack sooner or later? Completely. However will knowledge observability live on as its personal class or will it’s merged right into a broader class (like lively metadata or knowledge reliability)? That is what I’m not so certain about.
Ideally, when you’ve got all of your metadata in a single open platform, you must be capable to leverage it for quite a lot of use instances (like knowledge cataloging, observability, lineage and extra). I wrote about that concept final yr in my article on the metadata lake.
That being stated, right now, there’s a ton of innovation that these areas want independently. My sense is that we’ll proceed to see fragmentation in 2022 earlier than we see consolidation within the years to return.
It could really feel chaotic and loopy at instances, however right now is a golden age of knowledge.
Within the final eighteen months, our knowledge tooling has grown exponentially. All of us make loads of fuss in regards to the fashionable knowledge stack, and for good cause — it’s so significantly better than what we had earlier than. The sooner knowledge stack was frankly as damaged as damaged may get, and this gigantic leap ahead in tooling is precisely what knowledge groups wanted.
In my view, the subsequent “delta” on the horizon for the info world is the fashionable knowledge tradition stack — the perfect practices, values, and cultural rituals that may assist us various people of knowledge collaborate successfully and up our productiveness as we sort out our new knowledge stacks.
Nevertheless, we are able to solely take into consideration working collectively higher with knowledge after we’ve nailed, nicely, working with knowledge. We’re on the cusp of getting the trendy knowledge stack proper, and we are able to’t wait to see what new developments and traits 2022 will convey!
This text was initially printed on In the direction of Knowledge Science.
Header picture: Mike Kononov on Unsplash