Nowadays, “Data Mesh” is a buzzword in the world of Data management, especially for organizations that deal with tons of Data for Digital Transformation. In a nutshell, Data mesh is a concept or a new approach to data architecture and governance. It empowers cross-functional teams across the organization to own and manage their own data domains as a product in a decentralized way. Well, a first glance, it may not be clear what it is and what is the purpose of Data mesh. So first, let’s understand some of the challenges in traditional data architecture. Because of those challenges, data experts start to think about new data architecture.
Traditional Data Architecture and its Challenges
Below, is a very common data architecture that is adopted by a lot of companies.
1. They have operational Databases, enterprise applications, or other source systems that generate insightful data.
2. Then there is the ETL pipeline using tools and technologies. It extracts data from data sources, do some transformations as per business needs, then writes/loads the processed data into a centralized data repository or data warehouse.
3. From the data warehouse, the data goes into specific data marts. It is a simple form of a data warehouse that is focused on a single subject or line of business, such as sales, finance, or marketing.
4. Data in a data mart will be used in a reporting solutions to make business decisions.
Now, What are the challenges with this architecture?
Control over data is centralized
In above data architecture, data stored in a central place, This means one team or department is responsible for managing all the data in the organisation. It may slow down the data access for other teams within the organisation who need data quickly. This is because the different teams need to always wait on the centralized data team to provide them with the data they need.
For example, the Marketing team has a timeline to create a targeted promotion for the upcoming season. They need sales and customer data. But the Data team is unable to provide the data they need in the timeline. The marketing team will need to wait on the centralized data team to provide them with the data. It can slow down the process and make it difficult for the marketing team to be responsive to customer needs. Centralized control can create huge problems when the business starts to grow and scale.
In the above data architecture, there may be a lack of clear ownership and accountability for the data. And this in most cases will lead to data quality and consistency issues. This creates a lot of conflicts across the organization. For example, the Sales team analyzed reports generated on top of sales data but found missing information. They reach out to the data team but according to the data team, data is not in the correct state from the source system or application. So, there is a conflict between the teams, who are responsible for the quality and consistency of data.
The above data architecture is a monolith data architecture, which means a framework where all data is stored, transformed, manipulated, consumed, and managed from a single centralized data store. There is a major problem with this monolithic data architecture is that they are difficult to change and adapt. Organizations implements Monolithic data architecture using selected tools and technologies. If they try to be innovative by introducing new technology or a new way of handling the data, this can lead to problems.
There are many teams within the organization, that do not want to deal with the whole data warehousing architecture. They want to use data fast. They cannot afford to wait and go through the usual process and approvals. What happens is that they will use their own system solutions that are going to be completely separate from the centralized system. That is OK to speed up the data access. But if the leadership wants a comprehensive overview of the business, they will need all data. It can impact business decisions or a lot of decisions being made without all the data. It will impact the overall business.
So, these are a few main challenges of the above monolithic data architecture. These challenges can make it difficult for organizations to make decisions quickly, scale their systems efficiently and manage day-to-day operations.
Then, Data Mesh comes into the picture with a new data architectural approach. It is capable of solving the above challenges. As mentioned earlier –
Data Mesh is a new approach to data architecture and governance that empowers cross-functional
teams to own and manage their own data domains in a decentralised way and collaborating to
ensure data quality and consistency across the organisation
In the above data mesh architecture,
- Decentralisation of business into domains(Sales, Marketing etc.).
- Each Domain has team(group of Domain experts, Engineers, Data experts). The team is owning and manages its own data sets as a data product.
- Each domain team shares the data products securely across the organization. One domain team can use the data of the other domain team for their product implementation.
- Each domain team can use sets of tools and technology to implement and manage their data product independently.
Now, let’s see how date mesh can solve all the above challenges
In Data mesh, instead of centralized control over organizations’ data, the data ownership and management is distributed across the individual domain teams. In data mesh, you have the individual domain(like sales, marketing, etc.) teams with their own datasets. Data itself is treated as a product. And the data from these teams will go to a centralized platform and be accessible from every single team within the organization. It leads to the reduction of bottlenecks and allows faster data access.
Now, if the marketing team needs to do targeted promotion and they need data fast, they can organize it within their own team to access the data, and use it in any way that they want. No need to depend on another team for the data.
In the traditional data architecture, there were no clear ownership and accountability of data. But in data mesh, Each domain team is responsible for the quality and consistency of the data they own. If something goes wrong with the data, the Domain team that handles the data will be responsible to take further action.
Distributed Domain driven data architecture
Unlike Monolithic Data architecture, where all data goes to a centralized system, Data Mesh uses modular, loosely coupled data systems that can be easily changed and adapted. As seen above, every single domain team holds and manages its data on its own. So, changes to one system will not affect the other systems. Changes to one system are much easier to implement to fulfill the evolving business needs. Different domain teams can use different sets of tools and technologies as per their requirement.
Sharing and Collaboration
In Data mesh, domain teams are not encouraged to create data silos. Teams can use their data systems for fast data access and business agility. However, they need to adopt a mindset of sharing information across the organization with proper security. for example, through APIs or service mesh. So, if any other team within the organization needs data for their data product, they can use it. For example, if the sales department has its own data system here. However, the sales data will be accessible from the centralized platform by using a sales data product. The marketing team can use it for target promotions or coupons.
It seems data mesh can solve a lot of challenges faced by traditional data architecture. However, it can add complexity to an organization’s data infrastructure. As well, to adopt a data mesh there will need a cultural shift in how data is owned and shared by different teams. You need to keep in mind its complexity and expertise required before adopting this data architecture.