In the business world, there is a concept known as the Value Chain. In short, it's the steps that a business takes to add value and make money. A manufacturer will receive raw materials, manufacture components, assemble them into a finished product, distribute to retail locations and market to their customers. Every key step along the way adds value, allowing the company to sell their products for more than the cost of making them, creating a profit.
Data has a similar value chain. Initially, it is not always valuable by itself. Like raw materials, data often needs to be curated and transformed into valuable business insights.
Capture As with all things, Data must first be created. Whether this is a user entering data into a system or an IIoT sensor sending a message packet of values, data is originated and captured in a System of Record.
At low maturity, the data that is captured supports only the immediate business processes surrounding it. For instance, Supply Chain may enter vendor information required to approve the vendor, but not the information AP needs to pay them. If the departments have different systems for managing their data, then they risk becoming complete silos - maintaining separate lists of vendors with no real way to link their data together.
To take it a step further, often analytics consumers need additional data or data captured at a finer grain for reporting purposes. In a low maturity environment, these users are often left fending for themselves, creating supplementary sets of data maintained manually in spreadsheets.
An organization with more maturity won't capture only what is needed to support the business process of a single department, but of the enterprise, including analytics consumers. This is difficult because users are often concerned with driving the business goals of their department head, and if the department head isn't concerned with the data needed outside of their department, mature data processes won't be a priority.
The department head, or Data Owner, can be educated to understand how the data they control impacts the rest of the organization, and their buy-in is instrumental in accomplishing the enterprise's data goals.
Whenever data is created, no matter the maturity, there will always be people, process and technology. A proper data governance strategy will develop a plan that involves all three in optimizing data capture for the start of the Value Chain.
Colocate Some people can find all the data they need for a process in a single source. They can run reports in those systems and get the information they need.
Often, people need data from multiple sources. Data Gathering and Data Preparation becomes an arduous and repetitive task as someone regularly extracts data from multiple systems and brings them together for analysis. Many times, this is done in Excel or some other application that can suffer from performance limitations.
The next basic step in the data value chain is co-locating the data into the same platform, optimized for data analysis. Often this is facilitated via replication or ETL tools. While it seems like a small step, the value gained is typically significant for the amount of work required. When users can connect to a single location to gain access to all of their data, they immediately are able to save time on Data Gathering and can perform analysis that they normally have to forego.
Curate The next step in the value chain is to curate the data. Anyone familiar with any form of warehousing should recognize this. Design Models, Transform the Data and Load the Models.
This is where the magic happens...and where the debates (or arguments) on technique rage.
Regardless of the approach, there is significant value in creating an abstracted business model for reuse. No longer does someone need to understand a system's underlaying data model - now they have access to models that represent the business, using terms and structures that fit the underlying processes.
This truly opens data up to the enterprise, allowing someone in operations to get access to financial information and vice versa, when those individuals would normally struggle to navigate the system models on their own, risking misuse or wrong answers.
Consume This is a catch-all term for analytics. Whether a report, a cube, an ML model hosted in Databricks, the data at the end of the value chain is built to be consumed and analyzed to generate the final product that adds direct value to the organization.
The Governed Warehouse works to be a solution for Co-locating and Curating data, for the purpose of analytical Consumption.