Data Engineering For Analytics
Data Analysts operate primarily in dashboards and presentation layers, while traditional data engineering carries responsibility for both stable warehouse modelling and the evolving needs of analytical delivery. This creates tension, because the data warehouse layer is designed for accuracy, consistency, and reuse, while the analytics layer demands agility and iteration.
In my view, the most common structural failure of data teams is the separation of analytical delivery from data engineering capability. When analytical intent and data shaping are disconnected, bottlenecks emerge. The second pillar of the Analytics Operating System is the repositioning of the Data Analyst from dashboard operator to active collaborator across requirement, data shaping, and delivery, without turning them into full-scale platform engineers. When transforming data from your Data Warehouse or Data Mart into an analytics-ready shape, a small number of core modelling techniques go a long way. Below is a list of the key concepts behind most data transformation work.
The Six Data Moves for Analysts
Data cleaning – filtering and calculated fields
Data cleaning is one of the most fundamental tools in the analyst’s toolkit. Let’s begin by clearing up a common misconception: data cleaning does not imply that the data coming from the Data Mart is “dirty.” In most cases, the data is perfectly valid for its original purpose. What changes is the analytical question.
Aggregation
Data aggregation can refer to two distinct concepts.
- Row level aggregation
- Granularity reducing aggregation
Joins
Joins are the powerhouse of analytics. Every analyst needs to understand what they are and how they transform the datasets that are “joined” together. Joins are simply a technical term for relating one dataset to another. This also assumes that there are elements in the two datasets that allow a relationship to be created.
Pivot
If our dataset is a cube, a pivot is the act of rotating it so you can look at a different face. We are not changing the underlying facts, we are changing which dimensions are laid out as rows, which are laid out as columns, and which values are shown as the measure.
In business datasets, the same information commonly appears in two “shapes”:
- Wide (years as columns): easy for humans to read in a spreadsheet.
- Long (one row per observation): easy for databases to store and for analytical tools to aggregate and join.
Union
A union is simply the action of “appending” a table with contents of another table. An example use-case can be lists of customers coming from two different CRM systems.
Scaffold
Scaffolding is a very useful technique and as its name suggests, it consists of building a “scaffold” to support an analytics question that is difficult to answer without the scaffold. A canonical use-case of scaffolding is the context of customer support cases,, where one of the most common business question is: how many cases are open on any given day? Scaffolding can build a dataset that answers this question.
To receive the full write up with examples and best practices, join "The Simplicity Stack"