Due to the increasing size, frequency and complexity of data usage in our modern lives and workplaces – stories similar to Frank’s are easily recognisable with the need for digital transformation increasingly transparent. In moving away from manual data processing, very likely with Excel, towards more modern tools and technologies, the challenge lies not in how to build a new system but what to include in the new system. After all it is not building the system itself that adds value but the ability of a system to satisfy the needs of the user and the goals of the business; the user ultimately requires the output from data transformation and processing.
And so prior to development – what is the best methodology to understand a set of user requirements within the data space? How should a team be structured to most efficiently work?
The King in Alice in Wonderland gives a good advice: “Begin at the beginning, and go on till you come to the end: then stop.” We wish to understand the as-is process both in terms of its function and also to understand why this output is of value to the business. Of course here the King misleads us, in understanding the as-is process we are only at the start of imagining and creating an improved to-be process.
One structure for our teams that has worked well in the past is the division of labour between analysts. Here an analyst investigates the business context of the process, which is then passed to another analyst who can use it to enrich their understanding of the data and data process. To date we have been calling these roles business analyst and data business analyst respectively.
While, one person could conduct the tasks present in both roles, I believe there is a benefit to having two analysts and of having a distinction between their roles. The benefits lie in creating speed and efficiency when unpacking complex problems. A demarcation of role avoids multi-tasking between business analysis and research at the data-level (data processing, chained mathematics). This increases productivity through specialism of skillset and by reducing the need to switch between these tasks.
To understand both how data is currently processed and the value to the organisation it is useful to implement a framework. Here we borrow ideas from Lean Manufacturing and Six Sigma, a business process that transforms data can be broken into five parts; Suppliers, Inputs, Process, Outputs and Customers (SIPOC). The business analysts will analyse Suppliers and Customers (to understand the business outcomes of the process), while the data business analyst will analyse Inputs, Process and Outputs (to understand the data outcomes of the process). In doing so we seek to combine both business and data outcomes into a set of true requirements that define the process of data transformation. The process itself can then be optimised if required as long as the outcome of both the data transformation and business need is the same (or better) as previously.
I am currently working in the role of data business analyst with the responsibility of creating a set of requirements for new data pipelines from existing Excel processes. Depending on the complexity of the data pipeline, somewhere in the region of 10 to 50 user stories will be written to guide our data engineering and development team. I work with a business analyst who defines the as-is business process and with this knowledge it is my responsibility to work out the best process by which outputs are created. In applying this methodology and method of working, I can offer the following advice:
- Use frameworks and tools to structure thinking: We have strictly defined the abstraction of a data pipeline in terms of collection, ingestion, transformation, calculation and publication. Each step has a set definition and abstracted set of requirements. To map the logical flow between these steps I favour LucidCharts. Creating a process flow with swim-lane for each step gives clarity of the to-be process.
- Dig into infrastructure: Often business rules are stored and defined in existing infrastructure. We can programmatically dig into these data stores in order to structure this information in a more approachable manner.
- Test assumptions: Upfront work always has a lower cost. Any data and process exploration prior to development is time well spent.
- Consider the wider picture: There are likely common functions and frameworks that exist / are required across all data pipelines. Building these upfront expedites the creation of future data pipelines.
- Continue engaging with the business and stakeholders: The process of conducting data analysis is by no means independent from Agile methodologies. Seeking continuous feedback during analysis, as with development, will help course correction ensuring the direction of work is the most impactful.
Finally and in conclusion, back to Frank, the need for digital transformation and the scale of our challenge. Imagine if you weren’t just to consider solving Frank’s problem but instead were to contemplate moving an entire organisation forwards. Methods and processes for dealing with data developed 10 – 20 years ago must be unwound, understood and recreated. To understand the interaction between business and data at the organisational scale is a true undertaking. However, the rewards for those who can make the change are substantial, including new transparency and efficiency of data processing and greater insight from existing data stores. In this ever-connected data-driven world, expect data business analysis to be a significant role in the future.