-
-
-
-
URL copied!
Introduction
“True happiness comes only by making others happy.†– David O. McKay
Taking a lead from the quote above, a data platform can be truly happy if it can make others happy. “Others†in this context would be the actors/teams with whom the data platform interacts. Below are the key actors that typically have interactions with the data platform:
- Data Engineers
- Data Consumers
- Data Analysts
- Data Scientists/Machine Learning Engineers
- External Data Consumers like partners & data buyers
- DataOps Engineers
- Data Stewards & Admins (for Data Governance)
This blog identifies the common expectations that Data Engineers and Data Consumers have for a data platform, and it demonstrates how to meet these expectations. DataOps and Data Governance are also extremely important aspects of a comprehensive, end-to-end data platform, so we will cover the perspectives of DataOps Engineers and Data Stewards & Admins in Part II of this blog series.
Great (User) Expectations
Happy Data Engineers
Data Engineers typically expect the following from a data platform:
- If something is already done, I should not waste my time recreating the same.Â
- I should have the means to discover and re-use the already existing data platform components (e.g., extractors, transformers, loaders, connectors etc) and data assets (e.g., ingested data sets in this case).
- I should have access to a framework that allows me to stitch together the modular components reusing the existing available ones.
- I don’t expect all the data pipeline scenarios to be done using the existing components only. I know there might be a need to extend the existing components or create new ones. I want the framework to allow me to extend and create new components and stitch together the same while creating new pipelines.
- For me to be able to do that effectively, I should know exactly how the components must be created in order to stitch the pipelines effectively.
- I want to be able to create the components in a manner so that my work can be utilized not only by me, but also by the larger data engineering community within the organization.
- I want CI/CD integration at a high level, including easy-to-access resources and services to start on the job, as well as being able to move data pipelines across different environments.
- I would like to have the ability to store versions of the pipeline in a smooth, integrated manner.
Happy Data Consumers
Data consumers typically expect the following from a data platform:
- I should know exactly which Golden Records/Versions of processed data already exist.
- I should be able to trust that the data is trustworthy and fit for purpose. (E.g., I should be able to check the lineage to confirm that I am looking at the appropriate data that is needed for the requirement.)
- I should know the exact process required to access the data sets.
- Based on the exact need of the use case, I should be able to leverage different kinds of access patterns like streaming, bulk export/copy, Query, APIs etc.
- If I need a new data set, I should be able to get it serviced quickly.
- I should be able to share datasets and collaborate with other users.
- I should be able to add custom metadata like tags and comments.
Building a Happy Data PlatformÂ
“Efforts and courage are not enough without purpose and direction.†– John F. Kennedy
Approach 1: Build for Platform Feature
Below is a traditional, technology-driven approach:
- Ingest all the data from multiple different systems.
- Build tightly coupled pipelines for each use case, from ingestion to data processing and storage.
- Some approaches work towards generating all possible components for extracting, processing, loading, and exposing data — including batch and stream processing.
However, there are few issues with this approach:
- It doesn’t take long for a user to accumulate a lot of data in the lake and not know what to do with it. The data lake converts into a data swamp, and it becomes increasingly difficult to derive value from the data.
- There may be limited reuse in a case of tightly coupled pipelines.
- Return on investment and time to value might be a big challenge.
Approach 2: Build for Purpose
Below is an approach driven by business needs:
- Create a framework that allows extensibility, modularity, and flexibility by using configurations, templates, etc.
- Explore and discover already existing data and data platform component assets that can be reused.
- Implement specific, prioritized, business-driven use cases by leveraging the framework — creating reusable data platform component assets.
- Get the needed data for the specific use case.
- Platform components should be created based on the framework to allow reuse.
- Build Data Apps like Data Validator, Schema Mapper, etc.
While a DataOps mindset is a complete topic unto itself, it is worth mentioning on a high level that it is important to bring a DevOps and Agile approach to a data project. DataOps encompasses all aspects, including infrastructure management, services setup and management, environment setup, access management of data and components, quality, security and compliance, deployments, version control, and monitoring.
Paying attention to the high-level team setup also enables you to clearly separate team concerns:
- The Core Platform team works on architecting, designing, and creating technical components for the data platform (e.g., data extractors, loaders, processors and transformers, CI/CD, Infrastructure-as-a-Code, etc.).
- The Use Case Implementation team stitches together a pipeline using the components created by the Core Platform team; configures/extends it as needed; and writes the domain/business logic specific to the use case.
Accelerate Your Own Data Journey
The objective of a data platform is to eventually enable purposeful, actionable insights that can lead to business outcomes. Additionally, if the data platform puts the right emphasis on the journey and process (i.e., how it can make the job easier for its key actors while delivering the prioritized projects), then it will deliver an ecosystem that is fit for purpose, minimize waste, and enable a “reuse†mindset.
At GlobalLogic, we are continuously improving our Data Platform Accelerator, which is based on a similar approach. This digital accelerator enables enterprises to immediately manifest a solution that can gather, transform, and enrich data from across their organization. We are excited to work with our clients to accelerate their data journeys, and we would be happy to discuss your own needs through the below contact form.
Top Insights
Best practices for selecting a software engineering partner
SecurityDigital TransformationDevOpsCloudMediaMy Intro to the Amazing Partnership Between the...
Experience DesignPerspectiveCommunicationsMediaTechnologyAdaptive and Intuitive Design: Disrupting Sports Broadcasting
Experience DesignSecurityMobilityDigital TransformationCloudBig Data & AnalyticsMediaLet’s Work Together
Related Content
Enterprise GenAI: The Time to Focus on High-ROI Use Cases is Now
In the relentless pursuit of digital transformation, enterprises are constantly seeking innovative avenues to maintain a competitive edge. Generative Artificial Intelligence (GenAI) stands out as one of the most promising frontiers in this quest. Unlike traditional AI, which primarily focuses on data analysis and interpretation, GenAI has the unique ability to generate new, original content, ideas, and solutions, making it an indispensable tool for businesses across various sectors.
Learn More
DevOps for Customer First Strategy
In the healthcare industry where medical insurance providers are competing with each other to acquire more and more customers, evaluating customers' application to assign a risk level is of prime importance. This helps in formulating the policies and the premium that a customer needs to pay. In order to work on this the insurance companies must share their data which is highly susceptible of being stolen and misused against them by their corporate rivals.
Learn More
Master the skills of QAOps
Recently, the IT world has been experiencing an explosion of different terms related to operations. The good old days—when the global order was defined around a rule of thumb and IT as separate from business—are gone, never to return. Dozens of ‘Ops’ crowded the sphere of software testing: starting with trendy DevOps.
Learn More
The rise of digital cognitive behavioral therapy
In today’s world, more and more people are struggling with depression, anxiety, addiction and a whole range of similar mental health problems. In most of the cases, people are not even aware of the fact that they are fighting with some kind of mental illness. Managing these problems is not an easy task and ignoring these problems calls for unwanted actions and severe consequences, but fortunately we have Cognitive behavioral therapy (CBT) to help people manage their problems by making simple changes in the way they think and behave.
Learn More
Virtual Health Assistant – Transforming Value Based Care
Digital virtual health assistant, also known as virtual health care assistants, are digital platforms that use artificial intelligence (AI) technology to assist individuals manage their health and wellness. These virtual assistants use natural language processing, machine learning and other AI powered technologies to provide a wide range of services.
Learn More
ML – federated learning – Application in life insurance industry
In the healthcare industry where medical insurance providers are competing with each other to acquire more and more customers, evaluating customers' application to assign a risk level is of prime importance. This helps in formulating the policies and the premium that a customer needs to pay. In order to work on this the insurance companies must share their data which is highly susceptible of being stolen and misused against them by their corporate rivals.
Learn More
FinGreen 2.0 : Exploring the role of climate fintech in creating a more sustainable future
There are a number of similar sounding terminologies that a user would come across when exploring a data catalog. In this section we look at the important terms and how they are related to each other.
Learn More
The future of frontend development: Emerging trends and technologies
Let’s start with the history of the Web. It was 1991 when the first web page went live and our lives were changed drastically. Today, millions of people spend hours surfing the internet, making money and investing money, gaining university degrees, listening to music, and watching movies, educational theories, videos, and more.
Learn More
Share this page:
-
-
-
-
URL copied!