-
-
-
-
URL copied!
Modern enterprises spend a large amount of time and resources building data pipelines into the data platform from a variety of sources and managing the quality of data transferred through the pipelines. These pipelines can vary in terms of source systems, sink systems, transformations, and validations performed.
A pipeline created for a particular use case may not be reusable for a different one and will require additional development effort to change. As a result, there is a need for frameworks that build new pipelines, adding additional data sources or data sinks with minimal time and development effort. Ideally, the framework should also be flexible in customizing and extending it to easily adapt to suit enterprise-specific requirements.
A number of low code and no-code solutions exist that allow for visually creating the data pipelines across a variety of sources and sinks. However, they do not provide the flexibility and modularity typically required to customize the pipelines for a given scenario.
Using a low code framework consisting of reusable, modular components that can be stitched together to compose the required pipelines is a better approach.
In this post, you’ll learn about the requirements for the low code framework and the approach to designing this framework.
Requirements for the Framework
Creating and maintaining pipelines to move data in and out of the platform is a major consideration. A data platform framework that allows its users to perform the different operations in a consistent way, irrespective of the underlying technology, will greatly reduce time and effort.
What do you look for in a low code framework? Here are some suggested requirements.
Modular: The framework should be modular in design. Each component of the framework can be used, managed, and enhanced independently.
Out-of-the-Box Functionality: Support integration with common data sources and sinks, and perform transformations out of the box. The components should be easy to implement for common use cases.
Flexible: The framework should be able to integrate with different services/systems across clouds or from on-premises.
Extensible: Allow extending existing components to customize as per specific requirements or add new custom components to implement new functionalities.
Code First: Provide a programmable way of defining and managing pipelines. API and/or SDK support should be available to programmatically create and access the pipelines.
Cross Cloud Support: Support for data sources, sinks, and services across different cloud services. You should be able to migrate pipelines using the framework for one cloud or on-premises to another cloud environment.
Reusable: Provides common reusable templates that allow for creating jobs in an easy way.
Scalable: Ability to scale workers dynamically or by configuration to handle high performance. The framework should automatically scale the underlying compute in response to changing workloads.
Managed Service: The framework should be deployable on a fully managed cloud service. Provisioning the infrastructure capacity, managing, configuring, and scaling the environment should be managed automatically. Minor version upgrades and patches are automatically updated and support is provided for major version updates.
GUI-based Definition: An intuitive GUI for creating and maintaining the data pipelines will be useful. The job runs and logs from execution should be accessible through a job monitoring and management portal.
Security: Out-of-the-box integration with an enterprise-level IAM tool for authentication and role-based access control.
A High-level Overview of the Framework
The data platform framework provides the base foundation upon which you can build specific accelerators or tools for data integration and data quality/validation use cases.
Blueprint
While designing the framework, it is important to consider the following points:
- Technology Choice: We recommend a cloud-first approach when it comes to technology. The core of the framework should be deployable on a cloud-managed service that is extensible, flexible, and programmatically manageable.
- Data Processing: Data processing should be based on massively parallel processing solutions that can easily scale as per the requirement in order to support large volumes.
- Orchestration: Scheduling and executing data pipelines requires a scalable and extensible orchestration solution. Go with a managed workflow service that provides a programmable framework, with out-of-box operators for integration, and also allows for adding custom operators as required.
- Component Library: Common data processing functionalities should be made available as components that can be used independently or in addition to other components.
- Pipeline Configuration: A custom DSL-based configuration definition allows for reusability of pipeline logic and provides a simple interface for defining the required steps for execution.
Building Blocks
Here are the building blocks for such a framework:
- Pipeline Template: A DAG template that supports pipeline orchestration for different scenarios. The template can be used to generate data pipelines programmatically during design time, based on user requirements.
- Job Template: A job execution template that supports processing the data using the component library as per user requirements. Common job flow patterns can be supported through built-in templates.
- Component Library: A suite of functionality code for supporting different processing use cases. It consists of components, factories, and utilities.
- Components: The base processing implementations that perform read/write on various data sources, apply transformations, run data validations, and execute utility tasks.
- Factory and Generators: Factory and Generator code helps in abstracting the implementation differences across different technologies.
Accelerate Your Own Data Journey
At GlobalLogic, we are working on a similar approach as part of the Data Platform Accelerator (DPA). Our DPA consists of a suite of micro-accelerators built on top of a platform framework based on cloud PaaS technologies.
We regularly work with our clients to help them with their data journeys. Share your needs with us using the contact form below and we are happy to discuss your next steps.
Top Insights
Best practices for selecting a software engineering partner
SecurityDigital TransformationDevOpsCloudMediaMy Intro to the Amazing Partnership Between the...
Experience DesignPerspectiveCommunicationsMediaTechnologyAdaptive and Intuitive Design: Disrupting Sports Broadcasting
Experience DesignSecurityMobilityDigital TransformationCloudBig Data & AnalyticsMediaLet’s Work Together
Related Content
Edge-Computing Paradigm: Survey and Analysis on Security Threats
The commencement of extensive applications of IoT devices in the world of information technology are generating massive amount of data. The deployment of various IoT devices/sensors within the complex interconnected networks give rise to raw data from sensors, processed and controlled data, decision making data providing intelligent solution etc. IoT provide a common platform (called IoT cloud) for all the networks and devices connected to those networks so that the analytics can be performed on data and valuable information can be extracted.
Learn More
Automation of Mobile Application Stress Scenarios for Performance Engineering
In the healthcare industry where medical insurance providers are competing with each other to acquire more and more customers, evaluating customers' application to assign a risk level is of prime importance. This helps in formulating the policies and the premium that a customer needs to pay. In order to work on this the insurance companies must share their data which is highly susceptible of being stolen and misused against them by their corporate rivals.
Learn More
Enterprise GenAI: The Time to Focus on High-ROI Use Cases is Now
In the relentless pursuit of digital transformation, enterprises are constantly seeking innovative avenues to maintain a competitive edge. Generative Artificial Intelligence (GenAI) stands out as one of the most promising frontiers in this quest. Unlike traditional AI, which primarily focuses on data analysis and interpretation, GenAI has the unique ability to generate new, original content, ideas, and solutions, making it an indispensable tool for businesses across various sectors.
Learn More
DevOps for Customer First Strategy
In the healthcare industry where medical insurance providers are competing with each other to acquire more and more customers, evaluating customers' application to assign a risk level is of prime importance. This helps in formulating the policies and the premium that a customer needs to pay. In order to work on this the insurance companies must share their data which is highly susceptible of being stolen and misused against them by their corporate rivals.
Learn More
Master the skills of QAOps
Recently, the IT world has been experiencing an explosion of different terms related to operations. The good old days—when the global order was defined around a rule of thumb and IT as separate from business—are gone, never to return. Dozens of ‘Ops’ crowded the sphere of software testing: starting with trendy DevOps.
Learn More
The rise of digital cognitive behavioral therapy
In today’s world, more and more people are struggling with depression, anxiety, addiction and a whole range of similar mental health problems. In most of the cases, people are not even aware of the fact that they are fighting with some kind of mental illness. Managing these problems is not an easy task and ignoring these problems calls for unwanted actions and severe consequences, but fortunately we have Cognitive behavioral therapy (CBT) to help people manage their problems by making simple changes in the way they think and behave.
Learn More
Share this page:
-
-
-
-
URL copied!