Data Architecture: what is it?
A data architecture translates business needs into data and system requirements trying to manage their flow within a company.
Definition of Data Architecture
A data architecture – describes the structure of an organization's logical and physical assets and data management resources.
It encompasses the models, policies, rules, and standards that govern the collection, storage, disposition, integration, and use of data in organizations.
It is usually managed and organized by a specific professional figure: the data architect.
What is data architecture for?
The goal of any data architecture is to translate business needs into data and system requirements and manage the data and its flow within the enterprise.
Many companies today are looking to modernize their structure as a basis to take full advantage of artificial intelligence and enable digital transformation.
In fact, they fail to achieve their digital transformation and AI goals due to the complexity of the processes rather than the technical complexity.
Principles of Data Architecture
According to Joshua Klahr, vice president of product management at Splunk and formerly at AtScale, six principles form the basis of modern data architecture:
- Data is a shared resource.
A modern data architecture must eliminate departmental data silos and give all stakeholders a complete view of the business.
- Users require adequate access to data.
In addition to breaking down silos, modern data architectures must provide interfaces that allow users to easily consume data using tools suited to their work.
- Security is essential.
Modern data architectures must be designed for security and must support data policies and access controls directly on raw data.
- Common vocabularies ensure a common understanding.
Shared data assets, such as product catalogs, fiscal calendar sizes, and KPI definitions, require a common vocabulary to avoid controversy during analysis.
- The data should be taken care of.
Invest in core functions that perform data curation (modeling important relationships, cleaning raw data, and caring for key dimensions and measurements).
- Data streams should be optimized for agility.
Reduce the number of times data needs to be moved to reduce costs, increase data refresh, and maximize business agility.
How to organize the architecture
A modern data architecture should consist of the following components:
Data arhitecture, some best practices
Modern data architectures must be designed to take advantage of emerging technologies such as artificial intelligence (AI), automation, the Internet of Things (IoT), and blockchain.
A data architecture to be functional should follow these best practices:
- Cloud native.
Modern data architectures should be designed to support elastic scalability, high availability, end-to-end security for moving and at rest data, and scalability of cost and performance.
- Scalable data pipelines.
To take advantage of emerging technologies, data architectures should support real-time data streaming and micro-batch data bursts.
- Seamless integration of data with legacy applications using standard API interfaces.
- Enable real-time data.
That is, support the ability to implement automated and active data validation, classification, management, and governance.