New Data Economy
New Data Economy: Turning Big Data into Smart Data
- Overview
Massive amounts of data have triggered an artificial intelligence (AI) revolution, turning data into strategic assets that can be used to drive growth and value for countries, societies, and enterprises.
It is helping to improve the lives of citizens and consumers as governments and businesses seek to "responsibly" use data and artificial intelligence to accelerate growth, unlock efficiencies at scale and build new ecosystems.
Data brings new business models, new data-AI-driven products and services, and autonomous insight-to-action cycles, which are building what we call the new economy—the data and AI economy.
The currency in this economy is intelligence derived from all the data collected with the user’s required permissions. Advances such as generative AI will further accelerate the creation of a data and AI economy as it increases productivity and removes barriers to entry.
- Data Science, AI, and ML Work in Harmony
Data science, artificial intelligence (AI), and machine learning (ML) are interrelated disciplines. Data science collects, analyzes and interprets data to gain insights. Meanwhile, AI focuses on creating intelligent systems that imitate human decision-making, while ML, as a subset of AI, enables machines to learn from data.
Data science, AI, and ML complement each other. Data science provides the data and analytics that drive AI and ML. AI uses data from data science to drive decisions, while ML algorithms are improved through data provided by data science.
These three work in harmony: data science extracts meaningful information, ML enhances predictive models, and AI leverages these models to make smart decisions, working together to drive advances in technology and automation.
- Big Data Life Cycle
Big data is an emerging term referring to the process of managing huge amount of data from different sources, such as, DBMS, log files, postings of social media, and sensor data. Big data (text, number, images... etc.) could be divided into different forms: structured, semi-structured, and unstructured.
Big data could be further described by some attributes like velocity, volume, variety, value, and complexity. The emerging big data technologies also raise many security concerns and challenges.
Big data must pass through a series of steps before it generates value. Namely data access, storage, cleaning, and analysis. One approach to solve this problem is to run each stage as a different layer. And use tools available to fit the problem at hand, and scale analytical solutions to big data.
The big data life cycle consists of four stages, namely: Data Acquisition, Data Awareness, Data Analytics and Data Governance.
- Data Acquisition
Data acquisition has been understood as the process of gathering, filtering, and cleaning data before the data is put in a data warehouse or any other storage solution. The acquisition of big data is most commonly governed by four of the Vs: volume, velocity, variety, and value.
Most data acquisition scenarios assume high-volume, high-velocity, high-variety, but low-value data, making it important to have adaptable and time-efficient gathering, filtering, and cleaning algorithms that ensure that only the high-value fragments of the data are actually processed by the data-warehouse analysis.
- Data Awareness
Data Awareness is the task of creating a scheme of relationships within a set of data, to allow different users of the data to determine a fluid yet valid context and utilise it for their desired tasks.
It is a relatively new field, in which most of the work is currently being done on semantic structures to allow data to gain context in an interoperable format, in contrast to the current system where data is given context using unique, model specific constructs. (such as XML Schemes, etc.)
Prior to the Big Data revolution, organizations were inward-looking in terms of data. During this time, data-centric environments like data warehouses dealt only with data created within the enterprise.
But with the advent of data science and predictive analytics, many organizations have come to the realization that enterprise data must be fused with external data to enable and scale a digital business transformation.
This means that processes for identifying, sourcing, understanding, assessing and ingesting such data must be developed.
- The Goals of Data Processing and Analytics
Data Processing largely has three primary goals:
- Determines if the data collected is internally consistent;
- Make the data meaningful to other systems or users using either metaphors or analogy they can understand;
- Provide predictions about future events and behaviours based upon past data and trends (what many consider most importantly) .
Data analytics requires four primary conditions to be met in order to carry out effective processing: fast, data loading, fast query processing, efficient utilisation of storage and adaptivity to dynamic workload patterns.
The analytical model most commonly associated with meeting this criteria and with big data in general is MapReduce, detailed below.
- Data Governance
Data governance is a requirement in today’s fast-moving and highly competitive enterprise environment. Now that organizations have the opportunity to capture massive amounts of diverse internal and external data, they need a discipline to maximize their value, manage risks, and reduce cost.
Data governance is a collection of processes, roles, policies, standards, and metrics that ensure the effective and efficient use of information in enabling an organization to achieve its goals. It establishes the processes and responsibilities that ensure the quality and security of the data used across a business or organization.
Data governance defines who can take what action, upon what data, in what situations, using what methods.
A well-crafted data governance strategy is fundamental for any organization that works with big data, and will explain how your business benefits from consistent, common processes and responsibilities. Business drivers highlight what data needs to be carefully controlled in your data governance strategy and the benefits expected from this effort. This strategy will be the basis of your data governance framework.
Data Governance is the act of managing raw big data as well as the processed information that arises from big data in order to meet legal, regulatory and business imposed requirements. While there is no standardized format for data governance, there have been increasing call with various sectors (especially healthcare) to create such a format to ensure reliable, secure and consistent big data utilisation across the board.
For example, if a business driver for your data governance strategy is to ensure the privacy of healthcare-related data, patient data will need to be managed securely as it flows through your business. Retention requirements (e.g. history of who changed what information and when) will be defined to ensure compliance with relevant government requirements, such as the GDP.
[More to come ...]