This glossary contains definitions related to customer data analytics, predictive analytics, data visualization and operational business intelligence. Some definitions explain the meaning of words used to Hadoop and other software tools used in big data analytics. Other definitions are related to the strategies that business intelligence professionals, data scientists, statisticians and data analysts use to make data-driven decisions.
Data and data management terms related to data :
{including definitions about data warehousing and words and phrases about data management}.
Algorithms
Terms related to procedures or formulas for solving a problem by conducting a sequence of specified actions. In computing, algorithms in the form of mathematical instructions play an important part in search, artificial intelligence (AI) and machine learning.
Artificial intelligence
Terms related to artificial intelligence (AI), including definitions about machine learning and words and phrases about training data, algorithms, natural language processing, neural networks and automation.
- What is data monetization?Data monetization is the process of measuring the economic benefit of corporate data.
- natural language generation (NLG)Natural language generation (NLG) is the use of artificial intelligence (AI) programming to produce written or spoken narratives from a data set.
- What is unsupervised learning?Unsupervised learning is a type of machine learning (ML) technique that uses artificial intelligence (AI) algorithms to identify patterns in data sets that are neither classified nor labeled.
Data and data management
Terms related to data, including definitions about data warehousing and words and phrases about data management.
- What is Structured Query Language (SQL)?Structured Query Language (SQL) is a standardized programming language that is used to manage relational databases and perform various operations on the data in them.
- What is employee self-service (ESS)?Employee self-service (ESS) is a widely used human resources technology that enables employees to perform many job-related functions that were once largely paper-based, or otherwise maintained by management, administrative or HR staff.
- What is data validation?Data validation is the practice of checking the integrity, accuracy and structure of data before it is used for or by one or more business operations.
Database management
Terms related to databases, including definitions about relational databases and words and phrases about database management.
- What is Structured Query Language (SQL)?Structured Query Language (SQL) is a standardized programming language that is used to manage relational databases and perform various operations on the data in them.
- What is employee self-service (ESS)?Employee self-service (ESS) is a widely used human resources technology that enables employees to perform many job-related functions that were once largely paper-based, or otherwise maintained by management, administrative or HR staff.
- What is customer segmentation?Customer segmentation is the practice of dividing a customer base into groups of individuals that have similar characteristics relevant to marketing, such as age, gender, interests and spending habits.
- 3 V's (volume, velocity and variety) - The 3 V's (volume, velocity and variety) are three defining properties or dimensions of big data.
- 3-tier application architecture - A 3-tier application architecture is a modular client-server architecture that consists of a presentation tier, an application tier and a data tier.
- 5V's of big data - The 5 V's of big data -- velocity, volume, value, variety and veracity -- are the five main and innate characteristics of big data.
- 99.999 (Five nines or Five 9s) - In computers, 99.
- ACID (atomicity, consistency, isolation, and durability) - In transaction processing, ACID (atomicity, consistency, isolation, and durability) is an acronym and mnemonic device used to refer to the four essential properties a transaction should possess to ensure the integrity and reliability of the data involved in the transaction.
- actionable intelligence - Actionable intelligence is information that can be immediately used or acted upon -- either tactically in direct response to an evolving situation, or strategically as the result of an analysis or assessment.
- address space - Address space is the amount of memory allocated for all possible addresses for a computational entity -- for example, a device, a file, a server or a networked computer.
- Allscripts - Allscripts is a vendor of electronic health record systems for physician practices, hospitals and healthcare systems.
- alternate data stream (ADS) - An alternate data stream (ADS) is a feature of Windows New Technology File System (NTFS) that contains metadata for locating a specific file by author or title.
- Amazon Simple Storage Service (Amazon S3) - Amazon Simple Storage Service (Amazon S3) is a scalable, high-speed, web-based cloud storage service.
- Anaplan - Anaplan is a web-based enterprise platform for business planning.
- Apache Solr - Apache Solr is an open source search platform built upon a Java library called Lucene.
- Apple User Enrollment - Apple User Enrollment (UE) is a form of mobile device management (MDM) for Apple products that supports iOS 13 and macOS Catalina.
- atomic data - In a data warehouse, atomic data is the lowest level of detail.
- availability bias - In psychology, the availability bias is the human tendency to rely on information that comes readily to mind when evaluating situations or making decisions.
- Azure Data Studio (formerly SQL Operations Studio) - Azure Data Studio is a Microsoft tool, originally named SQL Operations Studio, for managing SQL Server databases and cloud-based Azure SQL Database and Azure SQL Data Warehouse systems.
- big data - Big data is a combination of structured, semi-structured and unstructured data that organizations collect, analyze and mine for information and insights.
- big data analytics - Big data analytics is the often complex process of examining big data to uncover information -- such as hidden patterns, correlations, market trends and customer preferences -- that can help organizations make informed business decisions.
- big data as a service (BDaaS) - Big data as a service (BDaS) is the delivery of data platforms and tools by a cloud provider to help organizations process, manage and analyze large data sets so they can generate insights to improve business operations and gain a competitive advantage.
- big data engineer - A big data engineer is an information technology (IT) professional who is responsible for designing, building, testing and maintaining complex data processing systems that work with large data sets.
- big data management - Big data management is the organization, administration and governance of large volumes of both structured and unstructured data.
- big data storage - Big data storage is a compute-and-storage architecture that collects and manages large data sets and enables real-time data analytics.
- block diagram - A block diagram is a visual representation of a system that uses simple, labeled blocks that represent single or multiple items, entities or concepts, connected by lines to show relationships between them.
- blockchain storage - Blockchain storage is a way of saving data in a decentralized network, which utilizes the unused hard disk space of users across the world to store files.
- box plot - A box plot is a graphical rendition of statistical data based on the minimum, first quartile, median, third quartile, and maximum.
- brontobyte - A brontobyte is an unofficial measure of memory or data storage that is equal to 10 to the 27th power of bytes.
- business analytics - Business analytics (BA) is a set of disciplines and technologies for solving business problems using data analysis, statistical models and other quantitative methods.
- business continuity - Business continuity is an organization's ability to maintain critical business functions during and after a disaster has occurred.
- business intelligence dashboard - A business intelligence dashboard, or BI dashboard, is a data visualization and analysis tool that displays on one screen the status of key performance indicators (KPIs) and other important business metrics and data points for an organization, department, team or process.
- capacity management - Capacity management is the broad term describing a variety of IT monitoring, administration and planning actions that ensure that a computing infrastructure has adequate resources to handle current data processing requirements, as well as the capacity to accommodate future loads.
- chatbot - A chatbot is a software or computer program that simulates human conversation or "chatter" through text or voice interactions.
- chief data officer (CDO) - A chief data officer (CDO) in many organizations is a C-level executive whose position has evolved into a range of strategic data management responsibilities related to the business to derive maximum value from the data available to the enterprise.
- CICS (Customer Information Control System) - CICS (Customer Information Control System) is middleware that sits between the z/OS IBM mainframe operating system and business applications.
- clickstream data (clickstream analytics) - Clickstream data and clickstream analytics are the processes involved in collecting, analyzing and reporting aggregate data about which pages a website visitor visits -- and in what order.
- clinical data analyst - A clinical data analyst -- also referred to as a 'healthcare data analyst' -- is a healthcare information professional who verifies the validity of scientific experiments and data gathered from research.
- clinical decision support system (CDSS) - A clinical decision support system (CDSS) is an application that analyzes data to help healthcare providers make decisions and improve patient care.
- cloud audit - A cloud audit is an assessment of a cloud computing environment and its services, based on a specific set of controls and best practices.
- Cloud Data Management Interface (CDMI) - The Cloud Data Management Interface (CDMI) is an international standard that defines a functional interface that applications use to create, retrieve, update and delete data elements from cloud storage.
- cloud SLA (cloud service-level agreement) - A cloud SLA (cloud service-level agreement) is an agreement between a cloud service provider and a customer that ensures a minimum level of service is maintained.
- cloud storage - Cloud storage is a service model in which data is transmitted and stored on remote storage systems, where it is maintained, managed, backed up and made available to users over a network (typically the internet).
- cloud storage API - A cloud storage API is an application programming interface that connects a locally based application to a cloud-based storage system so that a user can send data to it and access and work with data stored in it.
- cloud storage service - A cloud storage service is a business that maintains and manages its customers' data and makes that data accessible over a network, usually the internet.
- cluster quorum disk - A cluster quorum disk is the storage medium on which the configuration database is stored for a cluster computing network.
- cold backup (offline backup) - A cold backup is a backup of an offline database.
- complex event processing (CEP) - Complex event processing (CEP) is the use of technology to predict high-level events.
- compliance as a service (CaaS) - Compliance as a service (CaaS) is a cloud service that specifies how a managed service provider (MSP) helps an organization meet its regulatory compliance mandates.
- conflict-free replicated data type (CRDT) - A conflict-free replicated data type (CRDT) is a data structure that lets multiple people or applications make changes to the same piece of data.
- conformed dimension - In data warehousing, a conformed dimension is a dimension that has the same meaning to every fact with which it relates.
- consumer data - Consumer data is the information that organizations collect from individuals who use internet-connected platforms, including websites, social media networks, mobile apps, text messaging apps or email systems.
- containers (container-based virtualization or containerization) - Containers are a type of software that can virtually package and isolate applications for deployment.
- content personalization - Content personalization is a branding and marketing strategy in which webpages, email and other forms of content are tailored to match the characteristics, preferences or behaviors of individual users.
- Continuity of Care Document (CCD) - A Continuity of Care Document (CCD) is an electronic, patient-specific document detailing a patient's medical history.
- Continuity of Care Record (CCR) - The Continuity of Care Record, or CCR, provides a standardized way to create electronic snapshots about a patient's health information.
- core banking system - A core banking system is the software that banks use to manage their most critical processes, such as customer accounts, transactions and risk management.
- correlation - Correlation is a statistical measure that indicates the extent to which two or more variables fluctuate in relation to each other.
- correlation coefficient - A correlation coefficient is a statistical measure of the degree to which changes to the value of one variable predict change to the value of another.
- CRM (customer relationship management) analytics - CRM (customer relationship management) analytics comprises all of the programming that analyzes data about customers and presents it to an organization to help facilitate and streamline better business decisions.
- CRUD cycle (Create, Read, Update and Delete Cycle) - The CRUD cycle describes the elemental functions of a persistent database in a computer.
- cryptographic nonce - A nonce is a random or semi-random number that is generated for a specific use.
- curation - Curation is a field of endeavor involved with assembling, managing and presenting some type of collection.
- Current Procedural Terminology (CPT) code - Current Procedural Terminology (CPT) is a medical code set that enables physicians and other healthcare providers to describe and report the medical, surgical, and diagnostic procedures and services they perform to government and private payers, researchers and other interested parties.
- customer data integration (CDI) - Customer data integration (CDI) is the process of defining, consolidating and managing customer information across an organization's business units and systems to achieve a "single version of the truth" for customer data.
- customer intelligence (CI) - Customer intelligence (CI) is the process of collecting and analyzing detailed customer data from internal and external sources to gain insights about customer needs, motivations and behaviors.
- dark data - Dark data is digital information an organization collects, processes and stores that is not currently being used for business purposes.
- data - In computing, data is information that has been translated into a form that is efficient for movement or processing.
- data abstraction - Data abstraction is the reduction of a particular body of data to a simplified representation of the whole.
- data activation - Data activation is a marketing approach that uses consumer information and data analytics to help companies gain real-time insight into target audience behavior and plan for future marketing initiatives.
- data aggregation - Data aggregation is any process whereby data is gathered and expressed in a summary form.
- data analytics (DA) - Data analytics (DA) is the process of examining data sets to find trends and draw conclusions about the information they contain.
- data anonymization - Data anonymization describes various techniques to remove or block data containing personally identifiable information (PII).
- Data as a Service (DaaS) - Data as a Service (DaaS) is an information provision and distribution model in which data files (including text, images, sounds, and videos) are made available to customers over a network, typically the Internet.
- data availability - Data availability is a term used by computer storage manufacturers and storage service providers to describe how data should be available at a required level of performance in situations ranging from normal through disastrous.
- data breach - A data breach is a cyber attack in which sensitive, confidential or otherwise protected data has been accessed or disclosed in an unauthorized fashion.
- data catalog - A data catalog is a software application that creates an inventory of an organization's data assets to help data professionals and business users find relevant data for analytics uses.
- data center chiller - A data center chiller is a cooling system used in a data center to remove heat from one element and deposit it into another element.
- data center services - Data center services provide the supporting components necessary to the proper operation of a data center.
- data citizen - A data citizen is an employee who relies on data to make decisions and perform job responsibilities.
- data classification - Data classification is the process of organizing data into categories that make it easy to retrieve, sort and store for future use.
- data clean room - A data clean room is a technology service that helps content platforms keep first person user data private when interacting with advertising providers.
- data cleansing (data cleaning, data scrubbing) - Data cleansing, also referred to as data cleaning or data scrubbing, is the process of fixing incorrect, incomplete, duplicate or otherwise erroneous data in a data set.
- data collection - Data collection is the process of gathering data for use in business decision-making, strategic planning, research and other purposes.
- data curation - Data curation is the process of creating, organizing and maintaining data sets so they can be accessed and used by people looking for information.
- data de-identification - Data de-identification is decoupling or masking data, to prevent certain data elements from being associated with the individual.
- data destruction - Data destruction is the process of destroying data stored on tapes, hard disks and other forms of electronic media so that it's completely unreadable and can't be accessed or used for unauthorized purposes.
- data dignity - Data dignity, also known as data as labor, is a theory positing that people should be compensated for the data they have created.
- Data Dredging (data fishing) - Data dredging -- sometimes referred to as data fishing -- is a data mining practice in which large data volumes are analyzed to find any possible relationships between them.
- data engineer - A data engineer is an IT professional whose primary job is to prepare data for analytical or operational uses.
- data exploration - Data exploration is the first step in data analysis involving the use of data visualization tools and statistical techniques to uncover data set characteristics and initial patterns.
- data feed - A data feed is an ongoing stream of structured data that provides users with updates of current information from one or more sources.
- data governance policy - A data governance policy is a documented set of guidelines for ensuring that an organization's data and information assets are managed consistently and used properly.
- data gravity - Data gravity is the ability of a body of data to attract applications, services and other data.
- data historian - A data historian is a software program that records the data created by processes running in a computer system.
- data in motion - Data in motion, also referred to as data in transit or data in flight, is a process in which digital information is transported between locations either within or between computer systems.
- data in use - Data in use is data that is currently being updated, processed, accessed and read by a system.
- data ingestion - Data ingestion is the process of obtaining and importing data for immediate use or storage in a database.
- data integration - Data integration is the process of combining data from multiple source systems to create unified sets of information for both operational and analytical uses.
- data integrity - Data integrity is the assurance that digital information is uncorrupted and can only be accessed or modified by those authorized to do so.
- data lake - A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed for analytics applications.
- data lakehouse - A data lakehouse is a data management architecture that combines the key features and the benefits of a data lake and a data warehouse.
- data lifecycle management (DLM) - Data lifecycle management (DLM) is a policy-based approach to managing the flow of an information system's data throughout its lifecycle: from creation and initial storage to when it becomes obsolete and is deleted.
- data literacy - Data literacy is the ability to derive meaningful information from data, just as literacy in general is the ability to derive information from the written word.
- data loss - Data loss is the intentional or unintentional destruction of information.
- data management as a service (DMaaS) - Data management as a service (DMaaS) is a type of cloud service that provides enterprises with centralized storage for disparate data sources.
- data management platform (DMP) - A data management platform (DMP), also referred to as a unified data management platform (UDMP), is a centralized system for collecting and analyzing large sets of data originating from disparate sources.
- data marketplace (data market) - A data marketplace, or data market, is an online store where people can buy data.
- data mart (datamart) - A data mart is a repository of data that is designed to serve a particular community of knowledge workers.
- data masking - Data masking is a method of creating a structurally similar but inauthentic version of an organization's data that can be used for purposes such as software testing and user training.
- data mesh - Data mesh is a decentralized data management architecture for analytics and data science.
- data migration - Data migration is the process of transferring data between data storage systems, data formats or computer systems.
- data minimization - Data minimization aims to reduce the amount of collected data to only include necessary information for a specific purpose.
- data mining - Data mining is the process of sorting through large data sets to identify patterns and relationships that can help solve business problems through data analysis.
- data modeling - Data modeling is the process of creating a simplified visual diagram of a software system and the data elements it contains, using text and symbols to represent the data and how it flows.
- data observability - Data observability is a process and set of practices that aim to help data teams understand the overall health of the data in their organization's IT systems.
- data pipeline - A data pipeline is a set of network connections and processing steps that moves data from a source system to a target location and transforms it for planned business uses.
- data portability - Data portability is the ability to move data among different application programs, computing environments or cloud services.
- data preprocessing - Data preprocessing, a component of data preparation, describes any type of processing performed on raw data to prepare it for another data processing procedure.
- data processing - Data processing refers to essential operations executed on raw data to transform the information into a useful format or structure that provides valuable insights to a user or organization.
- data profiling - Data profiling refers to the process of examining, analyzing, reviewing and summarizing data sets to gain insight into the quality of data.
- data protection as a service (DPaaS) - Data protection as a service (DPaaS) involves managed services that safeguard an organization's data.
- data protection authorities - Data protection authorities (DPAs) are public authorities responsible for enforcing data protection laws and regulations within a specific jurisdiction.
- data protection management (DPM) - Data protection management (DPM) is the administration, monitoring and management of backup processes to ensure backup tasks run on schedule and data is securely backed up and recoverable.
- data quality - Data quality is a measure of a data set's condition based on factors such as accuracy, completeness, consistency, reliability and validity.
- data residency - Data residency refers to the physical or geographic location of an organization's data or information.
- data retention policy - In business settings, data retention is a concept that encompasses all processes for storing and preserving data, as well as the specific time periods and policies businesses enforce that determine how and for how long data should be retained.
- data science as a service (DSaaS) - Data science as a service (DSaaS) is a form of outsourcing that involves the delivery of information gleaned from advanced analytics applications run by data scientists at an outside company to corporate clients for their business use.
- data scientist - A data scientist is an analytics professional who is responsible for collecting, analyzing and interpreting data to help drive decision-making in an organization.
- data set - A data set, also spelled 'dataset,' is a collection of related data that's usually organized in a standardized format.
- data source name (DSN) - A data source name (DSN) is a data structure containing information about a specific database to which an Open Database Connectivity (ODBC) driver needs to connect.
- data splitting - Data splitting is when data is divided into two or more subsets.
- data stewardship - Data stewardship is the management and oversight of an organization's data assets to help provide business users with high-quality data that is easily accessible in a consistent manner.
- data storytelling - Data storytelling is the process of translating data analyses into understandable terms in order to influence a business decision or action.
- data streaming - Data streaming is the continuous transfer of data from one or more sources at a steady, high speed for processing into specific outputs.
- data structure - A data structure is a specialized format for organizing, processing, retrieving and storing data.
- Data Transfer Project (DTP) - Data Transfer Project (DTP) is an open source initiative to facilitate customer-controlled data transfers between two online services.
- data transformation - Data transformation is the process of converting data from one format, such as a database file, XML document or Excel spreadsheet, into another.
- data virtualization - Data virtualization is an umbrella term used to describe an approach to data management that allows an application to retrieve and manipulate data without requiring technical details about the data.
- data warehouse - A data warehouse is a repository of data from an organization's operational systems and other sources that supports analytics applications to help drive business decision-making.
- data warehouse appliance - A data warehouse appliance is an all-in-one “black box” solution optimized for data warehousing.
- data warehouse as a service (DWaaS) - Data warehouse as a service (DWaaS) is an outsourcing model in which a cloud service provider configures and manages the hardware and software resources a data warehouse requires, and the customer provides the data and pays for the managed service.
- database (DB) - A database is a collection of information that is organized so that it can be easily accessed, managed and updated.
- database management system (DBMS) - A database management system (DBMS) is a software system for creating and managing databases.
- database marketing - Database marketing is a systematic approach to the gathering, consolidation and processing of consumer data.
- database normalization - Database normalization is intrinsic to most relational database schemes.
- database replication - Database replication is the frequent electronic copying of data from a database in one computer or server to a database in another -- so that all users share the same level of information.
- DataOps - DataOps is an Agile approach to designing, implementing and maintaining a distributed data architecture that will support a wide range of open source tools and frameworks in production.
- Db2 - Db2 is a family of database management system (DBMS) products from IBM that serve a number of different operating system (OS) platforms.
- decision-making process - A decision-making process is a series of steps one or more individuals take to determine the best option or course of action to address a specific problem or situation.
- deep analytics - Deep analytics is the application of sophisticated data processing techniques to yield information from large and typically multi-source data sets comprised of both unstructured and semi-structured data.
- demand planning - Demand planning is the process of forecasting the demand for a product or service so it can be produced and delivered more efficiently and to the satisfaction of customers.
- descriptive analytics - Descriptive analytics is a type of data analytics that looks at past data to give an account of what has happened.
- deterministic/probabilistic data - Deterministic and probabilistic are opposing terms that can be used to describe customer data and how it is collected.
- digital twin - A digital twin is a virtual representation of a real-world entity or process.
- digital wallet - In general, a digital wallet is a software application, usually for a smartphone, that serves as an electronic version of a physical wallet.
- dimension - In data warehousing, a dimension is a collection of reference information that supports a measurable event, such as a customer transaction.
- dimension table - In data warehousing, a dimension table is a database table that stores attributes describing the facts in a fact table.
- dimensionality reduction - Dimensionality reduction is a process and technique to reduce the number of dimensions -- or features -- in a data set.
- disambiguation - Disambiguation is the process of determining a word's meaning -- or sense -- within its specific context.
- disaster recovery (DR) - Disaster recovery (DR) is an organization's ability to respond to and recover from an event that negatively affects business operations.
- distributed database - A distributed database is a database that consists of two or more files located in different sites either on the same network or on entirely different networks.
- distributed ledger technology (DLT) - Distributed ledger technology (DLT) is a digital system for recording the transaction of assets in which the transactions and their details are recorded in multiple places at the same time.
- document - A document is a form of information that might be useful to a user or set of users.
- Dublin Core - Dublin Core is an international metadata standard formally known as the Dublin Core Metadata Element Set and includes 15 metadata (data that describes data) terms.
- ebXML (Electronic Business XML) - EbXML (Electronic Business XML or e-business XML) is a project to use the Extensible Markup Language (XML) to standardize the secure exchange of business data.
- Eclipse (Eclipse Foundation) - Eclipse is a free, Java-based development platform known for its plugins that allow developers to develop and test code written in other programming languages.
- edge analytics - Edge analytics is an approach to data collection and analysis in which an automated analytical computation is performed on data at a sensor, network switch or other device instead of waiting for the data to be sent back to a centralized data store.
- empirical analysis - Empirical analysis is an evidence-based approach to the study and interpretation of information.
- empiricism - Empiricism is a philosophical theory applicable in many disciplines, including science and software development, that human knowledge comes predominantly from experiences gathered through the five senses.
- encoding and decoding - Encoding and decoding are used in many forms of communications, including computing, data communications, programming, digital electronics and human communications.
- encryption key management - Encryption key management is the practice of generating, organizing, protecting, storing, backing up and distributing encryption keys.
- enterprise search - Enterprise search is a type of software that lets users find data spread across organizations' internal repositories, such as content management systems, knowledge bases and customer relationship management (CRM) systems.
- entity - An entity is a single thing with a distinct separate existence.
- entity relationship diagram (ERD) - An entity relationship diagram (ERD), also known as an 'entity relationship model,' is a graphical representation that depicts relationships among people, objects, places, concepts or events in an information technology (IT) system.
- Epic Systems - Epic Systems, also known simply as Epic, is one of the largest providers of health information technology, used primarily by large U.
- erasure coding (EC) - Erasure coding (EC) is a method of data protection in which data is broken into fragments, expanded and encoded with redundant data pieces, and stored across a set of different locations or storage media.
- exabyte (EB) - An exabyte (EB) is a large unit of computer data storage, two to the sixtieth power bytes.
- Excel - Excel is a spreadsheet program from Microsoft and a component of its Office product group for business applications.
- explainable AI - Explainable AI (XAI) is artificial intelligence (AI) that's programmed to describe its purpose, rationale and decision-making process in a way that the average person can understand.
- exponential function - An exponential function is a mathematical function used to calculate the exponential growth or decay of a given set of data.
- extension - An extension typically refers to a file name extension.
- Extract, Load, Transform (ELT) - Extract, Load, Transform (ELT) is a data integration process for transferring raw data from a source server to a data system (such as a data warehouse or data lake) on a target server and then preparing the information for downstream uses.
- facial recognition - Facial recognition is a category of biometric software that maps an individual's facial features to confirm their identity.
- fact table - In data warehousing, a fact table is a database table in a dimensional model.
- failover - Failover is a backup operational mode in which the functions of a system component are assumed by a secondary component when the primary becomes unavailable.
- file extension (file format) - In computing, a file extension is a suffix added to the name of a file to indicate the file's layout, in terms of how the data within the file is organized.
- file synchronization (file sync) - File synchronization (file sync) is a method of keeping files that are stored in several different physical locations up to date.
- firmographic data - Firmographic data is types of information that can be used to categorize organizations, such as location, name, number of clients, industry and so on.
- FIX protocol (Financial Information Exchange protocol) - The Financial Information Exchange (FIX) protocol is an open specification intended to streamline electronic communications in the financial securities industry.
- foreign key - A foreign key is a column or columns of data in one table that refers to the unique data values -- often the primary key data -- in another table.
- framework - In general, a framework is a real or conceptual structure intended to serve as a support or guide for the building of something that expands the structure into something useful.
- garbage in, garbage out (GIGO) - Garbage in, garbage out, or GIGO, refers to the idea that in any system, the quality of output is determined by the quality of the input.
- Google BigQuery - Google BigQuery is a cloud-based big data analytics web service for processing very large read-only data sets.
- Google Cloud Storage - Google Cloud Storage is an enterprise public cloud storage platform that can house large unstructured data sets.
- GPS coordinates - GPS coordinates are a unique identifier of a precise geographic location on the earth, usually expressed in alphanumeric characters.
- gradient descent - Gradient descent is an optimization algorithm that refines a machine learning (ML) model's parameters to create a more accurate model.
- Gramm-Leach-Bliley Act (GLBA) - The Gramm-Leach-Bliley Act (GLB Act or GLBA), also known as the Financial Modernization Act of 1999, is a federal law enacted in the United States to control the ways financial institutions deal with the private information of individuals.
- grid computing - Grid computing is a system for connecting a large number of computer nodes into a distributed architecture that delivers the compute resources necessary to solve complex problems.
- gzip (GNU zip) - Gzip (GNU zip) is a free and open source algorithm for file compression.
- Hadoop - Hadoop is an open source distributed processing framework that manages data processing and storage for big data applications in scalable clusters of computer servers.
- Hadoop data lake - A Hadoop data lake is a data management platform comprising one or more Hadoop clusters.
- Hadoop Distributed File System (HDFS) - The Hadoop Distributed File System (HDFS) is the primary data storage system Hadoop applications use.
- hashing - Hashing is the process of transforming any given key or a string of characters into another value.
- health informatics - Health informatics is the practice of acquiring, studying and managing health data and applying medical concepts in conjunction with health information technology systems to help clinicians provide better healthcare.
- Health IT (health information technology) - Health IT (health information technology) is the area of IT involving the design, development, creation, use and maintenance of information systems for the healthcare industry.
- heartbeat (computing) - In computing, a heartbeat is a program that runs specialized scripts automatically whenever a system is initialized or rebooted.
- heat map (heatmap) - A heat map is a two-dimensional representation of data in which various values are represented by colors.
- hierarchy - Generally speaking, hierarchy refers to an organizational structure in which items are ranked in a specific manner, usually according to levels of importance.
- histogram - A histogram is a type of chart that shows the frequency distribution of data points across a continuous range of numerical values.
- historical data - Historical data, in a broad context, is data collected about past events and circumstances pertaining to a particular subject.
- IBM IMS (Information Management System) - IBM IMS (Information Management System) is a database and transaction management system that was first introduced by IBM in 1968.
- ICD-10-CM (Clinical Modification) - The ICD-10-CM (International Classification of Diseases, 10th Revision, Clinical Modification) is a system used by physicians and other healthcare providers to classify and code all diagnoses, symptoms and procedures related to inpatient and outpatient medical care in the United States.
- IDoc (intermediate document) - IDoc (intermediate document) is a standard data structure used in SAP applications to transfer data to and from SAP system applications and external systems.
- in-memory analytics - In-memory analytics is an approach to querying data residing in a computer's random access memory (RAM) as opposed to querying data stored on physical drives.
- in-memory database - An in-memory database is a type of analytic database designed to streamline the work involved in processing queries.
- inductive argument - An inductive argument is an assertion that uses specific premises or observations to make a broader generalization.
- infographic - An infographic (information graphic) is a representation of information in a graphic format designed to make the data easily understandable at a glance.
- information - Information is the output that results from analyzing, contextualizing, structuring, interpreting or in other ways processing data.
- information asset - An information asset is a collection of knowledge or data that is organized, managed and valuable.
- information assurance (IA) - Information assurance (IA) is the practice of protecting physical and digital information and the systems that support the information.
- information governance - Information governance is a holistic approach to managing corporate information by implementing processes, roles, controls and metrics that treat information as a valuable business asset.
- information lifecycle management (ILM) - Information lifecycle management (ILM) is a comprehensive approach to managing an organization's data and associated metadata, starting with its creation and acquisition through when it becomes obsolete and is deleted.
- information rights management (IRM) - Information rights management (IRM) is a discipline that involves managing, controlling and securing content from unwanted access.
- information systems (IS) - An information system (IS) is an interconnected set of components used to collect, store, process and transmit data and digital information.
- inline deduplication - Inline deduplication is the removal of redundancies from data before or as it is being written to a backup device.
- IT incident management - IT incident management is a component of IT service management (ITSM) that aims to rapidly restore services to normal following an incident while minimizing adverse effects on the business.
- Java Database Connectivity (JDBC) - Java Database Connectivity (JDBC) is an API packaged with the Java SE edition that makes it possible to connect from a Java Runtime Environment (JRE) to external, relational database systems.
- job - In certain computer operating systems, a job is the unit of work that a computer operator -- or a program called a job scheduler -- gives to the OS.
- job scheduler - A job scheduler is a computer program that enables an enterprise to schedule and, in some cases, monitor computer 'batch' jobs (units of work).
- job step - In certain computer operating systems, a job step is part of a job, a unit of work that a computer operator (or a program called a job scheduler) gives to the operating system.
- JOLAP (Java Online Analytical Processing) - JOLAP (Java Online Analytical Processing) is a Java application-programming interface (API) for the Java 2 Platform, Enterprise Edition (J2EE) environment that supports the creation, storage, access, and management of data in an online analytical processing (OLAP) application.
- key-value pair (KVP) - A key-value pair (KVP) is a set of two linked data items: a key, which is a unique identifier for some item of data, and the value, which is either the data that is identified or a pointer to the location of that data.
- knowledge base - In general, a knowledge base is a centralized repository of information.
- knowledge management (KM) - Knowledge management is the process an enterprise uses to gather, organize, share and analyze its knowledge in a way that's easily accessible to employees.
- knowledge-based systems (KBSes) - Knowledge-based systems (KBSes) are computer programs that use a centralized repository of data known as a knowledge base to provide a method for problem-solving.
- laboratory information system (LIS) - A laboratory information system (LIS) is computer software that processes, stores and manages data from patient medical processes and tests.
- Lambda architecture - Lambda architecture is an approach to big data management that provides access to batch processing and near real-time processing with a hybrid approach.
- legal health record (LHR) - A legal health record (LHR) refers to documentation about a patient's personal health information that is created by a healthcare organization or provider.
- Lisp (programming language) - Lisp, an acronym for list processing, is a functional programming language that was designed for easy manipulation of data strings.
- LTO-8 (Linear Tape-Open 8) - LTO-8, or Linear Tape-Open 8, is a tape format from the Linear Tape-Open Consortium released in late 2017.
- MariaDB - MariaDB is an open source relational database management system (DBMS) that is a compatible drop-in replacement for the widely used MySQL database technology.
- Massachusetts data protection law - What is the Massachusetts data protection law?The Massachusetts data protection law is legislation that stipulates security requirements for organizations that handle the private data of residents.
- master data - Master data is the core data that is essential to operations in a specific business or business unit.
- medical scribe - A medical scribe is a professional who specializes in documenting patient encounters in real time under the direction of a physician.
- metadata - Often referred to as data that describes other data, metadata is structured reference data that helps to sort and identify attributes of the information it describes.
- Microsoft Azure - Microsoft Azure, formerly known as Windows Azure, is Microsoft's public cloud computing platform.
- Microsoft Azure Data Lake - Microsoft Azure Data Lake is a highly scalable public cloud service that allows developers, scientists, business professionals and other Microsoft customers to gain insight from large, complex data sets.
- Microsoft MyAnalytics - Microsoft MyAnalytics is a personal analytics application in Office 365 that enables employees to gain insights into how they spend their time at work and how they can work smarter.
- Microsoft Office SharePoint Server (MOSS) - Microsoft Office SharePoint Server (MOSS) is the full version of a portal-based platform for collaboratively creating, managing and sharing documents and Web services.
- Microsoft Power BI - Microsoft Power BI is a business intelligence (BI) platform that provides nontechnical business users with tools for aggregating, analyzing, visualizing and sharing data.
- Microsoft System Center - Microsoft System Center is a suite of software products designed to simplify the deployment, configuration and management of IT infrastructure and virtualized software-defined data centers.
- Microsoft Visual FoxPro (Microsoft VFP) - Microsoft Visual FoxPro (VFP) is an object-oriented programming environment with a built-in relational database engine.
- middleware - Middleware is software that bridges the gap between applications and operating systems by providing a method for communication and data management.
- Monte Carlo simulation - A Monte Carlo simulation is a mathematical technique that simulates the range of possible outcomes for an uncertain event.
- MPP database (massively parallel processing database) - An MPP database is a database that is optimized to be processed in parallel for many operations to be performed by many processing units at a time.
- multidimensional database (MDB) - A multidimensional database (MDB) is a type of database that is optimized for data warehouse and online analytical processing (OLAP) applications.
- national identity card - A national identity card is a portable document, typically a plasticized card with digitally embedded information, that is used to verify aspects of a person's identity.
- noisy data - Noisy data is a data set that contains extra meaningless data.
- normal distribution - A normal distribution is a type of continuous probability distribution in which most data points cluster toward the middle of the range, while the rest taper off symmetrically toward either extreme.
- NoSQL (Not Only SQL database) - NoSQL is an approach to database management that can accommodate a wide variety of data models, including key-value, document, columnar and graph formats.
- object-oriented database management system (OODBMS) - An object-oriented database management system (OODBMS), sometimes shortened to ODBMS for object database management system, is a database management system (DBMS) that supports the modelling and creation of data as objects.
- OLAP (online analytical processing) - OLAP (online analytical processing) is a computing method that enables users to easily and selectively extract and query data in order to analyze it from different points of view.
- Open Database Connectivity (ODBC) - Open Database Connectivity (ODBC) is an open standard application programming interface (API) that allows application programmers to easily access data stored in a database.
- operational data store (ODS) - An operational data store (ODS) is a type of database that's often used as an interim logical area for a data warehouse.
- operational efficiency - Operational efficiency refers to an organization's ability to reduce waste of time, effort and material while still producing a high-quality service or product.
- operational intelligence (OI) - Operational intelligence (OI) is an approach to data analysis that enables decisions and actions in business operations to be based on real-time data as it's generated or collected by companies.
- Oracle - Oracle is one of the largest vendors in the enterprise IT market and the shorthand name of its flagship product, a relational database management system (RDBMS) that's formally called Oracle Database.
- pandemic plan - A pandemic plan is a documented strategy for business continuity in the event of a widespread outbreak of a dangerous infectious disease.
- parallel file system - A parallel file system is a software component designed to store data across multiple networked servers.
- pebibyte (PiB) - A pebibyte (PiB) is a unit of measure that describes data capacity.
- performance and accountability reporting (PAR) - Performance and accountability reporting (PAR) is the process of compiling and documenting factors that quantify an organization's achievements, efficiency and adherence to budget, comparing actual results against previously articulated goals.
- personal health record (PHR) - A personal health record (PHR) is an electronic summary of health information that a patient maintains control of themselves, as opposed to their healthcare provider.
- picture archiving and communication system (PACS) - Picture archiving and communication system (PACS) is a medical imaging technology used primarily in healthcare organizations to securely store and digitally transmit electronic images and clinically relevant reports.
- pivot table - A pivot table is a statistics tool that summarizes and reorganizes selected columns and rows of data in a spreadsheet or database table to obtain a desired report.
- PL/SQL (procedural language extension to Structured Query Language) - In Oracle database management, PL/SQL is a procedural language extension to Structured Query Language (SQL).
- precision agriculture - Precision agriculture (PA) is a farming management concept based on observing, measuring and responding to inter- and intra-field variability in crops.
- predictive modeling - Predictive modeling is a mathematical process used to predict future events or outcomes by analyzing patterns in a given set of input data.
- primary key (primary keyword) - A primary key, also called a primary keyword, is a column in a relational database table that's distinctive for each record.
- product data management (PDM) - Product data management (PDM) is the process of capturing and managing the electronic information related to a product so it can be reused in business processes such as design, production, distribution and marketing.
- public data - Public data is information that can be shared, used, reused and redistributed without restriction.
- qualitative data - Qualitative data is information that cannot be counted, measured or easily expressed using numbers.
- radiology information system (RIS) - A radiology information system (RIS) is a networked software system for managing medical imagery and associated data.
- raw data (source data or atomic data) - Raw data is the data originally generated by a system, device or operation, and has not been processed or changed in any way.
- RDBMS (relational database management system) - A relational database management system (RDBMS) is a collection of programs and capabilities that enable IT teams and others to create, update, administer and otherwise interact with a relational database.
- real-time analytics - Real-time analytics is the use of data and related resources for analysis as soon as it enters the system.
- record - In computer data processing, a record is a collection of data items arranged for processing by a program.
- records information management (RIM) - Records information management (RIM) is a corporate area of endeavor involving the administration of all business records through their life cycle.
- records retention schedule - A records retention schedule is a policy that defines how long paper and electronic content must be kept and provides disposal guidelines for how those items should be discarded.
- redundancy - Redundancy is a system design in which a component is duplicated so if it fails there will be a backup.
- refactoring - Refactoring is the process of restructuring code, while not changing its original functionality.
- registered health information technician (RHIT) - A registered health information technician (RHIT) is a certified professional who stores and verifies the accuracy and completeness of electronic health records.
- relational database - A relational database is a type of database that organizes data points with defined relationships for easy access.
- Report on Compliance (ROC) - A Report on Compliance (ROC) is a form that must be completed by all Level 1 Visa merchants undergoing a PCI DSS (Payment Card Industry Data Security Standard) audit.
- restore point - A system restore point is a backup copy of important Windows operating system (OS) files and settings that can be used to recover the system to an earlier point of time in the event of system failure or instability.
- RFM analysis (recency, frequency, monetary) - RFM analysis is a marketing technique used to quantitatively rank and group customers based on the recency, frequency and monetary total of their recent transactions to identify the best customers and perform targeted marketing campaigns.
- SAP Basis - Basis is a set of middleware programs and tools from SAP, the German company whose comprehensive R/3 product is used to help manage large corporations.
- SAP BW (Business Warehouse) - SAP Business Warehouse (BW) is a model-driven data warehousing product based on the SAP NetWeaver ABAP platform.
- SAP Data Services - SAP Data Services is a data integration and transformation software application.
- schema - In computer programming, a schema (pronounced SKEE-mah) is the organization or structure for a database, while in artificial intelligence (AI), a schema is a formal expression of an inference rule.
- security information management (SIM) - Security information management (SIM) is the practice of collecting, monitoring and analyzing security-related data from computer logs and various other data sources.
- self-driving car (autonomous car or driverless car) - A self-driving car -- sometimes called an autonomous car or driverless car -- is a vehicle that uses a combination of sensors, cameras, radar and artificial intelligence (AI) to travel between destinations without a human operator.
- self-service analytics - Self-service analytics is a type of business intelligence (BI) that enables business users to access, manipulate, analyze and visualize data, as well as generate reports based on their discoveries.
- self-service business intelligence (self-service BI) - Self-service business intelligence (BI) is an approach to data analytics that enables business users to access and explore data sets even if they don't have a background in BI or related functions such as data mining and statistical analysis.
- semantic network (knowledge graph) - A semantic network is a knowledge structure that depicts how concepts are related to one another and how they interconnect.
- semantic technology - Semantic technology is a set of methods and tools that provide advanced means for categorizing and processing data, as well as for discovering relationships within varied data sets.
- sensitive information - Sensitive information is data that must be protected from unauthorized access to safeguard the privacy or security of an individual or organization.
- SequenceFile - A SequenceFile is a flat, binary file type that serves as a container for data to be used in Hadoop distributed compute projects.
- server-based storage - Server-based storage is a re-emerging class of data storage that removes cost and complexity by housing storage media inside servers rather than in dedicated and custom-engineered storage arrays.
- serverless database - A serverless database is a type of cloud database that is fully managed for an organization by a cloud service provider and runs on demand as needed to support applications.
- SNOMED CT (Systematized Nomenclature of Medicine -- Clinical Terms) - SNOMED CT (Systematized Nomenclature of Medicine -- Clinical Terms) is a standardized, multilingual vocabulary of clinical terminology that is used by physicians and other health care providers for the electronic exchange of health information.
- snowflaking (snowflake schema) - In data warehousing, snowflaking is a form of dimensional modeling in which dimensions are stored in multiple related dimension tables.
- software-defined storage (SDS) - Software-defined storage (SDS) is a software program that manages data storage resources and functionality and has no dependencies on the underlying physical storage hardware.
- spatial data - Spatial data is any type of data that directly or indirectly references a specific geographical area or location.
- spreadsheet - A spreadsheet is a computer program that can capture, display and manipulate data arranged in rows and columns.
- standard business reporting (SBR) - Standard business reporting (SBR) is a group of frameworks adopted by governments to promote standardization in reporting business data.
- star schema - A star schema is a database organizational structure optimized for use in a data warehouse or business intelligence that uses a single large fact table to store transactional or measured data, and one or more smaller dimensional tables that store attributes about the data.
- statistical analysis - Statistical analysis is the collection and interpretation of data in order to uncover patterns and trends.
- storage class memory (SCM) - Storage class memory (SCM) is a type of physical computer memory that combines dynamic random access memory (DRAM), NAND flash memory and a power source for data persistence.
- stored procedure - A stored procedure is a group of statements with a specific name, which are stored inside a database, such as MySQL or Oracle.
- stream processing - Stream processing is a data management technique that involves ingesting a continuous data stream to quickly analyze, filter, transform or enhance the data in real time.
- streaming data architecture - A streaming data architecture is an information technology framework that puts the focus on processing data in motion and treats extract-transform-load (ETL) batch processing as just one more event in a continuous stream of events.
- structured data - Structured data is data that has been organized into a formatted repository, typically a database.
- supply chain planning (SCP) - Supply chain planning (SCP) is the process of anticipating the demand for products and planning their materials and components, production, marketing, distribution and sale.
- support vector machine (SVM) - A support vector machine (SVM) is a type of supervised learning algorithm used in machine learning to solve classification and regression tasks.
- syslog - Syslog is an IETF RFC 5424 standard protocol for computer logging and collection that is popular in Unix-like systems including servers, networking equipment and IoT devices.
- system of record (SOR) - A system of record (SOR) is an information storage and retrieval system that stores valuable data on an organizational system or process.
- System Restore (Windows) - System Restore is a Microsoft Windows utility designed to protect and revert the operating system (OS) to a previous state.
- T-SQL (Transact-SQL) - T-SQL (Transact-SQL) is a set of programming extensions from Sybase and Microsoft that add several features to the Structured Query Language (SQL), including transaction control, exception and error handling, row processing and declared variables.
- table - A table in computer programming is a data structure used to organize information, just as it is on paper.
- taxonomy - Taxonomy is the science of classification according to a predetermined system, with the resulting catalog used to provide a conceptual framework for discussion, analysis or information retrieval.
- text mining (text analytics) - Text mining is the process of exploring and analyzing large amounts of unstructured text data aided by software that can identify concepts, patterns, topics, keywords and other attributes in the data.
- text tagging - Text tagging is the process of manually or automatically adding tags or annotation to various components of unstructured data as one step in the process of preparing such data for analysis.
- timeline - A timeline is a visual representation of a chronological sequence of events along a drawn line that helps a viewer understand time relationships.
- transactional data - In computing, transactional data is the information collected from transactions.
- transcription error - A transcription error is a type of data entry error commonly made by human operators or by optical character recognition (OCR) programs.
- transportation management system (TMS) - A transportation management system (TMS) is specialized software for planning, executing and optimizing the shipment of goods.
- tree structure - A tree data structure is an algorithm for placing and locating files (called records or keys) in a database.
- U-SQL - U-SQL is a Microsoft query language that combines a declarative SQL-like syntax with C# programming, enabling it to be used to process both structured and unstructured data in big data environments.
- unstructured text - The unstructured text collected from social media activities plays a key role in predictive analytics for the enterprise because it is a prime source for sentiment analysis to determine the general attitude of consumers toward a brand or idea.
- user acceptance testing (UAT) - User acceptance testing (UAT), also called application testing or end-user testing, is a phase of software development in which the software is tested in the real world by its intended audience.
- user behavior analytics (UBA) - User behavior analytics (UBA) is the tracking, collecting and assessing of user data and activities using monitoring systems.
- utility storage - Utility storage is a service model in which a provider makes storage capacity available to an individual, organization or business unit on a pay-per-use basis.
- virtual desktop - A virtual desktop is a computer operating system that does not run directly on the endpoint hardware from which a user accesses it.
- volatile memory - Volatile memory is a type of memory that maintains its data only while the device is powered.
- web analytics - Web analytics is the process of analyzing the behavior of visitors to a website by tracking, reviewing and reporting the data generated by their use of the site and its components, such as its webpages, images and videos.
- web services - Web services are a type of internet software that use standardized messaging protocols and are made available from an application service provider's web server for a client or other web-based programs to use.
- WebLogic - Oracle WebLogic Server is a leading e-commerce online transaction processing (OLTP) platform, developed to connect users in distributed computing production environments and to facilitate the integration of mainframe applications with distributed corporate data and applications.
- What are data silos and what problems do they cause? - A data silo is a repository of data that's controlled by one department or business unit and isolated from the rest of an organization, much like grass and grain in a farm silo are closed off from outside elements.
- What are graph neural networks (GNNs)? - Graph neural networks (GNNs) are a type of neural network architecture and deep learning method that can help users analyze graphs, enabling them to make predictions based on the data described by a graph's nodes and edges.
- What is a backpropagation algorithm? - A backpropagation algorithm, or backward propagation of errors, is an algorithm that's used to help train neural network models.
- What is a Consensus Algorithm? - A consensus algorithm is a process in computer science used to achieve agreement on a single data value among distributed processes or systems.
- What is a data architect? - A data architect is an IT professional responsible for defining the policies, procedures, models and technologies used in collecting, organizing, storing and accessing company information.
- What is a data flow diagram (DDF)? - A data flow diagram (DFD) is a graphical or visual representation that uses a standardized set of symbols and notations to describe a business's operations through data movement.
- What is a private cloud? - Private cloud is a type of cloud computing that delivers similar advantages to public cloud, including scalability and self-service, but through a proprietary architecture.
- What is a validation set? How is it different from test, train data sets? - A validation set is a set of data used to train artificial intelligence (AI) with the goal of finding and optimizing the best model to solve a given problem.
- What is a vector database? - A vector database is a type of database technology that's used to store, manage and search vector embeddings, numerical representations of unstructured data that are also referred to simply as vectors.
- What is an API endpoint? - An API endpoint is a point at which an application programming interface -- the code that enables two software programs to communicate with each other -- connects with the software program.
- What is an NVDIMM (non-volatile dual in-line memory module)? - An NVDIMM (non-volatile dual in-line memory module) is hybrid computer memory that retains data during a service outage.
- What is anomaly detection? An overview and explanation - Anomaly detection is the process of identifying data points, entities or events that fall outside the normal range.
- What is bit rot? - Bit rot is the slow deterioration in the performance and integrity of data stored on storage media.
- What is customer segmentation? - Customer segmentation is the practice of dividing a customer base into groups of individuals that have similar characteristics relevant to marketing, such as age, gender, interests and spending habits.
- What is data architecture? A data management blueprint - Data architecture is a discipline that documents an organization's data assets, maps how data flows through IT systems and provides a blueprint for managing data, as this guide explains.
- What is data democratization? - Data democratization makes information in a digital format accessible to the average end user.
- What is data governance and why does it matter? - Data governance is the process of managing the availability, usability, integrity and security of the data in enterprise systems, based on internal standards and policies that also control data usage.
- What is data labeling? - Data labeling is the process of identifying and tagging data samples commonly used in the context of training machine learning (ML) models.
- What is data lifecycle? - A data lifecycle is the sequence of stages that a unit of data goes through from its initial generation or capture to its archiving or deletion at the end of its useful life.
- What is data loss prevention (DLP)? - Data loss prevention (DLP) -- sometimes referred to as 'data leak prevention,' 'information loss prevention' or 'extrusion prevention' -- is a strategy to mitigate threats to critical data.
- What is data management and why is it important? Full guide - Data management is the process of ingesting, storing, organizing and maintaining the data created and collected by an organization, as explained in this in-depth guide.
- What is data preparation? An in-depth guide - Data preparation is the process of gathering, combining, structuring and organizing data for use in business intelligence, analytics and data science applications, as explained in this guide.
- What is data science? The ultimate guide - Data science is the process of using advanced analytics techniques and scientific principles to analyze data and extract valuable information for business decision-making, strategic planning and other uses.
- What is data validation? - Data validation is the practice of checking the integrity, accuracy and structure of data before it is used for or by one or more business operations.
- What is denormalization and how does it work? - Denormalization is the process of adding precomputed redundant data to an otherwise normalized relational database to improve read performance.
- What is electronic data processing (EDP)? - Electronic data processing (EDP) refers to the gathering of data using electronic devices, such as computers, servers and internet of things (IoT) technologies.
- What is employee self-service (ESS)? - Employee self-service (ESS) is a widely used human resources technology that enables employees to perform many job-related functions that were once largely paper-based, or otherwise maintained by management, administrative or HR staff.
- What is enterprise content management? Guide to ECM - Enterprise content management (ECM) is a set of defined processes, strategies and tools that allows a business to effectively obtain, organize, store and deliver critical information to its employees, business stakeholders and customers.
- What is GDPR? Compliance and conditions explained - The General Data Protection Regulation (GDPR) is legislation that updated and unified data privacy laws across the European Union (EU).
- What is master data management (MDM)? - Master data management (MDM) is a process that creates a uniform set of data on customers, products, suppliers and other business entities from different IT systems.
- What is PaaS? Platform as a service definition and guide - Platform as a service (PaaS) is a cloud computing model where a third-party provider delivers hardware and software tools to users over the internet.
- What is sentiment analysis? - Sentiment analysis, also referred to as 'opinion mining,' is an approach to natural language processing (NLP) that identifies the emotional tone behind a body of text.
- What is Software as a Service (SaaS)? - Software as a service (SaaS) is a software distribution model in which a third-party provider hosts applications and makes them available to customers over the Internet.
- What is Structured Query Language (SQL)? - Structured Query Language (SQL) is a standardized programming language that is used to manage relational databases and perform various operations on the data in them.
- What is the Coalition for Secure AI (CoSAI)? - Coalition for Secure AI (CoSAI) is an open source initiative to enhance artificial intelligence's security.
- What is the Driver's Privacy Protection Act (DPPA)? - The Driver's Privacy Protection Act (DPPA) is a United States federal law designed to protect the personally identifiable information of licensed drivers from improper use or disclosure.
- What is transfer learning? - Transfer learning is a machine learning (ML) technique where an already developed ML model is reused in another task.
- wipe - Wipe, in a
- omputing context, means to erase all data on a?hard drive to render it unreadable.
- workload - In computing, a workload is typically any program or application that runs on a computer.
- WORM (write once, read many) - In computer media, write once, read many, or WORM, is a data storage technology that allows data to be written to a storage medium a single time and prevents the data from being erased or modified.
- XML Schema Definition (XSD) - XML Schema Definition or XSD is a recommendation by the World Wide Web Consortium (W3C) to describe and validate the structure and content of an XML document.
- YAML (YAML Ain't Markup Language) - YAML (YAML Ain't Markup Language) is a data serialization language used as the input format for diverse software applications.
- yobibyte (YiB) - A yobibyte (YiB) is a unit of measure used to describe data capacity as part of the binary system of measuring computing and storage capacity.
Experienced Project Manager | Expert in Agile & Traditional Methodologies | Driving Projects to Success on Time & Budget
3 周Master data analytics with AI by understanding key data management terms and techniques. Learn how to craft effective prompts that harness the power of data to drive insights and optimize decision-making.