Data Storage vs Database
Nowadays, massive amounts of data are generated. It is natural for you to ask where these data go and how they are stored? Well the answer is a storage system.
Data Storage is a concern which deals with where and how you store information in a digital system. We store data in the storage system. Let’s talk about how you store data? Normally you store data in 2 different mechanisms: Datafiles and Databases. So basically database means how you store your file.?
So in Azure cloud storage and databases are 2 different things. Azure Databases include SQL Db, Maria Db, MySQL Db and Cosmos Db. However, remember databases store their db files into the storage system only. Normally a storage account in azure is used to store raw un-structure files. It could be any file including a database file or virtual machine image file. Cosmos DB internally uses Azure page storage account to store it’s db files. Azure SQL database also uses Azure Page storage. In azure cloud storage account is how physically you store data plus they also provide tables, queues.?
Before you go further lets understand File System.?
File System
On a computer you use the Files System to name, store, locate and read your data. It is just a digital equivalent of an organized file cabinet. You put information in a text file, put them in a folder and then put them in a larger folder in your computer.??
You can put anything you want to a particular location. You can store any data including “Unstructured Data” that includes “documents, videos, spreadsheet, picture, music” you name it. File-system does not care what you store. File system reads and writes data into the physical hard-disk of your computer. Any application that you create or use like media player, visual studio, calculator, notepad etc. they are all stored in your machine file system. From the file system they will be written to your? hard disk.?
How to install a File System?
When you install an OS on a computer then a File System is installed on your machine. For Windows OS file system name is NTFS ( New technology file system ), for Linux OS file system name is EXT (Extended File System). You see these file format names when you try to format your pen-drive or hard disk.?
File storage systems are:?
EXT
Extended File System (EXT) developed for Unix and Linux OS in 1992. Maximum file size in ext4 is 16TB.
NTFS?
New Technology File System (NTFS) developed by Microsoft for Windows OS in 1993. NTFS is transactional means it allows files folders to be? recreated, renamed, deleted and many more without affecting others. NTFS is Journal means it stores the metadata and file changes as well. One file could be up to 16 EiB ( 1 EiB = 2^60 ). File and directory names can be up to 255 characters long, including any extensions.?
HFS?
Hierarchical File System (HFS) developed by Apple for MAC OS in 1998. One file size up to 2GB only.
Linux is most compatible with EXT, Windows is with NTFS and FAT and Mac OS with HFS, AHFS.
HDFS
One important example of file system Hadoop Distributed File System (HDFS). It uses massive parallel processing to store big-data.?
What are the types of Datafiles?
What is Database?
Organized collection of data. When you say database you mean both the structure and the design of the data environment as well as data itself. It seeks to store the data in a more complex way than what can be achieved by regular datafile. Databases usually store a number of different data entities with unifying information about how those entities are arranged or related. This enables access to a wider array of? information in one common environment versus storing that information in multiple data files that may or may not be tied together.?
Usually databases are constructed by a database management system (DBMS). DBMS is a software application which is used for creating, maintaining and accessing databases.?
Commonly used databases are called relational databases. We store information in 2 dimension tables and define specific relationships between those tables. E.F. Codd at IBM in 1969 invented a Relational model of data.?
Relational Databases are:
NoSQL Databases: 4 common alternative non-relational databases are:
Note: non-tabular or non-relational databases are also called NoSQL (aka “not only SQL”) databases.
Graph Databases
Based on graph theory. It can work with highly interconnected data. Like relationships between people or locations. Also used by social media applications like Netflix etc.?
Graph Database Examples:
领英推荐
Document Stores
Designed to store and read documents along with key pieces of metadata describing the data. Useful to store unstructured data or different data types in a way that's a little more useful than a typical file system. Example blob storage in azure.?
Document Store Examples:
Columnar Databases
Columnar Databases are storage mechanisms that seek to improve the performance of data-access by focusing on columns vs the row based approach in relational databases.?When we store transactional records in databases row by row.
However, if we are interested in reading or writing data more specific to columns then columnar databases are good for that.?
Columnar databases are useful for Analyzing big-data?
Columnar Databases Examples:
Key-value Stores
They store information in key and value pairs. Uses less memory and high speed. However, it needs more sophisticated programming to manipulate and extract data.?
Examples of Key-value Stores are:
Why are File Storage Systems not good for Data Analysis?
In the regular file system it is unclear?
Why do you want to analyze your data? What exactly does analysis mean? Well suppose you're creating a YouTube system to allow users to upload video files. You want to restrict un-appropriate videos to be uploaded. How would you perform this task? Using file analysis technology like in Azure you can use azure media service to analyze your video file. Similarly lets say you are designing a twitter system and want to stop inappropriate tweets you can use Azure Stream Analytics to do live analysis and understand which are appropriate or inappropriate tweets.?
It turns out that each source usually has their storage system to hold the data that it is producing. Source storage systems are normally optimized for functional performance like transactions ( delete, update, create files).? However, these data stores are not good for data extraction and analysis.?
There are storage systems optimized for business transactions called (OLTP) online transaction processing storage. Other storage systems optimized for data analysis are called online analytical processing (OLAP). Analytical storage systems are best for analyzing your data example to find-out if un-appropriate videos/images are uploaded to your system. Transactional storage systems normally contain lots of environmental details and metadata that is not useful for analytics purposes.? Also source storage systems are already using those files to do Realtime business transaction operations. You should not run analytics on them to slow down your business operations. Finally source storage systems are dealing with a high volume of data. They may not store the data for a long time which is not required for that system.?
Therefore, if you want to do analysis then you have to copy the source data into different analysis optimized storage for longer duration. It could be a central repository where in one place you put all data, virtual where physically it is stored in many physical locations however it appears at one place to the end user or semi-centralized repository.?
Databases for Analytics in Azure are:
What is NoSQL Database?
NoSQL (aka “Non SQL” or “Not Only SQL”) database is used to refer to any non-relational database. A common misconception is that NoSQL databases or non-relational databases don’t store relationship data. Well NoSQL databases can store relationship data—they just store it differently than relational databases do. Here related data doesn’t have to be split between tables. NoSQL data models allow related data to be nested within a single data structure. NoSQL databases allow developers to store huge amounts of unstructured data, giving them a lot of flexibility. NoSQL databases can be scale-out instead of scale-up. Scale-out feature of the database is also called Sharding or horizontal partitioning. In the Agile world NoSQL works better so you focus more on model and domain and do not care how it is stored. Domain Driven Design and code-first strategy encourage us to use NoSQL databases.?Even in the cloud technology when cloud providers have to scale the databases they do by horizontal scaling and therefore, they also prefer storing data in NoSQL format only.?
What is SQL Database?
Relational databases accessed by SQL (Structured Query Language) are called SQL databases where data is stored in a tabular format with fixed column and row count, data type and schema. In the Agile world every sprint you get to evolve your system you want to change your model which incur changes in the data table. Therefore, nowadays software engineers are preferring NoSQL databases. In waterfall days applications used to use SQL dB and we used to do a Data-First approach.?
Summary
Datafiles?
Databases
Reference: