How Tables and Indexes are Stored on Disk in Databases
Jairaj Sahgal
Founding Engineer | Senior Backend Developer at Sustainext | Sharing Insights from My Learning Journey
?????????? ?????????????? ???? ????????
?????????-???????????????? ?????????????? (???????? ???? ?????????????????? ????????????):
???-> Rows are stored sequentially in no particular order (Heap storage).
???-> Clustered indexes store rows in the order of the index key (e.g., primary key).
???-> Efficient for range queries and ordered scans when using clustered indexes.
???-> Heap storage is faster for full-table scans but slower for column-specific queries.
???????????????-???????????????? ??????????????:
???-> Data is stored column by column instead of row by row.
???-> Ideal for analytical queries that involve aggregations over large datasets.
???-> Slower for write-heavy workloads due to the need to update multiple column files.
???????????-?????????? ??????????????:
???-> Data is divided into fixed-size blocks called pages (typically 4KB, 8KB, or 16KB).
???-> Pages are grouped into extents (contiguous groups of pages) for efficient I/O.
???-> Pages can be cached in memory (buffer pool) for faster access.
?????????? ?????????????? ???? ????????
?????-???????? ??????????????:
???-> B-trees are balanced tree structures with keys and pointers to child nodes or data rows.
???-> Leaf nodes contain pointers to actual rows or their locations on disk.
???-> Efficient for range queries, equality searches, and ordered scans.
???-> Requires additional storage and maintenance overhead for insertions and deletions.
??????????? ??????????????:
???-> Hash indexes use a hash function to map keys to specific locations in a hash table.
???-> Extremely fast for exact match queries (equality searches).
???-> Not suitable for range queries or ordered scans due to lack of ordering.
??????????????? ??????????????:
???-> Bitmap indexes use bit arrays to represent the presence or absence of values in a column.
领英推荐
???-> Compact and efficient for columns with low cardinality (few distinct values).
???-> Expensive to update and not suitable for high-cardinality columns.
????????????????????? ??????????????:
???-> Secondary indexes are created on non-primary key columns.
???-> Store indexed column values along with pointers to actual rows or the clustered index key.
???-> Improve query performance for non-primary key columns but add overhead for updates.
???????? ???????????????????????? ???? ????????
??????????? ?????????? ?????? ????????????????:
???-> Tables and indexes are stored in data files, which are divided into segments.
???-> Each segment corresponds to a table or index and is further divided into extents and pages.
?????????????????????????:
???-> Some databases use tablespaces to group related data files.
???-> A tablespace is a logical container for segments and maps to one or more physical data files.
????????? ?????????? (??????????-?????????? ?????????????? - ??????):
???-> Log files ensure durability and recovery by recording changes before they are applied to data files.
???-> In case of a crash, the database replays the log to restore consistency.
??????????????? ???????? (???????????? ??????????):
???-> Frequently accessed pages are stored in a buffer pool (memory cache) to reduce disk I/O.
???-> If data is not in the buffer pool, it is fetched from disk and loaded into memory.
?????? ???????????????????????????? ?????? ??????????????
????????????????????????? ??????????-????????:
???-> Row-oriented storage is better for transactional workloads, while column-oriented storage excels in analytical workloads.
???-> Indexes improve query performance but add overhead for write operations.
????????????????? ????????????????????:
???-> Page-based storage ensures efficient use of disk space and memory.
???-> Compression techniques may be used to reduce the size of data and indexes.
??????????????????????? ?????? ????????????????:
???-> Write-ahead logging ensures data integrity and recovery in case of system failures.
???-> Regular backups and checkpoints help maintain data consistency.