diff options
| author | mo khan <mo.khan@gmail.com> | 2020-01-19 15:11:07 -0700 |
|---|---|---|
| committer | mo khan <mo.khan@gmail.com> | 2020-01-19 15:11:07 -0700 |
| commit | ef34060f6582706f4c8cec9b7a1e296a67e1c80e (patch) | |
| tree | ace68aa3f22ec9f9975a0208bbbb9ad8cb84d2e4 | |
| parent | edbf044e93da67df36ecd917b5865a6f1e30c3d0 (diff) | |
Add notes on indexes
| -rw-r--r-- | doc/unit-4.md | 127 |
1 files changed, 127 insertions, 0 deletions
diff --git a/doc/unit-4.md b/doc/unit-4.md index eab8723..436acd6 100644 --- a/doc/unit-4.md +++ b/doc/unit-4.md @@ -205,3 +205,130 @@ Read: * Designing physical database files * Using and selecting indexes * Designing a database for optimal query performance + +## Designing physical database files + +> physical file: a named portion of secondary memory (hdd) allocated for the purpose of storing physical records. + +> tablespace: a named logical storage unit in which data fro one or more database tables, views or other database objects may be stored. + +> extent: a contiguous section of disk storage space. + +> file organization: is a technique for physically arranging the records of a file on secondary storage devices. + +1. fast data retrieval +2. high throughput for processing data input and maintenance transactions +3. efficient use of storage space +4. protection from failures or data loss +5. minimizing need for reorganization +6. accommodating growth +7. security from unauthorized use + +Sequential file organization + +The records in a file are stored in sequence according to a primary key value. +Not used in databases due to their inflexibility, but may be used for backups. + +Indexed File Organization + +The records are stored sequentially or nonsequentially and an index is created that allows the +application software to locate individual records. + +> Index: is a table that is used to determine in a file the location of records that satisfy some condition. + +Each index entry matches a key value with one or more records. +An index that allows each entry to point to more than one record is called a secondary key index. + +> Secondary key: one field or a combination of fields for which more than one record may have the same combination of values. Also called a nonunique key. + +> join index: An index on columns from two or more tables that come from the same domain of values. + +Hashed File Organization + +A storage system in which the address of each record is determined using a hashing algorithm. + +> Hashing algorithm: is a routine that converts a primary key value into a record address. + +Typical hashing algorithm uses the technique of dividing each primary key value by a suitable prime number and then using the remainder of the division as the relative storage location. + +> hash index table: a file organization that uses hashing to map a key into a location in an index, where there is a pointer to the actual data record matching the hash key. + +> Pointer: a field of data indicating a target address that can be used to locate related field or record of data. + +* an index table is smaller than a data table. +* added expense of storing and maintaining the index space. + +File organization overview + +* Storage space + * sequential: no wasted space + * indexed: no wasted space for data but extra space for index + * hashed: extra space may be needed to allow for addition and deletion of records after the initial set of records is loaded. +* sequential retrieval on primary key: + * sequential: very fast + * indexed: moderately fast + * hashed: impractical, unless using hash index +* random retrieval on primary key + * sequential: impractical + * indexed: moderately fast + * hashed: very fast +* multiple key retrieval + * sequential: requires scanning whole file + * indexed: very fast with multiple indexes + * hashed: not possible unless using hash index +* deleting records: + * sequential: can create wasted space or require reorganizing + * indexed: if space can be dynamically allocated, easy. requires maintenance of indexes + * hashed: very eash +* adding new records: + * sequential: requires rewriting a file + * indexed: if space can be dynamically allocated, easy. requires maintenance of indexes. + * hashed: very easy. hash collisions need extra work. +* updating records: + * sequential: usually requires rewriting a file + * indexed: easy but requires maintenance of indexes + * hashed: very easy + +Using and Selecting indexes + +Creating a unique key index + +```sql +CREATE UNIQUE INDEX customers_pk ON customers(id); +``` + +* name: `customers_pk` +* table: `customers` +* column: `id` + +Creating a composite unique key. + +```sql +CREATE UNIQUE INDEX line_items_pk ON line_items(order_id, product_id); +``` + +Creating a secondary (non unique) key index + +```sql +CREATE INDEX products_fk ON orders(product_id); +``` + +When to use indexes + +There is a tradeoff between improved read performance and degraded write performance due to +index maintenance. + +Indexes can be used generously for systems that are read heavy but beware in systems that +are write heavy. + +Guidelines: + +1. indexes are most useful on larger tables +2. primary key should have a unique index. +3. indexes are useful for columns that appear in `WHERE` clauses. +4. Use indexes for columns that are referenced in `ORDER BY` and `GROUP BY` clauses. +5. Indexes are useful where there is a wide variety of values. +6. Indexes on large values is not optimal. Try to find a smaller coded value to index to represent the larger value. +7. If the key for the index is used to determine the location where the record will be stored then a surrogate key should be used. +8. Check DBMS for limits on the # of indexes on a table. (limits include # of indexes, and size of index) +9. Be careful indexes attributes that have null values. some systems don't index `null` so sequential scan is needed for `null` value. |
