Add notes on indexes

author: mo khan <mo.khan@gmail.com> 2020-01-19 15:11:07 -0700
committer: mo khan <mo.khan@gmail.com> 2020-01-19 15:11:07 -0700
commit: ef34060f6582706f4c8cec9b7a1e296a67e1c80e (patch)
tree: ace68aa3f22ec9f9975a0208bbbb9ad8cb84d2e4
parent: edbf044e93da67df36ecd917b5865a6f1e30c3d0 (diff)
1 files changed, 127 insertions, 0 deletions
diff --git a/doc/unit-4.md b/doc/unit-4.md
index eab8723..436acd6 100644
--- a/doc/unit-4.md
+++ b/doc/unit-4.md
@@ -205,3 +205,130 @@ Read:
 * Designing physical database files
 * Using and selecting indexes
 * Designing a database for optimal query performance
+
+## Designing physical database files
+
+> physical file: a named portion of secondary memory (hdd) allocated for the purpose of storing physical records.
+
+> tablespace: a named logical storage unit in which data fro one or more database tables, views or other database objects may be stored.
+
+> extent: a contiguous section of disk storage space.
+
+> file organization: is a technique for physically arranging the records of a file on secondary storage devices.
+
+1. fast data retrieval
+2. high throughput for processing data input and maintenance transactions
+3. efficient use of storage space
+4. protection from failures or data loss
+5. minimizing need for reorganization
+6. accommodating growth
+7. security from unauthorized use
+
+Sequential file organization
+
+The records in a file are stored in sequence according to a primary key value.
+Not used in databases due to their inflexibility, but may be used for backups.
+
+Indexed File Organization
+
+The records are stored sequentially or nonsequentially and an index is created that allows the
+application software to locate individual records.
+
+> Index: is a table that is used to determine in a file the location of records that satisfy some condition.
+
+Each index entry matches a key value with one or more records.
+An index that allows each entry to point to more than one record is called a secondary key index.
+
+> Secondary key: one field or a combination of fields for which more than one record may have the same combination of values. Also called a nonunique key.
+
+> join index: An index on columns from two or more tables that come from the same domain of values.
+
+Hashed File Organization
+
+A storage system in which the address of each record is determined using a hashing algorithm.
+
+> Hashing algorithm: is a routine that converts a primary key value into a record address.
+
+Typical hashing algorithm uses the technique of dividing each primary key value by a suitable prime number and then using the remainder of the division as the relative storage location.
+
+> hash index table: a file organization that uses hashing to map a key into a location in an index, where there is a pointer to the actual data record matching the hash key.
+
+> Pointer: a field of data indicating a target address that can be used to locate related field or record of data.
+
+* an index table is smaller than a data table.
+* added expense of storing and maintaining the index space.
+
+File organization overview
+
+* Storage space
+  * sequential: no wasted space
+  * indexed: no wasted space for data but extra space for index
+  * hashed: extra space may be needed to allow for addition and deletion of records after the initial set of records is loaded.
+* sequential retrieval on primary key:
+  * sequential: very fast
+  * indexed: moderately fast
+  * hashed: impractical, unless using hash index
+* random retrieval on primary key
+  * sequential: impractical
+  * indexed: moderately fast
+  * hashed: very fast
+* multiple key retrieval
+  * sequential: requires scanning whole file
+  * indexed: very fast with multiple indexes
+  * hashed: not possible unless using hash index
+* deleting records:
+  * sequential: can create wasted space or require reorganizing
+  * indexed: if space can be dynamically allocated, easy. requires maintenance of indexes
+  * hashed: very eash
+* adding new records:
+  * sequential: requires rewriting a file
+  * indexed: if space can be dynamically allocated, easy. requires maintenance of indexes.
+  * hashed: very easy. hash collisions need extra work.
+* updating records:
+  * sequential: usually requires rewriting a file
+  * indexed: easy but requires maintenance of indexes
+  * hashed: very easy
+
+Using and Selecting indexes
+
+Creating a unique key index
+
+```sql
+CREATE UNIQUE INDEX customers_pk ON customers(id);
+```
+
+* name: `customers_pk`
+* table: `customers`
+* column: `id`
+
+Creating a composite unique key.
+
+```sql
+CREATE UNIQUE INDEX line_items_pk ON line_items(order_id, product_id);
+```
+
+Creating a secondary (non unique) key index
+
+```sql
+CREATE INDEX products_fk ON orders(product_id);
+```
+
+When to use indexes
+
+There is a tradeoff between improved read performance and degraded write performance due to 
+index maintenance.
+
+Indexes can be used generously for systems that are read heavy but beware in systems that
+are write heavy.
+
+Guidelines:
+
+1. indexes are most useful on larger tables
+2. primary key should have a unique index.
+3. indexes are useful for columns that appear in `WHERE` clauses.
+4. Use indexes for columns that are referenced in `ORDER BY` and `GROUP BY` clauses.
+5. Indexes are useful where there is a wide variety of values.
+6. Indexes on large values is not optimal. Try to find a smaller coded value to index to represent the larger value.
+7. If the key for the index is used to determine the location where the record will be stored then a surrogate key should be used.
+8. Check DBMS for limits on the # of indexes on a table. (limits include # of indexes, and size of index)
+9. Be careful indexes attributes that have null values. some systems don't index `null` so sequential scan is needed for `null` value.
author	mo khan <mo.khan@gmail.com>	2020-01-19 15:11:07 -0700
committer	mo khan <mo.khan@gmail.com>	2020-01-19 15:11:07 -0700
commit	ef34060f6582706f4c8cec9b7a1e296a67e1c80e (patch)
tree	ace68aa3f22ec9f9975a0208bbbb9ad8cb84d2e4
parent	edbf044e93da67df36ecd917b5865a6f1e30c3d0 (diff)