summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authormo khan <mo.khan@gmail.com>2020-01-19 15:11:07 -0700
committermo khan <mo.khan@gmail.com>2020-01-19 15:11:07 -0700
commitef34060f6582706f4c8cec9b7a1e296a67e1c80e (patch)
treeace68aa3f22ec9f9975a0208bbbb9ad8cb84d2e4 /doc
parentedbf044e93da67df36ecd917b5865a6f1e30c3d0 (diff)
Add notes on indexes
Diffstat (limited to 'doc')
-rw-r--r--doc/unit-4.md127
1 files changed, 127 insertions, 0 deletions
diff --git a/doc/unit-4.md b/doc/unit-4.md
index eab8723..436acd6 100644
--- a/doc/unit-4.md
+++ b/doc/unit-4.md
@@ -205,3 +205,130 @@ Read:
* Designing physical database files
* Using and selecting indexes
* Designing a database for optimal query performance
+
+## Designing physical database files
+
+> physical file: a named portion of secondary memory (hdd) allocated for the purpose of storing physical records.
+
+> tablespace: a named logical storage unit in which data fro one or more database tables, views or other database objects may be stored.
+
+> extent: a contiguous section of disk storage space.
+
+> file organization: is a technique for physically arranging the records of a file on secondary storage devices.
+
+1. fast data retrieval
+2. high throughput for processing data input and maintenance transactions
+3. efficient use of storage space
+4. protection from failures or data loss
+5. minimizing need for reorganization
+6. accommodating growth
+7. security from unauthorized use
+
+Sequential file organization
+
+The records in a file are stored in sequence according to a primary key value.
+Not used in databases due to their inflexibility, but may be used for backups.
+
+Indexed File Organization
+
+The records are stored sequentially or nonsequentially and an index is created that allows the
+application software to locate individual records.
+
+> Index: is a table that is used to determine in a file the location of records that satisfy some condition.
+
+Each index entry matches a key value with one or more records.
+An index that allows each entry to point to more than one record is called a secondary key index.
+
+> Secondary key: one field or a combination of fields for which more than one record may have the same combination of values. Also called a nonunique key.
+
+> join index: An index on columns from two or more tables that come from the same domain of values.
+
+Hashed File Organization
+
+A storage system in which the address of each record is determined using a hashing algorithm.
+
+> Hashing algorithm: is a routine that converts a primary key value into a record address.
+
+Typical hashing algorithm uses the technique of dividing each primary key value by a suitable prime number and then using the remainder of the division as the relative storage location.
+
+> hash index table: a file organization that uses hashing to map a key into a location in an index, where there is a pointer to the actual data record matching the hash key.
+
+> Pointer: a field of data indicating a target address that can be used to locate related field or record of data.
+
+* an index table is smaller than a data table.
+* added expense of storing and maintaining the index space.
+
+File organization overview
+
+* Storage space
+ * sequential: no wasted space
+ * indexed: no wasted space for data but extra space for index
+ * hashed: extra space may be needed to allow for addition and deletion of records after the initial set of records is loaded.
+* sequential retrieval on primary key:
+ * sequential: very fast
+ * indexed: moderately fast
+ * hashed: impractical, unless using hash index
+* random retrieval on primary key
+ * sequential: impractical
+ * indexed: moderately fast
+ * hashed: very fast
+* multiple key retrieval
+ * sequential: requires scanning whole file
+ * indexed: very fast with multiple indexes
+ * hashed: not possible unless using hash index
+* deleting records:
+ * sequential: can create wasted space or require reorganizing
+ * indexed: if space can be dynamically allocated, easy. requires maintenance of indexes
+ * hashed: very eash
+* adding new records:
+ * sequential: requires rewriting a file
+ * indexed: if space can be dynamically allocated, easy. requires maintenance of indexes.
+ * hashed: very easy. hash collisions need extra work.
+* updating records:
+ * sequential: usually requires rewriting a file
+ * indexed: easy but requires maintenance of indexes
+ * hashed: very easy
+
+Using and Selecting indexes
+
+Creating a unique key index
+
+```sql
+CREATE UNIQUE INDEX customers_pk ON customers(id);
+```
+
+* name: `customers_pk`
+* table: `customers`
+* column: `id`
+
+Creating a composite unique key.
+
+```sql
+CREATE UNIQUE INDEX line_items_pk ON line_items(order_id, product_id);
+```
+
+Creating a secondary (non unique) key index
+
+```sql
+CREATE INDEX products_fk ON orders(product_id);
+```
+
+When to use indexes
+
+There is a tradeoff between improved read performance and degraded write performance due to
+index maintenance.
+
+Indexes can be used generously for systems that are read heavy but beware in systems that
+are write heavy.
+
+Guidelines:
+
+1. indexes are most useful on larger tables
+2. primary key should have a unique index.
+3. indexes are useful for columns that appear in `WHERE` clauses.
+4. Use indexes for columns that are referenced in `ORDER BY` and `GROUP BY` clauses.
+5. Indexes are useful where there is a wide variety of values.
+6. Indexes on large values is not optimal. Try to find a smaller coded value to index to represent the larger value.
+7. If the key for the index is used to determine the location where the record will be stored then a surrogate key should be used.
+8. Check DBMS for limits on the # of indexes on a table. (limits include # of indexes, and size of index)
+9. Be careful indexes attributes that have null values. some systems don't index `null` so sequential scan is needed for `null` value.