summaryrefslogtreecommitdiff
path: root/doc/unit-4.md
diff options
context:
space:
mode:
authormo khan <mo.khan@gmail.com>2020-01-18 17:00:14 -0700
committermo khan <mo.khan@gmail.com>2020-01-18 17:00:14 -0700
commitedbf044e93da67df36ecd917b5865a6f1e30c3d0 (patch)
tree8387bdcb881b00d99188cc7a71025abfa96fa08b /doc/unit-4.md
parent73e631a7d4808b5c55dd0cb3479b1d16b3f48a87 (diff)
Add notes on partitioning
Diffstat (limited to 'doc/unit-4.md')
-rw-r--r--doc/unit-4.md62
1 files changed, 61 insertions, 1 deletions
diff --git a/doc/unit-4.md b/doc/unit-4.md
index e16b8fb..eab8723 100644
--- a/doc/unit-4.md
+++ b/doc/unit-4.md
@@ -125,9 +125,69 @@ Dangers of denorm
### Partitioning
+Horizontal Partitioning
+
> Horizontal partitioning: Distribution of the rows of a logical relation into several separate tables.
-** Continue from page 218.
+Partition key
+
+* single column
+* date: usually a qualifier in queries
+
+Makes table maintainenance easier because fragementing and rebuliding can be isolated to single partitions
+as storage space needs to be organized.
+
+File level security can be applied to prohibit users from seeing certain rows of data.
+Taking one of the partitioned files out of service because it was damaged or so it can be recovered
+still allows processing against the other partitioned files to continue.
+
+Each partitioned file can be placed on a separate disk drive to reduce contention for the same drive
+and hence improve query and maintenance performance across the database.
+
+
+Horizontal partitioning is similar to creating supertype/subtype relationship because different types
+of the entity are involved in different relationships.
+
+Advantages
+
+1. Efficiency: Data queried together are stored close to one another and separate from data not used together.
+2. Local optimization: Each partition of data can be stored to optimize performance for it's own use.
+3. Security: Data not relevant to one group of users can be segregated from data those users are allowed to use.
+4. Recovery and uptime: Smaller files take less time to back up and recover and other files are still accessible if one file is damaged, so the effects of damage are isolated.
+5. Load balancing: Files can be allocated to different storage areas, which minimizes contention for access to the same storage area or even allows for parallel access to the different areas.
+
+Disadvantages
+
+1. Inconsistent access speed: Different partitions may have different access speeds, thus confusing users. If data needs to be accessed across partitions, this produces slower response times.
+2. Complexity: is usually not transparent to application code, which requires knowledge of which partition to read/write from.
+3. Extra space and update time: Data may be duplicated across partitions, taking extra storage space compared to all data in normalized files. Updates that affect multiple partitions will be slower.
+
+Range partitioning
+
+* Each partition is defined by a range of values for one or more columns of the normalized table.
+* A table row is inserted in the proper partition based on it's initial values for the range fields.
+* Each partition may hold a different number of rows.
+* A partition key that can produce a more balanced distribution of rows is better.
+* A row may be restricted fro moving between partitions when key values are updated.
+
+Hash Partitioning
+
+* data are evenly spread across partitions independent of any partition key value.
+* overcomes the uneven distribution of rows that is possible with range partitioning.
+* works well when distributing data evenly across devices is necessary.
+
+List Partitioning
+
+* partitions are defined based on predefined list of values of the partitioning key.
+* E.g. `province` partition key
+
+Vertical Partitioning
+
+> Vertical Partitioning: Distribution of the columns of a logical relation into several separate physical tables.
+
+Data replication
+
+* The same data is stored in multiple places in the database.
## Section 2: File Organization and Database Architectures