diff options
| author | mo khan <mo.khan@gmail.com> | 2020-01-18 14:36:45 -0700 |
|---|---|---|
| committer | mo khan <mo.khan@gmail.com> | 2020-01-18 14:36:45 -0700 |
| commit | 874f9b60ad15fde48b4d815f753c2b110d5ead6b (patch) | |
| tree | 80b935c1b70ac4d5816aed0f3881851ac33e5ce1 /doc | |
| parent | cddaca9149f10061130277ce187819b23d7944f4 (diff) | |
Add notes on denormalization
Diffstat (limited to 'doc')
| -rw-r--r-- | doc/unit-4.md | 100 |
1 files changed, 100 insertions, 0 deletions
diff --git a/doc/unit-4.md b/doc/unit-4.md index 7705154..b60e126 100644 --- a/doc/unit-4.md +++ b/doc/unit-4.md @@ -25,6 +25,106 @@ Important terms: * horizontal partitioning: * vertical partitioning: + +### The physical database design process + +* goal: data procesing effeciency + +#### critical decisions + +* storage format: data type +* provide dbms with hints on how to group attributes +* provide dmbs with hints on how to arrange similar data +* select structures for storing and connecting files efficiently. (indexes, db architecture) +* prepare strategies for handling queries. optimize for perf. + +#### Usage analysis + +* add notation to the EER diagram + * represents final set of normalized relations + +* record data volume +* record access frequency + +#### Designing Fields + +> Field: The smallest unit of application data reconized by system software. + +* type of data +* data integrity controls +* handling of missing values for the field + +> data type: a detailed coding scheme recognized by software system for representing organizational data. + +* space to store data +* speed required to access data + +1. represent all possible values +2. improve data integrity +3. support all data manipulations +4. minimize storage space + +[Postgresql Data Types](https://www.postgresql.org/docs/current/datatype.html) + +* Numeric +* monetary +* character +* binary +* date/time +* boolean +* enum +* geometric +* network address types +* bit string types +* text search types +* uuid types +* xml types +* json types +* arrays +* composite types +* range types +* domain types +* object identifier types + +#### Data Integrity + +* default value +* range control +* null value control +* referential integrity control + +#### Handling Missing Data + +* default value +* do not allow null +* substitute estimate of missing data +* track missing data so it can be resolved manually +* perform sensitivity testing + +### Denormalizing + +noramlized relations solve maintenance anomolies and +minimize redundancies. This may not yield efficient data processing +if the data required is across many relations. i.e JOINS are costly +if the data isn't near each other. + +> Denormalization: process of transforming normalized relations into non-normalized physical record specifications. + +Opportunities for denormalization + +1. two entities with a one-to-one relationship +2. many-to-many relationship (associative entity) with nonkey attributes + * instead of joining three files to extract data from two basic entities +3. reference data + +Dangers of denorm + +* increase chance of errors and inconsistencies +* optimization for one type of processing make affect a different type of processing +* more storage space requirements + +Partitioning + ## Section 2: File Organization and Database Architectures Outcomes: |
