summaryrefslogtreecommitdiff
path: root/doc/01-introduction.md
blob: 61e6cee88bbb393f933b36c6fcd45d3e8e142714 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
Chapter 1: Introduction to Databases

# What is a database?

# DBMS

* protect
* define and control
* backups
* maintain data. enter, change and delete data
* retrieve
* Programming and GUI interface

Example:

customers
* id
* name
* address

products
* id
* description
* product_finish

orders
* id
* created_at

line_items

* id
* order_id
* product_id
* quantity

* Data structure: data are organized in the form of tables, with rows and columns.
* Data manipulation: SQL language used to manipulate data
* Data integrity: mechanism used to specify business rules that maintain data integrity.

# Relation: two dimensional table of data.

## Shorthand syntax

Employee1(id, name) # shorthand notation for describing a relation.

* primary key is underlined.
* foreign key uses a dashed underline.

Employees( *id*, name )

## Relational Keys

* primary key: attribute or combination of attributes that uniquely identifies each row in a relation.
* composite key: primary key that consists of more than one attribute.
* foreign key: attribute in a relation that serves as the primary key of another relation in the same db.
* enterprise key: A primary key whose value is unique across all relations.

Not all tables are relations.

1. each relation has a unique name in the db.
2. each cell (intersection of row and column) is atomic (single valued). multi-valued attribute not allowed.
3. each row is unique. no two rows in a relation can be identical.
4. each attribute (column) within the table has a unique name.
5. sequence of columns is insignificant (left to right). order of columns can change without affecting meaning.
6. sequence of rows is insignificant. (top to bottom)

Suggestions:

It's good to create an instance of your relational schema with sample data because...

1. the sample data allow you to test your assumptions regarding the design.
2. the sample data provide a convenient way to check the accuracy of your design.
3. the sample data help improve communications with users in discussing your design.
4. you can use the sample data to develop prototype applications and to test queries.

## Integrity Constraints

Rules limiting acceptable values and actions. Purpose is to facilitate maintaining the accuracy
and integrity of data in the db.

The major types of integrity constraints are domain constraints, entity integrity and referential integrity.

### Domain Constraints

All the values that appear in a column of a relation must be from the same domain. huh?

> Domain: the set of values that may be assigned to an attribute.

Domain definition consists of the following components:

* domain name
* meaning
* data type
* size (length)
* allowable values or allowable range

#### Entity Integrity

Ensures that every relation has a primary key and that the data values for the primary key are all valid.

* Null: a value that may be assigned to an attribute when no other value applies or when the applicable value is unknown.

* Entity integrity rule: No primary key attribute (or component of primary key attribute) may be null.

#### Referential Integrity

Associations between tables are defined through foreign keys.

* Referential integrity constraint: each foreign key value must match a primary key value in another relation or the foreign key value must be null.

#### Cardinality

* Minimum cardinality: The minimum number of instances of one entity that may be associated with each instance of another entity.
* Maximum cardinality: The maximum number of instances of one entity that may be associated with each instance of another entity.

## Well structured relations

What makes up a well structured relation?

* minimal redundancy
* crud operations can occur without errors or inconsistencies.

Anomaly: an error or inconsistency that may result when a user attempts to update a table that
contains redundant data. The three types of anomalies are insertion, deletion and modification anomalies.

* insertion anomaly
  * insert into employees also requires course data.
  * Should be able to insert employee data without supplying course data.
* deletion anomaly:
* modification anomaly: a change to a single employee requires updating multiple records in the same relation.

1. Regular entity: has independent existence.
2. Weak entity: cannot exist without relationship to owner.
3. Associative Entity: many-to-many relationships. join tables.

> Surrogate primary key: serial # or other system assigned primary key for a relation.

# Normalization

Is a formal process for deciding which attributes should be grouped together in a relation
so that all anomalies are removed.

> Normalization: the process of decomposing relations with anomalies to produce smaller, well-structured relations.

1. minimize redundancy.
2. simplify enforcement of referential integrity constraints.
3. Make CUD operations easy.
4. Future growth.

## Normal form

What is normal form?

A normal form is a state of a relation that requires that certain rules regarding relationships between attributes are satisfied.

> Normal form: A state of a relation that requires that certain rules regarding relationships between attributes are satisfied.

* 1st normal form: multivalued attributes have been removed. each cell has a single value.
* 2nd normal form: partial functional dependencies have been removed. (?)
* 3rd normal form: transitive dependencies have been removed. (?)
* Boyce-Codd normal form: remaining anomalies from functional dependencies have been removed.
* 4th normal form: multivalued dependencies have been removed.
* 5th normal form: remaining anomalies have been removed.

> functional dependency: a constraint between two attributes in which the value of one attribute is determined by the value of another attribute.
> transitive dependency: a functional dependency between the primary key and one or more nonkey attributes that are dependent on the primary key via nother nonkey attribute.

* Synonyms: Two (or more) attributes that have different names but the same meaning.
* Alias: An alternative name used for an attribute.
* Homonym: An attribute that may have more than one meaning.