summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authormo khan <mo.khan@gmail.com>2020-04-18 09:26:11 -0600
committermo khan <mo.khan@gmail.com>2020-04-18 09:26:11 -0600
commite7c30c84d716b268dce3a89a473da6b850da5606 (patch)
tree9f19273ddc47fe7421e0f0602b1a44dfda51a25a
parent1730da9bc12453bab4db6e04eb933b3e7c196da3 (diff)
Fix spelling mistakes
-rw-r--r--doc/unit-4.md36
-rw-r--r--doc/unit-7.md30
-rw-r--r--doc/unit-8.md21
3 files changed, 42 insertions, 45 deletions
diff --git a/doc/unit-4.md b/doc/unit-4.md
index b5bd1ba..5ba9826 100644
--- a/doc/unit-4.md
+++ b/doc/unit-4.md
@@ -12,6 +12,7 @@
* their fields
Read: Chapter 5
+
* the physical database design process
* designing fields
* denormalizing and partitioning data
@@ -48,13 +49,13 @@ Important terms:
#### Designing Fields
-> Field: The smallest unit of application data reconized by system software.
+> Field: The smallest unit of application data recognized by system software.
* type of data
* data integrity controls
* handling of missing values for the field
-> data type: a detailed coding scheme recognized by software system for representing organizational data.
+> data type: a detailed coding scheme recognized by software systems for representing organizational data.
* space to store data
* speed required to access data
@@ -103,7 +104,7 @@ Important terms:
### Denormalizing
-noramlized relations solve maintenance anomolies and
+normalized relations solve maintenance anomalies and
minimize redundancies. This may not yield efficient data processing
if the data required is across many relations. i.e JOINS are costly
if the data isn't near each other.
@@ -120,7 +121,7 @@ Opportunities for denormalization
Dangers of denorm
* increase chance of errors and inconsistencies
-* optimization for one type of processing make affect a different type of processing
+* optimization for one type of processing may affect a different type of processing
* more storage space requirements
### Partitioning
@@ -134,7 +135,7 @@ Partition key
* single column
* date: usually a qualifier in queries
-Makes table maintainenance easier because fragementing and rebuliding can be isolated to single partitions
+Makes table maintenance easier because fragmenting and rebuilding can be isolated to single partitions
as storage space needs to be organized.
File level security can be applied to prohibit users from seeing certain rows of data.
@@ -144,7 +145,6 @@ still allows processing against the other partitioned files to continue.
Each partitioned file can be placed on a separate disk drive to reduce contention for the same drive
and hence improve query and maintenance performance across the database.
-
Horizontal partitioning is similar to creating supertype/subtype relationship because different types
of the entity are involved in different relationships.
@@ -168,7 +168,7 @@ Range partitioning
* A table row is inserted in the proper partition based on it's initial values for the range fields.
* Each partition may hold a different number of rows.
* A partition key that can produce a more balanced distribution of rows is better.
-* A row may be restricted fro moving between partitions when key values are updated.
+* A row may be restricted from moving between partitions when key values are updated.
Hash Partitioning
@@ -210,7 +210,7 @@ Read:
> physical file: a named portion of secondary memory (hdd) allocated for the purpose of storing physical records.
-> tablespace: a named logical storage unit in which data fro one or more database tables, views or other database objects may be stored.
+> tablespace: a named logical storage unit in which data from one or more database tables, views or other database objects may be stored.
> extent: a contiguous section of disk storage space.
@@ -231,7 +231,7 @@ Not used in databases due to their inflexibility, but may be used for backups.
Indexed File Organization
-The records are stored sequentially or nonsequentially and an index is created that allows the
+The records are stored sequentially or non-sequentially and an index is created that allows the
application software to locate individual records.
> Index: is a table that is used to determine in a file the location of records that satisfy some condition.
@@ -239,7 +239,7 @@ application software to locate individual records.
Each index entry matches a key value with one or more records.
An index that allows each entry to point to more than one record is called a secondary key index.
-> Secondary key: one field or a combination of fields for which more than one record may have the same combination of values. Also called a nonunique key.
+> Secondary key: one field or a combination of fields for which more than one record may have the same combination of values. Also called a non-unique key.
> join index: An index on columns from two or more tables that come from the same domain of values.
@@ -256,7 +256,7 @@ Typical hashing algorithm uses the technique of dividing each primary key value
> Pointer: a field of data indicating a target address that can be used to locate related field or record of data.
* an index table is smaller than a data table.
-* added expense of storing and maintaining the index space.
+* Added expense of storing and maintaining the index space.
File organization overview
@@ -278,12 +278,12 @@ File organization overview
* hashed: not possible unless using hash index
* deleting records:
* sequential: can create wasted space or require reorganizing
- * indexed: if space can be dynamically allocated, easy. requires maintenance of indexes
+ * indexed: if space can be dynamically allocated, easy. Requires maintenance of indexes
* hashed: very eash
* adding new records:
* sequential: requires rewriting a file
- * indexed: if space can be dynamically allocated, easy. requires maintenance of indexes.
- * hashed: very easy. hash collisions need extra work.
+ * indexed: if space can be dynamically allocated, easy. Requires maintenance of indexes.
+ * hashed: very easy. Hash collisions need extra work.
* updating records:
* sequential: usually requires rewriting a file
* indexed: easy but requires maintenance of indexes
@@ -315,7 +315,7 @@ CREATE INDEX products_fk ON orders(product_id);
When to use indexes
-There is a tradeoff between improved read performance and degraded write performance due to
+There is a trade-off between improved read performance and degraded write performance due to
index maintenance.
Indexes can be used generously for systems that are read heavy but beware in systems that
@@ -331,7 +331,7 @@ Guidelines:
6. Indexes on large values is not optimal. Try to find a smaller coded value to index to represent the larger value.
7. If the key for the index is used to determine the location where the record will be stored then a surrogate key should be used.
8. Check DBMS for limits on the # of indexes on a table. (limits include # of indexes, and size of index)
-9. Be careful of indexing attributes that have null values. some systems don't index `null` so sequential scan is needed for `null` value.
+9. Be careful of indexing attributes that have null values. Some systems don't index `null` so sequential scan is needed for `null` value.
### Designing a database for optimal query performance
@@ -341,8 +341,8 @@ Parallel query processing
## Summary
-denormalization transforms normalized relations into non-normalized implementations specs.
-Horizontal paritioning breaks a relation into multiple record specifications by placing different
+De-normalization transforms normalized relations into non-normalized implementations specs.
+Horizontal partitioning breaks a relation into multiple record specifications by placing different
rows into different tables, based on common column values.
Vertical partitioning distributes the columns of a relation into separate files, repeating the primary key in each of the files.
diff --git a/doc/unit-7.md b/doc/unit-7.md
index e9edc2f..4da9564 100644
--- a/doc/unit-7.md
+++ b/doc/unit-7.md
@@ -4,12 +4,12 @@
* Read "Chapter 9: Basic concepts of data warehousing"
-* data warehouse: a subject-oriented, integrated, time-variant, nonupdateable collection of data used in support of management decision-making processes.
+* data warehouse: a subject-oriented, integrated, time-variant, non-updatable collection of data used in support of management decision-making processes.
* subject oriented: organized around key subjects. e.g. customers, patients, students, products, and time.
* integrated: data housed in the data warehouse are defined using consistent naming conventions, formats, encoding structures and related characteristics gathered from multiple sources.
* time-variant: data in the data warehouse contain a time dimension so that they may be used to study trends and changes.
-* nonupdateable: loaded and refreshed from operational systems but cannot be updated by end users.
+* non-updatable: loaded and refreshed from operational systems but cannot be updated by end users.
## Section 2 - Data Warehouse Architectures and OLAP Tools
@@ -59,7 +59,7 @@ Independent data mart architecture
4. Scaling costs are excessive because every new application that creates a separate data mart repeats all the extract and load scripts.
5. If there is an attempt to make the separate data marts consistent, the cost to do so is quite high.
-Value of indepdendent data marts:
+Value of independent data marts:
1. one debate deals with the nature of the phased approach to implementing a data warehousing environment.
2. The other debate deals with the suitable database architecture for analytical processing.
@@ -72,7 +72,7 @@ The dependent data mart and operational data store is often called the "hub and
is the hub and the source data systems and the data marts are at ends of the input and output spokes.
-> Operational data store (ODS): An integrated, subject-oriented, continuously updateable, current-valued, enterprise wide, detailed database designed to serve operational users as they do decision support processing.
+> Operational data store (ODS): An integrated, subject-oriented, continuously updatable, current-valued, enterprise wide, detailed database designed to serve operational users as they do decision support processing.
Logical Data Mart and real-time data warehouse architecture
@@ -192,7 +192,7 @@ The dimension tables are denormalized.
Example:
-
+```text
| Dimension Table |
| Key 1 (PK) | ----
| attribute | |
@@ -212,7 +212,7 @@ Example:
| attribute |
| attribute |
| attribute |
-
+```
A star schema provides answers to a domain of business questions.
@@ -228,10 +228,10 @@ Every key used to join the fact table with a dimension table should be a surroga
Why?
-* business keys change, often slowly, over time and we need to remember old and new business key values for the same business object.
-* using a surrogate key also allows us to keep track of different nonkey attribute values for the same product production key with several surrogate keys, each for the different package sizes.
-* surrogate keys are often simpler and shorter
-* surrogate keys can be of the same length and format for all keys
+* Business keys change, often slowly, over time and we need to remember old and new business key values for the same business object.
+* Using a surrogate key also allows us to keep track of different non-key attribute values for the same product production key with several surrogate keys, each for the different package sizes.
+* Surrogate keys are often simpler and shorter
+* Surrogate keys can be of the same length and format for all keys
Grain of the fact table
@@ -245,7 +245,7 @@ This intersection of primary keys is called the grain of the fact table.
> Grain: the level of detail in a fact table, determined by the intersection of all the components of the primary key, including all foreign keys and any other primary key elements.
-A common grain would be each business transaction, such as an individual line item or an individual scanned item on a product sales receipt, a personall change order, a line item on a material receipt, a claim against an insurance policy, a boarding pass, or an individual ATM transaction.
+A common grain would be each business transaction, such as an individual line item or an individual scanned item on a product sales receipt, a personal change order, a line item on a material receipt, a claim against an insurance policy, a boarding pass, or an individual ATM transaction.
Clicks on a website is possibly the lowest level of granularity.
@@ -296,7 +296,6 @@ The User Interface
Traditional query and reporting tools include spreadsheets, personal computer databases and report writers and generators.
-
Role of metadata
The first requirement for building a user-friendly interface is a set of metadata that describes the data.
@@ -306,14 +305,13 @@ or some similar term.
Metadata should allow users to answer questions like:
-* what subjects are described in the data mart?
-* what dimensions and facts are included in the data mart?
+* What subjects are described in the data mart?
+* What dimensions and facts are included in the data mart?
* How are the data in the data mart derived from the enterprise data warehouse data?
* How are the data in the EDW derived from operational data?
* What reports and predefined queries are available to view the data?
* What drill down and other data analysis techniques are available?
-* who is responsible for the quality of data in the data marts, and to whome are requests for changes made?
-
+* Who is responsible for the quality of data in the data marts, and to whom are requests for changes made?
SQL OLAP Quering
diff --git a/doc/unit-8.md b/doc/unit-8.md
index f11d381..a851149 100644
--- a/doc/unit-8.md
+++ b/doc/unit-8.md
@@ -69,9 +69,9 @@ When you have completed this section, you should be able to
> Database administration: a technical function that is responsible for physical database design with technical issues,
such as security enforcement, database performance and backup and recovery.
-* analzye and design database
-* select dbms and related tools
-* installing and upgrading the dbms
+* analyze and design database
+* select DBMS and related tools
+* installing and upgrading the DBMS
* tuning database performance
* improving database query processing performance
* managing data security, privacy and integrity
@@ -79,12 +79,11 @@ such as security enforcement, database performance and backup and recovery.
> Open source DBMS: Free DBMS source code software that provides the core functionality of an SQL-compliant DBMS.
-
> Database security: Protection of database data against accidental or intentional loss, destruction, or misuse.
Threats:
-* accidental loss: human error, software/hardward caused breaches
+* accidental loss: human error, software/hardware caused breaches
* theft and fraud:
* loss of privacy or confidentiality:
* loss of data integrity:
@@ -132,15 +131,15 @@ Recover and restart:
ACID Transactions
1. Atomic: process all of the transaction or nothing.
-2. Consistent: any db constrants that were true before the trx must be true after the trx.
-3. Isolated: changes to db are not revealed to users until trx is committed.
+2. Consistent: any db constraints that were true before the transaction must be true after the transaction.
+3. Isolated: changes to db are not revealed to users until transaction is committed.
4. Durable: changes are permanent.
> Transaction boundaries: The logical beginning and end of a transaction.
Backward Recovery (rollback)
-The backout, or undo, of unwanted changes to a database. Before images of the records that have
+The back-out, or undo, of unwanted changes to a database. Before images of the records that have
been changed are applied to the database, and the database is returned to an earlier state.
Rollback is used to reverse the changes made by transaction that have been aborted, or terminated abnormally.
@@ -150,7 +149,7 @@ A technique that starts with an earlier copy of a database then plays the `after
good transactions.
1. Time consuming logic of reprocessing each transaction does not have to be repeated.
-2. Only th emost recent after images need to be applied.
+2. Only the most recent after images need to be applied.
Types of failure:
@@ -161,11 +160,11 @@ Responses to failure:
1. Aborted transaction:
* rollback
- * rollforward/return transactions to state just prior to abort
+ * roll forward/return transactions to state just prior to abort
2. Incorrect data
* Rollback
* Reprocess transaction without inaccurate data updates
- * Compenstating transactions
+ * Compensating transactions
3. System failure
* switch to duplicate database
* rollback