summaryrefslogtreecommitdiff
path: root/doc/unit-7.md
diff options
context:
space:
mode:
authormo khan <mo.khan@gmail.com>2020-04-18 09:26:11 -0600
committermo khan <mo.khan@gmail.com>2020-04-18 09:26:11 -0600
commite7c30c84d716b268dce3a89a473da6b850da5606 (patch)
tree9f19273ddc47fe7421e0f0602b1a44dfda51a25a /doc/unit-7.md
parent1730da9bc12453bab4db6e04eb933b3e7c196da3 (diff)
Fix spelling mistakes
Diffstat (limited to 'doc/unit-7.md')
-rw-r--r--doc/unit-7.md30
1 files changed, 14 insertions, 16 deletions
diff --git a/doc/unit-7.md b/doc/unit-7.md
index e9edc2f..4da9564 100644
--- a/doc/unit-7.md
+++ b/doc/unit-7.md
@@ -4,12 +4,12 @@
* Read "Chapter 9: Basic concepts of data warehousing"
-* data warehouse: a subject-oriented, integrated, time-variant, nonupdateable collection of data used in support of management decision-making processes.
+* data warehouse: a subject-oriented, integrated, time-variant, non-updatable collection of data used in support of management decision-making processes.
* subject oriented: organized around key subjects. e.g. customers, patients, students, products, and time.
* integrated: data housed in the data warehouse are defined using consistent naming conventions, formats, encoding structures and related characteristics gathered from multiple sources.
* time-variant: data in the data warehouse contain a time dimension so that they may be used to study trends and changes.
-* nonupdateable: loaded and refreshed from operational systems but cannot be updated by end users.
+* non-updatable: loaded and refreshed from operational systems but cannot be updated by end users.
## Section 2 - Data Warehouse Architectures and OLAP Tools
@@ -59,7 +59,7 @@ Independent data mart architecture
4. Scaling costs are excessive because every new application that creates a separate data mart repeats all the extract and load scripts.
5. If there is an attempt to make the separate data marts consistent, the cost to do so is quite high.
-Value of indepdendent data marts:
+Value of independent data marts:
1. one debate deals with the nature of the phased approach to implementing a data warehousing environment.
2. The other debate deals with the suitable database architecture for analytical processing.
@@ -72,7 +72,7 @@ The dependent data mart and operational data store is often called the "hub and
is the hub and the source data systems and the data marts are at ends of the input and output spokes.
-> Operational data store (ODS): An integrated, subject-oriented, continuously updateable, current-valued, enterprise wide, detailed database designed to serve operational users as they do decision support processing.
+> Operational data store (ODS): An integrated, subject-oriented, continuously updatable, current-valued, enterprise wide, detailed database designed to serve operational users as they do decision support processing.
Logical Data Mart and real-time data warehouse architecture
@@ -192,7 +192,7 @@ The dimension tables are denormalized.
Example:
-
+```text
| Dimension Table |
| Key 1 (PK) | ----
| attribute | |
@@ -212,7 +212,7 @@ Example:
| attribute |
| attribute |
| attribute |
-
+```
A star schema provides answers to a domain of business questions.
@@ -228,10 +228,10 @@ Every key used to join the fact table with a dimension table should be a surroga
Why?
-* business keys change, often slowly, over time and we need to remember old and new business key values for the same business object.
-* using a surrogate key also allows us to keep track of different nonkey attribute values for the same product production key with several surrogate keys, each for the different package sizes.
-* surrogate keys are often simpler and shorter
-* surrogate keys can be of the same length and format for all keys
+* Business keys change, often slowly, over time and we need to remember old and new business key values for the same business object.
+* Using a surrogate key also allows us to keep track of different non-key attribute values for the same product production key with several surrogate keys, each for the different package sizes.
+* Surrogate keys are often simpler and shorter
+* Surrogate keys can be of the same length and format for all keys
Grain of the fact table
@@ -245,7 +245,7 @@ This intersection of primary keys is called the grain of the fact table.
> Grain: the level of detail in a fact table, determined by the intersection of all the components of the primary key, including all foreign keys and any other primary key elements.
-A common grain would be each business transaction, such as an individual line item or an individual scanned item on a product sales receipt, a personall change order, a line item on a material receipt, a claim against an insurance policy, a boarding pass, or an individual ATM transaction.
+A common grain would be each business transaction, such as an individual line item or an individual scanned item on a product sales receipt, a personal change order, a line item on a material receipt, a claim against an insurance policy, a boarding pass, or an individual ATM transaction.
Clicks on a website is possibly the lowest level of granularity.
@@ -296,7 +296,6 @@ The User Interface
Traditional query and reporting tools include spreadsheets, personal computer databases and report writers and generators.
-
Role of metadata
The first requirement for building a user-friendly interface is a set of metadata that describes the data.
@@ -306,14 +305,13 @@ or some similar term.
Metadata should allow users to answer questions like:
-* what subjects are described in the data mart?
-* what dimensions and facts are included in the data mart?
+* What subjects are described in the data mart?
+* What dimensions and facts are included in the data mart?
* How are the data in the data mart derived from the enterprise data warehouse data?
* How are the data in the EDW derived from operational data?
* What reports and predefined queries are available to view the data?
* What drill down and other data analysis techniques are available?
-* who is responsible for the quality of data in the data marts, and to whome are requests for changes made?
-
+* Who is responsible for the quality of data in the data marts, and to whom are requests for changes made?
SQL OLAP Quering