1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
|
# Unit 8: Data Quality and Database Administration
## Section 1: Data Quality and Integration
Read all section in Chapter 10.
* define the characteristics of data quality;
* identify situations of poor data quality, and measures for improving them;
* describe the following data integration approaches: data consolidation, data federation, and data propagation;
* discuss, in detail, the ETL process for data integration used in the reconciled data layer of data warehousing;
* discuss the data transformation functions.
Data quality:
* identity uniqueness
* accuracy
* consistency
* completeness
* timeliness
* currency
* conformance
* referential integrity
Extract-transform-load (ETL)
1. map source to target and metadata management
2. extract
3. load
4. tranformation
> Data governance: high level organizational groups and processes that oversee data stewardship across the organization.
> It usually guides data quality initiatives, data architecture, data integration and master data management, data warehousing
> and business intelligence, and other data-related matters.
> Data steward: A person assigned the responsibility of ensuring that organizational applications properly support the organization's enterprise goals for data quality.
Data governance program needs:
* sponsorship from both senior management and business units
* a data steward manager to support, train and coordinate the data stewards
* data stewards for different business units, data subjects, source systems or combinations of these elements.
* a governance committee, headed by one person,
but composed of data steward managers,
executives and senior vice presidents,
IT leadership, and other business leaders,
to set strategic goals, coordinate activities,
and provide guidelines and standards for all data management activities.
## Section 2: Data Administration and Database Administration
Read all sections of Chapter 11: "Data and Database Administration."
When you have completed this section, you should be able to
* discuss the need for organizations to have both data administrator(s) and database administrator(s);
* list and discuss the roles of data administrators and database administrators;
* describe the threats addressed by data security;
* discuss the characteristics of most data security plans;
* discuss the Sarbanes-Oxley Act (SOX) for the control and security of databases and applications;
* list and discuss the backup and recovery facilities provided by database management systems;
* discuss the types of database failures and the process for disaster recovery;
* outline different mechanisms used for concurrency control;
* list and discuss the core functions supported by repository engines.
> Database administration: a technical function that is responsible for physical database design with technical issues,
such as security enforcement, database performance and backup and recovery.
* analyze and design database
* select DBMS and related tools
* installing and upgrading the DBMS
* tuning database performance
* improving database query processing performance
* managing data security, privacy and integrity
* performing data backup and recovery
> Open source DBMS: Free DBMS source code software that provides the core functionality of an SQL-compliant DBMS.
> Database security: Protection of database data against accidental or intentional loss, destruction, or misuse.
Threats:
* accidental loss: human error, software/hardware caused breaches
* theft and fraud:
* loss of privacy or confidentiality:
* loss of data integrity:
* loss of availability:
> Authorization rules: controls incorporated in a data management system that restrict access to data and also restrict the actions that people may take when they access data.
Sarbanes-Oxley (SOX)
1. IT change management
2. Logical access to data
3. IT operations
> Database recovery: mechanisms for restoring a database quickly and accurately after loss or damage.
Basic recovery facilities:
1. Backup facilities: which provide periodic backup copies of portions of or the entire database.
2. Journalizing facilities: which maintain an audit trail of transactions and database changes.
3. A checkpoint facility: by which the DBMS periodically suspends all processing and synchronizes its files and journals to establish a recovery point.
4. A recovery manager: which allows the DBMS to restore the database to a correct condition and restart processing transactions.
> Backup facility: A DBMS COPY utility that produces a backup copy of an entire database or a subset of a database.
> Journalizing facility: An audit trail of transactions and database changes.
In the event of a failure, a consistent database state can be reestablished using the info
in the journals with the most recent backup.
1. transaction log: contains a record of the essential data for each transaction that is processed.
2. database change log: contains before and after images of records that have been modified by transactions.
> Transaction: A discrete unit of work that must be completely processed or not processed at all within a computer system.
> Checkpoint facility: A facility by which a DBMS periodically refuses to accept any new transaction. The system is in a quiet state, and the database and transaction logs are synchronized.
> Recovery Manager: A module of a DBMS that restores the database to a correct condition when a failure occurs and then resumes processing user questions.
Recover and restart:
1. Disk Mirroring: Primary + replicas. Promote replica to primary if primary has outage.
2. Restore/Rerun: reprocess day's transactions against the backup copy of the database or portion of the database.
ACID Transactions
1. Atomic: process all of the transaction or nothing.
2. Consistent: any db constraints that were true before the transaction must be true after the transaction.
3. Isolated: changes to db are not revealed to users until transaction is committed.
4. Durable: changes are permanent.
> Transaction boundaries: The logical beginning and end of a transaction.
Backward Recovery (rollback)
The back-out, or undo, of unwanted changes to a database. Before images of the records that have
been changed are applied to the database, and the database is returned to an earlier state.
Rollback is used to reverse the changes made by transaction that have been aborted, or terminated abnormally.
Forward Recovery
A technique that starts with an earlier copy of a database then plays the `after images` of
good transactions.
1. Time consuming logic of reprocessing each transaction does not have to be repeated.
2. Only the most recent after images need to be applied.
Types of failure:
1. Aborted transactions: a transaction in progress that terminates abnormally.
2. Incorrect data: db is updated with incorrect, but valid, data.
Responses to failure:
1. Aborted transaction:
* rollback
* roll forward/return transactions to state just prior to abort
2. Incorrect data
* Rollback
* Reprocess transaction without inaccurate data updates
* Compensating transactions
3. System failure
* switch to duplicate database
* rollback
* restart from checkpoint
4. Database destruction
* Promote replica to primary
* roll forward
* reprocess transactions
|