intermission

author: mo khan <mo.khan@gmail.com> 2020-04-27 09:40:48 -0600
committer: mo khan <mo.khan@gmail.com> 2020-04-27 09:40:48 -0600
commit: e36945574226abe0040d9cf1dfaa9d608f51a665 (patch)
tree: 3a470821d447aca64749e4ae4e7f44d4992bd764
parent: 122b95e1855bc2cddfaefe4397fdaae500a13f7a (diff)
2 files changed, 67 insertions, 0 deletions
diff --git a/README.md b/README.md
index b970b59..139d744 100644
--- a/README.md
+++ b/README.md
@@ -262,3 +262,70 @@ To maintain the order of an index, deletes, updates of data rows means
 having to re-order the index. Insert, update and delete statements will
 increase cost to maintain index. Indexes will negatively impact performance
 for insert, update and deletes.
+
+Who is responsible for indexes?
+
+* do not believe this: `dba has to figure out performance bottleneck and figure out correct indexes.`
+* poor dba doesn't understand the data in the table.
+  * doesn't know what the data means.
+  * doesn't know what has to be fast.
+  * in some cases adding an index afterwards cannot improve a bad query.
+* during development make sure to choose good indexes.
+
+What are indexes are useful for?
+
+Library visual
+
+* table is a library of books
+  * each row is a book
+  * shelves are blocks
+* library catalogue: ordered list of books (index)
+
+* index can be used for a '<' condition.
+
+```sql
+# EXPLAIN SELECT * FROM test WHERE id < 42;
+                                QUERY PLAN
+--------------------------------------------------------------------------
+ Index Scan using test_id_idx on test  (cost=0.43..9.15 rows=41 width=14)
+   Index Cond: (id < 42)
+(2 rows)
+
+Time: 0.630 ms
+```
+
+```sql
+# EXPLAIN SELECT * FROM test WHERE id > 4000000000;
+                               QUERY PLAN
+-------------------------------------------------------------------------
+ Index Scan using test_id_idx on test  (cost=0.43..4.45 rows=1 width=14)
+   Index Cond: (id > '4000000000'::bigint)
+(2 rows)
+
+Time: 0.571 ms
+```
+
+* b-tree indexes can be read in both directions. index wasn't used. why?
+* pg can use the index but chooses not to use the index.
+
+![index-scan](./index-scan.png)
+
+Back and forth between index and table. This is fine when a few rows to scan.
+When a large amount of rows is large. access pattern between heap and index is
+rando I/O. Not as good as sequential I/O. Exceeding a large # of rows means
+it's more efficient to just do a sequential scan rather than go back and forth between
+index and datafiles.
+
+```sql
+# EXPLAIN SELECT min(id) FROM test;
+                                              QUERY PLAN
+------------------------------------------------------------------------------------------------------
+ Result  (cost=0.47..0.48 rows=1 width=8)
+   InitPlan 1 (returns $0)
+     ->  Limit  (cost=0.43..0.47 rows=1 width=8)
+           ->  Index Only Scan using test_id_idx on test  (cost=0.43..213123.91 rows=6291456 width=8)
+                 Index Cond: (id IS NOT NULL)
+(5 rows)
+
+Time: 0.766 ms
+```
diff --git a/index-scan.png b/index-scan.png
new file mode 100644
index 0000000..b317cc6
--- /dev/null
+++ b/index-scan.png
author	mo khan <mo.khan@gmail.com>	2020-04-27 09:40:48 -0600
committer	mo khan <mo.khan@gmail.com>	2020-04-27 09:40:48 -0600
commit	e36945574226abe0040d9cf1dfaa9d608f51a665 (patch)
tree	3a470821d447aca64749e4ae4e7f44d4992bd764
parent	122b95e1855bc2cddfaefe4397fdaae500a13f7a (diff)