Monday, February 6, 2012

PostgreSQL - MVCC revisited

I had previously read about this topic but I needed to grasp it a little bit more, so I made this mental map:



And also deviced a very simple exercise based on this document and this one :


To me, the most important piece of information is this:

Basic idea: tuple is visible if xmin is valid and xmax is not."Valid" means "either committed or the current transaction".

postgres=# CREATE TABLE test(id int, time timestamp);
CREATE TABLE
postgres=# BEGIN;
BEGIN
postgres=# select txid_current();
 txid_current 
--------------
        13756
(1 row)

postgres=# INSERT INTO test values(generate_series(1,100),current_timestamp);
INSERT 0 100
postgres=# SELECT *,xmin,xmax FROM test LIMIT 5;
 id |           time            | xmin  | xmax 
----+---------------------------+-------+------
  1 | 2012-02-06 21:45:00.11556 | 13756 |    0
  2 | 2012-02-06 21:45:00.11556 | 13756 |    0
  3 | 2012-02-06 21:45:00.11556 | 13756 |    0
  4 | 2012-02-06 21:45:00.11556 | 13756 |    0
  5 | 2012-02-06 21:45:00.11556 | 13756 |    0
(5 rows)

postgres=# COMMIT;
COMMIT


Now let's generate some dead tuples:

postgres=# BEGIN;
BEGIN
postgres=# SELECT txid_current();
 txid_current 
--------------
        13758
(1 row)

postgres=# UPDATE test SET time = current_timestamp WHERE id < 6;
UPDATE 100
postgres=# SELECT *,xmin,xmax FROM test LIMIT 5;
 id |            time            | xmin  | xmax 
----+----------------------------+-------+------
  1 | 2012-02-06 21:48:03.560034 | 13758 |    0
  2 | 2012-02-06 21:48:03.560034 | 13758 |    0
  3 | 2012-02-06 21:48:03.560034 | 13758 |    0
  4 | 2012-02-06 21:48:03.560034 | 13758 |    0
  5 | 2012-02-06 21:48:03.560034 | 13758 |    0
(5 rows)

postgres=# DELETE FROM test WHERE id < 6;
DELETE 5
postgres=# SELECT *,xmin,xmax FROM test LIMIT 5;
 id |            time            | xmin  | xmax 
----+----------------------------+-------+------
  6 | 2012-02-06 21:48:03.560034 | 13758 |    0
  7 | 2012-02-06 21:48:03.560034 | 13758 |    0
  8 | 2012-02-06 21:48:03.560034 | 13758 |    0
  9 | 2012-02-06 21:48:03.560034 | 13758 |    0
 10 | 2012-02-06 21:48:03.560034 | 13758 |    0
(5 rows)

postgres=# ROLLBACK;
ROLLBACK

postgres=# SELECT *,xmin,xmax FROM test LIMIT 5;
 id |           time            | xmin  | xmax  
----+---------------------------+-------+-------
  1 | 2012-02-06 21:45:00.11556 | 13756 | 13758
  2 | 2012-02-06 21:45:00.11556 | 13756 | 13758
  3 | 2012-02-06 21:45:00.11556 | 13756 | 13758
  4 | 2012-02-06 21:45:00.11556 | 13756 | 13758
  5 | 2012-02-06 21:45:00.11556 | 13756 | 13758
(5 rows)
How do we get a glance at how many dead tuples in a table?
postgres=# select * from pg_stat_all_tables where relname = 'test';
-[ RECORD 1 ]-----+------------------------------
relid             | 16969
schemaname        | public
relname           | test
seq_scan          | 7
seq_tup_read      | 320
idx_scan          | 
idx_tup_fetch     | 
n_tup_ins         | 100
n_tup_upd         | 105
n_tup_del         | 5
n_tup_hot_upd     | 105
n_live_tup        | 100
n_dead_tup        | 5
last_vacuum       | 
last_autovacuum   | 2012-02-06 21:51:33.926522-06
last_analyze      | 
last_autoanalyze  | 2012-02-06 21:46:33.783941-06
vacuum_count      | 0
autovacuum_count  | 1
analyze_count     | 0
autoanalyze_count | 1

n_dead_tup = number of dead tuples And now, lets VACUUM the table.
postgres=# VACUUM VERBOSE test;
INFO:  vacuuming "public.test"
INFO:  "test": found 5 removable, 100 nonremovable row versions in 1 out of 1 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 100 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.01 sec.
VACUUM


I still feel that I need to dive more into the subject. One thing that would be great is to peep into "deleted" row contents =-).

No comments:

Post a Comment