Monday, January 30, 2012

Python - My first plpython function

I needed a function that is able to remove white spaces inside a string in order to rename table and columns to a more standard form. There are probably faster / easier ways to do this, but for now, this will suffice.

postgres=# CREATE OR REPLACE FUNCTION remove_blanks (t text)
AS $$
import string
result = ""
for char in t:
  if not(char in string.whitespace):
    result = result + char
return result
$$ LANGUAGE plpython2u;

postgres=# \df
                                List of functions
 Schema |         Name          | Result data type | Argument data types |  Type  
public | remove_blanks         | text             | t text              | normal

postgres=# select remove_blanks('This string contains spaces');
(1 row)

Sunday, January 29, 2012

Python - Generating primes

I always make a generating primes program when I'm learning a new language, as if it were my "hello world". I'm using python 3.2

import math

def cribe(size):
    nums = [0]*size
    for i in range(2,int(math.sqrt(size))):
        if(nums[i] == 1):
        for j in range(2,size):
            if i * j > size - 1:
            if nums[i*j] == 0:
                nums[i*j] = 1
    for i in range(2,size):
        if(nums[i] ==0 ):
limit = input('Up to what number? ')

My IDE is by the moment aptana-studio, haven't found anything better.

Flot - Attractive Javascript plotting for jQuery - MIT License

I just found out about this open source chart plotting library. This is quite a static example, but a very wide array of functionalities are available, many kinds of interactions with the charts are possible. I intent to use it in my personal proyects of data visualization.

I tailored and customized this one, but here are some other examples:

Friday, January 27, 2012

PostgreSQL - Forbid access to a database temporarily.

I found out about this feature, thanks to someone who asked how to do this in the postgresql-es-ayuda list.

Disable connections to a PostgreSQL database temporarily

Today I had to disable access to some PostgreSQL databases temporarily. This can be easily done using pgAdmin. Under database properties, set "Connection Limit " to 0 from the default of -1. The corresponding SQL statement is

When you are ready to allow connections, change this back to -1.


PostgreSQL - Renaming tables and columns with mixed case names

I just imported a mysql database to postgresql, the thing is that tables and columns are written using both upper and lower case characters and we don't like that naming schema.

1. I have this query that lists almost all objects in a particular schema (I stored it as a view named "tables").

view "tables"
SELECT n.nspname as Schema,
  c.relname as Name,
  CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' WHEN 'f' THEN 'foreign table' END as Type,
  pg_catalog.obj_description(c.oid, 'pg_class') as Description
FROM pg_catalog.pg_class c
     LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind IN ('r','')
      AND n.nspname <> 'pg_catalog'
      AND n.nspname <> 'information_schema'
      AND n.nspname !~ '^pg_toast'
  AND pg_catalog.pg_table_is_visible(c.oid)

Now, I generate the needed SQL to rename the tables:

SELECT 'ALTER TABLE ' || $$"$$ || name ||$$"$$||' RENAME TO '|| lower(name)||' ; ' FROM (SELECT name FROM tables WHERE type = 'table') AS t1;

So far so good. But then, how to you rename table columns?

Well, this is my base query (attributes) for that:

SELECT a.attname,
  pg_catalog.format_type(a.atttypid, a.atttypmod),
  (SELECT substring(pg_catalog.pg_get_expr(d.adbin, d.adrelid) for 128)
   FROM pg_catalog.pg_attrdef d
   WHERE d.adrelid = a.attrelid AND d.adnum = a.attnum AND a.atthasdef),
  a.attnotnull, a.attnum,
  (SELECT c.collname FROM pg_catalog.pg_collation c, pg_catalog.pg_type t
   WHERE c.oid = a.attcollation AND t.oid = a.atttypid AND a.attcollation <> t.typcollation) AS attcollation
FROM pg_catalog.pg_attribute a
WHERE a.attrelid = '33954' AND a.attnum > 0 AND NOT a.attisdropped
ORDER BY a.attnum;

The part that needs to be parametrized is : WHERE a.attrelid = ?

I might also use this one:

    a.attname as "Column",
    pg_catalog.format_type(a.atttypid, a.atttypmod) as "Datatype"
    pg_catalog.pg_attribute a
    a.attnum > 0
    AND NOT a.attisdropped
    AND a.attrelid = (
        SELECT c.oid
        FROM pg_catalog.pg_class c
            LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
        WHERE c.relname ~ '^(actividad)$'
            AND pg_catalog.pg_table_is_visible(c.oid)

How to obtain the table's oid ?

SELECT oid::text FROM pg_attribute WHERE attrelid = 'mytable'::regclass;

For instance, obtaining the oid of the table "activity" in my database:

SELECT attrelid FROM pg_attribute WHERE attrelid = 'actividad'::regclass limit 1;
-[ RECORD 1 ]---
attrelid | 33954

Which is the number that my query "attributes" is embedding in the query. I'll finish this post when I get home.
Putting together all that info, I came to this query:
CREATE VIEW columns AS SELECT n.nspname as Schema,
  c.relname as Name,
  a.attname as Col 
FROM pg_catalog.pg_class c
     LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
     LEFT JOIN pg_catalog.pg_attribute a ON a.attrelid = c.relname::regclass    
       n.nspname <> 'pg_catalog'
      AND n.nspname <> 'information_schema'
      AND n.nspname !~ '^pg_toast'
  AND pg_catalog.pg_table_is_visible(c.oid)
  AND c.relkind = 'r'
  AND a.attnum > 0

Now only remains switching to lower case those table columns:

select 'ALTER TABLE ' ||schema||'.'||name||' RENAME COLUMN '||$$"$$||col||$$"$$||' TO '||$$"$$||lower(col)||$$"$$||' ;' FROM columns;

I put the column's name double quoted, as another issue is having those names with blank spaces between words, for which I'll probably need a function, gosh!

Django and python - First steps

I've always wanted to learn python, and I've always wanted to learn to use a web framework. I'm giving Django a try.

I'm developing my first app (tutorial 1), but they suggest that you get comfortable with the api before going through the second one.

I'd never thought that developing a database driven web application would be so easy. I'm starting to like it very much (despite those "expected indent" messages =-) ).

Thursday, January 26, 2012

MySQL - mini-migration to PostgreSQL

Today I had to perform a "nano-migration" from mysql (from a mysql dump) to postgresql, but gee I needed a refresher.

These links were very useful.

To install mysql (on opensuse).

To import the sql dump to mysql.

Enterprisedb offers a migration wizard for this kind of boring and burdensome chores:

Except that it didn't do an automatic conversion of the datetime mysql type, I'm really satisfied with the results.

There are also lots of scripts and other tools available.

Other interesting references:

Tuesday, January 24, 2012

Linux - OpenSUSE startup script with systemd

I've been struggling with my opensuse box to execute some scripts as root when the system starts, and I just found the answer to why it wasn't working.

I have found out that in openSUSE 12.1, some startup scripts do not operate due to the usage of systemd. systemd really speeds up system startup by running the many startup processes in parallel and avoids the usage of shell scripts altogether. That is is good for speed, but not good if you are in need of running something in openSUSE as root when the system is restarted.(1)

1. We need to create the text file after-local.service (shown below) in the folder /lib/systemd/system as root (Alt-F2):


kdesu kwrite /lib/systemd/system/after-local.service

This is what you put into this file and save it:


#  This file is part of systemd.
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.

Description=/etc/init.d/after.local Compatibility



2. Next, we need to add the after-local.service to systemd. Open up a terminal session and run the following command:


sudo systemctl enable /lib/systemd/system/after-local.service

3. Finally, you need to edit the file /etc/init.d/after.local as root and add in the stuff you want to run as root when you startup openSUSE 

Monday, January 23, 2012

Postgresql - Studying the MVCC model

I intent to understand PostgreSQL throughly. I found a great presentation here:
PostgreSQL Through Pictures.
It has a section dealing with this particular issue.

I also like this resource which depicts a practical example.

I find interesting following the examples illustrated with the explanations, I never learn more than when doing, but for now, I'm a bit tired and it's quite late.

Friday, January 20, 2012

Postgresql - Installing Postgresql 9.1 on Amazon Linux

First of all, there's such thing as a Linux from Amazon

Amazon Linux AMI

The Amazon Linux AMI is a supported and maintained Linux image provided by Amazon Web Services for use on Amazon Elastic Compute Cloud (Amazon EC2). It is designed to provide a stable, secure, and high performance execution environment for applications running on Amazon EC2. It also includes packages that enable easy integration with AWS, including launch configuration tools and many popular AWS libraries and tools. Amazon Web Services provides ongoing security and maintenance updates to all instances running the Amazon Linux AMI. The Amazon Linux AMI is provided at no additional charge to Amazon EC2 users.

It comes bundled with postgresql 8.4 which is good enough, but gee, I like bleeding edge stuff and postgresql has improved much since =-)

I followed this tutorial made by some altruistic guy that also shares his discoveries (what would it be the internet without people like that?)

And everything went smoothly, so smoothly that I'm smiling =-).

I'm reproducing the important parts here as a backup.

Update the yum repositories

I want to install the latest stable postgresql from We could just download the rpm and manually install from the file, but that inevitably results in some dependency issues. I prefer to configure an alternate yum repository for a particular keyword. So we need to update the configuration for the Amazon repositories (be sure to update both "main" and "updates", and do not forget the asterisk).

sudo vim /etc/yum.repos.d/amzn-main.repo
[At the bottom of the "[amzn-main]" section, after "enabled=1", add "exclude=postgresql*"]
sudo vim /etc/yum.repos.d/amzn-updates.repo
[Add the same exclude to the bottom of the "[amzn-updates]" section]

Download the repository/key installation rpm from

sudo rpm -ivh pgdg-redhat-9.0-2.noarch.rpm

Since this rpm is generated for RHEL6, we need to make a minor change to the resulting /etc/yum.repos.d/pgdg-90-redhat.repo file. Update the URLs, replacing the $releaseserver value with '6'. The Amazon Linux $releaseserver value is a date, instead of the primary version number that this repository configuration is expecting.

sudo vim /etc/yum.repos.d/pgdg-90-redhat.repo

The updated base url values should look like this (update two of them):

# baseurl=$releasever-$basearch

All we have done is told yum that it should use the amzn repositories for everything except packages that meet the "postgresql*" criteria. After that, install a new repository configuration for the repository.

webProgramming - Getting info about movies

I have a friend that among other things sells movies. I decided to give him a hand to fetch the relevant info about them so he can publish his own movie catalog.
With the api from, it couldn't be easier. You need to request an access key first (it's free) for personal use or to develop an open source application.

I'm using this implementation for php
and an example by Dan Bettles

Fetching the info is as simple as this: 

 * modified a little bit by René Romero

$title = basename(__FILE__, '.php') . ' Demo';

$oResults = false;

if (isset($_GET['sendRequest'])) {
    require_once dirname(dirname(__DIR__)) . '/include/boot.php';
    $oResults = simpletmdb\Tmdb::create('here you put your access key')

        <?php echo $title ?>

id ?> | name ?> | tagline ?> | imdb_id ?> | certification ?> | language ?> | released ?> | homepage ?> | url ?> | runtime ?> | trailer ?> | images->xpath("image[(@type = 'poster') and (@size = 'cover')]")) { ?> | overview ?>| categories && ($oGenres = $oResults->categories->xpath("category[@type = 'genre']"))) { ?> studios && ($oStudios = $oResults->studios->xpath("studio"))){?> | countries && ($oCountries = $oResults->countries->xpath("country"))){?> | cast && ($oCast = $oResults->cast->xpath("person[@job = 'Director']"))){?> | cast && ($oCast = $oResults->cast->xpath("person[@job = 'Producer']"))){?> | cast && ($oCast = $oResults->cast->xpath("person[@job = 'Original Music Composer']"))){?> | cast && ($oCast = $oResults->cast->xpath("person[@job = 'Author']"))){?> | cast && ($oCast = $oResults->cast->xpath("person[@job = 'Actor']"))){?> |

Movie not found

TODO: Batch processing, and remove trailing commas.

Saturday, January 14, 2012

PostgreSQL. fsync and speed

I was startled to read about data corruption in PostgreSQL in the pgsql-general list, I had never heard of such thing. In Oracle it struck me as well, the ones that get corrupted once in a while are the database blocks in the database buffer, and there's nothing to do about it, but nevermind.
It turns out that the person having this issue was running his server with the fsync option turned off.

What does fsync do?

The fsync() function can be used by an application to indicate that all data for the open file description named by fildes is to be transferred to the storage device associated with the file described by fildes in an implementation-dependent manner. The fsync() function does not return until the system has completed that action or until an error is detected.

So changes to a file are made over a memory buffer and then you close it or use fsync to "commit" the changes to disk. If you disable it, might gain some boost on your database speed, but under the risk of loosing data should a system crash or error occur.

As a side note, I stumbled upon this stack overflow question:
Where never_had_a_name asks why on earth doesn't MongoDB use fsync on each write, and the most important answer is:

This is going against ACID, more specifically against the D, which stands for durability:
Durability [guarantees] that once the user has been notified of a transaction's success the transaction will not be lost, the transaction's data changes will survive system failure, and that all integrity constraints have been satisfied, so the DBMS won't need to reverse the transaction.

ACID properties mostly apply to traditional RDBMS systems. NoSQL systems, which includes MongoDB, give up on one or more of the ACID properties in order to achieve better scalability. In MongoDB's case durability has been sacrificed for better performance when handling large amounts of updates.

Should the necessity arise, here is one good piece of advice from Craig Ringer in the pgsql-general discussion list:
Since you were running with fsync off, you must have had good backups or replication configured, because the documentation warns you that running with fsync=off will probably destroy your data and is only for expert users.

In case you don't have a current replica or backup: first, before you do ANYTHING ELSE follow the instructions here:

Now - **AFTER** following the instructions above - start the database in single user recovery mode. See:

where the "--single" command line argument for the "postgres" executable is documented.

Once you have a single user mode backend running you can REINDEX then shut the backend down, start back up in multi-user mode, and see if you have any luck.

IMHO I would never sacrifice data integrity over performance, but you never know when a client would come with a problem like this.

PostgreSQL - export to CSV or any other character delimited files

Just a small tip.

Exporting output to CSV format.

pgbenchdb=# \copy (select * from pgbench_history limit 30) TO 'test.csv' CSV

pgbenchdb=# \! cat test.csv
98,3,256655,-2623,2012-01-11 17:28:15.200909,
43,2,94378,2717,2012-01-11 17:28:15.200909,
46,1,414961,1716,2012-01-11 17:28:15.201082,
45,8,246477,3341,2012-01-11 17:28:15.200927,
84,7,743181,-1348,2012-01-11 17:28:15.201073,
20,1,713970,4518,2012-01-11 17:28:15.20119,
76,8,778718,-2588,2012-01-11 17:28:15.200988,
9,2,683187,-1675,2012-01-11 17:28:15.201074,
71,3,793922,-748,2012-01-11 17:28:15.261927,
93,6,484349,1816,2012-01-11 17:28:15.261776,
2,4,47256,2383,2012-01-11 17:28:15.261615,
64,2,567480,-4884,2012-01-11 17:28:15.201201,
75,8,716626,-293,2012-01-11 17:28:15.2013,
82,9,74376,-4112,2012-01-11 17:28:15.284261,
74,5,13936,-2220,2012-01-11 17:28:15.284299,
70,3,365642,-4313,2012-01-11 17:28:15.284375,
81,4,82626,2974,2012-01-11 17:28:15.285328,
27,2,379620,3392,2012-01-11 17:28:15.261669,
87,8,561117,-153,2012-01-11 17:28:15.284997,
85,9,953844,-4951,2012-01-11 17:28:15.296038,
38,5,447869,-3591,2012-01-11 17:28:15.307329,
73,4,130721,4496,2012-01-11 17:28:15.307251,
1,7,200428,-4388,2012-01-11 17:28:15.307193,
72,2,907918,-4323,2012-01-11 17:28:15.216612,
30,1,76931,4226,2012-01-11 17:28:15.318388,
27,7,87543,3849,2012-01-11 17:28:15.307277,
5,9,963441,-2585,2012-01-11 17:28:15.307221,
19,4,584590,-2414,2012-01-11 17:28:15.331634,
78,5,543756,476,2012-01-11 17:28:15.333069,
99,10,867181,-2597,2012-01-11 17:28:15.332944,

About \copy:

\copy invokes COPY FROM STDIN or COPY TO STDOUT, and then fetches/stores the data in a file accessible to the psql client. Thus, file accessibility and access rights depend on the client rather than the server when \copy is used.

Or using any other delimiter instead of comma, '|' for instance which I regard as being more reliable as commas are sometimes used inside text fields.

pgbenchdb=# COPY (select * from pgbench_history limit 30) TO '/var/lib/pgsql/test.csv' WITH DELIMITER '|';
-- with this variant of the copy command only absolute paths are allowed

the another variant:

pgbenchdb=# \copy (select * from pgbench_history limit 30) TO 'test.csv' WITH DELIMITER '|';
pgbenchdb=# \! cat test.csv
98|3|256655|-2623|2012-01-11 17:28:15.200909|\N
43|2|94378|2717|2012-01-11 17:28:15.200909|\N
46|1|414961|1716|2012-01-11 17:28:15.201082|\N
45|8|246477|3341|2012-01-11 17:28:15.200927|\N
84|7|743181|-1348|2012-01-11 17:28:15.201073|\N
20|1|713970|4518|2012-01-11 17:28:15.20119|\N
76|8|778718|-2588|2012-01-11 17:28:15.200988|\N
9|2|683187|-1675|2012-01-11 17:28:15.201074|\N
71|3|793922|-748|2012-01-11 17:28:15.261927|\N
93|6|484349|1816|2012-01-11 17:28:15.261776|\N
2|4|47256|2383|2012-01-11 17:28:15.261615|\N
64|2|567480|-4884|2012-01-11 17:28:15.201201|\N
75|8|716626|-293|2012-01-11 17:28:15.2013|\N
82|9|74376|-4112|2012-01-11 17:28:15.284261|\N
74|5|13936|-2220|2012-01-11 17:28:15.284299|\N
70|3|365642|-4313|2012-01-11 17:28:15.284375|\N
81|4|82626|2974|2012-01-11 17:28:15.285328|\N
27|2|379620|3392|2012-01-11 17:28:15.261669|\N
87|8|561117|-153|2012-01-11 17:28:15.284997|\N
85|9|953844|-4951|2012-01-11 17:28:15.296038|\N
38|5|447869|-3591|2012-01-11 17:28:15.307329|\N
73|4|130721|4496|2012-01-11 17:28:15.307251|\N
1|7|200428|-4388|2012-01-11 17:28:15.307193|\N
72|2|907918|-4323|2012-01-11 17:28:15.216612|\N
30|1|76931|4226|2012-01-11 17:28:15.318388|\N
27|7|87543|3849|2012-01-11 17:28:15.307277|\N
5|9|963441|-2585|2012-01-11 17:28:15.307221|\N
19|4|584590|-2414|2012-01-11 17:28:15.331634|\N
78|5|543756|476|2012-01-11 17:28:15.333069|\N
99|10|867181|-2597|2012-01-11 17:28:15.332944|\N

You could also fire something like this up:

In order to pass the output to the client, you could do something like this:
rene@linux-2561:~> psql -h -U postgres -c "COPY (SELECT * FROM pgbench_history limit 10) TO STDOUT WITH CSV;" pgbenchdb > test.csv
rene@linux-2561:~> cat test.csv
98,3,256655,-2623,2012-01-11 17:28:15.200909,
43,2,94378,2717,2012-01-11 17:28:15.200909,
46,1,414961,1716,2012-01-11 17:28:15.201082,
45,8,246477,3341,2012-01-11 17:28:15.200927,
84,7,743181,-1348,2012-01-11 17:28:15.201073,
20,1,713970,4518,2012-01-11 17:28:15.20119,
76,8,778718,-2588,2012-01-11 17:28:15.200988,
9,2,683187,-1675,2012-01-11 17:28:15.201074,
71,3,793922,-748,2012-01-11 17:28:15.261927,
93,6,484349,1816,2012-01-11 17:28:15.261776,

Now I'm curious about my statement of some text having embedded commas:
rene@linux-2561:~> psql -h -U postgres -c "COPY (SELECT  'this, is, a,nasty, field') TO STDOUT WITH CSV;" pgbenchdb > test.csv
rene@linux-2561:~> cat test.csv 
"this, is, a,nasty, field"

Nevermind, if it does detect embedded commas it double quotes the field. Nice!

The "\! COMMAND" psql's internal command is great, I just figured it out.

Friday, January 13, 2012

PosgreSQL - monitoring. User auditing "on steroids"

I'm taking a deep look into the latest PostgreSQL added features (in 9.0 >), and I'm very happy with this one:

Instead of being able to set up configuration variables per database or per user, one can now set them up for a certain user in a certain database:

marc=# ALTER ROLE marc IN database marc set log_statement to 'all';

To know who has which variables set-up in which user+database, there is a new psql command:

marc=# \drds
List of settings
role | database | settings
marc | marc | log_statement=all
(1 row)

For instance, I've enabled log_statement = 'all' to my testing role 'rene' on a database called 'rene', even though, the general setting is established to 'none'.
And this is what the log (set as a foreign table) looks like when I execute something like a 'create table' statement.

-[ RECORD 1 ]----------+---------------------------------------------------------
log_time               | 2012-01-13 18:47:17.46-06
user_name              | rene
database_name          | rene
process_id             | 8584
connection_from        | [local]
session_id             | 4f10d072.2188
session_line_num       | 2
command_tag            | idle
session_start_time     | 2012-01-13 18:46:42-06
virtual_transaction_id | 2/89
transaction_id         | 0
error_severity         | LOG
sql_state_code         | 00000
message                | statement: create table test(a int primary key);
detail                 | 
hint                   | 
internal_query         | 
internal_query_pos     | 
context                | 
query                  | 
query_pos              | 
location               | 
application_name       | psql

To me, this level of customization of the auditing feature is just beautiful.

This is what the new \drds psql internal command does behind the scenes.

SELECT rolname AS role, datname AS database,pg_catalog.array_to_string(setconfig, E'\n') AS settings
                       FROM pg_db_role_setting AS s
                       LEFT JOIN pg_database ON pg_database.oid = setdatabase
                       LEFT JOIN pg_roles ON pg_roles.oid = setrole
                       WHERE pg_roles.rolname ~ '^(rene)$'
                       ORDER BY role, database;

Obviously I figured that out through the logs =-).

The novelty created in order to implement this feature is the pg_db_role_setting table, that internally has this structure:

postgres=# \d pg_catalog.pg_db_role_setting
Table "pg_catalog.pg_db_role_setting"
   Column    |  Type  | Modifiers 
 setdatabase | oid    | not null
 setrole     | oid    | not null
 setconfig   | text[] | 
    "pg_db_role_setting_databaseid_rol_index" UNIQUE, btree (setdatabase, setrole), tablespace "pg_global"
Tablespace: "pg_global"


And something as simple as that, is what it takes to make someone like me happy =-D.

Wednesday, January 11, 2012

PostgreSQL - PITR . Getting back lost data, due to "user error".

There are circumstances in which something bad happens in the database (like dropping a table or deleting important data), but if you've got a base backup and a series of consistent wal segments, there's nothing to worry about.

Case: Important data was deleted by a malicious or careless database user.
  • A base backup created before the data was deleted.
  • wal archiving enabled  and working
  • log reports set with log_statement(mod)

Let's say the "malicious" user performs the following operations:

postgres@linux-2561:~> psql -p 5433 pgbenchdb
psql (9.1.1)
Type "help" for help.

pgbenchdb=# select count(*) from pgbench_history;
(1 row)

pgbenchdb=# delete from pgbench_history where extract(second from mtime)::int % 7 = 0;
pgbenchdb=# delete from pgbench_history where extract(second from mtime)::int % 11 = 0;
pgbenchdb=# delete from pgbench_history where extract(second from mtime)::int % 13 = 0;
pgbenchdb=# delete from pgbench_history where extract(second from mtime)::int % 17 = 0;
pgbenchdb=# select count(*) from pgbench_history;
(1 row)

After some time we get a call, telling us that today, some data was deleted from one very important table called pgbench_history.

Problem 1. When was this data deleted?
If we've got the reporting logs set to an appropriate level, this is a rather simple task.

postgres=# select log_time,message from pglog where log_time::date = current_date and message like '%delete%' and message like '%history%' order by log_time asc;
          log_time          |                                        message                                         
 2012-01-11 17:30:26.378-06 | statement: delete from pgbench_history where extract(second from mtime)::int % 7 = 0;
 2012-01-11 17:30:34.854-06 | statement: delete from pgbench_history where extract(second from mtime)::int % 11 = 0;
 2012-01-11 17:30:41.01-06  | statement: delete from pgbench_history where extract(second from mtime)::int % 13 = 0;
 2012-01-11 17:30:45.457-06 | statement: delete from pgbench_history where extract(second from mtime)::int % 17 = 0;
(4 rows)

So, we roll forward (starting from the time that the base backup was taken) up to the time, just before any of those deletions took place, that is, before 2012-01-11 17:30:26.378-06.

To make sure all transactions are present in the archiving directory.
pgbenchdb=# select pg_switch_xlog();
(1 row)

The only thing we have to do is to start our base backup with a recovery.conf file that specifies the "recovery_target_time" as '2012-01-11 17:30:26' for example.

Now, when you're not sure about up to what time to replay the wal files, I think there's no other means than by trial and error. In such cases you better make a copy of your base backup before replaying the wal files.

vi recovery.conf
restore_command = 'cp /var/lib/pgsql/wal_archive/%f %p'
recovery_target_time = '2012-01-11 17:30:26'

Once the recovery finishes, querying the pgbench_history table:

postgres@linux-2561:~/pitr2/pg_log> psql -p 5436 pgbenchdb
psql (9.1.1)
Type "help" for help.

pgbenchdb=# select count(*) from pgbench_history;
(1 row)

One thing that would be wonderful is to go back again in time. The thing here is that WAL replaying means that the changes to the database are reproduced and applied to the data files, then no going back is possible. But nothing could stop you from going further ahead ... I think.

Well you can actually go back by restoring your copy of the base backup and setting an earlier "recovery_target_time" =-) . On this terms it's obvious to infer that you cannot go further back from the time at which the base backup was created.

Postgresql - monitoring. log_statement (mod)

That option enables logging (like the auditing feature in Oracle) in this way:
mod logs all ddl statements, plus data-modifying statements such as INSERT, UPDATE, DELETE, TRUNCATE, and COPY FROM. PREPARE, EXECUTE, and EXPLAIN ANALYZE statements are also logged if their contained command is of an appropriate type.

This is what there is to know about this setting from the pg_settings view:

name       | log_statement
setting    | mod
unit       | 
category   | Reporting and Logging / What to Log
short_desc | Sets the type of statements logged.
extra_desc | 
context    | superuser
vartype    | enum
source     | configuration file
min_val    | 
max_val    | 
enumvals   | {none,ddl,mod,all}
boot_val   | none
reset_val  | mod
sourcefile | /var/lib/pgsql/pitr1/postgresql.conf
sourceline | 400

side note about the superuser context:
These settings can be set from postgresql.conf, or within a session via the SET command; but only superusers can change them via SET. Changes in postgresql.conf will affect existing sessions only if no session-local value has been established with SET.

I'm experimenting with PITR, and I'll need to know up to what point to recover. Ideally just before the deletion of some important data.

Then I execute , some ddl operations to test that they're effectively being logged:

postgres@linux-2561:~/pitr1/pg_log> pgbench -c 4 -j 2 -s 10 -t 10 -p 5433 pgbenchdb
Scale option ignored, using pgbench_branches table count = 10
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 10
query mode: simple
number of clients: 4
number of threads: 2
number of transactions per client: 10
number of transactions actually processed: 40/40
tps = 30.567626 (including connections establishing)
tps = 30.718934 (excluding connections establishing)

And see that effectively they where logged:
postgres@linux-2561:~/pitr1/pg_log> tail -f postgresql-2012-01-11_125210.csv
2012-01-11 13:08:14.660 CST,"postgres","pgbenchdb",10577,"[local]",4f0dde1d.2951,38,"idle in transaction",2012-01-11 13:08:13 CST,2/59,2516,LOG,00000,"statement: UPDATE pgbench_tellers SET tbalance = tbalance + -3973 WHERE tid = 15;",,,,,,,,,""
2012-01-11 13:08:14.660 CST,"postgres","pgbenchdb",10577,"[local]",4f0dde1d.2951,39,"idle in transaction",2012-01-11 13:08:13 CST,2/59,2516,LOG,00000,"statement: UPDATE pgbench_branches SET bbalance = bbalance + -3973 WHERE bid = 3;",,,,,,,,,""
2012-01-11 13:08:14.661 CST,"postgres","pgbenchdb",10577,"[local]",4f0dde1d.2951,40,"idle in transaction",2012-01-11 13:08:13 CST,2/59,2516,LOG,00000,"statement: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (15, 3, 286453, -3973, CURRENT_TIMESTAMP);",,,,,,,,,""
2012-01-11 13:08:14.683 CST,"postgres","pgbenchdb",10579,"[local]",4f0dde1d.2953,34,"idle in transaction",2012-01-11 13:08:13 CST,4/10,2517,LOG,00000,"statement: UPDATE pgbench_tellers SET tbalance = tbalance + 1104 WHERE tid = 94;",,,,,,,,,""
2012-01-11 13:08:14.683 CST,"postgres","pgbenchdb",10579,"[local]",4f0dde1d.2953,35,"idle in transaction",2012-01-11 13:08:13 CST,4/10,2517,LOG,00000,"statement: UPDATE pgbench_branches SET bbalance = bbalance + 1104 WHERE bid = 10;",,,,,,,,,""
2012-01-11 13:08:14.683 CST,"postgres","pgbenchdb",10579,"[local]",4f0dde1d.2953,36,"idle in transaction",2012-01-11 13:08:13 CST,4/10,2517,LOG,00000,"statement: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (94, 10, 413230, 1104, CURRENT_TIMESTAMP);",,,,,,,,,""
2012-01-11 13:08:14.708 CST,"postgres","pgbenchdb",10579,"[local]",4f0dde1d.2953,37,"idle in transaction",2012-01-11 13:08:13 CST,4/11,0,LOG,00000,"statement: UPDATE pgbench_accounts SET abalance = abalance + 2487 WHERE aid = 112736;",,,,,,,,,""
2012-01-11 13:08:14.724 CST,"postgres","pgbenchdb",10579,"[local]",4f0dde1d.2953,38,"idle in transaction",2012-01-11 13:08:13 CST,4/11,2518,LOG,00000,"statement: UPDATE pgbench_tellers SET tbalance = tbalance + 2487 WHERE tid = 85;",,,,,,,,,""
2012-01-11 13:08:14.725 CST,"postgres","pgbenchdb",10579,"[local]",4f0dde1d.2953,39,"idle in transaction",2012-01-11 13:08:13 CST,4/11,2518,LOG,00000,"statement: UPDATE pgbench_branches SET bbalance = bbalance + 2487 WHERE bid = 2;",,,,,,,,,""
2012-01-11 13:08:14.725 CST,"postgres","pgbenchdb",10579,"[local]",4f0dde1d.2953,40,"idle in transaction",2012-01-11 13:08:13 CST,4/11,2518,LOG,00000,"statement: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (85, 2, 112736, 2487, CURRENT_TIMESTAMP);",,,,,,,,,""

In a more structured fashion (setting up the log as a foreign table):

postgres=# select * from pglog where message like '%UPDATE%';

log_time               | 2012-01-11 13:38:34.642-06
user_name              | postgres
database_name          | pgbenchdb
process_id             | 12719
connection_from        | [local]
session_id             | 4f0de53a.31af
session_line_num       | 1
command_tag            | idle in transaction
session_start_time     | 2012-01-11 13:38:34-06
virtual_transaction_id | 5/1216
transaction_id         | 0
error_severity         | LOG
sql_state_code         | 00000
message                | statement: UPDATE pgbench_accounts SET abalance = abalance + 2131 WHERE aid = 423974;
detail                 | 
hint                   | 
internal_query         | 
internal_query_pos     | 
context                | 
query                  | 
query_pos              | 
location               | 
application_name       |

Now, there's no place for doubts about who did what.


Other references:

Now, in a busy system having such granular level of logging will generate huge logs, so specifying a log rotation setting is a must.

Tuesday, January 10, 2012

Postgresql - the pg_xlog directory

I'll talk about this topic on the light of my previous entry:
The sub-directory pg_xlog inside the database cluster ($PGDATA) keeps transaction logs (or Write Ahead Log - WAL).

I noticed that when archiving is enabled a couple of WAL segments are kept here and it's only after a pg_switch_pgxlog(), that they're transfered to the target archiving directory.

In the docs they mention that it's wise to keep this directory on a separate device for performance reasons, by making a symbolic link to a directory outside the data cluster.

If you don't specify a restore command or if it fails consistently , pg_xlog will grow indefinitely, that's one good thing to have in mind.

Postgresql - Recovery through archiving (PITR). Simple example

Ok, I have a brand new PostgreSQL 9.1 installation and I'll set up a simple recovery scenario.
My (brand new)database cluster path:

Step1. Before starting the server, enable archiving.
a. Define where the WAL logs will be archived.
For these purposes I'll use the directory /var/lib/pgsql/wal_archive

b.Configure the connection and archiving options (I changed these from the default values):

vi postgresql.conf
listen_addresses = '*'          # what IP address(es) to listen on;
port = 5433                     # I have the 5432 busy with another experiment
wal_level = archive             # minimal, archive, or hot_standby
                                # (change requires restart)
archive_mode = on               # allows archiving to be done
                                # (change requires restart)
archive_command = 'cp %p /var/lib/pgsql/wal_archive/%f'
max_wal_senders = 5             # max number of walsender processes
                                # (change requires restart)

The part worth explaining is this:
archive_command = 'cp %p /var/lib/pgsql/wal_archive/%f'
with this command we instruct postgresql to archive each wal segment (equivalent to a redo log in Oracle) it generates, represented with the variable %p (this variable will be replaced with its full path) to the directory /var/lib/pgsql/wal_archive. %f stands for the name of the wal segment and it will be replaced with the actual wal segment name behind the scenes too. PostgreSQL will handle for us those %p and %f variables.
The default value for the parameter max_wal_senders must be set to a positive integer value greater than cero for this to work.

c. Configure the connection settings. (I'll make the base backup through the pg_basebackup function, so authentication for replication is required.
vi pg_hba.conf
# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local   all             all                                     trust
# IPv4 local connections:
host    all             all               trust
# IPv6 local connections:
host    all             all             ::1/128                 trust
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     postgres                                trust
host    replication     postgres            trust
#host    replication     postgres        ::1/128                 trust

Step2. Start up the server, create a test database, make some ddl/dml operations and confirm that wal segments are being sent to the location specified.

Starting the server up:
postgres@linux-2561:~/pitr1> pg_ctl start -D .
postgres@linux-2561:~/pitr1> cd pg_log/
postgres@linux-2561:~/pitr1/pg_log> ls
postgres@linux-2561:~/pitr1/pg_log> tail -f postgresql-2012-01-10_114123.log 
2012-01-10 11:41:23 CST   LOG:  database system was shut down at 2012-01-10 11:10:12 CST
2012-01-10 11:41:23 CST   LOG:  database system is ready to accept connections
2012-01-10 11:41:23 CST   LOG:  autovacuum launcher started
#so it started without error

Creating the database and filling it with dummy data:
postgres@linux-2561:~/pitr1/pg_log> createdb -p 5433 pgbenchdb
postgres@linux-2561:~/pitr1/pg_log> pgbench -i -s 10 -p 5433 pgbenchdb
NOTICE:  table "pgbench_branches" does not exist, skipping
NOTICE:  table "pgbench_tellers" does not exist, skipping
NOTICE:  table "pgbench_accounts" does not exist, skipping
NOTICE:  table "pgbench_history" does not exist, skipping
creating tables...
1000000 tuples done.
set primary key...
NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index "pgbench_branches_pkey" for table "pgbench_branches"
NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index "pgbench_tellers_pkey" for table "pgbench_tellers"
NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index "pgbench_accounts_pkey" for table "pgbench_accounts"

verifying that wal segments are being archived.
postgres@linux-2561:~/pitr1/pg_log> ls ../../wal_archive/
000000010000000000000001  000000010000000000000004  000000010000000000000007  00000001000000000000000A
000000010000000000000002  000000010000000000000005  000000010000000000000008
000000010000000000000003  000000010000000000000006  000000010000000000000009

make some more transactions that among other thing affect the pgbench_history table that keeps track of the time an insertion was made.
postgres@linux-2561:~/pitr1/pg_log> pgbench -s 10 -c 6 -j 2 -t 100 -p 5433 pgbenchdb 
Scale option ignored, using pgbench_branches table count = 10
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 10
query mode: simple
number of clients: 6
number of threads: 2
number of transactions per client: 100
number of transactions actually processed: 600/600
tps = 187.111393 (including connections establishing)
tps = 187.944077 (excluding connections establishing)

Step3. Make the base backup .
In this case I'll perform the recovery on the same machine.

postgres@linux-2561:~/pitr1> cd 
postgres@linux-2561:~> ls
data_sr  pgbench_transaction.sql  pgpool_scripts_backup  pitr1  repmgr  unclassified  wal_archive
postgres@linux-2561:~> mkdir pitr2
postgres@linux-2561:~> pg_basebackup -p 5433 -D /var/lib/pgsql/pitr2/
NOTICE:  pg_stop_backup complete, all required WAL segments have been archived

A file with backup extension was generated with the following contents:
START WAL LOCATION: 0/E000020 (file 00000001000000000000000E) <- the name of the starting wal file
STOP WAL LOCATION: 0/E000094 (file 00000001000000000000000E)
START TIME: 2012-01-10 13:10:00 CST <- the time at which pg_start_backup was run
LABEL: pg_basebackup base backup <- label given in pg_start_backup (executed automatically by pg_basebackup)
STOP TIME: 2012-01-10 13:10:02 CST
a. Change this backup server port so I does not conflict with the original one.
postgres@linux-2561:~/pitr2> ls
backup_label  global   pg_hba.conf    pg_log        pg_notify  pg_stat_tmp  pg_tblspc    PG_VERSION  postgresql.conf
base          pg_clog  pg_ident.conf  pg_multixact  pg_serial  pg_subtrans  pg_twophase  pg_xlog     postmaster.opts
postgres@linux-2561:~/pitr2> vi postgresql.conf 
listen_addresses = '*'          # what IP address(es) to listen on;
                                        # comma-separated list of addresses;
                                        # defaults to 'localhost', '*' = all
                                        # (change requires restart)
port = 5434 
Step4. Create the recovery.conf file on the backup cluster The backup server will read this file on start up and it will "know" that it has to perform recovery (read wal segments from the specified location and apply them to data files) starting from the first wal segment generated during backup.
vi /var/lib/pgsql/pitr2/recovery.conf
recovery_command = 'cp /var/lib/pgsql/wal_archive/%f %p' #this is the only mandatory setting.
Before starting up the server, I'd like to generate more WAL segments on the master.
postgres@linux-2561:~/pitr2> pgbench -s 10 -c 6 -j 2 -t 100 -p 5433 pgbenchdb 
Scale option ignored, using pgbench_branches table count = 10
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 10
query mode: simple
number of clients: 6
number of threads: 2
number of transactions per client: 100
number of transactions actually processed: 600/600
tps = 191.475816 (including connections establishing)
tps = 192.120436 (excluding connections establishing)
Take note of the last transaction timestamp.
pgbenchdb=# SELECT max(mtime) from pgbench_history;
 2012-01-10 13:54:47.366648
(1 row)
Now, there's the possibility that the last transactions on the principal server aren't still stored in the last wal segment, so we force a log switch (that executes with a previous checkpoint). Also, you can force a segment switch manually with pg_switch_xlog, if you want to ensure that a just-finished transaction is archived as soon as possible
postgres@linux-2561:~/pitr2> psql -p 5433 -c "select pg_switch_xlog();"
(1 row)
One thing I need to review is this info in the manual.
you should at least save the contents of the cluster’s pg_xlog subdirectory, as it might contain logs which were not archived before the system went down.
Step5. Having everything set up. Let's start up the backup server. Expected result: to have an exact copy of the master server being the last transaction timestamp: 2012-01-10 13:54:47.366648
postgres@linux-2561:~/pitr2> pg_ctl start -D .
server starting
postgres@linux-2561:~/pitr2> 2012-01-10 21:21:04 GMT   FATAL:  data directory "/var/lib/pgsql/pitr2" has group or world access
2012-01-10 21:21:04 GMT   DETAIL:  Permissions should be u=rwx (0700).
Yikes!!!!! Ok then.
postgres@linux-2561:~> chmod 700 -R pitr2
postgres@linux-2561:~> pg_ctl start -D pitr2/
server starting
postgres@linux-2561:~> tail -f pitr2/pg_log/postgresql-2012-01-10_1
postgresql-2012-01-10_114123.log  postgresql-2012-01-10_130842.log  postgresql-2012-01-10_152456.log
postgres@linux-2561:~> tail -f pitr2/pg_log/postgresql-2012-01-10_152456.log 
2012-01-10 15:24:56 CST   LOG:  could not open file "pg_xlog/000000010000000000000010" (log file 0, segment 16): No such file or directory
2012-01-10 15:24:56 CST   LOG:  redo done at 0/F87914C
2012-01-10 15:24:56 CST   LOG:  last completed transaction was at log time 2012-01-10 13:55:13.965712-06
2012-01-10 15:24:56 CST   LOG:  restored log file "00000001000000000000000F" from archive
cp: cannot stat `/var/lib/pgsql/wal_archive/00000002.history': No such file or directory
2012-01-10 15:24:56 CST   LOG:  selected new timeline ID: 2
cp: cannot stat `/var/lib/pgsql/wal_archive/00000001.history': No such file or directory
2012-01-10 15:24:56 CST   LOG:  archive recovery complete
2012-01-10 15:24:59 CST   LOG:  autovacuum launcher started
2012-01-10 15:24:59 CST   LOG:  database system is ready to accept connections
Regarding to the .history files not found, the docs say: You may also see an error message at the start of recovery for a file named something like 00000001.history. This is also normal and does not indicate a problem in simple recovery situations.
Now, to make some generic tests for consistency. Data in the primary server:
postgres@linux-2561:~> psql -p 5433 pgbenchdb
psql (9.1.1)
Type "help" for help.

pgbenchdb=# \dt+
                          List of relations
 Schema |       Name       | Type  |  Owner   |  Size  | Description 
 public | pgbench_accounts | table | postgres | 124 MB | 
 public | pgbench_branches | table | postgres | 40 kB  | 
 public | pgbench_history  | table | postgres | 88 kB  | 
 public | pgbench_tellers  | table | postgres | 40 kB  | 
(4 rows)

pgbenchdb=# select max(mtime) from pgbench_history;
 2012-01-10 13:54:47.366648
(1 row)
Data in the backup server:
postgres@linux-2561:~> psql -p 5434 pgbenchdb
psql (9.1.1)
Type "help" for help.

pgbenchdb=# \dt+
                          List of relations
 Schema |       Name       | Type  |  Owner   |  Size  | Description 
 public | pgbench_accounts | table | postgres | 124 MB | 
 public | pgbench_branches | table | postgres | 40 kB  | 
 public | pgbench_history  | table | postgres | 88 kB  | 
 public | pgbench_tellers  | table | postgres | 40 kB  | 
(4 rows)

pgbenchdb=# select max(mtime) from pgbench_history;
 2012-01-10 13:54:47.366648
(1 row)

Now my wal_archive directory looks like this:
postgres@linux-2561:~/wal_archive> ls
000000010000000000000001  000000010000000000000006  00000001000000000000000B  00000001000000000000000F
000000010000000000000002  000000010000000000000007  00000001000000000000000C  00000002000000000000000F
000000010000000000000003  000000010000000000000008  00000001000000000000000D  00000002.history
000000010000000000000004  000000010000000000000009  00000001000000000000000E
000000010000000000000005  00000001000000000000000A  00000001000000000000000E.00000020.backup
This recovery produced an 00000002.history file that contains this timeline:
1       00000001000000000000000F        no recovery target specified
So this is the last wal segment that was applied to data files in this recovery. After that wal segment a new series was created, namely:
So, it only incremented the left most hexadecimal number to create this new wal segment series.

PostgreSQL - Full text search. Installing a new german dictionary

It really startled me PostgreSQL not being able to tell that 'Freundin' belongs to the same word family as 'Freund'. I looked it up and turned out that no appropriate german dictionary was installed on my system.

Step1. Install the dictionary.
Main website:

Extract it and place it into this directory or equivalent:

Step2. We'll need to do some encoding conversion:

[root@localhost german_compound]# 
iconv -f ISO8859-1 -t utf8 german.dic > ../german.dict
iconv -f ISO8859-1 -t utf8 german.aff > ../german.affix
iconv -f ISO8859-1 -t utf8 german.stop > ../german.stop

Step3. Follow this configuration track:
I named my configuration 'de'

Step4. Test the results.

postgres=# SELECT to_tsvector('de','Freundin') @@ to_tsquery('de','Freund');
(1 row)

postgres=# SELECT to_tsvector('de','Studentin') @@ to_tsquery('de','Student');
(1 row)
postgres=# SELECT to_tsvector('de','Freunde') @@ to_tsquery('de','Freund');
(1 row)
postgres=# SELECT to_tsvector('de','freundlich') @@ to_tsquery('de','Freund');
(1 row)

Now I'll add a good spanish one. So it seems I'll be able to sleep tonight, at least as soon as I play around with PITR a little bit.

Monday, January 9, 2012

Tools - Flowcharting software . yEd

I'm using this app to make some flowcharts and I'm liking it very much. It runs on Windows, Linux and Mac.

And guess what? It's free!.

Other alternatives are discussed here.

Postgresql - Full text search. First encounter

I'm amazed about the text searching capabilities embedded into PostgreSQL. It has the 'ability' to tell apart word variants and classify them as belonging to the same family (through their normalization).

My default configuration
pgbenchdb=# show default_text_search_config;
(1 row)

Though ,I better make it explicit in each case.

For instance:

Base word: friend
variants: friends , friendly

--in english
pgbenchdb=# SELECT to_tsvector('english','friends') @@ to_tsquery('english','friend');                                     
(1 row)    
--it matches                                                                                                                   
pgbenchdb=# SELECT to_tsvector('english','friendly') @@ to_tsquery('english','friend');                                       
(1 row)

--in spanish

pgbenchdb=# SELECT to_tsvector('spanish','amigos') @@ to_tsquery('spanish','amigo');
(1 row)

pgbenchdb=# SELECT to_tsvector('spanish','amigable') @@ to_tsquery('spanish','amigo');
(1 row)

--in german

pgbenchdb=# SELECT to_tsvector('german','Freunde') @@ to_tsquery('german','Freund');
(1 row)

pgbenchdb=# SELECT to_tsvector('german','Freundin') @@ to_tsquery('german','Freund');
(1 row)
--it doesn't match!!!???

pgbenchdb=# SELECT to_tsvector('german','freundlich') @@ to_tsquery('german','Freund');
(1 row)
--it doesn't match!!!???

Well, the german configuration needs further development, it ain't an easy language, that's a fact.

Here is a very good insight about different text search options, including this one.

I'll look up the Postgresql extension network, I might find something useful.

Postgresql - the pg_settings view

This view is roughly the equivalent to the v$spparameter/v$parameter views in Oracle. Each setting related with the running instance is detailed and documented here . From now on I'll try to relate everything I study about PostgreSQL with its counterpart on Oracle, so I reinforce the topics I learned in the OCA certification.


The view pg_settings provides access to run-time parameters of the server. It is essentially an alternative interface to the SHOW and SET commands. It also provides access to some facts about each parameter that are not directly available from SHOW, such as minimum and maximum values.

How many such settings are there?
postgres=# select count(*) from pg_settings;
(1 row)

One thing I regard worth looking up, is how to tell what settings were changed but are not yet applied as you need to restart the server for them to take effect (those with the postmaster context).

Let's play around a little bit. What are those settings?

postgres=# select name,setting from pg_settings where context = 'postmaster';
              name              |               setting                
 allow_system_table_mods        | off
 archive_mode                   | off
 autovacuum_freeze_max_age      | 200000000
 autovacuum_max_workers         | 3
 bonjour                        | off
 bonjour_name                   | 
 config_file                    | /var/lib/pgsql/data1/postgresql.conf
 data_directory                 | /var/lib/pgsql/data1
 external_pid_file              | 
 hba_file                       | /var/lib/pgsql/data1/pg_hba.conf
 hot_standby                    | off
 ident_file                     | /var/lib/pgsql/data1/pg_ident.conf
 listen_addresses               | localhost
 logging_collector              | on
 max_connections                | 100
 max_files_per_process          | 1000
 max_locks_per_transaction      | 64
 max_pred_locks_per_transaction | 64
 max_prepared_transactions      | 0
 max_wal_senders                | 0
 port                           | 5432
 shared_buffers                 | 3072
 shared_preload_libraries       | 
 silent_mode                    | off
 ssl                            | off
 ssl_ciphers                    | ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH
 superuser_reserved_connections | 3
 track_activity_query_size      | 1024
 unix_socket_directory          | 
 unix_socket_group              | 
 unix_socket_permissions        | 0777
 wal_buffers                    | 96
 wal_level                      | minimal
(33 rows)

Let's say I want to restrict the max_connections setting to half its default value.

postgres=# update pg_catalog.pg_settings set setting = '50' where name = 'max_connections';
ERROR:  parameter "max_connections" cannot be changed without restarting the server

Aha! So, it's much more simple than that, it won't let you do it through this view! Well,on these grounds, my question does not apply to PostgreSQL.

So I guess, this implementation is analogous to starting the Oracle instance with a pfile, so no need to have two views as oracle does, one for current settings and one for those written in the spfile that might not be effective just yet.

Sunday, January 8, 2012

Postgresql - Foreign Tables. twitter_fdw. advanced twitter searches

Case scenario:
Postgresql 9.1 on Fedora 15

Objective: find the latest 100 tweets, that talk about postgresql and contain the word 'query' .

My latest experience with this topic:

Prerequisite: Postgresql 9.1 with contrib modules and postgresql9.1-server-devel (or whatever is called in your distro).
Step1. Install the twitter_fdw wrapper:

Step2. Create the extension in the database you'll use for this purpose.
postgres=# CREATE EXTENSION twitter_fdw;

Step3. Fire the query away. (The CREATE EXTENSION statement creates not only FDW handlers but also Data Wrapper, Foreign Server, User Mapping and twitter table) <- isn't that great?

postgres=# SELECT from_user AS user , text::varchar(144) AS message FROM twitter WHERE q = '#postgresql' and text::varchar(144) ilike '%query%';

The result:
user     |                                                                  message                       
--------------+------------------------------------------------------------------------------------------------ -------------------------------------------
 postgresqlbr | Parallel query processing may finally be coming to #PostgreSQL - Parallel query processing m...
 chenghlee    | Parallel query processing may finally be coming to #PostgreSQL
(2 rows)

I didn't like a column named 'text', so I renamed it to 'message' but the thing stopped working (I should have known better --> the mapping! duh), so I switched the change back. I'll define a custom foreign server, user mapping , and foreign table latter just for practice.

Saturday, January 7, 2012

Postgresql - Foreign tables. file_fdw wrapper. querying the log file

The foreign table functionality has been around for quite some time in the SQL Standard. A foreign table derives from the concept of External Data Management (SQL/MED) .

SQL/MED is Management of External Data, a part of the SQL standard that deals with how a database management system can integrate data stored outside the database. There are two components in SQL/MED:

Foreign Table
a transparent access method for external data

Another reference:

In 2003, a new extension called SQL/MED ("SQL Management of External Data") was added to the SQL standard. It is a standardized way of handling access to remote objects in SQL databases. In 2011, PostgreSQL 9.1 was released with a great support of this standard.

In a nutshell, you can now use various Foreign Data Wrappers (FDW) to connect a PostgreSQL Server to remote data stores.

In this post I'll be using the file_fdw extension:

What does it do?
The file_fdw module provides the foreign-data wrapper file_fdw, which can be used to access data files in the server's file system. Data files must be in a format that can be read by COPY FROM; see COPY for details.

Case scenario. Setup a foreign table to read the server log directly within the database manager.
Postgresql version 9.1.1 on Debian squeeze
Step 0. Install if you haven't done it, the contrib module
root@debian1:~# apt-get install postgresql-contrib-9.1

Step 1. Reconfigure your postgresql.conf file so that logs are generated in a csv format and appropriate rotation of the log is chosen, and reload the configuration.

I though of these parameters (note that a static name for the generated log file is required for obvious reasons).

# - Where to Log -

log_destination = 'csvlog'              # Valid values are combinations of
                                        # stderr, csvlog, syslog, and eventlog,
                                        # depending on platform.  csvlog
                                        # requires logging_collector to be on.

# This is used when logging to stderr:
logging_collector = on                  # Enable capturing of stderr and csvlog
                                        # into log files. Required to be on for
                                        # csvlogs.
                                        # (change requires restart)

# These are only used if logging_collector is on:
log_directory = 'pg_log'                # directory where log files are written,
                                        # can be absolute or relative to PGDATA
log_filename = 'postgresql_log' # log file name pattern,
#log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'        # log file name pattern,
                                        # can include strftime() escapes
#log_file_mode = 0600                   # creation mode for log files,
                                        # begin with 0 to use octal notation
log_truncate_on_rotation = on         # If on, an existing log file with the
                                        # same name as the new log file will be
                                        # truncated rather than appended to.
                                        # But such truncation only occurs on
                                        # time-driven rotation, not on restarts
                                        # or size-driven rotation.  Default is
                                        # off, meaning append to existing files
                                        # in all cases.
log_rotation_age = 5d                   # Automatic rotation of logfiles will
                                        # happen after that time.  0 disables.

Step 2. Read through and follow along the example depicted in this link (the only change needed is the path to your logfile):

Step 3. Query the log in whatever fashion you like.
For example:
postgres@debian1:~/data_sr$ psql -p 5432 pgbenchdb
psql (9.1.1)
Type "help" for help.
pgbenchdb=# select log_time, error_severity, message from pglog order by log_time desc;

log_time          | error_severity |                                                         message                                                         
 2012-01-07 20:00:43.223-06 | ERROR          | column "logtime" does not exist
 2012-01-07 19:58:57.02-06  | WARNING        | transaction log file "000000020000000000000041" could not be archived: too many failures
 2012-01-07 19:58:57.02-06  | LOG            | archive command failed with exit code 1
 2012-01-07 19:58:56.009-06 | LOG            | archive command failed with exit code 1
 2012-01-07 19:58:54.998-06 | LOG            | archive command failed with exit code 1
 2012-01-07 19:57:54.93-06  | LOG            | archive command failed with exit code 1
 2012-01-07 19:57:54.93-06  | WARNING        | transaction log file "000000020000000000000041" could not be archived: too many failures
 2012-01-07 19:57:53.92-06  | LOG            | archive command failed with exit code 1
 2012-01-07 19:57:52.91-06  | LOG            | archive command failed with exit code 1
 2012-01-07 19:56:52.837-06 | WARNING        | transaction log file "000000020000000000000041" could not be archived: too many failures

On that warning:the transaction log file couldn't be archived, as the target machine where I was sending it to, was down. =-)
Publishing from my fortress of solitude, they gave me back my keys. Yay!!!!

Other references:

Friday, January 6, 2012

Postgresql - Streaming replication: synchronous replication

One cool new feature in postgresql 9.1 is the ability to set up synchronous streaming replication. What does that mean? That one transaction will be committed only when it has been written to wal records (being sent) and applied to the standby servers that are replicating data in a synchronous manner. This sounds like an extra overhead to dml operations, so I guess it's advisable for you to have an excellent communication pathway between the master and the synchronous standby servers. The main advantage of this set up is that you can guarantee to a great deal the reliability of data in case of failure, if one transaction was given the status of committed it will become available instantly in all the synchronous standby's as well, so you'll have virtually no data loss in case of contingencies.

Here are the steps to configure this kind of streaming replication:

The important part being:

To switch Asynchronous replication to Synchronous Replication,use following steps:

1. Change following parameter in postgresql.conf on Primary Server:
synchronous_standby_names = 'sync_replication'

2. Reload the above changes on primary using following command:
pg_ctl -D [primary data directory] reload

3. Change following parameter in recovery.conf file on Standby:
primary_conninfo = 'host=localhost port=5432 application_name=sync_replication'
4. Restart the standby using following command:
pg_ctl -D [standby directory] restart -m fast
Then ,the difference is that :
  • you specify the synchronous_standby_names in the master's configuration, for the standby servers that will replicate in a synchronous fashion.
  • reload the master's configuration
  • add the application_name option to the primary_conninfo string in the recovery.conf file in each synchronous standby.
  • restart the standby server.
Testing case:
+ Start a big transaction on the master server and kill one synchronous slave in the middle of it.
Expected result:
+ The transaction will  be rolled back. No more transactions will be allowed 'till the synchronous standby server becomes available and re-synchronizes.

+ The transaction is stuck!!!
This is what the master says about it:

in the log:
2012-01-06 17:09:58 CST [unknown] postgres LOG:  could not send data to client: Connection reset by peer

current activity in pg_stat_activity:
-[ RECORD 2 ]----+--------------------------------
datid            | 24576
datname          | pgbenchdb
procpid          | 13296
usesysid         | 10
usename          | postgres
application_name | psql
client_addr      | 
client_hostname  | 
client_port      | -1
backend_start    | 2012-01-06 17:09:50.726341-06
xact_start       | 2012-01-06 17:09:50.728063-06
query_start      | 2012-01-06 17:10:01.36678-06
waiting          | f
current_query    | commit;

the replication state in pg_stat_replication:

pgbenchdb=# select * from pg_stat_replication;
(No rows)

I was kind of expecting that.

There's no remedy for me other than to kill the transaction manually (there must me some sort of replication timeout parameter somewhere!).
But before that, am I able to execute dml operations through another session?
Yes!!!! Oops! (but this happens after I cancel it)

If I cancel manually the hung up transaction I get the following message:

pgbenchdb=# insert into pgbench_history (tid) values ( generate_series(1,100));
^CCancel request sent
WARNING:  canceling wait for synchronous replication due to user request
DETAIL:  The transaction has already committed locally, but might not have been replicated to the standby.
INSERT 0 100

postgres@debian1:~$ psql -p 5432 pgbenchdb < test_sync_replication.sql
INSERT 0 100000
INSERT 0 100000
INSERT 0 100000
INSERT 0 100000
INSERT 0 100000
INSERT 0 100000
^CCancel request sent
WARNING:  canceling wait for synchronous replication due to user request
DETAIL:  The transaction has already committed locally, but might not have been replicated to the standby.
Well, canceling a request is not something that applications usually are programmed to. But the issue that worries me is that it won't return an error message, it will freeze waiting for the synchronous standby to reply! Need to dive in the docs to figure out what the hell is going on. Despite the above results, I think it's important to pinpoint that the status of committed was not granted in neither the transactions.

Postgresql - psql passing variables to a sql script

It could arise the necessity to parametrize the scripts that we pass to psql, maybe just for testing purposes. For instance, this is the script that pgbench uses for each transaction that simulated clients execute:


    UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;

    SELECT abalance FROM pgbench_accounts WHERE aid = :aid;

    UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;

    UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;

    INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);


I called this script : pgbench_transaction.sql
As you can see, each variable has a colon as prefix. You can execute manually by invoking psql like this:

postgres@linux-2561:~> psql -p 9999 -v tid=1 -v bid=2 -v aid=3 -v delta=4  pgbenchdb < pgbench_transaction.sql
so you prefix each variable=value pair with a -v flag And the output:
(1 row)

an alternative way to do this "parameter passing" is by including set statements inside the script, obviously this approach is less flexible.
        \set tid 1
        \set bid 2
        \set aid 3
        \set delta 4

    UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
    SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
    UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
    UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
    INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
but useful nonetheless. There's an issue with using quotes, they must be escaped. For instance:
\set myvariable '\'somestring\'' SELECT * FROM FOO WHERE BAR = :myvariable

Postgresql - pgpool : num_init_children

Yesterday I was hitting my head against a wall trying to figure out why pgbench got stuck while doing a simple test involving 50 concurrent clients. It turns out that I was overlooking the "num_init_children" parameter in the pgpool configuration.

  • pgpool-II accepts up to num_init_children concurrent sessions. If the number of concurrent sessions reach num_init_children, new session will be queued
  • pgbench creates concurrent connections before starting actual transactions. So if the number of concurrent transactions specified by "-c" exceeds num_init_children, pgbench will stuck because it will wait for pgpool accepting connections forever.
  • On the other hand PostgreSQL does not accept concurrent sessions more than max_connections.
  • If you want to test pgpool-II's connection queuing, you can use psql instead of pgbench.
Reference. Question number 10.

I guess I need to follow what's usually told in this cases : RTFM!
Then the default value for that parameter is 32, setting it up to 51:

transaction type: TPC-B (sort of)
scaling factor: 10
query mode: simple
number of clients: 50
number of threads: 2
number of transactions per client: 10
number of transactions actually processed: 500/500
tps = 33.339290 (including connections establishing)
tps = 33.685822 (excluding connections establishing)

Not that bad, but it can improve, that's for sure.

Postgresql - Spotting differences between schemas

How do you tell the difference between database schemas in an automatic fashion?

One approach is to make a sql dump from both schemas and apply the diff unix command.

I'll use a toy database called switch_books

First make the first backup (the "before" backup)
-bash-4.2$ pg_dump switch_books > switch_books_before.sql
-bash-4.2$ psql switch_books

Then I create a dummy table:
switch_books=# create table fake_table (id integer primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "fake_table_pkey" for table "fake_table"
switch_books=# \q

And take a new shot (the after backup)
-bash-4.2$ pg_dump switch_books > switch_books_after.sql

# generate the diff file
-bash-4.2$ diff switch_books_before.sql switch_books_after.sql > switch_books_diff.sql

-bash-4.2$ cat switch_books_diff.sql | less

And reviewing the results:
> -- Name: fake_table; Type: TABLE; Schema: public; Owner: postgres; Tablespace:
> --
> CREATE TABLE fake_table (
>     id integer NOT NULL
> );
> ALTER TABLE public.fake_table OWNER TO postgres;
> --
> -- Data for Name: fake_table; Type: TABLE DATA; Schema: public; Owner: postgres
> --
> COPY fake_table (id) FROM stdin;
> \.
> --
> -- Name: fake_table_pkey; Type: CONSTRAINT; Schema: public; Owner: postgres; Tablespace:
> --
> ALTER TABLE ONLY fake_table
>     ADD CONSTRAINT fake_table_pkey PRIMARY KEY (id);

RAF mentioned today this utility in a conversation.

There are other tools that go one step beyond by generating a sql script that will make the differences go away.

Thursday, January 5, 2012

Postgresql - Maintenaince. Cleaning up the pg_attribute table

What is the pg_attribute table?
The catalog pg_attribute stores information about table columns. There will be exactly one pg_attribute row for every column in every table in the database. (There will also be attribute entries for indexes, and indeed all objects that have pg_class entries.) The term attribute is equivalent to column and is used for historical reasons.
Sometimes it could become "bloated", that is, it will have unnecessary information (due to invalid database objects) and some cleaning up would be in order for performance reasons. I found this reference about the issue: As simple as this
VACUUM FULL  pg_catalog.pg_class;
REINDEX TABLE pg_catalog.pg_class;
Damn, life would be so fucking beautiful if we all shared what we know.

Tuesday, January 3, 2012

Life . Small time Oracle Certification

I really hate conformism and self deception. I invested about two months on an Oracle certification course, and I don't think it was really worth it. The thing I loath the most is that they made us memorize what is called "a brain dump" to have the right to present the certification tests. So each one of us had a very easy time approving them. I kind of like Oracle and study on my own when I can't sleep (like just now), but it doesn't feel right ... they took all the joy, all the sense of achievement away from me. From the business standpoint I realize that in that way, they generate returning happy (and conformists) customers, and hence more profits, but it's just not right!
So,I say, screw you small timers! Shame on you!

Postgresql - Security : Enabling server-side SSL support

In the last post, I made a client connect through a SSH tunnel in order to avoid sending plain text messages to the backend. Now I'm testing the enabling of SSL support on the postgresql server.

I'm following this walktrough:

And I've already executed successfully the first part.
+ SSL support seems to be available in the binary packages of some distros (so you don't have to compile postgresql from source to have this feature, just enable ssl through the postgresql.conf file ssl = on in the respective step)

This is how it looks a successful connection to the server, using SSL (in spanish =-( ):

Now, how does this communication channel look to a sniffer ? ( wireshark for instance)

Well, following the TCP stream, you only get a bunch of non-readable rubish.

Wich is great! It would be awesome to break this encryption somehow ... nah, just kidding.

Monday, January 2, 2012

Linux - Which process is bound to a particular port

I found this neat script, for you to use whenever in doubt about what process is bound to a particular port.

# See Linux Server Hacks #54
# $Id:,v 1.5 2007/03/28 06:50:09 oracle Exp $

# get the process info (pid/name). restricted to tcp only
procinfo=$( netstat --numeric-ports -nlp 2> /dev/null | \
            grep ^tcp | grep -w ${port} | tail -n 1 | awk '{print $7}' )

case "${procinfo}" in
  echo "No process listening on port ${port}"

  echo "Process is running on ${port}, but current user does not have rights to see process information."

  echo "${procinfo} is running on port ${port}"
  ps uwep ${procinfo%/*}



It parses netstat info.


wwwrun@linux-2561:~> ./  9999
9591/pgpool is running on port 9999
wwwrun    9591  0.0  0.1  10000  4128 ?        S    18:55   0:00 /usr/local/bin/pgpool -f /usr/local/etc/pgpool.conf -F /usr/local/e

Mmm, alternatively you can grep the /etc/services file (I own that one to RAF). For instance:

postgres@debian1:~$ cat /etc/services | grep 5432
postgresql      5432/tcp        postgres        # PostgreSQL Database
postgresql      5432/udp        postgres

But I like the script mentioned above better as it gives you more useful info.

Microsoft - Licenciamiento. Multiplexing y Client Access License - Afterthoughts

Estoy buscando una forma de darle la vuelta al asunto del multiplexing. Estas son mis ideas de momento :

Lamentablemente no hay demasiada información o feedback de usuarios que se refieran a experiencias y criterios (o workarounds) adoptados por empresas respecto a este asunto.
Creo que hasta aquí llega mi papel como abogado del diablo =-) . Supongo que es tiempo de consultar a un auditor de tecnología.

Postgresql - Security : SSH Tunneling your PostgreSQL communications

In my previous entry, I highlighted the importance of having encrypted communications between the clients and the backend server. The first approach I'll attack, is to use the ssh tunneling facility as depicted on the official docs, I regard it as being the most straightforward option (or at least the fastest to implement).

Testing scenario
+ one postgresql client running on a windows machine (IP
+ one postgresql backend server running on fedora linux (IP

+ A ssh server running where the postgresql backend server lives.
+ A linux account with which you can connect to the postgresql server host (and it's properly configured in the pg_hba.conf file (if necessary) .
+ Putty for windows
+ Having understood what the on-line postgresql documentation says on this matter.

This is the tutorial I'm using for the putty configuration:

- Note that I'm connecting through the port 22 (default port for ssh)

- 63333 is the port that will listen for connections on the client host, and it will be the port the I'll specify in the psql connection.

- Make the ssh connection to the host where the postgresql server is running (this is the tunnel).

- Using the ssh tunnel : Connect to the server using psql -h localhost -p 63333

+ Encryption is not enforced for the clients to be able to connect to the server.
+ I think that relying on the clients to set the tunnel might not be a good idea

Sunday, January 1, 2012

Postgresql - Security : Sniffing PostgreSQL communications

This is the kind of info that wireshark can give you about any particular unencrypted network message using the pgsql protocol:

Field name Type Description Versions
pgsql.authtype Signed 32-bit integer Authentication type 1.0.0 to 1.6.4
pgsql.code String Code 1.0.0 to 1.6.4
pgsql.col.index Unsigned 32-bit integer Column index 1.0.0 to 1.6.4 String Column name 1.0.0 to 1.6.4
pgsql.col.typemod Signed 32-bit integer Type modifier 1.0.0 to 1.6.4
pgsql.condition String Condition 1.0.0 to 1.6.4
pgsql.copydata Byte array Copy data 1.0.0 to 1.6.4
pgsql.detail String Detail 1.0.0 to 1.6.4
pgsql.error String Error 1.0.0 to 1.6.4
pgsql.file String File 1.0.0 to 1.6.4
pgsql.format Unsigned 16-bit integer Format 1.0.0 to 1.6.4
pgsql.frontend Boolean Frontend 1.0.0 to 1.6.4
pgsql.hint String Hint 1.0.0 to 1.6.4
pgsql.key Unsigned 32-bit integer Key 1.0.0 to 1.6.4
pgsql.length Unsigned 32-bit integer Length 1.0.0 to 1.6.4
pgsql.line String Line 1.0.0 to 1.6.4
pgsql.message String Message 1.0.0 to 1.6.4
pgsql.oid Unsigned 32-bit integer OID 1.0.0 to 1.6.4
pgsql.oid.table Unsigned 32-bit integer Table OID 1.0.0 to 1.6.4
pgsql.oid.type Unsigned 32-bit integer Type OID 1.0.0 to 1.6.4
pgsql.parameter_name String Parameter name 1.0.0 to 1.6.4
pgsql.parameter_value String Parameter value 1.0.0 to 1.6.4
pgsql.password String Password 1.0.0 to 1.6.4 Unsigned 32-bit integer PID 1.0.0 to 1.6.4
pgsql.portal String Portal 1.0.0 to 1.6.4
pgsql.position String Position 1.0.0 to 1.6.4
pgsql.query String Query 1.0.0 to 1.6.4
pgsql.routine String Routine 1.0.0 to 1.6.4
pgsql.salt Byte array Salt value 1.0.0 to 1.6.4
pgsql.severity String Severity 1.0.0 to 1.6.4
pgsql.statement String Statement 1.0.0 to 1.6.4
pgsql.status Unsigned 8-bit integer Status 1.0.0 to 1.6.4
pgsql.tag String Tag 1.0.0 to 1.6.4
pgsql.text String Text 1.0.0 to 1.6.4
pgsql.type String Type 1.0.0 to 1.6.4 Byte array Data 1.0.0 to 1.6.4
pgsql.val.length Signed 32-bit integer Column length 1.0.0 to 1.6.4
pgsql.where String Context 1.0.0 to 1.6.4

This is for example one very innocent looking capture result of executing psql internal command \l (list databases):

Extended descriptions:

+ Well, not everything regarding sniffing communications is bad, I guess you can troubleshoot a variety of problems with its help.

But I'm a lazy kind of guy in the sense that I'd rather capture automatically any communication related to PostgreSQL and have it saved in a txt file. How to accomplish such thing?

One option is using tshark:

+ Added another tool for troubleshooting postgresql issues.
+ Of course it must be documented somewhere in the web, but using this technique you get a glance at what psql internal commands do.
+ Only very lame postgresql administrators don't encrypt postgresql messages for sensitive communications.