Thursday, March 29, 2012

Curso Business Intelligence - Material

Hola muchachos, aquí estaré colocando el material necesario para nuestro curso de modo que lo descarguen y lo lleven a la respectiva sesión. Guarden un bookmark a este post para una referencia rápida. El password en todos los casos es el nombre propio de la coordinadora de la carrera de Sistemas Inteligentes seguido del password de sus máquinas virtuales (todo en minúsculas).

#########################################################################################################
Sesión del 31 de Marzo 2012.

Diapositivas

https://rapidshare.com/files/1964967587/diap_0331.zip

Script SQL base de datos dw_imdb:
https://rapidshare.com/files/493712796/dw_imdb.sql.tar.gz

Diagrama entidad relación de la base de datos dw_imdb
https://rapidshare.com/files/3472566836/entidad_relacion_dw_imdb.pdf

No olviden llevar su diagrama entidad relación de la base de datos WCM.

#########################################################################################################

Sesion del 14 de Abril 2012

Perdón por la demora, hemos andado muy atareados.

Transformaciones
https://rapidshare.com/files/1028060750/transformaciones.tar.gz

#########################################################################################################

Friday, March 23, 2012

PostgreSQL - Getting the size of an index

I just stumbled upon this gentle piece of info:
http://postgresql.cc/postgres-index-size


How to Get a Postgres Table’s Indexes’ Size

Run the following SQL command, substituting the name of your table, to get the size of that Postgres table’s index data:

postgres=# select pg_size_pretty(pg_total_relation_size('test_table') - pg_relation_size('test_table'));
 pg_size_pretty
----------------
 24 kB
(1 row)

The function pg_total_relation_size() gets the size of both the row data and indexes for the named table. We subtract the size of the tables row data (pg_relation_size()) to get the size of the table’s indexes. The pg_size_pretty() function then puts the size in readable units (bytes, kilobytes, megabytes or gigabytes).


As I intent to explain tomorrow the importance of surrogate keys this tip will be very useful. =-)

Wednesday, March 14, 2012

Protovis playground - Disregard this please

I'm just playing around with the data visualization library called "protovis". There's a new kid in town called d3.js but I aim to develop my own visualizations with a product that uses protovis. It would be more proper to setup a web page independently but I'm too lazy for that, and I would like to show my intermediate results whilst playing with it, so, please disregard this post at this moment... =-).






-->

PostgreSQL - Generating fake data with Common Table Expressions

I'm filling an orders table with sample data for a sql course. I have two tables: employee (the one who placed the order) and distributor, the task to generate 1,000,000 orders between 2005 and 2012.

WITH RECURSIVE source (counter) AS ( SELECT 1 UNION ALL SELECT counter +1 FROM source where counter < 1000000)
SELECT 
(random() * (select max(employee_id) - min(employee_id) from wcm.employee))::int + (select min(employee_id) from wcm.employee) employee_id,
(random() * (select max(distributor_id) - min(distributor_id) from wcm.distributor))::int + (select min(distributor_id) from wcm.distributor) distributor_id,
( '2005-01-01'::date + ((now()::date - '2005-01-01'::date)::int * random())::int ) order_date
FROM source;


I love CTE's !!!! Don't you?

PostgreSQL - Foreign key checking. Permission denied for schema even as superuser

I was getting the following error:

wcm=> select version();
                                                      version                                                      
-------------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.0.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-48), 64-bit
(1 Zeile)

wcm=# delete from employee;
ERROR:  permission denied to the schema wcm
ZEILE 1: SELECT 1 FROM ONLY "wcm"."inventory" x WHERE $1 OPERATOR(pg_...
                            ^
ANFRAGE:  SELECT 1 FROM ONLY "wcm"."inventory" x WHERE $1 OPERATOR(pg_catalog.=) "employee_id" FOR SHARE OF x

The explanation goes as follows:

permission-denied-for-schema-even-as-superuser

That's a foreign key checking query.FK checks are done as the owner of
the target table
, not as the user who did the original query. So your
problem is that the owner of the table lacks permissions
to access the other table (or more specifically, the schema it's in).

Reference:
http://postgresql.1045698.n5.nabble.com/permission-denied-for-schema-even-as-superuser-td4373036.html

Thanks Tom Lane wherever you are, may the force be with you!!

Monday, March 5, 2012

Latex - Beamer + minted + Pygments

The minted package allows you to highlight source code according to its language type by using pygments.
http://tug.ctan.org/tex-archive/macros/latex/contrib/minted/

rene@linux-2561:~> sudo zypper in python-Pygments
rene@linux-2561:~/bin> unzip minted.zip 
Archive:  minted.zip
   creating: minted/
  inflating: minted/minted.dtx       
  inflating: minted/minted.ins       
  inflating: minted/minted.pdf       
  inflating: minted/README           
  inflating: minted/Makefile 
rene@linux-2561:~/bin/minted> python --version
Python 2.7.2
rene@linux-2561:~/bin/minted> make
tex minted.ins
This is TeX, Version 3.1415926 (TeX Live 2011)
(./minted.ins (/usr/share/texmf/tex/latex/base/docstrip.tex
Utility: `docstrip' 2.5d <2005/07/29>
English documentation    <1999/03/31>

**********************************************************
* This program converts documented macro-files into fast *
* loadable files by stripping off (nearly) all comments! *
**********************************************************

********************************************************
* No Configuration file found, using default settings. *
********************************************************

)

Generating file(s) minted.sty 

Processing file minted.dtx (package) -> minted.sty
Lines  processed: 1296
Comments removed: 994
Comments  passed: 0
Codelines passed: 296

 )
No pages of output.
Transcript written on minted.log

rene@linux-2561:~/bin/minted> sudo cp minted.sty /usr/share/texmf/tex/latex/minted/

rene@linux-2561:~> sudo texhash
texhash: Updating /etc/texmf/ls-R... 
texhash: Updating /var/lib/texmf/main/ls-R... 
texhash: Updating /usr/share/texmf/../../../var/lib/texmf/dist/ls-R... 
texhash: Updating /var/cache/texmf/fonts/ls-R... 
texhash: Updating /var/lib/texmf/ls-R... 
texhash: Done.

This is what I've got so far:



Source
\documentclass[slidestop,compress,mathserif]{beamer}
\usepackage{minted}
\usetheme{Antibes}
\usecolortheme{lily}
\title{PostgreSQL}
\author{@darwinian}
\begin{document}
\begin{frame}
\titlepage
\end{frame}
\begin{frame}[fragile]
\begin{itemize}
\item Generating factorials
\end{itemize}
\begin{minted}{sql}
WITH RECURSIVE source (counter,product) AS
 ( SELECT 1,1 UNION ALL
     SELECT counter +1 , product * (counter +1)
         FROM source WHERE counter < 10)
SELECT counter, product FROM source;
\end{minted}
\end{frame}
\end{document}
Other references: http://en.wikibooks.org/wiki/LaTeX/Packages/Installing_Extra_Packages

Sunday, March 4, 2012

Latex.- The powerdot class for presentations

I started working with the powerdot document class. I find it extremely easy to use and powerful.
http://ctan.org/tex-archive/macros/latex/contrib/powerdot

This is my test presentation.


Wich I achieved with the following code

\documentclass{powerdot}
\begin{document}
\title{PostgreSQL replication concepts}
\author{Rene Romero Benavides}
\date{March 4,2012}
\begin{slide}{PostgreSQL}
\begin{itemize}
\item Here is the first line on my slide
\pause
\item And now here's the next
\end{itemize}
\end{slide}
\end{document}

References.
www.bakoma-tex.com/doc/latex/powerdot/powerdot.pdf
http://csgsa.cs.clemson.edu/seminar/tools06/docbook/html/ch02s04.html#id2396476
http://www.miwie.org/presentations/html/powerdot.html
doctorh.umwblogs.org/files/2010/12/courseC.pdf
http://www.dmi.me.uk/blog/2010/11/08/creating-a-presentation-with-latex-and-powerdot/


Mmmm... second guessing, I'll try beamer too.

Friday, March 2, 2012

PostgreSQL - Forcing index scans

I'm following a presentation by Bruce Momjiam titled "Explainng the Postgres Query Optimizer", and I found an exercise where you can force the query optimizer to use index scans (RAF not long ago told me about this too). I don't quite understand the value of doing this as if a value is present in a table several times it could be more costly to carry on an index scan versus a sequential scan, but ... you never now

SET enable_seqscan = false;
SET enable_bitmapscan  = false;

WITH letter (letter, count) AS ( SELECT letter, count(*) from sample GROUP by 1) SELECT letter AS "1", count,(SELECT * FROM lookup_letter(letter) AS "12" limit 1) AS lookup_letter FROM letter ORDER BY 2 DESC

So now it uses index scan each time regardless.

1 | count |                              lookup_letter                               
---+-------+--------------------------------------------------------------------------
 p |   199 | Index Scan using i_sample on sample  (cost=0.00..39.05 rows=199 width=2)
 s |     9 | Index Scan using i_sample on sample  (cost=0.00..21.91 rows=9 width=2)
 c |     8 | Index Scan using i_sample on sample  (cost=0.00..19.64 rows=8 width=2)
 r |     7 | Index Scan using i_sample on sample  (cost=0.00..19.63 rows=7 width=2)
 t |     5 | Index Scan using i_sample on sample  (cost=0.00..15.09 rows=5 width=2)
 d |     4 | Index Scan using i_sample on sample  (cost=0.00..15.07 rows=4 width=2)
 v |     4 | Index Scan using i_sample on sample  (cost=0.00..15.07 rows=4 width=2)
 f |     4 | Index Scan using i_sample on sample  (cost=0.00..15.07 rows=4 width=2)
 _ |     3 | Index Scan using i_sample on sample  (cost=0.00..12.80 rows=3 width=2)
 a |     3 | Index Scan using i_sample on sample  (cost=0.00..12.80 rows=3 width=2)
 u |     3 | Index Scan using i_sample on sample  (cost=0.00..12.80 rows=3 width=2)
 e |     2 | Index Scan using i_sample on sample  (cost=0.00..8.27 rows=1 width=2)
 i |     1 | Index Scan using i_sample on sample  (cost=0.00..8.27 rows=1 width=2)
 k |     1 | Index Scan using i_sample on sample  (cost=0.00..8.27 rows=1 width=2)

Where in cases of index scanning a value that repeats to a high degree it's related cost is higher too, comparing it to:

RESET ALL;
WITH letter (letter, count) AS ( SELECT letter, count(*) from sample GROUP by 1) SELECT letter AS "1", count,(SELECT * FROM lookup_letter(letter) AS "12" limit 1) AS lookup_letter FROM letter ORDER BY 2 DESC;

 1 | count |                             lookup_letter                             
---+-------+-----------------------------------------------------------------------
 p |   199 | Seq Scan on sample  (cost=0.00..13.16 rows=199 width=2)
 s |     9 | Seq Scan on sample  (cost=0.00..13.16 rows=9 width=2)
 c |     8 | Seq Scan on sample  (cost=0.00..13.16 rows=8 width=2)
 r |     7 | Seq Scan on sample  (cost=0.00..13.16 rows=7 width=2)
 t |     5 | Bitmap Heap Scan on sample  (cost=4.29..12.76 rows=5 width=2)
 v |     4 | Bitmap Heap Scan on sample  (cost=4.28..12.74 rows=4 width=2)
 f |     4 | Bitmap Heap Scan on sample  (cost=4.28..12.74 rows=4 width=2)
 d |     4 | Bitmap Heap Scan on sample  (cost=4.28..12.74 rows=4 width=2)
 a |     3 | Bitmap Heap Scan on sample  (cost=4.27..11.38 rows=3 width=2)
 u |     3 | Bitmap Heap Scan on sample  (cost=4.27..11.38 rows=3 width=2)
 _ |     3 | Bitmap Heap Scan on sample  (cost=4.27..11.38 rows=3 width=2)
 e |     2 | Index Scan using i_sample on sample  (cost=0.00..8.27 rows=1 width=2)
 i |     1 | Index Scan using i_sample on sample  (cost=0.00..8.27 rows=1 width=2)
 k |     1 | Index Scan using i_sample on sample  (cost=0.00..8.27 rows=1 width=2)
(14 filas)

Thursday, March 1, 2012

PostgreSQL - Common Table Expressions

I just stumbled upon this presentation by Bruce Momjian:
http://momjian.us/main/writings/pgsql/cte.pdf
I had heard of these beauties before but never really used them ... they're astonishing!

I'm just going to replicate the given examples :

test=# with source as ( select 1) SELECT * FROM source;
 ?column? 
----------
        1
(1 row)

test=# WITH source AS ( SELECT 1 as col1) SELECT * FROM source;
 col1 
------
    1
(1 row)

test=# WITH source (col1) AS ( SELECT 1 ) SELECT * FROM source;
 col1 
------
    1
(1 row)

test=# WITH source (col2) AS ( SELECT 1 AS col1) SELECT col2 AS col3 FROM source;
 col3 
------
    1
(1 row)

test=# WITH source AS ( SELECT 1,2 ) SELECT * FROM source;
 ?column? | ?column? 
----------+----------
        1 |        2
(1 row)

-- here it starts to get interesting

test=# WITH source AS ( SELECT 1,2), source2 AS ( SELECT 3,4) select * from source UNION ALL SELECT * FROM source2;
 ?column? | ?column? 
----------+----------
        1 |        2
        3 |        4
(2 rows)

test=# WITH source AS ( select lanname, rolname FROM pg_language JOIN pg_roles ON lanowner = pg_roles.oid ORDER BY lanname) SELECT * FROM source UNION ALL SELECT MIN(lanname), NULL FROM source;
  lanname   | rolname  
------------+----------
 c          | postgres
 internal   | postgres
 plpgsql    | postgres
 plpython2u | postgres
 plpythonu  | postgres
 sql        | postgres
 c          | 
(7 rows)

test=# WITH class (oid, relname) AS ( SELECT oid, relname FROM pg_class WHERE relkind = 'r' ) SELECT c.relname, pa.attname FROM pg_attribute pa JOIN  class c ON c.oid = pa.attrelid ORDER BY 1,2 LIMIT 5;
  relname   | attname 
------------+---------
 child_test | cmax
 child_test | cmin
 child_test | col1
 child_test | col2
 child_test | ctid
(5 rows)


--and this is where I give up as I'm extreeeeeemly tired


By the way: cmin,cmax, and ctid are system columns:

cmin

The command identifier (starting at zero) within the inserting transaction.
cmax

The command identifier within the deleting transaction, or zero.
ctid

The physical location of the row version within its table. Note that although the ctid can be used to locate the row version very quickly, a row's ctid will change if it is updated or moved by VACUUM FULL. Therefore ctid is useless as a long-term row identifier. The OID, or even better a user-defined serial number, should be used to identify logical rows.

http://www.postgresql.org/docs/current/static/ddl-system-columns.html

Another set of statements that I find fascinating:
test=# set statement_timeout = '1s';
SET
test=# WITH RECURSIVE source AS ( select 1 UNION ALL SELECT 1 FROM source ) SELECT * FROM source;
ERROR:  canceling statement due to statement timeout

test=# RESET statement_timeout;
RESET


And this one is absolutely epic:
test=# WITH RECURSIVE source (counter) AS ( SELECT 1 UNION ALL SELECT counter + 1 FROM source WHERE counter < 10) SELECT * FROM source;
 counter 
---------
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
(10 rows)

test=# WITH RECURSIVE source (counter,product) AS ( SELECT 1,1 UNION ALL SELECT counter +1 , product * (counter +1) FROM source WHERE counter < 10) SELECT counter, product FROM source;
 counter | product 
---------+---------
       1 |       1
       2 |       2
       3 |       6
       4 |      24
       5 |     120
       6 |     720
       7 |    5040
       8 |   40320
       9 |  362880
      10 | 3628800
(10 rows)

test=# WITH RECURSIVE source (counter,product) AS ( SELECT 1,1 UNION ALL SELECT counter +1 , product * (counter +1) FROM source WHERE counter < 12) SELECT counter, product FROM source WHERE counter = 12;
 counter |  product  
---------+-----------
      12 | 479001600
(1 row)