Thursday, May 10, 2012

Python.-Closures.Functions that build other functions

I'm studying Dive Into Python 3 and I'm pretty much in love with it.
http://getpython3.com/diveintopython3/

From Chapter 6. Closures and Generators.
Guess what? Functions are just other kind of objects in Python!

The technique of using the values of outside parameters within a dynamic function is called closures
import re

def build_match_and_apply_functions(pattern,search,replace):
    '''
    builds dynamically a tuple containing two functions (matches_rule,apply_rule) for each tuple (pattern,search,replace) received
    the constants patter,search and replaced get substituted by the actual parameters passed to the function accordingly
    '''
    def matches_rule(word):
        return re.search(pattern,word)
    def apply_rule(word):
        return re.sub(search,replace,word)
    return (matches_rule,apply_rule)

patterns = \
'''
for each of these patterns two functions will be built a match_rule  and an apply_rule functions
'''
(
    ('[sxz]$','$','es'),
    ('[^aeioudgkprt]h$','$','es'),
    ('(qu|[^aeiou])y$','y$','ies'),
    ('$','$','s')
)


'''
this is called a list comprehension in Python terms
in this case we have as a result, a list containing tuples of match_rule and apply_rule functions
'''
rules = [build_match_and_apply_functions(pattern,search,replace)
         for (pattern, search,replace) in patterns]

print(rules)

def plural(noun):
    for matches_rule, apply_rule in rules:
        if matches_rule(noun):
            return apply_rule(noun)
        
nouns = ["sufix","trash","lady","cat","math","boy","day",'pita','vacancy']
for noun in nouns:
    print(plural(noun))

I'm still in the process of assimilation, and this is pretty cool stuff.

Wednesday, April 25, 2012

Python - An insignificant script to grade tests

I'm just backing up this script to track my progress with python, it ain't a great thing, it just grades a bunch of test of some students.

f = open('/home/rene/Dropbox/respuestas.txt','r')
o = open('/home/rene/Dropbox/hoja_resultados.txt','w')
r = open('/home/rene/Dropbox/cals_uaem.txt','r')
respuestas = f.readline().split()
secciones = f.readline().split()    
temas = f.readline().split(',')
f.close()
reactivos = [""]*29
counter = 0
for i in r:
    reactivos[counter] = i.split()
    counter = counter + 1
r.close()
temas_estudiar = []
preguntas_por_seccion = {'BI':9,'ETL':6,'SQL':4,'BI-Documentos':3}
aciertos = {'BI':0,'ETL':0,'SQL':0,'BI-Documentos':0}
errores = {'BI':0,'ETL':0,'SQL':0,'BI-Documentos':0}
ptc_aciertos = {'BI':0.0,'ETL':0.0,'SQL':0.0,'BI-Documentos':0.0}
for i in range(0,29):
    for j in range(0,22):
        if (j!=5):
            if (respuestas[j] == reactivos[i][j+1]):
                aciertos[secciones[j]] = aciertos[secciones[j]] + 1
            else:
                errores[secciones[j]] = errores[secciones[j]] + 1
                temas_estudiar.append(temas[j])
        else:
            aciertos[secciones[j]] = aciertos[secciones[j]] + 1       
    num_aciertos = aciertos['BI']+aciertos['SQL']+aciertos['ETL']+aciertos['BI-Documentos']
    
    o.write("\nNum.examen: ")
    o.write(reactivos[i][0]+'\n') 
    o.write("score: "+str(num_aciertos)+" de 22\n")  
    o.write("Aciertos\n")
    o.write("BI ETL SQL BI-Documentos\n") 
    o.write(str(aciertos['BI'])+" "+str(aciertos['ETL'])+" "+str(aciertos['SQL'])+" "+str(aciertos['BI-Documentos'])+'\n')
    o.write("Errores\n")
    o.write("BI ETL SQL BI-Documentos\n") 
    o.write(str(errores['BI'])+" "+str(errores['ETL'])+" "+str(errores['SQL'])+" "+str(errores['BI-Documentos'])+'\n')
    o.write("Promedios\n")
    o.write("BI ETL SQL BI-Documentos\n") 
    o.write("%.1f"%(float(aciertos['BI'])/float(preguntas_por_seccion['BI']))+" "+"%.1f"%(float(aciertos['ETL'])/float(preguntas_por_seccion['ETL']))+" "+"%.2f"%(float(aciertos['SQL'])/float(preguntas_por_seccion['SQL']))+' '+"%.2f"%(float(aciertos['BI-Documentos'])/float(preguntas_por_seccion['BI-Documentos']))+'\n')
    o.write("Promedio final: "+"%.2f"%(float(num_aciertos)/22.0)+"\n")
    aciertos['BI'] = 0
    aciertos['ETL'] = 0
    aciertos['SQL'] = 0
    aciertos['BI-Documentos'] = 0
    errores['BI'] = 0
    errores['ETL'] = 0
    errores['SQL'] = 0
    errores['BI-Documentos'] = 0
    o.write("Temas a estudiar: \n")
    for i in temas_estudiar:
        o.write(i+",")
    o.write('\n')
    del temas_estudiar[:]
o.close()

Things I learned:
The write function doesn't put a new line character at the end.
http://docs.python.org/release/1.5.1p1/tut/range.html
The range function is not inclusive of its second parameter.
To get a better understanding of formatting operations.
http://docs.python.org/release/2.4.4/lib/typesseq-strings.html
To use the "del" statement to clear a list, i.e. del myList[:]
To use the split function.
To make lists of lists such as:
mlist3 = [[0]*3]*3
http://www.daniweb.com/software-development/python/threads/58916/making-a-list-of-lists
To use dictionaries

Saturday, April 21, 2012

Weave Maker. Rapid Web Application Development

I'm giving Weave Maker a try, I love coding, but coding algorithms ... the logic behind applications, I don't fancy myself as a GUI developer nor I think I would be good at it... anyway:

http://www.wavemaker.com/product/

Here's a screenshot of my hello weavey world web application that only took 2 minutes to build:


Did I mention that it has Apache license?


Monday, April 16, 2012

Saturday, April 14, 2012

Pentaho -Kettle. java options. xulrunner dependency

I stumbled upon an old error I had forgotten. Many applications refer to the xulrunner library in order to work. As its location varies from system to system, in some cases it's necessary to set it "by hand" . This is done through java options, for instance, within your java options you could include something like this:


-Dorg.eclipse.swt.browser.XULRunnerPath=/usr/lib/xulrunner-2/xulrunner

I was having this issue with Pentaho PDI 4.2 (kettle):


/home/avr/Downloads/data-integration
INFO  14-04 00:36:53,711 - Spoon - Logging goes to file:///tmp/spoon_d782dc5d-85f3-11e1-b128-1b67dca12e64.log
WARN  14-04 00:36:55,703 - could not parse [vertical] as Align value
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x01ced76c, pid=5084, tid=3077724992
#
# JRE version: 7.0-b147
# Java VM: Java HotSpot(TM) Client VM (21.0-b17 mixed mode, sharing linux-x86 )
# Problematic frame:
# C  [libxul.so+0xd5076c]  NS_InvokeByIndex_P+0x5d90
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/avr/Downloads/data-integration/hs_err_pid5084.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
#
./spoon.sh: line 144:  5084 Aborted                 (core dumped) "$_PENTAHO_JAVA" $OPT $STARTUP -lib $LIBPATH "${1+$@}"

Which vanished once the aforementioned option was set.

I dealt with this issue a year ago, but gosh I couldn't remember it up until 3 hours of trial and error. Fuck, lack of sleep shuts down my long term memory!

Thursday, March 29, 2012

Curso Business Intelligence - Material

Hola muchachos, aquí estaré colocando el material necesario para nuestro curso de modo que lo descarguen y lo lleven a la respectiva sesión. Guarden un bookmark a este post para una referencia rápida. El password en todos los casos es el nombre propio de la coordinadora de la carrera de Sistemas Inteligentes seguido del password de sus máquinas virtuales (todo en minúsculas).

#########################################################################################################
Sesión del 31 de Marzo 2012.

Diapositivas

https://rapidshare.com/files/1964967587/diap_0331.zip

Script SQL base de datos dw_imdb:
https://rapidshare.com/files/493712796/dw_imdb.sql.tar.gz

Diagrama entidad relación de la base de datos dw_imdb
https://rapidshare.com/files/3472566836/entidad_relacion_dw_imdb.pdf

No olviden llevar su diagrama entidad relación de la base de datos WCM.

#########################################################################################################

Sesion del 14 de Abril 2012

Perdón por la demora, hemos andado muy atareados.

Transformaciones
https://rapidshare.com/files/1028060750/transformaciones.tar.gz

#########################################################################################################

Friday, March 23, 2012

PostgreSQL - Getting the size of an index

I just stumbled upon this gentle piece of info:
http://postgresql.cc/postgres-index-size


How to Get a Postgres Table’s Indexes’ Size

Run the following SQL command, substituting the name of your table, to get the size of that Postgres table’s index data:

postgres=# select pg_size_pretty(pg_total_relation_size('test_table') - pg_relation_size('test_table'));
 pg_size_pretty
----------------
 24 kB
(1 row)

The function pg_total_relation_size() gets the size of both the row data and indexes for the named table. We subtract the size of the tables row data (pg_relation_size()) to get the size of the table’s indexes. The pg_size_pretty() function then puts the size in readable units (bytes, kilobytes, megabytes or gigabytes).


As I intent to explain tomorrow the importance of surrogate keys this tip will be very useful. =-)

Wednesday, March 14, 2012

Protovis playground - Disregard this please

I'm just playing around with the data visualization library called "protovis". There's a new kid in town called d3.js but I aim to develop my own visualizations with a product that uses protovis. It would be more proper to setup a web page independently but I'm too lazy for that, and I would like to show my intermediate results whilst playing with it, so, please disregard this post at this moment... =-).






-->

PostgreSQL - Generating fake data with Common Table Expressions

I'm filling an orders table with sample data for a sql course. I have two tables: employee (the one who placed the order) and distributor, the task to generate 1,000,000 orders between 2005 and 2012.

WITH RECURSIVE source (counter) AS ( SELECT 1 UNION ALL SELECT counter +1 FROM source where counter < 1000000)
SELECT 
(random() * (select max(employee_id) - min(employee_id) from wcm.employee))::int + (select min(employee_id) from wcm.employee) employee_id,
(random() * (select max(distributor_id) - min(distributor_id) from wcm.distributor))::int + (select min(distributor_id) from wcm.distributor) distributor_id,
( '2005-01-01'::date + ((now()::date - '2005-01-01'::date)::int * random())::int ) order_date
FROM source;


I love CTE's !!!! Don't you?

PostgreSQL - Foreign key checking. Permission denied for schema even as superuser

I was getting the following error:

wcm=> select version();
                                                      version                                                      
-------------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.0.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-48), 64-bit
(1 Zeile)

wcm=# delete from employee;
ERROR:  permission denied to the schema wcm
ZEILE 1: SELECT 1 FROM ONLY "wcm"."inventory" x WHERE $1 OPERATOR(pg_...
                            ^
ANFRAGE:  SELECT 1 FROM ONLY "wcm"."inventory" x WHERE $1 OPERATOR(pg_catalog.=) "employee_id" FOR SHARE OF x

The explanation goes as follows:

permission-denied-for-schema-even-as-superuser

That's a foreign key checking query.FK checks are done as the owner of
the target table
, not as the user who did the original query. So your
problem is that the owner of the table lacks permissions
to access the other table (or more specifically, the schema it's in).

Reference:
http://postgresql.1045698.n5.nabble.com/permission-denied-for-schema-even-as-superuser-td4373036.html

Thanks Tom Lane wherever you are, may the force be with you!!

Monday, March 5, 2012

Latex - Beamer + minted + Pygments

The minted package allows you to highlight source code according to its language type by using pygments.
http://tug.ctan.org/tex-archive/macros/latex/contrib/minted/

rene@linux-2561:~> sudo zypper in python-Pygments
rene@linux-2561:~/bin> unzip minted.zip 
Archive:  minted.zip
   creating: minted/
  inflating: minted/minted.dtx       
  inflating: minted/minted.ins       
  inflating: minted/minted.pdf       
  inflating: minted/README           
  inflating: minted/Makefile 
rene@linux-2561:~/bin/minted> python --version
Python 2.7.2
rene@linux-2561:~/bin/minted> make
tex minted.ins
This is TeX, Version 3.1415926 (TeX Live 2011)
(./minted.ins (/usr/share/texmf/tex/latex/base/docstrip.tex
Utility: `docstrip' 2.5d <2005/07/29>
English documentation    <1999/03/31>

**********************************************************
* This program converts documented macro-files into fast *
* loadable files by stripping off (nearly) all comments! *
**********************************************************

********************************************************
* No Configuration file found, using default settings. *
********************************************************

)

Generating file(s) minted.sty 

Processing file minted.dtx (package) -> minted.sty
Lines  processed: 1296
Comments removed: 994
Comments  passed: 0
Codelines passed: 296

 )
No pages of output.
Transcript written on minted.log

rene@linux-2561:~/bin/minted> sudo cp minted.sty /usr/share/texmf/tex/latex/minted/

rene@linux-2561:~> sudo texhash
texhash: Updating /etc/texmf/ls-R... 
texhash: Updating /var/lib/texmf/main/ls-R... 
texhash: Updating /usr/share/texmf/../../../var/lib/texmf/dist/ls-R... 
texhash: Updating /var/cache/texmf/fonts/ls-R... 
texhash: Updating /var/lib/texmf/ls-R... 
texhash: Done.

This is what I've got so far:



Source
\documentclass[slidestop,compress,mathserif]{beamer}
\usepackage{minted}
\usetheme{Antibes}
\usecolortheme{lily}
\title{PostgreSQL}
\author{@darwinian}
\begin{document}
\begin{frame}
\titlepage
\end{frame}
\begin{frame}[fragile]
\begin{itemize}
\item Generating factorials
\end{itemize}
\begin{minted}{sql}
WITH RECURSIVE source (counter,product) AS
 ( SELECT 1,1 UNION ALL
     SELECT counter +1 , product * (counter +1)
         FROM source WHERE counter < 10)
SELECT counter, product FROM source;
\end{minted}
\end{frame}
\end{document}
Other references: http://en.wikibooks.org/wiki/LaTeX/Packages/Installing_Extra_Packages

Sunday, March 4, 2012

Latex.- The powerdot class for presentations

I started working with the powerdot document class. I find it extremely easy to use and powerful.
http://ctan.org/tex-archive/macros/latex/contrib/powerdot

This is my test presentation.


Wich I achieved with the following code

\documentclass{powerdot}
\begin{document}
\title{PostgreSQL replication concepts}
\author{Rene Romero Benavides}
\date{March 4,2012}
\begin{slide}{PostgreSQL}
\begin{itemize}
\item Here is the first line on my slide
\pause
\item And now here's the next
\end{itemize}
\end{slide}
\end{document}

References.
www.bakoma-tex.com/doc/latex/powerdot/powerdot.pdf
http://csgsa.cs.clemson.edu/seminar/tools06/docbook/html/ch02s04.html#id2396476
http://www.miwie.org/presentations/html/powerdot.html
doctorh.umwblogs.org/files/2010/12/courseC.pdf
http://www.dmi.me.uk/blog/2010/11/08/creating-a-presentation-with-latex-and-powerdot/


Mmmm... second guessing, I'll try beamer too.

Friday, March 2, 2012

PostgreSQL - Forcing index scans

I'm following a presentation by Bruce Momjiam titled "Explainng the Postgres Query Optimizer", and I found an exercise where you can force the query optimizer to use index scans (RAF not long ago told me about this too). I don't quite understand the value of doing this as if a value is present in a table several times it could be more costly to carry on an index scan versus a sequential scan, but ... you never now

SET enable_seqscan = false;
SET enable_bitmapscan  = false;

WITH letter (letter, count) AS ( SELECT letter, count(*) from sample GROUP by 1) SELECT letter AS "1", count,(SELECT * FROM lookup_letter(letter) AS "12" limit 1) AS lookup_letter FROM letter ORDER BY 2 DESC

So now it uses index scan each time regardless.

1 | count |                              lookup_letter                               
---+-------+--------------------------------------------------------------------------
 p |   199 | Index Scan using i_sample on sample  (cost=0.00..39.05 rows=199 width=2)
 s |     9 | Index Scan using i_sample on sample  (cost=0.00..21.91 rows=9 width=2)
 c |     8 | Index Scan using i_sample on sample  (cost=0.00..19.64 rows=8 width=2)
 r |     7 | Index Scan using i_sample on sample  (cost=0.00..19.63 rows=7 width=2)
 t |     5 | Index Scan using i_sample on sample  (cost=0.00..15.09 rows=5 width=2)
 d |     4 | Index Scan using i_sample on sample  (cost=0.00..15.07 rows=4 width=2)
 v |     4 | Index Scan using i_sample on sample  (cost=0.00..15.07 rows=4 width=2)
 f |     4 | Index Scan using i_sample on sample  (cost=0.00..15.07 rows=4 width=2)
 _ |     3 | Index Scan using i_sample on sample  (cost=0.00..12.80 rows=3 width=2)
 a |     3 | Index Scan using i_sample on sample  (cost=0.00..12.80 rows=3 width=2)
 u |     3 | Index Scan using i_sample on sample  (cost=0.00..12.80 rows=3 width=2)
 e |     2 | Index Scan using i_sample on sample  (cost=0.00..8.27 rows=1 width=2)
 i |     1 | Index Scan using i_sample on sample  (cost=0.00..8.27 rows=1 width=2)
 k |     1 | Index Scan using i_sample on sample  (cost=0.00..8.27 rows=1 width=2)

Where in cases of index scanning a value that repeats to a high degree it's related cost is higher too, comparing it to:

RESET ALL;
WITH letter (letter, count) AS ( SELECT letter, count(*) from sample GROUP by 1) SELECT letter AS "1", count,(SELECT * FROM lookup_letter(letter) AS "12" limit 1) AS lookup_letter FROM letter ORDER BY 2 DESC;

 1 | count |                             lookup_letter                             
---+-------+-----------------------------------------------------------------------
 p |   199 | Seq Scan on sample  (cost=0.00..13.16 rows=199 width=2)
 s |     9 | Seq Scan on sample  (cost=0.00..13.16 rows=9 width=2)
 c |     8 | Seq Scan on sample  (cost=0.00..13.16 rows=8 width=2)
 r |     7 | Seq Scan on sample  (cost=0.00..13.16 rows=7 width=2)
 t |     5 | Bitmap Heap Scan on sample  (cost=4.29..12.76 rows=5 width=2)
 v |     4 | Bitmap Heap Scan on sample  (cost=4.28..12.74 rows=4 width=2)
 f |     4 | Bitmap Heap Scan on sample  (cost=4.28..12.74 rows=4 width=2)
 d |     4 | Bitmap Heap Scan on sample  (cost=4.28..12.74 rows=4 width=2)
 a |     3 | Bitmap Heap Scan on sample  (cost=4.27..11.38 rows=3 width=2)
 u |     3 | Bitmap Heap Scan on sample  (cost=4.27..11.38 rows=3 width=2)
 _ |     3 | Bitmap Heap Scan on sample  (cost=4.27..11.38 rows=3 width=2)
 e |     2 | Index Scan using i_sample on sample  (cost=0.00..8.27 rows=1 width=2)
 i |     1 | Index Scan using i_sample on sample  (cost=0.00..8.27 rows=1 width=2)
 k |     1 | Index Scan using i_sample on sample  (cost=0.00..8.27 rows=1 width=2)
(14 filas)

Thursday, March 1, 2012

PostgreSQL - Common Table Expressions

I just stumbled upon this presentation by Bruce Momjian:
http://momjian.us/main/writings/pgsql/cte.pdf
I had heard of these beauties before but never really used them ... they're astonishing!

I'm just going to replicate the given examples :

test=# with source as ( select 1) SELECT * FROM source;
 ?column? 
----------
        1
(1 row)

test=# WITH source AS ( SELECT 1 as col1) SELECT * FROM source;
 col1 
------
    1
(1 row)

test=# WITH source (col1) AS ( SELECT 1 ) SELECT * FROM source;
 col1 
------
    1
(1 row)

test=# WITH source (col2) AS ( SELECT 1 AS col1) SELECT col2 AS col3 FROM source;
 col3 
------
    1
(1 row)

test=# WITH source AS ( SELECT 1,2 ) SELECT * FROM source;
 ?column? | ?column? 
----------+----------
        1 |        2
(1 row)

-- here it starts to get interesting

test=# WITH source AS ( SELECT 1,2), source2 AS ( SELECT 3,4) select * from source UNION ALL SELECT * FROM source2;
 ?column? | ?column? 
----------+----------
        1 |        2
        3 |        4
(2 rows)

test=# WITH source AS ( select lanname, rolname FROM pg_language JOIN pg_roles ON lanowner = pg_roles.oid ORDER BY lanname) SELECT * FROM source UNION ALL SELECT MIN(lanname), NULL FROM source;
  lanname   | rolname  
------------+----------
 c          | postgres
 internal   | postgres
 plpgsql    | postgres
 plpython2u | postgres
 plpythonu  | postgres
 sql        | postgres
 c          | 
(7 rows)

test=# WITH class (oid, relname) AS ( SELECT oid, relname FROM pg_class WHERE relkind = 'r' ) SELECT c.relname, pa.attname FROM pg_attribute pa JOIN  class c ON c.oid = pa.attrelid ORDER BY 1,2 LIMIT 5;
  relname   | attname 
------------+---------
 child_test | cmax
 child_test | cmin
 child_test | col1
 child_test | col2
 child_test | ctid
(5 rows)


--and this is where I give up as I'm extreeeeeemly tired


By the way: cmin,cmax, and ctid are system columns:

cmin

The command identifier (starting at zero) within the inserting transaction.
cmax

The command identifier within the deleting transaction, or zero.
ctid

The physical location of the row version within its table. Note that although the ctid can be used to locate the row version very quickly, a row's ctid will change if it is updated or moved by VACUUM FULL. Therefore ctid is useless as a long-term row identifier. The OID, or even better a user-defined serial number, should be used to identify logical rows.

http://www.postgresql.org/docs/current/static/ddl-system-columns.html

Another set of statements that I find fascinating:
test=# set statement_timeout = '1s';
SET
test=# WITH RECURSIVE source AS ( select 1 UNION ALL SELECT 1 FROM source ) SELECT * FROM source;
ERROR:  canceling statement due to statement timeout

test=# RESET statement_timeout;
RESET


And this one is absolutely epic:
test=# WITH RECURSIVE source (counter) AS ( SELECT 1 UNION ALL SELECT counter + 1 FROM source WHERE counter < 10) SELECT * FROM source;
 counter 
---------
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
(10 rows)

test=# WITH RECURSIVE source (counter,product) AS ( SELECT 1,1 UNION ALL SELECT counter +1 , product * (counter +1) FROM source WHERE counter < 10) SELECT counter, product FROM source;
 counter | product 
---------+---------
       1 |       1
       2 |       2
       3 |       6
       4 |      24
       5 |     120
       6 |     720
       7 |    5040
       8 |   40320
       9 |  362880
      10 | 3628800
(10 rows)

test=# WITH RECURSIVE source (counter,product) AS ( SELECT 1,1 UNION ALL SELECT counter +1 , product * (counter +1) FROM source WHERE counter < 12) SELECT counter, product FROM source WHERE counter = 12;
 counter |  product  
---------+-----------
      12 | 479001600
(1 row)


Tuesday, February 28, 2012

Data Visualization . Colorbrewer. Vanity coloring your maps

I love this app that helps you choose just the right color combinations for your layers in maps.

http://colorbrewer2.org/

This is how I used it on my first map:


Cool, isn't it? Yay!
P.S. By the way,darker means more violent...