Tuesday, January 10, 2012

PostgreSQL - Full text search. Installing a new german dictionary

It really startled me PostgreSQL not being able to tell that 'Freundin' belongs to the same word family as 'Freund'. I looked it up and turned out that no appropriate german dictionary was installed on my system.

Step1. Install the dictionary.
Main website:
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/
File:
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/dicts/ispell/ispell-german-compound.tar.gz

Extract it and place it into this directory or equivalent:
/usr/pgsql-9.1/share/tsearch_data

Step2. We'll need to do some encoding conversion:

[root@localhost german_compound]# 
iconv -f ISO8859-1 -t utf8 german.dic > ../german.dict
iconv -f ISO8859-1 -t utf8 german.aff > ../german.affix
iconv -f ISO8859-1 -t utf8 german.stop > ../german.stop
http://archives.postgresql.org/pgsql-general/2009-02/msg00121.php


Step3. Follow this configuration track:

http://domas.monkus.lt/full-text-search-postgresql
I named my configuration 'de'

Step4. Test the results.

postgres=# SELECT to_tsvector('de','Freundin') @@ to_tsquery('de','Freund');
 ?column? 
----------
 t
(1 row)

postgres=# SELECT to_tsvector('de','Studentin') @@ to_tsquery('de','Student');
 ?column? 
----------
 t
(1 row)
postgres=# SELECT to_tsvector('de','Freunde') @@ to_tsquery('de','Freund');
 ?column? 
----------
 t
(1 row)
postgres=# SELECT to_tsvector('de','freundlich') @@ to_tsquery('de','Freund');
 ?column? 
----------
 t
(1 row)


Now I'll add a good spanish one. So it seems I'll be able to sleep tonight, at least as soon as I play around with PITR a little bit.

No comments:

Post a Comment