Monday, January 9, 2012

Postgresql - Full text search. First encounter

I'm amazed about the text searching capabilities embedded into PostgreSQL. It has the 'ability' to tell apart word variants and classify them as belonging to the same family (through their normalization).

My default configuration
pgbenchdb=# show default_text_search_config;
 default_text_search_config 
----------------------------
 pg_catalog.english
(1 row)

Though ,I better make it explicit in each case.


For instance:

Base word: friend
variants: friends , friendly

--in english
pgbenchdb=# SELECT to_tsvector('english','friends') @@ to_tsquery('english','friend');                                     
 ?column?                                                                                                                     
----------                                                                                                                    
 t                                                                                                                            
(1 row)    
--it matches                                                                                                                   
                                                                                                                              
pgbenchdb=# SELECT to_tsvector('english','friendly') @@ to_tsquery('english','friend');                                       
 ?column?                                                                                                                     
----------                                                                                                                    
 t                                                                                                                            
(1 row)


--in spanish

pgbenchdb=# SELECT to_tsvector('spanish','amigos') @@ to_tsquery('spanish','amigo');
 ?column? 
----------
 t
(1 row)

pgbenchdb=# SELECT to_tsvector('spanish','amigable') @@ to_tsquery('spanish','amigo');
 ?column? 
----------
 t
(1 row)

--in german

pgbenchdb=# SELECT to_tsvector('german','Freunde') @@ to_tsquery('german','Freund');
 ?column? 
----------
 t
(1 row)

pgbenchdb=# SELECT to_tsvector('german','Freundin') @@ to_tsquery('german','Freund');
 ?column? 
----------
 f
(1 row)
--it doesn't match!!!???

pgbenchdb=# SELECT to_tsvector('german','freundlich') @@ to_tsquery('german','Freund');
 ?column? 
----------
 f
(1 row)
--it doesn't match!!!???


Well, the german configuration needs further development, it ain't an easy language, that's a fact.

Here is a very good insight about different text search options, including this one.
http://www.slideshare.net/billkarwin/full-text-search-in-postgresql

I'll look up the Postgresql extension network, I might find something useful.

No comments:

Post a Comment