Saturday, February 18, 2012

PostgreSQL - fuzzystrmatch . distance between strings

I think this feature could be useful for data cleansing, or in general, for tasks related with string comparisons.

http://www.postgresql.org/docs/8.3/static/fuzzystrmatch.html

F.9.2. Levenshtein
This function calculates the Levenshtein distance between two strings:
   levenshtein(text source, text target) returns int
Both source and target can be any non-null string, with a maximum of 255 characters.

test=# Create extension fuzzystrmatch;
CREATE EXTENSION

test=# select levenshtein('john smith','john schmit');
 levenshtein 
-------------
           3
(1 row)


Calculating their degree of similarity between two words sounds easy... but I'll try that after a long nap. One thing that would be awesome, is to somehow implement an efficient auto-complete feature using postgresql...

No comments:

Post a Comment