Sunday, April 28, 2013

Sunday project. My desperate dictionary "attack"

I was stuck with one word of this game:
https://play.google.com/store/apps/details?id=de.lotum.whatsinthefoto.es
you look at 4 pictures and guess the word that correlates them. I couldn't go further so, I decided to use brute force.

Objective: Get all the possible permutations of the given letters (12 ) of the given size (5) and match them against a dictionary.


darwin@evolution:~/spwords> wget http://www.insidepro.com/dictionaries/Spanish.rar
darwin@evolution:~/spwords> unrar Spanish.rar


darwin@evolution:~/spwords> file Spanish.dic
Spanish.dic: ISO-8859 text, with CRLF line terminators

The database has unicode encoding, so lets do some encoding conversion.

idarwin@evolution:~/spwords> iconv -f ISO-8859-1 -t UTF-8 Spanish.dic > Spanish.dic.unicode
darwin@evolution:~/spwords> dos2unix Spanish.dic.unicode


I'm using my favorite RDBMS (PostgreSQL) for the word matching.

postgres=# CREATE TABLE words(id serial primary key, word varchar, word_unnacented varchar);
NOTICE:  CREATE TABLE creará una secuencia implícita «words_id_seq» para la columna serial «words.id»
NOTICE:  CREATE TABLE / PRIMARY KEY creará el índice implícito «words_pkey» para la tabla «words»
CREATE TABLE
postgres=# \copy words(word) FROM 'Spanish.dic.unicode'
postgres=# select count(*) from words;
 count
--------
 413527


postgres=# CREATE EXTENSION unaccent;
CREATE EXTENSION
postgres=# UPDATE words set word_unnacented = unaccent(word);
UPDATE 413527
postgres=# CREATE INDEX words_word_idx ON words(word_unnacented varchar_pattern_ops);
CREATE INDEX

postgres=# create extension plpython2u;

 CREATE OR REPLACE FUNCTION match_word(letters varchar,len int) RETURNS TABLE (match varchar)
AS $$
import itertools
result = []
for i in itertools.permutations(letters,len):
        rs = plpy.execute("SELECT word_unnacented FROM words WHERE word_unnacented = '%s' " % ''.join(i).lower())
        result += [(r['word_unnacented']) for r in rs]
return result
$$ LANGUAGE plpython2u;

postgres=# select distinct match_word('srtkjncnmonu',5) order by 1;
match_word 
------------
 comun
 cornu
 crujo
 cruos
 cujon
 cunos
 curso
 curto
 cutos
 junco
 junos
 junto
 jurco
 juros
 justo
 moscu
 munon
 muros
 murto
...

etc.

It took 10 seconds to yield the results (Intel Pentium Dual Core) .

It turns out that the correct word, which I found through this "attack", was "curso", that had in my opinion nothing to do with the pictures. Now I can continue playing =-) .