Why does Doofinder return different results when I search for "estanterías" and "estanterias"? Isn't Doofinder supposed to clean up those "special characters" in searches?
Let's talk about what happens when someone makes a spelling mistake when performing a search.
There are some filters that are applied to the words when processing the data. We’ll consider two of them:
- Stemming: process where the root of the word is obtained.
- Character cleaning: some replacements are done, i.e. e instead of è, n instead of ñ, a instead of á…
In the index process the character cleaning is done after stemming. So, for instance, Spanish word estantería, is recognized as word in the dictionary and the stemming process takes the root estant, when character cleaning take place no special character is found.
Imagine someone mistypes this word and writes estanteria, which is wrong and doesn’t belongs to Spanish dictionary. Stemming takes estanteri as root, which results in removing the a (feminine particle in Spanish).
The first root (estant) will match with more words than the second (estanteri). For instance, the first root will match with every item with the word estante, while the second won’t.
The stemming is done before character cleaning because, if it is done after character cleaning a lot of words won’t be recognized as Spanish words. In the example above, estantería would be cleaned to estanteria and stemming wouldn’t recognize it.
To fix those cases where people usually mistype a word, synonyms could be used.
Though the behavior won’t be exactly the same for the synonyms (cause there are some fields where synonyms are not applied), it will improve the results for those misspellings.