What linguistic features modern search needs?

Do you like how Google fixes your typos, understands you and shows interesting information? Let’s talk about linguistic aspects of search, separately from relevance or smart ordering of products based on Machine Learning and data analysis. What linguistic features you need in your store search?

Spellchecking and Error tolerance

Yes, we do typos, especially on a mobile phone, or making little spelling errors in typing. But what can be more annoying than search which doesn’t fix them and not suggesting the right things? (Little spoiler- in some languages there’s one even more annoying thing and we’ll cover it later). Argh! that’s really nasty - you have to type by finger on a little letters on a phone screen. No, search needs to understand and handle this. Plus, from another hand- store my have typos in texts and we’d like to see it anyway.

Shop-specific dictionaries

Spell check is a useful feature, but it also needs to properly handle know words, specific for store domain. You should look professionally and suggest only meaningful typos for your visitors.

Natural language

We’re not machines who search for “a leather armchair” and would like to search “comfortable armchair for balcony” and search should understand us. It should find armchairs, go through the product descriptions, find which of them comfortable and within the parameters find that it’s perfectly suits balcony.

Inflections

In some languages words may change a lot by suffixes, prefixes, change by the different grammatical categories like tense, case, gender, aspect. Real smart search have to be able to handle these aspects in a searchable content and a user query. Specially for this, in Kea Labs we’re using various techniques, like algorithmic stemming or dictionary based text processing. We tune them for each specific language which allows us to make search less depended of word formations.

Compound words

In many european languages, such as a German, Dutch, Norwegian, Finnish, words are concatenated into long words. This is incredibly difficult for search, as it not only shall understand constituents, but also need to find content when visitors search concatenated word or a phrase of separate words.

Diaeresis/diacritic symbols

When I say nothing can be more annoying than search which doesn’t handle typos, I was wrong. The most obnoxious is the search which requires exact matching of diaeresis- symbols like ä, č or õ. If you, as a tourist have ever been trying to find a flight to, let’s say, Tromsø and didn’t find an options, then yes you may fill how thoughtful some searches are for people who don’t have even such letters on their keyboards. For many european languages this features is essential.

Different keyboard layouts

Another tricky part are various keyboard layouts. In some countries, like Germany or Czech Republic, letters Z and Y are replaced. And visitor shouldn’t care on which layout he’s typing - if it’s and classical with Z in the left-bottom corner, or a German with Y in that place? And what about cyrillic layouts, like in Russia? Search should be proactive and try to search on inverse layout if it didn’t get good enough results on original query.

Synonyms

Search will be more efficient if it understands synonyms. Ideally, it should handle not only standard dictionaries for a language, but also it should handle custom synonyms. In Kea Labs search or may configure synonyms and abbreviations specific for your store domain.

Due to the efforts of Google and other search companies, people’s expectations of search have grown significantly. Search in your store should match them and understand even complex queries of your visitors. This is essential, especially for complicated languages. Linguistic module of Kea Labs Search was designed for advanced linguistic processing and allows to find desired items even for complicated and irregular terms.