Yesterday I went to a specialty retail store to pick up a few things. This store has a loyalty program with some pretty cool rewards. I don’t shop there very often, but my friend, Elizabeth, does. I asked if I could put the points I earned for my purchase on her loyalty card. They were happy to do so and offered to look her loyalty number up in the computer. The clerk searched by her full name and birthday, but she didn’t show up. After a minute or so it dawned on me that she might be listed under “Liz.” Sure enough, there she was. That’s the thing about humans, we often call the same thing by many different names. Elizabeth could have been, Lizzy, Lizzie, Beth, Betty, Liza, etc. I was sharp enough to know that Elizabeth may be Liz, but the software wasn’t. If your media monitoring solution works as literally as this point of sale system does, you could be missing out on relevant media mentions of your clients or brand.
Information Retrieval Challenges
The accuracy of your media monitoring solution depends on its ability to search millions of documents and return all, and only, relevant results. Most solutions rely on something called Boolean search. Named for the man who invented it in in 1847, (Yes, 1847), Boolean is a type of search allowing users to combine keywords with operators such as AND, NOT and OR to further produce more relevant results. For example, a Boolean search could be "taxi service" AND "San Francisco". This would limit the search results to only those documents containing the two keywords.
Although Boolean is useful in many contexts, when it comes to media monitoring, there are some significant challenges:
- Synonyms: As illustrated by the Elizabeth/Liz situation, many different words can refer to the same thing or concept. For example, synonyms of the word “plain,” include basic, unadorned, pure, bare, simple, ordinary, and more. Manually writing a search to include all of a word's synonyms is usually impossible.
- Polysemy: Some words have more than one meaning. The word, “smart,” might mean intelligent or well dressed. It might also refer to what happens to your arm after a flu shot. A Boolean search would not make a distinction about which documents that use the word "smart" are relevant to your intent.
Natural Language Processing
Natural language processing (NLP) is the ability of software to understand human speech as it is spoken. NLP is based on machine learning, a type of artificial intelligence that examines and uses patterns in data to improve a program's own understanding. In our loyalty program example, the software would learn from the experience that Elizabeth and Liz are the same person. The next time a clerk searched for Elizabeth, the Liz result would be returned. Solutions that use natural language processing become smarter at making relevant connections over time and eventually speak the same language as the user. In terms of media monitoring this means fewer false negatives and positives, leaving you with only accurate, relevant results.
When evaluating media monitoring solutions, look for those that leverage natural language processing. Although many vendors try to obscure the point, keep in mind that natural language input and natural language processing are not the same. The best PR solutions add sophisticated big data analysis to natural language search to give users clarity and insight.