Finding the right place for semantic search

Semantic search is a fantastic technology, if used correctly.  I am not talking about users of semantic search technology, I am talking about the technology vendors that make it part of a system

I was inspired to write this blog after reading Glen Cathey’s (The Boolean Black Belt) Article on Why Do So Many ATS Vendors Offer Poor Search Capability.  The article made me think about search engines (google, yahoo, etc) and how semantic search is being used with them.

What is semantic search?  To put is simple: semantic search can take, as input, a word like “Java” and offers up other related terms like “J2EE” or “Beans” (both are related to Java).  This allows the user to type in a few terms but match many, many terms.

The matching terms are built into an “expert system” that is continually built over time.  Many fancy names are given to these systems, based on how they are built, but basically they are sets of rules.

Semantic search is not AI (artificial intelligence).  If you hear that, it probably started in a marketing department somewhere.

Companies that have built semantic search engines, while they have not created AI, have spent a tremendous amount of time and resources to build these sets of rules.  The better engines can build rules on the fly from a new set of data, like resumes.  This is very cool stuff.

Overall, I like semantic search.  It has great potential, however, it has great weaknesses if used incorrectly.   If built into the engine itself, semantic search can be very powerful,  this is because semantic processing is done at the search engine side, without any limitations or constraints.  However, if bolted onto a search engine, it can be more harmful than good.

Here is what I mean.  I’ll try to keep my logic simple.

1. The Google search engine has a limit in how many terms can be submitted to it.

2. Semantic search, by it’s nature, creates permutations upon given terms. For example:

“Senior VP of Sales”  can be “SVP Sales” or “Senior Vice President of Sales”

to translate that into a boolean expression you get

“senior vp of sales” OR “SVP sales” OR “senior vice president of sales”

3.  After creating permutations upon several concepts, you are out of search terms.

I’m a big believe in laws (maybe not speed-limit laws), but more the “laws of the universe” type stuff.  I like to understand and deconstruct the rules and see if each one stands alone, or, do I need to recheck my premises.  In this spirit, just before the first sourceCon conference, I developed the Seven Laws of Internet Research.  I felt there was too much emphasis on memorizing search strings and the latest search engines or sites, but not enough fundamental thought leadership on how to think about searching the Internet.

The first two laws are

1. The Law of Permutation
2. The Law of Completeness

The Law of Permutation simply states that when searching the Internet, as it is not a homogeneous source of data, you must describe what you are looking for in the language of the many vs. the language of the one.  (YES, this is what Semantic search is doing).

The Law of Completeness states you must strive for completeness of search engine results in order to have the superior outcome

Big Question:  What happens if semantic search is applied before you reach completeness of results?

Answer:  Missing data. Competitors eat your lunch.  If you are a sales person, it means missed sales leads, if you are a recruiter, it means missed resumes or passive candidates.

Does this mean that I am anti-semantic search?  No way.  I think it has great potential.

Here are my take-aways:

-Semantic search should be inside the search engine for optimal results

-Semantic search bolted onto a standard search engine is severely limited.

-Semantic search will cause data to be missed if applied before reaching completeness of possible results

-When combining a standard search engine and semantic search, it is best to apply the semantic processing AFTER completeness of data has been reached.  In reality, this would not be semantic search, but semantic filtering.