Archive for the “Data Mining” Category

Semantic search is a fantastic technology, if used correctly.  I am not talking about users of semantic search technology, I am talking about the technology vendors that make it part of a system

I was inspired to write this blog after reading Glen Cathey’s (The Boolean Black Belt) Article on Why Do So Many ATS Vendors Offer Poor Search Capability.  The article made me think about search engines (google, yahoo, etc) and how semantic search is being used with them.

What is semantic search?  To put is simple: semantic search can take, as input, a word like “Java” and offers up other related terms like “J2EE” or “Beans” (both are related to Java).  This allows the user to type in a few terms but match many, many terms.

The matching terms are built into an “expert system” that is continually built over time.  Many fancy names are given to these systems, based on how they are built, but basically they are sets of rules.

Semantic search is not AI (artificial intelligence).  If you hear that, it probably started in a marketing department somewhere.

Companies that have built semantic search engines, while they have not created AI, have spent a tremendous amount of time and resources to build these sets of rules.  The better engines can build rules on the fly from a new set of data, like resumes.  This is very cool stuff.

Overall, I like semantic search.  It has great potential, however, it has great weaknesses if used incorrectly.   If built into the engine itself, semantic search can be very powerful,  this is because semantic processing is done at the search engine side, without any limitations or constraints.  However, if bolted onto a search engine, it can be more harmful than good.

Here is what I mean.  I’ll try to keep my logic simple.

1. The Google search engine has a limit in how many terms can be submitted to it.

2. Semantic search, by it’s nature, creates permutations upon given terms. For example:

“Senior VP of Sales”  can be “SVP Sales” or “Senior Vice President of Sales”

to translate that into a boolean expression you get

“senior vp of sales” OR “SVP sales” OR “senior vice president of sales”

3.  After creating permutations upon several concepts, you are out of search terms.

I’m a big believe in laws (maybe not speed-limit laws), but more the “laws of the universe” type stuff.  I like to understand and deconstruct the rules and see if each one stands alone, or, do I need to recheck my premises.  In this spirit, just before the first sourceCon conference, I developed the Seven Laws of Internet Research.  I felt there was too much emphasis on memorizing search strings and the latest search engines or sites, but not enough fundamental thought leadership on how to think about searching the Internet.

The first two laws are

1. The Law of Permutation
2. The Law of Completeness

The Law of Permutation simply states that when searching the Internet, as it is not a homogeneous source of data, you must describe what you are looking for in the language of the many vs. the language of the one.  (YES, this is what Semantic search is doing).

The Law of Completeness states you must strive for completeness of search engine results in order to have the superior outcome

Big Question:  What happens if semantic search is applied before you reach completeness of results?

Answer:  Missing data. Competitors eat your lunch.  If you are a sales person, it means missed sales leads, if you are a recruiter, it means missed resumes or passive candidates.

Does this mean that I am anti-semantic search?  No way.  I think it has great potential.

Here are my take-aways:

-Semantic search should be inside the search engine for optimal results

-Semantic search bolted onto a standard search engine is severely limited.

-Semantic search will cause data to be missed if applied before reaching completeness of possible results

-When combining a standard search engine and semantic search, it is best to apply the semantic processing AFTER completeness of data has been reached.  In reality, this would not be semantic search, but semantic filtering.

Comments No Comments »

Having worked with many databases as well as having extensive experience in searching the Internet, I thought I’d share some thoughts on the differences between the two.

When I observe people searching the Internet, there is a common mistake I see them making.  Most people search the Internet like they are searching a database. Don’t get me wrong,  the Internet does include databases. Thomas Register, Spoke and Zoominfo are examples of different types of databases. Via different methods, information is added to these data sources and some sort of query mechanism is provided the subscribers. Can you use the Zoom query on Thomas Register and visa versa? No, these are proprietary systems that have search methods specialized to the content inside them. Each of these databases is limited, incomplete, but stored in a homogonous fashion.

The Internet, in its entirety, is not homogonous.
Read the rest of this entry »

Comments No Comments »

“Data normalization” is a phrase that leaves a blank stare on most peoples faces.  Here is a secret:  it is really simple.

Here is the inside scoop:  Technology people have a secret club, complete with handshake and everything.  It’s a club that we don’t want outsiders in.  So we create these long phrases that make peoples eyes glass over.  Why?  Because if everyone understood what we do, then we wouldn’t make the big bucks.   Being a recovering technologist, I’m on a continually journey to lose my geek speak. So get ready, here is the skinny on Data Normalization Read the rest of this entry »

Comments 1 Comment »

Most CRM implementations fail.  This is a fact.  Look it up.

In my years in the industry, I’ve worked with many vendors on the consulting side to help reduce the possibility of CRM failure.  While there is a whole host of reason that failure occurs, I have a very unique perspective into one of those reasons.  The Nature of Contact Information.

The nature of contact information is fairly finite (i.e. Company, URL, Name, Title, Email, Phone, Social Network membership, etc). In addition, the concept of contact information is a simple one to grasp. It is so simple, in fact, that if often gets overlooked.

One of the most important concepts in business is “be brilliant at the basics”. If you are brilliant at your basics many more complex processes will fall naturally into place. So how are you treating contact information?

The miss-handling of contact information can lead to dire consequences across your company.

Take the following work flow as an example:
Read the rest of this entry »

Comments 1 Comment »

There has been a recent rise of the term “Boolean blackbelt”, while I am not familiar, specifically, with all the people stating to be a a boolean black belt, I wanted to add some perspective.

“Boolean Blackbelt”, may be a great marketing phrase, but it is the equivalent of saying “basic math blackbelt”  or “kindergarden green beret”

Why?  because Boolean is extremely simple.  It is the basis for logic which includes logical operators  AND, OR, NOT, and XOR  (exclusive OR).

I do know some people out there that would qualify as “search string black belts” and what they do is a combination of understanding boolean logic along with the myriad of Internet sources.   They have vast knowledge of the process of search.  The magic they bring is not the boolean, it is the business process and understanding.  What they do is tweak out the special commands allowed by the search engines, like Google, MS Live.  Shally Steckerl and Glenn Gutmacher from Job Machine are examples of masters they are “beyond boolean”.

So please, hold the angry emails, I am not knocking the “black belts” out there,  I think they are actually selling their skills short.  My goal is to add clarity that being good in boolean may take few minutes.  Being good at the complex search commands available in the search engines may take a few days to learn.  To know HOW to apply the search fundamentals means you must understand and live with all the sources of information that are available.  Search engines, blogs, social networks, etc.  To know all that requires total immersion.

What is beyond boolean?  Remember, boolean equates to: AND, OR, NOT, OR.  Some popular search engine commands include: NEAR,  Site,  inURL.

(By the way, I just checked the domain:  beyondBoolean.com is available!).  Who is going to be beyondboolean?

What is beyond boolean?  What is missing?  Here are some that we have developed at Broadlook.  We call it, what else, “broadlean”.  The next major versions of Broadlook product engines will be supporting it (some already do).  I would like to hear from the searchologists (i prefer that term), what else would you like to see in the broadlean specification?

WARNING!!!!! – The rest of this blog is for the tech geeks out there.  If you don’t know your way around a search engine, STOP HERE!

Here are some BROADLEAN commands:

SS:   Same sentance as
SP:   Same paragraph as
SC:  Same concept as
PERSON:
CORP:
DATE:
EVENT:

SS, SP, SC are operators.  PERSON, CORP, DATE, EVENT are what we call entity operators.

example usage:

Project manager”  SS PERSON -  find all pages that mention the term “project manager” in the same sentance as a person.  The  value here is that you don’t have to know the person’s name to succeed with your search.

Google, MS Live, Yahoo are all toys when it comes to weeding through results.  I am excited to see the technology that comes out over the next few years in terms of targeting results.  What fun!

Comments 1 Comment »