Category Archives: social networking

My vision of big data, 11 prerequisites

Late yesterday I learned from Twitter that Whitney Houston died.

99% of the Whitney Houston Tweets were exactly, “RIP Whitney Houston.” Okay… but how did she move you? Do you remember a special dance with that one girl while she was singing? Was her voice so beautiful that it made you tear up? It was for me. Originality is there, but buried on Twitter. I would have enjoyed others insights on Whitney, to feel camaraderie in a shared loss. If it existed on Twitter, it was obfuscated behind all the drone “RIP Whitney Houston” tweets. So instead I played some Whitney songs and told my children who the woman with the beautiful voice was.

Twitter is big data.

“Big data” is making the news. The concept has crept from the back pages of technical publications into the mainstream. It’s a new topic, so the reporters have commandeered it. It’s becoming popular, and that’s too bad. Media feeding frenzies perpetuate the peripheral definition; articles get copied over and over again, and people stop thinking.

With their IPO in the news, Facebook has become the poster child for big data. So what is it? What is big data? Simply put, massive amounts of information about millions, and eventually, billions of people. Big data is making the news because of fear – fear of the possibilities of abuse. It sells newspapers, gets clicks, and page views which means we will be hearing a lot about big data. Scare people and make money.

Facebook is big data.

Google is changing its privacy policy. Another media feeding frenzy. If you have a Gmail account, Google+, music, shopping, etc. All the privacy policies are melding into one. I like the idea and I have to admit, I don’t understand the problems people are having. If you use 5 or 10 different Google services, are you really going to read many different user agreements? I don’t know anyone who actually does. I would prefer to have one policy that covers them all. Google gives these services away, if you don’t like that one, single policy – stop using the service. The chances of people being informed about Google’s policies will increase if they have a single policy. It’s a good thing. Stop the bitching.

Google is big data.

Another bit in the news. The Seattle Times reports a top porn site, Brazzers, was hacked. From the article,  and other news about regarding it, usernames, passwords and real names were hacked. The data is making its way across the Internet on file sharing sites.

Internet user databases are big data.

In my vision of the world, big data is in its infancy. Don’t freak out for at least 10 years.

Why now? Why is big data coming into mainstream now? It has been around for many years. Large data providers like Experian, Axiom, and D&B have been collecting data for a long time. What is different now? To ask “why now,” you must understand the continuum of getting at big data.

11 big Data Prerequisites

  1. The data must be there – this is the most exciting tipping point.  In being the CEO of a data-mining software company, I’m still dumbfounded when users expect to get information off the web…that is not there.  It must actually exist.
  2. You must be able to flag it – you can’t store everything and must make choices.  What is important? When does it happen?  Example:  News release with subject: Nanotechnology
  3. You must be able to find it – in the absence of a real-time data stream, you must able to search though data to find a “flag” of what you are looking for.
  4. You must be able to parse it –  this is the analysis of relevant grammatical constituents, identifying the parts of what you need, from within potential noise.  Example: parsing out the name of an inventor from within an article on nanotechnology
  5. You must be able to extract it – Not the same as parsing.  What if the data is in a PDF file or HTML web page?  In many cases, extraction is about access.  Is the data I am looking for across 5 sub-links of a single web page?  Extraction as it relates to the Internet also encapsulates web crawling.
  6. You must be able to process it – This takes CPU cycles. Bigger problems need bigger computers.
  7. You must normalize it – If you have multiple pieces of data on “The Container Company”,  “Container Company, The”, “The Container Co”, etc,  how do you merge that data?  You must normalize like entities to a standard “canonical form”.  With out it, we’ve got the Data Tower of Babel.
  8. You must be able to store it – Big data takes up disk space.
  9. You must be able to index it – If you ever want to find it after you store it, the data needs to be indexed.  This also means more disk space.
  10. You must be able to analyze it – big data needs big (or many distributed) CPU’s to crunch the numbers and garner order from the chaos.
  11. There must be a payoff – Putting together big data is expensive.  Without a end goal in mind, it is expensive to collect.  Google & Facebook collect, process, index & store data for profit.

So what is my vision of “big data”?  What is being talked about in the media is very short sighted.  I think I know where big data is going.  I’m basing my vision on my prerequisites.

Big Data Thoughts

1: Information is growing beyond the ability of any single source to store and index everything. Therefore, big data can never be “all data.” Facebook and Google cannot store everything. Therefore choices must be made. Google already does it; indexing what they deem relevant.

2: The amount of data about people on Facebook is paltry…compared to the maximum possibilities. Yes, in aggregate, it is the largest set of minimal data. Think for a second about your day. What would it take to record your entire life in HD, from 7 different angles. This future data stream would include everything you heard, read, and generally interacted with.

3: Mass, personal data recording is on the horizon. The first phase is already starting. The only limit is reasonable storage. The term is called “LifeLogging.” There are devices that you can wear and it will take a picture every 30 seconds. High quality LifeLogging technology will be critical in the future. Every 30 seconds is 1/900th of video (30 frames per second). If the Lifelogging device is just the conduit vs. the storage medium, the lifelog could be stored on your home PC. With h.264 video compression and 5.5 hours of 1080p video can be stored on a 32GB thumb drive. That means a single 1TB (terabyte) drive can hold 176 hours of hi definition video (7.3 days of video). It would be expensive today to buy 52 X 1TB drives to store a year of your life. It seems crazy… right? Not when you are a historian. In 1992, the average hard drive was around 1GB – 1000 times less than today.

Some ideas to reduce the storage size of LifeLogging:

-Go vector. If you have an avatar created of you, a vectorized version of you could be stored. This type of compression does not exist, but it will. LifeLogging in bitmap video is like a tape deck. Vectorizing video with the lifelogee as the center of the story would save 1000X the storage. It is like the hard drive compared to tape storage. In addition, storing data in this way could be accessed very quickly. Bottom line: with the right *Software* real LifeLogging could be done today. I should save this for another in-depth blog. I’ve spent many nights thinking about how it all could be done. I’ve got to stop watching Sci- Fi before bed. Lawn Mower Man

4: Assume that we are in the 2020′s. Based on Moore’s Law, and several others, A LifeLogging device will be able to be worn around your neck, and record your life in HD. They’ll probably be the price of premium iPad. At that level, LifeLogging is ubiquitous.

5. What did I eat today? What about over the past week, month, or year? Just because that information, is recorded, as video (me munching Apple), does not mean that it can be analyzed and recognized as Donato-eats-apple. Where did you buy that Apple? Can the date of the purchase be cross referenced with the date that you bought it at the grocery store?

New industries

Software that analyzes and makes inferences from LifeStreaming (the will be a multi-billion dollar industry. (Donato ate apple, Donato started car, Donato got phone call, Donato was watching the movie Contact). I would expect that each major type of world interaction would be handled by a different app or algorithm.

Software that compiles inferences, builds statistics and performs what-ifs on mass LifeStream data will be multi-billion dollar industry. (23% of people that ate apples 4x per month, where the apples came from Chile, and most likely were treated with chemical X, developed cancer by age 55). These are the types of discoveries we will be able to make that are currently only made by virtue of a happy accident.  (I made up that example…but do eat organic apples).

Example: compiling a list of the junk (postal) mail letters that I throw out without opening. That is good data. What is the one that I opened?

Software that manages the rights, payments, connectivity and privacy between life streams will be a multi-billion dollar industry. So if that apple from Chile used some real nasty pesticides – like a carcinogen? Could that supplier of that apple to the store be tracked? Do you want to know this? What if your wife bought it… and it is not part of your personal data stream? Do you and you wife have a LifeStream sharing agreement?

One person, eating one Apple does not a trend make. Multiply that by 50 million people over 5 years. This is not science fiction. This is simply faster computers, more memory, and analysis software. It’s a lot of Apples. Do I want to share, if it was anonymous, my eating habits and cross reference it with my health…maybe.

I expect that companies will pop up, each with a different set of analysis technology for different niches. It will probably evolve into an AppStore model. One company looks at how you interact with media, what you watch, listen to, theaters attended. Another knows what you eat. You can choose which feeds to share with the greater LifeStream and take part in a greater community.

By the way, none of this LifeStreaming will be on Facebook, or Google+. No one would trust them. In addition, it would be prohibitively expensive to centrally transmit, store and analyze it. Hmmm, maybe Facebook could be the trend builder? It is well positioned for it. Can you imagine it?

Donato ate an Apple
Donato threw core in garbage
Donato did not recycle V8 can
Donato is driving 15 miles over the speed limit

This is the first time in a few years that I thought of a way for Facebook to survive long term. In this Facebook, you would never log in to look at what people are doing, you would log in to see that latest trend and how it affected you.

I just hope it does not make it to twitter and get retweeted by the “RIP Whitney Houston” drones. Once analysis agents can understand (and broadcast) our individual actions, Twitter has no reason to exist.

Big Data equals big money.

If it is possible, and someone can profit, it will be collected.

 

The decline of Apps and the rise of Agents and Clewds

Last week, while presenting a live webinar “The Near and Far Future of Recruiting” I had an epiphany.  I was talking about the eventual decline (or morphing) of Facebook.  The theory is this: Mobile computing power in 10 years will be server-capable.  Add in violation of trust and general mistrust of social networks.  The result is peer-peer social networking.  No Facebook needed.  Everything sits on your mobile device.  More private, more secure, total user control and no ads.  Facebook may lead the way, but it will be hard to do as they would cannibalize their own ad-driven revenue model.

This was last year’s Epiphany.

What led to the new epiphany was my pontificating on CRM systems.  This was a recruiter-centric talk about the future of recruiting.  Many recruiter CRMs have connections to LinkedIn profiles.   Every one of these, that I have seen, has been implemented incorrectly, not due to any fault of the vendors.  In an optimal situation, the data inside the Profile should be mashed up with current CRM data.  Instead, LinkedIn requires usage of their API which brings back a canned LinkedIn profile. This is what I call “social linkage”.

The optimal situation would be a pair of  “social agents”.  While a company may have 1000 company prospects  in their CRM, they may only contact 50 in a given day. One “social agent” would automatically refresh the entire CRM on a longer cycle such as once per quarter.  Another just-in-time social agent would update the CRM just before the outreach process.  Why is this important?  LinkedIn is not a definitive data-source; nothing is.  What happens when you combine Facebook, Google+, Jigsaw (now data.com), Foursquare, twitter and whatever social network Microsoft comes up with?  Are you going to clutter your Salesforce or Microsoft Dynamics interface with 6-8 little snippets, much with redundant information?   This gets ugly fast.  The optimal implementation is to have a social agent retrieve LinkedIn, Data.com, Google+, Facebook, Twitter information.  Next, mash, score, apply analytics to present the information in a way that optimally fits your selling model.

Continue reading The decline of Apps and the rise of Agents and Clewds

LinkedIn is not a social network, Facebook will morph

After about 2 years of talking about this topic, I thought it best to collect some solid data before doing an official blog about it.

LinkedIn is not a social network.

A thing is defined by it’s major attribute.  While LinkedIn has aspects of a social network, it is actually a social database.

Hey Donato…But they say they are a social network!

In the early days they were.  As the network grew, savvy users realized they needed to grow their networks as large as possible to spread their reach.  In polls done over the last year in live webinars, I’ve asked groups ranging from 200-600 how they use LinkedIn.  Here are the questions and the responses.

1.  I get as many connections as possible and figure out how to contact people directly.

2. I use LinkedIn to as it was meant.  Connect with people through a series of connections.

3.  I don’t use LinkedIn.

69% of people choose option 1. Last year, it was only 50%. The trend is growing and…

LinkedIn is a social database.

Continue reading LinkedIn is not a social network, Facebook will morph

Line 1300; What are the rights of an incoming caller?

What rights does an incoming caller have?  To be more specific, an incoming solicitor calling a place of business?

At home, we have the do-not-call list.  This could never be put into effect for business, nor do I think anyone sane would see it as a good idea.  Business would halt.

The general consensus that I have gathered is that callers to your home have no rights.  Hanging up on them is acceptable with a simple “no thank you” is status quo.  This I find fascinating.  When I polled regarding a caller to a business environment, the treatment is different.  Recipients of call to a business environment report that they will listen 1-2 minutes before exiting from a call they don’t want.  Some reasons why at home and office:

At Home

  1. Home is sacred, people feel invaded and justified to not give up their home time
  2. It’s usually at the end of the day, evening, people want to relax
  3. Non equivalence.  You are home, the caller is at work

At the Office

  1. Professionalism.  The Golden Rule.
  2. Equivalence.  You are both in a work environment
  3. You may be calling them tomorrow
  4. You really may be interested in their service

In essence, this is a philosophical question.  What is your corporate belief system? What is your personal belief system?  For me, today was back to back  scheduled meetings and three solicitors got past my gatekeeper. Rare.  It inspired this blog and reminded me of one of my beliefs:


“I came here to say that I do not recognize anyone’s right to one minute of my life. Nor to any part of my energy. Nor to any achievement of mine. No matter who makes the claim, how large their number or how great their need.”

Ayn Rand, The Fountainhead.

Of the three in-bound calls, one lied to my gatekeeper to get to me.  This is plain stupid.  Alienate the person who manages my schedule.  The other two reached me while everyone else was at lunch.  Not one of the three had a coherent message.  How much of my time did they get? Less than five seconds.  Did I hang up on them?  No. There is another option!

About three months ago, in talking with our administrative staff, I came up with the idea for line 1300.

If you end up in line 1300, you get a recording that sound something like this:

“Hello. You have have reached line 1300 at Broadlook Technologies because you were either unclear or perhaps rude in your outreach.  This is your chance to get it right.  At the sound of the tone please leave a clear, articulate message detailing how your product or service is right for Broadlook.  We listen to this voicemail box once per week.  If we are interested we will contact you.  Thank you.”

What does line 1300 do?  It empower the people that support me.  They do not have to take crap from rude callers.  It gives your staff an immediate out from a monotonous, unclear, script-reading telemarketer.  In addition, it covers the litmus test of professionalism.  We DO listen to 1300 once per week.

Line 1300 is NOT about being mean.  It is fair.  Personally I give sales a step by step guide on how to sell to me. If they don’t follow it, line 1300.

Try adding a line 1300.  Your staff will love your for it.

Grandfathering bandwidth and the flexible appliance

There is a new commodity in the high tech world.

Unlimited bandwidth.



Ask any of the iPad user that got one in the early days. Unlimited bandwidth is no longer available on the iPad.   I am one of the lucky users. With a combination of my travel schedule,  high Bandwidth using applications like Netflix and Broadlook’s Profiler, I regularly top 12-15 Gigabytes per month in data transfer.  Data plans today cover 2GB which means I am using 6-8 times the bandwidth that new iPad users get.

I am a bandwidth hog.   I am one of the 2% of people that use the majority of the bandwidth and I’ve got a message for AT&T…I’m keeping my plan…forever.

Why blog about this?  It is a warning for the uninformed.

Guess what?  Very soon you will be a bandwidth hog.  AT&T, Verizon and the other carriers understand this.  It is the nature of technology.  More and more applications, business logic and media rests in the cloud.  Now Apple and Google each want to offer streaming music services.  No longer will you have your iTunes on your desktop, laptop or iPad.  Nope.  They want all your music in the cloud.  Why?   Apple gets a piece of the service fee that you pay AT&T for your iPhone or iPad.  Bandwidth is the new electricity.

This is reminiscent of 2002–2008 when every idiot said that you must make your software offering SaaS (Software as a service).  SaaS is mostly good for service providers since it gives them reoccurring revenue, but it is not always the best solution.  Don’t get me wrong, I am huge believer in SaaS, but it is not a panacea.

Now they (the same smart zealots who want your $$)…are saying that they want all your stuff in the cloud.  Why?  Simple, if you store everything : backups, music, CRM, etc in the cloud then you need bandwidth to access it.

Whose cloud?

At the recent Oracle OpenWorld conference, Larry Ellison, CEO of Oracle talked about the cloud NOT being a single set of servers but a flexible appliance.  Thank you Larry!  He gets it. Most don’t.

The Flexible Appliance

What is it?  My iPhone is a flexible appliance.  In a recent talk at the MRI Worldwide conference (The Near and far Future of Recruiting), I demonstrated on stage the advent of the mobile web server.  My laptop connected to a website that was hosted on my iPhone and one person in the front row said “that’s cool!” out loud.  Not the response I was hoping for, but it sunk in to enough  people that had time to think about it.  It inspired some great conversations about the future of recruiting.

I used an iPhone app called ServersMan that makes your iPhone a web server.  Being able to run a web server on a mobile phone has huge implications.

If you want to test the vision of a technical leader ask them this question:

When mobile devices (iPhones, iPads) can act as functional web servers, what does that mean for the technology landscape?”

They should be stunned, they should be wondering, they should be smiling.  If they don’t, then they lack vision.  The advent of the true flexible appliance will bring:

-Massive bandwidth usage.  Via your mobile flexible appliance/personal web server, you will be connected to everything

-Downfall of Facebook.  News to Zuck.  The future social networks will be controlled from the pocket.

-movement from “their” cloud to “my” cloud.

When I have proposed the above, among tech folks, they remind me that some sort of middleware needs to facilitate one mobile web server finding and connecting to another.  This already exists, it is called dynamic DNS and their are a bunch of companies that offer this.   With DynamicDNS, my iPhone web server could very quickly connect to 200 of my friends and update my status on their mobile devices.   No cloud, no Facebook needed.   The only limitation is bandwidth and mobile processing speed.

The above scenario will happen once people realize they don’t want Facebook storing everything about them.  Due to the nature of the beast, they will continue to violate the privacy of their users.  Eventually it will go away.  Don’t get me wrong, I like Facebook.  It gives me a way to connect with grandma and show pictures of the kids.  Facebook may change and become the king of the middle, middleware the ties everything consumer together.  But do you trust them?  I don’t.

It’s all about the middleware.

As I look at SaaS (Software as a Service) and then PaaS (Platform as a Service) combined with the advent of the flexible appliance, I realize that my previous thinking was limited.  In the mobile future,  the mobile is the cloud, the flexible appliance.  For consumer apps like Facebook, people will eventually prefer to keep their personal data in a place they control it.  However, for business applications like CRM and ATS (Applicant Tracking), I see a new class of business.  Middleware as a service (MaaS).

Middleware as a service will balance the load between the cloud and the flexible appliance.  Unlike the limited browser-based applications today, MaaS systems will balance the rich interface and local power of flexible appliance with the security, flexible business logic and data storage in the cloud.  It will be interesting to watch it evolve.

With all this stuff in the works… if you get an unlimited bandwidth package, read the contract and if you can, never give it up.  Providers will offer unlimited bandwidth as a promotion and then like AT&T/Apple, try to get you to downgrade from $30 per month to $25 per month to relinquish your unlimited package.

Did I mention that once you get it, never give it up?