Data Scientists are GO

Was Brains of the Thunderbirds a “data scientist” as well as a genius engineer? If not, which member of the Tracey family designed the world wide data capture system for monitoring disasters & crimes and the necessary analytics system for selecting from that feed the mission for the next episode?

International Rescue might have started out as a small family business with ‘Father’ orchestrating operations using instinct & raw intelligence, but it would soon have developed into a considerable enterprise with R&D, product development, branding and a network of contacts across the globe.

Stupid fantasy question? Perhaps it’s not so different from many of the growing enterprises across the globe. I think many businesses struggle to transition from small business to a corporate enterprise and the transition to more formal decision making processes is key challenge. Typically, directors are likely to still be using experience, instinct and if they have come to grips with delegation, a core of able lieutenants keeping operations and logistics finely tuned.

When it comes to ‘business intelligence’, they are likely to be struggling to shift from traditional reporting deployments (providing data on what happened) to fact based decision support solutions, driving real time KPI monitoring. Forget analytics driven insight. It’s tough enough responding to market conditions and I bet many financial controllers are still chasing that dream of daily profitability KPI’s.

And it not just a problem for new growing companies. Older, established enterprises with legacy computer systems also face similar challenges. Many production capture systems are still data silos that impede decision making because of the legacy host. Lets not forget many of the features of relational & and other computation theory have only recently become commercially viable due to the advances in solid state memory.

Therefore, I think ‘Big data’ is so far down the road for most companies its got that ‘oasis’ heat shimmer thing going on. The hype that is currently being generated makes for a lot of FOMO (fear of missing out). This makes these companies a prime target for ‘marketing’ coyotes & foxes and they are in plague proportions. Dont get me wrong. Big data is real, its just not the first step on the journey of analytics driven decision making.

Companies such as Google, Facebook and Twitter (to name a few) are inherently ‘big data’ concerns. It doesn’t follow that every other company would benefit from ‘getting on board’ this revolution. However, I firmly believe that all businesses would benefit from improved decision support solutions. Its a matter of choosing something appropriate that can scale as you grow.

So if you have been “tuning in” to this new IT drama series on the web, called “BIG data” and are getting a bit tired of it, you will be pleased to know that there is now a spin off series called “the data scientist”.

So in addition to our daily staple diet of yet another marketing led big data offering, we will very soon see a flood of recruitment agents touting a stable of “data scientists” on their books or similarly a swathe of professionals changing their job titles on LinkedIn.

But hang on a minute, didn’t Howard Dresner recently tell us that the capabilities of a data scientist are extremely rare. Those capabilities apparently being “someone who provides the capability to increase the value of an organization’s information assets”.

Too right he did. However, if the Twitter comments of some fellow SQL professionals who have been recruiting for skilled T-SQL, DBA & ETL talent recently are indicative, then there will be plenty seeking work out there, who are prepared to bluff their way into this role. Many agents will struggle to filter them out as the IT industry itself comes to grips with understanding this new analytical direction.

Anyway Howard hosts a weekly BI twitter session and I might start joining in. Please check out his original article that I am quoting to get the full story.

The Register.co.uk reported a few weeks ago about predictions of a global skills shortages citing the McKinsey Global Institute report. The author, Mark Whitehorn, suggests a data scientist would excel at analysing data, particularly large amounts of data that does not fit easily into tabular structures, so called “big data”. Sounds quite reasonable to me. I am a long time reader of the Register and recommend it highly to anyone in the tech industry. They give ‘Fleet Street’ a run for their money on inventive headlines.

In another report, Bruno Aziza says he has found the most successful data scientists balance brains and brawn–they can manipulate databases like its nothing, work well with other coworkers and present findings to the executive board. “Think of a data scientist more like the business analyst-plus,” says Aziza.

In the same report Jeff Hammerbacher, who previously led the data science team at Facebook, recounted that his team would use Python, R, and Hadoop, and then have to relay the findings to a non-technical team on any given day. The more you know, the better you’ll be prepared to solve the day’s problems.

I do wonder what Jeff meant by “non-technical”. Given the specialisation that goes on in IT these days, that could mean the front line data analysts that support the business report writers. Or it could be the analyst in Marketing who is planning the next seasons promotions. Perhaps its director level strategic planning sessions. It does highlight that converting data into information is still a classical task that requires directed effort that is linked all the way from senior decision makers through to operational users. There is no straightforward hierarchy in this process as data and information flows are very intertwined, now more than ever.

I have only recently subscribed to the Smart Data Collective and follow them on Twitter (@SmartDataCo). There are some very clever, well informed contributors on this site, as well as curation of high quality vendor sponsored whitepapers.

A recent Gartner research note has injected a healthy dose of “a warp drive plasma” equivalent into the “big data” mix. The headline quote for big data (but by no means the only interesting one), is “By 2015, big data demand will reach 4.4 million jobs globally, but only one-third of those jobs will be filled.” Another significant trend was that CMO’s (chief marketing officer) will begin to take over the budget that CIO’s are used to controlling. Now that would be interesting.

So, clearly interesting times are ahead. Gartner think Marketing will take control of a huge chunk of what was traditional IT budget. Cloud based PaaS, SaaS & IaaS are making it even easier for business units to go around IT department backlogs and unrealistic timelines. There are some very innovative cloud solutions out there and it certainly seems the way of future IT, but its not necessarily right for your business, or that specific requirement, now or ever. Gartner also think there wont be enough talent to meet demand.

I advocate loads of caution and pragmatism when getting out your cheque book for a third party service or solution and also if you decide to embark upon this in house. If you wish to better understand the roots of ‘big data’ and get a handle on what its really about, then I would advise starting with the book “Analytics at Work, Smarter Decisions Better Results”.

In the opening paragraphs this line struck a chord with me – “Rigorous analytical testing can establish whether your intervention [ … a customer communication, product discount, sales promotion, product placement, loyalty scheme … ] is causing desired changes in your business, or whether its simply the result of random statistical fluctuations”.

If you are part of a company that is facing the challenges that I have referred to, then this book is an excellent starting point. I think its an especially useful tool for understanding & communicating core issues and concepts to those within an organisation not familiar with analytics. However, its primarily presented as a methodology and this is where I found it most useful. There are no shortage of excellent tech manuals on how to get the most of database engines and how to build efficient ETL pipelines, which coupled with experience, whether personal or shared within a team, can deliver top quality decision support solutions.

However, a well conceived and robust analytics strategy can focus a business on what is really important right now, and ensure that tactical endeavours have the greatest chance of success. The right answers to the wrong questions, delivered in near realtime don’t help. I think this is why many of us IT techies have an affinity with the answer “42”, as we cynically appreciate the importance of asking the right question. The task of formulating the right questions is a business wide task requiring strong leadership, delegation and empowerment. This is also why I strongly believe that steering of BI / decision support solutions must be lifted out of IT, as IT types aren’t traditionally the best communicators and generally don’t have a business wide perspective or remit.

An alternative to being drawn into the “BIG data” hype, is to start a business intelligence unit, with a reporting line and budget separate from or in tandem with the IT department (you cant do it without them).

I would begin with a trinity of one finance domain expert, one marketing domain expert (or supply chain or analyst representing your core business activity) and a good data analyst from IT (or failing that an Excel expert from within the business). With a remit from and reporting to the relevant departmental directors (or board), define some useful, tangible questions & outcomes, with a short timeframe (weeks) and get started. Treat it like an old fashioned science project, predict the outcomes and learn from the results, expected or otherwise.

But don’t mistake this for a purely technical excise. Creation of an independent BI unit will also potentially shift the control of information within the business and as Francis Bacon said, “Knowledge is power”. The role of those steering the BI unit is crucial and its never easy to implement change within an organisation. Another challenge for those in charge of a business is to accept the reality of solid analysis, especially when it flies in the face of what the collective opinion currently is.

Seek out expertise in-house if you have it. Otherwise, hire an experienced BI consultant or data architect who has a track record and can demonstrate competency. Hiring a full time ‘data scientist’ or even a rockstar ‘data analyst’ and setting them loose from the outset, is effectively getting your tools out before you have a design or plan. Also, if this is new ground for your company, then you may not have the qualifications to hire the right skills and know when they are needed.

This article arrived as I was working on the final draft and is very timely. With a title of “The End of the Expert: Why No One in Marketing Knows What They’re Doing” its quite controversial, but makes some excellent points. This change of landscape could happen in any area of your business. Dont leave it too late to change direction. Robust BI solutions can provide very early warnings and analytics will help your company work out where to head next.

The latest debate is that Hadoop and other no SQL solutions will bring about the death of RDBMS and the EDW. OODBMS was going to kill RDBMS ten years ago. Didn’t happen and it won’t happen this time.

Purposeful analytics solutions are shaping up to be very powerful tools, but I think right now, a CIO would be crazy to embark upon a program of shifting the corporate data asset out of proven RDBMS technology platforms, underpinned by an experienced and populous talent base, into the brave new world overseen by this new breed of data scientist. It needs time to mature and even then I am not so sure it is a one size fits all solution. EDW methodology is as old as relational theory and is still relevant.

What also concerns me is that government & retail marketing seem to be at the head of this near hysterical big data push. This combination is worrying and I am concerned about the potential standards and ethics behind some of the data mining initiatives being conceived and implemented. This ‘BIG data’ push is coming at a time when existing protection instruments are coming apart at the seams, due to being conceived so long ago and no longer being fit for purpose.

Finally, I recall reading an article in one of the Computer magazines over a decade ago, which advised being wary of ’silver bullet” solutions that promise the world. Usually to be delivered in a guaranteed time frame and they will solve all your problems (usually sold by someone in a teflon coated suit). BE WARY. There are plenty of foxes & coyotes out there. I think the recent bad news for HP highlight this is still very relevant today.

FAB Virgil

Tags: , , , , , ,

Comments are closed.