Monday, July 14, 2014

Big Data doom mongers need to look outside of the marketing department

In every change there are hype machines that over play and sages who call doom.  Into the Big Data arena steps David Searls to proclaim that Big Data is a myth and simply hype which is set to burst in an article over at ZDNet.
But big data, he said, is nothing more than the myth that collecting vast amounts of data can help companies know customers better than those customers even know themselves.
 The boogie men in this story are IBM and the consultants who have hyped it all up.  There another sage jumps in
Dr Matthew Landauer, co-founder of OpenAustralia, is equally sceptical about big data. "All it allows you to do is optimise your current business," he told ZDNet. "It's never going to tell you that you're doing business wrong or need another model.
There then moves forward a complaint about privacy and security (which I'm not disputing) but the key point is that Big Data is a bubble, and I really have to disagree with the definition of big data, the lack of innovation it can drive and that its a bubble.

Firstly I don't agree that marketing and customer data is what Big Data is about.  The vast majority of my conversations on Big Data have nothing to do with an explosion of customer information (e.g social media) but are instead about machine data, trading data, weather data and other massive data sets that historically companies couldn't cost effectively do analytics on.

Social Media and customer information is just one part of the challenge and its probably the most fluffy bunny and liable to be a bubble but to infer from there that Big Data is just hype is like assuming all swans are white because you see a single white swan.

Secondly is the assertion is that it cannot tell you that you are doing things incorrectly.  I'm not sure what sort of machine learning and other data science work Matthew Landauer has done but I am surprised that someone with a PhD in Physics from Cambridge hasn't seen examples from his own field at just how much change becoming data driven can deliver in terms of insight that causes disciplines,  companies and industries to make dramatic changes and have new approaches (the LHR at CERN for instance is quite clearly a Big Data application).  Finance is covered with examples where smarter algorithms identified that things were done wrong and that new ways would make more money.  By analysing and simulating you can absolutely find that there are new ways that can work significantly better.

Thirdly I disagree about who started the big data hype, IBM were far from being the leaders, that job goes to two industries.  Firstly the internet giants, Amazon, Google, eBay, Yahoo etc who created entire new business models based on information, and secondly on engineering companies who saw new business models based on information.  Sure the 'marketing/social media' has come to be the default story used by the lazy but that is far from saying that it is actually the story.

Big Data marketing might pop, that doesn't mean that Big Data is hype.  Saying so would be like claiming the failure of Association Football to become the dominant sport in North America means that Association Football is failing.  Big Data is already delivering benefits in engineering in particular and the challenges associated with the Internet of Things are not going to result in a reduction in information anytime soon.  Claiming that its all just hype doesn't help move the state of the now forwards and certainly doesn't serve all those use cases which really are Big Data challenges.

But then the roll of doom-mongering sage has never been to be fair and balanced, but instead to take a specific example and declare 'We're all doomed' or 'the end is nigh'.  Where would the book deals be in 'Big Data has many specific use cases but some vendors are using it to hype sales of their technology in places where it doesn't really add value' or to give it a book title 'Salesmen - not always looking out for your best interests'.

Friday, June 27, 2014

Open Source as religion - when the Bazaar becomes a Cathedral

The seminal book on Open Source development "Cathedral and the Bazaar" talks eloquently about the difference between commercial software development and open source development.  In the past few years however there has been another shift, a shift where companies are actively releasing their technology into Open Source as a competitive differentiation.  A claim of 'we are open' because the source code is open.

The selling point then is the number of 'committers' (developers) that the company has on the open source project, this being their selling point because it means they can get your bugs fixed quicker because they have the inside track.

The competition between vendors using exactly the same open source distribution then becomes a question of who is the 'purest' the the vision and who has the most bodies contributing to it.  If an external company takes that source and releases their own version they are not simply frowned upon then are actively prevented from engaging in contribution as this would dilute the corporate messaging of the commercial companies who first established or who mainly contribute today to that open source program.

This isn't an entirely new thing, we used to see it quite a bit with some of the Java pieces and some would argue its related to what Linus does with Linux.  There is however a very big difference.  In those previous cases it was normally a single individual who made the original release and its that individual who then managed that control.  In Linus' case he isn't the commercial arm behind any of these things.

Its natural for this to have happened in the Open Source community as its become a commercial competitive weapon but it does really mean that Open Source is ceasing to become that historical bazaar and is instead in many cases now simply a different cathedral into which rigid company approaches are applied.  Its extremely hard for companies that have locked down millions in VC funding to enable their core market message "we own the code" to be diluted as their Open Source project becomes popular as this would reduce their differentiation and thus their market multiple as they look to IPO.

Open Source remains a strong approach and one that gives companies a level of security if a company ever goes bust, in that the code is still available.  But its quite clear to me that the VC funding that has flooded into the space has really destroyed the previous ad-hoc bazaar approach and instead simply re-created the Cathedral approach but with an Open Source release management system.

Tuesday, May 27, 2014

MDM isn't about data quality its about collaboration

I'm going to state a sacrilegious position for a moment: the quality of data isn't a primary goal in Master Data Management

Now before the perfectly correct 'Garbage In, Garbage Out' statement let me explain.  Data Quality is certainly something that MDM can help with but its not actually the primary aim of MDM.

MDM is about enabling collaboration, collaboration is about the cross-reference

Why do you do an MDM project?  The answer is to join information between multiple systems and multiple parts of the organisation.  Its so the customer in SFDC is the same as the customer in SAP and in Oracle Financials and when that customer hits the website you know who they are.  Its so the sales person can see all the invoices, orders and other elements related to their customer.  Its so you can see how a product goes through the various parts of the R&D and supply chain processes and track it all the way.

If everything was in one big system with a single database then you wouldn't really need MDM you'd just need data quality to make sure the single record was a good one.  You need MDM because you are attempting to join across systems and business units.  So the real value from MDM is that cross reference that tells you who the customer is and where all the information about them lives in the various systems... even if you never clean any of it.

So this is how you sell MDM to the business, not about data quality which is a secondary benefit, but as something that will enable the business to better collaborate and function more effectively.

Sometimes Quality doesn't count

The reality is that total quality isn't always what the business wants, they know some data is dodgy so the question is how dodgy and knowing that when you use it to make decisions.  Lots of social media is amazingly poor quality, but taken in volume trends can be seen.  What makes it more valuable though is when you can enable that cross-reference between the high-quality and the lower quality so you can see the trends of your customers and products not just trends in noise.

Focus on collaboration, focus on the cross reference, quality will follow

So having said that Data Quality isn't a primary focus it is actually how you enable that pesky cross reference, but you do so only on the information that matters, the core information required for the cross reference.  Thus you get a higher quality core identification of the customer and everyone understands why they are doing it, the quality enables the cross reference which enables the collaboration.

If the business don't care about quality why do you?
Now once you have that quality core, a minimum set of attributes required to uniquely identify the customer, then often you want to expand that quality to more attributes but stop and think.  Have the business asked me? That is quite an important point.  You might think its an absolute disaster that a given attribute isn't used in a standard way, but it could be that no-one in the business gives a stuff, so tell them about the issue but let them decide if they want to spend the money making it better.  If they don't document that they don't so if they come back you can say 'great so lets re-prioritise it' which is much better than 'oh so I spent money doing something that doesn't matter'.

The more you federate the more collaboration matters
The reason that MDM matters is that more and more business is about collaboration, both internal and external, this means that the business value of MDM has really shifted from being about the data quality in reports to being an integral part of how a business works.  Data Quality isn't irrelevant in this world but its turned from being the goal of MDM to being a tool that helps enable the primary goal which is collaboration.  As the need to digitally collaborate with partners and customers increases so the business value of that MDM cross reference increases both in operations and as the bit that helps you link up all of those big data sources to create a global view.

MDM is the Rosetta Stone that enables people to collaborate, so focus on collaboration not quality. 

Thursday, May 22, 2014

Lipstick on the iceberg - why the local view matters for IT evolution

There is a massive amount of IT hype that is focused on what people see, its about the agile delivery of interfaces, about reporting, visualisation and interactional models.  If you could weight hype then it is quite clear that 95% of all IT is about this area.  Its why we need development teams working hand-in-hand with the business, its why animations and visualisation are massively important.

But here is the thing.  SAP, IBM and Oracle have huge businesses built around the opposite of that, around large transactional elements, things that sit at the backend and actually do the running of the business.  Is procurement something that needs the fancy UI?  I've written before about why procurement is meant to be hated so no that isn't an area where the hype matters.

What about running a power grid? Controlling an aeroplane?  Traffic management? Sure these things have some level of user interaction and often its important that its slick and effective.  But what percentage of the effort is about the user interface?  Less than 5%.  The statistics out there will show that over 80% of spend is on legacy and even the new spend is mainly on transactional elements.

This is where taking a Business SOA view can help, it starts putting boundaries and value around those legacy areas to help you build new more dynamic solutions.  But here is a bit of the dirty secret.

The business doesn't care that its a mess behind the scenes.... if you make it look pretty

Its a fact that people in IT appear regularly shocked at.  But again this is about the SOA Christmas, the business users care about what they interact with, about their view for their purposes. They don't care if its a mess for IT as long as you can deliver that view.

So in other words the hype has got it right, by putting Lipstick on the Iceberg and by hyping the Lipstick you are able to justify the wrapping and evolution of everything else.  Applying SOA approaches to Data is part of the way to enable that evolution and start delivering the local view.

The business doesn't care about the iceberg... as long as you make it look pretty for them. 

How to select a Hadoop distro - stop thinking about Hadoop

Scoop, Flume, PIG, Zookeeper.  Do these mean anything to you?  If they do then the odds are you are looking at Hadoop.  The thing is that while that was cool a few years ago it really is time to face it that HDFS is a commodity, Map Reduce is interesting but not feasible for most users and the real question is how we turn all that raw data in HDFS into something we can actually use.

That means three things

  1. Performant and ANSI compliant SQL matters - if you aren't able to run traditional reporting package then you are making people change for no reason.  If you don't have an alternative then you aren't offering an answer
  2. Predictive analytics, statistical, machine learning and whatever else they want - this is the stuff that will actually be new to most people
  3. Reacting in real-time - and I mean FAST, not BI fast but ACTUALLY fast
The last one is about how you ingest data and then perform real time analytics which are able to incorporate forecasting information from Hadoop into real-time feedback that can be integrated into source systems.

So Hadoop and HDFS are actually the least important in your future, its critical but not important.  I've seen people spend ages looking at the innards rather than just getting on and actually solving problems.  Do you care what your mobile phone network looks like internally?  Do you care what the wiring back to the power station looks like?  HDFS is that for Data, its the critical substrate, something that needs to be there.  But where you should concentrate your efforts is on how it supports the business use cases above.

How does it support ANSI compliant SQL, how does it support your standard reporting packages.  How will you add new types of analytics, does it support the advanced analytics tools your business already successfully uses?  How does it enable real-time analytics and integration?  

Then of course its about how it works within your enterprise, so how does it work with data management tools, how does its monitoring fit in with your existing tools.  Basically is it a grown-up or a child in the information sand-pit.

Now this means its not really about the Hadoop or HDFS piece itself, its about the ecosystem of technologies into which it needs to integrate.  Otherwise its going to just be another silo of technologies that don't work well with everything else and ultimately doesn't deliver the value you need.

Thursday, April 24, 2014

Data Lakes will replace EDWs - a prediction

Over the last few years there has been a trend of increased spending on BI, and that trend isn't going away.  The analyst predictions however have, understandably, been based on the mentality that the choice was between a traditional EDW/DW model or Hadoop.  With the new 'Business Data Lake' type of hybrid approach its pretty clear that the shift is underway for all vendors to have a hybrid approach rather than a simple choice between Hadoop or a Data Warehouse.  So taking the average of a few analysts figures we get a graph that looks like this
In other words 12 months ago there was no real prediction at hybrid architectures. Now however we see SAP talking about hybrid, IBM about DB2 and Hadoop and Teradata doing the same. This means we need to think about what that means.  What it means is that we'll see a switch between Traditional approaches and hybrid Data Lake centric architectures that will start now and accelerate rapidly.
My prediction therefore is that these Hybrid Data Lake architectures will rapidly become the 'new normal' in enterprise computing.  There will still be more people taking traditional approaches this year and next but the choice for people looking at this is whether they want to get on the old bus or the new bus.  This for me is analogous to what we saw around proprietary EAI against Java based EAI around the turn of the century.  People who chose the old school found themselves in a very bad place once the switch had happened.

What I'm also predicting is we will see a drop rather than a gain in 'pure' Hadoop projects as people look to incorporate Hadoop as a core part of an architecture rather than standalone HDFS silos.