Tagged ‘Business Intelligence‘

What is Data Mining ?

In this weeks topic I will explore what is Data Mining, the different meanings, how the term is used, etc. I will give you my interpretation of what it is and how other descriptions of data mining can be categorised.

Every article you read, every presentation you hear, etc. you get a slightly different description, or should it be that they hint to a description of how they use data mining in the products or their applications. By giving this hint at what data mining is they try to claim that they are using it, as it gives their products, applications and services a higher degree of sophistication compared to others. There is also the idea that it is a one of those trendy terms that is thrown out without them really knowing what it is really about.

Data Mining Definition

One of the most commonly cited definitions of what data mining is, “ is the non-trivial extraction of previously unknown and potentially useful information from data” by Usama Fayyad et al (Chief Data Officer, Yahoo Inc) in their landmark paper back in 1996.

Based on this definition data mining is does not involve some basic analytics, decision making based on some defined rules, being able to identify events based on current data, etc. But these type of scenarios are typically talked about as being data mining. If we go back to the definition by Fayadd above, by say the “non-trivial” it means that we cannot write some code/queries to pull data out of our data that answers some simple questions. Another important part of the definition is “potentially useful information”, tells use that some times and may in a lot of cases, data mining does not give use anything useful. But it can give us useful information only if we have a good understanding of the data, the business rules of the data, the meta-data, how the rules and the data relate to each other, etc. All of this requires extensive experience of working with the data. Who is best at doing this, but database designers and developers. People with a statistics background (typical what you see in data mining roles) have to go and learn all about the data, the business rules, the meta-data etc. This can be a huge waste of time and resources as the database people are generally ignored.

Some examples

I was at an IT conference last week (I was co-author of a paper on Opinion Data Mining). One of the key note talks was given by a technical lead in IBM (one of two thousand in the company). He gave some good examples of how Business Intelligence (BI) could be used to manage the energy needs of a new city being build out in the middle east. He also gave another example of how BI is being used in and around Galway city and coast line. There were several mentions of data mining during his talk, but I don’t think any of his examples reflected what data mining is. Yes he did give examples of how you can intelligently use your data. For example, if an object is spotted out in Galway bay then you can predict where this object will come to shore. But data mining is not the technique that is used in this case. Instead it is a rules based type system, that takes into account a number of factors, link the size of the object, the current position, currents, wind direction, etc. Using these rules (and not data mining) they can identify the landing position and let all the necessary bodies know this (like the coast guard, Galway county council, environmental control, etc).

Generally data mining can be used when you have a mature BI environment in your organisation that includes not just transactional and business reporting, but also data warehousing, data analytics, prediction systems (based on rules), etc. Data mining allows you to explore for and identify patterns in your data (and you need lots of data really). Going back to the definition of data mining a lot of the results from a data mining project may not be of any value. What you are looking for are the nuggets of gold that exists in the data and you may take some time to fine these, if they exist at all.

One of the aims of this weeks posting was to explore what data mining really is. At this point I haven’t really talked much about what it is, but what I hope you have gotten so far is that the term data mining is overly used in the IT world and can be seen as one of those trendy words that organisation like use (and use incorrectly). Data mining is used as an umbrella term that covers any processing of your data that involves a bit a processing, applying some rules and some analytics.

Over the coming weeks we will explore what Data Mining really is and what are the different stages of a Data Mining project.

The next posting will be about CRISP-DM, which is a industry neutral, product neutral data mining life cycle.

Make a difference to your bottom line with BI

In the first two chapters the authors develop their core argument. BI projects are only successful when they have a positive impact on the bottom line of an organisation. BI so the central and simple theme of the authors, needs to give an organisation competitive advantage by either increasing its profits or decreasing its costs.

They continue to say that BI must not be implemented in an unstructured manner, but has to be at the core of the business and its processes. Therefore, it needs to be aligned with the overall business strategy. One of the fundamental mistakes an organisation can make is to take an ad hoc appraoch to BI.

Indeed it is my own experience that the full potential of BI is only unlocked by few organisations. Often BI is just used to produce a report here and there but is not embedded in the core business processes: Reporting is disjointed, without an overall strategy, and most of the time report results are not followed up by action..

This stands in contrast to an organisation that uses BI strategically, e.g. to identify valuable customers that are given preferential treatment or special conditions, as opposed to less profitable customers.

BI opportunity analysis according to the authors, stands at the beginning of each BI project. It requires intimate knowledge of the industry that the organisation operates in (competitors, industry trends etc.), an in depth understanding of the organisation’s business processes and business drivers, and a thorough understanding of how to align BI and Data Mining techniques with the BI objectives. “For any given company in any given industry, we should systematically evaluate its industry, strategy, and business design as a means of identifying potential BI opportunities”. Unfortunately, a rare combination of skills.

In the chapters that follow (chapters 3 to 6), the authors continue to develop their iterative, full lifecycle methodology, the BI Pathway method. It is split into three phases. The architecture phase includes the development of the BI portfolio, the BI readiness assessment, and business re-engineering models (How is information currently used and how will it be used in the future? How will BI influence and transform business processes?). The implementation phase more or less follows traditional, more technically focused implementation methods (Kimball , Inmon etc.). During the operational phase the implementation is fine tuned and continuously improved.

In chapter 7 the authors give very useful practical examples of how BI can be aligned with business processes. This is a good starting point for getting ideas of how to embed BI in the everyday business processes of an organisation.

Chapter 8 offers a good overview on the mistakes that are typically made in a BI project.

In my opinion this is one of the few books that actually offers fresh insights. Coming mainly from a technical background, this book was an eye opener for me. Even though it was always clear to me that BI projects need to be driven by business processes, I have to admit that I did not understand the full extent of this until I had read this book. What I also liked were the numerous case studies and practical examples that are given, which is so often lacking in other BI books. The only criticism I have is that more of this hands on stuff would have been even better. What I also found quite useful is the executive summary at the end of each chapter. All in all a highly recommended book for both the technical and business BI practitioner, the novice and the expert.

Listen to this Podcast where co-author Nancy Williams discusses her thoughts with Claudia Imhoff et al