Every business from startups to larger established businesses needs to track details about their customers, products, sales, purchases, social media, website logs, to name a few. By extracting, manipulating and analyzing this data you can determine key metrics to help you understand more about your customers and grow your business. In part 1 we talked about how to kick start your data warehousing with BI360 and now we need to know what to do with all this data.
Basically you can use your Data Warehouse for financial statement reporting and analysis, dashboards and data mining. I previously went through how to use BI360’s One Stop Reporting here (link to Part 5 – BI Series Nov 2013) so I will be talking more about data mining.
Data mining can be defined as the process of analyzing large quantities of data through automatic or semi-automatic tasks to extract previously unknown interesting patterns. Over the years, storage has grown at a faster rate than computing power. As a result, companies have stored enormous amounts of data and have become data-rich but knowledge poor. The main purpose of data mining is to increase the value of these large datasets by extracting knowledge from them and making the data useful to the business.
In general, there are two types of data mining – predictive and descriptive. Predictive data mining will use variables or fields from a dataset to predict unknown or future values. The goal is to produce a model from a defined dataset that is used to perform classification, estimation, and other data mining tasks. On the other hand, descriptive data mining will focus on finding patterns that describe the data and can be interpreted by data analysts or business users. The goal is to gain understanding of the data by uncovering new relationships, patterns, and important information regarding the dataset.
Data mining can be valuable in virtually any industry and business scenario. The following are just a few examples of how data mining can improve decision making and give businesses a better return on investment with their Business Intelligence (BI) projects:
Recommendation Generation and Retail Floor Optimization:
What products/services should a company offer to its customers? By heat mapping customer traffic patterns it can help place offerings. Casinos use this to help optimize the game theme selection, denomination selection, placement and removal
E-Commerce Websites – analyze purchasing behaviors of customer population and recommend products based on items in a customer’s cart.
Analyze items that don’t fit a certain pattern. Credit card, insurance companies and business with POS systems use this method to detect fraud.
Which customers are most likely to switch to a competitor? When will they leave and can it be acted on before they do? Based on findings, companies can improve relationships and offer those customers discounts or loyalty incentives.
How many customers or sales per geographic location.
Determine the risk of a loan or mortgage based on the customer profile and amount they are asking
Companies can make decisions based on cost and risk of the loan by using historical customer data
How well do companies know their customers? What is their Life Time Value (LTV)?
Determines behavioral and descriptive profiles for their customers in order to target marketing campaigns. The behavioral and descriptive profiles high LTV customers and be used and applied against first time customers’ behavioral and descriptive profiles to capture their loyalty.
Estimate how much sales and/or inventory for each week, month, and quarter of a specific year
Data Mining Tasks
Determining the correct task and algorithm to apply to your dataset is a crucial step to achieving an accurate and useful data mining model. In some cases, it will be quite obvious which task will be the most accurate to use depending on the nature of your data. More often than not, you will need to explore and combine multiple tasks before arriving at a single solution. The following section describes the seven basic data mining and some additional resources and discussion on each:
Classification: Can be used to identify loan applicants as low, medium, or high credit risks based on attributes such as income, credit score, employment history, home ownership, and amount of debt. In this case the target would be credit score, and the other attributes would be the predictors.
Clustering: By using attributes such as Income and Age, three clusters could be created (could be more than 3 depending on data) – 1. Younger population with low income, 2. Middle Age with higher income, and 3. Older Customers with lower income.
Association: (also called Market Basket Analysis) Perhaps most famously, Amazon uses these types of analyses to suggest additional items you may be interested in purchasing, based upon other items frequently bought together.
Sequence Analysis: Can be used to analyze the sequence of web clicks on a website. The results are probabilities of the next click in the sequence, i.e. if a user clicks ‘News’, there is a 20% chance the next click will be ‘Sports’ and a 30% chance it will be ‘Weather’.
Social Media Analysis
Deviation Analysis: Used to find the rare cases that do not match their behavior to the ‘norm’; most commonly used in fraud detection – finds the transactions that don’t match the spending habits of the customer.
Regression: Can be used to predict a value of a house based on location, number of rooms, land size, and crime rates
Forecasting: Based on the sales from last year by month, how many cases of soda will the northeast store sell in January? The output would be the estimated number of cases of soda for January.
For a quick how to on using SSAS Data Mining check out my previous post here (