Use Power BI and R to Quickly Identify Business Insights

In statistics, the correlation coefficient or Pearson’s correlation is a measure the strength and direction of association that exists between two continuous variables. The value of correlation, r, is always between +1 and –1. To interpret its value, see which of the following values your correlation r is closest to:

  • Exactly 1. A perfect downhill (negative) linear relationship
  • 0.70. A strong downhill (negative) linear relationship
  • 0.50. A moderate downhill (negative) relationship
  • 0.30. A weak downhill (negative) linear relationship
  • 0. No linear relationship
  • +0.30. A weak uphill (positive) linear relationship
  • +0.50. A moderate uphill (positive) relationship
  • +0.70. A strong uphill (positive) linear relationship
  • Exactly +1. A perfect uphill (positive) linear relationship

Don’t make the mistake of thinking that a correlation of –1 is a bad thing, indicating no relationship. Just the opposite is true! A correlation of –1 means the data are lined up in a perfect straight line, the strongest negative linear relationship you can get. The “–” (minus) sign just happens to indicate a negative relationship, a downhill line. Most statisticians like to see correlations beyond at least +0.5 or –0.5 before drawing any conclusions. Additional information on Pearson’s correlation can be found here.

Putting it All to Use

This is all great knowledge, but how can we apply this in a business environment. When I worked for a casino, we kept a daily record of Coin In (gross sales), High Temp, Low Temp and Fuel price in an Excel spreadsheet and tried to find a correlation between these data points. Try find the correlation with this data.

Excel Casino data

With Power BI and R, we can make this association all that much faster and with slicer for interactivity to drill into the data points. I first downloaded the R Correlation Plot custom visual located here and then Excel spreadsheet from above imported into Power BI. I was then able easily identify the correlations in a matter of minutes.

Positive correlations are identified with increasingly dark blue circles, negative correlation is red. The greater the bigger and darker the circle. Selecting “Nov” from the calendar slicer, you will notice that the intersection between “Low Temp” has a positive correlation (blue circle), so the lower the temperature the less you will realize in coin in for the that month.

Power BI and R Correlation Plot

Selecting the “Jul” from the calendar slicer, and notice that the correlation is different.

In this case, the correlation is zero between temperature and coin in.

Power BI and R Correlation Plot image 2


Hope this helps you quickly identify business insights in your organization.

Supercharge Your Dynamics GP Dashboards & Reports with Power BI

During my last blog post, I walked you through creating a financial dashboard using Power BI with Dynamics GP data. I have created additional analysis and measures now.


If you look closely at the bottom left, you will see two additional reports for Sales Summary and Customer Analysis.

Sales Summary

The data used from Dynamics GP for this report are the Sales Order Processing (SOP) tables, Inventory, (IV) and Customer (RM) tables. The Sales Summary contains the same report filters, Year, Month and Company as the Financial Summary report. The top left visual shows the total sales by product class ranked in descending order. The bottom left visual shows the total sales by customer class/sales channel. The top two visuals in the middle and right show the total sales and transactions based on the filters selected with a seven-day moving average behind them. The bottom right visual shows a comparative between cumulative sales and cumulative costs.


Customer Analysis

The Customer Analysis report allows you to dig into Dynamics GP sales data by Customer, Year, Month and Company. By selecting a customer on the left you get to see three years of comparative sales, sales detail, last sales date and amount, total sales, sales last year, totals profits and percent sales growth for that customer.


Future Dashboard developments

In the coming months, I plan additional Power BI reports associated with my Dynamics GP data. These reports will include:

  • Vendor Analysis
  • Inventory Analysis
  • Field Service – Contract and Service Call Analysis

Stay tuned for the future developments of the reports and an overview of their functionality.

WOW! A Dynamics GP multi-company Financial Dashboard – Part Deux

Back in October of 2015, I posted a blog about how to modify Jared Hall’s awesome Excel Financial Dashboard that you can download here. The problem with his solutions is that most Dynamics GP environments are multi-company setups making these dashboards not very functional without navigating and opening multiple Excel spreadsheets to view each company’s dashboard. So, I took it upon myself to modify the one provided by Jared Hall to work in a multi-company Dynamics GP setup, but that was so 2015!

This is 2017, so let’s create our Financial Dashboard in Power BI. The best part about using Power BI over Excel is that we can make is available through the Power BI service and via the iOS and Android Apps. Making it a much better option for mobility and availability.

The Parts to the Solution

For my solution, I choose to use the following tools:

  • A Small DataMart
  • Power BI Designer
  • Power BI Service

The DataMart

I chose to implement a DataMart to simplify the importing of the tables into the Power BI Designer. I created the following tables to implement my Financial Dashboard:

  • DimAccount
  • DimHeader – to summarize the report layout
  • DimDate
  • DimCompany
  • FactFinance – General Ledger data from the GL20000 and GL30000 tables of each company

Dashboard Design

I imported the above tables into Power BI Designer rather than using the DirectQuery mode. Using the Import option will allow for the full functionality of DAX and more importantly the Time Intelligence functions of DAX. Your table relationships should look like the screen shot below after you import the data.


The DAX Measures

The Profit & Loss Statement layout is handled by several DAX measures, the DimHeader table and the Sign and Report Sign columns within the Account table. I created the following DAX measures in the order listed below:

  1. Header Order
  2. HeaderCalcType
  3. Dollars
  4. Dollars with Sign
  5. Dollars with Report Sign
  6. Running Dollars with Sign
  7. Current Period
  8. Cumulative Sales (Selected)

The code for the DAX measures can be downloaded here from the Finance Dashboard template.

Visualize This 

An easy to use modern multi-company cloud-based or mobile app Finance Dashboard that shows your critical data, so you can spot trends, share insights, and make smart decisions.



I’m loving the Power BI Service and all the modern cloud-based analytics that it can bring to your dashboards and reports.

Power BI Visuals in On Premise SQL Server Reporting Services

Yesterday, January 17, 2017, Microsoft released the Power BI reports in SQL Server Reporting Services (SSRS) technical preview. You can read all about it here. I downloaded the technical preview and installed it on my system and deployed a Power BI Visual to my SSRS web site. Here are the steps to get it installed on your system.

· Download the SSRS Technical Preview and Power BI Designer for the preview


· The Technical Preview has the following system requirements:

    • Your own VM or server (not one in production use)
    • Windows Server 2012 or later (or Windows 8 or later)
    • .NET Framework 4.5.2 or later
    • SQL Server Database Engine (2008 or later), to store the report server database
    • SQL Server Analysis Services (2012 SP1 CU4 or later), to store your data models

· I first installed the SQLServerReportingServices.exe on my system and accepted the licensing terms and agreement and then selected the “Install” button.


· After the installation completed I needed to configure the technical preview of SSRS by selecting the “Configure Report Server” button.


· I provided my SQL Server name to Connect the Reporting Services Configuration Manager.


**NOTE – While the configuration manager let, me configure the SSRS technical preview web site with my original VM’s computer name of ‘BARRYCROWEL38F0’ I could not publish by Power BI report with that computer name. I shortened the name to 7 characters and removed the numbers from the name to get the Power BI Designer Technical Preview to function with the SSRS web site.

· Design a Power BI visual with the Power BI Designer Technical Preview. In my visual, I connected to an SQL Server Analysis Services (SSAS) database. To publish you report to SSRS, select the ‘Save As’ menu option.

Power BI Technical Preview SSRS

· Following the screen prompts and supply the Reporting Server.

Power BI Technical Preview SSRS

· Then enter the report name.

Power BI technical preview SSRS

The final product is a Power BI visual with slicer functionality in an on premise SSRS web site.

**NOTE – the current SSRS technical preview does not work with custom visuals or R visuals.

Power BI On Premise


I’m loving the first on premise technical preview and can’t wait for what next.

Dynamic Security – a look at the final product


Recently I posted a blog titled “How to implement Analysis Services Dynamic Security based on Microsoft CRM user access rights”. That post talked about how to implement the back-end structure needed to put it in place but didn’t show the result of all that hard work.

The dynamic row level security (RLS) that we setup inside Analysis Services will work within reports and dashboards built inside Reporting Services, Power BI and Excel. Power BI has other ways of deploying RLS when not using an Analysis Services connection. You can learn more about that here:



The End Result

We built are Dashboard in Power BI Designer and published the Dashboard to our end users. One of the end users “Tom” has access to 558 CRM Customer account within the US. When he accesses the Dashboard, his can view results that he has privileges to. (See Figure 1 below)

Power BI dashboard image 1

Figure 1 – Tom’s access view

If “Tom” tries to access an area that he doesn’t have privileges to, say Canada, the dashboard allows him to select the filter and changes the chart(s) results to show no data. (See Figure 2 below)

Power BI dashboard image 2

Figure 2 – Tom’s view with Canada selected

Additionally, the chart(s) change based on the end user that’s accessing the dashboard. In Figure 3 below, I show an administrative user accessing the dashboard and the result shown to them when selecting the “Canada” filter.

Power BI Dashboard image 3

Figure 3 – Administrative user view of Canadian Sales


I hope this helps you with your future Analysis Services development and Dashboard projects.

How to implement Analysis Services Dynamic Security based on Microsoft CRM user access rights


In Analysis Services implementations, there is always a demand for implementing security. In this post, I’ll describe a dynamic or attribute security solution in Analysis Services that I built at a customer based on Microsoft CRM Salesperson customer access rights. The data level or row level access is based on CRM security access rights and a SQL script and SSIS package will deploy the user Active Directory account and Customer Number to a custom CRM table. By leveraging the CRM security already established we will reduce the administrative tasks needed for making additional security changes in Analysis Services whenever there is a CRM security change.

The Setup

Here are the pieces that we need to deploy to put the security in place:

1. Microsoft CRM table script

2. DataMart Acct_Access table script

3. DataMart User table script

4. Add Dynamics Security role to Cube and define MDX to filter Active Directory user access

5. Create SSIS package to populate and update Acct_Access and User tables in DataMart

1. Microsoft CRM table script







CREATE TABLE [dbo].[Acct_Access](

[AccountNumber] [varchar](200) NULL,

[AccountID] [varchar](200) NULL,

[DomainUser] [varchar](200) NULL





2. DataMart Acct_Access Table Script





CREATE TABLE [dbo].[Acct_Access](

[Access_Key] [bigint] IDENTITY(1,1) NOT NULL,

[AccountNumber] [varchar](200) NULL,

[AccountID] [varchar](200) NULL,

[DomainUser] [varchar](200) NULL,

[Customer_Key] [bigint] NULL,

[User_Key] [bigint] NULL,

[Company_Key] [bigint] NULL,



[Access_Key] ASC



3. DataMart User Table Script





CREATE TABLE [dbo].[User](

[User_Key] [bigint] IDENTITY(1,1) NOT NULL,

[DomainUser] [varchar](200) NULL,



[User_Key] ASC





4. Add Dynamic Security role to Cube and define MDX script to filter Active Directory user access

Once this DataMart and CRM table structure is in place your next step is to bring it into your Analysis Services solution.  Here are the steps to follow inside BIDS/SSDT to leverage these tables

· Add the new tables (User and Acct_Access) to you Data Source View.

· Create a new dimension based off the User table.  Hide all the columns by changing AttributeHierarchyVisible to False.  You don’t want users to be able to view this dimension and process the dimension.

· Create a new Measure Group in the cube based off the Acct_Access table and name it Security.  Delete all measure it creates except for the automatically generated Count measure.  Select the single measure left to be hidden by selecting it and changing the Visible property to False.  This will hide the entire Measure Group from your users but you can still use it when writing MDX.

· Ensure these two have a Regular relationship setup in the Dimension Usage table of the Cube browser as seen in the screen shot below. 


· In the Solution Explorer, right-click on Roles and select New Role.  Name the Role Dynamic Security and give Read access in the General, Data Sources and Cubes tabs.  On the Membership tab add your CRM groups or user accounts to this Role.

· On the Dimension Data tab select the Customer dimension from the dropdown list.  Then select the Customer Number from the dropdown list.


· Inside the Dimension Data tab select the Advanced tab.  In the Allowed member set area enter this MDX:


{[Customer].[Customer Number].Members},

STRTOSET(“[Customer].[Domain User].&[“+username()+”]”),




5. Create SSIS package to populate and update Acct_Access and User tables in DataMart

With the table structure and the Cube solution complete, we need to create the SSIS package to populate and update the Acct_Access table in our CRM database and then bring the updated data into the DataMart. What we need:

a. First we create a SQL script task and add the script below to delete and repopulate the table with the current CRM customer access data. Change the yellow highlighted areas below for your environment needs.

execute as login = ‘Contoso\bcrowell’

delete Acct_Access


declare @DomainName varchar(200)

–Make sure you execute the step to create the domain user table with the correct permissions.

execute as login = ‘Contoso\bcrowell’

if exists (select * from sysobjects where name = ‘DomainUser’)

drop table DomainUser

select DomainName into DomainUser from SystemUser

grant select on DomainUser to [Contoso\Domain Users]

declare SEL_CUR cursor for select DomainName from SystemUser where IsDisabled = 0

open SEL_CUR

fetch next from SEL_CUR into @DomainName

while @@FETCH_STATUS = 0



execute as login = @DomainName

insert into Acct_Access(AccountNumber, AccountID, DomainUser) select AccountNumber, AccountID, @DomainName from FilteredAccount

fetch next from SEL_CUR into @DomainName



close SEL_CUR

deallocate SEL_CUR

drop table DomainUser


b. Add additional Data Flow task to take CRM Acct_Access data and populate/update the DataMart’s Acct_Access and User tables.


I’m not the first to deploy some of the techniques used above and I started my learning by reviewing these two blog posts below and modified it to work with security rights already setup in Microsoft CRM.



I hope this helps you with your future Analysis Services and Dashboard development projects.

Design Best Practices to Build Killer Dashboards

When buying real estate its all about location, location, location. Like real estate, dashboards or data visualizations are all about one thing – data, data, data. With that in mind here are some of my data visualization best practice design tips. Make your data visualization:

  • Relative
  • Interactive
  • Simple

Make it Relative

Make sure that the data underlying your dashboard is up-to-date, accurate and that your selected KPIs, metrics and charts tell your intended story or current business challenges.

Data can be from this quarter, this week, this hour—whatever the right timeline is for your business. Stale data can lend to you thinking you’re making fact-based decisions, but the data is no longer representative of or relevant to your current situation.

Make it interactive

Your dashboard or data visualization has to be relevant to its intended audience. Dashboards need to start with an audience in mind. Who is the consumer of the dashboard? What are their information needs? What do they already know? What are their experiences and prejudices? As we design the dashboard, understanding the consumers of the dashboard will help us craft a product that they love to use.

A complicating factor is that most dashboards have multiple audiences. In fact, delivering the same dashboard across an entire organization has the potential benefit of getting everyone on the same page. However, a diverse audience is hard to serve well. Therefore, try to prioritize the audiences so conflicts can be more easily handled.

After figure out your intended audience, you need to choose the right KPIs, metrics and make it visually engaging – you’ve got everyone looking at the same page. But once they’re on the same page, viewers will have unique questions about what they see. Create your dashboard so that individual viewers can interact with it to get the answers they seek.

Keep it Simple

This one is critical. No Clutter! Resist the temptation to make your dashboard too ashy or over-designed, with gauge-like graphics and widgets. Any dashboard or data visualization should follow the Fibonacci Spiral principle. It is a pretty cool thing to use/apply to your dashboards. Use it wisely and when it works, it works really well. Not all visualizations have to fit the Fibonacci Spiral though. For more on the Fibonacci Spiral principle, please see these links:



For simple and well thought out dashboard designs choose the right chart to tell your story. So how do you choose the right chart? While a few years ago Dr. Andrew Abela published a good Diagram helping to decide about which charts are a better fit for a given data and problem at hand (please click on image below to see it in full size).


Dr. Abela also published interesting thoughts about visualization taxonomies and recommended this 3 years old book by Dan Roam, who also published this Visual Thinking “Codex”. Juice Analytics converted Dr. Abela’s diagram to online Chart Chooser application, I suggest to review it when building your dashboard and selecting a chart to visualization your data.


Final Thoughts

Hope this helps you with your future dashboard and data visualization designs.. 

Real Time Data Warehouse ETL

Part 2 – Implementing Service Broker as a Real Time ETL Tool – The Code

To implement my Service Broker ETL API, I started with the basic code/framework provided by Eitan Blumin in this blog post: He provides a good basic framework for the setup of Service Broker and an explanation of the installation scripts. To learn more on the basic framework and scripts to implement Service Broker please refer to his post.

In this post, I’m going to provide his scripts for the basic installation with some modifications to work with Dynamics GP database and Solver’s BI360 data warehouse plus my modified stored procedure, usp_AT_iGLETL, that provides the logic of the ETL process from Dynamics GP’s general ledger table, GL20000, to Solver’s BI360 staging table.

General Workflow

Eitan’s general Service Broker workflow looks like this:

1. The user performs an UPDATE operation on a table. In my case with the modifications in place, the posting of a General Ledger batch within Dynamics GP will be the UPDATE operation on the GL20000 table.

2. An AFTER INSERT trigger on the table is fired. This trigger compiles the contents of the INSERTED and DELETED tables into XML parameters, creates a Service Broker message and sends it to a queue. The original transaction immediately returns. The modifications to the AFTER INSERT trigger pass the usp_AT_iGLETL stored procedure into the SB_AT_Fire_Trigger stored procedure.

3. The Service Broker service fires up and processes the messages in the queue independently of the original transaction. It opens up a transaction that will pull the message out of the queue, execute a relevant stored procedure that will use the XML data previously taken from the INSERTED and DELETED tables, and implement the relevant logic within the usp_AT_iGLETL stored procedure.

Service Broker Framework Installation and Usage

The installation script provided below implement a generic Service Broker framework which will allow you to use it for any table without any changes to the installed objects. The framework will work like this:

1. First, run the “Step1_SB_AT_Installation” script on the Dynamics GP company database where you want the asynchronous triggers.

2. Run the “Step2_GP_GLTable_Trigger” script on the Dynamics GP company database to create the AFTER INSERT trigger on the GL20000 table.

3. Run the “Step3_GP_ETL_Logging” script to create the basic framework for logging the Service Broker ETL process and error logs.

4. Run the “Step4_BI360_f_Trans_GL_Staging_Table” script to create the staging table for in BI360DW database.

5. Run the “Step5_ETL_StoredProcedure” script on the Dynamics GP company database to create the stored procedure, usp_AT_iGLETL, with the ETL logic.

Download Code

You should now have in place a “Near Real Time ETL” process in place to handle the loading of data from Dynamics GP to staging table within Solver’s BI360DW data warehouse database. You could then use the same logic and scripts above to handle the loading of the data from the staging table to the f_Trans_GL table within the BI360DW database. In my tests of the implementation outlined above, I was able to post a 350 line batch within Dynamics GP and load the staging table within five seconds.


Hopes this helps you with the basic understanding and installation of Service Broker as an ETL tool in loading your data warehouse..

Real Time Data Warehouse ETL

Part 1 – Implementing Service Broker as a Real Time ETL Tool

The cheapest and easiest way to solve the real-time ETL problem is to not even attempt it in the first place, but we live in the real world and businesses are saying that:

· Viewing yesterday’s data is longer sufficient.

· They need a faster reaction time to respond to threats and opportunities.

To meet client demand for real time data, I needed to look into modifying my extracting, transforming and loading (ETL) process. One of the most difficult parts of building any data warehouse is the process ETL of data from the source system. Performing ETL of data in real-time introduces additional challenges. Almost all ETL tools and systems, whether based on off-the-shelf products or custom-coded, operate in a batch mode. They assume that the data becomes available as some sort of extract file on a certain schedule, usually hourly, nightly, weekly, or monthly. Then the system transforms and cleanses the data and loads it into the data warehouse. When loading data continuously in real-time, there can’t be any system downtime. The heaviest periods in terms of data warehouse usage may very well coincide with the peak periods of incoming data. The requirements for continuous updates with no warehouse downtime are generally inconsistent with traditional ETL tools and systems.

One solution is to settle for near-real time updates by increasing the frequency of the data loads. Weekly loads can be replaced by daily loads, for example, or daily loads can be changed to hourly or more frequently. This is generally a much easier and cheaper proposition than trying to update in real-time. For one thing it doesn’t require getting a whole new ETL tool set and changing your prospective from a query/pull ETL process to an event driven architecture.

I work in the Microsoft BI Stack so I develop my ETL processes using SQL Server Integration Services (SSIS), but to meet increasing client demand for real time data integration into their Data Warehouse I needed to learn a new technology to me. Enter SQL Service Broker. So what is Service Broker?

· A core service of SQL Server since SQL 2005.

· Used internally within SQL Server (Database Mail, etc.).

· Mechanism for the guaranteed asynchronous delivery and processing of messages between databases (event driven process – changes to a database table).

I didn’t count on replacing the ETL/SSIS batch process; more likely just augmenting it with Service Broker. My new ETL process will now look something like figure 1.


Pros & Cons of using Service Broker as an ETL tool



  • Flexible
  • Included in all editions (Express or greater)
  • Fast & efficient delivery
  • Reliable delivery
  • Secure (Windows authentication or certificate-based encryption)
  • Relatively high implementation cost (custom development)
  • Can require some tuning and manipulation to get it to perform well
  • Lacks good GUI tools & wizards
  • Lacks good automated monitoring tools

Service Broker Learning Resources

Unlike searching Google when you need help with an SSIS package error, fix or just to learn something new the resources for learning Service Broker are limited. I have complied some of the best that I have found. – this is a blog of the great expert in the Service Broker area. Blog contains an impressive volume of quite interesting publications covering Service Broker. And author cares about updating. – Denny Cherry’s blog: you will be able to find a series of articles “Back to Basics” there. These publications describe all basic entities and commands (such as SEND, RECIVE) in the Service Broker. – Marcin Policht’s journal contains several publications covering Service Broker (some parts of big series about the SQL Server 2005 Express Edition):

1) Part 26 – Introduction to Service Broker:

2) Part 27 – Implementing Basic Service Broker Objects:

3) Part 28 – Implementing Service Broker Conversation:

In Part 2 of this series on implementing Service Broker as a real time ETL tool, we will look at the code that I used to integrate Dynamics GP General Ledger data changes into BI360 Data Warehouse.

How to Analyze and Report on your Data

Part 2 of 2 in Data Warehousing series

Every business from startup to larger established businesses needs to track details about their customers, products, sales, purchases, social media, website logs to name a few. By extracting, manipulating and analyzing this data you can determine key metrics to help you understand more about your customers and grow your business. In part 1 we talked about how to kick start your data warehousing with BI360 and now we need to know what to do with all this data.

Basically you can use your Data Warehouse for financial statement reporting and analysis, dashboards and data mining. I previously went through how to use BI360’s One Stop Reporting here  so I will be talking more about data mining and .

Data mining can be defined as the process of analyzing large quantities of data through automatic or semi-automatic tasks to extract previously unknown interesting patterns. Over the years, storage has grown at a faster rate than computing power. As a result, companies have stored enormous amounts of data and have become data-rich but knowledge poor. The main purpose of data mining is to increase the value of these large datasets by extracting knowledge from them and making the data useful to the business.[1]

In general, there are two types of data mining – predictive and descriptive. Predictive data mining will use variables or fields from a dataset to predict unknown or future values. The goal is to produce a model from a defined dataset that is used to perform classification, estimation, and other data mining tasks. On the other hand, descriptive data mining will focus on finding patterns that describe the data and can be interpreted by data analysts or business users. The goal is to gain understanding of the data by uncovering new relationships, patterns, and important information regarding the dataset.

Data mining can be valuable in virtually any industry and business scenario. The following are just a few examples of how data mining can improve decision making and give businesses a better return on investment with their Business Intelligence (BI) projects:

  • Recommendation Generation and Retail Floor Optimization:

What products/services should a company offer to its customers? By heat mapping customer traffic patterns it can help place offerings. Casino use this to help optimize the game theme selection, denomination selection, placement and removal

E-Commerce Websites – analyze purchasing behaviors of customer population and recommend products based on items in a customer’s cart

  • Anomaly Detection:

Analyze items that don’t fit a certain pattern. Credit card, insurance companies and business with POS systems use this method to detect fraud

  • Churn Analysis:

Which customers are most likely to switch to a competitor? When will they leave and can it be acted on before they do? Based on findings, companies can improve relationships and offer those customers discounts or loyalty incentives.

  • Market Penetration:

How many customers or sales per geographic location.

  • Risk Management:

Determine the risk of a loan or mortgage based on the customer profile and amount they are asking

Companies can make decisions based on cost and risk of the loan by using historical customer data

  • Customer Segmentation:

How well do companies know their customers? What is their Life Time Value (LTV)?

Determines behavioral and descriptive profiles for their customers in order to target marketing campaigns. The behavioral and descriptive profiles high LTV customers and be used and applied against first time customers’ behavioral and descriptive profiles to capture their loyalty.

  • Forecasting:

Estimate how much sales and/or inventory for each week, month, and quarter of a specific year

Data Mining Tasks

Determining the correct task and algorithm to apply to your dataset is a crucial step to achieving an accurate and useful data mining model. In some cases, it will be quite obvious which task will be the most accurate to use depending on the nature of your data. More often than not, you will need to explore and combine multiple tasks before arriving at a single solution. The following section describes the seven basic data mining and some additional resources and discussion on each:

Classification: Can be used to identify loan applicants as low, medium, or high credit risks based on attributes such as income, credit score, employment history, home ownership, and amount of debt. In this case the target would be credit score, and the other attributes would be the predictors.

Clustering: By using attributes such as Income and Age, three clusters could be created (could be more than 3 depending on data) – 1. Younger population with low income, 2. Middle Age with higher income, and 3. Older Customers with lower income.

Association: (also called Market Basket Analysis) Perhaps most famously, Amazon uses these types of analyses to suggest additional items you may be interested in purchasing, based upon other items frequently bought together.

Sequence Analysis: Can be used to analyze the sequence of web clicks on a website. The results are probabilities of the next click in the sequence, i.e. if a user clicks ‘News’, there is a 20% chance the next click will be ‘Sports’ and a 30% chance it will be ‘Weather’.

Sentiment Analysis

Social Media Analysis

Deviation Analysis: Used to find the rare cases that do not match their behavior to the ‘norm’; most commonly used in fraud detection – finds the transactions that don’t match the spending habits of the customer.

Regression: Can be used to predict a value of a house based on location, number of rooms, land size, and crime rates

Forecasting: Based on the sales from last year by month, how many cases of soda will the northeast store sell in January? The output would be the estimated number of cases of soda for January.

Next steps

For a quick how to on using SSAS Data Mining check out my previous post here.