Real Time Data Warehouse ETL

Part 2 – Implementing Service Broker as a Real Time ETL Tool – The Code

To implement my Service Broker ETL API, I started with the basic code/framework provided by Eitan Blumin in this blog post: http://www.madeiradata.com/service-broker-asynchronous-triggers/. He provides a good basic framework for the setup of Service Broker and an explanation of the installation scripts. To learn more on the basic framework and scripts to implement Service Broker please refer to his post.

In this post, I’m going to provide his scripts for the basic installation with some modifications to work with Dynamics GP database and Solver’s BI360 data warehouse plus my modified stored procedure, usp_AT_iGLETL, that provides the logic of the ETL process from Dynamics GP’s general ledger table, GL20000, to Solver’s BI360 staging table.

General Workflow

Eitan’s general Service Broker workflow looks like this:

1. The user performs an UPDATE operation on a table. In my case with the modifications in place, the posting of a General Ledger batch within Dynamics GP will be the UPDATE operation on the GL20000 table.

2. An AFTER INSERT trigger on the table is fired. This trigger compiles the contents of the INSERTED and DELETED tables into XML parameters, creates a Service Broker message and sends it to a queue. The original transaction immediately returns. The modifications to the AFTER INSERT trigger pass the usp_AT_iGLETL stored procedure into the SB_AT_Fire_Trigger stored procedure.

3. The Service Broker service fires up and processes the messages in the queue independently of the original transaction. It opens up a transaction that will pull the message out of the queue, execute a relevant stored procedure that will use the XML data previously taken from the INSERTED and DELETED tables, and implement the relevant logic within the usp_AT_iGLETL stored procedure.

Service Broker Framework Installation and Usage

The installation script provided below implement a generic Service Broker framework which will allow you to use it for any table without any changes to the installed objects. The framework will work like this:

1. First, run the “Step1_SB_AT_Installation” script on the Dynamics GP company database where you want the asynchronous triggers.

2. Run the “Step2_GP_GLTable_Trigger” script on the Dynamics GP company database to create the AFTER INSERT trigger on the GL20000 table.

3. Run the “Step3_GP_ETL_Logging” script to create the basic framework for logging the Service Broker ETL process and error logs.

4. Run the “Step4_BI360_f_Trans_GL_Staging_Table” script to create the staging table for in BI360DW database.

5. Run the “Step5_ETL_StoredProcedure” script on the Dynamics GP company database to create the stored procedure, usp_AT_iGLETL, with the ETL logic.

Download Code

You should now have in place a “Near Real Time ETL” process in place to handle the loading of data from Dynamics GP to staging table within Solver’s BI360DW data warehouse database. You could then use the same logic and scripts above to handle the loading of the data from the staging table to the f_Trans_GL table within the BI360DW database. In my tests of the implementation outlined above, I was able to post a 350 line batch within Dynamics GP and load the staging table within five seconds.

Conclusion

Hopes this helps you with the basic understanding and installation of Service Broker as an ETL tool in loading your data warehouse..

Advertisements

Real Time Data Warehouse ETL

Part 1 – Implementing Service Broker as a Real Time ETL Tool

The cheapest and easiest way to solve the real-time ETL problem is to not even attempt it in the first place, but we live in the real world and businesses are saying that:

· Viewing yesterday’s data is longer sufficient.

· They need a faster reaction time to respond to threats and opportunities.

To meet client demand for real time data, I needed to look into modifying my extracting, transforming and loading (ETL) process. One of the most difficult parts of building any data warehouse is the process ETL of data from the source system. Performing ETL of data in real-time introduces additional challenges. Almost all ETL tools and systems, whether based on off-the-shelf products or custom-coded, operate in a batch mode. They assume that the data becomes available as some sort of extract file on a certain schedule, usually hourly, nightly, weekly, or monthly. Then the system transforms and cleanses the data and loads it into the data warehouse. When loading data continuously in real-time, there can’t be any system downtime. The heaviest periods in terms of data warehouse usage may very well coincide with the peak periods of incoming data. The requirements for continuous updates with no warehouse downtime are generally inconsistent with traditional ETL tools and systems.

One solution is to settle for near-real time updates by increasing the frequency of the data loads. Weekly loads can be replaced by daily loads, for example, or daily loads can be changed to hourly or more frequently. This is generally a much easier and cheaper proposition than trying to update in real-time. For one thing it doesn’t require getting a whole new ETL tool set and changing your prospective from a query/pull ETL process to an event driven architecture.

I work in the Microsoft BI Stack so I develop my ETL processes using SQL Server Integration Services (SSIS), but to meet increasing client demand for real time data integration into their Data Warehouse I needed to learn a new technology to me. Enter SQL Service Broker. So what is Service Broker?

· A core service of SQL Server since SQL 2005.

· Used internally within SQL Server (Database Mail, etc.).

· Mechanism for the guaranteed asynchronous delivery and processing of messages between databases (event driven process – changes to a database table).

I didn’t count on replacing the ETL/SSIS batch process; more likely just augmenting it with Service Broker. My new ETL process will now look something like figure 1.

clip_image002

Pros & Cons of using Service Broker as an ETL tool

Pros

Cons

  • Flexible
  • Included in all editions (Express or greater)
  • Fast & efficient delivery
  • Reliable delivery
  • Secure (Windows authentication or certificate-based encryption)
  • Relatively high implementation cost (custom development)
  • Can require some tuning and manipulation to get it to perform well
  • Lacks good GUI tools & wizards
  • Lacks good automated monitoring tools

Service Broker Learning Resources

Unlike searching Google when you need help with an SSIS package error, fix or just to learn something new the resources for learning Service Broker are limited. I have complied some of the best that I have found.

http://rusanu.com/blog/ – this is a blog of the great expert in the Service Broker area. Blog contains an impressive volume of quite interesting publications covering Service Broker. And author cares about updating.

http://www.madeiradata.com/service-broker-asynchronous-triggers/

http://itknowledgeexchange.techtarget.com/sql-server/tag/service-broker/ – Denny Cherry’s blog: you will be able to find a series of articles “Back to Basics” there. These publications describe all basic entities and commands (such as SEND, RECIVE) in the Service Broker.

http://www.databasejournal.com/article.php/1503191/ – Marcin Policht’s journal contains several publications covering Service Broker (some parts of big series about the SQL Server 2005 Express Edition):

1) Part 26 – Introduction to Service Broker: http://www.databasejournal.com/features/mssql/article.php/3757581/

2) Part 27 – Implementing Basic Service Broker Objects: http://www.databasejournal.com/features/mssql/article.php/3759311/

3) Part 28 – Implementing Service Broker Conversation: http://www.databasejournal.com/features/mssql/article.php/3762261/

In Part 2 of this series on implementing Service Broker as a real time ETL tool, we will look at the code that I used to integrate Dynamics GP General Ledger data changes into BI360 Data Warehouse.

How to Analyze and Report on your Data

Part 2 of 2 in Data Warehousing series

Every business from startup to larger established businesses needs to track details about their customers, products, sales, purchases, social media, website logs to name a few. By extracting, manipulating and analyzing this data you can determine key metrics to help you understand more about your customers and grow your business. In part 1 we talked about how to kick start your data warehousing with BI360 and now we need to know what to do with all this data.

Basically you can use your Data Warehouse for financial statement reporting and analysis, dashboards and data mining. I previously went through how to use BI360’s One Stop Reporting here  so I will be talking more about data mining and .

Data mining can be defined as the process of analyzing large quantities of data through automatic or semi-automatic tasks to extract previously unknown interesting patterns. Over the years, storage has grown at a faster rate than computing power. As a result, companies have stored enormous amounts of data and have become data-rich but knowledge poor. The main purpose of data mining is to increase the value of these large datasets by extracting knowledge from them and making the data useful to the business.[1]

In general, there are two types of data mining – predictive and descriptive. Predictive data mining will use variables or fields from a dataset to predict unknown or future values. The goal is to produce a model from a defined dataset that is used to perform classification, estimation, and other data mining tasks. On the other hand, descriptive data mining will focus on finding patterns that describe the data and can be interpreted by data analysts or business users. The goal is to gain understanding of the data by uncovering new relationships, patterns, and important information regarding the dataset.

Data mining can be valuable in virtually any industry and business scenario. The following are just a few examples of how data mining can improve decision making and give businesses a better return on investment with their Business Intelligence (BI) projects:

  • Recommendation Generation and Retail Floor Optimization:

What products/services should a company offer to its customers? By heat mapping customer traffic patterns it can help place offerings. Casino use this to help optimize the game theme selection, denomination selection, placement and removal

E-Commerce Websites – analyze purchasing behaviors of customer population and recommend products based on items in a customer’s cart

  • Anomaly Detection:

Analyze items that don’t fit a certain pattern. Credit card, insurance companies and business with POS systems use this method to detect fraud

  • Churn Analysis:

Which customers are most likely to switch to a competitor? When will they leave and can it be acted on before they do? Based on findings, companies can improve relationships and offer those customers discounts or loyalty incentives.

  • Market Penetration:

How many customers or sales per geographic location.

  • Risk Management:

Determine the risk of a loan or mortgage based on the customer profile and amount they are asking

Companies can make decisions based on cost and risk of the loan by using historical customer data

  • Customer Segmentation:

How well do companies know their customers? What is their Life Time Value (LTV)?

Determines behavioral and descriptive profiles for their customers in order to target marketing campaigns. The behavioral and descriptive profiles high LTV customers and be used and applied against first time customers’ behavioral and descriptive profiles to capture their loyalty.

  • Forecasting:

Estimate how much sales and/or inventory for each week, month, and quarter of a specific year

Data Mining Tasks

Determining the correct task and algorithm to apply to your dataset is a crucial step to achieving an accurate and useful data mining model. In some cases, it will be quite obvious which task will be the most accurate to use depending on the nature of your data. More often than not, you will need to explore and combine multiple tasks before arriving at a single solution. The following section describes the seven basic data mining and some additional resources and discussion on each:

Classification: Can be used to identify loan applicants as low, medium, or high credit risks based on attributes such as income, credit score, employment history, home ownership, and amount of debt. In this case the target would be credit score, and the other attributes would be the predictors.

Clustering: By using attributes such as Income and Age, three clusters could be created (could be more than 3 depending on data) – 1. Younger population with low income, 2. Middle Age with higher income, and 3. Older Customers with lower income.

Association: (also called Market Basket Analysis) Perhaps most famously, Amazon uses these types of analyses to suggest additional items you may be interested in purchasing, based upon other items frequently bought together.

http://www.casinoenterprisemanagement.com/articles/may-2009/turning-market-basket-analysis-action

http://blogs.office.com/2014/10/08/6-cool-ways-use-custom-maps/

Sequence Analysis: Can be used to analyze the sequence of web clicks on a website. The results are probabilities of the next click in the sequence, i.e. if a user clicks ‘News’, there is a 20% chance the next click will be ‘Sports’ and a 30% chance it will be ‘Weather’.

Sentiment Analysis

http://blogs.msdn.com/b/microsoft_business_intelligence1/archive/2012/09/19/the-social-media-summer-games-sentiment-tracking-in-power-view-it-s-a-slam-dunk.aspx

https://cwebbbi.wordpress.com/2014/10/19/sentiment-analysis-in-excel-with-azure-machine-learning-and-power-query/

Social Media Analysis

http://www.digital-warriors.com/social-media-analytics-power-business/

Deviation Analysis: Used to find the rare cases that do not match their behavior to the ‘norm’; most commonly used in fraud detection – finds the transactions that don’t match the spending habits of the customer.

http://www.journalofaccountancy.com/Issues/2003/Aug/TurnExcelIntoAFinancialSleuth

Regression: Can be used to predict a value of a house based on location, number of rooms, land size, and crime rates

Forecasting: Based on the sales from last year by month, how many cases of soda will the northeast store sell in January? The output would be the estimated number of cases of soda for January.

Next steps

For a quick how to on using SSAS Data Mining check out my previous post here.


[1] http://en.wikipedia.org/wiki/Data_mining

Kick-Start Your Data Warehousing

Many of the projects that I’ve been working on over the last 12 months have evolved integrating data into a data warehouse. Specifically companies are looking to import more data from other systems to do various types of analysis. Data integration is nothing new, companies have been doing it for a long time, but usually with a very manual process that includes vast Microsoft Excel spreadsheets.

What Is a Data Warehouse?

A data warehouse is a database designed to enable reporting and data analysis activities. Data warehouses store current and historical data from one or more disparate sources. Some of these sources may include:

  • CRM data
  • ERP data
  • Supply Chain Management System data
  • Human Resource Management System data
  • Google Analytics data
  • Membership System data
  • Loyalty program data
  • Point of Sale/Hotel Management System data
  • Other Legacy Application System data

Once loaded into the data warehouse, the data is used for creating trending reports, daily operational reports, dashboards, SQL Service Analysis Service data mining models or other reporting internal/external reports. Some of the benefits of a data warehouse include:

  • Congregate data from multiple sources into a single database so a single query engine can be used to integrate and present a single view of the data
  • Mitigate the problem of database isolation level lock contention in transaction processing systems caused by attempts to run large, long running, analysis queries in transaction processing databases
  • Maintain data history, even if the source transaction systems do not
  • Integrate data from multiple source systems, enabling a central view across the enterprise
  • Improve data quality by providing consistent codes and descriptions and by flagging and fixing bad data.
  • Present the organization’s information consistently
  • Provide a consistent data model of interest regardless of the data’s source
  • Restructure the data so that it optimized for human consumption (instead of optimized for machine processing)
  • Restructure the data so that it delivers excellent query performance, even for complex analytic queries, without impacting the operational systems

My Favorite Application to Kick-start Data Warehousing

One of my favorite tools to help with all of this data warehousing is Solver’s BI360 Data Warehouse and Microsoft Excel One Stop Reporting add-in. Solver’s Data Warehouse comes preconfigured with seven ERP modules: General Ledger, Payables, Accounts Receivable, Sales, Human Resources/Payroll, Capital and Projects. Additionally, with BI360’s data warehouse manager you have the ability to custom configure additional Dimension and Fact tables to load the additional integration data points discussed above. This all means that your implementation can be completed in days or weeks instead of months or years, and at a fraction of the cost and risk of a traditional “home-grown” data warehouse project. To help with developing reports for your end users Solver has created Several ERP and CRM report templates. These include:

ERP

  • AP Aging by Vendor
  • AP by Vendor
  • AR Aging
  • AR Detail Report
  • Balance Sheet
  • Cash Flow
  • Company Performance Dashboard
  • Inventory by Location
  • Multiple Profit and Loss Statements
  • Trial Balance

CRM

  • Campaign Efficiency
  • Lead Activity
  • Opportunity Detail by Sales Person
  • Opportunity Performance by Sales Person
  • Opportunity Summary by Sales Person
  • Top 50 Opportunities
  • Campaign ROI
  • CRM Activity report

One of the other great things about BI360 is that reports are built with their One Stop Reporting Microsoft Excel add-in. So once you get the data into Microsoft Excel you have all of its formatting and Power BI and data mining capabilities.

Next steps

So I got the disparate data sources loaded into my data warehouse, now what? How to I use it and analyze it? We’ll look at some of these data gathering technics and analyzing in my next post.