Real Time Data Warehouse ETL

Part 2 – Implementing Service Broker as a Real Time ETL Tool – The Code

To implement my Service Broker ETL API, I started with the basic code/framework provided by Eitan Blumin in this blog post: http://www.madeiradata.com/service-broker-asynchronous-triggers/. He provides a good basic framework for the setup of Service Broker and an explanation of the installation scripts. To learn more on the basic framework and scripts to implement Service Broker please refer to his post.

In this post, I’m going to provide his scripts for the basic installation with some modifications to work with Dynamics GP database and Solver’s BI360 data warehouse plus my modified stored procedure, usp_AT_iGLETL, that provides the logic of the ETL process from Dynamics GP’s general ledger table, GL20000, to Solver’s BI360 staging table.

General Workflow

Eitan’s general Service Broker workflow looks like this:

1. The user performs an UPDATE operation on a table. In my case with the modifications in place, the posting of a General Ledger batch within Dynamics GP will be the UPDATE operation on the GL20000 table.

2. An AFTER INSERT trigger on the table is fired. This trigger compiles the contents of the INSERTED and DELETED tables into XML parameters, creates a Service Broker message and sends it to a queue. The original transaction immediately returns. The modifications to the AFTER INSERT trigger pass the usp_AT_iGLETL stored procedure into the SB_AT_Fire_Trigger stored procedure.

3. The Service Broker service fires up and processes the messages in the queue independently of the original transaction. It opens up a transaction that will pull the message out of the queue, execute a relevant stored procedure that will use the XML data previously taken from the INSERTED and DELETED tables, and implement the relevant logic within the usp_AT_iGLETL stored procedure.

Service Broker Framework Installation and Usage

The installation script provided below implement a generic Service Broker framework which will allow you to use it for any table without any changes to the installed objects. The framework will work like this:

1. First, run the “Step1_SB_AT_Installation” script on the Dynamics GP company database where you want the asynchronous triggers.

2. Run the “Step2_GP_GLTable_Trigger” script on the Dynamics GP company database to create the AFTER INSERT trigger on the GL20000 table.

3. Run the “Step3_GP_ETL_Logging” script to create the basic framework for logging the Service Broker ETL process and error logs.

4. Run the “Step4_BI360_f_Trans_GL_Staging_Table” script to create the staging table for in BI360DW database.

5. Run the “Step5_ETL_StoredProcedure” script on the Dynamics GP company database to create the stored procedure, usp_AT_iGLETL, with the ETL logic.

Download Code

You should now have in place a “Near Real Time ETL” process in place to handle the loading of data from Dynamics GP to staging table within Solver’s BI360DW data warehouse database. You could then use the same logic and scripts above to handle the loading of the data from the staging table to the f_Trans_GL table within the BI360DW database. In my tests of the implementation outlined above, I was able to post a 350 line batch within Dynamics GP and load the staging table within five seconds.

Conclusion

Hopes this helps you with the basic understanding and installation of Service Broker as an ETL tool in loading your data warehouse..

Advertisements

Real Time Data Warehouse ETL

Part 1 – Implementing Service Broker as a Real Time ETL Tool

The cheapest and easiest way to solve the real-time ETL problem is to not even attempt it in the first place, but we live in the real world and businesses are saying that:

· Viewing yesterday’s data is longer sufficient.

· They need a faster reaction time to respond to threats and opportunities.

To meet client demand for real time data, I needed to look into modifying my extracting, transforming and loading (ETL) process. One of the most difficult parts of building any data warehouse is the process ETL of data from the source system. Performing ETL of data in real-time introduces additional challenges. Almost all ETL tools and systems, whether based on off-the-shelf products or custom-coded, operate in a batch mode. They assume that the data becomes available as some sort of extract file on a certain schedule, usually hourly, nightly, weekly, or monthly. Then the system transforms and cleanses the data and loads it into the data warehouse. When loading data continuously in real-time, there can’t be any system downtime. The heaviest periods in terms of data warehouse usage may very well coincide with the peak periods of incoming data. The requirements for continuous updates with no warehouse downtime are generally inconsistent with traditional ETL tools and systems.

One solution is to settle for near-real time updates by increasing the frequency of the data loads. Weekly loads can be replaced by daily loads, for example, or daily loads can be changed to hourly or more frequently. This is generally a much easier and cheaper proposition than trying to update in real-time. For one thing it doesn’t require getting a whole new ETL tool set and changing your prospective from a query/pull ETL process to an event driven architecture.

I work in the Microsoft BI Stack so I develop my ETL processes using SQL Server Integration Services (SSIS), but to meet increasing client demand for real time data integration into their Data Warehouse I needed to learn a new technology to me. Enter SQL Service Broker. So what is Service Broker?

· A core service of SQL Server since SQL 2005.

· Used internally within SQL Server (Database Mail, etc.).

· Mechanism for the guaranteed asynchronous delivery and processing of messages between databases (event driven process – changes to a database table).

I didn’t count on replacing the ETL/SSIS batch process; more likely just augmenting it with Service Broker. My new ETL process will now look something like figure 1.

clip_image002

Pros & Cons of using Service Broker as an ETL tool

Pros

Cons

  • Flexible
  • Included in all editions (Express or greater)
  • Fast & efficient delivery
  • Reliable delivery
  • Secure (Windows authentication or certificate-based encryption)
  • Relatively high implementation cost (custom development)
  • Can require some tuning and manipulation to get it to perform well
  • Lacks good GUI tools & wizards
  • Lacks good automated monitoring tools

Service Broker Learning Resources

Unlike searching Google when you need help with an SSIS package error, fix or just to learn something new the resources for learning Service Broker are limited. I have complied some of the best that I have found.

http://rusanu.com/blog/ – this is a blog of the great expert in the Service Broker area. Blog contains an impressive volume of quite interesting publications covering Service Broker. And author cares about updating.

http://www.madeiradata.com/service-broker-asynchronous-triggers/

http://itknowledgeexchange.techtarget.com/sql-server/tag/service-broker/ – Denny Cherry’s blog: you will be able to find a series of articles “Back to Basics” there. These publications describe all basic entities and commands (such as SEND, RECIVE) in the Service Broker.

http://www.databasejournal.com/article.php/1503191/ – Marcin Policht’s journal contains several publications covering Service Broker (some parts of big series about the SQL Server 2005 Express Edition):

1) Part 26 – Introduction to Service Broker: http://www.databasejournal.com/features/mssql/article.php/3757581/

2) Part 27 – Implementing Basic Service Broker Objects: http://www.databasejournal.com/features/mssql/article.php/3759311/

3) Part 28 – Implementing Service Broker Conversation: http://www.databasejournal.com/features/mssql/article.php/3762261/

In Part 2 of this series on implementing Service Broker as a real time ETL tool, we will look at the code that I used to integrate Dynamics GP General Ledger data changes into BI360 Data Warehouse.