Real Time Data Warehouse ETL

Part 1 – Implementing Service Broker as a Real Time ETL Tool

The cheapest and easiest way to solve the real-time ETL problem is to not even attempt it in the first place, but we live in the real world and businesses are saying that:

· Viewing yesterday’s data is longer sufficient.

· They need a faster reaction time to respond to threats and opportunities.

To meet client demand for real time data, I needed to look into modifying my extracting, transforming and loading (ETL) process. One of the most difficult parts of building any data warehouse is the process ETL of data from the source system. Performing ETL of data in real-time introduces additional challenges. Almost all ETL tools and systems, whether based on off-the-shelf products or custom-coded, operate in a batch mode. They assume that the data becomes available as some sort of extract file on a certain schedule, usually hourly, nightly, weekly, or monthly. Then the system transforms and cleanses the data and loads it into the data warehouse. When loading data continuously in real-time, there can’t be any system downtime. The heaviest periods in terms of data warehouse usage may very well coincide with the peak periods of incoming data. The requirements for continuous updates with no warehouse downtime are generally inconsistent with traditional ETL tools and systems.

One solution is to settle for near-real time updates by increasing the frequency of the data loads. Weekly loads can be replaced by daily loads, for example, or daily loads can be changed to hourly or more frequently. This is generally a much easier and cheaper proposition than trying to update in real-time. For one thing it doesn’t require getting a whole new ETL tool set and changing your prospective from a query/pull ETL process to an event driven architecture.

I work in the Microsoft BI Stack so I develop my ETL processes using SQL Server Integration Services (SSIS), but to meet increasing client demand for real time data integration into their Data Warehouse I needed to learn a new technology to me. Enter SQL Service Broker. So what is Service Broker?

· A core service of SQL Server since SQL 2005.

· Used internally within SQL Server (Database Mail, etc.).

· Mechanism for the guaranteed asynchronous delivery and processing of messages between databases (event driven process – changes to a database table).

I didn’t count on replacing the ETL/SSIS batch process; more likely just augmenting it with Service Broker. My new ETL process will now look something like figure 1.

clip_image002

Pros & Cons of using Service Broker as an ETL tool

Pros

Cons

  • Flexible
  • Included in all editions (Express or greater)
  • Fast & efficient delivery
  • Reliable delivery
  • Secure (Windows authentication or certificate-based encryption)
  • Relatively high implementation cost (custom development)
  • Can require some tuning and manipulation to get it to perform well
  • Lacks good GUI tools & wizards
  • Lacks good automated monitoring tools

Service Broker Learning Resources

Unlike searching Google when you need help with an SSIS package error, fix or just to learn something new the resources for learning Service Broker are limited. I have complied some of the best that I have found.

http://rusanu.com/blog/ – this is a blog of the great expert in the Service Broker area. Blog contains an impressive volume of quite interesting publications covering Service Broker. And author cares about updating.

http://www.madeiradata.com/service-broker-asynchronous-triggers/

http://itknowledgeexchange.techtarget.com/sql-server/tag/service-broker/ – Denny Cherry’s blog: you will be able to find a series of articles “Back to Basics” there. These publications describe all basic entities and commands (such as SEND, RECIVE) in the Service Broker.

http://www.databasejournal.com/article.php/1503191/ – Marcin Policht’s journal contains several publications covering Service Broker (some parts of big series about the SQL Server 2005 Express Edition):

1) Part 26 – Introduction to Service Broker: http://www.databasejournal.com/features/mssql/article.php/3757581/

2) Part 27 – Implementing Basic Service Broker Objects: http://www.databasejournal.com/features/mssql/article.php/3759311/

3) Part 28 – Implementing Service Broker Conversation: http://www.databasejournal.com/features/mssql/article.php/3762261/

In Part 2 of this series on implementing Service Broker as a real time ETL tool, we will look at the code that I used to integrate Dynamics GP General Ledger data changes into BI360 Data Warehouse.