Apr 2, 2013

The OLAP Compiler in BW-on-HANA

via Content in SCN

This blog is about a functionality that I consider as one of the crown jewel of BW-on-HANA. The approach has evolved over many years; early discussions started around the time when the BW accelerator (BWA) got initiated (around 2003, project Euclid) and were ignited by the fact that BWA (and its sequel HANA) provided a layer for processing multiple and sequenced calculations close to the data layer. This has not been possible before as we did not have control over the data processing layer of the standard RDBMS underneath the classic BW. The latter is restrained by SQL as the standard API. This blog – as a side effect – will show in what way SQL limits analytic processing, especially fast analytic processing. Also, it will become apparent that BW's OLAP has converted into a kind of sophisticated compiler for HANA's calculation engine. That combination will be hard to beat when you go beyond demo, "hello world style", single table group-by and other simple analytic examples. 

An Example
Let's look at an example which looks as if it was of "hello world style" but which quickly reveals some challenges. In figure 1, a standard OLAP query result is displayed. 

It shows the quantities of sold items per product and country. In addition, the number of distinct customers who bought those products can be seen. Finally, the quantity relative to the overall number of sold products in a country are presented as percentages.

Figure 1: Example of a result of an OLAP query

Some Challenges in the Example
Now when you carefully look at the example of figure 1 then you see some of the challenges: 
The numbers of distinct customers do not sum up. There are 5 distinct customers buying pencils and 3 buying paper, both in Germany (DE), but only 6 – and not 5+3=8 – buying products in DE. There must be 2 customers that have bought both, pencils and paper. In processing terms this means that the subtotal (e.g. by country) cannot be calculated out of the preceeding aggregation level (e.g. by country and product) but needs to be calculated from the lowest granularity (i.e. by country, product, customer). The calculated key figure quantity per country refers to the key figure quantity and sets the latter's values in relation to its subtotals. This means that the quantity key figure and its subtotals has to be calculated prior to calculating key figure quantity per country. This means there is an order of processing imposed by mathematics.

Figure 2: Challenges in the example

What you can do with SQL
In order to calculate the result of figure 1, classic BW (or SQL-based OLAP tools in general) would issue a SQL statement that retrieves a rows similar to the one shown in figure 3. That row constitutes the base set of data from which the result of figure 1 can be calculated. It is not the final result yet but from that data it is possible to calculate the final result as indicated in figure 1. Don't get fooled by the small amount of data shown in figure 3, As you can see, it is necessary to get the details on the customers in order to calculate the distinct customers per group-by level. In real world scenarios – just imagine retailers, mobile phone or utility companies – the customer dimension can be huge, i.e. holding millions of entries. Consequently and caused by real-world combination, the result of the SQL query in figure 3 is likely to be huge in such cases. That "sub-result" needs to be further processed, traditionally in an application server, e.g. BW's ABAP server or the Web-intelligence server. This implies that huge amounts of data have to be transported from the DB server to such an application server or a client tool. 

Figure 3: Rowset retrieved by a SQL query to calculate result of figure 1

By the way: BWA 7.0 accelerated exactly this part of OLAP query processing, i.e. the basic data query. Subsequent processing on top has still been executed in the application server. This is not a big issue as long as the OLAP query is dominated by the SQL processing. However, it comes short - as in this example - when the result of the basic SQL query gets huge and requires significant data transport from the DB to the application server and then significant data traversals to calculate the final result. 

The "OLAP Calculation Graph"
Now based on the result shown in figure 3 there is a natural sequence of how to calculate the various formulas (behind the calculated key figures) and the various group aggregations (i.e. the subtotals and totals). 
Many of those subsequent calculations can be considered as SQL queries on top of figure 3's result. Figure 4 shows the resulting dependency graph: LQ is the label for the query of figure 3; L1, L2, ..., L6 are "queries" or calculations on top. BW's OLAP compiler basically derives that graph and sends it down to HANA's optimizer (using a proprietary interface), HANA optimizes and processes that graph and sends back the result. Please beware that the result is not a rowset, at least not in the normal sense. It is heterogeneous sets of rows that are returned and that need to go into the appropriate result line in figure 1. In short: the compiler creates a graph to be sent to HANA and then there is a postprocessing step that receives the result and converts it to the desired result set of the OLAP query (i.e. a cellset as in fig. 1). 

Figure 4: Graph derived for processing the final result (as in fig. 1) from the data in fig. 3

Concluding Remarks
Author think there is a few fundamental things that become apparent even by looking at the simple example discussed in this blog: Even though individual processing steps can be expressed via SQL, it is in the end a well defined sequence of processing steps that yield the result.

Accelerating the individual steps helps but falls short. For example, an OLAP client tool can derive an OLAP graph like the one in fig. 4. One alternative is that it issues for each node in that graph a SQL or SQL-like query. To avoid the data transport, it can materialize intermediate results. However, this constitutes an overhead. As a second alternative (the one frequently met in practice), it is possible to issue only the basic query - labeled "LQ" in fig. 4 - and transport a potentially huge result set over the network to the client in order to calculate the rest on the client level. This is the traditional approach which obviously also suffers from the transportation overhead.

BW-on-HANA resolves those issues by providing a powerful option to define an OLAP query - i.e. the BEx query - this is a precondition to allow all of that in the first place, sending down the entire processing graph to HANA and allowing HANA to optimize and pipeline the individual processing steps, and
having the capability to assemble the partial results of the processing steps into the final (OLAP) result.

Credits
Let me emphasize that I'm the mere republisher of this blog. The real credits go to the real author and bright minds behind this approach who sit in BW's OLAP and data manager teams.This blog is cross-published here. You can follow the author on Twitter via @tfxz.