Using filters in DAX

In this post I will show some samples on different ways to use filtering in DAX. If you write ineffective filters you will most likely always run into performance issues. We will also see the difference between using SUMMARIZE() and ROW() in queries that only return one row.

Lets pretend one of your customers wants to know the sum of sales for the product “AWC Logo Cap”. And the minimum sales.

The benchmarking is done by running the different queries in DAX Studio multiple times. I have cleared cache between the executions.

Sample query 1.

DEFINE
    MEASURE 'FactInternetSales'[MyFilteredNumber] =
        CALCULATE (
            SUM ( FactInternetSales[UnitPrice] ),
            FILTER ( DimProduct, [EnglishProductName] = "AWC Logo Cap" )
        )
    MEASURE 'FactInternetSales'[MyFilteredNumberMin] =
        CALCULATE (
            MIN ( FactInternetSales[UnitPrice] ),
            FILTER ( DimProduct, [EnglishProductName] = "AWC Logo Cap" )
        )
EVALUATE
 (
    FILTER (
        SUMMARIZE (
            FactInternetSales,
            DimProduct[EnglishProductName],
            "Filtered result", 'FactInternetSales'[MyFilteredNumber],
            "Filtered result min", 'FactInternetSales'[MyFilteredNumberMin]
        ),
        'FactInternetSales'[MyFilteredNumber] > 0
    )
)

This query produces the correct answer. But is does not perform very well.

It took 32 MS.

Then I will change my filters to not filter the entire table. Only the column that we are interested in.
This is done by using VALUES(..) on the column name in the filter expression.

Sample query 2.

DEFINE
    MEASURE 'FactInternetSales'[MyFilteredNumber] =
        CALCULATE (
            SUM ( FactInternetSales[UnitPrice] ),
            FILTER (
                VALUES ( DimProduct[EnglishProductName] ),
                [EnglishProductName] = "AWC Logo Cap"
            )
        )
    MEASURE 'FactInternetSales'[MyFilteredNumberMin] =
        CALCULATE (
            MIN ( FactInternetSales[UnitPrice] ),
            FILTER (
                VALUES ( DimProduct[EnglishProductName] ),
                [EnglishProductName] = "AWC Logo Cap"
            )
        )
EVALUATE
 (
    FILTER (
        SUMMARIZE (
            FactInternetSales,
            DimProduct[EnglishProductName],
            "Filtered result", 'FactInternetSales'[MyFilteredNumber],
            "Filtered result min", 'FactInternetSales'[MyFilteredNumberMin]
        ),
        'FactInternetSales'[MyFilteredNumber] > 0
    )
)

The query took now 25 MS to execute

What if I move the filter above SUMMARIZE()?

Sample query 3.

DEFINE
    MEASURE 'FactInternetSales'[MyFilteredNumber] =
        CALCULATE ( SUM ( FactInternetSales[UnitPrice] ) )
    MEASURE 'FactInternetSales'[MyFilteredNumberMin] =
        CALCULATE ( MIN ( FactInternetSales[UnitPrice] ) )
EVALUATE
 (
    FILTER (
        SUMMARIZE (
            FactInternetSales,
            DimProduct[EnglishProductName],
            "Filtered result", 'FactInternetSales'[MyFilteredNumber],
            "Filtered result min", 'FactInternetSales'[MyFilteredNumberMin]
        ),
        'FactInternetSales'[MyFilteredNumber] > 0
            && [EnglishProductName] = "AWC Logo Cap"
    )
)

The query went down to 13 MS

What if I re-write my query to use ROW() instead?

Sample query 4.

EVALUATE
 (
    ROW (
        "Filtered result", CALCULATE (
            SUM ( FactInternetSales[UnitPrice] ),
            FILTER ( DimProduct,  [EnglishProductName] = "AWC Logo Cap" ) 
        ),
        "Count", CALCULATE (
            MIN ( FactInternetSales[UnitPrice] ),
            FILTER ( DimProduct, [EnglishProductName] = "AWC Logo Cap" ) 
        )
    )
 )

Then my query actually went down to 5 MS.

One last modification. Insert VALUES(..) to just apply the filter on the necessary column.

Sample query 5.

EVALUATE
 (
    ROW (
        "Filtered result", CALCULATE (
            SUM ( FactInternetSales[UnitPrice] ),
            FILTER (
                VALUES ( DimProduct[EnglishProductName] ),
                [EnglishProductName] = "AWC Logo Cap"
            )
        ),
        "Count", CALCULATE (
            MIN ( FactInternetSales[UnitPrice] ),
            FILTER (
                VALUES ( DimProduct[EnglishProductName] ),
                [EnglishProductName] = "AWC Logo Cap"
            )
        )
    )
)

And now we went down to about 4 MS.

The server timings for “Sample query 1” looks like this

server-timing-query-1

And the server timings for “Sample query 5” looks like this

server-timing-query-2

Advertisements

Indent on report parameter in SSRS (written in MDX)

In this post I will show you how you can make nice indented report parameters in SSRS. This might become handy if you want to display multiple levels in the same parameter list.

Lets start by writing a simple MDX to list years and week numbers.

WITH MEMBER [RSValue]
AS
[Date].[Production].CurrentMember.UniqueName

MEMBER [RSCaption]
AS
[Date].[Production].CurrentMember.NAME

SELECT {
	[Measures].[RSValue],
	[Measures].[RSCaption]
} ON 0,
DESCENDANTS(
	[Date].[Production].[Year], 
	[Date].[Production].[Week], 
	SELF_AND_BEFORE
) ON 1
FROM [Adventure Works Internet Sales Model]

Running this query will produce a result like this in SSMS

Result 1

To add this result to a parameter I will first have to create a dataset. Let’s call that “MyDate”. Then paste the MDX into it. Then assign the “Available Values” as in the screenshot below.

Result 2

During report preview the report parameter will look like this

Result 3

Not too nice and readable I guess. I really need to make som kind of indenting here. So I start to make some changes to my MDX like this

WITH MEMBER [RSValue]
AS
[Date].[Production].CurrentMember.UniqueName

MEMBER [Indent]
AS
[Date].[Production].CurrentMember.LEVEL_NUMBER - 1

MEMBER [RSCaption]
AS
SPACE([Measures].[Indent] * 5) + " " + [Date].[Production].CurrentMember.NAME

SELECT {
	[Measures].[RSValue],
	[Measures].[Indent],
	[Measures].[RSCaption]
} ON 0,
DESCENDANTS(
	[Date].[Production].[Year], 
	[Date].[Production].[Week], 
	SELF_AND_BEFORE
) ON 1
FROM [Adventure Works Internet Sales Model]

It’s not easy to see the result of this indenting if you run the query in SSMS. But if you replace the query in the dataset we called “MyDate” you will see a much nicer list.

Result 4

Make sure you trim your parameter when setting it against the dataset you want to filter

Result 5

Voila!

See you in Seattle?

Saturday I went from Norway to Seattle with one of my friends. A flight that took over 11 hours. It is good to have a few days up front to shop and so on. The hotel (Hyatt Olive) we are staying at even has a fitness center and swimming pool. So I might even get in shape during this week.

I am really looking forward to SQL Pass that starts on Tuesday. Good speakers and really interesting sessions.

Maybe I will see you there?

Kind regards,
Sindre

How to make your model smaller

One off the biggest mistakes out there is to add columns with unique values to a model. Such as order numbers and orderline numbers. Unless you have a very good reason to do so. This will make your models larger then necessary. And the SSAS engine will not aggregate your data as good as it should.

Lets make a sample. We will use AdventureWorksDW2012 as we always do.

First I will add a couple of columns to my fact

alter table [dbo].[FactInternetSales]
add MostLikelyUniqueVarchar varchar(10),
UniqueIntValue int IDENTITY(1,1)

update [dbo].[FactInternetSales]
set MostLikelyUniqueVarchar = SUBSTRING(CONVERT(varchar(255), NEWID()), 0, 10)

Then I will create a tabular model with the following tables

– DimDate
– FactInternetSales
– DimProduct

And it will look something like this

Tables

I mark DimDate as date table and then deploy my model to the server.

The model is now about 6 MB.

Size 1

Then I replace my FactInternetSales with this query. Still having SalesOrderNumber and SalesOrderLineNumber in it.

select
      ,[OrderDateKey]
      ,[DueDateKey]
      ,[ShipDateKey]
      ,[CustomerKey]
      ,[PromotionKey]
      ,[CurrencyKey]
      ,[SalesTerritoryKey]
      ,[SalesOrderNumber]
      ,[SalesOrderLineNumber]
      ,[RevisionNumber]
      ,[OrderQuantity]
      ,[UnitPrice]
      ,[ExtendedAmount]
      ,[UnitPriceDiscountPct]
      ,[DiscountAmount]
      ,[ProductStandardCost]
      ,[TotalProductCost]
      ,[SalesAmount]
      ,[TaxAmt]
      ,[Freight]
      ,[CarrierTrackingNumber]
      ,[CustomerPONumber]
      ,[OrderDate]
      ,[DueDate]
      ,[ShipDate]
  FROM [dbo].[FactInternetSales]

The model is then down to about 4 MB.

Size 2

But what If I remove all the unique columns? Such as CarrierTrackingNumber, CustomerPONumber, SalesOrderNumber and SalesOrderLineNumber.

SELECT [ProductKey]
      ,[OrderDateKey]
      ,[DueDateKey]
      ,[ShipDateKey]
      ,[CustomerKey]
      ,[PromotionKey]
      ,[CurrencyKey]
      ,[SalesTerritoryKey]
      ,[OrderQuantity]
      ,[UnitPrice]
      ,[ExtendedAmount]
      ,[UnitPriceDiscountPct]
      ,[DiscountAmount]
      ,[ProductStandardCost]
      ,[TotalProductCost]
      ,[SalesAmount]
      ,[TaxAmt]
      ,[Freight]
      ,[OrderDate]
      ,[DueDate]
      ,[ShipDate]
  FROM [dbo].[FactInternetSales]

Now the model is reduced to about 3.5 MB. Half the size we began with 🙂

Size 3

So the conclusion is. Do not add columns with unique values in your model. You should only have columns with foreign keys and measures.

Optimize your datatypes for SSAS

Not everyone is aware of the fact that not all data types are supported in SSAS cubes / models. If you use one of the data types that are not supported by SSAS, SSAS will cast all your unsupported data types to supported data types when the cube is being processed.

This will cause your cube processing time to increase. And you will use more CPU than necessary.

Microsoft has provided a list of supported data types here

To avoid these kinds of problems there are several things you can do.

  • Use views as source for your cubes. Then cast the values in these views. SQL Server is better than SSAS to cast values. And it is more likely that your SQL Server has more power that your SSAS server.
  • Make sure your DWH only uses supported SSAS data types. But that’s not to likely is it?

Hardware specification for SSAS

Top things you should consider when buying new hardware for your SSAS server.

CPU
When you query a SSAS multidimensional cube your query might get split between the storage engine (raw data) and the formula engine (complex calculations). The formula engine is single threaded. So it is important to choose a CPU with high frequency. 3.4 Ghz will be much better than 2.1 Ghz.

The CPU should also have as much cache as possible.

Disk
I prefer to always have DAS (Directly Attached Storage) rather than using a SAN. At least 2 fast SSD disks in proper RAID should be perfect. Or you could consider flash memory like Fusion-IO (SanDisk). SSAS multidimensional uses Windows File Cache quite a lot.

I am not using RAID with failover. Because if the server crashes I can create a new virtual server in no-time. And then deploy the cubes to it.

Memory
You should choose as fast memory as possible. And make sure you have enough of it 🙂 SSAS Tabular reads the model into memory. So this will increase you query performance quite a lot.

NUMA
I will not talk too much about NUMA since it’s quite complicated. SSAS Multidimensional should perform quite well using NUMA. SSAS Tabular might have problems on a server with NUMA. I have read some forum posts telling that a query runs faster on a Surface Pro than a high-end server. And that the problem might be because of NUMA. On of the solutions on this is to install a virtual server on top of the server. And then bind one of the sockets to this.

At one of our customer we ended up buying this server (august 2015).

DL380 gen9
2 x Intel Xeon E5-2643v3, 3.4GHz, 6-core, 20MB L3 cache
120GB Memory 2133MHz
2 x 800GB SSD drive
10Gb Ethernet

Please note that this server specification might not suit your needs.

You should always identify bottlenecks before you decide on a new server specification. And then scale the server for your needs.

Marco Russo has written a nice article on “Optimize Hardware Settings for Analysis Services Tabular”

Can Tabular models be optimized by sorting the fact tables?

In multidimensional cubes you can create faster cubes by sorting your facts before you process the cube. By doing this your models will be more compressed (smaller) and your queries will run faster.

Today I wanted to test the same in Tabular. So I created 2 models with one fact table and a few dimensions. The only difference between the models are the fact table. In the first model I used the orginal table (FactInternetSales). And in the other model I used a view against FactInternetSales with a order by on all columns.

These are the tables I used

model

Then I processed and deployed the models

When checking the size of the folders that contains the fact data I was kind of surprised. I looked to be the opposite! The folder for the regular table was actually smaller than the one that represents the view. So the table seemed to be more compressed than the view.

What to you think? Is this caused by the fact that one of the facts is a table and the other one is a view? Or is the fact allready sorted in the table?