Power BI #6 — The Backend
The backend of Power BI. That is the topic of today’s article. In other words, are going to have a very high-level look at the architecture of Power BI followed by a glance at the engines (yes there are two).
Why do you need to know about the backend?
Up to this point, we have spoken about writing DAX. Hence, the next goal for us now is to write efficient DAX. A precursor to writing efficient DAX queries is to understand how these engines work.
What are the data query models in Power BI?
- Direct query: In simple words, this is when the DAX queries are run at the data source. Or, the content of the table is read from the data source at query time, and it is not stored in memory during data refresh.
- Import query: Here, the data queries are run solely in memory. Also called in-memory, or VertiPaq. The content of the table is stored by the VertiPaq engine, copying and restructuring the data from the data source during data refresh.
You need to know about these models because they significantly impact the capabilities and performance of your reporting.
How does the architecture of Power BI look like?
- DAX query: In this first step the measure or calculated column that we have created.
- Tabular Model: Think of the tabular model as the container containing our two engines. This tabular model is also used with the MDX language which is used with excel.
- Formula Engine (FE): This engine processes the request, generating, and executing a query plan.
- Storage Engine (SE): Retrieves data out of the Tabular model to answer the requests made by the Formula Engine. The Storage Engine has two implementations:
- VertiPaq hosts a copy of the data in memory that is refreshed periodically from the data source.
- DirectQuery forwards queries directly to the original data source for every request. DirectQuery does not create an additional copy of the data.
Where does the data flow?
- DAX - FE: The formula engine is a high-level execution unit. Basically, it's powerful enough to understand and solve complex DAX queries. It receives a DAX query then processes it to create a query execution plan as below.
- FE - SE: When the query execution plan (formula engine) needs to access the underlying tables(highlighted lines above) it forwards the processed request to the storage engine. This request can be a simple one trying to access the raw data table or it can be complex as involving aggregations and joins. The SE then retrieves the data.
- SE — Data Source: Here, depending on the model being used in the above section the SE either uses the Vertipaq engine, which uses an in-memory cache to retrieve and store your data. Or, directly uses the data source using the direct query.
What are the best rules to follow for writing DAX?
Here are some resources that clearly lay out the rules of thumb to follow while writing your DAX.
- https://powerpivotpro.com/2019/03/dax-optimizations-write-it-like-the-dax-calls-it/.
- https://maqsoftware.com/expertise/powerbi/dax-best-practices
- https://www.sqlbi.com/articles/optimizing-dax-expressions-involving-multiple-measures/
Additional Note
From my experience, while DAX is a very powerful language, we need to maintain a balance between data querying and data manipulation. There is a fine line between the two and DAX is powerful because of the complex engines but at the same time is expensive. I highly recommend that the data be preprocessed using power query, Python, R, or any other data manipulation language. This helps us utilize the maximum potential of DAX.
Anyways, the DAX studio is a great tool to analyze the working of the above two engines in action. Here is a link to a video. This is a great starting point for DAX studio used in the optimization of DAX.
Finally, a huge thanks to everyone who has followed this series, it was wonderful to be on this journey with you. I have learned so much and gained so much value from you by doing this. After following this series, you should be well equipped to take a dip into Power BI and start generating fast and factual insights from your data.
The next series coming soon…
References: