The hidden cost of dataflows Gen 2 and the on-prem gateway
- 3 minutes read - 493 wordsOne of the common comments about Dataflows Gen 2 is that the CU cost can be much higher than a similar Gen 1 dataflow. While they do have the same name, they do work quite differently under the hood and different technologies tend to have different costs, there is one key difference between the two in how they are billed.
In dataflows Gen 2 the pricing is relatively simple with the Microsoft documentation stating you are charged per second of the dataflow runtime (I’m over simplifying here, staging and fast copy can complicate this a bit). In dataflows Gen 1 the billing even simpler as it is just the compute time taken by the refresh.
While these two may seem very similar as they are both based on time the dataflow takes the key difference is that while dataflows Gen 1 just charged you for the compute time, Gen 2 will charge you for the full runtime of the refresh!
This especially matters when using the on-prem gateway as when a gateway is used the data processing and transformation takes place on the gateway, not in the Power BI/Fabric service. This means if your Power Query takes 5 minutes to run on your gateway, in Gen 1 you would just be charged a very minimal amount of CU for the final data load. However, with a Gen 2 dataflow you will be charged for the full 5 minutes. This is despite the compute running your data transformations is the machine hosting your gateway and not Fabric.
This gets especially bad with slow datasources, such as an overloaded database, a slow network, or poorly written Power Query transformations. To demonstrate this I have some Power Query code that loads a CSV saved locally on my laptop after a 5 minute wait. As the file is only available on my laptop I have the gateway installed to allow Fabric to load from the file.
let
Binary_load = () => File.Contents("C:\data.csv"),
Binary_data = Function.InvokeAfter(Binary_load, #duration(0,0,0,300)),
Source = Csv.Document(Binary_data, [Delimiter = ",", Columns = 2, QuoteStyle = QuoteStyle.None]),
#"Promoted headers" = Table.PromoteHeaders(Source, [PromoteAllScalars = true]),
#"Changed column type" = Table.TransformColumnTypes(#"Promoted headers", {{"Column1", type text}, {"Column2", Int64.Type}}, "en")
in
#"Changed column type"
I then ran this code in both a Gen 1 and Gen 2 dataflows one after the other to avoid any resource connection issues on the gateway. After these were completed I checked the Capacity Metrics app and we can see that while the duration for both Gen 1 and Gen 2 are very similar the CU cost for Gen 2 is far higher.

Dataflow | Duration | CU Cost |
---|---|---|
Gen 1 | 318 | 2 |
Gen 2 | 335 | 4,944 |
Conclusion
So if you are considering migrating a Gen 1 dataflow that uses an on-prem gateway to Gen 2, it is vital that you test thoroughly and make sure your dataflows are optimised to minimise the runtime or look at more CU cost effective artifacts such as pipelines or notebooks.