A simple ETL task in Power Automate

Flow Excel Filtered Copy

Microsoft Power Automate (previously called “Flow”) has a nice drag and drop interface and some coding capabilities that allow tasks to be automated. It seems to be free with most Office 365 business licenses, though with varying levels of functionality available for different licencing models. Tasks can handle Office 365 files, emails and other things, and can be scheduled, run manually or triggered by a variety of events such as emails or files arriving. Sounds great, but how does it stack up when doing a simple Extract Transform Load (ETL) task based on data in Excel? In this post I detail how building this flow goes, and discover some surprising limitations and foibles in the product.

I’m using Power Automate to do a simple Excel-based ETL task to provide clients with access to information about invoices they’ve been sent. The master invoice list has invoice information for all clients, but I want to have separate files for key clients so that I can share the files with them (read-only) to help them keep up to date with their payments. Each client file must only have invoices that match their client name, and the result should be ordered by invoice status, with conditional formatting to highlight different status values. With a scheduled refresh the files can be updated at any required frequency.

Using files in this way is probably not the best way to achieve information sharing with clients like this, but it’s what my customer wanted, and it’s a great exercise in learning Power Automate, especially its limitations!

As part of the exercise, and reflecting the actual situation I was faced with for my customer, I had the following constraints. If these were under my control, I would eliminate the constraints rather than trying to work around them, but they provided some interesting challenges:

  • The files need to be in a folder under the user’s personal space on One Drive For Business, not in a shared library, but the flow was to be developed in a folder under my personal space. This provides a good test for deploying a flow from one user environment to another.
  • The master file has no “key” value (in my example for this blog Invoice # could be used as a key, but I’m putting that aside since that wasn’t the case with my real customer scenario).

The PROs of using Power Automate to do this task:

  • It’s a cloud-based/serverless solution so doesn’t require any in-house platform to run on
  • It has access to One Drive/Office 365 files using your credentials
  • It has logging and diagnostics for all runs
  • It’s “free” with most Office 365 business licences. There are different licencing models, and confusingly, some features are “premium”, but I’ve used only the basic features here that don’t require any extra licence payment.

The CONs of using Power Automate to do this task:

  • There some profound functional limitations in Power Automate that make this seemingly simple task very frustrating to build. These issues took a lot of time to figure out since I’m new to Power Automate even though I’ve used several other sophisticate ETL tools. Granted, Power Automated isn’t promoted as an ETL tool, but even understanding that, the ETL-style functions that it does offer have some surprising limitations that I’ll go into detail about along the way.
  • The ability to “share” or transfer a flow has some annoyances too. Although there are multiple methods available to share/deploy a flow from one user to another, these methods are poorly documented in terms of how things like user credentials are affected. The method I used – export/import using a zipped flow in a file – works, but needs all of the references to folders and files to be reset when the user imports the flow. This is more a consequence of using personal spaces on One Drive for Business, but annoying nevertheless since the exact same folder structure and file names are being used by the developer and the end-user. And, if the flow accesses the same file multiple times the update needs to be made in all places: there’s no way to set up a single file reference and use it in multiple steps.

The data

Here’s the master invoice file “Client Invoices.xlsx” and a table called Invoices.

Limitation: Input/output tasks for Excel only work with tables in Excel. Not such a bad thing I guess. For the purposes of this article, I’m assuming you know about tables. If not, check them out here

Limitation: A flow cannot read a macro-enabled spreadsheet. The file must be XLSX not XLSM.

I’ve copied that file to make the first client file “Client 1 Invoices.xlsx”, again with a table called Invoices (this doesn’t need to match the master file table).

I’ve added a Key column and put values 1, 2, 3 etc in it. This is to allow the file to be cleared for each refresh, so the refresh deletes all rows in the target table, one at a time then adds all rows from the master table for rows matching this client. These delete row operations must specify a key value that identifies the row being deleted. Further below I examine ways of generating these key values when the new rows are added to the table.

Limitation: There is no “delete all rows in table” function. To do this you must “list rows present in a table”, then an “Apply to each” loop that contains a “delete a row from a table” action.

I’ve also added conditional formatting for the Status column values I want to highlight. This formatting works when the table is refreshed (i.e. emptied and refilled) as long as not all rows are deleted. If all of the rows are deleted first, then new rows added, all of the new rows take on the format of the heading, which is different to the behaviour of the spreadsheet if the rows are deleted and added manually.

To work around this, I’ve added a dummy row under the headings and exclude it from the delete so that the client spreadsheet retains its conditional formatting

Limitation: If all rows in a table are deleted, any rows that are added to the table take on the formatting of the table headings and any conditional formatting that was on the rows is lost.

I should also point out that conditional formatting functionality is different in the online version of Excel compared to desktop. The online version has very limited functionality, even severely restricting the colours you can use to highlight cells. I’ve found it best to set up conditional formatting in the desktop version, then upload (the initial file) and edit using the desktop version for future changes. Note also that while fill colours work, patterns (e.g. cross-hatch) do NOT work when viewing online!

I copied the Client 1 Invoices file for clients 2 and 3.

Building the flow

Flows can be triggered in various ways and I started with a scheduled flow. Later I adjusted the schedule to be every minute, which was nice for testing, but a bit extreme for real life. As I’ll show later, if you have the target spreadsheet open when the flow runs, you SEE the rows disappear as they’re deleted, and re-appear when they’re added. This is probably NOT what the client should see, so some thought needs to be put into the final scheduled run times.

I set the schedule to start a year ahead so there’s no chance it will run while being developed. I can manually trigger a test run at any time.

Here’s how the schedule looks when it’s edited …

The next step is to list, then delete, all rows in the Client 1 file, except for the row with “.” as the key value so we don’t lose conditional formatting as noted above.

I’ve renamed the step here from its default “List rows present in a table”, so that it’s clear in later steps, when referencing values, which step they’re coming from. Note that it’s best to do renames like this before you have any other steps that reference this step. Changing the name of a step will break other steps until you change their reference to the new name. This is pretty annoying when it happens.

I’ve also used the “Filter Query” option to make sure I don’t get the row with key value “.” to avoid losing my conditional formatting as notes above. This filter query is an OData filter. Interestingly my field name “Key” doesn’t have embedded spaces, and if you want to filter or sort the table rows using an OData clause here, it’s best to avoid having spaces in your column headings – flow can’t handle these (there are various suggestions for how to handle them, like using _x0020_ in place of a space, but I couldn’t get any of these to work). It also can’t handle more than expression in the Filter Query or more than one column in the Order By!

Limitation: OData query filter and order by can refer to only a single field value and cannot refer to field names with embedded spaces. i.e. while OData syntax is used, full OData syntax isn’t supported.

The Filter By OData query being used here is quite helpful, despite the terrible limitations I’ve found. The alternative is to list ALL rows an put them through a “Filter Array” step, which works fine, but annoyingly, the output of Filter Array is a thing called “Body” and this can’t be easily referenced using point and click like can be done with the fields of the Excel table – you have have to use some sort of expression to get the field out of “Body”. So, I stuck with the OData query, especially in a later step where I want to make sure that I’m only retrieving rows for the client I’m processing, hence minimising the risk that clients will see each other’s data.

It’s also interesting that when assigning the file to be processed, you can see files in your own space, and in shared libraries, but NOT in another users’ space, even if that user has shared files/folders with you.

Limitation: Flows can’t access files or folders from other users’ personal space, even if they have shared those resources with you.

Next, I use “Apply to each” (again, suitably renamed) to delete the rows from the client table based on their key value.

A note about key values.

When deleting a row from an Excel table, flow needs a key value. As noted above, part of this exercise is to deal with this when there is no key. So I needed to manufacture a key value each time a row was inserted. At first I tried a simple “=row()” formula in Excel, but I found this regularly caused the delete to fail because it would delete row with a lower value of Key (say Key=3), meaning the values higher than that in the column all “shuffled down” and the maximum (say Key=5) became one lower (Key=4). Then the attempt to delete the non-existent maximum value (5) fails. This happened even if I sorted the list by key value descending. It seems the order of rows in the “Apply to each – Delete” step is random, despite any ordering in the prior step.

Another option is to use a formula in Excel to join all of the other column values together to make a key. This works but is very unsightly, and you can’t just “hide” a column or the flow won’t see it!

Limitation: A flow can’t operate on hidden table columns. It’s as if they don’t exist.

Yet another, slightly better, option is to use an inbuilt field ‘ItemInternalId’ that the flow adds to each row. This can be access with the following expression:

item()[‘ItemInternalId’]

This expression yields a GUID-like value that works well as a unique key value. It is however, still unsightly and might confuse clients when they see it.

There’s also a function called ‘iterationIndexes’ which looked hopeful, but I couldn’t get that to work.

In the end I opted to set a variable and increment it on each add-row step (see below), giving numbers 1,2,3 etc. Here’s the initialize variable step:

Next, we list all the rows in the master table for the client we’re interested in:

Again, using a Filter Query value to limit it to only this client, and an Order By query so that rows are added in order of Status (descending). It appears that rows being added don’t suffer from the same random ordering that rows being deleted do.

After listing the rows for this client, we add them, one at a time to the target file, incrementing (by 1 surprisingly!) the Key value each time.

After this the flow should run.

And all going well it will have success all the way through.

If an error occurs at any step, you can just click on that step to see what the problem is. Even in successful steps you can dive in to see the details of the data that flowed through.

As I mentioned above, if you watch the target sheet while the flow is running, you’ll actually see the rows being deleted and re-added like this:

Replicating the flow for other clients

I suppose it’s possible to have a single flow that refreshes multiple client files, but my preference is to simply copy this flow as many times as there are client files required. Of course, with hundreds/thousands of clients wouldn’t be good, but in my situation this wasn’t the case. Care must be taken to re-select the different client file in EVERY step where the client file is referenced, and to change the client filter to work for the new client when listing rows in the master file.

Deploying the flow to another user.

This is where it gets even more interesting and where the documentation isn’t very helpful.

Flows can be exported in ZIP format, and a ZIPped flow can be imported by another user.

You can also export a flow in JSON format. I initially assumed that this would allow me to edit the flow as text and import, say for replicating the flow for other clients, or to work in the space of the other user. BUT this isn’t the case! The JSON export version cannot be imported and is apparently meant for some other function “Logic Apps”.

There’s also a recently added “Send a copy” button that’s supposed to do the same thing as exporting and importing a ZIPped flow:

But when I clicked this button it asked me if I wanted to start a free trial of premium features:

I searched but couldn’t find any information about which features require premium licensing. Trial an error I guess? The “Learn more” button only took be to a screen where I could choose between a per user plan and a per flow plan. My guess is that both of these enable premium features, but that’s only a guess.

Now, exporting the flow as a ZIP is interesting. You have to nominate up front if the user receiving the flow will be updating a prior version or “Create as new” for installing a first version. In my case it’s obvious, but if you’re using this method to distribute the flow to many users, it may not be so clear which users already have the flow and what would happen if they tried to import a “Create as new flow” when they already have a prior version.

There seem to be other distribution options, like templates and solutions, but I haven’t explored those much. The “solutions” option says that you must have a data connection!

Next, importing the flow:

When importing you need to click the connections marked as “Select during import”. In this case it’s the connection to the personal area of OneDrive. This needs to be set by the user importing the flow to their own space/username.

After flow has been imported, it has to be edited to set ALL of the file connections, which are effectively lost because they have some kind of hex-valued reference to the original location.

It would be nice if the flow just had name references and worked without editing if the importing user had the same folder and file structure, but it doesn’t work that way.

Limitation: When deploying a flow to another user, for files in a user’s personal space on OneDrive, even if that user has the same folder and file structure, all file references have to be re-assigned.

Conclusion

There certainly are benefits to using Power Automate – like cloud/serverless operation, point and click design (mostly). But for a simple process like this there are so many hoops to leap through, it’s almost not worth it. Perhaps I’m missing the point and trying to force a process that the system just isn’t designed to cater for. But if there are functions like listing rows in table, deleting rows and adding rows, I think it’s reasonable to expect that a simple process like would be simple to build and I’m surprised at how tricky this has been.

Have you had any similar (or better?) experiences with flows? Has any of the details above helped your situation? Please let me know via the comments below.