Power Platform Tidbits 10: How to split address into separate columns in Power Query
Power Platform Tidbits 10: How to split address into separate columns in Power Query
Sometimes address values are stored in a single text column and you'd need to split it into separate columns. For example when migrating data into Power Platform in a data flow, or when building a Power BI dataflow, or simply in Excel. In this post am going to show you how you can do this in a reliable way. I am going to make some assumptions first, but you can always fine-tune the formula for other address patterns.
Assumption
The addresses I'm going to work with look like the followings.
- François Vervloetstraat 100, 1080 Uccle
- Koning Albertstraat 14, Menen
The syntax we are looking for is:
first-line, [postcode] city-nameIn other words, the first line address is always present. The postcode is optional. the city name is always present. It might be that there is no address at all.
Solution
#"Split address" = Table.SplitColumn(#"Previous step", "address", Splitter.SplitTextByDelimiter(","), {"Line 1", "Line 2"}),
#"Split by character transition" = Table.SplitColumn(#"Split address", "Line 2", Splitter.SplitTextByCharacterTransition({"0".."9"}, (c) => not List.Contains({"0".."9"}, c)), {"Postcode", "City"}),
#"Replaced nulls" = Table.ReplaceValue(#"Split by character transition", null, each [Postcode], Replacer.ReplaceValue, {"City"}),
#"Replaced postcodes" = Table.ReplaceValue(#"Replaced nulls", each [Postcode], each if [Postcode] = [City] then null else [Postcode] , Replacer.ReplaceValue, {"Postcode"}),
#"Changed types" = Table.TransformColumnTypes(#"Replaced nulls", {{"Line 1", type text}, {"Postcode", type Int16.Type}, {"City", type text}})Let's go over the solution, line by line.
- The first line assumes that there is a
Previous stepand anaddresscolumn. It splits theaddresscolumn into two columns,Line 1andLine 2by looking for a comma character. - The
Line 2column is split into two other columnsPostcodeandCitybased on character transition from digits ({"0".."9"}) to non-digits. If the postcode was not optional, we could stop here, but we need two more steps to address this issue. - This step sets the
Postcodeto theCitywheneverPostcodeis null. After this step, whenever the postcode is not present in the address, bothPostcodeandCitycolumn will contain the city name. - This step sets the
Postcodeto null whenever thePostcodeandCitycolumns have the same value as the result of previous step. - The last step is just setting the type of columns, because after applying
Table.SplitColumntransition, the new columns might not have proper types.
More about splitting by character transition
The most interesting step of the solution is the one that splits columns based on character transition. Let's break it down. This step is a mix of two different table functions. If we look at the syntax of Table.SplitColumn function, it looks like this:
Table.SplitColumn(table as table, sourceColumn as text, splitter as function, optional columnNamesOrNumber as any, optional default as any, optional extraColumns as any) as table
The first parameter is the table, the second one is the name of the column that will split, the third one is the function that splits the value. This function will be given the value of the column and should in turn, return a list of values. It could be as simple as (value) => {"value 1", "value"}, but we use a native function that can split values for us, and that is Splitter.SplitTextByCharacterTransition. This function has the following syntax:
Splitter.SplitTextByCharacterTransition(before as anynonnull, after as anynonnull) as function
This function splits text into a list of texts according to a transition from one kind of character to another. The before and after parameters can either be a list of characters, or a function that takes a character and returns true/false. As before we are passing {"0".."9"}, but for after we are passing a function to detect any characters other than {"0".."9"} this makes the detection more inclusive. In other words (c) => not List.Contains({"0".."9"}, c).