Welcome PowerShell User! This recipe is just one of the hundreds of useful resources contained in the PowerShell Cookbook.

If you own the book already, login here to get free, online, searchable access to the entire book's content.

If not, the Windows PowerShell Cookbook is available at Amazon, or any of your other favourite book retailers. If you want to see what the PowerShell Cookbook has to offer, enjoy this free 90 page e-book sample: "The Windows PowerShell Interactive Shell".

5.15 Convert Text Streams to Objects

Problem

You have raw, unstructured text, and want to parse it into PowerShell objects.

Solution

Use the -Delimiter parameter of the ConvertFrom-String cmdlet to parse data in simple column formats. PowerShell automatically generates property names if you don’t specify them, and automatically converts the strings into more appropriate data types if possible:

$delimiter = "[ ]+(?=\d|Services|Console)"
$output = tasklist.exe | Select -Skip 3 | ConvertFrom-String -Delimiter $delimiter
PS > $output | Where-Object P2 -lt 1000 | Format-Table

P1                   P2 P3       P4 P5
--                   -- --       -- --
System Idle Process   0 Services  0 8 K
System                4 Services  0 2,072 K
Secure System        72 Services  0 39,256 K
Registry            132 Services  0 99,088 K
smss.exe            524 Services  0 1,076 K
(...)

You can also use the -Delimiter parameter to parse entire strings. Any text matched by your capture groups will be present as the second property and beyond, which you can name as you like:

PS > $expression = 'FirstName=(.*);LastName=(.*)'
PS > $parsed = "FirstName=Lee;LastName=Holmes" |
    ConvertFrom-String -Delimiter $expression -Property Ignored,FName,LName
PS > $parsed.FName
Lee
PS > $parsed.LName
Holmes

Use the -Template parameter to parse data automatically based on the tagging that you’ve added to example text in the template:

$template = @"
{FName*:Lee} {LName:Holmes}
{FName*:John} {LName:Smith}
"@

"Lee Holmes","Adam Smith","Some Body","Another Person" |
    ConvertFrom-String -TemplateContent $template
FName   LName
-----   -----
Lee     Holmes
Adam    Smith
Some    Body
Another Person

Discussion

One of the strongest features of PowerShell is its object-based pipeline. You don’t waste your energy creating, destroying, and recreating the object representation of your data. In other shells, you lose the full-fidelity representation of data when the pipeline converts it to pure text. You can regain some of it through excessive text parsing, but not all of it.

However, you still often have to interact with low-fidelity input that originates from outside PowerShell. Text-based data files and legacy programs are two examples.

PowerShell offers great support for all of the three text-parsing staples you might be aware of from other shells:

Sed

Replaces text. For that functionality, PowerShell offers the -replace operator and Convert-String cmdlet.

Grep

Searches text. For that functionality, PowerShell offers the Select-String cmdlet, among others.

The third traditional text-parsing tool, Awk, lets you chop a line of text into more intuitive groupings. For this, PowerShell offers the incredibly powerful ConvertFrom-String cmdlet.

In its simplest form, you can use the ConvertFrom-String cmdlet to parse column-oriented output based on a delimiter that you provide. The delimiter defaults to runs of whitespace, but you can also provide strings of your choosing or much more detailed regular expressions. PowerShell will also convert the text into more appropriate data types (such as integers and dates), if possible.

For more complicated needs, the ConvertFrom-String cmdlet supports example-driven parsing. As with the Convert-String cmdlet, this is about as close to magic as you’ll ever experience in a shell. Rather than forcing you to write complicated parsers by hand, the ConvertFrom-String cmdlet automatically learns how to extract data based on how you’ve tagged data in your example template.

Let’s consider trying to parse an address book:

Record
------

FName: Lee
LName: Holmes

Record
------

FName: Adam
LName: Smith

Record
------

FName: Some
LName: Body

Last updated: 05/09/2021

To have ConvertFrom-String parse it, we need to give it a template. A good way to think about templates is to imagine taking some sample output, highlighting regions of the sample output with a mouse, and then naming those regions.

In a template, the left curly brace { represents the start of your selection, and the right curly brace } represents the end of your selection. To name your selection, you provide a property name and a colon right after the opening brace. So, PowerShell Rocks becomes {FName:PowerShell} {LName:Rocks}.

Let’s start creating a template. In a new file, start with this as an example, and save it as addressbook.template.txt (the name is up to you):

{Record:Record
------

FName: Some
LName: Body}

Last updated: {LastUpdated:05/09/2021}

When you run ConvertFrom-String on this input and template, we get:

PS > $book = Get-Content addressbook.txt |
    ConvertFrom-String -TemplateFile addressbook.template.txt
PS > $book.LastUpdated
05/09/2021

PS > $book.Record

Record
------

FName: Lee
LName: Holmes

There were several records, though. To tell ConvertFrom-String that the input contained multiple of a certain pattern, use an asterisk after the property name:

{Record*:Record
------

FName: Some
LName: Body}

Last updated: {LastUpdated:05/09/2021}

If we run this, we see that ConvertFrom-String hasn’t quite figured out the record format. So let’s give it another example:

{Record*:Record
------

FName: Some
LName: Body}

{Record*:Record
------

FName: Adam
LName: Smith}

Last updated: {LastUpdated:05/09/2021}

And now, ConvertFrom-String understands records and a footer:

PS > (Get-Content addressbook.txt |
    ConvertFrom-String -TemplateFile addressbook.template.txt)

Record
------
Record...
Record...
Record...

PS > (Get-Content addressbook.txt |
    ConvertFrom-String -TemplateFile addressbook.template.txt).LastUpdated

05/09/2021

To tell ConvertFrom-String about the inner structure of a record, we simply tag it and name it as well. Update the first record in your template:

(...)
FName: {FName:Some}
LName: {LName:Body}}
(...)

And now ConvertFrom-String fully understands our database format.

PS > (Get-Content addressbook.txt |
    ConvertFrom-String -TemplateFile addressbook.template.txt)

Record
------
{@{FName=Lee; LName=Holmes}}
{@{FName=Adam; LName=Smith}}
{@{FName=Some; LName=Body}}

PS > (Get-Content addressbook.txt |
    ConvertFrom-String -TemplateFile addressbook.template.txt).Record[0].FName
Lee

As our final magic trick, let’s tell PowerShell that LastUpdate is a [DateTime]. Update your template to include:

(...)
Last updated: {[DateTime] LastUpdated:05/09/2021}
(...)

Which gives an amazing result:

PS > (Get-Content addressbook.txt |
    ConvertFrom-String -TemplateFile addressbook.template.txt).LastUpdated

Sunday, May 9, 2021 12:00:00 AM

See Also

Recipe 1.2, “Run Programs, Scripts, and Existing Tools”

Recipe 5.14, “Convert a String Between One Format and Another”