Understanding default encoding and Change the same in PowerShell

This blog post is to discuss output encoding format used when data is passed from one PowerShell cmdlet or to other applications. This is a rarely understood feature unless you are trying to write some module which integrates PowerShell with another software. 

Passing output between PowerShell cmdlets

The Strings inside PowerShell are 16-bit Unicode, instances of .NET’s System.String class. So by default, when you pipe output from one cmdlet to another, it is passed as 16-bit unicode or utf-16. Since Out-File is again a powershell cmdlet, it passes unicode text to the file generated. Same goes for redirection operators > and >> in the PowerShell.

As of PowerShell 5.1 (which is the latest version), there is no way to change the encoding of the output redirection operators > and >> and they invariably create UTF-16 LE files with a BOM (byte-order mark).

However, in PowerShell v3 or higher, you can use $PSDefaultParameterValues to change the encoding of any cmdlets and advanced functions that accept an -Encoding parameter:

$PSDefaultParameterValues = @{ '*:Encoding' = 'utf8' }

If you place this in your $PROFILE, cmdlets such as Out-File and Set-Content will use UTF-8 encoding by default.

This encoding format has no relation to $OutputEncoding parameter, which is discussed next.

Passing output from PowerShell to Native Application

When we pipe output data from PowerShell cmdlets into native applications, the output encoding from PowerShell cmdlets is controlled by the $OutputEncoding variable, which is by default set to ASCII. This variable $OutputEncoding is a system generated variable and its values can be simply obtained by typing variable name in PowerShell prompt:

PS D:\PowerShell-master> $OutputEncoding

IsSingleByte : True
BodyName : us-ascii
EncodingName : US-ASCII
HeaderName : us-ascii
WebName : us-ascii
WindowsCodePage : 1252
IsBrowserDisplay : False
IsBrowserSave : False
IsMailNewsDisplay : True
IsMailNewsSave : True
EncoderFallback : System.Text.EncoderReplacementFallback
DecoderFallback : System.Text.DecoderReplacementFallback
IsReadOnly : True
CodePage : 20127

This is set to ASCII because most of the applications do not handle unicode correctly. This may result in cases where an program on right side of the pipeline or redirection is not able to read input data clearly. This becomes especially important if your software supports multiple languages. For example, let’s create a text file with some Chinese characters in it.

PS C:\> ${c:\test.txt}=”中文“

Try to use findstr to find one of the Chinese characters, and it will not find anything:

PS C:\> Get-Content c:\test.txt | findstr /c:中

But, same command works in Cmd.exe:

PS C:\> cmd /c “findstr /c:中 test.txt”

中文

You can change the value of this variable to handle the encoding format:

PS D:\PowerShell-master> $OutputEncoding = [System.Text.Encoding]::Unicode
PS D:\PowerShell-master> $OutputEncoding

BodyName : utf-16
EncodingName : Unicode
HeaderName : utf-16
WebName : utf-16
WindowsCodePage : 1200
IsBrowserDisplay : False
IsBrowserSave : True
IsMailNewsDisplay : False
IsMailNewsSave : False
IsSingleByte : False
EncoderFallback : System.Text.EncoderReplacementFallback
DecoderFallback : System.Text.DecoderReplacementFallback
IsReadOnly : True
CodePage : 1200

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s