This blog post is to discuss output encoding format used when data is passed from one PowerShell cmdlet or to other applications. This is a rarely understood feature unless you are trying to write some module which integrates PowerShell with another software.
Passing output between PowerShell cmdlets
The Strings inside PowerShell are 16-bit Unicode, instances of .NET’s System.String class. So by default, when you pipe output from one cmdlet to another, it is passed as 16-bit unicode or utf-16. Since Out-File is again a powershell cmdlet, it passes unicode text to the file generated. Same goes for redirection operators > and >> in the PowerShell.
As of PowerShell 5.1 (which is the latest version), there is no way to change the encoding of the output redirection operators > and >> and they invariably create UTF-16 LE files with a BOM (byte-order mark).
However, in PowerShell v3 or higher, you can use $PSDefaultParameterValues to change the encoding of any cmdlets and advanced functions that accept an -Encoding parameter:
$PSDefaultParameterValues = @{ '*:Encoding' = 'utf8' }
If you place this in your $PROFILE, cmdlets such as Out-File and Set-Content will use UTF-8 encoding by default.
This encoding format has no relation to $OutputEncoding parameter, which is discussed next.
Passing output from PowerShell to Native Application
When we pipe output data from PowerShell cmdlets into native applications, the output encoding from PowerShell cmdlets is controlled by the $OutputEncoding variable, which is by default set to ASCII. This variable $OutputEncoding is a system generated variable and its values can be simply obtained by typing variable name in PowerShell prompt:
PS D:\PowerShell-master> $OutputEncoding IsSingleByte : True BodyName : us-ascii EncodingName : US-ASCII HeaderName : us-ascii WebName : us-ascii WindowsCodePage : 1252 IsBrowserDisplay : False IsBrowserSave : False IsMailNewsDisplay : True IsMailNewsSave : True EncoderFallback : System.Text.EncoderReplacementFallback DecoderFallback : System.Text.DecoderReplacementFallback IsReadOnly : True CodePage : 20127
This is set to ASCII because most of the applications do not handle unicode correctly. This may result in cases where an program on right side of the pipeline or redirection is not able to read input data clearly. This becomes especially important if your software supports multiple languages. For example, let’s create a text file with some Chinese characters in it.
PS C:\> ${c:\test.txt}=”中文“
Try to use findstr to find one of the Chinese characters, and it will not find anything:
PS C:\> Get-Content c:\test.txt | findstr /c:中
But, same command works in Cmd.exe:
PS C:\> cmd /c “findstr /c:中 test.txt” 中文
You can change the value of this variable to handle the encoding format:
PS D:\PowerShell-master> $OutputEncoding = [System.Text.Encoding]::Unicode PS D:\PowerShell-master> $OutputEncoding BodyName : utf-16 EncodingName : Unicode HeaderName : utf-16 WebName : utf-16 WindowsCodePage : 1200 IsBrowserDisplay : False IsBrowserSave : True IsMailNewsDisplay : False IsMailNewsSave : False IsSingleByte : False EncoderFallback : System.Text.EncoderReplacementFallback DecoderFallback : System.Text.DecoderReplacementFallback IsReadOnly : True CodePage : 1200
[…] Understanding default encoding and Change the same in PowerShell – mohitgoyal.co […]
LikeLike