Skip to content

Commit c86ce62

Browse files
committed
docs: document the new encoding options on ProcessExecuter.run_with_capture
1 parent 75c3d92 commit c86ce62

File tree

2 files changed

+153
-18
lines changed

2 files changed

+153
-18
lines changed

README.md

Lines changed: 83 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -26,13 +26,16 @@ then click the "Documentation" link.
2626
- Compatible with MRI 3.1+, TruffleRuby 24+, and JRuby 9.4+
2727
- Works on Mac, Linux, and Windows platforms
2828

29-
## Table of Contents
29+
## Table of contents
3030

3131
- [Requirements](#requirements)
32-
- [Table of Contents](#table-of-contents)
32+
- [Table of contents](#table-of-contents)
3333
- [Usage](#usage)
34-
- [Key Methods](#key-methods)
34+
- [Key methods](#key-methods)
3535
- [ProcessExecuter::MonitoredPipe](#processexecutermonitoredpipe)
36+
- [Encoding](#encoding)
37+
- [Encoding summary](#encoding-summary)
38+
- [Encoding details](#encoding-details)
3639
- [Breaking Changes](#breaking-changes)
3740
- [2.x](#2x)
3841
- [`ProcessExecuter.spawn`](#processexecuterspawn)
@@ -61,7 +64,7 @@ then click the "Documentation" link.
6164
[Full YARD documentation](https://rubydoc.info/gems/process_executer/) for this gem
6265
is hosted on RubyGems.org. Read below for an overview and several examples.
6366

64-
### Key Methods
67+
### Key methods
6568

6669
ℹ️ See [the ProcessExecuter module
6770
documentation](https://rubydoc.info/gems/process_executer/ProcessExecuter) for
@@ -104,7 +107,7 @@ This class's initializer accepts any compatible redirection destination supporte
104107
In addition to the standard redirection destinations, `MonitoredPipe` also
105108
supports these additional types of destinations:
106109

107-
- **Arbitrary Writers**
110+
- **Arbitrary writers**
108111

109112
You can redirect subprocess output to any Ruby object that implements the
110113
`#write` method. This is particularly useful for:
@@ -114,14 +117,88 @@ supports these additional types of destinations:
114117
- processing with a streaming parser to parse and process command output as the
115118
command runs
116119

117-
- **Multiple Destinations**
120+
- **Multiple destinations**
118121

119122
MonitoredPipe supports duplicating (or "teeing") output to multiple
120123
destinations simultaneously. This is achieved by providing an array in the
121124
format `[:tee, destination1, destination2, ...]`, where each `destination` can
122125
be any value that `MonitoredPipe` itself supports (including another tee or
123126
MonitoredPipe).
124127

128+
### Encoding
129+
130+
#### Encoding summary
131+
132+
The gem's core (`MonitoredPipe`) passes through raw bytes from the subprocess without
133+
attempting to interpret or transcode them. `ProcessExecuter.run_with_capture` allows
134+
text encodings to be specified for the captured stdout and stderr (defaulting to
135+
`UTF-8`). For these outputs, the raw bytes are interpreted as being in that specified
136+
encoding. The original byte sequence is preserved and the resulting captured string
137+
is tagged with the target encoding. No transcoding between different text encodings
138+
(e.g., `Latin-1` to `UTF-8`) is performed.
139+
140+
#### Encoding details
141+
142+
`ProcessExecuter::MonitoredPipe` is encoding agnostic. Bytes pass through this class
143+
from the subprocesses output to the destination object as a stream of unaltered
144+
bytes. No transcoding is applied. Strings written to the destination are tagged for
145+
the ASCII-8BIT (aka BINARY) encoding.
146+
147+
`ProcessExecuter` methods `.spawn_with_timeout`, `.run`, and `.run_with_capture` are
148+
also encoding agnostic except with one exception: the user can specify the assumed
149+
encoding for strings returned from `ResultWithCapture#stdout` and
150+
`ResultWithCapture#stderr`.
151+
152+
As a convenience, the captured output is assumed to be UTF-8 by default:
153+
154+
```ruby
155+
result = ProcessExecuter.run_with_capture('pwd')
156+
result.stdout #=> "/Users/James/projects/process_executer\n"
157+
result.stdout.encoding #=> #<Encoding::UTF-8>
158+
```
159+
160+
You can changed the assumed encoding for the captured stdout and stderr via options
161+
passed to `#run_with_capture`:
162+
163+
```ruby
164+
# Set the assumed encoding for both stdout and stderr
165+
result = ProcessExecuter.run_with_capture('pwd', encoding: Encoding::BINARY)
166+
result.stdout #=> "/Users/James/projects/process_executer\n"
167+
result.stdout.encoding #=> #<Encoding:BINARY (ASCII-8BIT)>
168+
169+
# You can set the assumed encoding separately for stdout and stderr
170+
# Encoding may be different for each
171+
result = ProcessExecuter.run_with_capture('pwd', stdout_encoding: 'BINARY', stderr_encoding: 'UTF-8')
172+
result.stdout.encoding #=> #<Encoding:BINARY (ASCII-8BIT)>
173+
result.stderr.encoding #=> #<Encoding:UTF-8>
174+
```
175+
176+
It is possible that the bytes captured are not valid in the given encoding. The user
177+
will need to check the `#valid_encoding?` method to know for sure.
178+
179+
```ruby
180+
File.binwrite('output.txt', "\xFF\xFE") # little-endian BOM marker is not valid UTF-8
181+
result = ProcessExecuter.run_with_capture('cat output.txt')
182+
result.stdout #=> "\xFF\xFE"
183+
result.stdout.encoding #=> #<Encoding:UTF-8>
184+
result.stdout.valid_encoding? #=> false
185+
```
186+
187+
Encoding options accept any encoding objects returned by `Encoding.list` or their
188+
String equivalent given by `#to_s`:
189+
190+
```ruby
191+
Encoding::UTF_8.to_s #=> 'UTF-8'
192+
```
193+
194+
Changing the assumed encoding DOES NOT cause transcoding. It simply interprets the
195+
bytes captured as the given encoding.
196+
197+
These encoding options ONLY affect the internally captured stdout and stderr for
198+
`ProcessExecuter::run_with_capture`. If you give an `out:` or `err:` option, these
199+
will result in BINARY encoded strings and you will need to handle setting the right
200+
encoding or transcoding after collecting the output.
201+
125202
## Breaking Changes
126203

127204
### 2.x

lib/process_executer.rb

Lines changed: 70 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -258,32 +258,69 @@ def self.run(*command, **options_hash)
258258
#
259259
# Accepts all [Process.spawn execution
260260
# options](https://docs.ruby-lang.org/en/3.4/Process.html#module-Process-label-Execution+Options),
261-
# the additional options defined by {spawn_with_timeout} and {run}, and the additional
262-
# option `merge_output`:
263-
#
264-
# * `merge_output: true` merges stdout and stderr into a single capture buffer
265-
# (default is false)
261+
# the additional options defined by {spawn_with_timeout} and {run}, and the
262+
# additional options `merge_output`, `encoding`, `stdout_encoding`, and
263+
# `stderr_encoding`:
264+
#
265+
# * `merge_output: <Boolean>` if true merges stdout and stderr into a single
266+
# capture buffer (default is false)
267+
# * `encoding: <Encoding>` sets the encoding for both stdout and stderr captures
268+
# (default is `Encoding::UTF_8`)
269+
# * `stdout_encoding: <Encoding>` sets the encoding for the stdout capture and, if
270+
# not nil, overrides the `encoding` option for stdout (default is nil)
271+
# * `stderr_encoding: <Encoding>` sets the encoding for the stderr capture and, if
272+
# not nil, overrides the `encoding` option for stderr (default is nil)
266273
#
267274
# The captured output is accessed in the returned object's `#stdout` and `#stderr`
268275
# methods. Merged output (if the `merged_output: true` option is given) is accessed
269276
# in the `#stdout` method.
270277
#
271-
# stdout and stderr redirection options may be given by the user. User-supplied
272-
# redirections will receive the output in addition to the internal capture.
278+
# stdout and stderr redirection destinations may be given by the user (e.g. `out:
279+
# <destination>` or `err: <destination>`). These redirections will receive the
280+
# output in addition to the internal capture.
281+
#
282+
# Unless told otherwise, the internally captured output is assumed to be in UTF-8
283+
# encoding. This assumption can be changed with the `encoding`,
284+
# `stdout_encoding`, or `stderr_encoding` options. These options accept any
285+
# encoding objects returned by `Encoding.list` or their String equivalent given by
286+
# `#to_s`.
287+
#
288+
# The bytes captured are not transcoded. They are interpreted as being in the
289+
# specified encoding. The user will have to check the validity of the
290+
# encoding by calling `#valid_encoding?` on the captured output (e.g.,
291+
# `result.stdout.valid_encoding?`).
273292
#
274293
# A `ProcessExecuter::ArgumentError` will be raised if both an options object and
275294
# an options_hash are given.
276295
#
277296
# @example capture stdout and stderr
278-
# result = ProcessExecuter.run_with_capture('echo HELLO; echo ERROR >&2')
279-
# result.stdout #=> "HELLO\n"
280-
# result.stderr #=> "ERROR\n"
297+
# result =
298+
# ProcessExecuter.run_with_capture('echo HELLO; echo ERROR >&2')
299+
# result.stdout #=> "HELLO\n" result.stderr #=> "ERROR\n"
281300
#
282301
# @example merge stdout and stderr
283302
# result = ProcessExecuter.run_with_capture('echo HELLO; echo ERROR >&2', merge_output: true)
284303
# # order of output is not guaranteed
285-
# result.stdout #=> "HELLO\nERROR\n"
286-
# result.stderr #=> ""
304+
# result.stdout #=> "HELLO\nERROR\n" result.stderr #=> ""
305+
#
306+
# @example default encoding
307+
# result = ProcessExecuter.run_with_capture('echo HELLO')
308+
# result.stdout #=> "HELLO\n"
309+
# result.stdout.encoding #=> #<Encoding:UTF-8>
310+
# result.stdout.valid_encoding? #=> true
311+
#
312+
# @example custom encoding
313+
# result = ProcessExecuter.run_with_capture('echo HELLO', encoding: Encoding::ISO_8859_1)
314+
# result.stdout #=> "HELLO\n"
315+
# result.stdout.encoding #=> #<Encoding:ISO-8859-1>
316+
# result.stdout.valid_encoding? #=> true
317+
#
318+
# @example custom encoding with invalid bytes
319+
# File.binwrite('output.txt', "\xFF\xFE") # little-endian BOM marker is not valid UTF-8
320+
# result = ProcessExecuter.run_with_capture('cat output.txt')
321+
# result.stdout #=> "\xFF\xFE"
322+
# result.stdout.encoding #=> #<Encoding:UTF-8>
323+
# result.stdout.valid_encoding? #=> false
287324
#
288325
# @overload run_with_capture(*command, **options_hash)
289326
#
@@ -301,6 +338,24 @@ def self.run(*command, **options_hash)
301338
# @option options_hash [Boolean] :merge_output if true, stdout and stderr will be
302339
# merged into a single capture buffer
303340
#
341+
# @option options_hash [Encoding, String] :encoding the encoding to assume for
342+
# the internal stdout and stderr captures
343+
#
344+
# The default is `Encoding::UTF_8`. This option is overridden by the `stdout_encoding`
345+
# and `stderr_encoding` options if they are given and not nil.
346+
#
347+
# @option options_hash [Encoding, String, nil] :stdout_encoding the encoding to
348+
# assume for the internal stdout capture
349+
#
350+
# The default is nil, which means the `encoding` option is used. If this option is
351+
# is not nil, it is used instead of the `encoding` option.
352+
#
353+
# @option options_hash [Encoding, String, nil] :stderr_encoding the encoding to
354+
# assume for the internal stderr capture
355+
#
356+
# The default is nil, which means the `encoding` option is used. If this option
357+
# is not nil, it is used instead of the `encoding` option.
358+
#
304359
# @overload run_with_capture(*command, options)
305360
#
306361
# @param command [Array<String>] see [Process module, Argument `command_line` or
@@ -337,6 +392,9 @@ def self.run(*command, **options_hash)
337392
#
338393
# @return [ProcessExecuter::ResultWithCapture]
339394
#
395+
# Where `#stdout` and `#stderr` are strings whose encoding is determined by the
396+
# `:encoding`, `:stdout_encoding`, or `:stderr_encoding` options.
397+
#
340398
# @api public
341399
#
342400
def self.run_with_capture(*command, **options_hash)

0 commit comments

Comments
 (0)