Skip to content

Improve buffer abstraction's encoding handling in JRuby dumper #760

Open
@headius

Description

@headius

In jruby/jruby#8682 we discovered that the use of IOOutputStream in GeneratorState.generate (for wrapping an IO-like object) is impacted by jruby/jruby#6588, poor handling of encodings in the implementation of byte[]-only OutputStream methods.

Specifying no encoding for IOOutputStream defaults to ASCII-8BIT, which breaks if the target IO has a MBC external encoding and any characters are in the high ASCII range.

Specifying UTF-8 as the encoding should work, but is impacted by jruby/jruby#8686, which fails to no-op when the provided encoding and the target IO's external encoding and subsequently errors in the character-transcoding subsystem.

In order to work around these issues, I have pushed #759 to force slow-path logic in IOOutputStream (dynamic "write" calls with String objects) whenever the target object is an IO with an external encoding. However we should restore the fast write logic by doing the following:

  • Fix the fast write logic downstream from IOOutputStream.write, so it accurately handles all incoming encodings (Handle encoding checks as in strTranscode jruby/jruby#8687).
  • Detect fixed versions of JRuby and switch to fast-write logic.
  • Implement a more robust IO-like wrapper that can handle mixed-encoding input, either in JRuby or in json.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions