Skip to content

adopt save-and-restore pattern for libxml2 error handlers everywhere #2172

@flavorjones

Description

@flavorjones

See #2168 and #2169 for details, but the short version is that we should be more rigorous about saving-and-restoring error handlers and error handler metadata around libxml2 calls, in case any are being made recursively within Nokogiri.

For example, these lines in Nokogiri::HTML::Document::EncodingReader are calling HTML::SAX::PushParser to parse a chunk from the IO read callback of a regular document parse:

handler = SAXHandler.new
parser = Nokogiri::HTML::SAX::PushParser.new(handler)
parser << chunk rescue Nokogiri::SyntaxError

To allow users to do similarly complex things, we should always save-and-restore the error callbacks (which are the only global state I can think of that we regularly manipulate).

We're doing this in the HTML::SAX::PushParser class to cover ourselves in the aforementioned case:

Nokogiri_structured_error_func_save_and_set(&handler_state, NULL, NULL);
status = htmlParseChunk(ctx, chunk, size, Qtrue == _last_chunk ? 1 : 0);
Nokogiri_structured_error_func_restore(&handler_state);

typedef struct _libxmlStructuredErrorHandlerState {
void *user_data;
xmlStructuredErrorFunc handler;
} libxmlStructuredErrorHandlerState ;
void init_xml_syntax_error();
void Nokogiri_structured_error_func_save(libxmlStructuredErrorHandlerState *handler_state);
void Nokogiri_structured_error_func_save_and_set(libxmlStructuredErrorHandlerState *handler_state,
void *user_data,
xmlStructuredErrorFunc handler);
void Nokogiri_structured_error_func_restore(libxmlStructuredErrorHandlerState *handler_state);

void
Nokogiri_structured_error_func_save(libxmlStructuredErrorHandlerState *handler_state)
{
/* this method is tightly coupled to the implementation of xmlSetStructuredErrorFunc */
handler_state->user_data = xmlStructuredErrorContext;
handler_state->handler = xmlStructuredError;
}
void
Nokogiri_structured_error_func_save_and_set(libxmlStructuredErrorHandlerState *handler_state,
void *user_data,
xmlStructuredErrorFunc handler)
{
Nokogiri_structured_error_func_save(handler_state);
xmlSetStructuredErrorFunc(user_data, handler);
}
void
Nokogiri_structured_error_func_restore(libxmlStructuredErrorHandlerState *handler_state)
{
xmlSetStructuredErrorFunc(handler_state->user_data, handler_state->handler);
}

This issue is opened to make sure we remember to do this everywhere.

It's somewhat related to wrapping we need to do around any libxml2 callbacks which re-enter the Ruby interpreter and how we handle those exceptions, all of which are detailed at #1610.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions