-
-
Notifications
You must be signed in to change notification settings - Fork 918
Description
See #2168 and #2169 for details, but the short version is that we should be more rigorous about saving-and-restoring error handlers and error handler metadata around libxml2 calls, in case any are being made recursively within Nokogiri.
For example, these lines in Nokogiri::HTML::Document::EncodingReader
are calling HTML::SAX::PushParser to parse a chunk from the IO read callback of a regular document parse:
nokogiri/lib/nokogiri/html/document.rb
Lines 275 to 277 in 7be6f04
handler = SAXHandler.new | |
parser = Nokogiri::HTML::SAX::PushParser.new(handler) | |
parser << chunk rescue Nokogiri::SyntaxError |
To allow users to do similarly complex things, we should always save-and-restore the error callbacks (which are the only global state I can think of that we regularly manipulate).
We're doing this in the HTML::SAX::PushParser class to cover ourselves in the aforementioned case:
nokogiri/ext/nokogiri/html_sax_push_parser.c
Lines 24 to 28 in 7be6f04
Nokogiri_structured_error_func_save_and_set(&handler_state, NULL, NULL); | |
status = htmlParseChunk(ctx, chunk, size, Qtrue == _last_chunk ? 1 : 0); | |
Nokogiri_structured_error_func_restore(&handler_state); |
nokogiri/ext/nokogiri/xml_syntax_error.h
Lines 6 to 17 in 7be6f04
typedef struct _libxmlStructuredErrorHandlerState { | |
void *user_data; | |
xmlStructuredErrorFunc handler; | |
} libxmlStructuredErrorHandlerState ; | |
void init_xml_syntax_error(); | |
void Nokogiri_structured_error_func_save(libxmlStructuredErrorHandlerState *handler_state); | |
void Nokogiri_structured_error_func_save_and_set(libxmlStructuredErrorHandlerState *handler_state, | |
void *user_data, | |
xmlStructuredErrorFunc handler); | |
void Nokogiri_structured_error_func_restore(libxmlStructuredErrorHandlerState *handler_state); |
nokogiri/ext/nokogiri/xml_syntax_error.c
Lines 3 to 24 in 7be6f04
void | |
Nokogiri_structured_error_func_save(libxmlStructuredErrorHandlerState *handler_state) | |
{ | |
/* this method is tightly coupled to the implementation of xmlSetStructuredErrorFunc */ | |
handler_state->user_data = xmlStructuredErrorContext; | |
handler_state->handler = xmlStructuredError; | |
} | |
void | |
Nokogiri_structured_error_func_save_and_set(libxmlStructuredErrorHandlerState *handler_state, | |
void *user_data, | |
xmlStructuredErrorFunc handler) | |
{ | |
Nokogiri_structured_error_func_save(handler_state); | |
xmlSetStructuredErrorFunc(user_data, handler); | |
} | |
void | |
Nokogiri_structured_error_func_restore(libxmlStructuredErrorHandlerState *handler_state) | |
{ | |
xmlSetStructuredErrorFunc(handler_state->user_data, handler_state->handler); | |
} |
This issue is opened to make sure we remember to do this everywhere.
It's somewhat related to wrapping we need to do around any libxml2 callbacks which re-enter the Ruby interpreter and how we handle those exceptions, all of which are detailed at #1610.