-
-
Notifications
You must be signed in to change notification settings - Fork 918
Description
NodeSet today
Over the years, NodeSet
has slowly approached being API-compatible with Enumerable
or Array
. This is good, and it validates the mental model of libxml2's xmlNodeSet
as an augmented ordered set, especially given that the underlying implementation is literally a C array:
typedef struct _xmlNodeSet xmlNodeSet;
typedef xmlNodeSet *xmlNodeSetPtr;
struct _xmlNodeSet {
int nodeNr; /* number of nodes in the set */
int nodeMax; /* size of the array as allocated */
xmlNodePtr *nodeTab; /* array of nodes in no particular order */
/* @@ with_ns to check whether namespace nodes should be looked at @@ */
};
However, we find ourselves at an interesting point, where NodeSet
is not completely an Enumerable
or Array
, and there are open issues pointing this out:
Further, NodeSet
has baggage, namely the associated Document
object which makes simple operations harder:
- NodeSet#dup needs deep/recursive option · Issue #924 · sparklemotion/nokogiri
- NodeSet.dup does not deep copy · Issue #1678 · sparklemotion/nokogiri
or even causes bugs:
Finally, the NodeSet
class is bigger and more complex than necessary (in both CRuby and JRuby), and so is a bit of a maintenance burden at this point.
NodeSet Tomorrow
As mentioned in #1952, it would be simpler if NodeSet
was a subclass of Array
, which would free us from using libxml2's xmlNodeSet
and unify the JRuby and CRuby implementations
The memory model could be updated so that it was independent of any Document
, thereby bringing it into alignment with the memory model of all the standard Ruby collection classes.
The Enumerable
API would be perfectly conformed to.
The API would be extended with Searchable
to support current API usage.
The API could also implement Document
decorators at creation time by optionally inheriting them from an existing NodeSet
or the creating Document
. Decorators are a rarely-used and ill-documented feature which I suspect is buggy and would be improved by moving to a simpler implementation.
DocumentFragment tomorrow
Finally, this opens the door to a long-time roadmap item, which is to re-implement DocumentFragment
on top of NodeSet
, thereby avoiding use of libxml2's underlying conventions (and further unifying the JRuby and CRuby implementations). This would further be a simplifying change and would potentially allow us to fix the quirks with how XPath searches work in fragments differently than in Document
s and NodeSet
s.
Risks
Primarily, the risks are:
- GC implementation correctness
- Potentially, unexpected memory usage - enabled by the fix to segfault in node_set.rb #1952
The first risk exists because we'd be making an invasive change to the current codebase which has been tested thoroughly by many applications over many years. This can be mitigated by continuing to run valgrind
in the CI suite, and potentially extending coverage to use ASan
. We may want to consider implementing a new class entirely to allow applications the ability to "flip back to the previous implementation" at runtime if any surprising problems occur (i.e., by setting an environment variable or global constant before Nokogiri is loaded).
The second risk exists because a NodeSet
may now contain nodes from many documents, and the highly-connected DOM graph may then mean that many unused objects would be prevented from being GCed. This perhaps shouldn't be surprising to anyone who's thought deeply about directed graphs.