-
Notifications
You must be signed in to change notification settings - Fork 357
Extract DTV info from __tls_get_addr, add to LibcInfo #929
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| } | ||
|
|
||
| // extractDTVInfoX86 analyzes __tls_get_addr to find the DTV offset from FS base | ||
| func extractDTVInfoX86(code []byte) (DTVInfo, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the actual new functionality.
| // extractDTVInfo extracts the introspection data for the DTV to access TLS vars | ||
| func extractDTVInfo(ef *pfelf.File) (*DTVInfo, error) { | ||
| var info DTVInfo | ||
| _, code, err := ef.SymbolData("__tls_get_addr", 2048) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the test coredumps don't actually have this symbol, so it is necessary to not error out if the symbol isn't present. Hence why i added the logging package, and we now just log the error if the symbol is missing.
We return an empty DTVInfo struct, it is up to users of DTV info to check that it is valid before using it. This can easily be done by verifying that "EntryWidth" is not 0.
In the cases where we DO have the symbol, but fail to extract info from it, we legitimately error out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems this is defined in the ld-linux-x86-64.so.2 in (some versions of) glibc. So it means that the libc information may need to be collected from two DSOs in case of glibc.
You should add this to the regexp pattern in IsPotentialTSDDSO. Perhaps rename that to IsLibcDSO?
This also means that ProcessManager.assignLibcInfo should be updated to merge the information from these two different DSOs. Probably add a helper libc.MergeLibcInfo or add a struct method for LibcInfo.Merge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the ability to check for equality and merge, and the values are "accumulated" when we call assignLibcInfo.
Also added unit tests for the associated LibcInfo.IsEqual and LibcInfo.Merge, and to verify the accumulation behaviour in assignLibcInfo.
1abf5f4 to
91a89bc
Compare
91a89bc to
5cb9b33
Compare
fabled
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! Some comments added. And I see that unit tests are missing for the new TLS extraction code.
If not too much work, could the mechanical conversion from TSD -> libc on the hook/struct be done a separate PR first? And then add the new functionality along with the missing tests in a next PR? I think the mechanical work should be trivial to get reviewed and merged first.
libc/libc_aarch64.go
Outdated
| switch inst.Op { | ||
|
|
||
| case aa.MRS: | ||
| foundThreadPtr = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The register to which the value is loaded is not tracked? Also there can be other MRS variants, should also check the actual full opcode like done in extractTSDInfoARM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed to bring it more in-line with extractTSDInfoARM, also fixed a test that turned out to be extracting the wrong value which this caught.
Great comments, will address them.
Actually it is there, just the renaming of files causes it to be hidden by default. Unless you are referring to additional unit tests you wanted?
I'll do this first, that should make this PR a lot easier to review once it is rebased on that. Git gets really confused about the additions happening as well as the renames. |
Done in #952 |
5cb9b33 to
23da4c0
Compare
|
I'll wait for #956 and rebase this again before reopening for review |
23da4c0 to
f2bede8
Compare
f2bede8 to
973a3ce
Compare
|
FYI this is next on my to-do list, not ready for another review yet still have comments to address |
The DTV variables read from extracted __tls_get_addr can be used to read TLS variables. If this symbol is missing, empty introspection data is returned.
99e5108 to
a437cf3
Compare
a437cf3 to
470f89b
Compare
- Move DTVInfo struct definition into C - Make DTVInfo struct members mirror TSDInfo names - Rename IsPotentialTSDDSO
470f89b to
acf2e8f
Compare
|
|
||
| // This is not expected to actually happen, but we want to be clear | ||
| // that values already present, if not empty, are kept and not reset | ||
| "non-empty values are ignored if already set": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a weird case we don't expect to ever see, where we somehow got TSD or DTV info from multiple libraries. In this case, we make it clear that whatever value we saw first stays. We could update this so that we take the last value we found, but I just wanted to make the behaviour clear with the test so that how this "conflict" is handled is defined.
330ef06 to
272c54d
Compare
c4d3476 to
d69f28b
Compare
|
I believe all comments have been addressed, should be ready for another look @fabled |
| if info.libcInfo.IsEqual(newLibcInfo) { | ||
| return | ||
| } else { | ||
| info.libcInfo.Merge(newLibcInfo) | ||
| newLibcInfo = *info.libcInfo | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes the semantics on UpdateLibcInfo to potentially be called multiple time for the interpreters. Earlier it was guaranteed to be called exactly once, and I think some interpreters may rely on the old behavior. Especially the fact that the first update might actually not give the information they need depending on which glibc DSO was seen first.
I was wondering if now that we have SynchronizeProcess and we process mappings as a whole instead of one-by-one, we could just do the libc matching first. But there could be corner cases where the process starts as single threaded and is seen as such, but it later loads a DSO what is multi threaded and pulls in libpthread. So perhaps it will not work so easily.
In any case, python and perl interpreters rely on getting the TSDInfo on the callback, but now the code might fire without it first, and then later with it. Probably a check for valid TSDInfo should be added in these two locatons? And possibly then check their procInfoInserted and not to reinstert the map entry if its inserted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if now that we have SynchronizeProcess and we process mappings as a whole instead of one-by-one, we could just do the libc matching first. But there could be corner cases where the process starts as single threaded and is seen as such, but it later loads a DSO what is multi threaded and pulls in libpthread. So perhaps it will not work so easily.
I'll check if it is possible to do it there, from my read-through of the code i wasn't sure what the guarantees were on when this would be called since things are processed on a "per-library" basis. I agree it would be good to synchronize it before informing the interpreters, and just call them once.
If i can't get that working, i'll try your other suggestion to at least make the other interpreters safe to have this called multiple times 👍
| func extractDTVInfoARM(code []byte) (DTVInfo, error) { | ||
| // Track register states similar to extractTSDInfoARM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm. we really should implement the arm.NewInterpreter similar to the amd.NewInterpreter that creates similar expressions to match to avoid all this state tracking duplication. but i suppose that's outside the scope of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes i found myself wishing for that several times while working on this, but yeah as it stands this is the status quo for arm. The amd api is quite nice. Something for future work - i can cut an issue for that?
| // Non-empty values from other override values in the receiver. | ||
| func (l *LibcInfo) Merge(other LibcInfo) { | ||
| // If other has TSDInfo and this instance does not, take it | ||
| if other.TSDInfo != (TSDInfo{}) && l.TSDInfo == (TSDInfo{}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just unconditionally assign if l is initial state? same for the next hunk.
| if other.TSDInfo != (TSDInfo{}) && l.TSDInfo == (TSDInfo{}) { | |
| if l.TSDInfo == (TSDInfo{}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that should be fine, i'll see about updating the test to assert this behaviour too. I was iffy on how we want to handle this, I think that "accept the new value if we are uninitialized" as you are proposing is a good behaviour.
What
Refactor's
tpbaseto belibcpackage, as it provides additional info about libc.In particular, this adds DTV introspection information and bundles this with the TSD info into a new
LibcInfo.Addresses #883 for all cases except static TLS.
Why
We need the DTV information to look up TLS variables when TLS descriptors are not available.
Refactoring to a generic "libc" approach is something @fabled suggested.
How
This adds disassembler code for extracting the DTV information from
__tls_get_addrif it is present in the libc.It should be supported for musl and glibc, on both x86_64 and aarch64, and test cases from different libc versions and architectures are added to validate that this is the case.
The
tpbasepackage is renamed tolibc, and related functions are renamed to be more generic, but otherwise the functionality is unchanged.Nothing actually uses the new DTV data yet, another PR will follow up to add that for Ruby which reads the execution context from a TLS variable.