Skip to content

The mystery of the remnant symbol #346

Open
@agriyakhetarpal

Description

@agriyakhetarpal

Title

The mystery of the remnant symbol: software, society, and our digital heritage

Describe your Talk

This talk is about my experience of contributing to an open-source tool called pydistcheck. This package is a linter that finds portability issues in Python distributions (sdists, wheels, and conda archives). It came into existence after the author's SciPy 2022 talk "Does that CSV Belong on PyPI? Probably Not", which dealt with topics directly inspiring its creation. Specifically, I put together a pull request at jameslamb/pydistcheck#310, where I addressed an issue related to Apple, Inc.'s strip system utility when used during the build process of Python packages with compiled extensions on macOS machines.

The strip program on macOS devices has an interesting quirk: it adds a specific weak symbol radr://5614542 (see Radar, Apple's legacy bug tracking system) in Mach-O files, which is detected as a debug symbol. It appears even in properly stripped binaries, despite stripping, as an artifact of the process – whether when running manually or at the linkage step of a build system. At the time of writing, this behaviour does not exist in third-party linkers for macOS, such as zld or the later alternative lld. However, it is still known to exist in the newer variant of Apple's static linker, ld-prime. The source code for ld-prime has not been released yet.

As a part of my investigation into this issue, I discovered that several pages from https://opensource.apple.com/ have been wiped into oblivion, especially pages to the original strip source code, and the Internet Archive (at https://archive.org/ and its mirrors at https://archive.pw and https://archive.is) were of hardly any help, as Apple was able to take down pages from there as well. I faced some struggle to find just one working link to Apple's cctools source, and I was able to grab one that had surprisingly not been taken down. Given that I have the original strip.c source code, I now have an understanding of what the Radar bug is related to: the old classic linker misbehaves if an "indirect symbol" appears at the 0th index in the symbol table. Indirect symbols are like forwarding addresses; they point to other symbols and are used for things like functions in shared libraries. This was worked around by adding this dummy symbol at the 0th index when there was a risk of indirect symbols landing there. In my talk, I plan to describe this behaviour (albeit briefly) in a better manner through appropriate code snippets from the source code and my understanding of the issue and the insights I've gained. While this symbol has been proven harmless, it continues to appear sporadically in internet culture. Here's a search result from Twitter, for example: https://x.com/search?q=radr://5614542&src=typed_query&f=top.

At the time of this talk, we're facing a crisis in knowledge dissemination. This bug led me to think about how the requisite documentation for understanding fundamental computing infrastructure, whether it's for macOS or other open-source code whose later versions are now proprietary, has just... disappeared from the public. It is not ideal for a programmer to mimic the work of a digital archaeologist in exploring bugs that have a history spanning two decades or more for code we write every day, regardless of how intriguing that line of work may be. While this is the work of a big tech oligarch like Apple, Inc., I opine that technical gatekeeping of this form echoes the sentiments of a larger pattern in the 2025 political landscape that I want to share with the audience, as described below.

In 2025, POTUS Donald Trump and the far-right GOP have had America in shambles through their fascist agenda. Over 8,000 web pages and approximately 3,000 datasets have been removed or modified across federal agencies. NASA had to comply and henceforth undertook the process of a comprehensive removal of DEIA-related content, including interviews with Black and female NASA employees and LGBTQ-related content. While this hasn't been primarily about the removal of scientific data removal, it is illustrative of the fact that institutional knowledge and open data established over half a century or longer can swiftly vanish in the wake of the hour following administrative orders, with hardly any time for recourse.

I'd like to further pivot briefly into two case studies for preserving data and code:

Lastly, I plan to end by describing how the audience can ensure that their code is archived and preserved: don't rely solely on GitHub and rather mirror to multiple platforms, use Software Heritage to archive (necessary) code, and think about broader themes of digital preservation and social justice. The main takeaways are to understand that every technical problem has social dimensions and that preservation is a political issue; those who control technical knowledge will shape who can innovate. I hope these the audience will find such steps for developers and citizens to preserve and democratise technical knowledge pragmatic for their usage.

Pre-requisites & reading material

  • Some information about distributing Python code to PyPI and/or other package indices (not required, but helpful to have); and
  • An understanding of how to compile code into binaries and run them: what (Unix) object files are, what symbols are, how to use a compiler and linker; and
  • A general interest in the politics and preservation of code and data

I do not expect the general audience to be familiar with these details, but having an overview of these would be beneficial. Given that the June edition is also targeted at a Linux-centric audience following a collaboration with ILUG-D, I expect the audience coming from the ILUG-D community and those acquainted with Linux/Unix concepts to be able to follow along with relative ease. I plan to cover these topics on a rudimentary basis instead of diving into deeper explanations, as the talk is thirty minutes long.

Resources

For more information on radr://5624542

The political state of the U.S.A. in 2025

The Apollo 11 story

Bitbucket and Mercurial, Software Heritage, digital preservation, and more

Time required for the talk

Twenty-five minutes; five minutes for questions

Link to slides/demos

No response

About you

I am a software engineer at Quansight, where I work on open-source scientific software in the Scientific Python and PyData ecosystems. My interests include Python packaging, compilers and toolchains, documentation and technical writing, as well as numerical software, among other areas. I spend my time working on Pyodide, JupyterLite, and various other open-source scientific software projects, and I enjoy the capacities in which they all interoperate.

Bluesky: @agriyakhetarp.al
Mastodon: fosstodon.org/@agriyakhetarpal
Twitter: @agriyakhetarpal
LinkedIn: linkedin.com/in/agriyakhetarpal
Email address: agriyakhetarpal [at] outlook [dot] com

I've previously spoken at a PyDelhi meetup last year, in July 2024. Here's my submission from that time: #285

Availability

21/06/2025

Any comments

N/A

Metadata

Metadata

Labels

acceptedCongratulations, your talk has been accepted!proposalWish to present at PyDelhi? This label added automatically on choosing the "Talk Proposal" option.scheduledThis talk/workshop is scheduled for the next meetup, either for the same month or the coming one

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions