Skip to content

Conversation

@wynnw
Copy link
Contributor

@wynnw wynnw commented Mar 22, 2024

We've found that the uwsgi build with python 3.11 has a log of segfaults when generating the tracebacks via the tracebacker thread. This is a new implementation for that in python 3.11 that has been much more stable as it avoids using sys._current_frames(), instead using the python thread state object uwsgi already has, and then moves more of the hard work into python code to simplify and make that code safer. Hopefully it's helpful for others.

xrmx added a commit that referenced this pull request Apr 6, 2024
Cherry pick a couple of commit from #2621
@xrmx
Copy link
Collaborator

xrmx commented Apr 6, 2024

I cherry-picked the two commits in the middle, taking a look at the others will require a bit

…ri events

- The existing tracebacker is periodically segfaulting on python 3.11.
  This implementation pushes more of the logic into python code,
  leveraging updates to the tracebacker module, so that the c code just
  has to iterate over list of strings returned back. Note that this does
  change the output format of the tracebacker to match the standard
  python format of the traceback.StackSummary.format() method, so anything
  parsing that output that will need to update it's logic to match.
- Also, this stops the use of sys._current_frames. This method is known to
  behave in strange ways when called from multithreaded code, and crashes a
  lot when doing harakiri tracebacks. This new implementation avoids calling
  sys._current_frames, and instead uses the PyThreadState that we already
  have for the main uwsgi worker thread. We get that, then pass that into our
  python function which will build the stack trace string for us. This avoids
  the segfaults and is much more stable than before. It only gets the
  traceback for the main thread which is the uwsgi worker thread, which
  is all we wanted anyway.
…d master graceful reload

- Instead of doing a blocking waitpid call, use WNOHANG calls to avoid
  breaking existing child process signal handling which was part of the
  master fork/exit cleanup that allowed FIFO-based zero downtime reloads
  to happen. This will also allow the fix to continue on as this code
  will block until all the workers have exited.
- Make the whitespace for the function consistent
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants