Description
In my synapse fork, I was seeing consistent flakiness when running complement on GH actions.
These tests were failing
❌ TestFederationKeyUploadQuery (580ms)
❌ TestKnockingInMSC3787Room (570ms)
❌ TestRestrictedRoomsRemoteJoinFailOverInMSC3787Room (6.73s)
❌ TestToDeviceMessagesOverFederation (7.73s)
❌ TestToDeviceMessagesOverFederation/interrupted_connectivity (6.14s)
❌ TestToDeviceMessagesOverFederation/stopped_server (20ms)
When adding debugging, I saw
"""
synapse_main | 2024-12-04 15:52:18,608 - synapse.federation.federation_base - 303 - ERROR - _process_incoming_pdus_in_room_inner-4-$Vp3-8StRMJno4kS-21Sr_LSK47Gk9HS4CIBdOjrsKbk - Invalid canonical JSON: {'auth_events': ['$3TeeAwpC6Edh_I_orHhXdCQLkztifWjjmSd78dU4qS0', '$7hX4UVoc-RUi5_agACZsqfotFvoqWkdwE0DMmiRQDL8', '$-ylJZyK-Hyxn_WOewXKLVD1XnVdAC_p2lxDLqKiG2pM'], 'content': {'bad_val': 1.1, 'body': 'Message 1'}, 'depth': 6, 'hashes': {'sha256': '+PxMZ1aox2NRluRwq0ctXEKXZ2NMsJG0yyKIBl1EQzg'}, 'origin': 'host.docker.internal:38621', 'origin_server_ts': 1733327538532, 'prev_events': ['$PO2EDaOjQOTSYMB0cN588VLhNn3JXUK2tHnLAWeKYG4'], 'room_id': '!0-1WnNO2FKvuc021cNJ3:host.docker.internal:38621', 'sender': '@charlie:host.docker.internal:38621', 'signatures': {'host.docker.internal:38621': {'ed25519:complement_aeff3b6780deb126c603cb94fcaefc9f922ad031cbab161c6f32014bac2354d1': '2Eidd/749jgA9rtwlAg54OENuddfKqY3P9YYBkvcZl5wx43ZhfvdkV9E4tLPwhbKVokwE5mOrs1l8flvjDsHDA'}}, 'type': 'm.room.message'} 400: Bad JSON value: 1.1
"""
This was happening on fetching prev_events
I noticed that the only place in complement where bad_val
comes from is the test TestOutboundFederationIgnoresMissingEventWithBadJSONForRoomVersion6
by disabling that one test, the rest of my tests started passing.
So it seems like a combination of 1) should synapse be failing on failing to deserialize prev_events? but also 2) why is this one test polluting others in the database? (and, relatedly, how do we isolate it?)