Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] [common] Fix RawMessageImpl.getProperties() failed when the message metadata contains the same key but with different values #23927

Conversation

horizonzy
Copy link
Member

@horizonzy horizonzy commented Feb 5, 2025

Main Issue: #23925

My fork PR: horizonzy#20

Motivation

If the message metadata contains the same key but with different values, we should make the later value override the previous value.

Modifications

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Feb 5, 2025
@dao-jun dao-jun added type/bug The PR fixed a bug or issue reported a bug ready-to-test labels Feb 5, 2025
@dao-jun dao-jun closed this Feb 5, 2025
@dao-jun dao-jun reopened this Feb 5, 2025
@horizonzy horizonzy closed this Mar 11, 2025
@horizonzy horizonzy reopened this Mar 11, 2025
@codecov-commenter
Copy link

codecov-commenter commented Mar 11, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.84%. Comparing base (bbc6224) to head (ffbb581).
Report is 954 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #23927      +/-   ##
============================================
- Coverage     73.57%   72.84%   -0.74%     
+ Complexity    32624    31818     -806     
============================================
  Files          1877     1863      -14     
  Lines        139502   157866   +18364     
  Branches      15299    19829    +4530     
============================================
+ Hits         102638   114994   +12356     
- Misses        28908    33548    +4640     
- Partials       7956     9324    +1368     
Flag Coverage Δ
inttests 29.10% <0.00%> (+4.52%) ⬆️
systests 25.16% <0.00%> (+0.84%) ⬆️
unittests 72.18% <100.00%> (-0.66%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...g/apache/pulsar/common/api/raw/RawMessageImpl.java 50.64% <100.00%> (+0.64%) ⬆️

... and 1058 files with indirect coverage changes

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

mattisonchao
mattisonchao previously approved these changes Mar 11, 2025
@mattisonchao
Copy link
Member

mattisonchao commented Mar 11, 2025

Some questions:

  1. why do the message properties allow duplicated keys?
  2. we remove one of the duplicate keys while reading the messages. Is it some properties lost for the user?

@mattisonchao mattisonchao dismissed their stale review March 11, 2025 08:09

need a question answer.

Copy link
Contributor

@BewareMyPower BewareMyPower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to @mattisonchao's comment. We should fix the case that duplicated keys are included in the same message.

@BewareMyPower
Copy link
Contributor

BewareMyPower commented Mar 11, 2025

If a message has duplicated keys in its property, I think we should allow the downstream to determine how to process such duplicated keys rather than retain a random value for such keys. For example, we can support throwing an exception that contains a list of <duplicated-key, values...>.

@horizonzy
Copy link
Member Author

If a message has duplicated keys in its property, I think we should allow the downstream to determine how to process such duplicated keys rather than retain a random value for such keys. For example, we can support throwing an exception that contains a list of <duplicated-key, values...>.

I think not. We define the MessageMetadata properties as followings.

message MessageMetadata {
    required string producer_name   = 1;
    required uint64 sequence_id     = 2;
    required uint64 publish_time    = 3;
    repeated KeyValue properties    = 4;
}

We use the proto repeated to define the properties like map, and the field properties type is repeated, so the order is determined.

For the map case, we should always use the new k,v to override the old k,v. They didn't care whether the message has duplicated keys, they just need use the newest k,v.

@horizonzy
Copy link
Member Author

If a message has duplicated keys in its property, I think we should allow the downstream to determine how to process such duplicated keys rather than retain a random value for such keys. For example, we can support throwing an exception that contains a list of <duplicated-key, values...>.

I think not. We define the MessageMetadata properties as followings.

message MessageMetadata {
    required string producer_name   = 1;
    required uint64 sequence_id     = 2;
    required uint64 publish_time    = 3;
    repeated KeyValue properties    = 4;
}

We use the proto repeated to define the properties like map, and the field properties type is repeated, so the order is determined.

For the map case, we should always use the new k,v to override the old k,v. They didn't care whether the message has duplicated keys, they just need use the newest k,v.

@mattisonchao here

@BewareMyPower
Copy link
Contributor

That makes sense to me. I approved it because it solves the issue when duplicated keys are already written as the properties. Regarding the producer side to prevent duplicated keys in MessageMetadata, I think it can be done in a separated PR.

@mattisonchao
Copy link
Member

well, I can approve it since the MessageImpl side has a kind of same logic. #6390

umm. it seems like that was caused by proto2 not supporting map type, and we didn't limit the duplicate key on the protocol side.

@mattisonchao mattisonchao merged commit 47f3223 into apache:master Mar 11, 2025
145 of 155 checks passed
@Technoboy- Technoboy- added this to the 4.1.0 milestone Mar 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-not-needed Your PR changes do not impact docs ready-to-test type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants