-
-
Notifications
You must be signed in to change notification settings - Fork 76
rspamd: add native integration with file header support #426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add native Rspamd integration using rspamd-client crate v0.3.0. Features: - File header optimization for local scanning (avoids message body transfer) - ZSTD compression enabled by default - HTTPCrypt encryption support - Automatic metadata extraction (IP, HELO, sender, recipient) - Flexible action handling (reject, tag, quarantine) - Proxy and TLS/mTLS support New components: - mod-rspamd: Rust module with Lua bindings - policy-extras/rspamd.lua: high-level Lua helper with setup() function - examples/rspamd-integration.lua: comprehensive usage examples The file header optimization is particularly beneficial when Rspamd runs on the same host as KumoMTA, avoiding transmission of message bodies over the socket for significant performance improvement.
wez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this!
I have quite a few comments/suggestions on how to integrate this better, and some are because of changes that only got pushed this week, likely while you weren't looking!
For testing, I'd suggest looking at the redis and amqp integrations.
For amqp, we leverage https://docs.rs/testcontainers-modules/0.13.0/testcontainers_modules/rabbitmq/struct.RabbitMq.html to run a server in a docker container:
kumomta/crates/integration-tests/src/rabbit.rs
Lines 97 to 101 in f582d60
| #[tokio::test] | |
| async fn test_amqprs_rabbit() -> anyhow::Result<()> { | |
| if std::env::var("KUMOD_TESTCONTAINERS").unwrap_or_else(|_| String::new()) != "1" { | |
| return Ok(()); | |
| } |
For redis, we launch an isolated copy of the redis server ourselves (no need for a container), on a port that we allocate dynamically:
kumomta/crates/mod-redis/src/test.rs
Lines 31 to 49 in f582d60
| pub async fn spawn(extra_config: &str) -> anyhow::Result<Self> { | |
| let mut errors = vec![]; | |
| for _ in 0..2 { | |
| let port = allocate_port(); | |
| match timeout( | |
| Duration::from_secs(5), | |
| Self::spawn_with_port(port, extra_config), | |
| ) | |
| .await? | |
| { | |
| Ok(me) => return Ok(me), | |
| Err(err) => { | |
| errors.push(format!("{err:#}")); | |
| } | |
| } | |
| } | |
| anyhow::bail!("failed to spawn redis-server: {}", errors.join(". ")); | |
| } |
What I think we should have for rspamd testing is similar to those: if we find rspamd installed in a reasonable path, then we can spawn it with an isolated configuration from a unit test, then run with a policy file that will trigger rspamd testing appropriately. Look at:
kumomta/crates/integration-tests/src/test/queue_ndr.rs
Lines 13 to 17 in f582d60
| let mut daemon = DaemonWithMaildirOptions::new() | |
| .policy_file("ndr.lua") | |
| .start() | |
| .await | |
| .context("DaemonWithMaildir::start")?; |
for an example of an integration test that uses a custom policy. That particular one loads in https://github.com/KumoCorp/kumomta/blob/main/crates/integration-tests/ndr.lua
Add support for Rspamd milter actions including: - Add headers (X-Spam-*, custom headers, etc.) - Remove headers by name - Subject line rewriting New functions: - apply_milter_actions(): Process milter add_headers and remove_headers - Updated apply_action() to automatically apply milter modifications The milter actions are applied automatically when using rspamd:setup() or rspamd.apply_action(). Users can also call apply_milter_actions() independently for fine-grained control. Note: Body rewriting is not currently supported by rspamd-client crate. Header modifications (including Subject rewriting) are fully functional. Added comprehensive example (Example 5) demonstrating milter usage.
Upgrade to rspamd-client 0.4.0 which adds support for message body
rewriting via the body_block feature.
Changes in mod-rspamd:
- Add body_block field to EnvelopeDataLua
- Pass body_block flag to rspamd-client when requested
- Expose rewritten_body in scan results
Changes in Lua helper:
- Add body_block option to extract_metadata()
- New function apply_body_rewrite() to replace message body
- Update apply_action() to handle body rewriting automatically
- Body rewrite is applied before milter actions
- Update setup() to support body_block configuration option
Features:
- When body_block is enabled, Rspamd returns the full rewritten message
- Automatic application via rspamd:setup({body_block = true})
- Manual control via rspamd.apply_body_rewrite(msg, result)
- Body rewriting includes all modifications (headers + body)
Examples added:
- Example 6: Manual body rewriting with inspection
- Example 7: Automatic body rewriting with setup()
- Updated Example 9: Direct API usage with body_block
The body rewriting feature is useful for Rspamd modules that modify
message content beyond headers, such as:
- Stripping dangerous attachments
- Adding email disclaimers
- Modifying MIME structure
- Content sanitization
- Update rspamd-client dependency to 0.4.1 - Replace set_first_named_header with prepend_header throughout - Fix Subject header handling to use remove_all_named_headers + prepend_header - Refactor EnvelopeDataLua to use #[derive(FromLua)] macro - Add "macros" feature to mlua dependency - Change additional_headers to Option<HashMap<String, String>> - Remove manual from_table() implementation - Add KeySource support for TLS parameters - Add data-loader dependency - Replace tls_cert_path/tls_key_path/tls_ca_path with KeySource types - Support File variant with helpful errors for Data/Vault variants
Add kumo.rspamd.scan_message() function that provides a streamlined interface for Rspamd integration: - Add message crate dependency for direct Message object access - Refactor RspamdClientConfig to use make_client_config() method - Add default action configuration flags: - add_headers: Add X-Spam-* headers (default: true) - reject_spam: Reject spam messages (default: false) - reject_soft: Use 4xx instead of 5xx rejection (default: false) - prefix_subject: Prefix subject with ***SPAM*** (default: false) - Automatically extract envelope metadata from Message object: - sender, recipients, IP, user, HELO, hostname - Apply default actions based on configuration - Call kumo.reject via Lua runtime for rejection handling - Add examples showing simplified API usage with encryption This simplifies Rspamd integration by eliminating the need for manual metadata extraction and boilerplate action handling.
Add comprehensive integration tests for Rspamd integration: - test_rspamd_scan_message: Basic scan functionality - Starts Rspamd container using testcontainers - Configures KumoMTA with scan_message API - Sends test message and verifies scan occurs - Validates message delivery after scan - test_rspamd_reject_spam: Rejection behavior - Tests reject_spam configuration flag - Verifies normal messages are accepted - Confirms spam rejection when enabled - test_rspamd_headers: Header addition - Validates X-Spam-* headers are added - Checks X-Spam-Flag, X-Spam-Score, X-Spam-Action - Verifies headers appear in processed messages Tests follow the same pattern as rabbit.rs and use the KUMOD_TESTCONTAINERS environment variable for opt-in execution. Uses GenericImage with rspamd/rspamd:latest Docker image.
- Add health check to verify Rspamd is ready before running tests - Define required spool directories in test configurations - Increase container startup timeout to 5 seconds - Poll Rspamd /ping endpoint to ensure service availability This eliminates test failures due to timing issues where tests attempted to scan messages before Rspamd was fully initialized.
- Change base_url from String to url::Url for validation - Change sensitive fields to KeySource for secure storage: - encryption_key: Option<KeySource> - password: Option<KeySource> - proxy_username: Option<KeySource> - proxy_password: Option<KeySource> - Add helper to extract strings from KeySource (File/Data/Vault) - Update example to demonstrate KeySource usage - Add url crate dependency This provides stronger compile-time validation of URLs and better security through flexible credential storage options.
Changed from smtp_server_message_received to smtp_server_data event for Rspamd scanning to provide: - Efficient per-batch scanning (scans once per SMTP transaction) - Avoids duplicate scans when messages have multiple recipients - Better SMTP protocol alignment (reject before final acceptance) Updated: - policy-extras/rspamd.lua: Use smtp_server_data in documentation - examples/rspamd-integration.lua: All 10 examples now use smtp_server_data - integration tests: Updated to use smtp_server_data - extract_metadata(): Uses recipient_list() to handle batch recipients - Added comments explaining per-batch scanning benefits The smtp_server_data event fires during DATA phase before message batching, allowing single scan per SMTP transaction regardless of recipient count. Metadata is stored and available for later events.
Added documentation and examples for scanning once in smtp_server_data but applying per-recipient spam thresholds in smtp_server_message_received. This pattern provides: - Efficient scanning (once per SMTP transaction) - Per-recipient policy decisions (VIP users, different domains, etc.) - Ability to reject before final message acceptance - Access to recipient-specific settings Changes: - policy-extras/rspamd.lua: Added "Per-Recipient Spam Thresholds" section with example code showing the two-stage pattern - examples/rspamd-integration.lua: Added Example 11 demonstrating per-recipient thresholds with VIP/staff/default tiers - integration-tests/src/rspamd.rs: Added test_rspamd_per_recipient_threshold test case validating the pattern works correctly
Added comprehensive config validation modeled after queue.lua: - Define valid Rspamd actions and module actions - Raise errors (not warnings) for unknown actions - Validate action parameter during setup() - Validate custom_handlers in validate_config hook - Validate quarantine settings - Validate base_url format Changes: - Added VALID_RSPAMD_ACTIONS and VALID_MODULE_ACTIONS tables - Added table_keys() helper function - Modified apply_action() to error on unknown actions - Added validation in setup() for action parameter - Added validate_config hook for deferred validation - Store configuration in mod.CONFIGURED for validation - Prevent double-configuration with error check This integrates with KumoMTA's --validate pipeline and provides clear error messages for configuration issues.
|
Ok, it seems to be more or less functional now I hope. |
wez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really good, thank you!
A few more comments below!
- Fixed critical bug in smtp_server.rs where headers added in smtp_server_data hook were lost when trace headers were enabled. The code was using original message data instead of modified base_message data when batching messages. - Refactored Rspamd integration to use action-based logic instead of score-based: * 'no action' -> deliver normally (ham) * 'add header' -> add spam headers and deliver * 'rewrite subject' -> optionally rewrite subject and deliver * 'soft reject' -> temporary failure (451) for greylisting * 'reject' -> permanent failure (550) when reject_spam is enabled - Added support for Rspamd's smtp_message field for custom rejection messages - Removed reject_soft config option (doesn't make sense for spam) - Optimized message data handling to minimize copying - Added comprehensive integration tests with RspamdContainer helper
|
@wez it seems there might be an issue with headers rewriting in the Rust. Or maybe I don't understand the original logic... --- a/crates/kumod/src/smtp_server.rs
+++ b/crates/kumod/src/smtp_server.rs
@@ -2619,9 +2619,12 @@ impl SmtpServerSession {
)
};
- let mut body = Vec::with_capacity(data.len() + received.len());
+ // Use base_message.get_data() instead of original data to preserve
+ // any modifications made in smtp_server_data hook (e.g., Rspamd headers)
+ let base_data = base_message.get_data();
+ let mut body = Vec::with_capacity(base_data.len() + received.len());
body.extend_from_slice(received.as_bytes());
- body.extend_from_slice(&data);
+ body.extend_from_slice(&base_data);
Arc::new(body.into_boxed_slice())
} else {
base_message.get_data()This change looks logical to me but please tell me if I'm wrong. |
The recent message batching changes led to this bug; we were using the unchanged data payload instead of the data that may have been altered by lua callouts. refs: #426 (comment)
Oh, good catch! I pushed a fix for this in 42b65b2 |
wez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just submitting a couple of comments for now; I can see that you're active so I'll review again a little later!
- Update rspamd-client to 0.5.0 which accepts AsRef<[u8]> - Add MessageDataWrapper for zero-copy Arc<Box<[u8]>> passing - Eliminate message body allocation and copy in scan_async call - Use match guards for cleaner conditional logic in action handling - Fix error handling to use format string interpolation
|
Any updates? |
wez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this; sorry for the delay in reviewing since the last time, we've had a lot going on!
Most of my comments are rather nit-picky around the tests, but the core looks good.
There's a bit of tidying up to do to match the current rust-centric implementation as compared to the original more lua-centric approach.
There's something that I think we should do to enable "connection pooling" (or rather, client-object-caching), but I think that we can do that in a follow-up PR: I'd like to get this one buttoned up so that the caching can be in its own smaller PR.
- Simplify code: Use unwrap_or_default() instead of unwrap_or_else
- Simplify policy file references in integration tests
- Use .domain field for cleaner domain matching instead of regex
- Optimize symbols collection: Use Vec<&str> instead of Vec<String>
- Use named interpolation {subject} in format strings
- Fix async/await issues: Add await to msg.data(), prepend_header(), etc.
- Properly handle all Result types with map_err
- Update test_rspamd_reject_spam to use GTUBE pattern for actual spam testing
- Refactor tests to use daemon.extract_maildir_messages() and daemon.wait_for_maildir_count()
- Update rspamd.lua to relay to sink (consistent with other tests)
- Remove assets/policy-extras/rspamd.lua (functionality moved to Rust)
- Update examples/rspamd-integration.lua to use Rust-centric API
- Update mod-rspamd README to reflect Rust-native implementation
- Fix clippy warnings and format code
Add native Rspamd integration using rspamd-client crate v0.4.0.
Features:
New components:
The file header optimization could be particularly beneficial when Rspamd runs on the same host as KumoMTA, avoiding transmission of message bodies over the socket for significant performance improvement.