rspamd: add native integration with file header support #426

vstakhov · 2025-10-09T19:17:57Z

Add native Rspamd integration using rspamd-client crate v0.4.0.

Features:

File header optimization for local scanning (avoids message body transfer)
ZSTD compression enabled by default
HTTPCrypt encryption support
Automatic metadata extraction (IP, HELO, sender, recipient)
Flexible action handling (reject, tag, quarantine)

New components:

mod-rspamd: Rust module with Lua bindings
policy-extras/rspamd.lua: high-level Lua helper with setup() function
examples/rspamd-integration.lua: comprehensive usage examples

The file header optimization could be particularly beneficial when Rspamd runs on the same host as KumoMTA, avoiding transmission of message bodies over the socket for significant performance improvement.

Add native Rspamd integration using rspamd-client crate v0.3.0. Features: - File header optimization for local scanning (avoids message body transfer) - ZSTD compression enabled by default - HTTPCrypt encryption support - Automatic metadata extraction (IP, HELO, sender, recipient) - Flexible action handling (reject, tag, quarantine) - Proxy and TLS/mTLS support New components: - mod-rspamd: Rust module with Lua bindings - policy-extras/rspamd.lua: high-level Lua helper with setup() function - examples/rspamd-integration.lua: comprehensive usage examples The file header optimization is particularly beneficial when Rspamd runs on the same host as KumoMTA, avoiding transmission of message bodies over the socket for significant performance improvement.

wez

Thank you for this!
I have quite a few comments/suggestions on how to integrate this better, and some are because of changes that only got pushed this week, likely while you weren't looking!

For testing, I'd suggest looking at the redis and amqp integrations.

For amqp, we leverage https://docs.rs/testcontainers-modules/0.13.0/testcontainers_modules/rabbitmq/struct.RabbitMq.html to run a server in a docker container:

kumomta/crates/integration-tests/src/rabbit.rs

Lines 97 to 101 in f582d60

    
           #[tokio::test] 
        
           async fn test_amqprs_rabbit() -> anyhow::Result<()> { 
        
               if std::env::var("KUMOD_TESTCONTAINERS").unwrap_or_else(|_| String::new()) != "1" { 
        
                   return Ok(()); 
        
               }

For redis, we launch an isolated copy of the redis server ourselves (no need for a container), on a port that we allocate dynamically:

kumomta/crates/mod-redis/src/test.rs

Lines 31 to 49 in f582d60

    
           pub async fn spawn(extra_config: &str) -> anyhow::Result<Self> { 
        
               let mut errors = vec![]; 
        
               for _ in 0..2 { 
        
                   let port = allocate_port(); 
        
                   match timeout( 
        
                       Duration::from_secs(5), 
        
                       Self::spawn_with_port(port, extra_config), 
        
                   ) 
        
                   .await? 
        
                   { 
        
                       Ok(me) => return Ok(me), 
        
                       Err(err) => { 
        
                           errors.push(format!("{err:#}")); 
        
                       } 
        
                   } 
        
               } 
        
               anyhow::bail!("failed to spawn redis-server: {}", errors.join(". ")); 
        
           }

What I think we should have for rspamd testing is similar to those: if we find rspamd installed in a reasonable path, then we can spawn it with an isolated configuration from a unit test, then run with a policy file that will trigger rspamd testing appropriately. Look at:

kumomta/crates/integration-tests/src/test/queue_ndr.rs

Lines 13 to 17 in f582d60

    
           let mut daemon = DaemonWithMaildirOptions::new() 
        
               .policy_file("ndr.lua") 
        
               .start() 
        
               .await 
        
               .context("DaemonWithMaildir::start")?;

for an example of an integration test that uses a custom policy. That particular one loads in https://github.com/KumoCorp/kumomta/blob/main/crates/integration-tests/ndr.lua

assets/policy-extras/rspamd.lua

crates/mod-rspamd/src/lib.rs

assets/policy-extras/rspamd.lua

crates/mod-rspamd/src/lib.rs

Add support for Rspamd milter actions including: - Add headers (X-Spam-*, custom headers, etc.) - Remove headers by name - Subject line rewriting New functions: - apply_milter_actions(): Process milter add_headers and remove_headers - Updated apply_action() to automatically apply milter modifications The milter actions are applied automatically when using rspamd:setup() or rspamd.apply_action(). Users can also call apply_milter_actions() independently for fine-grained control. Note: Body rewriting is not currently supported by rspamd-client crate. Header modifications (including Subject rewriting) are fully functional. Added comprehensive example (Example 5) demonstrating milter usage.

Upgrade to rspamd-client 0.4.0 which adds support for message body rewriting via the body_block feature. Changes in mod-rspamd: - Add body_block field to EnvelopeDataLua - Pass body_block flag to rspamd-client when requested - Expose rewritten_body in scan results Changes in Lua helper: - Add body_block option to extract_metadata() - New function apply_body_rewrite() to replace message body - Update apply_action() to handle body rewriting automatically - Body rewrite is applied before milter actions - Update setup() to support body_block configuration option Features: - When body_block is enabled, Rspamd returns the full rewritten message - Automatic application via rspamd:setup({body_block = true}) - Manual control via rspamd.apply_body_rewrite(msg, result) - Body rewriting includes all modifications (headers + body) Examples added: - Example 6: Manual body rewriting with inspection - Example 7: Automatic body rewriting with setup() - Updated Example 9: Direct API usage with body_block The body rewriting feature is useful for Rspamd modules that modify message content beyond headers, such as: - Stripping dangerous attachments - Adding email disclaimers - Modifying MIME structure - Content sanitization

- Update rspamd-client dependency to 0.4.1 - Replace set_first_named_header with prepend_header throughout - Fix Subject header handling to use remove_all_named_headers + prepend_header - Refactor EnvelopeDataLua to use #[derive(FromLua)] macro - Add "macros" feature to mlua dependency - Change additional_headers to Option<HashMap<String, String>> - Remove manual from_table() implementation - Add KeySource support for TLS parameters - Add data-loader dependency - Replace tls_cert_path/tls_key_path/tls_ca_path with KeySource types - Support File variant with helpful errors for Data/Vault variants

Add kumo.rspamd.scan_message() function that provides a streamlined interface for Rspamd integration: - Add message crate dependency for direct Message object access - Refactor RspamdClientConfig to use make_client_config() method - Add default action configuration flags: - add_headers: Add X-Spam-* headers (default: true) - reject_spam: Reject spam messages (default: false) - reject_soft: Use 4xx instead of 5xx rejection (default: false) - prefix_subject: Prefix subject with ***SPAM*** (default: false) - Automatically extract envelope metadata from Message object: - sender, recipients, IP, user, HELO, hostname - Apply default actions based on configuration - Call kumo.reject via Lua runtime for rejection handling - Add examples showing simplified API usage with encryption This simplifies Rspamd integration by eliminating the need for manual metadata extraction and boilerplate action handling.

Add comprehensive integration tests for Rspamd integration: - test_rspamd_scan_message: Basic scan functionality - Starts Rspamd container using testcontainers - Configures KumoMTA with scan_message API - Sends test message and verifies scan occurs - Validates message delivery after scan - test_rspamd_reject_spam: Rejection behavior - Tests reject_spam configuration flag - Verifies normal messages are accepted - Confirms spam rejection when enabled - test_rspamd_headers: Header addition - Validates X-Spam-* headers are added - Checks X-Spam-Flag, X-Spam-Score, X-Spam-Action - Verifies headers appear in processed messages Tests follow the same pattern as rabbit.rs and use the KUMOD_TESTCONTAINERS environment variable for opt-in execution. Uses GenericImage with rspamd/rspamd:latest Docker image.

- Add health check to verify Rspamd is ready before running tests - Define required spool directories in test configurations - Increase container startup timeout to 5 seconds - Poll Rspamd /ping endpoint to ensure service availability This eliminates test failures due to timing issues where tests attempted to scan messages before Rspamd was fully initialized.

- Change base_url from String to url::Url for validation - Change sensitive fields to KeySource for secure storage: - encryption_key: Option<KeySource> - password: Option<KeySource> - proxy_username: Option<KeySource> - proxy_password: Option<KeySource> - Add helper to extract strings from KeySource (File/Data/Vault) - Update example to demonstrate KeySource usage - Add url crate dependency This provides stronger compile-time validation of URLs and better security through flexible credential storage options.

Changed from smtp_server_message_received to smtp_server_data event for Rspamd scanning to provide: - Efficient per-batch scanning (scans once per SMTP transaction) - Avoids duplicate scans when messages have multiple recipients - Better SMTP protocol alignment (reject before final acceptance) Updated: - policy-extras/rspamd.lua: Use smtp_server_data in documentation - examples/rspamd-integration.lua: All 10 examples now use smtp_server_data - integration tests: Updated to use smtp_server_data - extract_metadata(): Uses recipient_list() to handle batch recipients - Added comments explaining per-batch scanning benefits The smtp_server_data event fires during DATA phase before message batching, allowing single scan per SMTP transaction regardless of recipient count. Metadata is stored and available for later events.

Added documentation and examples for scanning once in smtp_server_data but applying per-recipient spam thresholds in smtp_server_message_received. This pattern provides: - Efficient scanning (once per SMTP transaction) - Per-recipient policy decisions (VIP users, different domains, etc.) - Ability to reject before final message acceptance - Access to recipient-specific settings Changes: - policy-extras/rspamd.lua: Added "Per-Recipient Spam Thresholds" section with example code showing the two-stage pattern - examples/rspamd-integration.lua: Added Example 11 demonstrating per-recipient thresholds with VIP/staff/default tiers - integration-tests/src/rspamd.rs: Added test_rspamd_per_recipient_threshold test case validating the pattern works correctly

Added comprehensive config validation modeled after queue.lua: - Define valid Rspamd actions and module actions - Raise errors (not warnings) for unknown actions - Validate action parameter during setup() - Validate custom_handlers in validate_config hook - Validate quarantine settings - Validate base_url format Changes: - Added VALID_RSPAMD_ACTIONS and VALID_MODULE_ACTIONS tables - Added table_keys() helper function - Modified apply_action() to error on unknown actions - Added validation in setup() for action parameter - Added validate_config hook for deferred validation - Store configuration in mod.CONFIGURED for validation - Prevent double-configuration with error check This integrates with KumoMTA's --validate pipeline and provides clear error messages for configuration issues.

vstakhov · 2025-10-12T14:05:56Z

Ok, it seems to be more or less functional now I hope.

wez

This is looking really good, thank you!
A few more comments below!

crates/integration-tests/src/rspamd.rs

crates/mod-rspamd/src/lib.rs

- Fixed critical bug in smtp_server.rs where headers added in smtp_server_data hook were lost when trace headers were enabled. The code was using original message data instead of modified base_message data when batching messages. - Refactored Rspamd integration to use action-based logic instead of score-based: * 'no action' -> deliver normally (ham) * 'add header' -> add spam headers and deliver * 'rewrite subject' -> optionally rewrite subject and deliver * 'soft reject' -> temporary failure (451) for greylisting * 'reject' -> permanent failure (550) when reject_spam is enabled - Added support for Rspamd's smtp_message field for custom rejection messages - Removed reject_soft config option (doesn't make sense for spam) - Optimized message data handling to minimize copying - Added comprehensive integration tests with RspamdContainer helper

vstakhov · 2025-10-14T09:26:54Z

@wez it seems there might be an issue with headers rewriting in the Rust. Or maybe I don't understand the original logic...

--- a/crates/kumod/src/smtp_server.rs
+++ b/crates/kumod/src/smtp_server.rs
@@ -2619,9 +2619,12 @@ impl SmtpServerSession {
                     )
                 };

-                let mut body = Vec::with_capacity(data.len() + received.len());
+                // Use base_message.get_data() instead of original data to preserve
+                // any modifications made in smtp_server_data hook (e.g., Rspamd headers)
+                let base_data = base_message.get_data();
+                let mut body = Vec::with_capacity(base_data.len() + received.len());
                 body.extend_from_slice(received.as_bytes());
-                body.extend_from_slice(&data);
+                body.extend_from_slice(&base_data);
                 Arc::new(body.into_boxed_slice())
             } else {
                 base_message.get_data()

This change looks logical to me but please tell me if I'm wrong.

The recent message batching changes led to this bug; we were using the unchanged data payload instead of the data that may have been altered by lua callouts. refs: #426 (comment)

wez · 2025-10-15T06:09:44Z

@wez it seems there might be an issue with headers rewriting in the Rust. Or maybe I don't understand the original logic...

--- a/crates/kumod/src/smtp_server.rs
+++ b/crates/kumod/src/smtp_server.rs
@@ -2619,9 +2619,12 @@ impl SmtpServerSession {
                     )
                 };

-                let mut body = Vec::with_capacity(data.len() + received.len());
+                // Use base_message.get_data() instead of original data to preserve
+                // any modifications made in smtp_server_data hook (e.g., Rspamd headers)
+                let base_data = base_message.get_data();
+                let mut body = Vec::with_capacity(base_data.len() + received.len());
                 body.extend_from_slice(received.as_bytes());
-                body.extend_from_slice(&data);
+                body.extend_from_slice(&base_data);
                 Arc::new(body.into_boxed_slice())
             } else {
                 base_message.get_data()

This change looks logical to me but please tell me if I'm wrong.

Oh, good catch! I pushed a fix for this in 42b65b2
Thanks!

wez

Just submitting a couple of comments for now; I can see that you're active so I'll review again a little later!

crates/mod-rspamd/src/lib.rs

- Update rspamd-client to 0.5.0 which accepts AsRef<[u8]> - Add MessageDataWrapper for zero-copy Arc<Box<[u8]>> passing - Eliminate message body allocation and copy in scan_async call - Use match guards for cleaner conditional logic in action handling - Fix error handling to use format string interpolation

vstakhov · 2025-10-29T13:57:18Z

Any updates?

wez

Thanks for this; sorry for the delay in reviewing since the last time, we've had a lot going on!

Most of my comments are rather nit-picky around the tests, but the core looks good.
There's a bit of tidying up to do to match the current rust-centric implementation as compared to the original more lua-centric approach.

There's something that I think we should do to enable "connection pooling" (or rather, client-object-caching), but I think that we can do that in a follow-up PR: I'd like to get this one buttoned up so that the caching can be in its own smaller PR.

crates/integration-tests/src/rspamd.rs

crates/integration-tests/rspamd.lua

crates/mod-rspamd/src/lib.rs

assets/policy-extras/rspamd.lua

examples/rspamd-integration.lua

crates/mod-rspamd/README.md

- Simplify code: Use unwrap_or_default() instead of unwrap_or_else - Simplify policy file references in integration tests - Use .domain field for cleaner domain matching instead of regex - Optimize symbols collection: Use Vec<&str> instead of Vec<String> - Use named interpolation {subject} in format strings - Fix async/await issues: Add await to msg.data(), prepend_header(), etc. - Properly handle all Result types with map_err - Update test_rspamd_reject_spam to use GTUBE pattern for actual spam testing - Refactor tests to use daemon.extract_maildir_messages() and daemon.wait_for_maildir_count() - Update rspamd.lua to relay to sink (consistent with other tests) - Remove assets/policy-extras/rspamd.lua (functionality moved to Rust) - Update examples/rspamd-integration.lua to use Rust-centric API - Update mod-rspamd README to reflect Rust-native implementation - Fix clippy warnings and format code

MHillyer requested a review from wez October 9, 2025 19:50

wez reviewed Oct 10, 2025

View reviewed changes

vstakhov added 11 commits October 10, 2025 08:52

Merge branch 'main' into main

13f95ba

vstakhov marked this pull request as ready for review October 12, 2025 14:05

vstakhov requested a review from wez October 12, 2025 14:28

wez reviewed Oct 13, 2025

View reviewed changes

vstakhov requested a review from wez October 14, 2025 09:27

Merge branch 'main' into main

2e09852

wez added a commit that referenced this pull request Oct 15, 2025

smtp_server: fixup data content

42b65b2

The recent message batching changes led to this bug; we were using the unchanged data payload instead of the data that may have been altered by lua callouts. refs: #426 (comment)

vstakhov added 2 commits October 15, 2025 07:33

Merge remote-tracking branch 'origin/main'

89a15f4

Revert smtp_server.rs changes after merge with origin/main

065cb46

wez reviewed Oct 15, 2025

View reviewed changes

crates/mod-rspamd/src/lib.rs Outdated Show resolved Hide resolved

crates/mod-rspamd/src/lib.rs Outdated Show resolved Hide resolved

crates/mod-rspamd/src/lib.rs Show resolved Hide resolved

crates/mod-rspamd/src/lib.rs Outdated Show resolved Hide resolved

vstakhov requested a review from wez October 15, 2025 09:34

wez reviewed Nov 10, 2025

View reviewed changes

vstakhov added 2 commits November 11, 2025 11:11

Merge branch 'main' into main

7d284aa

wez mentioned this pull request Nov 14, 2025

Integrate NATS client #449

Open

	#[tokio::test]
	async fn test_amqprs_rabbit() -> anyhow::Result<()> {
	if std::env::var("KUMOD_TESTCONTAINERS").unwrap_or_else(\|_\| String::new()) != "1" {
	return Ok(());
	}

	pub async fn spawn(extra_config: &str) -> anyhow::Result<Self> {
	let mut errors = vec![];

	for _ in 0..2 {
	let port = allocate_port();
	match timeout(
	Duration::from_secs(5),
	Self::spawn_with_port(port, extra_config),
	)
	.await?
	{
	Ok(me) => return Ok(me),
	Err(err) => {
	errors.push(format!("{err:#}"));
	}
	}
	}
	anyhow::bail!("failed to spawn redis-server: {}", errors.join(". "));
	}

	let mut daemon = DaemonWithMaildirOptions::new()
	.policy_file("ndr.lua")
	.start()
	.await
	.context("DaemonWithMaildir::start")?;

Uh oh!

rspamd: add native integration with file header support #426

Are you sure you want to change the base?

rspamd: add native integration with file header support #426

Uh oh!

Conversation

vstakhov commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wez left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vstakhov commented Oct 12, 2025

Uh oh!

wez left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vstakhov commented Oct 14, 2025

Uh oh!

wez commented Oct 15, 2025

Uh oh!

wez left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vstakhov commented Oct 29, 2025

Uh oh!

wez left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vstakhov commented Oct 9, 2025 •

edited

Loading