Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smb hashmap/v9 #12036

Closed
wants to merge 11 commits into from
Closed

Smb hashmap/v9 #12036

wants to merge 11 commits into from

Conversation

victorjulien
Copy link
Member

LruCache use for all hashmaps in smb state. This will bound each of them.

https://redmine.openinfosecfoundation.org/issues/5672

Replaces #12028.

Don't tag the session as gap'd when the GAP is in a precise location:

1. in "skip" data, where the GAP just fits the skip data

2. in file data, where we pass the GAP on to the file

This reduces load of GAP post-processing that is unnecessary in these
case.
Use `lru` crate. Rename to reflect this.

Add `app-layer.protocols.smb.max-guid-cache-size` to control the max
size of the LRU cache.

Ticket: OISF#5672.
Rename to read_offset_cache.

Add `app-layer.protocols.smb.max-read-offset-cache-size` option to
control the limit.

Ticket: OISF#5672.
Turn the map mapping the smb session key to smb tree into a lru cache,
limited to 1024 by default.

Add `app-layer.protocols.smb.max-tree-cache-size` option to control the
limit.

Ticket: OISF#5672.
Reimplement the ssnguid2vec_map HashMap as a LruCache.

Since this is a DCERPC record cache, name it as such.

Default size is 128. Can be controlled by
`app-layer.protocols.smb.max-dcerpc-frag-cache-size`.

Ticket: OISF#5672.
Generic ssn2vec_map was a HashMap used for mapping session key to
different types of vector data:
- GUID
- filename
- share name

Turn this into a bounded LruCache. Rename to ssn2vec_cache.

Size of the cache is 512 by default, and can be configured using:

`app-layer.protocols.smb.max-session-cache-size`

Ticket: OISF#5672.
Copy link

codecov bot commented Oct 25, 2024

Codecov Report

Attention: Patch coverage is 86.90096% with 41 lines in your changes missing coverage. Please review.

Project coverage is 83.40%. Comparing base (89aa525) to head (2684860).
Report is 4 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #12036      +/-   ##
==========================================
- Coverage   83.42%   83.40%   -0.02%     
==========================================
  Files         910      910              
  Lines      257642   257671      +29     
==========================================
- Hits       214934   214922      -12     
- Misses      42708    42749      +41     
Flag Coverage Δ
fuzzcorpus 61.55% <86.58%> (-0.02%) ⬇️
livemode 19.41% <3.19%> (+<0.01%) ⬆️
pcap 44.40% <73.48%> (-0.10%) ⬇️
suricata-verify 62.72% <74.44%> (-0.05%) ⬇️
unittests 59.36% <15.01%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

@suricata-qa
Copy link

Information: QA ran without warnings.

Pipeline 23185

SCLogError!("Invalid max-guid-cache-size value");
}
}
let retval = conf_get("app-layer.protocols.smb.max-read-offset-cache-size");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like some get_as_usize could be useful here :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be done as a future optimization.

@@ -80,6 +81,16 @@ pub static mut SMB_CFG_MAX_READ_QUEUE_CNT: u32 = 64;
pub static mut SMB_CFG_MAX_WRITE_SIZE: u32 = 16777216;
pub static mut SMB_CFG_MAX_WRITE_QUEUE_SIZE: u32 = 67108864;
pub static mut SMB_CFG_MAX_WRITE_QUEUE_CNT: u32 = 64;
/// max size of the per state guid2name cache
pub static mut SMB_CFG_MAX_GUID_CACHE_SIZE: usize = 1024;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side note, but it would be good to have something like lazy_static to say, yes this is a global variable, but once initialization is done, it is a constant and it can be read safely...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be done as a future optimization.

@@ -28,13 +28,14 @@
use std;
use std::str;
use std::ffi::{self, CString};

use std::collections::HashMap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great overall

pub struct SMBState<> {
pub state_data: AppLayerStateData,

/// map ssn/tree/msgid to vec (guid/name/share)
pub ssn2vec_map: HashMap<SMBCommonHdr, Vec<u8>>,
pub ssn2vec_cache: LruCache<SMBCommonHdr, Vec<u8>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next question after this work :
So, this work bounds the number of elements by flow (great), but is the size of each element bounded ? Can this Vec and other ones like guid2name_cache: LruCache<Vec<u8>, Vec<u8>> have arbitrary big vectors ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question for a follow up ticket.

@catenacyber
Copy link
Contributor

This looks great overall.
I did not look yet into the details of whether we are removing elements from the caches every time we should. (wondering about gaps, and when only one side of the traffic is seen)
How did you test this ?

Comment on lines +1694 to +1717

::

smb:
max-guid-cache-size: 1024
max-rec-offset-cache-size: 128
max-tree-cache-size: 512
max-dcerpc-frag-cache-size: 128
max-session-cache-size: 512

The `max-guid-cache-size` setting controls the size of the hash that maps the GUID to
filenames. These are added through CREATE commands and removed by CLOSE commands.

`max-rec-offset-cache-size` controls the size of the hash that maps the READ offset
from READ commands to the READ responses.

The `max-tree-cache-size` option contols the size of the SMB session to SMB tree hash.

`max-dcerpc-frag-cache-size` controls the size of the hash that tracks partial DCERPC
over SMB records. These are buffered in this hash to only parse the DCERPC record when
it is fully reassembled.

The `max-session-cache-size` setting controls the size of a generic hash table that maps
SMB session to filenames, GUIDs and share names.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any additional comments here on the suitability of the defaults would be great.. Conservative? Good for average use? etc. Why they were chosen?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fairly random. I don't have a good env to test this stuff with. Want to make some ways to observe the sizes.

@victorjulien
Copy link
Member Author

This looks great overall. I did not look yet into the details of whether we are removing elements from the caches every time we should. (wondering about gaps, and when only one side of the traffic is seen) How did you test this ?

Large pcaps with a debug print to see the sizes, and it stays in bounds.

@victorjulien victorjulien mentioned this pull request Nov 5, 2024
@victorjulien
Copy link
Member Author

Replaced by #12087

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants