Conversation
7e99dc8 to
946d047
Compare
946d047 to
9d228a4
Compare
| tcp dport 8551 accept comment "Consensus Client Engine API" | ||
| tcp dport 30303 accept comment "Execution Client P2P" | ||
| tcp dport { 8545, 8546 } accept comment "Execution Client RPC" | ||
| tcp dport 27017 accept comment "Searcher Input Channel" |
There was a problem hiding this comment.
What's listening on this one?
There was a problem hiding this comment.
Based on Angela's doc, it is the searcher's input channel
| ct state established,related accept | ||
|
|
||
| # Production mode rules | ||
| tcp dport 8551 accept comment "Consensus Client Engine API" |
There was a problem hiding this comment.
Hm, should we be accepting requests to CL API apart from p2p?
I think Lighthouse listens at 9000 by default, should probably be listened for and accepted
There was a problem hiding this comment.
yeah, you are right but I was following the table in Angela's doc here.
I guess, this is specific to the deployment of searcher stuff within podman. We need more context sync with Angela on that doc
There was a problem hiding this comment.
Spoke with searcher, they prefer to run CL on their machine too. So let's remove 8551, open 9000 on our trusted CL node outside, and forward to 9000 on the searcher's machine.
|
|
||
| # Production mode rules | ||
| tcp dport 30303 accept comment "Execution Client P2P" | ||
| tcp dport { 8545, 8546 } accept comment "Execution Client RPC" |
There was a problem hiding this comment.
Hm, should we accept EL RPC to anyone external?
I wonder if there's a good way for us to restrict RPC to only between containers
There was a problem hiding this comment.
I am not sure about that but I guess princess is running an EL node within podman too. Or that was my impression from Angela's doc.
However, this is just a draft and not the final nftable confs. I would require @bakhtin to check them out and provide me with the conf that we usually require from production instances instead.
There was a problem hiding this comment.
We wanted to open 8545 for the hackathon so we wouldn't have to wait for their EL node inside to sync, but removing this from the spec now.
There was a problem hiding this comment.
I guess one caveat is this only works if the untrusted user themselves cannot listen on those ports, which can be the policy but should be up front and obvious that it's the requirement
I'm guessing the script enabling and disabling maintenance mode is wip?
I wonder if we can make the user containers have explicit networks https://github.com/containers/podman/blob/main/docs/tutorials/basic_networking.md
|
One interesting thing is putting containers in a single pod (see Communicate between two rootless containers in a pod) -- that way they can communicate over localhost, and we would not have to open any of the ports on the host machine. |
yes, the maintenance and production mode switching script is the next step after finalizing those nftable firewalling stuff. I guess we need both the nftables firewalling for the root/user and also explicit network configuration for the user's containers. |
Interesting, we could look into that approach but not sure what's the hard requirements for princess use-case is. wouldn't their LH/EL nodes within the pod containers require any external access? |
c329e38 to
7243a78
Compare
7243a78 to
41a607d
Compare
| if (!wait_for_namespace()) { | ||
| fprintf(stderr, "Network namespace not available, running in default namespace\n"); | ||
| execl("/bin/sh", "sh", NULL); | ||
| perror("execl"); | ||
| return 1; |
There was a problem hiding this comment.
We should probably just crash it if the namespace is not available and not allow any ssh login as a drastic security measurement
| // Temporarily elevate privileges | ||
| if (setuid(0) != 0) { | ||
| perror("setuid"); | ||
| return 1; | ||
| } |
There was a problem hiding this comment.
unfortunately, despite setting the binary as SUID to execute with root privileges, it didn't work because setting the namespace requires root privileges and was returning operation not permitted. So we need this hack here were we escalate the privileges just to enter the name space and then set it back to the searcher user in line 80
There was a problem hiding this comment.
I'd add dropping privileges as atexit just in case
There was a problem hiding this comment.
There should be some logic that puts the machine in automatic maintenance mode if the nftables firewall failed to be setup correctly. Ideally maintenance mode shouldn't require firewall rules to work, so if the firewall setup fails for some reason, we're still in a situation where no live data can be leaked.
| return 0; | ||
| } | ||
|
|
||
| int enter_namespace() { |
There was a problem hiding this comment.
what are the restrictions of this namespace? is it only network wise?
There was a problem hiding this comment.
This is all described in details in the searcher-net-init script below.
|
One of the main headaches I have with this approach is entering the namespace via Could you try setting up a wrapper shell that is calling ns-enter? |
yeah, I simplified this yesterday through the dropbear configuration to enter a similar wrapper shell for the transition. |
| define BUILDER_ENDPOINT = 0.0.0.0 # Replace with actual builder endpoint IP | ||
|
|
||
| table inet filter { | ||
| chain input { |
There was a problem hiding this comment.
Should we not accept by default and drop everything coming from & going to BUILDER_ENDPOINT?
There was a problem hiding this comment.
I thought of this too but I think in the maintenance mode the searcher would need this to sync their stuff when deploying the podman for their EL/CL. Wouldn't they need to send and receive from peer connections?
That's why I only opened those and dropped everything else including the state diff stream which is sensitive and not necessary in the maintenance stage.
| ct state established,related accept | ||
|
|
||
| # Allow only builder endpoint connections to these ports | ||
| ip saddr $BUILDER_ENDPOINT tcp dport 9000 accept comment "FB-operated Consensus Client P2P" |
There was a problem hiding this comment.
I guess more accurate would be "Builder-operated..."
There was a problem hiding this comment.
yeah, I took the naming from the table in the document but that can be changed for sure
| type filter hook output priority 0; policy drop | ||
|
|
||
| # Allow established/related connections | ||
| ct state established,related accept |
There was a problem hiding this comment.
hm, if a searcher establishes a connection during maintenance to their own server, wouldn't this rule let them continue sending stuff? Or are you maybe dropping the established connections on transition to production?
There was a problem hiding this comment.
I am currently dropping/killing everything the searcher establishes during transition. But that is why I also want a way to test this somehow.
It is kind of difficult to check which ones to keep established and which one to drop. Also, we need to make sure that the undesired established connection doesn't try to re-establish itself upon termination during the transition. But that shouldn't be possible because the transition nftable configuration should drop anything not allowed.
| ip daddr $BUILDER_ENDPOINT tcp dport 8645 accept comment "Builder RPC" | ||
|
|
||
| # Allow fluent-bit to access log endpoint | ||
| ip daddr $SEARCHER_LOG_HOST tcp dport $SEARCHER_LOG_PORT accept comment "Logs Collector" |
There was a problem hiding this comment.
Should this be only from the root network ns?
There was a problem hiding this comment.
yes, only the root should be able to access this and intercept sent/received stuff over that address:port
| ip daddr $SEARCHER_LOG_HOST tcp dport $SEARCHER_LOG_PORT accept comment "Logs Collector" | ||
|
|
||
| # Allow traffic to searcher namespace | ||
| oifname "veth0" accept comment "Searcher network interface" |
There was a problem hiding this comment.
Isn't veth0 root ns? I think the rule is correct but the comment is not
# Searcher Namespace Root Namespace External
# [veth1 10.0.0.2] [veth0 10.0.0.1] [eth0]
There was a problem hiding this comment.
IIUC, this opens that port to the outside. so it is more like veth0 -> eth0-> external world.
Internally, the veth1 is connected to veth0 and anything from the searcher trying to reach that address will be intercepted by veth0 to handle with our delayed fluent-bit.
| flush ruleset | ||
|
|
||
| table inet filter { | ||
| chain input { |
There was a problem hiding this comment.
I'd guess we should allow EL and CL
There was a problem hiding this comment.
Yes, we probably should keep those EL/CL connections. I believe I did this as a strict transition configuration and then re-allow them.
I have no strong opinion on this. So we can re-add the EL/CL rules here again without the state diff ofc
| // Wait for processes to terminate | ||
| sleep(2); | ||
|
|
||
| // Force kill remaining processes |
There was a problem hiding this comment.
Hm, why are you killing processes? Wouldn't dropping connections be enough?
There was a problem hiding this comment.
I tried dropping but the already established ones where still intakt and upon several tries with different approaches, the killing was doing it. But I am for sure open to suggestions on how and which connections to drop
| long pid; | ||
|
|
||
| // Elevate privileges | ||
| if (setuid(0) != 0 || setgid(0) != 0) { |
There was a problem hiding this comment.
I'd set reverting privilege as atexit before elevating them
https://en.cppreference.com/w/c/program/atexit
There was a problem hiding this comment.
Interesting, yeah. We can abstract the drop privileges in a separate function and wrap it with atexit
| // Temporarily elevate privileges | ||
| if (setuid(0) != 0) { | ||
| perror("setuid"); | ||
| return 1; | ||
| } |
There was a problem hiding this comment.
I'd add dropping privileges as atexit just in case
| return 1; | ||
| } | ||
|
|
||
| if (strcmp(argv[1], "transition-to-maintenance") == 0) { |
There was a problem hiding this comment.
General comment that strcmp is a buffer overrun waiting to happen - I'd strongly recommend strnlen_s + strncmp for all the comparisons (or C++ but it's a bit too late for that)
There was a problem hiding this comment.
I can try to refactor all to C++ or different language if necessary. Like Go.
First I just wanted to quickly have something running and doing the logic we need in a compiled binary.
I avoided Rust for the sake of delivery speed but we can also consider it if needed
| } | ||
|
|
||
| chain output { | ||
| type filter hook output priority 0; policy drop |
There was a problem hiding this comment.
I'd guess established output is alright?
There was a problem hiding this comment.
in the transition, you need to make sure that the killed undesired connection do not re-establish themselves during transition. So anything not explicitly added should be by default not allowed to establish during transition
|
|
||
| # Configure veth1 inside the namespace | ||
| log "Configuring veth1 in namespace $NAMESPACE..." | ||
| if ! ip netns exec "$NAMESPACE" ip addr add 10.0.0.2/24 dev veth1; then |
There was a problem hiding this comment.
Shouldn't this be 10.0.0.2/32 or 10.0.2.0/24? Similar comment for the 10.0.0.1
There was a problem hiding this comment.
Could you elaborate more on why?
| } | ||
|
|
||
| // Apply final rules | ||
| snprintf(cmd, sizeof(cmd), "/usr/sbin/nft -f /etc/nftables-%s.conf", to_mode); |
There was a problem hiding this comment.
I'd avoid reusing the buffer here
There was a problem hiding this comment.
I agree, but in this case it is hardcoded size but I see we don't clear it which indeed might be a problem.
I will refactor later on.
| return ret; | ||
| } | ||
|
|
||
| static int execute_transition_script(const char *from_mode, const char *to_mode) { |
There was a problem hiding this comment.
I'd guess the function would be simpler if you hardcoded the transitions as one function each
There was a problem hiding this comment.
I had them as one function each but it became annoying each time to refactor an issue/bug, I had to handle them in each separately due to duplication. So here I left the common code.
I can separate them again if we see it better or more readable.
| uid_t real_gid = getgid(); | ||
|
|
||
| log_message(LOG_INFO, "Applying transition rules..."); | ||
| if (setuid(0) != 0) { |
There was a problem hiding this comment.
I don't think you drop the privilege
There was a problem hiding this comment.
good catch. This only elevates and down below it should drop it again
| } | ||
|
|
||
| static int read_current_mode(char *mode, size_t size) { | ||
| if (acquire_lock() != 0) { |
There was a problem hiding this comment.
I'd probably lock the whole execution -- I don't think concurrent access is handled
There was a problem hiding this comment.
So you mean not on function level but rather directly in the main. Lock it and only unlock it when finished instead?
| export PS1='searcher@tdx:\w\$ ' | ||
|
|
||
| # Check current mode and handle transition if needed | ||
| mode=$(mode-controller status | awk '{print $NF}') |
There was a problem hiding this comment.
An alternative would be to not read the mode here, instead call mode-controller with no arguments and let it figure out the mode.
I think /usr/bin/ns-enter can be called regardless of mode which is super convenient
There was a problem hiding this comment.
I had it all in mode-controller originally but there were issues regarding the user not being set correctly.
Also the execution status check was not working correctly upon ssh-in into the machine while in production mode.
That's why we need this check in the login-shell for it to check prior to passing it to the mode-controller
No description provided.