Skip to content

Commit 5a508c1

Browse files
maltesandersiegfriedweber
andauthoredApr 3, 2025··
Fix: nifi reduce image size (#1027)
* reduce size * adapt changelog * add check for ownership / permission * set permissions * use check permissions script * newline * fixes * fixes * fixes * added purpose and usage * fix * typo * improve comments * linter * linter 2 * Apply suggestions from code review Co-authored-by: Siegfried Weber <mail@siegfriedweber.net> * consolidate bash commands * fix if else * move check-permissions-ownership.sh to stackable-base image /bin * adapted changelog * Apply suggestions from code review Co-authored-by: Siegfried Weber <mail@siegfriedweber.net> --------- Co-authored-by: Siegfried Weber <mail@siegfriedweber.net>
1 parent c654ba8 commit 5a508c1

File tree

5 files changed

+167
-63
lines changed

5 files changed

+167
-63
lines changed
 

‎CHANGELOG.md

+4
Original file line numberDiff line numberDiff line change
@@ -7,16 +7,20 @@ All notable changes to this project will be documented in this file.
77
### Added
88

99
- spark-connect-client: A new image for Spark connect tests and demos ([#1034])
10+
- nifi: check for correct permissions and ownerships in /stackable folder via
11+
`check-permissions-ownership.sh` provided in stackable-base image ([#1027]).
1012

1113
### Changed
1214

1315
- spark-k8s: Include spark-connect jars. Replace OpenJDK with Temurin JDK. Cleanup. ([#1034])
1416

1517
### Fixed
1618

19+
- nifi: reduce docker image size by removing the recursive chown/chmods in the final image ([#1027]).
1720
- spark-k8s: reduce docker image size by removing the recursive chown/chmods in the final image ([#1042]).
1821
- Add `--locked` flag to `cargo install` commands for reproducible builds ([#1044]).
1922

23+
[#1027]: https://github.com/stackabletech/docker-images/pull/1027
2024
[#1034]: https://github.com/stackabletech/docker-images/pull/1034
2125
[#1042]: https://github.com/stackabletech/docker-images/pull/1042
2226
[#1044]: https://github.com/stackabletech/docker-images/pull/1044

‎nifi/Dockerfile

+81-50
Original file line numberDiff line numberDiff line change
@@ -7,59 +7,78 @@ ARG PRODUCT
77
ARG MAVEN_VERSION="3.9.8"
88
ARG STACKABLE_USER_UID
99

10-
RUN microdnf update && \
11-
microdnf clean all && \
12-
rm -rf /var/cache/yum
10+
RUN <<EOF
11+
microdnf update
12+
microdnf clean all
13+
rm -rf /var/cache/yum
14+
EOF
1315

1416
# NOTE: From NiFi 2.0.0 upwards Apache Maven 3.9.6+ is required. As of 2024-07-04 the java-devel image
1517
# ships 3.6.3. This will update maven accordingly depending on the version. The error is due to the maven-enforer-plugin.
1618
#
1719
# [ERROR] Rule 2: org.apache.maven.enforcer.rules.version.RequireMavenVersion failed with message:
1820
# [ERROR] Detected Maven Version: 3.6.3 is not in the allowed range [3.9.6,).
1921
#
20-
WORKDIR /tmp
21-
RUN if [[ "${PRODUCT}" != 1.* ]] ; then \
22-
curl "https://repo.stackable.tech/repository/packages/maven/apache-maven-${MAVEN_VERSION}-bin.tar.gz" | tar -xzC . && \
23-
ln -sf /tmp/apache-maven-${MAVEN_VERSION}/bin/mvn /usr/bin/mvn ; \
24-
fi
22+
RUN <<EOF
23+
if [[ "${PRODUCT}" != 1.* ]] ; then
24+
cd /tmp
25+
curl "https://repo.stackable.tech/repository/packages/maven/apache-maven-${MAVEN_VERSION}-bin.tar.gz" | tar -xzC .
26+
ln -sf /tmp/apache-maven-${MAVEN_VERSION}/bin/mvn /usr/bin/mvn
27+
fi
28+
EOF
2529

2630
USER ${STACKABLE_USER_UID}
2731
WORKDIR /stackable
2832

2933
COPY --chown=${STACKABLE_USER_UID}:0 nifi/stackable/patches /stackable/patches
3034

31-
RUN curl 'https://repo.stackable.tech/repository/m2/tech/stackable/nifi/stackable-bcrypt/1.0-SNAPSHOT/stackable-bcrypt-1.0-20240508.153334-1-jar-with-dependencies.jar' \
32-
# This used to be located in /bin/stackable-bcrypt.jar. We create a softlink for /bin/stackable-bcrypt.jar in the main container for backwards compatibility.
33-
-o /stackable/stackable-bcrypt.jar && \
34-
# Get the source release from nexus
35-
curl "https://repo.stackable.tech/repository/packages/nifi/nifi-${PRODUCT}-source-release.zip" -o "/stackable/nifi-${PRODUCT}-source-release.zip" && \
36-
unzip "nifi-${PRODUCT}-source-release.zip" && \
37-
# Clean up downloaded source after unzipping
38-
rm -rf "nifi-${PRODUCT}-source-release.zip" && \
39-
# The NiFi "binary" ends up in a folder named "nifi-${PRODUCT}" which should be copied to /stackable
40-
# from /stackable/nifi-${PRODUCT}-src/nifi-assembly/target/nifi-${PRODUCT}-bin/nifi-${PRODUCT} (see later steps)
41-
# Therefore we add the suffix "-src" to be able to copy the binary and remove the unzipped sources afterwards.
42-
mv nifi-${PRODUCT} nifi-${PRODUCT}-src && \
43-
# Apply patches
44-
chmod +x patches/apply_patches.sh && \
45-
patches/apply_patches.sh ${PRODUCT} && \
46-
# Build NiFi
47-
cd /stackable/nifi-${PRODUCT}-src/ && \
48-
# NOTE: Since NiFi 2.0.0 PutIceberg Processor and services were removed, so including the `include-iceberg` profile does nothing.
49-
# Additionally some modules were moved to optional build profiles, so we need to add `include-hadoop` to get `nifi-parquet-nar` for example.
50-
if [[ "${PRODUCT}" != 1.* ]] ; then \
51-
mvn --batch-mode --no-transfer-progress clean install -Dmaven.javadoc.skip=true -DskipTests --activate-profiles include-hadoop,include-hadoop-aws,include-hadoop-azure,include-hadoop-gcp ; \
52-
else \
53-
mvn --batch-mode --no-transfer-progress clean install -Dmaven.javadoc.skip=true -DskipTests --activate-profiles include-iceberg,include-hadoop-aws,include-hadoop-azure,include-hadoop-gcp ; \
54-
fi && \
55-
# Copy the binaries to the /stackable folder
56-
mv /stackable/nifi-${PRODUCT}-src/nifi-assembly/target/nifi-${PRODUCT}-bin/nifi-${PRODUCT} /stackable/nifi-${PRODUCT} && \
57-
# Copy the SBOM as well
58-
mv /stackable/nifi-${PRODUCT}-src/nifi-assembly/target/bom.json /stackable/nifi-${PRODUCT}/nifi-${PRODUCT}.cdx.json && \
59-
# Remove the unzipped sources
60-
rm -rf /stackable/nifi-${PRODUCT}-src && \
61-
# Remove generated docs in binary
62-
rm -rf /stackable/nifi-${PRODUCT}/docs
35+
RUN <<EOF
36+
# This used to be located in /bin/stackable-bcrypt.jar. We create a softlink for /bin/stackable-bcrypt.jar in the main container for backwards compatibility.
37+
curl 'https://repo.stackable.tech/repository/m2/tech/stackable/nifi/stackable-bcrypt/1.0-SNAPSHOT/stackable-bcrypt-1.0-20240508.153334-1-jar-with-dependencies.jar' \
38+
-o /stackable/stackable-bcrypt.jar
39+
40+
# Get the source release from nexus
41+
curl "https://repo.stackable.tech/repository/packages/nifi/nifi-${PRODUCT}-source-release.zip" -o "/stackable/nifi-${PRODUCT}-source-release.zip"
42+
unzip "nifi-${PRODUCT}-source-release.zip"
43+
44+
# Clean up downloaded source after unzipping
45+
rm -rf "nifi-${PRODUCT}-source-release.zip"
46+
47+
# The NiFi "binary" ends up in a folder named "nifi-${PRODUCT}" which should be copied to /stackable
48+
# from /stackable/nifi-${PRODUCT}-src/nifi-assembly/target/nifi-${PRODUCT}-bin/nifi-${PRODUCT} (see later steps)
49+
# Therefore we add the suffix "-src" to be able to copy the binary and remove the unzipped sources afterwards.
50+
mv nifi-${PRODUCT} nifi-${PRODUCT}-src
51+
52+
# Apply patches
53+
chmod +x patches/apply_patches.sh
54+
patches/apply_patches.sh ${PRODUCT}
55+
56+
# Build NiFi
57+
cd /stackable/nifi-${PRODUCT}-src/
58+
59+
# NOTE: Since NiFi 2.0.0 PutIceberg Processor and services were removed, so including the `include-iceberg` profile does nothing.
60+
# Additionally some modules were moved to optional build profiles, so we need to add `include-hadoop` to get `nifi-parquet-nar` for example.
61+
if [[ "${PRODUCT}" != 1.* ]] ; then
62+
mvn --batch-mode --no-transfer-progress clean install -Dmaven.javadoc.skip=true -DskipTests --activate-profiles include-hadoop,include-hadoop-aws,include-hadoop-azure,include-hadoop-gcp
63+
else
64+
mvn --batch-mode --no-transfer-progress clean install -Dmaven.javadoc.skip=true -DskipTests --activate-profiles include-iceberg,include-hadoop-aws,include-hadoop-azure,include-hadoop-gcp
65+
fi
66+
67+
# Copy the binaries to the /stackable folder
68+
mv /stackable/nifi-${PRODUCT}-src/nifi-assembly/target/nifi-${PRODUCT}-bin/nifi-${PRODUCT} /stackable/nifi-${PRODUCT}
69+
70+
# Copy the SBOM as well
71+
mv /stackable/nifi-${PRODUCT}-src/nifi-assembly/target/bom.json /stackable/nifi-${PRODUCT}/nifi-${PRODUCT}.cdx.json
72+
73+
# Remove the unzipped sources
74+
rm -rf /stackable/nifi-${PRODUCT}-src
75+
76+
# Remove generated docs in binary
77+
rm -rf /stackable/nifi-${PRODUCT}/docs
78+
79+
# Set correct permissions
80+
chmod -R g=u /stackable
81+
EOF
6382

6483
FROM stackable/image/java-base AS final
6584

@@ -83,8 +102,6 @@ COPY --chown=${STACKABLE_USER_UID}:0 nifi/licenses /licenses
83102
COPY --chown=${STACKABLE_USER_UID}:0 nifi/python /stackable/python
84103

85104
RUN <<EOF
86-
ln -s /stackable/nifi-${PRODUCT} /stackable/nifi
87-
88105
microdnf update
89106

90107
# python-pip: Required to install Python packages
@@ -96,24 +113,38 @@ microdnf clean all
96113
rm -rf /var/cache/yum
97114

98115
# The nipyapi is required until NiFi 2.0.x for the ReportingTaskJob
116+
# This can be removed once the 1.x.x line is removed
99117
pip install --no-cache-dir \
100118
nipyapi==0.19.1
101119

102120
# For backwards compatibility we create a softlink in /bin where the jar used to be as long as we are root
103121
# This can be removed once older versions / operators using this are no longer supported
104122
ln -s /stackable/stackable-bcrypt.jar /bin/stackable-bcrypt.jar
105123

106-
# All files and folders owned by root group to support running as arbitrary users.
107-
# This is best practice as all container users will belong to the root group (0).
108-
chown -R ${STACKABLE_USER_UID}:0 /stackable
109-
chmod -R g=u /stackable
124+
ln -s /stackable/nifi-${PRODUCT} /stackable/nifi
125+
126+
# fix missing permissions / ownership
127+
chown --no-dereference ${STACKABLE_USER_UID}:0 /stackable/nifi
128+
chmod --recursive g=u /stackable/python
129+
chmod --recursive g=u /stackable/bin
130+
chmod g=u /stackable/nifi-${PRODUCT}
131+
EOF
132+
133+
# ----------------------------------------
134+
# Checks
135+
# This section is to run final checks to ensure the created final images
136+
# adhere to several minimal requirements like:
137+
# - check file permissions and ownerships
138+
# ----------------------------------------
139+
140+
# Check that permissions and ownership in /stackable are set correctly
141+
# This will fail and stop the build if any mismatches are found.
142+
RUN <<EOF
143+
/bin/check-permissions-ownership.sh /stackable ${STACKABLE_USER_UID} 0
110144
EOF
111145

112146
# ----------------------------------------
113-
# Attention: We are changing the group of all files in /stackable directly above
114-
# If you do any file based actions (copying / creating etc.) below this comment you
115-
# absolutely need to make sure that the correct permissions are applied!
116-
# chown ${STACKABLE_USER_UID}:0
147+
# Attention: Do not perform any file based actions (copying/creating etc.) below this comment because the permissions would not be checked.
117148
# ----------------------------------------
118149

119150
USER ${STACKABLE_USER_UID}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
#!/bin/bash
2+
#
3+
# Purpose
4+
#
5+
# Checks that permissions and ownership in the provided directory are set according to:
6+
#
7+
# chown -R ${STACKABLE_USER_UID}:0 /stackable
8+
# chmod -R g=u /stackable
9+
#
10+
# Will error out and print directories / files that do not match the required permissions or ownership.
11+
#
12+
# Usage
13+
#
14+
# ./check-permissions-ownership.sh <directory> <uid> <gid>
15+
# ./check-permissions-ownership.sh /stackable ${STACKABLE_USER_UID} 0
16+
#
17+
18+
if [[ $# -ne 3 ]]; then
19+
echo "Wrong number of parameters supplied. Usage:"
20+
echo "$0 <directory> <uid> <gid>"
21+
echo "$0 /stackable 1000 0"
22+
exit 1
23+
fi
24+
25+
DIRECTORY=$1
26+
EXPECTED_UID=$2
27+
EXPECTED_GID=$3
28+
29+
error_flag=0
30+
31+
# Check ownership
32+
while IFS= read -r -d '' file; do
33+
uid=$(stat -c "%u" "$file")
34+
gid=$(stat -c "%g" "$file")
35+
36+
if [[ "$uid" -ne "$EXPECTED_UID" || "$gid" -ne "$EXPECTED_GID" ]]; then
37+
echo "Ownership mismatch: $file (Expected: $EXPECTED_UID:$EXPECTED_GID, Found: $uid:$gid)"
38+
error_flag=1
39+
fi
40+
done < <(find "$DIRECTORY" -print0)
41+
42+
# Check permissions
43+
while IFS= read -r -d '' file; do
44+
perms=$(stat -c "%A" "$file")
45+
owner_perms="${perms:1:3}"
46+
group_perms="${perms:4:3}"
47+
48+
if [[ "$owner_perms" != "$group_perms" ]]; then
49+
echo "Permission mismatch: $file (Owner: $owner_perms, Group: $group_perms)"
50+
error_flag=1
51+
fi
52+
done < <(find "$DIRECTORY" -print0)
53+
54+
if [[ $error_flag -ne 0 ]]; then
55+
echo "Permission and Ownership checks failed for $DIRECTORY!"
56+
exit 1
57+
fi
58+
59+
echo "Permission and Ownership checks succeeded for $DIRECTORY!"

‎stackable-base/Dockerfile

+4
Original file line numberDiff line numberDiff line change
@@ -204,6 +204,10 @@ COPY --from=config-utils --chown=${STACKABLE_USER_UID}:0 /config-utils/config-ut
204204
# Debug tool that logs generic system information.
205205
COPY --from=containerdebug --chown=${STACKABLE_USER_UID}:0 /containerdebug/target/release/containerdebug /stackable/containerdebug
206206

207+
# **check-permissions-ownership.sh**
208+
# Bash script to check proper permissions and ownership requirements in the final Stackable images
209+
COPY --chown=${STACKABLE_USER_UID}:0 shared/checks/check-permissions-ownership.sh /bin/check-permissions-ownership.sh
210+
207211
ENV PATH="${PATH}:/stackable"
208212

209213
# These labels have mostly been superceded by the OpenContainer spec annotations below but it doesn't hurt to include them

‎vector/Dockerfile

+19-13
Original file line numberDiff line numberDiff line change
@@ -14,16 +14,22 @@ ARG STACKABLE_USER_UID
1414
# This happens by writing a "shutdown file" in a shared volume
1515
# See https://github.com/stackabletech/airflow-operator/blob/23.4.1/rust/operator-binary/src/airflow_db_controller.rs#L269 for an example
1616
# The Vector container waits for this file to appear and this waiting happens using `inotifywait` which comes from the `inotify-tools` package
17-
RUN ARCH="${TARGETARCH/amd64/x86_64}" ARCH="${ARCH/arm64/aarch64}" && \
18-
rpm --install \
19-
"https://repo.stackable.tech/repository/packages/vector/vector-${PRODUCT}-${RPM_RELEASE}.${ARCH}.rpm" \
20-
"https://repo.stackable.tech/repository/packages/inotify-tools/inotify-tools-${INOTIFY_TOOLS}.${ARCH}.rpm" && \
21-
mkdir /licenses && \
22-
cp /usr/share/licenses/vector-${PRODUCT}/LICENSE /licenses/VECTOR_LICENSE && \
23-
# Create the directory /stackable/vector/var.
24-
# This directory is set by operator-rs in the parameter `data_dir`
25-
# of the Vector configuration. The directory is used for persisting
26-
# Vector state, such as on-disk buffers, file checkpoints, and more.
27-
# Vector needs write permissions.
28-
mkdir --parents /stackable/vector/var && \
29-
chown --recursive ${STACKABLE_USER_UID}:0 /stackable/
17+
RUN <<EOF
18+
ARCH="${TARGETARCH/amd64/x86_64}"
19+
ARCH="${ARCH/arm64/aarch64}"
20+
rpm --install \
21+
"https://repo.stackable.tech/repository/packages/vector/vector-${PRODUCT}-${RPM_RELEASE}.${ARCH}.rpm" \
22+
"https://repo.stackable.tech/repository/packages/inotify-tools/inotify-tools-${INOTIFY_TOOLS}.${ARCH}.rpm"
23+
mkdir /licenses
24+
cp /usr/share/licenses/vector-${PRODUCT}/LICENSE /licenses/VECTOR_LICENSE
25+
26+
# Create the directory /stackable/vector/var.
27+
# This directory is set by operator-rs in the parameter `data_dir`
28+
# of the Vector configuration. The directory is used for persisting
29+
# Vector state, such as on-disk buffers, file checkpoints, and more.
30+
# Vector needs write permissions.
31+
mkdir --parents /stackable/vector/var
32+
chown --recursive ${STACKABLE_USER_UID}:0 /stackable/
33+
# Set correct permissions
34+
chmod -R g=u /stackable
35+
EOF

0 commit comments

Comments
 (0)
Please sign in to comment.