Skip to content

Create a single binary distribution bundle #1589

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

flyrain
Copy link
Contributor

@flyrain flyrain commented May 14, 2025

What’s included in the PR:

  1. Introduced a new module that combines both Admin Tools and Server. Please refer to my original email for the motivation behind this change.
  2. Removed the now-redundant run-scripts module.
  3. Consolidated the README to reflect the unified binary distribution.
  4. Standardized the binary distribution package naming to polaris-{version}.tgz and polaris-{version}.zip, following common conventions used by other projects (e.g., spark-3.5.5-bin-hadoop3.tgz).

TODOs:

  • Consolidate LICENSE and NOTICE files from both Admin Tools and Server.
  • Remove the distribution tasks in each of the original modules.
  • Consolidate the shared libs between Admin Tool and Server

The PR is technically ready, but I plan to wait until the 0.10 release is finalized to avoid disrupting the release process.

Yufei Gu added 10 commits May 14, 2025 12:05
This script combines the Polaris admin tool and server distributions into a single package: Maintains separate admin and server components, provides a unified run script to launch either component, preserves all necessary dependencies and configurations, and simplifies deployment by having both components in one distribution.
… run script - Remove repository configuration from plugins, restore run.sh script, fix build issues
…rojects.main.properties - Remove run-script dependency from admin and server modules - Fix code formatting in build files
@jbonofre
Copy link
Member

@flyrain thanks for this draft PR ! Just a note: we should keep the changes proposed in #1588 and #1568 . I will double check here.

@flyrain
Copy link
Contributor Author

flyrain commented May 14, 2025

@flyrain thanks for this draft PR ! Just a note: we should keep the changes proposed in #1588 and #1568 . I will double check here.

Yes, I added the disclaimer in the new commit.

├── README.md
├── admin/ # Admin tool files
├── server/ # Server files
└── run.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark was mentioned as an example, but IIRC, Spark's distribution has a bin directory for runnable files... Why not put run.sh into bin in Polaris too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one of reasons Spark has a dedicated bin directory is due to the number of items under the root directory like the following. Our distribution is different though, we don't have a lot of top level items, but I'm OK either way.

drwxr-xr-x@   3 ygu  staff    96B Aug  6  2024 yarn/
drwxr-xr-x@  31 ygu  staff   992B Aug  6  2024 sbin/
-rw-r--r--@   1 ygu  staff   166B Aug  6  2024 RELEASE
-rw-r--r--@   1 ygu  staff   4.5K Aug  6  2024 README.md
drwxr-xr-x@   3 ygu  staff    96B Aug  6  2024 R/
drwxr-xr-x@  19 ygu  staff   608B Aug  6  2024 python/
-rw-r--r--@   1 ygu  staff    56K Aug  6  2024 NOTICE
drwxr-xr-x@  58 ygu  staff   1.8K Aug  6  2024 licenses/
-rw-r--r--@   1 ygu  staff    22K Aug  6  2024 LICENSE
drwxr-xr-x@   4 ygu  staff   128B Aug  6  2024 kubernetes/
drwxr-xr-x@   4 ygu  staff   128B Aug  6  2024 examples/
drwxr-xr-x@   6 ygu  staff   192B Aug  6  2024 data/
drwxr-xr-x@   8 ygu  staff   256B Aug  6  2024 conf/
drwxr-xr-x@  30 ygu  staff   960B Aug  6  2024 bin/
drwxr-xr-x@ 255 ygu  staff   8.0K Oct 14  2024 jars/


script_dir="$(dirname "$0")"
# Default to server if no component specified
COMPONENT=${1:-server}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the default is useful. It is possible only if server is executed without arguments. 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does server take any parameters? Even if yes, I think the no-arg run is still the majority, in that case, it's easier to use by just hit run.sh.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not have bin/polaris-server and bin/polaris-admin (note: no .sh suffix) similar to Spark's bin/spark-shell, bin/spark-sql, etc?..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good idea to have two different bash script, but I'd prefer to do that once we merge libs between admin tool and server(the third todo item)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR refactors run scripts to support a combined dist archive anyway. I think it makes sense to put separate admin/server scripts in this PR.


description = "Apache Polaris Binary Distribution"

val adminProject = project(":polaris-quarkus-admin")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we do this change, I would suggest to rename polaris-quarkus-admin and polaris-quarkus-server as polaris-admin and polaris-server, and also maybe rename quarkus folder as dist folder or so.

Copy link
Member

@snazy snazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change uses a bunch of highly discouraged Gradle practices:

  • It's accessing another Gradle projects and those tasks - that's bad practice and highly discouraged.
  • This leads to task-dependency and in turn build and Gradle caching issues

Please take a look at Gradle artifacts and configurations and defining portable artifact dependencies including the proper task dependencies implied by these mechanisms.

@@ -0,0 +1,69 @@
# Licensed to the Apache Software Foundation (ASF) under one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this file is placed under quarkus/distribution/distribution – is that intentional? I'd advocate for moving it one level up (it would become the README for the distribution module, but I think that's a good thing).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's intentional, which follows the same structure in admin and server module.

@@ -1,4 +1,4 @@
#!/usr/bin/env bash
#!/bin/bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is less portable than the original shebang.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch (I missed that)

@adutra
Copy link
Contributor

adutra commented May 15, 2025

@flyrain just to confirm, we are not planning to remove the individual distributions for server and tool, are we?

@flyrain
Copy link
Contributor Author

flyrain commented May 15, 2025

@flyrain just to confirm, we are not planning to remove the individual distributions for server and tool, are we?

Good question. We will remove the individual distribution once the single binary distribution is ready. Their jar file will still be published individually though.

@adutra
Copy link
Contributor

adutra commented May 15, 2025

@flyrain just to confirm, we are not planning to remove the individual distributions for server and tool, are we?

Good question. We will remove the individual distribution once the single binary distribution is ready. Their jar file will still be published individually though.

Are we sure about this? I assume that most people downloading the distribution would be interested in the server, not the tool. Imposing the combined distribution would mean for them to download a bunch of jars they don't need. And the opposite is true as well: if I only need the tool, why would I need to download 200Mb of server jars?

@dimas-b
Copy link
Contributor

dimas-b commented May 16, 2025

With real (not in-memory) persistence the admin tool is pretty much always required for proper bootstrapping, I guess. So it makes sense to bundle it with the server.

As to whether to keep a separate downloadable archive just for the admin tool, I suppose it may be useful only if the server runs in docker. However, in that case, it might be preferable to have a separate docker image for the admin tool and have only the combined archive-based distribution. WDYT?

@adutra
Copy link
Contributor

adutra commented May 16, 2025

it might be preferable to have a separate docker image for the admin tool and have only the combined archive-based distribution. WDYT?

If that's the plan I can live with that 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants