Skip to content

HIVE-20189: Separate metastore client code into its own module #5924

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 15, 2025

Conversation

deniskuzZ
Copy link
Member

@deniskuzZ deniskuzZ commented Jul 3, 2025

What changes were proposed in this pull request?

move client classes into it's own module

Why are the changes needed?

improve the structure of standalone-metastore classes

Does this PR introduce any user-facing change?

No

How was this patch tested?

jenkins

<dependencies>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-standalone-metastore-common</artifactId>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new module depends on the metastore-common, which doesn't reduce the size or dependencies from metastore-common, not sure if it's ok, for the user what will benefit from the new module?

Copy link
Member Author

@deniskuzZ deniskuzZ Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's expected for a client module to depend on a common module. The client module delivers core client functionality and can be enhanced later with features like caching.
Offering a 'client' JAR aligns with common conventions and is expected to be user-friendly.
ATM we have 2 distinct cache wrappers in different Hive modules, just because there was no structure:

  • org.apache.hadoop.hive.metastore.HiveClientCache
  • org.apache.iceberg.hive.CachedClientPool

Copy link
Member Author

@deniskuzZ deniskuzZ Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another point I don't understand, why ql and beeline modules depend on metastore-server? I think we should drop the dependency on server and move the classes into the metastore common or client.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I basically agree that we can potentially improve the structure in the future. We will have obvious guidelines about how to organize files.

  • metastore-client: Client-specific files, which Server doesn't need
  • metastore-server: Server-specific files, which Client doesn't need
  • metastore-common: Common modules

another point I don't understand, why ql and beeline modules depend on metastore-server? I think we should drop the dependency on server and move the classes into the metastore common or client.

I guess ql requires metastore-server to use an embedded HMS.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't embedded HMS only used in tests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it isn't. It can be used when metastore.thrift.uris is empty. For example, our Hive docker image(not HMS docker image) can set up a HiveServer2 without HMS. It probably uses the embedded mode that runs HMS-equivalent threads in HS2.

/**
* Check if metastore is being used in embedded mode.
* This utility function exists so that the logic for determining the mode is same
* in HiveConf and HiveMetaStoreClient
* @param msUri - metastore server uri
* @return true if the metastore is embedded
*/
public static boolean isEmbeddedMetaStore(String msUri) {
return (msUri == null) || msUri.trim().isEmpty();
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we can say it is for testing purposes and separate the classes from hive-exec.

Copy link
Contributor

@okumin okumin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Configuration conf, String userName) throws Exception {
Token<org.apache.hadoop.mapreduce.security.token.delegation.DelegationTokenIdentifier> t;
try (JobClient jcl = new JobClient(new JobConf(conf, HCatOutputFormat.class))) {
t = jcl.getDelegationToken(new Text(userName));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may directly return it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

Copy link
Contributor

@ngsg ngsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, +1

import org.apache.hadoop.hive.metastore.TableType;
import org.apache.hadoop.hive.metastore.Warehouse;
import org.apache.hadoop.hive.metastore.IMetaStoreClient;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think the original import is sorted correctly, so we might not need this change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

Copy link

@deniskuzZ deniskuzZ merged commit ea27ed7 into apache:master Jul 15, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants