Skip to content

Conversation

@RatulDawar
Copy link

Related Issue : #27276

@cla-bot
Copy link

cla-bot bot commented Nov 11, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@github-actions github-actions bot added the iceberg Iceberg connector label Nov 11, 2025
@RatulDawar RatulDawar changed the title feature: Make filesystem cache table aware [WIP] feature: Make filesystem cache table aware Nov 11, 2025
@cla-bot
Copy link

cla-bot bot commented Nov 11, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

2 similar comments
@cla-bot
Copy link

cla-bot bot commented Nov 11, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@cla-bot
Copy link

cla-bot bot commented Nov 11, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link

@RotRotAl RotRotAl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi please fix your code (5 test shave failed), and submit signed cla as the bot commented.

}

@Config("fs.cache.include-tables")
@ConfigDescription("List of tables to include in file system cache (schema.table format, supports wildcards like schema.* or *)")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wording is a bit off,

Suggested change
@ConfigDescription("List of tables to include in file system cache (schema.table format, supports wildcards like schema.* or *)")
@ConfigDescription("List of tables to include in file system cache,
useto cache listings for all tables in all schemas


/**
* Predicate to determine if a table should be cached based on configured include list.
* Supports wildcards: schema.table, schema.*, or *

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use * to cache listings for all tables in all schemas

@RotRotAl
Copy link

@RatulDawar have you tried adding this feature to hive connector as well? it will be very helpful

@ratuldawar11
Copy link

Hi please fix your code (5 test shave failed), and submit signed cla as the bot commented.

Yes have already submitted cla 3-4 days back. Also still working on this PR. It's a WIP.

@ratuldawar11
Copy link

@RatulDawar have you tried adding this feature to hive connector as well? it will be very helpful

Sure will add that too in the same PR.

@RotRotAl
Copy link

Hi is there something new?

@RatulDawar
Copy link
Author

Hi is there something new?

Hey resolving the comments in fews hours, was waiting for CLA to be signed, it just got signed today.

@cla-bot cla-bot bot added the cla-signed label Nov 26, 2025
@RatulDawar RatulDawar force-pushed the feature/explicit-table-caching branch from 1ee85e3 to 6bb2852 Compare November 26, 2025 18:07
@github-actions github-actions bot added docs ui Web UI hudi Hudi connector delta-lake Delta Lake connector hive Hive connector bigquery BigQuery connector elasticsearch Elasticsearch connector google-sheets Google Sheets connector kafka Kafka connector memory Memory connector opensearch OpenSearch connector redis Redis connector redshift Redshift connector sqlserver SQLServer connector lakehouse labels Nov 26, 2025
@RatulDawar RatulDawar force-pushed the feature/explicit-table-caching branch from 6bb2852 to 5482baf Compare November 26, 2025 18:11
@RatulDawar RatulDawar force-pushed the feature/explicit-table-caching branch from 5482baf to feda670 Compare November 26, 2025 18:39
Comment on lines 44 to 45
if (includeTables.isEmpty()) {
this.predicate = _ -> true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove.

return predicate.test(table);
}

private static Predicate<SchemaTableName> matches(List<String> tables)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matches -> buildPredicate

Comment on lines +61 to +62
.map(prefix -> (Predicate<SchemaTableName>) prefix::matches)
.reduce(Predicate::or)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fancy, but it would be better to just collect parsed table names on a field
and implement predicate with explicit for loop over them.

also, if any entry is *, then other elements still should be parsed (validated), but don't need to be remembered

Comment on lines +28 to +36
default TrinoFileSystem create(ConnectorSession session, boolean cachingEnabled)
{
return create(session);
}

default TrinoFileSystem create(ConnectorIdentity identity, boolean cachingEnabled)
{
return create(identity);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i generally like the idea that the FS calling code can knowingly bypass caching layer. Today this is handled by connector's CacheKeyProvider having knowledge of other components internals, eg

if (path.endsWith(".trinoSchema") || path.contains("/.trinoPermissions/")) {
// Needed to avoid caching files from FileHiveMetastore on coordinator during tests
return Optional.empty();

if (path.endsWith(".trinoSchema") || path.contains("/.trinoPermissions/")) {
// Needed to avoid caching files from FileHiveMetastore on coordinator during tests
return Optional.empty();
}

I am not sure these new functions are the best way to do it.
at least for FileMetastore i would try to simply inject the uncached underlying FS via Guice, or maybe have it configurable for FileMetastore separately (eg read-only use-cases on static data sets are OK to cache)

cc @electrum @raunaqmorarka

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we want to go this direction with new methods in TFSF, please open a separate PR adding those methods.
We can add usage of this methods in FileMetastore (controlled by configuration). This will be a simpler prep change & will ensure we have agreement on the API.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findepi I think these API's make this bypassing more robust both the Guice binding and api's with caching flag seems better approches than the current cache key provider with context internal file paths.
I would prefer API changes given that it is more flexible for possible future requirements. Open to discussion on this... !

Comment on lines 77 to 92
if (vendedCredentialsEnabled &&
fileIoProperties.containsKey(VENDED_S3_ACCESS_KEY) &&
fileIoProperties.containsKey(VENDED_S3_SECRET_KEY) &&
fileIoProperties.containsKey(VENDED_S3_SESSION_TOKEN)) {
// Do not include original credentials as they should not be used in vended mode
ConnectorIdentity identityWithExtraCredentials = ConnectorIdentity.forUser(identity.getUser())
.withGroups(identity.getGroups())
.withPrincipal(identity.getPrincipal())
.withEnabledSystemRoles(identity.getEnabledSystemRoles())
.withConnectorRole(identity.getConnectorRole())
.withExtraCredentials(ImmutableMap.<String, String>builder()
.put(EXTRA_CREDENTIALS_ACCESS_KEY_PROPERTY, fileIoProperties.get(VENDED_S3_ACCESS_KEY))
.put(EXTRA_CREDENTIALS_SECRET_KEY_PROPERTY, fileIoProperties.get(VENDED_S3_SECRET_KEY))
.put(EXTRA_CREDENTIALS_SESSION_TOKEN_PROPERTY, fileIoProperties.get(VENDED_S3_SESSION_TOKEN))
.buildOrThrow())
.build();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks like a copy of the method above. Avoid copying (duplicating) this logic.

* @param cachingEnabled whether file system caching should be enabled
* @return a TrinoFileSystem instance
*/
default TrinoFileSystem create(ConnectorSession session, Map<String, String> fileIoProperties, boolean cachingEnabled)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem is that cachingEnabled = true does not imply there will be any caching
it depends what the backing TrinoFileSystemFactory is. i.e. caching needs to be allowed on the FS layer as well

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fair, I think then it's make it virtually impossible to use apis with caching flags.
Should we go guice dependencies ? That seems fine over handling this in cachekeyprovider.

private final Predicate<SchemaTableName> predicate;

@Inject
public TableCachingPredicate(FileSystemConfig config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like part of trino-filesystem-manager caching layer, but it's used only in Iceberg.

For FS caching it would be more natural to cache based on locations, but i can see how user-unfriendly this might be and that cache control by table names is reasonable.

This currently is a connector-specific cache control and if implemented as such, it should be configured as such. see delta.fs.cache.disable-transaction-log-caching for example of pre-existing connector-specific cache control.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findepi I think we can extended it for all importers of trino-file-system manager as this doesn't require any connector specific handling. Adding this specific to connectos seems to add more complexity than implementing it connector agonistic.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me confirm if this is possible or not

- Change default value from empty list to ImmutableList.of('*') to maintain backward compatibility
- Add @notempty validation to reject empty list configuration
- Add test to verify empty list validation fails
- Add jakarta.validation-api and io.airlift:testing dependencies to pom.xml
@github-actions
Copy link

This pull request has gone a while without any activity. Ask for help on #core-dev on Trino slack.

@github-actions github-actions bot added the stale label Dec 19, 2025
@github-actions
Copy link

github-actions bot commented Jan 9, 2026

Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.

@github-actions github-actions bot closed this Jan 9, 2026
@RoeyoOgen
Copy link

@RatulDawar any update?

@RoeyoOgen RoeyoOgen reopened this Jan 11, 2026
@github-actions github-actions bot removed the stale label Jan 12, 2026
@RatulDawar
Copy link
Author

@RatulDawar any update?

I have addressed all the comments need to start a discussion on how the API's should be formed here, will add my pointers and do that in this week itself, will again start working on this actively

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bigquery BigQuery connector cla-signed delta-lake Delta Lake connector docs elasticsearch Elasticsearch connector google-sheets Google Sheets connector hive Hive connector hudi Hudi connector iceberg Iceberg connector kafka Kafka connector lakehouse memory Memory connector opensearch OpenSearch connector redis Redis connector redshift Redshift connector sqlserver SQLServer connector ui Web UI

Development

Successfully merging this pull request may close these issues.

5 participants