Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CASSANDRA-20460: Expand TableWalk tests to include collections and add support for += and -= for these types #3995

Closed
wants to merge 39 commits into from

Conversation

dcapwell
Copy link
Contributor

No description provided.

@dcapwell dcapwell requested a review from maedhroz March 19, 2025 21:14
return AbstractTypeGenerators.builder()
.withTypeKinds(Generators.fromGen(TYPE_KIND_DISTRIBUTION.next(rs)))
.withPrimitives(Generators.fromGen(PRIMITIVE_DISTRIBUTION.next(rs)))
.withUserTypeFields(AbstractTypeGenerators.UserTypeFieldsGen.simpleNames())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not testing UDT serialization so simple names makes it so much easier to read...

Comment on lines +107 to +108
.withTypeKinds(Generators.fromGen(TYPE_KIND_DISTRIBUTION.next(rs)))
.withPrimitives(Generators.fromGen(PRIMITIVE_DISTRIBUTION.next(rs)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before we used uniform random so each test had the same weights. Now we select a new distribution for each test run (some tests will only be primitives, others will be mostly maps, etc.)

.withTypeKinds(Generators.fromGen(TYPE_KIND_DISTRIBUTION.next(rs)))
.withPrimitives(Generators.fromGen(PRIMITIVE_DISTRIBUTION.next(rs)))
.withUserTypeFields(AbstractTypeGenerators.UserTypeFieldsGen.simpleNames())
.withMaxDepth(1);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ATM we are not testing nested access (will need to eventually) so lowered as the cql literal output was so large it got hard to read (UDT of map of UDT of map of UDT....)

@@ -206,7 +229,7 @@ protected List<CreateIndexDDL.Indexer> supportedIndexers()
builder.value(pk, key.bufferAt(pks.indexOf(pk)));


List<Symbol> searchableColumns = state.nonPartitionColumns;
List<Symbol> searchableColumns = state.searchableNonPartitionColumns;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multi cell collections/udt can not be searched, so just using the non-partition columns wasn't enough

@@ -363,7 +386,8 @@ protected TableMetadata defineTable(RandomSource rs, String ks)
.withCompression())
.withKeyspaceName(ks).withTableName("tbl")
.withSimpleColumnNames()
.withDefaultTypeGen(supportedTypes())
.withDefaultTypeGen(supportedTypes(rs))
.withPrimaryColumnTypeGen(supportedPrimaryColumnTypes(rs))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when primary keys can be collections there can be issues with having too many bytes and having our mutations be rejected. Keeping it primitive also improves readability for debug output...

Would be good to have a test that overrides this though, just to improve coverage...

@@ -424,7 +448,8 @@ public State(RandomSource rs, Cluster cluster)
.withoutTransaction()
.withoutTtl()
.withoutTimestamp()
.withPartitions(SourceDSL.arbitrary().pick(uniquePartitions));
.withPartitions(SourceDSL.arbitrary().pick(uniquePartitions))
.withColumnExpressions(e -> e.withOperators(Generators.fromGen(BOOLEAN_DISTRIBUTION.next(rs))));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each column will have a different distribution for when operators are allowed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Required for CASSANDRA-20449 repro at least 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wasn't required; we hit the issue before this change. This change is to better cover different edge cases

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need different assignment operators to repro (+= v vs. = {v}) but not necessarily different distributions of operators, which is what's introduced here - is that right?

@@ -493,6 +503,45 @@ private static void validateAnyOrder(ImmutableUniqueList<Symbol> columns, Set<Ro
if (actual.isEmpty()) sb.append("No rows returned");
else sb.append("Missing rows:\n").append(table(columns, missing));
}
if (!unexpected.isEmpty() && unexpected.size() == missing.size())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is extra human readable validation logic. null vs <empty> has been super hard to see from the above output, and the use case i was finding is that we had the correct rows returned, but 1 or more columns had a difference... so if this is the case we try to show which columns are off... this was super useful for detecting consistency issues with null and <empty>

{
// good chance a column differs
StringBuilder finalSb = sb;
Runnable runOnce = new Runnable()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think Alex created a reusable function for this in accord... would need to find it

@@ -731,7 +780,7 @@ private static boolean matches(ByteBuffer value, List<? extends Expression> cond
for (Expression e : conditions)
{
ByteBuffer expected = eval(e);
if (expected.equals(value))
if (expected != null && expected.equals(value))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eval is now Nullable but this code path shouldn't really ever see a null... maybe if you did something like [1] - [1] then we could get a null here... mostly added to be correct and defensive (and intellij yelled at me)

@@ -141,7 +141,7 @@ else if (((isFrozen(type) && !type.isVector()) || StorageAttachedIndex.SUPPORTED
public EnumSet<QueryType> supportedQueries(AbstractType<?> type)
{
type = type.unwrap();
if (IndexTermType.isEqOnlyType(type))
if (IndexTermType.isEqOnlyType(type) || type.isCollection() || type.isUDT() || type.isTuple())
Copy link
Contributor Author

@dcapwell dcapwell Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maedhroz should we push this into IndexTermType.isEqOnlyType?


public class ExpressionEvaluator
{
public static Optional<Object> tryEval(Expression e)
@Nullable
public static Object eval(Expression e)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returning Optional was just causing confusion, and all code paths did .get() anyways.

I also needed the ability to return null rather than empty which i couldn't do with Optional

{
Object lhs = eval(e.left);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the model eval method will put the "current" value as a ByteBuffer into the operation, this then means that the types might not match! to solve this we just convert w/e we see a BB.

accum.removeAll((List<Object>) rhs);
return accum.isEmpty() ? null : accum;
}
if (lhs instanceof Map)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code path is currently untested, only part of this patch that wasn't validated

@@ -770,12 +784,12 @@ private void generateRemaining(RandomnessSource rnd,
}
}
}
if (kind == Mutation.Kind.UPDATE && isTransaction)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collections can do things outside of transactions, so need to push this into supportsOperators

typeFilter = other.typeFilter;
udtName = other.udtName;
multiCellGen = other.multiCellGen;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was missing so i took the fields in object order and recreated this constructor just in case there was more; looks like it was just this one though.

for (int i = 0; i < size; i++)
{
T value;
while (!set.add(value = gen.generate(rnd))) {}
Copy link
Contributor Author

@dcapwell dcapwell Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't as safe as uniqueBestEffort from accord (if gen is a boolean gen and size is 3, this will loop forever)... once accord merges we can talk about unifying all this stuff...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the meantime, want to set an iteration limit with a specific error? Just to make it a bit easier to figure out what's happening if a test is timing out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now no. There are better impl in Gens and when i migrate off of quick theories we can just use the better versions; so no reason to handle here...

{
if (!(e instanceof AssignmentOperator)) return eval(e);
// multi cell collections have the property that they do update even if the current value is null
boolean isFancy = col.type().isCollection() && col.type().isMultiCell();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious about the isFancy name choosing :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i felt it was fancy!

return EnumSet.of(Kind.ADD, Kind.SUBTRACT);
if (type instanceof MapType)
{
// map supports subtract, but not map - map; only map - set!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's... an interesting finding. Should we have a Jira to add proper support?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wouldn't add a JIRA unless someone wants that feature. map -= acts on the key level, which can be done without a transaction (CAS or Accord), where as remove(key, value) would need to know the current value, which isn't behavior we like to support outside of transactions.

@@ -523,6 +561,8 @@ private List<Symbol> multiColumnQueryColumns()
List<Symbol> allowedColumns = searchableColumns;
if (hasMultiNodeMultiColumnAllowFilteringWithLocalWritesIssue())
allowedColumns = nonPkIndexedColumns;
if (IGNORED_ISSUES.contains(KnownIssue.SAI_AND_VECTOR_COLUMNS) && !indexes.isEmpty())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added to get clean CI. Something is up with multi column queries that touch both SAI and vector type. It doesn't look like the vector must be indexed to get incorrect results, so doesn't feel like we are selecting a bad index here

for (int i = 0; i < size; i++)
{
T value;
while (!set.add(value = gen.generate(rnd))) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the meantime, want to set an iteration limit with a specific error? Just to make it a bit easier to figure out what's happening if a test is timing out.

@@ -424,7 +448,8 @@ public State(RandomSource rs, Cluster cluster)
.withoutTransaction()
.withoutTtl()
.withoutTimestamp()
.withPartitions(SourceDSL.arbitrary().pick(uniquePartitions));
.withPartitions(SourceDSL.arbitrary().pick(uniquePartitions))
.withColumnExpressions(e -> e.withOperators(Generators.fromGen(BOOLEAN_DISTRIBUTION.next(rs))));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Required for CASSANDRA-20449 repro at least 😄

@dcapwell
Copy link
Contributor Author

merged

@dcapwell dcapwell closed this Mar 21, 2025
@dcapwell dcapwell deleted the CASSANDRA-20460 branch March 21, 2025 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants