Skip to content

Conversation

@jphui
Copy link
Contributor

@jphui jphui commented May 13, 2025

Summary

BUG=META-21845

This PR adds WRITE pathway functionality for Aspect soft-deletion, which is a necessary step in order to support Search Backfill -- https://docs.google.com/document/d/1_Hok1RaawgBN1ESGe3D0Qcesa6sBtbHuKjMLhWNgbtM/edit?tab=t.0#bookmark=id.wcf1pnap4gv.

There are a few important Design Choices that have been made here, most should have associated notes in the code...

SoftDeletedAspect model extension

This is discussed in the document here.

As of 06/06/25, the change will be made as such:

record SoftDeletedAspect {

  /**
   * Flag to indicate if the aspect is soft deleted in GMA
   *
   * Currently, there are no cases where the aspect is soft-deleted but this flag is NOT set to
   * true: the very existence of this Aspect type indicates soft-deletion. This is to say that
   * any aspect that is not soft-deleted will simply not exist -- cannot be cast -- as this type.
   */
  gma_deleted: boolean

  /**
   * The deleted content (oldValue) of the aspect.
   * Marked as optional for backwards compatibility with existing Deleted Aspects.
   */
+  deletedContent: optional string

  /**
   * The canonical class name of the aspect.
   * Marked as optional for backwards compatibility with existing Deleted Aspects.
   */
+  canonicalName: optional string

  /**
   * Audit timestamp to track when this aspect was deleted.
   * Marked as optional for backwards compatibility with existing Deleted Aspects.
   */
+  deletedTimestamp: optional string

  /**
   * Audit information to track who deleted this aspect.
   * Marked as optional for backwards compatibility with existing Deleted Aspects.
   */
+  deletedBy: optional string
}

Note that we leave the new fields as optional because:

  1. backwards compatible support for already-deleted Aspects -- want to be able to serialize it into this new model correctly
  2. we only need this field for new schema deletions since historical records are kept in the old schema -- however, the Old Schema still writes using the same SoftDeletedAspect model, so unless we want to rewire a lot of code (*), we need this to be set as an optional field
  3. At least semantically, it is possible to "delete an already-null / deleted aspect". This can be seen in unit test failures when attempting to code anything otherwise. In such a case, there is no OldValue to reference. Note that for Search-related correctness, this is -- in theory -- OK: no OldValue means that in the old state, there were no search fields indexed and thus, during a backfill, no need to delete anything.

New Schema ONLY (not even Dual Schema)

As mentioned in point (2) above, the writes done to metadata_aspect itself will continue to be oldValue-less:

In addition, in Dual Schema and "changelog enabled" (Cold Archive is NOT set up) cases, we elect to NOT write oldValue even in the New Schema table. This is because doing so would involve significant further refactoring with few use cases.

  • service-gms has been Cold-Archived
  • ThirdPartyIntegration is the only search result with Dual Schema and is being retired.

Note that this will NOT affect the intended use cases of this added feature: backfilling deleted aspects in the Search Index, of which we only support for Assets on the New Schema :)

(*) Quick Screenshot showing that there would be major refactoring to support a 1:1 change on Old Schema:

Screenshot 2025-06-05 at 11 55 34 PM

See Testing Done section to show proof of this operation

TODO: Nullable OldValue

There is 1 notable TODO that is left here as a residual of this.

   * <p>TODO: Make oldValue @Nonnull once the Old Schema is completed deprecated OR if old schema write pathways are
   * refactored to be able to process (retrieve and then write) the old value.

Public DAO API changes

1. BaseLocalDAO -- none
2. EbeanLocalDAO -- none, only a protected (helper) method addWithOldValue() is added
3. IEbeanLocalAccess
+ addWithOldValue() -- public helper is needed since the DAO is only aware of the public interface and NEEDS to be able to call functionality that can take advantage of an OldValue for writing

isSoftDeletedAspect() changes

I've adjusted both isSoftDeletedAspect() methods in EBeanDDAOUtils to evaluate based on "ability to serialize as a SoftDeletedAspect". This reasoning is documented in the code and the unit tests make clear this behavior:

 / *
   * Currently, there are no cases where the aspect is soft-deleted but this flag is NOT set to
   * true: the very existence of this Aspect type indicates soft-deletion. This is to say that
   * any aspect that is not soft-deleted will simply not exist -- cannot be cast -- as this type.
   * /

Note that this uses a call to ValidationUtils which has a strong schema validation method and requires all non-optional fields to be set (which is good).

Testing Done

unit testing is updated :)

Note that because existing Deleted Aspect (read) tests do not fail, I conclude that there are no regressions.

DB Snapshots showing that the code works (since we're not adding READ support that recognizes these changes yet)

New Schema example showing the old value being recorded

Screenshot 2025-06-06 at 12 28 21 AM

Dual Schema example showing the old value being recorded

Screenshot 2025-06-06 at 12 24 26 AM

Old Schema example showing that the legacy "gma_deleted only" pathway is used

Screenshot 2025-06-05 at 11 50 56 PM

Checklist

@jphui jphui changed the title Add write support for Soft Delete Aspect (feat) Add write support for Soft Delete Aspect May 13, 2025
@codecov-commenter
Copy link

codecov-commenter commented May 13, 2025

Codecov Report

Attention: Patch coverage is 96.42857% with 1 line in your changes missing coverage. Please review.

Project coverage is 65.92%. Comparing base (b94ec80) to head (aa36443).

Files with missing lines Patch % Lines
.../java/com/linkedin/metadata/dao/EbeanLocalDAO.java 87.50% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master     #534      +/-   ##
============================================
+ Coverage     65.79%   65.92%   +0.13%     
- Complexity     1595     1601       +6     
============================================
  Files           139      139              
  Lines          6230     6245      +15     
  Branches        721      722       +1     
============================================
+ Hits           4099     4117      +18     
+ Misses         1810     1808       -2     
+ Partials        321      320       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@ZihanLi58 ZihanLi58 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it only contains the write api change, does read support this new format? If not, I believe we need to add those in the same PR, otherwise services depends on this version will be break when reading the deleted aspect


@Override
@Transactional
public <ASPECT extends RecordTemplate> int addWithOldValue(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to have this new method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the Public DAO Api changes:

+ addWithOldValue() -- public helper is needed since the DAO is only aware of the public interface and NEEDS to be able to call functionality that can take advantage of an OldValue for writing

Otherwise, there will be an additional get() that's performed "below" this level in order to access this data.

return addWithOptimisticLocking(urn, null, newValue, aspectClass, auditStamp, oldTimestamp, ingestionTrackingContext, isTestMode);
}

private <ASPECT extends RecordTemplate> int addWithOptimisticLocking(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also change this name to be updateWithOptimisticLocking. The current name makes me feel even it's an add, you will update the db with new value and old Value.

* @param trackingContext tracking context
* @param isTestMode test mode
*/
protected <ASPECT extends RecordTemplate> void insertWithOldValue(@Nonnull URN urn, @Nullable RecordTemplate oldValue,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an we just create another insert method with different parameters, feel like different name does not make too much sense here

private static boolean isSoftDeletedAspect(@Nonnull String metadata) {
try {
SoftDeletedAspect aspect = RecordUtils.toRecordTemplate(SoftDeletedAspect.class, sqlRow.getString(columnName));
return aspect.hasGma_deleted() && aspect.isGma_deleted();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this change? old check does not work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In another word, I'm asking why not use isGma_deleted to check instead of checkin against schema?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is explained the isSoftDeletedAspect() changes section:

 / *
   * Currently, there are no cases where the aspect is soft-deleted but this flag is NOT set to
   * true: the very existence of this Aspect type indicates soft-deletion. This is to say that
   * any aspect that is not soft-deleted will simply not exist -- cannot be cast -- as this type.
   * /

and also in the nearby documentation:

   * <p>This validation approach is necessary because many usages of checking soft-deletion are followed by
   * a deserialization call to {@link RecordUtils#toRecordTemplate(Class, String)}, which will fail if
   * we try to deserialize a SoftDeletedAspect -- to another Aspect Type -- with the flag set to "false".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see your point from the explanation, but some confusion may arise later on why we are relying on the schema validation to check the state of aspect.
adding a comment, something like "If this is does not fir the model of a deleted aspect, it is not not deleted" might help. just a suggestion..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks both for the input, I will include my validation steps as well but still ultimately keep the original check. Note a few things:

  1. the fact of the matter is still that in the case where this boolean is set to "false", there will be some DAO errors that will result from this due to the inability to parse the returned object...
  2. ... however, this is OK because right now there are no cases that set it to "false"

@jphui
Copy link
Contributor Author

jphui commented Jun 6, 2025

@ZihanLi58

I see it only contains the write api change, does read support this new format? If not, I believe we need to add those in the same PR, otherwise services depends on this version will be break when reading the deleted aspect

Read "supports" the format in the sense that it will recognize the deleted aspects correctly but will not be able to read any of the new data yet. Existing unit tests are proof / example of this.

Effectively, this means no one can consume this new data yet. I did this on purpose to:

  1. ensure a safe change
  2. break up the PR

private static boolean isSoftDeletedAspect(@Nonnull String metadata) {
try {
SoftDeletedAspect aspect = RecordUtils.toRecordTemplate(SoftDeletedAspect.class, sqlRow.getString(columnName));
return aspect.hasGma_deleted() && aspect.isGma_deleted();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see your point from the explanation, but some confusion may arise later on why we are relying on the schema validation to check the state of aspect.
adding a comment, something like "If this is does not fir the model of a deleted aspect, it is not not deleted" might help. just a suggestion..

if (oldValue == null) {
// Shouldn't happen: shouldn't run delete() on an already-null/deleted aspect, or if calling to soft-delete, should
// pass in the old value.
// Since Old Schema logic that calls this method does NOT have access to the old value (without significant refactoring)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be more helpful to throw an exception if it is NEW_SCHEMA or DUAL_SCHEMA and oldValue=null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would basically result in DAO failures for cases on the Old or Dual schemas, I think a warning log here should be sufficient for now.

log.warn(String.format(
"OldValue should not be null when NewValue is null (soft-deletion). Urn: <%s> and aspect: <%s>", urn, aspectClass));
}
return sqlUpdate.setParameter("metadata", RecordUtils.toJsonString(
Copy link
Contributor

@kotkar-pallavi kotkar-pallavi Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious if we have verified with deleting more than 1 aspect in unit tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the testDeleteManyWithRelationshipRemoval() test has this:

...
    // soft delete the AspectFooBar and AspectFooBaz aspects
    Collection<EntityAspectUnion> deletedAspects =
        fooDao.deleteMany(fooUrn, new HashSet<>(Arrays.asList(AspectFooBar.class, AspectFooBaz.class)), _dummyAuditStamp);
...

@Nonnull
public static <ASPECT extends RecordTemplate> SoftDeletedAspect createSoftDeletedAspect(
@Nonnull Class<ASPECT> aspectClass, @Nullable ASPECT oldValue, @Nonnull Timestamp deletedTimestamp, @Nonnull String deletedBy) {
SoftDeletedAspect sda = new SoftDeletedAspect().setGma_deleted(true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit][nit] can we rename this variable to something like softDeletedAspect

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants