Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"plone.app.contenttypes.migration.migration.migrate_folders" introduces catalog inconsistencies #556

Open
d-maurer opened this issue Jun 15, 2020 · 6 comments

Comments

@d-maurer
Copy link

Affected version: 2.1.7

In the following transacript, p represents a Plone portal. A folder hierarchy test_migration, f1, f2, d is created and the catalog is patched to limit the search to objects below test_migration. Before, the call of migrate_folders, the catalog is consistent; after the call, the catalog is inconsistent.

>>> from Products.ATContentTypes.content.folder import ATFolder
>>> from Products.ATContentTypes.content.document import ATDocument
>>> from plone.app.contenttypes.migration.migration import migrate_folders
>>> 
>>> def add(dest, id, type):
...   obj = type(id)
...   oid = dest._setObject(obj.getId(), obj)
...   return dest._getOb(oid)
... 
>>> c = p.portal_catalog
>>> top = add(p, "test_migration", ATFolder)
>>> f1 = add(top, "f1", ATFolder)
>>> f2 = add(f1, "f2", ATFolder)
>>> d = add(f2, "d", ATDocument)
>>> 
>>> orig_search = c.__call__.__func__
>>> 
>>> def patched_search(self, q):
...   q["path"] = "/".join(top.getPhysicalPath())
...   return orig_search(self, q)
... 
>>> c.__class__.__call__ = patched_search
>>> 
>>> c({})
[<Products.ZCatalog.Catalog.mybrains object at 0x7fc415d98230>, <Products.ZCatalog.Catalog.mybrains object at 0x7fc417422550>, <Products.ZCatalog.Catalog.mybrains object at 0x7fc417422280>, <Products.ZCatalog.Catalog.mybrains object at 0x7fc4174228c0>]
>>> 
>>> migrate_folders(p)
>>> 
>>> c({})
ERROR:Zope.ZCatalog:uncatalogObject unsuccessfully attempted to uncatalog an object with a uid of /partner/bp3/test_migration_MIGRATION_. 
ERROR:Zope.ZCatalog:uncatalogObject unsuccessfully attempted to uncatalog an object with a uid of /partner/bp3/test_migration/f1/f2_MIGRATION_. 
ERROR:Zope.ZCatalog:uncatalogObject unsuccessfully attempted to uncatalog an object with a uid of /partner/bp3/test_migration/f1_MIGRATION_. 
ERROR:Products.ZCatalog:A different document with value '106ed81ac36144f18fa77af52d71d844' already exists in the index.
ERROR:Products.ZCatalog:A different document with value '106ed81ac36144f18fa77af52d71d844' already exists in the index.
[]

The reason is a bad interaction with the Products.CMFCore.indexing.IndexQueue. This queue registers indexing requests for objects, more precisely acquisition wrappers for objects. During migrate_folders, migrated folders have their children removed and are renamed; the children are then added to the migrated folder. This creates indexing requests for the decendents in the queue. If during the migration a descendent is migrated, its renaming modifies those indexing requests as an unwanted side effect.

@d-maurer
Copy link
Author

The initial code became victim of the same effect that causes the catalog inconsistencies in the first place: the migration modifies the result of getPhysicalPath. Below the code demonstrating the issue is fixed.

>>> from Products.ATContentTypes.content.folder import ATFolder
>>> from Products.ATContentTypes.content.document import ATDocument
>>> from plone.app.contenttypes.migration.migration import migrate_folders
>>> 
>>> def add(dest, id, type):
...   obj = type(id)
...   oid = dest._setObject(obj.getId(), obj)
...   return dest._getOb(oid)
... 
>>> c = p.portal_catalog
>>> top = add(p, "test_migration", ATFolder)
>>> f1 = add(top, "f1", ATFolder)
>>> f2 = add(f1, "f2", ATFolder)
>>> d = add(f2, "d", ATDocument)
>>> 
>>> top_path = "/".join(top.getPhysicalPath())
>>> 
>>> orig_search = c.__call__.__func__
>>> 
>>> def patched_search(self, q={}):
...   q = q.copy()
...   q["path"] = top_path
...   return orig_search(self, q)
... 
>>> c.__class__.__call__ = patched_search
>>> 
>>> c({})
[<Products.ZCatalog.Catalog.mybrains object at 0x7f5deee3abe0>, <Products.ZCatalog.Catalog.mybrains object at 0x7f5deee3a960>, <Products.ZCatalog.Catalog.mybrains object at 0x7f5deee3f050>, <Products.ZCatalog.Catalog.mybrains object at 0x7f5deee3f190>]
>>> 
>>> migrate_folders(p)
>>> 
>>> for x in c({}): path = x.getPath(); print(path[path.index("test_migration"):])
... 
ERROR:Zope.ZCatalog:uncatalogObject unsuccessfully attempted to uncatalog an object with a uid of /partner/bp3/test_migration_MIGRATION_. 
ERROR:Zope.ZCatalog:uncatalogObject unsuccessfully attempted to uncatalog an object with a uid of /partner/bp3/test_migration/f1/f2_MIGRATION_. 
ERROR:Zope.ZCatalog:uncatalogObject unsuccessfully attempted to uncatalog an object with a uid of /partner/bp3/test_migration/f1_MIGRATION_. 
ERROR:Products.ZCatalog:A different document with value '0b5477edf6f04fb5991bcbb99461e00a' already exists in the index.
ERROR:Products.ZCatalog:A different document with value '0b5477edf6f04fb5991bcbb99461e00a' already exists in the index.
test_migration
test_migration/f1
test_migration/f1/f2
test_migration/f1/f2/d
test_migration/f1_MIGRATION_/f2_MIGRATION_/d
test_migration/f1_MIGRATION_/f2_MIGRATION_
test_migration/f1/f2_MIGRATION_/d

@d-maurer
Copy link
Author

The following patches solve the problem (at least for the fairly simple example case):

from Products.contentmigration.basemigrator.migrator import ItemMigrationMixin, FolderMigrationMixin

# patch `FolderMigrationMixin`
def migrate_children(self):
  """Copy childish objects from the old folder to the new one
  """
  for id, obj in self.old.objectItems(): # this should keep order
    __traceback_info__ = __traceback_info__ = ('migrate_children',
                  self.old, self.orig_id, 'Migrating subobject %s' % id)
    # do not reindex - if a descendent needs reindexing
    #  it calls for it by itself
    self.new._setObject(id, obj, set_owner=0, suppress_events=True)

FolderMigrationMixin.migrate_children = migrate_children
FolderMigrationMixin.beforeChange_storeSubojects = lambda self: None

# patch `ItemMigrationMixin`
def renameOld(self):
  # do not unindex: the object will be replaced which will call a reindex
  self.parent._delObject(self.orig_id, suppress_events=True)

ItemMigrationMixin.renameOld = renameOld
ItemMigrationMixin.remove = lambda self: None # already in `renameOld`

@d-maurer
Copy link
Author

The inconsistencies (at least the strange ones with the catalog indexes inconsistent with the catalog metadata) are caused by Products.CMFCore.CMFCatalogAware.CatalogAware.reindexObjectSecurity -- again in interaction with the catalog's IndexQueue. They arise as follows.

When an object o is newly created, its o.reindexObjectSecurity is called. This determines the descendants of o via a catalog search and calls for every descendant d d.reindexObject to reindex the indexes related to object security. The descendants may or may not yet be migrated. The problem occurs for not yet migrated ones. Their reindexing populates the IndexQueue with indexing requests for not migrated objects. When those objects are later migrated, a reindexing request is put into the IndexQueue. Index requests for non migrated and migrated objects can have the same path; then they affect the same catalog data. When the IndexQueue is optimized, collapsable requests are NOT identified via the path (likely a CMFCore bug) but via an object hash. This prevents that those indexing requests are folded into a single request even though they affect the same catalog data. Nevertheless, everything would be fine if the indexing requests were honored in the same order as they were posted. However, the optimization introduces a random order which may cause unmigrated object indexation to follow the migrated object indexation -- and this introduces the inconsistencies.

I have the strong feeling that this is primarily a bug in Products.CMFCore.indexing.IndexQueue. As all indexing requests with the same path modify the same catalog data, this path (and nothing else) should be used to determine when two indexing requests should be merged. In addition, the actual indexing should use the real object - not one remembered at the time of the indexing request (but alternatively, it could use the latest remembered object).

I have worked around the problem in my real work migration via:

from Products.CMFCore.CMFCatalogAware import CatalogAware

CatalogAware.reindexObjectSecurity = lambda *args, **kw: None

This disables reindexObjectSecurity. As we are migrating, nothing should change with respect to security. Thus, if the migration truthfully maintains security for each migrated object, no special things should be necessary.

@dest81
Copy link

dest81 commented Sep 18, 2020

The following patches solve the problem (at least for the fairly simple example case):

from Products.contentmigration.basemigrator.migrator import ItemMigrationMixin, FolderMigrationMixin

# patch `FolderMigrationMixin`
def migrate_children(self):
  """Copy childish objects from the old folder to the new one
  """
  for id, obj in self.old.objectItems(): # this should keep order
    __traceback_info__ = __traceback_info__ = ('migrate_children',
                  self.old, self.orig_id, 'Migrating subobject %s' % id)
    # do not reindex - if a descendent needs reindexing
    #  it calls for it by itself
    self.new._setObject(id, obj, set_owner=0, suppress_events=True)

FolderMigrationMixin.migrate_children = migrate_children
FolderMigrationMixin.beforeChange_storeSubojects = lambda self: None

# patch `ItemMigrationMixin`
def renameOld(self):
  # do not unindex: the object will be replaced which will call a reindex
  self.parent._delObject(self.orig_id, suppress_events=True)

ItemMigrationMixin.renameOld = renameOld
ItemMigrationMixin.remove = lambda self: None # already in `renameOld`

Good trick :)

No uncatalog errors but after migration I run this code to check objects

def check_objects(portal_types=[]):
    catalog = api.portal.get_tool('portal_catalog')
    for content_brain in catalog.unrestrictedSearchResults(
            portal_type=portal_types):
        obj = content_brain.getObject()
        if not IDexterityContent.providedBy(obj):
            log.error("%s is not dexterity" % obj)

and got many Archetypes objects

Deep debugging shows weird thing in the createNew function. For example I am migrating two ATFolder objects:

/plone/ParentFolder
/plone/ParentFolder/ChildFolder

If I migrate parent object first and then migrate child object self.parent for this child object self.parent is still Archetypes.

def createNew(self):
        """Create the new object
        """
        IDexterityContent.providedBy(self.parent) # returns False
        _createObjectByType(self.dst_portal_type, self.parent, self.new_id)
        self.new = getattr(aq_inner(self.parent).aq_explicit, self.new_id)

It seems it doesn't take newly created parent dexterity folder and takes old from memory. To fix this I have patched BaseMigrator.init method:
self.parent = portal.unrestrictedTraverse(aq_parent(self.old).getPhysicalPath())

Not sure I understand fully what's going on but your trick together with my seems to be working for me.

@d-maurer
Copy link
Author

d-maurer commented Sep 18, 2020 via email

@dest81
Copy link

dest81 commented Sep 22, 2020

@d-maurer Thanks for your tips,
I had used this trick before

from Products.CMFCore.CMFCatalogAware import CatalogAware

CatalogAware.reindexObjectSecurity = lambda *args, **kw: None

and it didn't solve the issue. It made migration twice faster and thanks for that :) but I still have to patch BaseMigrator.init

Anyway glad to hear that it wasn't just my issue and I am moving in correct direction.
Thank you again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants