Skip to content

SourceManager unloads transient lazy pointer on persistence #264

Open
@mcserep

Description

@mcserep

In SourceManager the persistFiles() method unloads the FileContent for the cached File entities for (unverified) memory management considerations:

void SourceManager::persistFiles()
{
  std::lock_guard<std::mutex> guard(_createFileMutex);

  _transaction([&, this]() {
    for (const auto& p : _files)
    {
      if (_persistedFiles.find(p.second->id) == _persistedFiles.end())
        _persistedFiles.insert(p.second->id);
      else
        continue;

      try
      {
        // Directories don't have content.
        if (p.second->content &&
            _persistedContents.find(p.second->content.object_id()) ==
            _persistedContents.end())
        {
          p.second->content.load();
          _db->persist(*p.second->content);
          _persistedContents.insert(p.second->content.object_id());
        }

        _db->persist(*p.second);

        // TODO: The memory consumption should be checked to see if not
        // unloading the lazy shared pointer keeps the file content in memory.
        // If so then this line should be uncommented. The reason for not
        // unloading is that some parsers may want to read the file contents and
        // if this can be done through the File object then the file is not
        // needed to be read from disk.
        p.second->content.unload();
      }
      catch (const odb::object_already_persistent&)
      {
      }
    }
  });
}

Based on the given comment it was assumed that the file's content could be reloaded later if desired. However File::content is an odb::lazy_shared_ptr<FileContent> and in case we parse it for the first time it is loaded from the disk by the SourceManager::getCreateFileEntry() and the pointer is transient (not persisted). According to the ODB manual the unload() method "for transient objects is equivalent to reset()" and therefore reloading the pointer later with load() is not possible.

Follow through the explanation among the code lines:

for (const auto& p : _files)
{
   // not relevant code parts omitted ...

   try
   {
      // This loads the lazy_ptr. Note that this lazy_ptr instance is transient
      // if it was read from the disk and not loaded from the database.
      p.second->content.load();
      // This persists the FileContent record in the database, 
      // but the lazy pointer is still transient, since the persist() method
      // does not marks is persisted. (For the persist() method the 
      // shared_ptr behind lazy_ptr was passed in the first place).
      _db->persist(*p.second->content);
    }

    _db->persist(*p.second);

    // This unloads the lazy_ptr, which is equal to reset() for transient pointers
    p.second->content.unload();
  }
}

This results to bugs, e.g. with the following scenario:

  1. Run searchparser
    1. Loads all files into SourceManager from disk as transient objects.
    2. Persist the objects. The lazy pointer for FileContent entities are still transient.
    3. Unloads the transient lazy pointers, which is an irreversible action.
  2. Run metricsparser
    1. Loads files from SourceManager.
    2. The lazy pointers for the FileContent entities are now invalid and cannot be loaded.
    3. Metrics is skipped for most files, because it assumes the files have no content.

As a solution these transient lazy pointers:

  • should not be unloaded upon persistence; or
  • should be replaced with an unloaded, but persisted lazy pointer; or
  • functionality should be added into SourceManager::getCreateFileEntry() to handle these invalidated lazy pointers when required.

Open for discussion!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Kind: Bug ⚠️Target: DatabaseIssues related to the database schema of the core or a plugin, or database handling in general.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions