Description
In SourceManager
the persistFiles()
method unloads the FileContent
for the cached File
entities for (unverified) memory management considerations:
void SourceManager::persistFiles()
{
std::lock_guard<std::mutex> guard(_createFileMutex);
_transaction([&, this]() {
for (const auto& p : _files)
{
if (_persistedFiles.find(p.second->id) == _persistedFiles.end())
_persistedFiles.insert(p.second->id);
else
continue;
try
{
// Directories don't have content.
if (p.second->content &&
_persistedContents.find(p.second->content.object_id()) ==
_persistedContents.end())
{
p.second->content.load();
_db->persist(*p.second->content);
_persistedContents.insert(p.second->content.object_id());
}
_db->persist(*p.second);
// TODO: The memory consumption should be checked to see if not
// unloading the lazy shared pointer keeps the file content in memory.
// If so then this line should be uncommented. The reason for not
// unloading is that some parsers may want to read the file contents and
// if this can be done through the File object then the file is not
// needed to be read from disk.
p.second->content.unload();
}
catch (const odb::object_already_persistent&)
{
}
}
});
}
Based on the given comment it was assumed that the file's content could be reloaded later if desired. However File::content
is an odb::lazy_shared_ptr<FileContent>
and in case we parse it for the first time it is loaded from the disk by the SourceManager::getCreateFileEntry()
and the pointer is transient (not persisted). According to the ODB manual the unload()
method "for transient objects is equivalent to reset()
" and therefore reloading the pointer later with load()
is not possible.
Follow through the explanation among the code lines:
for (const auto& p : _files)
{
// not relevant code parts omitted ...
try
{
// This loads the lazy_ptr. Note that this lazy_ptr instance is transient
// if it was read from the disk and not loaded from the database.
p.second->content.load();
// This persists the FileContent record in the database,
// but the lazy pointer is still transient, since the persist() method
// does not marks is persisted. (For the persist() method the
// shared_ptr behind lazy_ptr was passed in the first place).
_db->persist(*p.second->content);
}
_db->persist(*p.second);
// This unloads the lazy_ptr, which is equal to reset() for transient pointers
p.second->content.unload();
}
}
This results to bugs, e.g. with the following scenario:
- Run searchparser
- Loads all files into SourceManager from disk as transient objects.
- Persist the objects. The lazy pointer for
FileContent
entities are still transient. - Unloads the transient lazy pointers, which is an irreversible action.
- Run metricsparser
- Loads files from SourceManager.
- The lazy pointers for the
FileContent
entities are now invalid and cannot be loaded. - Metrics is skipped for most files, because it assumes the files have no content.
As a solution these transient lazy pointers:
- should not be unloaded upon persistence; or
- should be replaced with an unloaded, but persisted lazy pointer; or
- functionality should be added into
SourceManager::getCreateFileEntry()
to handle these invalidated lazy pointers when required.
Open for discussion!