Skip to content
Matt Williams edited this page Jul 1, 2016 · 3 revisions

Introduction

A proposal for a splitting of the duties of GangaObject into two classes:

  1. one which is a standard in-memory object with a _data-backed store (GangaObject)
  2. one which is stored in a registry and stores all its data in the registry (RegistryObject)

RegistryObject

In all of Ganga, there are only a very small number of objects which are ever stored in registries (ignoring the box registry for now): Job, Task, and ShareRef and of those only Job is ever lazy-loaded. Nonetheless, every GangaObject needs to worry about both lazy-loading (_index_cache etc.) as registry membership. This could be simplified by creating a subclass of GangaObject which would know how to deal with these things and GangaObject (particularly its descriptor getters and setters) could be simplified to standard _dict access.

GangaObjects would never be lazy-loaded and would always have all their data stored in _data as is the case for most objects at the moment. A RegistryObject would be a very thin wrapper which would only store an id and _registry attribute. Any call to, for example, j.status would be redirected to something like j._registry.get_attribute(j.id, 'status'). It would then be the responsibility of the registry to decide how to implement get_attribute(). For example JobRegistry would try to get the information from the cache in preference to loading the object fully but PrepRegistry would always get the session lock and load the XML from disk.

The first step in this process would be creating a subclass of GangaObject as simple as:

class RegistryObject(GangaObject):
    pass

and making Job derive from it.

Registry changes

The registry will now be responsible for storing all the data about its items directly. So instead of having Registry._objects be a list of ``GangaObject``s, it will instead be a table of data, indexed by job id.

We then overload RegistryObject's Descriptor (via some simple metaclass magic) so that instead of __get__ and __set__ accessing _data, they instead call Registry.get_attribute() instead. get_attribute() will query the data table and return the appropriate information. The registry can at this point decide whether to return info from the index cache or fully load the object from disk and give that data instead.

Open questions

  1. The box registry: Since we've reduced the types that can be stored in registries the box will need to be rewritten somewhat. It needs some thought but should be allowed to prevent progress in other, more important areas.
  2. Subjobs: To first-order this will continue to work as it does now but it opens up possibilities in the future for harmonising the job registry and ``SubJobXMLList`
Clone this wiki locally