Skip to content

Inlining fields in a future-proof way #8

@jbothma

Description

@jbothma

We often want to inline values of complex type to aid indexing/matching and presentation without traversing the graph to fetch those entities.

e.g.

  • Address into address
  • Identification into
  • Sanction (program) potentially for programCode
  • Categorisation potentially, e.g. the Czech statistical categorisation

A concern with this is that there are continually new identifier types that we'd like to support. With the current approach, each of these results in a schema change.

It would be nice to consider options to support inlining identifiers in a way that scales. This could potentially also provide a strategy we apply when we inline other kinds of things.

Some things to consider:

  • It's nice to be able to apply stronger validation to specific types with well-defined formats as we currently do with specific identifier props
  • It might be good to have a way to version these, to support migrating from a scheme if we need to
  • Some types are only applicable to some schemata e.g. Organization:giiNumber but Company:bikCode. That's easy to express as properties
  • We have varying degrees of certainty of the type of a value, e.g. it might be a bank account number, we might know that it's russian. A source might express it as a generic identification value with no further information, or even indicate its type incorrectly.

A couple of ideas we've floated

  1. Simply index nested versions
  2. Add some kind of scheme or type information to inlined values
    • e.g. 1:ru_bik:123456789 where
      • 1 is the version
      • ru_bik is the type
      • 123456789 is the value
    • To aid matching, perhaps more generic types can be defined for related but different types, with a library to fill down to more and more generic types, e.g. ru_bik can also be placed under bik and bank_account either before publication, or just before indexing
    • types might indicate how strong and how specific they are, which might be used in scoring matches
    • Maybe categorisation could be expressed as a particularly weak form of an identifier

If we go in the direction of (2), we might want to consider applying a similar approach to inlining other kinds of fields, e.g. programCode on the entity might be scoped from the start

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions