Skip to content

Resolution plan for divergent types issues #5

@pudo

Description

@pudo

So these are all the properties identified by @tillprochaska's script which have different types in different schemata and thus create a mess when trying to generate an ES schema from FtM entities. Here's some proposals for how to address each:

author:

  • Assessment:author - entity
  • Document:author - string

Resolution: Unsure. Can you @tillprochaska determine the exposure of OCCRP on Assessment in general? I never quite figured out what that's supposed to mean. We could rename Assessment:author to Assessment:authorEntity, but I wonder if we can explore abolishing Assessment entirely, or at least clarify its semantics?

organization:

  • Position:organization - entity
  • Post:organization - string
  • Membership:organization - entity
  • Directorship:organization - entity

Resolution: Post is already meant to be removed in FtM 4.0

number:

  • UserAccount:number - phone
  • Identification:number - identifier

Resolution: propose moving UserAccount:number to UserAccount:phone. Can we get a sense of how many such values are set, @tillprochaska and also possibly @catileptic/@simonwoerpel?

authority:

  • Sanction:authority - string
  • CallForTenders:authority - entity
  • Contract:authority - entity
  • Identification:authority - string

Resolution: complete mess. I don't know. We use both Sanction and Identification extensively and would need to post a change announcement to our customers if we move that.

duration:

  • Video:duration - number
  • Call:duration - number
  • Audio:duration - number
  • Sanction:duration - string

Resolution: shift Sanction:duration to Sanction:term - or Risk:something?

area:

  • RealEstate:area - number
  • License:area - string

Resolution: License is likely much less used. We could rename to scope? Or location with type: address? @tillprochaska Can you get an index count here, too?

subject:

  • Message:subject - string
  • Email:subject - string
  • UnknownLink:subject - entity

Resolution: nasty. subject is a key part of UnknownLink, so I'm actually wondering if we want to replace the whole schema (eg. with a new OtherLink and make the transition schema -> schema, rather than breaking UnknownLink really hard.

sender:

  • Email:sender - string
  • Message:sender - entity
  • EconomicActivity:sender - entity

Resolution: really nasty. This is emitted for basically every email by the ingestors, so moving Email:sender to Email:senderName is a massive installed base problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions