Skip to content

Delay proposals of values with no likelihoods in the current subproblem #9

Open
@alex-lew

Description

@alex-lew

Consider the program

class NameWithNickname begin
  true_name ~ string_prior(1, 30) preferring all_first_names
  nickname ~ string_prior(1, 30) preferring all_first_names
end
class Person begin
  fname ~ NameWithNickname
  lname ~ string_prior(1, 30) preferring all_last_names
end
class Record begin
  person ~ Person
  name ~ uniform([person.fname.true_name, person.fname.nickname])
end

The problem here is that when you first process a record, you are (by design) assumed to be observing either the person’s true first name, or their nickname. But PClean will try to initialize both latent fields. Suppose you see a person’s first name, and PClean gets it right that it’s a full first name. Then later you see their nickname in another record. You won’t be able to assign the new record to the same “person” object, because the “person” object will already have some (other, generated-from-the-prior) nickname.

If we can delay the proposal of the "other" latent until we have evidence for it, we could circumvent this issue, and do accurate inference in models like this.

This is also very relevant for data integration across multiple sources, where different sources may report different attributes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions