Skip to content

CharacterizationService seems to interact badly with versioning #398

@dmolesUC

Description

@dmolesUC

(Where we've actually observed the problem is in over in Hyrax's ValkyrieCharacterizationService, and I've filed samvera/hyrax#7079 there, but it looks like the Valkyrie behavior is taken straight from the Hydra::Works version, so if it's going to change in one place, it should probably change in both.)

Characterization can return either a single value for each property, or an array of values. In the case of an array, we want to store the whole array, except in certain cases (currently hard-coded to be MIME type, height, and width, which have special handling). The way this is accomplished is that the store_metadata method treats each value as an array (possibly of length 1), and iterates over the array of values, calling append_property_value on each.

    def store_metadata(terms)
      terms.each_pair do |term, value|
        property = property_for(term)
        next if property.nil?
        # Array-ify the value to avoid a conditional here
        Array(value).each { |v| append_property_value(property, v) }
      end
    end

Each call to append_property_value retrieves the current value or values from the object, and appends the new one.

    def append_property_value(property, value)
      # We don't want multiple mime_types; this overwrites each time to accept last value
      value = object.send(property) + [value] unless property == :mime_type
      # We don't want multiple heights / widths, pick the max
      value = value.map(&:to_i).max.to_s if property == :height || property == :width
      object.send("#{property}=", value)
    end

This works fine if we start with an empty metadata object, but if we start with one that has values from a previous version, we end up retaining the old values and appending to them, when we should be replacing them instead.

I would propose that instead of repeatedly getting and setting the values from/on FileMetadata, we ditch append_property_value, filter the values up front and just make one #{property}= call:

    def store_metadata(terms)
      terms.each do |term, value|
        next unless (property = property_for(term))

        # could also extract this into a `filter_value` method
        # and put it where `append_property_value` is now
        if property == :mime_type
          # keep only the last mime type
          value = Array(value).last
        elsif [:height, :width].include?(property)
          # keep only the max height or width
          value = Array(value).map(&:to_i).max.to_s
        end

        object.send("#{property}=", value)
      end
    end

Does this seem worthwhile?

Or is there a use case for retaining / appending to the values from old versions that I'm not seeing?

Or is there a difference in the hydra-works / Fedora behavior that makes this not a problem on this side of the fence?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions