Skip to content

Filters

Paul Götze edited this page Feb 3, 2016 · 2 revisions

Filters are used to preprocess datasets.

There are two categories of filters which are also reflected by the namespaces:

  • supervised – The filter requires a class atribute to be set
  • unsupervised – A class attribute is not required to be present

In each category there are two sub-categories:

  • attribute-based – Attributes (columns) are processed
  • instance-based – Instances (rows) are processed

Thus, Filter classes are organized in the following four namespaces:

Weka::Filters::Supervised::Attribute
Weka::Filters::Supervised::Instance

Weka::Filters::Unsupervised::Attribute
Weka::Filters::Unsupervised::Instance

Filtering Instances

Filters can be used directly to filter Instances:

# create filter
filter = Weka::Filters::Unsupervised::Attribute::Normalize.new

# filter instances
filtered_data = filter.filter(instances)

You can also apply a Filter on an Instances object:

# create filter
filter = Weka::Filters::Unsupervised::Attribute::Normalize.new

# apply filter on instances
filtered_data = instances.apply_filter(filter)

With this approach, it is possible to chain multiple filters on a dataset:

# create filters
include Weka::Filters::Unsupervised::Attribute

normalize  = Normalize.new
discretize = Discretize.new

# apply a filter chain on instances
filtered_data = instances.apply_filter(normalize).apply_filter(discretize)

# or even shorter
filtered_data = instances.apply_filters(normalize, discretize)

Setting Filter options

Any Filter has several options. You can list a description of all options of a filter:

puts Weka::Filters::Unsupervised::Attribute::Normalize.options
# -S <num> The scaling factor for the output range.
#   (default: 1.0)
# -T <num>  The translation of the output range.
#   (default: 0.0)
# -unset-class-temporarily  Unsets the class index temporarily before the filter is
#   applied to the data.
#   (default: no)

To get the default option set of a Filter you can run .default_options:

Weka::Filters::Unsupervised::Attribute::Normalize.default_options
# => '-S 1.0 -T 0.0'

Options can be set while building a Filter:

filter = Weka::Filters::Unsupervised::Attribute::Normalize.build do
  use_options '-S 0.5'
end

Or they can be set or changed after you created the Filter:

filter = Weka::Filters::Unsupervised::Attribute::Normalize.new
filter.use_options('-S 0.5')
Clone this wiki locally