-
Notifications
You must be signed in to change notification settings - Fork 8
Filters
Paul Götze edited this page Feb 3, 2016
·
2 revisions
Filters are used to preprocess datasets.
There are two categories of filters which are also reflected by the namespaces:
- supervised – The filter requires a class atribute to be set
- unsupervised – A class attribute is not required to be present
In each category there are two sub-categories:
- attribute-based – Attributes (columns) are processed
- instance-based – Instances (rows) are processed
Thus, Filter classes are organized in the following four namespaces:
Weka::Filters::Supervised::Attribute
Weka::Filters::Supervised::Instance
Weka::Filters::Unsupervised::Attribute
Weka::Filters::Unsupervised::Instance
Filters can be used directly to filter Instances:
# create filter
filter = Weka::Filters::Unsupervised::Attribute::Normalize.new
# filter instances
filtered_data = filter.filter(instances)
You can also apply a Filter on an Instances object:
# create filter
filter = Weka::Filters::Unsupervised::Attribute::Normalize.new
# apply filter on instances
filtered_data = instances.apply_filter(filter)
With this approach, it is possible to chain multiple filters on a dataset:
# create filters
include Weka::Filters::Unsupervised::Attribute
normalize = Normalize.new
discretize = Discretize.new
# apply a filter chain on instances
filtered_data = instances.apply_filter(normalize).apply_filter(discretize)
# or even shorter
filtered_data = instances.apply_filters(normalize, discretize)
Any Filter has several options. You can list a description of all options of a filter:
puts Weka::Filters::Unsupervised::Attribute::Normalize.options
# -S <num> The scaling factor for the output range.
# (default: 1.0)
# -T <num> The translation of the output range.
# (default: 0.0)
# -unset-class-temporarily Unsets the class index temporarily before the filter is
# applied to the data.
# (default: no)
To get the default option set of a Filter you can run .default_options
:
Weka::Filters::Unsupervised::Attribute::Normalize.default_options
# => '-S 1.0 -T 0.0'
Options can be set while building a Filter:
filter = Weka::Filters::Unsupervised::Attribute::Normalize.build do
use_options '-S 0.5'
end
Or they can be set or changed after you created the Filter:
filter = Weka::Filters::Unsupervised::Attribute::Normalize.new
filter.use_options('-S 0.5')