Skip to content

Merge Data: extend documentation#3958

Merged
janezd merged 1 commit intobiolab:masterfrom
ajdapretnar:mergedata-docs
Aug 2, 2019
Merged

Merge Data: extend documentation#3958
janezd merged 1 commit intobiolab:masterfrom
ajdapretnar:mergedata-docs

Conversation

@ajdapretnar
Copy link
Contributor

Issue

Documents changes introduced in #3919.

Description of changes

Documents merging by multiple rows. Also extends documentation for merging types.

Includes
  • Code changes
  • Tests
  • Documentation

@ajdapretnar ajdapretnar requested a review from janezd August 1, 2019 14:22
@codecov
Copy link

codecov bot commented Aug 1, 2019

Codecov Report

Merging #3958 into master will decrease coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3958      +/-   ##
==========================================
- Coverage   85.13%   85.13%   -0.01%     
==========================================
  Files         378      378              
  Lines       67117    67117              
==========================================
- Hits        57138    57137       -1     
- Misses       9979     9980       +1

Copy link
Contributor

@janezd janezd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I have a few small comments. This widget is indeed difficult to describe, but I guess we've done a good job now.

The **Merge Data** widget is used to horizontally merge two datasets, based on the values of selected attributes (columns). In the input, two datasets are required, data and extra data. The widget allows selection of one or more attributes from each domain, which will be used to perform the merging. The widget produces one output. It corresponds to the instances from the input data to which attributes (columns) from input extra data are appended.

Merging is done by values of selected (merging) attributes. First, the value of the merging attribute from Data is taken and instances from Extra Data are searched for matching values. If more than a single instance from Extra Data was to be found, the attribute is removed from available merging attributes.
Merging is done by values of selected (merging) attributes. First, the value of the merging attribute from Data is taken and instances from Extra Data are searched for matching values. If the selected attribute does not contain unique values (in other words, the attribute has duplicate values), the attribute is removed from the available merging attributes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer true because now it suffices that a combination of attributes yields unique values. Now, it's an error if the chosen combination of values is not unique.

I'd also say "matching" instead of merging.

Actually, I'd remove this paragraph and change the previous one.

- Data: dataset with features added from extra data

The **Merge Data** widget is used to horizontally merge two datasets, based on values of selected attributes. In the input, two datasets are required, data and extra data. The widget allows selection of an attribute from each domain, which will be used to perform the merging. The widget produces one output. It corresponds to instances from the input data to which attributes from input extra data are appended.
The **Merge Data** widget is used to horizontally merge two datasets, based on the values of selected attributes (columns). In the input, two datasets are required, data and extra data. The widget allows selection of one or more attributes from each domain, which will be used to perform the merging. The widget produces one output. It corresponds to the instances from the input data to which attributes (columns) from input extra data are appended.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The widget allows selection of one or more attributes from each domain, which will be used to perform the merging.

Maybe: Rows from the two data sets are matched by the values of pairs of attributes, chosen by the user.

Merging is done by values of selected (merging) attributes. First, the value of the merging attribute from Data is taken and instances from Extra Data are searched for matching values. If the selected attribute does not contain unique values (in other words, the attribute has duplicate values), the attribute is removed from the available merging attributes.

![](images/MergeData-stamped.png)
Merge Data can merge also on more than one attribute. Click on the plus icon to add the attribute to merge on. The final result have to be unique combinations for each individual row.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attribute -> attribute pair? (appears twice)

1. Information on main data.
2. Information on data to append.
3. Merging type:
- **Append columns from Extra Data** outputs all instances from Data appended by matching instances from Extra Data. When no match is found,unknown values are appended.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps: Output all rows from the Data, augmented by columns from matching rows in the Extra data. Rows without matches are retained, though the data in extra columns is missing.

Appending sound more like vertical (to me).

"Unknown values" sounds (to me) as if behaviour is undefined.

2. Information on data to append.
3. Merging type:
- **Append columns from Extra Data** outputs all instances from Data appended by matching instances from Extra Data. When no match is found,unknown values are appended.
- **Find matching pairs of rows** outputs only matching instances.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps: Find matching pairs of rows: similar to above, except that Data rows without matches are removed from the output.*

3. Merging type:
- **Append columns from Extra Data** outputs all instances from Data appended by matching instances from Extra Data. When no match is found,unknown values are appended.
- **Find matching pairs of rows** outputs only matching instances.
- **Concatenate tables** outputs all instances from both inputs, even though the match may not be found. In that case unknown values are assigned.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Concatenate tables treats both data sources symmetrically. The output is similar as for the first option, except that non-matched values from Extra data are appended at the end.

- **Append columns from Extra Data** outputs all instances from Data appended by matching instances from Extra Data. When no match is found,unknown values are appended.
- **Find matching pairs of rows** outputs only matching instances.
- **Concatenate tables** outputs all instances from both inputs, even though the match may not be found. In that case unknown values are assigned.
4. List of comparable attributes from Data input.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skip "comparable".


#####Concatenate tables (outer join)

The rows present in both the Data and the Extra Data will be present on the output. Where rows cannot be matched, missing values will appear.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps: All rows from both ...

@ajdapretnar
Copy link
Contributor Author

Fixed.

@janezd janezd merged commit ea80339 into biolab:master Aug 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants