Merge Data: extend documentation#3958

Merged

janezd merged 1 commit intobiolab:masterfrom

ajdapretnar:mergedata-docs

Aug 2, 2019

Contributor

ajdapretnar commented Aug 1, 2019

Issue

Documents changes introduced in #3919.

Description of changes

Documents merging by multiple rows. Also extends documentation for merging types.

Includes

Code changes
Tests
Documentation

ajdapretnar requested a review from janezd

August 1, 2019 14:22

codecov bot commented Aug 1, 2019

Codecov Report

Merging #3958 into master will decrease coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3958      +/-   ##
==========================================
- Coverage   85.13%   85.13%   -0.01%     
==========================================
  Files         378      378              
  Lines       67117    67117              
==========================================
- Hits        57138    57137       -1     
- Misses       9979     9980       +1

janezd reviewed

View reviewed changes

Contributor

janezd left a comment

Thanks. I have a few small comments. This widget is indeed difficult to describe, but I guess we've done a good job now.

doc/visual-programming/source/widgets/data/mergedata.md Outdated

    
              The **Merge Data** widget is used to horizontally merge two datasets, based on the values of selected attributes (columns). In the input, two datasets are required, data and extra data. The widget allows selection of one or more attributes from each domain, which will be used to perform the merging. The widget produces one output. It corresponds to the instances from the input data to which attributes (columns) from input extra data are appended.

              Merging is done by values of selected (merging) attributes. First, the value of the merging attribute from Data is taken and instances from Extra Data are searched for matching values. If more than a single instance from Extra Data was to be found, the attribute is removed from available merging attributes.

              Merging is done by values of selected (merging) attributes. First, the value of the merging attribute from Data is taken and instances from Extra Data are searched for matching values. If the selected attribute does not contain unique values (in other words, the attribute has duplicate values), the attribute is removed from the available merging attributes.

Contributor

janezd Aug 1, 2019

This is no longer true because now it suffices that a combination of attributes yields unique values. Now, it's an error if the chosen combination of values is not unique.

I'd also say "matching" instead of merging.

Actually, I'd remove this paragraph and change the previous one.

doc/visual-programming/source/widgets/data/mergedata.md Outdated

    
              - Data: dataset with features added from extra data

              The **Merge Data** widget is used to horizontally merge two datasets, based on values of selected attributes. In the input, two datasets are required, data and extra data. The widget allows selection of an attribute from each domain, which will be used to perform the merging. The widget produces one output. It corresponds to instances from the input data to which attributes from input extra data are appended.

              The **Merge Data** widget is used to horizontally merge two datasets, based on the values of selected attributes (columns). In the input, two datasets are required, data and extra data. The widget allows selection of one or more attributes from each domain, which will be used to perform the merging. The widget produces one output. It corresponds to the instances from the input data to which attributes (columns) from input extra data are appended.

Contributor

janezd Aug 1, 2019

The widget allows selection of one or more attributes from each domain, which will be used to perform the merging.

Maybe: Rows from the two data sets are matched by the values of pairs of attributes, chosen by the user.

doc/visual-programming/source/widgets/data/mergedata.md Outdated

    
              Merging is done by values of selected (merging) attributes. First, the value of the merging attribute from Data is taken and instances from Extra Data are searched for matching values. If the selected attribute does not contain unique values (in other words, the attribute has duplicate values), the attribute is removed from the available merging attributes.

              ![](images/MergeData-stamped.png)

              Merge Data can merge also on more than one attribute. Click on the plus icon to add the attribute to merge on. The final result have to be unique combinations for each individual row.

Contributor

janezd Aug 1, 2019

attribute -> attribute pair? (appears twice)

doc/visual-programming/source/widgets/data/mergedata.md Outdated

+. Information on main data.
+. Information on data to append.
+. Merging type:
+                 - **Append columns from Extra Data** outputs all instances from Data appended by matching instances from Extra Data. When no match is found,unknown values are appended.

Contributor

janezd Aug 1, 2019

Perhaps: Output all rows from the Data, augmented by columns from matching rows in the Extra data. Rows without matches are retained, though the data in extra columns is missing.

Appending sound more like vertical (to me).

"Unknown values" sounds (to me) as if behaviour is undefined.

doc/visual-programming/source/widgets/data/mergedata.md Outdated

+. Information on data to append.
+. Merging type:
+                 - **Append columns from Extra Data** outputs all instances from Data appended by matching instances from Extra Data. When no match is found,unknown values are appended.
+                 - **Find matching pairs of rows** outputs only matching instances.

Contributor

janezd Aug 1, 2019

Perhaps: Find matching pairs of rows: similar to above, except that Data rows without matches are removed from the output.*

doc/visual-programming/source/widgets/data/mergedata.md Outdated

+. Merging type:
+                 - **Append columns from Extra Data** outputs all instances from Data appended by matching instances from Extra Data. When no match is found,unknown values are appended.
+                 - **Find matching pairs of rows** outputs only matching instances.
+                 - **Concatenate tables** outputs all instances from both inputs, even though the match may not be found. In that case unknown values are assigned.

Contributor

janezd Aug 1, 2019

Concatenate tables treats both data sources symmetrically. The output is similar as for the first option, except that non-matched values from Extra data are appended at the end.

doc/visual-programming/source/widgets/data/mergedata.md Outdated

+                 - **Append columns from Extra Data** outputs all instances from Data appended by matching instances from Extra Data. When no match is found,unknown values are appended.
+                 - **Find matching pairs of rows** outputs only matching instances.
+                 - **Concatenate tables** outputs all instances from both inputs, even though the match may not be found. In that case unknown values are assigned.
+. List of comparable attributes from Data input.

Contributor

janezd Aug 1, 2019

Skip "comparable".

doc/visual-programming/source/widgets/data/mergedata.md Outdated


		#####Concatenate tables (outer join)

		The rows present in both the Data and the Extra Data will be present on the output. Where rows cannot be matched, missing values will appear.

Contributor

janezd Aug 1, 2019

Perhaps: All rows from both ...


          Merge Data docs

08f87e2

ajdapretnar force-pushed the mergedata-docs branch from 1e971a6 to 08f87e2 Compare

August 2, 2019 06:53

Contributor Author

ajdapretnar commented Aug 2, 2019

Fixed.

janezd merged commit ea80339 into biolab:master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet