You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Mechenich, M.F., Žliobaitė, I. Eco-ISEA3H, a machine learning ready spatial database for ecometric and species distribution modeling. Sci Data 10, 77 (2023). https://doi.org/10.1038/s41597-023-01966-x
This paper has details of various sampling strategies employed for indexing raster data.
Categorical
Centroid: record the categorical variable occuring at each cell centroid. Nulls are carried over.
Fraction: record the proportion of each cell's area covered by each categorical value. There would be a fraction attribute for each class for each cell. (A sparse data structure could help manage this.)
Mode: as it suggestes on the tin; but a null value used in cases where fraction attributes sum to less than 0.2 of the cell's area. (I think this is probably wrong; it leads to data loss for cells on the edge of nodata areas. Perhaps there should be a switch for whether null should be a valid modal value; or to give a threshold like 0.2 as a parameter.)
Continuous
Centroid: as above.
Mean: area-weighted arithmetic mean. The authors are careful to do the conversion operations in the native coordinate reference system. For data in authalic coordinate reference systems, the area-weighted mean is the simple mean. But for data in WGS84, they calculate the size of each pixel and use that as a weight when calculating the mean. See VRT warping method causing spatial inconsistencies. #14 for other discussion on how we handle reprojection issues currently; it may need revision.
This issue should be closed when this tool is capable of reproducing all of these cases.
The text was updated successfully, but these errors were encountered:
Mechenich, M.F., Žliobaitė, I. Eco-ISEA3H, a machine learning ready spatial database for ecometric and species distribution modeling. Sci Data 10, 77 (2023). https://doi.org/10.1038/s41597-023-01966-x
This paper has details of various sampling strategies employed for indexing raster data.
Categorical
Continuous
This issue should be closed when this tool is capable of reproducing all of these cases.
The text was updated successfully, but these errors were encountered: