Change Neighborhood Size Calc in Find Orientations #714

darrencpagan · 2024-09-16T12:40:27Z

darrencpagan
Sep 16, 2024

The logic for determining the neighborhood size for clustering for find-orientations tends to overestimate the necessary size, leading to missed grains, particularly in textured materials.

Reilly Knox and I suggest changing line 693 of findorientations.py to:

int(np.floor(0.75*(1-compl_thresh)*min(seed_refl_per_grain)))

psavery · 2024-09-17T17:29:14Z

psavery
Sep 17, 2024
Maintainer

For reference, here is the line.

@donald-e-boyce, can you take a look at this?

0 replies

donald-e-boyce · 2024-09-23T15:24:31Z

donald-e-boyce
Sep 23, 2024
Maintainer

In the create_clustering_parameters function, the parameters
min_samples and mean_rpg are returned. The min_samples
parameter is the minimum size for a cluster when using the
dbcluster package for clustering. The mean_rpg is the mean
number of reflections per grain (more below).

If min_samples is too large, then it may miss some valid grains.
If it is too small, then dbscan is less efficient, and some of the
clusters may not correspond to real grains.

The create_clustering_parameters function works by creating 100 random
orientations and creating synthetic grains with those orientations, centroid
at the origin, and zero strain. Then it computes the reflections for
each seed HKL, tracking the number of reflections for each grain and
seed. The mean_rpg value is just the mean of the number of reflections
on each grain. The min_samples value is computed from the minimum number
of reflections of the 100 grains and the completeness threshold, using the
formula:

    min_samples = max(
        int(np.floor(0.5*compl_thresh*min(seed_refl_per_grain))),
        2
    )

where compl_thresh is the minimum completeness for a grain.
The intent here is that from the random orientations, you expect almost all
of the grains to have at least compl_thresh*min(seed_refl_per_grain))
reflections, and cutting that in half gives generous allowance.
The minimum number of samples is always at least 2.

@darrencpagan suggests the formula:
int(np.floor(0.75*(1-compl_thresh)*min(seed_refl_per_grain)))
to replace the existing
int(np.floor(0.5*compl_thresh*min(seed_refl_per_grain))).

One thing immediately apparent is that the formula decreases with
increasing completeness. In fact, for completeness threshold of 1, it gives
a value of 0, corresponding to a min_samples value of 2 (the minimum).
And for a completeness threshold of 0, it gives the maximum value.
This seems to be the opposite of what you want. With a high completeness
threshold, you can use a higher cluster size because all the grains will
have lots of reflections; with a low completeness threshold, you have
to make it much smaller because many grains will have a small proportion
of reflections.

Here are some other suggestions:

set the min_samples based on mean and standard deviation of
the samples; knowing the standard deviation, you can set the cutoff
value to correspond to an expected percentage of grains being found,
e.g. to the 99% or 99.9 percent value, based on the
random grain statistics.
run more than 100 random grains; make that an option to the
create_clustering_parameters function.
in fact, add ngrains and a pvalue arguments to the
create_clustering_parameters function
maybe add an option to set min_samples directly to a fixed value, and make that
available in the config file.

In any case, I think we ought to experiment with these values a little
bit and see what happens. I'll work on that some this week, starting
with adding arguments to create_clustering_parameters function.

0 replies

joelvbernier · 2024-09-23T16:21:33Z

joelvbernier
Sep 23, 2024
Maintainer

I think Don’s suggestion of a statistics-based approach is the way to go. This heuristic was originally developed and tested with synthetic data, but I think it definitely merits a parameter study both with and without strong texture.Sent from my iPhoneOn Sep 23, 2024, at 08:24, Donald Boyce ***@***.***> wrote: In the create_clustering_parameters function, the parameters min_samples and mean_rpg are returned. The min_samples parameter is the minimum size for a cluster when using the dbcluster package for clustering. The mean_rpg is the mean number of reflections per grain (more below). If min_samples is too large, then it may miss some valid grains. If it is too small, then dbscan is less efficient, and some of the clusters may not correspond to real grains. The create_clustering_parameters function works by creating 100 random orientations and creating synthetic grains with those orientations, centroid at the origin, and zero strain. Then it computes the reflections for each seed HKL, tracking the number of reflections for each grain and seed. The mean_rpg value is just the mean of the number of reflections on each grain. The min_samples value is computed from the minimum number of reflections of the 100 grains and the completeness threshold, using the formula: min_samples = max( int(np.floor(0.5*compl_thresh*min(seed_refl_per_grain))), 2 ) where compl_thresh is the minimum completeness for a grain. The intent here is that from the random orientations, you expect almost all of the grains to have at least compl_thresh*min(seed_refl_per_grain)) reflections, and cutting that in half gives generous allowance. The minimum number of samples is always at least 2. @darrencpagan suggests the formula: int(np.floor(0.75*(1-compl_thresh)*min(seed_refl_per_grain))) to replace the existing int(np.floor(0.5*compl_thresh*min(seed_refl_per_grain))). One thing immediately apparent is that the formula decreases with increasing completeness. In fact, for completeness threshold of 1, it gives a value of 0, corresponding to a min_samples value of 2 (the minimum). And for a completeness threshold of 0, it gives the maximum value. This seems to be the opposite of what you want. With a high completeness threshold, you can use a higher cluster size because all the grains will have lots of reflections; with a low completeness threshold, you have to make it much smaller because many grains will have a small proportion of reflections. Here are some other suggestions: set the min_samples based on mean and standard deviation of the samples; knowing the standard deviation, you can set the cutoff value to correspond to an expected percentage of grains being found, e.g. to the 99% or 99.9 percent value, based on the random grain statistics. run more than 100 random grains; make that an option to the create_clustering_parameters function. in fact, add ngrains and a pvalue arguments to the create_clustering_parameters function maybe add an option to set min_samples directly to a fixed value, and make that available in the config file. In any case, I think we ought to experiment with these values a little bit and see what happens. I'll work on that some this week, starting with adding arguments to create_clustering_parameters function. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change Neighborhood Size Calc in Find Orientations #714

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Change Neighborhood Size Calc in Find Orientations #714

darrencpagan Sep 16, 2024

Replies: 3 comments

psavery Sep 17, 2024 Maintainer

donald-e-boyce Sep 23, 2024 Maintainer

joelvbernier Sep 23, 2024 Maintainer

darrencpagan
Sep 16, 2024

psavery
Sep 17, 2024
Maintainer

donald-e-boyce
Sep 23, 2024
Maintainer

joelvbernier
Sep 23, 2024
Maintainer