Skip to content

Conversation

@jmr
Copy link
Member

@jmr jmr commented Nov 24, 2025

Use the uv coordinates to prune the vertex-edge distance computations needed.

When cell A is above B, we don't need to compare the top vertices/edge of A with the bottom vertices/edge of B.

Note, however, that when A is both above and left of B, we cannot just compute the lower-right/upper-left vertex distance due to the projection.

It is possible that more comparisons can be omitted; I'm not sure.

Error is within 1e-15 radians of previous results.

GetDistance shows a 2x speedup for same-face cells.

Use the uv coordinates to prune the vertex-edge distance computations
needed.

When cell A is above B, we don't need to compare the top vertices/edge
of A with the bottom vertices/edge of B.

Note, however, that when A is both above and left of B, we cannot just
compute the lower-right/upper-left vertex distance due to the
projection.

It is possible that more comparisons can be omitted; I'm not sure.

Error is within 1e-15 radians of previous results.

GetDistance shows a 2x speedup for same-face cells.
@jmr
Copy link
Member Author

jmr commented Nov 24, 2025

@ericveach Could you take a look a this and see if it is close to what you had in mind with your TODO here?

s2geometry/src/s2/s2cell.cc

Lines 512 to 514 in ed62eee

// TODO(ericv): This could be optimized to be at least 5x faster by pruning
// the set of possible closest vertex/edge pairs using the faces and (u,v)
// ranges of both cells.

@ericveach
Copy link

ericveach commented Nov 27, 2025 via email

@ericveach
Copy link

ericveach commented Nov 27, 2025

Somehow the formatting seems to get screwed up when I paste from a terminal window. Here's another attempt.

  if (face_ == target.face_) {
    // Find the index "ai" of the edge of A that is furthest away from the       
    // opposite edge of B in (u,v)-space.                                        
    int ai = -1;
    double max_dist = 0;
    auto checkEdge = [&ai, &max_dist](double dist, int a_edge) {
      if (dist > max_dist) {
        max_dist = dist;
        ai = a_edge;
      }
    };
    checkEdge(uv_[0][0] - target.uv_[0][1], kLeftEdge);
    checkEdge(target.uv_[0][0] - uv_[0][1], kRightEdge);
    checkEdge(uv_[1][0] - target.uv_[1][1], kTopEdge);
    checkEdge(target.uv_[1][0] - uv_[1][1], kBottomEdge);
    if (ai < 0) {
      // A and B intersect (including edge and vertex intersections).            
      return S1ChordAngle::Zero();
    }
    // Otherwise the minimum distance always occurs between an endpoint of the   
    // edge "ai" and the opposite edge (ai ^ 2) of B, or symmetrically, an       
    // endpoint of the opposite edge of B and the edge "ai".                     
    int bi = ai ^ 2;
    S2::UpdateMinDistance(va[ai], vb[bi], vb[bi + 1], &min_dist);
    S2::UpdateMinDistance(va[ai + 1], vb[bi], vb[bi + 1], &min_dist);
    S2::UpdateMinDistance(vb[bi], va[ai], va[ai + 1], &min_dist);
    S2::UpdateMinDistance(vb[bi + 1], va[ai], va[ai + 1], &min_dist);
    return min_dist;
  }

@jmr
Copy link
Member Author

jmr commented Nov 28, 2025

Thanks, Eric! #479 (comment) works for me after I swapped kTopEdge and kBottomEdge and used (_ + 1) & 3 instead of _ + 1. I tried something similar after I noticed the test failures were far in one direction and not the other, but I must have made a mistake when I tried it.

@jmr
Copy link
Member Author

jmr commented Nov 28, 2025

I have incorporated your suggestion, with a separate FindFurthestEdge function since this will soon be used in S2Cell::IsDistanceLess, too.

@ericveach
Copy link

Although I can't build the actual library right now I did try pasting some version of the code I suggested into compiler explorer, and when built under Clang the FindFurthestEdge portion is branchless. Given this and the fact that only 4 edges are tested, how much faster is it now?

Unfortunately GCC doesn't seem willing to generate branchless code, but there's not much we can do about that.

@jmr
Copy link
Member Author

jmr commented Nov 29, 2025

Although I can't build the actual library right now I did try pasting some version of the code I suggested into compiler explorer, and when built under Clang the FindFurthestEdge portion is branchless. Given this and the fact that only 4 edges are tested, how much faster is it now?

UpdateMinDistance is a 3.8x speedup, and GetVertex is 1.5x. Based on operation count, 8x and 2x would be ideal.

Before it was 80% UpdateMinDistance and 15% GetVertex, after it's 60%30%.

This gives an overall 3x speedup. We can take the win and continue to optimize. The largest chunk of the remaining time is in vector Normalize and Norm2.

Unfortunately GCC doesn't seem willing to generate branchless code, but there's not much we can do about that.

I can get gcc to generate branchless code by giving it a branch probability somewhere between 0.21 and 0.33.

https://godbolt.org/z/rhqGcrqE9

It would be interesting to see the actual branch probability and what FDO did.

@jmr
Copy link
Member Author

jmr commented Nov 29, 2025

It's also worth mentioning that only 1/5 as many instructions were executed and IPC decreased from 3.2 to 1.8 (less unnecessary extra work to be done in parallel).

Working on releasing benchmarks now.

@ericveach
Copy link

Thanks for the comprehensive analysis! It looks like the performance analysis tools available have improved a lot in the past few years.

Just to clarify, are the results you mentioned for random cells, or for cells on the same face, or for cells that are small relative to their separation distance? I suspect that the last case may be the most important in practice, e.g. where the separation distance may be up to a few hundred or thousands of km but the cell sizes are, say, only 1% or 10% of that distance. This is the situation you would often have when measuring distances between coverings of real-world geometry, for example.

@ericveach
Copy link

Maybe your existing change is big enough, but if you wanted to try adding the vertex-only case, here is a stab at it:

  if (face_ == target.face_) {
    // In certain cases the distance between two cells is attained between a     
    // pair of vertices.  This makes the distance very cheap to compute and so   
    // it's worth detecting the easy cases where this happens.  Recall that      
    // all cells except at level 0 are slightly diamond-shaped, i.e. one         
    // diagonal is slightly longer than the other and the corresponding cell     
    // corners are either right-angled or acute.  Define a diagonal to have      
    // positive slope if u and v both increase along it, and negative slopee     
    // otherwise.  Furthermore define two cells to be separated along a          
    // positive diagonal if one has strictly larger u- and v-values than the     
    // other, and along a negative diagonal if one cell has strictly smaller     
    // u-values and strictly larger v-values or vice versa.  Then if two cells   
    // A and B are separated along a positive diagonal and the long diagonals    
    // of A and B both have positive slope, then the minimum distance occurs     
    // between two vertices; namely, the closer endpoints of their long          
    // diagonals.  The same is true if the two cells are separated along a       
    // negative diagonal and both long diagonals have negative slope.  One of    
    // these two situations is expected to occur almost 50% of the time when     
    // the cells are on the same face and are small relative to their            
    // separation distance.                                                      
    //                                                                           
    // For the purpose of determining whether cells are separated along a        
    // diagonal, as described above, we assign constants to the four edges of    
    // A as follows: L=1, T=3, R=3, B=9.  "sep_dirs" is the sum of these         
    // values for the edges of A that separate A from B.  This yields the        
    // following possible sums: TL=4, TR=6, BL=10, BR=12, L=1, T=3, R=3, B=9,    
    // none=0.  This lets us test the following conditions cheaply, given that   
    // we test them in the following order:                                      
    //                                                                           
    //  sep_dirs == 0 : the two cells intersect                                  
    //  !(sep_dirs & 1) : cells are separated along a diagonal (TL, TR, BL, BR)  
    //  (sep_dirs & 2) : cells are separated along a positive diagonal (TL, BR)  
    //  (sep_dirs < 8) : cell A  cell B (TL, TR)                                 

    int sep_dirs = 0;
    R2Rect a_uv = a.GetBoundUV();
    R2Rect b_uv = b.GetBoundUV();
    if (a_uv[0][0] > b_uv[0][1]) sep_dirs += 1;  // left side of A               
    if (a_uv[0][1] < b_uv[0][0]) sep_dirs += 3;  // right                        
    if (a_uv[1][0] > b_uv[1][1]) sep_dirs += 9;  // bottom                       
    if (a_uv[1][1] < b_uv[1][0]) sep_dirs += 3;  // top                          
    if (sep_dirs == 0) {
      // A and B intersect (this includes edge and vertex intersections).        
      return S1ChordAngle::Zero();
    }
    // Otherwise if the two cells are separated along a diagonal, check if the   
    // cells also have their long diagonals in that direction.                   
    bool separated = !(sep_dirs & 1);  // TL, TR, BL, BR                         
    if (separated) {
      bool sep_positive = (sep_dirs & 2) != 0;  // TR or BL                      
      bool a_positive = (a_uv[0][0] < 0) != (a_uv[1][0] < 0);
      bool b_positive = (b_uv[0][0] < 0) != (b_uv[1][0] < 0);
      // The use of & rather than && below encourages branchless compilation.    
      if (a_positive == sep_positive & b_positive == sep_positive) {
        // Compute the vertex of A that is closest to B, without branches.       
        // Vertices are numbered as follows: BL=0, BR=1, TR=2, TL=3.             
        // a_positive implies that the closest vertex to B is TR or BL.          
        // a_top_edge implies that the closest vertex to B is TR or TL.          
        bool a_top_edge = sep_dirs < 8;  // TR or TL                             
        int i = (a_positive ? 0 : 1) + (a_top_edge ? 2 : 0);
        return S1ChordAngle(GetVertex(i), target_.GetVertex(i ^ 2));
      }
    }
    // Otherwise carry on with the existing code, except that the (ai < 0) case is no longer needed.
  }

Again, no guarantees that this will work or even compile. But at least in the case of cells that are small relative to their separation distance, and where the separation distance is small relative to the Earth's radius, it should yield a substantial speedup.

@jmr
Copy link
Member Author

jmr commented Dec 1, 2025

It looks like the performance analysis tools available have improved a lot in the past few years.

Definitely. I used:

  1. --benchmark_perf_counters=CYCLES,INSTRUCTIONS: https://github.com/google/benchmark/blob/main/docs/perf_counters.md
  2. Staring at flame graphs in pprof. I'm not sure of the state of the internal vs external versions, but the external one does support flame graphs: google/pprof@8b542ba
  3. benchstat to compare benchmark output: https://pkg.go.dev/golang.org/x/perf/cmd/benchstat

Just to clarify, are the results you mentioned for random cells, or for cells on the same face, or for cells that are small relative to their separation distance?

Random same-face cells.

// Copyright 2025 Google LLC.
// SPDX-License-Identifier: Apache-2.0
static void BM_GetDistanceToCellSameFace(benchmark::State& state) {
  const string seed_str = StrCat("GET_DISTANCE_TO_CELL_SAME_FACE",
                                 absl::GetFlag(FLAGS_s2_random_seed));
  std::seed_seq seed(seed_str.begin(), seed_str.end());
  std::mt19937_64 bitgen(seed);
  std::vector<S2Cell> cells;
  cells.reserve(kBatchSize);
  for (int i = 0; i < kBatchSize; ++i) {
    // Make a cell id and move it to face 0.
    S2CellId cellid = s2random::CellId(bitgen);
    cells.emplace_back(
        S2CellId::FromFacePosLevel(0, cellid.pos(), cellid.level()));
  }

  int i = 0;
  while (state.KeepRunningBatch(kBatchSize)) {
    const S2Cell& cell1 = (cells)[i];
    for (const S2Cell& cell2 : cells) {
      S1ChordAngle distance = cell1.GetDistance(cell2);
      benchmark::DoNotOptimize(distance);
    }
    if (++i == kBatchSize) i = 0;
  }
}
BENCHMARK(BM_GetDistanceToCellSameFace);

A more realistic version of this could be done. There are also larger-scale benchmarks.

Re #479 (comment). Thanks for that. I will try it, but definitely separately.

@jmr
Copy link
Member Author

jmr commented Dec 1, 2025

When I naively try #479 (comment), it's about 5% faster on BM_GetDistanceToCellSameFace. Some of the other benchmarks also show similar speedups,others don't. I didn't run all the benchmarks yet, and won't have time to look at this in detail for a while.

@ericveach
Copy link

Thanks for the trying that and also the profiling info. Your benchmark looks pretty good; the only downside is that I think it probably significantly overweights large cells, since s2random::S2CellId() chooses the cell level randomly between 0 and 30. So for example, ~10% of the cells will be 2500km across or larger, and ~19% of random pairs will involve a cell at least this big. This means the test is significantly weighted towards pairs that overlap or that are not well separated relative to their size. If you wanted to test the well-separated case specifically, cell levels [10..30] might be an appropriate choice.

* Add S2Cell::IsDistanceLess implementation along the lines of
  GetDistance
* Rename "HighErrorExample" test to "HighDifferenceExample"
* Use UpdateMinInteriorDistance when endpoints have already been
  checked
* Reword FindFurthestEdge comment
* Rename GetDistanceToCellBruteForce args
Copy link

@ericveach ericveach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@ericveach
Copy link

ericveach commented Dec 8, 2025 via email

@jmr jmr merged commit dc9df71 into google:master Dec 8, 2025
11 checks passed
@jmr jmr deleted the cell-cell-dist branch December 8, 2025 07:31
@jmr
Copy link
Member Author

jmr commented Dec 8, 2025

Eric, thank you for your time and careful attention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants