⚡️ Speed up function calculate_text_metrics by 109% in PR #11114 (feat/langchain-1.0)
#11353
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
⚡️ This pull request contains optimizations for PR #11114
If you approve this dependent PR, these changes will be merged into the original PR branch
feat/langchain-1.0.📄 109% (1.09x) speedup for
calculate_text_metricsinsrc/backend/base/langflow/api/v1/knowledge_bases.py⏱️ Runtime :
84.7 milliseconds→40.5 milliseconds(best of91runs)📝 Explanation and details
The optimized code achieves a 108% speedup (from 84.7ms to 40.5ms) by eliminating redundant per-column operations and using more efficient pandas string methods.
Key Optimizations
1. Batch Column Processing via
stack()2. Regex-based Word Counting
str.split().str.len()which creates Python lists for every cell, then counts list lengths (145ms in profiler - the slowest operation)str.count(r'\S+')which counts non-whitespace sequences directly without materializing lists3. Early Exit for Empty Column Lists
Performance Characteristics
The optimization excels when:
Test results confirm this: the large-scale tests (500 rows, 100 columns) benefit most, while simple single-column cases see modest gains due to the overhead of stacking being comparable to the single-iteration loop.
Correctness Note
Both implementations handle edge cases identically (None → "None", empty strings, Unicode), as confirmed by the comprehensive test suite passing.
✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-pr11114-2026-01-19T14.59.04and push.