Replies: 1 comment
-
ahhh yes. you’ve hit a painfully under-discussed issue in RAG pipelines — especially with larger files: we logged this in our open-source diagnosis map as:
we got tired of this exact babysitting loop, so we built auto-resume logic + async chunk recovery into our toolchain. 👉 https://github.com/onestardao/WFGY/blob/main/ProblemMap/README.md if you're interested, happy to show how we hook it up — zero restart, just skip-forward and retry the failed chunk only. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I’m running into frequent LLM timeouts when LightRAG processes large documents for entities and relations (using Ollama/API). When this happens, the whole file processing just stops, and nothing else gets indexed for that document.
Right now, I have to manually restart the process for that file if it bombs out with a timeout. The LLM cache helps a lot here—it lets the system skip over chunks that were already processed, so there’s no extra LLM overhead for those. But the main annoyance is that I can’t just hit a “retry” or “resume” button in the UI. Instead, I have to delete the file and upload it again to restart processing, which is obviously messy and inefficient.
Feature Suggestions
• Automatic retry/resume on error: It would be great if LightRAG could auto-retry the processing job if it fails with a timeout (or similar transient error), instead of just quitting. Right now, I have to babysit it and manually restart.
• Manual resume button in UI: There should be a button in the web UI that lets me resume processing right where it left off for a file, without deleting/reuploading. The status should be preserved.
• Clearer error reporting: Better logging and status in the UI when things fail—show which chunk caused the timeout, error details, etc.
Basically, I need a way—either automatic or manual—for LightRAG to pick up where it left off after a timeout, without extra manual work or reuploading files. The LLM cache already takes care of skipping processed chunks, so this should be doable. It’s a pretty common workflow for batch jobs in other systems.
Beta Was this translation helpful? Give feedback.
All reactions