Open
Description
@jerinphilip Based on our discussion on slack and upon some further thought, I'd propose this.
- replace members
std::vector<std::string> input
andstd::string translation
withstd::vector<SentencePieceText>
and SentencePieceText, respectively. - Move tokenization from QueuedInput::next() to the PlainTextTranslation constructor, where sentence splitting takes place.
- Replace indeed Queued Input and BatchGenerator by a different mechanism.
(Background comment: I'm hesitant to touch the Batch class, as it's fairly baked into Marian, so we need some mechanism of mapping sentence-level translations to the respective requests. Currently the TranslationService has a map that maps from unique sentence ids to the respective Jobs and promises.)
Wrt 3., here's what I'd propose for starters:
- for the time being, we keep the mechanism where the TranslationService stores a shared ptr to each translation job in a map that maps from its id (identical to the sentenceId in the respective batch) to the shared ptr and a promise.
- for further processing, we pass around weak_ptr. This lets us determine at any point whether a job is still valid.
- we add a member function cancelJob(uint64_t job_id) to TranslationService that removes the entry in the map mentioned above.
- PlainTextTranslation keeps track of the ids of the jobs it owns. Cancellation of a PlainTextTranslation the cancels all the jobs it owns.
- The TranslatoinService (via custom batcher class (to make code maintenance easier)) keeps an array of deques corresponding to the respective sentence lengths, and a heap where the heads of each deques are prioritized so that oldest comes first. When a translation worker requests a batch, we take the oldest sentence and fill up the batch with sentences of the same or a similar length. Each time we pop a sentence/Job from the respective deque, we update the heap. Jobs that aren't valid any more (weak_ptr::lock() returns a shared_ptr to nullptr), can be skipped at batch creation time.
Metadata
Metadata
Assignees
Labels
No labels