Skip to content

batchAnnotateFiles failing silently (and taking php thread with it) #7288

Closed
@James-THEA

Description

@James-THEA

Environment details

  • OS: Amazon Linux 2023
  • PHP version: 8.2.15
  • Package name and version: v1.9.0

Steps to reproduce

  1. Use this file:
    faraone2005 (1).pdf

  2. Request pages 1-10
    a. Two batches of 5 pages. It works if I do only 1-9.

More context:
I have a setup to parse PDFs that relies on the Google Cloud Vision API. It has worked for the past several months, and anecdotally this is a new issue. There is no error thrown, and the PHP thread just dies.

Moreover, the issue doesn't exist in all my environments. Locally, everything works great (PHP version 8.2.4). On an Amazon Beanstalk server it works as well (some versions as listed above). The issue exists on both new and old servers that we have spun up. That means there is a possible solution of finding the discrepancy between the servers and updating the problem; however, I still think this should be filed as a bug.

I have added memory usage logging, and nothing appears that crazy (>100MB). It does spike on the first request using batchAnnotateFiles and then dies on the second request, so it is possible it spikes again (as I strongly suspect a memory limit is the problem).

I found this bug report: https://www.googlecloudcommunity.com/gc/AI-ML/Vision-AI-OCR-Internal-server-error-Failed-to-process-features/m-p/735441

It looks almost identical to my issue, but it is for Vision AI, so the fix is not applicable

Code example

A little edited for brevity, but I can confirm it still has the problem.

private function myFunction($filePath, int $startingPage, int $lastPage): FileUploadResponse {
        $pdfContent = \Storage::get($filePath);
        $inputConfig = (new InputConfig())
            ->setMimeType('application/pdf')
            ->setContent($pdfContent);
        $feature = (new Feature())->setType(Type::DOCUMENT_TEXT_DETECTION);

        $totalPages = range($startingPage + 1, $lastPage + 1);
        $pageChunks = array_chunk($totalPages, 5);
        $overallText = '';
        $maxLength = self::MAX_UPLOAD_TEXT_LENGTH;        
        
        for ($chunk = 0; $chunk < count($pageChunks); $chunk++) {
            try {
                $imageAnnotator = new ImageAnnotatorClient(['credentials' => 'redacted']);
                $pages = $pageChunks[$chunk];
                $annotateFileRequest = (new AnnotateFileRequest())
                    ->setInputConfig($inputConfig)
                    ->setFeatures([$feature])
                    ->setPages($pages);
                try {
                    $response = $imageAnnotator->batchAnnotateFiles([$annotateFileRequest]); // request dies here
                } catch (\Exception $e) {
                    Logger(json_encode($e));
                }
                $responses = $response->getResponses()[0]->getResponses();

                for ($x = 0; $x < min(count($pages), count($responses)); $x++) {
                    $pageResponse = $responses[$x];
                    if ($pageResponse->hasError()) {
                        continue;
                    }
                    if ($pageResponse->getFullTextAnnotation() !== null) {
                        $overallText .= $pageResponse->getFullTextAnnotation()->getText();
                    }
                }
            } finally {
                $imageAnnotator->close();
                gc_collect_cycles();
            }
        }
        return new FileUploadResponse(text: $overallText);
    }

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions