Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Session.request() got an unexpected keyword argument 'output_content_format' #39678

Open
bb-at-ss opened this issue Feb 12, 2025 · 8 comments
Open
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Document Intelligence issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that

Comments

@bb-at-ss
Copy link

bb-at-ss commented Feb 12, 2025

  • Package Name: azure-ai-documentintelligence python sdk
  • Package Version: 1.0.0
  • Operating System: Windows 11
  • Python Version: 3.12

Describe the bug
I've been using Azure Doc Intel in production for a couple of months and now we wanted to add some features, ideally extracting text directly into markdown now. However, when trying to read in the pdf as bytes and use then use the Azure Doc Intel client to read the pdf, extract the text, and return it as formatted markdown we get the following error: Session.request() got an unexpected keyword argument 'output_content_format .

I've searched the code base high-and-low, tried the enum, tried several variations of how to input the pdf file bytes, etc., but the error persists. Commenting it out and running it on our test set of pdfs that have previously been through the pipeline and it works as expected.

To Reproduce

client = DocumentAnalysisClient(
            endpoint=os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"], 
            credential=AzureKeyCredential(os.environ["DOCUMENTINTELLIGENCE_API_KEY"])
        )
with open(pdf_path, "rb") as f:
      file_bytes = f.read()

poller = client.begin_analyze_document(
                    document=AnalyzeDocumentRequest(bytes_source=file_bytes),
                    model_id="prebuilt-layout",
                    output_content_format='markdown'
          )


**Expected behavior**
Extract the text from the pdf in a markdown format. 


@github-actions github-actions bot added customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Feb 12, 2025
@pvaneck
Copy link
Member

pvaneck commented Feb 12, 2025

Hey, @bb-at-ss. Can you confirm that you are actually using version 1.0.0 of azure-ai-documentintelligence? I see in your code snippet, that you are using DocumentAnalysisClient, which indicates that this is likely azure-ai-formrecognizer (which azure-ai-doumentintelligence supersedes). I think only azure-ai-documentintelligences supports the output_content_format keyword argument.

For azure-ai-documentintelligence, check out the migration guide and a sample here.

@pvaneck pvaneck added needs-author-feedback Workflow: More information is needed from author to address the issue. Document Intelligence labels Feb 12, 2025
@github-actions github-actions bot removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Feb 12, 2025
Copy link

Hi @bb-at-ss. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

@bb-at-ss
Copy link
Author

Yes. 100% I'm using 1.0.0 because I made sure to check and update the module before submitting the issue as part of basic trouble shooting. Here again from VS code this morning.

pip show azure-ai-documentintelligence           

azure-ai-documentintelligence           1.0.0

# more modules:

azure-ai-formrecognizer                 3.3.3
azure-ai-inference                      1.0.0b7
azure-common                            1.1.28
azure-core                                  1.32.0
azure-cosmos                              4.9.0
azure-identity                              1.19.0

I tried both strings and the Enum there. Many examples have either, but I continue to encounter this bug. Right now I'm working around it no problem, but I am here to answer more questions if that helps.

@github-actions github-actions bot added needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team and removed needs-author-feedback Workflow: More information is needed from author to address the issue. labels Feb 12, 2025
@kristapratico kristapratico added the Client This issue points to a problem in the data-plane of the library. label Feb 12, 2025
@kristapratico
Copy link
Member

@bb-at-ss can you share a full example that reproduces this (including imports)? To echo what @pvaneck mentioned, the azure-ai-documentintelligence library has no DocumentAnalysisClient (which is shown in your example). This should be DocumentIntelligenceClient.

@kristapratico kristapratico added the needs-author-feedback Workflow: More information is needed from author to address the issue. label Feb 12, 2025
Copy link

Hi @bb-at-ss. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

@github-actions github-actions bot removed the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Feb 12, 2025
@bb-at-ss
Copy link
Author

This is probably on my end then. I will have to look into this, but this can probably be closed.

@github-actions github-actions bot added needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team and removed needs-author-feedback Workflow: More information is needed from author to address the issue. labels Feb 12, 2025
@kristapratico kristapratico added the issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close. label Feb 12, 2025
@github-actions github-actions bot removed the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Feb 12, 2025
Copy link

Hi @bb-at-ss. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text "/unresolve" to remove the "issue-addressed" label and continue the conversation.

@kristapratico
Copy link
Member

@bb-at-ss let us know if you still run into errors, happy to help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Document Intelligence issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that
Projects
None yet
Development

No branches or pull requests

3 participants