Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Authorisation for dataprep api's using tokenvalidator middleware #868

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

jaswanth8888
Copy link

Description

  • Added token validator middleware in core utils which validates jwt token using provided identity provider
  • Added keycloak class which validates, verifies,decodes a given JWT token with the realm url provided
  • made changes dataprep-redis to use token validator
  • Ingest,deleting a documnet would need admin role, get documents can be done both admin and users
  • if JWT_AUTH(default false) is set to false Verification wouldn't happen and service works as is

Issues

n/a

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)
  • Others (enhancement, documentation, validation, etc.)

Dependencies

pyjwt

Tests

Describe the tests that you ran to verify your changes.

Copy link

codecov bot commented Nov 10, 2024

Codecov Report

Attention: Patch coverage is 24.32432% with 56 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
comps/cores/mega/keycloak.py 30.00% 28 Missing ⚠️
comps/cores/mega/utils.py 17.64% 28 Missing ⚠️
Files with missing lines Coverage Δ
comps/cores/mega/keycloak.py 30.00% <30.00%> (ø)
comps/cores/mega/utils.py 21.64% <17.64%> (-0.86%) ⬇️

@chensuyue chensuyue added this to the v1.1 milestone Nov 13, 2024
Copy link
Collaborator

@mkbhanda mkbhanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we end up in a denial of service type attack with too many documents uploaded and running out of space. May need to address this space aspect to trigger a warning to admin or at the very least add a comment to that effect here in case someone wants to adopt this sample, then the things they need to watch out for. On document delete, do we remove all vectors associated with the given document?

Copy link
Collaborator

@Ruoyu-y Ruoyu-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the PR, i guess the purpose is to allow only 'ADMIN_ROLE' user to upload and delete files and allow both 'USER_ROLE' and 'ADMIN_ROLE' user to get the existing files. Is that correct?
And from my understanding, authentication and authorization is not part of the logic of data preparation(comments are welcome), so it might not be that proper to make this part inside the component itself. Otherwise, any other component needs authentication and authorization have to add this part in there code. As now most of the pipelines involve a gateway, it is better to trigger authentication and authorization there. Comments are welcome.

@amberjain1
Copy link
Collaborator

amberjain1 commented Nov 13, 2024

Could we end up in a denial of service type attack with too many documents uploaded and running out of space. May need to address this space aspect to trigger a warning to admin or at the very least add a comment to that effect here in case someone wants to adopt this sample, then the things they need to watch out for. On document delete, do we remove all vectors associated with the given document?

@mkbhanda,
This is a valid comment for data-prep service itself. Though this PR is not intending to touch this part or addressing this issue.
DDoS related issue shall be handled as a separate PR.

@amberjain1 amberjain1 closed this Nov 13, 2024
@amberjain1 amberjain1 reopened this Nov 13, 2024
@mkbhanda
Copy link
Collaborator

From the PR, i guess the purpose is to allow only 'ADMIN_ROLE' user to upload and delete files and allow both 'USER_ROLE' and 'ADMIN_ROLE' user to get the existing files. Is that correct? And from my understanding, authentication and authorization is not part of the logic of data preparation(comments are welcome), so it might not be that proper to make this part inside the component itself. Otherwise, any other component needs authentication and authorization have to add this part in there code. As now most of the pipelines involve a gateway, it is better to trigger authentication and authorization there. Comments are welcome.

@Ruoyu-y I agree with your summarization on which roles can do what. @amberjain1 and @jaswanth8888 would you please comment on her suggestion on where authentication and authorization should be handled -- as in the gateway

@jaswanth8888
Copy link
Author

Could we end up in a denial of service type attack with too many documents uploaded and running out of space. May need to address this space aspect to trigger a warning to admin or at the very least add a comment to that effect here in case someone wants to adopt this sample, then the things they need to watch out for. On document delete, do we remove all vectors associated with the given document?

On document delete we are deleting the vectors even before. so we didn't touch that part.

@jaswanth8888
Copy link
Author

From the PR, i guess the purpose is to allow only 'ADMIN_ROLE' user to upload and delete files and allow both 'USER_ROLE' and 'ADMIN_ROLE' user to get the existing files. Is that correct? And from my understanding, authentication and authorization is not part of the logic of data preparation(comments are welcome), so it might not be that proper to make this part inside the component itself. Otherwise, any other component needs authentication and authorization have to add this part in there code. As now most of the pipelines involve a gateway, it is better to trigger authentication and authorization there. Comments are welcome.

@Ruoyu-y I agree with your summarization on which roles can do what. @amberjain1 and @jaswanth8888 would you please comment on her suggestion on where authentication and authorization should be handled -- as in the gateway

@Ruoyu-y Different Api's would need different Role authorization. so, we imported the authorization middleware and used it in data prep. if any other components to implement this they can import and use it

@Ruoyu-y
Copy link
Collaborator

Ruoyu-y commented Nov 14, 2024

From the PR, i guess the purpose is to allow only 'ADMIN_ROLE' user to upload and delete files and allow both 'USER_ROLE' and 'ADMIN_ROLE' user to get the existing files. Is that correct? And from my understanding, authentication and authorization is not part of the logic of data preparation(comments are welcome), so it might not be that proper to make this part inside the component itself. Otherwise, any other component needs authentication and authorization have to add this part in there code. As now most of the pipelines involve a gateway, it is better to trigger authentication and authorization there. Comments are welcome.

@Ruoyu-y I agree with your summarization on which roles can do what. @amberjain1 and @jaswanth8888 would you please comment on her suggestion on where authentication and authorization should be handled -- as in the gateway

@Ruoyu-y Different Api's would need different Role authorization. so, we imported the authorization middleware and used it in data prep. if any other components to implement this they can import and use it

I understand that different APIs require different role authorization. But what i mean is that this authentication and authorization step shall not be part of the component or for certain API. Otherwise, the piece of code should duplicate for multiple components whatever needs an authentication

From the PR, i guess the purpose is to allow only 'ADMIN_ROLE' user to upload and delete files and allow both 'USER_ROLE' and 'ADMIN_ROLE' user to get the existing files. Is that correct? And from my understanding, authentication and authorization is not part of the logic of data preparation(comments are welcome), so it might not be that proper to make this part inside the component itself. Otherwise, any other component needs authentication and authorization have to add this part in there code. As now most of the pipelines involve a gateway, it is better to trigger authentication and authorization there. Comments are welcome.

@Ruoyu-y I agree with your summarization on which roles can do what. @amberjain1 and @jaswanth8888 would you please comment on her suggestion on where authentication and authorization should be handled -- as in the gateway

@Ruoyu-y Different Api's would need different Role authorization. so, we imported the authorization middleware and used it in data prep. if any other components to implement this they can import and use it

I understand that different API requires different Role authorization. But should the authentication and authorization logic reside within component or the API itself(handled by the API call itself)? Shall the gateway be the component that manage all these stuff?
Like if i would like to allow user with role A to be able to access API B and API C. Instead of setting rules in both API B and API C, is it better to set in a central component(gateway) with a single rule?

@chensuyue
Copy link
Collaborator

The ChatQnA test on gaudi failed https://github.com/opea-project/GenAIComps/actions/runs/11820755586/job/32959963824?pr=868, please check.

@chensuyue
Copy link
Collaborator

We have some cross test when the PR update core part.

@chensuyue chensuyue removed this from the v1.1 milestone Nov 18, 2024
@chensuyue
Copy link
Collaborator

Release the v1.1 milestone after sync with @jaswanth8888.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants