Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add nvidia ingest component #6333

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

jordanrfrazier
Copy link
Collaborator

@jordanrfrazier jordanrfrazier commented Feb 13, 2025

Adds the nv-ingest component. Adds nv-ingest-client as an optional dependency, as it introduces several large packages:

(langflow) ~/Documents/langflow/langflow (nvidia-components-ingest ✗) uv sync
Resolved 556 packages in 189ms
   Built langflow @ file:///Users/jordan.frazier/Documents/Langflow/langflow
Prepared 1 package in 306ms
Uninstalled 1 package in 1ms
Installed 17 packages in 85ms
 + argon2-cffi==23.1.0
 + argon2-cffi-bindings==21.2.0
 + azure-core==1.32.0
 + azure-storage-blob==12.24.1
 + dirtyjson==1.0.8
 + isodate==0.7.2
 ~ langflow==1.1.5 (from file:///Users/jordan.frazier/Documents/Langflow/langflow)
 + llama-index-core==0.10.68.post1
 + llama-index-embeddings-nvidia==0.1.5
 + milvus-model==0.2.12
 + minio==7.2.15
 + nv-ingest-client==2025.2.7.dev0
 + python-pptx==0.6.23
 + safetensors==0.5.2
 + scipy==1.15.1
 + transformers==4.46.3
 + xlsxwriter==3.2.2

@jordanrfrazier
Copy link
Collaborator Author

cc. @jeffreyscarpenter

@jordanrfrazier
Copy link
Collaborator Author

jordanrfrazier commented Feb 13, 2025

Open Items:

  • Investigate why adding manual deps is necessary
    • Requirements were not defined in these earlier versions of nv-ingest
  • UI/UX
  • Testing

@jordanrfrazier jordanrfrazier changed the title (DO NOT MERGE) feat: add nvidia ingest component feat: add nvidia ingest component Feb 13, 2025
@jordanrfrazier jordanrfrazier marked this pull request as ready for review February 13, 2025 21:59
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Feb 13, 2025
@jordanrfrazier jordanrfrazier requested review from ogabrielluiz and edwinjosechittilappilly and removed request for ogabrielluiz February 13, 2025 21:59
@dosubot dosubot bot added the enhancement New feature or request label Feb 13, 2025
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Feb 13, 2025
Copy link
Collaborator

@edwinjosechittilappilly edwinjosechittilappilly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed : transformers was getting installed as part of the dependencies.

@ogabrielluiz Looking forward to your views on it.

]

outputs = [
Output(display_name="Data", name="data", method="load_file"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the component is having a list[Data]

I would suggest to add two outputs. one for Data and other for DataFrame

ref. URL.py

Copy link
Collaborator Author

@jordanrfrazier jordanrfrazier Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we think that introducing this secondary dataframe output in the ingest components is cluttering the UI?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is anything that is List[Data] , Dataframe is suggested, in most cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jordanrfrazier Yes, but it is a migration strategy. Soon we will have more Components with DataFrame input and it we can start removing the list[Data] outputs.

Copy link

@simonfraserduncan simonfraserduncan Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @jordanrfrazier

What's the most common use case here? Just because we have the capability doesn't mean we have to include it. If most users primarily work with Data, adding a secondary DataFrame output clutters the UI and introduces [more] decision fatigue.

Is there a strong use case where exposing both by default improves usability?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ogabrielluiz Isn't this more of an internal implementation detail, in that case? i.e. if the langflow.Dataframe can be handled in the same way as a list[Data] (and by extension, Data, I'm assuming), shouldn't we be able to just do a single swap of output/input types with a conversion method in the langflow.Dataframe class to Data if still required by the component?

Side note: Is there a strong reason for the name change to Dataframe? Does that unconsciously couple us to pandas too much, in the scenario we decide to change the backing data type in the future?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we consider inclusion of DataFrame as a future improvement?

@ogabrielluiz
Copy link
Contributor

Noticed : transformers was getting installed as part of the dependencies.

@ogabrielluiz Looking forward to your views on it.

Yeah.. that's not good. It didn't install pytorch, though which is the heaviest one. Maybe it isn't that bad but I'd try to avoid if possible, maybe checking extras could help.

@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Feb 14, 2025
@jordanrfrazier
Copy link
Collaborator Author

Screenshot 2025-02-13 at 6 08 09 PM Screenshot 2025-02-13 at 6 04 16 PM

@jordanrfrazier
Copy link
Collaborator Author

Noticed : transformers was getting installed as part of the dependencies.
@ogabrielluiz Looking forward to your views on it.

Yeah.. that's not good. It didn't install pytorch, though which is the heaviest one. Maybe it isn't that bad but I'd try to avoid if possible, maybe checking extras could help.

I moved to an optional dependency for now

@jordanrfrazier jordanrfrazier force-pushed the nvidia-components-ingest branch from d6a54ca to 7065d5b Compare February 14, 2025 02:17
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Feb 14, 2025
@edwinjosechittilappilly
Copy link
Collaborator

@jordanrfrazier, Can you resolve the conflicts?
After that, I can initiate the testing!

@jordanrfrazier jordanrfrazier force-pushed the nvidia-components-ingest branch from ee63f60 to 1179f2d Compare February 18, 2025 16:24
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Feb 18, 2025
Copy link

codspeed-hq bot commented Feb 18, 2025

CodSpeed Performance Report

Merging #6333 will degrade performances by 10.32%

Comparing nvidia-components-ingest (26649e8) with main (25ac555)

Summary

❌ 1 regressions
✅ 13 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
test_build_flow_invalid_job_id 8.2 ms 9.1 ms -10.32%

@jordanrfrazier jordanrfrazier force-pushed the nvidia-components-ingest branch from 26649e8 to 5560b0c Compare February 19, 2025 05:31
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Feb 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants