Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Arrow URI FileSystem backed instance to retrieve remote files #7709

Merged
merged 17 commits into from
Jun 14, 2021

Conversation

jdye64
Copy link
Contributor

@jdye64 jdye64 commented Mar 24, 2021

Arrow offers an API that allows for users to provide a uri definition for target files. This PR will use that api and create a new arrow_io_source constructor to accept that information from the user and then create the appropriate FileSystem instance and configure it for access to that file.

This closes: #7475

@jdye64 jdye64 requested a review from a team as a code owner March 24, 2021 19:30
@jdye64 jdye64 requested review from cwharris and vuule March 24, 2021 19:30
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Mar 24, 2021
@jdye64
Copy link
Contributor Author

jdye64 commented Mar 24, 2021

Could someone please label this PR as [WIP]? I do not have the permissions to do so.

@kkraus14
Copy link
Collaborator

Could someone please label this PR as [WIP]? I do not have the permissions to do so.

Can you convert the PR to a draft using a link on the right hand side?

@jdye64 jdye64 marked this pull request as draft March 24, 2021 19:33
@ayushdg
Copy link
Member

ayushdg commented Mar 24, 2021

In terms of configuration options for s3 and hdfs we currently support the following options: hdfs, s3.
Any ideas on how to expose the config options based on the filesystem class?

@vuule vuule added cuIO cuIO issue non-breaking Non-breaking change feature request New feature or request labels Mar 25, 2021
@codecov
Copy link

codecov bot commented Mar 25, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.08@90e29d9). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head 5ca415f differs from pull request most recent head 5535475. Consider uploading reports for the commit 5535475 to get more accurate results
Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.08    #7709   +/-   ##
===============================================
  Coverage                ?   82.91%           
===============================================
  Files                   ?      110           
  Lines                   ?    18094           
  Branches                ?        0           
===============================================
  Hits                    ?    15002           
  Misses                  ?     3092           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 90e29d9...5535475. Read the comment docs.

@github-actions github-actions bot added CMake CMake build issue conda Java Affects Java cuDF API. labels Apr 8, 2021
@kkraus14 kkraus14 changed the base branch from branch-0.19 to branch-0.20 April 9, 2021 00:15
@github-actions github-actions bot removed conda Java Affects Java cuDF API. labels Apr 22, 2021
@karthikeyann karthikeyann added the 2 - In Progress Currently a work in progress label May 17, 2021
@vuule
Copy link
Contributor

vuule commented May 19, 2021

@jdye64 what is the status of this PR? Should it still target 21.06?

@github-actions github-actions bot removed Java Affects Java cuDF API. conda labels Jun 1, 2021
@jdye64
Copy link
Contributor Author

jdye64 commented Jun 1, 2021

rerun tests

1 similar comment
@randerzander
Copy link
Contributor

rerun tests

Copy link
Member

@ayushdg ayushdg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @jdye64! Couple of comments regarding tests.

cpp/tests/io/arrow_io_source_test.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@hyperbolic2346 hyperbolic2346 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall.

cpp/include/cudf/io/datasource.hpp Outdated Show resolved Hide resolved
@vuule
Copy link
Contributor

vuule commented Jun 14, 2021

@gpucibot merge

@jdye64
Copy link
Contributor Author

jdye64 commented Jun 14, 2021

rerun tests

1 similar comment
@jdye64
Copy link
Contributor Author

jdye64 commented Jun 14, 2021

rerun tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currently a work in progress CMake CMake build issue cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA/Proposal] Use Arrow backed filesystem objects for reading remote files
9 participants