Skip to content

Update extending-operators.md #15832

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open

Update extending-operators.md #15832

wants to merge 15 commits into from

Conversation

Adez017
Copy link
Contributor

@Adez017 Adez017 commented Apr 23, 2025

Which issue does this PR close?

Rationale for this change

updated the extending-operators.md file

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Apr 23, 2025
@Adez017 Adez017 marked this pull request as draft April 23, 2025 17:04
@Adez017
Copy link
Contributor Author

Adez017 commented Apr 23, 2025

hi @xudong963 , i want to ask that did we had to rewrite the part of code https://github.com/apache/datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_plan.rs#L18-L24 afterwards or just had to migrate the whole code into extending-operators .
Also please review the changes that I made and provide feedback based on it

@xudong963
Copy link
Member

i want to ask that did we had to rewrite the part of code https://github.com/apache/datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_plan.rs#L18-L24 afterwards or just had to migrate the whole code into extending-operators .

I think after migrating them, we don't need to retain the code

@Adez017
Copy link
Contributor Author

Adez017 commented Apr 24, 2025

i want to ask that did we had to rewrite the part of code https://github.com/apache/datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_plan.rs#L18-L24 afterwards or just had to migrate the whole code into extending-operators .

I think after migrating them, we don't need to retain the code

does that mean that I need to migrate all the code from user_defined_plan.rs to extending-operators with the code also ?

@xudong963
Copy link
Member

i want to ask that did we had to rewrite the part of code https://github.com/apache/datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_plan.rs#L18-L24 afterwards or just had to migrate the whole code into extending-operators .

I think after migrating them, we don't need to retain the code

does that mean that I need to migrate all the code from user_defined_plan.rs to extending-operators with the code also ?

Yes, except tests.

@Adez017
Copy link
Contributor Author

Adez017 commented Apr 24, 2025

i want to ask that did we had to rewrite the part of code https://github.com/apache/datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_plan.rs#L18-L24 afterwards or just had to migrate the whole code into extending-operators .

I think after migrating them, we don't need to retain the code

does that mean that I need to migrate all the code from user_defined_plan.rs to extending-operators with the code also ?

Yes, except tests.

that means above it :

@Adez017 Adez017 marked this pull request as ready for review April 25, 2025 04:55
@Adez017
Copy link
Contributor Author

Adez017 commented Apr 25, 2025

Hi @xudong963 , i think it is ready , give it a check

@xudong963
Copy link
Member

xudong963 commented Apr 25, 2025

You can refer to the doc: https://datafusion.apache.org/library-user-guide/custom-table-providers.html.

It should contain the real code https://github.com/apache/datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_plan.rs#L458-L916 to describe the process of defining an extending operator, not only an SQL example.

@Adez017
Copy link
Contributor Author

Adez017 commented Apr 25, 2025

hey @xudong963 , check it out now .
also could you help in the failing check in workflow?

@xudong963
Copy link
Member

You can rebase with main

@xudong963
Copy link
Member

Would anyone happen to know how to preview the HTML format for the PR changes?

@Adez017
Copy link
Contributor Author

Adez017 commented Apr 27, 2025

You can rebase with main

doe this solve the issue ?

@xudong963
Copy link
Member

You can rebase with main

doe this solve the issue ?

You can open the failed CI and see what's wrong:

error[E0599]: no method named `unwrap` found for struct `CoalescePartitionsExec` in the current scope
   --> datafusion/physical-optimizer/src/enforce_sorting/replace_with_order_preserving_variants.rs:210:14
    |
208 |           let coalesce = CoalescePartitionsExec::new(child)
    |  ________________________-
209 | |             .with_fetch(plan.fetch())
210 | |             .unwrap();
    | |             -^^^^^^ method not found in `CoalescePartitionsExec`
    | |_____________|
    |

warning: unused import: `ExecutionPlan`
  --> datafusion/physical-optimizer/src/enforce_sorting/replace_with_order_preserving_variants.rs:37:32
   |
37 | use datafusion_physical_plan::{ExecutionPlan, ExecutionPlanProperties};
   |                                ^^^^^^^^^^^^^
   |
   = note: `#[warn(unused_imports)]` on by default

The error is fixed in main, so rebasing your branch with main will fix the error

@Adez017
Copy link
Contributor Author

Adez017 commented May 1, 2025

You can rebase with main

doe this solve the issue ?

You can open the failed CI and see what's wrong:

error[E0599]: no method named `unwrap` found for struct `CoalescePartitionsExec` in the current scope
   --> datafusion/physical-optimizer/src/enforce_sorting/replace_with_order_preserving_variants.rs:210:14
    |
208 |           let coalesce = CoalescePartitionsExec::new(child)
    |  ________________________-
209 | |             .with_fetch(plan.fetch())
210 | |             .unwrap();
    | |             -^^^^^^ method not found in `CoalescePartitionsExec`
    | |_____________|
    |

warning: unused import: `ExecutionPlan`
  --> datafusion/physical-optimizer/src/enforce_sorting/replace_with_order_preserving_variants.rs:37:32
   |
37 | use datafusion_physical_plan::{ExecutionPlan, ExecutionPlanProperties};
   |                                ^^^^^^^^^^^^^
   |
   = note: `#[warn(unused_imports)]` on by default

The error is fixed in main, so rebasing your branch with main will fix the error

Thank for your help @xudong963 but I think it didn't work for the failing workflow

@xudong963
Copy link
Member

You can rebase with main

doe this solve the issue ?

You can open the failed CI and see what's wrong:

error[E0599]: no method named `unwrap` found for struct `CoalescePartitionsExec` in the current scope
   --> datafusion/physical-optimizer/src/enforce_sorting/replace_with_order_preserving_variants.rs:210:14
    |
208 |           let coalesce = CoalescePartitionsExec::new(child)
    |  ________________________-
209 | |             .with_fetch(plan.fetch())
210 | |             .unwrap();
    | |             -^^^^^^ method not found in `CoalescePartitionsExec`
    | |_____________|
    |

warning: unused import: `ExecutionPlan`
  --> datafusion/physical-optimizer/src/enforce_sorting/replace_with_order_preserving_variants.rs:37:32
   |
37 | use datafusion_physical_plan::{ExecutionPlan, ExecutionPlanProperties};
   |                                ^^^^^^^^^^^^^
   |
   = note: `#[warn(unused_imports)]` on by default

The error is fixed in main, so rebasing your branch with main will fix the error

Thank for your help @xudong963 but I think it didn't work for the failing workflow

Sorry I am on vacation , you can try to fix by the error hints in ci

@Adez017
Copy link
Contributor Author

Adez017 commented May 2, 2025

i Think we need @alamb help now . could you help ?

@Adez017
Copy link
Contributor Author

Adez017 commented May 3, 2025

@alamb , please take a look

@Adez017
Copy link
Contributor Author

Adez017 commented May 3, 2025

could anyone please help here ?

@xudong963
Copy link
Member

Could you please refer to the error lints in CI? Such as

error[E0433]: failed to resolve: use of undeclared type `Statistics`
  --> datafusion/core/src/lib.rs:1365:12
   |
98 |         Ok(Statistics::new_unknown(&self.schema()))
   |            ^^^^^^^^^^ use of undeclared type `Statistics`
   |
help: consider importing one of these items
   |
2  + use datafusion::physical_plan::Statistics;
   |
2  + use datafusion_common::Statistics;
   |
2  + use datafusion_physical_plan::Statistics;
   |
2  + use parquet::file::statistics::Statistics;

It's saying, you need to import Statistics for the example in the doc, such as
image

@Adez017
Copy link
Contributor Author

Adez017 commented May 4, 2025

Could you please refer to the error lints in CI? Such as

error[E0433]: failed to resolve: use of undeclared type `Statistics`
  --> datafusion/core/src/lib.rs:1365:12
   |
98 |         Ok(Statistics::new_unknown(&self.schema()))
   |            ^^^^^^^^^^ use of undeclared type `Statistics`
   |
help: consider importing one of these items
   |
2  + use datafusion::physical_plan::Statistics;
   |
2  + use datafusion_common::Statistics;
   |
2  + use datafusion_physical_plan::Statistics;
   |
2  + use parquet::file::statistics::Statistics;

It's saying, you need to import Statistics for the example in the doc, such as image

Thanks, mate, but I think it shows for the lib.rs file, and we didn't make any changes there. Does this mean that we need to change over there? I am pretty confused. And one more thing: When we are using the code as an example, do we need to check the imports?

@xudong963
Copy link
Member

Thanks, mate, but I think it shows for the lib.rs file, and we didn't make any changes there. Does this mean that we need to change over there?

If you take a look at the file, there is a macro in lib.rs pointing to the doc file

#[cfg(doctest)]
doc_comment::doctest!(
    "../../../docs/source/library-user-guide/extending-operators.md",
    library_user_guide_extending_operators
);

And one more thing: When we are using the code as an example, do we need to check the imports?

Yes, there is an example: https://github.com/apache/datafusion/blob/main/datafusion/catalog/src/table.rs#L203-L214

You can add # at the beginning of the imports to hide them in doc

@github-actions github-actions bot added the core Core DataFusion crate label May 5, 2025
@github-actions github-actions bot removed the core Core DataFusion crate label May 5, 2025
@Adez017
Copy link
Contributor Author

Adez017 commented May 5, 2025

hey @xudong963 , i think there might be something that I am missing I had done imports but it cause failing again and again , could you please help out ?

@alamb
Copy link
Contributor

alamb commented May 5, 2025

hey @xudong963 , i think there might be something that I am missing I had done imports but it cause failing again and again , could you please help out ?

Here is a writeup that can probably help:

// The following commands test the examples from the user guide as part of
// `cargo test --doc`
//
// # Adding new tests:
//
// Simply add code like this to your .md file and ensure your md file is
// included in the lists below.
//
// ```rust
// <code here will be tested>
// ```
//
// Note that sometimes it helps to author the doctest as a standalone program
// first, and then copy it into the user guide.
//
// # Debugging Test Failures
//
// Unfortunately, the line numbers reported by doctest do not correspond to the
// line numbers of in the .md files. Thus, if a doctest fails, use the name of
// the test to find the relevant file in the list below, and then find the
// example in that file to fix.
//
// For example, if `user_guide_expressions(line 123)` fails,
// go to `docs/source/user-guide/expressions.md` to find the relevant problem.
//

In this case I think you will need to make each example self-contained (so add use, etc statements). You can test them locally with a command cargo test --doc -p datafusion

@Adez017
Copy link
Contributor Author

Adez017 commented May 6, 2025

hey @xudong963 , i think there might be something that I am missing I had done imports but it cause failing again and again , could you please help out ?

Here is a writeup that can probably help:

// The following commands test the examples from the user guide as part of
// `cargo test --doc`
//
// # Adding new tests:
//
// Simply add code like this to your .md file and ensure your md file is
// included in the lists below.
//
// ```rust
// <code here will be tested>
// ```
//
// Note that sometimes it helps to author the doctest as a standalone program
// first, and then copy it into the user guide.
//
// # Debugging Test Failures
//
// Unfortunately, the line numbers reported by doctest do not correspond to the
// line numbers of in the .md files. Thus, if a doctest fails, use the name of
// the test to find the relevant file in the list below, and then find the
// example in that file to fix.
//
// For example, if `user_guide_expressions(line 123)` fails,
// go to `docs/source/user-guide/expressions.md` to find the relevant problem.
//

In this case I think you will need to make each example self-contained (so add use, etc statements). You can test them locally with a command cargo test --doc -p datafusion

Hi @alamb thanks for the help , I tried doing so and eventually I understand the problem but couldn't find a complete solution .
i was think isn't there any dependency of the doc example on others . also could you help out to finish it up ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move code in user_defined_plan.rs to the extending-operators doc
3 participants