-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Adjust how aliases are formatted #4750
base: main
Are you sure you want to change the base?
Conversation
This proposes to adjust how aliases are formatted. Currently we include a space around `=`, which means we get the quite reasonable: ```elm sum_gross_cost = sum gross_cost ``` ...but also the quite confusing: ```elm join side:left manager = employees e.reports_to == manager.employee_id` ``` There, the _least_ two bound items appear to be `manager` & `employees`, but in fact those are bound. So this proposes to change these to: - Remove spaces around `=` in aliases - Add parentheses if the rvalue contains multiple items So now we get: ```elm join side:left manager=employees e.reports_to == manager.employee_id` ``` ...while the standard case here becomes arguable a bit worse, but still quite reasonable: ```elm sum_gross_cost = (sum gross_cost) ``` What do folks think?
Is this just for PRQL code samples in the tests and docs? Presumably both versions will still compile fine? I was quite surprised that the following actually compiles because I somehow always thought that the parentheses around the join conditions was mandatory: join side:left manager=employees e.reports_to == manager.employee_id Never really thought about this before and now that I have it's opened a bit of a Pandora's box for me and I have quite a few comments/questions. I'll put those in separate comments to make it easier to downvote/upvote them separately. |
First thoughtThere's a difference between column aliases and relation aliases, should they be treated differently? column aliases: derive sum_gross_cost = sum gross_cost relation references: from worker=employees
join side:left manager=employees worker.reports_to == manager.employee_id I don't actually have an opinion on this because I think it's deeper than that. See next comment. |
Second thoughtColumn aliases
Relation aliases
from worker=employees
join side:left manager=employees worker.reports_to == manager.employee_id
Coming back to point 1., I believe this is a carryover from SQL and should be removed from PRQL because a) it is not a fundamental part of Relational Algebra, and b) breaks the local nature of PRQL transforms and puts in global dependencies between different pipeline steps. I've argued this before but I'm more convinced and resolved on this now. I think this would easily become clear if we did a simple implementation of PRQL on a backend of lists of tuples (without any SQL in between). My conclusions from my investigations into Monads and Relational Algebra can best be summed up by saying that I believe PRQL should be a DSL for the List monad. By restricting ourselves to SQL backends we miss out a lot of potential application areas of PRQL and risk perpetuating mistakes from SQL (like the global scope of relation aliases). I'm not sure how this fits into the resolver rework that's currently going on but my suspicion is that it would probably greatly simplify the resolver as well. I think the only thing that stands in the way of this is tl;dr I think relation aliases area a mistake and should be removed from PRQL! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tl;dr @max-sixty I'm happy with how the codegen examples in this PR look and am in favour of this change.
I too feel that the disadvantages of complicating the syntax outweigh the advantages of being able to set aliases. |
The trouble about removing relation aliases in Extending your previous example slightly: from worker=employees
join side:left manager=employees worker.reports_to == manager.employee_id
select { manager.employee_id, manager.name, worker_name=worker.name } How else would you phrase the |
@kgutwin You are quite right but I have a solution for that, at least semantically (the syntax would still need refinement). Unfortunately I don't have time to write it up right now but I'll try to do it as soon as I can. (There might actually already be a write up somewhere on here.) Should I put it in an "issue" or a "discussion"? |
That is a big statement to make! I do feel the same way and the resolver rework does in fact make a few steps in this direction. I've opened an issue about relational aliases: #4751 |
Let's move the discussion about relational aliases to the new issue and talk about the new code style here. Am I correct that this change applies to "official PRQL code style" only, and not language itself? So no changes to the parser? Regarding the new code style:
We do use For example:
|
Correct!
This is feasible. It's a bit more complicated but not intractable. So to extend your example: from x=employees
derive rounded=(round 2 gross_cost),
aggregate {
sum_gross_cost = sum gross_cost,
avg_gross_cost = average gross_cost,
gross_cost_again = gross cost,
}
derive y=sum_gross_cost What do folks think? |
That looks good to me. |
for more information, see https://pre-commit.ci
(FYI this became a bit harder than I thought, so pausing on this in favor of trying the new |
This proposes to adjust how aliases are formatted.
Currently we include a space around
=
, which means we get the quite reasonable:...but also the quite confusing:
There, the least two bound items appear to be
manager
&employees
, but in fact those are bound.So this proposes to change these to:
=
in aliasesSo now we get:
...while the standard case here becomes arguably a tiny bit worse, but I think still reasonable:
What do folks think?