-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft: ✨ add a first working draft of auto_aggregate selection #33
base: master
Are you sure you want to change the base?
Conversation
Hi @Xasin, very good addition! So happy to see it! Would you mind adding some example of this in action? Probably improving the README with some instructions would be good for test, review and future users. |
On it! |
Preliminary docs entry and a link from the README to the docs file added. |
utils/aggregate_selector.sql
Outdated
aggregate_choices JSONB, | ||
groupby_clause TEXT, filter_query TEXT, | ||
hypertable_schema TEXT DEFAULT 'public', | ||
time_column TEXT DEFAULT 'time') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On thought, would it make sense to add maybe a debug flag and do a raise notice on the actual query that is run?
That way the user could just grab it and run explain analyze and see what is going on. The function ends up hiding that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I've noticed that it's rather tricky to figure out what is happening inside a function...
I suppose I could reformat the query a bit to first generate a string and then optionally raise that string as debug flag if a value is set. It's not too much of a change and could definitely really help...
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have a moment, try the new pushed change.
I hope it works right
utils/auto_downsample.sql
Outdated
RAISE NOTICE 'Generated query output:' | ||
RAISE NOTICE query_construct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems there's a bug trying to load this lines:
jonatasdp@MacBook-Pro-3 ~/c/t/t/utils (Xasin/master)> psql playground -f auto_downsample.sql
Expanded display is used automatically.
Border style is 2.
Line style is unicode.
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
psql:auto_downsample.sql:215: ERROR: syntax error at or near "RAISE"
LINE 47: RAISE NOTICE query_construct
^
```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to test it with the following hypertable:
\d ticks
Table "toolkit_experimental.ticks"
┌────────┬──────────────────────────┬───────────┬──────────┬─────────┐
│ Column │ Type │ Collation │ Nullable │ Default │
├────────┼──────────────────────────┼───────────┼──────────┼─────────┤
│ time │ timestamp with time zone │ │ not null │ │
│ symbol │ text │ │ │ │
│ price │ numeric │ │ │ │
│ volume │ double precision │ │ │ │
└────────┴──────────────────────────┴───────────┴──────────┴─────────┘
Indexes:
"ticks_time_idx" btree ("time" DESC)
Triggers:
ts_insert_blocker BEFORE INSERT ON ticks FOR EACH ROW EXECUTE FUNCTION _timescaledb_internal.insert_blocker()
Number of child tables: 1 (Use \d+ to list them.)
Now, creating the auto downsample:
SELECT *
FROM auto_downsample(
'ticks',
INTERVAL '10m',
$aggregate_options$
[
{"with_columns": ["price"], "aggregate":"avg(price) AS value"}
]
$aggregate_options$,
'symbol',
$where_clause$
WHERE time BETWEEN NOW()-INTERVAL'30d' AND NOW()-INTERVAL'10d'
$where_clause$)
AS (time TIMESTAMP, symbol text, value DOUBLE PRECISION);
I got the following error:
Using parameter set <NULL>
ERROR: No aggregator given!
HINT: Supply a "aggregate" field in the JSON aggregate object
CONTEXT: PL/pgSQL function auto_downsample(text,interval,jsonb,text,text,text,text,boolean) line 21 at RAISE
Why can't I get it with a single aggregates? Maybe I'm missing something or my context is not suitable for this 🤔
Co-authored-by: Jônatas Davi Paganini <[email protected]> Signed-off-by: Xasin <[email protected]>
@jonatas I think the problem there is that you are using a different schema than the "public" schema. The schema name can be given via the I am assuming "public" is the default schema, but perhaps there is a better way, e.g. using the currently "active" schema, if such a thing exists? What caught me off guard is that it's raising the wrong error. I thought I had a guard in check that specifically detects when it can't find a fitting table and give an appropriate error message, but it seems that it didn't work.
It seems to work fine on a Hypertable with no continuous aggregates for me. |
Thanks for the fix @Xasin. Just trying it again: CREATE TABLE ticks
( time TIMESTAMP NOT NULL,
symbol varchar,
price decimal,
volume int);
SELECT create_hypertable('ticks', 'time'); And then: SELECT *
FROM auto_downsample(
'ticks',
INTERVAL '10m',
$aggregate_options$
[
{"with_columns": ["price"], "aggregate":"avg(price) AS value"}
]
$aggregate_options$,
'symbol',
$where_clause$
WHERE time BETWEEN NOW()-INTERVAL'30d' AND NOW()-INTERVAL'10d'
$where_clause$)
AS (time TIMESTAMP, symbol text, value DOUBLE PRECISION); Now, after reloading I have the following error:
|
@jonatas very strange, I am getting an error as well, but it's quite different. When I run your commands 1:1 in my psql prompt (on a PostgreSQL 13.11 server), the error message is the following:
But after fixing those by replacing the |
I'm on Postgresql 14.7. Maybe we could leave a complete example running in the examples, just to allow people to try to replicate it and then adapt to their needs. Can you share the changes you did just to make sure I'm following all the replaces in the |
@jonatas the edited "AS" clause has to have the exact same types as the return types of the query, so it looks like: I also confirmed that I have no other typedef of the function that might be hiding the issue, so I'm sure I only have the I'll update a fully working example in a moment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all your help here! very good to have this helper around 🚀 🙇
Tested manually ✅
I'm really happy to help! I don't think there's anything else from me left to change, so unless there's something else that's needed I think this can be merged in. |
Amazing! Thanks again! Adding Chris to have a second 👀 and if @d-sooter has any feedback let us know before we merge it. |
So i played around a bit with the function and i love it. And will be using it :) One issue i had is that i join my hypertable/Cagg on a metadata table. And since i dont know which table will be selected i cant really perform my join. For my situation i just added a fixed alias to the table so that it can be referenced if needed in a join but im not sure if it would make sense to allow an alias to be passed for the table or to have a fixed alias |
@nimbit-software hm... That is an interesting problem. Would it help if I aliased the selected table to a common name inside the internal query, so that it's consistently referenceable for JOINs? For now, can you try something for me? EDIT: Maybe that's what you already did. |
so basically this is what i did. i set a static alias "timeseries" and then i can perform my join. I can imagine cases where people would want to pass in their own alias, but then we have another input parameter and its starts getting a bit full.
we could check the table name and if there is a space in it then split it up and set the alias.
|
@d-sooter that doesn't sound too bad. It doesn't actually matter how many parameters the function has. PLPGSQL functions support named parameters, where individual parameters can be accessed through e.g. I don't feel comfortable with the RegExp solution, it can get a bit messy and harder to understand than an explicit parameter like that. |
I had one last idea. I already implemented it in my system and it works good. That is gapfill I added a parameter to the function
and then just updated the query
Since the where is passed it it makes it easy to use the interpolation or locf functions as well. I tested it and it works like a charm |
This PR will add one new file which will create new functions to allow for the execution of an automatic downsampling of data for TimescaleDB hypertables and continuous aggregates.
Functionality of the downsampling method is documented in the utils/aggregate_selector.sql file.
There are no dependencies with other systems, and no internal state of TimescaleDB is modified, as such there should be little conflict with pre-existing systems.