Evaluate removing the "create distributed function" section from the quick start guide #1033

ozgune · 2022-03-11T23:26:09Z

Why are we implementing it? (sales eng)

What are the typical use cases?

Communication goals (e.g. detailed howto vs orientation)

Our Quick Start guide is an opportunity to introduce simple concepts to our users.

https://docs.citusdata.com/en/v10.2/get_started/tutorial_multi_tenant.html

In the multi-tenant quick start guide, we introduce the following concept. I feel that the notion of additional roundtrips, creating a new UDF, and then declaring the use of the UDF as a distributed function goes beyond a quick start.

Could we evaluate removing the following section from our Quick Start Guide?

I'm asking because I haven't used create_distributed_function() in this way before. Although I'm not a power user, I also feel that this goes beyond what's needed to get started on Citus.

"Each statement in a transactions causes roundtrips between the coordinator and workers in multi-node Citus. For multi-tenant workloads, it’s more efficient to run transactions in distributed functions. The efficiency gains become more apparent for larger transactions, but we can use the small transaction above as an example.

First create a function that does the deletions:

CREATE OR REPLACE FUNCTION
delete_campaign(company_id int, campaign_id int)
RETURNS void LANGUAGE plpgsql AS $fn$
BEGIN
DELETE FROM campaigns
WHERE id = $2 AND campaigns.company_id = $1;
DELETE FROM ads
WHERE ads.campaign_id = $2 AND ads.company_id = $1;
END;
$fn$;

Next use create_distributed_function to instruct Citus to run the function directly on workers rather than on the coordinator (except on a single-node Citus installation, which runs everything on the coordinator). It will run the function on whatever worker holds the Shards for tables ads and campaigns corresponding to the value company_id.

SELECT create_distributed_function(
'delete_campaign(int, int)', 'company_id',
colocate_with := 'campaigns'
);

-- you can run the function as usual
SELECT delete_campaign(5, 46);"

Good locations for content in docs structure

How does this work? (devs)

Example sql

Corner cases, gotchas

Are there relevant blog posts or outside documentation about the concept/feature?

Link to relevant commits and regression tests if applicable

onderkalaci · 2022-03-21T17:30:02Z

related to #1024.

"Distributed functions" is an advanced topic, so it makes sense not to have it on the quick start.

Users typically create a distributed function and expect the function speed up (expecting similar behavior to create distributed table). However, in reality, the schema/functions should be properly set up to benefit from distributed functions. Hence, users are confused with the concept of distributed functions.

In fact, Marco thinks we could rename create_distributed_function to something more explicit like delegate_procedure_to_nodes or such.

jonels-msft · 2022-03-21T17:47:34Z

"Distributed functions" is an advanced topic, so it makes sense not to have it on the quick start.

IIRC there was a push to advertise that feature when it was released, but I agree that it's a distraction in an early tutorial.

In fact, Marco thinks we could rename create_distributed_function to something more explicit like delegate_procedure_to_nodes or such.

Sounds like a good idea. "Distributed function" suggests a false analogy with "distributed table."

ozgune changed the title ~~Evaluate "creating a distributed function" in the multi-tenant quick start guide~~ Evaluate removing the "create distributed function" section from the quick start guide Mar 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate removing the "create distributed function" section from the quick start guide #1033

Evaluate removing the "create distributed function" section from the quick start guide #1033

ozgune commented Mar 11, 2022

onderkalaci commented Mar 21, 2022

jonels-msft commented Mar 21, 2022

Evaluate removing the "create distributed function" section from the quick start guide #1033

Evaluate removing the "create distributed function" section from the quick start guide #1033

Comments

ozgune commented Mar 11, 2022

Why are we implementing it? (sales eng)

What are the typical use cases?

Communication goals (e.g. detailed howto vs orientation)

Good locations for content in docs structure

How does this work? (devs)

Example sql

Corner cases, gotchas

Are there relevant blog posts or outside documentation about the concept/feature?

Link to relevant commits and regression tests if applicable

onderkalaci commented Mar 21, 2022

jonels-msft commented Mar 21, 2022