Skip to content

Commit

Permalink
Merge branch 'main' into create_user_2pc
Browse files Browse the repository at this point in the history
  • Loading branch information
gurkanindibay authored Feb 26, 2024
2 parents f7ba210 + f424268 commit 3a9c2d4
Show file tree
Hide file tree
Showing 19 changed files with 1,134 additions and 28 deletions.
22 changes: 21 additions & 1 deletion .devcontainer/.vscode/launch.json
Original file line number Diff line number Diff line change
Expand Up @@ -16,5 +16,25 @@
}
],
},
]
{
"name": "Open core file",
"type": "cppdbg",
"request": "launch",
"program": "/home/citus/.pgenv/pgsql/bin/postgres",
"coreDumpPath": "${input:corefile}",
"cwd": "${workspaceFolder}",
"MIMode": "gdb",
}
],
"inputs": [
{
"id": "corefile",
"type": "command",
"command": "extension.commandvariable.file.pickFile",
"args": {
"dialogTitle": "Select core file",
"include": "**/core*",
},
},
],
}
1 change: 1 addition & 0 deletions .devcontainer/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,7 @@ RUN sudo apt update \
lsof \
man \
net-tools \
psmisc \
pspg \
tree \
vim \
Expand Down
7 changes: 6 additions & 1 deletion .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,11 @@
"image": "ghcr.io/citusdata/citus-devcontainer:main",
"runArgs": [
"--cap-add=SYS_PTRACE",
"--ulimit=core=-1",
],
"forwardPorts": [
9700
],
"forwardPorts": [9700],
"customizations": {
"vscode": {
"extensions": [
Expand All @@ -14,6 +17,7 @@
"github.vscode-pull-request-github",
"ms-vscode.cpptools-extension-pack",
"ms-vsliveshare.vsliveshare",
"rioj7.command-variable",
],
"settings": {
"files.exclude": {
Expand All @@ -30,3 +34,4 @@
"updateContentCommand": "./configure",
"postCreateCommand": "make -C .devcontainer/",
}

43 changes: 43 additions & 0 deletions DEVCONTAINER.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Devcontainer

## Coredumps
When postgres/citus crashes, there is the option to create a coredump. This is useful for debugging the issue. Coredumps are enabled in the devcontainer by default. However, not all environments are configured correctly out of the box. The most important configuration that is not standardized is the `core_pattern`. The configuration can be verified from the container, however, you cannot change this setting from inside the container as the filesystem containing this setting is in read only mode while inside the container.

To verify if corefiles are written run the following command in a terminal. This shows the filename pattern with which the corefile will be written.
```bash
cat /proc/sys/kernel/core_pattern
```

This should be configured with a relative path or simply a simple filename, such as `core`. When your environment shows an absolute path you will need to change this setting. How to change this setting depends highly on the underlying system as the setting needs to be changed on the kernel of the host running the container.

You can put any pattern in `/proc/sys/kernel/core_pattern` as you see fit. eg. You can add the PID to the core pattern in one of two ways;
- You either include `%p` in the core_pattern. This gets substituted with the PID of the crashing process.
- Alternatively you could set `/proc/sys/kernel/core_uses_pid` to `1` in the same way as you set `core_pattern`. This will append the PID to the corefile if `%p` is not explicitly contained in the core_pattern.

When a coredump is written you can use the debug/launch configuration `Open core file` which is preconfigured in the devcontainer. This will open a fileprompt that lists all coredumps that are found in your workspace. When you want to debug coredumps from `citus_dev` that are run in your `/data` directory, you can add the data directory to your workspace. In the command pallet of vscode you can run `>Workspace: Add Folder to Workspace...` and select the `/data` directory. This will allow you to open the coredumps from the `/data` directory in the `Open core file` debug configuration.

### Windows (docker desktop)
When running in docker desktop on windows you will most likely need to change this setting. The linux guest in WSL2 that runs your container is the `docker-desktop` environment. The easiest way to get onto the host, where you can change this setting, is to open a powershell window and verify you have the docker-desktop environment listed.

```powershell
wsl --list
```

Among others this should list both `docker-desktop` and `docker-desktop-data`. You can then open a shell in the `docker-desktop` environment.

```powershell
wsl -d docker-desktop
```

Inside this shell you can verify that you have the right environment by running

```bash
cat /proc/sys/kernel/core_pattern
```

This should show the same configuration as the one you see inside the devcontainer. You can then change the setting by running the following command.
This will change the setting for the current session. If you want to make the change permanent you will need to add this to a startup script.

```bash
echo "core" > /proc/sys/kernel/core_pattern
```
129 changes: 120 additions & 9 deletions src/backend/distributed/commands/database.c
Original file line number Diff line number Diff line change
Expand Up @@ -40,15 +40,38 @@
#include "distributed/deparse_shard_query.h"
#include "distributed/deparser.h"
#include "distributed/listutils.h"
#include "distributed/local_executor.h"
#include "distributed/metadata/distobject.h"
#include "distributed/metadata_sync.h"
#include "distributed/metadata_utility.h"
#include "distributed/multi_executor.h"
#include "distributed/relation_access_tracking.h"
#include "distributed/serialize_distributed_ddls.h"
#include "distributed/shard_cleaner.h"
#include "distributed/worker_protocol.h"
#include "distributed/worker_transaction.h"


/*
* Used to save original name of the database before it is replaced with a
* temporary name for failure handling purposes in PreprocessCreateDatabaseStmt().
*/
static char *CreateDatabaseCommandOriginalDbName = NULL;


/*
* The format string used when creating a temporary databases for failure
* handling purposes.
*
* The fields are as follows to ensure using a unique name for each temporary
* database:
* - operationId: The operation id returned by RegisterOperationNeedingCleanup().
* - groupId: The group id of the worker node where CREATE DATABASE command
* is issued from.
*/
#define TEMP_DATABASE_NAME_FMT "citus_temp_database_%lu_%d"


/*
* DatabaseCollationInfo is used to store collation related information of a database.
*/
Expand Down Expand Up @@ -286,8 +309,9 @@ PreprocessAlterDatabaseStmt(Node *node, const char *queryString,
* NontransactionalNodeDDLTask to run the command on the workers outside
* the transaction block.
*/

return NontransactionalNodeDDLTaskList(NON_COORDINATOR_NODES, commands);
bool warnForPartialFailure = true;
return NontransactionalNodeDDLTaskList(NON_COORDINATOR_NODES, commands,
warnForPartialFailure);
}
else
{
Expand Down Expand Up @@ -453,7 +477,12 @@ PreprocessAlterDatabaseSetStmt(Node *node, const char *queryString,
*
* In this stage, we perform validations that we want to ensure before delegating to
* previous utility hooks because it might not be convenient to throw an error in an
* implicit transaction that creates a database.
* implicit transaction that creates a database. Also in this stage, we save the original
* database name and replace dbname field with a temporary name for failure handling
* purposes. We let Postgres create the database with the temporary name, insert a cleanup
* record for the temporary database name on all nodes and let PostprocessCreateDatabaseStmt()
* to return the distributed DDL job that both creates the database with the temporary name
* and then renames it back to its original name.
*
* We also serialize database commands globally by acquiring a Citus specific advisory
* lock based on OCLASS_DATABASE on the first primary worker node.
Expand All @@ -467,22 +496,56 @@ PreprocessCreateDatabaseStmt(Node *node, const char *queryString,
return NIL;
}

EnsurePropagationToCoordinator();
EnsureCoordinatorIsInMetadata();

CreatedbStmt *stmt = castNode(CreatedbStmt, node);
EnsureSupportedCreateDatabaseCommand(stmt);

SerializeDistributedDDLsOnObjectClass(OCLASS_DATABASE);

OperationId operationId = RegisterOperationNeedingCleanup();

char *tempDatabaseName = psprintf(TEMP_DATABASE_NAME_FMT,
operationId, GetLocalGroupId());

List *remoteNodes = TargetWorkerSetNodeList(ALL_SHARD_NODES, RowShareLock);
WorkerNode *remoteNode = NULL;
foreach_ptr(remoteNode, remoteNodes)
{
InsertCleanupRecordOutsideTransaction(
CLEANUP_OBJECT_DATABASE,
pstrdup(quote_identifier(tempDatabaseName)),
remoteNode->groupId,
CLEANUP_ON_FAILURE
);
}

CreateDatabaseCommandOriginalDbName = stmt->dbname;
stmt->dbname = tempDatabaseName;

/*
* Delete cleanup records in the same transaction so that if the current
* transactions fails for some reason, then the cleanup records won't be
* deleted. In the happy path, we will delete the cleanup records without
* deferring them to the background worker.
*/
FinalizeOperationNeedingCleanupOnSuccess("create database");

return NIL;
}


/*
* PostprocessCreateDatabaseStmt is executed after the statement is applied to the local
* postgres instance. In this stage we prepare the commands that need to be run on
* all workers to create the database.
* postgres instance.
*
* In this stage, we first rename the temporary database back to its original name for
* local node and then return a list of distributed DDL jobs to create the database with
* the temporary name and then to rename it back to its original name. That way, if CREATE
* DATABASE fails on any of the nodes, the temporary database will be cleaned up by the
* cleanup records that we inserted in PreprocessCreateDatabaseStmt() and in case of a
* failure, we won't leak any databases called as the name that user intended to use for
* the database.
*/
List *
PostprocessCreateDatabaseStmt(Node *node, const char *queryString)
Expand Down Expand Up @@ -515,9 +578,55 @@ PostprocessCreateDatabaseStmt(Node *node, const char *queryString)
* block, we need to use NontransactionalNodeDDLTaskList() to send the CREATE
* DATABASE statement to the workers.
*/
bool warnForPartialFailure = false;
List *createDatabaseDDLJobList =
NontransactionalNodeDDLTaskList(REMOTE_NODES, createDatabaseCommands);
return createDatabaseDDLJobList;
NontransactionalNodeDDLTaskList(REMOTE_NODES, createDatabaseCommands,
warnForPartialFailure);

CreatedbStmt *stmt = castNode(CreatedbStmt, node);

char *renameDatabaseCommand =
psprintf("ALTER DATABASE %s RENAME TO %s",
quote_identifier(stmt->dbname),
quote_identifier(CreateDatabaseCommandOriginalDbName));

List *renameDatabaseCommands = list_make3(DISABLE_DDL_PROPAGATION,
renameDatabaseCommand,
ENABLE_DDL_PROPAGATION);

/*
* We use NodeDDLTaskList() to send the RENAME DATABASE statement to the
* workers because we want to execute it in a coordinated transaction.
*/
List *renameDatabaseDDLJobList =
NodeDDLTaskList(REMOTE_NODES, renameDatabaseCommands);

/*
* Temporarily disable citus.enable_ddl_propagation before issuing
* rename command locally because we don't want to execute it on remote
* nodes yet. We will execute it on remote nodes by returning it as a
* distributed DDL job.
*
* The reason why we don't want to execute it on remote nodes yet is that
* the database is not created on remote nodes yet.
*/
int saveNestLevel = NewGUCNestLevel();
set_config_option("citus.enable_ddl_propagation", "off",
(superuser() ? PGC_SUSET : PGC_USERSET), PGC_S_SESSION,
GUC_ACTION_LOCAL, true, 0, false);

ExecuteUtilityCommand(renameDatabaseCommand);

AtEOXact_GUC(true, saveNestLevel);

/*
* Restore the original database name because MarkObjectDistributed()
* resolves oid of the object based on the database name and is called
* after executing the distributed DDL job that renames temporary database.
*/
stmt->dbname = CreateDatabaseCommandOriginalDbName;

return list_concat(createDatabaseDDLJobList, renameDatabaseDDLJobList);
}


Expand Down Expand Up @@ -571,8 +680,10 @@ PreprocessDropDatabaseStmt(Node *node, const char *queryString,
* use NontransactionalNodeDDLTaskList() to send the DROP DATABASE statement
* to the workers.
*/
bool warnForPartialFailure = true;
List *dropDatabaseDDLJobList =
NontransactionalNodeDDLTaskList(REMOTE_NODES, dropDatabaseCommands);
NontransactionalNodeDDLTaskList(REMOTE_NODES, dropDatabaseCommands,
warnForPartialFailure);
return dropDatabaseDDLJobList;
}

Expand Down
3 changes: 3 additions & 0 deletions src/backend/distributed/commands/index.c
Original file line number Diff line number Diff line change
Expand Up @@ -493,6 +493,7 @@ GenerateCreateIndexDDLJob(IndexStmt *createIndexStatement, const char *createInd
ddlJob->startNewTransaction = createIndexStatement->concurrent;
ddlJob->metadataSyncCommand = createIndexCommand;
ddlJob->taskList = CreateIndexTaskList(createIndexStatement);
ddlJob->warnForPartialFailure = true;

return ddlJob;
}
Expand Down Expand Up @@ -652,6 +653,7 @@ PreprocessReindexStmt(Node *node, const char *reindexCommand,
"concurrently");
ddlJob->metadataSyncCommand = reindexCommand;
ddlJob->taskList = CreateReindexTaskList(relationId, reindexStatement);
ddlJob->warnForPartialFailure = true;

ddlJobs = list_make1(ddlJob);
}
Expand Down Expand Up @@ -780,6 +782,7 @@ PreprocessDropIndexStmt(Node *node, const char *dropIndexCommand,
ddlJob->metadataSyncCommand = dropIndexCommand;
ddlJob->taskList = DropIndexTaskList(distributedRelationId, distributedIndexId,
dropIndexStatement);
ddlJob->warnForPartialFailure = true;

ddlJobs = list_make1(ddlJob);
}
Expand Down
13 changes: 9 additions & 4 deletions src/backend/distributed/commands/utility_hook.c
Original file line number Diff line number Diff line change
Expand Up @@ -1402,7 +1402,7 @@ ExecuteDistributedDDLJob(DDLJob *ddlJob)
errhint("Use DROP INDEX CONCURRENTLY IF EXISTS to remove the "
"invalid index, then retry the original command.")));
}
else
else if (ddlJob->warnForPartialFailure)
{
ereport(WARNING,
(errmsg(
Expand All @@ -1411,9 +1411,9 @@ ExecuteDistributedDDLJob(DDLJob *ddlJob)
"state.\nIf the problematic command is a CREATE operation, "
"consider using the 'IF EXISTS' syntax to drop the object,"
"\nif applicable, and then re-attempt the original command.")));

PG_RE_THROW();
}

PG_RE_THROW();
}
PG_END_TRY();
}
Expand Down Expand Up @@ -1629,9 +1629,12 @@ DDLTaskList(Oid relationId, const char *commandString)
* NontransactionalNodeDDLTaskList builds a list of tasks to execute a DDL command on a
* given target set of nodes with cannotBeExecutedInTransaction is set to make sure
* that task list is executed outside a transaction block.
*
* Also sets warnForPartialFailure for the returned DDLJobs.
*/
List *
NontransactionalNodeDDLTaskList(TargetWorkerSet targets, List *commands)
NontransactionalNodeDDLTaskList(TargetWorkerSet targets, List *commands,
bool warnForPartialFailure)
{
List *ddlJobs = NodeDDLTaskList(targets, commands);
DDLJob *ddlJob = NULL;
Expand All @@ -1642,6 +1645,8 @@ NontransactionalNodeDDLTaskList(TargetWorkerSet targets, List *commands)
{
task->cannotBeExecutedInTransaction = true;
}

ddlJob->warnForPartialFailure = warnForPartialFailure;
}
return ddlJobs;
}
Expand Down
Loading

0 comments on commit 3a9c2d4

Please sign in to comment.