Skip to content

feat(go/adbc): add IngestStream helper for one-call ingestion and add TestIngestStream #3150

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

Mandukhai-Alimaa
Copy link

@Mandukhai-Alimaa Mandukhai-Alimaa commented Jul 14, 2025

Solves #3142

Changes:

  1. New freestanding function: func IngestStream(ctx context.Context, cnxn Connection, reader array.RecordReader, opts map[string]string) (int64, error)
  2. TestIngestStream added to drivermgr_test

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this make more sense as a freestanding function? It can be implemented entirely in terms of existing APIs anyways, and that way we don't have to make a breaking change to the core API

@zeroshade
Copy link
Member

I was thinking the same as what @lidavidm suggested, a free function would be better in this case to avoid a breaking change to the interface

@lidavidm lidavidm changed the title add IngestStream helper for one-call ingestion and add TestIngestStream feat(go/adbc): add IngestStream helper for one-call ingestion and add TestIngestStream Jul 14, 2025
MANDY Alimaa added 2 commits July 13, 2025 22:39
Wrap the ingestion steps into a single, freestanding `adbc.IngestStream` function for easier Arrow data ingestion. Add `TestIngestStream` in drivermgr_test.go to verify end-to-end functionality.

Closes apache#3142
Wrap the ingestion steps into a single, freestanding `adbc.IngestStream` function for easier Arrow data ingestion. Add `TestIngestStream` in drivermgr_test.go to verify end-to-end functionality.

Closes apache#3142
go/adbc/adbc.go Outdated
// the five-step boilerplate of NewStatement, SetOption, Bind,
// Execute, and Close.
//
// This is not part of the ADBC API specification.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 633 to 646
// 1) Create the target table
st, err := dm.conn.NewStatement()
dm.Require().NoError(err)
defer validation.CheckedClose(dm.T(), st)

dm.NoError(st.SetSqlQuery(`
CREATE TABLE IF NOT EXISTS ingest_test (
col1 INTEGER,
col2 TEXT
)
`))
n, err := st.ExecuteUpdate(dm.ctx)
dm.NoError(err)
dm.Equal(int64(0), n, "CREATE TABLE should report 0 rows affected")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ingest can create the table itself, so this isn't necessarily necessary

defer b.Release()

// first batch: 3 rows
b.Field(0).(*array.Int64Builder).AppendValues([]int64{1, 2, 3}, nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ArrayFromJSON might make this easier

go/adbc/adbc.go Outdated
// Execute, and Close.
//
// This is not part of the ADBC API specification.
func IngestStream(ctx context.Context, cnxn Connection, reader array.RecordReader, opts map[string]string) (int64, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should accept things like target table directly as parameters? Or have an explicit parameters struct? (We can keep the map for any other options but I think the idea ought to be that we make the common parameters into formal parameters instead of requiring the option)

go/adbc/adbc.go Outdated
if err != nil {
return -1, fmt.Errorf("IngestStream: NewStatement: %w", err)
}
defer stmt.Close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close is fallible, so you have to handle the error. You need something like this (with a named error return value)

defer func() {
	err = errors.Join(err, stmt.Close())
}()

MANDY Alimaa added 3 commits July 14, 2025 13:59
Wrap the ingestion steps into a single, freestanding `adbc.IngestStream` function for easier Arrow data ingestion. Add `TestIngestStream` in drivermgr_test to verify end-to-end functionality.

Closes apache#3142
Wrap the ingestion steps into a single, freestanding `adbc.IngestStream` function for easier Arrow data ingestion. Add `TestIngestStream` in drivermgr_test to verify end-to-end functionality.

Closes apache#3142
Wrap the ingestion steps into a single, freestanding `adbc.IngestStream` function for easier Arrow data ingestion. Add `TestIngestStream` in drivermgr_test to verify end-to-end functionality.

Closes apache#3142
go/adbc/ext.go Outdated
// 1) Create a new statement
stmt, err := cnxn.NewStatement()
if err != nil {
return -1, fmt.Errorf("IngestStream: NewStatement: %w", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the convention for golang error strings is that they do not start with a capitalized letter. Perhaps something more like:

fmt.Errorf("error during ingestion: NewStatement: %w", err)

go/adbc/ext.go Outdated
}

// Set required options
if err := stmt.SetOption(OptionKeyIngestTargetTable, opt.TargetTable); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the :, you're shadowing the err from before so the errors.Join isn't going to work this way.

go/adbc/ext.go Outdated
}

// 4) Execute the update
count, err := stmt.ExecuteUpdate(ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as before, you're shadowing the err var here. Usually not a problem, but will be an issue with the defer from the top

go/adbc/ext.go Outdated
Comment on lines 87 to 90
type IngestStreamOption struct {
TargetTable string // required
IngestMode string // required, e.g. adbc.OptionValueIngestModeCreateAppend, or OptionValueIngestModeReplace
Extra map[string]string // any other stmt.SetOption(...) args
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since these are required, should we make them explicit parameters instead of behind this struct? i.e.

func IngestStream(ctx context.Context, cnxn Connection, reader array.RecordReader, target, mode string, extra map[string]string) (int64, error)

what do you think? @lidavidm thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could go either way, but the struct may make it easier to add new parameters in the future? That said I don't mind table/mode being just formal parameters and having struct fields for things like temporary/catalog/schema, and keeping Extra for any driver-specific options

Wrap the ingestion steps into a single, freestanding `adbc.IngestStream` function for easier Arrow data ingestion. Add `TestIngestStream` in drivermgr_test to verify end-to-end functionality.

Closes apache#3142
@Mandukhai-Alimaa Mandukhai-Alimaa marked this pull request as ready for review July 15, 2025 16:47
@github-actions github-actions bot added this to the ADBC Libraries 20 milestone Jul 15, 2025
Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks really good, just a couple comments.

go/adbc/ext.go Outdated

// IngestStreamOption bundles the IngestStream options.
// Driver specific options can go into Extra.
type IngestStreamOption struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would pluralize this to be IngestStreamOptions personally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternately, we could potentially do something like:

type IngestStreamOption func(adbc.Statement) error

func ingestSetOption(name, value string) IngestStreamOption {
    return func(st adbc.Statement) {
        return st.SetOption(name, cat)
    }
}

func WithIngestCatalog(cat string) IngestStreamOption {
    return ingestSetOption(adbc.OptionValueIngestTargetCatalog, cat)
}

func WithIngestTemp() IngestStreamOption {
    return ingestSetOption(adbc.OptionValueIngestTemporary, adbc.OptionValueEnabled)
}

...

func IngestStream(ctx context.Context, cnxn Connection, reader array.RecordReader, targetTable, ingestMode string, opts ...IngestStreamOption) (count int64, err error) {
    ...

    for _, o := range opts {
        if err = o(stmt); err != nil {
            err = fmt.Errorf("error during ingestion: %w", err)
            return
        }
    }

    ....
}

which would not only allow for the options we provide but make it fairly easy for users to create their own options if they like and want to add, while still making this extensible.

I'm not saying we definitely should do this, just putting it out as a suggestion. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine as-is in terms of extensibility

go/adbc/ext.go Outdated

// IngestStreamOption bundles the IngestStream options.
// Driver specific options can go into Extra.
type IngestStreamOption struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine as-is in terms of extensibility

go/adbc/ext.go Outdated

// Set required options
if err = stmt.SetOption(OptionKeyIngestTargetTable, targetTable); err != nil {
return 0, fmt.Errorf("error during ingestion: SetOption(target_table=%s): %w", targetTable, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why do we go from returning -1 to 0 as the sentinel value? (Not that it matters either way, but I think we usually use -1)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it. I've switched the sentinel back to −1. My pair programmer (AI assistant) must have accidentally used 0, and I overlooked it. Thanks for catching that!

MANDY Alimaa added 2 commits July 17, 2025 19:25
Wrap the ingestion steps into a single, freestanding `adbc.IngestStream` function for easier Arrow data ingestion. Add `TestIngestStream` in drivermgr_test to verify end-to-end functionality.

Closes apache#3142
Wrap the ingestion steps into a single, freestanding `adbc.IngestStream` function for easier Arrow data ingestion. Add `TestIngestStream` in drivermgr_test to verify end-to-end functionality.

Closes apache#3142
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants