Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pyoso autocomplete #3249

Open
ryscheng opened this issue Mar 12, 2025 · 2 comments
Open

pyoso autocomplete #3249

ryscheng opened this issue Mar 12, 2025 · 2 comments

Comments

@ryscheng
Copy link
Member

What is it?

@Jabolol mind brain-dumping what you had thought about in Denver?

@ryscheng ryscheng added this to the [c] pyoso fetches data milestone Mar 12, 2025
Copy link

linear bot commented Mar 12, 2025

@github-project-automation github-project-automation bot moved this to Backlog in OSO Mar 12, 2025
@Jabolol
Copy link
Contributor

Jabolol commented Mar 13, 2025

Carl and I went through some different API designs when we were at the Gitcoin house in Denver, and we came to the following conclusions:

The best outcome is a fully typed pyoso package. But that has some limitations. IMO, the best API would be as follows:

from pyoso import Client

client = Client(API_KEY)

# example:
#
# +-----------+-----------+
# | column_0  | column_1  |
# +-----------+-----------+
# | value     | data_1    |
# | value     | data_2    |
# | value     | data_3    |
# +-----------+-----------+
#

# create a virtual df, lazily evaluated
vdf = client.table("example").select(
    "column_0", # autocomplete based on "example" columns
    "column_1", # autocomplete ...
).where(lambda row: row["column_0"] == "value") # automatically typed to str

# modify columns with the vdf, do df logic with it

df = vdf.dispatch() # this builds the SQL and executes the query

print(df.head())

But this has a big implementation risk, since it is extremely complex. We need to handle all SQL keywords and give them semantic meaning, which is a really big task, even when just allowing a small subset of keywords.

An intermediate approach, which is still helpful, would be as follows:

from pyoso import Client

client = Client(API_KEY)

ctx, handle = client.table("example") # ctx has all the columns with type information

vdf = handle.query(
    f"""
        select {ctx.rows.column_0} from {ctx.table}
    """
)

# modify columns with the vdf, do df logic with it

df = vdf.dispatch() # this builds the SQL and executes the query

print(df.head())

This implementation is simpler. It provides typed access to all the columns, and possibly, extra metadata, through the context (ctx) object. The issue is that string interpolation is not a silver bullet, there's no semantic check and things can get messy real quick. But it is a viable option, it would prevent people from checking the schema on BigQuery.

These two approaches have something in common. How do we actually add the typing support? In python, there are some files called python interfaces (.pyi), which consist of a subset of python that gives information to the type checker. An example hello.pyi file could be:

def add(a: int, b: int) -> int: ...
def greet(name: str) -> str: ...

This is really powerful, because by writing a small subset of python-ish code, we can add typing support.

I would divide the library into two parts. The types, and the core functionality. The core functionality would be private (not exposed to the user except the query method, for example), stable, and would be the one in charge of all the virtual df handling, streams, authentication, etc. The typing part, would be possibly generated by another module, or even another programming language. This would fetch the schemas from our databases, and build the .pyi files, which would be bundled with our library. One caveat though, is that we'd need to re-publish our library every time schemas change.

In conclusion, on the long run I would like to see option A being the one we end up doing, but it is a big investment. We are a small team and it's a big project. Option B seems more feasible on the short/middle term, and would make our library have acceptable typing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

No branches or pull requests

2 participants