Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constant recompilation in dev mode - schema_version and repo_factory. python #9035

Open
ArthurHSUflow opened this issue Dec 11, 2024 · 0 comments
Assignees
Labels
question The issue is a question. Please use Stack Overflow for questions.

Comments

@ArthurHSUflow
Copy link

ArthurHSUflow commented Dec 11, 2024

Hi !
I've got an issue in dev mode where schema compilation are happening all the time.
Also, only happening using cube.py. Got the same scope with cube.js, and working fine (tried to switch to python to get more flexibility..)
With a lot of cubes it makes the playground unresponsive. From the log i can see that schema version seems to have a new hash suffixed all the time, triggering the recompile.
Slack thread: https://cube-js.slack.com/archives/C04NYBJP7RQ/p1733337668458919

Please find below a minimal setup that reproduces the issue.
Just have one model static model defined, in model repository. Refresh key defined but return constant value.
Database is local postgres

cube.py

from cube import config, file_repository

# Define users with their credentials and security context
USERS = {
    "arthur": {
        "password": "password1",
        "securityContext": {
            "user": "arthur",
            "apiType": "sql_api",
            "typologies": ["startup"],
        },
    }
}


@config('repository_factory')
def repository_factory(ctx: dict) -> list[dict]:
    repo = file_repository('model')
    # Sort by a stable attribute that all cubes have, e.g. 'name'
    repo = sorted(repo, key=lambda x: x.get('name', ''))
    return repo

@config('schema_version')
def schema_version(ctx):
    return "fixed_schema_version"



@config('check_sql_auth')
def check_sql_auth(req: dict, user_name: str, password: str) -> dict:
    """
    Authenticate SQL API users and return their security context.
    """
    user_data = USERS.get(user_name)
    if user_data:
        if user_data["password"] == password:
            security_context = user_data["securityContext"]
            print(f"User authenticated: {user_name}")
            return {"password": password, "securityContext": security_context}

    print(f"Authentication failed for user: {user_name}")
    raise Exception("Access denied")


@config('context_to_app_id')
def context_to_app_id(ctx: dict) -> str:
    access_type = ctx.get('securityContext', {}).get('access_type', None)
    app_id = "bi_api" if access_type == "sql_api" else "rest_api"
    print(f"Generated appId: {app_id} for access_type: {access_type}")
    return app_id

@config('scheduled_refresh_contexts')
def scheduled_refresh_contexts():
    return [
        {
            "securityContext": {
                "user": "defaultUser",
                "access_type": "sql_api",  # Instead of apiType, use access_type to match your code
                "typologies": []
            }
        }
    ]

docker-compose

version: '2.2'

services:
  cube_api:
    restart: always
    image: cubejs/cube:latest
    env_file:
      - .env
    volumes:
      - ./model:/cube/conf/model
      - ./cube.py:/cube/conf/cube.py
    ports:
      - "4000:4000"
    extra_hosts:
      - "host.docker.internal:host-gateway"
    networks:
      - cube_network

  cube_refresh_worker:
    restart: always
    image: cubejs/cube:latest
    env_file:
      - .env
    volumes:
      - ./model:/cube/conf/model
      - ./cube.py:/cube/conf/cube.py
    extra_hosts:
      - "host.docker.internal:host-gateway"
    networks:
      - cube_network

  cubestore_router:
    restart: always
    image: cubejs/cubestore:latest
    environment:
      - CUBESTORE_WORKERS=cubestore_worker_1:10001
      - CUBESTORE_REMOTE_DIR=/cube/data
      - CUBESTORE_META_PORT=9999
      - CUBESTORE_SERVER_NAME=cubestore_router:9999
      - CUBEJS_DB_SSL=false
    volumes:
      - .cubestore:/cube/data
    extra_hosts:
      - "host.docker.internal:host-gateway"
    networks:
      - cube_network


  cubestore_worker_1:
    restart: always
    image: cubejs/cubestore:latest
    environment:
      - CUBESTORE_WORKERS=cubestore_worker_1:10001
      - CUBESTORE_SERVER_NAME=cubestore_worker_1:10001
      - CUBESTORE_WORKER_PORT=10001
      - CUBESTORE_REMOTE_DIR=/cube/data
      - CUBESTORE_META_ADDR=cubestore_router:9999
      - CUBEJS_DB_SSL=false
    volumes:
      - .cubestore:/cube/data
    extra_hosts:
      - "host.docker.internal:host-gateway"
    depends_on:
      - cubestore_router
    networks:
      - cube_network

networks:
  cube_network:
    driver: bridge    
    ipam:
      config:
        - subnet: 172.28.0.0/16

Two set of logs
First one; think linked to refresh of cache. Can see schema compilation back to back

cube_refresh_worker_1  | 2024-12-11 09:40:05,853 TRACE [cubejs_native::python::runtime] New task
cube_refresh_worker_1  | Recompiling schema: undefined 
cube_refresh_worker_1  | {
cube_refresh_worker_1  |   "version": "fixed_schema_version_2e4917d42cf9d31f5d4bfcc1d79f1033"
cube_refresh_worker_1  | }
cube_refresh_worker_1  | 2024-12-11 09:40:05,856 TRACE [cubejs_native::python::runtime] New task
cube_refresh_worker_1  | Compiling schema completed: undefined (44ms)
cube_refresh_worker_1  | {
cube_refresh_worker_1  |   "version": "fixed_schema_version_27d71b3eea11de3095c682c5b78de311"
cube_refresh_worker_1  | }
cube_refresh_worker_1  | 2024-12-11 09:40:05,897 TRACE [cubejs_native::python::runtime] New task
cube_api_1             | Compiling schema completed: undefined (70ms)
cube_api_1             | {
cube_api_1             |   "version": "fixed_schema_version_2e4917d42cf9d31f5d4bfcc1d79f1033"
cube_api_1             | }
cube_api_1             | 2024-12-11 09:40:05,918 TRACE [cubejs_native::python::runtime] New task
cube_refresh_worker_1  | Compiling schema completed: undefined (88ms)
cube_refresh_worker_1  | {
cube_refresh_worker_1  |   "version": "fixed_schema_version_2e4917d42cf9d31f5d4bfcc1d79f1033"
cube_refresh_worker_1  | }
cube_refresh_worker_1  | 2024-12-11 09:40:05,945 TRACE [cubejs_native::python::runtime] New task
cube_api_1             | Compiling schema completed: undefined (103ms)
cube_api_1             | {
cube_api_1             |   "version": "fixed_schema_version_27d71b3eea11de3095c682c5b78de311"
cube_api_1             | }
cube_api_1             | 2024-12-11 09:40:05,958 TRACE [cubejs_native::python::runtime] New task
cube_refresh_worker_1  | Query started: scheduler-799e82c0-c2e8-4d33-a9a0-0752359b081e 
cube_refresh_worker_1  | {}
cube_api_1             | Recompiling schema: undefined 
cube_api_1             | {
cube_api_1             |   "version": "fixed_schema_version_2e4917d42cf9d31f5d4bfcc1d79f1033"

Another one when making a query from the playground. again schema compilation

cube_api_1             |   "securityContext": {
cube_api_1             |     "iat": 1733909855,
cube_api_1             |     "exp": 1733996255
cube_api_1             |   }
cube_api_1             | }
cube_api_1             | 2024-12-11 09:43:03,603 TRACE [cubejs_native::python::runtime] New task
cube_api_1             | Generated appId: rest_api for access_type: None
cube_api_1             | 2024-12-11 09:43:03,603 TRACE [cubejs_native::python::runtime] New task
cube_api_1             | 2024-12-11 09:43:03,603 TRACE [cubejs_native::python::runtime] New task
cube_api_1             | Recompiling schema: undefined 
cube_api_1             | {
cube_api_1             |   "version": "fixed_schema_version_27d71b3eea11de3095c682c5b78de311"
cube_api_1             | }
cube_api_1             | 2024-12-11 09:43:03,604 TRACE [cubejs_native::python::runtime] New task
cube_api_1             | 2024-12-11 09:43:03,604 TRACE [cubejs_native::python::runtime] New task
cube_api_1             | Compiling schema completed: undefined (30ms)
cube_api_1             | {
cube_api_1             |   "version": "fixed_schema_version_27d71b3eea11de3095c682c5b78de311"
cube_api_1             | }
cube_api_1             | Query Rewrite completed: f80c9060-7bc8-4c8d-b13d-0e7710354224-span-1 (33ms)
cube_api_1             | --
cube_api_1             | {
cube_api_1             |   "limit": 5000,
cube_api_1             |   "dimensions": [
cube_api_1             |     "innovation_provider_denorm_sh.typology_name",
cube_api_1             |     "innovation_provider_denorm_sh.name",
cube_api_1             |     "innovation_provider_denorm_sh.has_description"
cube_api_1             |   ]
cube_api_1             | }

I would expect the schema to be always the same.
At most two schema compilation, due to two possibles app_id

@igorlukanin igorlukanin self-assigned this Dec 16, 2024
@igorlukanin igorlukanin added the question The issue is a question. Please use Stack Overflow for questions. label Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question The issue is a question. Please use Stack Overflow for questions.
Projects
None yet
Development

No branches or pull requests

2 participants