Skip to content

Conversation

Mirza-Samad-Ahmed-Baig
Copy link

@Mirza-Samad-Ahmed-Baig Mirza-Samad-Ahmed-Baig commented Jun 28, 2025

This commit introduces several refactorings across run.c, test.c, and
train.py to enhance code modularity, readability, and maintainability.
The primary goal was to reduce function complexity and nesting depth by
extracting distinct logical blocks into dedicated helper functions.

Key changes include:

run.c:

  • Extracted multihead_attention function: The complex multihead
    attention logic within the forward function has been moved into a
    new, self-contained multihead_attention function. This significantly
    reduces the nesting level and improves the clarity of the main
    forward loop.
  • Extracted process_utf8_bytes function: The intricate UTF-8 byte
    processing within the encode function was isolated into
    process_utf8_bytes. This simplifies the encode function and makes
    the byte-level operations more understandable.
  • Extracted render_chat_prompt function: The logic responsible for
    formatting chat prompts in the chat function has been moved to a new
    render_chat_prompt helper, making the chat function's flow clearer.
  • Extracted get_chat_prompts function: The logic for acquiring
    system and user prompts within the chat function has been
    encapsulated in get_chat_prompts, further streamlining the chat
    function.

test.c:

  • Extracted individual test_prompt_encoding functions: The
    monolithic test_prompt_encodings function was broken down into
    smaller, more focused functions (e.g., test_prompt_encoding_0,
    test_prompt_encoding_1, etc.). This improves test readability and
    makes it easier to identify and debug specific test failures.

train.py:

  • Extracted setup_ddp function: The distributed data parallel (DDP)
    setup logic has been moved into a dedicated setup_ddp function,
    centralizing configuration and improving the main script's clarity.
  • Extracted initialize_model function: The model initialization
    logic, including loading from scratch or resuming from a checkpoint,
    is now handled by initialize_model.
  • Extracted setup_optimizer function: The optimizer setup, including
    GradScaler initialization and loading optimizer state from
    checkpoints, has been moved to setup_optimizer.
  • Extracted save_checkpoint function: The logic for saving training
    checkpoints has been encapsulated in a save_checkpoint function,
    promoting reusability and cleaner code within the training loop.

These changes collectively contribute to a more organized,
readable, and maintainable codebase, making it easier to
understand, debug, and extend in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant