Skip to content

Conversation

Copy link

Copilot AI commented Nov 5, 2025

Created a single, consolidated development guide to replace scattered documentation and provide contributors with a complete reference for working on Sparkling Water.

What's Added

DEVELOPMENT.md (1,126 lines) covering:

  • Architecture: ASCII diagrams showing H2OContext bridge layer, internal/external backend modes, and module responsibilities across Scala/Python/R APIs
  • Repository layout: Annotated directory tree explaining core (H2OContext, converters), ml (algorithm wrappers), py/r (language bindings), scoring, examples
  • Development setup: Prerequisites (Java 8/11, Spark, Gradle), environment variables (SPARK_HOME, MASTER), build commands, shell usage
  • Testing: ScalaTest/pytest frameworks, unit/integration/benchmark commands, test structure conventions
  • Code style: Scalafmt config (.scalafmt.conf), naming conventions, common patterns for data conversion and context initialization
  • Workflows: Git branching strategy, commit conventions, PR process, CI checks (Jenkins pipelines)
  • Debugging & troubleshooting: IDE setup, JVM options, common errors (port conflicts, memory, version mismatches), platform-specific notes
  • Extension patterns: Step-by-step guides for adding H2O algorithm wrappers, REST endpoints, configuration options, new modules
  • Security & performance: Authentication setup, secrets handling, memory management for Spark/H2O integration, quality gates

Key Sections

The guide provides text-based architecture diagrams showing data flow:

User Application (Scala/Python/R)
         ↓
    H2OContext (Core Bridge)
    • Data Conversion (Spark ↔ H2O)
    • Backend Management
         ↓
   Spark RDD/DataFrame ↔ H2O Frame

And practical developer workflows like:

# Make changes in core
./gradlew :sparkling-water-core:test
./gradlew :sparkling-water-core:integTest
./bin/run-example.sh AirlinesWithWeatherDemo

Consolidates information from doc/src/site/sphinx/devel/ RST files while adding architecture context, troubleshooting, and extension patterns not previously documented in a single location.

Original prompt

Create a single, well-structured DEVELOPMENT.md file that explains everything a contributor needs to know to work effectively on this project, including an architecture overview. The result should be written in clear, concise Markdown and tailored to THIS repository’s codebase and configuration files.
Produce a DEVELOPMENT.md that includes clear sections like this (adapt the titles if needed):

  1. Overview

    • Brief summary of what the project is and its purpose.
    • Primary technologies and high-level responsibilities (frontend / backend / services / libraries).
    • Link to README and any other key docs (CONTRIBUTING, CODE_OF_CONDUCT, etc.) if they exist.
  2. Architecture Overview

    • High-level description of the system architecture: main components, how they interact, and boundaries (e.g. frontend, backend, database, external APIs, queues, background jobs).
    • Describe the main modules / services / packages and what they are responsible for.
    • Mention where the core domain logic lives and where integration/infra logic lives.
    • Provide at least one text-based “diagram” (ASCII or Markdown list/tree) that shows:
      • Major components and their relationships.
      • Request/response or event flows (e.g. frontend → API → DB).
    • Call out any external systems (APIs, third-party integrations, cloud services).
    • Explain how configuration and secrets are handled (e.g. .env files, environment variables, config files).
  3. Repository Layout

    • Briefly describe the folder structure and what each major directory is for.
    • Highlight where to find:
      • Application entry points.
      • Core domain code.
      • Shared utilities/components.
      • Tests.
      • Scripts / tooling / infra / deployment files.
    • Use a code block tree view (e.g. tree style) for clarity, but only down to a useful depth.
  4. Getting Started for Development

    • Prerequisites (languages, runtimes, package managers, Docker, databases, etc.).
    • Setup instructions:
      • How to clone the repo.
      • How to install dependencies.
      • How to configure environment variables.
      • How to set up any required local services (DB, queues, external emulators).
    • Commands to:
      • Run the app locally (frontend, backend, or both).
      • Run dev servers with hot reload (if applicable).
    • Any first-time migration or bootstrap steps (e.g. seeding databases, running migrations).
  5. Running & Writing Tests

    • Which test frameworks are used and where tests live.
    • Commands to run:
      • Unit tests.
      • Integration tests.
      • End-to-end tests (if applicable).
    • How to run tests for a single package/module (if monorepo).
    • Any testing conventions (file naming, test data, fixtures).
    • How to run tests in CI locally (if possible via scripts or Docker).
  6. Code Style & Conventions

    • Languages and versions used (TypeScript/JS, Python, Go, etc.).
    • Linters, formatters, and static analysis tools (e.g. ESLint, Prettier, flake8, mypy).
    • Commands to run formatting and linting.
    • Key style or architectural conventions specific to this project:
      • Naming conventions (for files, classes, components, etc.).
      • Preferred patterns (e.g. hooks, services, repositories, dependency injection).
      • Any “do’s and don’ts” worth calling out.
  7. Branching, Workflow & Commit Practices

    • Git branching strategy (e.g. trunk-based, git-flow, feature branches).
    • How to create feature branches and open pull requests.
    • Commit message conventions (e.g. Conventional Commits).
    • Expected review process (who reviews, required approvals, checks that must pass).
    • Any labels or PR templates worth mentioning.
  8. Local Development Tips

    • Common dev workflows (e.g. “typical frontend workflow”, “working on backend APIs”, “updating DB schema”).
    • How to debug:
      • Recommended VS Code or IDE configurations, if inferable.
      • Common breakpoints or logging patterns.
    • Troubleshooting guide:
      • Common errors during setup or dev and how to fix them.
      • Known platform-specific issues (Mac/Windows/Linux).
  9. Database & Data Model (if applicable)

    • Brief description of the main data storage (SQL/NoSQL, ORM, migrations).
    • Where schema/migration files live and how to:
      • Create new migrations.
      • Apply migrations locally.
    • Any important data-related constraints or patterns.
  10. Security, Performance & Quality Considerations

    • Any security-sensitive parts of the code (auth, authZ, secrets).
    • Guidelines for handling secrets, tokens, and sensitive data.
    • Performance considerations: known bottlenecks, caching, rate limits, etc.
    • Any quality gates enforced in CI (coverage thresholds, required checks).
  11. CI/CD & Deployment (Contributor Perspective)

    • Briefly describe the CI pipeline (which provider is used, what checks run).
    • How PRs are validated (tests, linters, builds).
    • Any p...

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI changed the title [WIP] Add DEVELOPMENT.md for contributor guidelines Add comprehensive DEVELOPMENT.md for contributors Nov 5, 2025
Copilot AI requested a review from mmalohlava November 5, 2025 03:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants