KasarLabs
diff --git a/‎AGENTS.md‎
Lines changed: 47 additions & 0 deletions b/‎AGENTS.md‎
Lines changed: 47 additions & 0 deletions
diff --git a/‎API_DOCUMENTATION.md‎
Lines changed: 3 additions & 1 deletion b/‎API_DOCUMENTATION.md‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎packages/agents/src/types/index.ts‎
Lines changed: 1 addition & 0 deletions b/‎packages/agents/src/types/index.ts‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎packages/ingester/README.md‎
Lines changed: 3 additions & 0 deletions b/‎packages/ingester/README.md‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎packages/ingester/__tests__/IngesterFactory.test.ts‎
Lines changed: 1 addition & 0 deletions b/‎packages/ingester/__tests__/IngesterFactory.test.ts‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎packages/ingester/src/IngesterFactory.ts‎
Lines changed: 8 additions & 9 deletions b/‎packages/ingester/src/IngesterFactory.ts‎
Lines changed: 8 additions & 9 deletions
diff --git a/‎packages/ingester/src/ingesters/StarknetJSIngester.ts‎
Lines changed: 160 additions & 0 deletions b/‎packages/ingester/src/ingesters/StarknetJSIngester.ts‎
Lines changed: 160 additions & 0 deletions
diff --git a/‎python/MAINTAINER_GUIDE.md‎
Lines changed: 1 addition & 1 deletion b/‎python/MAINTAINER_GUIDE.md‎
Lines changed: 1 addition & 1 deletion
@@ -0,0 +1,47 @@
+# Agents Working Protocol
+
+This file documents conventions and checklists for making changes that affect the Cairo Coder agent system. Its scope applies to the entire repository.
+
+## Adding a Documentation Source
+
+When adding a new documentation source (e.g., a new docs site or SDK) make sure to complete all of the following steps:
+
+1. TypeScript ingestion (packages/ingester)
+
+   - Create an ingester class extending `BaseIngester` or `MarkdownIngester` under `packages/ingester/src/ingesters/`.
+   - Register it in `packages/ingester/src/IngesterFactory.ts`.
+   - Ensure chunks carry correct metadata: `uniqueId`, `contentHash`, `sourceLink`, and `source`.
+   - Run `pnpm generate-embeddings` (or `generate-embeddings:yes`) to populate/update the vector store.
+
+2. Agents (TS)
+
+   - Add the new enum value to `packages/agents/src/types/index.ts` under `DocumentSource`.
+   - Verify Postgres vector store accepts the new `source` and filters on it (`packages/agents/src/db/postgresVectorStore.ts`).
+
+3. Retrieval Pipeline (Python)
+
+   - Add the new enum value to `python/src/cairo_coder/core/types.py` under `DocumentSource`.
+   - Ensure filtering by `metadata->>'source'` works with the new value in `python/src/cairo_coder/dspy/document_retriever.py`.
+   - Update the query processor resource descriptions in `python/src/cairo_coder/dspy/query_processor.py` (`RESOURCE_DESCRIPTIONS`).
+
+4. Optimized Program Files (Python) — required
+
+   - If the query processor or retrieval prompts are optimized via compiled DSPy programs, you must also update the optimized program artifacts so they reflect the new resource.
+   - Specifically, review and update: `python/optimizers/results/optimized_retrieval_program.json` (and any other relevant optimized files, e.g., `optimized_rag.json`, `optimized_mcp_program.json`).
+   - Regenerate these artifacts if your change affects prompt instructions, available resource lists, or selection logic.
+
+5. API and Docs
+
+   - Ensure the new source appears where appropriate (e.g., `/v1/agents` output and documentation tables):
+     - `API_DOCUMENTATION.md`
+     - `packages/ingester/README.md`
+     - Any user-facing lists of supported sources
+
+6. Quick Sanity Check
+   - Ingest a small subset (or run a dry-run) and verify: rows exist in the vector DB with the new `source`, links open correctly, and retrieval can filter by the new source.
+
+## Notes
+
+- Keep changes minimal and consistent with existing style.
+- Do not commit credentials or large artifacts; optimized program JSONs are small and versioned.
+- If you add new files that define agent behavior, document them here.
@@ -57,7 +57,8 @@ Lists every agent registered in Cairo Coder.
       "cairo_by_example",
       "openzeppelin_docs",
       "corelib_docs",
-      "scarb_docs"
+      "scarb_docs",
+      "starknet_js"
     ]
   },
   {
@@ -80,6 +81,7 @@ Lists every agent registered in Cairo Coder.
 | `openzeppelin_docs` | OpenZeppelin Cairo contracts documentation |
 | `corelib_docs`      | Cairo core library docs                    |
 | `scarb_docs`        | Scarb package manager documentation        |
+| `starknet_js`       | StarknetJS guides and SDK documentation    |
 
 ## Chat Completions
 
 
@@ -74,6 +74,7 @@ export enum DocumentSource {
   OPENZEPPELIN_DOCS = 'openzeppelin_docs',
   CORELIB_DOCS = 'corelib_docs',
   SCARB_DOCS = 'scarb_docs',
+  STARKNET_JS = 'starknet_js',
 }
 
 export type BookChunk = {
 
@@ -29,6 +29,9 @@ The ingester currently supports the following documentation sources:
 3. **Starknet Foundry** (`starknet_foundry`): Documentation for the Starknet Foundry testing framework
 4. **Cairo By Example** (`cairo_by_example`): Examples of Cairo programming
 5. **OpenZeppelin Docs** (`openzeppelin_docs`): OpenZeppelin documentation for Starknet
+6. **Core Library Docs** (`corelib_docs`): Cairo core library documentation
+7. **Scarb Docs** (`scarb_docs`): Scarb package manager documentation
+8. **StarknetJS Guides** (`starknet_js`): StarknetJS guides and tutorials
 
 ## Architecture
 
 
@@ -85,6 +85,7 @@ describe('IngesterFactory', () => {
         'openzeppelin_docs',
         'corelib_docs',
         'scarb_docs',
+        'starknet_js',
       ]);
     });
   });
 
@@ -58,6 +58,12 @@ export class IngesterFactory {
         const { ScarbDocsIngester } = require('./ingesters/ScarbDocsIngester');
         return new ScarbDocsIngester();
 
+      case 'starknet_js':
+        const {
+          StarknetJSIngester,
+        } = require('./ingesters/StarknetJSIngester');
+        return new StarknetJSIngester();
+
       default:
         throw new Error(`Unsupported source: ${source}`);
     }
@@ -69,14 +75,7 @@ export class IngesterFactory {
    * @returns Array of available document sources
    */
   public static getAvailableSources(): DocumentSource[] {
-    return [
-      DocumentSource.CAIRO_BOOK,
-      DocumentSource.STARKNET_DOCS,
-      DocumentSource.STARKNET_FOUNDRY,
-      DocumentSource.CAIRO_BY_EXAMPLE,
-      DocumentSource.OPENZEPPELIN_DOCS,
-      DocumentSource.CORELIB_DOCS,
-      DocumentSource.SCARB_DOCS,
-    ];
+    const sources = Object.values(DocumentSource);
+    return sources;
   }
 }
@@ -0,0 +1,160 @@
+import * as path from 'path';
+import { exec as execCallback } from 'child_process';
+import { promisify } from 'util';
+import * as fs from 'fs/promises';
+import { BookConfig, BookPageDto, ParsedSection } from '../utils/types';
+import { MarkdownIngester } from './MarkdownIngester';
+import { DocumentSource, logger } from '@cairo-coder/agents';
+import { Document } from '@langchain/core/documents';
+import { BookChunk } from '@cairo-coder/agents/types/index';
+import { calculateHash } from '../utils/contentUtils';
+
+export class StarknetJSIngester extends MarkdownIngester {
+  private static readonly SKIPPED_DIRECTORIES = ['pictures', 'doc_scripts'];
+
+  constructor() {
+    const config: BookConfig = {
+      repoOwner: 'starknet-io',
+      repoName: 'starknet.js',
+      fileExtension: '.md',
+      chunkSize: 4096,
+      chunkOverlap: 512,
+    };
+
+    super(config, DocumentSource.STARKNET_JS);
+  }
+
+  protected getExtractDir(): string {
+    return path.join(__dirname, '..', '..', 'temp', 'starknet-js-guides');
+  }
+
+  protected async downloadAndExtractDocs(): Promise<BookPageDto[]> {
+    const extractDir = this.getExtractDir();
+    const repoUrl = `https://github.com/${this.config.repoOwner}/${this.config.repoName}.git`;
+    const exec = promisify(execCallback);
+
+    try {
+      // Clone the repository
+      // TODO: Consider sparse clone optimization for efficiency:
+      // git clone --depth 1 --filter=blob:none --sparse ${repoUrl} ${extractDir}
+      // cd ${extractDir} && git sparse-checkout set www/docs/guides
+      logger.info(`Cloning repository from ${repoUrl}...`);
+      await exec(`git clone ${repoUrl} ${extractDir}`);
+      logger.info('Repository cloned successfully');
+
+      // Navigate to the guides directory
+      const docsDir = path.join(extractDir, 'www', 'docs', 'guides');
+
+      // Process markdown files from the guides directory
+      const pages: BookPageDto[] = [];
+      await this.processDirectory(docsDir, docsDir, pages);
+
+      logger.info(
+        `Processed ${pages.length} markdown files from StarknetJS guides`,
+      );
+      return pages;
+    } catch (error) {
+      logger.error('Error downloading StarknetJS guides:', error);
+      throw new Error('Failed to download and extract StarknetJS guides');
+    }
+  }
+
+  private async processDirectory(
+    dir: string,
+    baseDir: string,
+    pages: BookPageDto[],
+  ): Promise<void> {
+    const entries = await fs.readdir(dir, { withFileTypes: true });
+
+    for (const entry of entries) {
+      const fullPath = path.join(dir, entry.name);
+
+      if (entry.isDirectory()) {
+        // Skip configured directories
+        if (StarknetJSIngester.SKIPPED_DIRECTORIES.includes(entry.name)) {
+          logger.debug(`Skipping directory: ${entry.name}`);
+          continue;
+        }
+        // Recursively process subdirectories
+        await this.processDirectory(fullPath, baseDir, pages);
+      } else if (entry.isFile() && entry.name.endsWith('.md')) {
+        // Read the markdown file
+        const content = await fs.readFile(fullPath, 'utf-8');
+
+        // Create relative path without extension for the name
+        const relativePath = path.relative(baseDir, fullPath);
+        const name = relativePath.replace(/\.md$/, '');
+
+        pages.push({
+          name,
+          content,
+        });
+
+        logger.debug(`Processed file: ${name}`);
+      }
+    }
+  }
+
+  protected parsePage(
+    content: string,
+    split: boolean = false,
+  ): ParsedSection[] {
+    // Strip frontmatter before parsing
+    const strippedContent = this.stripFrontmatter(content);
+    return super.parsePage(strippedContent, split);
+  }
+
+  public stripFrontmatter(content: string): string {
+    // Remove YAML frontmatter if present (delimited by --- at start and end)
+    const frontmatterRegex = /^---\n[\s\S]*?\n---\n?/;
+    return content.replace(frontmatterRegex, '').trimStart();
+  }
+
+  /**
+   * Create chunks from a single page with a proper source link to GitHub
+   * This overrides the default to attach a meaningful URL for UI display.
+   */
+  protected createChunkFromPage(
+    page_name: string,
+    page_content: string,
+  ): Document<BookChunk>[] {
+    const baseUrl =
+      'https://github.com/starknet-io/starknet.js/blob/main/www/docs/guides';
+    const pageUrl = `${baseUrl}/${page_name}.md`;
+
+    const localChunks: Document<BookChunk>[] = [];
+    const sanitizedContent = this.sanitizeCodeBlocks(
+      this.stripFrontmatter(page_content),
+    );
+
+    const sections = this.parsePage(sanitizedContent, true);
+
+    sections.forEach((section: ParsedSection, index: number) => {
+      // Reuse hashing and metadata shape from parent implementation by constructing Document directly
+      // Importantly, attach a stable page-level sourceLink for the UI
+      const content = section.content;
+      const title = section.title;
+      const uniqueId = `${page_name}-${index}`;
+
+      // Lightweight hash to keep parity with other ingesters without duplicating util impl
+      const contentHash = calculateHash(content);
+
+      localChunks.push(
+        new Document<BookChunk>({
+          pageContent: content,
+          metadata: {
+            name: page_name,
+            title,
+            chunkNumber: index,
+            contentHash,
+            uniqueId,
+            sourceLink: pageUrl,
+            source: this.source,
+          },
+        }),
+      );
+    });
+
+    return localChunks;
+  }
+}
@@ -72,7 +72,7 @@ graph TD
 Cairo Coder's goal is to democratize Cairo development by providing an intelligent code generation service that:
 
 - Understands natural language queries (e.g., "Create an ERC20 token with minting").
-- Retrieves relevant documentation from sources like Cairo Book, Starknet Docs, Scarb, OpenZeppelin.
+- Retrieves relevant documentation from sources like Cairo Book, Starknet Docs, Scarb, OpenZeppelin, and StarknetJS.
 - Generates compilable Cairo code with explanations, following best practices.
 - Supports specialized agents (e.g., for Scarb config, Starknet deployment).
 - Is optimizable to improve accuracy over time using datasets like Starklings exercises.
Original file line number	Diff line number	Diff line change
`@@ -74,6 +74,7 @@ export enum DocumentSource {`
`74`	`74`	`OPENZEPPELIN_DOCS = 'openzeppelin_docs',`
`75`	`75`	`CORELIB_DOCS = 'corelib_docs',`
`76`	`76`	`SCARB_DOCS = 'scarb_docs',`
	`77`	`+ STARKNET_JS = 'starknet_js',`
`77`	`78`	`}`
`78`	`79`
`79`	`80`	`export type BookChunk = {`