Skip to content

Commit 882a79d

Browse files
sestinjPatrick-Erichsendevbyjonahzcf0508TenDRILLL
authored
Dev (#1662)
* docs: add docs and schema for "OS" provider (#1536) * ignore .env * ✨ use and cache imports for autocomplete (#1456) * ✨ use and cache imports for autocomplete * fix tsc * add voyage rerank-1 * import Handlebars * feat: update onboarding w/ embeddings model (#1570) * chore(gui): remove unused pages * feat: add embeddings step * feat: update styles * feat: copy button updates * fix: correct pull command for embed model * fix: remove commented code * fix: remove commented code * feat: simplify copy btn props * chore: rename onboarding selection event * feat: add provider config * fix: undo msg name * remove dead code * fix: invalid mode check * fix: remove testing logic * fix: fullscreen gui retains context when hidden, fixed fullscreen focusing (#1582) * small UI tweaks * media query * feat: add best experience onboarding * small fixes * feat: add free trial card to onboarding (#1600) * feat: add free trial card to onboarding * add import * chore: add telemetry for full screen toggle (#1618) * rerank-lite-1 * basic tests for VS Code extension * chore: onboarding metrics (#1626) * fix: pageview tracking * feat: add onboarding telemetry * create single `onboardingStatus` type * improved var naming * remove console logs * fix double adding of context providers * fix cross-platform build validation * Update troubleshooting.md (#1637) * add back skip onboarding button * fix free trial embeddings error * Nate/indexing fixes (#1642) * fix pausing of indexing * don't send empty array to openai embeddings * catch embeddings errors without stopping entire indexing process * update version * changelog * Update troubleshooting.md (#1646) * chore: reduce vscode extension bundle size (#1647) * feat: make disabled state a tooltip (#1653) * add content-type header to ollama /api/show req * support legacy OpenAI formatted servers * Tests for indexing + follow all .gitignore syntax (#1661) * cleaner indexing progress updates messages * chunking tests * first round of testing for walkDir in .ts * few more tests * swap fs with ide * clean up dead code * replace traverseDirectory * fix listFolders * smoother indexing updates for chunking * ide pathSetp * absolute paths test * fix path sep error with abs paths on windows * clean up tests * feat: Client Certificate Options Support (#1658) * feat: support client certificate authentication options * docs: support client certificate authentication options * chore: update package.json * docs: move clientCertificate to it's own example * update config_schema.json with client cert options * Add support for the HuggingFace Text Embeddings Inference server (#1657) Co-authored-by: Rob Leidle <[email protected]> * update package.json version --------- Co-authored-by: Patrick Erichsen <[email protected]> Co-authored-by: Jonah Wagner <[email protected]> Co-authored-by: 华丽 <[email protected]> Co-authored-by: Ten <[email protected]> Co-authored-by: Rob Leidle <[email protected]> Co-authored-by: Rob Leidle <[email protected]>
1 parent d69b830 commit 882a79d

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

48 files changed

+1094
-244
lines changed

.prompts/test.prompt renamed to .prompts/jest.prompt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,4 +16,5 @@ Write unit tests for the above selected code, following each of these instructio
1616
- The tests should be complete and sophisticated
1717
- Give the tests just as chat output, don't edit any file
1818
- Don't explain how to set up `jest`
19-
- Write a single code block, making sure to label with the language being used (e.g. "```typscript")
19+
- Write a single code block, making sure to label with the language being used (e.g. "```typscript")
20+
- Do not under any circumstances mock any functions or modules

.vscode/launch.json

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -22,15 +22,21 @@
2222
"CONTINUE_GLOBAL_DIR": "${workspaceFolder}/binary/.continue"
2323
}
2424
},
25+
2526
{
27+
"name": "Debug Jest Tests",
2628
"type": "node",
2729
"request": "launch",
28-
"name": "Jest All",
29-
"program": "${workspaceFolder}/core/node_modules/.bin/jest",
30-
"args": ["--runInBand"],
30+
"runtimeArgs": [
31+
"--inspect-brk",
32+
"${workspaceRoot}/core/node_modules/.bin/jest",
33+
"${fileBasenameNoExtension}",
34+
"--runInBand",
35+
"--config",
36+
"${workspaceRoot}/core/jest.config.js"
37+
],
3138
"console": "integratedTerminal",
32-
"internalConsoleOptions": "neverOpen",
33-
"disableOptimisticBPs": true
39+
"internalConsoleOptions": "neverOpen"
3440
},
3541
{
3642
"type": "chrome",

core/config/promptFile.ts

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ import Handlebars from "handlebars";
22
import path from "path";
33
import * as YAML from "yaml";
44
import type { IDE, SlashCommand } from "..";
5+
import { walkDir } from "../indexing/walkDir";
56
import { stripImages } from "../llm/countTokens.js";
67
import { renderTemplatedString } from "../llm/llms/index.js";
78
import { getBasename } from "../util/index.js";
@@ -18,7 +19,7 @@ export async function getPromptFiles(
1819
return [];
1920
}
2021

21-
const paths = await ide.listWorkspaceContents(dir, false);
22+
const paths = await walkDir(dir, ide, { ignoreFiles: [] });
2223
const results = paths.map(async (path) => {
2324
const content = await ide.readFile(path);
2425
return { path, content };

core/config/types.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -385,7 +385,6 @@ declare global {
385385
stackDepth: number,
386386
): Promise<string[]>;
387387
getAvailableThreads(): Promise<Thread[]>;
388-
listWorkspaceContents(directory?: string, useGitIgnore?: boolean): Promise<string[]>;
389388
listFolders(): Promise<string[]>;
390389
getWorkspaceDirs(): Promise<string[]>;
391390
getWorkspaceConfigs(): Promise<ContinueRcJson[]>;
@@ -639,6 +638,7 @@ declare global {
639638
}
640639
641640
export type EmbeddingsProviderName =
641+
| "huggingface-tei"
642642
| "transformers.js"
643643
| "ollama"
644644
| "openai"

core/context/providers/FileContextProvider.ts

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,12 @@ import {
55
ContextSubmenuItem,
66
LoadSubmenuItemsArgs,
77
} from "../../index.js";
8-
import { getBasename, groupByLastNPathParts, getUniqueFilePath } from "../../util/index.js";
8+
import { walkDir } from "../../indexing/walkDir.js";
9+
import {
10+
getBasename,
11+
getUniqueFilePath,
12+
groupByLastNPathParts,
13+
} from "../../util/index.js";
914
import { BaseContextProvider } from "../index.js";
1015

1116
const MAX_SUBMENU_ITEMS = 10_000;
@@ -40,12 +45,12 @@ class FileContextProvider extends BaseContextProvider {
4045
const workspaceDirs = await args.ide.getWorkspaceDirs();
4146
const results = await Promise.all(
4247
workspaceDirs.map((dir) => {
43-
return args.ide.listWorkspaceContents(dir);
48+
return walkDir(dir, args.ide);
4449
}),
4550
);
46-
const files = results.flat().slice(-MAX_SUBMENU_ITEMS);
51+
const files = results.flat().slice(-MAX_SUBMENU_ITEMS);
4752
const fileGroups = groupByLastNPathParts(files, 2);
48-
53+
4954
return files.map((file) => {
5055
return {
5156
id: file,

core/context/providers/FileTreeContextProvider.ts

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ import {
33
ContextProviderDescription,
44
ContextProviderExtras,
55
} from "../../index.js";
6+
import { walkDir } from "../../indexing/walkDir.js";
67
import { splitPath } from "../../util/index.js";
78
import { BaseContextProvider } from "../index.js";
89

@@ -43,7 +44,7 @@ class FileTreeContextProvider extends BaseContextProvider {
4344
const trees = [];
4445

4546
for (const workspaceDir of workspaceDirs) {
46-
const contents = await extras.ide.listWorkspaceContents(workspaceDir);
47+
const contents = await walkDir(workspaceDir, extras.ide);
4748

4849
const subDirTree: Directory = {
4950
name: splitPath(workspaceDir).pop() ?? "",

core/index.d.ts

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -434,10 +434,6 @@ export interface IDE {
434434
stackDepth: number,
435435
): Promise<string[]>;
436436
getAvailableThreads(): Promise<Thread[]>;
437-
listWorkspaceContents(
438-
directory?: string,
439-
useGitIgnore?: boolean,
440-
): Promise<string[]>;
441437
listFolders(): Promise<string[]>;
442438
getWorkspaceDirs(): Promise<string[]>;
443439
getWorkspaceConfigs(): Promise<ContinueRcJson[]>;
@@ -482,6 +478,7 @@ export interface IDE {
482478

483479
// Callbacks
484480
onDidChangeActiveTextEditor(callback: (filepath: string) => void): void;
481+
pathSep(): Promise<string>;
485482
}
486483

487484
// Slash Commands
@@ -667,6 +664,13 @@ export interface RequestOptions {
667664
headers?: { [key: string]: string };
668665
extraBodyProperties?: { [key: string]: any };
669666
noProxy?: string[];
667+
clientCertificate?: ClientCertificateOptions;
668+
}
669+
670+
export interface ClientCertificateOptions {
671+
cert: string;
672+
key: string;
673+
passphrase?: string;
670674
}
671675

672676
export interface StepWithParams {
@@ -722,6 +726,7 @@ export interface ModelDescription {
722726
}
723727

724728
export type EmbeddingsProviderName =
729+
| "huggingface-tei"
725730
| "transformers.js"
726731
| "ollama"
727732
| "openai"

core/indexing/LanceDbIndex.ts

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -322,7 +322,7 @@ export class LanceDbIndex implements CodebaseIndex {
322322
accumulatedProgress += 1 / results.addTag.length / 3;
323323
yield {
324324
progress: accumulatedProgress,
325-
desc: `Indexing ${path}`,
325+
desc: `Indexing ${getBasename(path)}`,
326326
status: "indexing",
327327
};
328328
}
@@ -337,7 +337,7 @@ export class LanceDbIndex implements CodebaseIndex {
337337
accumulatedProgress += 1 / toDel.length / 3;
338338
yield {
339339
progress: accumulatedProgress,
340-
desc: `Stashing ${path}`,
340+
desc: `Stashing ${getBasename(path)}`,
341341
status: "indexing",
342342
};
343343
}
@@ -354,7 +354,7 @@ export class LanceDbIndex implements CodebaseIndex {
354354
accumulatedProgress += 1 / results.del.length / 3;
355355
yield {
356356
progress: accumulatedProgress,
357-
desc: `Removing ${path}`,
357+
desc: `Removing ${getBasename(path)}`,
358358
status: "indexing",
359359
};
360360
}

core/indexing/chunk/ChunkCodebaseIndex.ts

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,9 @@ export class ChunkCodebaseIndex implements CodebaseIndex {
9494
}
9595
}
9696

97+
const progressReservedForTagging = 0.3;
98+
let accumulatedProgress = 0;
99+
97100
// Compute chunks for new files
98101
const contents = await Promise.all(
99102
results.compute.map(({ path }) => this.readFile(path)),
@@ -111,8 +114,10 @@ export class ChunkCodebaseIndex implements CodebaseIndex {
111114
handleChunk(chunk);
112115
}
113116

117+
accumulatedProgress =
118+
(i / results.compute.length) * (1 - progressReservedForTagging);
114119
yield {
115-
progress: i / results.compute.length,
120+
progress: accumulatedProgress,
116121
desc: `Chunking ${getBasename(item.path)}`,
117122
status: "indexing",
118123
};
@@ -134,6 +139,12 @@ export class ChunkCodebaseIndex implements CodebaseIndex {
134139
}
135140

136141
markComplete([item], IndexResultType.AddTag);
142+
accumulatedProgress += 1 / results.addTag.length / 4;
143+
yield {
144+
progress: accumulatedProgress,
145+
desc: `Chunking ${getBasename(item.path)}`,
146+
status: "indexing",
147+
};
137148
}
138149

139150
// Remove tag
@@ -150,6 +161,12 @@ export class ChunkCodebaseIndex implements CodebaseIndex {
150161
[tagString, item.cacheKey, item.path],
151162
);
152163
markComplete([item], IndexResultType.RemoveTag);
164+
accumulatedProgress += 1 / results.removeTag.length / 4;
165+
yield {
166+
progress: accumulatedProgress,
167+
desc: `Removing ${getBasename(item.path)}`,
168+
status: "indexing",
169+
};
153170
}
154171

155172
// Delete
@@ -164,6 +181,12 @@ export class ChunkCodebaseIndex implements CodebaseIndex {
164181
]);
165182

166183
markComplete([item], IndexResultType.Delete);
184+
accumulatedProgress += 1 / results.del.length / 4;
185+
yield {
186+
progress: accumulatedProgress,
187+
desc: `Removing ${getBasename(item.path)}`,
188+
status: "indexing",
189+
};
167190
}
168191
}
169192
}

core/indexing/chunk/basic.ts

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,10 @@ export function* basicChunker(
55
contents: string,
66
maxChunkSize: number,
77
): Generator<ChunkWithoutID> {
8+
if (contents.trim().length === 0) {
9+
return;
10+
}
11+
812
let chunkContent = "";
913
let chunkTokens = 0;
1014
let startLine = 0;

core/indexing/chunk/code.ts

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,10 @@ function collapseChildren(
5757
}
5858
code = code.slice(node.startIndex);
5959
let removedChild = false;
60-
while (countTokens(code) > maxChunkSize && collapsedChildren.length > 0) {
60+
while (
61+
countTokens(code.trim()) > maxChunkSize &&
62+
collapsedChildren.length > 0
63+
) {
6164
removedChild = true;
6265
// Remove children starting at the end - TODO: Add multiple chunks so no children are missing
6366
const childCode = collapsedChildren.pop()!;
Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
import fetch, { Response } from "node-fetch";
2+
import { EmbedOptions, FetchFunction } from "../..";
3+
import { withExponentialBackoff } from "../../util/withExponentialBackoff";
4+
import BaseEmbeddingsProvider from "./BaseEmbeddingsProvider";
5+
6+
class HuggingFaceTEIEmbeddingsProvider extends BaseEmbeddingsProvider {
7+
private maxBatchSize = 32;
8+
9+
static defaultOptions: Partial<EmbedOptions> | undefined = {
10+
apiBase: "http://localhost:8080",
11+
model: "tei",
12+
};
13+
14+
constructor(options: EmbedOptions, fetch: FetchFunction) {
15+
super(options, fetch);
16+
// without this extra slash the last portion of the path will be dropped from the URL when using the node.js URL constructor
17+
if (!this.options.apiBase?.endsWith("/")) {
18+
this.options.apiBase += "/";
19+
}
20+
this.doInfoRequest().then(response => {
21+
this.options.model = response.model_id;
22+
this.maxBatchSize = response.max_client_batch_size;
23+
});
24+
}
25+
26+
async embed(chunks: string[]) {
27+
const promises = [];
28+
for (let i = 0; i < chunks.length; i += this.maxBatchSize) {
29+
promises.push(this.doEmbedRequest(chunks.slice(i, i + this.maxBatchSize)));
30+
}
31+
const results = await Promise.all(promises);
32+
return results.flat();
33+
}
34+
35+
async doEmbedRequest(batch: string[]): Promise<number[][]> {
36+
const resp = await withExponentialBackoff<Response>(() =>
37+
this.fetch(new URL("embed", this.options.apiBase), {
38+
method: "POST",
39+
body: JSON.stringify({
40+
inputs: batch
41+
}),
42+
headers: {
43+
"Content-Type": "application/json",
44+
}
45+
}),
46+
);
47+
if (!resp.ok) {
48+
const text = await resp.text();
49+
const embedError = JSON.parse(text) as TEIEmbedErrorResponse;
50+
if (!embedError.error_type || !embedError.error) {
51+
throw new Error(text);
52+
}
53+
throw new TEIEmbedError(embedError);
54+
}
55+
return (await resp.json()) as number[][];
56+
}
57+
58+
async doInfoRequest(): Promise<TEIInfoResponse> {
59+
const resp = await withExponentialBackoff<Response>(() =>
60+
this.fetch(new URL("info", this.options.apiBase), {
61+
method: "GET",
62+
}),
63+
);
64+
if (!resp.ok) {
65+
throw new Error(await resp.text());
66+
}
67+
return (await resp.json()) as TEIInfoResponse;
68+
}
69+
}
70+
71+
class TEIEmbedError extends Error {
72+
constructor(teiResponse: TEIEmbedErrorResponse) {
73+
super(JSON.stringify(teiResponse));
74+
}
75+
}
76+
77+
type TEIEmbedErrorResponse = {
78+
error: string
79+
error_type: string
80+
}
81+
82+
type TEIInfoResponse = {
83+
model_id: string;
84+
model_sha: string;
85+
model_dtype: string;
86+
model_type: {
87+
embedding: {
88+
pooling: string;
89+
}
90+
};
91+
max_concurrent_requests: number;
92+
max_input_length: number;
93+
max_batch_tokens: number;
94+
max_batch_requests: number;
95+
max_client_batch_size: number;
96+
auto_truncate: boolean;
97+
tokenization_workers: number;
98+
version: string;
99+
sha: string;
100+
docker_label: string;
101+
};
102+
103+
export default HuggingFaceTEIEmbeddingsProvider;

core/indexing/embeddings/index.ts

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ import { EmbeddingsProviderName } from "../../index.js";
22
import BaseEmbeddingsProvider from "./BaseEmbeddingsProvider.js";
33
import CohereEmbeddingsProvider from "./CohereEmbeddingsProvider.js";
44
import FreeTrialEmbeddingsProvider from "./FreeTrialEmbeddingsProvider.js";
5+
import HuggingFaceTEIEmbeddingsProvider from "./HuggingFaceTEIEmbeddingsProvider.js";
56
import OllamaEmbeddingsProvider from "./OllamaEmbeddingsProvider.js";
67
import OpenAIEmbeddingsProvider from "./OpenAIEmbeddingsProvider.js";
78
import TransformersJsEmbeddingsProvider from "./TransformersJsEmbeddingsProvider.js";
@@ -22,5 +23,7 @@ export const allEmbeddingsProviders: Record<
2223
cohere: CohereEmbeddingsProvider,
2324
// eslint-disable-next-line @typescript-eslint/naming-convention
2425
"free-trial": FreeTrialEmbeddingsProvider,
26+
// eslint-disable-next-line @typescript-eslint/naming-convention
27+
"huggingface-tei": HuggingFaceTEIEmbeddingsProvider,
2528
gemini: GeminiEmbeddingsProvider,
2629
};

0 commit comments

Comments
 (0)