Skip to content

@libsql/client Memory leak when doing batch inserts to a file #333

@kevinfiol

Description

@kevinfiol

System info:

  • OS: Fedora Linux 42
  • CPU: 12th Gen Intel® Core™ i5-12400F (12)
  • Node: 24.11.1
  • @libsql/client: 0.15.15

Steps to reproduce:

  1. Created the following table:
create table if not exists user_indexed (
  id integer primary key autoincrement,
  first_name text not null,
  last_name text not null,
  email text not null,
  birthday text,
  is_pro integer not null default 0,
  deleted_at text,
  created_at text default CURRENT_TIMESTAMP,
  updated_at text
) strict;
  1. Run the following seed script (@faker-js/faker is used to generate random data):
import { memoryUsage } from 'node:process';
import { createClient } from '@libsql/client';
import { faker } from '@faker-js/faker';

const MEMORY_LIMIT = 2_147_483_648; // 2GB
const USER_COUNT = 500_000;
const BATCH_SIZE = 20_000;

const db = createClient({
  url: 'file:db.sqlite'
});

// set pragmas
db.execute('pragma journal_mode = wal;')

// generate user data
const users = [];
for (let i = 0; i < USER_COUNT; i++) {
  const firstName = faker.person.firstName();
  const lastName = faker.person.lastName();
  const email = faker.internet.email({ firstName, lastName }).toLowerCase();
  const birthday = faker.date.birthdate().toISOString();
  const is_pro = faker.number.int({ min: 0, max: 1 });

  const today = (new Date()).toISOString();
  const created_at = faker.date.between({ from: '2010-01-01', to: today }).toISOString();
  const updated_at = faker.date.between({ from: created_at, to: today }).toISOString();

  const is_deleted = faker.datatype.boolean();
  const deleted_at = is_deleted ? faker.date.between({ from: updated_at, to: today }).toISOString() : null;

  users.push({
    firstName,
    lastName,
    email,
    birthday,
    is_pro,
    deleted_at,
    created_at,
    updated_at
  });
}

// create insert statements
const stmts = [];
for (const user of users) {
  stmts.push({
    sql: `
      insert into user_indexed (
        first_name,
        last_name,
        email,
        birthday,
        is_pro,
        deleted_at,
        created_at,
        updated_at
      ) values (
        :firstName,
        :lastName,
        :email,
        :birthday,
        :is_pro,
        :deleted_at,
        :created_at,
        :updated_at
      )
    `,
    args: user
  });
}

try {
  console.time('seeding');

  let count = 0;
  while (stmts.length > 0) {
    const memUsageInBytes = memoryUsage.rss();
    console.log(`memory usage: ${formatBytes(memUsageInBytes)}`)
    if (memUsageInBytes >= MEMORY_LIMIT) {
      console.error('exceeded memory limit. exiting');
      process.exit(1);
    }

    const batch = stmts.splice(0, BATCH_SIZE);
    await db.batch(batch);

    count += BATCH_SIZE;
    console.log(`seeded ${count} rows...`)
  }

  console.log('DB has been seeded');
  console.timeEnd('seeding');
} catch (e) {
  throw e;
}

function formatBytes(bytes) {
  const kb = 1024;
  const mb = kb * 1024;
  const gb = mb * 1024;

  if (bytes < kb) {
    return bytes + ' B';
  } else if (bytes < mb) {
    return (bytes / kb).toFixed(2) + ' KB';
  } else if (bytes < gb) {
    return (bytes / mb).toFixed(2) + ' MB';
  } else {
    return (bytes / gb).toFixed(2) + ' GB';
  }
}
  1. Track the memory usage. From my results, it seems to balloon with every batch of insertions:
$ node scripts/seed-leak.js 
memory usage: 372.71 MB
seeded 20000 rows...
memory usage: 483.02 MB
seeded 40000 rows...
memory usage: 623.32 MB
seeded 60000 rows...
memory usage: 734.32 MB
seeded 80000 rows...
memory usage: 852.62 MB
seeded 100000 rows...
memory usage: 963.37 MB
seeded 120000 rows...
memory usage: 1.06 GB
seeded 140000 rows...
memory usage: 1.18 GB
seeded 160000 rows...
memory usage: 1.28 GB
seeded 180000 rows...
memory usage: 1.40 GB
seeded 200000 rows...
memory usage: 1.51 GB
seeded 220000 rows...
memory usage: 1.63 GB
seeded 240000 rows...
memory usage: 1.75 GB
seeded 260000 rows...
memory usage: 1.85 GB
seeded 280000 rows...
memory usage: 1.97 GB
seeded 300000 rows...
memory usage: 2.08 GB
exceeded memory limit. exiting

Some observations that may be important:

  • I did not experience the memory leak when replacing @libsql/client with [email protected].
  • I did not experience the memory leak when replacing @libsql/client with libsql, which has a better-sqlite3-compatible API (however, it was considerably slower than better-sqlite3).
  • In both cases, I wrapped inserts in transactions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions