Skip to content

Performance Bottleneck in nodejs-polars When Creating Multiple Expr Objects #265

@chenstarx

Description

@chenstarx

Have you tried latest version of polars?

  • [yes]

What version of polars are you using?

0.0.15

What operating system are you using polars on?

MacOS 14.4.1 (M3 Pro)

What node version are you using

v20.12.1

Describe your bug.

I have encountered a significant performance issue when using the nodejs-polars library. Specifically, the time required to create multiple Expr objects is considerably higher compared to the Python version of polars.

What are the steps to reproduce the behavior?

To illustrate the issue, I conducted a performance test by generating one million Expr objects in both nodejs-polars and Python polars. The following code snippets demonstrate the test setup:

Python Code

import polars as pl
import time

start_time = time.time()

for _ in range(1000000):
    _ = pl.col("A") > pl.lit(1)

end_time = time.time()
print(f"Time taken in Python: {end_time - start_time} seconds")

Node.js Code

const pl = require('nodejs-polars');
console.time('Expr Creation');

for (let i = 0; i < 1000000; i++) {
  const expr = pl.col("A").gt(pl.lit(1));
}

console.timeEnd('Expr Creation');

What is the actual behavior?

Python polars: Approximately 7 seconds to create 1,000,000 Expr objects.
Node.js polars: Approximately 1,000 seconds to create the same number of Expr objects.

  • Each iteration of the for loop took approximately 1ms
Impact

This performance discrepancy presents a significant bottleneck when performing operations that require frequent creation of Expr objects in nodejs-polars. It substantially limits the library's usability for large-scale data processing tasks in a Node.js environment.

What is the expected behavior?

The performance of creating Expr objects in nodejs-polars should be closer to, or ideally match, the performance in the Python version of polars.

Possible Reason

The issue might be caused by _Expr that will create an new Expr object when executed. Each execution will take about 0.5ms in my laptop, consuming considerable time if executed million times. Moreover, In my test dataset, the actual computing time for millions rows of data is very short, most of the time was wasted in creating the Expr objects.

There might be two ways to solve this problem:

  1. Rewriting _Expr with class, and return this in each expression method, avoiding time-consuming operations of creating a complex new object in javascript.
  2. Turning the Expr into mutable object, updating its attributes rather than creating an new object in each expression.

Thanks for reading the issue, I hope my suggestion would be helpful for the nodejs-polars library.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions