change calls to glob.glob to new dataKrash function #274

DoctorPresto · 2025-08-04T01:38:25Z

dataKrash is a convenience function using os.scandir produces the same results as glob.glob, but 25x faster and 3x more memory efficient. Rather than recursively searching the entire directory multiple times, it indexes once and builds a dict which contains lists of the filetypes requested.

results from benchmarking on a small folder of sectors vs current implentation with glob.glob:

Benchmarking: dataKrash(x5)
  Run 1: 0.2427s, Peak Memory: 0.06 MB
  Run 2: 0.2404s, Peak Memory: 0.06 MB
  Run 3: 0.2372s, Peak Memory: 0.06 MB
  Run 4: 0.2423s, Peak Memory: 0.06 MB
  Run 5: 0.2374s, Peak Memory: 0.06 MB

📊 Average Time: 0.2400s
📦 Average Peak Memory: 0.06 MB
  .glb: 177 files
  .mesh.json: 0 files
  .app.json: 27 files
  .anims.json: 0 files
  .ent.json: 89 files
  .anims.glb: 0 files
  .streamingsector.json: 8 files
  .rig.json: 0 files
  .phys.json: 0 files

Benchmarking: glob.glob (x5)
  Run 1: 6.0052s, Peak Memory: 0.17 MB
  Run 2: 5.9445s, Peak Memory: 0.16 MB
  Run 3: 5.8777s, Peak Memory: 0.16 MB
  Run 4: 5.8439s, Peak Memory: 0.16 MB
  Run 5: 5.9775s, Peak Memory: 0.16 MB

📊 Average Time: 5.9298s
📦 Average Peak Memory: 0.16 MB
  .glb: 177 files
  .mesh.json: 0 files
  .app.json: 27 files
  .anims.json: 0 files
  .ent.json: 89 files
  .anims.glb: 0 files
  .streamingsector.json: 8 files
  .rig.json: 0 files
  .phys.json: 0 files

datakrash produces the same results as glob.glob, but 25x faster and 3x more memory efficient. rather than recursively searching the entire directory multiple times, it indexes once and builds a dict which contains lists of the filetypes requested.

Simarilius-uk · 2025-08-04T16:29:19Z

I dont understand how glob sucks so bad on your machine, I'm testing it on a project here and its parsing >4x as many files in 1.1sec. Your codes still faster, but not by the margin your seeing.
Is this safe against file stuff that I was doing escaped path for?

DoctorPresto · 2025-08-04T20:43:47Z

I dont understand how glob sucks so bad on your machine, I'm testing it on a project here and its parsing >4x as many files in 1.1sec. Your codes still faster, but not by the margin your seeing.

Yeah, I cherry picked where I'd benchmark a bit here but glob’s performance hit comes from re-scanning the whole tree for every pattern - this version only walks the directory once and filters per file, so it scales way better in larger structures where glob falls down.

Is this safe against file stuff that I was doing escaped path for?

I tried to make sure it covers everything we were relying on: proper normalization for cross-platform + unicode safety (unicodedata.normalize), support for non-ASCII, emojis in paths for some reason, filenames with spaces etc... The only edge case that might still need caution is escaped paths — but as long as we’re passing them in as raw strings (r""), we should be good.

Simarilius-uk · 2025-08-05T10:50:31Z

can you check it agains the issue that #147 showed?

DoctorPresto added 2 commits August 3, 2025 21:29

use datakrash in ent and sectors

76138bb

datakrash produces the same results as glob.glob, but 25x faster and 3x more memory efficient. rather than recursively searching the entire directory multiple times, it indexes once and builds a dict which contains lists of the filetypes requested.

automatically set the fps to 30 when importing an animtation

f927872

DoctorPresto requested a review from Simarilius-uk August 4, 2025 01:38

DoctorPresto added the enhancement New feature or request label Aug 4, 2025

DoctorPresto added 2 commits August 4, 2025 00:15

fix typo and actually put a path

afc8082

don't accidentally erase all of the plugin

332964c

Merge branch 'main' into filehandlers

78de3aa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

change calls to glob.glob to new dataKrash function #274

change calls to glob.glob to new dataKrash function #274

Uh oh!

DoctorPresto commented Aug 4, 2025

Uh oh!

Simarilius-uk commented Aug 4, 2025

Uh oh!

DoctorPresto commented Aug 4, 2025

Uh oh!

Simarilius-uk commented Aug 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

change calls to glob.glob to new dataKrash function #274

Are you sure you want to change the base?

change calls to glob.glob to new dataKrash function #274

Uh oh!

Conversation

DoctorPresto commented Aug 4, 2025

Uh oh!

Simarilius-uk commented Aug 4, 2025

Uh oh!

DoctorPresto commented Aug 4, 2025

Uh oh!

Simarilius-uk commented Aug 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants