Skip to content

Fixed Mojo + benchmark script #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

uberkael
Copy link

@uberkael uberkael commented Apr 29, 2024

This is incorrect. The benchmarks only measure the Python startup overhead and the Mojo compilation overhead; the test programs are small compared to those times.

I'm not an expert, but out of curiosity, I did update the Mojo code and compiled it before running the hyperfine call with a shell script bench.sh.

As you can see all the times are mostly the same, meaning all the Leetcode examples are too small for a benchmark, even if mojo compiled is 9x faster in this case.

❯ bench.sh
Benchmarking problem-1
Benchmark 1: python problem-1.py
  Time (mean ± σ):      71.0 ms ±   3.9 ms    [User: 39.1 ms, System: 32.3 ms]
  Range (min … max):    62.7 ms …  82.3 ms    45 runs
Benchmark 2: problem-1
  Time (mean ± σ):       9.3 ms ±   5.2 ms    [User: 2.2 ms, System: 8.4 ms]
  Range (min … max):     5.9 ms …  53.1 ms    417 runs
Summary
  problem-1 ran
    7.63 ± 4.28 times faster than python problem-1.py
Benchmarking problem-2
Benchmark 1: python problem-2.py
  Time (mean ± σ):      70.8 ms ±   3.4 ms    [User: 40.0 ms, System: 31.4 ms]
  Range (min … max):    63.5 ms …  78.1 ms    43 runs
Benchmark 2: problem-2
  Time (mean ± σ):       8.1 ms ±   1.8 ms    [User: 2.0 ms, System: 7.9 ms]
  Range (min … max):     6.0 ms …  18.2 ms    350 runs
Summary
  problem-2 ran
    8.76 ± 2.00 times faster than python problem-2.py
Benchmarking problem-3
Benchmark 1: python problem-3.py
  Time (mean ± σ):      70.8 ms ±   4.6 ms    [User: 37.4 ms, System: 34.2 ms]
  Range (min … max):    61.9 ms …  83.7 ms    45 runs
Benchmark 2: problem-3
  Time (mean ± σ):       8.5 ms ±   2.1 ms    [User: 2.3 ms, System: 7.8 ms]
  Range (min … max):     6.1 ms …  20.4 ms    429 runs
Summary
  problem-3 ran
    8.33 ± 2.16 times faster than python problem-3.py
Benchmarking problem-4
Benchmark 1: python problem-4.py
  Time (mean ± σ):      73.9 ms ±   5.7 ms    [User: 41.9 ms, System: 32.5 ms]
  Range (min … max):    64.7 ms …  87.4 ms    44 runs
Benchmark 2: problem-4
  Time (mean ± σ):       8.0 ms ±   1.7 ms    [User: 2.1 ms, System: 7.7 ms]
  Range (min … max):     6.1 ms …  19.0 ms    420 runs
Summary
  problem-4 ran
    9.24 ± 2.10 times faster than python problem-4.py
Benchmarking problem-5
Benchmark 1: python problem-5.py
  Time (mean ± σ):      72.0 ms ±   3.7 ms    [User: 38.5 ms, System: 33.9 ms]
  Range (min … max):    64.1 ms …  81.2 ms    41 runs
Benchmark 2: problem-5
  Time (mean ± σ):       8.6 ms ±   2.1 ms    [User: 2.1 ms, System: 8.0 ms]
  Range (min … max):     6.0 ms …  19.7 ms    370 runs
Summary
  problem-5 ran
    8.37 ± 2.13 times faster than python problem-5.py```

@uberkael uberkael force-pushed the main branch 3 times, most recently from b8491b9 to fef4d22 Compare April 29, 2024 21:52
@SaadBazaz
Copy link
Member

Hey @uberkael ,
Thank you for the Pull Request. Reviewing it, would love to improve the benchmarks for the community.

@SaadBazaz SaadBazaz requested review from SohaibBazaz and SaadBazaz May 5, 2024 14:54
@SohaibBazaz
Copy link
Collaborator

Hello @uberkael, Thanks for the review!
As a beginner, based on my readings, it appears that Mojo must be compiled before it can be executed. Therefore, the issue i think that I encountered in measuring only the "compilation overhead" might be due to the way I conducted the benchmarking, correct?

@uberkael
Copy link
Author

uberkael commented May 6, 2024

Yes,
But also those examples are below the millisecond of runtime so the benchmark is almost invisible.
Try to execute them multiple times, I will update the PR.

while i < len(s):

if i < len(s) - 1 and py.abs(ord(s[i]) - ord(s[i + 1])) == 32:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see why importing from python was not necessary in my other codes but here this should be working just fine, no?

@@ -1,22 +1,14 @@
from python import Python

def lengthOfLastWord(enterword: String):
Copy link
Collaborator

@SohaibBazaz SohaibBazaz May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand why you changed the defining keyword here from "def" to "fn". I mean it worked fine with "def", any reason for that?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fn in mojo allow to use types, which enable the compiler to optimize more effectively.

@ayghri
Copy link

ayghri commented Jun 2, 2024

Thanks for the PR. I found it strange that Mojo is ~ x20 slower. I was just googling benchmarks and stumbled upon this. I've never used Mojo and these results almost swayed me from doing so.

These examples are definitely misleading with the compilation overhead and lack of optimizations.

@uberkael
Copy link
Author

uberkael commented Sep 3, 2024

I updated the code and removed all print() statements.
It appears that the Mojo print function is still not well optimized and makes the code quite slow.
modular/max#975 (comment)

The results:

Summary
  problem-1.bin ran
    4.08 ± 0.25 times faster than mojo problem-1.mojo
    9.01 ± 0.58 times faster than python problem-1.py

  problem-2.bin ran
    4.72 ± 0.47 times faster than mojo problem-2.mojo
   10.30 ± 0.97 times faster than python problem-2.py

  problem-3.bin ran
    2.22 ± 0.08 times faster than python problem-3.py
    3.27 ± 0.10 times faster than mojo problem-3.mojo

  problem-4.bin ran
    3.53 ± 0.26 times faster than mojo problem-4.mojo
    4.97 ± 0.35 times faster than python problem-4.py

  problem-5.bin ran
    3.95 ± 0.46 times faster than mojo problem-5.mojo
    7.20 ± 0.64 times faster than python problem-5.py

image

@ayghri @SaadBazaz @SohaibBazaz

@ayghri
Copy link

ayghri commented Sep 9, 2024

print statement are always going to slow down the program due to the I/O operations. The alternative is to use a I/O buffer, but it seems Mojo doesn't have them yet.

@SaadBazaz
Copy link
Member

We can exclude print statements for the raw benchmark and find a way to display them only before/after the benchmark timing function. @ayghri @uberkael



fn main():
for _ in range(100000):
_ = lengthOfLastWord("Hello World")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will need to write keep(lengthOfLastWord("Hello World"), otherwise lenghOfLastWord gets DCE'd.

from benchmark import keep

...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants