Skip to content

Mac M3 Any Model crashing #613

Open
Open
@andrew-morris-rgs

Description

@andrew-morris-rgs

I am able to run python -m petals.cli.run_server meta-llama/Meta-Llama-3.1-405B-Instruct --num_blocks 2 --max_disk_space=50G for a bit but it always eventually exits with the an AssertionError: Span served by this server is not present in the DHT.

System info:

Apple M3
16 GB
pyenv python v3.12
zsh

Installation method: pipx install --python=${HOME}/.pyenv/versions/3.12.2/bin/python petals

Other errors in the stdout:

${HOME}/.pyenv/versions/3.12.2/lib/python3.12/site-packages/google/protobuf/runtime_version.py:112: UserWarning: Protobuf gencode version 5.27.2 is older than the runtime version 5.28.0 at runtime.proto. Please avoid checked-in Protobuf gencode that can be obsolete.
  warnings.warn(
${HOME}/.pyenv/versions/3.12.2/lib/python3.12/site-packages/google/protobuf/runtime_version.py:112: UserWarning: Protobuf gencode version 5.27.2 is older than the runtime version 5.28.0 at crypto.proto. Please avoid checked-in Protobuf gencode that can be obsolete.
  warnings.warn(
${HOME}/.pyenv/versions/3.12.2/lib/python3.12/site-packages/google/protobuf/runtime_version.py:112: UserWarning: Protobuf gencode version 5.27.2 is older than the runtime version 5.28.0 at p2pd.proto. Please avoid checked-in Protobuf gencode that can be obsolete.
  warnings.warn(
${HOME}/.pyenv/versions/3.12.2/lib/python3.12/site-packages/google/protobuf/runtime_version.py:112: UserWarning: Protobuf gencode version 5.27.2 is older than the runtime version 5.28.0 at averaging.proto. Please avoid checked-in Protobuf gencode that can be obsolete.
  warnings.warn(
${HOME}/.pyenv/versions/3.12.2/lib/python3.12/site-packages/google/protobuf/runtime_version.py:112: UserWarning: Protobuf gencode version 5.27.2 is older than the runtime version 5.28.0 at dht.proto. Please avoid checked-in Protobuf gencode that can be obsolete.
  warnings.warn(
${HOME}/.pyenv/versions/3.12.2/lib/python3.12/site-packages/google/protobuf/runtime_version.py:112: UserWarning: Protobuf gencode version 5.27.2 is older than the runtime version 5.28.0 at auth.proto. Please avoid checked-in Protobuf gencode that can be obsolete.
  warnings.warn(
Sep 11 17:16:48.795 [INFO] Running Petals 2.3.0.dev2
Sep 11 17:16:49.602 [INFO] Make sure you follow the Llama terms of use: https://llama.meta.com/llama3/license, https://llama.meta.com/llama2/license
Sep 11 17:16:49.602 [INFO] Using DHT prefix: Meta-Llama-3-1-405B-Instruct-hf
Sep 11 17:17:09.101 [INFO] This server is accessible via relays
Sep 11 17:17:13.146 [INFO] Connecting to the public swarm
Sep 11 17:17:13.147 [INFO] Running a server on <REDACTED>
Sep 11 17:17:13.164 [WARN] [petals.server.server.__init__:178] Type bfloat16 is not supported on MPS, using float16 instead
Sep 11 17:17:13.164 [INFO] Model weights are loaded in float16 format
Sep 11 17:17:13.165 [INFO] Attention cache for all blocks will consume up to 0.12 GiB
Sep 11 17:17:13.165 [INFO] Loading throughput info
Sep 11 17:17:13.166 [INFO] Reporting throughput: 13.5 tokens/sec for 2 blocks
Sep 11 17:17:17.462 [INFO] Announced that blocks [0, 1] are joining
Sep 11 17:17:28.173 [INFO] Loaded meta-llama/Meta-Llama-3.1-405B-Instruct block 0
'NoneType' object has no attribute 'cadam32bit_grad_fp32'
Sep 11 17:17:52.718 [INFO] Loaded meta-llama/Meta-Llama-3.1-405B-Instruct block 1
Sep 11 17:18:06.367 [INFO] Detected a NAT or a firewall, connecting to libp2p relays. This takes a few minutes
Sep 11 17:36:45.452 [WARN] [petals.server.reachability.validate_reachability:40] Skipping reachability check because health.petals.dev is down: ReadTimeout(ReadTimeoutError("HTTPSConnectionPool(host='health.petals.dev', port=443): Read timed out. (read timeout=10)"))
Sep 11 17:36:45.707 [INFO] Started
Sep 11 17:44:13.840 [INFO] Announced that blocks ['Meta-Llama-3-1-405B-Instruct-hf.0', 'Meta-Llama-3-1-405B-Instruct-hf.1'] are offline
Sep 11 17:44:13.948 [INFO] Shutting down
Sep 11 17:44:13.952 [INFO] Module container shut down successfully
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "${HOME}/.pyenv/versions/3.12.2/lib/python3.12/site-packages/petals/cli/run_server.py", line 235, in <module>
    main()
  File "${HOME}/.pyenv/versions/3.12.2/lib/python3.12/site-packages/petals/cli/run_server.py", line 227, in main
    server.run()
  File "${HOME}/.pyenv/versions/3.12.2/lib/python3.12/site-packages/petals/server/server.py", line 378, in run
    if self._should_choose_other_blocks():
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "${HOME}/.pyenv/versions/3.12.2/lib/python3.12/site-packages/petals/server/server.py", line 418, in _should_choose_other_blocks
    return block_selection.should_choose_other_blocks(self.dht.peer_id, module_infos, self.balance_quality)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "${HOME}/.pyenv/versions/3.12.2/lib/python3.12/site-packages/petals/server/block_selection.py", line 51, in should_choose_other_blocks
    assert local_peer_id in spans, "Span served by this server is not present in the DHT"
AssertionError: Span served by this server is not present in the DHT

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions