Skip to content

Latest commit

 

History

History
157 lines (99 loc) · 7.54 KB

CG-03-05.md

File metadata and controls

157 lines (99 loc) · 7.54 KB

WebAssembly logo

Agenda for the March 5 video call of WebAssembly's Community Group

  • Where: zoom.us
  • When: March 5, 5pm-6pm UTC (March 5, 9am-10am Pacific Time)
  • Location: link on calendar invite
  • Contact:

Registration

None required if you've attended before. Email Ben Smith to sign up if it's your first time. The meeting is open to CG members only.

Logistics

The meeting will be on a zoom.us video conference. Installation is required, see the calendar invite.

Agenda items

  1. Opening, welcome and roll call
    1. Opening of the meeting
    2. Introduction of attendees
  2. Find volunteers for note taking (acting chair to volunteer)
  3. Adoption of the agenda
  4. Proposals and discussions
    1. Review of action items from prior meeting.
    2. Discuss lowering memcpy and memmove to memory.copy and memset to memory.fill in tools.
    3. POLL: Move bulk memory proposal to phase 3.
  5. Closure

Agenda items for future meetings

None

Schedule constraints

None

Meeting Notes

Opening, welcome and roll call

Opening of the meeting

Introduction of attendees

  • Adam Klein
  • Alex Crichton
  • Ben Smith
  • Conrad Watt
  • Daniel Ehrenberg
  • Deepti Gandluri
  • Derek Schuff
  • Francis McCabe
  • Lars Hansen
  • Limin Zhu
  • Luke Imhoff
  • Luke Wagner
  • Mitch Moore
  • Paul Dworzanski
  • Paul Schoenfelder
  • Peter Jensen
  • Richard Winterton
  • Sam Clegg
  • Sergey Rubanov
  • Sven Sauleau
  • TatWai Chong
  • Thomas Lively
  • Ulrik Sorber

Find volunteers for note taking (acting chair to volunteer)

Ben Smith

Adoption of the agenda

Lars Second

Proposals and discussions

Review of action items from prior meeting.

Discuss lowering memcpy and memmove to memory.copy and memset to memory.fill in tools

TL: Wanted to get feedback from engine implementers. Implementing bulk memory ops. Has a chance to lower memcopy, memmove and memset. One option, always lower to these instructions. Other option: memcopy, memmove, memset are compiled and included as normal functions. The tools in LLVM have the ability to optimize calls to these functions, that have constant, statically known sizes to loads and stores. Memmove 16 bytes, could become two pairs of i64 load+store. Using these load store pairs could be faster. But it is slightly larger on code size. Falling back to calling wasm functions of memmove and memset means these compiled functions need to be in module. Trading off potential performance issue, against code size over the wire. I want to lower memmove and memcopy to the bulk instructions.

LH: In firefox these are implemented as calls into the runtime.

TL: Is there an unreasonable amount of overhead in runtime?

LW: I think it’s better for people to use loads/stores, 32-bit, 64-bit, when size is constant to use lowered to inline stores. So there isn’t much of a code size tradeoff.

TL: If it can be implemented w/ a single load/store pair, then I do it that way, otherwise bulk memory.

LH: The general feeling would be that these are for larger data movements, and it might be better to use a different stragtegy for smaller movements.

TL: v8 team said they prefer bounds checking behavior of bulk memory, for small constants.

AK: The other side of this is that we would do the lowering in the compiler, not go to runtime for small constants. Use them a compression scheme on the wire and go back to load/stores later.

TL: Yes, that’s what I thought.

JG: In the presence of baseline compilers, will they do that too?

LW: I agree, if hot code is having many small memmoves/memcpys then it’s better to lower to loads/stores.

LH: If you have an unaligned pointer and constant size for copy, you don’t know if there’s an overlap. Semantics are that you get all bytes until OOB w/ bulk memory ops, load/store is different. There are a bunch of optimizations that may be limited if we try to do it inline.

TL: Sounds like there are reasons to be cautious. In that case, might be good to find performance/code size tradeoff. We’ll take some measurements and try to find best policy here.

JG: I would vote that we have two policies, optimize for size and optimize for speed. In -oz we may always want memcopy, in -os not sure.

TL: wasm is unique, even when optimizing for speed we care more about code size than other targets.

JG: We may need to do something special for -O3, throughput vs. start up time. Whether baseline or top-tier jit matters depends on payload as well.

LI: Are the number of loads/stores influenced by the host cpu, how deep the pipeline is, etc. it makes it hard for wasm to target because it can’t tell what the host will do.

TL: Yes, it makes a difference, flying blind here, trying to come up with a one-size-fits-all solution. If anyone has an idea about what we should optimize for, then we’ll come up with something kind of good.

JG: That trade-off is why for top-tier we’ll want to rely on engine to do best optimization here. For LLVM, we need to understand the space more, need to collect the data across browsers.

LI: Do you see the speed setting being actually used. Last meeting, everyone said they’d optimize for size, (producers section), so who will use this?

TL: Different case, those bytes didn’t matter for user.

RW: We have to be careful we’re not too browser focused, different requirements for node.js.

JG: Some applications are not sensitive to size -- one where it’s acceptable that you wait a while, load time is high because users are willing to wait for it. So we can optimize for speed against size.

TL: We will take those measurements and do some science, come up w/ something reasonable.

LW: Would there be a GH issue, here’s the recommended way for toolchains to compile. To discuss that resolution.

LI: Does x86 behave differently than arm here?

TL: Not sure, depends on engines too.

FM: If the majority of memcopy is generated due to assignment, then original requirements don’t need to be honored.

DS: middle end of LLVM already does this.

POLL: Move bulk memory proposal to phase 3

BS: [discussing moving bulk memory to phase 3] In the previous meeting we didn’t move to phase 3 because we didn’t have a testsuite, since then Lars and I have been working on this. Lars provided a large number of tests, and I wrote the reference interpreter implementation. As of a recent PR, we are in sync. The V8 implementation is also almost in sync.

Poll: Move bulk memory to phase 3

SF F N A SA
15 5 0 0 0

Closure