Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WHATWG Streams implementation #544

Open
1 of 3 tasks
jackkleeman opened this issue Aug 15, 2024 · 18 comments
Open
1 of 3 tasks

WHATWG Streams implementation #544

jackkleeman opened this issue Aug 15, 2024 · 18 comments

Comments

@jackkleeman
Copy link
Contributor

jackkleeman commented Aug 15, 2024

I'm fairly interested in contributing an implementation of the web stream API. As I see it there are two options:

  1. Use an all JS implementation, taking inspiration from the reference implementation from the working group, using some native APIs where necessary (as I understand from https://github.com/creatorrr/web-streams-polyfill, this will be necessary)
  2. Write a Rust implementation of the streams spec

I would probably lean towards doing the latter, either as a separate crate if it can be abstracted well or simply within this repo. Are there any opinions on how this should be done? Thanks!


@richarddavison
Copy link
Contributor

Hi @jackkleeman. Thanks for considering contributing. #2 would almost be a requirement due to performance since there is no JIT in quick JS JavaScript implementation would be very slow. There is already a native rust implementation of a simple and incomplete streams API used by child_process module. The tricky part is that the streams API is very large, complex and contains a lot of cases so a native implementation would probably have to take some shortcuts and be not strictly compliant with Node.

@jackkleeman
Copy link
Contributor Author

ok, i understand that we need a native implementation, and we have some initial work that we can extend, and that we should probably try and do this iteratively adding different parts of the api. will try and kick something off in the coming weeks!

@jackkleeman
Copy link
Contributor Author

i have started work on this. indeed the spec is absolutely huge.... however, i think taking shortcuts will just make things harder tbh than just faithfully transferring the spec function by function (which seems to be what firefox devs did, for example). so thats what i am doing over in https://github.com/jackkleeman/llrt/tree/streams

@richarddavison
Copy link
Contributor

@jackkleeman I took a peek in your repo and you've seen to have had amazing progress in such short time. Fantastic! Anything we can do to help?

@jackkleeman
Copy link
Contributor Author

@richarddavison thanks! all good for now. im currently working on getting the readable streams web platform tests all working so we can maybe get that into a PR and i can move on to writeable

@jackkleeman
Copy link
Contributor Author

Latest update: the readable streams tests all pass, and most of the byte stream tests too. I am entering a very busy time at work so will probably take a pause for a few weeks.

@richarddavison
Copy link
Contributor

Latest update: the readable streams tests all pass, and most of the byte stream tests too. I am entering a very busy time at work so will probably take a pause for a few weeks.

Fantastic! Take your time, I've been silently keeping an eye on your fork and you've done amazing work! Hats off 🎩

@jackkleeman
Copy link
Contributor Author

if you are keeping an eye on it, any feedback about the horrible OwnedBorrowMut (for stream, controller and reader) mess would be appreciated. You don't realise how much the rust borrow checker does for you until you start having to grapple with its rules being enforced at runtime....
Generally my methodology has been to create the owned borrow values at the entrypoint function, and pass them down. Furthermore, you need to make sure that the borrows are released whenever you call out to user code (as they may call back in to your code), so that means basically passing them everywhere, and passing them as owned borrows, not references, because you can't release the borrow without ownership. That means you also need to pass them in to functions, consume them, and then return reborrowed values back out again, in many cases. Which leads to this incredible infectious signature: fn (stream, controller, reader, ...) -> Result<(stream, controller, reader)>. And of course, controller can generally be one of two values, and reader may be None or another two values. So its a lot of combinations!

Maybe this is unavoidable. I have considered having some sort of struct to contain all three, but due to the combinations of different readers and controllers, I don't know if this will really make things any simpler. And more generally, there is the fact that refactoring into more sensible things (eg, remove the cyclical references stream -> controller -> stream) will depart from the spec. Even now, my understanding of the end to end flow of (eg) a byte is poor simply because of the complexity, and I rely on being extremely close to the spec for things to work.

Anyway, if you have an opinion on this, id love to hear it!

@jackkleeman
Copy link
Contributor Author

Sidenote while its in my brain - quickjs has no arraybuffer transfer function. I have tried to think of a way to hack it on the rquickjs side, but I don't think its possible as we don't control the arraybuffer constructor and so its tricky to change the fact that it will be dropped by quickjs when the detach method is called.

quickjs-ng have implemented this function. I don't think it would be hard to contribute to quickjs. For the time being we can get around it by just copying the bytes, but this may lead to some tests that don't pass, if they have ways to detect for this. Plus, it will lead to a fair few copies, because the spec really likes transferring array buffers.

@richarddavison
Copy link
Contributor

Anyway, if you have an opinion on this, id love to hear it!

Not at the moment. I would need to have a closer look at everything you've built to see how it works.

quickjs has no arraybuffer transfer function

No it does not (as its an ES2024 feature) but it allows you access the raw underlying byte slice (which causes the holding ArrayBuffer to detach them). Take a look here:

let raw = array_buffer

@jackkleeman
Copy link
Contributor Author

jackkleeman commented Sep 27, 2024

I don't think it detaches, unless I'm mistaken? You can get a slice to the bytes, but it still owns it, and when GC'd it will free the slice. It's not possible to cause a detach without triggering this free, as far as I can see. Hence you have to copy it to safely transfer it

@richarddavison
Copy link
Contributor

I don't think it detaches, unless I'm mistaken? You can get a slice to the bytes, but it still owns it, and when GC'd it will free the slice. It's not possible to cause a detach without triggering this free, as far as I can see. Hence you have to copy it to safely transfer it

No you are right, it doesn't automatically. We can detach it by calling detach() but that would also cause it to be dropped/freed.

@richarddavison
Copy link
Contributor

@jackkleeman you've probably noticed but we have now merged QuickJS-NG so transfer method should now be available 🎉

@jackkleeman
Copy link
Contributor Author

sweet! ive started on writable stream, which seems to be a lot less moving parts than readable, so hopefully i am 80% of the way done. still quite busy at work but I find moments to make progress :)

@jackkleeman
Copy link
Contributor Author

update - writable streams all passing. i rebased and transfer works also
still left to do is piping and transform streams. however, i suspect we have enough to replace the existing llrt_stream, if thats something we want to do. for the time being im working under llrt_stream_next

@jackkleeman
Copy link
Contributor Author

currently we are at 10k additions just for the rust changes... open question how we can move towards getting this merged

@richarddavison
Copy link
Contributor

This is fantastic news 🎉 I'm really excited about this development. I recon the first step should be replacing llrt_stream. I'm happy to start with a the current implementation, even if it's not fully optimized for all cases, and gradually refine it over time. Following the principle: make it work, make it right, then make it fast!

I anticipate that we'll need to make some adjustments to optimize for a stream that avoids interactions with the JavaScript layer. For example, piping to an fs.writeStream should allow bytes to flow directly to the target

@jackkleeman
Copy link
Contributor Author

jackkleeman commented Dec 28, 2024

ok, i will look to do a PR that replaces the existing llrt_stream. the stream spec has a ton of special casing (its 80% of the complexity, honestly) for bytes, including a mode where you can provide your own reusable buffer for byte transport, so its possible it will work out the box for the fs use cases. but lets see!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants