Skip to content

Commit 83cf05e

Browse files
authored
Merge pull request #5 from martindurant/for-demos
Add logging for FS instantiation, better README and Allow-Private-Network
2 parents 38cab63 + 51f2046 commit 83cf05e

File tree

7 files changed

+225
-42
lines changed

7 files changed

+225
-42
lines changed

README.md

Lines changed: 17 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -14,13 +14,13 @@ Install the two sub-packages:
1414
- fsspec-proxy, a fastAPI-based server which reads/writes to configured storage
1515
locations
1616
- pyscript-fsspec-client, a filesystem implementation that connects to the proxy,
17-
allowing even pyscript to access bytes in remote stores
17+
allowing even pyscript to access bytes in remote stores.
1818

1919
Now run:
2020
```bash
2121
$ fsspec-proxy dev
2222
```
23-
to start the (unsecured) proxy server, with port 8000. Further arguments
23+
This starts the (unsecured) proxy server, with port 8000. Further arguments
2424
will be passed to fastAPI to configure, for example, the port and address
2525
to listen on.
2626

@@ -33,6 +33,15 @@ server can be reconfigured via an API call.
3333
or auth. It can be regarded as a prototype to base production-level
3434
implementations on.
3535

36+
Options
37+
-------
38+
39+
- `run` (default) runs the server in production mode
40+
- `dev` run the server in development mode
41+
- `private` adds Access-Control-Allow-Private-Network header to allow some
42+
requests in some CORS situations. If you are seeing CORS issues, adding
43+
this might help.
44+
3645
Demo
3746
----
3847

@@ -43,27 +52,21 @@ The server will show incoming byte range requests, and you can also track them
4352
in the browser's debug console. The end result should be a table view of the
4453
contents of the target parquet file.
4554

46-
Installation with Optional Dependencies (fsspec-proxy)
47-
-----------------------------------------------------
48-
49-
The following steps apply only to the `fsspec-proxy` package. The package has
50-
several optional dependency groups:
51-
52-
- `s3`: Required for S3 access (needed for the "Conda Stats" example)
53-
- `anaconda`: Required for Anaconda Cloud access
54-
- `all`: All optional dependencies
55+
By default, the server attempts to instantiate S3 and anaconda filesystems,
56+
but will skip these with a message if the dependencies are not available. The
57+
demo uses the S3 backend, so you will need S3 support (below).
5558

5659
S3 Support
57-
~~~~~~~~~~
60+
----------
5861

5962
To use S3 functionality (including the "Conda Stats" example):
6063

6164
```bash
62-
pip install .[s3]
65+
pip install "./fsspec-proxy[s3]"
6366
```
6467

6568
Anaconda Cloud Support
66-
~~~~~~~~~~~~~~~~~~~~~~
69+
----------------------
6770

6871
To use Anaconda Cloud functionality, you'll need to install dependencies from
6972
the Anaconda Cloud index. You can do this in two ways:
@@ -86,20 +89,3 @@ the Anaconda Cloud index. You can do this in two ways:
8689
```bash
8790
pip install .[anaconda] --extra-index-url https://pypi.anaconda.org/anaconda-cloud/simple
8891
```
89-
90-
All Optional Dependencies
91-
~~~~~~~~~~~~~~~~~~~~~~~~
92-
93-
To install all optional dependencies:
94-
95-
```bash
96-
# With pip config
97-
pip install .[all]
98-
99-
# Or directly with extra index
100-
pip install .[all] --extra-index-url https://pypi.anaconda.org/anaconda-cloud/simple
101-
```
102-
103-
This will ensure that all required packages for `fsspec-proxy`, including those
104-
only available on Anaconda Cloud, are installed.
105-
>>>>>>> 4408c85bdd76295a43f0d7a20041a646e46b3f25

config.yaml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
sources:
2+
- name: inmemory
3+
path: memory://mytests
4+
- name: local
5+
path: file:///Users
6+
readonly: true
7+
- name: "Conda Stats"
8+
path: "s3://anaconda-package-data/conda/hourly/"
9+
kwargs:
10+
anon: True
11+
- name: "MyAnaconda"
12+
path: "anaconda://my/"
13+
allow_reload: true
14+

fsspec-proxy/fsspec_proxy/__main__.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
#!/usr/bin/env python
22

3+
import os
34
import re
45
import sys
56

@@ -11,7 +12,9 @@
1112

1213
def run_main():
1314
mode = "dev" if "dev" in sys.argv else "run"
14-
argv = [_ for _ in sys.argv if _ not in {"dev", "run"}]
15+
# TODO: this should be unified with the config
16+
os.environ["FS_PROXY_PRIVATE"] = str("private" in sys.argv)
17+
argv = [_ for _ in sys.argv if _ not in {"dev", "run", "private"}]
1518
sys.argv = [
1619
re.sub(r'(-script\.pyw?|\.exe)?$', '', sys.argv[0]),
1720
mode,

fsspec-proxy/fsspec_proxy/bytes_server.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
from contextlib import asynccontextmanager
22
import io
33
import fastapi
4-
from fastapi.middleware.cors import CORSMiddleware
4+
from fsspec_proxy.cors import CORSMiddleware
55
from starlette.responses import StreamingResponse
66

77
from fsspec_proxy import file_manager
@@ -62,7 +62,7 @@ async def delete_file(key, path, response: fastapi.Response):
6262
if fs_info.get("readonly"):
6363
raise fastapi.HTTPException(status_code=403, detail="Not Allowed")
6464
try:
65-
out = await fs_info["instance"]._rm_file(path)
65+
await fs_info["instance"]._rm_file(path)
6666
except FileNotFoundError:
6767
raise fastapi.HTTPException(status_code=404, detail="Item not found")
6868
except PermissionError:
@@ -93,7 +93,6 @@ async def put_bytes(key, path, request: fastapi.Request, response: fastapi.Respo
9393
raise fastapi.HTTPException(status_code=403, detail="Not Allowed")
9494
path = f"{fs_info['path'].rstrip('/')}/{path.lstrip('/')}"
9595
data = await request.body()
96-
print("####", data)
9796
try:
9897
await fs_info["instance"]._pipe_file(path, data)
9998
except FileNotFoundError:

fsspec-proxy/fsspec_proxy/cors.py

Lines changed: 178 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,178 @@
1+
"""Copy of fastapi.middleware.cors, with """
2+
3+
from __future__ import annotations
4+
5+
import functools
6+
import os
7+
import re
8+
import typing
9+
10+
from starlette.datastructures import Headers, MutableHeaders
11+
from starlette.responses import PlainTextResponse, Response
12+
from starlette.types import ASGIApp, Message, Receive, Scope, Send
13+
14+
PRIVATE = os.getenv("FS_PROXY_PRIVATE", False) == "True"
15+
ALL_METHODS = ("DELETE", "GET", "HEAD", "OPTIONS", "PATCH", "POST", "PUT")
16+
SAFELISTED_HEADERS = {"Accept", "Accept-Language", "Content-Language", "Content-Type"}
17+
18+
19+
class CORSMiddleware:
20+
def __init__(
21+
self,
22+
app: ASGIApp,
23+
allow_origins: typing.Sequence[str] = (),
24+
allow_methods: typing.Sequence[str] = ("GET",),
25+
allow_headers: typing.Sequence[str] = (),
26+
allow_credentials: bool = False,
27+
allow_origin_regex: str | None = None,
28+
expose_headers: typing.Sequence[str] = (),
29+
max_age: int = 600,
30+
) -> None:
31+
if "*" in allow_methods:
32+
allow_methods = ALL_METHODS
33+
34+
compiled_allow_origin_regex = None
35+
if allow_origin_regex is not None:
36+
compiled_allow_origin_regex = re.compile(allow_origin_regex)
37+
38+
allow_all_origins = "*" in allow_origins
39+
allow_all_headers = "*" in allow_headers
40+
preflight_explicit_allow_origin = not allow_all_origins or allow_credentials
41+
42+
simple_headers = {}
43+
if allow_all_origins:
44+
simple_headers["Access-Control-Allow-Origin"] = "*"
45+
if allow_credentials:
46+
simple_headers["Access-Control-Allow-Credentials"] = "true"
47+
if expose_headers:
48+
simple_headers["Access-Control-Expose-Headers"] = ", ".join(expose_headers)
49+
50+
preflight_headers = {}
51+
if preflight_explicit_allow_origin:
52+
# The origin value will be set in preflight_response() if it is allowed.
53+
preflight_headers["Vary"] = "Origin"
54+
else:
55+
preflight_headers["Access-Control-Allow-Origin"] = "*"
56+
preflight_headers.update(
57+
{
58+
"Access-Control-Allow-Methods": ", ".join(allow_methods),
59+
"Access-Control-Max-Age": str(max_age),
60+
}
61+
)
62+
allow_headers = sorted(SAFELISTED_HEADERS | set(allow_headers))
63+
if allow_headers and not allow_all_headers:
64+
preflight_headers["Access-Control-Allow-Headers"] = ", ".join(allow_headers)
65+
if allow_credentials:
66+
preflight_headers["Access-Control-Allow-Credentials"] = "true"
67+
68+
self.app = app
69+
self.allow_origins = allow_origins
70+
self.allow_methods = allow_methods
71+
self.allow_headers = [h.lower() for h in allow_headers]
72+
self.allow_all_origins = allow_all_origins
73+
self.allow_all_headers = allow_all_headers
74+
self.preflight_explicit_allow_origin = preflight_explicit_allow_origin
75+
self.allow_origin_regex = compiled_allow_origin_regex
76+
self.simple_headers = simple_headers
77+
self.preflight_headers = preflight_headers
78+
79+
async def __call__(self, scope: Scope, receive: Receive, send: Send) -> None:
80+
if scope["type"] != "http": # pragma: no cover
81+
await self.app(scope, receive, send)
82+
return
83+
84+
method = scope["method"]
85+
headers = Headers(scope=scope)
86+
origin = headers.get("origin")
87+
88+
if origin is None:
89+
await self.app(scope, receive, send)
90+
return
91+
92+
if method == "OPTIONS" and "access-control-request-method" in headers:
93+
response = self.preflight_response(request_headers=headers)
94+
await response(scope, receive, send)
95+
return
96+
97+
await self.simple_response(scope, receive, send, request_headers=headers)
98+
99+
def is_allowed_origin(self, origin: str) -> bool:
100+
if self.allow_all_origins:
101+
return True
102+
103+
if self.allow_origin_regex is not None and self.allow_origin_regex.fullmatch(origin):
104+
return True
105+
106+
return origin in self.allow_origins
107+
108+
def preflight_response(self, request_headers: Headers) -> Response:
109+
requested_origin = request_headers["origin"]
110+
requested_method = request_headers["access-control-request-method"]
111+
requested_headers = request_headers.get("access-control-request-headers")
112+
113+
headers = dict(self.preflight_headers)
114+
failures = []
115+
116+
if self.is_allowed_origin(origin=requested_origin):
117+
if self.preflight_explicit_allow_origin:
118+
# The "else" case is already accounted for in self.preflight_headers
119+
# and the value would be "*".
120+
headers["Access-Control-Allow-Origin"] = requested_origin
121+
else:
122+
failures.append("origin")
123+
124+
if requested_method not in self.allow_methods:
125+
failures.append("method")
126+
127+
# If we allow all headers, then we have to mirror back any requested
128+
# headers in the response.
129+
if self.allow_all_headers and requested_headers is not None:
130+
headers["Access-Control-Allow-Headers"] = requested_headers
131+
elif requested_headers is not None:
132+
for header in [h.lower() for h in requested_headers.split(",")]:
133+
if header.strip() not in self.allow_headers:
134+
failures.append("headers")
135+
break
136+
137+
# We don't strictly need to use 400 responses here, since its up to
138+
# the browser to enforce the CORS policy, but its more informative
139+
# if we do.
140+
if failures:
141+
failure_text = "Disallowed CORS " + ", ".join(failures)
142+
return PlainTextResponse(failure_text, status_code=400, headers=headers)
143+
144+
if PRIVATE:
145+
headers["Access-Control-Allow-Private-Network"] = "true"
146+
return PlainTextResponse("OK", status_code=200, headers=headers)
147+
148+
async def simple_response(self, scope: Scope, receive: Receive, send: Send, request_headers: Headers) -> None:
149+
send = functools.partial(self.send, send=send, request_headers=request_headers)
150+
await self.app(scope, receive, send)
151+
152+
async def send(self, message: Message, send: Send, request_headers: Headers) -> None:
153+
if message["type"] != "http.response.start":
154+
await send(message)
155+
return
156+
157+
message.setdefault("headers", [])
158+
headers = MutableHeaders(scope=message)
159+
headers.update(self.simple_headers)
160+
origin = request_headers["Origin"]
161+
has_cookie = "cookie" in request_headers
162+
163+
# If request includes any cookie headers, then we must respond
164+
# with the specific origin instead of '*'.
165+
if self.allow_all_origins and has_cookie:
166+
self.allow_explicit_origin(headers, origin)
167+
168+
# If we only allow specific origins, then we have to mirror back
169+
# the Origin header in the response.
170+
elif not self.allow_all_origins and self.is_allowed_origin(origin=origin):
171+
self.allow_explicit_origin(headers, origin)
172+
173+
await send(message)
174+
175+
@staticmethod
176+
def allow_explicit_origin(headers: MutableHeaders, origin: str) -> None:
177+
headers["Access-Control-Allow-Origin"] = origin
178+
headers.add_vary_header("Origin")

fsspec-proxy/fsspec_proxy/file_manager.py

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,14 @@
1-
from fsspec.implementations.asyn_wrapper import AsyncFileSystemWrapper
2-
import fsspec.utils
31
import io
2+
import logging
43
import os
54
import yaml
6-
import logging
5+
6+
from fsspec.implementations.asyn_wrapper import AsyncFileSystemWrapper
7+
import fsspec.utils
78

89
logger = logging.getLogger("fsspec_proxy")
10+
# TODO: this config is copied as config.yaml; de-dup and move other options
11+
# into the config
912
default_config = b"""sources:
1013
- name: inmemory
1114
path: memory://mytests
@@ -62,7 +65,7 @@ def initialize_filesystems(self):
6265
fs, url2 = fsspec.url_to_fs(fs_path, **kwargs)
6366
except Exception:
6467
# or we could still list show their names but not the contents
65-
logger.error("Instantiating filesystem failed")
68+
logger.exception("Instantiating filesystem %s failed, skipping", key)
6669
continue
6770
if not fs.async_impl:
6871
fs = AsyncFileSystemWrapper(fs)

fsspec-proxy/pyproject.toml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,16 +29,16 @@ dependencies = [
2929
]
3030
dynamic = ["version", "urls", "keywords"]
3131

32-
[dependency-groups]
32+
[project.optional-dependencies]
3333
s3 = [
3434
"s3fs"
3535
]
3636
anaconda = [
3737
"anaconda-cloud-storage"
3838
]
3939
all = [
40-
{include-group = "s3"},
41-
{include-group = "anaconda"}
40+
"s3fs",
41+
"anaconda-cloud-storage"
4242
]
4343

4444
[tool.hatch.build.hooks.version]

0 commit comments

Comments
 (0)