-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Description
Description
It looks like https://recodatasets.z20.web.core.windows.net is no longer serving assets for this project, which is mentioned in #2251. In addition to the images, this also affects model hyper-parameter files like dkn_MINDsmall.yaml, which is used in the example notebook for training and evaluating DKN (examples/02_model_content_based_filtering/dkn_deep_dive.ipynb
)
A similar issue appears to affect example notebooks for NRMS, NAML, NPA, LSTUR, LightGCN, xDeepFM, and possibly others.
In which platform does it happen?
Running example notebooks in VS Code locally from a cloned copy of the repo
How do we replicate the issue?
- Create a local Python environment following the instructions from the README
- Open
dkn_deep_dive.ipynb
and run all the cells - The first issue that crops up is downloading the MIND dataset from
recodatasets.z20.web.core.windows.net
- Supply a local copy of MIND and adjust the paths in the notebook
- Continue to the
Create Hyperparameters
section and run the first cell - Execution fails with:
---------------------------------------------------------------------------
File ~/Projects/microsoft/recommenders/.venv/lib/python3.11/site-packages/urllib3/util/retry.py:519, in Retry.increment(self, method, url, response, error, _pool, _stacktrace)
518 reason = error or ResponseError(cause)
--> 519 raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
521 log.debug("Incremented Retry for (url='%s'): %r", url, new_retry)
MaxRetryError: HTTPSConnectionPool(host='recodatasets.z20.web.core.windows.net', port=443): Max retries exceeded with url: /deeprec/deeprec/dkn/dkn_MINDsmall.yaml (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7d8615c28b10>: Failed to resolve 'recodatasets.z20.web.core.windows.net' ([Errno -2] Name or service not known)"))
During handling of the above exception, another exception occurred:
ConnectionError Traceback (most recent call last)
Cell In[25], line 1
----> 1 yaml_file = maybe_download(url="https://recodatasets.z20.web.core.windows.net/deeprec/deeprec/dkn/dkn_MINDsmall.yaml",
2 work_directory=data_path)
3 hparams = prepare_hparams(yaml_file,
4 news_feature_file=news_feature_file,
5 user_history_file=user_history_file,
(...) 9 history_size=history_size,
10 batch_size=batch_size)
File ~/Projects/microsoft/recommenders/.venv/lib/python3.11/site-packages/retrying.py:55, in retry.<locals>.wrap.<locals>.wrapped_f(*args, **kw)
53 @wraps(f)
54 def wrapped_f(*args, **kw):
---> 55 return Retrying(*dargs, **dkw).call(f, *args, **kw)
File ~/Projects/microsoft/recommenders/.venv/lib/python3.11/site-packages/retrying.py:289, in Retrying.call(self, fn, *args, **kwargs)
286 if self.stop(attempt_number, delay_since_first_attempt_ms):
287 if not self._wrap_exception and attempt.has_exception:
288 # get() on an attempt with an exception should cause it to be raised, but raise just in case
--> 289 raise attempt.get()
290 else:
291 raise RetryError(attempt)
File ~/Projects/microsoft/recommenders/.venv/lib/python3.11/site-packages/retrying.py:326, in Attempt.get(self, wrap_exception)
324 else:
325 exc_type, exc, tb = self.value
--> 326 raise exc.with_traceback(tb)
327 else:
328 return self.value
File ~/Projects/microsoft/recommenders/.venv/lib/python3.11/site-packages/retrying.py:273, in Retrying.call(self, fn, *args, **kwargs)
270 self._before_attempts(attempt_number)
272 try:
--> 273 attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
274 except Exception:
275 tb = sys.exc_info()
File ~/Projects/microsoft/recommenders/recommenders/datasets/download_utils.py:36, in maybe_download(url, filename, work_directory, expected_bytes)
34 filepath = os.path.join(work_directory, filename)
35 if not os.path.exists(filepath):
---> 36 r = requests.get(url, stream=True)
37 if r.status_code == 200:
38 log.info(f"Downloading {url}")
File ~/Projects/microsoft/recommenders/.venv/lib/python3.11/site-packages/requests/api.py:73, in get(url, params, **kwargs)
62 def get(url, params=None, **kwargs):
63 r"""Sends a GET request.
64
65 :param url: URL for the new :class:`Request` object.
(...) 70 :rtype: requests.Response
71 """
---> 73 return request("get", url, params=params, **kwargs)
File ~/Projects/microsoft/recommenders/.venv/lib/python3.11/site-packages/requests/api.py:59, in request(method, url, **kwargs)
55 # By using the 'with' statement we are sure the session is closed, thus we
56 # avoid leaving sockets open which can trigger a ResourceWarning in some
57 # cases, and look like a memory leak in others.
58 with sessions.Session() as session:
---> 59 return session.request(method=method, url=url, **kwargs)
File ~/Projects/microsoft/recommenders/.venv/lib/python3.11/site-packages/requests/sessions.py:589, in Session.request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
584 send_kwargs = {
585 "timeout": timeout,
586 "allow_redirects": allow_redirects,
587 }
588 send_kwargs.update(settings)
--> 589 resp = self.send(prep, **send_kwargs)
591 return resp
File ~/Projects/microsoft/recommenders/.venv/lib/python3.11/site-packages/requests/sessions.py:703, in Session.send(self, request, **kwargs)
700 start = preferred_clock()
702 # Send the request
--> 703 r = adapter.send(request, **kwargs)
705 # Total elapsed time of the request (approximately)
706 elapsed = preferred_clock() - start
File ~/Projects/microsoft/recommenders/.venv/lib/python3.11/site-packages/requests/adapters.py:677, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
673 if isinstance(e.reason, _SSLError):
674 # This branch is for urllib3 v1.22 and later.
675 raise SSLError(e, request=request)
--> 677 raise ConnectionError(e, request=request)
679 except ClosedPoolError as e:
680 raise ConnectionError(e, request=request)
ConnectionError: HTTPSConnectionPool(host='recodatasets.z20.web.core.windows.net', port=443): Max retries exceeded with url: /deeprec/deeprec/dkn/dkn_MINDsmall.yaml (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7d8615c28b10>: Failed to resolve 'recodatasets.z20.web.core.windows.net' ([Errno -2] Name or service not known)"))
Expected behavior (i.e. solution)
Example notebooks should be runnable with a local copy of the MIND dataset, with non-dataset non-image resources like hyper-parameter files available for download from somewhere other than recodatasets
(e.g. the resources
repo)
Willingness to contribute
- Yes, I can contribute for this issue independently.
- Yes, I can contribute for this issue with guidance from Recommenders community.
- No, I cannot contribute at this time.
Other Comments
Can't be fixed by outside folks who don't have the relevant files