-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Download failure under heavy load #974
Comments
In a close read of the code it looks like it is only retrying the initial connection https://github.com/planetlabs/planet-client-python/blob/79d9a3cb952fcd4f75e5e935ee067455580c779d/planet/http.py#L411C38-L411C38. If read timeout errors happen at planet-client-python/planet/clients/orders.py Line 259 in 79d9a3c
|
calling out this very insightful quote for input into the python api docs effort #994:
|
script to create many orders for testing, run with
import asyncio
import planet
async def create(count=1):
item_ids = ['20230719_071823_96_2479']
requests = [planet.order_request.build_request(
name=str(i),
products=[
planet.order_request.product(item_ids=item_ids,
product_bundle='analytic_udm2',
item_type='PSScene')],
)
for i in range(count)]
async with planet.Session() as s:
client = s.client('orders')
orders = await asyncio.gather(*[
_create_order(client, request)
for request in requests
])
for o in orders:
print(o['id'])
async def _create_order(client, order_detail):
with planet.reporting.StateBar(state='creating') as reporter:
order = await client.create_order(order_detail)
reporter.update(state='created', order_id=order['id'])
return order
asyncio.run(create(count=100)) Interestingly, one run of out of three runs this got the following error:
|
Speaking to the above error, what is needed is
import asyncio
import planet
async def create(count=1):
item_ids = ['20230719_071823_96_2479']
requests = [planet.order_request.build_request(
name=str(i),
products=[
planet.order_request.product(item_ids=item_ids,
product_bundle='analytic_udm2',
item_type='PSScene')],
)
for i in range(count)]
async with planet.Session() as s:
client = s.client('orders')
orders = await asyncio.gather(*[
_create_order(client, request)
for request in requests
], return_exceptions=True)
async def _create_order(client, order_detail):
with planet.reporting.StateBar(state='creating') as reporter:
order = await client.create_order(order_detail)
reporter.update(state='created', order_id=order['id'])
print(order['id'])
asyncio.run(create(count=100)) |
script to download many orders that were created already and recorded in
import asyncio
import planet
async def download(count=1, directory_path='downloads'):
async with planet.Session() as s:
client = s.client('orders')
with open('oids.txt', 'r') as f:
order_ids = f.readlines()
oids = [order_id.strip() for order_id in order_ids[:count]]
res = await asyncio.gather(*[
_download_order(client, oid, directory_path)
for oid in oids], return_exceptions=True)
for res in zip(oids, res):
if issubclass(type(res[1]), Exception):
print(f'Failed download: {res[0]}')
print(res[1])
else:
print(f'Successful download: {res[0]}')
async def _download_order(client, order_id, directory):
with planet.reporting.StateBar(state='waiting') as reporter:
await client.wait(order_id, callback=reporter.update_state, max_attempts=0, delay=7)
await client.download_order(order_id, directory, progress_bar=True, overwrite=True)
asyncio.run(download(count=100)) |
This works now. I refactored the code a little bit for my purpose and it looks like this. def activate_and_download_orders(api_key, input_file):
auth = Auth.from_key(api_key)
def activate_order_wrapper(input_file):
list_of_order_ids = []
async def create(df):
list_of_requests = []
for index, row in df.iterrows():
temp_date = datetime.strptime(row['fulfilled_date'], "%Y-%m-%d").strftime("%Y%m%d")
name = f"{temp_date}_SKYSAT_{row['order_name']}"
item_ids = eval(row['item_id'])
item_ids = [item for sublist in item_ids for item in sublist]
list_of_requests.append(planet.order_request.build_request(
name=name,
products=[ # see if delivery function of order_request be used here to directory download zip file
planet.order_request.product(item_ids=item_ids,
product_bundle='pansharpened_udm2',
item_type='SkySatCollect')],
delivery=planet.order_request.delivery(
archive_type='zip',
single_archive=True,
archive_filename=f'{name}.zip')))
async with planet.Session(auth=auth) as s:
client = s.client('orders')
orders = await asyncio.gather(*[
_create_order(client, request)
for request in list_of_requests
], return_exceptions=True)
async def _create_order(client, order_detail):
with planet.reporting.StateBar(state='creating') as reporter:
order = await client.create_order(order_detail)
reporter.update(state='created', order_id=order['id'])
list_of_order_ids.append(order['id'])
df = pd.read_csv(input_file)
asyncio.run(create(df))
return list_of_order_ids
list_of_orders_to_be_downloaded = activate_order_wrapper(input_file)
def download_order_wrapper(list_of_orders_to_be_downloaded):
directory = "./orders"
# Check if the directory exists
if not os.path.exists(directory):
# Create the directory
os.makedirs(directory)
else:
pass
async def download(list_of_orders_to_be_downloaded, directory_path):
async with planet.Session(auth=auth) as s:
client = s.client('orders')
oids = [order_id for order_id in list_of_orders_to_be_downloaded]
res = await asyncio.gather(*[
_download_order(client, oid, directory_path)
for oid in oids], return_exceptions=True)
for res in zip(oids, res):
if issubclass(type(res[1]), Exception):
print(f'Failed download: {res[0]}')
print(res[1])
else:
print(f'Successful download: {res[0]}')
async def _download_order(client, order_id, directory):
with planet.reporting.StateBar(state='waiting') as reporter:
await client.wait(order_id, callback=reporter.update_state, max_attempts=0, delay=7)
await client.download_order(order_id, directory, progress_bar=True, overwrite=True)
asyncio.run(download(list_of_orders_to_be_downloaded, directory))
download_order_wrapper(list_of_orders_to_be_downloaded)
# for UNIX systems
# write the same for Windows Anaconda Prompt
os.system('mv ./orders/*/*.zip ./orders/')
# os.system(move /Y .\orders\*\*.zip .\orders\) This can be even better if the user types two commands: one for activation and one for downloading. I am gonna ask them if they will agree to it. |
But it failed for two of the orders still, I am unsure why it's happening. I just got a |
Yeah, we still need to add retry to the download. I'm working on that. These scripts are mostly designed to hone in on and trigger the error. Which they are doing spectacularly =) And the idea is that they won't trigger the error when retry is added. Stay tuned! |
Under high download concurrency, httpcore and httpx errors propagate up from the
StreamingBody
instance at https://github.com/planetlabs/planet-client-python/blob/main/planet/clients/orders.py#L259. These errors do not manifest at lower concurrency. Streaming responses is a strategy used to keep the memory footprint of programs manageable while downloading multiple large (up to ~100 MB) TIFFs concurrently.Possible lead: the same kind of
asyncio.exceptions.CancelledError
is mentioned at agronholm/anyio#534. Which was closed, concluding that callers have to expect read timeouts and work around them.Possible workaround: separate order creation from order download. Order creation is more reliable and when it does fail, fails differently. It is probably less complicated to retry order downloads if they are de-interleaved from order creation. This project has tended to document order creation and download as tasks that are done together, but that may not be a best practice for large batches of orders.
Traceback 1:
Traceback 2:
cc @aayushmalik
The text was updated successfully, but these errors were encountered: