You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have prepared data in minio remote path, and use bulk_import to load data, I use list_import_jobs method to get current jobs, it works well when I don't pass parameter collection_name, however, when I set collection_name, it can't return correct result.
Expected Behavior
When I use list_import_jobs with collection_name it should return result correctly.
Steps/Code To Reproduce behavior
fromminioimportMiniofrompymilvus.bulk_writerimportlist_import_jobsfromdb.milvus_fixedimportbulk_importbucket_name="a-bucket"defcreate_minio_client() ->Minio:
returnMinio(
endpoint='127.0.0.1:9000',
access_key='minioadmin',
secret_key='minioadmin',
secure=False,
)
if__name__=='__main__':
minio_client=create_minio_client()
minio_path="/data/da137d38-4ff7-4f5d-b2d4-8debaa3dba18"db_name="local_test"collection_name="test"objects=minio_client.list_objects(
bucket_name=bucket_name,
prefix=minio_path,
recursive=True,
)
paths= [obj.object_nameforobjinobjects]
response=bulk_import(
url="http://localhost:19530",
collection_name=collection_name,
files=[[path] forpathinpaths],
db_name=db_name,
)
job_id=response.json()["data"]["jobId"]
print("insert to custom db job_id is {}".format(job_id))
response=bulk_import(
url="http://localhost:19530",
collection_name=collection_name,
files=[[path] forpathinpaths],
db_name="default",
)
job_id=response.json()["data"]["jobId"]
print("insert to default db job_id is {}".format(job_id))
response=list_import_jobs(url="http://127.0.0.1:19530")
data= (response.json())["data"]
print("list jobs: {}".format(data))
response=list_import_jobs(url="http://127.0.0.1:19530", collection_name=collection_name)
data= (response.json())["data"]
print("list jobs with collection name: {}".format(data))
And this is output
insert to custom db job_id is 454799264148166193
insert to default db job_id is 454799264148166197
list jobs: {'records': [{'collectionName': 'test', 'jobId': '454799264148158255', 'progress': 100, 'state': 'Completed'}, {'collectionName': 'test', 'jobId': '454799264148166193', 'progress': 0, 'state': 'Pending'}, {'collectionName': 'test', 'jobId': '454799264148166197', 'progress': 0, 'state': 'Pending'}]}
list jobs with collection name: {'records': [{'collectionName': 'test', 'jobId': '454799264148166197', 'progress': 0, 'state': 'Pending'}]}
You could see that the non default collection job query result is missing when I pass collection_name.
The bulk_import is modified to suppport db_name parameter as PR #2446 does.
## bulkinsert RESTful api wrapperdefbulk_import(
url: str,
collection_name: str,
db_name: str="default",
files: Optional[List[List[str]]] =None,
object_url: str="",
cluster_id: str="",
api_key: str="",
access_key: str="",
secret_key: str="",
**kwargs,
) ->requests.Response:
"""call bulkinsert restful interface to import files Args: url (str): url of the server collection_name (str): name of the target collection db_name (str): name of database partition_name (str): name of the target partition files (list of list of str): The files that contain the data to import. A sub-list contains a single JSON or Parquet file, or a set of Numpy files. object_url (str): The URL of the object to import. This URL should be accessible to the S3-compatible object storage service, such as AWS S3, GCS, Azure blob storage. cluster_id (str): id of a milvus instance(for cloud) api_key (str): API key to authenticate your requests. access_key (str): access key to access the object storage secret_key (str): secret key to access the object storage Returns: response of the restful interface """request_url=url+"/v2/vectordb/jobs/import/create"partition_name=kwargs.pop("partition_name", "")
params= {
"collectionName": collection_name,
"partitionName": partition_name,
"files": files,
"objectUrl": object_url,
"clusterId": cluster_id,
"accessKey": access_key,
"secretKey": secret_key,
"dbName": db_name,
}
resp=_post_request(url=request_url, api_key=api_key, params=params, **kwargs)
_handle_response(request_url, resp.json())
returnresp
Environment details
Hardware/Softward conditions
OS: Windows
CPU: 13th Gen Intel(R) Core(TM) i7-1365U
Method of installation: docker-compose, standalone
Is there an existing issue for this?
Describe the bug
I have prepared data in minio remote path, and use
bulk_import
to load data, I uselist_import_jobs
method to get current jobs, it works well when I don't pass parametercollection_name
, however, when I setcollection_name
, it can't return correct result.Expected Behavior
When I use
list_import_jobs
withcollection_name
it should return result correctly.Steps/Code To Reproduce behavior
And this is output
You could see that the non default collection job query result is missing when I pass
collection_name
.The
bulk_import
is modified to suppportdb_name
parameter as PR #2446 does.Environment details
insdie docker-compose.yaml
Anything else?
No response
The text was updated successfully, but these errors were encountered: