Skip to content

ve2: add support for create/destroy hw context#560

Merged
mamin506 merged 14 commits intoamd:mainfrom
rajkumar-xilinx:28may_hwctx
Jun 12, 2025
Merged

ve2: add support for create/destroy hw context#560
mamin506 merged 14 commits intoamd:mainfrom
rajkumar-xilinx:28may_hwctx

Conversation

@rajkumar-xilinx
Copy link
Contributor

-Support hw context create/destroy ioctls
-Add new ve2 resourse solver drivers to manage aie columns
resources among hw contexts.
-Add new ve2 management driver for cert firmware

-Support hw context create/destroy ioctls
-Add new ve2 resourse solver drivers to manage aie columns
 resources among hw contexts.
-Add new ve2 management driver for cert firmware

Signed-off-by: Raj Kumar Rampelli <[email protected]>
@rajkumar-xilinx rajkumar-xilinx marked this pull request as draft May 27, 2025 19:39
@EugeneHiew
Copy link

Can one of the admins verify this patch?

@rajkumar-xilinx
Copy link
Contributor Author

Internal review in progress, so marked as draft.

@mamin506 mamin506 requested a review from chvamshi-xilinx May 28, 2025 16:46
@@ -0,0 +1,219 @@
/* SPDX-License-Identifier: GPL-2.0 */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Handshake region is got updated in CERT. Could you please check once and update accordingly

xrs_cfg.ddev = &xdna->ddev;
xrs_cfg.total_col = XRS_MAX_COL;

if (xdna->dev_handle)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xdna->dev_handle
This one checks twice.

Suggested change
if (xdna->dev_handle)
if (xdna->dev_handle) {
xdna->dev_handle->xrs_hdl = xrsm_init(&xrs_cfg);
if (!xdna->dev_handle->xrs_hdl) {
XDNA_ERR(xdna, "Initialize resolver failed");
drm_dev_put(&xdna->ddev);
return -EINVAL;
}
}


total_col = xrs_get_total_cols(xdna->dev_handle->xrs_hdl);
if (total_col < 0)
return -EINVAL;
Copy link
Contributor

@saifuddin-xilinx saifuddin-xilinx Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before return we should free xrs_req memory also unlock the lock

ret = cert_setup_partition(aie_dev, col, start_col, num_col, hsa_addr);
if (ret < 0) {
XDNA_ERR(xdna, "cert_setup_partition() err %d for col %d", ret, start_col);
return ret;
Copy link
Contributor

@saifuddin-xilinx saifuddin-xilinx Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we are returning from here please teardown and release the aie partition

Comment on lines 226 to 231
for (u32 col = start_col; col < start_col + num_col; col++) {
rel_col = col - start_col;
ret = cert_clear_partition(xdna, priv->aie_dev, rel_col);
if (ret < 0)
XDNA_ERR(xdna, "cert_clear_partition() err %d for col %d", ret, rel_col);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the loop.

Suggested change
for (u32 col = start_col; col < start_col + num_col; col++) {
rel_col = col - start_col;
ret = cert_clear_partition(xdna, priv->aie_dev, rel_col);
if (ret < 0)
XDNA_ERR(xdna, "cert_clear_partition() err %d for col %d", ret, rel_col);
}
for (u32 col = 0; col < num_col; col++) {
ret = cert_clear_partition(xdna, priv->aie_dev, col);
if (ret < 0)
XDNA_ERR(xdna, "cert_clear_partition() err %d for col %d", ret, col);
}

aie_dev = aie_partition_request(&request);
if (IS_ERR(aie_dev)) {
XDNA_ERR(xdna, "aie parition request failed, error %ld", PTR_ERR(aie_dev));
return -ENODEV;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please release the resource if failed case xrs_release_resource
Same subsequently

@mamin506
Copy link
Contributor

mamin506 commented Jun 4, 2025

ok to test

@mamin506 mamin506 marked this pull request as ready for review June 4, 2025 17:17
Signed-off-by: Raj Kumar Rampelli <[email protected]>
Comment on lines 103 to 105
if (!xdna->dev_info->ops->ctx_init)
return -EOPNOTSUPP;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why adding this check? I don't think we need this check. I don't see any device that doesn't support this ctx_init() callback.
Please let me know what the justification for this is. Otherwise, please remove.

#include <drm/drm_managed.h>

#include "amdxdna_of_drv.h"
#include "ve2_res_solver.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NEVER include aie2_, aie4_ and ve2_ header in amdxdna_ files. Please remove this line and make necessary changes.

u16 minor;
}
version;
//Queue capacity, must be a power of two.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use WARN_ON() instead of comment. Delete this comment pls.

Comment on lines 51 to 59
union {
struct {
u16 type: 8;
u16 barrier: 1;
u16 acquire_fence_scope: 2;
u16 release_fence_scope: 2;
};
u16 header;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not suggesting coding style. Ref aie2_msg_priv.h

If this triggers many codes change, we can merge this for open source. But this needs to be refine when upstreaming.

u32 size; //5c
}
dbg_buf;
volatile struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put a comment to explain why 'volatile' is needed for this but not for above fields?

Comment on lines 175 to 216
// number of checks whether there are jobs ready
u32 c_job_readiness_checked;
// number of opcode run
u32 c_opcode;
u32 c_job_launched;
u32 c_job_finished;
// number of hsa pkt handled
u32 c_hsa_pkt;
// number of pages loaded
u32 c_page;
// number of hsa doorbell ring
u32 c_doorbell;
// number of uc memory(PM) scrub
u32 c_uc_scrub;
// number of tct requested
u32 c_tct_requested;
// number of tct received
u32 c_tct_received;
// run out of wait handle UC_DMA_WRITE_DES opcode
u16 c_preemption_ucdma;
// run out of wait handle UC_DMA_WRITE_DES_SYNC opcode
u16 c_preemption_ucdma_sync;
// POLL_32 opcode retry times
u16 c_preemption_poll;
// MASK_POLL_32 opcode retry times
u16 c_preemption_mask_poll;
// run out of physical barrier REMOTE_BARRIER opcode
u16 c_preemption_remote_barrier;
// actor entry overflow or run out of wait handle WAIT_TCTS opcode
u16 c_preemption_wait_tct;
// block UC_DMA_WRITE_DES opcode
u16 c_block_ucdma;
// block UC_DMA_WRITE_DES_SYNC opcode
u16 c_block_ucdma_sync;
// block local_barrier opcode
u16 c_block_local_barrier;
// block REMOTE_BARRIER opcode
u16 c_block_remote_barrier;
// block WAIT_TCTS opcode
u16 c_block_wait_tct;
// number of slow actor entry lookup
u16 c_actor_hash_conflict;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of comment, use readable field name is prefer.

Comment on lines 116 to 117
free_host_queue:
ve2_free_hsa_queue(xdna, &hwctx->priv->hwctx_hsa_queue);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are host_queue and hsa_queu the same? Even they are the same, please use consistent terminology to avoid confusion.

Comment on lines 40 to 47
static void ve2_irq_handler(u32 partition_id, void *cb_arg)
{
struct amdxdna_ctx *hwctx = (struct amdxdna_ctx *)cb_arg;
struct amdxdna_dev *xdna = hwctx->client->xdna;

XDNA_DBG(xdna, "Created partition for start_col %d, num_col %d with partition_id %d\n",
hwctx->start_col, hwctx->num_col, partition_id);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does nothing but only print a debug message. Is this expected?

This function doesn't look like a Linux IRQ handler. If my understanding is correct, please rename this to something else. This is more like a callback function.

Raj Kumar Rampelli added 3 commits June 10, 2025 10:34
@rajkumar-xilinx rajkumar-xilinx requested a review from mamin506 June 10, 2025 06:53
Signed-off-by: Raj Kumar Rampelli <[email protected]>
@mamin506
Copy link
Contributor

The checkpatch step failed in the CI. Please fix that.

Copy link
Contributor

@mamin506 mamin506 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks good to me. Just some minor changes and suggestion. Please take a look. The upper_32_bits() macro is a must have change.

Comment on lines 35 to 37
#define ADDR64_HIGH(x) (((x) >> 32) & 0xFFFFFFFF)
#define ADDR64_LOW(x) ((x) & 0xFFFFFFFF)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linux has predefined macro for the same purpose. Please use those macros,
See https://elixir.bootlin.com/linux/v6.15/source/include/linux/wordpart.h#L14

Comment on lines 163 to 164
/* save aie_dev into priv */
priv->aie_dev = aie_dev;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary comment. :D

Comment on lines 90 to 92
if (!xrs)
return -EINVAL;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The xrt_hdl is assigned when probe the device. It is OK to remove this check.
Unless xrs can be dynamically load/unload, and maybe the caller of this function doesn't aware of it. You can check such a fundamental struct. :)

@mamin506 mamin506 self-requested a review June 11, 2025 22:35
@mamin506
Copy link
Contributor

retest this please

@mamin506 mamin506 merged commit 84455db into amd:main Jun 12, 2025
1 check passed
@rajkumar-xilinx rajkumar-xilinx deleted the 28may_hwctx branch June 13, 2025 03:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants