Skip to content

Commit

Permalink
arm_compute v18.03
Browse files Browse the repository at this point in the history
  • Loading branch information
Jenkins committed Mar 2, 2018
1 parent 06ea048 commit e40997b
Show file tree
Hide file tree
Showing 5,024 changed files with 22,978 additions and 21,544 deletions.
The diff you're trying to view is too large. We only load the first 3000 changed files.
18 changes: 18 additions & 0 deletions .github/issue_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
<!--
Please fill the fields below in order to help us diagnose the issue. If you have a
general question or a problem with the scripts, you can ignore these fields.
-->

**Output of 'strings libarm_compute.so | grep arm_compute_version':**

**Platform:**

**Operating System:**


<!--
Please describe the issue (error, expected behaviour etc) and steps to reproduce it. If possible,
share the shortest code necessary that reproduces the issue.
-->

**Problem description:**
14 changes: 11 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,14 @@

:warning: **Deprecation notice: QS8 and QS16 data types will be removed in the next release** (As far as we know nobody uses these data types, if you do or think they are useful please open an Issue or send us an email):warning:
:warning: **Deprecation notice: QS8 and QS16 data types will be removed in the 18.05 release** (As far as we know nobody uses these data types, if you do or think they are useful please open an Issue or send us an email):warning:

Please report issues here: https://github.com/ARM-software/ComputeLibrary/issues
**Make sure you are using the latest version of the library before opening an issue. Thanks**

News:

We're hiring: [Senior Machine Learning C++ Software Engineer](https://careers.peopleclick.com/careerscp/client_arm/external/jobDetails.do?functionName=getJobDetail&jobPostId=36246&localeCode=en-us)
Come talk to us: [Gian Marco will be presenting his work at the EVS](https://www.embedded-vision.com/summit/even-faster-cnns-exploring-new-class-winograd-algorithms)

Related projects:

- [Caffe on Compute Library](https://github.com/OAID/Caffe-HRT)
Expand All @@ -12,6 +17,7 @@ Related projects:

Documentation available here:

- [v18.03](https://arm-software.github.io/ComputeLibrary/v18.03/)
- [v18.02](https://arm-software.github.io/ComputeLibrary/v18.02/)
- [v18.01](https://arm-software.github.io/ComputeLibrary/v18.01/)
- [v17.12](https://arm-software.github.io/ComputeLibrary/v17.12/)
Expand All @@ -24,8 +30,10 @@ Documentation available here:

Binaries available here:

- [v18.02-linux](https://github.com/ARM-software/ComputeLibrary/releases/download/v18.01/arm_compute-v18.02-bin-linux.tar.gz)
- [v18.02-android](https://github.com/ARM-software/ComputeLibrary/releases/download/v18.01/arm_compute-v18.02-bin-android.tar.gz)
- [v18.03-linux](https://github.com/ARM-software/ComputeLibrary/releases/download/v18.03/arm_compute-v18.03-bin-linux.tar.gz)
- [v18.03-android](https://github.com/ARM-software/ComputeLibrary/releases/download/v18.03/arm_compute-v18.03-bin-android.tar.gz)
- [v18.02-linux](https://github.com/ARM-software/ComputeLibrary/releases/download/v18.02/arm_compute-v18.02-bin-linux.tar.gz)
- [v18.02-android](https://github.com/ARM-software/ComputeLibrary/releases/download/v18.02/arm_compute-v18.02-bin-android.tar.gz)
- [v18.01](https://github.com/ARM-software/ComputeLibrary/releases/download/v18.01/arm_compute-v18.01-bin.tar.gz)
- [v17.12](https://github.com/ARM-software/ComputeLibrary/releases/download/v17.12/arm_compute-v17.12-bin.tar.gz)
- [v17.10](https://github.com/ARM-software/ComputeLibrary/releases/download/v17.10/arm_compute-v17.10-bin.tar.gz)
Expand Down
4 changes: 2 additions & 2 deletions SConscript
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@ import os.path
import re
import subprocess

VERSION = "v18.02"
SONAME_VERSION="9.0.0"
VERSION = "v18.03"
SONAME_VERSION="10.0.0"

Import('env')
Import('vars')
Expand Down
2 changes: 1 addition & 1 deletion arm_compute/core/Dimensions.h
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ class Dimensions
*/
template <typename... Ts>
explicit Dimensions(Ts... dims)
: _id{ { dims... } }, _num_dimensions{ sizeof...(dims) }
: _id{ { static_cast<T>(dims)... } }, _num_dimensions{ sizeof...(dims) }
{
}

Expand Down
4 changes: 2 additions & 2 deletions arm_compute/core/NEON/kernels/convolution/winograd/gemm.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -65,11 +65,11 @@ inline void BlockedGemm(
const int c_row_stride
) {
// Array access methods
const auto A = [a, M, K, a_row_stride] (const int i, const int j) -> TIn {
const auto A = [a, a_row_stride] (const int i, const int j) -> TIn {
return a[i*a_row_stride + j];
};

const auto B = [b, K, N, b_row_stride] (const int i, const int j) -> TIn {
const auto B = [b, b_row_stride] (const int i, const int j) -> TIn {
return b[i*b_row_stride + j];
};

Expand Down
20 changes: 11 additions & 9 deletions arm_compute/core/utils/logging/Macros.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2017 ARM Limited.
* Copyright (c) 2017-2018 ARM Limited.
*
* SPDX-License-Identifier: MIT
*
Expand Down Expand Up @@ -50,14 +50,16 @@
} \
} while(false)

#define ARM_COMPUTE_LOG_STREAM(logger_name, log_level, stream) \
do \
{ \
auto __logger = arm_compute::logging::LoggerRegistry::get().logger(logger_name); \
if(__logger != nullptr) \
{ \
__logger->log(log_level, static_cast<std::ostringstream &>(std::ostringstream() << stream).str()); \
} \
#define ARM_COMPUTE_LOG_STREAM(logger_name, log_level, stream) \
do \
{ \
auto __logger = arm_compute::logging::LoggerRegistry::get().logger(logger_name); \
if(__logger != nullptr) \
{ \
std::ostringstream s; \
s << stream; \
__logger->log(log_level, s.str()); \
} \
} while(false)

#else /* ARM_COMPUTE_LOGGING_ENABLED */
Expand Down
82 changes: 47 additions & 35 deletions arm_compute/runtime/CL/CLTuner.h
Original file line number Diff line number Diff line change
Expand Up @@ -37,26 +37,43 @@ class ICLKernel;
class CLTuner : public ICLTuner
{
public:
/** Constructor */
CLTuner();
/** Constructor
*
* @param[in] tune_new_kernels Find the optimal local workgroup size for kernels which are not present in the table ?
*
*/
CLTuner(bool tune_new_kernels = true);

/** Destructor */
~CLTuner() = default;

/** Setter for tune_new_kernels option
*
* @param[in] tune_new_kernels Find the optimal local workgroup size for kernels which are not present in the table ?
*/
void set_tune_new_kernels(bool tune_new_kernels);
/** Tune kernels that are not in the LWS table
*
* @return True if tuning of new kernels is enabled.
*/
bool tune_new_kernels() const;
/** Manually add a LWS for a kernel
*
* @param[in] kernel_id Unique identifiant of the kernel
* @param[in] optimal_lws Optimal local workgroup size to use for the given kernel
*/
void add_lws_to_table(const std::string &kernel_id, cl::NDRange optimal_lws);
/** Import LWS table
*
* @param[in] lws_table The unordered_map container to import
*/
void import_lws_table(const std::unordered_map<std::string, cl::NDRange> &lws_table);

/** Export LWS table
/** Give read access to the LWS table
*
* return The lws table as unordered_map container
*/
const std::unordered_map<std::string, cl::NDRange> &export_lws_table();

// Inherited methods overridden:
void tune_kernel(ICLKernel &kernel) override;
const std::unordered_map<std::string, cl::NDRange> &lws_table() const;

/** Set the OpenCL kernel event
*
Expand All @@ -66,7 +83,28 @@ class CLTuner : public ICLTuner
*/
void set_cl_kernel_event(cl_event kernel_event);

std::function<decltype(clEnqueueNDRangeKernel)> real_function;
std::function<decltype(clEnqueueNDRangeKernel)> real_clEnqueueNDRangeKernel;

/** Load the LWS table from file
*
* @param[in] filename Load the LWS table from this file.(Must exist)
*/
void load_from_file(const std::string &filename);

/** Save the content of the LWS table to file
*
* @param[in] filename Save the LWS table to this file. (Content will be overwritten)
*/
void save_to_file(const std::string &filename) const;

// Inherited methods overridden:
void tune_kernel(ICLKernel &kernel) override;

/** Is the kernel_event set ?
*
* @return true if the kernel_event is set.
*/
bool kernel_event_is_set() const;

private:
/** Find optimal LWS using brute-force approach
Expand All @@ -81,33 +119,7 @@ class CLTuner : public ICLTuner
cl::CommandQueue _queue;
cl::CommandQueue _queue_profiler;
cl::Event _kernel_event;
};

/* Function to be used to intercept kernel enqueues and store their OpenCL Event */
class Interceptor
{
public:
explicit Interceptor(CLTuner &tuner);

/** clEnqueueNDRangeKernel interface
*
* @param[in] command_queue A valid command-queue. The kernel will be queued for execution on the device associated with command_queue.
* @param[in] kernel A valid kernel object. The OpenCL context associated with kernel and command_queue must be the same.
* @param[in] work_dim The number of dimensions used to specify the global work-items and work-items in the work-group. work_dim must be greater than zero and less than or equal to CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS.
* @param[in] gwo Global-Workgroup-Offset. It can be used to specify an array of work_dim unsigned values that describe the offset used to calculate the global ID of a work-item. If global_work_offset is NULL, the global IDs start at offset (0, 0, ... 0).
* @param[in] gws Global-Workgroup-Size. Points to an array of work_dim unsigned values that describe the number of global work-items in work_dim dimensions that will execute the kernel function.
* @param[in] lws Local-Workgroup-Size. Points to an array of work_dim unsigned values that describe the number of work-items that make up a work-group
* @param[in] num_events_in_wait_list Number of events in the waiting list
* @param[in] event_wait_list Event waiting list
* @param[in] event OpenCL kernel event
*
* @return the OpenCL status
*/
cl_int operator()(cl_command_queue command_queue, cl_kernel kernel, cl_uint work_dim, const size_t *gwo, const size_t *gws, const size_t *lws, cl_uint num_events_in_wait_list,
const cl_event *event_wait_list, cl_event *event);

private:
CLTuner &_tuner;
bool _tune_new_kernels;
};
}
#endif /*__ARM_COMPUTE_CLTUNER_H__ */
Loading

1 comment on commit e40997b

@psyhtest
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This latest ArmCL release 18.03 can now be built and tested on an OpenCL platform using Collective Knowledge as follows:

$ sudo ck pip install ck
$ ck pull repo:ck-math
$ ck install package:lib-armcl-opencl-18.03
$ ck compile program:acl-sgemm-opencl-example
$ ck run program:acl-sgemm-opencl-example
...
{
  "GFLOPS": "19.434593", 
  "GFLOPS_1": "12.255410", 
  "GFLOPS_AVG": "12.255410", 
  "GFLOPS_MAX": "19.434593", 
  "K": "1024", 
  "M": "1024", 
  "N": "1024", 
  "NUM_REPETITIONS": "5", 
  "STATUS": "0", 
  "TIME_AVG": "0.175227", 
  "TIME_MIN": "0.110498", 
  "TIME_REPETITION0": "0.196454", 
  "TIME_REPETITION1": "0.182109", 
  "TIME_REPETITION2": "0.165389", 
  "TIME_REPETITION3": "0.110498", 
  "TIME_REPETITION4": "0.221687", 
  "k": "1024", 
  "m": "1024", 
  "ms_1": "175.227400", 
  "n": "1024", 
  "post_processed": "yes"
}

(The example was run on an Odroid-XU3 board.)

Please sign in to comment.