Bolt
1.3
C++ template library with support for OpenCL
|
Classes | |
class | bolt::amp::control |
struct | bolt::amp::control::debug |
Functions | |
bolt::amp::control::control (Concurrency::accelerator accel=getDefault().getAccelerator(), e_UseHostMode useHost=getDefault().getUseHost(), unsigned debug=getDefault().getDebug()) | |
bolt::amp::control::control (const control &ref) | |
void | bolt::amp::control::setAccelerator (::Concurrency::accelerator accel) |
Set the AMP Accelerator for Bolt algorithms to use. | |
void | bolt::amp::control::setUseHost (e_UseHostMode useHost) |
void | bolt::amp::control::setForceRunMode (e_RunMode forceRunMode) |
void | bolt::amp::control::setDebug (unsigned debug) |
void | bolt::amp::control::setWGPerComputeUnit (int wgPerComputeUnit) |
void | bolt::amp::control::setWaitMode (e_WaitMode waitMode) |
void | bolt::amp::control::setUnroll (int unroll) |
Concurrency::accelerator & | bolt::amp::control::getAccelerator () |
const Concurrency::accelerator & | bolt::amp::control::getAccelerator () const |
e_UseHostMode | bolt::amp::control::getUseHost () const |
e_RunMode | bolt::amp::control::getForceRunMode () const |
e_RunMode | bolt::amp::control::getDefaultPathToRun () const |
unsigned | bolt::amp::control::getDebug () const |
int const | bolt::amp::control::getWGPerComputeUnit () const |
e_WaitMode | bolt::amp::control::getWaitMode () const |
int | bolt::amp::control::getUnroll () const |
static control & | bolt::amp::control::getDefault () |
|
inlinestatic |
Return default default control
structure. This is used for Bolt API calls when the user does not explicitly specify a control
structure. Also, newly created control
structures copy the default structure for their initial values. Note that changes to the default control
structure are not automatically copied to already-created control structures. Typically, the default control
structure is modified as part of the application initialiation; then, as other control
structures are created, they pick up the modified defaults. Some examples:
|
inline |
Enable debug messages to be printed to stdout as the algorithm is compiled, run, and tuned. See the #debug namespace for a list of values. Multiple debug options can be combined with the + sign, as in following example. Use this technique rather than separate calls to the debug() API; each call resets the debug level, rather than merging with the existing debug() setting.
|
inline |
Force the Bolt command to run on the specifed device. Default is "Automatic," in which case the Bolt runtime selects the device. Forcing the mode to SerialCpu can be useful for debugging the algorithm. Forcing the mode can also be useful for performance comparisons, or for direct control over the run location (perhaps due to knowledge that the algorithm is best-suited for GPU).
|
inline |
unroll assignment
|
inline |
If enabled, Bolt can use the host CPU to run parts of the algorithm. If false, Bolt runs the entire algorithm using the device specified by the accelerator. This can be appropriate on a discrete GPU, where the input data is located on the device memory.
|
inline |
Set the method used to detect completion at the end of a Bolt routine.
|
inline |
Set the work-groups-per-compute unit that will be used for reduction-style operations (reduce, transform_reduce). Higher numbers can hide latency by improving the occupancy but will increase the amount of data that has to be reduced in the final, less efficient step. Experimentation may be required to find the optimal point for a given algorithm and device; typically 8-12 will deliver good results