Bolt  1.3
C++ template library with support for OpenCL
Classes | Enumerations | Enumerator | Functions | Variables
AMP-control

Classes

class  bolt::amp::control
 
struct  bolt::amp::control::debug
 

Enumerations

enum  e_UseHostMode { NoUseHost, UseHost }
 
enum  e_RunMode { Automatic, SerialCpu, MultiCoreCpu, Gpu }
 
enum  e_AutoTuneMode { NoAutoTune =0x0, AutoTuneDevice =0x1, AutoTuneWorkShape =0x2, AutoTuneAll =0x3 }
 
enum  e_WaitMode { BalancedWait, NiceWait, BusyWait, ClFinish }
 

Functions

 bolt::amp::control::control (Concurrency::accelerator accel=getDefault().getAccelerator(), e_UseHostMode useHost=getDefault().getUseHost(), unsigned debug=getDefault().getDebug())
 
 bolt::amp::control::control (const control &ref)
 
void bolt::amp::control::setAccelerator (::Concurrency::accelerator accel)
 Set the AMP Accelerator for Bolt algorithms to use.
 
void bolt::amp::control::setUseHost (e_UseHostMode useHost)
 
void bolt::amp::control::setForceRunMode (e_RunMode forceRunMode)
 
void bolt::amp::control::setDebug (unsigned debug)
 
void bolt::amp::control::setWGPerComputeUnit (int wgPerComputeUnit)
 
void bolt::amp::control::setWaitMode (e_WaitMode waitMode)
 
void bolt::amp::control::setUnroll (int unroll)
 
Concurrency::accelerator & bolt::amp::control::getAccelerator ()
 
const Concurrency::accelerator & bolt::amp::control::getAccelerator () const
 
e_UseHostMode bolt::amp::control::getUseHost () const
 
e_RunMode bolt::amp::control::getForceRunMode () const
 
e_RunMode bolt::amp::control::getDefaultPathToRun () const
 
unsigned bolt::amp::control::getDebug () const
 
int const bolt::amp::control::getWGPerComputeUnit () const
 
e_WaitMode bolt::amp::control::getWaitMode () const
 
int bolt::amp::control::getUnroll () const
 
static control & bolt::amp::control::getDefault ()
 

Variables

static const unsigned bolt::amp::control::debug::None =0
 
static const unsigned bolt::amp::control::debug::Compile = 0x1
 
static const unsigned bolt::amp::control::debug::ShowCode = 0x2
 
static const unsigned bolt::amp::control::debug::SaveCompilerTemps = 0x4
 
static const unsigned bolt::amp::control::debug::DebugKernelRun = 0x8
 
static const unsigned bolt::amp::control::debug::AutoTune = 0x10
 

Detailed Description

Function Documentation

static control& bolt::amp::control::getDefault ( )
inlinestatic

Return default default control structure. This is used for Bolt API calls when the user does not explicitly specify a control structure. Also, newly created control structures copy the default structure for their initial values. Note that changes to the default control structure are not automatically copied to already-created control structures. Typically, the default control structure is modified as part of the application initialiation; then, as other control structures are created, they pick up the modified defaults. Some examples:

bolt::amp::control myControl = bolt::cl::getDefault(); // copy existing default control.
bolt::amp::control myControl; // same as last line - the constructor also copies values from the default control
// Modify a setting in the default \p control
bolt::amp::control::getDefault().compileOptions("-g");
void bolt::amp::control::setDebug ( unsigned  debug)
inline

Enable debug messages to be printed to stdout as the algorithm is compiled, run, and tuned. See the #debug namespace for a list of values. Multiple debug options can be combined with the + sign, as in following example. Use this technique rather than separate calls to the debug() API; each call resets the debug level, rather than merging with the existing debug() setting.

// Show example of combining two debug options with the '+' sign.
myControl.debug(bolt::amp::control::debug::Compile + bolt::amp::control:debug::SaveCompilerTemps);
void bolt::amp::control::setForceRunMode ( e_RunMode  forceRunMode)
inline

Force the Bolt command to run on the specifed device. Default is "Automatic," in which case the Bolt runtime selects the device. Forcing the mode to SerialCpu can be useful for debugging the algorithm. Forcing the mode can also be useful for performance comparisons, or for direct control over the run location (perhaps due to knowledge that the algorithm is best-suited for GPU).

void bolt::amp::control::setUnroll ( int  unroll)
inline

unroll assignment

void bolt::amp::control::setUseHost ( e_UseHostMode  useHost)
inline

If enabled, Bolt can use the host CPU to run parts of the algorithm. If false, Bolt runs the entire algorithm using the device specified by the accelerator. This can be appropriate on a discrete GPU, where the input data is located on the device memory.

void bolt::amp::control::setWaitMode ( e_WaitMode  waitMode)
inline

Set the method used to detect completion at the end of a Bolt routine.

void bolt::amp::control::setWGPerComputeUnit ( int  wgPerComputeUnit)
inline

Set the work-groups-per-compute unit that will be used for reduction-style operations (reduce, transform_reduce). Higher numbers can hide latency by improving the occupancy but will increase the amount of data that has to be reduced in the final, less efficient step. Experimentation may be required to find the optimal point for a given algorithm and device; typically 8-12 will deliver good results