Classes
class	bolt::amp::control

struct	bolt::amp::control::debug

Enumerations
enum	e_UseHostMode { NoUseHost, UseHost }

enum	e_RunMode { Automatic, SerialCpu, MultiCoreCpu, Gpu }

enum	e_AutoTuneMode { NoAutoTune =0x0, AutoTuneDevice =0x1, AutoTuneWorkShape =0x2, AutoTuneAll =0x3 }

enum	e_WaitMode { BalancedWait, NiceWait, BusyWait, ClFinish }

Functions
	bolt::amp::control::control (Concurrency::accelerator accel=getDefault().getAccelerator(), e_UseHostMode useHost=getDefault().getUseHost(), unsigned debug=getDefault().getDebug())

	bolt::amp::control::control (const control &ref)

void	bolt::amp::control::setAccelerator (::Concurrency::accelerator accel)
	Set the AMP Accelerator for Bolt algorithms to use.

void	bolt::amp::control::setUseHost (e_UseHostMode useHost)

void	bolt::amp::control::setForceRunMode (e_RunMode forceRunMode)

void	bolt::amp::control::setDebug (unsigned debug)

void	bolt::amp::control::setWGPerComputeUnit (int wgPerComputeUnit)

void	bolt::amp::control::setWaitMode (e_WaitMode waitMode)

void	bolt::amp::control::setUnroll (int unroll)

Concurrency::accelerator &	bolt::amp::control::getAccelerator ()

const Concurrency::accelerator &	bolt::amp::control::getAccelerator () const

e_UseHostMode	bolt::amp::control::getUseHost () const

e_RunMode	bolt::amp::control::getForceRunMode () const

e_RunMode	bolt::amp::control::getDefaultPathToRun () const

unsigned	bolt::amp::control::getDebug () const

int const	bolt::amp::control::getWGPerComputeUnit () const

e_WaitMode	bolt::amp::control::getWaitMode () const

int	bolt::amp::control::getUnroll () const

static control &	bolt::amp::control::getDefault ()

Variables
static const unsigned	bolt::amp::control::debug::None =0

static const unsigned	bolt::amp::control::debug::Compile = 0x1

static const unsigned	bolt::amp::control::debug::ShowCode = 0x2

static const unsigned	bolt::amp::control::debug::SaveCompilerTemps = 0x4

static const unsigned	bolt::amp::control::debug::DebugKernelRun = 0x8

static const unsigned	bolt::amp::control::debug::AutoTune = 0x10

Detailed Description

Function Documentation

static control& bolt::amp::control::getDefault ( )

inlinestatic

Return default default control structure. This is used for Bolt API calls when the user does not explicitly specify a control structure. Also, newly created control structures copy the default structure for their initial values. Note that changes to the default control structure are not automatically copied to already-created control structures. Typically, the default control structure is modified as part of the application initialiation; then, as other control structures are created, they pick up the modified defaults. Some examples:

bolt::amp::control myControl = bolt::cl::getDefault();  // copy existing default control.
bolt::amp::control myControl;  // same as last line - the constructor also copies values from the default control
// Modify a setting in the default \p control
bolt::amp::control::getDefault().compileOptions("-g");

void bolt::amp::control::setDebug ( unsigned debug )

inline

Enable debug messages to be printed to stdout as the algorithm is compiled, run, and tuned. See the #debug namespace for a list of values. Multiple debug options can be combined with the + sign, as in following example. Use this technique rather than separate calls to the debug() API; each call resets the debug level, rather than merging with the existing debug() setting.

bolt::amp::control myControl;
// Show example of combining two debug options with the '+' sign.
myControl.debug(bolt::amp::control::debug::Compile + bolt::amp::control:debug::SaveCompilerTemps);

void bolt::amp::control::setForceRunMode ( e_RunMode forceRunMode )

inline

Force the Bolt command to run on the specifed device. Default is "Automatic," in which case the Bolt runtime selects the device. Forcing the mode to SerialCpu can be useful for debugging the algorithm. Forcing the mode can also be useful for performance comparisons, or for direct control over the run location (perhaps due to knowledge that the algorithm is best-suited for GPU).

void bolt::amp::control::setUnroll ( int unroll )

inline

unroll assignment

void bolt::amp::control::setUseHost ( e_UseHostMode useHost )

inline

If enabled, Bolt can use the host CPU to run parts of the algorithm. If false, Bolt runs the entire algorithm using the device specified by the accelerator. This can be appropriate on a discrete GPU, where the input data is located on the device memory.

void bolt::amp::control::setWaitMode ( e_WaitMode waitMode )

inline

Set the method used to detect completion at the end of a Bolt routine.

void bolt::amp::control::setWGPerComputeUnit ( int wgPerComputeUnit )

inline

Set the work-groups-per-compute unit that will be used for reduction-style operations (reduce, transform_reduce). Higher numbers can hide latency by improving the occupancy but will increase the amount of data that has to be reduced in the final, less efficient step. Experimentation may be required to find the optimal point for a given algorithm and device; typically 8-12 will deliver good results

Classes

Enumerations

Functions

Variables

Detailed Description

Function Documentation