Added by Enrico Degregori about 2 years ago
During the development/porting cycle, bit-identical results should be obtained between CPU and GPU version.
CPU version needs to disable FMA: -Mnofma
GPU version needs to disable FMA and use CPU algorithms for mathematical operations: -Mnofma -gpu=math_uniform,cuda11.7