Recently I have begun to port my compression algorithm to OpenCL. Despite being C99 friendly, there are some gotchas. 1) Implicit float and double conversion not present. E.g. double *vecd; float4 vecf; vecf[0] = vecd[0]; /* NOOO! */ vecf[0] = (float)vecd[0]; /* ? */ (const float *) modelview does not fly. float4 _mesa_gluProjectf(float4 v, const __global float * modelview, const __global float *projection, const __global int *viewport); __kernel void projection_test(const __global float* modelview, const __global float* projection, const __global int *viewport, const __global float* vert) { uint gid = get_global_id(0); float4 v; v[0] = vert[4*gid+0]; v[1] = vert[4*gid+1]; v[2] = vert[4*gid+2]; v[3] = 1; float4 wv = _mesa_gluProjectf(v, mview, mproj, viewport); ... } 3) Byte addressable memory needs to be explicitly enabled! #pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable
- enables the application to write 1 byte data types into global
memory. This is very important to effectively manipulate strings or
monochromatic pictures. Just a quick test of performance for single projection operation on different 3D models using my Macbook Pro (C2Duo, Nvidia 320M): 12 verts GPU Loop - Unoptimized: 0.136173512 CPU Loop - gluProject : 1.81e-05 1760 verts GPU Loop - Unoptimized: 0.079442842 CPU Loop - gluProject : 0.000334405 2144 verts GPU Loop - Unoptimized: 0.082229331 CPU Loop - gluProject : 0.00050072 8728 verts GPU Loop - Unoptimized: 0.104535711 CPU Loop - gluProject : 0.0015727 22998 verts GPU Loop - Unoptimized: 0.097971984 CPU Loop - gluProject : 0.004996595 32328 verts GPU Loop - Unoptimized: 0.091570921 CPU Loop - gluProject : 0.006260859 543652 verts GPU Loop - Unoptimized: 0.414667467 CPU Loop - gluProject : 0.109198138
OpenCL works best when you make each kernel work hard. As the port of this algorithm goes on the load will be increased and the results should start to appear. Nevertheless, with OpenCL, this work is otherwise deferred from the CPU. I've only just obtained another version of the AMD APP for OpenCL development as they posted a corrupt file (219mb != 7.4mb) on their website. I would like to see if there is any improvement on this simple routine on the bigger machine running windows. I've really found the CMSoft website to be most useful. It covers a few of the things that you would not otherwise find out through trial and painful error. |