java - How can I pass a struct to a kernel in JCuda -


i have looked @ http://www.javacodegeeks.com/2011/10/gpgpu-with-jcuda-good-bad-and-ugly.html says must modify kernel take single dimensional arrays. refuse believe impossible create struct , copy device memory in jcuda.

i imagine usual implementation create case class (scala terminology) extends native api, can turned struct can safely passed kernel. unfortunately haven't found on google, hence question.

(the author of jcuda here (not "jcuda", please))

as mentioned in forum post linked comment: not impossible use structs in cuda kernels , fill them jcuda side. complicated, , beneficial.

for reason of why beneficial use structs @ in gpu programming, have refer results you'll find when search differences between

"array of structures" versus "structure of arrays".

usually, latter preferred gpu computations, due improved memory coalescing, beyond can profoundly summarize in answer. here, summarize why using structs in gpu computing bit difficult in general, , particularly difficult in jcuda/java.


in plain c, structs (theoretically!) simple, regarding memory layout. imagine structure like

struct vertex {     short a;     float x;     float y;     float z;     short b; }; 

now can create array of these structs:

vertex* vertices = (vertex*)malloc(n*sizeof(vertex)); 

these structs guaranteed laid out 1 contiguous memory block:

            |   vertices[0]      ||   vertices[1]      |             |                    ||                    | vertices -> [ a|  x |  y |  z | b][ a|  x |  y |  z | b].... 

since cuda kernel , c code compiled same compiler, there not room musinderstandings. host side says "here memory, interpret vertex objects", , kernel receive same memory , work it.

still, in plain c, there in practice potential unexpected problems. compilers introduce paddings these structs, achieve alignments. example structure might in fact have layout this:

struct vertex {     short a;        // 2 bytes     char padding_0  // padding byte     char padding_1  // padding byte     float x;        // 4 bytes     float y;        // 4 bytes     float z;        // 4 bytes     short b;        // 2 bytes     char padding_2  // padding byte     char padding_3  // padding byte }; 

something may done in order make sure structures aligned 32bit (4byte) word boundaries. moreover, there pragmas , compiler directives may influence alignment. cuda additionally prefers memory alignments, , therefore these directives used heavily in cuda headers.

for short: when define struct in c, , print sizeof(yourstruct) (or actual layout of struct) console, have hard time predict print. expect surprises.


in jcuda/java, world different. there no structs. when create java class like

class vertex {     short a;     float x;     float y;     float z;     short b; } 

and create array of these

vertex vertices[2] = new vertex[2]; vertices[0] = new vertex(); vertices[1] = new vertex(); 

then these vertex objects may arbirarily scattered in memory. don't know how large 1 vertex object is, , hardly able find out. thus, trying create array of structures in jcuda , pass cuda kernel not make sense.


however, mentioned above: still possible, in form. if know memory layout structures have in cuda kernel, can create memory block "compatible" structure layout, , fill java side. struct vertex mentioned above, roughly (involving pseudocode) this:

// 1 short + 3 floats + 1 short, no paddings int sizeofvertex = 2 + 4 + 4 + 4 + 2;   // allocate data 2 vertices bytebuffer data = bytebuffer.allocatedirect(sizeofvertex * 2);  // set vertices[0].a , vertices[0].x , vertices[0].y data.position(0).asshortbuffer().put(0, a0); data.position(2).asfloatbuffer().put(0, x0); data.position(2).asfloatbuffer().put(1, y0);  // set vertices[1].a , vertices[1].x , vertices[1].y data.position(sizeofvertex+0).asshortbuffer().put(0, a1); data.position(sizeofvertex+2).asfloatbuffer().put(0, x1); data.position(sizeofvertex+2).asfloatbuffer().put(1, y1);  // copy vertex data device cudamemcpy(devicedata, pointer.to(data), cudamemcpyhosttodevice); 

it boils down keeping memory in bytebuffer, , manually access memory regions correspond desired fields of desired structs.

however, warning: have consider possibility not portable among several cuda-c compiler versions or platforms. when compile kernel (that contains struct definition) once on 32bit linux machine , once on 64 bit windows machine, structure layout might different (and java code have aware of this).

(note: 1 define interface simplify these accesses. , jocl, tried create utility classes feel bit more c structs , automate copying process extent. in case, inconvenient (and not achieve performance) compared plain c)


Comments

Popular posts from this blog

javascript - Google App Script ContentService downloadAsFile not working -

javascript - Function overwritting -

php - Find a regex to take part of Email -