multithreading - How many threads/work-items are used? -


i trying understand architecture of gpu , estimate latency of 1 arithmetic statement without compiling or running it.

i suppose following code use 1 thread/work-item although specify local size = 32. correct?

int k = 0; (; k < 32000; k++){      = c * (b + d); } 

if run programme using double precision unit (dpu), , there 1 dpu per sm on nvidia tesla gpu, size of warp? still 32 threads (1 thread uses dpu, plus 31 threads use sps)?

one more question: according gpu architecture, there no threads on real gpu. thread virtual concept programmers?

i trying understand architecture of gpu , estimate latency of 1 arithmetic statement without compiling or running it.

i not believe publicly specified anywhere , varies between vendors , models. modern discrete gpus amd , nvidia typically have pipelines of around 20 stages.

i suppose following code use 1 thread/work-item although specify local size = 32. correct?

if specify ndrange of 32 work items, irrespective of local size, 32 work items. haven't shown how launch kernel, question here unclear.

if run programme using double precision unit (dpu), , there 1 dpu per sm on nvidia tesla gpu, size of warp?

the size of warp not depend on type of instruction execute. warps physical concept, akin simd lanes. cannot change it. on nvidia hardware, 32.

this has nothing spus , dpus. amount of spus , dpus constrains number of single precision , double precision instructions can issued/retired @ every cycle (exact constraints vary between hardware, not possible issue both types of instructions in same cycle).

assuming fictitious sm 32 spus , 1 dpu, means can issue 32 single precision instructions , 1 double precision instruction @ every cycle .

if 32 threads need execute single precision instruction, issued in single cycle. if need execute double precision, issued on 32 cycles. , if assume sm can both in parallel, can issue 1 double precision instruction , 31 single precision instructions in single cycle, too.

is thread virtual concept programmers?

yes, term "thread" when talking in cuda parlance unrelated usual meaning, akin "simd lane". note opencl not use term thread, work-item. underlying execution mechanism unspecified , need not map hardware concept.


Comments

Popular posts from this blog

javascript - Google App Script ContentService downloadAsFile not working -

javascript - Function overwritting -

php - Find a regex to take part of Email -