Thursday, July 29, 2010

[From Jeffrey] cuda-hydro

So the code works now...



The resolution for this simulation is 732x732 ( in case you wonder why such strange numbers: 732=3*(256-12) ). Approximately 10000 time steps were taken and it took 2 hours on k3. Total duration in simulation time is approximately 2 crossing time.

This is definitely not the fastest it can get. Computations are currently only done on one block that has a size of 256, so the simulation box is 3x3=9 times too large for the block. I should be able to utilize 9 blocks in parallel to do this simulation, which would in principle give me another factor of 9 in speed. The only reason why this is not done yet is that the memory sharing between blocks is not trivial and I didn't want to involve too much complication before I get it to work. Now that I understand cuda a little more, I already have a plan for the improvement and it shouldn't be too difficult.

2 comments:

  1. Great!!! so you mean you use 1 cuda block with how many threads?

    Pawel

    ReplyDelete
  2. One block with 256 threads. I couldn't get it to 512 because then the amount of shared memory required would exceed 16k. I guess if I push a little I can stuff ~300 threads in there.

    Jeffrey

    ReplyDelete