Robert, James and Pawel spent a lot of time on Monday working on
(i) making infiniband work concurrently with ethernet, and installing all openMPI
tests - a lot of success with that but tests aren't finished yet. a lot of software setup and configuration needs to be done. it look like we'll get our 20Gbps!
(ii) simulation of particles.
we decided to skip tau-effects and concentrate on benchmarking our naive, barely optimized code. after conquerring the problem of 512 particles-only
in the old code, we can now run up to 30 million particles. around 10pm we even started doing some "galaxy" simulations by adding external forces. we assumed a non-rotating, weakly barred force.
we benchmarked the compute part of the code at roughly 100 GFLOP/s.
this is approximately what we'd expect from a global memory bandwidth-limited problem:
the card can stream 120 GB/s from DDR5 to GPU, that is 30 GFLoats/s.
If 3.3 FLOP are done on every float from RAM, then we are comp/bandwidth balanced. Our leapfrog integration is probably limited by bandwidth, although the GPU heats up to 93C at times, and the fan becomes a hair-dryer.