Saturday, June 26, 2010

[from Josh]

I have sent an email with my code in it. It is written in C/C++ and CUDA C. It is almost working. What it really needs is someone who knows more about CUDA to take a look at it. Pawel knows more than any of us on the subject I'm sure. Anyway I wont be able to work on it at my camp so I'm leaving it to you. Theres 3 days and I've completed 99% of the code. All that really needs to be done is adjust the way some things are written to avoid some memory bugs that wont allow it to run properly (I get through about 5 time steps then there's an error). I will post what the problems are and the layout of my code to make things easier for you.

1. in the particle.h define all the constants and structs (CUDA C only deals with structs not classes).
2. in 3ddisplay.h is the visualization which i have debugged completely.
3. in movPos.h is the calling of the kernels and memcpy from device to host.
4. in is the main function and setup resides in runSim function.
5. in are the kernels

Things that need to be checked over: 3,4,5
I think the problem may stem from how I have defined the tau_set. I compressed a 3D array into a 1D array. since the array was of dimensions theta_s x phi_s x r_s the 1D array goes from [0..theta_s * phi_s * r_s-1] where each vertical slice of the shell starts at the bottom at (theta=0,phi=-pi/2,r=0). The easiest thing to do here would be for someone who understands CUDA better to implement a 3D array and simplify things greatly.

There is need in the second kernel to place an individual particle in the corresponding tau cell and I had calculated the index to be (int(theta/dtheta)%theta_s)*phi_s*r_s + (int(phi/dphi)%phi_s)*r_s + (int(r/dr_%r_s). based on theta,phi,r of particle, and tau cell spacing dtheta,dphi,dr and total number of pieces in each demension theta_s,phi_s,r_s.

However this did not work since obviously a negative number is achievable. This caused an error. Also it would be good if someone could think of a cheap way of reducing theta and phi into a lower angle once it exceeds 2pi or pi/2 respectively.

If you implement a 3d array then you must modify the kernels so that every loop in movPos calls the first kernel which zeros all tau grid. Then the second kernel must be called (after __syncthreads()) which counts the number of particles in each grid and saves it in the respective tau cell of your array. Then the final kernel may be called again after synchronizing which integrates the particles by calculating tau and using the beta equation.

Everything is well commented and if worst comes to worst and you just want to dispaly CUDA then you can try to remove all aspects of tau and simply show an N-body simulation. That would mean that the only array is a 1D particle array. And I already know that my integration code works from my previous version.

Also to compile my code you must specify in linux "-lm -lglut -lglu -lcudart -lcutil" You can most likely just use a make file from another project that has openGL enabled.

I'm very sorry I have wont be able to help out the next few days. I wanted to be at the presentation very much and had planned on it until the date changed (after I had already made plans with my family).

I wish you all good luck. I will check my email in the next few hours to before I leave to see if any of you respond.

I will also send contact info so that you can reach me and ask questions about my code if you choose to try and fix it. I think only someone good and CUDA can fix the issue however I think you all should try.



  1. [from James] A couple of preliminary comments based on your post, before I look at the code:

    In my own program, I frequently get array index errors because tau causes a blow-out effect and sooner or later (depending on how strong the effect is) one of my particles goes outside the limits of the tau grid.

    Implementing the grid as a 1D array should work fine, since C stores multi-dimensional arrays as 1D arrays (in row-major order).

    Instead of x%y you could try using fmod(x,y), or just a simple "if x > y then x-=y". I have no idea which method is faster (though I assume the if statement is slowest).

  2. This comment has been removed by the author.

  3. [from James]
    The easiest way I can think of getting around the negative tau index issue is to simply add pi/2 to the particle's phi value during the index calculation (effectively pretending, for the purpose of the tau grid, that 0<phi<pi).

    For keeping the particles' positions between zero and 2*pi, I think the easiest way is to use fmodf(theta,2*pi) and fmodf(phi,pi) when calculating the particle's new position.

  4. [from Josh]
    I didn't know there was a float mod function. Very useful. Also for my tau grid I limit my tau grid to just 3 times as wide as initial particle radius. Originally I had made the tau grid expand (by reallocating memory) but found that that could lead to HUGE memory problems if a particle was blown out to like 5000*R. Any way limiting the tau array is a good strategy since as long as the majority of the particles are within the bounds then any tiny amount of particle outside will feel "max" tau effect due to all the dt's up until the tau bound then everything outside just feels that tau. if its just a few particles outside then the difference in accuracy is negligible.

    PS just got home from my camp. Relieved to hear we have more time.