1. in the particle.h define all the constants and structs (CUDA C only deals with structs not classes).
2. in 3ddisplay.h is the visualization which i have debugged completely.
3. in movPos.h is the calling of the kernels and memcpy from device to host.
4. in CUDA_pSim.cu is the main function and setup resides in runSim function.
5. in CUDA_pSim_kernel.cu are the kernels
Things that need to be checked over: 3,4,5
I think the problem may stem from how I have defined the tau_set. I compressed a 3D array into a 1D array. since the array was of dimensions theta_s x phi_s x r_s the 1D array goes from [0..theta_s * phi_s * r_s-1] where each vertical slice of the shell starts at the bottom at (theta=0,phi=-pi/2,r=0). The easiest thing to do here would be for someone who understands CUDA better to implement a 3D array and simplify things greatly.
There is need in the second kernel to place an individual particle in the corresponding tau cell and I had calculated the index to be (int(theta/dtheta)%theta_s)*phi_s*r_s + (int(phi/dphi)%phi_s)*r_s + (int(r/dr_%r_s). based on theta,phi,r of particle, and tau cell spacing dtheta,dphi,dr and total number of pieces in each demension theta_s,phi_s,r_s.
However this did not work since obviously a negative number is achievable. This caused an error. Also it would be good if someone could think of a cheap way of reducing theta and phi into a lower angle once it exceeds 2pi or pi/2 respectively.
If you implement a 3d array then you must modify the kernels so that every loop in movPos calls the first kernel which zeros all tau grid. Then the second kernel must be called (after __syncthreads()) which counts the number of particles in each grid and saves it in the respective tau cell of your array. Then the final kernel may be called again after synchronizing which integrates the particles by calculating tau and using the beta equation.
Everything is well commented and if worst comes to worst and you just want to dispaly CUDA then you can try to remove all aspects of tau and simply show an N-body simulation. That would mean that the only array is a 1D particle array. And I already know that my integration code works from my previous version.
Also to compile my code you must specify in linux "-lm -lglut -lglu -lcudart -lcutil" You can most likely just use a make file from another project that has openGL enabled.
I'm very sorry I have wont be able to help out the next few days. I wanted to be at the presentation very much and had planned on it until the date changed (after I had already made plans with my family).
I wish you all good luck. I will check my email in the next few hours to before I leave to see if any of you respond.
I will also send contact info so that you can reach me and ask questions about my code if you choose to try and fix it. I think only someone good and CUDA can fix the issue however I think you all should try.