This is going to be a very short update.
I improved the CUDA code in nearly every way I know, so now is the moment of truth. How much faster is it?
This is on k3, comparing the speed of my c code to my CUDA c code under the exact same conditions.
This is slightly slower than I have hoped. Probably I'm still doing something wrong.
Another disappointing thing is the maximum resolution I can get is still just 732 by 732. If I push it to 1k by 1k, I get a segmentation fault that indicates memory shortage, which is confusing since the amount of memory the program needs should not be that much.
The next thing I plan to do is to run it on k4/k5. Sounds like a simple thing to do, but unfortunately k4 & k5 don't have mathgl installed yet. Guess I need to do that first.