Monday, May 31, 2010

[from Pawel] how to compute tau? Party on Sat.

I'm talking about the optical thicknaess calculation for a bunch of particles.
Suppose the disk is 2D, otherwise you'll have to generalize. Then each particle must deposit
its dtau (small contribution to tau).

we talked a lot about this with James amd he's your guru on that (right James?). ask him any questions. you first declare an array (2d: r,phi) that will hold dtau(r,phi)

we established that it is much better to use the so-called "scatter" approach than "gather" approach. in the first, you traverse the list of N particles, and donate their dtau's to the
appropriate grid locations (grid indexes are not stored but found each time you need them for a given particle, it should save time and space in ram). in the second approach, you'd traverse the
dtau grid and look for particles inside each grid cell (that's a lot of searching! ~N.)

we need exp(-tau) in the end, we can get it by integrating tau(r,phi) = integral_0^r dtau(r,phi)
= sum_along_r_from_the_innermost_to_the_particle's_cell of dtau(r,phi).

we've talked about a nice trick to avoid computing exponential functions later on.
suppose you have dtau(r) in an array and want exp(-tau). then you can get a very good approximation to f=exp(-tau) by
f = integral_0^tau (-f) dtau
why? because df(tau) = -f(tau) dtau is the differential equation with solution f=exp(-tau) .
you need to start with initial value f=1 at the inner radius and integrate outside.

what is dtau contributed by each particle? it's dA/(4pi r^2) where dA is its geom. cross section.
in general, we don't want to count particles, all we want is to arrive at tau(r,phi) which
in the initial axisymmetric state has tau(r=outer radius) = tau_infinity = 10 (or 3, or something you input). Therefore do an integration using dtau = C/r^2 where C=1 initially, and after getting
tau(outer) or exp(-tau(outer)) computed with the above algorithm, you recalculate C so in all the
later calls to the tau-computing procedure it gives the right outside exp(-tau_infinity) number.

* * *

We will meet on Friday downtown, meanwhile there seems to be enough work and enough computers already doing or starting to do cuda at UTSC to drop in there the rest of the week.

On Sat, as I mentioned, you are invited for a pool-side party (swimming encouraged,
take your swimming pants). bring a friend if you'd like.
I'll make some bbq, but if you want fancy salads maybe we can make it a potluck = byo.. party) hopefully the weather will cooperate so we can go flying for 20 min with each of you out of buttonville airport (north of 404 & 407).

please let me know how many are coming and whether you have a car (it would be good to have
more than just my car for the buttonville trip).

finally, the coordinates (time will still be confirmed, we'll see if the Wx is ok the whole or just part of the day):

37 Fairhill Cres., toronto, 2pm, Sat 26 May. I can give you a 1-2km ride from from Underhill and Lawrence to my place, send me the details of how you're coming so we'll set some time to meet
at the gas station there. the bus that goes on Lawrence is called 54.

Thursday, May 27, 2010

[from Josh] OpenGL code

I propose that if ones wishes to post code then they post it between "pre" tags so that they preserve formatting and allow others to test their code out.
"look at how I did this under HTML view to see what a pre tag is"

The way that I have built my code everything that is needed to run the simulation (such as "math.h", "freeglut.h",...etc) is "#include"ed in one header file. Also within that header file I define the global variables that will be used for the simulation such as number of particles, current runtime, dt, solar mass,... etc. I also define a class object called particle and define a dynamic array of particle objects. I then include this first header file in another header file called "forces.h". In the header file "forces.h" I define all the different forces that I want to take effect in my simulation (currently just gravitational but its very easy to add in other forces). I then include the forces header file into another header file called "movePos.h". In "movPos.h" all the numerical integration takes place. It calls functions from "forces.h" to do so and updates the particles positions. I then include "movePos.h" into my graphic header called "3Ddisplay.h" which I have posted below. Here, within my render function I call functions in "movePos.h" which update the particle object array that I defined in the very first header file. I then graph the positions of each of the particles adding visualization techniques and also display data.

Up until now everything has been contained in seperate header files that are easily updated, debugged, and managed. You should know that header files release their properties to any file that includes them, (ie. things that include header files inherit all things within the header files). I then have a file (not a header) called "main.cpp". This file includes "3Ddisplay.h" and thus inherits all the other header files functions and global variables. In "main.cpp" I define a single function called "main" that interactivity asks the user what type of simulation they would like to perform, what the parameters of the simulation are and then calls the displaySim function in "3Ddisplay.h" which then displays the simulation as you requested. The program is terminated when the user presses "q" in the display window.

Also my simulation utilizes openMP and will run in parallel across all available CPU cores.

I tested my program using "extern" variables instead of in inherited header variables but the complexity rose considerably and made debugging a nightmare. I also test various pointer array object declaration techniques to see which one was the fastest and I found that the using my current method of memory allocation is the fastest. This is similar to the way we would do things with CUDA too since CUDA works fastest with "pinned memory" (memory that is allocated once and remains static for the remainder of the program).

If you would like to use the below code you will have to either a) modify it completely so that it works with your code format, or b) try and follow my format.

In general you really only need to do two things:
- change displaySim function to "main" and add #include freeglut.h at the top of your program
- in the render function replace the movePos3D with your own code that updates particle positions. (Note where it says pPart[i] is my particle object and you should replace it with what ever your particle is or the corresponding x,y,z,vx,vy,vz of your code.

Note: you will need to have freeglut installed on your system and locatable by your compiler. See my previous post on the topic for those using windows.

#include "movePos.h"

/*openGL vars*/
static int left_click = GLUT_UP;
static int right_click = GLUT_UP;
static int xold;
static int yold;
static int width;
static int height;
static int wh;
static int hw;
static int vidtype = 'n';
static int leavetrail = 0;
static int paused = 0;
static double rotate_x = 30;
static double rotate_y = 15;
static double alpha = 0;
static double beta = 0;
static double scale = 100;
static double trans = -0.1*scale;

/*openGL functions*/
void ReshapeFunc(int new_width, int new_height){
width = (double)new_width;
height = (double)new_height;

hw = height / (double) width;
wh = width / (double) height;


gluPerspective(20 * sqrt(double(1 + hw * hw)), wh, 8, 12);


void MotionFunc(int x, int y){
if (GLUT_DOWN == left_click){
rotate_y = rotate_y + (y - yold) / 5.f;
rotate_x = rotate_x + (x - xold) / 5.f;
if (rotate_y > 90) rotate_y = 90;
if (rotate_y < -90) rotate_y = -90; glutPostRedisplay(); } if (GLUT_DOWN == right_click){ beta = beta + (y - yold) / 2.f; alpha = alpha + (x - xold) / 2.f; glutPostRedisplay(); } xold = x; yold = y; } void keyCB(unsigned char key, int x, int y){ if( key == 'q' ){ free(pPart); exit(0); } if( key == '+' ){ scale *= 2.0/3.0; } if( key == '-' ){ scale *= 1.5; } if( key == 'r' ){ vidtype = 'r'; } if( key == 'v' ){ vidtype = 'v'; } if( key == 'n' ){ vidtype = 'n'; } if( key == 't' ){ leavetrail = abs(leavetrail - 1); } if( key == 'p' ){ paused = abs(paused - 1); } } void renderBitmapCharacher(float x, float y, float z, void *font,char *string){ char *c; glRasterPos3f(x,y,z); for (c=string; *c != '\0'; c++){ glutBitmapCharacter(font, *c); } } void Render(int face){ double D; int i; /* Axis */ glBegin(GL_LINES); glColor3d(1, 0, 0); glVertex3d(-0.7*scale, -0.7*scale, -0.7*scale); glVertex3d( 0.7*scale, -0.7*scale, -0.7*scale); glColor3d(0, 1, 0); glVertex3d(-0.7*scale, -0.7*scale, -0.7*scale); glVertex3d(-0.7*scale, 0.7*scale, -0.7*scale); glColor3d(0, 0, 1); glVertex3d(-0.7*scale, -0.7*scale, -0.7*scale); glVertex3d(-0.7*scale, -0.7*scale, 0.7*scale); glEnd(); glRotated(beta, 1, 0, 0); glRotated(alpha, 0, 1, 0); glColor3d(1, 1, 1); glBegin(GL_POINTS); if (face==0) movePos3D(); KE = 0.0; #pragma omp parallel for for (i=0;i width)
glOrtho(-scale, scale, -scale * hw, scale * hw, -scale, scale);
glOrtho(-scale * wh, scale * wh, -scale, scale, -scale, scale);

/* Perspective view */
glViewport(0, 0, width / 2, height / 2);

/* Display data */
char Npart_str[33];
renderBitmapCharacher(-scale,scale - 0*scale*0.06,0,(void *)GLUT_BITMAP_8_BY_13,Npart_str);
char G_str[33];
renderBitmapCharacher(-scale,scale - 1*scale*0.06,0,(void *)GLUT_BITMAP_8_BY_13,G_str);
char t_str[33];
renderBitmapCharacher(-scale,scale - 2*scale*0.06,0,(void *)GLUT_BITMAP_8_BY_13,t_str);
char h_str[33];
renderBitmapCharacher(-scale,scale - 3*scale*0.06,0,(void *)GLUT_BITMAP_8_BY_13,h_str);
char scale_str[33];
sprintf(scale_str,"axis scale=%2.4Lf",scale*2);
renderBitmapCharacher(-scale,scale - 4*scale*0.06,0,(void *)GLUT_BITMAP_8_BY_13,scale_str);
char K_str[33];
sprintf(K_str,"Kinetic Energy=%2.4Lf",KE);
renderBitmapCharacher(-scale,scale - 5*scale*0.06,0,(void *)GLUT_BITMAP_8_BY_13,K_str);

glRotated(rotate_y, 1, 0, 0);
glRotated(rotate_x, 0, 1, 0);

/* Right view */
glViewport(0, height / 2 + 1, width / 2 + 1, height / 2);
glRotated(90, 0, -1, 0);

/* Face view */
glViewport(width / 2 + 1, height / 2 + 1, width / 2, height / 2);

/* Top view */
glViewport(width / 2 + 1, 0, width / 2, height / 2);
glRotated(90, 1, 0, 0);

/* Projection matrix is restored */

/* End */

/* Run upon click */
void MouseFunc(int button, int state, int x, int y)
if (GLUT_LEFT_BUTTON == button)
left_click = state;
if (GLUT_RIGHT_BUTTON == button)
right_click = state;
xold = x;
yold = y;

/* Timed refresh to control refresh rate */
void TimerdisplayCB(int value){
if (paused == 0) glutPostRedisplay();

void displaySim(int *argcp, char **argv){
/* Creation of the window */
glutInit(argcp, argv);
glutCreateWindow("Particle Simulation");

/* OpenGL settings */
glClearColor(0, 0, 0, 0);

/* Declaration of the callbacks */

/* Main display */

Wednesday, May 26, 2010

[from Robert] cudak4 & cudak5 up and running

Both machines are running Fedora 12 along with the newest 256.25 Beta drivers from NVIDIA finally. The CUDA compiler is also running, and I have sent everyone an email on how to access c5 (for now). Try it out!

[from Anthony] some CUDA examples

There's a website I found which has a handful of CUDA codes such as adding matrics and solving Laplace's equation.

I do find it useful for CUDA beginner's like myself since they explain what is happening at each step of their code. They also compare codes that are programmed for the CPU and GPU in terms of how they are written (in C++ btw) and how fast the CPU and GPU execute the program.

For instance, in the Laplace 2D solver they discussed that the smaller the grid (the smaller then NxN array) and fewer iterations the more likely that the CPU wins in terms of speed because the GPU puts more effort in setting up. Here are some of my results with different N and iterations...

n=10 iterations=1000 time_gpu=6586.000 time_cpu=1601.000

n=10 iterations=100 time_gpu=615.000 time_cpu=150.000

n=60 iterations=100 time_gpu=617.000 time_cpu=5980.000

I won't be posting their codes - it's pretty length-y. They also have a plot where they compare the CPU/GPU times of executing the program.

By the way, the devices I have in my macbook pro are...

Device 0: "GeForce 9600M GT"

CUDA Driver Version: 3.0

CUDA Runtime Version: 3.0

CUDA Capability Major revision number: 1

CUDA Capability Minor revision number: 1

Total amount of global memory: 268107776 bytes

Number of multiprocessors: 4

Number of cores: 32

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 8192

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 2147483647 bytes

Texture alignment: 256 bytes

Clock rate: 1.25 GHz

Concurrent copy and execution: Yes

Run time limit on kernels: Yes

Integrated: No

Support host page-locked memory mapping: No

Compute mode: Default (multiple host threads can use this device simultaneously)

Device 1: "GeForce 9400M"

CUDA Driver Version: 3.0

CUDA Runtime Version: 3.0

CUDA Capability Major revision number: 1

CUDA Capability Minor revision number: 1

Total amount of global memory: 266010624 bytes

Number of multiprocessors: 2

Number of cores: 16

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 8192

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 2147483647 bytes

Texture alignment: 256 bytes

Clock rate: 1.10 GHz

Concurrent copy and execution: No

Run time limit on kernels: Yes

Integrated: Yes

Support host page-locked memory mapping: Yes

Compute mode: Default (multiple host threads can use this device simultaneously)

It's not as great in comparison tp what we have in our new machines where our group installed the new Fermi cards (I think three for each node or something?). So if I were to carry out some massive, intense computations and graphics it would be recommended to use our new machines since they have a better cooling system.

Just now I tried the code in Dr. Dobbs part 1 CUDA. It had no problems compiling with the nvcc command. It also suggested to add printf to track what is happening in the program. I will try that and see what kind of results I get.

For our CPU programs, most of us have written our N*(2-body) simulations and I have done that with both cartesian and polar coords. Still working on the visuals where I'm trying with OpenGL. Josh, you should post your openGL classes, codes, whatnot - it will save *all* of us a lot of time.

I already have mathGL in place but apparently it uses kuickshow as its image viewer and I seem to have difficulties getting kuickshow on my mac, so right now it can't output any image with mathGL. But it has no problems compiling so I'm pretty sure I installed it correclty.

As if now, I am (most of us as well....) modifying our codes to include radiation pressure as a source of physical interactions between particles and I'm still figuring out how to do this with our codes.


Monday, May 24, 2010

[from Josh] x,y,z n-body simulation

EDIT: Here is the link to a better video with better resolution.


This is my 2d or 3d n body simulation with opemMP running parallel for loops (can run several 1000 particles fine on my CPU). I made the simulation in C/C++ and the visuals with openGL. From my interface you can orbit the system using your left mouse button, rotate the axes with the right mouse button, and zoom in and out with the +/- keys. You can also change colour of each particle. Pressing 'r' is supposed to give a color based on particle position (malfunctioning ATM), 'v' for velocity, 'n' for static distribution (to help track particles). In the video it starts off with 'n' and then goes to 'v'.

Using lower G values causes the particles to fly away to quickly and by zooming out you can see just out far the orbit can be with 100.

The particles appear to behave properly and a continued simulation results in sub systems forming and then orbiting each other. It is quite neat to see actually and if I remember I will run one for a few hours with 1000+ particles and take a picture to show you what happens.

I identified a point of possible error for everyone else's simulations. It depends on how you set your code up but in mine I had to be careful when calculating the forces not to use the updated position coords until each particle had been updated.

I have started porting it over to CUDA C. It shouldn't be too hard.


Sunday, May 23, 2010

[from Pawel] our new machines.

The most important things now are to figure out how to install the compatible operating system and cuda driver on cudak5, which gives us some problems right now, and how to build cudak4 (I'm getting more & more impatient waiting for asus to respond within warranty process).

Those of you who want to help, let's meet on Tuesday in UTSC (>10:30)
Maybe we'll try out the newest beta cuda driver 256.25

Everybody else, how are the exercises going? mathGL installed and working?

May is the time for cpu-based computing, June is for gpu-based computing.
I'm looking forward to seeing your cpu-based, working, programs and their descriptions here, in May.

You need to be ready to do cuda on cudak3 initially, later on cudak5 and hopefully cudak4 and cudak1 (I'm repairing it; it will not be on 24/7, but it's a great development platform, with cuda setup similar to cudak3).
To start with, you should reproduce some Dr. Dobbs exercises. Please report successes asap, I imagine by the end of this week.

Reminder - on June 23 we must to be ready with our cuda programs, and some brief desrciption of them will be placed in this blog.

I'd love to see a demo of real-time compressible 2d hydrodynamics (it's difficult but we'll try with Jeffrey), and more than one demos of tau>1 disk calculations, 2d or 3d, done with cuda, using 100 million particles (my guess).

so this is the plan. remember, cuda takes time to learn. help each other, team up if you want, ask me questions.

Friday, May 21, 2010

[from James] polar N-body random generation

For completeness' sake, here's the code I wrote for randomly generating a symmetric disk of 1000 particles in polar co-ordinates with pretty constant number density. I tried a few different cartesian-to-polar conversion formulas, but the others always generated an asymmetric disk; this method always generates a θ between 0 and Π/2, then moves from quadrant to quadrant placing a point in each as it goes.

while i < 1000:
while j < 4:
r=sqrt(x*x + y*y)
if r <= 1.0:
T.append(j*pi*0.5 + atan(y/x))
j = j+1
i = i+1

Thursday, May 20, 2010

[from James] polar N-body self-gravity

Here's the physics I've worked out for particle-particle gravitational interactions in polar co-ordinates:
We're calculating the total forces acting on a particle located at (R,θ).
Another particle located at (r,φ) will be a distance d from the first particle.

d2 = R2 + r2 + 2*r*R*cos(ψ)

where ψ is angle between the position vectors for the two particles.
The gravitational force between these two particles acts along d; we can split this into the parts acting along θ and R respectively:

F(d)d = F(d)sinβθ + F(d)cosβR

where the bold values are unit vectors and their non-bold counterparts are the corresponding magnitudes (since I don't know how to write those little vector arrow superscripts in html), and β is the angle between R and d,

β = arccos[(d2 + r2 + R2)/(2dR)]

Pretty straightforward so far, right?

This force along d produces acceleration along both R and θ:

F(d)d = d2θ)/dt2 + d2(RR)/dt2

Ok, now working out the derivatives for those accelerations is kinda long-ish, fortunately B&T did it already for the radial acceleration in chapter 3.1. Deriving the angular acceleration by the same method and adding the two gives the total acceleration as

d2θ)/dt2 + d2(RR)/dt2
= [(d2θ/dt2)*(R+1) + (dθ/dt)*(2(dR/dt) - θ*(dθ/dt)]θ
+ [(d2R/dt2) + θ*(d2θ/dt2) - (dθ/dt)2(R+2)]R

Now combining the above with our force equation near the top, separating the R and θ parts into two equations, and re-arranging, we finally get

d2θ/dt2 = [F(d)sinβ - 2(dR/dt)(dθ/dt) - θ(dθ/dt)2] / (R+1)
d2R/dt2 = F(d)cosβ + θ(d2θ/dt2) + (dθ/dt)2(R+2)

I've double-checked my calculations and obtained the same result, but the presence of d2θ/dt2 in the d2R/dt2 equation doesn't really make sense to me. Anyone have any ideas about that?

Of course the above equations only tell us the particle's acceleration due to the gravitational attraction of one other particle in the disk; we still need to add the (similar) forces from all the other particles, plus of course the central mass. Once we do, we'll have a self-gravitating disk.

[from James] polar 1-body

The polar 2-body (or really 1-body orbiting the central mass) has been working for a few days now; I won't post the whole code, which is long due to the RKF45 integrator I've been using, but here is the relevant physical modelling part of the code (in python):

# 2-body problem in polar co-ordinates. Assume that G*M = 1 and m = 1.
# "t" is the time, "kin" is a list/array of current orbital values passed
# along from the integrator function.
def g(t,kin):
GM = 1.0
r = kin[0] # radial position
theta = kin[1] # angular position
v = kin[2] # radial velocity
omega = kin[3] # angular velocity
E = kin[4] # total energy
L = kin[5] # angular momentum
dr = v
dtheta = omega
dVr= r*omega*omega - GM/(r*r)
domega = -2.0*omega*v/r # careful not to confuse omega and domega when reading below
dE = GM*v/(r*r) + v*dVr + r*v*omega*omega + r*r*omega*domega
dL = 2*r*omega*dr + r*r*domega

# send computed d/dt values back to the integrator
return np.array((dr,dtheta,dVr,domega,dE,dL))

# Initial conditions. Position in (r, theta) and eccentricity will be
# randomly generated. V (radial velocity) will always initially be zero, ie.
# the particle will always begin at apocenter or pericenter. Other values
# are then calculated from r, theta, and eccentricity.
GM = 1.0
X,Y,Eccent = rand.random(),rand.random(),rand.random()
R = sqrt(X*X + Y*Y)
Theta = atan(Y/X) # radians
V = 0.0
Axis = (R*(1.0 + Eccent*cos(Theta)))/(1.0-Eccent*Eccent) # semi-major axis
Omega = (sqrt(GM*Axis*(1.0-Eccent*Eccent)))/(R*R)
# rad/time
Energy = -0.5*GM/Axis
AngMomentum = R*R*Omega

I'll send the whole file out via e-mail. If you have python you can run it as an executable, in which case it should display a polar plot of the particle's orbit, and also print out a table of data displaying time, r, theta, E and L for each timestep taken.

I know most of you are working in C/C++, but the actual formulas should work just as well regardless of language.

As for the N-body: the random generator of constant number density particle disk is working fine, and I think I have the polar-coordinate physics for the particle-particle gravitational interactions worked out, but I'm running into a wall on the programming side of things (which makes it hard to test the physics!). I hope to have it working before the weekend.

[from Pawel] about mathGL


I hope you're working on your mathGL installation and visualization of your
many-particle disks. Yesterday, not everybody showed up so I'll repeat:
/home/pawel/mathgl-1.10 is a directory readable to you; it contains
mathgl-1.10.eng.pdf = the manual for methGL,
mathgl-1.10.tgz is in my /home/pawel/Download directory -
copy it, untar it like so
> tar xvf mathgl-1.10.tgz
and change into the directory that this command creates
> cd mathgl-1.10
it's a small package that you better set up in your directories, reading the installation part of the manual.

In my directory /home/pawel/math-1.10/ I have a sample program in C++ called
sample.cpp, also one in Fortran90 called sample.f90 and so on. You can copy if if you like,
it has only one type of plot uncommented but I tried many of them! Almost all types of graphs and multi-D
plots work perfectly.

One warning: before you install mathGL, you need to have a library gsl-1.13 installed in your system.
I have the tarball in /home/pawel/Download/gsl-1.13.tar.gz and I installed it for everybody to use on cudak3. If you're working on a different system, install it there, and if it's Windows, find another zip file for it. It should be described in the mathGL installation instructions, I'd think.

Those demo plots are fun to make! We'll see how you can use them for disks!

[from Pawel] Our tasks, part 1

Hi guys, I propose to do those things:
1. precede your post title with [from Author] bit, so we know who's talking.
2. if a post is a continuation of something previously discussed by you, give it a number like
[from Anthony] Particle motion, part 3. that will really help to find our way around here.
3. If you update your earlier post, which you shold do if you wish so, note that like (updated 01-01-2011)
at the end of the post title.

* * *

now, to the first point I want to talk about: simulations. I would like all of you to have minimal experience in particle dynamics before we move on, and I have some evidence that Jeffrey (experienced in this) and Anthony and perhaps Josh (though I'd like to see his resuklts posted here and shown to us in real-time openGL one day) are already close to being able to move to the next level. I know Robert was busy with hardware/system software as was Josh, so I can understand. Still, this is not so complicated and I expect that you'll post. James, are you going to post your 1-Body simulation here soon?
If any of you want to team up with others, please do so, dividing tasks between yourself so you know exactly everythign about what the other(s) have done, and report together.

on level 2, we will improve and build out your simulations to include physical interactions such as radiation pressure with shadow casting. Any ideas on how to do this? Obviously, we need to synchronize the calculations for all the particles so the forces they exert are computed at the same times. It is possible to
multi-time the calculations and have let's say most particles for which we update forces at intevals dt,
and the rest which don't need it that often, and hence are updated in force only every (2*dt).
It's not necessary and it would complicate our life in this instance.

You have to device a scheme for optical thickness comptation at each particle's location.
I think you need to youse a fixed Eulerian polar grid of points, enough of then so you have at least as much cells as particles. Each particle will contribute a certain contribution dtau to tau, and that will have to vary as 1/r^2. Let's see why. dtau by definition is dH/H, or relative increase/decrease of flux of radiation coming from the star. Radiation transfer equation is describing the change of flux in a parallel beam of radiation reads
dH = -H d(tau)
where -dH is the absorbed/scattered flux.
However, our beam of radiation is spreading since we have a spherical symmetry, so it's better to
multiply both sides of this equation by 4pi r^2, and call L the total flux or total luminosity of the radiation *going out radially* (not scattered or absorbed yet). Then
dL = -L d(tau)
and now we have the comfortable result that if 0% of radiation is scattered/absorbed in a layer (optical thickness dtau=0), then the total luminosity at all radii is constant and equal L=L_* (stellar luminosity).

from this, dtau due to one particle is simply the fraction of particle cross section pi*s^2 to the
sphere of radius r (r=distance from the star, s = radius of the dust particle),
dtau = pi s^2/ (4 pi r^2) = (1/4) (s/r)^2

a good way to handle the calculation is to have a procedure for evaluation of a map of dtau asa function of (r, theta) coordinates. I would take 360 cells in theta and 1000 in radius between r=0.1 (or so) to r=3 or so.
I assume that inital ring of particles will be chosen between r=1 and r=2. If there are a billion instead of million particles, I'd take 3600 cells in angle and 10000 in radius, for a total of 0.36e8 cells. So ok, there will be on average >1 particle per cell, that's fine, as long it's not >>1.

After that, you integrate along radial paths
tau(r,theta) = integral _0 ^r dtau(r,theta),
which can be though of as adding consecutive dtau values to the running sum called tau, going from
inner to outer radii, separately for every theta. integral^r means integral with upper limit r.

On the first usage of this routine, you set A=1 and dtau = A/r^2, neglecting all the precise constants that would be needed. That is done, because no matter what s^2 we take and how accurate with constants we are,
we simply don't know and don;t want to deal with the number of particles of dust.
We prefer to specify global properties of the disk: as we travere the disk in its midplane, the total tau
or tau(infinity), should accumulate to a given number, which is our input parameter for the simulation.
So in the first pass we divide the **average** (over theta) tau(r=r_outer) by the wanted final tau_infinity,
and store that number as the average optical thickness parameter per particle, A. Then I'd recalculate
the tau map with the updated A.

I'd recommend to try the following choices of radial optical depth if a disk:
tau_infinity = 0, 2, 4, 12
* * *
when the table of dtau is ready and the integrated tau(r,theta) is ready, you can then compte the table of
beta = F_rad/F_grav. that parameter, as you know, does depend on the position only via tau:
beta = beta0 * exp(-tau)

since, however, exp(-tau) happens to be the solution of differential equation
d(tau) = - tau * dn
where dn=1 is by how much the index of the cell jumps from one cell to another, you might as well
save the computer some time on evaluation of math function exp() like so:
you set the initial value of function f, which is your numerical approximation to exp(-tau), to 1 at the inner radius r0. Moving out from one cell to another, you multiply the previous value of f by (1-dtau) in the current cell, store the result on move on to the outer radius.
Why? well, exp(-tau) = exp (-integral^r dtau) = product over all cells of factors exp(-dtau), because
an exp of a sum is a product of the exp's.
and that is very nearly equal to the product of 1-dtau factors, because exp(-dtau) = 1 - dtau/1! + dtau^2/2! - dtau^3/3! +...., and our dtau <<1 (optical thickness of one cell <<1).

[from Josh] Next Step

I think the next step we should take in our coding endeavors would be to have people start writing header files with specific tasks. For instance we need a section of code preferable written in a header file that visualizes our simulations with opengl or mathgl. I think opengl is much faster but for static graphs mathgl is the option.

So... I think the next step is to agree upon a format that our codes should follow (so that we can write sections of code separably and know that they will mesh together properly when run together). This also makes updating and debugging sections of code much much easier. We should discuss this at the next meeting (maybe Tuesday?).

[from Josh] New Office

Me (Josh), Anthony, and Rob checked out the new office. It's room AB 121 (far cubicle). As it turns out the office is very cramped and also houses two other students who said they will be there often. It will only be big enough for one person to work at once there. I'll be bringing a desktop to put there since I do not have a laptop and would like to be able to work on windows when down there. As of now Anthony and I have each a key. We paid a $20 deposit for each key. I really don't think it was worth it to get two keys after seeing the office. Anyway now I have a place to put a computer. The only problem is I don't have a monitor. Does anyone have a monitor I can use? Also I will be gone from Friday to Monday to my cottage where I hope to catch up on some R & R and also maybe some reading for school and the research.

Wednesday, May 19, 2010

[from Anthony] Remarks, continued

Remember my comment about the 'epicylic' output from my 2-body program with dt=0.5? Some of us sat down and discussed this puzzling result and tried to track the source of this phenomenon. We examined the numerical results from my program and noticed that indeed the radius fluctuates but probably not to the point that the particle would go in epicyclic motion. This may be an illusion that some of my colleagues suggested. Moreover, Prof. Artymowicz suggested that my for loops should stop at a certain time instead of at a specific step so that we can track what's happening at each period. So with his advice I did some visuals for certain amount of loops....

This is at around 1 loop.

Around 3 loops.

Around 7 loops.

Around 13 loops.

...and so on till you get the result that I had in my previous post. I believe in my previous post that I had the particle to go around 40 loops since I assigned n approx. 300 steps. So indeed, it does look like an illusion - the particle is not actually going in epicyclic motion.


Tuesday, May 18, 2010

Anthony Reporting...

After spending a day and-a-half learning C++ I managed to write my own program to solve the 2-body and N*(2-body) numerically. I actually had some programming experience with Python prior to this summer project so learning C++ was not too difficult.

#include <iostream>

#include <cmath>

#include <cstdlib>

#include <fstream>

using namespace std;

int main ()


ofstream dataFile("");

double r,ax,ay,E,a;

double x = 1; //initial conditions

double y = 0;

double vx = 0;

double vy = 1;

double dt=0.5;

double t = 0;

for (int step =0; t<60 ;step ++) //integrator


r = sqrt(x*x + y*y);

ax = -x/(r*r*r);

ay = -y/(r*r*r);

vx += ax * dt ;

vy += ay * dt ;

x += vx * dt ;

y += vy * dt ;

t += dt;

dataFile x << '\t' << y << '\n';//writes data


return 0;


So this is my code in solving the 2-body problem numerically in C++ (hope you all can read this...). Note that the orbit of the particle is purely circular (e=0) and in Cartesian. In my program you can see that I had to output a data file so that I can plot the x-y data points using gnuplot. Currently, I am only plotting the numerical results and not doing any fancy visualization *yet*. I still need to learn OpenGL! Perhaps Josh can give us all a lecture on OpenGL? ;-)

So using gnuplot, this is my result...

Looks good so far. Moreover, I assigned the particle to move approx. 3 loops and the time-step was 0.1 s (instead of 0.02 s that I was assigned to do). The reason I used 0.1 s because I want you all the see the slight deviation in the radius from my numerical results. The integrator I *think* I used in my program was the leapfrog method (or Euler) so you can see that this integrator is quite stable in this time-step. When I assigned the time-step to be 0.5 s...

This looks interesting because it appears that the particle is undergoing epicyclic motion (is it really undergoing epicyclic motion?). Despite of that you can see that there is a larger deviation in the radius with this time-step. I should also post the energies and angular momentum data as well to see how well they conserve.

The next logical step is to generate N>>1000 particles and have each of them undergo the same motion; hence, N*(2-body) problem. So having just learned the concept of arrays in C++ and the psuedo-random number generator I was able to apply it in this problem.

#include <iostream>

#include <cmath>

#include <cstdlib>

#include <fstream>

using namespace std;

int main ()


ofstream dataFile("");

int m,n= 4;

double x[n][m],y[n][m],vx[n][m],vy[n][m],ax[n][m],ay[n][m],r[n][m],dt;

dt = 0.05;


for (int i=0; i//filling each variable array






double t=0;

for (int j=1; t<40; j++)

{ r[j-1][i]=sqrt(x[j-1][i]*x[j-1][i]+y[j-1][i]*y[j-1][i]);



vx[j][i] = vx[j-1][i]+(ax[j-1][i])*(dt);

vy[j][i] = vy[j-1][i]+(ay[j-1][i])*(dt);

x[j][i] = x[j-1][i]+vx[j-1][i]*dt;

y[j][i] = y[j-1][i]+vy[j-1][i]*dt;

dataFile x[j][i] << '\t' << y[j][i] << '\n';//Need to find another way to write the data properly

t += dt;




return 0;


Looks messy and long, doesn't it? Remember, I'm still a novice in C++! So initially, I assigned to compute for 4 particles and here's the output...

Still looks good. Actually, there were some problems in the coding prior to this result where particles located farther than 1 unit would fly off its orbit. Here's a sample of what I'm referring to...

It does look kind of funny but interesting at the same time. Notice that each particle seem to have a different center in their orbit, which puzzled me for a while. I sort of consulted with Prof. Artymowicz and he suggested it might be the initial conditions so I went and check and indeed it was! My code for the N*(2-body) program was based on the 2-body program I wrote earlier and it I might have forgotten to change the initial conditions. Silly mistake!

So then I tried to compute for 100 particles to see how it looks but apparently there is some bug in XCode (or in my code). Oh yes, I should mention that I am using XCode as my IDE to compile my programs since I'm mainly using a macbook pro which has two nvidia cards that are suitable for CUDA (perhaps not suitable for dealing really massive computations). I believe this bug has to do with with the memory storage in the arrays so I would need to look at pointers for C++ or something? Maybe one of my colleagues could improve my program to make it more efficient. This is where I start to appreciate more with the idea of CUDA where it would allow us to do some fast, efficient computations for modeling in astrophysics.

So this is where I'm currently at with the programming and now I'm learning OpenGL on my own so that I can do some 'real' animated, visual work with my program. Hopefully it won't not take too long to learn.



Monday, May 17, 2010

How to install openGL to get graphics

You qill want something called "freeglut", this tut is for windows:
- Download freeglut 2.6.0 and glut 3.7 packages.
- Extract the files to a temp folder.
- If you are using Visual Studio then locate each of the below files in the extracts and,
  • Put freeglut.h in: ‘C:\Program Files\Microsoft Visual Studio 9.0\VC\include\GL\’ (note: you'll have to create the GL folder)
  • Put freeglut_ext.h in: ‘C:\Program Files\Microsoft Visual Studio 9.0\VC\include\GL\’
  • Put freeglut_std.h in: ‘C:\Program Files\Microsoft Visual Studio 9.0\VC\include\GL\’
  • Put freeglut.lib in: ‘C:\Program Files\Microsoft Visual Studio 9.0\VC\lib\’
  • Put freeglut.dll in: ‘C:\WINDOWS\system32\’ (SysWOW64 folder for vista or win7 64 bit users)
  • Put glut32.dll in: ‘C:\WINDOWS\system32\’ (SysWOW64 folder for vista or win7 64 bit users)
  • Put glut32.lib in: ‘C:\Program Files\Microsoft Visual Studio 9.0\VC\lib\’
  • Put glut.h in: ‘C:\Program Files\Microsoft Visual Studio 9.0\VC\include\GL\’
- Also in Visual Studio you will have to go to project proporties Linker > Input > additional dependancies and add "opengl32.lib glu32.lib glut32.lib" without quotes. Also you will need to add the lib and includes that you just added. let me know if you have difficulty. is another good site.

The easiest way if you don't use visual studio is to just put the .h and .lib files in the same folder as your source code and put the .dlls in the same folder as compiled .exe file. Then at the top of your code just have #include "freeglut.h".

Heres some stuff you can make. Posted above is a frame from my nbod x,y,z system simulation. Each quadrant of the graphic shows a different angle.


Friday, May 14, 2010

Our Intents

We would like to build several super computers, link them with infini-band, and do several simulations such as protoplanetary disk formation with optical tau>1 (leading to dust instabilities). To do this we will code in CUDA our simulations and run the them in parallel across many threads.

We would also like to build a super computer at the University of Toronto Scarborough Campus out of GPU clusters that has roughly half the performance of the super computer at the St. George Campus "SciNet" with a fraction of the energy costs.

The Summer 2010 research team consists of Pawel Artymowicz; Jeffrey Fung; Joshua Albert; Vince Vetro; Anthony Leung; James Joyce; Robert Labancz. Each person has different roles in the team.

In the picture are Robert, Vince, Josh, and Pawel having just finished assembling the first super computer.