## Friday, August 24, 2012

### F#: Running Monte Carlo Simulation on GPU

Multi-threading on CPU is a bit old fashioned. Let's run Monte Carlo Simulation with style on GPU. Here I'm using the Accelerator API from Microsoft Research. Tomas Petricek has a nice and concise introduction about the API. In fact, this post mainly relies on the introduction. If you never use the API, it's a good idea to take a look at the article to pick up some basic concepts on how the API works.

Here are some points I like to mention about the sample code below:
• Currently Accelerator does not support random number generation. Since I want to time only the execution done on GPU, I put the code for normal random number generation outside the main pricing function (i.e., European_Option_Price), as what I did in my previous posts on Monte Carlo. Inside the the pricing function, I try to put every computation on GPU.
• As I still use the Box Muller transform, to avoid the issue of generating a normal random number of infinity, I apply the simple transform (i.e., Y = 1-U[0,1)) mentioned in the blog post. Well, if you want to try, you can choose not to apply the transform and set the seed to 388, and then see what your GPU will return. I tried that for fun.
• Why I set grid_size = 3163 is because the number of simulations will be the square of grid_size, which is slightly over the simulation size (10 millions) I used in my previous posts on Monte Carlo. Easier for me to compare.
• Obviously, using the API reduces F#'s expressiveness, but with some custom syntactic sugars, like overloaded operators, I believe that it can be mitigated. However, I don't bother to do that at this moment.
After running the code, the output is as follows:

option_price = 10.117122
CPU time = 234ms
Absolute time = 433ms

Wow, in terms of absolute time, this GPU version is almost 4 times faster than my previous champion, the Array version. Though we fix the seed to 1, the reason why the option price given by the GPU version is different from the price by the Array version is mainly because of the simple transform I mentioned above. If you don't apply the transform, you will see both versions produce "almost" the same option price (some very minor difference is still there, due to slightly different number of simulations).

My laptop is 4 years old and equiped with only a lousy Intel 965 Express chipset, because I seldom play PC games and hence I used to look at mainly CPU speed and memory when purchasing a laptop. Now, given this impressive performance result, next time when I buy a new laptop, I will definitely choose one with a high-end NVIDIA graphics chipset.

Why NVIDIA? because I like to try the following:
• Actually the latest release of Accelerator contains a CUDA target.
• NVIDIA claims their cuRAND library can generate random numbers at light speed.

////////////////////////////////////////////////////////////
open System
open System.Diagnostics
open Microsoft.ParallelArrays

type PA = Microsoft.ParallelArrays.ParallelArrays
type FPA = Microsoft.ParallelArrays.FloatParallelArray

let grid_size = 3163
let shape = [| grid_size; grid_size |]
let dxTarget = new DX9Target()

let rnd = new Random(1)
let next_double_pair () =
(1.0 - rnd.NextDouble(), 1.0 - rnd.NextDouble())
//let rnd = new Random(388)
//let next_double_pair () = (rnd.NextDouble(), rnd.NextDouble())
let box_muller_transform (u1,u2) =
sqrt(-2.0*log(u1)) * cos(2.0*Math.PI*u2)
let generator _ _ =
next_double_pair() |> box_muller_transform |> (float32)
let random_normals = Array2D.init grid_size grid_size generator
let European_Option_Price K S0 r vol T =
let FPA_zero = new FPA(float32 0.0, shape)
let FPA_N = new FPA(float32 (grid_size * grid_size), [|1|])
let FPA_discount = new FPA(float32 (exp(-r*T)), [|1|])
let FPA_K = new FPA(float32 K,shape)
let FPA_S0 = new FPA(float32 S0,shape)
let FPA_mean =
new FPA(float32 ((r - 0.5*(vol**2.0))*T), shape)
let FPA_sigma = new FPA(float32 (vol*sqrt(T)), shape)
let FPA_exp = new FPA(float32 (exp 1.0), shape)
let FPA_normals = new FPA(random_normals)
let FPA_S_T =
FPA_S0 *
PA.Pow(FPA_exp,
(FPA_mean + FPA_sigma * FPA_normals))
let FPA_payouts = PA.Max(FPA_S_T - FPA_K, FPA_zero)
let FPA_price = FPA_discount * PA.Sum(FPA_payouts) / FPA_N
let result = dxTarget.ToArray1D(FPA_price)
(float) result.[0]

let S0 = 50.0
let K = 40.0
let r = 0.01
let vol = 0.2
let T = 0.25
let price() =
let price = European_Option_Price K S0 r vol T
printfn "option_price = %f" price

let time f =
let proc = Process.GetCurrentProcess()
let cpu_time_stamp = proc.TotalProcessorTime
let timer = new Stopwatch()
timer.Start()
try
f()
finally
let cpu_time = (proc.TotalProcessorTime -
cpu_time_stamp).TotalMilliseconds
printfn "CPU time = %dms" (int64 cpu_time)
printfn "Absolute time = %dms" timer.ElapsedMilliseconds

time price