- (Computer Science Solution 1) Given that nowadays almost every machine (or even a mobile phone) is a multicore and there is no dependency between simulation runs, let's run them in parallel.
- (Computer Science Solution 2) Profile the code, identify the bottleneck, figure out asymptotic time complexity, and then redesign the steps in the algorithm accordingly.
- (Stochastic Process Solution) Apply variance reduction techniques so that I can achieve the same level of simulation accuracy with less runs.
- (Numerical Solution?) pre-compute as much as possible, like (r - 0.5*(vol**2.0))*T or 2.0*Math.PI...etc, so that I don't have to repeatedly compute the same constants.
- (hmm... Finance Solution perhaps) Simply use money to resolve this issue, i.e., buy another machine with faster processors.
The results are as follows:
version
|
option price
|
CPU time (ms)
|
absolute time (ms)
|
List
|
10.118465
|
5,335
|
5,467
|
Array
|
10.118465
|
1,606
|
1,639
|
Sequence
|
10.118465
|
7,612
|
7,631
|
All the 3 option prices are the same because they use the same seed, 1, to generate random numbers. As you can see, the performance improvement is obviously worth the effort of "Find and Replace". Now the new "Array" version can finish the same computations within less than 1/3 of the time needed by the original "List" version.
The fact that Array is the fastest and Sequence is the slowest is not surprising to me. And I don't bother to find out why Sequence took so much time relatively at this moment, because the way I implemented the code doesn't really need its main strengths: lazy evaluation and ability to model infinite series. However, what I found intriguing is why Array is so fast relative to List.
What I learnt from some F# introductory books is that Array is efficient for random access, which is also where List tends to perform horribly given its singly-linked nature. Though, in our Monte Carlo computations, we walk through a list or an array in a sequential manner only. As such, I don't think it's the efficiency of random access to cause the performance difference I'm looking at.
After running a few simple tests on List and Array using F# Interactive as described below, it seems to me that in our case what really makes the difference is creation of a data structure rather than access to individual elements in the data structure. As you can see from below, it took more than 2 seconds to initialize a list of 10,000,000 floats, while the same action with an array took less than 0.1 seconds. Therefore, in terms of creation or allocation, arrays perform much faster than lists. In terms of iterating over the 10,000,000 floats, although the result below shows that the list was even slightly faster than the array, the difference is immaterial as far as our case is concerned.
Please note that by this post I am not saying that arrays are always superior than lists. At least we should remember that lists have execellent constant time complexity in stack-style operations, which I don't use in my Monte Carlo code.
Based on the finding that creation of a new list can be costly, I will be updating the code of the "List" version accordingly in next post.
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
//// Here are the simple tests on Array and List using F# Interactive
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
Firstly, I turn on the timing function:
> #time;;
--> Timing now on
Secondly, I create a list of 10,000,000 floats and sum up using the fold function:
> let X = List.init 10000000 (fun i -> float i);;
Real: 00:00:02.118, CPU: 00:00:02.246, GC gen0: 38, gen1: 20, gen2: 2
val X : float list =
[0.0; 1.0; 2.0; 3.0; 4.0; 5.0; 6.0; 7.0; 8.0; 9.0; 10.0; 11.0; 12.0; 13.0;
14.0; 15.0; 16.0; 17.0; 18.0; 19.0; 20.0; 21.0; 22.0; 23.0; 24.0; 25.0;
26.0; 27.0; 28.0; 29.0; 30.0; 31.0; 32.0; 33.0; 34.0; 35.0; 36.0; 37.0;
38.0; 39.0; 40.0; 41.0; 42.0; 43.0; 44.0; 45.0; 46.0; 47.0; 48.0; 49.0;
50.0; 51.0; 52.0; 53.0; 54.0; 55.0; 56.0; 57.0; 58.0; 59.0; 60.0; 61.0;
62.0; 63.0; 64.0; 65.0; 66.0; 67.0; 68.0; 69.0; 70.0; 71.0; 72.0; 73.0;
74.0; 75.0; 76.0; 77.0; 78.0; 79.0; 80.0; 81.0; 82.0; 83.0; 84.0; 85.0;
86.0; 87.0; 88.0; 89.0; 90.0; 91.0; 92.0; 93.0; 94.0; 95.0; 96.0; 97.0;
98.0; 99.0; ...]
> let sum_X = List.fold (fun s x -> s + x) 0.0 X;;
Real: 00:00:00.062, CPU: 00:00:00.062, GC gen0: 0, gen1: 0, gen2: 0
val sum_X : float = 4.9999995e+13
Finally, I do the same things with Array:
> let Y = Array.init 10000000 (fun i -> float i);;
Real: 00:00:00.076, CPU: 00:00:00.078, GC gen0: 0, gen1: 0, gen2: 0
val Y : float [] =
[|0.0; 1.0; 2.0; 3.0; 4.0; 5.0; 6.0; 7.0; 8.0; 9.0; 10.0; 11.0; 12.0; 13.0;
14.0; 15.0; 16.0; 17.0; 18.0; 19.0; 20.0; 21.0; 22.0; 23.0; 24.0; 25.0;
26.0; 27.0; 28.0; 29.0; 30.0; 31.0; 32.0; 33.0; 34.0; 35.0; 36.0; 37.0;
38.0; 39.0; 40.0; 41.0; 42.0; 43.0; 44.0; 45.0; 46.0; 47.0; 48.0; 49.0;
50.0; 51.0; 52.0; 53.0; 54.0; 55.0; 56.0; 57.0; 58.0; 59.0; 60.0; 61.0;
62.0; 63.0; 64.0; 65.0; 66.0; 67.0; 68.0; 69.0; 70.0; 71.0; 72.0; 73.0;
74.0; 75.0; 76.0; 77.0; 78.0; 79.0; 80.0; 81.0; 82.0; 83.0; 84.0; 85.0;
86.0; 87.0; 88.0; 89.0; 90.0; 91.0; 92.0; 93.0; 94.0; 95.0; 96.0; 97.0;
98.0; 99.0; ...|]
> let sum_Y = Array.fold (fun s x -> s + x) 0.0 Y;;
Real: 00:00:00.107, CPU: 00:00:00.109, GC gen0: 0, gen1: 0, gen2: 0
val sum_Y : float = 4.9999995e+13
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
//// The Array version of Monte Calro Simulation
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
open System
open System.Diagnostics
let N = 10000000
let rnd = new Random(1)
let randomPairs = Array.init N (fun i -> (rnd.NextDouble(), rnd.NextDouble()))
let box_muller_transform (u1,u2) = sqrt(-2.0*log(u1)) * cos(2.0*Math.PI*u2)
let randomNormals = randomPairs |> Array.map box_muller_transform
let call_payout K S = max (S-K) 0.0
let European_Option_Price (payout:float->float) S0 r vol T =
let S_T x = S0 * exp((r - 0.5*(vol**2.0))*T + vol*sqrt(T)*x)
randomNormals |> Array.map S_T |> Array.map payout |> Array.average
|> (*) (exp(-r*T))
let S0 = 50.0
let K = 40.0
let r = 0.01
let vol = 0.2
let T = 0.25
let price() =
let price = European_Option_Price (call_payout K) S0 r vol T
printfn "option_price = %f" price
let time f =
let proc = Process.GetCurrentProcess()
let cpu_time_stamp = proc.TotalProcessorTime
let timer = new Stopwatch()
timer.Start()
try
f()
finally
let cpu_time = (proc.TotalProcessorTime-cpu_time_stamp).TotalMilliseconds
printfn "CPU time = %dms" (int64 cpu_time)
printfn "Absolute time = %dms" timer.ElapsedMilliseconds
time price
No mystery there! I've been systematically investigating F# data structure performance. See my latest article in the series
ReplyDeleteThanks for the info. I will definitely take a look.
Delete