Friday, September 7, 2012

F#: K-S test on final prices of GBM paths

In the previous post, I mentioned that I chose System.Random, because it’s faster than any other random number generators available in Math.NET Numerics. And I also said that I have a minor concern about System.Random, because someone found a problem in the implementation of System.Random and raised this issue to the .NET team in early 2011. And the team admits it’s a problem. However, this issue is still there with .NET Framework 4.5. And it seems to me that the .NET team is not planning to take any immediate action.

As I know little about the algorithm of System.Random, I’m not sure if it’s worth the efforts to pick up the details and quantify the impact. As such, I decided to take an easy way out, i.e., close my eyes to this issue  do Kolmogrov-Smirnov test (KS test) on only the final prices distribution. If the simulated distribution is statistically indistinguishable from the true, theoretic distribution in terms of shape, I will be fine with System.Random.  At the end of the day, I think what really matters to me is whether System.Random can ultimately give me the correct shape of the final prices distribution I want.

Basically the F# code is pretty much the same as in the previous post except the following:

• I add a run_stat_test function to run the statistical test and generate a Q-Q plot in Matlab.

• I change the Monte Carlo parameter N to 1 so that I simulate only one data point, i.e., final price, for each simulated path.

// Monte Carlo parameters
let T = 1.0             // time horizon in years
let M = 100000          // number of simulations
let N = 1               // number of time intervals

You can see that I take the logarithms of all final prices and test them against a normal distribution with mean = (log S0)+(r-0.5*sigma*sigma)*T and standard deviation = sigma*sqrt(T) using KS test. And the output from the test is as follows:

h = false, p = 0.7243765247

And here is the Q-Q plot:

p-value is about 0.72, which means it’s quite likely that the simulated prices were indeed sampled from the normal distribution, and h=false means that we can’t reject the null hypothesis. Moreover, the Q-Q plot does not show any obvious outliers off the diagonal line. Therefore, it’s ok for me to continue to use System.Random.

If any expert happens to know about the issue of System.Random and is willing to shed some light on it, it will be appreciated.

//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

open MathNet.Numerics.Distributions
open MathNet.Numerics.Statistics

// Monte Carlo parameters
let T = 1.0             // time horizon in years
let M = 100000          // number of simulations
let N = 1               // number of time intervals
let p = 0.99            // confidence level

// underlying dynamics
let S0 = 50.0         // spot price
let sigma = 0.2       // annualized vol
let r = 0.01          // risk-free rate

// risk factor dW
let rnd = new System.Random(1)
let get_dW dt =
let dW = Normal.WithMeanVariance(0.0, dt)
dW.RandomSource <- rnd
(fun () -> dW.Sample())

// GBM path generation
let generate_GBM_paths S0 r sigma T M N =
let dt = T / (float N)
let drift = (r - 0.5 * (sigma**2.0)) * dt
let dW = get_dW dt
Array.init M (fun _ ->
Array.init N (fun _ -> drift + sigma * dW()) |>
Array.scan (+) 0.0 |>
Array.map (fun x -> S0 * exp(x)) )

let run_stat_test (prices:float array) =
let m = (log S0)+(r-0.5*sigma*sigma)*T
let s = sigma*sqrt(T)
let log_prices = Array.map log prices
let matlab = new MLApp.MLAppClass()
matlab.PutWorkspaceData("log_prices", "base", log_prices)
matlab.PutWorkspaceData("m", "base", m)
matlab.PutWorkspaceData("s", "base", s)
matlab.Execute("pd = ProbDistUnivParam('normal',[m s])") |> ignore
matlab.Execute("[h,p] = kstest(log_prices,pd,0.05,0)") |> ignore
matlab.Execute("qqplot(log_prices,pd)") |> ignore
let h = matlab.GetVariable("h","base")
let p = matlab.GetVariable("p","base")
(h,p)

// main
let main () =
let final_prices = generate_GBM_paths S0 r sigma T M N |>
Array.map (fun p -> p.[p.Length - 1])
let forward = final_prices.Mean()
let std_dev = final_prices.StandardDeviation()
let std_error = std_dev / sqrt(float M)
let std_normal = Normal.WithMeanVariance(0.0,1.0)
let radius = std_normal.InverseCumulativeDistribution(0.5*(1.0+p))
* std_error
let conf_int = (forward - radius, forward + radius)
let h,p = run_stat_test final_prices
printfn "h = %A, p = %A" h p
(forward, std_dev, std_error, conf_int)

// run simulation
main()

// calculate theoretical mean and deviation
let true_forward = S0 * exp(r*T)
let v = sigma*sigma*T
let true_std_dev = sqrt((exp(v)-1.0)*exp(2.0*(log(S0)+r*T-0.5*v*T)+v))

Sunday, September 2, 2012

F#: Simulate entire GBM path

In all the Monte Carlo simulation codes I have done in previous posts, I simulate only the underlying prices at expiry, because European Calls and Puts are dependent on only the final price at expiry and we knew that the final prices follow a lognormal distribution, for which there's a nice analytic expression and hence we can easily simulate. However, if we move further out of the safe zone of European options, it's quite often that we need to simulate entire GBM (Geometric Brownian Motion) paths. Before we really get into dealing with path-dependent options, let's first take a look at my implementation for generating GBM paths based on the Euler method.

In this post, I started using a open-source numerical library, Math.NET Numerics. Don Syme wrote a blog on how to install and use it. Basically I use it in my code for two purposes:

• using the Normal class to draw normal random numbers. Internally the Normal class also uses the Box Muller transform to generate samples, but it uses the polar form of the transform, which is faster than the basic form I had been using in previous posts. Another good thing about the polar form is that it does not suffer the infinity issue I mentioned in a previous post.
• using some handy statistical functions, like mean, standard deviation, and inverse CDF, because I like to compute standard error and confidence interval.
Before we look into the code, here are the parameters I use:

// Monte Carlo parameters
let T = 1.0             // time horizon in years
let M = 100000          // number of simulations
let N = 252             // number of time intervals
let p = 0.99            // confidence level

// underlying dynamics
let S0 = 50.0         // spot price
let sigma = 0.2       // annualized vol
let r = 0.01          // risk-free rate

Below is the code snippet to construct a function which generates Brownian Motion increments. You can see that I use System.Random as the random source, because it's a few times faster than all other uniform random sources provided by Math.NET Numerics. (I actually have a minor concern about System.Random. I might talk about it later in another post.)

// risk factor dW
let rnd = new System.Random(1)
let get_dW dt =
let dW = Normal.WithMeanVariance(0.0, dt)
dW.RandomSource <- rnd
(fun () -> dW.Sample())

The below function, generate_GBM_paths, is the one that actually simulate all the entire paths. Here I use an array of arrays to keep all the paths simulated. Storing every complete simulated path is memory consuming, but it's convenient for me to do post-simulation analysis. I think this code snippet is a good example to show that F#'s built-in array processing functions are really expressive and handy, compared to other major general-purpose languages.

// GBM path generation based on the Euler method
let generate_GBM_paths S0 r sigma T M N =
let dt = T / (float N)
let drift = (r - 0.5 * (sigma**2.0)) * dt
let dW = get_dW dt
Array.init M (fun _ ->
Array.init N (fun _ -> drift + sigma * dW()) |>
Array.scan (+) 0.0 |>
Array.map (fun x -> S0 * exp(x)) )

As you can see from the main function of the code, after generation of GBM paths, I extracted final prices out of all the paths and passed them to:

• the statistical functions provided by Math.NET Numerics to compute
• mean, i.e., the forward price of the underlying
• standard deviation of the simulated final price distribution
• standard error of the final price simulation
• confidence interval
• Matlab to plot KDE (Kernel Density Estimation) to see if the simulated final price distribution looks like a lognormal distribution. I recently just learnt KDE from the book, "Data Analysis with Open Source Tools," written by Philipp K. Janert. In case you haven't heard about KDE, it's something like an advanced version of histograms. In a KDE, the graph will be smoothed by a density function and you don't need to find out what bin size you should use. If you don't like to run this KDE part, you could simply comment it out.

Finally, here are the results from the main function:

Real: 00:00:07.541, CPU: 00:00:06.817, GC gen0: 201, gen1: 135, gen2: 2
val it : float * float * float * (float * float) =
(50.47657684, 10.21848436, 0.03231368481, (50.3933423, 50.55981138))

The entries in the above tuple are mean, standard deviation, standard error, and confidence interval. I also calculated theoretical mean and standard deviation as follows:

val true_forward : float = 50.50250835
val true_std_dev : float = 10.20235347

And below is the KDE plot:

///////////////////////////////////////////////////////////////////////////////////////////////////////

open MathNet.Numerics.Distributions
open MathNet.Numerics.Statistics

// Monte Carlo parameters
let T = 1.0             // time horizon in years
let M = 100000          // number of simulations
let N = 252             // number of time intervals
let p = 0.99            // confidence level

// underlying dynamics
let S0 = 50.0         // spot price
let sigma = 0.2       // annualized vol
let r = 0.01          // risk-free rate

// risk factor dW
let rnd = new System.Random(1)
let get_dW dt =
let dW = Normal.WithMeanVariance(0.0, dt)
dW.RandomSource <- rnd
(fun () -> dW.Sample())

// GBM path generation
let generate_GBM_paths S0 r sigma T M N =
let dt = T / (float N)
let drift = (r - 0.5 * (sigma**2.0)) * dt
let dW = get_dW dt
Array.init M (fun _ ->
Array.init N (fun _ -> drift + sigma * dW()) |>
Array.scan (+) 0.0 |>
Array.map (fun x -> S0 * exp(x)) )

// plot the simulated distribution using KDE in Matlab
let plot_KDE (prices:float array) =
let matlab = new MLApp.MLAppClass()
matlab.PutWorkspaceData("x", "base", prices)
matlab.Execute("[f,xi] = ksdensity(x);") |> ignore
matlab.Execute("plot(xi,f);") |> ignore
()

// main
let main () =
let final_prices = generate_GBM_paths S0 r sigma T M N |>
Array.map (fun p -> p.[p.Length - 1])
let forward = final_prices.Mean()
let std_dev = final_prices.StandardDeviation()
let std_error = std_dev / sqrt(float M)
let std_normal = Normal.WithMeanVariance(0.0,1.0)
let radius = std_normal.InverseCumulativeDistribution(0.5*(1.0+p))
* std_error
let conf_int = (forward - radius, forward + radius)
plot_KDE final_prices
(forward, std_dev, std_error, conf_int)

// run simulation
main()

// calculate theoretical mean and deviation
let true_forward = S0 * exp(r*T)
let v = sigma*sigma*T
let true_std_dev = sqrt((exp(v)-1.0)*exp(2.0*(log(S0)+r*T-0.5*v*T)+v))

Friday, August 24, 2012

F#: Running Monte Carlo Simulation on GPU

Multi-threading on CPU is a bit old fashioned. Let's run Monte Carlo Simulation with style on GPU. Here I'm using the Accelerator API from Microsoft Research. Tomas Petricek has a nice and concise introduction about the API. In fact, this post mainly relies on the introduction. If you never use the API, it's a good idea to take a look at the article to pick up some basic concepts on how the API works.

Here are some points I like to mention about the sample code below:
• Currently Accelerator does not support random number generation. Since I want to time only the execution done on GPU, I put the code for normal random number generation outside the main pricing function (i.e., European_Option_Price), as what I did in my previous posts on Monte Carlo. Inside the the pricing function, I try to put every computation on GPU.
• As I still use the Box Muller transform, to avoid the issue of generating a normal random number of infinity, I apply the simple transform (i.e., Y = 1-U[0,1)) mentioned in the blog post. Well, if you want to try, you can choose not to apply the transform and set the seed to 388, and then see what your GPU will return. I tried that for fun.
• Why I set grid_size = 3163 is because the number of simulations will be the square of grid_size, which is slightly over the simulation size (10 millions) I used in my previous posts on Monte Carlo. Easier for me to compare.
• Obviously, using the API reduces F#'s expressiveness, but with some custom syntactic sugars, like overloaded operators, I believe that it can be mitigated. However, I don't bother to do that at this moment.
After running the code, the output is as follows:

option_price = 10.117122
CPU time = 234ms
Absolute time = 433ms

Wow, in terms of absolute time, this GPU version is almost 4 times faster than my previous champion, the Array version. Though we fix the seed to 1, the reason why the option price given by the GPU version is different from the price by the Array version is mainly because of the simple transform I mentioned above. If you don't apply the transform, you will see both versions produce "almost" the same option price (some very minor difference is still there, due to slightly different number of simulations).

My laptop is 4 years old and equiped with only a lousy Intel 965 Express chipset, because I seldom play PC games and hence I used to look at mainly CPU speed and memory when purchasing a laptop. Now, given this impressive performance result, next time when I buy a new laptop, I will definitely choose one with a high-end NVIDIA graphics chipset.

Why NVIDIA? because I like to try the following:
• Actually the latest release of Accelerator contains a CUDA target.
• NVIDIA claims their cuRAND library can generate random numbers at light speed.

////////////////////////////////////////////////////////////
open System
open System.Diagnostics
open Microsoft.ParallelArrays

type PA = Microsoft.ParallelArrays.ParallelArrays
type FPA = Microsoft.ParallelArrays.FloatParallelArray

let grid_size = 3163
let shape = [| grid_size; grid_size |]
let dxTarget = new DX9Target()

let rnd = new Random(1)
let next_double_pair () =
(1.0 - rnd.NextDouble(), 1.0 - rnd.NextDouble())
//let rnd = new Random(388)
//let next_double_pair () = (rnd.NextDouble(), rnd.NextDouble())
let box_muller_transform (u1,u2) =
sqrt(-2.0*log(u1)) * cos(2.0*Math.PI*u2)
let generator _ _ =
next_double_pair() |> box_muller_transform |> (float32)
let random_normals = Array2D.init grid_size grid_size generator
let European_Option_Price K S0 r vol T =
let FPA_zero = new FPA(float32 0.0, shape)
let FPA_N = new FPA(float32 (grid_size * grid_size), [|1|])
let FPA_discount = new FPA(float32 (exp(-r*T)), [|1|])
let FPA_K = new FPA(float32 K,shape)
let FPA_S0 = new FPA(float32 S0,shape)
let FPA_mean =
new FPA(float32 ((r - 0.5*(vol**2.0))*T), shape)
let FPA_sigma = new FPA(float32 (vol*sqrt(T)), shape)
let FPA_exp = new FPA(float32 (exp 1.0), shape)
let FPA_normals = new FPA(random_normals)
let FPA_S_T =
FPA_S0 *
PA.Pow(FPA_exp,
(FPA_mean + FPA_sigma * FPA_normals))
let FPA_payouts = PA.Max(FPA_S_T - FPA_K, FPA_zero)
let FPA_price = FPA_discount * PA.Sum(FPA_payouts) / FPA_N
let result = dxTarget.ToArray1D(FPA_price)
(float) result.[0]

let S0 = 50.0
let K = 40.0
let r = 0.01
let vol = 0.2
let T = 0.25
let price() =
let price = European_Option_Price K S0 r vol T
printfn "option_price = %f" price

let time f =
let proc = Process.GetCurrentProcess()
let cpu_time_stamp = proc.TotalProcessorTime
let timer = new Stopwatch()
timer.Start()
try
f()
finally
let cpu_time = (proc.TotalProcessorTime -
cpu_time_stamp).TotalMilliseconds
printfn "CPU time = %dms" (int64 cpu_time)
printfn "Absolute time = %dms" timer.ElapsedMilliseconds

time price

Monday, August 20, 2012

F#: Infinity and Infinitesimal-ness

These days many programming languages have the notion of infinity defined explicitly with their float type. F# (or .Net actually) is no exception. In contrast to infinity, Double.Epsilon is the smallest positive float in F#, i.e., anything between Double.Epsilon and 0.0 is viewed as 0.0 in F#. Let's use F# Interactive to do some simple empirical studies on both infinity and Double.Epsilon in F#.

First of all, F# has a built-in float value of infinity, which is equivalent to System.Double.PositiveInfinity in .Net.

> infinity;;
val it : float = infinity
> System.Double.PositiveInfinity;;
val it : float = infinity

Well, negative infinity is also there.

> -infinity;;
val it : float = -infinity
> System.Double.NegativeInfinity;;
val it : float = -infinity

Then let's see how small Double.Epsilon is:

> System.Double.Epsilon;;
val it : float = 4.940656458e-324

Not sure how you feel about this, but it looks "scary" small to me. By the way, we also have Double.MaxValue and Double.MinValue.

> System.Double.MaxValue;;
val it : float = 1.797693135e+308
> infinity > System.Double.MaxValue;;
val it : bool = true
> System.Double.MinValue;;
val it : float = -1.797693135e+308
> System.Double.MinValue > -infinity;;
val it : bool = true

In financial mathematics, log(x) and exp(x) are used excessively, so let's take a look at how these 2 functions operate with infinity and Double.Epsilon.

> exp(infinity);;
val it : float = infinity
> log(infinity);;
val it : float = infinity

> log(0.0);;
val it : float = -infinity

We know that log(x) goes to negative infinity when x goes to 0.0. What if we evaluate log(x) at x = Double.Epsilon? Can F# properly handle this seemingly extreme case?

> log(System.Double.Epsilon);;
val it : float = -744.4400719
> exp(log(System.Double.Epsilon));;
val it : float = 4.940656458e-324
> exp(log(System.Double.Epsilon)) = System.Double.Epsilon;;
val it : bool = true

OK, log(Double.Epsilon) is only around -744.44, not too bad relative to the infinitesimal-ness of Double.Epsilon, and if you substitute this value into exp(x) you basically recover Double.Epsilon as shown above. If you run similar tests in Matlab, Matlab will return you pretty much the same results.

The above example actually implies a few things about the resolution of log(x) and exp(x) in F#:
• Other than negative infinity, you won't see that log(x) could ever return any numerical value lower than -744.4400719, i.e., log(Double.Epsilon)
• For any x smaller than -744.4400719, i.e., log(Double.Epsilon), exp(x) is evaluated as 0.0.
Besides exp(x) and log(x), you might also try how well F# handles sin(x) and cos(x) when x is extremely big or small. You might see something unexpected.

Thursday, August 16, 2012

F#: How could the Monte Calro pricer return option_price = infinity?

If you leave the random seed blank in the Monte Carlo code and let System.Random pick up a time-dependent seed for you (or you deliberately set the seed to 388), you might see a funny pricing result like this:

option_price = Infinity
CPU time = 2636ms
Absolute time = 2435ms

Why you see this funny "infinity" price is because:
• System.Random.NextDouble() could generate 0.0, though the probability is extremely low. As such, it's possible to generate a tuple of (u1,u2) where u1 = 0.0 and (u2 < 0.25 or u2 > 0.75). This kind of (u1,u2) drives the Box Muller transform to infinity. Then, not surprisingly, we can see that a single sample of infinite payout screws up the Law of Large Numbers, as the integrability condition is not satisfied.
• The notion of infinity is defined in F# (or actually via Double.PositiveInfinity and Double.NegativeInfinity in .Net). F# knows how to handle log(0.0) and exp(infinity) so that F# elegantly tells you "Sir, the option is literally priceless." rather than returns you a garbage float number. Nevertheless, elegance without correctness means nothing, as infinity is a wrong answer to our option pricing problem .
Well, there's nothing wrong for System.Random to generate a value of 0.0, because this is simply how it's designed. The NextDouble() method is meant to approximate U[0,1), but the possibility of getting a 0.0 value is not desirable here. One workaround is to take a simple transform: Y = 1.0 - U[0,1). Then Y follows U(0,1]. So the code snippet we use to draw random numbers can be modified as follows:

let N = 10000000
let rnd = new Random(388)
let next_double () = 1.0 - rnd.NextDouble()
let randomPairs = Array.init N (fun i -> (next_double(), next_double()))
let box_muller_transform (u1,u2) = sqrt(-2.0*log(u1)) * cos(2.0*Math.PI*u2)
let randomNormals = randomPairs |> Array.map box_muller_transform

You might argue that mathematically P(U=0) is zero in either U(0,1) or U[0,1) or U(0,1] or U[0,1]. When it comes to implementation of a pseudo-random number generator, at least in the case of System.Random, U[0,1) usually implies P(U=0)>0.

Perhaps you like to know how I find out the seed of 388. For sure, I don't pick up this seed out of thin air. Here is the F# code I use to search for a seed that leads to infinity in our case:

///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
open System
open System.Diagnostics

let search_for_0 seed_min seed_max draw_max =
let found u1 u2 = if (u1 = 0.0 && (u2 < 0.25 || u2 > 0.75)) then true else false

let rec seed_loop seed =
let rnd = new Random(seed)

let rec draw_loop draw =
let (u1,u2) = (rnd.NextDouble(),rnd.NextDouble())
if found u1 u2 || draw = draw_max then (u1,u2, draw)
else draw_loop (draw + 1)

let (u1,u2, draw) = draw_loop 1
if found u1 u2 || seed = seed_max then (u1,u2, draw, seed)
else seed_loop (seed + 1)

let (u1,u2, draw, seed) = seed_loop seed_min
printfn "seed_min = %d, seed_max = %d, draw_max = %d" seed_min seed_max draw_max
if found u1 u2 then

printfn "found u1 = 0.0 and u2 = %f at draw %d of seed %d" u2 draw seed
else printfn "couldn't find 0.0 within the range specified."

let time () =
let proc = Process.GetCurrentProcess()
let cpu_time_stamp = proc.TotalProcessorTime
let timer = new Stopwatch()
timer.Start()
try
search_for_0 1 1000 10000000
finally
let cpu_time = (proc.TotalProcessorTime-cpu_time_stamp).TotalMilliseconds
printfn "CPU time = %dms" (int64 cpu_time)
printfn "Absolute time = %dms" timer.ElapsedMilliseconds

time()

Wednesday, August 15, 2012

F#: Monte Carlo Simulation using Tail Recursion

Previously we used lists or arrays to implement Monte Carlo simulation, where in the beginning of the code we draw a fixed number of random numbers and keep all of them in memory throughout the execution of the code. Needless to say it's very memory-consuming. Indeed, if I increase the number of simulation runs to 100 millions, the code (either the List or Array version) throws  System.OutOfMemoryException on my laptop.

We in fact don't really need to keep all the random numbers in memory, if the option price is the only thing we want to calculate, which is the case in the simple option pricing problem we are handling. At the end of the day, what really matters and hence we have to keep track of is the sum of all simulated option payouts rather than individual simulated payouts. In other words, every time when we generate a random number to simulate an option payout, there's no harm to throw away the random number as long as we have added the simulated payout to a partial sum value we keep track of.

That's where tail recursion can help. We want to use tail recursion with the accumulator pattern to achieve constant space complexity. In the code below,  the sum parameter of the loop function keeps track of the partial sum and plays the role of accumulator.

Well, here is the performance results for 10 million simulation runs on my laptop:
 version option price CPU time (ms) absolute time (ms) List 10.118465 5,335 5,467 Array 10.118465 1,606 1,639 Tail Recursion 10.118465 2,730 2,732

The performance of the tail-recursive version is quite OK relative to List and Array. Anyhow, the following results show the true power of tail recursion when it comes to 100 million runs.

 version option price CPU time (ms) absolute time (ms) List out of memory out of memory out of memory Array out of memory out of memory out of memory Tail Recursion 10.118305 26,800 26,898

Perhaps memory consumption is a more important factor to monitor than CPU time or absolute time. If I manually bring up Windows Task Manager, I can see that the memory use of the tail-recursive code remains steadily at the level of 1.4MB while the memory use of either the List or Array version keeps growing up to 1.8GB until it eventually throws out the out-of-memory exception and crashes.

///////////////////////////////////////////////////////////////////////////////
open System
open System.Diagnostics

let N = 10000000
let rnd = new Random(1)

let box_muller_transform (u1,u2) = sqrt(-2.0*log(u1)) * cos(2.0*Math.PI*u2)

let call_payout K S = max (S-K) 0.0

let European_Option_Price (payout:float->float) S0 r vol T =
let S_T x = S0 * exp((r - 0.5*(vol**2.0))*T + vol*sqrt(T)*x)
let rec loop n sum =
if n = N then sum
else let new_sample = (rnd.NextDouble(),rnd.NextDouble())
|> box_muller_transform|> S_T |> payout
loop (n+1) (sum+new_sample)
(loop 0 0.0) / (float N) |> (*) (exp(-r*T))

let S0 = 50.0
let K = 40.0
let r = 0.01
let vol = 0.2
let T = 0.25
let price() =
let price = European_Option_Price (call_payout K) S0 r vol T
printfn "option_price = %f" price

let time f =
let proc = Process.GetCurrentProcess()
let cpu_time_stamp = proc.TotalProcessorTime
let timer = new Stopwatch()
timer.Start()
try
f()
finally
let cpu_time = (proc.TotalProcessorTime-cpu_time_stamp).TotalMilliseconds
printfn "CPU time = %dms" (int64 cpu_time)
printfn "Absolute time = %dms" timer.ElapsedMilliseconds

time price

Tuesday, August 14, 2012

F#: speed up Monte Carlo Simulation by reducing creation of a new list

In the previous post, I learnt that creation of a new list could be much more costly than that of a new array. As such, I try to identify where in my original "List" version I can reduce creation of a new list.

Here is the original European_Option_Price function in the code:

let European_Option_Price (payout:float->float) S0 r vol T =
let S_T x = S0 * exp((r - 0.5*(vol**2.0))*T + vol*sqrt(T)*x)
randomNormals |> List.map S_T |> List.map payout |> List.average
|> (*) (exp(-r*T))

I modify the third line by using an anonymous function to wrap up the two function calls to S_T and payout:

let European_Option_Price (payout:float->float) S0 r vol T =
let S_T x = S0 * exp((r - 0.5*(vol**2.0))*T + vol*sqrt(T)*x)
randomNormals |> List.map (fun x -> S_T x |> payout) |> List.average
|> (*) (exp(-r*T))

The performance results are as follows:

 version option price CPU time (ms) absolute time (ms) List (original) 10.118465 5,335 5,467 List (modified) 10.118465 3,993 4,035

Obviously, the improvement is significant (>25%), although it's not as big as that of "Find and Replace". Nevertheless, if you like to stick with List in Monte Carlo simulation, perhaps you can consider creating less lists in your F# functions.

By adding the anonymous function, we actually introduce some overhead because now we call S_T and payout indirectly and hence have more states to maintain on the call stack. If you like to avoid the overhead, you could choose to move the payout call to the end of the S_T function rather than using an anonymous function.

let European_Option_Price (payout:float->float) S0 r vol T =
let S_T x = S0 * exp((r - 0.5*(vol**2.0))*T + vol*sqrt(T)*x) |> payout
randomNormals |> List.map S_T |> List.average
|> (*) (exp(-r*T))

You might see a bit improvment by saving the overhead incurred by the use of an anonymous function. However, I also tried that, but the improvement seems to be immaterial in this case so I don't bother to adopt it at this moment.

Monday, August 13, 2012

F#: use "Find and Replace" to speed up Monte Carlo Simulation

In the previous post, I implemented a Monte Carlo pricer using List in F#. If I would like to improve the performance of the pricer, what could I do? One may suggest the following solutions:

• (Computer Science Solution 1) Given that nowadays almost every machine (or even a mobile phone) is a multicore and there is no dependency between simulation runs, let's run them in parallel.
• (Computer Science Solution 2)  Profile the code, identify the bottleneck, figure out asymptotic time complexity, and then redesign the steps in the algorithm accordingly.
• (Stochastic Process Solution) Apply variance reduction techniques so that I can achieve the same level of simulation accuracy with less runs.
• (Numerical Solution?) pre-compute as much as possible, like (r - 0.5*(vol**2.0))*T or 2.0*Math.PI...etc, so that I don't have to repeatedly compute the same constants.
• (hmm... Finance Solution perhaps) Simply use money to resolve this issue, i.e., buy another machine with faster processors.
Well, all solutions above sound promising, but I actually took another even simpler approach. I just use the "Find and Replace" function in the Visual Studio code editor to replace all the "List" occurrences in my code with "Array" (and I tried "Seq" also). And then measure the performances of the 3 different versions: the original "List" version and the two new versions: "Array" and "Seq".

The results are as follows:
 version option price CPU time (ms) absolute time (ms) List 10.118465 5,335 5,467 Array 10.118465 1,606 1,639 Sequence 10.118465 7,612 7,631
( Please note that optimization is turned on while compiling the 3 versions. The code of the Array version is attached at the bottome of this post.)

All the 3 option prices are the same because they use the same seed, 1, to generate random numbers. As you can see, the performance improvement is obviously worth the effort of "Find and Replace". Now the new "Array" version can finish the same computations within less than 1/3 of the time needed by the original "List" version.

The fact that Array is the fastest and Sequence is the slowest is not surprising to me. And I don't bother to find out why Sequence took so much time relatively at this moment, because the way I implemented the code doesn't really need its main strengths: lazy evaluation and ability to model infinite series. However, what I found intriguing is why Array is so fast relative to List.

What I learnt from some F# introductory books is that Array is efficient for random access, which is also where List tends to perform horribly given its singly-linked nature. Though, in our Monte Carlo computations, we walk through a list or an array in a sequential manner only. As such, I don't think it's the efficiency of random access to cause the performance difference I'm looking at.

After running a few simple tests on List and Array using F# Interactive as described below, it seems to me that in our case what really makes the difference is creation of a data structure rather than access to individual elements in the data structure. As you can see from below, it took more than 2 seconds to initialize a list of 10,000,000 floats, while the same action with an array took less than 0.1 seconds. Therefore, in terms of creation or allocation, arrays perform much faster than lists. In terms of iterating over the 10,000,000 floats, although the result below shows that the list was even slightly faster than the array, the difference is immaterial as far as our case is concerned.

Please note that by this post I am not saying that arrays are always superior than lists. At least we should remember that lists have execellent constant time complexity in stack-style operations, which I don't use in my Monte Carlo code.

Based on the finding that creation of a new list can be costly, I will be updating the code of the "List" version accordingly in next post.

//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
//// Here are the simple tests on Array and List using F# Interactive
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
Firstly, I turn on the timing function:
> #time;;
--> Timing now on

Secondly, I create a list of 10,000,000 floats and sum up using the fold function:
> let X = List.init 10000000 (fun i -> float i);;
Real: 00:00:02.118, CPU: 00:00:02.246, GC gen0: 38, gen1: 20, gen2: 2

val X : float list =
[0.0; 1.0; 2.0; 3.0; 4.0; 5.0; 6.0; 7.0; 8.0; 9.0; 10.0; 11.0; 12.0; 13.0;
14.0; 15.0; 16.0; 17.0; 18.0; 19.0; 20.0; 21.0; 22.0; 23.0; 24.0; 25.0;
26.0; 27.0; 28.0; 29.0; 30.0; 31.0; 32.0; 33.0; 34.0; 35.0; 36.0; 37.0;
38.0; 39.0; 40.0; 41.0; 42.0; 43.0; 44.0; 45.0; 46.0; 47.0; 48.0; 49.0;
50.0; 51.0; 52.0; 53.0; 54.0; 55.0; 56.0; 57.0; 58.0; 59.0; 60.0; 61.0;
62.0; 63.0; 64.0; 65.0; 66.0; 67.0; 68.0; 69.0; 70.0; 71.0; 72.0; 73.0;
74.0; 75.0; 76.0; 77.0; 78.0; 79.0; 80.0; 81.0; 82.0; 83.0; 84.0; 85.0;
86.0; 87.0; 88.0; 89.0; 90.0; 91.0; 92.0; 93.0; 94.0; 95.0; 96.0; 97.0;
98.0; 99.0; ...]

> let sum_X = List.fold (fun s x -> s + x) 0.0 X;;
Real: 00:00:00.062, CPU: 00:00:00.062, GC gen0: 0, gen1: 0, gen2: 0

val sum_X : float = 4.9999995e+13

Finally, I do the same things with Array:
> let Y = Array.init 10000000 (fun i -> float i);;
Real: 00:00:00.076, CPU: 00:00:00.078, GC gen0: 0, gen1: 0, gen2: 0

val Y : float [] =
[|0.0; 1.0; 2.0; 3.0; 4.0; 5.0; 6.0; 7.0; 8.0; 9.0; 10.0; 11.0; 12.0; 13.0;
14.0; 15.0; 16.0; 17.0; 18.0; 19.0; 20.0; 21.0; 22.0; 23.0; 24.0; 25.0;
26.0; 27.0; 28.0; 29.0; 30.0; 31.0; 32.0; 33.0; 34.0; 35.0; 36.0; 37.0;
38.0; 39.0; 40.0; 41.0; 42.0; 43.0; 44.0; 45.0; 46.0; 47.0; 48.0; 49.0;
50.0; 51.0; 52.0; 53.0; 54.0; 55.0; 56.0; 57.0; 58.0; 59.0; 60.0; 61.0;
62.0; 63.0; 64.0; 65.0; 66.0; 67.0; 68.0; 69.0; 70.0; 71.0; 72.0; 73.0;
74.0; 75.0; 76.0; 77.0; 78.0; 79.0; 80.0; 81.0; 82.0; 83.0; 84.0; 85.0;
86.0; 87.0; 88.0; 89.0; 90.0; 91.0; 92.0; 93.0; 94.0; 95.0; 96.0; 97.0;
98.0; 99.0; ...|]

> let sum_Y = Array.fold (fun s x -> s + x) 0.0 Y;;
Real: 00:00:00.107, CPU: 00:00:00.109, GC gen0: 0, gen1: 0, gen2: 0

val sum_Y : float = 4.9999995e+13

//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
//// The Array version of Monte Calro Simulation
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
open System
open System.Diagnostics

let N = 10000000
let rnd = new Random(1)
let randomPairs = Array.init N (fun i -> (rnd.NextDouble(), rnd.NextDouble()))
let box_muller_transform (u1,u2) = sqrt(-2.0*log(u1)) * cos(2.0*Math.PI*u2)
let randomNormals = randomPairs |> Array.map box_muller_transform

let call_payout K S = max (S-K) 0.0

let European_Option_Price (payout:float->float) S0 r vol T =
let S_T x = S0 * exp((r - 0.5*(vol**2.0))*T + vol*sqrt(T)*x)
randomNormals |> Array.map S_T |> Array.map payout |> Array.average
|> (*) (exp(-r*T))

let S0 = 50.0
let K = 40.0
let r = 0.01
let vol = 0.2
let T = 0.25
let price() =
let price = European_Option_Price (call_payout K) S0 r vol T
printfn "option_price = %f" price

let time f =
let proc = Process.GetCurrentProcess()
let cpu_time_stamp = proc.TotalProcessorTime
let timer = new Stopwatch()
timer.Start()
try
f()
finally
let cpu_time = (proc.TotalProcessorTime-cpu_time_stamp).TotalMilliseconds
printfn "CPU time = %dms" (int64 cpu_time)
printfn "Absolute time = %dms" timer.ElapsedMilliseconds

time price

Thursday, August 9, 2012

F#: a Monte Carlo Pricer for European Calls and Puts

Here we implement Monte Carlo simulation for pricing European Calls and Puts in a Black-Scholes world. Although we already have closed-form solutions, I think it's a good, simple but non-trivial example to show how Monte Carlo simulation can be nicely and concisely done in F#, thanks to the following functional programming techniques:

• List processing
• Higher-order function
• Function currying
• Function pipelining

There are a few things to note about the code example:

• Normal random numbers are generated using Box-Muller transform.
• The number of simulation runs is 10,000,000.
• The time function is used to measure CPU time and absolute time.
• The seed to initiate System.Random is deliberately fixed as 1 so that I can get the same series of normal random numbers every time I run the code. If you don't want this, please remember to remove the seed.
• The underlying dynamics and European call option details are as follows:
• S0 (spot price) = 50.0
• vol (annualized volatility) = 20%
• risk free rate = 1%
• strike = 40.0
• expiration (in years) = 0.25
• As a sanity check, the closed-form  theoretical price based on the parameters above is 10.11842.
After running the code on my 4-year-old laptop, I got the following output :

option_price = 10.118465
CPU time = 5709ms
Absolute time = 7115ms

It took about 7 seconds to run 10 million simulations. And the simulated price agrees with the theoretical price up to 4 decimal places.

Finally, here is the code:
///////////////////////////////////////////////////////////////////////////////

open System
open System.Diagnostics

let N = 10000000
let rnd = new Random(1)
let randomPairs = List.init N (fun i -> (rnd.NextDouble(), rnd.NextDouble()))
let box_muller_transform (u1,u2) = sqrt(-2.0*log(u1)) * cos(2.0*Math.PI*u2)
let randomNormals = randomPairs |> List.map box_muller_transform

let call_payout K S = max (S-K) 0.0

let European_Option_Price (payout:float->float) S0 r vol T =
let S_T x = S0 * exp((r - 0.5*(vol**2.0))*T + vol*sqrt(T)*x)
randomNormals |> List.map S_T |> List.map payout |> List.average
|> (*) (exp(-r*T))

let S0 = 50.0
let K = 40.0
let r = 0.01
let vol = 0.2
let T = 0.25
let price() =
let price = European_Option_Price (call_payout K) S0 r vol T
printfn "option_price = %f" price

let time f =
let proc = Process.GetCurrentProcess()
let cpu_time_stamp = proc.TotalProcessorTime
let timer = new Stopwatch()
timer.Start()
try
f()
finally
let cpu_time = (proc.TotalProcessorTime-cpu_time_stamp).TotalMilliseconds
printfn "CPU time = %dms" (int64 cpu_time)
printfn "Absolute time = %dms" timer.ElapsedMilliseconds

time price