library(fmin)
#> Loading required package: numDeriv
library(ggplot2)
This package lets you minimize functions with multivariate inputs. It uses a quasi-Newton algorithm described in Nocedal and Wright (2006). It is similar to the popular nlm()
function in R
, but is implemented fully in R
. There is also a full C++
implementation included so that you can optimize functions within C++
directly: this may be useful for those who implement their functions in Rcpp
.
This vignette shows the simple capabilities of the package.
Let’s use Rosenbrock’s banana function in 2D as an example. The banana function with \(2\) input variables \((x_1, x_2)\) has the form
\[f(\textbf{x}) = (1 - x_1)^2 + 100(x_2 - x_1^2)^2\]
It is known that this function has a minimum at \((1, 1)\).
Let’s define the function and plot it.
# define function
banana <- function(x) {
return((1 - x[1])^2 + 100 * (x[2] - x[1]^2)^2)
}
# plot it
gr1 <- seq(-2, 2, 0.01)
gr2 <- seq(-1, 3, 0.01)
gr <- expand.grid(gr1, gr2)
colnames(gr) <- c("x1", "x2")
gr$f <- apply(gr, 1, FUN = banana)
ggplot(gr) +
geom_contour_filled(aes(x = x1, y = x2, z = f),
breaks = c(0, 50, 100, 500, 1000, 2500, 3000)) +
scale_color_viridis_c()
We can see it is a difficult function to find the global minimum of because of the long valley that looks like a banana.
Let’s use fmin
from this package to optimise the function. To do that we must specify a point to start at. I’ll pretend I don’t know \((1,1)\) is the best point and from the plot guess that \((0, 0)\) could be the minimum.
start <- c(0, 0)
opt <- fmin(banana, start)
You can now access the optimisation results:
opt
#> $estimate
#> [1] 1.00000000001 1.00000000001
#>
#> $value
#> [1] 4.61647239799e-23
#>
#> $g
#> [1] 4.95690873220e-11 -1.80296929914e-11
#>
#> $H
#> [,1] [,2]
#> [1,] 802.000000011 -400.000000003
#> [2,] -400.000000003 200.000000000
#>
#> $conv
#> [1] TRUE
#>
#> $niter
#> [1] 24
The estimate
is the input variables identified as the where the minimum of the function occurs. The value of the function at the minimum is given by value
. Other function information given is g
(the gradient at the minimum) and H
(the Hessian at the minimum). You’d expect the gradient to be very small at the minimum (as it is a stationary point). An important output is conv
which is TRUE
if the algorithm is likely to have converged to a stationary point, when this is FALSE
you need to investigate why the algortihm has failed. Finally, niter
is the number of iterations of the algorithm used.
For comparison, we see that in this case fmin
outperforms nlm
(marginally):
nlm_opt <- nlm(banana, start)
nlm_opt
#> $minimum
#> [1] 4.0237264531e-12
#>
#> $estimate
#> [1] 0.999997994158 0.999995990123
#>
#> $gradient
#> [1] -7.32827668359e-07 3.60568730429e-07
#>
#> $code
#> [1] 1
#>
#> $iterations
#> [1] 26
See examples for a R script with more examples of tricky functions to optimize using fmin
.
If you want to see some output as fmin
works, then use verbose = TRUE
:
opt <- fmin(banana, start, verbose = TRUE)
#> 1 0.8000 | 0.2000 0.0000
#> 2 0.6344 | 0.2035 0.0416
#> 3 0.4781 | 0.3713 0.1091
#> 4 0.4037 | 0.3703 0.1286
#> 5 0.2779 | 0.4998 0.2332
#> 6 0.2061 | 0.5800 0.3192
#> 7 0.1351 | 0.6476 0.4298
#> 8 0.0905 | 0.6999 0.4919
#> 9 0.0736 | 0.7305 0.5306
#> 10 0.0621 | 0.7681 0.5809
#> 11 0.0540 | 0.7927 0.6179
#> 12 0.0326 | 0.8442 0.7036
#> 13 0.0149 | 0.8803 0.7725
#> 14 0.0084 | 0.9413 0.8791
#> 15 0.0027 | 0.9486 0.8991
#> 16 0.0006 | 0.9787 0.9566
#> 17 0.0001 | 0.9938 0.9869
#> 18 0.0000 | 0.9977 0.9955
#> 19 0.0000 | 1.0000 1.0010
#> 20 0.0000 | 1.0000 1.0000
#> 21 0.0000 | 1.0000 1.0000
#> 22 0.0000 | 1.0000 1.0000
#> 23 0.0000 | 1.0000 1.0000
#> 24 0.0000 | 1.0000 1.0000
I have already said that conv
reports whether or not convergence is likely to have occured with the fmin
algorithm. Nothing in life is certain and you might want to have a closer look. Or if conv
is FALSE
you might want to see why.
One way to do this is to plot how the gradient and function values changed as the algorithm unfolded. To do this, you need to save the key information as the algortihm runs.
opt <- fmin(banana, start, save = TRUE)
Then you can use an function within this package to plot some useful graphs:
check_fmin(opt)
Maybe you will spot something suspicious or concerning in these graphs that can help optimize the function or pick better starting values.
The fmin
function has a number of arguments that I haven’t used yet. Let me describe why you might want to use them.
gobj
allows you to specify a gradient function if you know it for your problem. Otherwise, the gradient is approximated numerically using numDeriv
.
funit
The algorithm cannot work brilliantly for any scale of function value, e.g. functions that go into the thousands or millions might not work very well due to computer arithmetic being weird. It is then better practise to scale your function to vary mostly between \(-1\) and \(1\). For the banana case, I case see the function goes up to at least \(1000\), so I might select \(funit = 1000\).
units
Same reason for using funit
but for the input variables. If the input variables have widely different scales (one is thousandths and the other in millions) then you can specify a scaling so that they all vary evenly between \(-1\) and \(1\). This is a vector with a scaling unit for each input variable.
tol
A tolerance is used to determine when convergence has occurred. You could relax this if you want to see how marginal the decision is whether convergence has occurred or not.
stepmax
Every step of the algorithm is somewhere between a full Newton step (as determined by Newton’s algorithm) and something shorter or longer. A stepmax
of \(1\) limits steps to be at most a Newton step or shorter. If you make stepmax
\(> 1\) then the algorithm will consider steps longer than a Newton update and stepmax
\(< 1\) means it will only consider shorter steps.
maxsubsteps
Before taking a step, the algrotihm tries lots of substeps to try to find the best step to make, you can limit how many substeps it tries.
Sometimes you might have a function written in C++
(perhaps it receives information from R
using Rcpp
). You might want to optimise the function in C++
without using an optimiser in R
. This can reduce some overhead caused by Rcpp
passing information between R
and C++
during the optimisation. I will assume the reader is somewhat familiar with C++
.
To do this, you need to define your C++
function in a particular way. Let’s do it for the banana function.
#include <RcppEigen.h>
#include <iostream>
#include <fmin.h>
// [[Rcpp::depends(RcppEigen)]]
using namespace Rcpp;
// Define a class with an () operator
class myFun {
public:
myFun() {}
double operator()(const Eigen::VectorXd& x) const;
};
// Define function I want to minimise
double myFun::operator()(const Eigen::VectorXd& x) const {
double f = 0;
f += (1 - x[0]) * (1 - x[0]);
f += 100 * (x[1] - x[0] * x[0]);
return f;
}
The key idea is to create a C++
class where the parenthesis is an overloaded operator. Someone unfamilar with these C++
concepts, can just replace the part the function is defined above with their own function. For those familar with classes, recall that it is possbile to store auxiliary data as data members of the class.
You can also copy the function below that calls the C++
version of fmin
on the function defined in myFun::operator()
:
// [[Rcpp::export]]
List doF(Eigen::VectorXd x,
int maxit = 1000,
double tol = 1e-10,
double stepmax = 1,
int maxsubsteps = 10,
bool verbose = false,
int digits = 4) {
// create an instance of my function
myFun F;
// create instance of the fmin engine
Fmin<myFun> fmin(F, x, maxit, tol, stepmax, maxsubsteps, verbose, digits);
// run fmin on my function to find minimum
fmin.Run();
// package up results to return to R
List res = List::create(Named("par") = fmin.Par(),
Named("value") = F(fmin.Par()),
Named("g") = fmin.ComputeG(fmin.Par()),
Named("H") = fmin.ComputeH(fmin.Par()),
Named("conv") = fmin.Conv(),
Named("niter") = fmin.Iter());
return res;
}
Suppose I put all the above C++
code into a file called banana.cpp
. I can now compile this code in R
and optimise my function.
# Need RcppEigen and Rcpp to compile this code
library(RcppEigen)
library(Rcpp)
# compile function
sourceCpp("banana.cpp")
Now I can run the optimisation in C++
and look at the results in R
:
# Run optimization
res <- doF(c(0, 0))
# Ouputs
res
#> $par
#> [1] 1 1
#>
#> $value
#> [1] 1.83486088326e-26
#>
#> $g
#> [1] -4.71934006269e-12 2.29816061630e-12
#>
#> $H
#> [,1] [,2]
#> [1,] 801.999999990 -399.999999999
#> [2,] -399.999999999 200.000000001
#>
#> $conv
#> [1] TRUE
#>
#> $niter
#> [1] 24