library(fmin)
#> Loading required package: numDeriv
library(ggplot2)

This package lets you minimize functions with multivariate inputs. It uses a quasi-Newton algorithm described in Nocedal and Wright (2006). It is similar to the popular nlm() function in R, but is implemented fully in R. There is also a full C++ implementation included so that you can optimize functions within C++ directly: this may be useful for those who implement their functions in Rcpp.

This vignette shows the simple capabilities of the package.

Basic Usage

Let’s use Rosenbrock’s banana function in 2D as an example. The banana function with \(2\) input variables \((x_1, x_2)\) has the form

\[f(\textbf{x}) = (1 - x_1)^2 + 100(x_2 - x_1^2)^2\]

It is known that this function has a minimum at \((1, 1)\).

Let’s define the function and plot it.

# define function
banana <- function(x) { 
  return((1 - x[1])^2 + 100 * (x[2] - x[1]^2)^2)  
}

# plot it 
gr1 <- seq(-2, 2, 0.01)
gr2 <- seq(-1, 3, 0.01)
gr <- expand.grid(gr1, gr2)
colnames(gr) <- c("x1", "x2")
gr$f <- apply(gr, 1, FUN = banana)
ggplot(gr) + 
  geom_contour_filled(aes(x = x1, y = x2, z = f),
                      breaks = c(0, 50, 100, 500, 1000, 2500, 3000)) + 
  scale_color_viridis_c()

We can see it is a difficult function to find the global minimum of because of the long valley that looks like a banana.

Let’s use fmin from this package to optimise the function. To do that we must specify a point to start at. I’ll pretend I don’t know \((1,1)\) is the best point and from the plot guess that \((0, 0)\) could be the minimum.

start <- c(0, 0)
opt <- fmin(banana, start)

You can now access the optimisation results:

opt
#> $estimate
#> [1] 1.00000000001 1.00000000001
#> 
#> $value
#> [1] 4.61647239799e-23
#> 
#> $g
#> [1]  4.95690873220e-11 -1.80296929914e-11
#> 
#> $H
#>                [,1]           [,2]
#> [1,]  802.000000011 -400.000000003
#> [2,] -400.000000003  200.000000000
#> 
#> $conv
#> [1] TRUE
#> 
#> $niter
#> [1] 24

The estimate is the input variables identified as the where the minimum of the function occurs. The value of the function at the minimum is given by value. Other function information given is g (the gradient at the minimum) and H (the Hessian at the minimum). You’d expect the gradient to be very small at the minimum (as it is a stationary point). An important output is conv which is TRUE if the algorithm is likely to have converged to a stationary point, when this is FALSE you need to investigate why the algortihm has failed. Finally, niter is the number of iterations of the algorithm used.

For comparison, we see that in this case fmin outperforms nlm (marginally):

nlm_opt <- nlm(banana, start)
nlm_opt
#> $minimum
#> [1] 4.0237264531e-12
#> 
#> $estimate
#> [1] 0.999997994158 0.999995990123
#> 
#> $gradient
#> [1] -7.32827668359e-07  3.60568730429e-07
#> 
#> $code
#> [1] 1
#> 
#> $iterations
#> [1] 26

See examples for a R script with more examples of tricky functions to optimize using fmin.

If you want to see some output as fmin works, then use verbose = TRUE:

opt <- fmin(banana, start, verbose = TRUE)
#> 1       0.8000  |  0.2000 0.0000      
#> 2       0.6344  |  0.2035 0.0416      
#> 3       0.4781  |  0.3713 0.1091      
#> 4       0.4037  |  0.3703 0.1286      
#> 5       0.2779  |  0.4998 0.2332      
#> 6       0.2061  |  0.5800 0.3192      
#> 7       0.1351  |  0.6476 0.4298      
#> 8       0.0905  |  0.6999 0.4919      
#> 9       0.0736  |  0.7305 0.5306      
#> 10      0.0621  |  0.7681 0.5809      
#> 11      0.0540  |  0.7927 0.6179      
#> 12      0.0326  |  0.8442 0.7036      
#> 13      0.0149  |  0.8803 0.7725      
#> 14      0.0084  |  0.9413 0.8791      
#> 15      0.0027  |  0.9486 0.8991      
#> 16      0.0006  |  0.9787 0.9566      
#> 17      0.0001  |  0.9938 0.9869      
#> 18      0.0000  |  0.9977 0.9955      
#> 19      0.0000  |  1.0000 1.0010      
#> 20      0.0000  |  1.0000 1.0000      
#> 21      0.0000  |  1.0000 1.0000      
#> 22      0.0000  |  1.0000 1.0000      
#> 23      0.0000  |  1.0000 1.0000      
#> 24      0.0000  |  1.0000 1.0000

Checking convergence

I have already said that conv reports whether or not convergence is likely to have occured with the fmin algorithm. Nothing in life is certain and you might want to have a closer look. Or if conv is FALSE you might want to see why.

One way to do this is to plot how the gradient and function values changed as the algorithm unfolded. To do this, you need to save the key information as the algortihm runs.

opt <- fmin(banana, start, save = TRUE)

Then you can use an function within this package to plot some useful graphs:

check_fmin(opt)

Maybe you will spot something suspicious or concerning in these graphs that can help optimize the function or pick better starting values.

More advanced uses

The fmin function has a number of arguments that I haven’t used yet. Let me describe why you might want to use them.

gobj allows you to specify a gradient function if you know it for your problem. Otherwise, the gradient is approximated numerically using numDeriv.
funit The algorithm cannot work brilliantly for any scale of function value, e.g. functions that go into the thousands or millions might not work very well due to computer arithmetic being weird. It is then better practise to scale your function to vary mostly between \(-1\) and \(1\). For the banana case, I case see the function goes up to at least \(1000\), so I might select \(funit = 1000\).
units Same reason for using funit but for the input variables. If the input variables have widely different scales (one is thousandths and the other in millions) then you can specify a scaling so that they all vary evenly between \(-1\) and \(1\). This is a vector with a scaling unit for each input variable.
tol A tolerance is used to determine when convergence has occurred. You could relax this if you want to see how marginal the decision is whether convergence has occurred or not.
stepmax Every step of the algorithm is somewhere between a full Newton step (as determined by Newton’s algorithm) and something shorter or longer. A stepmax of \(1\) limits steps to be at most a Newton step or shorter. If you make stepmax \(> 1\) then the algorithm will consider steps longer than a Newton update and stepmax \(< 1\) means it will only consider shorter steps.
maxsubsteps Before taking a step, the algrotihm tries lots of substeps to try to find the best step to make, you can limit how many substeps it tries.

Using fmin in C++

Sometimes you might have a function written in C++ (perhaps it receives information from R using Rcpp). You might want to optimise the function in C++ without using an optimiser in R. This can reduce some overhead caused by Rcpp passing information between R and C++ during the optimisation. I will assume the reader is somewhat familiar with C++.

To do this, you need to define your C++ function in a particular way. Let’s do it for the banana function.

#include <RcppEigen.h>
#include <iostream>
#include <fmin.h>
// [[Rcpp::depends(RcppEigen)]]

using namespace Rcpp;

// Define a class with an () operator 
class myFun {
public:
  myFun() {}
  double operator()(const Eigen::VectorXd& x) const;
};

// Define function I want to minimise 
double myFun::operator()(const Eigen::VectorXd& x) const {
  double f = 0; 
  f += (1 - x[0]) * (1 - x[0]); 
  f += 100 * (x[1] - x[0] * x[0]); 
  return f;
}

The key idea is to create a C++ class where the parenthesis is an overloaded operator. Someone unfamilar with these C++ concepts, can just replace the part the function is defined above with their own function. For those familar with classes, recall that it is possbile to store auxiliary data as data members of the class.

You can also copy the function below that calls the C++ version of fmin on the function defined in myFun::operator():

// [[Rcpp::export]]
List doF(Eigen::VectorXd x,
         int maxit = 1000,
         double tol = 1e-10,
         double stepmax = 1, 
         int maxsubsteps = 10,
         bool verbose = false,
         int digits = 4) {
         
  // create an instance of my function 
  myFun F;
  // create instance of the fmin engine 
  Fmin<myFun> fmin(F, x, maxit, tol, stepmax, maxsubsteps, verbose, digits);
  // run fmin on my function to find minimum 
  fmin.Run();
  // package up results to return to R
  List res = List::create(Named("par") = fmin.Par(),
                          Named("value") = F(fmin.Par()),
                          Named("g") = fmin.ComputeG(fmin.Par()),
                          Named("H") = fmin.ComputeH(fmin.Par()),
                          Named("conv") = fmin.Conv(),
                          Named("niter") = fmin.Iter());
  return res;
}

Suppose I put all the above C++ code into a file called banana.cpp. I can now compile this code in R and optimise my function.

# Need RcppEigen and Rcpp to compile this code 
library(RcppEigen)
library(Rcpp)
# compile function
sourceCpp("banana.cpp")

Now I can run the optimisation in C++ and look at the results in R:

# Run optimization
res <- doF(c(0, 0))

# Ouputs
res
#> $par
#> [1] 1 1
#> 
#> $value
#> [1] 1.83486088326e-26
#> 
#> $g
#> [1] -4.71934006269e-12  2.29816061630e-12
#> 
#> $H
#>                [,1]           [,2]
#> [1,]  801.999999990 -399.999999999
#> [2,] -399.999999999  200.000000001
#> 
#> $conv
#> [1] TRUE
#> 
#> $niter
#> [1] 24

References

Nocedal, Jorge, and Stephen Wright. 2006. Numerical Optimization. Springer Science & Business Media.

fmin: R and C++ implementation of Quasi-Newton multivariate optimiser

Richard Glennie

2021-08-20

Basic Usage

Checking convergence

More advanced uses

Using fmin in C++

References