feat: initial commit - Phase 1 & 2 core features

This commit is contained in:
hiderfong
2026-04-22 17:07:33 +08:00
commit 1773bda06b
25005 changed files with 6252106 additions and 0 deletions
@@ -0,0 +1,76 @@
From the website for the L-BFGS-B code (from at
http://www.ece.northwestern.edu/~nocedal/lbfgsb.html):
"""
L-BFGS-B is a limited-memory quasi-Newton code for bound-constrained
optimization, i.e. for problems where the only constraints are of the
form l<= x <= u.
"""
This is a Python wrapper (using F2PY) written by David M. Cooke
<cookedm@physics.mcmaster.ca> and released as version 0.9 on April 9, 2004.
The wrapper was slightly modified by Joonas Paalasmaa for the 3.0 version
in March 2012.
License of L-BFGS-B (Fortran code)
==================================
The version included here (in lbfgsb.f) is 3.0 (released April 25, 2011). It was
written by Ciyou Zhu, Richard Byrd, and Jorge Nocedal <nocedal@ece.nwu.edu>. It
carries the following condition for use:
"""
This software is freely available, but we expect that all publications
describing work using this software, or all commercial products using it,
quote at least one of the references given below. This software is released
under the BSD License.
References
* R. H. Byrd, P. Lu and J. Nocedal. A Limited Memory Algorithm for Bound
Constrained Optimization, (1995), SIAM Journal on Scientific and
Statistical Computing, 16, 5, pp. 1190-1208.
* C. Zhu, R. H. Byrd and J. Nocedal. L-BFGS-B: Algorithm 778: L-BFGS-B,
FORTRAN routines for large scale bound constrained optimization (1997),
ACM Transactions on Mathematical Software, 23, 4, pp. 550 - 560.
* J.L. Morales and J. Nocedal. L-BFGS-B: Remark on Algorithm 778: L-BFGS-B,
FORTRAN routines for large scale bound constrained optimization (2011),
ACM Transactions on Mathematical Software, 38, 1.
"""
The Python wrapper
==================
This code uses F2PY (http://cens.ioc.ee/projects/f2py2e/) to generate
the wrapper around the Fortran code.
The Python code and wrapper are copyrighted 2004 by David M. Cooke
<cookedm@physics.mcmaster.ca>.
Example usage
=============
An example of the usage is given at the bottom of the lbfgsb.py file.
Run it with 'python lbfgsb.py'.
License for the Python wrapper
==============================
Copyright (c) 2004 David M. Cooke <cookedm@physics.mcmaster.ca>
Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
of the Software, and to permit persons to whom the Software is furnished to do
so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
@@ -0,0 +1,451 @@
"""
=====================================================
Optimization and root finding (:mod:`scipy.optimize`)
=====================================================
.. currentmodule:: scipy.optimize
.. toctree::
:hidden:
optimize.cython_optimize
SciPy ``optimize`` provides functions for minimizing (or maximizing)
objective functions, possibly subject to constraints. It includes
solvers for nonlinear problems (with support for both local and global
optimization algorithms), linear programming, constrained
and nonlinear least-squares, root finding, and curve fitting.
Common functions and objects, shared across different solvers, are:
.. autosummary::
:toctree: generated/
show_options - Show specific options optimization solvers.
OptimizeResult - The optimization result returned by some optimizers.
OptimizeWarning - The optimization encountered problems.
Optimization
============
Scalar functions optimization
-----------------------------
.. autosummary::
:toctree: generated/
minimize_scalar - Interface for minimizers of univariate functions
The `minimize_scalar` function supports the following methods:
.. toctree::
optimize.minimize_scalar-brent
optimize.minimize_scalar-bounded
optimize.minimize_scalar-golden
Local (multivariate) optimization
---------------------------------
.. autosummary::
:toctree: generated/
minimize - Interface for minimizers of multivariate functions.
The `minimize` function supports the following methods:
.. toctree::
optimize.minimize-neldermead
optimize.minimize-powell
optimize.minimize-cg
optimize.minimize-bfgs
optimize.minimize-newtoncg
optimize.minimize-lbfgsb
optimize.minimize-tnc
optimize.minimize-cobyla
optimize.minimize-slsqp
optimize.minimize-trustconstr
optimize.minimize-dogleg
optimize.minimize-trustncg
optimize.minimize-trustkrylov
optimize.minimize-trustexact
Constraints are passed to `minimize` function as a single object or
as a list of objects from the following classes:
.. autosummary::
:toctree: generated/
NonlinearConstraint - Class defining general nonlinear constraints.
LinearConstraint - Class defining general linear constraints.
Simple bound constraints are handled separately and there is a special class
for them:
.. autosummary::
:toctree: generated/
Bounds - Bound constraints.
Quasi-Newton strategies implementing `HessianUpdateStrategy`
interface can be used to approximate the Hessian in `minimize`
function (available only for the 'trust-constr' method). Available
quasi-Newton methods implementing this interface are:
.. autosummary::
:toctree: generated/
BFGS - Broyden-Fletcher-Goldfarb-Shanno (BFGS) Hessian update strategy.
SR1 - Symmetric-rank-1 Hessian update strategy.
.. _global_optimization:
Global optimization
-------------------
.. autosummary::
:toctree: generated/
basinhopping - Basinhopping stochastic optimizer.
brute - Brute force searching optimizer.
differential_evolution - Stochastic optimizer using differential evolution.
shgo - Simplicial homology global optimizer.
dual_annealing - Dual annealing stochastic optimizer.
direct - DIRECT (Dividing Rectangles) optimizer.
Least-squares and curve fitting
===============================
Nonlinear least-squares
-----------------------
.. autosummary::
:toctree: generated/
least_squares - Solve a nonlinear least-squares problem with bounds on the variables.
Linear least-squares
--------------------
.. autosummary::
:toctree: generated/
nnls - Linear least-squares problem with non-negativity constraint.
lsq_linear - Linear least-squares problem with bound constraints.
isotonic_regression - Least squares problem of isotonic regression via PAVA.
Curve fitting
-------------
.. autosummary::
:toctree: generated/
curve_fit -- Fit curve to a set of points.
Root finding
============
Scalar functions
----------------
.. autosummary::
:toctree: generated/
root_scalar - Unified interface for nonlinear solvers of scalar functions.
brentq - quadratic interpolation Brent method.
brenth - Brent method, modified by Harris with hyperbolic extrapolation.
ridder - Ridder's method.
bisect - Bisection method.
newton - Newton's method (also Secant and Halley's methods).
toms748 - Alefeld, Potra & Shi Algorithm 748.
RootResults - The root finding result returned by some root finders.
The `root_scalar` function supports the following methods:
.. toctree::
optimize.root_scalar-brentq
optimize.root_scalar-brenth
optimize.root_scalar-bisect
optimize.root_scalar-ridder
optimize.root_scalar-newton
optimize.root_scalar-toms748
optimize.root_scalar-secant
optimize.root_scalar-halley
The table below lists situations and appropriate methods, along with
*asymptotic* convergence rates per iteration (and per function evaluation)
for successful convergence to a simple root(*).
Bisection is the slowest of them all, adding one bit of accuracy for each
function evaluation, but is guaranteed to converge.
The other bracketing methods all (eventually) increase the number of accurate
bits by about 50% for every function evaluation.
The derivative-based methods, all built on `newton`, can converge quite quickly
if the initial value is close to the root. They can also be applied to
functions defined on (a subset of) the complex plane.
+-------------+----------+----------+-----------+-------------+-------------+----------------+
| Domain of f | Bracket? | Derivatives? | Solvers | Convergence |
+ + +----------+-----------+ +-------------+----------------+
| | | `fprime` | `fprime2` | | Guaranteed? | Rate(s)(*) |
+=============+==========+==========+===========+=============+=============+================+
| `R` | Yes | N/A | N/A | - bisection | - Yes | - 1 "Linear" |
| | | | | - brentq | - Yes | - >=1, <= 1.62 |
| | | | | - brenth | - Yes | - >=1, <= 1.62 |
| | | | | - ridder | - Yes | - 2.0 (1.41) |
| | | | | - toms748 | - Yes | - 2.7 (1.65) |
+-------------+----------+----------+-----------+-------------+-------------+----------------+
| `R` or `C` | No | No | No | secant | No | 1.62 (1.62) |
+-------------+----------+----------+-----------+-------------+-------------+----------------+
| `R` or `C` | No | Yes | No | newton | No | 2.00 (1.41) |
+-------------+----------+----------+-----------+-------------+-------------+----------------+
| `R` or `C` | No | Yes | Yes | halley | No | 3.00 (1.44) |
+-------------+----------+----------+-----------+-------------+-------------+----------------+
.. seealso::
`scipy.optimize.cython_optimize` -- Typed Cython versions of root finding functions
Fixed point finding:
.. autosummary::
:toctree: generated/
fixed_point - Single-variable fixed-point solver.
Multidimensional
----------------
.. autosummary::
:toctree: generated/
root - Unified interface for nonlinear solvers of multivariate functions.
The `root` function supports the following methods:
.. toctree::
optimize.root-hybr
optimize.root-lm
optimize.root-broyden1
optimize.root-broyden2
optimize.root-anderson
optimize.root-linearmixing
optimize.root-diagbroyden
optimize.root-excitingmixing
optimize.root-krylov
optimize.root-dfsane
Linear programming / MILP
=========================
.. autosummary::
:toctree: generated/
milp -- Mixed integer linear programming.
linprog -- Unified interface for minimizers of linear programming problems.
The `linprog` function supports the following methods:
.. toctree::
optimize.linprog-simplex
optimize.linprog-interior-point
optimize.linprog-revised_simplex
optimize.linprog-highs-ipm
optimize.linprog-highs-ds
optimize.linprog-highs
The simplex, interior-point, and revised simplex methods support callback
functions, such as:
.. autosummary::
:toctree: generated/
linprog_verbose_callback -- Sample callback function for linprog (simplex).
Assignment problems
===================
.. autosummary::
:toctree: generated/
linear_sum_assignment -- Solves the linear-sum assignment problem.
quadratic_assignment -- Solves the quadratic assignment problem.
The `quadratic_assignment` function supports the following methods:
.. toctree::
optimize.qap-faq
optimize.qap-2opt
Utilities
=========
Finite-difference approximation
-------------------------------
.. autosummary::
:toctree: generated/
approx_fprime - Approximate the gradient of a scalar function.
check_grad - Check the supplied derivative using finite differences.
Line search
-----------
.. autosummary::
:toctree: generated/
bracket - Bracket a minimum, given two starting points.
line_search - Return a step that satisfies the strong Wolfe conditions.
Hessian approximation
---------------------
.. autosummary::
:toctree: generated/
LbfgsInvHessProduct - Linear operator for L-BFGS approximate inverse Hessian.
HessianUpdateStrategy - Interface for implementing Hessian update strategies
Benchmark problems
------------------
.. autosummary::
:toctree: generated/
rosen - The Rosenbrock function.
rosen_der - The derivative of the Rosenbrock function.
rosen_hess - The Hessian matrix of the Rosenbrock function.
rosen_hess_prod - Product of the Rosenbrock Hessian with a vector.
Legacy functions
================
The functions below are not recommended for use in new scripts;
all of these methods are accessible via a newer, more consistent
interfaces, provided by the interfaces above.
Optimization
------------
General-purpose multivariate methods:
.. autosummary::
:toctree: generated/
fmin - Nelder-Mead Simplex algorithm.
fmin_powell - Powell's (modified) conjugate direction method.
fmin_cg - Non-linear (Polak-Ribiere) conjugate gradient algorithm.
fmin_bfgs - Quasi-Newton method (Broydon-Fletcher-Goldfarb-Shanno).
fmin_ncg - Line-search Newton Conjugate Gradient.
Constrained multivariate methods:
.. autosummary::
:toctree: generated/
fmin_l_bfgs_b - Zhu, Byrd, and Nocedal's constrained optimizer.
fmin_tnc - Truncated Newton code.
fmin_cobyla - Constrained optimization by linear approximation.
fmin_slsqp - Minimization using sequential least-squares programming.
Univariate (scalar) minimization methods:
.. autosummary::
:toctree: generated/
fminbound - Bounded minimization of a scalar function.
brent - 1-D function minimization using Brent method.
golden - 1-D function minimization using Golden Section method.
Least-squares
-------------
.. autosummary::
:toctree: generated/
leastsq - Minimize the sum of squares of M equations in N unknowns.
Root finding
------------
General nonlinear solvers:
.. autosummary::
:toctree: generated/
fsolve - Non-linear multivariable equation solver.
broyden1 - Broyden's first method.
broyden2 - Broyden's second method.
NoConvergence - Exception raised when nonlinear solver does not converge.
Large-scale nonlinear solvers:
.. autosummary::
:toctree: generated/
newton_krylov
anderson
BroydenFirst
InverseJacobian
KrylovJacobian
Simple iteration solvers:
.. autosummary::
:toctree: generated/
excitingmixing
linearmixing
diagbroyden
""" # noqa: E501
from ._optimize import *
from ._minimize import *
from ._root import *
from ._root_scalar import *
from ._minpack_py import *
from ._zeros_py import *
from ._lbfgsb_py import fmin_l_bfgs_b, LbfgsInvHessProduct
from ._tnc import fmin_tnc
from ._cobyla_py import fmin_cobyla
from ._nonlin import *
from ._slsqp_py import fmin_slsqp
from ._nnls import nnls
from ._basinhopping import basinhopping
from ._linprog import linprog, linprog_verbose_callback
from ._lsap import linear_sum_assignment
from ._differentialevolution import differential_evolution
from ._lsq import least_squares, lsq_linear
from ._isotonic import isotonic_regression
from ._constraints import (NonlinearConstraint,
LinearConstraint,
Bounds)
from ._hessian_update_strategy import HessianUpdateStrategy, BFGS, SR1
from ._shgo import shgo
from ._dual_annealing import dual_annealing
from ._qap import quadratic_assignment
from ._direct_py import direct
from ._milp import milp
# Deprecated namespaces, to be removed in v2.0.0
from . import (
cobyla, lbfgsb, linesearch, minpack, minpack2, moduleTNC, nonlin, optimize,
slsqp, tnc, zeros
)
__all__ = [s for s in dir() if not s.startswith('_')]
from scipy._lib._testutils import PytestTester
test = PytestTester(__name__)
del PytestTester
@@ -0,0 +1,753 @@
"""
basinhopping: The basinhopping global optimization algorithm
"""
import numpy as np
import math
import inspect
import scipy.optimize
from scipy._lib._util import check_random_state
__all__ = ['basinhopping']
_params = (inspect.Parameter('res_new', kind=inspect.Parameter.KEYWORD_ONLY),
inspect.Parameter('res_old', kind=inspect.Parameter.KEYWORD_ONLY))
_new_accept_test_signature = inspect.Signature(parameters=_params)
class Storage:
"""
Class used to store the lowest energy structure
"""
def __init__(self, minres):
self._add(minres)
def _add(self, minres):
self.minres = minres
self.minres.x = np.copy(minres.x)
def update(self, minres):
if minres.success and (minres.fun < self.minres.fun
or not self.minres.success):
self._add(minres)
return True
else:
return False
def get_lowest(self):
return self.minres
class BasinHoppingRunner:
"""This class implements the core of the basinhopping algorithm.
x0 : ndarray
The starting coordinates.
minimizer : callable
The local minimizer, with signature ``result = minimizer(x)``.
The return value is an `optimize.OptimizeResult` object.
step_taking : callable
This function displaces the coordinates randomly. Signature should
be ``x_new = step_taking(x)``. Note that `x` may be modified in-place.
accept_tests : list of callables
Each test is passed the kwargs `f_new`, `x_new`, `f_old` and
`x_old`. These tests will be used to judge whether or not to accept
the step. The acceptable return values are True, False, or ``"force
accept"``. If any of the tests return False then the step is rejected.
If ``"force accept"``, then this will override any other tests in
order to accept the step. This can be used, for example, to forcefully
escape from a local minimum that ``basinhopping`` is trapped in.
disp : bool, optional
Display status messages.
"""
def __init__(self, x0, minimizer, step_taking, accept_tests, disp=False):
self.x = np.copy(x0)
self.minimizer = minimizer
self.step_taking = step_taking
self.accept_tests = accept_tests
self.disp = disp
self.nstep = 0
# initialize return object
self.res = scipy.optimize.OptimizeResult()
self.res.minimization_failures = 0
# do initial minimization
minres = minimizer(self.x)
if not minres.success:
self.res.minimization_failures += 1
if self.disp:
print("warning: basinhopping: local minimization failure")
self.x = np.copy(minres.x)
self.energy = minres.fun
self.incumbent_minres = minres # best minimize result found so far
if self.disp:
print("basinhopping step %d: f %g" % (self.nstep, self.energy))
# initialize storage class
self.storage = Storage(minres)
if hasattr(minres, "nfev"):
self.res.nfev = minres.nfev
if hasattr(minres, "njev"):
self.res.njev = minres.njev
if hasattr(minres, "nhev"):
self.res.nhev = minres.nhev
def _monte_carlo_step(self):
"""Do one Monte Carlo iteration
Randomly displace the coordinates, minimize, and decide whether
or not to accept the new coordinates.
"""
# Take a random step. Make a copy of x because the step_taking
# algorithm might change x in place
x_after_step = np.copy(self.x)
x_after_step = self.step_taking(x_after_step)
# do a local minimization
minres = self.minimizer(x_after_step)
x_after_quench = minres.x
energy_after_quench = minres.fun
if not minres.success:
self.res.minimization_failures += 1
if self.disp:
print("warning: basinhopping: local minimization failure")
if hasattr(minres, "nfev"):
self.res.nfev += minres.nfev
if hasattr(minres, "njev"):
self.res.njev += minres.njev
if hasattr(minres, "nhev"):
self.res.nhev += minres.nhev
# accept the move based on self.accept_tests. If any test is False,
# then reject the step. If any test returns the special string
# 'force accept', then accept the step regardless. This can be used
# to forcefully escape from a local minimum if normal basin hopping
# steps are not sufficient.
accept = True
for test in self.accept_tests:
if inspect.signature(test) == _new_accept_test_signature:
testres = test(res_new=minres, res_old=self.incumbent_minres)
else:
testres = test(f_new=energy_after_quench, x_new=x_after_quench,
f_old=self.energy, x_old=self.x)
if testres == 'force accept':
accept = True
break
elif testres is None:
raise ValueError("accept_tests must return True, False, or "
"'force accept'")
elif not testres:
accept = False
# Report the result of the acceptance test to the take step class.
# This is for adaptive step taking
if hasattr(self.step_taking, "report"):
self.step_taking.report(accept, f_new=energy_after_quench,
x_new=x_after_quench, f_old=self.energy,
x_old=self.x)
return accept, minres
def one_cycle(self):
"""Do one cycle of the basinhopping algorithm
"""
self.nstep += 1
new_global_min = False
accept, minres = self._monte_carlo_step()
if accept:
self.energy = minres.fun
self.x = np.copy(minres.x)
self.incumbent_minres = minres # best minimize result found so far
new_global_min = self.storage.update(minres)
# print some information
if self.disp:
self.print_report(minres.fun, accept)
if new_global_min:
print("found new global minimum on step %d with function"
" value %g" % (self.nstep, self.energy))
# save some variables as BasinHoppingRunner attributes
self.xtrial = minres.x
self.energy_trial = minres.fun
self.accept = accept
return new_global_min
def print_report(self, energy_trial, accept):
"""print a status update"""
minres = self.storage.get_lowest()
print("basinhopping step %d: f %g trial_f %g accepted %d "
" lowest_f %g" % (self.nstep, self.energy, energy_trial,
accept, minres.fun))
class AdaptiveStepsize:
"""
Class to implement adaptive stepsize.
This class wraps the step taking class and modifies the stepsize to
ensure the true acceptance rate is as close as possible to the target.
Parameters
----------
takestep : callable
The step taking routine. Must contain modifiable attribute
takestep.stepsize
accept_rate : float, optional
The target step acceptance rate
interval : int, optional
Interval for how often to update the stepsize
factor : float, optional
The step size is multiplied or divided by this factor upon each
update.
verbose : bool, optional
Print information about each update
"""
def __init__(self, takestep, accept_rate=0.5, interval=50, factor=0.9,
verbose=True):
self.takestep = takestep
self.target_accept_rate = accept_rate
self.interval = interval
self.factor = factor
self.verbose = verbose
self.nstep = 0
self.nstep_tot = 0
self.naccept = 0
def __call__(self, x):
return self.take_step(x)
def _adjust_step_size(self):
old_stepsize = self.takestep.stepsize
accept_rate = float(self.naccept) / self.nstep
if accept_rate > self.target_accept_rate:
# We're accepting too many steps. This generally means we're
# trapped in a basin. Take bigger steps.
self.takestep.stepsize /= self.factor
else:
# We're not accepting enough steps. Take smaller steps.
self.takestep.stepsize *= self.factor
if self.verbose:
print("adaptive stepsize: acceptance rate {:f} target {:f} new "
"stepsize {:g} old stepsize {:g}".format(accept_rate,
self.target_accept_rate, self.takestep.stepsize,
old_stepsize))
def take_step(self, x):
self.nstep += 1
self.nstep_tot += 1
if self.nstep % self.interval == 0:
self._adjust_step_size()
return self.takestep(x)
def report(self, accept, **kwargs):
"called by basinhopping to report the result of the step"
if accept:
self.naccept += 1
class RandomDisplacement:
"""Add a random displacement of maximum size `stepsize` to each coordinate.
Calling this updates `x` in-place.
Parameters
----------
stepsize : float, optional
Maximum stepsize in any dimension
random_gen : {None, int, `numpy.random.Generator`,
`numpy.random.RandomState`}, optional
If `seed` is None (or `np.random`), the `numpy.random.RandomState`
singleton is used.
If `seed` is an int, a new ``RandomState`` instance is used,
seeded with `seed`.
If `seed` is already a ``Generator`` or ``RandomState`` instance then
that instance is used.
"""
def __init__(self, stepsize=0.5, random_gen=None):
self.stepsize = stepsize
self.random_gen = check_random_state(random_gen)
def __call__(self, x):
x += self.random_gen.uniform(-self.stepsize, self.stepsize,
np.shape(x))
return x
class MinimizerWrapper:
"""
wrap a minimizer function as a minimizer class
"""
def __init__(self, minimizer, func=None, **kwargs):
self.minimizer = minimizer
self.func = func
self.kwargs = kwargs
def __call__(self, x0):
if self.func is None:
return self.minimizer(x0, **self.kwargs)
else:
return self.minimizer(self.func, x0, **self.kwargs)
class Metropolis:
"""Metropolis acceptance criterion.
Parameters
----------
T : float
The "temperature" parameter for the accept or reject criterion.
random_gen : {None, int, `numpy.random.Generator`,
`numpy.random.RandomState`}, optional
If `seed` is None (or `np.random`), the `numpy.random.RandomState`
singleton is used.
If `seed` is an int, a new ``RandomState`` instance is used,
seeded with `seed`.
If `seed` is already a ``Generator`` or ``RandomState`` instance then
that instance is used.
Random number generator used for acceptance test.
"""
def __init__(self, T, random_gen=None):
# Avoid ZeroDivisionError since "MBH can be regarded as a special case
# of the BH framework with the Metropolis criterion, where temperature
# T = 0." (Reject all steps that increase energy.)
self.beta = 1.0 / T if T != 0 else float('inf')
self.random_gen = check_random_state(random_gen)
def accept_reject(self, res_new, res_old):
"""
Assuming the local search underlying res_new was successful:
If new energy is lower than old, it will always be accepted.
If new is higher than old, there is a chance it will be accepted,
less likely for larger differences.
"""
with np.errstate(invalid='ignore'):
# The energy values being fed to Metropolis are 1-length arrays, and if
# they are equal, their difference is 0, which gets multiplied by beta,
# which is inf, and array([0]) * float('inf') causes
#
# RuntimeWarning: invalid value encountered in multiply
#
# Ignore this warning so when the algorithm is on a flat plane, it always
# accepts the step, to try to move off the plane.
prod = -(res_new.fun - res_old.fun) * self.beta
w = math.exp(min(0, prod))
rand = self.random_gen.uniform()
return w >= rand and (res_new.success or not res_old.success)
def __call__(self, *, res_new, res_old):
"""
f_new and f_old are mandatory in kwargs
"""
return bool(self.accept_reject(res_new, res_old))
def basinhopping(func, x0, niter=100, T=1.0, stepsize=0.5,
minimizer_kwargs=None, take_step=None, accept_test=None,
callback=None, interval=50, disp=False, niter_success=None,
seed=None, *, target_accept_rate=0.5, stepwise_factor=0.9):
"""Find the global minimum of a function using the basin-hopping algorithm.
Basin-hopping is a two-phase method that combines a global stepping
algorithm with local minimization at each step. Designed to mimic
the natural process of energy minimization of clusters of atoms, it works
well for similar problems with "funnel-like, but rugged" energy landscapes
[5]_.
As the step-taking, step acceptance, and minimization methods are all
customizable, this function can also be used to implement other two-phase
methods.
Parameters
----------
func : callable ``f(x, *args)``
Function to be optimized. ``args`` can be passed as an optional item
in the dict `minimizer_kwargs`
x0 : array_like
Initial guess.
niter : integer, optional
The number of basin-hopping iterations. There will be a total of
``niter + 1`` runs of the local minimizer.
T : float, optional
The "temperature" parameter for the acceptance or rejection criterion.
Higher "temperatures" mean that larger jumps in function value will be
accepted. For best results `T` should be comparable to the
separation (in function value) between local minima.
stepsize : float, optional
Maximum step size for use in the random displacement.
minimizer_kwargs : dict, optional
Extra keyword arguments to be passed to the local minimizer
`scipy.optimize.minimize` Some important options could be:
method : str
The minimization method (e.g. ``"L-BFGS-B"``)
args : tuple
Extra arguments passed to the objective function (`func`) and
its derivatives (Jacobian, Hessian).
take_step : callable ``take_step(x)``, optional
Replace the default step-taking routine with this routine. The default
step-taking routine is a random displacement of the coordinates, but
other step-taking algorithms may be better for some systems.
`take_step` can optionally have the attribute ``take_step.stepsize``.
If this attribute exists, then `basinhopping` will adjust
``take_step.stepsize`` in order to try to optimize the global minimum
search.
accept_test : callable, ``accept_test(f_new=f_new, x_new=x_new, f_old=fold, x_old=x_old)``, optional
Define a test which will be used to judge whether to accept the
step. This will be used in addition to the Metropolis test based on
"temperature" `T`. The acceptable return values are True,
False, or ``"force accept"``. If any of the tests return False
then the step is rejected. If the latter, then this will override any
other tests in order to accept the step. This can be used, for example,
to forcefully escape from a local minimum that `basinhopping` is
trapped in.
callback : callable, ``callback(x, f, accept)``, optional
A callback function which will be called for all minima found. ``x``
and ``f`` are the coordinates and function value of the trial minimum,
and ``accept`` is whether that minimum was accepted. This can
be used, for example, to save the lowest N minima found. Also,
`callback` can be used to specify a user defined stop criterion by
optionally returning True to stop the `basinhopping` routine.
interval : integer, optional
interval for how often to update the `stepsize`
disp : bool, optional
Set to True to print status messages
niter_success : integer, optional
Stop the run if the global minimum candidate remains the same for this
number of iterations.
seed : {None, int, `numpy.random.Generator`, `numpy.random.RandomState`}, optional
If `seed` is None (or `np.random`), the `numpy.random.RandomState`
singleton is used.
If `seed` is an int, a new ``RandomState`` instance is used,
seeded with `seed`.
If `seed` is already a ``Generator`` or ``RandomState`` instance then
that instance is used.
Specify `seed` for repeatable minimizations. The random numbers
generated with this seed only affect the default Metropolis
`accept_test` and the default `take_step`. If you supply your own
`take_step` and `accept_test`, and these functions use random
number generation, then those functions are responsible for the state
of their random number generator.
target_accept_rate : float, optional
The target acceptance rate that is used to adjust the `stepsize`.
If the current acceptance rate is greater than the target,
then the `stepsize` is increased. Otherwise, it is decreased.
Range is (0, 1). Default is 0.5.
.. versionadded:: 1.8.0
stepwise_factor : float, optional
The `stepsize` is multiplied or divided by this stepwise factor upon
each update. Range is (0, 1). Default is 0.9.
.. versionadded:: 1.8.0
Returns
-------
res : OptimizeResult
The optimization result represented as a `OptimizeResult` object.
Important attributes are: ``x`` the solution array, ``fun`` the value
of the function at the solution, and ``message`` which describes the
cause of the termination. The ``OptimizeResult`` object returned by the
selected minimizer at the lowest minimum is also contained within this
object and can be accessed through the ``lowest_optimization_result``
attribute. See `OptimizeResult` for a description of other attributes.
See Also
--------
minimize :
The local minimization function called once for each basinhopping step.
`minimizer_kwargs` is passed to this routine.
Notes
-----
Basin-hopping is a stochastic algorithm which attempts to find the global
minimum of a smooth scalar function of one or more variables [1]_ [2]_ [3]_
[4]_. The algorithm in its current form was described by David Wales and
Jonathan Doye [2]_ http://www-wales.ch.cam.ac.uk/.
The algorithm is iterative with each cycle composed of the following
features
1) random perturbation of the coordinates
2) local minimization
3) accept or reject the new coordinates based on the minimized function
value
The acceptance test used here is the Metropolis criterion of standard Monte
Carlo algorithms, although there are many other possibilities [3]_.
This global minimization method has been shown to be extremely efficient
for a wide variety of problems in physics and chemistry. It is
particularly useful when the function has many minima separated by large
barriers. See the `Cambridge Cluster Database
<https://www-wales.ch.cam.ac.uk/CCD.html>`_ for databases of molecular
systems that have been optimized primarily using basin-hopping. This
database includes minimization problems exceeding 300 degrees of freedom.
See the free software program `GMIN <https://www-wales.ch.cam.ac.uk/GMIN>`_
for a Fortran implementation of basin-hopping. This implementation has many
variations of the procedure described above, including more
advanced step taking algorithms and alternate acceptance criterion.
For stochastic global optimization there is no way to determine if the true
global minimum has actually been found. Instead, as a consistency check,
the algorithm can be run from a number of different random starting points
to ensure the lowest minimum found in each example has converged to the
global minimum. For this reason, `basinhopping` will by default simply
run for the number of iterations `niter` and return the lowest minimum
found. It is left to the user to ensure that this is in fact the global
minimum.
Choosing `stepsize`: This is a crucial parameter in `basinhopping` and
depends on the problem being solved. The step is chosen uniformly in the
region from x0-stepsize to x0+stepsize, in each dimension. Ideally, it
should be comparable to the typical separation (in argument values) between
local minima of the function being optimized. `basinhopping` will, by
default, adjust `stepsize` to find an optimal value, but this may take
many iterations. You will get quicker results if you set a sensible
initial value for ``stepsize``.
Choosing `T`: The parameter `T` is the "temperature" used in the
Metropolis criterion. Basinhopping steps are always accepted if
``func(xnew) < func(xold)``. Otherwise, they are accepted with
probability::
exp( -(func(xnew) - func(xold)) / T )
So, for best results, `T` should to be comparable to the typical
difference (in function values) between local minima. (The height of
"walls" between local minima is irrelevant.)
If `T` is 0, the algorithm becomes Monotonic Basin-Hopping, in which all
steps that increase energy are rejected.
.. versionadded:: 0.12.0
References
----------
.. [1] Wales, David J. 2003, Energy Landscapes, Cambridge University Press,
Cambridge, UK.
.. [2] Wales, D J, and Doye J P K, Global Optimization by Basin-Hopping and
the Lowest Energy Structures of Lennard-Jones Clusters Containing up to
110 Atoms. Journal of Physical Chemistry A, 1997, 101, 5111.
.. [3] Li, Z. and Scheraga, H. A., Monte Carlo-minimization approach to the
multiple-minima problem in protein folding, Proc. Natl. Acad. Sci. USA,
1987, 84, 6611.
.. [4] Wales, D. J. and Scheraga, H. A., Global optimization of clusters,
crystals, and biomolecules, Science, 1999, 285, 1368.
.. [5] Olson, B., Hashmi, I., Molloy, K., and Shehu1, A., Basin Hopping as
a General and Versatile Optimization Framework for the Characterization
of Biological Macromolecules, Advances in Artificial Intelligence,
Volume 2012 (2012), Article ID 674832, :doi:`10.1155/2012/674832`
Examples
--------
The following example is a 1-D minimization problem, with many
local minima superimposed on a parabola.
>>> import numpy as np
>>> from scipy.optimize import basinhopping
>>> func = lambda x: np.cos(14.5 * x - 0.3) + (x + 0.2) * x
>>> x0 = [1.]
Basinhopping, internally, uses a local minimization algorithm. We will use
the parameter `minimizer_kwargs` to tell basinhopping which algorithm to
use and how to set up that minimizer. This parameter will be passed to
`scipy.optimize.minimize`.
>>> minimizer_kwargs = {"method": "BFGS"}
>>> ret = basinhopping(func, x0, minimizer_kwargs=minimizer_kwargs,
... niter=200)
>>> print("global minimum: x = %.4f, f(x) = %.4f" % (ret.x, ret.fun))
global minimum: x = -0.1951, f(x) = -1.0009
Next consider a 2-D minimization problem. Also, this time, we
will use gradient information to significantly speed up the search.
>>> def func2d(x):
... f = np.cos(14.5 * x[0] - 0.3) + (x[1] + 0.2) * x[1] + (x[0] +
... 0.2) * x[0]
... df = np.zeros(2)
... df[0] = -14.5 * np.sin(14.5 * x[0] - 0.3) + 2. * x[0] + 0.2
... df[1] = 2. * x[1] + 0.2
... return f, df
We'll also use a different local minimization algorithm. Also, we must tell
the minimizer that our function returns both energy and gradient (Jacobian).
>>> minimizer_kwargs = {"method":"L-BFGS-B", "jac":True}
>>> x0 = [1.0, 1.0]
>>> ret = basinhopping(func2d, x0, minimizer_kwargs=minimizer_kwargs,
... niter=200)
>>> print("global minimum: x = [%.4f, %.4f], f(x) = %.4f" % (ret.x[0],
... ret.x[1],
... ret.fun))
global minimum: x = [-0.1951, -0.1000], f(x) = -1.0109
Here is an example using a custom step-taking routine. Imagine you want
the first coordinate to take larger steps than the rest of the coordinates.
This can be implemented like so:
>>> class MyTakeStep:
... def __init__(self, stepsize=0.5):
... self.stepsize = stepsize
... self.rng = np.random.default_rng()
... def __call__(self, x):
... s = self.stepsize
... x[0] += self.rng.uniform(-2.*s, 2.*s)
... x[1:] += self.rng.uniform(-s, s, x[1:].shape)
... return x
Since ``MyTakeStep.stepsize`` exists basinhopping will adjust the magnitude
of `stepsize` to optimize the search. We'll use the same 2-D function as
before
>>> mytakestep = MyTakeStep()
>>> ret = basinhopping(func2d, x0, minimizer_kwargs=minimizer_kwargs,
... niter=200, take_step=mytakestep)
>>> print("global minimum: x = [%.4f, %.4f], f(x) = %.4f" % (ret.x[0],
... ret.x[1],
... ret.fun))
global minimum: x = [-0.1951, -0.1000], f(x) = -1.0109
Now, let's do an example using a custom callback function which prints the
value of every minimum found
>>> def print_fun(x, f, accepted):
... print("at minimum %.4f accepted %d" % (f, int(accepted)))
We'll run it for only 10 basinhopping steps this time.
>>> rng = np.random.default_rng()
>>> ret = basinhopping(func2d, x0, minimizer_kwargs=minimizer_kwargs,
... niter=10, callback=print_fun, seed=rng)
at minimum 0.4159 accepted 1
at minimum -0.4317 accepted 1
at minimum -1.0109 accepted 1
at minimum -0.9073 accepted 1
at minimum -0.4317 accepted 0
at minimum -0.1021 accepted 1
at minimum -0.7425 accepted 1
at minimum -0.9073 accepted 1
at minimum -0.4317 accepted 0
at minimum -0.7425 accepted 1
at minimum -0.9073 accepted 1
The minimum at -1.0109 is actually the global minimum, found already on the
8th iteration.
""" # numpy/numpydoc#87 # noqa: E501
if target_accept_rate <= 0. or target_accept_rate >= 1.:
raise ValueError('target_accept_rate has to be in range (0, 1)')
if stepwise_factor <= 0. or stepwise_factor >= 1.:
raise ValueError('stepwise_factor has to be in range (0, 1)')
x0 = np.array(x0)
# set up the np.random generator
rng = check_random_state(seed)
# set up minimizer
if minimizer_kwargs is None:
minimizer_kwargs = dict()
wrapped_minimizer = MinimizerWrapper(scipy.optimize.minimize, func,
**minimizer_kwargs)
# set up step-taking algorithm
if take_step is not None:
if not callable(take_step):
raise TypeError("take_step must be callable")
# if take_step.stepsize exists then use AdaptiveStepsize to control
# take_step.stepsize
if hasattr(take_step, "stepsize"):
take_step_wrapped = AdaptiveStepsize(
take_step, interval=interval,
accept_rate=target_accept_rate,
factor=stepwise_factor,
verbose=disp)
else:
take_step_wrapped = take_step
else:
# use default
displace = RandomDisplacement(stepsize=stepsize, random_gen=rng)
take_step_wrapped = AdaptiveStepsize(displace, interval=interval,
accept_rate=target_accept_rate,
factor=stepwise_factor,
verbose=disp)
# set up accept tests
accept_tests = []
if accept_test is not None:
if not callable(accept_test):
raise TypeError("accept_test must be callable")
accept_tests = [accept_test]
# use default
metropolis = Metropolis(T, random_gen=rng)
accept_tests.append(metropolis)
if niter_success is None:
niter_success = niter + 2
bh = BasinHoppingRunner(x0, wrapped_minimizer, take_step_wrapped,
accept_tests, disp=disp)
# The wrapped minimizer is called once during construction of
# BasinHoppingRunner, so run the callback
if callable(callback):
callback(bh.storage.minres.x, bh.storage.minres.fun, True)
# start main iteration loop
count, i = 0, 0
message = ["requested number of basinhopping iterations completed"
" successfully"]
for i in range(niter):
new_global_min = bh.one_cycle()
if callable(callback):
# should we pass a copy of x?
val = callback(bh.xtrial, bh.energy_trial, bh.accept)
if val is not None:
if val:
message = ["callback function requested stop early by"
"returning True"]
break
count += 1
if new_global_min:
count = 0
elif count > niter_success:
message = ["success condition satisfied"]
break
# prepare return object
res = bh.res
res.lowest_optimization_result = bh.storage.get_lowest()
res.x = np.copy(res.lowest_optimization_result.x)
res.fun = res.lowest_optimization_result.fun
res.message = message
res.nit = i + 1
res.success = res.lowest_optimization_result.success
return res
@@ -0,0 +1,663 @@
import numpy as np
import scipy._lib._elementwise_iterative_method as eim
from scipy._lib._util import _RichResult
_ELIMITS = -1 # used in _bracket_root
_ESTOPONESIDE = 2 # used in _bracket_root
def _bracket_root_iv(func, xl0, xr0, xmin, xmax, factor, args, maxiter):
if not callable(func):
raise ValueError('`func` must be callable.')
if not np.iterable(args):
args = (args,)
xl0 = np.asarray(xl0)[()]
if not np.issubdtype(xl0.dtype, np.number) or np.iscomplex(xl0).any():
raise ValueError('`xl0` must be numeric and real.')
xr0 = xl0 + 1 if xr0 is None else xr0
xmin = -np.inf if xmin is None else xmin
xmax = np.inf if xmax is None else xmax
factor = 2. if factor is None else factor
xl0, xr0, xmin, xmax, factor = np.broadcast_arrays(xl0, xr0, xmin, xmax, factor)
if not np.issubdtype(xr0.dtype, np.number) or np.iscomplex(xr0).any():
raise ValueError('`xr0` must be numeric and real.')
if not np.issubdtype(xmin.dtype, np.number) or np.iscomplex(xmin).any():
raise ValueError('`xmin` must be numeric and real.')
if not np.issubdtype(xmax.dtype, np.number) or np.iscomplex(xmax).any():
raise ValueError('`xmax` must be numeric and real.')
if not np.issubdtype(factor.dtype, np.number) or np.iscomplex(factor).any():
raise ValueError('`factor` must be numeric and real.')
if not np.all(factor > 1):
raise ValueError('All elements of `factor` must be greater than 1.')
maxiter = np.asarray(maxiter)
message = '`maxiter` must be a non-negative integer.'
if (not np.issubdtype(maxiter.dtype, np.number) or maxiter.shape != tuple()
or np.iscomplex(maxiter)):
raise ValueError(message)
maxiter_int = int(maxiter[()])
if not maxiter == maxiter_int or maxiter < 0:
raise ValueError(message)
if not np.all((xmin <= xl0) & (xl0 < xr0) & (xr0 <= xmax)):
raise ValueError('`xmin <= xl0 < xr0 <= xmax` must be True (elementwise).')
return func, xl0, xr0, xmin, xmax, factor, args, maxiter
def _bracket_root(func, xl0, xr0=None, *, xmin=None, xmax=None, factor=None,
args=(), maxiter=1000):
"""Bracket the root of a monotonic scalar function of one variable
This function works elementwise when `xl0`, `xr0`, `xmin`, `xmax`, `factor`, and
the elements of `args` are broadcastable arrays.
Parameters
----------
func : callable
The function for which the root is to be bracketed.
The signature must be::
func(x: ndarray, *args) -> ndarray
where each element of ``x`` is a finite real and ``args`` is a tuple,
which may contain an arbitrary number of arrays that are broadcastable
with `x`. ``func`` must be an elementwise function: each element
``func(x)[i]`` must equal ``func(x[i])`` for all indices ``i``.
xl0, xr0: float array_like
Starting guess of bracket, which need not contain a root. If `xr0` is
not provided, ``xr0 = xl0 + 1``. Must be broadcastable with one another.
xmin, xmax : float array_like, optional
Minimum and maximum allowable endpoints of the bracket, inclusive. Must
be broadcastable with `xl0` and `xr0`.
factor : float array_like, default: 2
The factor used to grow the bracket. See notes for details.
args : tuple, optional
Additional positional arguments to be passed to `func`. Must be arrays
broadcastable with `xl0`, `xr0`, `xmin`, and `xmax`. If the callable to be
bracketed requires arguments that are not broadcastable with these
arrays, wrap that callable with `func` such that `func` accepts
only `x` and broadcastable arrays.
maxiter : int, optional
The maximum number of iterations of the algorithm to perform.
Returns
-------
res : _RichResult
An instance of `scipy._lib._util._RichResult` with the following
attributes. The descriptions are written as though the values will be
scalars; however, if `func` returns an array, the outputs will be
arrays of the same shape.
xl, xr : float
The lower and upper ends of the bracket, if the algorithm
terminated successfully.
fl, fr : float
The function value at the lower and upper ends of the bracket.
nfev : int
The number of function evaluations required to find the bracket.
This is distinct from the number of times `func` is *called*
because the function may evaluated at multiple points in a single
call.
nit : int
The number of iterations of the algorithm that were performed.
status : int
An integer representing the exit status of the algorithm.
- ``0`` : The algorithm produced a valid bracket.
- ``-1`` : The bracket expanded to the allowable limits without finding a bracket.
- ``-2`` : The maximum number of iterations was reached.
- ``-3`` : A non-finite value was encountered.
- ``-4`` : Iteration was terminated by `callback`.
- ``1`` : The algorithm is proceeding normally (in `callback` only).
- ``2`` : A bracket was found in the opposite search direction (in `callback` only).
success : bool
``True`` when the algorithm terminated successfully (status ``0``).
Notes
-----
This function generalizes an algorithm found in pieces throughout
`scipy.stats`. The strategy is to iteratively grow the bracket `(l, r)`
until ``func(l) < 0 < func(r)``. The bracket grows to the left as follows.
- If `xmin` is not provided, the distance between `xl0` and `l` is iteratively
increased by `factor`.
- If `xmin` is provided, the distance between `xmin` and `l` is iteratively
decreased by `factor`. Note that this also *increases* the bracket size.
Growth of the bracket to the right is analogous.
Growth of the bracket in one direction stops when the endpoint is no longer
finite, the function value at the endpoint is no longer finite, or the
endpoint reaches its limiting value (`xmin` or `xmax`). Iteration terminates
when the bracket stops growing in both directions, the bracket surrounds
the root, or a root is found (accidentally).
If two brackets are found - that is, a bracket is found on both sides in
the same iteration, the smaller of the two is returned.
If roots of the function are found, both `l` and `r` are set to the
leftmost root.
""" # noqa: E501
# Todo:
# - find bracket with sign change in specified direction
# - Add tolerance
# - allow factor < 1?
callback = None # works; I just don't want to test it
temp = _bracket_root_iv(func, xl0, xr0, xmin, xmax, factor, args, maxiter)
func, xl0, xr0, xmin, xmax, factor, args, maxiter = temp
xs = (xl0, xr0)
temp = eim._initialize(func, xs, args)
func, xs, fs, args, shape, dtype = temp # line split for PEP8
# The approach is to treat the left and right searches as though they were
# (almost) totally independent one-sided bracket searches. (The interaction
# is considered when checking for termination and preparing the result
# object.)
# `x` is the "moving" end of the bracket
x = np.concatenate(xs)
f = np.concatenate(fs)
n = len(x) // 2
# `x_last` is the previous location of the moving end of the bracket. If
# the signs of `f` and `f_last` are different, `x` and `x_last` form a
# bracket.
x_last = np.concatenate((x[n:], x[:n]))
f_last = np.concatenate((f[n:], f[:n]))
# `x0` is the "fixed" end of the bracket.
x0 = x_last
# We don't need to retain the corresponding function value, since the
# fixed end of the bracket is only needed to compute the new value of the
# moving end; it is never returned.
xmin = np.broadcast_to(xmin, shape).astype(dtype, copy=False).ravel()
xmax = np.broadcast_to(xmax, shape).astype(dtype, copy=False).ravel()
limit = np.concatenate((xmin, xmax))
factor = np.broadcast_to(factor, shape).astype(dtype, copy=False).ravel()
factor = np.concatenate((factor, factor))
active = np.arange(2*n)
args = [np.concatenate((arg, arg)) for arg in args]
# This is needed due to inner workings of `eim._loop`.
# We're abusing it a tiny bit.
shape = shape + (2,)
# `d` is for "distance".
# For searches without a limit, the distance between the fixed end of the
# bracket `x0` and the moving end `x` will grow by `factor` each iteration.
# For searches with a limit, the distance between the `limit` and moving
# end of the bracket `x` will shrink by `factor` each iteration.
i = np.isinf(limit)
ni = ~i
d = np.zeros_like(x)
d[i] = x[i] - x0[i]
d[ni] = limit[ni] - x[ni]
status = np.full_like(x, eim._EINPROGRESS, dtype=int) # in progress
nit, nfev = 0, 1 # one function evaluation per side performed above
work = _RichResult(x=x, x0=x0, f=f, limit=limit, factor=factor,
active=active, d=d, x_last=x_last, f_last=f_last,
nit=nit, nfev=nfev, status=status, args=args,
xl=None, xr=None, fl=None, fr=None, n=n)
res_work_pairs = [('status', 'status'), ('xl', 'xl'), ('xr', 'xr'),
('nit', 'nit'), ('nfev', 'nfev'), ('fl', 'fl'),
('fr', 'fr'), ('x', 'x'), ('f', 'f'),
('x_last', 'x_last'), ('f_last', 'f_last')]
def pre_func_eval(work):
# Initialize moving end of bracket
x = np.zeros_like(work.x)
# Unlimited brackets grow by `factor` by increasing distance from fixed
# end to moving end.
i = np.isinf(work.limit) # indices of unlimited brackets
work.d[i] *= work.factor[i]
x[i] = work.x0[i] + work.d[i]
# Limited brackets grow by decreasing the distance from the limit to
# the moving end.
ni = ~i # indices of limited brackets
work.d[ni] /= work.factor[ni]
x[ni] = work.limit[ni] - work.d[ni]
return x
def post_func_eval(x, f, work):
# Keep track of the previous location of the moving end so that we can
# return a narrower bracket. (The alternative is to remember the
# original fixed end, but then the bracket would be wider than needed.)
work.x_last = work.x
work.f_last = work.f
work.x = x
work.f = f
def check_termination(work):
stop = np.zeros_like(work.x, dtype=bool)
# Condition 1: a valid bracket (or the root itself) has been found
sf = np.sign(work.f)
sf_last = np.sign(work.f_last)
i = (sf_last == -sf) | (sf_last == 0) | (sf == 0)
work.status[i] = eim._ECONVERGED
stop[i] = True
# Condition 2: the other side's search found a valid bracket.
# (If we just found a bracket with the rightward search, we can stop
# the leftward search, and vice-versa.)
# To do this, we need to set the status of the other side's search;
# this is tricky because `work.status` contains only the *active*
# elements, so we don't immediately know the index of the element we
# need to set - or even if it's still there. (That search may have
# terminated already, e.g. by reaching its `limit`.)
# To facilitate this, `work.active` contains a unit integer index of
# each search. Index `k` (`k < n)` and `k + n` correspond with a
# leftward and rightward search, respectively. Elements are removed
# from `work.active` just as they are removed from `work.status`, so
# we use `work.active` to help find the right location in
# `work.status`.
# Get the integer indices of the elements that can also stop
also_stop = (work.active[i] + work.n) % (2*work.n)
# Check whether they are still active.
# To start, we need to find out where in `work.active` they would
# appear if they are indeed there.
j = np.searchsorted(work.active, also_stop)
# If the location exceeds the length of the `work.active`, they are
# not there.
j = j[j < len(work.active)]
# Check whether they are still there.
j = j[also_stop == work.active[j]]
# Now convert these to boolean indices to use with `work.status`.
i = np.zeros_like(stop)
i[j] = True # boolean indices of elements that can also stop
i = i & ~stop
work.status[i] = _ESTOPONESIDE
stop[i] = True
# Condition 3: moving end of bracket reaches limit
i = (work.x == work.limit) & ~stop
work.status[i] = _ELIMITS
stop[i] = True
# Condition 4: non-finite value encountered
i = ~(np.isfinite(work.x) & np.isfinite(work.f)) & ~stop
work.status[i] = eim._EVALUEERR
stop[i] = True
return stop
def post_termination_check(work):
pass
def customize_result(res, shape):
n = len(res['x']) // 2
# To avoid ambiguity, below we refer to `xl0`, the initial left endpoint
# as `a` and `xr0`, the initial right endpoint, as `b`.
# Because we treat the two one-sided searches as though they were
# independent, what we keep track of in `work` and what we want to
# return in `res` look quite different. Combine the results from the
# two one-sided searches before reporting the results to the user.
# - "a" refers to the leftward search (the moving end started at `a`)
# - "b" refers to the rightward search (the moving end started at `b`)
# - "l" refers to the left end of the bracket (closer to -oo)
# - "r" refers to the right end of the bracket (closer to +oo)
xal = res['x'][:n]
xar = res['x_last'][:n]
xbl = res['x_last'][n:]
xbr = res['x'][n:]
fal = res['f'][:n]
far = res['f_last'][:n]
fbl = res['f_last'][n:]
fbr = res['f'][n:]
# Initialize the brackets and corresponding function values to return
# to the user. Brackets may not be valid (e.g. there is no root,
# there weren't enough iterations, NaN encountered), but we still need
# to return something. One option would be all NaNs, but what I've
# chosen here is the left- and right-most points at which the function
# has been evaluated. This gives the user some information about what
# interval of the real line has been searched and shows that there is
# no sign change between the two ends.
xl = xal.copy()
fl = fal.copy()
xr = xbr.copy()
fr = fbr.copy()
# `status` indicates whether the bracket is valid or not. If so,
# we want to adjust the bracket we return to be the narrowest possible
# given the points at which we evaluated the function.
# For example if bracket "a" is valid and smaller than bracket "b" OR
# if bracket "a" is valid and bracket "b" is not valid, we want to
# return bracket "a" (and vice versa).
sa = res['status'][:n]
sb = res['status'][n:]
da = xar - xal
db = xbr - xbl
i1 = ((da <= db) & (sa == 0)) | ((sa == 0) & (sb != 0))
i2 = ((db <= da) & (sb == 0)) | ((sb == 0) & (sa != 0))
xr[i1] = xar[i1]
fr[i1] = far[i1]
xl[i2] = xbl[i2]
fl[i2] = fbl[i2]
# Finish assembling the result object
res['xl'] = xl
res['xr'] = xr
res['fl'] = fl
res['fr'] = fr
res['nit'] = np.maximum(res['nit'][:n], res['nit'][n:])
res['nfev'] = res['nfev'][:n] + res['nfev'][n:]
# If the status on one side is zero, the status is zero. In any case,
# report the status from one side only.
res['status'] = np.choose(sa == 0, (sb, sa))
res['success'] = (res['status'] == 0)
del res['x']
del res['f']
del res['x_last']
del res['f_last']
return shape[:-1]
return eim._loop(work, callback, shape, maxiter, func, args, dtype,
pre_func_eval, post_func_eval, check_termination,
post_termination_check, customize_result, res_work_pairs)
def _bracket_minimum_iv(func, xm0, xl0, xr0, xmin, xmax, factor, args, maxiter):
if not callable(func):
raise ValueError('`func` must be callable.')
if not np.iterable(args):
args = (args,)
xm0 = np.asarray(xm0)[()]
if not np.issubdtype(xm0.dtype, np.number) or np.iscomplex(xm0).any():
raise ValueError('`xm0` must be numeric and real.')
xmin = -np.inf if xmin is None else xmin
xmax = np.inf if xmax is None else xmax
xl0_not_supplied = False
if xl0 is None:
xl0 = xm0 - 0.5
xl0_not_supplied = True
xr0_not_supplied = False
if xr0 is None:
xr0 = xm0 + 0.5
xr0_not_supplied = True
factor = 2.0 if factor is None else factor
xl0, xm0, xr0, xmin, xmax, factor = np.broadcast_arrays(
xl0, xm0, xr0, xmin, xmax, factor
)
if not np.issubdtype(xl0.dtype, np.number) or np.iscomplex(xl0).any():
raise ValueError('`xl0` must be numeric and real.')
if not np.issubdtype(xr0.dtype, np.number) or np.iscomplex(xr0).any():
raise ValueError('`xr0` must be numeric and real.')
if not np.issubdtype(xmin.dtype, np.number) or np.iscomplex(xmin).any():
raise ValueError('`xmin` must be numeric and real.')
if not np.issubdtype(xmax.dtype, np.number) or np.iscomplex(xmax).any():
raise ValueError('`xmax` must be numeric and real.')
if not np.issubdtype(factor.dtype, np.number) or np.iscomplex(factor).any():
raise ValueError('`factor` must be numeric and real.')
if not np.all(factor > 1):
raise ValueError('All elements of `factor` must be greater than 1.')
# Default choices for xl or xr might have exceeded xmin or xmax. Adjust
# to make sure this doesn't happen. We replace with copies because xl, and xr
# are read-only views produced by broadcast_arrays.
if xl0_not_supplied:
xl0 = xl0.copy()
cond = ~np.isinf(xmin) & (xl0 < xmin)
xl0[cond] = (
xm0[cond] - xmin[cond]
) / np.array(16, dtype=xl0.dtype)
if xr0_not_supplied:
xr0 = xr0.copy()
cond = ~np.isinf(xmax) & (xmax < xr0)
xr0[cond] = (
xmax[cond] - xm0[cond]
) / np.array(16, dtype=xr0.dtype)
maxiter = np.asarray(maxiter)
message = '`maxiter` must be a non-negative integer.'
if (not np.issubdtype(maxiter.dtype, np.number) or maxiter.shape != tuple()
or np.iscomplex(maxiter)):
raise ValueError(message)
maxiter_int = int(maxiter[()])
if not maxiter == maxiter_int or maxiter < 0:
raise ValueError(message)
if not np.all((xmin <= xl0) & (xl0 < xm0) & (xm0 < xr0) & (xr0 <= xmax)):
raise ValueError(
'`xmin <= xl0 < xm0 < xr0 <= xmax` must be True (elementwise).'
)
return func, xm0, xl0, xr0, xmin, xmax, factor, args, maxiter
def _bracket_minimum(func, xm0, *, xl0=None, xr0=None, xmin=None, xmax=None,
factor=None, args=(), maxiter=1000):
"""Bracket the minimum of a unimodal scalar function of one variable
This function works elementwise when `xm0`, `xl0`, `xr0`, `xmin`, `xmax`,
and the elements of `args` are broadcastable arrays.
Parameters
----------
func : callable
The function for which the minimum is to be bracketed.
The signature must be::
func(x: ndarray, *args) -> ndarray
where each element of ``x`` is a finite real and ``args`` is a tuple,
which may contain an arbitrary number of arrays that are broadcastable
with ``x``. `func` must be an elementwise function: each element
``func(x)[i]`` must equal ``func(x[i])`` for all indices `i`.
xm0: float array_like
Starting guess for middle point of bracket.
xl0, xr0: float array_like, optional
Starting guesses for left and right endpoints of the bracket. Must be
broadcastable with one another and with `xm0`.
xmin, xmax : float array_like, optional
Minimum and maximum allowable endpoints of the bracket, inclusive. Must
be broadcastable with `xl0`, `xm0`, and `xr0`.
factor : float array_like, optional
Controls expansion of bracket endpoint in downhill direction. Works
differently in the cases where a limit is set in the downhill direction
with `xmax` or `xmin`. See Notes.
args : tuple, optional
Additional positional arguments to be passed to `func`. Must be arrays
broadcastable with `xl0`, `xm0`, `xr0`, `xmin`, and `xmax`. If the
callable to be bracketed requires arguments that are not broadcastable
with these arrays, wrap that callable with `func` such that `func`
accepts only ``x`` and broadcastable arrays.
maxiter : int, optional
The maximum number of iterations of the algorithm to perform. The number
of function evaluations is three greater than the number of iterations.
Returns
-------
res : _RichResult
An instance of `scipy._lib._util._RichResult` with the following
attributes. The descriptions are written as though the values will be
scalars; however, if `func` returns an array, the outputs will be
arrays of the same shape.
xl, xm, xr : float
The left, middle, and right points of the bracket, if the algorithm
terminated successfully.
fl, fm, fr : float
The function value at the left, middle, and right points of the bracket.
nfev : int
The number of function evaluations required to find the bracket.
nit : int
The number of iterations of the algorithm that were performed.
status : int
An integer representing the exit status of the algorithm.
- ``0`` : The algorithm produced a valid bracket.
- ``-1`` : The bracket expanded to the allowable limits. Assuming
unimodality, this implies the endpoint at the limit is a
minimizer.
- ``-2`` : The maximum number of iterations was reached.
- ``-3`` : A non-finite value was encountered.
success : bool
``True`` when the algorithm terminated successfully (status ``0``).
Notes
-----
Similar to `scipy.optimize.bracket`, this function seeks to find real
points ``xl < xm < xr`` such that ``f(xl) >= f(xm)`` and ``f(xr) >= f(xm)``,
where at least one of the inequalities is strict. Unlike `scipy.optimize.bracket`,
this function can operate in a vectorized manner on array input, so long as
the input arrays are broadcastable with each other. Also unlike
`scipy.optimize.bracket`, users may specify minimum and maximum endpoints
for the desired bracket.
Given an initial trio of points ``xl = xl0``, ``xm = xm0``, ``xr = xr0``,
the algorithm checks if these points already give a valid bracket. If not,
a new endpoint, ``w`` is chosen in the "downhill" direction, ``xm`` becomes the new
opposite endpoint, and either `xl` or `xr` becomes the new middle point,
depending on which direction is downhill. The algorithm repeats from here.
The new endpoint `w` is chosen differently depending on whether or not a
boundary `xmin` or `xmax` has been set in the downhill direction. Without
loss of generality, suppose the downhill direction is to the right, so that
``f(xl) > f(xm) > f(xr)``. If there is no boundary to the right, then `w`
is chosen to be ``xr + factor * (xr - xm)`` where `factor` is controlled by
the user (defaults to 2.0) so that step sizes increase in geometric proportion.
If there is a boundary, `xmax` in this case, then `w` is chosen to be
``xmax - (xmax - xr)/factor``, with steps slowing to a stop at
`xmax`. This cautious approach ensures that a minimum near but distinct from
the boundary isn't missed while also detecting whether or not the `xmax` is
a minimizer when `xmax` is reached after a finite number of steps.
""" # noqa: E501
callback = None # works; I just don't want to test it
temp = _bracket_minimum_iv(func, xm0, xl0, xr0, xmin, xmax, factor, args, maxiter)
func, xm0, xl0, xr0, xmin, xmax, factor, args, maxiter = temp
xs = (xl0, xm0, xr0)
func, xs, fs, args, shape, dtype = eim._initialize(func, xs, args)
xl0, xm0, xr0 = xs
fl0, fm0, fr0 = fs
xmin = np.broadcast_to(xmin, shape).astype(dtype, copy=False).ravel()
xmax = np.broadcast_to(xmax, shape).astype(dtype, copy=False).ravel()
# We will modify factor later on so make a copy. np.broadcast_to returns
# a read-only view.
factor = np.broadcast_to(factor, shape).astype(dtype, copy=True).ravel()
# To simplify the logic, swap xl and xr if f(xl) < f(xr). We should always be
# marching downhill in the direction from xl to xr.
comp = fl0 < fr0
xl0[comp], xr0[comp] = xr0[comp], xl0[comp]
fl0[comp], fr0[comp] = fr0[comp], fl0[comp]
# We only need the boundary in the direction we're traveling.
limit = np.where(comp, xmin, xmax)
unlimited = np.isinf(limit)
limited = ~unlimited
step = np.empty_like(xl0)
step[unlimited] = (xr0[unlimited] - xm0[unlimited])
step[limited] = (limit[limited] - xr0[limited])
# Step size is divided by factor for case where there is a limit.
factor[limited] = 1 / factor[limited]
status = np.full_like(xl0, eim._EINPROGRESS, dtype=int)
nit, nfev = 0, 3
work = _RichResult(xl=xl0, xm=xm0, xr=xr0, xr0=xr0, fl=fl0, fm=fm0, fr=fr0,
step=step, limit=limit, limited=limited, factor=factor, nit=nit,
nfev=nfev, status=status, args=args)
res_work_pairs = [('status', 'status'), ('xl', 'xl'), ('xm', 'xm'), ('xr', 'xr'),
('nit', 'nit'), ('nfev', 'nfev'), ('fl', 'fl'), ('fm', 'fm'),
('fr', 'fr')]
def pre_func_eval(work):
work.step *= work.factor
x = np.empty_like(work.xr)
x[~work.limited] = work.xr0[~work.limited] + work.step[~work.limited]
x[work.limited] = work.limit[work.limited] - work.step[work.limited]
# Since the new bracket endpoint is calculated from an offset with the
# limit, it may be the case that the new endpoint equals the old endpoint,
# when the old endpoint is sufficiently close to the limit. We use the
# limit itself as the new endpoint in these cases.
x[work.limited] = np.where(
x[work.limited] == work.xr[work.limited],
work.limit[work.limited],
x[work.limited],
)
return x
def post_func_eval(x, f, work):
work.xl, work.xm, work.xr = work.xm, work.xr, x
work.fl, work.fm, work.fr = work.fm, work.fr, f
def check_termination(work):
# Condition 1: A valid bracket has been found.
stop = (
(work.fl >= work.fm) & (work.fr > work.fm)
| (work.fl > work.fm) & (work.fr >= work.fm)
)
work.status[stop] = eim._ECONVERGED
# Condition 2: Moving end of bracket reaches limit.
i = (work.xr == work.limit) & ~stop
work.status[i] = _ELIMITS
stop[i] = True
# Condition 3: non-finite value encountered
i = ~(np.isfinite(work.xr) & np.isfinite(work.fr)) & ~stop
work.status[i] = eim._EVALUEERR
stop[i] = True
return stop
def post_termination_check(work):
pass
def customize_result(res, shape):
# Reorder entries of xl and xr if they were swapped due to f(xl0) < f(xr0).
comp = res['xl'] > res['xr']
res['xl'][comp], res['xr'][comp] = res['xr'][comp], res['xl'][comp]
res['fl'][comp], res['fr'][comp] = res['fr'][comp], res['fl'][comp]
return shape
return eim._loop(work, callback, shape,
maxiter, func, args, dtype,
pre_func_eval, post_func_eval,
check_termination, post_termination_check,
customize_result, res_work_pairs)
@@ -0,0 +1,524 @@
import numpy as np
from ._zeros_py import _xtol, _rtol, _iter
import scipy._lib._elementwise_iterative_method as eim
from scipy._lib._util import _RichResult
def _chandrupatla(func, a, b, *, args=(), xatol=_xtol, xrtol=_rtol,
fatol=None, frtol=0, maxiter=_iter, callback=None):
"""Find the root of an elementwise function using Chandrupatla's algorithm.
For each element of the output of `func`, `chandrupatla` seeks the scalar
root that makes the element 0. This function allows for `a`, `b`, and the
output of `func` to be of any broadcastable shapes.
Parameters
----------
func : callable
The function whose root is desired. The signature must be::
func(x: ndarray, *args) -> ndarray
where each element of ``x`` is a finite real and ``args`` is a tuple,
which may contain an arbitrary number of components of any type(s).
``func`` must be an elementwise function: each element ``func(x)[i]``
must equal ``func(x[i])`` for all indices ``i``. `_chandrupatla`
seeks an array ``x`` such that ``func(x)`` is an array of zeros.
a, b : array_like
The lower and upper bounds of the root of the function. Must be
broadcastable with one another.
args : tuple, optional
Additional positional arguments to be passed to `func`.
xatol, xrtol, fatol, frtol : float, optional
Absolute and relative tolerances on the root and function value.
See Notes for details.
maxiter : int, optional
The maximum number of iterations of the algorithm to perform.
callback : callable, optional
An optional user-supplied function to be called before the first
iteration and after each iteration.
Called as ``callback(res)``, where ``res`` is a ``_RichResult``
similar to that returned by `_chandrupatla` (but containing the current
iterate's values of all variables). If `callback` raises a
``StopIteration``, the algorithm will terminate immediately and
`_chandrupatla` will return a result.
Returns
-------
res : _RichResult
An instance of `scipy._lib._util._RichResult` with the following
attributes. The descriptions are written as though the values will be
scalars; however, if `func` returns an array, the outputs will be
arrays of the same shape.
x : float
The root of the function, if the algorithm terminated successfully.
nfev : int
The number of times the function was called to find the root.
nit : int
The number of iterations of Chandrupatla's algorithm performed.
status : int
An integer representing the exit status of the algorithm.
``0`` : The algorithm converged to the specified tolerances.
``-1`` : The algorithm encountered an invalid bracket.
``-2`` : The maximum number of iterations was reached.
``-3`` : A non-finite value was encountered.
``-4`` : Iteration was terminated by `callback`.
``1`` : The algorithm is proceeding normally (in `callback` only).
success : bool
``True`` when the algorithm terminated successfully (status ``0``).
fun : float
The value of `func` evaluated at `x`.
xl, xr : float
The lower and upper ends of the bracket.
fl, fr : float
The function value at the lower and upper ends of the bracket.
Notes
-----
Implemented based on Chandrupatla's original paper [1]_.
If ``xl`` and ``xr`` are the left and right ends of the bracket,
``xmin = xl if abs(func(xl)) <= abs(func(xr)) else xr``,
and ``fmin0 = min(func(a), func(b))``, then the algorithm is considered to
have converged when ``abs(xr - xl) < xatol + abs(xmin) * xrtol`` or
``fun(xmin) <= fatol + abs(fmin0) * frtol``. This is equivalent to the
termination condition described in [1]_ with ``xrtol = 4e-10``,
``xatol = 1e-5``, and ``fatol = frtol = 0``. The default values are
``xatol = 2e-12``, ``xrtol = 4 * np.finfo(float).eps``, ``frtol = 0``,
and ``fatol`` is the smallest normal number of the ``dtype`` returned
by ``func``.
References
----------
.. [1] Chandrupatla, Tirupathi R.
"A new hybrid quadratic/bisection algorithm for finding the zero of a
nonlinear function without using derivatives".
Advances in Engineering Software, 28(3), 145-149.
https://doi.org/10.1016/s0965-9978(96)00051-8
See Also
--------
brentq, brenth, ridder, bisect, newton
Examples
--------
>>> from scipy import optimize
>>> def f(x, c):
... return x**3 - 2*x - c
>>> c = 5
>>> res = optimize._chandrupatla._chandrupatla(f, 0, 3, args=(c,))
>>> res.x
2.0945514818937463
>>> c = [3, 4, 5]
>>> res = optimize._chandrupatla._chandrupatla(f, 0, 3, args=(c,))
>>> res.x
array([1.8932892 , 2. , 2.09455148])
"""
res = _chandrupatla_iv(func, args, xatol, xrtol,
fatol, frtol, maxiter, callback)
func, args, xatol, xrtol, fatol, frtol, maxiter, callback = res
# Initialization
temp = eim._initialize(func, (a, b), args)
func, xs, fs, args, shape, dtype = temp
x1, x2 = xs
f1, f2 = fs
status = np.full_like(x1, eim._EINPROGRESS, dtype=int) # in progress
nit, nfev = 0, 2 # two function evaluations performed above
xatol = _xtol if xatol is None else xatol
xrtol = _rtol if xrtol is None else xrtol
fatol = np.finfo(dtype).tiny if fatol is None else fatol
frtol = frtol * np.minimum(np.abs(f1), np.abs(f2))
work = _RichResult(x1=x1, f1=f1, x2=x2, f2=f2, x3=None, f3=None, t=0.5,
xatol=xatol, xrtol=xrtol, fatol=fatol, frtol=frtol,
nit=nit, nfev=nfev, status=status)
res_work_pairs = [('status', 'status'), ('x', 'xmin'), ('fun', 'fmin'),
('nit', 'nit'), ('nfev', 'nfev'), ('xl', 'x1'),
('fl', 'f1'), ('xr', 'x2'), ('fr', 'f2')]
def pre_func_eval(work):
# [1] Figure 1 (first box)
x = work.x1 + work.t * (work.x2 - work.x1)
return x
def post_func_eval(x, f, work):
# [1] Figure 1 (first diamond and boxes)
# Note: y/n are reversed in figure; compare to BASIC in appendix
work.x3, work.f3 = work.x2.copy(), work.f2.copy()
j = np.sign(f) == np.sign(work.f1)
nj = ~j
work.x3[j], work.f3[j] = work.x1[j], work.f1[j]
work.x2[nj], work.f2[nj] = work.x1[nj], work.f1[nj]
work.x1, work.f1 = x, f
def check_termination(work):
# [1] Figure 1 (second diamond)
# Check for all terminal conditions and record statuses.
# See [1] Section 4 (first two sentences)
i = np.abs(work.f1) < np.abs(work.f2)
work.xmin = np.choose(i, (work.x2, work.x1))
work.fmin = np.choose(i, (work.f2, work.f1))
stop = np.zeros_like(work.x1, dtype=bool) # termination condition met
# This is the convergence criterion used in bisect. Chandrupatla's
# criterion is equivalent to this except with a factor of 4 on `xrtol`.
work.dx = abs(work.x2 - work.x1)
work.tol = abs(work.xmin) * work.xrtol + work.xatol
i = work.dx < work.tol
# Modify in place to incorporate tolerance on function value. Note that
# `frtol` has been redefined as `frtol = frtol * np.minimum(f1, f2)`,
# where `f1` and `f2` are the function evaluated at the original ends of
# the bracket.
i |= np.abs(work.fmin) <= work.fatol + work.frtol
work.status[i] = eim._ECONVERGED
stop[i] = True
i = (np.sign(work.f1) == np.sign(work.f2)) & ~stop
work.xmin[i], work.fmin[i], work.status[i] = np.nan, np.nan, eim._ESIGNERR
stop[i] = True
i = ~((np.isfinite(work.x1) & np.isfinite(work.x2)
& np.isfinite(work.f1) & np.isfinite(work.f2)) | stop)
work.xmin[i], work.fmin[i], work.status[i] = np.nan, np.nan, eim._EVALUEERR
stop[i] = True
return stop
def post_termination_check(work):
# [1] Figure 1 (third diamond and boxes / Equation 1)
xi1 = (work.x1 - work.x2) / (work.x3 - work.x2)
phi1 = (work.f1 - work.f2) / (work.f3 - work.f2)
alpha = (work.x3 - work.x1) / (work.x2 - work.x1)
j = ((1 - np.sqrt(1 - xi1)) < phi1) & (phi1 < np.sqrt(xi1))
f1j, f2j, f3j, alphaj = work.f1[j], work.f2[j], work.f3[j], alpha[j]
t = np.full_like(alpha, 0.5)
t[j] = (f1j / (f1j - f2j) * f3j / (f3j - f2j)
- alphaj * f1j / (f3j - f1j) * f2j / (f2j - f3j))
# [1] Figure 1 (last box; see also BASIC in appendix with comment
# "Adjust T Away from the Interval Boundary")
tl = 0.5 * work.tol / work.dx
work.t = np.clip(t, tl, 1 - tl)
def customize_result(res, shape):
xl, xr, fl, fr = res['xl'], res['xr'], res['fl'], res['fr']
i = res['xl'] < res['xr']
res['xl'] = np.choose(i, (xr, xl))
res['xr'] = np.choose(i, (xl, xr))
res['fl'] = np.choose(i, (fr, fl))
res['fr'] = np.choose(i, (fl, fr))
return shape
return eim._loop(work, callback, shape, maxiter, func, args, dtype,
pre_func_eval, post_func_eval, check_termination,
post_termination_check, customize_result, res_work_pairs)
def _chandrupatla_iv(func, args, xatol, xrtol,
fatol, frtol, maxiter, callback):
# Input validation for `_chandrupatla`
if not callable(func):
raise ValueError('`func` must be callable.')
if not np.iterable(args):
args = (args,)
tols = np.asarray([xatol if xatol is not None else 1,
xrtol if xrtol is not None else 1,
fatol if fatol is not None else 1,
frtol if frtol is not None else 1])
if (not np.issubdtype(tols.dtype, np.number) or np.any(tols < 0)
or np.any(np.isnan(tols)) or tols.shape != (4,)):
raise ValueError('Tolerances must be non-negative scalars.')
maxiter_int = int(maxiter)
if maxiter != maxiter_int or maxiter < 0:
raise ValueError('`maxiter` must be a non-negative integer.')
if callback is not None and not callable(callback):
raise ValueError('`callback` must be callable.')
return func, args, xatol, xrtol, fatol, frtol, maxiter, callback
def _chandrupatla_minimize(func, x1, x2, x3, *, args=(), xatol=None,
xrtol=None, fatol=None, frtol=None, maxiter=100,
callback=None):
"""Find the minimizer of an elementwise function.
For each element of the output of `func`, `_chandrupatla_minimize` seeks
the scalar minimizer that minimizes the element. This function allows for
`x1`, `x2`, `x3`, and the elements of `args` to be arrays of any
broadcastable shapes.
Parameters
----------
func : callable
The function whose minimizer is desired. The signature must be::
func(x: ndarray, *args) -> ndarray
where each element of ``x`` is a finite real and ``args`` is a tuple,
which may contain an arbitrary number of arrays that are broadcastable
with `x`. ``func`` must be an elementwise function: each element
``func(x)[i]`` must equal ``func(x[i])`` for all indices ``i``.
`_chandrupatla` seeks an array ``x`` such that ``func(x)`` is an array
of minima.
x1, x2, x3 : array_like
The abscissae of a standard scalar minimization bracket. A bracket is
valid if ``x1 < x2 < x3`` and ``func(x1) > func(x2) <= func(x3)``.
Must be broadcastable with one another and `args`.
args : tuple, optional
Additional positional arguments to be passed to `func`. Must be arrays
broadcastable with `x1`, `x2`, and `x3`. If the callable to be
differentiated requires arguments that are not broadcastable with `x`,
wrap that callable with `func` such that `func` accepts only `x` and
broadcastable arrays.
xatol, xrtol, fatol, frtol : float, optional
Absolute and relative tolerances on the minimizer and function value.
See Notes for details.
maxiter : int, optional
The maximum number of iterations of the algorithm to perform.
callback : callable, optional
An optional user-supplied function to be called before the first
iteration and after each iteration.
Called as ``callback(res)``, where ``res`` is a ``_RichResult``
similar to that returned by `_chandrupatla_minimize` (but containing
the current iterate's values of all variables). If `callback` raises a
``StopIteration``, the algorithm will terminate immediately and
`_chandrupatla_minimize` will return a result.
Returns
-------
res : _RichResult
An instance of `scipy._lib._util._RichResult` with the following
attributes. (The descriptions are written as though the values will be
scalars; however, if `func` returns an array, the outputs will be
arrays of the same shape.)
success : bool
``True`` when the algorithm terminated successfully (status ``0``).
status : int
An integer representing the exit status of the algorithm.
``0`` : The algorithm converged to the specified tolerances.
``-1`` : The algorithm encountered an invalid bracket.
``-2`` : The maximum number of iterations was reached.
``-3`` : A non-finite value was encountered.
``-4`` : Iteration was terminated by `callback`.
``1`` : The algorithm is proceeding normally (in `callback` only).
x : float
The minimizer of the function, if the algorithm terminated
successfully.
fun : float
The value of `func` evaluated at `x`.
nfev : int
The number of points at which `func` was evaluated.
nit : int
The number of iterations of the algorithm that were performed.
xl, xm, xr : float
The final three-point bracket.
fl, fm, fr : float
The function value at the bracket points.
Notes
-----
Implemented based on Chandrupatla's original paper [1]_.
If ``x1 < x2 < x3`` are the points of the bracket and ``f1 > f2 <= f3``
are the values of ``func`` at those points, then the algorithm is
considered to have converged when ``x3 - x1 <= abs(x2)*xrtol + xatol``
or ``(f1 - 2*f2 + f3)/2 <= abs(f2)*frtol + fatol``. Note that first of
these differs from the termination conditions described in [1]_. The
default values of `xrtol` is the square root of the precision of the
appropriate dtype, and ``xatol=fatol = frtol`` is the smallest normal
number of the appropriate dtype.
References
----------
.. [1] Chandrupatla, Tirupathi R. (1998).
"An efficient quadratic fit-sectioning algorithm for minimization
without derivatives".
Computer Methods in Applied Mechanics and Engineering, 152 (1-2),
211-217. https://doi.org/10.1016/S0045-7825(97)00190-4
See Also
--------
golden, brent, bounded
Examples
--------
>>> from scipy.optimize._chandrupatla import _chandrupatla_minimize
>>> def f(x, args=1):
... return (x - args)**2
>>> res = _chandrupatla_minimize(f, -5, 0, 5)
>>> res.x
1.0
>>> c = [1, 1.5, 2]
>>> res = _chandrupatla_minimize(f, -5, 0, 5, args=(c,))
>>> res.x
array([1. , 1.5, 2. ])
"""
res = _chandrupatla_iv(func, args, xatol, xrtol,
fatol, frtol, maxiter, callback)
func, args, xatol, xrtol, fatol, frtol, maxiter, callback = res
# Initialization
xs = (x1, x2, x3)
temp = eim._initialize(func, xs, args)
func, xs, fs, args, shape, dtype = temp # line split for PEP8
x1, x2, x3 = xs
f1, f2, f3 = fs
phi = dtype.type(0.5 + 0.5*5**0.5) # golden ratio
status = np.full_like(x1, eim._EINPROGRESS, dtype=int) # in progress
nit, nfev = 0, 3 # three function evaluations performed above
fatol = np.finfo(dtype).tiny if fatol is None else fatol
frtol = np.finfo(dtype).tiny if frtol is None else frtol
xatol = np.finfo(dtype).tiny if xatol is None else xatol
xrtol = np.sqrt(np.finfo(dtype).eps) if xrtol is None else xrtol
# Ensure that x1 < x2 < x3 initially.
xs, fs = np.vstack((x1, x2, x3)), np.vstack((f1, f2, f3))
i = np.argsort(xs, axis=0)
x1, x2, x3 = np.take_along_axis(xs, i, axis=0)
f1, f2, f3 = np.take_along_axis(fs, i, axis=0)
q0 = x3.copy() # "At the start, q0 is set at x3..." ([1] after (7))
work = _RichResult(x1=x1, f1=f1, x2=x2, f2=f2, x3=x3, f3=f3, phi=phi,
xatol=xatol, xrtol=xrtol, fatol=fatol, frtol=frtol,
nit=nit, nfev=nfev, status=status, q0=q0, args=args)
res_work_pairs = [('status', 'status'),
('x', 'x2'), ('fun', 'f2'),
('nit', 'nit'), ('nfev', 'nfev'),
('xl', 'x1'), ('xm', 'x2'), ('xr', 'x3'),
('fl', 'f1'), ('fm', 'f2'), ('fr', 'f3')]
def pre_func_eval(work):
# `_check_termination` is called first -> `x3 - x2 > x2 - x1`
# But let's calculate a few terms that we'll reuse
x21 = work.x2 - work.x1
x32 = work.x3 - work.x2
# [1] Section 3. "The quadratic minimum point Q1 is calculated using
# the relations developed in the previous section." [1] Section 2 (5/6)
A = x21 * (work.f3 - work.f2)
B = x32 * (work.f1 - work.f2)
C = A / (A + B)
# q1 = C * (work.x1 + work.x2) / 2 + (1 - C) * (work.x2 + work.x3) / 2
q1 = 0.5 * (C*(work.x1 - work.x3) + work.x2 + work.x3) # much faster
# this is an array, so multiplying by 0.5 does not change dtype
# "If Q1 and Q0 are sufficiently close... Q1 is accepted if it is
# sufficiently away from the inside point x2"
i = abs(q1 - work.q0) < 0.5 * abs(x21) # [1] (7)
xi = q1[i]
# Later, after (9), "If the point Q1 is in a +/- xtol neighborhood of
# x2, the new point is chosen in the larger interval at a distance
# tol away from x2."
# See also QBASIC code after "Accept Ql adjust if close to X2".
j = abs(q1[i] - work.x2[i]) <= work.xtol[i]
xi[j] = work.x2[i][j] + np.sign(x32[i][j]) * work.xtol[i][j]
# "If condition (7) is not satisfied, golden sectioning of the larger
# interval is carried out to introduce the new point."
# (For simplicity, we go ahead and calculate it for all points, but we
# change the elements for which the condition was satisfied.)
x = work.x2 + (2 - work.phi) * x32
x[i] = xi
# "We define Q0 as the value of Q1 at the previous iteration."
work.q0 = q1
return x
def post_func_eval(x, f, work):
# Standard logic for updating a three-point bracket based on a new
# point. In QBASIC code, see "IF SGN(X-X2) = SGN(X3-X2) THEN...".
# There is an awful lot of data copying going on here; this would
# probably benefit from code optimization or implementation in Pythran.
i = np.sign(x - work.x2) == np.sign(work.x3 - work.x2)
xi, x1i, x2i, x3i = x[i], work.x1[i], work.x2[i], work.x3[i],
fi, f1i, f2i, f3i = f[i], work.f1[i], work.f2[i], work.f3[i]
j = fi > f2i
x3i[j], f3i[j] = xi[j], fi[j]
j = ~j
x1i[j], f1i[j], x2i[j], f2i[j] = x2i[j], f2i[j], xi[j], fi[j]
ni = ~i
xni, x1ni, x2ni, x3ni = x[ni], work.x1[ni], work.x2[ni], work.x3[ni],
fni, f1ni, f2ni, f3ni = f[ni], work.f1[ni], work.f2[ni], work.f3[ni]
j = fni > f2ni
x1ni[j], f1ni[j] = xni[j], fni[j]
j = ~j
x3ni[j], f3ni[j], x2ni[j], f2ni[j] = x2ni[j], f2ni[j], xni[j], fni[j]
work.x1[i], work.x2[i], work.x3[i] = x1i, x2i, x3i
work.f1[i], work.f2[i], work.f3[i] = f1i, f2i, f3i
work.x1[ni], work.x2[ni], work.x3[ni] = x1ni, x2ni, x3ni,
work.f1[ni], work.f2[ni], work.f3[ni] = f1ni, f2ni, f3ni
def check_termination(work):
# Check for all terminal conditions and record statuses.
stop = np.zeros_like(work.x1, dtype=bool) # termination condition met
# Bracket is invalid; stop and don't return minimizer/minimum
i = ((work.f2 > work.f1) | (work.f2 > work.f3))
work.x2[i], work.f2[i] = np.nan, np.nan
stop[i], work.status[i] = True, eim._ESIGNERR
# Non-finite values; stop and don't return minimizer/minimum
finite = np.isfinite(work.x1+work.x2+work.x3+work.f1+work.f2+work.f3)
i = ~(finite | stop)
work.x2[i], work.f2[i] = np.nan, np.nan
stop[i], work.status[i] = True, eim._EVALUEERR
# [1] Section 3 "Points 1 and 3 are interchanged if necessary to make
# the (x2, x3) the larger interval."
# Note: I had used np.choose; this is much faster. This would be a good
# place to save e.g. `work.x3 - work.x2` for reuse, but I tried and
# didn't notice a speed boost, so let's keep it simple.
i = abs(work.x3 - work.x2) < abs(work.x2 - work.x1)
temp = work.x1[i]
work.x1[i] = work.x3[i]
work.x3[i] = temp
temp = work.f1[i]
work.f1[i] = work.f3[i]
work.f3[i] = temp
# [1] Section 3 (bottom of page 212)
# "We set a tolerance value xtol..."
work.xtol = abs(work.x2) * work.xrtol + work.xatol # [1] (8)
# "The convergence based on interval is achieved when..."
# Note: Equality allowed in case of `xtol=0`
i = abs(work.x3 - work.x2) <= 2 * work.xtol # [1] (9)
# "We define ftol using..."
ftol = abs(work.f2) * work.frtol + work.fatol # [1] (10)
# "The convergence based on function values is achieved when..."
# Note 1: modify in place to incorporate tolerance on function value.
# Note 2: factor of 2 is not in the text; see QBASIC start of DO loop
i |= (work.f1 - 2 * work.f2 + work.f3) <= 2*ftol # [1] (11)
i &= ~stop
stop[i], work.status[i] = True, eim._ECONVERGED
return stop
def post_termination_check(work):
pass
def customize_result(res, shape):
xl, xr, fl, fr = res['xl'], res['xr'], res['fl'], res['fr']
i = res['xl'] < res['xr']
res['xl'] = np.choose(i, (xr, xl))
res['xr'] = np.choose(i, (xl, xr))
res['fl'] = np.choose(i, (fr, fl))
res['fr'] = np.choose(i, (fl, fr))
return shape
return eim._loop(work, callback, shape, maxiter, func, args, dtype,
pre_func_eval, post_func_eval, check_termination,
post_termination_check, customize_result, res_work_pairs)
@@ -0,0 +1,316 @@
"""
Interface to Constrained Optimization By Linear Approximation
Functions
---------
.. autosummary::
:toctree: generated/
fmin_cobyla
"""
import functools
from threading import RLock
import numpy as np
from scipy.optimize import _cobyla as cobyla
from ._optimize import (OptimizeResult, _check_unknown_options,
_prepare_scalar_function)
try:
from itertools import izip
except ImportError:
izip = zip
__all__ = ['fmin_cobyla']
# Workaround as _cobyla.minimize is not threadsafe
# due to an unknown f2py bug and can segfault,
# see gh-9658.
_module_lock = RLock()
def synchronized(func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
with _module_lock:
return func(*args, **kwargs)
return wrapper
@synchronized
def fmin_cobyla(func, x0, cons, args=(), consargs=None, rhobeg=1.0,
rhoend=1e-4, maxfun=1000, disp=None, catol=2e-4,
*, callback=None):
"""
Minimize a function using the Constrained Optimization By Linear
Approximation (COBYLA) method. This method wraps a FORTRAN
implementation of the algorithm.
Parameters
----------
func : callable
Function to minimize. In the form func(x, \\*args).
x0 : ndarray
Initial guess.
cons : sequence
Constraint functions; must all be ``>=0`` (a single function
if only 1 constraint). Each function takes the parameters `x`
as its first argument, and it can return either a single number or
an array or list of numbers.
args : tuple, optional
Extra arguments to pass to function.
consargs : tuple, optional
Extra arguments to pass to constraint functions (default of None means
use same extra arguments as those passed to func).
Use ``()`` for no extra arguments.
rhobeg : float, optional
Reasonable initial changes to the variables.
rhoend : float, optional
Final accuracy in the optimization (not precisely guaranteed). This
is a lower bound on the size of the trust region.
disp : {0, 1, 2, 3}, optional
Controls the frequency of output; 0 implies no output.
maxfun : int, optional
Maximum number of function evaluations.
catol : float, optional
Absolute tolerance for constraint violations.
callback : callable, optional
Called after each iteration, as ``callback(x)``, where ``x`` is the
current parameter vector.
Returns
-------
x : ndarray
The argument that minimises `f`.
See also
--------
minimize: Interface to minimization algorithms for multivariate
functions. See the 'COBYLA' `method` in particular.
Notes
-----
This algorithm is based on linear approximations to the objective
function and each constraint. We briefly describe the algorithm.
Suppose the function is being minimized over k variables. At the
jth iteration the algorithm has k+1 points v_1, ..., v_(k+1),
an approximate solution x_j, and a radius RHO_j.
(i.e., linear plus a constant) approximations to the objective
function and constraint functions such that their function values
agree with the linear approximation on the k+1 points v_1,.., v_(k+1).
This gives a linear program to solve (where the linear approximations
of the constraint functions are constrained to be non-negative).
However, the linear approximations are likely only good
approximations near the current simplex, so the linear program is
given the further requirement that the solution, which
will become x_(j+1), must be within RHO_j from x_j. RHO_j only
decreases, never increases. The initial RHO_j is rhobeg and the
final RHO_j is rhoend. In this way COBYLA's iterations behave
like a trust region algorithm.
Additionally, the linear program may be inconsistent, or the
approximation may give poor improvement. For details about
how these issues are resolved, as well as how the points v_i are
updated, refer to the source code or the references below.
References
----------
Powell M.J.D. (1994), "A direct search optimization method that models
the objective and constraint functions by linear interpolation.", in
Advances in Optimization and Numerical Analysis, eds. S. Gomez and
J-P Hennart, Kluwer Academic (Dordrecht), pp. 51-67
Powell M.J.D. (1998), "Direct search algorithms for optimization
calculations", Acta Numerica 7, 287-336
Powell M.J.D. (2007), "A view of algorithms for optimization without
derivatives", Cambridge University Technical Report DAMTP 2007/NA03
Examples
--------
Minimize the objective function f(x,y) = x*y subject
to the constraints x**2 + y**2 < 1 and y > 0::
>>> def objective(x):
... return x[0]*x[1]
...
>>> def constr1(x):
... return 1 - (x[0]**2 + x[1]**2)
...
>>> def constr2(x):
... return x[1]
...
>>> from scipy.optimize import fmin_cobyla
>>> fmin_cobyla(objective, [0.0, 0.1], [constr1, constr2], rhoend=1e-7)
array([-0.70710685, 0.70710671])
The exact solution is (-sqrt(2)/2, sqrt(2)/2).
"""
err = "cons must be a sequence of callable functions or a single"\
" callable function."
try:
len(cons)
except TypeError as e:
if callable(cons):
cons = [cons]
else:
raise TypeError(err) from e
else:
for thisfunc in cons:
if not callable(thisfunc):
raise TypeError(err)
if consargs is None:
consargs = args
# build constraints
con = tuple({'type': 'ineq', 'fun': c, 'args': consargs} for c in cons)
# options
opts = {'rhobeg': rhobeg,
'tol': rhoend,
'disp': disp,
'maxiter': maxfun,
'catol': catol,
'callback': callback}
sol = _minimize_cobyla(func, x0, args, constraints=con,
**opts)
if disp and not sol['success']:
print(f"COBYLA failed to find a solution: {sol.message}")
return sol['x']
@synchronized
def _minimize_cobyla(fun, x0, args=(), constraints=(),
rhobeg=1.0, tol=1e-4, maxiter=1000,
disp=False, catol=2e-4, callback=None, bounds=None,
**unknown_options):
"""
Minimize a scalar function of one or more variables using the
Constrained Optimization BY Linear Approximation (COBYLA) algorithm.
Options
-------
rhobeg : float
Reasonable initial changes to the variables.
tol : float
Final accuracy in the optimization (not precisely guaranteed).
This is a lower bound on the size of the trust region.
disp : bool
Set to True to print convergence messages. If False,
`verbosity` is ignored as set to 0.
maxiter : int
Maximum number of function evaluations.
catol : float
Tolerance (absolute) for constraint violations
"""
_check_unknown_options(unknown_options)
maxfun = maxiter
rhoend = tol
iprint = int(bool(disp))
# check constraints
if isinstance(constraints, dict):
constraints = (constraints, )
if bounds:
i_lb = np.isfinite(bounds.lb)
if np.any(i_lb):
def lb_constraint(x, *args, **kwargs):
return x[i_lb] - bounds.lb[i_lb]
constraints.append({'type': 'ineq', 'fun': lb_constraint})
i_ub = np.isfinite(bounds.ub)
if np.any(i_ub):
def ub_constraint(x):
return bounds.ub[i_ub] - x[i_ub]
constraints.append({'type': 'ineq', 'fun': ub_constraint})
for ic, con in enumerate(constraints):
# check type
try:
ctype = con['type'].lower()
except KeyError as e:
raise KeyError('Constraint %d has no type defined.' % ic) from e
except TypeError as e:
raise TypeError('Constraints must be defined using a '
'dictionary.') from e
except AttributeError as e:
raise TypeError("Constraint's type must be a string.") from e
else:
if ctype != 'ineq':
raise ValueError("Constraints of type '%s' not handled by "
"COBYLA." % con['type'])
# check function
if 'fun' not in con:
raise KeyError('Constraint %d has no function defined.' % ic)
# check extra arguments
if 'args' not in con:
con['args'] = ()
# m is the total number of constraint values
# it takes into account that some constraints may be vector-valued
cons_lengths = []
for c in constraints:
f = c['fun'](x0, *c['args'])
try:
cons_length = len(f)
except TypeError:
cons_length = 1
cons_lengths.append(cons_length)
m = sum(cons_lengths)
# create the ScalarFunction, cobyla doesn't require derivative function
def _jac(x, *args):
return None
sf = _prepare_scalar_function(fun, x0, args=args, jac=_jac)
def calcfc(x, con):
f = sf.fun(x)
i = 0
for size, c in izip(cons_lengths, constraints):
con[i: i + size] = c['fun'](x, *c['args'])
i += size
return f
def wrapped_callback(x):
if callback is not None:
callback(np.copy(x))
info = np.zeros(4, np.float64)
xopt, info = cobyla.minimize(calcfc, m=m, x=np.copy(x0), rhobeg=rhobeg,
rhoend=rhoend, iprint=iprint, maxfun=maxfun,
dinfo=info, callback=wrapped_callback)
if info[3] > catol:
# Check constraint violation
info[0] = 4
return OptimizeResult(x=xopt,
status=int(info[0]),
success=info[0] == 1,
message={1: 'Optimization terminated successfully.',
2: 'Maximum number of function evaluations '
'has been exceeded.',
3: 'Rounding errors are becoming damaging '
'in COBYLA subroutine.',
4: 'Did not converge to a solution '
'satisfying the constraints. See '
'`maxcv` for magnitude of violation.',
5: 'NaN result encountered.'
}.get(info[0], 'Unknown exit status.'),
nfev=int(info[1]),
fun=info[2],
maxcv=info[3])
@@ -0,0 +1,590 @@
"""Constraints definition for minimize."""
import numpy as np
from ._hessian_update_strategy import BFGS
from ._differentiable_functions import (
VectorFunction, LinearVectorFunction, IdentityVectorFunction)
from ._optimize import OptimizeWarning
from warnings import warn, catch_warnings, simplefilter, filterwarnings
from scipy.sparse import issparse
def _arr_to_scalar(x):
# If x is a numpy array, return x.item(). This will
# fail if the array has more than one element.
return x.item() if isinstance(x, np.ndarray) else x
class NonlinearConstraint:
"""Nonlinear constraint on the variables.
The constraint has the general inequality form::
lb <= fun(x) <= ub
Here the vector of independent variables x is passed as ndarray of shape
(n,) and ``fun`` returns a vector with m components.
It is possible to use equal bounds to represent an equality constraint or
infinite bounds to represent a one-sided constraint.
Parameters
----------
fun : callable
The function defining the constraint.
The signature is ``fun(x) -> array_like, shape (m,)``.
lb, ub : array_like
Lower and upper bounds on the constraint. Each array must have the
shape (m,) or be a scalar, in the latter case a bound will be the same
for all components of the constraint. Use ``np.inf`` with an
appropriate sign to specify a one-sided constraint.
Set components of `lb` and `ub` equal to represent an equality
constraint. Note that you can mix constraints of different types:
interval, one-sided or equality, by setting different components of
`lb` and `ub` as necessary.
jac : {callable, '2-point', '3-point', 'cs'}, optional
Method of computing the Jacobian matrix (an m-by-n matrix,
where element (i, j) is the partial derivative of f[i] with
respect to x[j]). The keywords {'2-point', '3-point',
'cs'} select a finite difference scheme for the numerical estimation.
A callable must have the following signature:
``jac(x) -> {ndarray, sparse matrix}, shape (m, n)``.
Default is '2-point'.
hess : {callable, '2-point', '3-point', 'cs', HessianUpdateStrategy, None}, optional
Method for computing the Hessian matrix. The keywords
{'2-point', '3-point', 'cs'} select a finite difference scheme for
numerical estimation. Alternatively, objects implementing
`HessianUpdateStrategy` interface can be used to approximate the
Hessian. Currently available implementations are:
- `BFGS` (default option)
- `SR1`
A callable must return the Hessian matrix of ``dot(fun, v)`` and
must have the following signature:
``hess(x, v) -> {LinearOperator, sparse matrix, array_like}, shape (n, n)``.
Here ``v`` is ndarray with shape (m,) containing Lagrange multipliers.
keep_feasible : array_like of bool, optional
Whether to keep the constraint components feasible throughout
iterations. A single value set this property for all components.
Default is False. Has no effect for equality constraints.
finite_diff_rel_step: None or array_like, optional
Relative step size for the finite difference approximation. Default is
None, which will select a reasonable value automatically depending
on a finite difference scheme.
finite_diff_jac_sparsity: {None, array_like, sparse matrix}, optional
Defines the sparsity structure of the Jacobian matrix for finite
difference estimation, its shape must be (m, n). If the Jacobian has
only few non-zero elements in *each* row, providing the sparsity
structure will greatly speed up the computations. A zero entry means
that a corresponding element in the Jacobian is identically zero.
If provided, forces the use of 'lsmr' trust-region solver.
If None (default) then dense differencing will be used.
Notes
-----
Finite difference schemes {'2-point', '3-point', 'cs'} may be used for
approximating either the Jacobian or the Hessian. We, however, do not allow
its use for approximating both simultaneously. Hence whenever the Jacobian
is estimated via finite-differences, we require the Hessian to be estimated
using one of the quasi-Newton strategies.
The scheme 'cs' is potentially the most accurate, but requires the function
to correctly handles complex inputs and be analytically continuable to the
complex plane. The scheme '3-point' is more accurate than '2-point' but
requires twice as many operations.
Examples
--------
Constrain ``x[0] < sin(x[1]) + 1.9``
>>> from scipy.optimize import NonlinearConstraint
>>> import numpy as np
>>> con = lambda x: x[0] - np.sin(x[1])
>>> nlc = NonlinearConstraint(con, -np.inf, 1.9)
"""
def __init__(self, fun, lb, ub, jac='2-point', hess=BFGS(),
keep_feasible=False, finite_diff_rel_step=None,
finite_diff_jac_sparsity=None):
self.fun = fun
self.lb = lb
self.ub = ub
self.finite_diff_rel_step = finite_diff_rel_step
self.finite_diff_jac_sparsity = finite_diff_jac_sparsity
self.jac = jac
self.hess = hess
self.keep_feasible = keep_feasible
class LinearConstraint:
"""Linear constraint on the variables.
The constraint has the general inequality form::
lb <= A.dot(x) <= ub
Here the vector of independent variables x is passed as ndarray of shape
(n,) and the matrix A has shape (m, n).
It is possible to use equal bounds to represent an equality constraint or
infinite bounds to represent a one-sided constraint.
Parameters
----------
A : {array_like, sparse matrix}, shape (m, n)
Matrix defining the constraint.
lb, ub : dense array_like, optional
Lower and upper limits on the constraint. Each array must have the
shape (m,) or be a scalar, in the latter case a bound will be the same
for all components of the constraint. Use ``np.inf`` with an
appropriate sign to specify a one-sided constraint.
Set components of `lb` and `ub` equal to represent an equality
constraint. Note that you can mix constraints of different types:
interval, one-sided or equality, by setting different components of
`lb` and `ub` as necessary. Defaults to ``lb = -np.inf``
and ``ub = np.inf`` (no limits).
keep_feasible : dense array_like of bool, optional
Whether to keep the constraint components feasible throughout
iterations. A single value set this property for all components.
Default is False. Has no effect for equality constraints.
"""
def _input_validation(self):
if self.A.ndim != 2:
message = "`A` must have exactly two dimensions."
raise ValueError(message)
try:
shape = self.A.shape[0:1]
self.lb = np.broadcast_to(self.lb, shape)
self.ub = np.broadcast_to(self.ub, shape)
self.keep_feasible = np.broadcast_to(self.keep_feasible, shape)
except ValueError:
message = ("`lb`, `ub`, and `keep_feasible` must be broadcastable "
"to shape `A.shape[0:1]`")
raise ValueError(message)
def __init__(self, A, lb=-np.inf, ub=np.inf, keep_feasible=False):
if not issparse(A):
# In some cases, if the constraint is not valid, this emits a
# VisibleDeprecationWarning about ragged nested sequences
# before eventually causing an error. `scipy.optimize.milp` would
# prefer that this just error out immediately so it can handle it
# rather than concerning the user.
with catch_warnings():
simplefilter("error")
self.A = np.atleast_2d(A).astype(np.float64)
else:
self.A = A
if issparse(lb) or issparse(ub):
raise ValueError("Constraint limits must be dense arrays.")
self.lb = np.atleast_1d(lb).astype(np.float64)
self.ub = np.atleast_1d(ub).astype(np.float64)
if issparse(keep_feasible):
raise ValueError("`keep_feasible` must be a dense array.")
self.keep_feasible = np.atleast_1d(keep_feasible).astype(bool)
self._input_validation()
def residual(self, x):
"""
Calculate the residual between the constraint function and the limits
For a linear constraint of the form::
lb <= A@x <= ub
the lower and upper residuals between ``A@x`` and the limits are values
``sl`` and ``sb`` such that::
lb + sl == A@x == ub - sb
When all elements of ``sl`` and ``sb`` are positive, all elements of
the constraint are satisfied; a negative element in ``sl`` or ``sb``
indicates that the corresponding element of the constraint is not
satisfied.
Parameters
----------
x: array_like
Vector of independent variables
Returns
-------
sl, sb : array-like
The lower and upper residuals
"""
return self.A@x - self.lb, self.ub - self.A@x
class Bounds:
"""Bounds constraint on the variables.
The constraint has the general inequality form::
lb <= x <= ub
It is possible to use equal bounds to represent an equality constraint or
infinite bounds to represent a one-sided constraint.
Parameters
----------
lb, ub : dense array_like, optional
Lower and upper bounds on independent variables. `lb`, `ub`, and
`keep_feasible` must be the same shape or broadcastable.
Set components of `lb` and `ub` equal
to fix a variable. Use ``np.inf`` with an appropriate sign to disable
bounds on all or some variables. Note that you can mix constraints of
different types: interval, one-sided or equality, by setting different
components of `lb` and `ub` as necessary. Defaults to ``lb = -np.inf``
and ``ub = np.inf`` (no bounds).
keep_feasible : dense array_like of bool, optional
Whether to keep the constraint components feasible throughout
iterations. Must be broadcastable with `lb` and `ub`.
Default is False. Has no effect for equality constraints.
"""
def _input_validation(self):
try:
res = np.broadcast_arrays(self.lb, self.ub, self.keep_feasible)
self.lb, self.ub, self.keep_feasible = res
except ValueError:
message = "`lb`, `ub`, and `keep_feasible` must be broadcastable."
raise ValueError(message)
def __init__(self, lb=-np.inf, ub=np.inf, keep_feasible=False):
if issparse(lb) or issparse(ub):
raise ValueError("Lower and upper bounds must be dense arrays.")
self.lb = np.atleast_1d(lb)
self.ub = np.atleast_1d(ub)
if issparse(keep_feasible):
raise ValueError("`keep_feasible` must be a dense array.")
self.keep_feasible = np.atleast_1d(keep_feasible).astype(bool)
self._input_validation()
def __repr__(self):
start = f"{type(self).__name__}({self.lb!r}, {self.ub!r}"
if np.any(self.keep_feasible):
end = f", keep_feasible={self.keep_feasible!r})"
else:
end = ")"
return start + end
def residual(self, x):
"""Calculate the residual (slack) between the input and the bounds
For a bound constraint of the form::
lb <= x <= ub
the lower and upper residuals between `x` and the bounds are values
``sl`` and ``sb`` such that::
lb + sl == x == ub - sb
When all elements of ``sl`` and ``sb`` are positive, all elements of
``x`` lie within the bounds; a negative element in ``sl`` or ``sb``
indicates that the corresponding element of ``x`` is out of bounds.
Parameters
----------
x: array_like
Vector of independent variables
Returns
-------
sl, sb : array-like
The lower and upper residuals
"""
return x - self.lb, self.ub - x
class PreparedConstraint:
"""Constraint prepared from a user defined constraint.
On creation it will check whether a constraint definition is valid and
the initial point is feasible. If created successfully, it will contain
the attributes listed below.
Parameters
----------
constraint : {NonlinearConstraint, LinearConstraint`, Bounds}
Constraint to check and prepare.
x0 : array_like
Initial vector of independent variables.
sparse_jacobian : bool or None, optional
If bool, then the Jacobian of the constraint will be converted
to the corresponded format if necessary. If None (default), such
conversion is not made.
finite_diff_bounds : 2-tuple, optional
Lower and upper bounds on the independent variables for the finite
difference approximation, if applicable. Defaults to no bounds.
Attributes
----------
fun : {VectorFunction, LinearVectorFunction, IdentityVectorFunction}
Function defining the constraint wrapped by one of the convenience
classes.
bounds : 2-tuple
Contains lower and upper bounds for the constraints --- lb and ub.
These are converted to ndarray and have a size equal to the number of
the constraints.
keep_feasible : ndarray
Array indicating which components must be kept feasible with a size
equal to the number of the constraints.
"""
def __init__(self, constraint, x0, sparse_jacobian=None,
finite_diff_bounds=(-np.inf, np.inf)):
if isinstance(constraint, NonlinearConstraint):
fun = VectorFunction(constraint.fun, x0,
constraint.jac, constraint.hess,
constraint.finite_diff_rel_step,
constraint.finite_diff_jac_sparsity,
finite_diff_bounds, sparse_jacobian)
elif isinstance(constraint, LinearConstraint):
fun = LinearVectorFunction(constraint.A, x0, sparse_jacobian)
elif isinstance(constraint, Bounds):
fun = IdentityVectorFunction(x0, sparse_jacobian)
else:
raise ValueError("`constraint` of an unknown type is passed.")
m = fun.m
lb = np.asarray(constraint.lb, dtype=float)
ub = np.asarray(constraint.ub, dtype=float)
keep_feasible = np.asarray(constraint.keep_feasible, dtype=bool)
lb = np.broadcast_to(lb, m)
ub = np.broadcast_to(ub, m)
keep_feasible = np.broadcast_to(keep_feasible, m)
if keep_feasible.shape != (m,):
raise ValueError("`keep_feasible` has a wrong shape.")
mask = keep_feasible & (lb != ub)
f0 = fun.f
if np.any(f0[mask] < lb[mask]) or np.any(f0[mask] > ub[mask]):
raise ValueError("`x0` is infeasible with respect to some "
"inequality constraint with `keep_feasible` "
"set to True.")
self.fun = fun
self.bounds = (lb, ub)
self.keep_feasible = keep_feasible
def violation(self, x):
"""How much the constraint is exceeded by.
Parameters
----------
x : array-like
Vector of independent variables
Returns
-------
excess : array-like
How much the constraint is exceeded by, for each of the
constraints specified by `PreparedConstraint.fun`.
"""
with catch_warnings():
# Ignore the following warning, it's not important when
# figuring out total violation
# UserWarning: delta_grad == 0.0. Check if the approximated
# function is linear
filterwarnings("ignore", "delta_grad", UserWarning)
ev = self.fun.fun(np.asarray(x))
excess_lb = np.maximum(self.bounds[0] - ev, 0)
excess_ub = np.maximum(ev - self.bounds[1], 0)
return excess_lb + excess_ub
def new_bounds_to_old(lb, ub, n):
"""Convert the new bounds representation to the old one.
The new representation is a tuple (lb, ub) and the old one is a list
containing n tuples, ith containing lower and upper bound on a ith
variable.
If any of the entries in lb/ub are -np.inf/np.inf they are replaced by
None.
"""
lb = np.broadcast_to(lb, n)
ub = np.broadcast_to(ub, n)
lb = [float(x) if x > -np.inf else None for x in lb]
ub = [float(x) if x < np.inf else None for x in ub]
return list(zip(lb, ub))
def old_bound_to_new(bounds):
"""Convert the old bounds representation to the new one.
The new representation is a tuple (lb, ub) and the old one is a list
containing n tuples, ith containing lower and upper bound on a ith
variable.
If any of the entries in lb/ub are None they are replaced by
-np.inf/np.inf.
"""
lb, ub = zip(*bounds)
# Convert occurrences of None to -inf or inf, and replace occurrences of
# any numpy array x with x.item(). Then wrap the results in numpy arrays.
lb = np.array([float(_arr_to_scalar(x)) if x is not None else -np.inf
for x in lb])
ub = np.array([float(_arr_to_scalar(x)) if x is not None else np.inf
for x in ub])
return lb, ub
def strict_bounds(lb, ub, keep_feasible, n_vars):
"""Remove bounds which are not asked to be kept feasible."""
strict_lb = np.resize(lb, n_vars).astype(float)
strict_ub = np.resize(ub, n_vars).astype(float)
keep_feasible = np.resize(keep_feasible, n_vars)
strict_lb[~keep_feasible] = -np.inf
strict_ub[~keep_feasible] = np.inf
return strict_lb, strict_ub
def new_constraint_to_old(con, x0):
"""
Converts new-style constraint objects to old-style constraint dictionaries.
"""
if isinstance(con, NonlinearConstraint):
if (con.finite_diff_jac_sparsity is not None or
con.finite_diff_rel_step is not None or
not isinstance(con.hess, BFGS) or # misses user specified BFGS
con.keep_feasible):
warn("Constraint options `finite_diff_jac_sparsity`, "
"`finite_diff_rel_step`, `keep_feasible`, and `hess`"
"are ignored by this method.",
OptimizeWarning, stacklevel=3)
fun = con.fun
if callable(con.jac):
jac = con.jac
else:
jac = None
else: # LinearConstraint
if np.any(con.keep_feasible):
warn("Constraint option `keep_feasible` is ignored by this method.",
OptimizeWarning, stacklevel=3)
A = con.A
if issparse(A):
A = A.toarray()
def fun(x):
return np.dot(A, x)
def jac(x):
return A
# FIXME: when bugs in VectorFunction/LinearVectorFunction are worked out,
# use pcon.fun.fun and pcon.fun.jac. Until then, get fun/jac above.
pcon = PreparedConstraint(con, x0)
lb, ub = pcon.bounds
i_eq = lb == ub
i_bound_below = np.logical_xor(lb != -np.inf, i_eq)
i_bound_above = np.logical_xor(ub != np.inf, i_eq)
i_unbounded = np.logical_and(lb == -np.inf, ub == np.inf)
if np.any(i_unbounded):
warn("At least one constraint is unbounded above and below. Such "
"constraints are ignored.",
OptimizeWarning, stacklevel=3)
ceq = []
if np.any(i_eq):
def f_eq(x):
y = np.array(fun(x)).flatten()
return y[i_eq] - lb[i_eq]
ceq = [{"type": "eq", "fun": f_eq}]
if jac is not None:
def j_eq(x):
dy = jac(x)
if issparse(dy):
dy = dy.toarray()
dy = np.atleast_2d(dy)
return dy[i_eq, :]
ceq[0]["jac"] = j_eq
cineq = []
n_bound_below = np.sum(i_bound_below)
n_bound_above = np.sum(i_bound_above)
if n_bound_below + n_bound_above:
def f_ineq(x):
y = np.zeros(n_bound_below + n_bound_above)
y_all = np.array(fun(x)).flatten()
y[:n_bound_below] = y_all[i_bound_below] - lb[i_bound_below]
y[n_bound_below:] = -(y_all[i_bound_above] - ub[i_bound_above])
return y
cineq = [{"type": "ineq", "fun": f_ineq}]
if jac is not None:
def j_ineq(x):
dy = np.zeros((n_bound_below + n_bound_above, len(x0)))
dy_all = jac(x)
if issparse(dy_all):
dy_all = dy_all.toarray()
dy_all = np.atleast_2d(dy_all)
dy[:n_bound_below, :] = dy_all[i_bound_below]
dy[n_bound_below:, :] = -dy_all[i_bound_above]
return dy
cineq[0]["jac"] = j_ineq
old_constraints = ceq + cineq
if len(old_constraints) > 1:
warn("Equality and inequality constraints are specified in the same "
"element of the constraint list. For efficient use with this "
"method, equality and inequality constraints should be specified "
"in separate elements of the constraint list. ",
OptimizeWarning, stacklevel=3)
return old_constraints
def old_constraint_to_new(ic, con):
"""
Converts old-style constraint dictionaries to new-style constraint objects.
"""
# check type
try:
ctype = con['type'].lower()
except KeyError as e:
raise KeyError('Constraint %d has no type defined.' % ic) from e
except TypeError as e:
raise TypeError(
'Constraints must be a sequence of dictionaries.'
) from e
except AttributeError as e:
raise TypeError("Constraint's type must be a string.") from e
else:
if ctype not in ['eq', 'ineq']:
raise ValueError("Unknown constraint type '%s'." % con['type'])
if 'fun' not in con:
raise ValueError('Constraint %d has no function defined.' % ic)
lb = 0
if ctype == 'eq':
ub = 0
else:
ub = np.inf
jac = '2-point'
if 'args' in con:
args = con['args']
def fun(x):
return con["fun"](x, *args)
if 'jac' in con:
def jac(x):
return con["jac"](x, *args)
else:
fun = con['fun']
if 'jac' in con:
jac = con['jac']
return NonlinearConstraint(fun, lb, ub, jac)
@@ -0,0 +1,728 @@
import numpy as np
"""
# 2023 - ported from minpack2.dcsrch, dcstep (Fortran) to Python
c MINPACK-1 Project. June 1983.
c Argonne National Laboratory.
c Jorge J. More' and David J. Thuente.
c
c MINPACK-2 Project. November 1993.
c Argonne National Laboratory and University of Minnesota.
c Brett M. Averick, Richard G. Carter, and Jorge J. More'.
"""
# NOTE this file was linted by black on first commit, and can be kept that way.
class DCSRCH:
"""
Parameters
----------
phi : callable phi(alpha)
Function at point `alpha`
derphi : callable phi'(alpha)
Objective function derivative. Returns a scalar.
ftol : float
A nonnegative tolerance for the sufficient decrease condition.
gtol : float
A nonnegative tolerance for the curvature condition.
xtol : float
A nonnegative relative tolerance for an acceptable step. The
subroutine exits with a warning if the relative difference between
sty and stx is less than xtol.
stpmin : float
A nonnegative lower bound for the step.
stpmax :
A nonnegative upper bound for the step.
Notes
-----
This subroutine finds a step that satisfies a sufficient
decrease condition and a curvature condition.
Each call of the subroutine updates an interval with
endpoints stx and sty. The interval is initially chosen
so that it contains a minimizer of the modified function
psi(stp) = f(stp) - f(0) - ftol*stp*f'(0).
If psi(stp) <= 0 and f'(stp) >= 0 for some step, then the
interval is chosen so that it contains a minimizer of f.
The algorithm is designed to find a step that satisfies
the sufficient decrease condition
f(stp) <= f(0) + ftol*stp*f'(0),
and the curvature condition
abs(f'(stp)) <= gtol*abs(f'(0)).
If ftol is less than gtol and if, for example, the function
is bounded below, then there is always a step which satisfies
both conditions.
If no step can be found that satisfies both conditions, then
the algorithm stops with a warning. In this case stp only
satisfies the sufficient decrease condition.
A typical invocation of dcsrch has the following outline:
Evaluate the function at stp = 0.0d0; store in f.
Evaluate the gradient at stp = 0.0d0; store in g.
Choose a starting step stp.
task = 'START'
10 continue
call dcsrch(stp,f,g,ftol,gtol,xtol,task,stpmin,stpmax,
isave,dsave)
if (task .eq. 'FG') then
Evaluate the function and the gradient at stp
go to 10
end if
NOTE: The user must not alter work arrays between calls.
The subroutine statement is
subroutine dcsrch(f,g,stp,ftol,gtol,xtol,stpmin,stpmax,
task,isave,dsave)
where
stp is a double precision variable.
On entry stp is the current estimate of a satisfactory
step. On initial entry, a positive initial estimate
must be provided.
On exit stp is the current estimate of a satisfactory step
if task = 'FG'. If task = 'CONV' then stp satisfies
the sufficient decrease and curvature condition.
f is a double precision variable.
On initial entry f is the value of the function at 0.
On subsequent entries f is the value of the
function at stp.
On exit f is the value of the function at stp.
g is a double precision variable.
On initial entry g is the derivative of the function at 0.
On subsequent entries g is the derivative of the
function at stp.
On exit g is the derivative of the function at stp.
ftol is a double precision variable.
On entry ftol specifies a nonnegative tolerance for the
sufficient decrease condition.
On exit ftol is unchanged.
gtol is a double precision variable.
On entry gtol specifies a nonnegative tolerance for the
curvature condition.
On exit gtol is unchanged.
xtol is a double precision variable.
On entry xtol specifies a nonnegative relative tolerance
for an acceptable step. The subroutine exits with a
warning if the relative difference between sty and stx
is less than xtol.
On exit xtol is unchanged.
task is a character variable of length at least 60.
On initial entry task must be set to 'START'.
On exit task indicates the required action:
If task(1:2) = 'FG' then evaluate the function and
derivative at stp and call dcsrch again.
If task(1:4) = 'CONV' then the search is successful.
If task(1:4) = 'WARN' then the subroutine is not able
to satisfy the convergence conditions. The exit value of
stp contains the best point found during the search.
If task(1:5) = 'ERROR' then there is an error in the
input arguments.
On exit with convergence, a warning or an error, the
variable task contains additional information.
stpmin is a double precision variable.
On entry stpmin is a nonnegative lower bound for the step.
On exit stpmin is unchanged.
stpmax is a double precision variable.
On entry stpmax is a nonnegative upper bound for the step.
On exit stpmax is unchanged.
isave is an integer work array of dimension 2.
dsave is a double precision work array of dimension 13.
Subprograms called
MINPACK-2 ... dcstep
MINPACK-1 Project. June 1983.
Argonne National Laboratory.
Jorge J. More' and David J. Thuente.
MINPACK-2 Project. November 1993.
Argonne National Laboratory and University of Minnesota.
Brett M. Averick, Richard G. Carter, and Jorge J. More'.
"""
def __init__(self, phi, derphi, ftol, gtol, xtol, stpmin, stpmax):
self.stage = None
self.ginit = None
self.gtest = None
self.gx = None
self.gy = None
self.finit = None
self.fx = None
self.fy = None
self.stx = None
self.sty = None
self.stmin = None
self.stmax = None
self.width = None
self.width1 = None
# leave all assessment of tolerances/limits to the first call of
# this object
self.ftol = ftol
self.gtol = gtol
self.xtol = xtol
self.stpmin = stpmin
self.stpmax = stpmax
self.phi = phi
self.derphi = derphi
def __call__(self, alpha1, phi0=None, derphi0=None, maxiter=100):
"""
Parameters
----------
alpha1 : float
alpha1 is the current estimate of a satisfactory
step. A positive initial estimate must be provided.
phi0 : float
the value of `phi` at 0 (if known).
derphi0 : float
the derivative of `derphi` at 0 (if known).
maxiter : int
Returns
-------
alpha : float
Step size, or None if no suitable step was found.
phi : float
Value of `phi` at the new point `alpha`.
phi0 : float
Value of `phi` at `alpha=0`.
task : bytes
On exit task indicates status information.
If task[:4] == b'CONV' then the search is successful.
If task[:4] == b'WARN' then the subroutine is not able
to satisfy the convergence conditions. The exit value of
stp contains the best point found during the search.
If task[:5] == b'ERROR' then there is an error in the
input arguments.
"""
if phi0 is None:
phi0 = self.phi(0.0)
if derphi0 is None:
derphi0 = self.derphi(0.0)
phi1 = phi0
derphi1 = derphi0
task = b"START"
for i in range(maxiter):
stp, phi1, derphi1, task = self._iterate(
alpha1, phi1, derphi1, task
)
if not np.isfinite(stp):
task = b"WARN"
stp = None
break
if task[:2] == b"FG":
alpha1 = stp
phi1 = self.phi(stp)
derphi1 = self.derphi(stp)
else:
break
else:
# maxiter reached, the line search did not converge
stp = None
task = b"WARNING: dcsrch did not converge within max iterations"
if task[:5] == b"ERROR" or task[:4] == b"WARN":
stp = None # failed
return stp, phi1, phi0, task
def _iterate(self, stp, f, g, task):
"""
Parameters
----------
stp : float
The current estimate of a satisfactory step. On initial entry, a
positive initial estimate must be provided.
f : float
On first call f is the value of the function at 0. On subsequent
entries f should be the value of the function at stp.
g : float
On initial entry g is the derivative of the function at 0. On
subsequent entries g is the derivative of the function at stp.
task : bytes
On initial entry task must be set to 'START'.
On exit with convergence, a warning or an error, the
variable task contains additional information.
Returns
-------
stp, f, g, task: tuple
stp : float
the current estimate of a satisfactory step if task = 'FG'. If
task = 'CONV' then stp satisfies the sufficient decrease and
curvature condition.
f : float
the value of the function at stp.
g : float
the derivative of the function at stp.
task : bytes
On exit task indicates the required action:
If task(1:2) == b'FG' then evaluate the function and
derivative at stp and call dcsrch again.
If task(1:4) == b'CONV' then the search is successful.
If task(1:4) == b'WARN' then the subroutine is not able
to satisfy the convergence conditions. The exit value of
stp contains the best point found during the search.
If task(1:5) == b'ERROR' then there is an error in the
input arguments.
"""
p5 = 0.5
p66 = 0.66
xtrapl = 1.1
xtrapu = 4.0
if task[:5] == b"START":
if stp < self.stpmin:
task = b"ERROR: STP .LT. STPMIN"
if stp > self.stpmax:
task = b"ERROR: STP .GT. STPMAX"
if g >= 0:
task = b"ERROR: INITIAL G .GE. ZERO"
if self.ftol < 0:
task = b"ERROR: FTOL .LT. ZERO"
if self.gtol < 0:
task = b"ERROR: GTOL .LT. ZERO"
if self.xtol < 0:
task = b"ERROR: XTOL .LT. ZERO"
if self.stpmin < 0:
task = b"ERROR: STPMIN .LT. ZERO"
if self.stpmax < self.stpmin:
task = b"ERROR: STPMAX .LT. STPMIN"
if task[:5] == b"ERROR":
return stp, f, g, task
# Initialize local variables.
self.brackt = False
self.stage = 1
self.finit = f
self.ginit = g
self.gtest = self.ftol * self.ginit
self.width = self.stpmax - self.stpmin
self.width1 = self.width / p5
# The variables stx, fx, gx contain the values of the step,
# function, and derivative at the best step.
# The variables sty, fy, gy contain the value of the step,
# function, and derivative at sty.
# The variables stp, f, g contain the values of the step,
# function, and derivative at stp.
self.stx = 0.0
self.fx = self.finit
self.gx = self.ginit
self.sty = 0.0
self.fy = self.finit
self.gy = self.ginit
self.stmin = 0
self.stmax = stp + xtrapu * stp
task = b"FG"
return stp, f, g, task
# in the original Fortran this was a location to restore variables
# we don't need to do that because they're attributes.
# If psi(stp) <= 0 and f'(stp) >= 0 for some step, then the
# algorithm enters the second stage.
ftest = self.finit + stp * self.gtest
if self.stage == 1 and f <= ftest and g >= 0:
self.stage = 2
# test for warnings
if self.brackt and (stp <= self.stmin or stp >= self.stmax):
task = b"WARNING: ROUNDING ERRORS PREVENT PROGRESS"
if self.brackt and self.stmax - self.stmin <= self.xtol * self.stmax:
task = b"WARNING: XTOL TEST SATISFIED"
if stp == self.stpmax and f <= ftest and g <= self.gtest:
task = b"WARNING: STP = STPMAX"
if stp == self.stpmin and (f > ftest or g >= self.gtest):
task = b"WARNING: STP = STPMIN"
# test for convergence
if f <= ftest and abs(g) <= self.gtol * -self.ginit:
task = b"CONVERGENCE"
# test for termination
if task[:4] == b"WARN" or task[:4] == b"CONV":
return stp, f, g, task
# A modified function is used to predict the step during the
# first stage if a lower function value has been obtained but
# the decrease is not sufficient.
if self.stage == 1 and f <= self.fx and f > ftest:
# Define the modified function and derivative values.
fm = f - stp * self.gtest
fxm = self.fx - self.stx * self.gtest
fym = self.fy - self.sty * self.gtest
gm = g - self.gtest
gxm = self.gx - self.gtest
gym = self.gy - self.gtest
# Call dcstep to update stx, sty, and to compute the new step.
# dcstep can have several operations which can produce NaN
# e.g. inf/inf. Filter these out.
with np.errstate(invalid="ignore", over="ignore"):
tup = dcstep(
self.stx,
fxm,
gxm,
self.sty,
fym,
gym,
stp,
fm,
gm,
self.brackt,
self.stmin,
self.stmax,
)
self.stx, fxm, gxm, self.sty, fym, gym, stp, self.brackt = tup
# Reset the function and derivative values for f
self.fx = fxm + self.stx * self.gtest
self.fy = fym + self.sty * self.gtest
self.gx = gxm + self.gtest
self.gy = gym + self.gtest
else:
# Call dcstep to update stx, sty, and to compute the new step.
# dcstep can have several operations which can produce NaN
# e.g. inf/inf. Filter these out.
with np.errstate(invalid="ignore", over="ignore"):
tup = dcstep(
self.stx,
self.fx,
self.gx,
self.sty,
self.fy,
self.gy,
stp,
f,
g,
self.brackt,
self.stmin,
self.stmax,
)
(
self.stx,
self.fx,
self.gx,
self.sty,
self.fy,
self.gy,
stp,
self.brackt,
) = tup
# Decide if a bisection step is needed
if self.brackt:
if abs(self.sty - self.stx) >= p66 * self.width1:
stp = self.stx + p5 * (self.sty - self.stx)
self.width1 = self.width
self.width = abs(self.sty - self.stx)
# Set the minimum and maximum steps allowed for stp.
if self.brackt:
self.stmin = min(self.stx, self.sty)
self.stmax = max(self.stx, self.sty)
else:
self.stmin = stp + xtrapl * (stp - self.stx)
self.stmax = stp + xtrapu * (stp - self.stx)
# Force the step to be within the bounds stpmax and stpmin.
stp = np.clip(stp, self.stpmin, self.stpmax)
# If further progress is not possible, let stp be the best
# point obtained during the search.
if (
self.brackt
and (stp <= self.stmin or stp >= self.stmax)
or (
self.brackt
and self.stmax - self.stmin <= self.xtol * self.stmax
)
):
stp = self.stx
# Obtain another function and derivative
task = b"FG"
return stp, f, g, task
def dcstep(stx, fx, dx, sty, fy, dy, stp, fp, dp, brackt, stpmin, stpmax):
"""
Subroutine dcstep
This subroutine computes a safeguarded step for a search
procedure and updates an interval that contains a step that
satisfies a sufficient decrease and a curvature condition.
The parameter stx contains the step with the least function
value. If brackt is set to .true. then a minimizer has
been bracketed in an interval with endpoints stx and sty.
The parameter stp contains the current step.
The subroutine assumes that if brackt is set to .true. then
min(stx,sty) < stp < max(stx,sty),
and that the derivative at stx is negative in the direction
of the step.
The subroutine statement is
subroutine dcstep(stx,fx,dx,sty,fy,dy,stp,fp,dp,brackt,
stpmin,stpmax)
where
stx is a double precision variable.
On entry stx is the best step obtained so far and is an
endpoint of the interval that contains the minimizer.
On exit stx is the updated best step.
fx is a double precision variable.
On entry fx is the function at stx.
On exit fx is the function at stx.
dx is a double precision variable.
On entry dx is the derivative of the function at
stx. The derivative must be negative in the direction of
the step, that is, dx and stp - stx must have opposite
signs.
On exit dx is the derivative of the function at stx.
sty is a double precision variable.
On entry sty is the second endpoint of the interval that
contains the minimizer.
On exit sty is the updated endpoint of the interval that
contains the minimizer.
fy is a double precision variable.
On entry fy is the function at sty.
On exit fy is the function at sty.
dy is a double precision variable.
On entry dy is the derivative of the function at sty.
On exit dy is the derivative of the function at the exit sty.
stp is a double precision variable.
On entry stp is the current step. If brackt is set to .true.
then on input stp must be between stx and sty.
On exit stp is a new trial step.
fp is a double precision variable.
On entry fp is the function at stp
On exit fp is unchanged.
dp is a double precision variable.
On entry dp is the derivative of the function at stp.
On exit dp is unchanged.
brackt is an logical variable.
On entry brackt specifies if a minimizer has been bracketed.
Initially brackt must be set to .false.
On exit brackt specifies if a minimizer has been bracketed.
When a minimizer is bracketed brackt is set to .true.
stpmin is a double precision variable.
On entry stpmin is a lower bound for the step.
On exit stpmin is unchanged.
stpmax is a double precision variable.
On entry stpmax is an upper bound for the step.
On exit stpmax is unchanged.
MINPACK-1 Project. June 1983
Argonne National Laboratory.
Jorge J. More' and David J. Thuente.
MINPACK-2 Project. November 1993.
Argonne National Laboratory and University of Minnesota.
Brett M. Averick and Jorge J. More'.
"""
sgn_dp = np.sign(dp)
sgn_dx = np.sign(dx)
# sgnd = dp * (dx / abs(dx))
sgnd = sgn_dp * sgn_dx
# First case: A higher function value. The minimum is bracketed.
# If the cubic step is closer to stx than the quadratic step, the
# cubic step is taken, otherwise the average of the cubic and
# quadratic steps is taken.
if fp > fx:
theta = 3.0 * (fx - fp) / (stp - stx) + dx + dp
s = max(abs(theta), abs(dx), abs(dp))
gamma = s * np.sqrt((theta / s) ** 2 - (dx / s) * (dp / s))
if stp < stx:
gamma *= -1
p = (gamma - dx) + theta
q = ((gamma - dx) + gamma) + dp
r = p / q
stpc = stx + r * (stp - stx)
stpq = stx + ((dx / ((fx - fp) / (stp - stx) + dx)) / 2.0) * (stp - stx)
if abs(stpc - stx) <= abs(stpq - stx):
stpf = stpc
else:
stpf = stpc + (stpq - stpc) / 2.0
brackt = True
elif sgnd < 0.0:
# Second case: A lower function value and derivatives of opposite
# sign. The minimum is bracketed. If the cubic step is farther from
# stp than the secant step, the cubic step is taken, otherwise the
# secant step is taken.
theta = 3 * (fx - fp) / (stp - stx) + dx + dp
s = max(abs(theta), abs(dx), abs(dp))
gamma = s * np.sqrt((theta / s) ** 2 - (dx / s) * (dp / s))
if stp > stx:
gamma *= -1
p = (gamma - dp) + theta
q = ((gamma - dp) + gamma) + dx
r = p / q
stpc = stp + r * (stx - stp)
stpq = stp + (dp / (dp - dx)) * (stx - stp)
if abs(stpc - stp) > abs(stpq - stp):
stpf = stpc
else:
stpf = stpq
brackt = True
elif abs(dp) < abs(dx):
# Third case: A lower function value, derivatives of the same sign,
# and the magnitude of the derivative decreases.
# The cubic step is computed only if the cubic tends to infinity
# in the direction of the step or if the minimum of the cubic
# is beyond stp. Otherwise the cubic step is defined to be the
# secant step.
theta = 3 * (fx - fp) / (stp - stx) + dx + dp
s = max(abs(theta), abs(dx), abs(dp))
# The case gamma = 0 only arises if the cubic does not tend
# to infinity in the direction of the step.
gamma = s * np.sqrt(max(0, (theta / s) ** 2 - (dx / s) * (dp / s)))
if stp > stx:
gamma = -gamma
p = (gamma - dp) + theta
q = (gamma + (dx - dp)) + gamma
r = p / q
if r < 0 and gamma != 0:
stpc = stp + r * (stx - stp)
elif stp > stx:
stpc = stpmax
else:
stpc = stpmin
stpq = stp + (dp / (dp - dx)) * (stx - stp)
if brackt:
# A minimizer has been bracketed. If the cubic step is
# closer to stp than the secant step, the cubic step is
# taken, otherwise the secant step is taken.
if abs(stpc - stp) < abs(stpq - stp):
stpf = stpc
else:
stpf = stpq
if stp > stx:
stpf = min(stp + 0.66 * (sty - stp), stpf)
else:
stpf = max(stp + 0.66 * (sty - stp), stpf)
else:
# A minimizer has not been bracketed. If the cubic step is
# farther from stp than the secant step, the cubic step is
# taken, otherwise the secant step is taken.
if abs(stpc - stp) > abs(stpq - stp):
stpf = stpc
else:
stpf = stpq
stpf = np.clip(stpf, stpmin, stpmax)
else:
# Fourth case: A lower function value, derivatives of the same sign,
# and the magnitude of the derivative does not decrease. If the
# minimum is not bracketed, the step is either stpmin or stpmax,
# otherwise the cubic step is taken.
if brackt:
theta = 3.0 * (fp - fy) / (sty - stp) + dy + dp
s = max(abs(theta), abs(dy), abs(dp))
gamma = s * np.sqrt((theta / s) ** 2 - (dy / s) * (dp / s))
if stp > sty:
gamma = -gamma
p = (gamma - dp) + theta
q = ((gamma - dp) + gamma) + dy
r = p / q
stpc = stp + r * (sty - stp)
stpf = stpc
elif stp > stx:
stpf = stpmax
else:
stpf = stpmin
# Update the interval which contains a minimizer.
if fp > fx:
sty = stp
fy = fp
dy = dp
else:
if sgnd < 0:
sty = stx
fy = fx
dy = dx
stx = stp
fx = fp
dx = dp
# Compute the new step.
stp = stpf
return stx, fx, dx, sty, fy, dy, stp, brackt
@@ -0,0 +1,646 @@
import numpy as np
import scipy.sparse as sps
from ._numdiff import approx_derivative, group_columns
from ._hessian_update_strategy import HessianUpdateStrategy
from scipy.sparse.linalg import LinearOperator
from scipy._lib._array_api import atleast_nd, array_namespace
FD_METHODS = ('2-point', '3-point', 'cs')
class ScalarFunction:
"""Scalar function and its derivatives.
This class defines a scalar function F: R^n->R and methods for
computing or approximating its first and second derivatives.
Parameters
----------
fun : callable
evaluates the scalar function. Must be of the form ``fun(x, *args)``,
where ``x`` is the argument in the form of a 1-D array and ``args`` is
a tuple of any additional fixed parameters needed to completely specify
the function. Should return a scalar.
x0 : array-like
Provides an initial set of variables for evaluating fun. Array of real
elements of size (n,), where 'n' is the number of independent
variables.
args : tuple, optional
Any additional fixed parameters needed to completely specify the scalar
function.
grad : {callable, '2-point', '3-point', 'cs'}
Method for computing the gradient vector.
If it is a callable, it should be a function that returns the gradient
vector:
``grad(x, *args) -> array_like, shape (n,)``
where ``x`` is an array with shape (n,) and ``args`` is a tuple with
the fixed parameters.
Alternatively, the keywords {'2-point', '3-point', 'cs'} can be used
to select a finite difference scheme for numerical estimation of the
gradient with a relative step size. These finite difference schemes
obey any specified `bounds`.
hess : {callable, '2-point', '3-point', 'cs', HessianUpdateStrategy}
Method for computing the Hessian matrix. If it is callable, it should
return the Hessian matrix:
``hess(x, *args) -> {LinearOperator, spmatrix, array}, (n, n)``
where x is a (n,) ndarray and `args` is a tuple with the fixed
parameters. Alternatively, the keywords {'2-point', '3-point', 'cs'}
select a finite difference scheme for numerical estimation. Or, objects
implementing `HessianUpdateStrategy` interface can be used to
approximate the Hessian.
Whenever the gradient is estimated via finite-differences, the Hessian
cannot be estimated with options {'2-point', '3-point', 'cs'} and needs
to be estimated using one of the quasi-Newton strategies.
finite_diff_rel_step : None or array_like
Relative step size to use. The absolute step size is computed as
``h = finite_diff_rel_step * sign(x0) * max(1, abs(x0))``, possibly
adjusted to fit into the bounds. For ``method='3-point'`` the sign
of `h` is ignored. If None then finite_diff_rel_step is selected
automatically,
finite_diff_bounds : tuple of array_like
Lower and upper bounds on independent variables. Defaults to no bounds,
(-np.inf, np.inf). Each bound must match the size of `x0` or be a
scalar, in the latter case the bound will be the same for all
variables. Use it to limit the range of function evaluation.
epsilon : None or array_like, optional
Absolute step size to use, possibly adjusted to fit into the bounds.
For ``method='3-point'`` the sign of `epsilon` is ignored. By default
relative steps are used, only if ``epsilon is not None`` are absolute
steps used.
Notes
-----
This class implements a memoization logic. There are methods `fun`,
`grad`, hess` and corresponding attributes `f`, `g` and `H`. The following
things should be considered:
1. Use only public methods `fun`, `grad` and `hess`.
2. After one of the methods is called, the corresponding attribute
will be set. However, a subsequent call with a different argument
of *any* of the methods may overwrite the attribute.
"""
def __init__(self, fun, x0, args, grad, hess, finite_diff_rel_step,
finite_diff_bounds, epsilon=None):
if not callable(grad) and grad not in FD_METHODS:
raise ValueError(
f"`grad` must be either callable or one of {FD_METHODS}."
)
if not (callable(hess) or hess in FD_METHODS
or isinstance(hess, HessianUpdateStrategy)):
raise ValueError(
f"`hess` must be either callable, HessianUpdateStrategy"
f" or one of {FD_METHODS}."
)
if grad in FD_METHODS and hess in FD_METHODS:
raise ValueError("Whenever the gradient is estimated via "
"finite-differences, we require the Hessian "
"to be estimated using one of the "
"quasi-Newton strategies.")
self.xp = xp = array_namespace(x0)
_x = atleast_nd(x0, ndim=1, xp=xp)
_dtype = xp.float64
if xp.isdtype(_x.dtype, "real floating"):
_dtype = _x.dtype
# promotes to floating
self.x = xp.astype(_x, _dtype)
self.x_dtype = _dtype
self.n = self.x.size
self.nfev = 0
self.ngev = 0
self.nhev = 0
self.f_updated = False
self.g_updated = False
self.H_updated = False
self._lowest_x = None
self._lowest_f = np.inf
finite_diff_options = {}
if grad in FD_METHODS:
finite_diff_options["method"] = grad
finite_diff_options["rel_step"] = finite_diff_rel_step
finite_diff_options["abs_step"] = epsilon
finite_diff_options["bounds"] = finite_diff_bounds
if hess in FD_METHODS:
finite_diff_options["method"] = hess
finite_diff_options["rel_step"] = finite_diff_rel_step
finite_diff_options["abs_step"] = epsilon
finite_diff_options["as_linear_operator"] = True
# Function evaluation
def fun_wrapped(x):
self.nfev += 1
# Send a copy because the user may overwrite it.
# Overwriting results in undefined behaviour because
# fun(self.x) will change self.x, with the two no longer linked.
fx = fun(np.copy(x), *args)
# Make sure the function returns a true scalar
if not np.isscalar(fx):
try:
fx = np.asarray(fx).item()
except (TypeError, ValueError) as e:
raise ValueError(
"The user-provided objective function "
"must return a scalar value."
) from e
if fx < self._lowest_f:
self._lowest_x = x
self._lowest_f = fx
return fx
def update_fun():
self.f = fun_wrapped(self.x)
self._update_fun_impl = update_fun
self._update_fun()
# Gradient evaluation
if callable(grad):
def grad_wrapped(x):
self.ngev += 1
return np.atleast_1d(grad(np.copy(x), *args))
def update_grad():
self.g = grad_wrapped(self.x)
elif grad in FD_METHODS:
def update_grad():
self._update_fun()
self.ngev += 1
self.g = approx_derivative(fun_wrapped, self.x, f0=self.f,
**finite_diff_options)
self._update_grad_impl = update_grad
self._update_grad()
# Hessian Evaluation
if callable(hess):
self.H = hess(np.copy(x0), *args)
self.H_updated = True
self.nhev += 1
if sps.issparse(self.H):
def hess_wrapped(x):
self.nhev += 1
return sps.csr_matrix(hess(np.copy(x), *args))
self.H = sps.csr_matrix(self.H)
elif isinstance(self.H, LinearOperator):
def hess_wrapped(x):
self.nhev += 1
return hess(np.copy(x), *args)
else:
def hess_wrapped(x):
self.nhev += 1
return np.atleast_2d(np.asarray(hess(np.copy(x), *args)))
self.H = np.atleast_2d(np.asarray(self.H))
def update_hess():
self.H = hess_wrapped(self.x)
elif hess in FD_METHODS:
def update_hess():
self._update_grad()
self.H = approx_derivative(grad_wrapped, self.x, f0=self.g,
**finite_diff_options)
return self.H
update_hess()
self.H_updated = True
elif isinstance(hess, HessianUpdateStrategy):
self.H = hess
self.H.initialize(self.n, 'hess')
self.H_updated = True
self.x_prev = None
self.g_prev = None
def update_hess():
self._update_grad()
self.H.update(self.x - self.x_prev, self.g - self.g_prev)
self._update_hess_impl = update_hess
if isinstance(hess, HessianUpdateStrategy):
def update_x(x):
self._update_grad()
self.x_prev = self.x
self.g_prev = self.g
# ensure that self.x is a copy of x. Don't store a reference
# otherwise the memoization doesn't work properly.
_x = atleast_nd(x, ndim=1, xp=self.xp)
self.x = self.xp.astype(_x, self.x_dtype)
self.f_updated = False
self.g_updated = False
self.H_updated = False
self._update_hess()
else:
def update_x(x):
# ensure that self.x is a copy of x. Don't store a reference
# otherwise the memoization doesn't work properly.
_x = atleast_nd(x, ndim=1, xp=self.xp)
self.x = self.xp.astype(_x, self.x_dtype)
self.f_updated = False
self.g_updated = False
self.H_updated = False
self._update_x_impl = update_x
def _update_fun(self):
if not self.f_updated:
self._update_fun_impl()
self.f_updated = True
def _update_grad(self):
if not self.g_updated:
self._update_grad_impl()
self.g_updated = True
def _update_hess(self):
if not self.H_updated:
self._update_hess_impl()
self.H_updated = True
def fun(self, x):
if not np.array_equal(x, self.x):
self._update_x_impl(x)
self._update_fun()
return self.f
def grad(self, x):
if not np.array_equal(x, self.x):
self._update_x_impl(x)
self._update_grad()
return self.g
def hess(self, x):
if not np.array_equal(x, self.x):
self._update_x_impl(x)
self._update_hess()
return self.H
def fun_and_grad(self, x):
if not np.array_equal(x, self.x):
self._update_x_impl(x)
self._update_fun()
self._update_grad()
return self.f, self.g
class VectorFunction:
"""Vector function and its derivatives.
This class defines a vector function F: R^n->R^m and methods for
computing or approximating its first and second derivatives.
Notes
-----
This class implements a memoization logic. There are methods `fun`,
`jac`, hess` and corresponding attributes `f`, `J` and `H`. The following
things should be considered:
1. Use only public methods `fun`, `jac` and `hess`.
2. After one of the methods is called, the corresponding attribute
will be set. However, a subsequent call with a different argument
of *any* of the methods may overwrite the attribute.
"""
def __init__(self, fun, x0, jac, hess,
finite_diff_rel_step, finite_diff_jac_sparsity,
finite_diff_bounds, sparse_jacobian):
if not callable(jac) and jac not in FD_METHODS:
raise ValueError(f"`jac` must be either callable or one of {FD_METHODS}.")
if not (callable(hess) or hess in FD_METHODS
or isinstance(hess, HessianUpdateStrategy)):
raise ValueError("`hess` must be either callable,"
f"HessianUpdateStrategy or one of {FD_METHODS}.")
if jac in FD_METHODS and hess in FD_METHODS:
raise ValueError("Whenever the Jacobian is estimated via "
"finite-differences, we require the Hessian to "
"be estimated using one of the quasi-Newton "
"strategies.")
self.xp = xp = array_namespace(x0)
_x = atleast_nd(x0, ndim=1, xp=xp)
_dtype = xp.float64
if xp.isdtype(_x.dtype, "real floating"):
_dtype = _x.dtype
# promotes to floating
self.x = xp.astype(_x, _dtype)
self.x_dtype = _dtype
self.n = self.x.size
self.nfev = 0
self.njev = 0
self.nhev = 0
self.f_updated = False
self.J_updated = False
self.H_updated = False
finite_diff_options = {}
if jac in FD_METHODS:
finite_diff_options["method"] = jac
finite_diff_options["rel_step"] = finite_diff_rel_step
if finite_diff_jac_sparsity is not None:
sparsity_groups = group_columns(finite_diff_jac_sparsity)
finite_diff_options["sparsity"] = (finite_diff_jac_sparsity,
sparsity_groups)
finite_diff_options["bounds"] = finite_diff_bounds
self.x_diff = np.copy(self.x)
if hess in FD_METHODS:
finite_diff_options["method"] = hess
finite_diff_options["rel_step"] = finite_diff_rel_step
finite_diff_options["as_linear_operator"] = True
self.x_diff = np.copy(self.x)
if jac in FD_METHODS and hess in FD_METHODS:
raise ValueError("Whenever the Jacobian is estimated via "
"finite-differences, we require the Hessian to "
"be estimated using one of the quasi-Newton "
"strategies.")
# Function evaluation
def fun_wrapped(x):
self.nfev += 1
return np.atleast_1d(fun(x))
def update_fun():
self.f = fun_wrapped(self.x)
self._update_fun_impl = update_fun
update_fun()
self.v = np.zeros_like(self.f)
self.m = self.v.size
# Jacobian Evaluation
if callable(jac):
self.J = jac(self.x)
self.J_updated = True
self.njev += 1
if (sparse_jacobian or
sparse_jacobian is None and sps.issparse(self.J)):
def jac_wrapped(x):
self.njev += 1
return sps.csr_matrix(jac(x))
self.J = sps.csr_matrix(self.J)
self.sparse_jacobian = True
elif sps.issparse(self.J):
def jac_wrapped(x):
self.njev += 1
return jac(x).toarray()
self.J = self.J.toarray()
self.sparse_jacobian = False
else:
def jac_wrapped(x):
self.njev += 1
return np.atleast_2d(jac(x))
self.J = np.atleast_2d(self.J)
self.sparse_jacobian = False
def update_jac():
self.J = jac_wrapped(self.x)
elif jac in FD_METHODS:
self.J = approx_derivative(fun_wrapped, self.x, f0=self.f,
**finite_diff_options)
self.J_updated = True
if (sparse_jacobian or
sparse_jacobian is None and sps.issparse(self.J)):
def update_jac():
self._update_fun()
self.J = sps.csr_matrix(
approx_derivative(fun_wrapped, self.x, f0=self.f,
**finite_diff_options))
self.J = sps.csr_matrix(self.J)
self.sparse_jacobian = True
elif sps.issparse(self.J):
def update_jac():
self._update_fun()
self.J = approx_derivative(fun_wrapped, self.x, f0=self.f,
**finite_diff_options).toarray()
self.J = self.J.toarray()
self.sparse_jacobian = False
else:
def update_jac():
self._update_fun()
self.J = np.atleast_2d(
approx_derivative(fun_wrapped, self.x, f0=self.f,
**finite_diff_options))
self.J = np.atleast_2d(self.J)
self.sparse_jacobian = False
self._update_jac_impl = update_jac
# Define Hessian
if callable(hess):
self.H = hess(self.x, self.v)
self.H_updated = True
self.nhev += 1
if sps.issparse(self.H):
def hess_wrapped(x, v):
self.nhev += 1
return sps.csr_matrix(hess(x, v))
self.H = sps.csr_matrix(self.H)
elif isinstance(self.H, LinearOperator):
def hess_wrapped(x, v):
self.nhev += 1
return hess(x, v)
else:
def hess_wrapped(x, v):
self.nhev += 1
return np.atleast_2d(np.asarray(hess(x, v)))
self.H = np.atleast_2d(np.asarray(self.H))
def update_hess():
self.H = hess_wrapped(self.x, self.v)
elif hess in FD_METHODS:
def jac_dot_v(x, v):
return jac_wrapped(x).T.dot(v)
def update_hess():
self._update_jac()
self.H = approx_derivative(jac_dot_v, self.x,
f0=self.J.T.dot(self.v),
args=(self.v,),
**finite_diff_options)
update_hess()
self.H_updated = True
elif isinstance(hess, HessianUpdateStrategy):
self.H = hess
self.H.initialize(self.n, 'hess')
self.H_updated = True
self.x_prev = None
self.J_prev = None
def update_hess():
self._update_jac()
# When v is updated before x was updated, then x_prev and
# J_prev are None and we need this check.
if self.x_prev is not None and self.J_prev is not None:
delta_x = self.x - self.x_prev
delta_g = self.J.T.dot(self.v) - self.J_prev.T.dot(self.v)
self.H.update(delta_x, delta_g)
self._update_hess_impl = update_hess
if isinstance(hess, HessianUpdateStrategy):
def update_x(x):
self._update_jac()
self.x_prev = self.x
self.J_prev = self.J
_x = atleast_nd(x, ndim=1, xp=self.xp)
self.x = self.xp.astype(_x, self.x_dtype)
self.f_updated = False
self.J_updated = False
self.H_updated = False
self._update_hess()
else:
def update_x(x):
_x = atleast_nd(x, ndim=1, xp=self.xp)
self.x = self.xp.astype(_x, self.x_dtype)
self.f_updated = False
self.J_updated = False
self.H_updated = False
self._update_x_impl = update_x
def _update_v(self, v):
if not np.array_equal(v, self.v):
self.v = v
self.H_updated = False
def _update_x(self, x):
if not np.array_equal(x, self.x):
self._update_x_impl(x)
def _update_fun(self):
if not self.f_updated:
self._update_fun_impl()
self.f_updated = True
def _update_jac(self):
if not self.J_updated:
self._update_jac_impl()
self.J_updated = True
def _update_hess(self):
if not self.H_updated:
self._update_hess_impl()
self.H_updated = True
def fun(self, x):
self._update_x(x)
self._update_fun()
return self.f
def jac(self, x):
self._update_x(x)
self._update_jac()
return self.J
def hess(self, x, v):
# v should be updated before x.
self._update_v(v)
self._update_x(x)
self._update_hess()
return self.H
class LinearVectorFunction:
"""Linear vector function and its derivatives.
Defines a linear function F = A x, where x is N-D vector and
A is m-by-n matrix. The Jacobian is constant and equals to A. The Hessian
is identically zero and it is returned as a csr matrix.
"""
def __init__(self, A, x0, sparse_jacobian):
if sparse_jacobian or sparse_jacobian is None and sps.issparse(A):
self.J = sps.csr_matrix(A)
self.sparse_jacobian = True
elif sps.issparse(A):
self.J = A.toarray()
self.sparse_jacobian = False
else:
# np.asarray makes sure A is ndarray and not matrix
self.J = np.atleast_2d(np.asarray(A))
self.sparse_jacobian = False
self.m, self.n = self.J.shape
self.xp = xp = array_namespace(x0)
_x = atleast_nd(x0, ndim=1, xp=xp)
_dtype = xp.float64
if xp.isdtype(_x.dtype, "real floating"):
_dtype = _x.dtype
# promotes to floating
self.x = xp.astype(_x, _dtype)
self.x_dtype = _dtype
self.f = self.J.dot(self.x)
self.f_updated = True
self.v = np.zeros(self.m, dtype=float)
self.H = sps.csr_matrix((self.n, self.n))
def _update_x(self, x):
if not np.array_equal(x, self.x):
_x = atleast_nd(x, ndim=1, xp=self.xp)
self.x = self.xp.astype(_x, self.x_dtype)
self.f_updated = False
def fun(self, x):
self._update_x(x)
if not self.f_updated:
self.f = self.J.dot(x)
self.f_updated = True
return self.f
def jac(self, x):
self._update_x(x)
return self.J
def hess(self, x, v):
self._update_x(x)
self.v = v
return self.H
class IdentityVectorFunction(LinearVectorFunction):
"""Identity vector function and its derivatives.
The Jacobian is the identity matrix, returned as a dense array when
`sparse_jacobian=False` and as a csr matrix otherwise. The Hessian is
identically zero and it is returned as a csr matrix.
"""
def __init__(self, x0, sparse_jacobian):
n = len(x0)
if sparse_jacobian or sparse_jacobian is None:
A = sps.eye(n, format='csr')
sparse_jacobian = True
else:
A = np.eye(n)
sparse_jacobian = False
super().__init__(A, x0, sparse_jacobian)
@@ -0,0 +1,669 @@
# mypy: disable-error-code="attr-defined"
import numpy as np
import scipy._lib._elementwise_iterative_method as eim
from scipy._lib._util import _RichResult
_EERRORINCREASE = -1 # used in _differentiate
def _differentiate_iv(func, x, args, atol, rtol, maxiter, order, initial_step,
step_factor, step_direction, preserve_shape, callback):
# Input validation for `_differentiate`
if not callable(func):
raise ValueError('`func` must be callable.')
# x has more complex IV that is taken care of during initialization
x = np.asarray(x)
dtype = x.dtype if np.issubdtype(x.dtype, np.inexact) else np.float64
if not np.iterable(args):
args = (args,)
if atol is None:
atol = np.finfo(dtype).tiny
if rtol is None:
rtol = np.sqrt(np.finfo(dtype).eps)
message = 'Tolerances and step parameters must be non-negative scalars.'
tols = np.asarray([atol, rtol, initial_step, step_factor])
if (not np.issubdtype(tols.dtype, np.number)
or np.any(tols < 0)
or tols.shape != (4,)):
raise ValueError(message)
initial_step, step_factor = tols[2:].astype(dtype)
maxiter_int = int(maxiter)
if maxiter != maxiter_int or maxiter <= 0:
raise ValueError('`maxiter` must be a positive integer.')
order_int = int(order)
if order_int != order or order <= 0:
raise ValueError('`order` must be a positive integer.')
step_direction = np.sign(step_direction).astype(dtype)
x, step_direction = np.broadcast_arrays(x, step_direction)
x, step_direction = x[()], step_direction[()]
message = '`preserve_shape` must be True or False.'
if preserve_shape not in {True, False}:
raise ValueError(message)
if callback is not None and not callable(callback):
raise ValueError('`callback` must be callable.')
return (func, x, args, atol, rtol, maxiter_int, order_int, initial_step,
step_factor, step_direction, preserve_shape, callback)
def _differentiate(func, x, *, args=(), atol=None, rtol=None, maxiter=10,
order=8, initial_step=0.5, step_factor=2.0,
step_direction=0, preserve_shape=False, callback=None):
"""Evaluate the derivative of an elementwise scalar function numerically.
Parameters
----------
func : callable
The function whose derivative is desired. The signature must be::
func(x: ndarray, *fargs) -> ndarray
where each element of ``x`` is a finite real and ``fargs`` is a tuple,
which may contain an arbitrary number of arrays that are broadcastable
with `x`. ``func`` must be an elementwise function: each element
``func(x)[i]`` must equal ``func(x[i])`` for all indices ``i``.
x : array_like
Abscissae at which to evaluate the derivative.
args : tuple, optional
Additional positional arguments to be passed to `func`. Must be arrays
broadcastable with `x`. If the callable to be differentiated requires
arguments that are not broadcastable with `x`, wrap that callable with
`func`. See Examples.
atol, rtol : float, optional
Absolute and relative tolerances for the stopping condition: iteration
will stop when ``res.error < atol + rtol * abs(res.df)``. The default
`atol` is the smallest normal number of the appropriate dtype, and
the default `rtol` is the square root of the precision of the
appropriate dtype.
order : int, default: 8
The (positive integer) order of the finite difference formula to be
used. Odd integers will be rounded up to the next even integer.
initial_step : float, default: 0.5
The (absolute) initial step size for the finite difference derivative
approximation.
step_factor : float, default: 2.0
The factor by which the step size is *reduced* in each iteration; i.e.
the step size in iteration 1 is ``initial_step/step_factor``. If
``step_factor < 1``, subsequent steps will be greater than the initial
step; this may be useful if steps smaller than some threshold are
undesirable (e.g. due to subtractive cancellation error).
maxiter : int, default: 10
The maximum number of iterations of the algorithm to perform. See
notes.
step_direction : array_like
An array representing the direction of the finite difference steps (for
use when `x` lies near to the boundary of the domain of the function.)
Must be broadcastable with `x` and all `args`.
Where 0 (default), central differences are used; where negative (e.g.
-1), steps are non-positive; and where positive (e.g. 1), all steps are
non-negative.
preserve_shape : bool, default: False
In the following, "arguments of `func`" refers to the array ``x`` and
any arrays within ``fargs``. Let ``shape`` be the broadcasted shape
of `x` and all elements of `args` (which is conceptually
distinct from ``fargs`` passed into `f`).
- When ``preserve_shape=False`` (default), `f` must accept arguments
of *any* broadcastable shapes.
- When ``preserve_shape=True``, `f` must accept arguments of shape
``shape`` *or* ``shape + (n,)``, where ``(n,)`` is the number of
abscissae at which the function is being evaluated.
In either case, for each scalar element ``xi`` within `x`, the array
returned by `f` must include the scalar ``f(xi)`` at the same index.
Consequently, the shape of the output is always the shape of the input
``x``.
See Examples.
callback : callable, optional
An optional user-supplied function to be called before the first
iteration and after each iteration.
Called as ``callback(res)``, where ``res`` is a ``_RichResult``
similar to that returned by `_differentiate` (but containing the
current iterate's values of all variables). If `callback` raises a
``StopIteration``, the algorithm will terminate immediately and
`_differentiate` will return a result.
Returns
-------
res : _RichResult
An instance of `scipy._lib._util._RichResult` with the following
attributes. (The descriptions are written as though the values will be
scalars; however, if `func` returns an array, the outputs will be
arrays of the same shape.)
success : bool
``True`` when the algorithm terminated successfully (status ``0``).
status : int
An integer representing the exit status of the algorithm.
``0`` : The algorithm converged to the specified tolerances.
``-1`` : The error estimate increased, so iteration was terminated.
``-2`` : The maximum number of iterations was reached.
``-3`` : A non-finite value was encountered.
``-4`` : Iteration was terminated by `callback`.
``1`` : The algorithm is proceeding normally (in `callback` only).
df : float
The derivative of `func` at `x`, if the algorithm terminated
successfully.
error : float
An estimate of the error: the magnitude of the difference between
the current estimate of the derivative and the estimate in the
previous iteration.
nit : int
The number of iterations performed.
nfev : int
The number of points at which `func` was evaluated.
x : float
The value at which the derivative of `func` was evaluated
(after broadcasting with `args` and `step_direction`).
Notes
-----
The implementation was inspired by jacobi [1]_, numdifftools [2]_, and
DERIVEST [3]_, but the implementation follows the theory of Taylor series
more straightforwardly (and arguably naively so).
In the first iteration, the derivative is estimated using a finite
difference formula of order `order` with maximum step size `initial_step`.
Each subsequent iteration, the maximum step size is reduced by
`step_factor`, and the derivative is estimated again until a termination
condition is reached. The error estimate is the magnitude of the difference
between the current derivative approximation and that of the previous
iteration.
The stencils of the finite difference formulae are designed such that
abscissae are "nested": after `func` is evaluated at ``order + 1``
points in the first iteration, `func` is evaluated at only two new points
in each subsequent iteration; ``order - 1`` previously evaluated function
values required by the finite difference formula are reused, and two
function values (evaluations at the points furthest from `x`) are unused.
Step sizes are absolute. When the step size is small relative to the
magnitude of `x`, precision is lost; for example, if `x` is ``1e20``, the
default initial step size of ``0.5`` cannot be resolved. Accordingly,
consider using larger initial step sizes for large magnitudes of `x`.
The default tolerances are challenging to satisfy at points where the
true derivative is exactly zero. If the derivative may be exactly zero,
consider specifying an absolute tolerance (e.g. ``atol=1e-16``) to
improve convergence.
References
----------
[1]_ Hans Dembinski (@HDembinski). jacobi.
https://github.com/HDembinski/jacobi
[2]_ Per A. Brodtkorb and John D'Errico. numdifftools.
https://numdifftools.readthedocs.io/en/latest/
[3]_ John D'Errico. DERIVEST: Adaptive Robust Numerical Differentiation.
https://www.mathworks.com/matlabcentral/fileexchange/13490-adaptive-robust-numerical-differentiation
[4]_ Numerical Differentition. Wikipedia.
https://en.wikipedia.org/wiki/Numerical_differentiation
Examples
--------
Evaluate the derivative of ``np.exp`` at several points ``x``.
>>> import numpy as np
>>> from scipy.optimize._differentiate import _differentiate
>>> f = np.exp
>>> df = np.exp # true derivative
>>> x = np.linspace(1, 2, 5)
>>> res = _differentiate(f, x)
>>> res.df # approximation of the derivative
array([2.71828183, 3.49034296, 4.48168907, 5.75460268, 7.3890561 ])
>>> res.error # estimate of the error
array(
[7.12940817e-12, 9.16688947e-12, 1.17594823e-11, 1.50972568e-11, 1.93942640e-11]
)
>>> abs(res.df - df(x)) # true error
array(
[3.06421555e-14, 3.01980663e-14, 5.06261699e-14, 6.30606678e-14, 8.34887715e-14]
)
Show the convergence of the approximation as the step size is reduced.
Each iteration, the step size is reduced by `step_factor`, so for
sufficiently small initial step, each iteration reduces the error by a
factor of ``1/step_factor**order`` until finite precision arithmetic
inhibits further improvement.
>>> iter = list(range(1, 12)) # maximum iterations
>>> hfac = 2 # step size reduction per iteration
>>> hdir = [-1, 0, 1] # compare left-, central-, and right- steps
>>> order = 4 # order of differentiation formula
>>> x = 1
>>> ref = df(x)
>>> errors = [] # true error
>>> for i in iter:
... res = _differentiate(f, x, maxiter=i, step_factor=hfac,
... step_direction=hdir, order=order,
... atol=0, rtol=0) # prevent early termination
... errors.append(abs(res.df - ref))
>>> errors = np.array(errors)
>>> plt.semilogy(iter, errors[:, 0], label='left differences')
>>> plt.semilogy(iter, errors[:, 1], label='central differences')
>>> plt.semilogy(iter, errors[:, 2], label='right differences')
>>> plt.xlabel('iteration')
>>> plt.ylabel('error')
>>> plt.legend()
>>> plt.show()
>>> (errors[1, 1] / errors[0, 1], 1 / hfac**order)
(0.06215223140159822, 0.0625)
The implementation is vectorized over `x`, `step_direction`, and `args`.
The function is evaluated once before the first iteration to perform input
validation and standardization, and once per iteration thereafter.
>>> def f(x, p):
... print('here')
... f.nit += 1
... return x**p
>>> f.nit = 0
>>> def df(x, p):
... return p*x**(p-1)
>>> x = np.arange(1, 5)
>>> p = np.arange(1, 6).reshape((-1, 1))
>>> hdir = np.arange(-1, 2).reshape((-1, 1, 1))
>>> res = _differentiate(f, x, args=(p,), step_direction=hdir, maxiter=1)
>>> np.allclose(res.df, df(x, p))
True
>>> res.df.shape
(3, 5, 4)
>>> f.nit
2
By default, `preserve_shape` is False, and therefore the callable
`f` may be called with arrays of any broadcastable shapes.
For example:
>>> shapes = []
>>> def f(x, c):
... shape = np.broadcast_shapes(x.shape, c.shape)
... shapes.append(shape)
... return np.sin(c*x)
>>>
>>> c = [1, 5, 10, 20]
>>> res = _differentiate(f, 0, args=(c,))
>>> shapes
[(4,), (4, 8), (4, 2), (3, 2), (2, 2), (1, 2)]
To understand where these shapes are coming from - and to better
understand how `_differentiate` computes accurate results - note that
higher values of ``c`` correspond with higher frequency sinusoids.
The higher frequency sinusoids make the function's derivative change
faster, so more function evaluations are required to achieve the target
accuracy:
>>> res.nfev
array([11, 13, 15, 17])
The initial ``shape``, ``(4,)``, corresponds with evaluating the
function at a single abscissa and all four frequencies; this is used
for input validation and to determine the size and dtype of the arrays
that store results. The next shape corresponds with evaluating the
function at an initial grid of abscissae and all four frequencies.
Successive calls to the function evaluate the function at two more
abscissae, increasing the effective order of the approximation by two.
However, in later function evaluations, the function is evaluated at
fewer frequencies because the corresponding derivative has already
converged to the required tolerance. This saves function evaluations to
improve performance, but it requires the function to accept arguments of
any shape.
"Vector-valued" functions are unlikely to satisfy this requirement.
For example, consider
>>> def f(x):
... return [x, np.sin(3*x), x+np.sin(10*x), np.sin(20*x)*(x-1)**2]
This integrand is not compatible with `_differentiate` as written; for instance,
the shape of the output will not be the same as the shape of ``x``. Such a
function *could* be converted to a compatible form with the introduction of
additional parameters, but this would be inconvenient. In such cases,
a simpler solution would be to use `preserve_shape`.
>>> shapes = []
>>> def f(x):
... shapes.append(x.shape)
... x0, x1, x2, x3 = x
... return [x0, np.sin(3*x1), x2+np.sin(10*x2), np.sin(20*x3)*(x3-1)**2]
>>>
>>> x = np.zeros(4)
>>> res = _differentiate(f, x, preserve_shape=True)
>>> shapes
[(4,), (4, 8), (4, 2), (4, 2), (4, 2), (4, 2)]
Here, the shape of ``x`` is ``(4,)``. With ``preserve_shape=True``, the
function may be called with argument ``x`` of shape ``(4,)`` or ``(4, n)``,
and this is what we observe.
"""
# TODO (followup):
# - investigate behavior at saddle points
# - array initial_step / step_factor?
# - multivariate functions?
res = _differentiate_iv(func, x, args, atol, rtol, maxiter, order, initial_step,
step_factor, step_direction, preserve_shape, callback)
(func, x, args, atol, rtol, maxiter, order,
h0, fac, hdir, preserve_shape, callback) = res
# Initialization
# Since f(x) (no step) is not needed for central differences, it may be
# possible to eliminate this function evaluation. However, it's useful for
# input validation and standardization, and everything else is designed to
# reduce function calls, so let's keep it simple.
temp = eim._initialize(func, (x,), args, preserve_shape=preserve_shape)
func, xs, fs, args, shape, dtype = temp
x, f = xs[0], fs[0]
df = np.full_like(f, np.nan)
# Ideally we'd broadcast the shape of `hdir` in `_elementwise_algo_init`, but
# it's simpler to do it here than to generalize `_elementwise_algo_init` further.
# `hdir` and `x` are already broadcasted in `_differentiate_iv`, so we know
# that `hdir` can be broadcasted to the final shape.
hdir = np.broadcast_to(hdir, shape).flatten()
status = np.full_like(x, eim._EINPROGRESS, dtype=int) # in progress
nit, nfev = 0, 1 # one function evaluations performed above
# Boolean indices of left, central, right, and (all) one-sided steps
il = hdir < 0
ic = hdir == 0
ir = hdir > 0
io = il | ir
# Most of these attributes are reasonably obvious, but:
# - `fs` holds all the function values of all active `x`. The zeroth
# axis corresponds with active points `x`, the first axis corresponds
# with the different steps (in the order described in
# `_differentiate_weights`).
# - `terms` (which could probably use a better name) is half the `order`,
# which is always even.
work = _RichResult(x=x, df=df, fs=f[:, np.newaxis], error=np.nan, h=h0,
df_last=np.nan, error_last=np.nan, h0=h0, fac=fac,
atol=atol, rtol=rtol, nit=nit, nfev=nfev,
status=status, dtype=dtype, terms=(order+1)//2,
hdir=hdir, il=il, ic=ic, ir=ir, io=io)
# This is the correspondence between terms in the `work` object and the
# final result. In this case, the mapping is trivial. Note that `success`
# is prepended automatically.
res_work_pairs = [('status', 'status'), ('df', 'df'), ('error', 'error'),
('nit', 'nit'), ('nfev', 'nfev'), ('x', 'x')]
def pre_func_eval(work):
"""Determine the abscissae at which the function needs to be evaluated.
See `_differentiate_weights` for a description of the stencil (pattern
of the abscissae).
In the first iteration, there is only one stored function value in
`work.fs`, `f(x)`, so we need to evaluate at `order` new points. In
subsequent iterations, we evaluate at two new points. Note that
`work.x` is always flattened into a 1D array after broadcasting with
all `args`, so we add a new axis at the end and evaluate all point
in one call to the function.
For improvement:
- Consider measuring the step size actually taken, since `(x + h) - x`
is not identically equal to `h` with floating point arithmetic.
- Adjust the step size automatically if `x` is too big to resolve the
step.
- We could probably save some work if there are no central difference
steps or no one-sided steps.
"""
n = work.terms # half the order
h = work.h # step size
c = work.fac # step reduction factor
d = c**0.5 # square root of step reduction factor (one-sided stencil)
# Note - no need to be careful about dtypes until we allocate `x_eval`
if work.nit == 0:
hc = h / c**np.arange(n)
hc = np.concatenate((-hc[::-1], hc))
else:
hc = np.asarray([-h, h]) / c**(n-1)
if work.nit == 0:
hr = h / d**np.arange(2*n)
else:
hr = np.asarray([h, h/d]) / c**(n-1)
n_new = 2*n if work.nit == 0 else 2 # number of new abscissae
x_eval = np.zeros((len(work.hdir), n_new), dtype=work.dtype)
il, ic, ir = work.il, work.ic, work.ir
x_eval[ir] = work.x[ir, np.newaxis] + hr
x_eval[ic] = work.x[ic, np.newaxis] + hc
x_eval[il] = work.x[il, np.newaxis] - hr
return x_eval
def post_func_eval(x, f, work):
""" Estimate the derivative and error from the function evaluations
As in `pre_func_eval`: in the first iteration, there is only one stored
function value in `work.fs`, `f(x)`, so we need to add the `order` new
points. In subsequent iterations, we add two new points. The tricky
part is getting the order to match that of the weights, which is
described in `_differentiate_weights`.
For improvement:
- Change the order of the weights (and steps in `pre_func_eval`) to
simplify `work_fc` concatenation and eliminate `fc` concatenation.
- It would be simple to do one-step Richardson extrapolation with `df`
and `df_last` to increase the order of the estimate and/or improve
the error estimate.
- Process the function evaluations in a more numerically favorable
way. For instance, combining the pairs of central difference evals
into a second-order approximation and using Richardson extrapolation
to produce a higher order approximation seemed to retain accuracy up
to very high order.
- Alternatively, we could use `polyfit` like Jacobi. An advantage of
fitting polynomial to more points than necessary is improved noise
tolerance.
"""
n = work.terms
n_new = n if work.nit == 0 else 1
il, ic, io = work.il, work.ic, work.io
# Central difference
# `work_fc` is *all* the points at which the function has been evaluated
# `fc` is the points we're using *this iteration* to produce the estimate
work_fc = (f[ic, :n_new], work.fs[ic, :], f[ic, -n_new:])
work_fc = np.concatenate(work_fc, axis=-1)
if work.nit == 0:
fc = work_fc
else:
fc = (work_fc[:, :n], work_fc[:, n:n+1], work_fc[:, -n:])
fc = np.concatenate(fc, axis=-1)
# One-sided difference
work_fo = np.concatenate((work.fs[io, :], f[io, :]), axis=-1)
if work.nit == 0:
fo = work_fo
else:
fo = np.concatenate((work_fo[:, 0:1], work_fo[:, -2*n:]), axis=-1)
work.fs = np.zeros((len(ic), work.fs.shape[-1] + 2*n_new))
work.fs[ic] = work_fc
work.fs[io] = work_fo
wc, wo = _differentiate_weights(work, n)
work.df_last = work.df.copy()
work.df[ic] = fc @ wc / work.h
work.df[io] = fo @ wo / work.h
work.df[il] *= -1
work.h /= work.fac
work.error_last = work.error
# Simple error estimate - the difference in derivative estimates between
# this iteration and the last. This is typically conservative because if
# convergence has begin, the true error is much closer to the difference
# between the current estimate and the *next* error estimate. However,
# we could use Richarson extrapolation to produce an error estimate that
# is one order higher, and take the difference between that and
# `work.df` (which would just be constant factor that depends on `fac`.)
work.error = abs(work.df - work.df_last)
def check_termination(work):
"""Terminate due to convergence, non-finite values, or error increase"""
stop = np.zeros_like(work.df).astype(bool)
i = work.error < work.atol + work.rtol*abs(work.df)
work.status[i] = eim._ECONVERGED
stop[i] = True
if work.nit > 0:
i = ~((np.isfinite(work.x) & np.isfinite(work.df)) | stop)
work.df[i], work.status[i] = np.nan, eim._EVALUEERR
stop[i] = True
# With infinite precision, there is a step size below which
# all smaller step sizes will reduce the error. But in floating point
# arithmetic, catastrophic cancellation will begin to cause the error
# to increase again. This heuristic tries to avoid step sizes that are
# too small. There may be more theoretically sound approaches for
# detecting a step size that minimizes the total error, but this
# heuristic seems simple and effective.
i = (work.error > work.error_last*10) & ~stop
work.status[i] = _EERRORINCREASE
stop[i] = True
return stop
def post_termination_check(work):
return
def customize_result(res, shape):
return shape
return eim._loop(work, callback, shape, maxiter, func, args, dtype,
pre_func_eval, post_func_eval, check_termination,
post_termination_check, customize_result, res_work_pairs,
preserve_shape)
def _differentiate_weights(work, n):
# This produces the weights of the finite difference formula for a given
# stencil. In experiments, use of a second-order central difference formula
# with Richardson extrapolation was more accurate numerically, but it was
# more complicated, and it would have become even more complicated when
# adding support for one-sided differences. However, now that all the
# function evaluation values are stored, they can be processed in whatever
# way is desired to produce the derivative estimate. We leave alternative
# approaches to future work. To be more self-contained, here is the theory
# for deriving the weights below.
#
# Recall that the Taylor expansion of a univariate, scalar-values function
# about a point `x` may be expressed as:
# f(x + h) = f(x) + f'(x)*h + f''(x)/2!*h**2 + O(h**3)
# Suppose we evaluate f(x), f(x+h), and f(x-h). We have:
# f(x) = f(x)
# f(x + h) = f(x) + f'(x)*h + f''(x)/2!*h**2 + O(h**3)
# f(x - h) = f(x) - f'(x)*h + f''(x)/2!*h**2 + O(h**3)
# We can solve for weights `wi` such that:
# w1*f(x) = w1*(f(x))
# + w2*f(x + h) = w2*(f(x) + f'(x)*h + f''(x)/2!*h**2) + O(h**3)
# + w3*f(x - h) = w3*(f(x) - f'(x)*h + f''(x)/2!*h**2) + O(h**3)
# = 0 + f'(x)*h + 0 + O(h**3)
# Then
# f'(x) ~ (w1*f(x) + w2*f(x+h) + w3*f(x-h))/h
# is a finite difference derivative approximation with error O(h**2),
# and so it is said to be a "second-order" approximation. Under certain
# conditions (e.g. well-behaved function, `h` sufficiently small), the
# error in the approximation will decrease with h**2; that is, if `h` is
# reduced by a factor of 2, the error is reduced by a factor of 4.
#
# By default, we use eighth-order formulae. Our central-difference formula
# uses abscissae:
# x-h/c**3, x-h/c**2, x-h/c, x-h, x, x+h, x+h/c, x+h/c**2, x+h/c**3
# where `c` is the step factor. (Typically, the step factor is greater than
# one, so the outermost points - as written above - are actually closest to
# `x`.) This "stencil" is chosen so that each iteration, the step can be
# reduced by the factor `c`, and most of the function evaluations can be
# reused with the new step size. For example, in the next iteration, we
# will have:
# x-h/c**4, x-h/c**3, x-h/c**2, x-h/c, x, x+h/c, x+h/c**2, x+h/c**3, x+h/c**4
# We do not reuse `x-h` and `x+h` for the new derivative estimate.
# While this would increase the order of the formula and thus the
# theoretical convergence rate, it is also less stable numerically.
# (As noted above, there are other ways of processing the values that are
# more stable. Thus, even now we store `f(x-h)` and `f(x+h)` in `work.fs`
# to simplify future development of this sort of improvement.)
#
# The (right) one-sided formula is produced similarly using abscissae
# x, x+h, x+h/d, x+h/d**2, ..., x+h/d**6, x+h/d**7, x+h/d**7
# where `d` is the square root of `c`. (The left one-sided formula simply
# uses -h.) When the step size is reduced by factor `c = d**2`, we have
# abscissae:
# x, x+h/d**2, x+h/d**3..., x+h/d**8, x+h/d**9, x+h/d**9
# `d` is chosen as the square root of `c` so that the rate of the step-size
# reduction is the same per iteration as in the central difference case.
# Note that because the central difference formulas are inherently of even
# order, for simplicity, we use only even-order formulas for one-sided
# differences, too.
# It's possible for the user to specify `fac` in, say, double precision but
# `x` and `args` in single precision. `fac` gets converted to single
# precision, but we should always use double precision for the intermediate
# calculations here to avoid additional error in the weights.
fac = work.fac.astype(np.float64)
# Note that if the user switches back to floating point precision with
# `x` and `args`, then `fac` will not necessarily equal the (lower
# precision) cached `_differentiate_weights.fac`, and the weights will
# need to be recalculated. This could be fixed, but it's late, and of
# low consequence.
if fac != _differentiate_weights.fac:
_differentiate_weights.central = []
_differentiate_weights.right = []
_differentiate_weights.fac = fac
if len(_differentiate_weights.central) != 2*n + 1:
# Central difference weights. Consider refactoring this; it could
# probably be more compact.
i = np.arange(-n, n + 1)
p = np.abs(i) - 1. # center point has power `p` -1, but sign `s` is 0
s = np.sign(i)
h = s / fac ** p
A = np.vander(h, increasing=True).T
b = np.zeros(2*n + 1)
b[1] = 1
weights = np.linalg.solve(A, b)
# Enforce identities to improve accuracy
weights[n] = 0
for i in range(n):
weights[-i-1] = -weights[i]
# Cache the weights. We only need to calculate them once unless
# the step factor changes.
_differentiate_weights.central = weights
# One-sided difference weights. The left one-sided weights (with
# negative steps) are simply the negative of the right one-sided
# weights, so no need to compute them separately.
i = np.arange(2*n + 1)
p = i - 1.
s = np.sign(i)
h = s / np.sqrt(fac) ** p
A = np.vander(h, increasing=True).T
b = np.zeros(2 * n + 1)
b[1] = 1
weights = np.linalg.solve(A, b)
_differentiate_weights.right = weights
return (_differentiate_weights.central.astype(work.dtype, copy=False),
_differentiate_weights.right.astype(work.dtype, copy=False))
_differentiate_weights.central = []
_differentiate_weights.right = []
_differentiate_weights.fac = None
@@ -0,0 +1,278 @@
from __future__ import annotations
from typing import ( # noqa: UP035
Any, Callable, Iterable, TYPE_CHECKING
)
import numpy as np
from scipy.optimize import OptimizeResult
from ._constraints import old_bound_to_new, Bounds
from ._direct import direct as _direct # type: ignore
if TYPE_CHECKING:
import numpy.typing as npt
__all__ = ['direct']
ERROR_MESSAGES = (
"Number of function evaluations done is larger than maxfun={}",
"Number of iterations is larger than maxiter={}",
"u[i] < l[i] for some i",
"maxfun is too large",
"Initialization failed",
"There was an error in the creation of the sample points",
"An error occurred while the function was sampled",
"Maximum number of levels has been reached.",
"Forced stop",
"Invalid arguments",
"Out of memory",
)
SUCCESS_MESSAGES = (
("The best function value found is within a relative error={} "
"of the (known) global optimum f_min"),
("The volume of the hyperrectangle containing the lowest function value "
"found is below vol_tol={}"),
("The side length measure of the hyperrectangle containing the lowest "
"function value found is below len_tol={}"),
)
def direct(
func: Callable[[npt.ArrayLike, tuple[Any]], float],
bounds: Iterable | Bounds,
*,
args: tuple = (),
eps: float = 1e-4,
maxfun: int | None = None,
maxiter: int = 1000,
locally_biased: bool = True,
f_min: float = -np.inf,
f_min_rtol: float = 1e-4,
vol_tol: float = 1e-16,
len_tol: float = 1e-6,
callback: Callable[[npt.ArrayLike], None] | None = None
) -> OptimizeResult:
"""
Finds the global minimum of a function using the
DIRECT algorithm.
Parameters
----------
func : callable
The objective function to be minimized.
``func(x, *args) -> float``
where ``x`` is an 1-D array with shape (n,) and ``args`` is a tuple of
the fixed parameters needed to completely specify the function.
bounds : sequence or `Bounds`
Bounds for variables. There are two ways to specify the bounds:
1. Instance of `Bounds` class.
2. ``(min, max)`` pairs for each element in ``x``.
args : tuple, optional
Any additional fixed parameters needed to
completely specify the objective function.
eps : float, optional
Minimal required difference of the objective function values
between the current best hyperrectangle and the next potentially
optimal hyperrectangle to be divided. In consequence, `eps` serves as a
tradeoff between local and global search: the smaller, the more local
the search becomes. Default is 1e-4.
maxfun : int or None, optional
Approximate upper bound on objective function evaluations.
If `None`, will be automatically set to ``1000 * N`` where ``N``
represents the number of dimensions. Will be capped if necessary to
limit DIRECT's RAM usage to app. 1GiB. This will only occur for very
high dimensional problems and excessive `max_fun`. Default is `None`.
maxiter : int, optional
Maximum number of iterations. Default is 1000.
locally_biased : bool, optional
If `True` (default), use the locally biased variant of the
algorithm known as DIRECT_L. If `False`, use the original unbiased
DIRECT algorithm. For hard problems with many local minima,
`False` is recommended.
f_min : float, optional
Function value of the global optimum. Set this value only if the
global optimum is known. Default is ``-np.inf``, so that this
termination criterion is deactivated.
f_min_rtol : float, optional
Terminate the optimization once the relative error between the
current best minimum `f` and the supplied global minimum `f_min`
is smaller than `f_min_rtol`. This parameter is only used if
`f_min` is also set. Must lie between 0 and 1. Default is 1e-4.
vol_tol : float, optional
Terminate the optimization once the volume of the hyperrectangle
containing the lowest function value is smaller than `vol_tol`
of the complete search space. Must lie between 0 and 1.
Default is 1e-16.
len_tol : float, optional
If `locally_biased=True`, terminate the optimization once half of
the normalized maximal side length of the hyperrectangle containing
the lowest function value is smaller than `len_tol`.
If `locally_biased=False`, terminate the optimization once half of
the normalized diagonal of the hyperrectangle containing the lowest
function value is smaller than `len_tol`. Must lie between 0 and 1.
Default is 1e-6.
callback : callable, optional
A callback function with signature ``callback(xk)`` where ``xk``
represents the best function value found so far.
Returns
-------
res : OptimizeResult
The optimization result represented as a ``OptimizeResult`` object.
Important attributes are: ``x`` the solution array, ``success`` a
Boolean flag indicating if the optimizer exited successfully and
``message`` which describes the cause of the termination. See
`OptimizeResult` for a description of other attributes.
Notes
-----
DIviding RECTangles (DIRECT) is a deterministic global
optimization algorithm capable of minimizing a black box function with
its variables subject to lower and upper bound constraints by sampling
potential solutions in the search space [1]_. The algorithm starts by
normalising the search space to an n-dimensional unit hypercube.
It samples the function at the center of this hypercube and at 2n
(n is the number of variables) more points, 2 in each coordinate
direction. Using these function values, DIRECT then divides the
domain into hyperrectangles, each having exactly one of the sampling
points as its center. In each iteration, DIRECT chooses, using the `eps`
parameter which defaults to 1e-4, some of the existing hyperrectangles
to be further divided. This division process continues until either the
maximum number of iterations or maximum function evaluations allowed
are exceeded, or the hyperrectangle containing the minimal value found
so far becomes small enough. If `f_min` is specified, the optimization
will stop once this function value is reached within a relative tolerance.
The locally biased variant of DIRECT (originally called DIRECT_L) [2]_ is
used by default. It makes the search more locally biased and more
efficient for cases with only a few local minima.
A note about termination criteria: `vol_tol` refers to the volume of the
hyperrectangle containing the lowest function value found so far. This
volume decreases exponentially with increasing dimensionality of the
problem. Therefore `vol_tol` should be decreased to avoid premature
termination of the algorithm for higher dimensions. This does not hold
for `len_tol`: it refers either to half of the maximal side length
(for ``locally_biased=True``) or half of the diagonal of the
hyperrectangle (for ``locally_biased=False``).
This code is based on the DIRECT 2.0.4 Fortran code by Gablonsky et al. at
https://ctk.math.ncsu.edu/SOFTWARE/DIRECTv204.tar.gz .
This original version was initially converted via f2c and then cleaned up
and reorganized by Steven G. Johnson, August 2007, for the NLopt project.
The `direct` function wraps the C implementation.
.. versionadded:: 1.9.0
References
----------
.. [1] Jones, D.R., Perttunen, C.D. & Stuckman, B.E. Lipschitzian
optimization without the Lipschitz constant. J Optim Theory Appl
79, 157-181 (1993).
.. [2] Gablonsky, J., Kelley, C. A Locally-Biased form of the DIRECT
Algorithm. Journal of Global Optimization 21, 27-37 (2001).
Examples
--------
The following example is a 2-D problem with four local minima: minimizing
the Styblinski-Tang function
(https://en.wikipedia.org/wiki/Test_functions_for_optimization).
>>> from scipy.optimize import direct, Bounds
>>> def styblinski_tang(pos):
... x, y = pos
... return 0.5 * (x**4 - 16*x**2 + 5*x + y**4 - 16*y**2 + 5*y)
>>> bounds = Bounds([-4., -4.], [4., 4.])
>>> result = direct(styblinski_tang, bounds)
>>> result.x, result.fun, result.nfev
array([-2.90321597, -2.90321597]), -78.3323279095383, 2011
The correct global minimum was found but with a huge number of function
evaluations (2011). Loosening the termination tolerances `vol_tol` and
`len_tol` can be used to stop DIRECT earlier.
>>> result = direct(styblinski_tang, bounds, len_tol=1e-3)
>>> result.x, result.fun, result.nfev
array([-2.9044353, -2.9044353]), -78.33230330754142, 207
"""
# convert bounds to new Bounds class if necessary
if not isinstance(bounds, Bounds):
if isinstance(bounds, list) or isinstance(bounds, tuple):
lb, ub = old_bound_to_new(bounds)
bounds = Bounds(lb, ub)
else:
message = ("bounds must be a sequence or "
"instance of Bounds class")
raise ValueError(message)
lb = np.ascontiguousarray(bounds.lb, dtype=np.float64)
ub = np.ascontiguousarray(bounds.ub, dtype=np.float64)
# validate bounds
# check that lower bounds are smaller than upper bounds
if not np.all(lb < ub):
raise ValueError('Bounds are not consistent min < max')
# check for infs
if (np.any(np.isinf(lb)) or np.any(np.isinf(ub))):
raise ValueError("Bounds must not be inf.")
# validate tolerances
if (vol_tol < 0 or vol_tol > 1):
raise ValueError("vol_tol must be between 0 and 1.")
if (len_tol < 0 or len_tol > 1):
raise ValueError("len_tol must be between 0 and 1.")
if (f_min_rtol < 0 or f_min_rtol > 1):
raise ValueError("f_min_rtol must be between 0 and 1.")
# validate maxfun and maxiter
if maxfun is None:
maxfun = 1000 * lb.shape[0]
if not isinstance(maxfun, int):
raise ValueError("maxfun must be of type int.")
if maxfun < 0:
raise ValueError("maxfun must be > 0.")
if not isinstance(maxiter, int):
raise ValueError("maxiter must be of type int.")
if maxiter < 0:
raise ValueError("maxiter must be > 0.")
# validate boolean parameters
if not isinstance(locally_biased, bool):
raise ValueError("locally_biased must be True or False.")
def _func_wrap(x, args=None):
x = np.asarray(x)
if args is None:
f = func(x)
else:
f = func(x, *args)
# always return a float
return np.asarray(f).item()
# TODO: fix disp argument
x, fun, ret_code, nfev, nit = _direct(
_func_wrap,
np.asarray(lb), np.asarray(ub),
args,
False, eps, maxfun, maxiter,
locally_biased,
f_min, f_min_rtol,
vol_tol, len_tol, callback
)
format_val = (maxfun, maxiter, f_min_rtol, vol_tol, len_tol)
if ret_code > 2:
message = SUCCESS_MESSAGES[ret_code - 3].format(
format_val[ret_code - 1])
elif 0 < ret_code <= 2:
message = ERROR_MESSAGES[ret_code - 1].format(format_val[ret_code - 1])
elif 0 > ret_code > -100:
message = ERROR_MESSAGES[abs(ret_code) + 1]
else:
message = ERROR_MESSAGES[ret_code + 99]
return OptimizeResult(x=np.asarray(x), fun=fun, status=ret_code,
success=ret_code > 2, message=message,
nfev=nfev, nit=nit)
@@ -0,0 +1,715 @@
# Dual Annealing implementation.
# Copyright (c) 2018 Sylvain Gubian <sylvain.gubian@pmi.com>,
# Yang Xiang <yang.xiang@pmi.com>
# Author: Sylvain Gubian, Yang Xiang, PMP S.A.
"""
A Dual Annealing global optimization algorithm
"""
import numpy as np
from scipy.optimize import OptimizeResult
from scipy.optimize import minimize, Bounds
from scipy.special import gammaln
from scipy._lib._util import check_random_state
from scipy.optimize._constraints import new_bounds_to_old
__all__ = ['dual_annealing']
class VisitingDistribution:
"""
Class used to generate new coordinates based on the distorted
Cauchy-Lorentz distribution. Depending on the steps within the strategy
chain, the class implements the strategy for generating new location
changes.
Parameters
----------
lb : array_like
A 1-D NumPy ndarray containing lower bounds of the generated
components. Neither NaN or inf are allowed.
ub : array_like
A 1-D NumPy ndarray containing upper bounds for the generated
components. Neither NaN or inf are allowed.
visiting_param : float
Parameter for visiting distribution. Default value is 2.62.
Higher values give the visiting distribution a heavier tail, this
makes the algorithm jump to a more distant region.
The value range is (1, 3]. Its value is fixed for the life of the
object.
rand_gen : {`~numpy.random.RandomState`, `~numpy.random.Generator`}
A `~numpy.random.RandomState`, `~numpy.random.Generator` object
for using the current state of the created random generator container.
"""
TAIL_LIMIT = 1.e8
MIN_VISIT_BOUND = 1.e-10
def __init__(self, lb, ub, visiting_param, rand_gen):
# if you wish to make _visiting_param adjustable during the life of
# the object then _factor2, _factor3, _factor5, _d1, _factor6 will
# have to be dynamically calculated in `visit_fn`. They're factored
# out here so they don't need to be recalculated all the time.
self._visiting_param = visiting_param
self.rand_gen = rand_gen
self.lower = lb
self.upper = ub
self.bound_range = ub - lb
# these are invariant numbers unless visiting_param changes
self._factor2 = np.exp((4.0 - self._visiting_param) * np.log(
self._visiting_param - 1.0))
self._factor3 = np.exp((2.0 - self._visiting_param) * np.log(2.0)
/ (self._visiting_param - 1.0))
self._factor4_p = np.sqrt(np.pi) * self._factor2 / (self._factor3 * (
3.0 - self._visiting_param))
self._factor5 = 1.0 / (self._visiting_param - 1.0) - 0.5
self._d1 = 2.0 - self._factor5
self._factor6 = np.pi * (1.0 - self._factor5) / np.sin(
np.pi * (1.0 - self._factor5)) / np.exp(gammaln(self._d1))
def visiting(self, x, step, temperature):
""" Based on the step in the strategy chain, new coordinates are
generated by changing all components is the same time or only
one of them, the new values are computed with visit_fn method
"""
dim = x.size
if step < dim:
# Changing all coordinates with a new visiting value
visits = self.visit_fn(temperature, dim)
upper_sample, lower_sample = self.rand_gen.uniform(size=2)
visits[visits > self.TAIL_LIMIT] = self.TAIL_LIMIT * upper_sample
visits[visits < -self.TAIL_LIMIT] = -self.TAIL_LIMIT * lower_sample
x_visit = visits + x
a = x_visit - self.lower
b = np.fmod(a, self.bound_range) + self.bound_range
x_visit = np.fmod(b, self.bound_range) + self.lower
x_visit[np.fabs(
x_visit - self.lower) < self.MIN_VISIT_BOUND] += 1.e-10
else:
# Changing only one coordinate at a time based on strategy
# chain step
x_visit = np.copy(x)
visit = self.visit_fn(temperature, 1)[0]
if visit > self.TAIL_LIMIT:
visit = self.TAIL_LIMIT * self.rand_gen.uniform()
elif visit < -self.TAIL_LIMIT:
visit = -self.TAIL_LIMIT * self.rand_gen.uniform()
index = step - dim
x_visit[index] = visit + x[index]
a = x_visit[index] - self.lower[index]
b = np.fmod(a, self.bound_range[index]) + self.bound_range[index]
x_visit[index] = np.fmod(b, self.bound_range[
index]) + self.lower[index]
if np.fabs(x_visit[index] - self.lower[
index]) < self.MIN_VISIT_BOUND:
x_visit[index] += self.MIN_VISIT_BOUND
return x_visit
def visit_fn(self, temperature, dim):
""" Formula Visita from p. 405 of reference [2] """
x, y = self.rand_gen.normal(size=(dim, 2)).T
factor1 = np.exp(np.log(temperature) / (self._visiting_param - 1.0))
factor4 = self._factor4_p * factor1
# sigmax
x *= np.exp(-(self._visiting_param - 1.0) * np.log(
self._factor6 / factor4) / (3.0 - self._visiting_param))
den = np.exp((self._visiting_param - 1.0) * np.log(np.fabs(y)) /
(3.0 - self._visiting_param))
return x / den
class EnergyState:
"""
Class used to record the energy state. At any time, it knows what is the
currently used coordinates and the most recent best location.
Parameters
----------
lower : array_like
A 1-D NumPy ndarray containing lower bounds for generating an initial
random components in the `reset` method.
upper : array_like
A 1-D NumPy ndarray containing upper bounds for generating an initial
random components in the `reset` method
components. Neither NaN or inf are allowed.
callback : callable, ``callback(x, f, context)``, optional
A callback function which will be called for all minima found.
``x`` and ``f`` are the coordinates and function value of the
latest minimum found, and `context` has value in [0, 1, 2]
"""
# Maximum number of trials for generating a valid starting point
MAX_REINIT_COUNT = 1000
def __init__(self, lower, upper, callback=None):
self.ebest = None
self.current_energy = None
self.current_location = None
self.xbest = None
self.lower = lower
self.upper = upper
self.callback = callback
def reset(self, func_wrapper, rand_gen, x0=None):
"""
Initialize current location is the search domain. If `x0` is not
provided, a random location within the bounds is generated.
"""
if x0 is None:
self.current_location = rand_gen.uniform(self.lower, self.upper,
size=len(self.lower))
else:
self.current_location = np.copy(x0)
init_error = True
reinit_counter = 0
while init_error:
self.current_energy = func_wrapper.fun(self.current_location)
if self.current_energy is None:
raise ValueError('Objective function is returning None')
if (not np.isfinite(self.current_energy) or np.isnan(
self.current_energy)):
if reinit_counter >= EnergyState.MAX_REINIT_COUNT:
init_error = False
message = (
'Stopping algorithm because function '
'create NaN or (+/-) infinity values even with '
'trying new random parameters'
)
raise ValueError(message)
self.current_location = rand_gen.uniform(self.lower,
self.upper,
size=self.lower.size)
reinit_counter += 1
else:
init_error = False
# If first time reset, initialize ebest and xbest
if self.ebest is None and self.xbest is None:
self.ebest = self.current_energy
self.xbest = np.copy(self.current_location)
# Otherwise, we keep them in case of reannealing reset
def update_best(self, e, x, context):
self.ebest = e
self.xbest = np.copy(x)
if self.callback is not None:
val = self.callback(x, e, context)
if val is not None:
if val:
return ('Callback function requested to stop early by '
'returning True')
def update_current(self, e, x):
self.current_energy = e
self.current_location = np.copy(x)
class StrategyChain:
"""
Class that implements within a Markov chain the strategy for location
acceptance and local search decision making.
Parameters
----------
acceptance_param : float
Parameter for acceptance distribution. It is used to control the
probability of acceptance. The lower the acceptance parameter, the
smaller the probability of acceptance. Default value is -5.0 with
a range (-1e4, -5].
visit_dist : VisitingDistribution
Instance of `VisitingDistribution` class.
func_wrapper : ObjectiveFunWrapper
Instance of `ObjectiveFunWrapper` class.
minimizer_wrapper: LocalSearchWrapper
Instance of `LocalSearchWrapper` class.
rand_gen : {None, int, `numpy.random.Generator`,
`numpy.random.RandomState`}, optional
If `seed` is None (or `np.random`), the `numpy.random.RandomState`
singleton is used.
If `seed` is an int, a new ``RandomState`` instance is used,
seeded with `seed`.
If `seed` is already a ``Generator`` or ``RandomState`` instance then
that instance is used.
energy_state: EnergyState
Instance of `EnergyState` class.
"""
def __init__(self, acceptance_param, visit_dist, func_wrapper,
minimizer_wrapper, rand_gen, energy_state):
# Local strategy chain minimum energy and location
self.emin = energy_state.current_energy
self.xmin = np.array(energy_state.current_location)
# Global optimizer state
self.energy_state = energy_state
# Acceptance parameter
self.acceptance_param = acceptance_param
# Visiting distribution instance
self.visit_dist = visit_dist
# Wrapper to objective function
self.func_wrapper = func_wrapper
# Wrapper to the local minimizer
self.minimizer_wrapper = minimizer_wrapper
self.not_improved_idx = 0
self.not_improved_max_idx = 1000
self._rand_gen = rand_gen
self.temperature_step = 0
self.K = 100 * len(energy_state.current_location)
def accept_reject(self, j, e, x_visit):
r = self._rand_gen.uniform()
pqv_temp = 1.0 - ((1.0 - self.acceptance_param) *
(e - self.energy_state.current_energy) / self.temperature_step)
if pqv_temp <= 0.:
pqv = 0.
else:
pqv = np.exp(np.log(pqv_temp) / (
1. - self.acceptance_param))
if r <= pqv:
# We accept the new location and update state
self.energy_state.update_current(e, x_visit)
self.xmin = np.copy(self.energy_state.current_location)
# No improvement for a long time
if self.not_improved_idx >= self.not_improved_max_idx:
if j == 0 or self.energy_state.current_energy < self.emin:
self.emin = self.energy_state.current_energy
self.xmin = np.copy(self.energy_state.current_location)
def run(self, step, temperature):
self.temperature_step = temperature / float(step + 1)
self.not_improved_idx += 1
for j in range(self.energy_state.current_location.size * 2):
if j == 0:
if step == 0:
self.energy_state_improved = True
else:
self.energy_state_improved = False
x_visit = self.visit_dist.visiting(
self.energy_state.current_location, j, temperature)
# Calling the objective function
e = self.func_wrapper.fun(x_visit)
if e < self.energy_state.current_energy:
# We have got a better energy value
self.energy_state.update_current(e, x_visit)
if e < self.energy_state.ebest:
val = self.energy_state.update_best(e, x_visit, 0)
if val is not None:
if val:
return val
self.energy_state_improved = True
self.not_improved_idx = 0
else:
# We have not improved but do we accept the new location?
self.accept_reject(j, e, x_visit)
if self.func_wrapper.nfev >= self.func_wrapper.maxfun:
return ('Maximum number of function call reached '
'during annealing')
# End of StrategyChain loop
def local_search(self):
# Decision making for performing a local search
# based on strategy chain results
# If energy has been improved or no improvement since too long,
# performing a local search with the best strategy chain location
if self.energy_state_improved:
# Global energy has improved, let's see if LS improves further
e, x = self.minimizer_wrapper.local_search(self.energy_state.xbest,
self.energy_state.ebest)
if e < self.energy_state.ebest:
self.not_improved_idx = 0
val = self.energy_state.update_best(e, x, 1)
if val is not None:
if val:
return val
self.energy_state.update_current(e, x)
if self.func_wrapper.nfev >= self.func_wrapper.maxfun:
return ('Maximum number of function call reached '
'during local search')
# Check probability of a need to perform a LS even if no improvement
do_ls = False
if self.K < 90 * len(self.energy_state.current_location):
pls = np.exp(self.K * (
self.energy_state.ebest - self.energy_state.current_energy) /
self.temperature_step)
if pls >= self._rand_gen.uniform():
do_ls = True
# Global energy not improved, let's see what LS gives
# on the best strategy chain location
if self.not_improved_idx >= self.not_improved_max_idx:
do_ls = True
if do_ls:
e, x = self.minimizer_wrapper.local_search(self.xmin, self.emin)
self.xmin = np.copy(x)
self.emin = e
self.not_improved_idx = 0
self.not_improved_max_idx = self.energy_state.current_location.size
if e < self.energy_state.ebest:
val = self.energy_state.update_best(
self.emin, self.xmin, 2)
if val is not None:
if val:
return val
self.energy_state.update_current(e, x)
if self.func_wrapper.nfev >= self.func_wrapper.maxfun:
return ('Maximum number of function call reached '
'during dual annealing')
class ObjectiveFunWrapper:
def __init__(self, func, maxfun=1e7, *args):
self.func = func
self.args = args
# Number of objective function evaluations
self.nfev = 0
# Number of gradient function evaluation if used
self.ngev = 0
# Number of hessian of the objective function if used
self.nhev = 0
self.maxfun = maxfun
def fun(self, x):
self.nfev += 1
return self.func(x, *self.args)
class LocalSearchWrapper:
"""
Class used to wrap around the minimizer used for local search
Default local minimizer is SciPy minimizer L-BFGS-B
"""
LS_MAXITER_RATIO = 6
LS_MAXITER_MIN = 100
LS_MAXITER_MAX = 1000
def __init__(self, search_bounds, func_wrapper, *args, **kwargs):
self.func_wrapper = func_wrapper
self.kwargs = kwargs
self.jac = self.kwargs.get('jac', None)
self.minimizer = minimize
bounds_list = list(zip(*search_bounds))
self.lower = np.array(bounds_list[0])
self.upper = np.array(bounds_list[1])
# If no minimizer specified, use SciPy minimize with 'L-BFGS-B' method
if not self.kwargs:
n = len(self.lower)
ls_max_iter = min(max(n * self.LS_MAXITER_RATIO,
self.LS_MAXITER_MIN),
self.LS_MAXITER_MAX)
self.kwargs['method'] = 'L-BFGS-B'
self.kwargs['options'] = {
'maxiter': ls_max_iter,
}
self.kwargs['bounds'] = list(zip(self.lower, self.upper))
elif callable(self.jac):
def wrapped_jac(x):
return self.jac(x, *args)
self.kwargs['jac'] = wrapped_jac
def local_search(self, x, e):
# Run local search from the given x location where energy value is e
x_tmp = np.copy(x)
mres = self.minimizer(self.func_wrapper.fun, x, **self.kwargs)
if 'njev' in mres:
self.func_wrapper.ngev += mres.njev
if 'nhev' in mres:
self.func_wrapper.nhev += mres.nhev
# Check if is valid value
is_finite = np.all(np.isfinite(mres.x)) and np.isfinite(mres.fun)
in_bounds = np.all(mres.x >= self.lower) and np.all(
mres.x <= self.upper)
is_valid = is_finite and in_bounds
# Use the new point only if it is valid and return a better results
if is_valid and mres.fun < e:
return mres.fun, mres.x
else:
return e, x_tmp
def dual_annealing(func, bounds, args=(), maxiter=1000,
minimizer_kwargs=None, initial_temp=5230.,
restart_temp_ratio=2.e-5, visit=2.62, accept=-5.0,
maxfun=1e7, seed=None, no_local_search=False,
callback=None, x0=None):
"""
Find the global minimum of a function using Dual Annealing.
Parameters
----------
func : callable
The objective function to be minimized. Must be in the form
``f(x, *args)``, where ``x`` is the argument in the form of a 1-D array
and ``args`` is a tuple of any additional fixed parameters needed to
completely specify the function.
bounds : sequence or `Bounds`
Bounds for variables. There are two ways to specify the bounds:
1. Instance of `Bounds` class.
2. Sequence of ``(min, max)`` pairs for each element in `x`.
args : tuple, optional
Any additional fixed parameters needed to completely specify the
objective function.
maxiter : int, optional
The maximum number of global search iterations. Default value is 1000.
minimizer_kwargs : dict, optional
Extra keyword arguments to be passed to the local minimizer
(`minimize`). Some important options could be:
``method`` for the minimizer method to use and ``args`` for
objective function additional arguments.
initial_temp : float, optional
The initial temperature, use higher values to facilitates a wider
search of the energy landscape, allowing dual_annealing to escape
local minima that it is trapped in. Default value is 5230. Range is
(0.01, 5.e4].
restart_temp_ratio : float, optional
During the annealing process, temperature is decreasing, when it
reaches ``initial_temp * restart_temp_ratio``, the reannealing process
is triggered. Default value of the ratio is 2e-5. Range is (0, 1).
visit : float, optional
Parameter for visiting distribution. Default value is 2.62. Higher
values give the visiting distribution a heavier tail, this makes
the algorithm jump to a more distant region. The value range is (1, 3].
accept : float, optional
Parameter for acceptance distribution. It is used to control the
probability of acceptance. The lower the acceptance parameter, the
smaller the probability of acceptance. Default value is -5.0 with
a range (-1e4, -5].
maxfun : int, optional
Soft limit for the number of objective function calls. If the
algorithm is in the middle of a local search, this number will be
exceeded, the algorithm will stop just after the local search is
done. Default value is 1e7.
seed : {None, int, `numpy.random.Generator`, `numpy.random.RandomState`}, optional
If `seed` is None (or `np.random`), the `numpy.random.RandomState`
singleton is used.
If `seed` is an int, a new ``RandomState`` instance is used,
seeded with `seed`.
If `seed` is already a ``Generator`` or ``RandomState`` instance then
that instance is used.
Specify `seed` for repeatable minimizations. The random numbers
generated with this seed only affect the visiting distribution function
and new coordinates generation.
no_local_search : bool, optional
If `no_local_search` is set to True, a traditional Generalized
Simulated Annealing will be performed with no local search
strategy applied.
callback : callable, optional
A callback function with signature ``callback(x, f, context)``,
which will be called for all minima found.
``x`` and ``f`` are the coordinates and function value of the
latest minimum found, and ``context`` has value in [0, 1, 2], with the
following meaning:
- 0: minimum detected in the annealing process.
- 1: detection occurred in the local search process.
- 2: detection done in the dual annealing process.
If the callback implementation returns True, the algorithm will stop.
x0 : ndarray, shape(n,), optional
Coordinates of a single N-D starting point.
Returns
-------
res : OptimizeResult
The optimization result represented as a `OptimizeResult` object.
Important attributes are: ``x`` the solution array, ``fun`` the value
of the function at the solution, and ``message`` which describes the
cause of the termination.
See `OptimizeResult` for a description of other attributes.
Notes
-----
This function implements the Dual Annealing optimization. This stochastic
approach derived from [3]_ combines the generalization of CSA (Classical
Simulated Annealing) and FSA (Fast Simulated Annealing) [1]_ [2]_ coupled
to a strategy for applying a local search on accepted locations [4]_.
An alternative implementation of this same algorithm is described in [5]_
and benchmarks are presented in [6]_. This approach introduces an advanced
method to refine the solution found by the generalized annealing
process. This algorithm uses a distorted Cauchy-Lorentz visiting
distribution, with its shape controlled by the parameter :math:`q_{v}`
.. math::
g_{q_{v}}(\\Delta x(t)) \\propto \\frac{ \\
\\left[T_{q_{v}}(t) \\right]^{-\\frac{D}{3-q_{v}}}}{ \\
\\left[{1+(q_{v}-1)\\frac{(\\Delta x(t))^{2}} { \\
\\left[T_{q_{v}}(t)\\right]^{\\frac{2}{3-q_{v}}}}}\\right]^{ \\
\\frac{1}{q_{v}-1}+\\frac{D-1}{2}}}
Where :math:`t` is the artificial time. This visiting distribution is used
to generate a trial jump distance :math:`\\Delta x(t)` of variable
:math:`x(t)` under artificial temperature :math:`T_{q_{v}}(t)`.
From the starting point, after calling the visiting distribution
function, the acceptance probability is computed as follows:
.. math::
p_{q_{a}} = \\min{\\{1,\\left[1-(1-q_{a}) \\beta \\Delta E \\right]^{ \\
\\frac{1}{1-q_{a}}}\\}}
Where :math:`q_{a}` is a acceptance parameter. For :math:`q_{a}<1`, zero
acceptance probability is assigned to the cases where
.. math::
[1-(1-q_{a}) \\beta \\Delta E] < 0
The artificial temperature :math:`T_{q_{v}}(t)` is decreased according to
.. math::
T_{q_{v}}(t) = T_{q_{v}}(1) \\frac{2^{q_{v}-1}-1}{\\left( \\
1 + t\\right)^{q_{v}-1}-1}
Where :math:`q_{v}` is the visiting parameter.
.. versionadded:: 1.2.0
References
----------
.. [1] Tsallis C. Possible generalization of Boltzmann-Gibbs
statistics. Journal of Statistical Physics, 52, 479-487 (1998).
.. [2] Tsallis C, Stariolo DA. Generalized Simulated Annealing.
Physica A, 233, 395-406 (1996).
.. [3] Xiang Y, Sun DY, Fan W, Gong XG. Generalized Simulated
Annealing Algorithm and Its Application to the Thomson Model.
Physics Letters A, 233, 216-220 (1997).
.. [4] Xiang Y, Gong XG. Efficiency of Generalized Simulated
Annealing. Physical Review E, 62, 4473 (2000).
.. [5] Xiang Y, Gubian S, Suomela B, Hoeng J. Generalized
Simulated Annealing for Efficient Global Optimization: the GenSA
Package for R. The R Journal, Volume 5/1 (2013).
.. [6] Mullen, K. Continuous Global Optimization in R. Journal of
Statistical Software, 60(6), 1 - 45, (2014).
:doi:`10.18637/jss.v060.i06`
Examples
--------
The following example is a 10-D problem, with many local minima.
The function involved is called Rastrigin
(https://en.wikipedia.org/wiki/Rastrigin_function)
>>> import numpy as np
>>> from scipy.optimize import dual_annealing
>>> func = lambda x: np.sum(x*x - 10*np.cos(2*np.pi*x)) + 10*np.size(x)
>>> lw = [-5.12] * 10
>>> up = [5.12] * 10
>>> ret = dual_annealing(func, bounds=list(zip(lw, up)))
>>> ret.x
array([-4.26437714e-09, -3.91699361e-09, -1.86149218e-09, -3.97165720e-09,
-6.29151648e-09, -6.53145322e-09, -3.93616815e-09, -6.55623025e-09,
-6.05775280e-09, -5.00668935e-09]) # random
>>> ret.fun
0.000000
"""
if isinstance(bounds, Bounds):
bounds = new_bounds_to_old(bounds.lb, bounds.ub, len(bounds.lb))
if x0 is not None and not len(x0) == len(bounds):
raise ValueError('Bounds size does not match x0')
lu = list(zip(*bounds))
lower = np.array(lu[0])
upper = np.array(lu[1])
# Check that restart temperature ratio is correct
if restart_temp_ratio <= 0. or restart_temp_ratio >= 1.:
raise ValueError('Restart temperature ratio has to be in range (0, 1)')
# Checking bounds are valid
if (np.any(np.isinf(lower)) or np.any(np.isinf(upper)) or np.any(
np.isnan(lower)) or np.any(np.isnan(upper))):
raise ValueError('Some bounds values are inf values or nan values')
# Checking that bounds are consistent
if not np.all(lower < upper):
raise ValueError('Bounds are not consistent min < max')
# Checking that bounds are the same length
if not len(lower) == len(upper):
raise ValueError('Bounds do not have the same dimensions')
# Wrapper for the objective function
func_wrapper = ObjectiveFunWrapper(func, maxfun, *args)
# minimizer_kwargs has to be a dict, not None
minimizer_kwargs = minimizer_kwargs or {}
minimizer_wrapper = LocalSearchWrapper(
bounds, func_wrapper, *args, **minimizer_kwargs)
# Initialization of random Generator for reproducible runs if seed provided
rand_state = check_random_state(seed)
# Initialization of the energy state
energy_state = EnergyState(lower, upper, callback)
energy_state.reset(func_wrapper, rand_state, x0)
# Minimum value of annealing temperature reached to perform
# re-annealing
temperature_restart = initial_temp * restart_temp_ratio
# VisitingDistribution instance
visit_dist = VisitingDistribution(lower, upper, visit, rand_state)
# Strategy chain instance
strategy_chain = StrategyChain(accept, visit_dist, func_wrapper,
minimizer_wrapper, rand_state, energy_state)
need_to_stop = False
iteration = 0
message = []
# OptimizeResult object to be returned
optimize_res = OptimizeResult()
optimize_res.success = True
optimize_res.status = 0
t1 = np.exp((visit - 1) * np.log(2.0)) - 1.0
# Run the search loop
while not need_to_stop:
for i in range(maxiter):
# Compute temperature for this step
s = float(i) + 2.0
t2 = np.exp((visit - 1) * np.log(s)) - 1.0
temperature = initial_temp * t1 / t2
if iteration >= maxiter:
message.append("Maximum number of iteration reached")
need_to_stop = True
break
# Need a re-annealing process?
if temperature < temperature_restart:
energy_state.reset(func_wrapper, rand_state)
break
# starting strategy chain
val = strategy_chain.run(i, temperature)
if val is not None:
message.append(val)
need_to_stop = True
optimize_res.success = False
break
# Possible local search at the end of the strategy chain
if not no_local_search:
val = strategy_chain.local_search()
if val is not None:
message.append(val)
need_to_stop = True
optimize_res.success = False
break
iteration += 1
# Setting the OptimizeResult values
optimize_res.x = energy_state.xbest
optimize_res.fun = energy_state.ebest
optimize_res.nit = iteration
optimize_res.nfev = func_wrapper.nfev
optimize_res.njev = func_wrapper.ngev
optimize_res.nhev = func_wrapper.nhev
optimize_res.message = message
return optimize_res
@@ -0,0 +1,430 @@
"""Hessian update strategies for quasi-Newton optimization methods."""
import numpy as np
from numpy.linalg import norm
from scipy.linalg import get_blas_funcs
from warnings import warn
__all__ = ['HessianUpdateStrategy', 'BFGS', 'SR1']
class HessianUpdateStrategy:
"""Interface for implementing Hessian update strategies.
Many optimization methods make use of Hessian (or inverse Hessian)
approximations, such as the quasi-Newton methods BFGS, SR1, L-BFGS.
Some of these approximations, however, do not actually need to store
the entire matrix or can compute the internal matrix product with a
given vector in a very efficiently manner. This class serves as an
abstract interface between the optimization algorithm and the
quasi-Newton update strategies, giving freedom of implementation
to store and update the internal matrix as efficiently as possible.
Different choices of initialization and update procedure will result
in different quasi-Newton strategies.
Four methods should be implemented in derived classes: ``initialize``,
``update``, ``dot`` and ``get_matrix``.
Notes
-----
Any instance of a class that implements this interface,
can be accepted by the method ``minimize`` and used by
the compatible solvers to approximate the Hessian (or
inverse Hessian) used by the optimization algorithms.
"""
def initialize(self, n, approx_type):
"""Initialize internal matrix.
Allocate internal memory for storing and updating
the Hessian or its inverse.
Parameters
----------
n : int
Problem dimension.
approx_type : {'hess', 'inv_hess'}
Selects either the Hessian or the inverse Hessian.
When set to 'hess' the Hessian will be stored and updated.
When set to 'inv_hess' its inverse will be used instead.
"""
raise NotImplementedError("The method ``initialize(n, approx_type)``"
" is not implemented.")
def update(self, delta_x, delta_grad):
"""Update internal matrix.
Update Hessian matrix or its inverse (depending on how 'approx_type'
is defined) using information about the last evaluated points.
Parameters
----------
delta_x : ndarray
The difference between two points the gradient
function have been evaluated at: ``delta_x = x2 - x1``.
delta_grad : ndarray
The difference between the gradients:
``delta_grad = grad(x2) - grad(x1)``.
"""
raise NotImplementedError("The method ``update(delta_x, delta_grad)``"
" is not implemented.")
def dot(self, p):
"""Compute the product of the internal matrix with the given vector.
Parameters
----------
p : array_like
1-D array representing a vector.
Returns
-------
Hp : array
1-D represents the result of multiplying the approximation matrix
by vector p.
"""
raise NotImplementedError("The method ``dot(p)``"
" is not implemented.")
def get_matrix(self):
"""Return current internal matrix.
Returns
-------
H : ndarray, shape (n, n)
Dense matrix containing either the Hessian
or its inverse (depending on how 'approx_type'
is defined).
"""
raise NotImplementedError("The method ``get_matrix(p)``"
" is not implemented.")
class FullHessianUpdateStrategy(HessianUpdateStrategy):
"""Hessian update strategy with full dimensional internal representation.
"""
_syr = get_blas_funcs('syr', dtype='d') # Symmetric rank 1 update
_syr2 = get_blas_funcs('syr2', dtype='d') # Symmetric rank 2 update
# Symmetric matrix-vector product
_symv = get_blas_funcs('symv', dtype='d')
def __init__(self, init_scale='auto'):
self.init_scale = init_scale
# Until initialize is called we can't really use the class,
# so it makes sense to set everything to None.
self.first_iteration = None
self.approx_type = None
self.B = None
self.H = None
def initialize(self, n, approx_type):
"""Initialize internal matrix.
Allocate internal memory for storing and updating
the Hessian or its inverse.
Parameters
----------
n : int
Problem dimension.
approx_type : {'hess', 'inv_hess'}
Selects either the Hessian or the inverse Hessian.
When set to 'hess' the Hessian will be stored and updated.
When set to 'inv_hess' its inverse will be used instead.
"""
self.first_iteration = True
self.n = n
self.approx_type = approx_type
if approx_type not in ('hess', 'inv_hess'):
raise ValueError("`approx_type` must be 'hess' or 'inv_hess'.")
# Create matrix
if self.approx_type == 'hess':
self.B = np.eye(n, dtype=float)
else:
self.H = np.eye(n, dtype=float)
def _auto_scale(self, delta_x, delta_grad):
# Heuristic to scale matrix at first iteration.
# Described in Nocedal and Wright "Numerical Optimization"
# p.143 formula (6.20).
s_norm2 = np.dot(delta_x, delta_x)
y_norm2 = np.dot(delta_grad, delta_grad)
ys = np.abs(np.dot(delta_grad, delta_x))
if ys == 0.0 or y_norm2 == 0 or s_norm2 == 0:
return 1
if self.approx_type == 'hess':
return y_norm2 / ys
else:
return ys / y_norm2
def _update_implementation(self, delta_x, delta_grad):
raise NotImplementedError("The method ``_update_implementation``"
" is not implemented.")
def update(self, delta_x, delta_grad):
"""Update internal matrix.
Update Hessian matrix or its inverse (depending on how 'approx_type'
is defined) using information about the last evaluated points.
Parameters
----------
delta_x : ndarray
The difference between two points the gradient
function have been evaluated at: ``delta_x = x2 - x1``.
delta_grad : ndarray
The difference between the gradients:
``delta_grad = grad(x2) - grad(x1)``.
"""
if np.all(delta_x == 0.0):
return
if np.all(delta_grad == 0.0):
warn('delta_grad == 0.0. Check if the approximated '
'function is linear. If the function is linear '
'better results can be obtained by defining the '
'Hessian as zero instead of using quasi-Newton '
'approximations.',
UserWarning, stacklevel=2)
return
if self.first_iteration:
# Get user specific scale
if self.init_scale == "auto":
scale = self._auto_scale(delta_x, delta_grad)
else:
scale = float(self.init_scale)
# Scale initial matrix with ``scale * np.eye(n)``
if self.approx_type == 'hess':
self.B *= scale
else:
self.H *= scale
self.first_iteration = False
self._update_implementation(delta_x, delta_grad)
def dot(self, p):
"""Compute the product of the internal matrix with the given vector.
Parameters
----------
p : array_like
1-D array representing a vector.
Returns
-------
Hp : array
1-D represents the result of multiplying the approximation matrix
by vector p.
"""
if self.approx_type == 'hess':
return self._symv(1, self.B, p)
else:
return self._symv(1, self.H, p)
def get_matrix(self):
"""Return the current internal matrix.
Returns
-------
M : ndarray, shape (n, n)
Dense matrix containing either the Hessian or its inverse
(depending on how `approx_type` was defined).
"""
if self.approx_type == 'hess':
M = np.copy(self.B)
else:
M = np.copy(self.H)
li = np.tril_indices_from(M, k=-1)
M[li] = M.T[li]
return M
class BFGS(FullHessianUpdateStrategy):
"""Broyden-Fletcher-Goldfarb-Shanno (BFGS) Hessian update strategy.
Parameters
----------
exception_strategy : {'skip_update', 'damp_update'}, optional
Define how to proceed when the curvature condition is violated.
Set it to 'skip_update' to just skip the update. Or, alternatively,
set it to 'damp_update' to interpolate between the actual BFGS
result and the unmodified matrix. Both exceptions strategies
are explained in [1]_, p.536-537.
min_curvature : float
This number, scaled by a normalization factor, defines the
minimum curvature ``dot(delta_grad, delta_x)`` allowed to go
unaffected by the exception strategy. By default is equal to
1e-8 when ``exception_strategy = 'skip_update'`` and equal
to 0.2 when ``exception_strategy = 'damp_update'``.
init_scale : {float, 'auto'}
Matrix scale at first iteration. At the first
iteration the Hessian matrix or its inverse will be initialized
with ``init_scale*np.eye(n)``, where ``n`` is the problem dimension.
Set it to 'auto' in order to use an automatic heuristic for choosing
the initial scale. The heuristic is described in [1]_, p.143.
By default uses 'auto'.
Notes
-----
The update is based on the description in [1]_, p.140.
References
----------
.. [1] Nocedal, Jorge, and Stephen J. Wright. "Numerical optimization"
Second Edition (2006).
"""
def __init__(self, exception_strategy='skip_update', min_curvature=None,
init_scale='auto'):
if exception_strategy == 'skip_update':
if min_curvature is not None:
self.min_curvature = min_curvature
else:
self.min_curvature = 1e-8
elif exception_strategy == 'damp_update':
if min_curvature is not None:
self.min_curvature = min_curvature
else:
self.min_curvature = 0.2
else:
raise ValueError("`exception_strategy` must be 'skip_update' "
"or 'damp_update'.")
super().__init__(init_scale)
self.exception_strategy = exception_strategy
def _update_inverse_hessian(self, ys, Hy, yHy, s):
"""Update the inverse Hessian matrix.
BFGS update using the formula:
``H <- H + ((H*y).T*y + s.T*y)/(s.T*y)^2 * (s*s.T)
- 1/(s.T*y) * ((H*y)*s.T + s*(H*y).T)``
where ``s = delta_x`` and ``y = delta_grad``. This formula is
equivalent to (6.17) in [1]_ written in a more efficient way
for implementation.
References
----------
.. [1] Nocedal, Jorge, and Stephen J. Wright. "Numerical optimization"
Second Edition (2006).
"""
self.H = self._syr2(-1.0 / ys, s, Hy, a=self.H)
self.H = self._syr((ys+yHy)/ys**2, s, a=self.H)
def _update_hessian(self, ys, Bs, sBs, y):
"""Update the Hessian matrix.
BFGS update using the formula:
``B <- B - (B*s)*(B*s).T/s.T*(B*s) + y*y^T/s.T*y``
where ``s`` is short for ``delta_x`` and ``y`` is short
for ``delta_grad``. Formula (6.19) in [1]_.
References
----------
.. [1] Nocedal, Jorge, and Stephen J. Wright. "Numerical optimization"
Second Edition (2006).
"""
self.B = self._syr(1.0 / ys, y, a=self.B)
self.B = self._syr(-1.0 / sBs, Bs, a=self.B)
def _update_implementation(self, delta_x, delta_grad):
# Auxiliary variables w and z
if self.approx_type == 'hess':
w = delta_x
z = delta_grad
else:
w = delta_grad
z = delta_x
# Do some common operations
wz = np.dot(w, z)
Mw = self.dot(w)
wMw = Mw.dot(w)
# Guarantee that wMw > 0 by reinitializing matrix.
# While this is always true in exact arithmetic,
# indefinite matrix may appear due to roundoff errors.
if wMw <= 0.0:
scale = self._auto_scale(delta_x, delta_grad)
# Reinitialize matrix
if self.approx_type == 'hess':
self.B = scale * np.eye(self.n, dtype=float)
else:
self.H = scale * np.eye(self.n, dtype=float)
# Do common operations for new matrix
Mw = self.dot(w)
wMw = Mw.dot(w)
# Check if curvature condition is violated
if wz <= self.min_curvature * wMw:
# If the option 'skip_update' is set
# we just skip the update when the condition
# is violated.
if self.exception_strategy == 'skip_update':
return
# If the option 'damp_update' is set we
# interpolate between the actual BFGS
# result and the unmodified matrix.
elif self.exception_strategy == 'damp_update':
update_factor = (1-self.min_curvature) / (1 - wz/wMw)
z = update_factor*z + (1-update_factor)*Mw
wz = np.dot(w, z)
# Update matrix
if self.approx_type == 'hess':
self._update_hessian(wz, Mw, wMw, z)
else:
self._update_inverse_hessian(wz, Mw, wMw, z)
class SR1(FullHessianUpdateStrategy):
"""Symmetric-rank-1 Hessian update strategy.
Parameters
----------
min_denominator : float
This number, scaled by a normalization factor,
defines the minimum denominator magnitude allowed
in the update. When the condition is violated we skip
the update. By default uses ``1e-8``.
init_scale : {float, 'auto'}, optional
Matrix scale at first iteration. At the first
iteration the Hessian matrix or its inverse will be initialized
with ``init_scale*np.eye(n)``, where ``n`` is the problem dimension.
Set it to 'auto' in order to use an automatic heuristic for choosing
the initial scale. The heuristic is described in [1]_, p.143.
By default uses 'auto'.
Notes
-----
The update is based on the description in [1]_, p.144-146.
References
----------
.. [1] Nocedal, Jorge, and Stephen J. Wright. "Numerical optimization"
Second Edition (2006).
"""
def __init__(self, min_denominator=1e-8, init_scale='auto'):
self.min_denominator = min_denominator
super().__init__(init_scale)
def _update_implementation(self, delta_x, delta_grad):
# Auxiliary variables w and z
if self.approx_type == 'hess':
w = delta_x
z = delta_grad
else:
w = delta_grad
z = delta_x
# Do some common operations
Mw = self.dot(w)
z_minus_Mw = z - Mw
denominator = np.dot(w, z_minus_Mw)
# If the denominator is too small
# we just skip the update.
if np.abs(denominator) <= self.min_denominator*norm(w)*norm(z_minus_Mw):
return
# Update matrix
if self.approx_type == 'hess':
self.B = self._syr(1/denominator, z_minus_Mw, a=self.B)
else:
self.H = self._syr(1/denominator, z_minus_Mw, a=self.H)
@@ -0,0 +1,106 @@
# cython: language_level=3
from libcpp cimport bool
from libcpp.string cimport string
cdef extern from "HConst.h" nogil:
const int HIGHS_CONST_I_INF "kHighsIInf"
const double HIGHS_CONST_INF "kHighsInf"
const double kHighsTiny
const double kHighsZero
const int kHighsThreadLimit
cdef enum HighsDebugLevel:
HighsDebugLevel_kHighsDebugLevelNone "kHighsDebugLevelNone" = 0
HighsDebugLevel_kHighsDebugLevelCheap "kHighsDebugLevelCheap"
HighsDebugLevel_kHighsDebugLevelCostly "kHighsDebugLevelCostly"
HighsDebugLevel_kHighsDebugLevelExpensive "kHighsDebugLevelExpensive"
HighsDebugLevel_kHighsDebugLevelMin "kHighsDebugLevelMin" = HighsDebugLevel_kHighsDebugLevelNone
HighsDebugLevel_kHighsDebugLevelMax "kHighsDebugLevelMax" = HighsDebugLevel_kHighsDebugLevelExpensive
ctypedef enum HighsModelStatus:
HighsModelStatusNOTSET "HighsModelStatus::kNotset" = 0
HighsModelStatusLOAD_ERROR "HighsModelStatus::kLoadError"
HighsModelStatusMODEL_ERROR "HighsModelStatus::kModelError"
HighsModelStatusPRESOLVE_ERROR "HighsModelStatus::kPresolveError"
HighsModelStatusSOLVE_ERROR "HighsModelStatus::kSolveError"
HighsModelStatusPOSTSOLVE_ERROR "HighsModelStatus::kPostsolveError"
HighsModelStatusMODEL_EMPTY "HighsModelStatus::kModelEmpty"
HighsModelStatusOPTIMAL "HighsModelStatus::kOptimal"
HighsModelStatusINFEASIBLE "HighsModelStatus::kInfeasible"
HighsModelStatus_UNBOUNDED_OR_INFEASIBLE "HighsModelStatus::kUnboundedOrInfeasible"
HighsModelStatusUNBOUNDED "HighsModelStatus::kUnbounded"
HighsModelStatusREACHED_DUAL_OBJECTIVE_VALUE_UPPER_BOUND "HighsModelStatus::kObjectiveBound"
HighsModelStatusREACHED_OBJECTIVE_TARGET "HighsModelStatus::kObjectiveTarget"
HighsModelStatusREACHED_TIME_LIMIT "HighsModelStatus::kTimeLimit"
HighsModelStatusREACHED_ITERATION_LIMIT "HighsModelStatus::kIterationLimit"
HighsModelStatusUNKNOWN "HighsModelStatus::kUnknown"
HighsModelStatusHIGHS_MODEL_STATUS_MIN "HighsModelStatus::kMin" = HighsModelStatusNOTSET
HighsModelStatusHIGHS_MODEL_STATUS_MAX "HighsModelStatus::kMax" = HighsModelStatusUNKNOWN
cdef enum HighsBasisStatus:
HighsBasisStatusLOWER "HighsBasisStatus::kLower" = 0, # (slack) variable is at its lower bound [including fixed variables]
HighsBasisStatusBASIC "HighsBasisStatus::kBasic" # (slack) variable is basic
HighsBasisStatusUPPER "HighsBasisStatus::kUpper" # (slack) variable is at its upper bound
HighsBasisStatusZERO "HighsBasisStatus::kZero" # free variable is non-basic and set to zero
HighsBasisStatusNONBASIC "HighsBasisStatus::kNonbasic" # nonbasic with no specific bound information - useful for users and postsolve
cdef enum SolverOption:
SOLVER_OPTION_SIMPLEX "SolverOption::SOLVER_OPTION_SIMPLEX" = -1
SOLVER_OPTION_CHOOSE "SolverOption::SOLVER_OPTION_CHOOSE"
SOLVER_OPTION_IPM "SolverOption::SOLVER_OPTION_IPM"
cdef enum PrimalDualStatus:
PrimalDualStatusSTATUS_NOT_SET "PrimalDualStatus::STATUS_NOT_SET" = -1
PrimalDualStatusSTATUS_MIN "PrimalDualStatus::STATUS_MIN" = PrimalDualStatusSTATUS_NOT_SET
PrimalDualStatusSTATUS_NO_SOLUTION "PrimalDualStatus::STATUS_NO_SOLUTION"
PrimalDualStatusSTATUS_UNKNOWN "PrimalDualStatus::STATUS_UNKNOWN"
PrimalDualStatusSTATUS_INFEASIBLE_POINT "PrimalDualStatus::STATUS_INFEASIBLE_POINT"
PrimalDualStatusSTATUS_FEASIBLE_POINT "PrimalDualStatus::STATUS_FEASIBLE_POINT"
PrimalDualStatusSTATUS_MAX "PrimalDualStatus::STATUS_MAX" = PrimalDualStatusSTATUS_FEASIBLE_POINT
cdef enum HighsOptionType:
HighsOptionTypeBOOL "HighsOptionType::kBool" = 0
HighsOptionTypeINT "HighsOptionType::kInt"
HighsOptionTypeDOUBLE "HighsOptionType::kDouble"
HighsOptionTypeSTRING "HighsOptionType::kString"
# workaround for lack of enum class support in Cython < 3.x
# cdef enum class ObjSense(int):
# ObjSenseMINIMIZE "ObjSense::kMinimize" = 1
# ObjSenseMAXIMIZE "ObjSense::kMaximize" = -1
cdef cppclass ObjSense:
pass
cdef ObjSense ObjSenseMINIMIZE "ObjSense::kMinimize"
cdef ObjSense ObjSenseMAXIMIZE "ObjSense::kMaximize"
# cdef enum class MatrixFormat(int):
# MatrixFormatkColwise "MatrixFormat::kColwise" = 1
# MatrixFormatkRowwise "MatrixFormat::kRowwise"
# MatrixFormatkRowwisePartitioned "MatrixFormat::kRowwisePartitioned"
cdef cppclass MatrixFormat:
pass
cdef MatrixFormat MatrixFormatkColwise "MatrixFormat::kColwise"
cdef MatrixFormat MatrixFormatkRowwise "MatrixFormat::kRowwise"
cdef MatrixFormat MatrixFormatkRowwisePartitioned "MatrixFormat::kRowwisePartitioned"
# cdef enum class HighsVarType(int):
# kContinuous "HighsVarType::kContinuous"
# kInteger "HighsVarType::kInteger"
# kSemiContinuous "HighsVarType::kSemiContinuous"
# kSemiInteger "HighsVarType::kSemiInteger"
# kImplicitInteger "HighsVarType::kImplicitInteger"
cdef cppclass HighsVarType:
pass
cdef HighsVarType kContinuous "HighsVarType::kContinuous"
cdef HighsVarType kInteger "HighsVarType::kInteger"
cdef HighsVarType kSemiContinuous "HighsVarType::kSemiContinuous"
cdef HighsVarType kSemiInteger "HighsVarType::kSemiInteger"
cdef HighsVarType kImplicitInteger "HighsVarType::kImplicitInteger"
@@ -0,0 +1,56 @@
# cython: language_level=3
from libc.stdio cimport FILE
from libcpp cimport bool
from libcpp.string cimport string
from .HighsStatus cimport HighsStatus
from .HighsOptions cimport HighsOptions
from .HighsInfo cimport HighsInfo
from .HighsLp cimport (
HighsLp,
HighsSolution,
HighsBasis,
ObjSense,
)
from .HConst cimport HighsModelStatus
cdef extern from "Highs.h":
# From HiGHS/src/Highs.h
cdef cppclass Highs:
HighsStatus passHighsOptions(const HighsOptions& options)
HighsStatus passModel(const HighsLp& lp)
HighsStatus run()
HighsStatus setHighsLogfile(FILE* logfile)
HighsStatus setHighsOutput(FILE* output)
HighsStatus writeHighsOptions(const string filename, const bool report_only_non_default_values = true)
# split up for cython below
#const HighsModelStatus& getModelStatus(const bool scaled_model = False) const
const HighsModelStatus & getModelStatus() const
const HighsInfo& getHighsInfo "getInfo" () const
string modelStatusToString(const HighsModelStatus model_status) const
#HighsStatus getHighsInfoValue(const string& info, int& value)
HighsStatus getHighsInfoValue(const string& info, double& value) const
const HighsOptions& getHighsOptions() const
const HighsLp& getLp() const
HighsStatus writeSolution(const string filename, const bool pretty) const
HighsStatus setBasis()
const HighsSolution& getSolution() const
const HighsBasis& getBasis() const
bool changeObjectiveSense(const ObjSense sense)
HighsStatus setHighsOptionValueBool "setOptionValue" (const string & option, const bool value)
HighsStatus setHighsOptionValueInt "setOptionValue" (const string & option, const int value)
HighsStatus setHighsOptionValueStr "setOptionValue" (const string & option, const string & value)
HighsStatus setHighsOptionValueDbl "setOptionValue" (const string & option, const double value)
string primalDualStatusToString(const int primal_dual_status)
void resetGlobalScheduler(bool blocking)
@@ -0,0 +1,20 @@
# cython: language_level=3
cdef extern from "HighsIO.h" nogil:
# workaround for lack of enum class support in Cython < 3.x
# cdef enum class HighsLogType(int):
# kInfo "HighsLogType::kInfo" = 1
# kDetailed "HighsLogType::kDetailed"
# kVerbose "HighsLogType::kVerbose"
# kWarning "HighsLogType::kWarning"
# kError "HighsLogType::kError"
cdef cppclass HighsLogType:
pass
cdef HighsLogType kInfo "HighsLogType::kInfo"
cdef HighsLogType kDetailed "HighsLogType::kDetailed"
cdef HighsLogType kVerbose "HighsLogType::kVerbose"
cdef HighsLogType kWarning "HighsLogType::kWarning"
cdef HighsLogType kError "HighsLogType::kError"
@@ -0,0 +1,22 @@
# cython: language_level=3
cdef extern from "HighsInfo.h" nogil:
# From HiGHS/src/lp_data/HighsInfo.h
cdef cppclass HighsInfo:
# Inherited from HighsInfoStruct:
int mip_node_count
int simplex_iteration_count
int ipm_iteration_count
int crossover_iteration_count
int primal_solution_status
int dual_solution_status
int basis_validity
double objective_function_value
double mip_dual_bound
double mip_gap
int num_primal_infeasibilities
double max_primal_infeasibility
double sum_primal_infeasibilities
int num_dual_infeasibilities
double max_dual_infeasibility
double sum_dual_infeasibilities
@@ -0,0 +1,46 @@
# cython: language_level=3
from libcpp cimport bool
from libcpp.string cimport string
from libcpp.vector cimport vector
from .HConst cimport HighsBasisStatus, ObjSense, HighsVarType
from .HighsSparseMatrix cimport HighsSparseMatrix
cdef extern from "HighsLp.h" nogil:
# From HiGHS/src/lp_data/HighsLp.h
cdef cppclass HighsLp:
int num_col_
int num_row_
vector[double] col_cost_
vector[double] col_lower_
vector[double] col_upper_
vector[double] row_lower_
vector[double] row_upper_
HighsSparseMatrix a_matrix_
ObjSense sense_
double offset_
string model_name_
vector[string] row_names_
vector[string] col_names_
vector[HighsVarType] integrality_
bool isMip() const
cdef cppclass HighsSolution:
vector[double] col_value
vector[double] col_dual
vector[double] row_value
vector[double] row_dual
cdef cppclass HighsBasis:
bool valid_
vector[HighsBasisStatus] col_status
vector[HighsBasisStatus] row_status
@@ -0,0 +1,9 @@
# cython: language_level=3
from .HighsStatus cimport HighsStatus
from .HighsLp cimport HighsLp
from .HighsOptions cimport HighsOptions
cdef extern from "HighsLpUtils.h" nogil:
# From HiGHS/src/lp_data/HighsLpUtils.h
HighsStatus assessLp(HighsLp& lp, const HighsOptions& options)
@@ -0,0 +1,10 @@
# cython: language_level=3
from libcpp.string cimport string
from .HConst cimport HighsModelStatus
cdef extern from "HighsModelUtils.h" nogil:
# From HiGHS/src/lp_data/HighsModelUtils.h
string utilHighsModelStatusToString(const HighsModelStatus model_status)
string utilBasisStatusToString(const int primal_dual_status)
@@ -0,0 +1,110 @@
# cython: language_level=3
from libc.stdio cimport FILE
from libcpp cimport bool
from libcpp.string cimport string
from libcpp.vector cimport vector
from .HConst cimport HighsOptionType
cdef extern from "HighsOptions.h" nogil:
cdef cppclass OptionRecord:
HighsOptionType type
string name
string description
bool advanced
cdef cppclass OptionRecordBool(OptionRecord):
bool* value
bool default_value
cdef cppclass OptionRecordInt(OptionRecord):
int* value
int lower_bound
int default_value
int upper_bound
cdef cppclass OptionRecordDouble(OptionRecord):
double* value
double lower_bound
double default_value
double upper_bound
cdef cppclass OptionRecordString(OptionRecord):
string* value
string default_value
cdef cppclass HighsOptions:
# From HighsOptionsStruct:
# Options read from the command line
string model_file
string presolve
string solver
string parallel
double time_limit
string options_file
# Options read from the file
double infinite_cost
double infinite_bound
double small_matrix_value
double large_matrix_value
double primal_feasibility_tolerance
double dual_feasibility_tolerance
double ipm_optimality_tolerance
double dual_objective_value_upper_bound
int highs_debug_level
int simplex_strategy
int simplex_scale_strategy
int simplex_crash_strategy
int simplex_dual_edge_weight_strategy
int simplex_primal_edge_weight_strategy
int simplex_iteration_limit
int simplex_update_limit
int ipm_iteration_limit
int highs_min_threads
int highs_max_threads
int message_level
string solution_file
bool write_solution_to_file
bool write_solution_pretty
# Advanced options
bool run_crossover
bool mps_parser_type_free
int keep_n_rows
int allowed_simplex_matrix_scale_factor
int allowed_simplex_cost_scale_factor
int simplex_dualise_strategy
int simplex_permute_strategy
int dual_simplex_cleanup_strategy
int simplex_price_strategy
int dual_chuzc_sort_strategy
bool simplex_initial_condition_check
double simplex_initial_condition_tolerance
double dual_steepest_edge_weight_log_error_threshhold
double dual_simplex_cost_perturbation_multiplier
double start_crossover_tolerance
bool less_infeasible_DSE_check
bool less_infeasible_DSE_choose_row
bool use_original_HFactor_logic
# Options for MIP solver
int mip_max_nodes
int mip_report_level
# Switch for MIP solver
bool mip
# Options for HighsPrintMessage and HighsLogMessage
FILE* logfile
FILE* output
int message_level
string solution_file
bool write_solution_to_file
bool write_solution_pretty
vector[OptionRecord*] records
@@ -0,0 +1,9 @@
# cython: language_level=3
from libcpp cimport bool
from .HighsOptions cimport HighsOptions
cdef extern from "HighsRuntimeOptions.h" nogil:
# From HiGHS/src/lp_data/HighsRuntimeOptions.h
bool loadOptions(int argc, char** argv, HighsOptions& options)
@@ -0,0 +1,12 @@
# cython: language_level=3
from libcpp.string cimport string
cdef extern from "HighsStatus.h" nogil:
ctypedef enum HighsStatus:
HighsStatusError "HighsStatus::kError" = -1
HighsStatusOK "HighsStatus::kOk" = 0
HighsStatusWarning "HighsStatus::kWarning" = 1
string highsStatusToString(HighsStatus status)
@@ -0,0 +1,95 @@
# cython: language_level=3
from libcpp cimport bool
cdef extern from "SimplexConst.h" nogil:
cdef enum SimplexAlgorithm:
PRIMAL "SimplexAlgorithm::kPrimal" = 0
DUAL "SimplexAlgorithm::kDual"
cdef enum SimplexStrategy:
SIMPLEX_STRATEGY_MIN "SimplexStrategy::kSimplexStrategyMin" = 0
SIMPLEX_STRATEGY_CHOOSE "SimplexStrategy::kSimplexStrategyChoose" = SIMPLEX_STRATEGY_MIN
SIMPLEX_STRATEGY_DUAL "SimplexStrategy::kSimplexStrategyDual"
SIMPLEX_STRATEGY_DUAL_PLAIN "SimplexStrategy::kSimplexStrategyDualPlain" = SIMPLEX_STRATEGY_DUAL
SIMPLEX_STRATEGY_DUAL_TASKS "SimplexStrategy::kSimplexStrategyDualTasks"
SIMPLEX_STRATEGY_DUAL_MULTI "SimplexStrategy::kSimplexStrategyDualMulti"
SIMPLEX_STRATEGY_PRIMAL "SimplexStrategy::kSimplexStrategyPrimal"
SIMPLEX_STRATEGY_MAX "SimplexStrategy::kSimplexStrategyMax" = SIMPLEX_STRATEGY_PRIMAL
SIMPLEX_STRATEGY_NUM "SimplexStrategy::kSimplexStrategyNum"
cdef enum SimplexCrashStrategy:
SIMPLEX_CRASH_STRATEGY_MIN "SimplexCrashStrategy::kSimplexCrashStrategyMin" = 0
SIMPLEX_CRASH_STRATEGY_OFF "SimplexCrashStrategy::kSimplexCrashStrategyOff" = SIMPLEX_CRASH_STRATEGY_MIN
SIMPLEX_CRASH_STRATEGY_LTSSF_K "SimplexCrashStrategy::kSimplexCrashStrategyLtssfK"
SIMPLEX_CRASH_STRATEGY_LTSSF "SimplexCrashStrategy::kSimplexCrashStrategyLtssf" = SIMPLEX_CRASH_STRATEGY_LTSSF_K
SIMPLEX_CRASH_STRATEGY_BIXBY "SimplexCrashStrategy::kSimplexCrashStrategyBixby"
SIMPLEX_CRASH_STRATEGY_LTSSF_PRI "SimplexCrashStrategy::kSimplexCrashStrategyLtssfPri"
SIMPLEX_CRASH_STRATEGY_LTSF_K "SimplexCrashStrategy::kSimplexCrashStrategyLtsfK"
SIMPLEX_CRASH_STRATEGY_LTSF_PRI "SimplexCrashStrategy::kSimplexCrashStrategyLtsfPri"
SIMPLEX_CRASH_STRATEGY_LTSF "SimplexCrashStrategy::kSimplexCrashStrategyLtsf"
SIMPLEX_CRASH_STRATEGY_BIXBY_NO_NONZERO_COL_COSTS "SimplexCrashStrategy::kSimplexCrashStrategyBixbyNoNonzeroColCosts"
SIMPLEX_CRASH_STRATEGY_BASIC "SimplexCrashStrategy::kSimplexCrashStrategyBasic"
SIMPLEX_CRASH_STRATEGY_TEST_SING "SimplexCrashStrategy::kSimplexCrashStrategyTestSing"
SIMPLEX_CRASH_STRATEGY_MAX "SimplexCrashStrategy::kSimplexCrashStrategyMax" = SIMPLEX_CRASH_STRATEGY_TEST_SING
cdef enum SimplexEdgeWeightStrategy:
SIMPLEX_EDGE_WEIGHT_STRATEGY_MIN "SimplexEdgeWeightStrategy::kSimplexEdgeWeightStrategyMin" = -1
SIMPLEX_EDGE_WEIGHT_STRATEGY_CHOOSE "SimplexEdgeWeightStrategy::kSimplexEdgeWeightStrategyChoose" = SIMPLEX_EDGE_WEIGHT_STRATEGY_MIN
SIMPLEX_EDGE_WEIGHT_STRATEGY_DANTZIG "SimplexEdgeWeightStrategy::kSimplexEdgeWeightStrategyDantzig"
SIMPLEX_EDGE_WEIGHT_STRATEGY_DEVEX "SimplexEdgeWeightStrategy::kSimplexEdgeWeightStrategyDevex"
SIMPLEX_EDGE_WEIGHT_STRATEGY_STEEPEST_EDGE "SimplexEdgeWeightStrategy::kSimplexEdgeWeightStrategySteepestEdge"
SIMPLEX_EDGE_WEIGHT_STRATEGY_STEEPEST_EDGE_UNIT_INITIAL "SimplexEdgeWeightStrategy::kSimplexEdgeWeightStrategySteepestEdgeUnitInitial"
SIMPLEX_EDGE_WEIGHT_STRATEGY_MAX "SimplexEdgeWeightStrategy::kSimplexEdgeWeightStrategyMax" = SIMPLEX_EDGE_WEIGHT_STRATEGY_STEEPEST_EDGE_UNIT_INITIAL
cdef enum SimplexPriceStrategy:
SIMPLEX_PRICE_STRATEGY_MIN = 0
SIMPLEX_PRICE_STRATEGY_COL = SIMPLEX_PRICE_STRATEGY_MIN
SIMPLEX_PRICE_STRATEGY_ROW
SIMPLEX_PRICE_STRATEGY_ROW_SWITCH
SIMPLEX_PRICE_STRATEGY_ROW_SWITCH_COL_SWITCH
SIMPLEX_PRICE_STRATEGY_MAX = SIMPLEX_PRICE_STRATEGY_ROW_SWITCH_COL_SWITCH
cdef enum SimplexDualChuzcStrategy:
SIMPLEX_DUAL_CHUZC_STRATEGY_MIN = 0
SIMPLEX_DUAL_CHUZC_STRATEGY_CHOOSE = SIMPLEX_DUAL_CHUZC_STRATEGY_MIN
SIMPLEX_DUAL_CHUZC_STRATEGY_QUAD
SIMPLEX_DUAL_CHUZC_STRATEGY_HEAP
SIMPLEX_DUAL_CHUZC_STRATEGY_BOTH
SIMPLEX_DUAL_CHUZC_STRATEGY_MAX = SIMPLEX_DUAL_CHUZC_STRATEGY_BOTH
cdef enum InvertHint:
INVERT_HINT_NO = 0
INVERT_HINT_UPDATE_LIMIT_REACHED
INVERT_HINT_SYNTHETIC_CLOCK_SAYS_INVERT
INVERT_HINT_POSSIBLY_OPTIMAL
INVERT_HINT_POSSIBLY_PRIMAL_UNBOUNDED
INVERT_HINT_POSSIBLY_DUAL_UNBOUNDED
INVERT_HINT_POSSIBLY_SINGULAR_BASIS
INVERT_HINT_PRIMAL_INFEASIBLE_IN_PRIMAL_SIMPLEX
INVERT_HINT_CHOOSE_COLUMN_FAIL
INVERT_HINT_Count
cdef enum DualEdgeWeightMode:
DANTZIG "DualEdgeWeightMode::DANTZIG" = 0
DEVEX "DualEdgeWeightMode::DEVEX"
STEEPEST_EDGE "DualEdgeWeightMode::STEEPEST_EDGE"
Count "DualEdgeWeightMode::Count"
cdef enum PriceMode:
ROW "PriceMode::ROW" = 0
COL "PriceMode::COL"
const int PARALLEL_THREADS_DEFAULT
const int DUAL_TASKS_MIN_THREADS
const int DUAL_MULTI_MIN_THREADS
const bool invert_if_row_out_negative
const int NONBASIC_FLAG_TRUE
const int NONBASIC_FLAG_FALSE
const int NONBASIC_MOVE_UP
const int NONBASIC_MOVE_DN
const int NONBASIC_MOVE_ZE
@@ -0,0 +1,7 @@
# cython: language_level=3
cdef extern from "highs_c_api.h" nogil:
int Highs_passLp(void* highs, int numcol, int numrow, int numnz,
double* colcost, double* collower, double* colupper,
double* rowlower, double* rowupper,
int* astart, int* aindex, double* avalue)
@@ -0,0 +1,158 @@
from __future__ import annotations
from typing import TYPE_CHECKING
import numpy as np
from ._optimize import OptimizeResult
from ._pava_pybind import pava
if TYPE_CHECKING:
import numpy.typing as npt
__all__ = ["isotonic_regression"]
def isotonic_regression(
y: npt.ArrayLike,
*,
weights: npt.ArrayLike | None = None,
increasing: bool = True,
) -> OptimizeResult:
r"""Nonparametric isotonic regression.
A (not strictly) monotonically increasing array `x` with the same length
as `y` is calculated by the pool adjacent violators algorithm (PAVA), see
[1]_. See the Notes section for more details.
Parameters
----------
y : (N,) array_like
Response variable.
weights : (N,) array_like or None
Case weights.
increasing : bool
If True, fit monotonic increasing, i.e. isotonic, regression.
If False, fit a monotonic decreasing, i.e. antitonic, regression.
Default is True.
Returns
-------
res : OptimizeResult
The optimization result represented as a ``OptimizeResult`` object.
Important attributes are:
- ``x``: The isotonic regression solution, i.e. an increasing (or
decreasing) array of the same length than y, with elements in the
range from min(y) to max(y).
- ``weights`` : Array with the sum of case weights for each block
(or pool) B.
- ``blocks``: Array of length B+1 with the indices of the start
positions of each block (or pool) B. The j-th block is given by
``x[blocks[j]:blocks[j+1]]`` for which all values are the same.
Notes
-----
Given data :math:`y` and case weights :math:`w`, the isotonic regression
solves the following optimization problem:
.. math::
\operatorname{argmin}_{x_i} \sum_i w_i (y_i - x_i)^2 \quad
\text{subject to } x_i \leq x_j \text{ whenever } i \leq j \,.
For every input value :math:`y_i`, it generates a value :math:`x_i` such
that :math:`x` is increasing (but not strictly), i.e.
:math:`x_i \leq x_{i+1}`. This is accomplished by the PAVA.
The solution consists of pools or blocks, i.e. neighboring elements of
:math:`x`, e.g. :math:`x_i` and :math:`x_{i+1}`, that all have the same
value.
Most interestingly, the solution stays the same if the squared loss is
replaced by the wide class of Bregman functions which are the unique
class of strictly consistent scoring functions for the mean, see [2]_
and references therein.
The implemented version of PAVA according to [1]_ has a computational
complexity of O(N) with input size N.
References
----------
.. [1] Busing, F. M. T. A. (2022).
Monotone Regression: A Simple and Fast O(n) PAVA Implementation.
Journal of Statistical Software, Code Snippets, 102(1), 1-25.
:doi:`10.18637/jss.v102.c01`
.. [2] Jordan, A.I., Mühlemann, A. & Ziegel, J.F.
Characterizing the optimal solutions to the isotonic regression
problem for identifiable functionals.
Ann Inst Stat Math 74, 489-514 (2022).
:doi:`10.1007/s10463-021-00808-0`
Examples
--------
This example demonstrates that ``isotonic_regression`` really solves a
constrained optimization problem.
>>> import numpy as np
>>> from scipy.optimize import isotonic_regression, minimize
>>> y = [1.5, 1.0, 4.0, 6.0, 5.7, 5.0, 7.8, 9.0, 7.5, 9.5, 9.0]
>>> def objective(yhat, y):
... return np.sum((yhat - y)**2)
>>> def constraint(yhat, y):
... # This is for a monotonically increasing regression.
... return np.diff(yhat)
>>> result = minimize(objective, x0=y, args=(y,),
... constraints=[{'type': 'ineq',
... 'fun': lambda x: constraint(x, y)}])
>>> result.x
array([1.25 , 1.25 , 4. , 5.56666667, 5.56666667,
5.56666667, 7.8 , 8.25 , 8.25 , 9.25 ,
9.25 ])
>>> result = isotonic_regression(y)
>>> result.x
array([1.25 , 1.25 , 4. , 5.56666667, 5.56666667,
5.56666667, 7.8 , 8.25 , 8.25 , 9.25 ,
9.25 ])
The big advantage of ``isotonic_regression`` compared to calling
``minimize`` is that it is more user friendly, i.e. one does not need to
define objective and constraint functions, and that it is orders of
magnitudes faster. On commodity hardware (in 2023), for normal distributed
input y of length 1000, the minimizer takes about 4 seconds, while
``isotonic_regression`` takes about 200 microseconds.
"""
yarr = np.asarray(y) # Check yarr.ndim == 1 is implicit (pybind11) in pava.
if weights is None:
warr = np.ones_like(yarr)
else:
warr = np.asarray(weights)
if not (yarr.ndim == warr.ndim == 1 and yarr.shape[0] == warr.shape[0]):
raise ValueError(
"Input arrays y and w must have one dimension of equal length."
)
if np.any(warr <= 0):
raise ValueError("Weights w must be strictly positive.")
order = slice(None) if increasing else slice(None, None, -1)
x = np.array(yarr[order], order="C", dtype=np.float64, copy=True)
wx = np.array(warr[order], order="C", dtype=np.float64, copy=True)
n = x.shape[0]
r = np.full(shape=n + 1, fill_value=-1, dtype=np.intp)
x, wx, r, b = pava(x, wx, r)
# Now that we know the number of blocks b, we only keep the relevant part
# of r and wx.
# As information: Due to the pava implementation, after the last block
# index, there might be smaller numbers appended to r, e.g.
# r = [0, 10, 8, 7] which in the end should be r = [0, 10].
r = r[:b + 1]
wx = wx[:b]
if not increasing:
x = x[::-1]
wx = wx[::-1]
r = r[-1] - r[::-1]
return OptimizeResult(
x=x,
weights=wx,
blocks=r,
)
@@ -0,0 +1,543 @@
"""
Functions
---------
.. autosummary::
:toctree: generated/
fmin_l_bfgs_b
"""
## License for the Python wrapper
## ==============================
## Copyright (c) 2004 David M. Cooke <cookedm@physics.mcmaster.ca>
## Permission is hereby granted, free of charge, to any person obtaining a
## copy of this software and associated documentation files (the "Software"),
## to deal in the Software without restriction, including without limitation
## the rights to use, copy, modify, merge, publish, distribute, sublicense,
## and/or sell copies of the Software, and to permit persons to whom the
## Software is furnished to do so, subject to the following conditions:
## The above copyright notice and this permission notice shall be included in
## all copies or substantial portions of the Software.
## THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
## IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
## FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
## AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
## LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
## FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
## DEALINGS IN THE SOFTWARE.
## Modifications by Travis Oliphant and Enthought, Inc. for inclusion in SciPy
import numpy as np
from numpy import array, asarray, float64, zeros
from . import _lbfgsb
from ._optimize import (MemoizeJac, OptimizeResult, _call_callback_maybe_halt,
_wrap_callback, _check_unknown_options,
_prepare_scalar_function)
from ._constraints import old_bound_to_new
from scipy.sparse.linalg import LinearOperator
__all__ = ['fmin_l_bfgs_b', 'LbfgsInvHessProduct']
def fmin_l_bfgs_b(func, x0, fprime=None, args=(),
approx_grad=0,
bounds=None, m=10, factr=1e7, pgtol=1e-5,
epsilon=1e-8,
iprint=-1, maxfun=15000, maxiter=15000, disp=None,
callback=None, maxls=20):
"""
Minimize a function func using the L-BFGS-B algorithm.
Parameters
----------
func : callable f(x,*args)
Function to minimize.
x0 : ndarray
Initial guess.
fprime : callable fprime(x,*args), optional
The gradient of `func`. If None, then `func` returns the function
value and the gradient (``f, g = func(x, *args)``), unless
`approx_grad` is True in which case `func` returns only ``f``.
args : sequence, optional
Arguments to pass to `func` and `fprime`.
approx_grad : bool, optional
Whether to approximate the gradient numerically (in which case
`func` returns only the function value).
bounds : list, optional
``(min, max)`` pairs for each element in ``x``, defining
the bounds on that parameter. Use None or +-inf for one of ``min`` or
``max`` when there is no bound in that direction.
m : int, optional
The maximum number of variable metric corrections
used to define the limited memory matrix. (The limited memory BFGS
method does not store the full hessian but uses this many terms in an
approximation to it.)
factr : float, optional
The iteration stops when
``(f^k - f^{k+1})/max{|f^k|,|f^{k+1}|,1} <= factr * eps``,
where ``eps`` is the machine precision, which is automatically
generated by the code. Typical values for `factr` are: 1e12 for
low accuracy; 1e7 for moderate accuracy; 10.0 for extremely
high accuracy. See Notes for relationship to `ftol`, which is exposed
(instead of `factr`) by the `scipy.optimize.minimize` interface to
L-BFGS-B.
pgtol : float, optional
The iteration will stop when
``max{|proj g_i | i = 1, ..., n} <= pgtol``
where ``proj g_i`` is the i-th component of the projected gradient.
epsilon : float, optional
Step size used when `approx_grad` is True, for numerically
calculating the gradient
iprint : int, optional
Controls the frequency of output. ``iprint < 0`` means no output;
``iprint = 0`` print only one line at the last iteration;
``0 < iprint < 99`` print also f and ``|proj g|`` every iprint iterations;
``iprint = 99`` print details of every iteration except n-vectors;
``iprint = 100`` print also the changes of active set and final x;
``iprint > 100`` print details of every iteration including x and g.
disp : int, optional
If zero, then no output. If a positive number, then this over-rides
`iprint` (i.e., `iprint` gets the value of `disp`).
maxfun : int, optional
Maximum number of function evaluations. Note that this function
may violate the limit because of evaluating gradients by numerical
differentiation.
maxiter : int, optional
Maximum number of iterations.
callback : callable, optional
Called after each iteration, as ``callback(xk)``, where ``xk`` is the
current parameter vector.
maxls : int, optional
Maximum number of line search steps (per iteration). Default is 20.
Returns
-------
x : array_like
Estimated position of the minimum.
f : float
Value of `func` at the minimum.
d : dict
Information dictionary.
* d['warnflag'] is
- 0 if converged,
- 1 if too many function evaluations or too many iterations,
- 2 if stopped for another reason, given in d['task']
* d['grad'] is the gradient at the minimum (should be 0 ish)
* d['funcalls'] is the number of function calls made.
* d['nit'] is the number of iterations.
See also
--------
minimize: Interface to minimization algorithms for multivariate
functions. See the 'L-BFGS-B' `method` in particular. Note that the
`ftol` option is made available via that interface, while `factr` is
provided via this interface, where `factr` is the factor multiplying
the default machine floating-point precision to arrive at `ftol`:
``ftol = factr * numpy.finfo(float).eps``.
Notes
-----
License of L-BFGS-B (FORTRAN code):
The version included here (in fortran code) is 3.0
(released April 25, 2011). It was written by Ciyou Zhu, Richard Byrd,
and Jorge Nocedal <nocedal@ece.nwu.edu>. It carries the following
condition for use:
This software is freely available, but we expect that all publications
describing work using this software, or all commercial products using it,
quote at least one of the references given below. This software is released
under the BSD License.
References
----------
* R. H. Byrd, P. Lu and J. Nocedal. A Limited Memory Algorithm for Bound
Constrained Optimization, (1995), SIAM Journal on Scientific and
Statistical Computing, 16, 5, pp. 1190-1208.
* C. Zhu, R. H. Byrd and J. Nocedal. L-BFGS-B: Algorithm 778: L-BFGS-B,
FORTRAN routines for large scale bound constrained optimization (1997),
ACM Transactions on Mathematical Software, 23, 4, pp. 550 - 560.
* J.L. Morales and J. Nocedal. L-BFGS-B: Remark on Algorithm 778: L-BFGS-B,
FORTRAN routines for large scale bound constrained optimization (2011),
ACM Transactions on Mathematical Software, 38, 1.
Examples
--------
Solve a linear regression problem via `fmin_l_bfgs_b`. To do this, first we define
an objective function ``f(m, b) = (y - y_model)**2``, where `y` describes the
observations and `y_model` the prediction of the linear model as
``y_model = m*x + b``. The bounds for the parameters, ``m`` and ``b``, are arbitrarily
chosen as ``(0,5)`` and ``(5,10)`` for this example.
>>> import numpy as np
>>> from scipy.optimize import fmin_l_bfgs_b
>>> X = np.arange(0, 10, 1)
>>> M = 2
>>> B = 3
>>> Y = M * X + B
>>> def func(parameters, *args):
... x = args[0]
... y = args[1]
... m, b = parameters
... y_model = m*x + b
... error = sum(np.power((y - y_model), 2))
... return error
>>> initial_values = np.array([0.0, 1.0])
>>> x_opt, f_opt, info = fmin_l_bfgs_b(func, x0=initial_values, args=(X, Y),
... approx_grad=True)
>>> x_opt, f_opt
array([1.99999999, 3.00000006]), 1.7746231151323805e-14 # may vary
The optimized parameters in ``x_opt`` agree with the ground truth parameters
``m`` and ``b``. Next, let us perform a bound contrained optimization using the `bounds`
parameter.
>>> bounds = [(0, 5), (5, 10)]
>>> x_opt, f_op, info = fmin_l_bfgs_b(func, x0=initial_values, args=(X, Y),
... approx_grad=True, bounds=bounds)
>>> x_opt, f_opt
array([1.65990508, 5.31649385]), 15.721334516453945 # may vary
"""
# handle fprime/approx_grad
if approx_grad:
fun = func
jac = None
elif fprime is None:
fun = MemoizeJac(func)
jac = fun.derivative
else:
fun = func
jac = fprime
# build options
callback = _wrap_callback(callback)
opts = {'disp': disp,
'iprint': iprint,
'maxcor': m,
'ftol': factr * np.finfo(float).eps,
'gtol': pgtol,
'eps': epsilon,
'maxfun': maxfun,
'maxiter': maxiter,
'callback': callback,
'maxls': maxls}
res = _minimize_lbfgsb(fun, x0, args=args, jac=jac, bounds=bounds,
**opts)
d = {'grad': res['jac'],
'task': res['message'],
'funcalls': res['nfev'],
'nit': res['nit'],
'warnflag': res['status']}
f = res['fun']
x = res['x']
return x, f, d
def _minimize_lbfgsb(fun, x0, args=(), jac=None, bounds=None,
disp=None, maxcor=10, ftol=2.2204460492503131e-09,
gtol=1e-5, eps=1e-8, maxfun=15000, maxiter=15000,
iprint=-1, callback=None, maxls=20,
finite_diff_rel_step=None, **unknown_options):
"""
Minimize a scalar function of one or more variables using the L-BFGS-B
algorithm.
Options
-------
disp : None or int
If `disp is None` (the default), then the supplied version of `iprint`
is used. If `disp is not None`, then it overrides the supplied version
of `iprint` with the behaviour you outlined.
maxcor : int
The maximum number of variable metric corrections used to
define the limited memory matrix. (The limited memory BFGS
method does not store the full hessian but uses this many terms
in an approximation to it.)
ftol : float
The iteration stops when ``(f^k -
f^{k+1})/max{|f^k|,|f^{k+1}|,1} <= ftol``.
gtol : float
The iteration will stop when ``max{|proj g_i | i = 1, ..., n}
<= gtol`` where ``proj g_i`` is the i-th component of the
projected gradient.
eps : float or ndarray
If `jac is None` the absolute step size used for numerical
approximation of the jacobian via forward differences.
maxfun : int
Maximum number of function evaluations. Note that this function
may violate the limit because of evaluating gradients by numerical
differentiation.
maxiter : int
Maximum number of iterations.
iprint : int, optional
Controls the frequency of output. ``iprint < 0`` means no output;
``iprint = 0`` print only one line at the last iteration;
``0 < iprint < 99`` print also f and ``|proj g|`` every iprint iterations;
``iprint = 99`` print details of every iteration except n-vectors;
``iprint = 100`` print also the changes of active set and final x;
``iprint > 100`` print details of every iteration including x and g.
maxls : int, optional
Maximum number of line search steps (per iteration). Default is 20.
finite_diff_rel_step : None or array_like, optional
If `jac in ['2-point', '3-point', 'cs']` the relative step size to
use for numerical approximation of the jacobian. The absolute step
size is computed as ``h = rel_step * sign(x) * max(1, abs(x))``,
possibly adjusted to fit into the bounds. For ``method='3-point'``
the sign of `h` is ignored. If None (default) then step is selected
automatically.
Notes
-----
The option `ftol` is exposed via the `scipy.optimize.minimize` interface,
but calling `scipy.optimize.fmin_l_bfgs_b` directly exposes `factr`. The
relationship between the two is ``ftol = factr * numpy.finfo(float).eps``.
I.e., `factr` multiplies the default machine floating-point precision to
arrive at `ftol`.
"""
_check_unknown_options(unknown_options)
m = maxcor
pgtol = gtol
factr = ftol / np.finfo(float).eps
x0 = asarray(x0).ravel()
n, = x0.shape
# historically old-style bounds were/are expected by lbfgsb.
# That's still the case but we'll deal with new-style from here on,
# it's easier
if bounds is None:
pass
elif len(bounds) != n:
raise ValueError('length of x0 != length of bounds')
else:
bounds = np.array(old_bound_to_new(bounds))
# check bounds
if (bounds[0] > bounds[1]).any():
raise ValueError(
"LBFGSB - one of the lower bounds is greater than an upper bound."
)
# initial vector must lie within the bounds. Otherwise ScalarFunction and
# approx_derivative will cause problems
x0 = np.clip(x0, bounds[0], bounds[1])
if disp is not None:
if disp == 0:
iprint = -1
else:
iprint = disp
# _prepare_scalar_function can use bounds=None to represent no bounds
sf = _prepare_scalar_function(fun, x0, jac=jac, args=args, epsilon=eps,
bounds=bounds,
finite_diff_rel_step=finite_diff_rel_step)
func_and_grad = sf.fun_and_grad
fortran_int = _lbfgsb.types.intvar.dtype
nbd = zeros(n, fortran_int)
low_bnd = zeros(n, float64)
upper_bnd = zeros(n, float64)
bounds_map = {(-np.inf, np.inf): 0,
(1, np.inf): 1,
(1, 1): 2,
(-np.inf, 1): 3}
if bounds is not None:
for i in range(0, n):
l, u = bounds[0, i], bounds[1, i]
if not np.isinf(l):
low_bnd[i] = l
l = 1
if not np.isinf(u):
upper_bnd[i] = u
u = 1
nbd[i] = bounds_map[l, u]
if not maxls > 0:
raise ValueError('maxls must be positive.')
x = array(x0, float64)
f = array(0.0, float64)
g = zeros((n,), float64)
wa = zeros(2*m*n + 5*n + 11*m*m + 8*m, float64)
iwa = zeros(3*n, fortran_int)
task = zeros(1, 'S60')
csave = zeros(1, 'S60')
lsave = zeros(4, fortran_int)
isave = zeros(44, fortran_int)
dsave = zeros(29, float64)
task[:] = 'START'
n_iterations = 0
while 1:
# g may become float32 if a user provides a function that calculates
# the Jacobian in float32 (see gh-18730). The underlying Fortran code
# expects float64, so upcast it
g = g.astype(np.float64)
# x, f, g, wa, iwa, task, csave, lsave, isave, dsave = \
_lbfgsb.setulb(m, x, low_bnd, upper_bnd, nbd, f, g, factr,
pgtol, wa, iwa, task, iprint, csave, lsave,
isave, dsave, maxls)
task_str = task.tobytes()
if task_str.startswith(b'FG'):
# The minimization routine wants f and g at the current x.
# Note that interruptions due to maxfun are postponed
# until the completion of the current minimization iteration.
# Overwrite f and g:
f, g = func_and_grad(x)
elif task_str.startswith(b'NEW_X'):
# new iteration
n_iterations += 1
intermediate_result = OptimizeResult(x=x, fun=f)
if _call_callback_maybe_halt(callback, intermediate_result):
task[:] = 'STOP: CALLBACK REQUESTED HALT'
if n_iterations >= maxiter:
task[:] = 'STOP: TOTAL NO. of ITERATIONS REACHED LIMIT'
elif sf.nfev > maxfun:
task[:] = ('STOP: TOTAL NO. of f AND g EVALUATIONS '
'EXCEEDS LIMIT')
else:
break
task_str = task.tobytes().strip(b'\x00').strip()
if task_str.startswith(b'CONV'):
warnflag = 0
elif sf.nfev > maxfun or n_iterations >= maxiter:
warnflag = 1
else:
warnflag = 2
# These two portions of the workspace are described in the mainlb
# subroutine in lbfgsb.f. See line 363.
s = wa[0: m*n].reshape(m, n)
y = wa[m*n: 2*m*n].reshape(m, n)
# See lbfgsb.f line 160 for this portion of the workspace.
# isave(31) = the total number of BFGS updates prior the current iteration;
n_bfgs_updates = isave[30]
n_corrs = min(n_bfgs_updates, maxcor)
hess_inv = LbfgsInvHessProduct(s[:n_corrs], y[:n_corrs])
task_str = task_str.decode()
return OptimizeResult(fun=f, jac=g, nfev=sf.nfev,
njev=sf.ngev,
nit=n_iterations, status=warnflag, message=task_str,
x=x, success=(warnflag == 0), hess_inv=hess_inv)
class LbfgsInvHessProduct(LinearOperator):
"""Linear operator for the L-BFGS approximate inverse Hessian.
This operator computes the product of a vector with the approximate inverse
of the Hessian of the objective function, using the L-BFGS limited
memory approximation to the inverse Hessian, accumulated during the
optimization.
Objects of this class implement the ``scipy.sparse.linalg.LinearOperator``
interface.
Parameters
----------
sk : array_like, shape=(n_corr, n)
Array of `n_corr` most recent updates to the solution vector.
(See [1]).
yk : array_like, shape=(n_corr, n)
Array of `n_corr` most recent updates to the gradient. (See [1]).
References
----------
.. [1] Nocedal, Jorge. "Updating quasi-Newton matrices with limited
storage." Mathematics of computation 35.151 (1980): 773-782.
"""
def __init__(self, sk, yk):
"""Construct the operator."""
if sk.shape != yk.shape or sk.ndim != 2:
raise ValueError('sk and yk must have matching shape, (n_corrs, n)')
n_corrs, n = sk.shape
super().__init__(dtype=np.float64, shape=(n, n))
self.sk = sk
self.yk = yk
self.n_corrs = n_corrs
self.rho = 1 / np.einsum('ij,ij->i', sk, yk)
def _matvec(self, x):
"""Efficient matrix-vector multiply with the BFGS matrices.
This calculation is described in Section (4) of [1].
Parameters
----------
x : ndarray
An array with shape (n,) or (n,1).
Returns
-------
y : ndarray
The matrix-vector product
"""
s, y, n_corrs, rho = self.sk, self.yk, self.n_corrs, self.rho
q = np.array(x, dtype=self.dtype, copy=True)
if q.ndim == 2 and q.shape[1] == 1:
q = q.reshape(-1)
alpha = np.empty(n_corrs)
for i in range(n_corrs-1, -1, -1):
alpha[i] = rho[i] * np.dot(s[i], q)
q = q - alpha[i]*y[i]
r = q
for i in range(n_corrs):
beta = rho[i] * np.dot(y[i], r)
r = r + s[i] * (alpha[i] - beta)
return r
def todense(self):
"""Return a dense array representation of this operator.
Returns
-------
arr : ndarray, shape=(n, n)
An array with the same shape and containing
the same data represented by this `LinearOperator`.
"""
s, y, n_corrs, rho = self.sk, self.yk, self.n_corrs, self.rho
I = np.eye(*self.shape, dtype=self.dtype)
Hk = I
for i in range(n_corrs):
A1 = I - s[i][:, np.newaxis] * y[i][np.newaxis, :] * rho[i]
A2 = I - y[i][:, np.newaxis] * s[i][np.newaxis, :] * rho[i]
Hk = np.dot(A1, np.dot(Hk, A2)) + (rho[i] * s[i][:, np.newaxis] *
s[i][np.newaxis, :])
return Hk
@@ -0,0 +1,897 @@
"""
Functions
---------
.. autosummary::
:toctree: generated/
line_search_armijo
line_search_wolfe1
line_search_wolfe2
scalar_search_wolfe1
scalar_search_wolfe2
"""
from warnings import warn
from scipy.optimize import _minpack2 as minpack2 # noqa: F401
from ._dcsrch import DCSRCH
import numpy as np
__all__ = ['LineSearchWarning', 'line_search_wolfe1', 'line_search_wolfe2',
'scalar_search_wolfe1', 'scalar_search_wolfe2',
'line_search_armijo']
class LineSearchWarning(RuntimeWarning):
pass
def _check_c1_c2(c1, c2):
if not (0 < c1 < c2 < 1):
raise ValueError("'c1' and 'c2' do not satisfy"
"'0 < c1 < c2 < 1'.")
#------------------------------------------------------------------------------
# Minpack's Wolfe line and scalar searches
#------------------------------------------------------------------------------
def line_search_wolfe1(f, fprime, xk, pk, gfk=None,
old_fval=None, old_old_fval=None,
args=(), c1=1e-4, c2=0.9, amax=50, amin=1e-8,
xtol=1e-14):
"""
As `scalar_search_wolfe1` but do a line search to direction `pk`
Parameters
----------
f : callable
Function `f(x)`
fprime : callable
Gradient of `f`
xk : array_like
Current point
pk : array_like
Search direction
gfk : array_like, optional
Gradient of `f` at point `xk`
old_fval : float, optional
Value of `f` at point `xk`
old_old_fval : float, optional
Value of `f` at point preceding `xk`
The rest of the parameters are the same as for `scalar_search_wolfe1`.
Returns
-------
stp, f_count, g_count, fval, old_fval
As in `line_search_wolfe1`
gval : array
Gradient of `f` at the final point
Notes
-----
Parameters `c1` and `c2` must satisfy ``0 < c1 < c2 < 1``.
"""
if gfk is None:
gfk = fprime(xk, *args)
gval = [gfk]
gc = [0]
fc = [0]
def phi(s):
fc[0] += 1
return f(xk + s*pk, *args)
def derphi(s):
gval[0] = fprime(xk + s*pk, *args)
gc[0] += 1
return np.dot(gval[0], pk)
derphi0 = np.dot(gfk, pk)
stp, fval, old_fval = scalar_search_wolfe1(
phi, derphi, old_fval, old_old_fval, derphi0,
c1=c1, c2=c2, amax=amax, amin=amin, xtol=xtol)
return stp, fc[0], gc[0], fval, old_fval, gval[0]
def scalar_search_wolfe1(phi, derphi, phi0=None, old_phi0=None, derphi0=None,
c1=1e-4, c2=0.9,
amax=50, amin=1e-8, xtol=1e-14):
"""
Scalar function search for alpha that satisfies strong Wolfe conditions
alpha > 0 is assumed to be a descent direction.
Parameters
----------
phi : callable phi(alpha)
Function at point `alpha`
derphi : callable phi'(alpha)
Objective function derivative. Returns a scalar.
phi0 : float, optional
Value of phi at 0
old_phi0 : float, optional
Value of phi at previous point
derphi0 : float, optional
Value derphi at 0
c1 : float, optional
Parameter for Armijo condition rule.
c2 : float, optional
Parameter for curvature condition rule.
amax, amin : float, optional
Maximum and minimum step size
xtol : float, optional
Relative tolerance for an acceptable step.
Returns
-------
alpha : float
Step size, or None if no suitable step was found
phi : float
Value of `phi` at the new point `alpha`
phi0 : float
Value of `phi` at `alpha=0`
Notes
-----
Uses routine DCSRCH from MINPACK.
Parameters `c1` and `c2` must satisfy ``0 < c1 < c2 < 1`` as described in [1]_.
References
----------
.. [1] Nocedal, J., & Wright, S. J. (2006). Numerical optimization.
In Springer Series in Operations Research and Financial Engineering.
(Springer Series in Operations Research and Financial Engineering).
Springer Nature.
"""
_check_c1_c2(c1, c2)
if phi0 is None:
phi0 = phi(0.)
if derphi0 is None:
derphi0 = derphi(0.)
if old_phi0 is not None and derphi0 != 0:
alpha1 = min(1.0, 1.01*2*(phi0 - old_phi0)/derphi0)
if alpha1 < 0:
alpha1 = 1.0
else:
alpha1 = 1.0
maxiter = 100
dcsrch = DCSRCH(phi, derphi, c1, c2, xtol, amin, amax)
stp, phi1, phi0, task = dcsrch(
alpha1, phi0=phi0, derphi0=derphi0, maxiter=maxiter
)
return stp, phi1, phi0
line_search = line_search_wolfe1
#------------------------------------------------------------------------------
# Pure-Python Wolfe line and scalar searches
#------------------------------------------------------------------------------
# Note: `line_search_wolfe2` is the public `scipy.optimize.line_search`
def line_search_wolfe2(f, myfprime, xk, pk, gfk=None, old_fval=None,
old_old_fval=None, args=(), c1=1e-4, c2=0.9, amax=None,
extra_condition=None, maxiter=10):
"""Find alpha that satisfies strong Wolfe conditions.
Parameters
----------
f : callable f(x,*args)
Objective function.
myfprime : callable f'(x,*args)
Objective function gradient.
xk : ndarray
Starting point.
pk : ndarray
Search direction. The search direction must be a descent direction
for the algorithm to converge.
gfk : ndarray, optional
Gradient value for x=xk (xk being the current parameter
estimate). Will be recomputed if omitted.
old_fval : float, optional
Function value for x=xk. Will be recomputed if omitted.
old_old_fval : float, optional
Function value for the point preceding x=xk.
args : tuple, optional
Additional arguments passed to objective function.
c1 : float, optional
Parameter for Armijo condition rule.
c2 : float, optional
Parameter for curvature condition rule.
amax : float, optional
Maximum step size
extra_condition : callable, optional
A callable of the form ``extra_condition(alpha, x, f, g)``
returning a boolean. Arguments are the proposed step ``alpha``
and the corresponding ``x``, ``f`` and ``g`` values. The line search
accepts the value of ``alpha`` only if this
callable returns ``True``. If the callable returns ``False``
for the step length, the algorithm will continue with
new iterates. The callable is only called for iterates
satisfying the strong Wolfe conditions.
maxiter : int, optional
Maximum number of iterations to perform.
Returns
-------
alpha : float or None
Alpha for which ``x_new = x0 + alpha * pk``,
or None if the line search algorithm did not converge.
fc : int
Number of function evaluations made.
gc : int
Number of gradient evaluations made.
new_fval : float or None
New function value ``f(x_new)=f(x0+alpha*pk)``,
or None if the line search algorithm did not converge.
old_fval : float
Old function value ``f(x0)``.
new_slope : float or None
The local slope along the search direction at the
new value ``<myfprime(x_new), pk>``,
or None if the line search algorithm did not converge.
Notes
-----
Uses the line search algorithm to enforce strong Wolfe
conditions. See Wright and Nocedal, 'Numerical Optimization',
1999, pp. 59-61.
The search direction `pk` must be a descent direction (e.g.
``-myfprime(xk)``) to find a step length that satisfies the strong Wolfe
conditions. If the search direction is not a descent direction (e.g.
``myfprime(xk)``), then `alpha`, `new_fval`, and `new_slope` will be None.
Examples
--------
>>> import numpy as np
>>> from scipy.optimize import line_search
A objective function and its gradient are defined.
>>> def obj_func(x):
... return (x[0])**2+(x[1])**2
>>> def obj_grad(x):
... return [2*x[0], 2*x[1]]
We can find alpha that satisfies strong Wolfe conditions.
>>> start_point = np.array([1.8, 1.7])
>>> search_gradient = np.array([-1.0, -1.0])
>>> line_search(obj_func, obj_grad, start_point, search_gradient)
(1.0, 2, 1, 1.1300000000000001, 6.13, [1.6, 1.4])
"""
fc = [0]
gc = [0]
gval = [None]
gval_alpha = [None]
def phi(alpha):
fc[0] += 1
return f(xk + alpha * pk, *args)
fprime = myfprime
def derphi(alpha):
gc[0] += 1
gval[0] = fprime(xk + alpha * pk, *args) # store for later use
gval_alpha[0] = alpha
return np.dot(gval[0], pk)
if gfk is None:
gfk = fprime(xk, *args)
derphi0 = np.dot(gfk, pk)
if extra_condition is not None:
# Add the current gradient as argument, to avoid needless
# re-evaluation
def extra_condition2(alpha, phi):
if gval_alpha[0] != alpha:
derphi(alpha)
x = xk + alpha * pk
return extra_condition(alpha, x, phi, gval[0])
else:
extra_condition2 = None
alpha_star, phi_star, old_fval, derphi_star = scalar_search_wolfe2(
phi, derphi, old_fval, old_old_fval, derphi0, c1, c2, amax,
extra_condition2, maxiter=maxiter)
if derphi_star is None:
warn('The line search algorithm did not converge',
LineSearchWarning, stacklevel=2)
else:
# derphi_star is a number (derphi) -- so use the most recently
# calculated gradient used in computing it derphi = gfk*pk
# this is the gradient at the next step no need to compute it
# again in the outer loop.
derphi_star = gval[0]
return alpha_star, fc[0], gc[0], phi_star, old_fval, derphi_star
def scalar_search_wolfe2(phi, derphi, phi0=None,
old_phi0=None, derphi0=None,
c1=1e-4, c2=0.9, amax=None,
extra_condition=None, maxiter=10):
"""Find alpha that satisfies strong Wolfe conditions.
alpha > 0 is assumed to be a descent direction.
Parameters
----------
phi : callable phi(alpha)
Objective scalar function.
derphi : callable phi'(alpha)
Objective function derivative. Returns a scalar.
phi0 : float, optional
Value of phi at 0.
old_phi0 : float, optional
Value of phi at previous point.
derphi0 : float, optional
Value of derphi at 0
c1 : float, optional
Parameter for Armijo condition rule.
c2 : float, optional
Parameter for curvature condition rule.
amax : float, optional
Maximum step size.
extra_condition : callable, optional
A callable of the form ``extra_condition(alpha, phi_value)``
returning a boolean. The line search accepts the value
of ``alpha`` only if this callable returns ``True``.
If the callable returns ``False`` for the step length,
the algorithm will continue with new iterates.
The callable is only called for iterates satisfying
the strong Wolfe conditions.
maxiter : int, optional
Maximum number of iterations to perform.
Returns
-------
alpha_star : float or None
Best alpha, or None if the line search algorithm did not converge.
phi_star : float
phi at alpha_star.
phi0 : float
phi at 0.
derphi_star : float or None
derphi at alpha_star, or None if the line search algorithm
did not converge.
Notes
-----
Uses the line search algorithm to enforce strong Wolfe
conditions. See Wright and Nocedal, 'Numerical Optimization',
1999, pp. 59-61.
"""
_check_c1_c2(c1, c2)
if phi0 is None:
phi0 = phi(0.)
if derphi0 is None:
derphi0 = derphi(0.)
alpha0 = 0
if old_phi0 is not None and derphi0 != 0:
alpha1 = min(1.0, 1.01*2*(phi0 - old_phi0)/derphi0)
else:
alpha1 = 1.0
if alpha1 < 0:
alpha1 = 1.0
if amax is not None:
alpha1 = min(alpha1, amax)
phi_a1 = phi(alpha1)
#derphi_a1 = derphi(alpha1) evaluated below
phi_a0 = phi0
derphi_a0 = derphi0
if extra_condition is None:
def extra_condition(alpha, phi):
return True
for i in range(maxiter):
if alpha1 == 0 or (amax is not None and alpha0 > amax):
# alpha1 == 0: This shouldn't happen. Perhaps the increment has
# slipped below machine precision?
alpha_star = None
phi_star = phi0
phi0 = old_phi0
derphi_star = None
if alpha1 == 0:
msg = 'Rounding errors prevent the line search from converging'
else:
msg = "The line search algorithm could not find a solution " + \
"less than or equal to amax: %s" % amax
warn(msg, LineSearchWarning, stacklevel=2)
break
not_first_iteration = i > 0
if (phi_a1 > phi0 + c1 * alpha1 * derphi0) or \
((phi_a1 >= phi_a0) and not_first_iteration):
alpha_star, phi_star, derphi_star = \
_zoom(alpha0, alpha1, phi_a0,
phi_a1, derphi_a0, phi, derphi,
phi0, derphi0, c1, c2, extra_condition)
break
derphi_a1 = derphi(alpha1)
if (abs(derphi_a1) <= -c2*derphi0):
if extra_condition(alpha1, phi_a1):
alpha_star = alpha1
phi_star = phi_a1
derphi_star = derphi_a1
break
if (derphi_a1 >= 0):
alpha_star, phi_star, derphi_star = \
_zoom(alpha1, alpha0, phi_a1,
phi_a0, derphi_a1, phi, derphi,
phi0, derphi0, c1, c2, extra_condition)
break
alpha2 = 2 * alpha1 # increase by factor of two on each iteration
if amax is not None:
alpha2 = min(alpha2, amax)
alpha0 = alpha1
alpha1 = alpha2
phi_a0 = phi_a1
phi_a1 = phi(alpha1)
derphi_a0 = derphi_a1
else:
# stopping test maxiter reached
alpha_star = alpha1
phi_star = phi_a1
derphi_star = None
warn('The line search algorithm did not converge',
LineSearchWarning, stacklevel=2)
return alpha_star, phi_star, phi0, derphi_star
def _cubicmin(a, fa, fpa, b, fb, c, fc):
"""
Finds the minimizer for a cubic polynomial that goes through the
points (a,fa), (b,fb), and (c,fc) with derivative at a of fpa.
If no minimizer can be found, return None.
"""
# f(x) = A *(x-a)^3 + B*(x-a)^2 + C*(x-a) + D
with np.errstate(divide='raise', over='raise', invalid='raise'):
try:
C = fpa
db = b - a
dc = c - a
denom = (db * dc) ** 2 * (db - dc)
d1 = np.empty((2, 2))
d1[0, 0] = dc ** 2
d1[0, 1] = -db ** 2
d1[1, 0] = -dc ** 3
d1[1, 1] = db ** 3
[A, B] = np.dot(d1, np.asarray([fb - fa - C * db,
fc - fa - C * dc]).flatten())
A /= denom
B /= denom
radical = B * B - 3 * A * C
xmin = a + (-B + np.sqrt(radical)) / (3 * A)
except ArithmeticError:
return None
if not np.isfinite(xmin):
return None
return xmin
def _quadmin(a, fa, fpa, b, fb):
"""
Finds the minimizer for a quadratic polynomial that goes through
the points (a,fa), (b,fb) with derivative at a of fpa.
"""
# f(x) = B*(x-a)^2 + C*(x-a) + D
with np.errstate(divide='raise', over='raise', invalid='raise'):
try:
D = fa
C = fpa
db = b - a * 1.0
B = (fb - D - C * db) / (db * db)
xmin = a - C / (2.0 * B)
except ArithmeticError:
return None
if not np.isfinite(xmin):
return None
return xmin
def _zoom(a_lo, a_hi, phi_lo, phi_hi, derphi_lo,
phi, derphi, phi0, derphi0, c1, c2, extra_condition):
"""Zoom stage of approximate linesearch satisfying strong Wolfe conditions.
Part of the optimization algorithm in `scalar_search_wolfe2`.
Notes
-----
Implements Algorithm 3.6 (zoom) in Wright and Nocedal,
'Numerical Optimization', 1999, pp. 61.
"""
maxiter = 10
i = 0
delta1 = 0.2 # cubic interpolant check
delta2 = 0.1 # quadratic interpolant check
phi_rec = phi0
a_rec = 0
while True:
# interpolate to find a trial step length between a_lo and
# a_hi Need to choose interpolation here. Use cubic
# interpolation and then if the result is within delta *
# dalpha or outside of the interval bounded by a_lo or a_hi
# then use quadratic interpolation, if the result is still too
# close, then use bisection
dalpha = a_hi - a_lo
if dalpha < 0:
a, b = a_hi, a_lo
else:
a, b = a_lo, a_hi
# minimizer of cubic interpolant
# (uses phi_lo, derphi_lo, phi_hi, and the most recent value of phi)
#
# if the result is too close to the end points (or out of the
# interval), then use quadratic interpolation with phi_lo,
# derphi_lo and phi_hi if the result is still too close to the
# end points (or out of the interval) then use bisection
if (i > 0):
cchk = delta1 * dalpha
a_j = _cubicmin(a_lo, phi_lo, derphi_lo, a_hi, phi_hi,
a_rec, phi_rec)
if (i == 0) or (a_j is None) or (a_j > b - cchk) or (a_j < a + cchk):
qchk = delta2 * dalpha
a_j = _quadmin(a_lo, phi_lo, derphi_lo, a_hi, phi_hi)
if (a_j is None) or (a_j > b-qchk) or (a_j < a+qchk):
a_j = a_lo + 0.5*dalpha
# Check new value of a_j
phi_aj = phi(a_j)
if (phi_aj > phi0 + c1*a_j*derphi0) or (phi_aj >= phi_lo):
phi_rec = phi_hi
a_rec = a_hi
a_hi = a_j
phi_hi = phi_aj
else:
derphi_aj = derphi(a_j)
if abs(derphi_aj) <= -c2*derphi0 and extra_condition(a_j, phi_aj):
a_star = a_j
val_star = phi_aj
valprime_star = derphi_aj
break
if derphi_aj*(a_hi - a_lo) >= 0:
phi_rec = phi_hi
a_rec = a_hi
a_hi = a_lo
phi_hi = phi_lo
else:
phi_rec = phi_lo
a_rec = a_lo
a_lo = a_j
phi_lo = phi_aj
derphi_lo = derphi_aj
i += 1
if (i > maxiter):
# Failed to find a conforming step size
a_star = None
val_star = None
valprime_star = None
break
return a_star, val_star, valprime_star
#------------------------------------------------------------------------------
# Armijo line and scalar searches
#------------------------------------------------------------------------------
def line_search_armijo(f, xk, pk, gfk, old_fval, args=(), c1=1e-4, alpha0=1):
"""Minimize over alpha, the function ``f(xk+alpha pk)``.
Parameters
----------
f : callable
Function to be minimized.
xk : array_like
Current point.
pk : array_like
Search direction.
gfk : array_like
Gradient of `f` at point `xk`.
old_fval : float
Value of `f` at point `xk`.
args : tuple, optional
Optional arguments.
c1 : float, optional
Value to control stopping criterion.
alpha0 : scalar, optional
Value of `alpha` at start of the optimization.
Returns
-------
alpha
f_count
f_val_at_alpha
Notes
-----
Uses the interpolation algorithm (Armijo backtracking) as suggested by
Wright and Nocedal in 'Numerical Optimization', 1999, pp. 56-57
"""
xk = np.atleast_1d(xk)
fc = [0]
def phi(alpha1):
fc[0] += 1
return f(xk + alpha1*pk, *args)
if old_fval is None:
phi0 = phi(0.)
else:
phi0 = old_fval # compute f(xk) -- done in past loop
derphi0 = np.dot(gfk, pk)
alpha, phi1 = scalar_search_armijo(phi, phi0, derphi0, c1=c1,
alpha0=alpha0)
return alpha, fc[0], phi1
def line_search_BFGS(f, xk, pk, gfk, old_fval, args=(), c1=1e-4, alpha0=1):
"""
Compatibility wrapper for `line_search_armijo`
"""
r = line_search_armijo(f, xk, pk, gfk, old_fval, args=args, c1=c1,
alpha0=alpha0)
return r[0], r[1], 0, r[2]
def scalar_search_armijo(phi, phi0, derphi0, c1=1e-4, alpha0=1, amin=0):
"""Minimize over alpha, the function ``phi(alpha)``.
Uses the interpolation algorithm (Armijo backtracking) as suggested by
Wright and Nocedal in 'Numerical Optimization', 1999, pp. 56-57
alpha > 0 is assumed to be a descent direction.
Returns
-------
alpha
phi1
"""
phi_a0 = phi(alpha0)
if phi_a0 <= phi0 + c1*alpha0*derphi0:
return alpha0, phi_a0
# Otherwise, compute the minimizer of a quadratic interpolant:
alpha1 = -(derphi0) * alpha0**2 / 2.0 / (phi_a0 - phi0 - derphi0 * alpha0)
phi_a1 = phi(alpha1)
if (phi_a1 <= phi0 + c1*alpha1*derphi0):
return alpha1, phi_a1
# Otherwise, loop with cubic interpolation until we find an alpha which
# satisfies the first Wolfe condition (since we are backtracking, we will
# assume that the value of alpha is not too small and satisfies the second
# condition.
while alpha1 > amin: # we are assuming alpha>0 is a descent direction
factor = alpha0**2 * alpha1**2 * (alpha1-alpha0)
a = alpha0**2 * (phi_a1 - phi0 - derphi0*alpha1) - \
alpha1**2 * (phi_a0 - phi0 - derphi0*alpha0)
a = a / factor
b = -alpha0**3 * (phi_a1 - phi0 - derphi0*alpha1) + \
alpha1**3 * (phi_a0 - phi0 - derphi0*alpha0)
b = b / factor
alpha2 = (-b + np.sqrt(abs(b**2 - 3 * a * derphi0))) / (3.0*a)
phi_a2 = phi(alpha2)
if (phi_a2 <= phi0 + c1*alpha2*derphi0):
return alpha2, phi_a2
if (alpha1 - alpha2) > alpha1 / 2.0 or (1 - alpha2/alpha1) < 0.96:
alpha2 = alpha1 / 2.0
alpha0 = alpha1
alpha1 = alpha2
phi_a0 = phi_a1
phi_a1 = phi_a2
# Failed to find a suitable step length
return None, phi_a1
#------------------------------------------------------------------------------
# Non-monotone line search for DF-SANE
#------------------------------------------------------------------------------
def _nonmonotone_line_search_cruz(f, x_k, d, prev_fs, eta,
gamma=1e-4, tau_min=0.1, tau_max=0.5):
"""
Nonmonotone backtracking line search as described in [1]_
Parameters
----------
f : callable
Function returning a tuple ``(f, F)`` where ``f`` is the value
of a merit function and ``F`` the residual.
x_k : ndarray
Initial position.
d : ndarray
Search direction.
prev_fs : float
List of previous merit function values. Should have ``len(prev_fs) <= M``
where ``M`` is the nonmonotonicity window parameter.
eta : float
Allowed merit function increase, see [1]_
gamma, tau_min, tau_max : float, optional
Search parameters, see [1]_
Returns
-------
alpha : float
Step length
xp : ndarray
Next position
fp : float
Merit function value at next position
Fp : ndarray
Residual at next position
References
----------
[1] "Spectral residual method without gradient information for solving
large-scale nonlinear systems of equations." W. La Cruz,
J.M. Martinez, M. Raydan. Math. Comp. **75**, 1429 (2006).
"""
f_k = prev_fs[-1]
f_bar = max(prev_fs)
alpha_p = 1
alpha_m = 1
alpha = 1
while True:
xp = x_k + alpha_p * d
fp, Fp = f(xp)
if fp <= f_bar + eta - gamma * alpha_p**2 * f_k:
alpha = alpha_p
break
alpha_tp = alpha_p**2 * f_k / (fp + (2*alpha_p - 1)*f_k)
xp = x_k - alpha_m * d
fp, Fp = f(xp)
if fp <= f_bar + eta - gamma * alpha_m**2 * f_k:
alpha = -alpha_m
break
alpha_tm = alpha_m**2 * f_k / (fp + (2*alpha_m - 1)*f_k)
alpha_p = np.clip(alpha_tp, tau_min * alpha_p, tau_max * alpha_p)
alpha_m = np.clip(alpha_tm, tau_min * alpha_m, tau_max * alpha_m)
return alpha, xp, fp, Fp
def _nonmonotone_line_search_cheng(f, x_k, d, f_k, C, Q, eta,
gamma=1e-4, tau_min=0.1, tau_max=0.5,
nu=0.85):
"""
Nonmonotone line search from [1]
Parameters
----------
f : callable
Function returning a tuple ``(f, F)`` where ``f`` is the value
of a merit function and ``F`` the residual.
x_k : ndarray
Initial position.
d : ndarray
Search direction.
f_k : float
Initial merit function value.
C, Q : float
Control parameters. On the first iteration, give values
Q=1.0, C=f_k
eta : float
Allowed merit function increase, see [1]_
nu, gamma, tau_min, tau_max : float, optional
Search parameters, see [1]_
Returns
-------
alpha : float
Step length
xp : ndarray
Next position
fp : float
Merit function value at next position
Fp : ndarray
Residual at next position
C : float
New value for the control parameter C
Q : float
New value for the control parameter Q
References
----------
.. [1] W. Cheng & D.-H. Li, ''A derivative-free nonmonotone line
search and its application to the spectral residual
method'', IMA J. Numer. Anal. 29, 814 (2009).
"""
alpha_p = 1
alpha_m = 1
alpha = 1
while True:
xp = x_k + alpha_p * d
fp, Fp = f(xp)
if fp <= C + eta - gamma * alpha_p**2 * f_k:
alpha = alpha_p
break
alpha_tp = alpha_p**2 * f_k / (fp + (2*alpha_p - 1)*f_k)
xp = x_k - alpha_m * d
fp, Fp = f(xp)
if fp <= C + eta - gamma * alpha_m**2 * f_k:
alpha = -alpha_m
break
alpha_tm = alpha_m**2 * f_k / (fp + (2*alpha_m - 1)*f_k)
alpha_p = np.clip(alpha_tp, tau_min * alpha_p, tau_max * alpha_p)
alpha_m = np.clip(alpha_tm, tau_min * alpha_m, tau_max * alpha_m)
# Update C and Q
Q_next = nu * Q + 1
C = (nu * Q * (C + eta) + fp) / Q_next
Q = Q_next
return alpha, xp, fp, Fp, C, Q
@@ -0,0 +1,714 @@
"""
A top-level linear programming interface.
.. versionadded:: 0.15.0
Functions
---------
.. autosummary::
:toctree: generated/
linprog
linprog_verbose_callback
linprog_terse_callback
"""
import numpy as np
from ._optimize import OptimizeResult, OptimizeWarning
from warnings import warn
from ._linprog_highs import _linprog_highs
from ._linprog_ip import _linprog_ip
from ._linprog_simplex import _linprog_simplex
from ._linprog_rs import _linprog_rs
from ._linprog_doc import (_linprog_highs_doc, _linprog_ip_doc, # noqa: F401
_linprog_rs_doc, _linprog_simplex_doc,
_linprog_highs_ipm_doc, _linprog_highs_ds_doc)
from ._linprog_util import (
_parse_linprog, _presolve, _get_Abc, _LPProblem, _autoscale,
_postsolve, _check_result, _display_summary)
from copy import deepcopy
__all__ = ['linprog', 'linprog_verbose_callback', 'linprog_terse_callback']
__docformat__ = "restructuredtext en"
LINPROG_METHODS = [
'simplex', 'revised simplex', 'interior-point', 'highs', 'highs-ds', 'highs-ipm'
]
def linprog_verbose_callback(res):
"""
A sample callback function demonstrating the linprog callback interface.
This callback produces detailed output to sys.stdout before each iteration
and after the final iteration of the simplex algorithm.
Parameters
----------
res : A `scipy.optimize.OptimizeResult` consisting of the following fields:
x : 1-D array
The independent variable vector which optimizes the linear
programming problem.
fun : float
Value of the objective function.
success : bool
True if the algorithm succeeded in finding an optimal solution.
slack : 1-D array
The values of the slack variables. Each slack variable corresponds
to an inequality constraint. If the slack is zero, then the
corresponding constraint is active.
con : 1-D array
The (nominally zero) residuals of the equality constraints, that is,
``b - A_eq @ x``
phase : int
The phase of the optimization being executed. In phase 1 a basic
feasible solution is sought and the T has an additional row
representing an alternate objective function.
status : int
An integer representing the exit status of the optimization::
0 : Optimization terminated successfully
1 : Iteration limit reached
2 : Problem appears to be infeasible
3 : Problem appears to be unbounded
4 : Serious numerical difficulties encountered
nit : int
The number of iterations performed.
message : str
A string descriptor of the exit status of the optimization.
"""
x = res['x']
fun = res['fun']
phase = res['phase']
status = res['status']
nit = res['nit']
message = res['message']
complete = res['complete']
saved_printoptions = np.get_printoptions()
np.set_printoptions(linewidth=500,
formatter={'float': lambda x: f"{x: 12.4f}"})
if status:
print('--------- Simplex Early Exit -------\n')
print(f'The simplex method exited early with status {status:d}')
print(message)
elif complete:
print('--------- Simplex Complete --------\n')
print(f'Iterations required: {nit}')
else:
print(f'--------- Iteration {nit:d} ---------\n')
if nit > 0:
if phase == 1:
print('Current Pseudo-Objective Value:')
else:
print('Current Objective Value:')
print('f = ', fun)
print()
print('Current Solution Vector:')
print('x = ', x)
print()
np.set_printoptions(**saved_printoptions)
def linprog_terse_callback(res):
"""
A sample callback function demonstrating the linprog callback interface.
This callback produces brief output to sys.stdout before each iteration
and after the final iteration of the simplex algorithm.
Parameters
----------
res : A `scipy.optimize.OptimizeResult` consisting of the following fields:
x : 1-D array
The independent variable vector which optimizes the linear
programming problem.
fun : float
Value of the objective function.
success : bool
True if the algorithm succeeded in finding an optimal solution.
slack : 1-D array
The values of the slack variables. Each slack variable corresponds
to an inequality constraint. If the slack is zero, then the
corresponding constraint is active.
con : 1-D array
The (nominally zero) residuals of the equality constraints, that is,
``b - A_eq @ x``.
phase : int
The phase of the optimization being executed. In phase 1 a basic
feasible solution is sought and the T has an additional row
representing an alternate objective function.
status : int
An integer representing the exit status of the optimization::
0 : Optimization terminated successfully
1 : Iteration limit reached
2 : Problem appears to be infeasible
3 : Problem appears to be unbounded
4 : Serious numerical difficulties encountered
nit : int
The number of iterations performed.
message : str
A string descriptor of the exit status of the optimization.
"""
nit = res['nit']
x = res['x']
if nit == 0:
print("Iter: X:")
print(f"{nit: <5d} ", end="")
print(x)
def linprog(c, A_ub=None, b_ub=None, A_eq=None, b_eq=None,
bounds=(0, None), method='highs', callback=None,
options=None, x0=None, integrality=None):
r"""
Linear programming: minimize a linear objective function subject to linear
equality and inequality constraints.
Linear programming solves problems of the following form:
.. math::
\min_x \ & c^T x \\
\mbox{such that} \ & A_{ub} x \leq b_{ub},\\
& A_{eq} x = b_{eq},\\
& l \leq x \leq u ,
where :math:`x` is a vector of decision variables; :math:`c`,
:math:`b_{ub}`, :math:`b_{eq}`, :math:`l`, and :math:`u` are vectors; and
:math:`A_{ub}` and :math:`A_{eq}` are matrices.
Alternatively, that's:
- minimize ::
c @ x
- such that ::
A_ub @ x <= b_ub
A_eq @ x == b_eq
lb <= x <= ub
Note that by default ``lb = 0`` and ``ub = None``. Other bounds can be
specified with ``bounds``.
Parameters
----------
c : 1-D array
The coefficients of the linear objective function to be minimized.
A_ub : 2-D array, optional
The inequality constraint matrix. Each row of ``A_ub`` specifies the
coefficients of a linear inequality constraint on ``x``.
b_ub : 1-D array, optional
The inequality constraint vector. Each element represents an
upper bound on the corresponding value of ``A_ub @ x``.
A_eq : 2-D array, optional
The equality constraint matrix. Each row of ``A_eq`` specifies the
coefficients of a linear equality constraint on ``x``.
b_eq : 1-D array, optional
The equality constraint vector. Each element of ``A_eq @ x`` must equal
the corresponding element of ``b_eq``.
bounds : sequence, optional
A sequence of ``(min, max)`` pairs for each element in ``x``, defining
the minimum and maximum values of that decision variable.
If a single tuple ``(min, max)`` is provided, then ``min`` and ``max``
will serve as bounds for all decision variables.
Use ``None`` to indicate that there is no bound. For instance, the
default bound ``(0, None)`` means that all decision variables are
non-negative, and the pair ``(None, None)`` means no bounds at all,
i.e. all variables are allowed to be any real.
method : str, optional
The algorithm used to solve the standard form problem.
:ref:`'highs' <optimize.linprog-highs>` (default),
:ref:`'highs-ds' <optimize.linprog-highs-ds>`,
:ref:`'highs-ipm' <optimize.linprog-highs-ipm>`,
:ref:`'interior-point' <optimize.linprog-interior-point>` (legacy),
:ref:`'revised simplex' <optimize.linprog-revised_simplex>` (legacy),
and
:ref:`'simplex' <optimize.linprog-simplex>` (legacy) are supported.
The legacy methods are deprecated and will be removed in SciPy 1.11.0.
callback : callable, optional
If a callback function is provided, it will be called at least once per
iteration of the algorithm. The callback function must accept a single
`scipy.optimize.OptimizeResult` consisting of the following fields:
x : 1-D array
The current solution vector.
fun : float
The current value of the objective function ``c @ x``.
success : bool
``True`` when the algorithm has completed successfully.
slack : 1-D array
The (nominally positive) values of the slack,
``b_ub - A_ub @ x``.
con : 1-D array
The (nominally zero) residuals of the equality constraints,
``b_eq - A_eq @ x``.
phase : int
The phase of the algorithm being executed.
status : int
An integer representing the status of the algorithm.
``0`` : Optimization proceeding nominally.
``1`` : Iteration limit reached.
``2`` : Problem appears to be infeasible.
``3`` : Problem appears to be unbounded.
``4`` : Numerical difficulties encountered.
nit : int
The current iteration number.
message : str
A string descriptor of the algorithm status.
Callback functions are not currently supported by the HiGHS methods.
options : dict, optional
A dictionary of solver options. All methods accept the following
options:
maxiter : int
Maximum number of iterations to perform.
Default: see method-specific documentation.
disp : bool
Set to ``True`` to print convergence messages.
Default: ``False``.
presolve : bool
Set to ``False`` to disable automatic presolve.
Default: ``True``.
All methods except the HiGHS solvers also accept:
tol : float
A tolerance which determines when a residual is "close enough" to
zero to be considered exactly zero.
autoscale : bool
Set to ``True`` to automatically perform equilibration.
Consider using this option if the numerical values in the
constraints are separated by several orders of magnitude.
Default: ``False``.
rr : bool
Set to ``False`` to disable automatic redundancy removal.
Default: ``True``.
rr_method : string
Method used to identify and remove redundant rows from the
equality constraint matrix after presolve. For problems with
dense input, the available methods for redundancy removal are:
"SVD":
Repeatedly performs singular value decomposition on
the matrix, detecting redundant rows based on nonzeros
in the left singular vectors that correspond with
zero singular values. May be fast when the matrix is
nearly full rank.
"pivot":
Uses the algorithm presented in [5]_ to identify
redundant rows.
"ID":
Uses a randomized interpolative decomposition.
Identifies columns of the matrix transpose not used in
a full-rank interpolative decomposition of the matrix.
None:
Uses "svd" if the matrix is nearly full rank, that is,
the difference between the matrix rank and the number
of rows is less than five. If not, uses "pivot". The
behavior of this default is subject to change without
prior notice.
Default: None.
For problems with sparse input, this option is ignored, and the
pivot-based algorithm presented in [5]_ is used.
For method-specific options, see
:func:`show_options('linprog') <show_options>`.
x0 : 1-D array, optional
Guess values of the decision variables, which will be refined by
the optimization algorithm. This argument is currently used only by the
'revised simplex' method, and can only be used if `x0` represents a
basic feasible solution.
integrality : 1-D array or int, optional
Indicates the type of integrality constraint on each decision variable.
``0`` : Continuous variable; no integrality constraint.
``1`` : Integer variable; decision variable must be an integer
within `bounds`.
``2`` : Semi-continuous variable; decision variable must be within
`bounds` or take value ``0``.
``3`` : Semi-integer variable; decision variable must be an integer
within `bounds` or take value ``0``.
By default, all variables are continuous.
For mixed integrality constraints, supply an array of shape `c.shape`.
To infer a constraint on each decision variable from shorter inputs,
the argument will be broadcasted to `c.shape` using `np.broadcast_to`.
This argument is currently used only by the ``'highs'`` method and
ignored otherwise.
Returns
-------
res : OptimizeResult
A :class:`scipy.optimize.OptimizeResult` consisting of the fields
below. Note that the return types of the fields may depend on whether
the optimization was successful, therefore it is recommended to check
`OptimizeResult.status` before relying on the other fields:
x : 1-D array
The values of the decision variables that minimizes the
objective function while satisfying the constraints.
fun : float
The optimal value of the objective function ``c @ x``.
slack : 1-D array
The (nominally positive) values of the slack variables,
``b_ub - A_ub @ x``.
con : 1-D array
The (nominally zero) residuals of the equality constraints,
``b_eq - A_eq @ x``.
success : bool
``True`` when the algorithm succeeds in finding an optimal
solution.
status : int
An integer representing the exit status of the algorithm.
``0`` : Optimization terminated successfully.
``1`` : Iteration limit reached.
``2`` : Problem appears to be infeasible.
``3`` : Problem appears to be unbounded.
``4`` : Numerical difficulties encountered.
nit : int
The total number of iterations performed in all phases.
message : str
A string descriptor of the exit status of the algorithm.
See Also
--------
show_options : Additional options accepted by the solvers.
Notes
-----
This section describes the available solvers that can be selected by the
'method' parameter.
`'highs-ds'` and
`'highs-ipm'` are interfaces to the
HiGHS simplex and interior-point method solvers [13]_, respectively.
`'highs'` (default) chooses between
the two automatically. These are the fastest linear
programming solvers in SciPy, especially for large, sparse problems;
which of these two is faster is problem-dependent.
The other solvers (`'interior-point'`, `'revised simplex'`, and
`'simplex'`) are legacy methods and will be removed in SciPy 1.11.0.
Method *highs-ds* is a wrapper of the C++ high performance dual
revised simplex implementation (HSOL) [13]_, [14]_. Method *highs-ipm*
is a wrapper of a C++ implementation of an **i**\ nterior-\ **p**\ oint
**m**\ ethod [13]_; it features a crossover routine, so it is as accurate
as a simplex solver. Method *highs* chooses between the two automatically.
For new code involving `linprog`, we recommend explicitly choosing one of
these three method values.
.. versionadded:: 1.6.0
Method *interior-point* uses the primal-dual path following algorithm
as outlined in [4]_. This algorithm supports sparse constraint matrices and
is typically faster than the simplex methods, especially for large, sparse
problems. Note, however, that the solution returned may be slightly less
accurate than those of the simplex methods and will not, in general,
correspond with a vertex of the polytope defined by the constraints.
.. versionadded:: 1.0.0
Method *revised simplex* uses the revised simplex method as described in
[9]_, except that a factorization [11]_ of the basis matrix, rather than
its inverse, is efficiently maintained and used to solve the linear systems
at each iteration of the algorithm.
.. versionadded:: 1.3.0
Method *simplex* uses a traditional, full-tableau implementation of
Dantzig's simplex algorithm [1]_, [2]_ (*not* the
Nelder-Mead simplex). This algorithm is included for backwards
compatibility and educational purposes.
.. versionadded:: 0.15.0
Before applying *interior-point*, *revised simplex*, or *simplex*,
a presolve procedure based on [8]_ attempts
to identify trivial infeasibilities, trivial unboundedness, and potential
problem simplifications. Specifically, it checks for:
- rows of zeros in ``A_eq`` or ``A_ub``, representing trivial constraints;
- columns of zeros in ``A_eq`` `and` ``A_ub``, representing unconstrained
variables;
- column singletons in ``A_eq``, representing fixed variables; and
- column singletons in ``A_ub``, representing simple bounds.
If presolve reveals that the problem is unbounded (e.g. an unconstrained
and unbounded variable has negative cost) or infeasible (e.g., a row of
zeros in ``A_eq`` corresponds with a nonzero in ``b_eq``), the solver
terminates with the appropriate status code. Note that presolve terminates
as soon as any sign of unboundedness is detected; consequently, a problem
may be reported as unbounded when in reality the problem is infeasible
(but infeasibility has not been detected yet). Therefore, if it is
important to know whether the problem is actually infeasible, solve the
problem again with option ``presolve=False``.
If neither infeasibility nor unboundedness are detected in a single pass
of the presolve, bounds are tightened where possible and fixed
variables are removed from the problem. Then, linearly dependent rows
of the ``A_eq`` matrix are removed, (unless they represent an
infeasibility) to avoid numerical difficulties in the primary solve
routine. Note that rows that are nearly linearly dependent (within a
prescribed tolerance) may also be removed, which can change the optimal
solution in rare cases. If this is a concern, eliminate redundancy from
your problem formulation and run with option ``rr=False`` or
``presolve=False``.
Several potential improvements can be made here: additional presolve
checks outlined in [8]_ should be implemented, the presolve routine should
be run multiple times (until no further simplifications can be made), and
more of the efficiency improvements from [5]_ should be implemented in the
redundancy removal routines.
After presolve, the problem is transformed to standard form by converting
the (tightened) simple bounds to upper bound constraints, introducing
non-negative slack variables for inequality constraints, and expressing
unbounded variables as the difference between two non-negative variables.
Optionally, the problem is automatically scaled via equilibration [12]_.
The selected algorithm solves the standard form problem, and a
postprocessing routine converts the result to a solution to the original
problem.
References
----------
.. [1] Dantzig, George B., Linear programming and extensions. Rand
Corporation Research Study Princeton Univ. Press, Princeton, NJ,
1963
.. [2] Hillier, S.H. and Lieberman, G.J. (1995), "Introduction to
Mathematical Programming", McGraw-Hill, Chapter 4.
.. [3] Bland, Robert G. New finite pivoting rules for the simplex method.
Mathematics of Operations Research (2), 1977: pp. 103-107.
.. [4] Andersen, Erling D., and Knud D. Andersen. "The MOSEK interior point
optimizer for linear programming: an implementation of the
homogeneous algorithm." High performance optimization. Springer US,
2000. 197-232.
.. [5] Andersen, Erling D. "Finding all linearly dependent rows in
large-scale linear programming." Optimization Methods and Software
6.3 (1995): 219-227.
.. [6] Freund, Robert M. "Primal-Dual Interior-Point Methods for Linear
Programming based on Newton's Method." Unpublished Course Notes,
March 2004. Available 2/25/2017 at
https://ocw.mit.edu/courses/sloan-school-of-management/15-084j-nonlinear-programming-spring-2004/lecture-notes/lec14_int_pt_mthd.pdf
.. [7] Fourer, Robert. "Solving Linear Programs by Interior-Point Methods."
Unpublished Course Notes, August 26, 2005. Available 2/25/2017 at
http://www.4er.org/CourseNotes/Book%20B/B-III.pdf
.. [8] Andersen, Erling D., and Knud D. Andersen. "Presolving in linear
programming." Mathematical Programming 71.2 (1995): 221-245.
.. [9] Bertsimas, Dimitris, and J. Tsitsiklis. "Introduction to linear
programming." Athena Scientific 1 (1997): 997.
.. [10] Andersen, Erling D., et al. Implementation of interior point
methods for large scale linear programming. HEC/Universite de
Geneve, 1996.
.. [11] Bartels, Richard H. "A stabilization of the simplex method."
Journal in Numerische Mathematik 16.5 (1971): 414-434.
.. [12] Tomlin, J. A. "On scaling linear programming problems."
Mathematical Programming Study 4 (1975): 146-166.
.. [13] Huangfu, Q., Galabova, I., Feldmeier, M., and Hall, J. A. J.
"HiGHS - high performance software for linear optimization."
https://highs.dev/
.. [14] Huangfu, Q. and Hall, J. A. J. "Parallelizing the dual revised
simplex method." Mathematical Programming Computation, 10 (1),
119-142, 2018. DOI: 10.1007/s12532-017-0130-5
Examples
--------
Consider the following problem:
.. math::
\min_{x_0, x_1} \ -x_0 + 4x_1 & \\
\mbox{such that} \ -3x_0 + x_1 & \leq 6,\\
-x_0 - 2x_1 & \geq -4,\\
x_1 & \geq -3.
The problem is not presented in the form accepted by `linprog`. This is
easily remedied by converting the "greater than" inequality
constraint to a "less than" inequality constraint by
multiplying both sides by a factor of :math:`-1`. Note also that the last
constraint is really the simple bound :math:`-3 \leq x_1 \leq \infty`.
Finally, since there are no bounds on :math:`x_0`, we must explicitly
specify the bounds :math:`-\infty \leq x_0 \leq \infty`, as the
default is for variables to be non-negative. After collecting coeffecients
into arrays and tuples, the input for this problem is:
>>> from scipy.optimize import linprog
>>> c = [-1, 4]
>>> A = [[-3, 1], [1, 2]]
>>> b = [6, 4]
>>> x0_bounds = (None, None)
>>> x1_bounds = (-3, None)
>>> res = linprog(c, A_ub=A, b_ub=b, bounds=[x0_bounds, x1_bounds])
>>> res.fun
-22.0
>>> res.x
array([10., -3.])
>>> res.message
'Optimization terminated successfully. (HiGHS Status 7: Optimal)'
The marginals (AKA dual values / shadow prices / Lagrange multipliers)
and residuals (slacks) are also available.
>>> res.ineqlin
residual: [ 3.900e+01 0.000e+00]
marginals: [-0.000e+00 -1.000e+00]
For example, because the marginal associated with the second inequality
constraint is -1, we expect the optimal value of the objective function
to decrease by ``eps`` if we add a small amount ``eps`` to the right hand
side of the second inequality constraint:
>>> eps = 0.05
>>> b[1] += eps
>>> linprog(c, A_ub=A, b_ub=b, bounds=[x0_bounds, x1_bounds]).fun
-22.05
Also, because the residual on the first inequality constraint is 39, we
can decrease the right hand side of the first constraint by 39 without
affecting the optimal solution.
>>> b = [6, 4] # reset to original values
>>> b[0] -= 39
>>> linprog(c, A_ub=A, b_ub=b, bounds=[x0_bounds, x1_bounds]).fun
-22.0
"""
meth = method.lower()
methods = {"highs", "highs-ds", "highs-ipm",
"simplex", "revised simplex", "interior-point"}
if meth not in methods:
raise ValueError(f"Unknown solver '{method}'")
if x0 is not None and meth != "revised simplex":
warning_message = "x0 is used only when method is 'revised simplex'. "
warn(warning_message, OptimizeWarning, stacklevel=2)
if np.any(integrality) and not meth == "highs":
integrality = None
warning_message = ("Only `method='highs'` supports integer "
"constraints. Ignoring `integrality`.")
warn(warning_message, OptimizeWarning, stacklevel=2)
elif np.any(integrality):
integrality = np.broadcast_to(integrality, np.shape(c))
lp = _LPProblem(c, A_ub, b_ub, A_eq, b_eq, bounds, x0, integrality)
lp, solver_options = _parse_linprog(lp, options, meth)
tol = solver_options.get('tol', 1e-9)
# Give unmodified problem to HiGHS
if meth.startswith('highs'):
if callback is not None:
raise NotImplementedError("HiGHS solvers do not support the "
"callback interface.")
highs_solvers = {'highs-ipm': 'ipm', 'highs-ds': 'simplex',
'highs': None}
sol = _linprog_highs(lp, solver=highs_solvers[meth],
**solver_options)
sol['status'], sol['message'] = (
_check_result(sol['x'], sol['fun'], sol['status'], sol['slack'],
sol['con'], lp.bounds, tol, sol['message'],
integrality))
sol['success'] = sol['status'] == 0
return OptimizeResult(sol)
warn(f"`method='{meth}'` is deprecated and will be removed in SciPy "
"1.11.0. Please use one of the HiGHS solvers (e.g. "
"`method='highs'`) in new code.", DeprecationWarning, stacklevel=2)
iteration = 0
complete = False # will become True if solved in presolve
undo = []
# Keep the original arrays to calculate slack/residuals for original
# problem.
lp_o = deepcopy(lp)
# Solve trivial problem, eliminate variables, tighten bounds, etc.
rr_method = solver_options.pop('rr_method', None) # need to pop these;
rr = solver_options.pop('rr', True) # they're not passed to methods
c0 = 0 # we might get a constant term in the objective
if solver_options.pop('presolve', True):
(lp, c0, x, undo, complete, status, message) = _presolve(lp, rr,
rr_method,
tol)
C, b_scale = 1, 1 # for trivial unscaling if autoscale is not used
postsolve_args = (lp_o._replace(bounds=lp.bounds), undo, C, b_scale)
if not complete:
A, b, c, c0, x0 = _get_Abc(lp, c0)
if solver_options.pop('autoscale', False):
A, b, c, x0, C, b_scale = _autoscale(A, b, c, x0)
postsolve_args = postsolve_args[:-2] + (C, b_scale)
if meth == 'simplex':
x, status, message, iteration = _linprog_simplex(
c, c0=c0, A=A, b=b, callback=callback,
postsolve_args=postsolve_args, **solver_options)
elif meth == 'interior-point':
x, status, message, iteration = _linprog_ip(
c, c0=c0, A=A, b=b, callback=callback,
postsolve_args=postsolve_args, **solver_options)
elif meth == 'revised simplex':
x, status, message, iteration = _linprog_rs(
c, c0=c0, A=A, b=b, x0=x0, callback=callback,
postsolve_args=postsolve_args, **solver_options)
# Eliminate artificial variables, re-introduce presolved variables, etc.
disp = solver_options.get('disp', False)
x, fun, slack, con = _postsolve(x, postsolve_args, complete)
status, message = _check_result(x, fun, status, slack, con, lp_o.bounds,
tol, message, integrality)
if disp:
_display_summary(message, status, fun, iteration)
sol = {
'x': x,
'fun': fun,
'slack': slack,
'con': con,
'status': status,
'message': message,
'nit': iteration,
'success': status == 0}
return OptimizeResult(sol)
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,440 @@
"""HiGHS Linear Optimization Methods
Interface to HiGHS linear optimization software.
https://highs.dev/
.. versionadded:: 1.5.0
References
----------
.. [1] Q. Huangfu and J.A.J. Hall. "Parallelizing the dual revised simplex
method." Mathematical Programming Computation, 10 (1), 119-142,
2018. DOI: 10.1007/s12532-017-0130-5
"""
import inspect
import numpy as np
from ._optimize import OptimizeWarning, OptimizeResult
from warnings import warn
from ._highs._highs_wrapper import _highs_wrapper
from ._highs._highs_constants import (
CONST_INF,
MESSAGE_LEVEL_NONE,
HIGHS_OBJECTIVE_SENSE_MINIMIZE,
MODEL_STATUS_NOTSET,
MODEL_STATUS_LOAD_ERROR,
MODEL_STATUS_MODEL_ERROR,
MODEL_STATUS_PRESOLVE_ERROR,
MODEL_STATUS_SOLVE_ERROR,
MODEL_STATUS_POSTSOLVE_ERROR,
MODEL_STATUS_MODEL_EMPTY,
MODEL_STATUS_OPTIMAL,
MODEL_STATUS_INFEASIBLE,
MODEL_STATUS_UNBOUNDED_OR_INFEASIBLE,
MODEL_STATUS_UNBOUNDED,
MODEL_STATUS_REACHED_DUAL_OBJECTIVE_VALUE_UPPER_BOUND
as MODEL_STATUS_RDOVUB,
MODEL_STATUS_REACHED_OBJECTIVE_TARGET,
MODEL_STATUS_REACHED_TIME_LIMIT,
MODEL_STATUS_REACHED_ITERATION_LIMIT,
HIGHS_SIMPLEX_STRATEGY_DUAL,
HIGHS_SIMPLEX_CRASH_STRATEGY_OFF,
HIGHS_SIMPLEX_EDGE_WEIGHT_STRATEGY_CHOOSE,
HIGHS_SIMPLEX_EDGE_WEIGHT_STRATEGY_DANTZIG,
HIGHS_SIMPLEX_EDGE_WEIGHT_STRATEGY_DEVEX,
HIGHS_SIMPLEX_EDGE_WEIGHT_STRATEGY_STEEPEST_EDGE,
)
from scipy.sparse import csc_matrix, vstack, issparse
def _highs_to_scipy_status_message(highs_status, highs_message):
"""Converts HiGHS status number/message to SciPy status number/message"""
scipy_statuses_messages = {
None: (4, "HiGHS did not provide a status code. "),
MODEL_STATUS_NOTSET: (4, ""),
MODEL_STATUS_LOAD_ERROR: (4, ""),
MODEL_STATUS_MODEL_ERROR: (2, ""),
MODEL_STATUS_PRESOLVE_ERROR: (4, ""),
MODEL_STATUS_SOLVE_ERROR: (4, ""),
MODEL_STATUS_POSTSOLVE_ERROR: (4, ""),
MODEL_STATUS_MODEL_EMPTY: (4, ""),
MODEL_STATUS_RDOVUB: (4, ""),
MODEL_STATUS_REACHED_OBJECTIVE_TARGET: (4, ""),
MODEL_STATUS_OPTIMAL: (0, "Optimization terminated successfully. "),
MODEL_STATUS_REACHED_TIME_LIMIT: (1, "Time limit reached. "),
MODEL_STATUS_REACHED_ITERATION_LIMIT: (1, "Iteration limit reached. "),
MODEL_STATUS_INFEASIBLE: (2, "The problem is infeasible. "),
MODEL_STATUS_UNBOUNDED: (3, "The problem is unbounded. "),
MODEL_STATUS_UNBOUNDED_OR_INFEASIBLE: (4, "The problem is unbounded "
"or infeasible. ")}
unrecognized = (4, "The HiGHS status code was not recognized. ")
scipy_status, scipy_message = (
scipy_statuses_messages.get(highs_status, unrecognized))
scipy_message = (f"{scipy_message}"
f"(HiGHS Status {highs_status}: {highs_message})")
return scipy_status, scipy_message
def _replace_inf(x):
# Replace `np.inf` with CONST_INF
infs = np.isinf(x)
with np.errstate(invalid="ignore"):
x[infs] = np.sign(x[infs])*CONST_INF
return x
def _convert_to_highs_enum(option, option_str, choices):
# If option is in the choices we can look it up, if not use
# the default value taken from function signature and warn:
try:
return choices[option.lower()]
except AttributeError:
return choices[option]
except KeyError:
sig = inspect.signature(_linprog_highs)
default_str = sig.parameters[option_str].default
warn(f"Option {option_str} is {option}, but only values in "
f"{set(choices.keys())} are allowed. Using default: "
f"{default_str}.",
OptimizeWarning, stacklevel=3)
return choices[default_str]
def _linprog_highs(lp, solver, time_limit=None, presolve=True,
disp=False, maxiter=None,
dual_feasibility_tolerance=None,
primal_feasibility_tolerance=None,
ipm_optimality_tolerance=None,
simplex_dual_edge_weight_strategy=None,
mip_rel_gap=None,
mip_max_nodes=None,
**unknown_options):
r"""
Solve the following linear programming problem using one of the HiGHS
solvers:
User-facing documentation is in _linprog_doc.py.
Parameters
----------
lp : _LPProblem
A ``scipy.optimize._linprog_util._LPProblem`` ``namedtuple``.
solver : "ipm" or "simplex" or None
Which HiGHS solver to use. If ``None``, "simplex" will be used.
Options
-------
maxiter : int
The maximum number of iterations to perform in either phase. For
``solver='ipm'``, this does not include the number of crossover
iterations. Default is the largest possible value for an ``int``
on the platform.
disp : bool
Set to ``True`` if indicators of optimization status are to be printed
to the console each iteration; default ``False``.
time_limit : float
The maximum time in seconds allotted to solve the problem; default is
the largest possible value for a ``double`` on the platform.
presolve : bool
Presolve attempts to identify trivial infeasibilities,
identify trivial unboundedness, and simplify the problem before
sending it to the main solver. It is generally recommended
to keep the default setting ``True``; set to ``False`` if presolve is
to be disabled.
dual_feasibility_tolerance : double
Dual feasibility tolerance. Default is 1e-07.
The minimum of this and ``primal_feasibility_tolerance``
is used for the feasibility tolerance when ``solver='ipm'``.
primal_feasibility_tolerance : double
Primal feasibility tolerance. Default is 1e-07.
The minimum of this and ``dual_feasibility_tolerance``
is used for the feasibility tolerance when ``solver='ipm'``.
ipm_optimality_tolerance : double
Optimality tolerance for ``solver='ipm'``. Default is 1e-08.
Minimum possible value is 1e-12 and must be smaller than the largest
possible value for a ``double`` on the platform.
simplex_dual_edge_weight_strategy : str (default: None)
Strategy for simplex dual edge weights. The default, ``None``,
automatically selects one of the following.
``'dantzig'`` uses Dantzig's original strategy of choosing the most
negative reduced cost.
``'devex'`` uses the strategy described in [15]_.
``steepest`` uses the exact steepest edge strategy as described in
[16]_.
``'steepest-devex'`` begins with the exact steepest edge strategy
until the computation is too costly or inexact and then switches to
the devex method.
Currently, using ``None`` always selects ``'steepest-devex'``, but this
may change as new options become available.
mip_max_nodes : int
The maximum number of nodes allotted to solve the problem; default is
the largest possible value for a ``HighsInt`` on the platform.
Ignored if not using the MIP solver.
unknown_options : dict
Optional arguments not used by this particular solver. If
``unknown_options`` is non-empty, a warning is issued listing all
unused options.
Returns
-------
sol : dict
A dictionary consisting of the fields:
x : 1D array
The values of the decision variables that minimizes the
objective function while satisfying the constraints.
fun : float
The optimal value of the objective function ``c @ x``.
slack : 1D array
The (nominally positive) values of the slack,
``b_ub - A_ub @ x``.
con : 1D array
The (nominally zero) residuals of the equality constraints,
``b_eq - A_eq @ x``.
success : bool
``True`` when the algorithm succeeds in finding an optimal
solution.
status : int
An integer representing the exit status of the algorithm.
``0`` : Optimization terminated successfully.
``1`` : Iteration or time limit reached.
``2`` : Problem appears to be infeasible.
``3`` : Problem appears to be unbounded.
``4`` : The HiGHS solver ran into a problem.
message : str
A string descriptor of the exit status of the algorithm.
nit : int
The total number of iterations performed.
For ``solver='simplex'``, this includes iterations in all
phases. For ``solver='ipm'``, this does not include
crossover iterations.
crossover_nit : int
The number of primal/dual pushes performed during the
crossover routine for ``solver='ipm'``. This is ``0``
for ``solver='simplex'``.
ineqlin : OptimizeResult
Solution and sensitivity information corresponding to the
inequality constraints, `b_ub`. A dictionary consisting of the
fields:
residual : np.ndnarray
The (nominally positive) values of the slack variables,
``b_ub - A_ub @ x``. This quantity is also commonly
referred to as "slack".
marginals : np.ndarray
The sensitivity (partial derivative) of the objective
function with respect to the right-hand side of the
inequality constraints, `b_ub`.
eqlin : OptimizeResult
Solution and sensitivity information corresponding to the
equality constraints, `b_eq`. A dictionary consisting of the
fields:
residual : np.ndarray
The (nominally zero) residuals of the equality constraints,
``b_eq - A_eq @ x``.
marginals : np.ndarray
The sensitivity (partial derivative) of the objective
function with respect to the right-hand side of the
equality constraints, `b_eq`.
lower, upper : OptimizeResult
Solution and sensitivity information corresponding to the
lower and upper bounds on decision variables, `bounds`.
residual : np.ndarray
The (nominally positive) values of the quantity
``x - lb`` (lower) or ``ub - x`` (upper).
marginals : np.ndarray
The sensitivity (partial derivative) of the objective
function with respect to the lower and upper
`bounds`.
mip_node_count : int
The number of subproblems or "nodes" solved by the MILP
solver. Only present when `integrality` is not `None`.
mip_dual_bound : float
The MILP solver's final estimate of the lower bound on the
optimal solution. Only present when `integrality` is not
`None`.
mip_gap : float
The difference between the final objective function value
and the final dual bound, scaled by the final objective
function value. Only present when `integrality` is not
`None`.
Notes
-----
The result fields `ineqlin`, `eqlin`, `lower`, and `upper` all contain
`marginals`, or partial derivatives of the objective function with respect
to the right-hand side of each constraint. These partial derivatives are
also referred to as "Lagrange multipliers", "dual values", and
"shadow prices". The sign convention of `marginals` is opposite that
of Lagrange multipliers produced by many nonlinear solvers.
References
----------
.. [15] Harris, Paula MJ. "Pivot selection methods of the Devex LP code."
Mathematical programming 5.1 (1973): 1-28.
.. [16] Goldfarb, Donald, and John Ker Reid. "A practicable steepest-edge
simplex algorithm." Mathematical Programming 12.1 (1977): 361-371.
"""
if unknown_options:
message = (f"Unrecognized options detected: {unknown_options}. "
"These will be passed to HiGHS verbatim.")
warn(message, OptimizeWarning, stacklevel=3)
# Map options to HiGHS enum values
simplex_dual_edge_weight_strategy_enum = _convert_to_highs_enum(
simplex_dual_edge_weight_strategy,
'simplex_dual_edge_weight_strategy',
choices={'dantzig': HIGHS_SIMPLEX_EDGE_WEIGHT_STRATEGY_DANTZIG,
'devex': HIGHS_SIMPLEX_EDGE_WEIGHT_STRATEGY_DEVEX,
'steepest-devex': HIGHS_SIMPLEX_EDGE_WEIGHT_STRATEGY_CHOOSE,
'steepest':
HIGHS_SIMPLEX_EDGE_WEIGHT_STRATEGY_STEEPEST_EDGE,
None: None})
c, A_ub, b_ub, A_eq, b_eq, bounds, x0, integrality = lp
lb, ub = bounds.T.copy() # separate bounds, copy->C-cntgs
# highs_wrapper solves LHS <= A*x <= RHS, not equality constraints
with np.errstate(invalid="ignore"):
lhs_ub = -np.ones_like(b_ub)*np.inf # LHS of UB constraints is -inf
rhs_ub = b_ub # RHS of UB constraints is b_ub
lhs_eq = b_eq # Equality constraint is inequality
rhs_eq = b_eq # constraint with LHS=RHS
lhs = np.concatenate((lhs_ub, lhs_eq))
rhs = np.concatenate((rhs_ub, rhs_eq))
if issparse(A_ub) or issparse(A_eq):
A = vstack((A_ub, A_eq))
else:
A = np.vstack((A_ub, A_eq))
A = csc_matrix(A)
options = {
'presolve': presolve,
'sense': HIGHS_OBJECTIVE_SENSE_MINIMIZE,
'solver': solver,
'time_limit': time_limit,
'highs_debug_level': MESSAGE_LEVEL_NONE,
'dual_feasibility_tolerance': dual_feasibility_tolerance,
'ipm_optimality_tolerance': ipm_optimality_tolerance,
'log_to_console': disp,
'mip_max_nodes': mip_max_nodes,
'output_flag': disp,
'primal_feasibility_tolerance': primal_feasibility_tolerance,
'simplex_dual_edge_weight_strategy':
simplex_dual_edge_weight_strategy_enum,
'simplex_strategy': HIGHS_SIMPLEX_STRATEGY_DUAL,
'simplex_crash_strategy': HIGHS_SIMPLEX_CRASH_STRATEGY_OFF,
'ipm_iteration_limit': maxiter,
'simplex_iteration_limit': maxiter,
'mip_rel_gap': mip_rel_gap,
}
options.update(unknown_options)
# np.inf doesn't work; use very large constant
rhs = _replace_inf(rhs)
lhs = _replace_inf(lhs)
lb = _replace_inf(lb)
ub = _replace_inf(ub)
if integrality is None or np.sum(integrality) == 0:
integrality = np.empty(0)
else:
integrality = np.array(integrality)
res = _highs_wrapper(c, A.indptr, A.indices, A.data, lhs, rhs,
lb, ub, integrality.astype(np.uint8), options)
# HiGHS represents constraints as lhs/rhs, so
# Ax + s = b => Ax = b - s
# and we need to split up s by A_ub and A_eq
if 'slack' in res:
slack = res['slack']
con = np.array(slack[len(b_ub):])
slack = np.array(slack[:len(b_ub)])
else:
slack, con = None, None
# lagrange multipliers for equalities/inequalities and upper/lower bounds
if 'lambda' in res:
lamda = res['lambda']
marg_ineqlin = np.array(lamda[:len(b_ub)])
marg_eqlin = np.array(lamda[len(b_ub):])
marg_upper = np.array(res['marg_bnds'][1, :])
marg_lower = np.array(res['marg_bnds'][0, :])
else:
marg_ineqlin, marg_eqlin = None, None
marg_upper, marg_lower = None, None
# this needs to be updated if we start choosing the solver intelligently
# Convert to scipy-style status and message
highs_status = res.get('status', None)
highs_message = res.get('message', None)
status, message = _highs_to_scipy_status_message(highs_status,
highs_message)
x = np.array(res['x']) if 'x' in res else None
sol = {'x': x,
'slack': slack,
'con': con,
'ineqlin': OptimizeResult({
'residual': slack,
'marginals': marg_ineqlin,
}),
'eqlin': OptimizeResult({
'residual': con,
'marginals': marg_eqlin,
}),
'lower': OptimizeResult({
'residual': None if x is None else x - lb,
'marginals': marg_lower,
}),
'upper': OptimizeResult({
'residual': None if x is None else ub - x,
'marginals': marg_upper
}),
'fun': res.get('fun'),
'status': status,
'success': res['status'] == MODEL_STATUS_OPTIMAL,
'message': message,
'nit': res.get('simplex_nit', 0) or res.get('ipm_nit', 0),
'crossover_nit': res.get('crossover_nit'),
}
if np.any(x) and integrality is not None:
sol.update({
'mip_node_count': res.get('mip_node_count', 0),
'mip_dual_bound': res.get('mip_dual_bound', 0.0),
'mip_gap': res.get('mip_gap', 0.0),
})
return sol
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,572 @@
"""Revised simplex method for linear programming
The *revised simplex* method uses the method described in [1]_, except
that a factorization [2]_ of the basis matrix, rather than its inverse,
is efficiently maintained and used to solve the linear systems at each
iteration of the algorithm.
.. versionadded:: 1.3.0
References
----------
.. [1] Bertsimas, Dimitris, and J. Tsitsiklis. "Introduction to linear
programming." Athena Scientific 1 (1997): 997.
.. [2] Bartels, Richard H. "A stabilization of the simplex method."
Journal in Numerische Mathematik 16.5 (1971): 414-434.
"""
# Author: Matt Haberland
import numpy as np
from numpy.linalg import LinAlgError
from scipy.linalg import solve
from ._optimize import _check_unknown_options
from ._bglu_dense import LU
from ._bglu_dense import BGLU as BGLU
from ._linprog_util import _postsolve
from ._optimize import OptimizeResult
def _phase_one(A, b, x0, callback, postsolve_args, maxiter, tol, disp,
maxupdate, mast, pivot):
"""
The purpose of phase one is to find an initial basic feasible solution
(BFS) to the original problem.
Generates an auxiliary problem with a trivial BFS and an objective that
minimizes infeasibility of the original problem. Solves the auxiliary
problem using the main simplex routine (phase two). This either yields
a BFS to the original problem or determines that the original problem is
infeasible. If feasible, phase one detects redundant rows in the original
constraint matrix and removes them, then chooses additional indices as
necessary to complete a basis/BFS for the original problem.
"""
m, n = A.shape
status = 0
# generate auxiliary problem to get initial BFS
A, b, c, basis, x, status = _generate_auxiliary_problem(A, b, x0, tol)
if status == 6:
residual = c.dot(x)
iter_k = 0
return x, basis, A, b, residual, status, iter_k
# solve auxiliary problem
phase_one_n = n
iter_k = 0
x, basis, status, iter_k = _phase_two(c, A, x, basis, callback,
postsolve_args,
maxiter, tol, disp,
maxupdate, mast, pivot,
iter_k, phase_one_n)
# check for infeasibility
residual = c.dot(x)
if status == 0 and residual > tol:
status = 2
# drive artificial variables out of basis
# TODO: test redundant row removal better
# TODO: make solve more efficient with BGLU? This could take a while.
keep_rows = np.ones(m, dtype=bool)
for basis_column in basis[basis >= n]:
B = A[:, basis]
try:
basis_finder = np.abs(solve(B, A)) # inefficient
pertinent_row = np.argmax(basis_finder[:, basis_column])
eligible_columns = np.ones(n, dtype=bool)
eligible_columns[basis[basis < n]] = 0
eligible_column_indices = np.where(eligible_columns)[0]
index = np.argmax(basis_finder[:, :n]
[pertinent_row, eligible_columns])
new_basis_column = eligible_column_indices[index]
if basis_finder[pertinent_row, new_basis_column] < tol:
keep_rows[pertinent_row] = False
else:
basis[basis == basis_column] = new_basis_column
except LinAlgError:
status = 4
# form solution to original problem
A = A[keep_rows, :n]
basis = basis[keep_rows]
x = x[:n]
m = A.shape[0]
return x, basis, A, b, residual, status, iter_k
def _get_more_basis_columns(A, basis):
"""
Called when the auxiliary problem terminates with artificial columns in
the basis, which must be removed and replaced with non-artificial
columns. Finds additional columns that do not make the matrix singular.
"""
m, n = A.shape
# options for inclusion are those that aren't already in the basis
a = np.arange(m+n)
bl = np.zeros(len(a), dtype=bool)
bl[basis] = 1
options = a[~bl]
options = options[options < n] # and they have to be non-artificial
# form basis matrix
B = np.zeros((m, m))
B[:, 0:len(basis)] = A[:, basis]
if (basis.size > 0 and
np.linalg.matrix_rank(B[:, :len(basis)]) < len(basis)):
raise Exception("Basis has dependent columns")
rank = 0 # just enter the loop
for i in range(n): # somewhat arbitrary, but we need another way out
# permute the options, and take as many as needed
new_basis = np.random.permutation(options)[:m-len(basis)]
B[:, len(basis):] = A[:, new_basis] # update the basis matrix
rank = np.linalg.matrix_rank(B) # check the rank
if rank == m:
break
return np.concatenate((basis, new_basis))
def _generate_auxiliary_problem(A, b, x0, tol):
"""
Modifies original problem to create an auxiliary problem with a trivial
initial basic feasible solution and an objective that minimizes
infeasibility in the original problem.
Conceptually, this is done by stacking an identity matrix on the right of
the original constraint matrix, adding artificial variables to correspond
with each of these new columns, and generating a cost vector that is all
zeros except for ones corresponding with each of the new variables.
A initial basic feasible solution is trivial: all variables are zero
except for the artificial variables, which are set equal to the
corresponding element of the right hand side `b`.
Running the simplex method on this auxiliary problem drives all of the
artificial variables - and thus the cost - to zero if the original problem
is feasible. The original problem is declared infeasible otherwise.
Much of the complexity below is to improve efficiency by using singleton
columns in the original problem where possible, thus generating artificial
variables only as necessary, and using an initial 'guess' basic feasible
solution.
"""
status = 0
m, n = A.shape
if x0 is not None:
x = x0
else:
x = np.zeros(n)
r = b - A@x # residual; this must be all zeros for feasibility
A[r < 0] = -A[r < 0] # express problem with RHS positive for trivial BFS
b[r < 0] = -b[r < 0] # to the auxiliary problem
r[r < 0] *= -1
# Rows which we will need to find a trivial way to zero.
# This should just be the rows where there is a nonzero residual.
# But then we would not necessarily have a column singleton in every row.
# This makes it difficult to find an initial basis.
if x0 is None:
nonzero_constraints = np.arange(m)
else:
nonzero_constraints = np.where(r > tol)[0]
# these are (at least some of) the initial basis columns
basis = np.where(np.abs(x) > tol)[0]
if len(nonzero_constraints) == 0 and len(basis) <= m: # already a BFS
c = np.zeros(n)
basis = _get_more_basis_columns(A, basis)
return A, b, c, basis, x, status
elif (len(nonzero_constraints) > m - len(basis) or
np.any(x < 0)): # can't get trivial BFS
c = np.zeros(n)
status = 6
return A, b, c, basis, x, status
# chooses existing columns appropriate for inclusion in initial basis
cols, rows = _select_singleton_columns(A, r)
# find the rows we need to zero that we _can_ zero with column singletons
i_tofix = np.isin(rows, nonzero_constraints)
# these columns can't already be in the basis, though
# we are going to add them to the basis and change the corresponding x val
i_notinbasis = np.logical_not(np.isin(cols, basis))
i_fix_without_aux = np.logical_and(i_tofix, i_notinbasis)
rows = rows[i_fix_without_aux]
cols = cols[i_fix_without_aux]
# indices of the rows we can only zero with auxiliary variable
# these rows will get a one in each auxiliary column
arows = nonzero_constraints[np.logical_not(
np.isin(nonzero_constraints, rows))]
n_aux = len(arows)
acols = n + np.arange(n_aux) # indices of auxiliary columns
basis_ng = np.concatenate((cols, acols)) # basis columns not from guess
basis_ng_rows = np.concatenate((rows, arows)) # rows we need to zero
# add auxiliary singleton columns
A = np.hstack((A, np.zeros((m, n_aux))))
A[arows, acols] = 1
# generate initial BFS
x = np.concatenate((x, np.zeros(n_aux)))
x[basis_ng] = r[basis_ng_rows]/A[basis_ng_rows, basis_ng]
# generate costs to minimize infeasibility
c = np.zeros(n_aux + n)
c[acols] = 1
# basis columns correspond with nonzeros in guess, those with column
# singletons we used to zero remaining constraints, and any additional
# columns to get a full set (m columns)
basis = np.concatenate((basis, basis_ng))
basis = _get_more_basis_columns(A, basis) # add columns as needed
return A, b, c, basis, x, status
def _select_singleton_columns(A, b):
"""
Finds singleton columns for which the singleton entry is of the same sign
as the right-hand side; these columns are eligible for inclusion in an
initial basis. Determines the rows in which the singleton entries are
located. For each of these rows, returns the indices of the one singleton
column and its corresponding row.
"""
# find indices of all singleton columns and corresponding row indices
column_indices = np.nonzero(np.sum(np.abs(A) != 0, axis=0) == 1)[0]
columns = A[:, column_indices] # array of singleton columns
row_indices = np.zeros(len(column_indices), dtype=int)
nonzero_rows, nonzero_columns = np.nonzero(columns)
row_indices[nonzero_columns] = nonzero_rows # corresponding row indices
# keep only singletons with entries that have same sign as RHS
# this is necessary because all elements of BFS must be non-negative
same_sign = A[row_indices, column_indices]*b[row_indices] >= 0
column_indices = column_indices[same_sign][::-1]
row_indices = row_indices[same_sign][::-1]
# Reversing the order so that steps below select rightmost columns
# for initial basis, which will tend to be slack variables. (If the
# guess corresponds with a basic feasible solution but a constraint
# is not satisfied with the corresponding slack variable zero, the slack
# variable must be basic.)
# for each row, keep rightmost singleton column with an entry in that row
unique_row_indices, first_columns = np.unique(row_indices,
return_index=True)
return column_indices[first_columns], unique_row_indices
def _find_nonzero_rows(A, tol):
"""
Returns logical array indicating the locations of rows with at least
one nonzero element.
"""
return np.any(np.abs(A) > tol, axis=1)
def _select_enter_pivot(c_hat, bl, a, rule="bland", tol=1e-12):
"""
Selects a pivot to enter the basis. Currently Bland's rule - the smallest
index that has a negative reduced cost - is the default.
"""
if rule.lower() == "mrc": # index with minimum reduced cost
return a[~bl][np.argmin(c_hat)]
else: # smallest index w/ negative reduced cost
return a[~bl][c_hat < -tol][0]
def _display_iter(phase, iteration, slack, con, fun):
"""
Print indicators of optimization status to the console.
"""
header = True if not iteration % 20 else False
if header:
print("Phase",
"Iteration",
"Minimum Slack ",
"Constraint Residual",
"Objective ")
# :<X.Y left aligns Y digits in X digit spaces
fmt = '{0:<6}{1:<10}{2:<20.13}{3:<20.13}{4:<20.13}'
try:
slack = np.min(slack)
except ValueError:
slack = "NA"
print(fmt.format(phase, iteration, slack, np.linalg.norm(con), fun))
def _display_and_callback(phase_one_n, x, postsolve_args, status,
iteration, disp, callback):
if phase_one_n is not None:
phase = 1
x_postsolve = x[:phase_one_n]
else:
phase = 2
x_postsolve = x
x_o, fun, slack, con = _postsolve(x_postsolve,
postsolve_args)
if callback is not None:
res = OptimizeResult({'x': x_o, 'fun': fun, 'slack': slack,
'con': con, 'nit': iteration,
'phase': phase, 'complete': False,
'status': status, 'message': "",
'success': False})
callback(res)
if disp:
_display_iter(phase, iteration, slack, con, fun)
def _phase_two(c, A, x, b, callback, postsolve_args, maxiter, tol, disp,
maxupdate, mast, pivot, iteration=0, phase_one_n=None):
"""
The heart of the simplex method. Beginning with a basic feasible solution,
moves to adjacent basic feasible solutions successively lower reduced cost.
Terminates when there are no basic feasible solutions with lower reduced
cost or if the problem is determined to be unbounded.
This implementation follows the revised simplex method based on LU
decomposition. Rather than maintaining a tableau or an inverse of the
basis matrix, we keep a factorization of the basis matrix that allows
efficient solution of linear systems while avoiding stability issues
associated with inverted matrices.
"""
m, n = A.shape
status = 0
a = np.arange(n) # indices of columns of A
ab = np.arange(m) # indices of columns of B
if maxupdate:
# basis matrix factorization object; similar to B = A[:, b]
B = BGLU(A, b, maxupdate, mast)
else:
B = LU(A, b)
for iteration in range(iteration, maxiter):
if disp or callback is not None:
_display_and_callback(phase_one_n, x, postsolve_args, status,
iteration, disp, callback)
bl = np.zeros(len(a), dtype=bool)
bl[b] = 1
xb = x[b] # basic variables
cb = c[b] # basic costs
try:
v = B.solve(cb, transposed=True) # similar to v = solve(B.T, cb)
except LinAlgError:
status = 4
break
# TODO: cythonize?
c_hat = c - v.dot(A) # reduced cost
c_hat = c_hat[~bl]
# Above is much faster than:
# N = A[:, ~bl] # slow!
# c_hat = c[~bl] - v.T.dot(N)
# Can we perform the multiplication only on the nonbasic columns?
if np.all(c_hat >= -tol): # all reduced costs positive -> terminate
break
j = _select_enter_pivot(c_hat, bl, a, rule=pivot, tol=tol)
u = B.solve(A[:, j]) # similar to u = solve(B, A[:, j])
i = u > tol # if none of the u are positive, unbounded
if not np.any(i):
status = 3
break
th = xb[i]/u[i]
l = np.argmin(th) # implicitly selects smallest subscript
th_star = th[l] # step size
x[b] = x[b] - th_star*u # take step
x[j] = th_star
B.update(ab[i][l], j) # modify basis
b = B.b # similar to b[ab[i][l]] =
else:
# If the end of the for loop is reached (without a break statement),
# then another step has been taken, so the iteration counter should
# increment, info should be displayed, and callback should be called.
iteration += 1
status = 1
if disp or callback is not None:
_display_and_callback(phase_one_n, x, postsolve_args, status,
iteration, disp, callback)
return x, b, status, iteration
def _linprog_rs(c, c0, A, b, x0, callback, postsolve_args,
maxiter=5000, tol=1e-12, disp=False,
maxupdate=10, mast=False, pivot="mrc",
**unknown_options):
"""
Solve the following linear programming problem via a two-phase
revised simplex algorithm.::
minimize: c @ x
subject to: A @ x == b
0 <= x < oo
User-facing documentation is in _linprog_doc.py.
Parameters
----------
c : 1-D array
Coefficients of the linear objective function to be minimized.
c0 : float
Constant term in objective function due to fixed (and eliminated)
variables. (Currently unused.)
A : 2-D array
2-D array which, when matrix-multiplied by ``x``, gives the values of
the equality constraints at ``x``.
b : 1-D array
1-D array of values representing the RHS of each equality constraint
(row) in ``A_eq``.
x0 : 1-D array, optional
Starting values of the independent variables, which will be refined by
the optimization algorithm. For the revised simplex method, these must
correspond with a basic feasible solution.
callback : callable, optional
If a callback function is provided, it will be called within each
iteration of the algorithm. The callback function must accept a single
`scipy.optimize.OptimizeResult` consisting of the following fields:
x : 1-D array
Current solution vector.
fun : float
Current value of the objective function ``c @ x``.
success : bool
True only when an algorithm has completed successfully,
so this is always False as the callback function is called
only while the algorithm is still iterating.
slack : 1-D array
The values of the slack variables. Each slack variable
corresponds to an inequality constraint. If the slack is zero,
the corresponding constraint is active.
con : 1-D array
The (nominally zero) residuals of the equality constraints,
that is, ``b - A_eq @ x``.
phase : int
The phase of the algorithm being executed.
status : int
For revised simplex, this is always 0 because if a different
status is detected, the algorithm terminates.
nit : int
The number of iterations performed.
message : str
A string descriptor of the exit status of the optimization.
postsolve_args : tuple
Data needed by _postsolve to convert the solution to the standard-form
problem into the solution to the original problem.
Options
-------
maxiter : int
The maximum number of iterations to perform in either phase.
tol : float
The tolerance which determines when a solution is "close enough" to
zero in Phase 1 to be considered a basic feasible solution or close
enough to positive to serve as an optimal solution.
disp : bool
Set to ``True`` if indicators of optimization status are to be printed
to the console each iteration.
maxupdate : int
The maximum number of updates performed on the LU factorization.
After this many updates is reached, the basis matrix is factorized
from scratch.
mast : bool
Minimize Amortized Solve Time. If enabled, the average time to solve
a linear system using the basis factorization is measured. Typically,
the average solve time will decrease with each successive solve after
initial factorization, as factorization takes much more time than the
solve operation (and updates). Eventually, however, the updated
factorization becomes sufficiently complex that the average solve time
begins to increase. When this is detected, the basis is refactorized
from scratch. Enable this option to maximize speed at the risk of
nondeterministic behavior. Ignored if ``maxupdate`` is 0.
pivot : "mrc" or "bland"
Pivot rule: Minimum Reduced Cost (default) or Bland's rule. Choose
Bland's rule if iteration limit is reached and cycling is suspected.
unknown_options : dict
Optional arguments not used by this particular solver. If
`unknown_options` is non-empty a warning is issued listing all
unused options.
Returns
-------
x : 1-D array
Solution vector.
status : int
An integer representing the exit status of the optimization::
0 : Optimization terminated successfully
1 : Iteration limit reached
2 : Problem appears to be infeasible
3 : Problem appears to be unbounded
4 : Numerical difficulties encountered
5 : No constraints; turn presolve on
6 : Guess x0 cannot be converted to a basic feasible solution
message : str
A string descriptor of the exit status of the optimization.
iteration : int
The number of iterations taken to solve the problem.
"""
_check_unknown_options(unknown_options)
messages = ["Optimization terminated successfully.",
"Iteration limit reached.",
"The problem appears infeasible, as the phase one auxiliary "
"problem terminated successfully with a residual of {0:.1e}, "
"greater than the tolerance {1} required for the solution to "
"be considered feasible. Consider increasing the tolerance to "
"be greater than {0:.1e}. If this tolerance is unnaceptably "
"large, the problem is likely infeasible.",
"The problem is unbounded, as the simplex algorithm found "
"a basic feasible solution from which there is a direction "
"with negative reduced cost in which all decision variables "
"increase.",
"Numerical difficulties encountered; consider trying "
"method='interior-point'.",
"Problems with no constraints are trivially solved; please "
"turn presolve on.",
"The guess x0 cannot be converted to a basic feasible "
"solution. "
]
if A.size == 0: # address test_unbounded_below_no_presolve_corrected
return np.zeros(c.shape), 5, messages[5], 0
x, basis, A, b, residual, status, iteration = (
_phase_one(A, b, x0, callback, postsolve_args,
maxiter, tol, disp, maxupdate, mast, pivot))
if status == 0:
x, basis, status, iteration = _phase_two(c, A, x, basis, callback,
postsolve_args,
maxiter, tol, disp,
maxupdate, mast, pivot,
iteration)
return x, status, messages[status].format(residual, tol), iteration
@@ -0,0 +1,661 @@
"""Simplex method for linear programming
The *simplex* method uses a traditional, full-tableau implementation of
Dantzig's simplex algorithm [1]_, [2]_ (*not* the Nelder-Mead simplex).
This algorithm is included for backwards compatibility and educational
purposes.
.. versionadded:: 0.15.0
Warnings
--------
The simplex method may encounter numerical difficulties when pivot
values are close to the specified tolerance. If encountered try
remove any redundant constraints, change the pivot strategy to Bland's
rule or increase the tolerance value.
Alternatively, more robust methods maybe be used. See
:ref:`'interior-point' <optimize.linprog-interior-point>` and
:ref:`'revised simplex' <optimize.linprog-revised_simplex>`.
References
----------
.. [1] Dantzig, George B., Linear programming and extensions. Rand
Corporation Research Study Princeton Univ. Press, Princeton, NJ,
1963
.. [2] Hillier, S.H. and Lieberman, G.J. (1995), "Introduction to
Mathematical Programming", McGraw-Hill, Chapter 4.
"""
import numpy as np
from warnings import warn
from ._optimize import OptimizeResult, OptimizeWarning, _check_unknown_options
from ._linprog_util import _postsolve
def _pivot_col(T, tol=1e-9, bland=False):
"""
Given a linear programming simplex tableau, determine the column
of the variable to enter the basis.
Parameters
----------
T : 2-D array
A 2-D array representing the simplex tableau, T, corresponding to the
linear programming problem. It should have the form:
[[A[0, 0], A[0, 1], ..., A[0, n_total], b[0]],
[A[1, 0], A[1, 1], ..., A[1, n_total], b[1]],
.
.
.
[A[m, 0], A[m, 1], ..., A[m, n_total], b[m]],
[c[0], c[1], ..., c[n_total], 0]]
for a Phase 2 problem, or the form:
[[A[0, 0], A[0, 1], ..., A[0, n_total], b[0]],
[A[1, 0], A[1, 1], ..., A[1, n_total], b[1]],
.
.
.
[A[m, 0], A[m, 1], ..., A[m, n_total], b[m]],
[c[0], c[1], ..., c[n_total], 0],
[c'[0], c'[1], ..., c'[n_total], 0]]
for a Phase 1 problem (a problem in which a basic feasible solution is
sought prior to maximizing the actual objective. ``T`` is modified in
place by ``_solve_simplex``.
tol : float
Elements in the objective row larger than -tol will not be considered
for pivoting. Nominally this value is zero, but numerical issues
cause a tolerance about zero to be necessary.
bland : bool
If True, use Bland's rule for selection of the column (select the
first column with a negative coefficient in the objective row,
regardless of magnitude).
Returns
-------
status: bool
True if a suitable pivot column was found, otherwise False.
A return of False indicates that the linear programming simplex
algorithm is complete.
col: int
The index of the column of the pivot element.
If status is False, col will be returned as nan.
"""
ma = np.ma.masked_where(T[-1, :-1] >= -tol, T[-1, :-1], copy=False)
if ma.count() == 0:
return False, np.nan
if bland:
# ma.mask is sometimes 0d
return True, np.nonzero(np.logical_not(np.atleast_1d(ma.mask)))[0][0]
return True, np.ma.nonzero(ma == ma.min())[0][0]
def _pivot_row(T, basis, pivcol, phase, tol=1e-9, bland=False):
"""
Given a linear programming simplex tableau, determine the row for the
pivot operation.
Parameters
----------
T : 2-D array
A 2-D array representing the simplex tableau, T, corresponding to the
linear programming problem. It should have the form:
[[A[0, 0], A[0, 1], ..., A[0, n_total], b[0]],
[A[1, 0], A[1, 1], ..., A[1, n_total], b[1]],
.
.
.
[A[m, 0], A[m, 1], ..., A[m, n_total], b[m]],
[c[0], c[1], ..., c[n_total], 0]]
for a Phase 2 problem, or the form:
[[A[0, 0], A[0, 1], ..., A[0, n_total], b[0]],
[A[1, 0], A[1, 1], ..., A[1, n_total], b[1]],
.
.
.
[A[m, 0], A[m, 1], ..., A[m, n_total], b[m]],
[c[0], c[1], ..., c[n_total], 0],
[c'[0], c'[1], ..., c'[n_total], 0]]
for a Phase 1 problem (a Problem in which a basic feasible solution is
sought prior to maximizing the actual objective. ``T`` is modified in
place by ``_solve_simplex``.
basis : array
A list of the current basic variables.
pivcol : int
The index of the pivot column.
phase : int
The phase of the simplex algorithm (1 or 2).
tol : float
Elements in the pivot column smaller than tol will not be considered
for pivoting. Nominally this value is zero, but numerical issues
cause a tolerance about zero to be necessary.
bland : bool
If True, use Bland's rule for selection of the row (if more than one
row can be used, choose the one with the lowest variable index).
Returns
-------
status: bool
True if a suitable pivot row was found, otherwise False. A return
of False indicates that the linear programming problem is unbounded.
row: int
The index of the row of the pivot element. If status is False, row
will be returned as nan.
"""
if phase == 1:
k = 2
else:
k = 1
ma = np.ma.masked_where(T[:-k, pivcol] <= tol, T[:-k, pivcol], copy=False)
if ma.count() == 0:
return False, np.nan
mb = np.ma.masked_where(T[:-k, pivcol] <= tol, T[:-k, -1], copy=False)
q = mb / ma
min_rows = np.ma.nonzero(q == q.min())[0]
if bland:
return True, min_rows[np.argmin(np.take(basis, min_rows))]
return True, min_rows[0]
def _apply_pivot(T, basis, pivrow, pivcol, tol=1e-9):
"""
Pivot the simplex tableau inplace on the element given by (pivrow, pivol).
The entering variable corresponds to the column given by pivcol forcing
the variable basis[pivrow] to leave the basis.
Parameters
----------
T : 2-D array
A 2-D array representing the simplex tableau, T, corresponding to the
linear programming problem. It should have the form:
[[A[0, 0], A[0, 1], ..., A[0, n_total], b[0]],
[A[1, 0], A[1, 1], ..., A[1, n_total], b[1]],
.
.
.
[A[m, 0], A[m, 1], ..., A[m, n_total], b[m]],
[c[0], c[1], ..., c[n_total], 0]]
for a Phase 2 problem, or the form:
[[A[0, 0], A[0, 1], ..., A[0, n_total], b[0]],
[A[1, 0], A[1, 1], ..., A[1, n_total], b[1]],
.
.
.
[A[m, 0], A[m, 1], ..., A[m, n_total], b[m]],
[c[0], c[1], ..., c[n_total], 0],
[c'[0], c'[1], ..., c'[n_total], 0]]
for a Phase 1 problem (a problem in which a basic feasible solution is
sought prior to maximizing the actual objective. ``T`` is modified in
place by ``_solve_simplex``.
basis : 1-D array
An array of the indices of the basic variables, such that basis[i]
contains the column corresponding to the basic variable for row i.
Basis is modified in place by _apply_pivot.
pivrow : int
Row index of the pivot.
pivcol : int
Column index of the pivot.
"""
basis[pivrow] = pivcol
pivval = T[pivrow, pivcol]
T[pivrow] = T[pivrow] / pivval
for irow in range(T.shape[0]):
if irow != pivrow:
T[irow] = T[irow] - T[pivrow] * T[irow, pivcol]
# The selected pivot should never lead to a pivot value less than the tol.
if np.isclose(pivval, tol, atol=0, rtol=1e4):
message = (
f"The pivot operation produces a pivot value of:{pivval: .1e}, "
"which is only slightly greater than the specified "
f"tolerance{tol: .1e}. This may lead to issues regarding the "
"numerical stability of the simplex method. "
"Removing redundant constraints, changing the pivot strategy "
"via Bland's rule or increasing the tolerance may "
"help reduce the issue.")
warn(message, OptimizeWarning, stacklevel=5)
def _solve_simplex(T, n, basis, callback, postsolve_args,
maxiter=1000, tol=1e-9, phase=2, bland=False, nit0=0,
):
"""
Solve a linear programming problem in "standard form" using the Simplex
Method. Linear Programming is intended to solve the following problem form:
Minimize::
c @ x
Subject to::
A @ x == b
x >= 0
Parameters
----------
T : 2-D array
A 2-D array representing the simplex tableau, T, corresponding to the
linear programming problem. It should have the form:
[[A[0, 0], A[0, 1], ..., A[0, n_total], b[0]],
[A[1, 0], A[1, 1], ..., A[1, n_total], b[1]],
.
.
.
[A[m, 0], A[m, 1], ..., A[m, n_total], b[m]],
[c[0], c[1], ..., c[n_total], 0]]
for a Phase 2 problem, or the form:
[[A[0, 0], A[0, 1], ..., A[0, n_total], b[0]],
[A[1, 0], A[1, 1], ..., A[1, n_total], b[1]],
.
.
.
[A[m, 0], A[m, 1], ..., A[m, n_total], b[m]],
[c[0], c[1], ..., c[n_total], 0],
[c'[0], c'[1], ..., c'[n_total], 0]]
for a Phase 1 problem (a problem in which a basic feasible solution is
sought prior to maximizing the actual objective. ``T`` is modified in
place by ``_solve_simplex``.
n : int
The number of true variables in the problem.
basis : 1-D array
An array of the indices of the basic variables, such that basis[i]
contains the column corresponding to the basic variable for row i.
Basis is modified in place by _solve_simplex
callback : callable, optional
If a callback function is provided, it will be called within each
iteration of the algorithm. The callback must accept a
`scipy.optimize.OptimizeResult` consisting of the following fields:
x : 1-D array
Current solution vector
fun : float
Current value of the objective function
success : bool
True only when a phase has completed successfully. This
will be False for most iterations.
slack : 1-D array
The values of the slack variables. Each slack variable
corresponds to an inequality constraint. If the slack is zero,
the corresponding constraint is active.
con : 1-D array
The (nominally zero) residuals of the equality constraints,
that is, ``b - A_eq @ x``
phase : int
The phase of the optimization being executed. In phase 1 a basic
feasible solution is sought and the T has an additional row
representing an alternate objective function.
status : int
An integer representing the exit status of the optimization::
0 : Optimization terminated successfully
1 : Iteration limit reached
2 : Problem appears to be infeasible
3 : Problem appears to be unbounded
4 : Serious numerical difficulties encountered
nit : int
The number of iterations performed.
message : str
A string descriptor of the exit status of the optimization.
postsolve_args : tuple
Data needed by _postsolve to convert the solution to the standard-form
problem into the solution to the original problem.
maxiter : int
The maximum number of iterations to perform before aborting the
optimization.
tol : float
The tolerance which determines when a solution is "close enough" to
zero in Phase 1 to be considered a basic feasible solution or close
enough to positive to serve as an optimal solution.
phase : int
The phase of the optimization being executed. In phase 1 a basic
feasible solution is sought and the T has an additional row
representing an alternate objective function.
bland : bool
If True, choose pivots using Bland's rule [3]_. In problems which
fail to converge due to cycling, using Bland's rule can provide
convergence at the expense of a less optimal path about the simplex.
nit0 : int
The initial iteration number used to keep an accurate iteration total
in a two-phase problem.
Returns
-------
nit : int
The number of iterations. Used to keep an accurate iteration total
in the two-phase problem.
status : int
An integer representing the exit status of the optimization::
0 : Optimization terminated successfully
1 : Iteration limit reached
2 : Problem appears to be infeasible
3 : Problem appears to be unbounded
4 : Serious numerical difficulties encountered
"""
nit = nit0
status = 0
message = ''
complete = False
if phase == 1:
m = T.shape[1]-2
elif phase == 2:
m = T.shape[1]-1
else:
raise ValueError("Argument 'phase' to _solve_simplex must be 1 or 2")
if phase == 2:
# Check if any artificial variables are still in the basis.
# If yes, check if any coefficients from this row and a column
# corresponding to one of the non-artificial variable is non-zero.
# If found, pivot at this term. If not, start phase 2.
# Do this for all artificial variables in the basis.
# Ref: "An Introduction to Linear Programming and Game Theory"
# by Paul R. Thie, Gerard E. Keough, 3rd Ed,
# Chapter 3.7 Redundant Systems (pag 102)
for pivrow in [row for row in range(basis.size)
if basis[row] > T.shape[1] - 2]:
non_zero_row = [col for col in range(T.shape[1] - 1)
if abs(T[pivrow, col]) > tol]
if len(non_zero_row) > 0:
pivcol = non_zero_row[0]
_apply_pivot(T, basis, pivrow, pivcol, tol)
nit += 1
if len(basis[:m]) == 0:
solution = np.empty(T.shape[1] - 1, dtype=np.float64)
else:
solution = np.empty(max(T.shape[1] - 1, max(basis[:m]) + 1),
dtype=np.float64)
while not complete:
# Find the pivot column
pivcol_found, pivcol = _pivot_col(T, tol, bland)
if not pivcol_found:
pivcol = np.nan
pivrow = np.nan
status = 0
complete = True
else:
# Find the pivot row
pivrow_found, pivrow = _pivot_row(T, basis, pivcol, phase, tol, bland)
if not pivrow_found:
status = 3
complete = True
if callback is not None:
solution[:] = 0
solution[basis[:n]] = T[:n, -1]
x = solution[:m]
x, fun, slack, con = _postsolve(
x, postsolve_args
)
res = OptimizeResult({
'x': x,
'fun': fun,
'slack': slack,
'con': con,
'status': status,
'message': message,
'nit': nit,
'success': status == 0 and complete,
'phase': phase,
'complete': complete,
})
callback(res)
if not complete:
if nit >= maxiter:
# Iteration limit exceeded
status = 1
complete = True
else:
_apply_pivot(T, basis, pivrow, pivcol, tol)
nit += 1
return nit, status
def _linprog_simplex(c, c0, A, b, callback, postsolve_args,
maxiter=1000, tol=1e-9, disp=False, bland=False,
**unknown_options):
"""
Minimize a linear objective function subject to linear equality and
non-negativity constraints using the two phase simplex method.
Linear programming is intended to solve problems of the following form:
Minimize::
c @ x
Subject to::
A @ x == b
x >= 0
User-facing documentation is in _linprog_doc.py.
Parameters
----------
c : 1-D array
Coefficients of the linear objective function to be minimized.
c0 : float
Constant term in objective function due to fixed (and eliminated)
variables. (Purely for display.)
A : 2-D array
2-D array such that ``A @ x``, gives the values of the equality
constraints at ``x``.
b : 1-D array
1-D array of values representing the right hand side of each equality
constraint (row) in ``A``.
callback : callable, optional
If a callback function is provided, it will be called within each
iteration of the algorithm. The callback function must accept a single
`scipy.optimize.OptimizeResult` consisting of the following fields:
x : 1-D array
Current solution vector
fun : float
Current value of the objective function
success : bool
True when an algorithm has completed successfully.
slack : 1-D array
The values of the slack variables. Each slack variable
corresponds to an inequality constraint. If the slack is zero,
the corresponding constraint is active.
con : 1-D array
The (nominally zero) residuals of the equality constraints,
that is, ``b - A_eq @ x``
phase : int
The phase of the algorithm being executed.
status : int
An integer representing the status of the optimization::
0 : Algorithm proceeding nominally
1 : Iteration limit reached
2 : Problem appears to be infeasible
3 : Problem appears to be unbounded
4 : Serious numerical difficulties encountered
nit : int
The number of iterations performed.
message : str
A string descriptor of the exit status of the optimization.
postsolve_args : tuple
Data needed by _postsolve to convert the solution to the standard-form
problem into the solution to the original problem.
Options
-------
maxiter : int
The maximum number of iterations to perform.
disp : bool
If True, print exit status message to sys.stdout
tol : float
The tolerance which determines when a solution is "close enough" to
zero in Phase 1 to be considered a basic feasible solution or close
enough to positive to serve as an optimal solution.
bland : bool
If True, use Bland's anti-cycling rule [3]_ to choose pivots to
prevent cycling. If False, choose pivots which should lead to a
converged solution more quickly. The latter method is subject to
cycling (non-convergence) in rare instances.
unknown_options : dict
Optional arguments not used by this particular solver. If
`unknown_options` is non-empty a warning is issued listing all
unused options.
Returns
-------
x : 1-D array
Solution vector.
status : int
An integer representing the exit status of the optimization::
0 : Optimization terminated successfully
1 : Iteration limit reached
2 : Problem appears to be infeasible
3 : Problem appears to be unbounded
4 : Serious numerical difficulties encountered
message : str
A string descriptor of the exit status of the optimization.
iteration : int
The number of iterations taken to solve the problem.
References
----------
.. [1] Dantzig, George B., Linear programming and extensions. Rand
Corporation Research Study Princeton Univ. Press, Princeton, NJ,
1963
.. [2] Hillier, S.H. and Lieberman, G.J. (1995), "Introduction to
Mathematical Programming", McGraw-Hill, Chapter 4.
.. [3] Bland, Robert G. New finite pivoting rules for the simplex method.
Mathematics of Operations Research (2), 1977: pp. 103-107.
Notes
-----
The expected problem formulation differs between the top level ``linprog``
module and the method specific solvers. The method specific solvers expect a
problem in standard form:
Minimize::
c @ x
Subject to::
A @ x == b
x >= 0
Whereas the top level ``linprog`` module expects a problem of form:
Minimize::
c @ x
Subject to::
A_ub @ x <= b_ub
A_eq @ x == b_eq
lb <= x <= ub
where ``lb = 0`` and ``ub = None`` unless set in ``bounds``.
The original problem contains equality, upper-bound and variable constraints
whereas the method specific solver requires equality constraints and
variable non-negativity.
``linprog`` module converts the original problem to standard form by
converting the simple bounds to upper bound constraints, introducing
non-negative slack variables for inequality constraints, and expressing
unbounded variables as the difference between two non-negative variables.
"""
_check_unknown_options(unknown_options)
status = 0
messages = {0: "Optimization terminated successfully.",
1: "Iteration limit reached.",
2: "Optimization failed. Unable to find a feasible"
" starting point.",
3: "Optimization failed. The problem appears to be unbounded.",
4: "Optimization failed. Singular matrix encountered."}
n, m = A.shape
# All constraints must have b >= 0.
is_negative_constraint = np.less(b, 0)
A[is_negative_constraint] *= -1
b[is_negative_constraint] *= -1
# As all constraints are equality constraints the artificial variables
# will also be basic variables.
av = np.arange(n) + m
basis = av.copy()
# Format the phase one tableau by adding artificial variables and stacking
# the constraints, the objective row and pseudo-objective row.
row_constraints = np.hstack((A, np.eye(n), b[:, np.newaxis]))
row_objective = np.hstack((c, np.zeros(n), c0))
row_pseudo_objective = -row_constraints.sum(axis=0)
row_pseudo_objective[av] = 0
T = np.vstack((row_constraints, row_objective, row_pseudo_objective))
nit1, status = _solve_simplex(T, n, basis, callback=callback,
postsolve_args=postsolve_args,
maxiter=maxiter, tol=tol, phase=1,
bland=bland
)
# if pseudo objective is zero, remove the last row from the tableau and
# proceed to phase 2
nit2 = nit1
if abs(T[-1, -1]) < tol:
# Remove the pseudo-objective row from the tableau
T = T[:-1, :]
# Remove the artificial variable columns from the tableau
T = np.delete(T, av, 1)
else:
# Failure to find a feasible starting point
status = 2
messages[status] = (
"Phase 1 of the simplex method failed to find a feasible "
"solution. The pseudo-objective function evaluates to {0:.1e} "
"which exceeds the required tolerance of {1} for a solution to be "
"considered 'close enough' to zero to be a basic solution. "
"Consider increasing the tolerance to be greater than {0:.1e}. "
"If this tolerance is unacceptably large the problem may be "
"infeasible.".format(abs(T[-1, -1]), tol)
)
if status == 0:
# Phase 2
nit2, status = _solve_simplex(T, n, basis, callback=callback,
postsolve_args=postsolve_args,
maxiter=maxiter, tol=tol, phase=2,
bland=bland, nit0=nit1
)
solution = np.zeros(n + m)
solution[basis[:n]] = T[:n, -1]
x = solution[:m]
return x, status, messages[status], int(nit2)
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,5 @@
"""This module contains least-squares algorithms."""
from .least_squares import least_squares
from .lsq_linear import lsq_linear
__all__ = ['least_squares', 'lsq_linear']
@@ -0,0 +1,183 @@
"""Bounded-variable least-squares algorithm."""
import numpy as np
from numpy.linalg import norm, lstsq
from scipy.optimize import OptimizeResult
from .common import print_header_linear, print_iteration_linear
def compute_kkt_optimality(g, on_bound):
"""Compute the maximum violation of KKT conditions."""
g_kkt = g * on_bound
free_set = on_bound == 0
g_kkt[free_set] = np.abs(g[free_set])
return np.max(g_kkt)
def bvls(A, b, x_lsq, lb, ub, tol, max_iter, verbose, rcond=None):
m, n = A.shape
x = x_lsq.copy()
on_bound = np.zeros(n)
mask = x <= lb
x[mask] = lb[mask]
on_bound[mask] = -1
mask = x >= ub
x[mask] = ub[mask]
on_bound[mask] = 1
free_set = on_bound == 0
active_set = ~free_set
free_set, = np.nonzero(free_set)
r = A.dot(x) - b
cost = 0.5 * np.dot(r, r)
initial_cost = cost
g = A.T.dot(r)
cost_change = None
step_norm = None
iteration = 0
if verbose == 2:
print_header_linear()
# This is the initialization loop. The requirement is that the
# least-squares solution on free variables is feasible before BVLS starts.
# One possible initialization is to set all variables to lower or upper
# bounds, but many iterations may be required from this state later on.
# The implemented ad-hoc procedure which intuitively should give a better
# initial state: find the least-squares solution on current free variables,
# if its feasible then stop, otherwise, set violating variables to
# corresponding bounds and continue on the reduced set of free variables.
while free_set.size > 0:
if verbose == 2:
optimality = compute_kkt_optimality(g, on_bound)
print_iteration_linear(iteration, cost, cost_change, step_norm,
optimality)
iteration += 1
x_free_old = x[free_set].copy()
A_free = A[:, free_set]
b_free = b - A.dot(x * active_set)
z = lstsq(A_free, b_free, rcond=rcond)[0]
lbv = z < lb[free_set]
ubv = z > ub[free_set]
v = lbv | ubv
if np.any(lbv):
ind = free_set[lbv]
x[ind] = lb[ind]
active_set[ind] = True
on_bound[ind] = -1
if np.any(ubv):
ind = free_set[ubv]
x[ind] = ub[ind]
active_set[ind] = True
on_bound[ind] = 1
ind = free_set[~v]
x[ind] = z[~v]
r = A.dot(x) - b
cost_new = 0.5 * np.dot(r, r)
cost_change = cost - cost_new
cost = cost_new
g = A.T.dot(r)
step_norm = norm(x[free_set] - x_free_old)
if np.any(v):
free_set = free_set[~v]
else:
break
if max_iter is None:
max_iter = n
max_iter += iteration
termination_status = None
# Main BVLS loop.
optimality = compute_kkt_optimality(g, on_bound)
for iteration in range(iteration, max_iter): # BVLS Loop A
if verbose == 2:
print_iteration_linear(iteration, cost, cost_change,
step_norm, optimality)
if optimality < tol:
termination_status = 1
if termination_status is not None:
break
move_to_free = np.argmax(g * on_bound)
on_bound[move_to_free] = 0
while True: # BVLS Loop B
free_set = on_bound == 0
active_set = ~free_set
free_set, = np.nonzero(free_set)
x_free = x[free_set]
x_free_old = x_free.copy()
lb_free = lb[free_set]
ub_free = ub[free_set]
A_free = A[:, free_set]
b_free = b - A.dot(x * active_set)
z = lstsq(A_free, b_free, rcond=rcond)[0]
lbv, = np.nonzero(z < lb_free)
ubv, = np.nonzero(z > ub_free)
v = np.hstack((lbv, ubv))
if v.size > 0:
alphas = np.hstack((
lb_free[lbv] - x_free[lbv],
ub_free[ubv] - x_free[ubv])) / (z[v] - x_free[v])
i = np.argmin(alphas)
i_free = v[i]
alpha = alphas[i]
x_free *= 1 - alpha
x_free += alpha * z
x[free_set] = x_free
if i < lbv.size:
on_bound[free_set[i_free]] = -1
else:
on_bound[free_set[i_free]] = 1
else:
x_free = z
x[free_set] = x_free
break
step_norm = norm(x_free - x_free_old)
r = A.dot(x) - b
cost_new = 0.5 * np.dot(r, r)
cost_change = cost - cost_new
if cost_change < tol * cost:
termination_status = 2
cost = cost_new
g = A.T.dot(r)
optimality = compute_kkt_optimality(g, on_bound)
if termination_status is None:
termination_status = 0
return OptimizeResult(
x=x, fun=r, cost=cost, optimality=optimality, active_mask=on_bound,
nit=iteration + 1, status=termination_status,
initial_cost=initial_cost)
@@ -0,0 +1,733 @@
"""Functions used by least-squares algorithms."""
from math import copysign
import numpy as np
from numpy.linalg import norm
from scipy.linalg import cho_factor, cho_solve, LinAlgError
from scipy.sparse import issparse
from scipy.sparse.linalg import LinearOperator, aslinearoperator
EPS = np.finfo(float).eps
# Functions related to a trust-region problem.
def intersect_trust_region(x, s, Delta):
"""Find the intersection of a line with the boundary of a trust region.
This function solves the quadratic equation with respect to t
||(x + s*t)||**2 = Delta**2.
Returns
-------
t_neg, t_pos : tuple of float
Negative and positive roots.
Raises
------
ValueError
If `s` is zero or `x` is not within the trust region.
"""
a = np.dot(s, s)
if a == 0:
raise ValueError("`s` is zero.")
b = np.dot(x, s)
c = np.dot(x, x) - Delta**2
if c > 0:
raise ValueError("`x` is not within the trust region.")
d = np.sqrt(b*b - a*c) # Root from one fourth of the discriminant.
# Computations below avoid loss of significance, see "Numerical Recipes".
q = -(b + copysign(d, b))
t1 = q / a
t2 = c / q
if t1 < t2:
return t1, t2
else:
return t2, t1
def solve_lsq_trust_region(n, m, uf, s, V, Delta, initial_alpha=None,
rtol=0.01, max_iter=10):
"""Solve a trust-region problem arising in least-squares minimization.
This function implements a method described by J. J. More [1]_ and used
in MINPACK, but it relies on a single SVD of Jacobian instead of series
of Cholesky decompositions. Before running this function, compute:
``U, s, VT = svd(J, full_matrices=False)``.
Parameters
----------
n : int
Number of variables.
m : int
Number of residuals.
uf : ndarray
Computed as U.T.dot(f).
s : ndarray
Singular values of J.
V : ndarray
Transpose of VT.
Delta : float
Radius of a trust region.
initial_alpha : float, optional
Initial guess for alpha, which might be available from a previous
iteration. If None, determined automatically.
rtol : float, optional
Stopping tolerance for the root-finding procedure. Namely, the
solution ``p`` will satisfy ``abs(norm(p) - Delta) < rtol * Delta``.
max_iter : int, optional
Maximum allowed number of iterations for the root-finding procedure.
Returns
-------
p : ndarray, shape (n,)
Found solution of a trust-region problem.
alpha : float
Positive value such that (J.T*J + alpha*I)*p = -J.T*f.
Sometimes called Levenberg-Marquardt parameter.
n_iter : int
Number of iterations made by root-finding procedure. Zero means
that Gauss-Newton step was selected as the solution.
References
----------
.. [1] More, J. J., "The Levenberg-Marquardt Algorithm: Implementation
and Theory," Numerical Analysis, ed. G. A. Watson, Lecture Notes
in Mathematics 630, Springer Verlag, pp. 105-116, 1977.
"""
def phi_and_derivative(alpha, suf, s, Delta):
"""Function of which to find zero.
It is defined as "norm of regularized (by alpha) least-squares
solution minus `Delta`". Refer to [1]_.
"""
denom = s**2 + alpha
p_norm = norm(suf / denom)
phi = p_norm - Delta
phi_prime = -np.sum(suf ** 2 / denom**3) / p_norm
return phi, phi_prime
suf = s * uf
# Check if J has full rank and try Gauss-Newton step.
if m >= n:
threshold = EPS * m * s[0]
full_rank = s[-1] > threshold
else:
full_rank = False
if full_rank:
p = -V.dot(uf / s)
if norm(p) <= Delta:
return p, 0.0, 0
alpha_upper = norm(suf) / Delta
if full_rank:
phi, phi_prime = phi_and_derivative(0.0, suf, s, Delta)
alpha_lower = -phi / phi_prime
else:
alpha_lower = 0.0
if initial_alpha is None or not full_rank and initial_alpha == 0:
alpha = max(0.001 * alpha_upper, (alpha_lower * alpha_upper)**0.5)
else:
alpha = initial_alpha
for it in range(max_iter):
if alpha < alpha_lower or alpha > alpha_upper:
alpha = max(0.001 * alpha_upper, (alpha_lower * alpha_upper)**0.5)
phi, phi_prime = phi_and_derivative(alpha, suf, s, Delta)
if phi < 0:
alpha_upper = alpha
ratio = phi / phi_prime
alpha_lower = max(alpha_lower, alpha - ratio)
alpha -= (phi + Delta) * ratio / Delta
if np.abs(phi) < rtol * Delta:
break
p = -V.dot(suf / (s**2 + alpha))
# Make the norm of p equal to Delta, p is changed only slightly during
# this. It is done to prevent p lie outside the trust region (which can
# cause problems later).
p *= Delta / norm(p)
return p, alpha, it + 1
def solve_trust_region_2d(B, g, Delta):
"""Solve a general trust-region problem in 2 dimensions.
The problem is reformulated as a 4th order algebraic equation,
the solution of which is found by numpy.roots.
Parameters
----------
B : ndarray, shape (2, 2)
Symmetric matrix, defines a quadratic term of the function.
g : ndarray, shape (2,)
Defines a linear term of the function.
Delta : float
Radius of a trust region.
Returns
-------
p : ndarray, shape (2,)
Found solution.
newton_step : bool
Whether the returned solution is the Newton step which lies within
the trust region.
"""
try:
R, lower = cho_factor(B)
p = -cho_solve((R, lower), g)
if np.dot(p, p) <= Delta**2:
return p, True
except LinAlgError:
pass
a = B[0, 0] * Delta**2
b = B[0, 1] * Delta**2
c = B[1, 1] * Delta**2
d = g[0] * Delta
f = g[1] * Delta
coeffs = np.array(
[-b + d, 2 * (a - c + f), 6 * b, 2 * (-a + c + f), -b - d])
t = np.roots(coeffs) # Can handle leading zeros.
t = np.real(t[np.isreal(t)])
p = Delta * np.vstack((2 * t / (1 + t**2), (1 - t**2) / (1 + t**2)))
value = 0.5 * np.sum(p * B.dot(p), axis=0) + np.dot(g, p)
i = np.argmin(value)
p = p[:, i]
return p, False
def update_tr_radius(Delta, actual_reduction, predicted_reduction,
step_norm, bound_hit):
"""Update the radius of a trust region based on the cost reduction.
Returns
-------
Delta : float
New radius.
ratio : float
Ratio between actual and predicted reductions.
"""
if predicted_reduction > 0:
ratio = actual_reduction / predicted_reduction
elif predicted_reduction == actual_reduction == 0:
ratio = 1
else:
ratio = 0
if ratio < 0.25:
Delta = 0.25 * step_norm
elif ratio > 0.75 and bound_hit:
Delta *= 2.0
return Delta, ratio
# Construction and minimization of quadratic functions.
def build_quadratic_1d(J, g, s, diag=None, s0=None):
"""Parameterize a multivariate quadratic function along a line.
The resulting univariate quadratic function is given as follows::
f(t) = 0.5 * (s0 + s*t).T * (J.T*J + diag) * (s0 + s*t) +
g.T * (s0 + s*t)
Parameters
----------
J : ndarray, sparse matrix or LinearOperator shape (m, n)
Jacobian matrix, affects the quadratic term.
g : ndarray, shape (n,)
Gradient, defines the linear term.
s : ndarray, shape (n,)
Direction vector of a line.
diag : None or ndarray with shape (n,), optional
Addition diagonal part, affects the quadratic term.
If None, assumed to be 0.
s0 : None or ndarray with shape (n,), optional
Initial point. If None, assumed to be 0.
Returns
-------
a : float
Coefficient for t**2.
b : float
Coefficient for t.
c : float
Free term. Returned only if `s0` is provided.
"""
v = J.dot(s)
a = np.dot(v, v)
if diag is not None:
a += np.dot(s * diag, s)
a *= 0.5
b = np.dot(g, s)
if s0 is not None:
u = J.dot(s0)
b += np.dot(u, v)
c = 0.5 * np.dot(u, u) + np.dot(g, s0)
if diag is not None:
b += np.dot(s0 * diag, s)
c += 0.5 * np.dot(s0 * diag, s0)
return a, b, c
else:
return a, b
def minimize_quadratic_1d(a, b, lb, ub, c=0):
"""Minimize a 1-D quadratic function subject to bounds.
The free term `c` is 0 by default. Bounds must be finite.
Returns
-------
t : float
Minimum point.
y : float
Minimum value.
"""
t = [lb, ub]
if a != 0:
extremum = -0.5 * b / a
if lb < extremum < ub:
t.append(extremum)
t = np.asarray(t)
y = t * (a * t + b) + c
min_index = np.argmin(y)
return t[min_index], y[min_index]
def evaluate_quadratic(J, g, s, diag=None):
"""Compute values of a quadratic function arising in least squares.
The function is 0.5 * s.T * (J.T * J + diag) * s + g.T * s.
Parameters
----------
J : ndarray, sparse matrix or LinearOperator, shape (m, n)
Jacobian matrix, affects the quadratic term.
g : ndarray, shape (n,)
Gradient, defines the linear term.
s : ndarray, shape (k, n) or (n,)
Array containing steps as rows.
diag : ndarray, shape (n,), optional
Addition diagonal part, affects the quadratic term.
If None, assumed to be 0.
Returns
-------
values : ndarray with shape (k,) or float
Values of the function. If `s` was 2-D, then ndarray is
returned, otherwise, float is returned.
"""
if s.ndim == 1:
Js = J.dot(s)
q = np.dot(Js, Js)
if diag is not None:
q += np.dot(s * diag, s)
else:
Js = J.dot(s.T)
q = np.sum(Js**2, axis=0)
if diag is not None:
q += np.sum(diag * s**2, axis=1)
l = np.dot(s, g)
return 0.5 * q + l
# Utility functions to work with bound constraints.
def in_bounds(x, lb, ub):
"""Check if a point lies within bounds."""
return np.all((x >= lb) & (x <= ub))
def step_size_to_bound(x, s, lb, ub):
"""Compute a min_step size required to reach a bound.
The function computes a positive scalar t, such that x + s * t is on
the bound.
Returns
-------
step : float
Computed step. Non-negative value.
hits : ndarray of int with shape of x
Each element indicates whether a corresponding variable reaches the
bound:
* 0 - the bound was not hit.
* -1 - the lower bound was hit.
* 1 - the upper bound was hit.
"""
non_zero = np.nonzero(s)
s_non_zero = s[non_zero]
steps = np.empty_like(x)
steps.fill(np.inf)
with np.errstate(over='ignore'):
steps[non_zero] = np.maximum((lb - x)[non_zero] / s_non_zero,
(ub - x)[non_zero] / s_non_zero)
min_step = np.min(steps)
return min_step, np.equal(steps, min_step) * np.sign(s).astype(int)
def find_active_constraints(x, lb, ub, rtol=1e-10):
"""Determine which constraints are active in a given point.
The threshold is computed using `rtol` and the absolute value of the
closest bound.
Returns
-------
active : ndarray of int with shape of x
Each component shows whether the corresponding constraint is active:
* 0 - a constraint is not active.
* -1 - a lower bound is active.
* 1 - a upper bound is active.
"""
active = np.zeros_like(x, dtype=int)
if rtol == 0:
active[x <= lb] = -1
active[x >= ub] = 1
return active
lower_dist = x - lb
upper_dist = ub - x
lower_threshold = rtol * np.maximum(1, np.abs(lb))
upper_threshold = rtol * np.maximum(1, np.abs(ub))
lower_active = (np.isfinite(lb) &
(lower_dist <= np.minimum(upper_dist, lower_threshold)))
active[lower_active] = -1
upper_active = (np.isfinite(ub) &
(upper_dist <= np.minimum(lower_dist, upper_threshold)))
active[upper_active] = 1
return active
def make_strictly_feasible(x, lb, ub, rstep=1e-10):
"""Shift a point to the interior of a feasible region.
Each element of the returned vector is at least at a relative distance
`rstep` from the closest bound. If ``rstep=0`` then `np.nextafter` is used.
"""
x_new = x.copy()
active = find_active_constraints(x, lb, ub, rstep)
lower_mask = np.equal(active, -1)
upper_mask = np.equal(active, 1)
if rstep == 0:
x_new[lower_mask] = np.nextafter(lb[lower_mask], ub[lower_mask])
x_new[upper_mask] = np.nextafter(ub[upper_mask], lb[upper_mask])
else:
x_new[lower_mask] = (lb[lower_mask] +
rstep * np.maximum(1, np.abs(lb[lower_mask])))
x_new[upper_mask] = (ub[upper_mask] -
rstep * np.maximum(1, np.abs(ub[upper_mask])))
tight_bounds = (x_new < lb) | (x_new > ub)
x_new[tight_bounds] = 0.5 * (lb[tight_bounds] + ub[tight_bounds])
return x_new
def CL_scaling_vector(x, g, lb, ub):
"""Compute Coleman-Li scaling vector and its derivatives.
Components of a vector v are defined as follows::
| ub[i] - x[i], if g[i] < 0 and ub[i] < np.inf
v[i] = | x[i] - lb[i], if g[i] > 0 and lb[i] > -np.inf
| 1, otherwise
According to this definition v[i] >= 0 for all i. It differs from the
definition in paper [1]_ (eq. (2.2)), where the absolute value of v is
used. Both definitions are equivalent down the line.
Derivatives of v with respect to x take value 1, -1 or 0 depending on a
case.
Returns
-------
v : ndarray with shape of x
Scaling vector.
dv : ndarray with shape of x
Derivatives of v[i] with respect to x[i], diagonal elements of v's
Jacobian.
References
----------
.. [1] M.A. Branch, T.F. Coleman, and Y. Li, "A Subspace, Interior,
and Conjugate Gradient Method for Large-Scale Bound-Constrained
Minimization Problems," SIAM Journal on Scientific Computing,
Vol. 21, Number 1, pp 1-23, 1999.
"""
v = np.ones_like(x)
dv = np.zeros_like(x)
mask = (g < 0) & np.isfinite(ub)
v[mask] = ub[mask] - x[mask]
dv[mask] = -1
mask = (g > 0) & np.isfinite(lb)
v[mask] = x[mask] - lb[mask]
dv[mask] = 1
return v, dv
def reflective_transformation(y, lb, ub):
"""Compute reflective transformation and its gradient."""
if in_bounds(y, lb, ub):
return y, np.ones_like(y)
lb_finite = np.isfinite(lb)
ub_finite = np.isfinite(ub)
x = y.copy()
g_negative = np.zeros_like(y, dtype=bool)
mask = lb_finite & ~ub_finite
x[mask] = np.maximum(y[mask], 2 * lb[mask] - y[mask])
g_negative[mask] = y[mask] < lb[mask]
mask = ~lb_finite & ub_finite
x[mask] = np.minimum(y[mask], 2 * ub[mask] - y[mask])
g_negative[mask] = y[mask] > ub[mask]
mask = lb_finite & ub_finite
d = ub - lb
t = np.remainder(y[mask] - lb[mask], 2 * d[mask])
x[mask] = lb[mask] + np.minimum(t, 2 * d[mask] - t)
g_negative[mask] = t > d[mask]
g = np.ones_like(y)
g[g_negative] = -1
return x, g
# Functions to display algorithm's progress.
def print_header_nonlinear():
print("{:^15}{:^15}{:^15}{:^15}{:^15}{:^15}"
.format("Iteration", "Total nfev", "Cost", "Cost reduction",
"Step norm", "Optimality"))
def print_iteration_nonlinear(iteration, nfev, cost, cost_reduction,
step_norm, optimality):
if cost_reduction is None:
cost_reduction = " " * 15
else:
cost_reduction = f"{cost_reduction:^15.2e}"
if step_norm is None:
step_norm = " " * 15
else:
step_norm = f"{step_norm:^15.2e}"
print("{:^15}{:^15}{:^15.4e}{}{}{:^15.2e}"
.format(iteration, nfev, cost, cost_reduction,
step_norm, optimality))
def print_header_linear():
print("{:^15}{:^15}{:^15}{:^15}{:^15}"
.format("Iteration", "Cost", "Cost reduction", "Step norm",
"Optimality"))
def print_iteration_linear(iteration, cost, cost_reduction, step_norm,
optimality):
if cost_reduction is None:
cost_reduction = " " * 15
else:
cost_reduction = f"{cost_reduction:^15.2e}"
if step_norm is None:
step_norm = " " * 15
else:
step_norm = f"{step_norm:^15.2e}"
print(f"{iteration:^15}{cost:^15.4e}{cost_reduction}{step_norm}{optimality:^15.2e}")
# Simple helper functions.
def compute_grad(J, f):
"""Compute gradient of the least-squares cost function."""
if isinstance(J, LinearOperator):
return J.rmatvec(f)
else:
return J.T.dot(f)
def compute_jac_scale(J, scale_inv_old=None):
"""Compute variables scale based on the Jacobian matrix."""
if issparse(J):
scale_inv = np.asarray(J.power(2).sum(axis=0)).ravel()**0.5
else:
scale_inv = np.sum(J**2, axis=0)**0.5
if scale_inv_old is None:
scale_inv[scale_inv == 0] = 1
else:
scale_inv = np.maximum(scale_inv, scale_inv_old)
return 1 / scale_inv, scale_inv
def left_multiplied_operator(J, d):
"""Return diag(d) J as LinearOperator."""
J = aslinearoperator(J)
def matvec(x):
return d * J.matvec(x)
def matmat(X):
return d[:, np.newaxis] * J.matmat(X)
def rmatvec(x):
return J.rmatvec(x.ravel() * d)
return LinearOperator(J.shape, matvec=matvec, matmat=matmat,
rmatvec=rmatvec)
def right_multiplied_operator(J, d):
"""Return J diag(d) as LinearOperator."""
J = aslinearoperator(J)
def matvec(x):
return J.matvec(np.ravel(x) * d)
def matmat(X):
return J.matmat(X * d[:, np.newaxis])
def rmatvec(x):
return d * J.rmatvec(x)
return LinearOperator(J.shape, matvec=matvec, matmat=matmat,
rmatvec=rmatvec)
def regularized_lsq_operator(J, diag):
"""Return a matrix arising in regularized least squares as LinearOperator.
The matrix is
[ J ]
[ D ]
where D is diagonal matrix with elements from `diag`.
"""
J = aslinearoperator(J)
m, n = J.shape
def matvec(x):
return np.hstack((J.matvec(x), diag * x))
def rmatvec(x):
x1 = x[:m]
x2 = x[m:]
return J.rmatvec(x1) + diag * x2
return LinearOperator((m + n, n), matvec=matvec, rmatvec=rmatvec)
def right_multiply(J, d, copy=True):
"""Compute J diag(d).
If `copy` is False, `J` is modified in place (unless being LinearOperator).
"""
if copy and not isinstance(J, LinearOperator):
J = J.copy()
if issparse(J):
J.data *= d.take(J.indices, mode='clip') # scikit-learn recipe.
elif isinstance(J, LinearOperator):
J = right_multiplied_operator(J, d)
else:
J *= d
return J
def left_multiply(J, d, copy=True):
"""Compute diag(d) J.
If `copy` is False, `J` is modified in place (unless being LinearOperator).
"""
if copy and not isinstance(J, LinearOperator):
J = J.copy()
if issparse(J):
J.data *= np.repeat(d, np.diff(J.indptr)) # scikit-learn recipe.
elif isinstance(J, LinearOperator):
J = left_multiplied_operator(J, d)
else:
J *= d[:, np.newaxis]
return J
def check_termination(dF, F, dx_norm, x_norm, ratio, ftol, xtol):
"""Check termination condition for nonlinear least squares."""
ftol_satisfied = dF < ftol * F and ratio > 0.25
xtol_satisfied = dx_norm < xtol * (xtol + x_norm)
if ftol_satisfied and xtol_satisfied:
return 4
elif ftol_satisfied:
return 2
elif xtol_satisfied:
return 3
else:
return None
def scale_for_robust_loss_function(J, f, rho):
"""Scale Jacobian and residuals for a robust loss function.
Arrays are modified in place.
"""
J_scale = rho[1] + 2 * rho[2] * f**2
J_scale[J_scale < EPS] = EPS
J_scale **= 0.5
f *= rho[1] / J_scale
return left_multiply(J, J_scale, copy=False), f
@@ -0,0 +1,331 @@
"""
Dogleg algorithm with rectangular trust regions for least-squares minimization.
The description of the algorithm can be found in [Voglis]_. The algorithm does
trust-region iterations, but the shape of trust regions is rectangular as
opposed to conventional elliptical. The intersection of a trust region and
an initial feasible region is again some rectangle. Thus, on each iteration a
bound-constrained quadratic optimization problem is solved.
A quadratic problem is solved by well-known dogleg approach, where the
function is minimized along piecewise-linear "dogleg" path [NumOpt]_,
Chapter 4. If Jacobian is not rank-deficient then the function is decreasing
along this path, and optimization amounts to simply following along this
path as long as a point stays within the bounds. A constrained Cauchy step
(along the anti-gradient) is considered for safety in rank deficient cases,
in this situations the convergence might be slow.
If during iterations some variable hit the initial bound and the component
of anti-gradient points outside the feasible region, then a next dogleg step
won't make any progress. At this state such variables satisfy first-order
optimality conditions and they are excluded before computing a next dogleg
step.
Gauss-Newton step can be computed exactly by `numpy.linalg.lstsq` (for dense
Jacobian matrices) or by iterative procedure `scipy.sparse.linalg.lsmr` (for
dense and sparse matrices, or Jacobian being LinearOperator). The second
option allows to solve very large problems (up to couple of millions of
residuals on a regular PC), provided the Jacobian matrix is sufficiently
sparse. But note that dogbox is not very good for solving problems with
large number of constraints, because of variables exclusion-inclusion on each
iteration (a required number of function evaluations might be high or accuracy
of a solution will be poor), thus its large-scale usage is probably limited
to unconstrained problems.
References
----------
.. [Voglis] C. Voglis and I. E. Lagaris, "A Rectangular Trust Region Dogleg
Approach for Unconstrained and Bound Constrained Nonlinear
Optimization", WSEAS International Conference on Applied
Mathematics, Corfu, Greece, 2004.
.. [NumOpt] J. Nocedal and S. J. Wright, "Numerical optimization, 2nd edition".
"""
import numpy as np
from numpy.linalg import lstsq, norm
from scipy.sparse.linalg import LinearOperator, aslinearoperator, lsmr
from scipy.optimize import OptimizeResult
from .common import (
step_size_to_bound, in_bounds, update_tr_radius, evaluate_quadratic,
build_quadratic_1d, minimize_quadratic_1d, compute_grad,
compute_jac_scale, check_termination, scale_for_robust_loss_function,
print_header_nonlinear, print_iteration_nonlinear)
def lsmr_operator(Jop, d, active_set):
"""Compute LinearOperator to use in LSMR by dogbox algorithm.
`active_set` mask is used to excluded active variables from computations
of matrix-vector products.
"""
m, n = Jop.shape
def matvec(x):
x_free = x.ravel().copy()
x_free[active_set] = 0
return Jop.matvec(x * d)
def rmatvec(x):
r = d * Jop.rmatvec(x)
r[active_set] = 0
return r
return LinearOperator((m, n), matvec=matvec, rmatvec=rmatvec, dtype=float)
def find_intersection(x, tr_bounds, lb, ub):
"""Find intersection of trust-region bounds and initial bounds.
Returns
-------
lb_total, ub_total : ndarray with shape of x
Lower and upper bounds of the intersection region.
orig_l, orig_u : ndarray of bool with shape of x
True means that an original bound is taken as a corresponding bound
in the intersection region.
tr_l, tr_u : ndarray of bool with shape of x
True means that a trust-region bound is taken as a corresponding bound
in the intersection region.
"""
lb_centered = lb - x
ub_centered = ub - x
lb_total = np.maximum(lb_centered, -tr_bounds)
ub_total = np.minimum(ub_centered, tr_bounds)
orig_l = np.equal(lb_total, lb_centered)
orig_u = np.equal(ub_total, ub_centered)
tr_l = np.equal(lb_total, -tr_bounds)
tr_u = np.equal(ub_total, tr_bounds)
return lb_total, ub_total, orig_l, orig_u, tr_l, tr_u
def dogleg_step(x, newton_step, g, a, b, tr_bounds, lb, ub):
"""Find dogleg step in a rectangular region.
Returns
-------
step : ndarray, shape (n,)
Computed dogleg step.
bound_hits : ndarray of int, shape (n,)
Each component shows whether a corresponding variable hits the
initial bound after the step is taken:
* 0 - a variable doesn't hit the bound.
* -1 - lower bound is hit.
* 1 - upper bound is hit.
tr_hit : bool
Whether the step hit the boundary of the trust-region.
"""
lb_total, ub_total, orig_l, orig_u, tr_l, tr_u = find_intersection(
x, tr_bounds, lb, ub
)
bound_hits = np.zeros_like(x, dtype=int)
if in_bounds(newton_step, lb_total, ub_total):
return newton_step, bound_hits, False
to_bounds, _ = step_size_to_bound(np.zeros_like(x), -g, lb_total, ub_total)
# The classical dogleg algorithm would check if Cauchy step fits into
# the bounds, and just return it constrained version if not. But in a
# rectangular trust region it makes sense to try to improve constrained
# Cauchy step too. Thus, we don't distinguish these two cases.
cauchy_step = -minimize_quadratic_1d(a, b, 0, to_bounds)[0] * g
step_diff = newton_step - cauchy_step
step_size, hits = step_size_to_bound(cauchy_step, step_diff,
lb_total, ub_total)
bound_hits[(hits < 0) & orig_l] = -1
bound_hits[(hits > 0) & orig_u] = 1
tr_hit = np.any((hits < 0) & tr_l | (hits > 0) & tr_u)
return cauchy_step + step_size * step_diff, bound_hits, tr_hit
def dogbox(fun, jac, x0, f0, J0, lb, ub, ftol, xtol, gtol, max_nfev, x_scale,
loss_function, tr_solver, tr_options, verbose):
f = f0
f_true = f.copy()
nfev = 1
J = J0
njev = 1
if loss_function is not None:
rho = loss_function(f)
cost = 0.5 * np.sum(rho[0])
J, f = scale_for_robust_loss_function(J, f, rho)
else:
cost = 0.5 * np.dot(f, f)
g = compute_grad(J, f)
jac_scale = isinstance(x_scale, str) and x_scale == 'jac'
if jac_scale:
scale, scale_inv = compute_jac_scale(J)
else:
scale, scale_inv = x_scale, 1 / x_scale
Delta = norm(x0 * scale_inv, ord=np.inf)
if Delta == 0:
Delta = 1.0
on_bound = np.zeros_like(x0, dtype=int)
on_bound[np.equal(x0, lb)] = -1
on_bound[np.equal(x0, ub)] = 1
x = x0
step = np.empty_like(x0)
if max_nfev is None:
max_nfev = x0.size * 100
termination_status = None
iteration = 0
step_norm = None
actual_reduction = None
if verbose == 2:
print_header_nonlinear()
while True:
active_set = on_bound * g < 0
free_set = ~active_set
g_free = g[free_set]
g_full = g.copy()
g[active_set] = 0
g_norm = norm(g, ord=np.inf)
if g_norm < gtol:
termination_status = 1
if verbose == 2:
print_iteration_nonlinear(iteration, nfev, cost, actual_reduction,
step_norm, g_norm)
if termination_status is not None or nfev == max_nfev:
break
x_free = x[free_set]
lb_free = lb[free_set]
ub_free = ub[free_set]
scale_free = scale[free_set]
# Compute (Gauss-)Newton and build quadratic model for Cauchy step.
if tr_solver == 'exact':
J_free = J[:, free_set]
newton_step = lstsq(J_free, -f, rcond=-1)[0]
# Coefficients for the quadratic model along the anti-gradient.
a, b = build_quadratic_1d(J_free, g_free, -g_free)
elif tr_solver == 'lsmr':
Jop = aslinearoperator(J)
# We compute lsmr step in scaled variables and then
# transform back to normal variables, if lsmr would give exact lsq
# solution, this would be equivalent to not doing any
# transformations, but from experience it's better this way.
# We pass active_set to make computations as if we selected
# the free subset of J columns, but without actually doing any
# slicing, which is expensive for sparse matrices and impossible
# for LinearOperator.
lsmr_op = lsmr_operator(Jop, scale, active_set)
newton_step = -lsmr(lsmr_op, f, **tr_options)[0][free_set]
newton_step *= scale_free
# Components of g for active variables were zeroed, so this call
# is correct and equivalent to using J_free and g_free.
a, b = build_quadratic_1d(Jop, g, -g)
actual_reduction = -1.0
while actual_reduction <= 0 and nfev < max_nfev:
tr_bounds = Delta * scale_free
step_free, on_bound_free, tr_hit = dogleg_step(
x_free, newton_step, g_free, a, b, tr_bounds, lb_free, ub_free)
step.fill(0.0)
step[free_set] = step_free
if tr_solver == 'exact':
predicted_reduction = -evaluate_quadratic(J_free, g_free,
step_free)
elif tr_solver == 'lsmr':
predicted_reduction = -evaluate_quadratic(Jop, g, step)
# gh11403 ensure that solution is fully within bounds.
x_new = np.clip(x + step, lb, ub)
f_new = fun(x_new)
nfev += 1
step_h_norm = norm(step * scale_inv, ord=np.inf)
if not np.all(np.isfinite(f_new)):
Delta = 0.25 * step_h_norm
continue
# Usual trust-region step quality estimation.
if loss_function is not None:
cost_new = loss_function(f_new, cost_only=True)
else:
cost_new = 0.5 * np.dot(f_new, f_new)
actual_reduction = cost - cost_new
Delta, ratio = update_tr_radius(
Delta, actual_reduction, predicted_reduction,
step_h_norm, tr_hit
)
step_norm = norm(step)
termination_status = check_termination(
actual_reduction, cost, step_norm, norm(x), ratio, ftol, xtol)
if termination_status is not None:
break
if actual_reduction > 0:
on_bound[free_set] = on_bound_free
x = x_new
# Set variables exactly at the boundary.
mask = on_bound == -1
x[mask] = lb[mask]
mask = on_bound == 1
x[mask] = ub[mask]
f = f_new
f_true = f.copy()
cost = cost_new
J = jac(x, f)
njev += 1
if loss_function is not None:
rho = loss_function(f)
J, f = scale_for_robust_loss_function(J, f, rho)
g = compute_grad(J, f)
if jac_scale:
scale, scale_inv = compute_jac_scale(J, scale_inv)
else:
step_norm = 0
actual_reduction = 0
iteration += 1
if termination_status is None:
termination_status = 0
return OptimizeResult(
x=x, cost=cost, fun=f_true, jac=J, grad=g_full, optimality=g_norm,
active_mask=on_bound, nfev=nfev, njev=njev, status=termination_status)
@@ -0,0 +1,967 @@
"""Generic interface for least-squares minimization."""
from warnings import warn
import numpy as np
from numpy.linalg import norm
from scipy.sparse import issparse
from scipy.sparse.linalg import LinearOperator
from scipy.optimize import _minpack, OptimizeResult
from scipy.optimize._numdiff import approx_derivative, group_columns
from scipy.optimize._minimize import Bounds
from .trf import trf
from .dogbox import dogbox
from .common import EPS, in_bounds, make_strictly_feasible
TERMINATION_MESSAGES = {
-1: "Improper input parameters status returned from `leastsq`",
0: "The maximum number of function evaluations is exceeded.",
1: "`gtol` termination condition is satisfied.",
2: "`ftol` termination condition is satisfied.",
3: "`xtol` termination condition is satisfied.",
4: "Both `ftol` and `xtol` termination conditions are satisfied."
}
FROM_MINPACK_TO_COMMON = {
0: -1, # Improper input parameters from MINPACK.
1: 2,
2: 3,
3: 4,
4: 1,
5: 0
# There are 6, 7, 8 for too small tolerance parameters,
# but we guard against it by checking ftol, xtol, gtol beforehand.
}
def call_minpack(fun, x0, jac, ftol, xtol, gtol, max_nfev, x_scale, diff_step):
n = x0.size
if diff_step is None:
epsfcn = EPS
else:
epsfcn = diff_step**2
# Compute MINPACK's `diag`, which is inverse of our `x_scale` and
# ``x_scale='jac'`` corresponds to ``diag=None``.
if isinstance(x_scale, str) and x_scale == 'jac':
diag = None
else:
diag = 1 / x_scale
full_output = True
col_deriv = False
factor = 100.0
if jac is None:
if max_nfev is None:
# n squared to account for Jacobian evaluations.
max_nfev = 100 * n * (n + 1)
x, info, status = _minpack._lmdif(
fun, x0, (), full_output, ftol, xtol, gtol,
max_nfev, epsfcn, factor, diag)
else:
if max_nfev is None:
max_nfev = 100 * n
x, info, status = _minpack._lmder(
fun, jac, x0, (), full_output, col_deriv,
ftol, xtol, gtol, max_nfev, factor, diag)
f = info['fvec']
if callable(jac):
J = jac(x)
else:
J = np.atleast_2d(approx_derivative(fun, x))
cost = 0.5 * np.dot(f, f)
g = J.T.dot(f)
g_norm = norm(g, ord=np.inf)
nfev = info['nfev']
njev = info.get('njev', None)
status = FROM_MINPACK_TO_COMMON[status]
active_mask = np.zeros_like(x0, dtype=int)
return OptimizeResult(
x=x, cost=cost, fun=f, jac=J, grad=g, optimality=g_norm,
active_mask=active_mask, nfev=nfev, njev=njev, status=status)
def prepare_bounds(bounds, n):
lb, ub = (np.asarray(b, dtype=float) for b in bounds)
if lb.ndim == 0:
lb = np.resize(lb, n)
if ub.ndim == 0:
ub = np.resize(ub, n)
return lb, ub
def check_tolerance(ftol, xtol, gtol, method):
def check(tol, name):
if tol is None:
tol = 0
elif tol < EPS:
warn(f"Setting `{name}` below the machine epsilon ({EPS:.2e}) effectively "
f"disables the corresponding termination condition.",
stacklevel=3)
return tol
ftol = check(ftol, "ftol")
xtol = check(xtol, "xtol")
gtol = check(gtol, "gtol")
if method == "lm" and (ftol < EPS or xtol < EPS or gtol < EPS):
raise ValueError("All tolerances must be higher than machine epsilon "
f"({EPS:.2e}) for method 'lm'.")
elif ftol < EPS and xtol < EPS and gtol < EPS:
raise ValueError("At least one of the tolerances must be higher than "
f"machine epsilon ({EPS:.2e}).")
return ftol, xtol, gtol
def check_x_scale(x_scale, x0):
if isinstance(x_scale, str) and x_scale == 'jac':
return x_scale
try:
x_scale = np.asarray(x_scale, dtype=float)
valid = np.all(np.isfinite(x_scale)) and np.all(x_scale > 0)
except (ValueError, TypeError):
valid = False
if not valid:
raise ValueError("`x_scale` must be 'jac' or array_like with "
"positive numbers.")
if x_scale.ndim == 0:
x_scale = np.resize(x_scale, x0.shape)
if x_scale.shape != x0.shape:
raise ValueError("Inconsistent shapes between `x_scale` and `x0`.")
return x_scale
def check_jac_sparsity(jac_sparsity, m, n):
if jac_sparsity is None:
return None
if not issparse(jac_sparsity):
jac_sparsity = np.atleast_2d(jac_sparsity)
if jac_sparsity.shape != (m, n):
raise ValueError("`jac_sparsity` has wrong shape.")
return jac_sparsity, group_columns(jac_sparsity)
# Loss functions.
def huber(z, rho, cost_only):
mask = z <= 1
rho[0, mask] = z[mask]
rho[0, ~mask] = 2 * z[~mask]**0.5 - 1
if cost_only:
return
rho[1, mask] = 1
rho[1, ~mask] = z[~mask]**-0.5
rho[2, mask] = 0
rho[2, ~mask] = -0.5 * z[~mask]**-1.5
def soft_l1(z, rho, cost_only):
t = 1 + z
rho[0] = 2 * (t**0.5 - 1)
if cost_only:
return
rho[1] = t**-0.5
rho[2] = -0.5 * t**-1.5
def cauchy(z, rho, cost_only):
rho[0] = np.log1p(z)
if cost_only:
return
t = 1 + z
rho[1] = 1 / t
rho[2] = -1 / t**2
def arctan(z, rho, cost_only):
rho[0] = np.arctan(z)
if cost_only:
return
t = 1 + z**2
rho[1] = 1 / t
rho[2] = -2 * z / t**2
IMPLEMENTED_LOSSES = dict(linear=None, huber=huber, soft_l1=soft_l1,
cauchy=cauchy, arctan=arctan)
def construct_loss_function(m, loss, f_scale):
if loss == 'linear':
return None
if not callable(loss):
loss = IMPLEMENTED_LOSSES[loss]
rho = np.empty((3, m))
def loss_function(f, cost_only=False):
z = (f / f_scale) ** 2
loss(z, rho, cost_only=cost_only)
if cost_only:
return 0.5 * f_scale ** 2 * np.sum(rho[0])
rho[0] *= f_scale ** 2
rho[2] /= f_scale ** 2
return rho
else:
def loss_function(f, cost_only=False):
z = (f / f_scale) ** 2
rho = loss(z)
if cost_only:
return 0.5 * f_scale ** 2 * np.sum(rho[0])
rho[0] *= f_scale ** 2
rho[2] /= f_scale ** 2
return rho
return loss_function
def least_squares(
fun, x0, jac='2-point', bounds=(-np.inf, np.inf), method='trf',
ftol=1e-8, xtol=1e-8, gtol=1e-8, x_scale=1.0, loss='linear',
f_scale=1.0, diff_step=None, tr_solver=None, tr_options={},
jac_sparsity=None, max_nfev=None, verbose=0, args=(), kwargs={}):
"""Solve a nonlinear least-squares problem with bounds on the variables.
Given the residuals f(x) (an m-D real function of n real
variables) and the loss function rho(s) (a scalar function), `least_squares`
finds a local minimum of the cost function F(x)::
minimize F(x) = 0.5 * sum(rho(f_i(x)**2), i = 0, ..., m - 1)
subject to lb <= x <= ub
The purpose of the loss function rho(s) is to reduce the influence of
outliers on the solution.
Parameters
----------
fun : callable
Function which computes the vector of residuals, with the signature
``fun(x, *args, **kwargs)``, i.e., the minimization proceeds with
respect to its first argument. The argument ``x`` passed to this
function is an ndarray of shape (n,) (never a scalar, even for n=1).
It must allocate and return a 1-D array_like of shape (m,) or a scalar.
If the argument ``x`` is complex or the function ``fun`` returns
complex residuals, it must be wrapped in a real function of real
arguments, as shown at the end of the Examples section.
x0 : array_like with shape (n,) or float
Initial guess on independent variables. If float, it will be treated
as a 1-D array with one element. When `method` is 'trf', the initial
guess might be slightly adjusted to lie sufficiently within the given
`bounds`.
jac : {'2-point', '3-point', 'cs', callable}, optional
Method of computing the Jacobian matrix (an m-by-n matrix, where
element (i, j) is the partial derivative of f[i] with respect to
x[j]). The keywords select a finite difference scheme for numerical
estimation. The scheme '3-point' is more accurate, but requires
twice as many operations as '2-point' (default). The scheme 'cs'
uses complex steps, and while potentially the most accurate, it is
applicable only when `fun` correctly handles complex inputs and
can be analytically continued to the complex plane. Method 'lm'
always uses the '2-point' scheme. If callable, it is used as
``jac(x, *args, **kwargs)`` and should return a good approximation
(or the exact value) for the Jacobian as an array_like (np.atleast_2d
is applied), a sparse matrix (csr_matrix preferred for performance) or
a `scipy.sparse.linalg.LinearOperator`.
bounds : 2-tuple of array_like or `Bounds`, optional
There are two ways to specify bounds:
1. Instance of `Bounds` class
2. Lower and upper bounds on independent variables. Defaults to no
bounds. Each array must match the size of `x0` or be a scalar,
in the latter case a bound will be the same for all variables.
Use ``np.inf`` with an appropriate sign to disable bounds on all
or some variables.
method : {'trf', 'dogbox', 'lm'}, optional
Algorithm to perform minimization.
* 'trf' : Trust Region Reflective algorithm, particularly suitable
for large sparse problems with bounds. Generally robust method.
* 'dogbox' : dogleg algorithm with rectangular trust regions,
typical use case is small problems with bounds. Not recommended
for problems with rank-deficient Jacobian.
* 'lm' : Levenberg-Marquardt algorithm as implemented in MINPACK.
Doesn't handle bounds and sparse Jacobians. Usually the most
efficient method for small unconstrained problems.
Default is 'trf'. See Notes for more information.
ftol : float or None, optional
Tolerance for termination by the change of the cost function. Default
is 1e-8. The optimization process is stopped when ``dF < ftol * F``,
and there was an adequate agreement between a local quadratic model and
the true model in the last step.
If None and 'method' is not 'lm', the termination by this condition is
disabled. If 'method' is 'lm', this tolerance must be higher than
machine epsilon.
xtol : float or None, optional
Tolerance for termination by the change of the independent variables.
Default is 1e-8. The exact condition depends on the `method` used:
* For 'trf' and 'dogbox' : ``norm(dx) < xtol * (xtol + norm(x))``.
* For 'lm' : ``Delta < xtol * norm(xs)``, where ``Delta`` is
a trust-region radius and ``xs`` is the value of ``x``
scaled according to `x_scale` parameter (see below).
If None and 'method' is not 'lm', the termination by this condition is
disabled. If 'method' is 'lm', this tolerance must be higher than
machine epsilon.
gtol : float or None, optional
Tolerance for termination by the norm of the gradient. Default is 1e-8.
The exact condition depends on a `method` used:
* For 'trf' : ``norm(g_scaled, ord=np.inf) < gtol``, where
``g_scaled`` is the value of the gradient scaled to account for
the presence of the bounds [STIR]_.
* For 'dogbox' : ``norm(g_free, ord=np.inf) < gtol``, where
``g_free`` is the gradient with respect to the variables which
are not in the optimal state on the boundary.
* For 'lm' : the maximum absolute value of the cosine of angles
between columns of the Jacobian and the residual vector is less
than `gtol`, or the residual vector is zero.
If None and 'method' is not 'lm', the termination by this condition is
disabled. If 'method' is 'lm', this tolerance must be higher than
machine epsilon.
x_scale : array_like or 'jac', optional
Characteristic scale of each variable. Setting `x_scale` is equivalent
to reformulating the problem in scaled variables ``xs = x / x_scale``.
An alternative view is that the size of a trust region along jth
dimension is proportional to ``x_scale[j]``. Improved convergence may
be achieved by setting `x_scale` such that a step of a given size
along any of the scaled variables has a similar effect on the cost
function. If set to 'jac', the scale is iteratively updated using the
inverse norms of the columns of the Jacobian matrix (as described in
[JJMore]_).
loss : str or callable, optional
Determines the loss function. The following keyword values are allowed:
* 'linear' (default) : ``rho(z) = z``. Gives a standard
least-squares problem.
* 'soft_l1' : ``rho(z) = 2 * ((1 + z)**0.5 - 1)``. The smooth
approximation of l1 (absolute value) loss. Usually a good
choice for robust least squares.
* 'huber' : ``rho(z) = z if z <= 1 else 2*z**0.5 - 1``. Works
similarly to 'soft_l1'.
* 'cauchy' : ``rho(z) = ln(1 + z)``. Severely weakens outliers
influence, but may cause difficulties in optimization process.
* 'arctan' : ``rho(z) = arctan(z)``. Limits a maximum loss on
a single residual, has properties similar to 'cauchy'.
If callable, it must take a 1-D ndarray ``z=f**2`` and return an
array_like with shape (3, m) where row 0 contains function values,
row 1 contains first derivatives and row 2 contains second
derivatives. Method 'lm' supports only 'linear' loss.
f_scale : float, optional
Value of soft margin between inlier and outlier residuals, default
is 1.0. The loss function is evaluated as follows
``rho_(f**2) = C**2 * rho(f**2 / C**2)``, where ``C`` is `f_scale`,
and ``rho`` is determined by `loss` parameter. This parameter has
no effect with ``loss='linear'``, but for other `loss` values it is
of crucial importance.
max_nfev : None or int, optional
Maximum number of function evaluations before the termination.
If None (default), the value is chosen automatically:
* For 'trf' and 'dogbox' : 100 * n.
* For 'lm' : 100 * n if `jac` is callable and 100 * n * (n + 1)
otherwise (because 'lm' counts function calls in Jacobian
estimation).
diff_step : None or array_like, optional
Determines the relative step size for the finite difference
approximation of the Jacobian. The actual step is computed as
``x * diff_step``. If None (default), then `diff_step` is taken to be
a conventional "optimal" power of machine epsilon for the finite
difference scheme used [NR]_.
tr_solver : {None, 'exact', 'lsmr'}, optional
Method for solving trust-region subproblems, relevant only for 'trf'
and 'dogbox' methods.
* 'exact' is suitable for not very large problems with dense
Jacobian matrices. The computational complexity per iteration is
comparable to a singular value decomposition of the Jacobian
matrix.
* 'lsmr' is suitable for problems with sparse and large Jacobian
matrices. It uses the iterative procedure
`scipy.sparse.linalg.lsmr` for finding a solution of a linear
least-squares problem and only requires matrix-vector product
evaluations.
If None (default), the solver is chosen based on the type of Jacobian
returned on the first iteration.
tr_options : dict, optional
Keyword options passed to trust-region solver.
* ``tr_solver='exact'``: `tr_options` are ignored.
* ``tr_solver='lsmr'``: options for `scipy.sparse.linalg.lsmr`.
Additionally, ``method='trf'`` supports 'regularize' option
(bool, default is True), which adds a regularization term to the
normal equation, which improves convergence if the Jacobian is
rank-deficient [Byrd]_ (eq. 3.4).
jac_sparsity : {None, array_like, sparse matrix}, optional
Defines the sparsity structure of the Jacobian matrix for finite
difference estimation, its shape must be (m, n). If the Jacobian has
only few non-zero elements in *each* row, providing the sparsity
structure will greatly speed up the computations [Curtis]_. A zero
entry means that a corresponding element in the Jacobian is identically
zero. If provided, forces the use of 'lsmr' trust-region solver.
If None (default), then dense differencing will be used. Has no effect
for 'lm' method.
verbose : {0, 1, 2}, optional
Level of algorithm's verbosity:
* 0 (default) : work silently.
* 1 : display a termination report.
* 2 : display progress during iterations (not supported by 'lm'
method).
args, kwargs : tuple and dict, optional
Additional arguments passed to `fun` and `jac`. Both empty by default.
The calling signature is ``fun(x, *args, **kwargs)`` and the same for
`jac`.
Returns
-------
result : OptimizeResult
`OptimizeResult` with the following fields defined:
x : ndarray, shape (n,)
Solution found.
cost : float
Value of the cost function at the solution.
fun : ndarray, shape (m,)
Vector of residuals at the solution.
jac : ndarray, sparse matrix or LinearOperator, shape (m, n)
Modified Jacobian matrix at the solution, in the sense that J^T J
is a Gauss-Newton approximation of the Hessian of the cost function.
The type is the same as the one used by the algorithm.
grad : ndarray, shape (m,)
Gradient of the cost function at the solution.
optimality : float
First-order optimality measure. In unconstrained problems, it is
always the uniform norm of the gradient. In constrained problems,
it is the quantity which was compared with `gtol` during iterations.
active_mask : ndarray of int, shape (n,)
Each component shows whether a corresponding constraint is active
(that is, whether a variable is at the bound):
* 0 : a constraint is not active.
* -1 : a lower bound is active.
* 1 : an upper bound is active.
Might be somewhat arbitrary for 'trf' method as it generates a
sequence of strictly feasible iterates and `active_mask` is
determined within a tolerance threshold.
nfev : int
Number of function evaluations done. Methods 'trf' and 'dogbox' do
not count function calls for numerical Jacobian approximation, as
opposed to 'lm' method.
njev : int or None
Number of Jacobian evaluations done. If numerical Jacobian
approximation is used in 'lm' method, it is set to None.
status : int
The reason for algorithm termination:
* -1 : improper input parameters status returned from MINPACK.
* 0 : the maximum number of function evaluations is exceeded.
* 1 : `gtol` termination condition is satisfied.
* 2 : `ftol` termination condition is satisfied.
* 3 : `xtol` termination condition is satisfied.
* 4 : Both `ftol` and `xtol` termination conditions are satisfied.
message : str
Verbal description of the termination reason.
success : bool
True if one of the convergence criteria is satisfied (`status` > 0).
See Also
--------
leastsq : A legacy wrapper for the MINPACK implementation of the
Levenberg-Marquadt algorithm.
curve_fit : Least-squares minimization applied to a curve-fitting problem.
Notes
-----
Method 'lm' (Levenberg-Marquardt) calls a wrapper over least-squares
algorithms implemented in MINPACK (lmder, lmdif). It runs the
Levenberg-Marquardt algorithm formulated as a trust-region type algorithm.
The implementation is based on paper [JJMore]_, it is very robust and
efficient with a lot of smart tricks. It should be your first choice
for unconstrained problems. Note that it doesn't support bounds. Also,
it doesn't work when m < n.
Method 'trf' (Trust Region Reflective) is motivated by the process of
solving a system of equations, which constitute the first-order optimality
condition for a bound-constrained minimization problem as formulated in
[STIR]_. The algorithm iteratively solves trust-region subproblems
augmented by a special diagonal quadratic term and with trust-region shape
determined by the distance from the bounds and the direction of the
gradient. This enhancements help to avoid making steps directly into bounds
and efficiently explore the whole space of variables. To further improve
convergence, the algorithm considers search directions reflected from the
bounds. To obey theoretical requirements, the algorithm keeps iterates
strictly feasible. With dense Jacobians trust-region subproblems are
solved by an exact method very similar to the one described in [JJMore]_
(and implemented in MINPACK). The difference from the MINPACK
implementation is that a singular value decomposition of a Jacobian
matrix is done once per iteration, instead of a QR decomposition and series
of Givens rotation eliminations. For large sparse Jacobians a 2-D subspace
approach of solving trust-region subproblems is used [STIR]_, [Byrd]_.
The subspace is spanned by a scaled gradient and an approximate
Gauss-Newton solution delivered by `scipy.sparse.linalg.lsmr`. When no
constraints are imposed the algorithm is very similar to MINPACK and has
generally comparable performance. The algorithm works quite robust in
unbounded and bounded problems, thus it is chosen as a default algorithm.
Method 'dogbox' operates in a trust-region framework, but considers
rectangular trust regions as opposed to conventional ellipsoids [Voglis]_.
The intersection of a current trust region and initial bounds is again
rectangular, so on each iteration a quadratic minimization problem subject
to bound constraints is solved approximately by Powell's dogleg method
[NumOpt]_. The required Gauss-Newton step can be computed exactly for
dense Jacobians or approximately by `scipy.sparse.linalg.lsmr` for large
sparse Jacobians. The algorithm is likely to exhibit slow convergence when
the rank of Jacobian is less than the number of variables. The algorithm
often outperforms 'trf' in bounded problems with a small number of
variables.
Robust loss functions are implemented as described in [BA]_. The idea
is to modify a residual vector and a Jacobian matrix on each iteration
such that computed gradient and Gauss-Newton Hessian approximation match
the true gradient and Hessian approximation of the cost function. Then
the algorithm proceeds in a normal way, i.e., robust loss functions are
implemented as a simple wrapper over standard least-squares algorithms.
.. versionadded:: 0.17.0
References
----------
.. [STIR] M. A. Branch, T. F. Coleman, and Y. Li, "A Subspace, Interior,
and Conjugate Gradient Method for Large-Scale Bound-Constrained
Minimization Problems," SIAM Journal on Scientific Computing,
Vol. 21, Number 1, pp 1-23, 1999.
.. [NR] William H. Press et. al., "Numerical Recipes. The Art of Scientific
Computing. 3rd edition", Sec. 5.7.
.. [Byrd] R. H. Byrd, R. B. Schnabel and G. A. Shultz, "Approximate
solution of the trust region problem by minimization over
two-dimensional subspaces", Math. Programming, 40, pp. 247-263,
1988.
.. [Curtis] A. Curtis, M. J. D. Powell, and J. Reid, "On the estimation of
sparse Jacobian matrices", Journal of the Institute of
Mathematics and its Applications, 13, pp. 117-120, 1974.
.. [JJMore] J. J. More, "The Levenberg-Marquardt Algorithm: Implementation
and Theory," Numerical Analysis, ed. G. A. Watson, Lecture
Notes in Mathematics 630, Springer Verlag, pp. 105-116, 1977.
.. [Voglis] C. Voglis and I. E. Lagaris, "A Rectangular Trust Region
Dogleg Approach for Unconstrained and Bound Constrained
Nonlinear Optimization", WSEAS International Conference on
Applied Mathematics, Corfu, Greece, 2004.
.. [NumOpt] J. Nocedal and S. J. Wright, "Numerical optimization,
2nd edition", Chapter 4.
.. [BA] B. Triggs et. al., "Bundle Adjustment - A Modern Synthesis",
Proceedings of the International Workshop on Vision Algorithms:
Theory and Practice, pp. 298-372, 1999.
Examples
--------
In this example we find a minimum of the Rosenbrock function without bounds
on independent variables.
>>> import numpy as np
>>> def fun_rosenbrock(x):
... return np.array([10 * (x[1] - x[0]**2), (1 - x[0])])
Notice that we only provide the vector of the residuals. The algorithm
constructs the cost function as a sum of squares of the residuals, which
gives the Rosenbrock function. The exact minimum is at ``x = [1.0, 1.0]``.
>>> from scipy.optimize import least_squares
>>> x0_rosenbrock = np.array([2, 2])
>>> res_1 = least_squares(fun_rosenbrock, x0_rosenbrock)
>>> res_1.x
array([ 1., 1.])
>>> res_1.cost
9.8669242910846867e-30
>>> res_1.optimality
8.8928864934219529e-14
We now constrain the variables, in such a way that the previous solution
becomes infeasible. Specifically, we require that ``x[1] >= 1.5``, and
``x[0]`` left unconstrained. To this end, we specify the `bounds` parameter
to `least_squares` in the form ``bounds=([-np.inf, 1.5], np.inf)``.
We also provide the analytic Jacobian:
>>> def jac_rosenbrock(x):
... return np.array([
... [-20 * x[0], 10],
... [-1, 0]])
Putting this all together, we see that the new solution lies on the bound:
>>> res_2 = least_squares(fun_rosenbrock, x0_rosenbrock, jac_rosenbrock,
... bounds=([-np.inf, 1.5], np.inf))
>>> res_2.x
array([ 1.22437075, 1.5 ])
>>> res_2.cost
0.025213093946805685
>>> res_2.optimality
1.5885401433157753e-07
Now we solve a system of equations (i.e., the cost function should be zero
at a minimum) for a Broyden tridiagonal vector-valued function of 100000
variables:
>>> def fun_broyden(x):
... f = (3 - x) * x + 1
... f[1:] -= x[:-1]
... f[:-1] -= 2 * x[1:]
... return f
The corresponding Jacobian matrix is sparse. We tell the algorithm to
estimate it by finite differences and provide the sparsity structure of
Jacobian to significantly speed up this process.
>>> from scipy.sparse import lil_matrix
>>> def sparsity_broyden(n):
... sparsity = lil_matrix((n, n), dtype=int)
... i = np.arange(n)
... sparsity[i, i] = 1
... i = np.arange(1, n)
... sparsity[i, i - 1] = 1
... i = np.arange(n - 1)
... sparsity[i, i + 1] = 1
... return sparsity
...
>>> n = 100000
>>> x0_broyden = -np.ones(n)
...
>>> res_3 = least_squares(fun_broyden, x0_broyden,
... jac_sparsity=sparsity_broyden(n))
>>> res_3.cost
4.5687069299604613e-23
>>> res_3.optimality
1.1650454296851518e-11
Let's also solve a curve fitting problem using robust loss function to
take care of outliers in the data. Define the model function as
``y = a + b * exp(c * t)``, where t is a predictor variable, y is an
observation and a, b, c are parameters to estimate.
First, define the function which generates the data with noise and
outliers, define the model parameters, and generate data:
>>> from numpy.random import default_rng
>>> rng = default_rng()
>>> def gen_data(t, a, b, c, noise=0., n_outliers=0, seed=None):
... rng = default_rng(seed)
...
... y = a + b * np.exp(t * c)
...
... error = noise * rng.standard_normal(t.size)
... outliers = rng.integers(0, t.size, n_outliers)
... error[outliers] *= 10
...
... return y + error
...
>>> a = 0.5
>>> b = 2.0
>>> c = -1
>>> t_min = 0
>>> t_max = 10
>>> n_points = 15
...
>>> t_train = np.linspace(t_min, t_max, n_points)
>>> y_train = gen_data(t_train, a, b, c, noise=0.1, n_outliers=3)
Define function for computing residuals and initial estimate of
parameters.
>>> def fun(x, t, y):
... return x[0] + x[1] * np.exp(x[2] * t) - y
...
>>> x0 = np.array([1.0, 1.0, 0.0])
Compute a standard least-squares solution:
>>> res_lsq = least_squares(fun, x0, args=(t_train, y_train))
Now compute two solutions with two different robust loss functions. The
parameter `f_scale` is set to 0.1, meaning that inlier residuals should
not significantly exceed 0.1 (the noise level used).
>>> res_soft_l1 = least_squares(fun, x0, loss='soft_l1', f_scale=0.1,
... args=(t_train, y_train))
>>> res_log = least_squares(fun, x0, loss='cauchy', f_scale=0.1,
... args=(t_train, y_train))
And, finally, plot all the curves. We see that by selecting an appropriate
`loss` we can get estimates close to optimal even in the presence of
strong outliers. But keep in mind that generally it is recommended to try
'soft_l1' or 'huber' losses first (if at all necessary) as the other two
options may cause difficulties in optimization process.
>>> t_test = np.linspace(t_min, t_max, n_points * 10)
>>> y_true = gen_data(t_test, a, b, c)
>>> y_lsq = gen_data(t_test, *res_lsq.x)
>>> y_soft_l1 = gen_data(t_test, *res_soft_l1.x)
>>> y_log = gen_data(t_test, *res_log.x)
...
>>> import matplotlib.pyplot as plt
>>> plt.plot(t_train, y_train, 'o')
>>> plt.plot(t_test, y_true, 'k', linewidth=2, label='true')
>>> plt.plot(t_test, y_lsq, label='linear loss')
>>> plt.plot(t_test, y_soft_l1, label='soft_l1 loss')
>>> plt.plot(t_test, y_log, label='cauchy loss')
>>> plt.xlabel("t")
>>> plt.ylabel("y")
>>> plt.legend()
>>> plt.show()
In the next example, we show how complex-valued residual functions of
complex variables can be optimized with ``least_squares()``. Consider the
following function:
>>> def f(z):
... return z - (0.5 + 0.5j)
We wrap it into a function of real variables that returns real residuals
by simply handling the real and imaginary parts as independent variables:
>>> def f_wrap(x):
... fx = f(x[0] + 1j*x[1])
... return np.array([fx.real, fx.imag])
Thus, instead of the original m-D complex function of n complex
variables we optimize a 2m-D real function of 2n real variables:
>>> from scipy.optimize import least_squares
>>> res_wrapped = least_squares(f_wrap, (0.1, 0.1), bounds=([0, 0], [1, 1]))
>>> z = res_wrapped.x[0] + res_wrapped.x[1]*1j
>>> z
(0.49999999999925893+0.49999999999925893j)
"""
if method not in ['trf', 'dogbox', 'lm']:
raise ValueError("`method` must be 'trf', 'dogbox' or 'lm'.")
if jac not in ['2-point', '3-point', 'cs'] and not callable(jac):
raise ValueError("`jac` must be '2-point', '3-point', 'cs' or "
"callable.")
if tr_solver not in [None, 'exact', 'lsmr']:
raise ValueError("`tr_solver` must be None, 'exact' or 'lsmr'.")
if loss not in IMPLEMENTED_LOSSES and not callable(loss):
raise ValueError("`loss` must be one of {} or a callable."
.format(IMPLEMENTED_LOSSES.keys()))
if method == 'lm' and loss != 'linear':
raise ValueError("method='lm' supports only 'linear' loss function.")
if verbose not in [0, 1, 2]:
raise ValueError("`verbose` must be in [0, 1, 2].")
if max_nfev is not None and max_nfev <= 0:
raise ValueError("`max_nfev` must be None or positive integer.")
if np.iscomplexobj(x0):
raise ValueError("`x0` must be real.")
x0 = np.atleast_1d(x0).astype(float)
if x0.ndim > 1:
raise ValueError("`x0` must have at most 1 dimension.")
if isinstance(bounds, Bounds):
lb, ub = bounds.lb, bounds.ub
bounds = (lb, ub)
else:
if len(bounds) == 2:
lb, ub = prepare_bounds(bounds, x0.shape[0])
else:
raise ValueError("`bounds` must contain 2 elements.")
if method == 'lm' and not np.all((lb == -np.inf) & (ub == np.inf)):
raise ValueError("Method 'lm' doesn't support bounds.")
if lb.shape != x0.shape or ub.shape != x0.shape:
raise ValueError("Inconsistent shapes between bounds and `x0`.")
if np.any(lb >= ub):
raise ValueError("Each lower bound must be strictly less than each "
"upper bound.")
if not in_bounds(x0, lb, ub):
raise ValueError("`x0` is infeasible.")
x_scale = check_x_scale(x_scale, x0)
ftol, xtol, gtol = check_tolerance(ftol, xtol, gtol, method)
if method == 'trf':
x0 = make_strictly_feasible(x0, lb, ub)
def fun_wrapped(x):
return np.atleast_1d(fun(x, *args, **kwargs))
f0 = fun_wrapped(x0)
if f0.ndim != 1:
raise ValueError("`fun` must return at most 1-d array_like. "
f"f0.shape: {f0.shape}")
if not np.all(np.isfinite(f0)):
raise ValueError("Residuals are not finite in the initial point.")
n = x0.size
m = f0.size
if method == 'lm' and m < n:
raise ValueError("Method 'lm' doesn't work when the number of "
"residuals is less than the number of variables.")
loss_function = construct_loss_function(m, loss, f_scale)
if callable(loss):
rho = loss_function(f0)
if rho.shape != (3, m):
raise ValueError("The return value of `loss` callable has wrong "
"shape.")
initial_cost = 0.5 * np.sum(rho[0])
elif loss_function is not None:
initial_cost = loss_function(f0, cost_only=True)
else:
initial_cost = 0.5 * np.dot(f0, f0)
if callable(jac):
J0 = jac(x0, *args, **kwargs)
if issparse(J0):
J0 = J0.tocsr()
def jac_wrapped(x, _=None):
return jac(x, *args, **kwargs).tocsr()
elif isinstance(J0, LinearOperator):
def jac_wrapped(x, _=None):
return jac(x, *args, **kwargs)
else:
J0 = np.atleast_2d(J0)
def jac_wrapped(x, _=None):
return np.atleast_2d(jac(x, *args, **kwargs))
else: # Estimate Jacobian by finite differences.
if method == 'lm':
if jac_sparsity is not None:
raise ValueError("method='lm' does not support "
"`jac_sparsity`.")
if jac != '2-point':
warn(f"jac='{jac}' works equivalently to '2-point' for method='lm'.",
stacklevel=2)
J0 = jac_wrapped = None
else:
if jac_sparsity is not None and tr_solver == 'exact':
raise ValueError("tr_solver='exact' is incompatible "
"with `jac_sparsity`.")
jac_sparsity = check_jac_sparsity(jac_sparsity, m, n)
def jac_wrapped(x, f):
J = approx_derivative(fun, x, rel_step=diff_step, method=jac,
f0=f, bounds=bounds, args=args,
kwargs=kwargs, sparsity=jac_sparsity)
if J.ndim != 2: # J is guaranteed not sparse.
J = np.atleast_2d(J)
return J
J0 = jac_wrapped(x0, f0)
if J0 is not None:
if J0.shape != (m, n):
raise ValueError(
f"The return value of `jac` has wrong shape: expected {(m, n)}, "
f"actual {J0.shape}."
)
if not isinstance(J0, np.ndarray):
if method == 'lm':
raise ValueError("method='lm' works only with dense "
"Jacobian matrices.")
if tr_solver == 'exact':
raise ValueError(
"tr_solver='exact' works only with dense "
"Jacobian matrices.")
jac_scale = isinstance(x_scale, str) and x_scale == 'jac'
if isinstance(J0, LinearOperator) and jac_scale:
raise ValueError("x_scale='jac' can't be used when `jac` "
"returns LinearOperator.")
if tr_solver is None:
if isinstance(J0, np.ndarray):
tr_solver = 'exact'
else:
tr_solver = 'lsmr'
if method == 'lm':
result = call_minpack(fun_wrapped, x0, jac_wrapped, ftol, xtol, gtol,
max_nfev, x_scale, diff_step)
elif method == 'trf':
result = trf(fun_wrapped, jac_wrapped, x0, f0, J0, lb, ub, ftol, xtol,
gtol, max_nfev, x_scale, loss_function, tr_solver,
tr_options.copy(), verbose)
elif method == 'dogbox':
if tr_solver == 'lsmr' and 'regularize' in tr_options:
warn("The keyword 'regularize' in `tr_options` is not relevant "
"for 'dogbox' method.",
stacklevel=2)
tr_options = tr_options.copy()
del tr_options['regularize']
result = dogbox(fun_wrapped, jac_wrapped, x0, f0, J0, lb, ub, ftol,
xtol, gtol, max_nfev, x_scale, loss_function,
tr_solver, tr_options, verbose)
result.message = TERMINATION_MESSAGES[result.status]
result.success = result.status > 0
if verbose >= 1:
print(result.message)
print("Function evaluations {}, initial cost {:.4e}, final cost "
"{:.4e}, first-order optimality {:.2e}."
.format(result.nfev, initial_cost, result.cost,
result.optimality))
return result
@@ -0,0 +1,362 @@
"""Linear least squares with bound constraints on independent variables."""
import numpy as np
from numpy.linalg import norm
from scipy.sparse import issparse, csr_matrix
from scipy.sparse.linalg import LinearOperator, lsmr
from scipy.optimize import OptimizeResult
from scipy.optimize._minimize import Bounds
from .common import in_bounds, compute_grad
from .trf_linear import trf_linear
from .bvls import bvls
def prepare_bounds(bounds, n):
if len(bounds) != 2:
raise ValueError("`bounds` must contain 2 elements.")
lb, ub = (np.asarray(b, dtype=float) for b in bounds)
if lb.ndim == 0:
lb = np.resize(lb, n)
if ub.ndim == 0:
ub = np.resize(ub, n)
return lb, ub
TERMINATION_MESSAGES = {
-1: "The algorithm was not able to make progress on the last iteration.",
0: "The maximum number of iterations is exceeded.",
1: "The first-order optimality measure is less than `tol`.",
2: "The relative change of the cost function is less than `tol`.",
3: "The unconstrained solution is optimal."
}
def lsq_linear(A, b, bounds=(-np.inf, np.inf), method='trf', tol=1e-10,
lsq_solver=None, lsmr_tol=None, max_iter=None,
verbose=0, *, lsmr_maxiter=None,):
r"""Solve a linear least-squares problem with bounds on the variables.
Given a m-by-n design matrix A and a target vector b with m elements,
`lsq_linear` solves the following optimization problem::
minimize 0.5 * ||A x - b||**2
subject to lb <= x <= ub
This optimization problem is convex, hence a found minimum (if iterations
have converged) is guaranteed to be global.
Parameters
----------
A : array_like, sparse matrix of LinearOperator, shape (m, n)
Design matrix. Can be `scipy.sparse.linalg.LinearOperator`.
b : array_like, shape (m,)
Target vector.
bounds : 2-tuple of array_like or `Bounds`, optional
Lower and upper bounds on parameters. Defaults to no bounds.
There are two ways to specify the bounds:
- Instance of `Bounds` class.
- 2-tuple of array_like: Each element of the tuple must be either
an array with the length equal to the number of parameters, or a
scalar (in which case the bound is taken to be the same for all
parameters). Use ``np.inf`` with an appropriate sign to disable
bounds on all or some parameters.
method : 'trf' or 'bvls', optional
Method to perform minimization.
* 'trf' : Trust Region Reflective algorithm adapted for a linear
least-squares problem. This is an interior-point-like method
and the required number of iterations is weakly correlated with
the number of variables.
* 'bvls' : Bounded-variable least-squares algorithm. This is
an active set method, which requires the number of iterations
comparable to the number of variables. Can't be used when `A` is
sparse or LinearOperator.
Default is 'trf'.
tol : float, optional
Tolerance parameter. The algorithm terminates if a relative change
of the cost function is less than `tol` on the last iteration.
Additionally, the first-order optimality measure is considered:
* ``method='trf'`` terminates if the uniform norm of the gradient,
scaled to account for the presence of the bounds, is less than
`tol`.
* ``method='bvls'`` terminates if Karush-Kuhn-Tucker conditions
are satisfied within `tol` tolerance.
lsq_solver : {None, 'exact', 'lsmr'}, optional
Method of solving unbounded least-squares problems throughout
iterations:
* 'exact' : Use dense QR or SVD decomposition approach. Can't be
used when `A` is sparse or LinearOperator.
* 'lsmr' : Use `scipy.sparse.linalg.lsmr` iterative procedure
which requires only matrix-vector product evaluations. Can't
be used with ``method='bvls'``.
If None (default), the solver is chosen based on type of `A`.
lsmr_tol : None, float or 'auto', optional
Tolerance parameters 'atol' and 'btol' for `scipy.sparse.linalg.lsmr`
If None (default), it is set to ``1e-2 * tol``. If 'auto', the
tolerance will be adjusted based on the optimality of the current
iterate, which can speed up the optimization process, but is not always
reliable.
max_iter : None or int, optional
Maximum number of iterations before termination. If None (default), it
is set to 100 for ``method='trf'`` or to the number of variables for
``method='bvls'`` (not counting iterations for 'bvls' initialization).
verbose : {0, 1, 2}, optional
Level of algorithm's verbosity:
* 0 : work silently (default).
* 1 : display a termination report.
* 2 : display progress during iterations.
lsmr_maxiter : None or int, optional
Maximum number of iterations for the lsmr least squares solver,
if it is used (by setting ``lsq_solver='lsmr'``). If None (default), it
uses lsmr's default of ``min(m, n)`` where ``m`` and ``n`` are the
number of rows and columns of `A`, respectively. Has no effect if
``lsq_solver='exact'``.
Returns
-------
OptimizeResult with the following fields defined:
x : ndarray, shape (n,)
Solution found.
cost : float
Value of the cost function at the solution.
fun : ndarray, shape (m,)
Vector of residuals at the solution.
optimality : float
First-order optimality measure. The exact meaning depends on `method`,
refer to the description of `tol` parameter.
active_mask : ndarray of int, shape (n,)
Each component shows whether a corresponding constraint is active
(that is, whether a variable is at the bound):
* 0 : a constraint is not active.
* -1 : a lower bound is active.
* 1 : an upper bound is active.
Might be somewhat arbitrary for the `trf` method as it generates a
sequence of strictly feasible iterates and active_mask is determined
within a tolerance threshold.
unbounded_sol : tuple
Unbounded least squares solution tuple returned by the least squares
solver (set with `lsq_solver` option). If `lsq_solver` is not set or is
set to ``'exact'``, the tuple contains an ndarray of shape (n,) with
the unbounded solution, an ndarray with the sum of squared residuals,
an int with the rank of `A`, and an ndarray with the singular values
of `A` (see NumPy's ``linalg.lstsq`` for more information). If
`lsq_solver` is set to ``'lsmr'``, the tuple contains an ndarray of
shape (n,) with the unbounded solution, an int with the exit code,
an int with the number of iterations, and five floats with
various norms and the condition number of `A` (see SciPy's
``sparse.linalg.lsmr`` for more information). This output can be
useful for determining the convergence of the least squares solver,
particularly the iterative ``'lsmr'`` solver. The unbounded least
squares problem is to minimize ``0.5 * ||A x - b||**2``.
nit : int
Number of iterations. Zero if the unconstrained solution is optimal.
status : int
Reason for algorithm termination:
* -1 : the algorithm was not able to make progress on the last
iteration.
* 0 : the maximum number of iterations is exceeded.
* 1 : the first-order optimality measure is less than `tol`.
* 2 : the relative change of the cost function is less than `tol`.
* 3 : the unconstrained solution is optimal.
message : str
Verbal description of the termination reason.
success : bool
True if one of the convergence criteria is satisfied (`status` > 0).
See Also
--------
nnls : Linear least squares with non-negativity constraint.
least_squares : Nonlinear least squares with bounds on the variables.
Notes
-----
The algorithm first computes the unconstrained least-squares solution by
`numpy.linalg.lstsq` or `scipy.sparse.linalg.lsmr` depending on
`lsq_solver`. This solution is returned as optimal if it lies within the
bounds.
Method 'trf' runs the adaptation of the algorithm described in [STIR]_ for
a linear least-squares problem. The iterations are essentially the same as
in the nonlinear least-squares algorithm, but as the quadratic function
model is always accurate, we don't need to track or modify the radius of
a trust region. The line search (backtracking) is used as a safety net
when a selected step does not decrease the cost function. Read more
detailed description of the algorithm in `scipy.optimize.least_squares`.
Method 'bvls' runs a Python implementation of the algorithm described in
[BVLS]_. The algorithm maintains active and free sets of variables, on
each iteration chooses a new variable to move from the active set to the
free set and then solves the unconstrained least-squares problem on free
variables. This algorithm is guaranteed to give an accurate solution
eventually, but may require up to n iterations for a problem with n
variables. Additionally, an ad-hoc initialization procedure is
implemented, that determines which variables to set free or active
initially. It takes some number of iterations before actual BVLS starts,
but can significantly reduce the number of further iterations.
References
----------
.. [STIR] M. A. Branch, T. F. Coleman, and Y. Li, "A Subspace, Interior,
and Conjugate Gradient Method for Large-Scale Bound-Constrained
Minimization Problems," SIAM Journal on Scientific Computing,
Vol. 21, Number 1, pp 1-23, 1999.
.. [BVLS] P. B. Start and R. L. Parker, "Bounded-Variable Least-Squares:
an Algorithm and Applications", Computational Statistics, 10,
129-141, 1995.
Examples
--------
In this example, a problem with a large sparse matrix and bounds on the
variables is solved.
>>> import numpy as np
>>> from scipy.sparse import rand
>>> from scipy.optimize import lsq_linear
>>> rng = np.random.default_rng()
...
>>> m = 20000
>>> n = 10000
...
>>> A = rand(m, n, density=1e-4, random_state=rng)
>>> b = rng.standard_normal(m)
...
>>> lb = rng.standard_normal(n)
>>> ub = lb + 1
...
>>> res = lsq_linear(A, b, bounds=(lb, ub), lsmr_tol='auto', verbose=1)
# may vary
The relative change of the cost function is less than `tol`.
Number of iterations 16, initial cost 1.5039e+04, final cost 1.1112e+04,
first-order optimality 4.66e-08.
"""
if method not in ['trf', 'bvls']:
raise ValueError("`method` must be 'trf' or 'bvls'")
if lsq_solver not in [None, 'exact', 'lsmr']:
raise ValueError("`solver` must be None, 'exact' or 'lsmr'.")
if verbose not in [0, 1, 2]:
raise ValueError("`verbose` must be in [0, 1, 2].")
if issparse(A):
A = csr_matrix(A)
elif not isinstance(A, LinearOperator):
A = np.atleast_2d(np.asarray(A))
if method == 'bvls':
if lsq_solver == 'lsmr':
raise ValueError("method='bvls' can't be used with "
"lsq_solver='lsmr'")
if not isinstance(A, np.ndarray):
raise ValueError("method='bvls' can't be used with `A` being "
"sparse or LinearOperator.")
if lsq_solver is None:
if isinstance(A, np.ndarray):
lsq_solver = 'exact'
else:
lsq_solver = 'lsmr'
elif lsq_solver == 'exact' and not isinstance(A, np.ndarray):
raise ValueError("`exact` solver can't be used when `A` is "
"sparse or LinearOperator.")
if len(A.shape) != 2: # No ndim for LinearOperator.
raise ValueError("`A` must have at most 2 dimensions.")
if max_iter is not None and max_iter <= 0:
raise ValueError("`max_iter` must be None or positive integer.")
m, n = A.shape
b = np.atleast_1d(b)
if b.ndim != 1:
raise ValueError("`b` must have at most 1 dimension.")
if b.size != m:
raise ValueError("Inconsistent shapes between `A` and `b`.")
if isinstance(bounds, Bounds):
lb = bounds.lb
ub = bounds.ub
else:
lb, ub = prepare_bounds(bounds, n)
if lb.shape != (n,) and ub.shape != (n,):
raise ValueError("Bounds have wrong shape.")
if np.any(lb >= ub):
raise ValueError("Each lower bound must be strictly less than each "
"upper bound.")
if lsmr_maxiter is not None and lsmr_maxiter < 1:
raise ValueError("`lsmr_maxiter` must be None or positive integer.")
if not ((isinstance(lsmr_tol, float) and lsmr_tol > 0) or
lsmr_tol in ('auto', None)):
raise ValueError("`lsmr_tol` must be None, 'auto', or positive float.")
if lsq_solver == 'exact':
unbd_lsq = np.linalg.lstsq(A, b, rcond=-1)
elif lsq_solver == 'lsmr':
first_lsmr_tol = lsmr_tol # tol of first call to lsmr
if lsmr_tol is None or lsmr_tol == 'auto':
first_lsmr_tol = 1e-2 * tol # default if lsmr_tol not defined
unbd_lsq = lsmr(A, b, maxiter=lsmr_maxiter,
atol=first_lsmr_tol, btol=first_lsmr_tol)
x_lsq = unbd_lsq[0] # extract the solution from the least squares solver
if in_bounds(x_lsq, lb, ub):
r = A @ x_lsq - b
cost = 0.5 * np.dot(r, r)
termination_status = 3
termination_message = TERMINATION_MESSAGES[termination_status]
g = compute_grad(A, r)
g_norm = norm(g, ord=np.inf)
if verbose > 0:
print(termination_message)
print(f"Final cost {cost:.4e}, first-order optimality {g_norm:.2e}")
return OptimizeResult(
x=x_lsq, fun=r, cost=cost, optimality=g_norm,
active_mask=np.zeros(n), unbounded_sol=unbd_lsq,
nit=0, status=termination_status,
message=termination_message, success=True)
if method == 'trf':
res = trf_linear(A, b, x_lsq, lb, ub, tol, lsq_solver, lsmr_tol,
max_iter, verbose, lsmr_maxiter=lsmr_maxiter)
elif method == 'bvls':
res = bvls(A, b, x_lsq, lb, ub, tol, max_iter, verbose)
res.unbounded_sol = unbd_lsq
res.message = TERMINATION_MESSAGES[res.status]
res.success = res.status > 0
if verbose > 0:
print(res.message)
print(
f"Number of iterations {res.nit}, initial cost {res.initial_cost:.4e}, "
f"final cost {res.cost:.4e}, first-order optimality {res.optimality:.2e}."
)
del res.initial_cost
return res
@@ -0,0 +1,560 @@
"""Trust Region Reflective algorithm for least-squares optimization.
The algorithm is based on ideas from paper [STIR]_. The main idea is to
account for the presence of the bounds by appropriate scaling of the variables (or,
equivalently, changing a trust-region shape). Let's introduce a vector v:
| ub[i] - x[i], if g[i] < 0 and ub[i] < np.inf
v[i] = | x[i] - lb[i], if g[i] > 0 and lb[i] > -np.inf
| 1, otherwise
where g is the gradient of a cost function and lb, ub are the bounds. Its
components are distances to the bounds at which the anti-gradient points (if
this distance is finite). Define a scaling matrix D = diag(v**0.5).
First-order optimality conditions can be stated as
D^2 g(x) = 0.
Meaning that components of the gradient should be zero for strictly interior
variables, and components must point inside the feasible region for variables
on the bound.
Now consider this system of equations as a new optimization problem. If the
point x is strictly interior (not on the bound), then the left-hand side is
differentiable and the Newton step for it satisfies
(D^2 H + diag(g) Jv) p = -D^2 g
where H is the Hessian matrix (or its J^T J approximation in least squares),
Jv is the Jacobian matrix of v with components -1, 1 or 0, such that all
elements of matrix C = diag(g) Jv are non-negative. Introduce the change
of the variables x = D x_h (_h would be "hat" in LaTeX). In the new variables,
we have a Newton step satisfying
B_h p_h = -g_h,
where B_h = D H D + C, g_h = D g. In least squares B_h = J_h^T J_h, where
J_h = J D. Note that J_h and g_h are proper Jacobian and gradient with respect
to "hat" variables. To guarantee global convergence we formulate a
trust-region problem based on the Newton step in the new variables:
0.5 * p_h^T B_h p + g_h^T p_h -> min, ||p_h|| <= Delta
In the original space B = H + D^{-1} C D^{-1}, and the equivalent trust-region
problem is
0.5 * p^T B p + g^T p -> min, ||D^{-1} p|| <= Delta
Here, the meaning of the matrix D becomes more clear: it alters the shape
of a trust-region, such that large steps towards the bounds are not allowed.
In the implementation, the trust-region problem is solved in "hat" space,
but handling of the bounds is done in the original space (see below and read
the code).
The introduction of the matrix D doesn't allow to ignore bounds, the algorithm
must keep iterates strictly feasible (to satisfy aforementioned
differentiability), the parameter theta controls step back from the boundary
(see the code for details).
The algorithm does another important trick. If the trust-region solution
doesn't fit into the bounds, then a reflected (from a firstly encountered
bound) search direction is considered. For motivation and analysis refer to
[STIR]_ paper (and other papers of the authors). In practice, it doesn't need
a lot of justifications, the algorithm simply chooses the best step among
three: a constrained trust-region step, a reflected step and a constrained
Cauchy step (a minimizer along -g_h in "hat" space, or -D^2 g in the original
space).
Another feature is that a trust-region radius control strategy is modified to
account for appearance of the diagonal C matrix (called diag_h in the code).
Note that all described peculiarities are completely gone as we consider
problems without bounds (the algorithm becomes a standard trust-region type
algorithm very similar to ones implemented in MINPACK).
The implementation supports two methods of solving the trust-region problem.
The first, called 'exact', applies SVD on Jacobian and then solves the problem
very accurately using the algorithm described in [JJMore]_. It is not
applicable to large problem. The second, called 'lsmr', uses the 2-D subspace
approach (sometimes called "indefinite dogleg"), where the problem is solved
in a subspace spanned by the gradient and the approximate Gauss-Newton step
found by ``scipy.sparse.linalg.lsmr``. A 2-D trust-region problem is
reformulated as a 4th order algebraic equation and solved very accurately by
``numpy.roots``. The subspace approach allows to solve very large problems
(up to couple of millions of residuals on a regular PC), provided the Jacobian
matrix is sufficiently sparse.
References
----------
.. [STIR] Branch, M.A., T.F. Coleman, and Y. Li, "A Subspace, Interior,
and Conjugate Gradient Method for Large-Scale Bound-Constrained
Minimization Problems," SIAM Journal on Scientific Computing,
Vol. 21, Number 1, pp 1-23, 1999.
.. [JJMore] More, J. J., "The Levenberg-Marquardt Algorithm: Implementation
and Theory," Numerical Analysis, ed. G. A. Watson, Lecture
"""
import numpy as np
from numpy.linalg import norm
from scipy.linalg import svd, qr
from scipy.sparse.linalg import lsmr
from scipy.optimize import OptimizeResult
from .common import (
step_size_to_bound, find_active_constraints, in_bounds,
make_strictly_feasible, intersect_trust_region, solve_lsq_trust_region,
solve_trust_region_2d, minimize_quadratic_1d, build_quadratic_1d,
evaluate_quadratic, right_multiplied_operator, regularized_lsq_operator,
CL_scaling_vector, compute_grad, compute_jac_scale, check_termination,
update_tr_radius, scale_for_robust_loss_function, print_header_nonlinear,
print_iteration_nonlinear)
def trf(fun, jac, x0, f0, J0, lb, ub, ftol, xtol, gtol, max_nfev, x_scale,
loss_function, tr_solver, tr_options, verbose):
# For efficiency, it makes sense to run the simplified version of the
# algorithm when no bounds are imposed. We decided to write the two
# separate functions. It violates the DRY principle, but the individual
# functions are kept the most readable.
if np.all(lb == -np.inf) and np.all(ub == np.inf):
return trf_no_bounds(
fun, jac, x0, f0, J0, ftol, xtol, gtol, max_nfev, x_scale,
loss_function, tr_solver, tr_options, verbose)
else:
return trf_bounds(
fun, jac, x0, f0, J0, lb, ub, ftol, xtol, gtol, max_nfev, x_scale,
loss_function, tr_solver, tr_options, verbose)
def select_step(x, J_h, diag_h, g_h, p, p_h, d, Delta, lb, ub, theta):
"""Select the best step according to Trust Region Reflective algorithm."""
if in_bounds(x + p, lb, ub):
p_value = evaluate_quadratic(J_h, g_h, p_h, diag=diag_h)
return p, p_h, -p_value
p_stride, hits = step_size_to_bound(x, p, lb, ub)
# Compute the reflected direction.
r_h = np.copy(p_h)
r_h[hits.astype(bool)] *= -1
r = d * r_h
# Restrict trust-region step, such that it hits the bound.
p *= p_stride
p_h *= p_stride
x_on_bound = x + p
# Reflected direction will cross first either feasible region or trust
# region boundary.
_, to_tr = intersect_trust_region(p_h, r_h, Delta)
to_bound, _ = step_size_to_bound(x_on_bound, r, lb, ub)
# Find lower and upper bounds on a step size along the reflected
# direction, considering the strict feasibility requirement. There is no
# single correct way to do that, the chosen approach seems to work best
# on test problems.
r_stride = min(to_bound, to_tr)
if r_stride > 0:
r_stride_l = (1 - theta) * p_stride / r_stride
if r_stride == to_bound:
r_stride_u = theta * to_bound
else:
r_stride_u = to_tr
else:
r_stride_l = 0
r_stride_u = -1
# Check if reflection step is available.
if r_stride_l <= r_stride_u:
a, b, c = build_quadratic_1d(J_h, g_h, r_h, s0=p_h, diag=diag_h)
r_stride, r_value = minimize_quadratic_1d(
a, b, r_stride_l, r_stride_u, c=c)
r_h *= r_stride
r_h += p_h
r = r_h * d
else:
r_value = np.inf
# Now correct p_h to make it strictly interior.
p *= theta
p_h *= theta
p_value = evaluate_quadratic(J_h, g_h, p_h, diag=diag_h)
ag_h = -g_h
ag = d * ag_h
to_tr = Delta / norm(ag_h)
to_bound, _ = step_size_to_bound(x, ag, lb, ub)
if to_bound < to_tr:
ag_stride = theta * to_bound
else:
ag_stride = to_tr
a, b = build_quadratic_1d(J_h, g_h, ag_h, diag=diag_h)
ag_stride, ag_value = minimize_quadratic_1d(a, b, 0, ag_stride)
ag_h *= ag_stride
ag *= ag_stride
if p_value < r_value and p_value < ag_value:
return p, p_h, -p_value
elif r_value < p_value and r_value < ag_value:
return r, r_h, -r_value
else:
return ag, ag_h, -ag_value
def trf_bounds(fun, jac, x0, f0, J0, lb, ub, ftol, xtol, gtol, max_nfev,
x_scale, loss_function, tr_solver, tr_options, verbose):
x = x0.copy()
f = f0
f_true = f.copy()
nfev = 1
J = J0
njev = 1
m, n = J.shape
if loss_function is not None:
rho = loss_function(f)
cost = 0.5 * np.sum(rho[0])
J, f = scale_for_robust_loss_function(J, f, rho)
else:
cost = 0.5 * np.dot(f, f)
g = compute_grad(J, f)
jac_scale = isinstance(x_scale, str) and x_scale == 'jac'
if jac_scale:
scale, scale_inv = compute_jac_scale(J)
else:
scale, scale_inv = x_scale, 1 / x_scale
v, dv = CL_scaling_vector(x, g, lb, ub)
v[dv != 0] *= scale_inv[dv != 0]
Delta = norm(x0 * scale_inv / v**0.5)
if Delta == 0:
Delta = 1.0
g_norm = norm(g * v, ord=np.inf)
f_augmented = np.zeros(m + n)
if tr_solver == 'exact':
J_augmented = np.empty((m + n, n))
elif tr_solver == 'lsmr':
reg_term = 0.0
regularize = tr_options.pop('regularize', True)
if max_nfev is None:
max_nfev = x0.size * 100
alpha = 0.0 # "Levenberg-Marquardt" parameter
termination_status = None
iteration = 0
step_norm = None
actual_reduction = None
if verbose == 2:
print_header_nonlinear()
while True:
v, dv = CL_scaling_vector(x, g, lb, ub)
g_norm = norm(g * v, ord=np.inf)
if g_norm < gtol:
termination_status = 1
if verbose == 2:
print_iteration_nonlinear(iteration, nfev, cost, actual_reduction,
step_norm, g_norm)
if termination_status is not None or nfev == max_nfev:
break
# Now compute variables in "hat" space. Here, we also account for
# scaling introduced by `x_scale` parameter. This part is a bit tricky,
# you have to write down the formulas and see how the trust-region
# problem is formulated when the two types of scaling are applied.
# The idea is that first we apply `x_scale` and then apply Coleman-Li
# approach in the new variables.
# v is recomputed in the variables after applying `x_scale`, note that
# components which were identically 1 not affected.
v[dv != 0] *= scale_inv[dv != 0]
# Here, we apply two types of scaling.
d = v**0.5 * scale
# C = diag(g * scale) Jv
diag_h = g * dv * scale
# After all this has been done, we continue normally.
# "hat" gradient.
g_h = d * g
f_augmented[:m] = f
if tr_solver == 'exact':
J_augmented[:m] = J * d
J_h = J_augmented[:m] # Memory view.
J_augmented[m:] = np.diag(diag_h**0.5)
U, s, V = svd(J_augmented, full_matrices=False)
V = V.T
uf = U.T.dot(f_augmented)
elif tr_solver == 'lsmr':
J_h = right_multiplied_operator(J, d)
if regularize:
a, b = build_quadratic_1d(J_h, g_h, -g_h, diag=diag_h)
to_tr = Delta / norm(g_h)
ag_value = minimize_quadratic_1d(a, b, 0, to_tr)[1]
reg_term = -ag_value / Delta**2
lsmr_op = regularized_lsq_operator(J_h, (diag_h + reg_term)**0.5)
gn_h = lsmr(lsmr_op, f_augmented, **tr_options)[0]
S = np.vstack((g_h, gn_h)).T
S, _ = qr(S, mode='economic')
JS = J_h.dot(S) # LinearOperator does dot too.
B_S = np.dot(JS.T, JS) + np.dot(S.T * diag_h, S)
g_S = S.T.dot(g_h)
# theta controls step back step ratio from the bounds.
theta = max(0.995, 1 - g_norm)
actual_reduction = -1
while actual_reduction <= 0 and nfev < max_nfev:
if tr_solver == 'exact':
p_h, alpha, n_iter = solve_lsq_trust_region(
n, m, uf, s, V, Delta, initial_alpha=alpha)
elif tr_solver == 'lsmr':
p_S, _ = solve_trust_region_2d(B_S, g_S, Delta)
p_h = S.dot(p_S)
p = d * p_h # Trust-region solution in the original space.
step, step_h, predicted_reduction = select_step(
x, J_h, diag_h, g_h, p, p_h, d, Delta, lb, ub, theta)
x_new = make_strictly_feasible(x + step, lb, ub, rstep=0)
f_new = fun(x_new)
nfev += 1
step_h_norm = norm(step_h)
if not np.all(np.isfinite(f_new)):
Delta = 0.25 * step_h_norm
continue
# Usual trust-region step quality estimation.
if loss_function is not None:
cost_new = loss_function(f_new, cost_only=True)
else:
cost_new = 0.5 * np.dot(f_new, f_new)
actual_reduction = cost - cost_new
Delta_new, ratio = update_tr_radius(
Delta, actual_reduction, predicted_reduction,
step_h_norm, step_h_norm > 0.95 * Delta)
step_norm = norm(step)
termination_status = check_termination(
actual_reduction, cost, step_norm, norm(x), ratio, ftol, xtol)
if termination_status is not None:
break
alpha *= Delta / Delta_new
Delta = Delta_new
if actual_reduction > 0:
x = x_new
f = f_new
f_true = f.copy()
cost = cost_new
J = jac(x, f)
njev += 1
if loss_function is not None:
rho = loss_function(f)
J, f = scale_for_robust_loss_function(J, f, rho)
g = compute_grad(J, f)
if jac_scale:
scale, scale_inv = compute_jac_scale(J, scale_inv)
else:
step_norm = 0
actual_reduction = 0
iteration += 1
if termination_status is None:
termination_status = 0
active_mask = find_active_constraints(x, lb, ub, rtol=xtol)
return OptimizeResult(
x=x, cost=cost, fun=f_true, jac=J, grad=g, optimality=g_norm,
active_mask=active_mask, nfev=nfev, njev=njev,
status=termination_status)
def trf_no_bounds(fun, jac, x0, f0, J0, ftol, xtol, gtol, max_nfev,
x_scale, loss_function, tr_solver, tr_options, verbose):
x = x0.copy()
f = f0
f_true = f.copy()
nfev = 1
J = J0
njev = 1
m, n = J.shape
if loss_function is not None:
rho = loss_function(f)
cost = 0.5 * np.sum(rho[0])
J, f = scale_for_robust_loss_function(J, f, rho)
else:
cost = 0.5 * np.dot(f, f)
g = compute_grad(J, f)
jac_scale = isinstance(x_scale, str) and x_scale == 'jac'
if jac_scale:
scale, scale_inv = compute_jac_scale(J)
else:
scale, scale_inv = x_scale, 1 / x_scale
Delta = norm(x0 * scale_inv)
if Delta == 0:
Delta = 1.0
if tr_solver == 'lsmr':
reg_term = 0
damp = tr_options.pop('damp', 0.0)
regularize = tr_options.pop('regularize', True)
if max_nfev is None:
max_nfev = x0.size * 100
alpha = 0.0 # "Levenberg-Marquardt" parameter
termination_status = None
iteration = 0
step_norm = None
actual_reduction = None
if verbose == 2:
print_header_nonlinear()
while True:
g_norm = norm(g, ord=np.inf)
if g_norm < gtol:
termination_status = 1
if verbose == 2:
print_iteration_nonlinear(iteration, nfev, cost, actual_reduction,
step_norm, g_norm)
if termination_status is not None or nfev == max_nfev:
break
d = scale
g_h = d * g
if tr_solver == 'exact':
J_h = J * d
U, s, V = svd(J_h, full_matrices=False)
V = V.T
uf = U.T.dot(f)
elif tr_solver == 'lsmr':
J_h = right_multiplied_operator(J, d)
if regularize:
a, b = build_quadratic_1d(J_h, g_h, -g_h)
to_tr = Delta / norm(g_h)
ag_value = minimize_quadratic_1d(a, b, 0, to_tr)[1]
reg_term = -ag_value / Delta**2
damp_full = (damp**2 + reg_term)**0.5
gn_h = lsmr(J_h, f, damp=damp_full, **tr_options)[0]
S = np.vstack((g_h, gn_h)).T
S, _ = qr(S, mode='economic')
JS = J_h.dot(S)
B_S = np.dot(JS.T, JS)
g_S = S.T.dot(g_h)
actual_reduction = -1
while actual_reduction <= 0 and nfev < max_nfev:
if tr_solver == 'exact':
step_h, alpha, n_iter = solve_lsq_trust_region(
n, m, uf, s, V, Delta, initial_alpha=alpha)
elif tr_solver == 'lsmr':
p_S, _ = solve_trust_region_2d(B_S, g_S, Delta)
step_h = S.dot(p_S)
predicted_reduction = -evaluate_quadratic(J_h, g_h, step_h)
step = d * step_h
x_new = x + step
f_new = fun(x_new)
nfev += 1
step_h_norm = norm(step_h)
if not np.all(np.isfinite(f_new)):
Delta = 0.25 * step_h_norm
continue
# Usual trust-region step quality estimation.
if loss_function is not None:
cost_new = loss_function(f_new, cost_only=True)
else:
cost_new = 0.5 * np.dot(f_new, f_new)
actual_reduction = cost - cost_new
Delta_new, ratio = update_tr_radius(
Delta, actual_reduction, predicted_reduction,
step_h_norm, step_h_norm > 0.95 * Delta)
step_norm = norm(step)
termination_status = check_termination(
actual_reduction, cost, step_norm, norm(x), ratio, ftol, xtol)
if termination_status is not None:
break
alpha *= Delta / Delta_new
Delta = Delta_new
if actual_reduction > 0:
x = x_new
f = f_new
f_true = f.copy()
cost = cost_new
J = jac(x, f)
njev += 1
if loss_function is not None:
rho = loss_function(f)
J, f = scale_for_robust_loss_function(J, f, rho)
g = compute_grad(J, f)
if jac_scale:
scale, scale_inv = compute_jac_scale(J, scale_inv)
else:
step_norm = 0
actual_reduction = 0
iteration += 1
if termination_status is None:
termination_status = 0
active_mask = np.zeros_like(x)
return OptimizeResult(
x=x, cost=cost, fun=f_true, jac=J, grad=g, optimality=g_norm,
active_mask=active_mask, nfev=nfev, njev=njev,
status=termination_status)
@@ -0,0 +1,249 @@
"""The adaptation of Trust Region Reflective algorithm for a linear
least-squares problem."""
import numpy as np
from numpy.linalg import norm
from scipy.linalg import qr, solve_triangular
from scipy.sparse.linalg import lsmr
from scipy.optimize import OptimizeResult
from .givens_elimination import givens_elimination
from .common import (
EPS, step_size_to_bound, find_active_constraints, in_bounds,
make_strictly_feasible, build_quadratic_1d, evaluate_quadratic,
minimize_quadratic_1d, CL_scaling_vector, reflective_transformation,
print_header_linear, print_iteration_linear, compute_grad,
regularized_lsq_operator, right_multiplied_operator)
def regularized_lsq_with_qr(m, n, R, QTb, perm, diag, copy_R=True):
"""Solve regularized least squares using information from QR-decomposition.
The initial problem is to solve the following system in a least-squares
sense::
A x = b
D x = 0
where D is diagonal matrix. The method is based on QR decomposition
of the form A P = Q R, where P is a column permutation matrix, Q is an
orthogonal matrix and R is an upper triangular matrix.
Parameters
----------
m, n : int
Initial shape of A.
R : ndarray, shape (n, n)
Upper triangular matrix from QR decomposition of A.
QTb : ndarray, shape (n,)
First n components of Q^T b.
perm : ndarray, shape (n,)
Array defining column permutation of A, such that ith column of
P is perm[i]-th column of identity matrix.
diag : ndarray, shape (n,)
Array containing diagonal elements of D.
Returns
-------
x : ndarray, shape (n,)
Found least-squares solution.
"""
if copy_R:
R = R.copy()
v = QTb.copy()
givens_elimination(R, v, diag[perm])
abs_diag_R = np.abs(np.diag(R))
threshold = EPS * max(m, n) * np.max(abs_diag_R)
nns, = np.nonzero(abs_diag_R > threshold)
R = R[np.ix_(nns, nns)]
v = v[nns]
x = np.zeros(n)
x[perm[nns]] = solve_triangular(R, v)
return x
def backtracking(A, g, x, p, theta, p_dot_g, lb, ub):
"""Find an appropriate step size using backtracking line search."""
alpha = 1
while True:
x_new, _ = reflective_transformation(x + alpha * p, lb, ub)
step = x_new - x
cost_change = -evaluate_quadratic(A, g, step)
if cost_change > -0.1 * alpha * p_dot_g:
break
alpha *= 0.5
active = find_active_constraints(x_new, lb, ub)
if np.any(active != 0):
x_new, _ = reflective_transformation(x + theta * alpha * p, lb, ub)
x_new = make_strictly_feasible(x_new, lb, ub, rstep=0)
step = x_new - x
cost_change = -evaluate_quadratic(A, g, step)
return x, step, cost_change
def select_step(x, A_h, g_h, c_h, p, p_h, d, lb, ub, theta):
"""Select the best step according to Trust Region Reflective algorithm."""
if in_bounds(x + p, lb, ub):
return p
p_stride, hits = step_size_to_bound(x, p, lb, ub)
r_h = np.copy(p_h)
r_h[hits.astype(bool)] *= -1
r = d * r_h
# Restrict step, such that it hits the bound.
p *= p_stride
p_h *= p_stride
x_on_bound = x + p
# Find the step size along reflected direction.
r_stride_u, _ = step_size_to_bound(x_on_bound, r, lb, ub)
# Stay interior.
r_stride_l = (1 - theta) * r_stride_u
r_stride_u *= theta
if r_stride_u > 0:
a, b, c = build_quadratic_1d(A_h, g_h, r_h, s0=p_h, diag=c_h)
r_stride, r_value = minimize_quadratic_1d(
a, b, r_stride_l, r_stride_u, c=c)
r_h = p_h + r_h * r_stride
r = d * r_h
else:
r_value = np.inf
# Now correct p_h to make it strictly interior.
p_h *= theta
p *= theta
p_value = evaluate_quadratic(A_h, g_h, p_h, diag=c_h)
ag_h = -g_h
ag = d * ag_h
ag_stride_u, _ = step_size_to_bound(x, ag, lb, ub)
ag_stride_u *= theta
a, b = build_quadratic_1d(A_h, g_h, ag_h, diag=c_h)
ag_stride, ag_value = minimize_quadratic_1d(a, b, 0, ag_stride_u)
ag *= ag_stride
if p_value < r_value and p_value < ag_value:
return p
elif r_value < p_value and r_value < ag_value:
return r
else:
return ag
def trf_linear(A, b, x_lsq, lb, ub, tol, lsq_solver, lsmr_tol,
max_iter, verbose, *, lsmr_maxiter=None):
m, n = A.shape
x, _ = reflective_transformation(x_lsq, lb, ub)
x = make_strictly_feasible(x, lb, ub, rstep=0.1)
if lsq_solver == 'exact':
QT, R, perm = qr(A, mode='economic', pivoting=True)
QT = QT.T
if m < n:
R = np.vstack((R, np.zeros((n - m, n))))
QTr = np.zeros(n)
k = min(m, n)
elif lsq_solver == 'lsmr':
r_aug = np.zeros(m + n)
auto_lsmr_tol = False
if lsmr_tol is None:
lsmr_tol = 1e-2 * tol
elif lsmr_tol == 'auto':
auto_lsmr_tol = True
r = A.dot(x) - b
g = compute_grad(A, r)
cost = 0.5 * np.dot(r, r)
initial_cost = cost
termination_status = None
step_norm = None
cost_change = None
if max_iter is None:
max_iter = 100
if verbose == 2:
print_header_linear()
for iteration in range(max_iter):
v, dv = CL_scaling_vector(x, g, lb, ub)
g_scaled = g * v
g_norm = norm(g_scaled, ord=np.inf)
if g_norm < tol:
termination_status = 1
if verbose == 2:
print_iteration_linear(iteration, cost, cost_change,
step_norm, g_norm)
if termination_status is not None:
break
diag_h = g * dv
diag_root_h = diag_h ** 0.5
d = v ** 0.5
g_h = d * g
A_h = right_multiplied_operator(A, d)
if lsq_solver == 'exact':
QTr[:k] = QT.dot(r)
p_h = -regularized_lsq_with_qr(m, n, R * d[perm], QTr, perm,
diag_root_h, copy_R=False)
elif lsq_solver == 'lsmr':
lsmr_op = regularized_lsq_operator(A_h, diag_root_h)
r_aug[:m] = r
if auto_lsmr_tol:
eta = 1e-2 * min(0.5, g_norm)
lsmr_tol = max(EPS, min(0.1, eta * g_norm))
p_h = -lsmr(lsmr_op, r_aug, maxiter=lsmr_maxiter,
atol=lsmr_tol, btol=lsmr_tol)[0]
p = d * p_h
p_dot_g = np.dot(p, g)
if p_dot_g > 0:
termination_status = -1
theta = 1 - min(0.005, g_norm)
step = select_step(x, A_h, g_h, diag_h, p, p_h, d, lb, ub, theta)
cost_change = -evaluate_quadratic(A, g, step)
# Perhaps almost never executed, the idea is that `p` is descent
# direction thus we must find acceptable cost decrease using simple
# "backtracking", otherwise the algorithm's logic would break.
if cost_change < 0:
x, step, cost_change = backtracking(
A, g, x, p, theta, p_dot_g, lb, ub)
else:
x = make_strictly_feasible(x + step, lb, ub, rstep=0)
step_norm = norm(step)
r = A.dot(x) - b
g = compute_grad(A, r)
if cost_change < tol * cost:
termination_status = 2
cost = 0.5 * np.dot(r, r)
if termination_status is None:
termination_status = 0
active_mask = find_active_constraints(x, lb, ub, rtol=tol)
return OptimizeResult(
x=x, fun=r, cost=cost, optimality=g_norm, active_mask=active_mask,
nit=iteration + 1, status=termination_status,
initial_cost=initial_cost)
@@ -0,0 +1,392 @@
import warnings
import numpy as np
from scipy.sparse import csc_array, vstack, issparse
from scipy._lib._util import VisibleDeprecationWarning
from ._highs._highs_wrapper import _highs_wrapper # type: ignore[import]
from ._constraints import LinearConstraint, Bounds
from ._optimize import OptimizeResult
from ._linprog_highs import _highs_to_scipy_status_message
def _constraints_to_components(constraints):
"""
Convert sequence of constraints to a single set of components A, b_l, b_u.
`constraints` could be
1. A LinearConstraint
2. A tuple representing a LinearConstraint
3. An invalid object
4. A sequence of composed entirely of objects of type 1/2
5. A sequence containing at least one object of type 3
We want to accept 1, 2, and 4 and reject 3 and 5.
"""
message = ("`constraints` (or each element within `constraints`) must be "
"convertible into an instance of "
"`scipy.optimize.LinearConstraint`.")
As = []
b_ls = []
b_us = []
# Accept case 1 by standardizing as case 4
if isinstance(constraints, LinearConstraint):
constraints = [constraints]
else:
# Reject case 3
try:
iter(constraints)
except TypeError as exc:
raise ValueError(message) from exc
# Accept case 2 by standardizing as case 4
if len(constraints) == 3:
# argument could be a single tuple representing a LinearConstraint
try:
constraints = [LinearConstraint(*constraints)]
except (TypeError, ValueError, VisibleDeprecationWarning):
# argument was not a tuple representing a LinearConstraint
pass
# Address cases 4/5
for constraint in constraints:
# if it's not a LinearConstraint or something that represents a
# LinearConstraint at this point, it's invalid
if not isinstance(constraint, LinearConstraint):
try:
constraint = LinearConstraint(*constraint)
except TypeError as exc:
raise ValueError(message) from exc
As.append(csc_array(constraint.A))
b_ls.append(np.atleast_1d(constraint.lb).astype(np.float64))
b_us.append(np.atleast_1d(constraint.ub).astype(np.float64))
if len(As) > 1:
A = vstack(As, format="csc")
b_l = np.concatenate(b_ls)
b_u = np.concatenate(b_us)
else: # avoid unnecessary copying
A = As[0]
b_l = b_ls[0]
b_u = b_us[0]
return A, b_l, b_u
def _milp_iv(c, integrality, bounds, constraints, options):
# objective IV
if issparse(c):
raise ValueError("`c` must be a dense array.")
c = np.atleast_1d(c).astype(np.float64)
if c.ndim != 1 or c.size == 0 or not np.all(np.isfinite(c)):
message = ("`c` must be a one-dimensional array of finite numbers "
"with at least one element.")
raise ValueError(message)
# integrality IV
if issparse(integrality):
raise ValueError("`integrality` must be a dense array.")
message = ("`integrality` must contain integers 0-3 and be broadcastable "
"to `c.shape`.")
if integrality is None:
integrality = 0
try:
integrality = np.broadcast_to(integrality, c.shape).astype(np.uint8)
except ValueError:
raise ValueError(message)
if integrality.min() < 0 or integrality.max() > 3:
raise ValueError(message)
# bounds IV
if bounds is None:
bounds = Bounds(0, np.inf)
elif not isinstance(bounds, Bounds):
message = ("`bounds` must be convertible into an instance of "
"`scipy.optimize.Bounds`.")
try:
bounds = Bounds(*bounds)
except TypeError as exc:
raise ValueError(message) from exc
try:
lb = np.broadcast_to(bounds.lb, c.shape).astype(np.float64)
ub = np.broadcast_to(bounds.ub, c.shape).astype(np.float64)
except (ValueError, TypeError) as exc:
message = ("`bounds.lb` and `bounds.ub` must contain reals and "
"be broadcastable to `c.shape`.")
raise ValueError(message) from exc
# constraints IV
if not constraints:
constraints = [LinearConstraint(np.empty((0, c.size)),
np.empty((0,)), np.empty((0,)))]
try:
A, b_l, b_u = _constraints_to_components(constraints)
except ValueError as exc:
message = ("`constraints` (or each element within `constraints`) must "
"be convertible into an instance of "
"`scipy.optimize.LinearConstraint`.")
raise ValueError(message) from exc
if A.shape != (b_l.size, c.size):
message = "The shape of `A` must be (len(b_l), len(c))."
raise ValueError(message)
indptr, indices, data = A.indptr, A.indices, A.data.astype(np.float64)
# options IV
options = options or {}
supported_options = {'disp', 'presolve', 'time_limit', 'node_limit',
'mip_rel_gap'}
unsupported_options = set(options).difference(supported_options)
if unsupported_options:
message = (f"Unrecognized options detected: {unsupported_options}. "
"These will be passed to HiGHS verbatim.")
warnings.warn(message, RuntimeWarning, stacklevel=3)
options_iv = {'log_to_console': options.pop("disp", False),
'mip_max_nodes': options.pop("node_limit", None)}
options_iv.update(options)
return c, integrality, lb, ub, indptr, indices, data, b_l, b_u, options_iv
def milp(c, *, integrality=None, bounds=None, constraints=None, options=None):
r"""
Mixed-integer linear programming
Solves problems of the following form:
.. math::
\min_x \ & c^T x \\
\mbox{such that} \ & b_l \leq A x \leq b_u,\\
& l \leq x \leq u, \\
& x_i \in \mathbb{Z}, i \in X_i
where :math:`x` is a vector of decision variables;
:math:`c`, :math:`b_l`, :math:`b_u`, :math:`l`, and :math:`u` are vectors;
:math:`A` is a matrix, and :math:`X_i` is the set of indices of
decision variables that must be integral. (In this context, a
variable that can assume only integer values is said to be "integral";
it has an "integrality" constraint.)
Alternatively, that's:
minimize::
c @ x
such that::
b_l <= A @ x <= b_u
l <= x <= u
Specified elements of x must be integers
By default, ``l = 0`` and ``u = np.inf`` unless specified with
``bounds``.
Parameters
----------
c : 1D dense array_like
The coefficients of the linear objective function to be minimized.
`c` is converted to a double precision array before the problem is
solved.
integrality : 1D dense array_like, optional
Indicates the type of integrality constraint on each decision variable.
``0`` : Continuous variable; no integrality constraint.
``1`` : Integer variable; decision variable must be an integer
within `bounds`.
``2`` : Semi-continuous variable; decision variable must be within
`bounds` or take value ``0``.
``3`` : Semi-integer variable; decision variable must be an integer
within `bounds` or take value ``0``.
By default, all variables are continuous. `integrality` is converted
to an array of integers before the problem is solved.
bounds : scipy.optimize.Bounds, optional
Bounds on the decision variables. Lower and upper bounds are converted
to double precision arrays before the problem is solved. The
``keep_feasible`` parameter of the `Bounds` object is ignored. If
not specified, all decision variables are constrained to be
non-negative.
constraints : sequence of scipy.optimize.LinearConstraint, optional
Linear constraints of the optimization problem. Arguments may be
one of the following:
1. A single `LinearConstraint` object
2. A single tuple that can be converted to a `LinearConstraint` object
as ``LinearConstraint(*constraints)``
3. A sequence composed entirely of objects of type 1. and 2.
Before the problem is solved, all values are converted to double
precision, and the matrices of constraint coefficients are converted to
instances of `scipy.sparse.csc_array`. The ``keep_feasible`` parameter
of `LinearConstraint` objects is ignored.
options : dict, optional
A dictionary of solver options. The following keys are recognized.
disp : bool (default: ``False``)
Set to ``True`` if indicators of optimization status are to be
printed to the console during optimization.
node_limit : int, optional
The maximum number of nodes (linear program relaxations) to solve
before stopping. Default is no maximum number of nodes.
presolve : bool (default: ``True``)
Presolve attempts to identify trivial infeasibilities,
identify trivial unboundedness, and simplify the problem before
sending it to the main solver.
time_limit : float, optional
The maximum number of seconds allotted to solve the problem.
Default is no time limit.
mip_rel_gap : float, optional
Termination criterion for MIP solver: solver will terminate when
the gap between the primal objective value and the dual objective
bound, scaled by the primal objective value, is <= mip_rel_gap.
Returns
-------
res : OptimizeResult
An instance of :class:`scipy.optimize.OptimizeResult`. The object
is guaranteed to have the following attributes.
status : int
An integer representing the exit status of the algorithm.
``0`` : Optimal solution found.
``1`` : Iteration or time limit reached.
``2`` : Problem is infeasible.
``3`` : Problem is unbounded.
``4`` : Other; see message for details.
success : bool
``True`` when an optimal solution is found and ``False`` otherwise.
message : str
A string descriptor of the exit status of the algorithm.
The following attributes will also be present, but the values may be
``None``, depending on the solution status.
x : ndarray
The values of the decision variables that minimize the
objective function while satisfying the constraints.
fun : float
The optimal value of the objective function ``c @ x``.
mip_node_count : int
The number of subproblems or "nodes" solved by the MILP solver.
mip_dual_bound : float
The MILP solver's final estimate of the lower bound on the optimal
solution.
mip_gap : float
The difference between the primal objective value and the dual
objective bound, scaled by the primal objective value.
Notes
-----
`milp` is a wrapper of the HiGHS linear optimization software [1]_. The
algorithm is deterministic, and it typically finds the global optimum of
moderately challenging mixed-integer linear programs (when it exists).
References
----------
.. [1] Huangfu, Q., Galabova, I., Feldmeier, M., and Hall, J. A. J.
"HiGHS - high performance software for linear optimization."
https://highs.dev/
.. [2] Huangfu, Q. and Hall, J. A. J. "Parallelizing the dual revised
simplex method." Mathematical Programming Computation, 10 (1),
119-142, 2018. DOI: 10.1007/s12532-017-0130-5
Examples
--------
Consider the problem at
https://en.wikipedia.org/wiki/Integer_programming#Example, which is
expressed as a maximization problem of two variables. Since `milp` requires
that the problem be expressed as a minimization problem, the objective
function coefficients on the decision variables are:
>>> import numpy as np
>>> c = -np.array([0, 1])
Note the negative sign: we maximize the original objective function
by minimizing the negative of the objective function.
We collect the coefficients of the constraints into arrays like:
>>> A = np.array([[-1, 1], [3, 2], [2, 3]])
>>> b_u = np.array([1, 12, 12])
>>> b_l = np.full_like(b_u, -np.inf)
Because there is no lower limit on these constraints, we have defined a
variable ``b_l`` full of values representing negative infinity. This may
be unfamiliar to users of `scipy.optimize.linprog`, which only accepts
"less than" (or "upper bound") inequality constraints of the form
``A_ub @ x <= b_u``. By accepting both ``b_l`` and ``b_u`` of constraints
``b_l <= A_ub @ x <= b_u``, `milp` makes it easy to specify "greater than"
inequality constraints, "less than" inequality constraints, and equality
constraints concisely.
These arrays are collected into a single `LinearConstraint` object like:
>>> from scipy.optimize import LinearConstraint
>>> constraints = LinearConstraint(A, b_l, b_u)
The non-negativity bounds on the decision variables are enforced by
default, so we do not need to provide an argument for `bounds`.
Finally, the problem states that both decision variables must be integers:
>>> integrality = np.ones_like(c)
We solve the problem like:
>>> from scipy.optimize import milp
>>> res = milp(c=c, constraints=constraints, integrality=integrality)
>>> res.x
[1.0, 2.0]
Note that had we solved the relaxed problem (without integrality
constraints):
>>> res = milp(c=c, constraints=constraints) # OR:
>>> # from scipy.optimize import linprog; res = linprog(c, A, b_u)
>>> res.x
[1.8, 2.8]
we would not have obtained the correct solution by rounding to the nearest
integers.
Other examples are given :ref:`in the tutorial <tutorial-optimize_milp>`.
"""
args_iv = _milp_iv(c, integrality, bounds, constraints, options)
c, integrality, lb, ub, indptr, indices, data, b_l, b_u, options = args_iv
highs_res = _highs_wrapper(c, indptr, indices, data, b_l, b_u,
lb, ub, integrality, options)
res = {}
# Convert to scipy-style status and message
highs_status = highs_res.get('status', None)
highs_message = highs_res.get('message', None)
status, message = _highs_to_scipy_status_message(highs_status,
highs_message)
res['status'] = status
res['message'] = message
res['success'] = (status == 0)
x = highs_res.get('x', None)
res['x'] = np.array(x) if x is not None else None
res['fun'] = highs_res.get('fun', None)
res['mip_node_count'] = highs_res.get('mip_node_count', None)
res['mip_dual_bound'] = highs_res.get('mip_dual_bound', None)
res['mip_gap'] = highs_res.get('mip_gap', None)
return OptimizeResult(res)
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,164 @@
import numpy as np
from scipy.linalg import solve, LinAlgWarning
import warnings
__all__ = ['nnls']
def nnls(A, b, maxiter=None, *, atol=None):
"""
Solve ``argmin_x || Ax - b ||_2`` for ``x>=0``.
This problem, often called as NonNegative Least Squares, is a convex
optimization problem with convex constraints. It typically arises when
the ``x`` models quantities for which only nonnegative values are
attainable; weight of ingredients, component costs and so on.
Parameters
----------
A : (m, n) ndarray
Coefficient array
b : (m,) ndarray, float
Right-hand side vector.
maxiter: int, optional
Maximum number of iterations, optional. Default value is ``3 * n``.
atol: float
Tolerance value used in the algorithm to assess closeness to zero in
the projected residual ``(A.T @ (A x - b)`` entries. Increasing this
value relaxes the solution constraints. A typical relaxation value can
be selected as ``max(m, n) * np.linalg.norm(a, 1) * np.spacing(1.)``.
This value is not set as default since the norm operation becomes
expensive for large problems hence can be used only when necessary.
Returns
-------
x : ndarray
Solution vector.
rnorm : float
The 2-norm of the residual, ``|| Ax-b ||_2``.
See Also
--------
lsq_linear : Linear least squares with bounds on the variables
Notes
-----
The code is based on [2]_ which is an improved version of the classical
algorithm of [1]_. It utilizes an active set method and solves the KKT
(Karush-Kuhn-Tucker) conditions for the non-negative least squares problem.
References
----------
.. [1] : Lawson C., Hanson R.J., "Solving Least Squares Problems", SIAM,
1995, :doi:`10.1137/1.9781611971217`
.. [2] : Bro, Rasmus and de Jong, Sijmen, "A Fast Non-Negativity-
Constrained Least Squares Algorithm", Journal Of Chemometrics, 1997,
:doi:`10.1002/(SICI)1099-128X(199709/10)11:5<393::AID-CEM483>3.0.CO;2-L`
Examples
--------
>>> import numpy as np
>>> from scipy.optimize import nnls
...
>>> A = np.array([[1, 0], [1, 0], [0, 1]])
>>> b = np.array([2, 1, 1])
>>> nnls(A, b)
(array([1.5, 1. ]), 0.7071067811865475)
>>> b = np.array([-1, -1, -1])
>>> nnls(A, b)
(array([0., 0.]), 1.7320508075688772)
"""
A = np.asarray_chkfinite(A)
b = np.asarray_chkfinite(b)
if len(A.shape) != 2:
raise ValueError("Expected a two-dimensional array (matrix)" +
f", but the shape of A is {A.shape}")
if len(b.shape) != 1:
raise ValueError("Expected a one-dimensional array (vector)" +
f", but the shape of b is {b.shape}")
m, n = A.shape
if m != b.shape[0]:
raise ValueError(
"Incompatible dimensions. The first dimension of " +
f"A is {m}, while the shape of b is {(b.shape[0], )}")
x, rnorm, mode = _nnls(A, b, maxiter, tol=atol)
if mode != 1:
raise RuntimeError("Maximum number of iterations reached.")
return x, rnorm
def _nnls(A, b, maxiter=None, tol=None):
"""
This is a single RHS algorithm from ref [2] above. For multiple RHS
support, the algorithm is given in :doi:`10.1002/cem.889`
"""
m, n = A.shape
AtA = A.T @ A
Atb = b @ A # Result is 1D - let NumPy figure it out
if not maxiter:
maxiter = 3*n
if tol is None:
tol = 10 * max(m, n) * np.spacing(1.)
# Initialize vars
x = np.zeros(n, dtype=np.float64)
s = np.zeros(n, dtype=np.float64)
# Inactive constraint switches
P = np.zeros(n, dtype=bool)
# Projected residual
w = Atb.copy().astype(np.float64) # x=0. Skip (-AtA @ x) term
# Overall iteration counter
# Outer loop is not counted, inner iter is counted across outer spins
iter = 0
while (not P.all()) and (w[~P] > tol).any(): # B
# Get the "most" active coeff index and move to inactive set
k = np.argmax(w * (~P)) # B.2
P[k] = True # B.3
# Iteration solution
s[:] = 0.
# B.4
with warnings.catch_warnings():
warnings.filterwarnings('ignore', message='Ill-conditioned matrix',
category=LinAlgWarning)
s[P] = solve(AtA[np.ix_(P, P)], Atb[P], assume_a='sym', check_finite=False)
# Inner loop
while (iter < maxiter) and (s[P].min() < 0): # C.1
iter += 1
inds = P * (s < 0)
alpha = (x[inds] / (x[inds] - s[inds])).min() # C.2
x *= (1 - alpha)
x += alpha*s
P[x <= tol] = False
with warnings.catch_warnings():
warnings.filterwarnings('ignore', message='Ill-conditioned matrix',
category=LinAlgWarning)
s[P] = solve(AtA[np.ix_(P, P)], Atb[P], assume_a='sym',
check_finite=False)
s[~P] = 0 # C.6
x[:] = s[:]
w[:] = Atb - AtA @ x
if iter == maxiter:
# Typically following line should return
# return x, np.linalg.norm(A@x - b), -1
# however at the top level, -1 raises an exception wasting norm
# Instead return dummy number 0.
return x, 0., -1
return x, np.linalg.norm(A@x - b), 1
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,775 @@
"""Routines for numerical differentiation."""
import functools
import numpy as np
from numpy.linalg import norm
from scipy.sparse.linalg import LinearOperator
from ..sparse import issparse, csc_matrix, csr_matrix, coo_matrix, find
from ._group_columns import group_dense, group_sparse
from scipy._lib._array_api import atleast_nd, array_namespace
def _adjust_scheme_to_bounds(x0, h, num_steps, scheme, lb, ub):
"""Adjust final difference scheme to the presence of bounds.
Parameters
----------
x0 : ndarray, shape (n,)
Point at which we wish to estimate derivative.
h : ndarray, shape (n,)
Desired absolute finite difference steps.
num_steps : int
Number of `h` steps in one direction required to implement finite
difference scheme. For example, 2 means that we need to evaluate
f(x0 + 2 * h) or f(x0 - 2 * h)
scheme : {'1-sided', '2-sided'}
Whether steps in one or both directions are required. In other
words '1-sided' applies to forward and backward schemes, '2-sided'
applies to center schemes.
lb : ndarray, shape (n,)
Lower bounds on independent variables.
ub : ndarray, shape (n,)
Upper bounds on independent variables.
Returns
-------
h_adjusted : ndarray, shape (n,)
Adjusted absolute step sizes. Step size decreases only if a sign flip
or switching to one-sided scheme doesn't allow to take a full step.
use_one_sided : ndarray of bool, shape (n,)
Whether to switch to one-sided scheme. Informative only for
``scheme='2-sided'``.
"""
if scheme == '1-sided':
use_one_sided = np.ones_like(h, dtype=bool)
elif scheme == '2-sided':
h = np.abs(h)
use_one_sided = np.zeros_like(h, dtype=bool)
else:
raise ValueError("`scheme` must be '1-sided' or '2-sided'.")
if np.all((lb == -np.inf) & (ub == np.inf)):
return h, use_one_sided
h_total = h * num_steps
h_adjusted = h.copy()
lower_dist = x0 - lb
upper_dist = ub - x0
if scheme == '1-sided':
x = x0 + h_total
violated = (x < lb) | (x > ub)
fitting = np.abs(h_total) <= np.maximum(lower_dist, upper_dist)
h_adjusted[violated & fitting] *= -1
forward = (upper_dist >= lower_dist) & ~fitting
h_adjusted[forward] = upper_dist[forward] / num_steps
backward = (upper_dist < lower_dist) & ~fitting
h_adjusted[backward] = -lower_dist[backward] / num_steps
elif scheme == '2-sided':
central = (lower_dist >= h_total) & (upper_dist >= h_total)
forward = (upper_dist >= lower_dist) & ~central
h_adjusted[forward] = np.minimum(
h[forward], 0.5 * upper_dist[forward] / num_steps)
use_one_sided[forward] = True
backward = (upper_dist < lower_dist) & ~central
h_adjusted[backward] = -np.minimum(
h[backward], 0.5 * lower_dist[backward] / num_steps)
use_one_sided[backward] = True
min_dist = np.minimum(upper_dist, lower_dist) / num_steps
adjusted_central = (~central & (np.abs(h_adjusted) <= min_dist))
h_adjusted[adjusted_central] = min_dist[adjusted_central]
use_one_sided[adjusted_central] = False
return h_adjusted, use_one_sided
@functools.lru_cache
def _eps_for_method(x0_dtype, f0_dtype, method):
"""
Calculates relative EPS step to use for a given data type
and numdiff step method.
Progressively smaller steps are used for larger floating point types.
Parameters
----------
f0_dtype: np.dtype
dtype of function evaluation
x0_dtype: np.dtype
dtype of parameter vector
method: {'2-point', '3-point', 'cs'}
Returns
-------
EPS: float
relative step size. May be np.float16, np.float32, np.float64
Notes
-----
The default relative step will be np.float64. However, if x0 or f0 are
smaller floating point types (np.float16, np.float32), then the smallest
floating point type is chosen.
"""
# the default EPS value
EPS = np.finfo(np.float64).eps
x0_is_fp = False
if np.issubdtype(x0_dtype, np.inexact):
# if you're a floating point type then over-ride the default EPS
EPS = np.finfo(x0_dtype).eps
x0_itemsize = np.dtype(x0_dtype).itemsize
x0_is_fp = True
if np.issubdtype(f0_dtype, np.inexact):
f0_itemsize = np.dtype(f0_dtype).itemsize
# choose the smallest itemsize between x0 and f0
if x0_is_fp and f0_itemsize < x0_itemsize:
EPS = np.finfo(f0_dtype).eps
if method in ["2-point", "cs"]:
return EPS**0.5
elif method in ["3-point"]:
return EPS**(1/3)
else:
raise RuntimeError("Unknown step method, should be one of "
"{'2-point', '3-point', 'cs'}")
def _compute_absolute_step(rel_step, x0, f0, method):
"""
Computes an absolute step from a relative step for finite difference
calculation.
Parameters
----------
rel_step: None or array-like
Relative step for the finite difference calculation
x0 : np.ndarray
Parameter vector
f0 : np.ndarray or scalar
method : {'2-point', '3-point', 'cs'}
Returns
-------
h : float
The absolute step size
Notes
-----
`h` will always be np.float64. However, if `x0` or `f0` are
smaller floating point dtypes (e.g. np.float32), then the absolute
step size will be calculated from the smallest floating point size.
"""
# this is used instead of np.sign(x0) because we need
# sign_x0 to be 1 when x0 == 0.
sign_x0 = (x0 >= 0).astype(float) * 2 - 1
rstep = _eps_for_method(x0.dtype, f0.dtype, method)
if rel_step is None:
abs_step = rstep * sign_x0 * np.maximum(1.0, np.abs(x0))
else:
# User has requested specific relative steps.
# Don't multiply by max(1, abs(x0) because if x0 < 1 then their
# requested step is not used.
abs_step = rel_step * sign_x0 * np.abs(x0)
# however we don't want an abs_step of 0, which can happen if
# rel_step is 0, or x0 is 0. Instead, substitute a realistic step
dx = ((x0 + abs_step) - x0)
abs_step = np.where(dx == 0,
rstep * sign_x0 * np.maximum(1.0, np.abs(x0)),
abs_step)
return abs_step
def _prepare_bounds(bounds, x0):
"""
Prepares new-style bounds from a two-tuple specifying the lower and upper
limits for values in x0. If a value is not bound then the lower/upper bound
will be expected to be -np.inf/np.inf.
Examples
--------
>>> _prepare_bounds([(0, 1, 2), (1, 2, np.inf)], [0.5, 1.5, 2.5])
(array([0., 1., 2.]), array([ 1., 2., inf]))
"""
lb, ub = (np.asarray(b, dtype=float) for b in bounds)
if lb.ndim == 0:
lb = np.resize(lb, x0.shape)
if ub.ndim == 0:
ub = np.resize(ub, x0.shape)
return lb, ub
def group_columns(A, order=0):
"""Group columns of a 2-D matrix for sparse finite differencing [1]_.
Two columns are in the same group if in each row at least one of them
has zero. A greedy sequential algorithm is used to construct groups.
Parameters
----------
A : array_like or sparse matrix, shape (m, n)
Matrix of which to group columns.
order : int, iterable of int with shape (n,) or None
Permutation array which defines the order of columns enumeration.
If int or None, a random permutation is used with `order` used as
a random seed. Default is 0, that is use a random permutation but
guarantee repeatability.
Returns
-------
groups : ndarray of int, shape (n,)
Contains values from 0 to n_groups-1, where n_groups is the number
of found groups. Each value ``groups[i]`` is an index of a group to
which ith column assigned. The procedure was helpful only if
n_groups is significantly less than n.
References
----------
.. [1] A. Curtis, M. J. D. Powell, and J. Reid, "On the estimation of
sparse Jacobian matrices", Journal of the Institute of Mathematics
and its Applications, 13 (1974), pp. 117-120.
"""
if issparse(A):
A = csc_matrix(A)
else:
A = np.atleast_2d(A)
A = (A != 0).astype(np.int32)
if A.ndim != 2:
raise ValueError("`A` must be 2-dimensional.")
m, n = A.shape
if order is None or np.isscalar(order):
rng = np.random.RandomState(order)
order = rng.permutation(n)
else:
order = np.asarray(order)
if order.shape != (n,):
raise ValueError("`order` has incorrect shape.")
A = A[:, order]
if issparse(A):
groups = group_sparse(m, n, A.indices, A.indptr)
else:
groups = group_dense(m, n, A)
groups[order] = groups.copy()
return groups
def approx_derivative(fun, x0, method='3-point', rel_step=None, abs_step=None,
f0=None, bounds=(-np.inf, np.inf), sparsity=None,
as_linear_operator=False, args=(), kwargs={}):
"""Compute finite difference approximation of the derivatives of a
vector-valued function.
If a function maps from R^n to R^m, its derivatives form m-by-n matrix
called the Jacobian, where an element (i, j) is a partial derivative of
f[i] with respect to x[j].
Parameters
----------
fun : callable
Function of which to estimate the derivatives. The argument x
passed to this function is ndarray of shape (n,) (never a scalar
even if n=1). It must return 1-D array_like of shape (m,) or a scalar.
x0 : array_like of shape (n,) or float
Point at which to estimate the derivatives. Float will be converted
to a 1-D array.
method : {'3-point', '2-point', 'cs'}, optional
Finite difference method to use:
- '2-point' - use the first order accuracy forward or backward
difference.
- '3-point' - use central difference in interior points and the
second order accuracy forward or backward difference
near the boundary.
- 'cs' - use a complex-step finite difference scheme. This assumes
that the user function is real-valued and can be
analytically continued to the complex plane. Otherwise,
produces bogus results.
rel_step : None or array_like, optional
Relative step size to use. If None (default) the absolute step size is
computed as ``h = rel_step * sign(x0) * max(1, abs(x0))``, with
`rel_step` being selected automatically, see Notes. Otherwise
``h = rel_step * sign(x0) * abs(x0)``. For ``method='3-point'`` the
sign of `h` is ignored. The calculated step size is possibly adjusted
to fit into the bounds.
abs_step : array_like, optional
Absolute step size to use, possibly adjusted to fit into the bounds.
For ``method='3-point'`` the sign of `abs_step` is ignored. By default
relative steps are used, only if ``abs_step is not None`` are absolute
steps used.
f0 : None or array_like, optional
If not None it is assumed to be equal to ``fun(x0)``, in this case
the ``fun(x0)`` is not called. Default is None.
bounds : tuple of array_like, optional
Lower and upper bounds on independent variables. Defaults to no bounds.
Each bound must match the size of `x0` or be a scalar, in the latter
case the bound will be the same for all variables. Use it to limit the
range of function evaluation. Bounds checking is not implemented
when `as_linear_operator` is True.
sparsity : {None, array_like, sparse matrix, 2-tuple}, optional
Defines a sparsity structure of the Jacobian matrix. If the Jacobian
matrix is known to have only few non-zero elements in each row, then
it's possible to estimate its several columns by a single function
evaluation [3]_. To perform such economic computations two ingredients
are required:
* structure : array_like or sparse matrix of shape (m, n). A zero
element means that a corresponding element of the Jacobian
identically equals to zero.
* groups : array_like of shape (n,). A column grouping for a given
sparsity structure, use `group_columns` to obtain it.
A single array or a sparse matrix is interpreted as a sparsity
structure, and groups are computed inside the function. A tuple is
interpreted as (structure, groups). If None (default), a standard
dense differencing will be used.
Note, that sparse differencing makes sense only for large Jacobian
matrices where each row contains few non-zero elements.
as_linear_operator : bool, optional
When True the function returns an `scipy.sparse.linalg.LinearOperator`.
Otherwise it returns a dense array or a sparse matrix depending on
`sparsity`. The linear operator provides an efficient way of computing
``J.dot(p)`` for any vector ``p`` of shape (n,), but does not allow
direct access to individual elements of the matrix. By default
`as_linear_operator` is False.
args, kwargs : tuple and dict, optional
Additional arguments passed to `fun`. Both empty by default.
The calling signature is ``fun(x, *args, **kwargs)``.
Returns
-------
J : {ndarray, sparse matrix, LinearOperator}
Finite difference approximation of the Jacobian matrix.
If `as_linear_operator` is True returns a LinearOperator
with shape (m, n). Otherwise it returns a dense array or sparse
matrix depending on how `sparsity` is defined. If `sparsity`
is None then a ndarray with shape (m, n) is returned. If
`sparsity` is not None returns a csr_matrix with shape (m, n).
For sparse matrices and linear operators it is always returned as
a 2-D structure, for ndarrays, if m=1 it is returned
as a 1-D gradient array with shape (n,).
See Also
--------
check_derivative : Check correctness of a function computing derivatives.
Notes
-----
If `rel_step` is not provided, it assigned as ``EPS**(1/s)``, where EPS is
determined from the smallest floating point dtype of `x0` or `fun(x0)`,
``np.finfo(x0.dtype).eps``, s=2 for '2-point' method and
s=3 for '3-point' method. Such relative step approximately minimizes a sum
of truncation and round-off errors, see [1]_. Relative steps are used by
default. However, absolute steps are used when ``abs_step is not None``.
If any of the absolute or relative steps produces an indistinguishable
difference from the original `x0`, ``(x0 + dx) - x0 == 0``, then a
automatic step size is substituted for that particular entry.
A finite difference scheme for '3-point' method is selected automatically.
The well-known central difference scheme is used for points sufficiently
far from the boundary, and 3-point forward or backward scheme is used for
points near the boundary. Both schemes have the second-order accuracy in
terms of Taylor expansion. Refer to [2]_ for the formulas of 3-point
forward and backward difference schemes.
For dense differencing when m=1 Jacobian is returned with a shape (n,),
on the other hand when n=1 Jacobian is returned with a shape (m, 1).
Our motivation is the following: a) It handles a case of gradient
computation (m=1) in a conventional way. b) It clearly separates these two
different cases. b) In all cases np.atleast_2d can be called to get 2-D
Jacobian with correct dimensions.
References
----------
.. [1] W. H. Press et. al. "Numerical Recipes. The Art of Scientific
Computing. 3rd edition", sec. 5.7.
.. [2] A. Curtis, M. J. D. Powell, and J. Reid, "On the estimation of
sparse Jacobian matrices", Journal of the Institute of Mathematics
and its Applications, 13 (1974), pp. 117-120.
.. [3] B. Fornberg, "Generation of Finite Difference Formulas on
Arbitrarily Spaced Grids", Mathematics of Computation 51, 1988.
Examples
--------
>>> import numpy as np
>>> from scipy.optimize._numdiff import approx_derivative
>>>
>>> def f(x, c1, c2):
... return np.array([x[0] * np.sin(c1 * x[1]),
... x[0] * np.cos(c2 * x[1])])
...
>>> x0 = np.array([1.0, 0.5 * np.pi])
>>> approx_derivative(f, x0, args=(1, 2))
array([[ 1., 0.],
[-1., 0.]])
Bounds can be used to limit the region of function evaluation.
In the example below we compute left and right derivative at point 1.0.
>>> def g(x):
... return x**2 if x >= 1 else x
...
>>> x0 = 1.0
>>> approx_derivative(g, x0, bounds=(-np.inf, 1.0))
array([ 1.])
>>> approx_derivative(g, x0, bounds=(1.0, np.inf))
array([ 2.])
"""
if method not in ['2-point', '3-point', 'cs']:
raise ValueError("Unknown method '%s'. " % method)
xp = array_namespace(x0)
_x = atleast_nd(x0, ndim=1, xp=xp)
_dtype = xp.float64
if xp.isdtype(_x.dtype, "real floating"):
_dtype = _x.dtype
# promotes to floating
x0 = xp.astype(_x, _dtype)
if x0.ndim > 1:
raise ValueError("`x0` must have at most 1 dimension.")
lb, ub = _prepare_bounds(bounds, x0)
if lb.shape != x0.shape or ub.shape != x0.shape:
raise ValueError("Inconsistent shapes between bounds and `x0`.")
if as_linear_operator and not (np.all(np.isinf(lb))
and np.all(np.isinf(ub))):
raise ValueError("Bounds not supported when "
"`as_linear_operator` is True.")
def fun_wrapped(x):
# send user function same fp type as x0. (but only if cs is not being
# used
if xp.isdtype(x.dtype, "real floating"):
x = xp.astype(x, x0.dtype)
f = np.atleast_1d(fun(x, *args, **kwargs))
if f.ndim > 1:
raise RuntimeError("`fun` return value has "
"more than 1 dimension.")
return f
if f0 is None:
f0 = fun_wrapped(x0)
else:
f0 = np.atleast_1d(f0)
if f0.ndim > 1:
raise ValueError("`f0` passed has more than 1 dimension.")
if np.any((x0 < lb) | (x0 > ub)):
raise ValueError("`x0` violates bound constraints.")
if as_linear_operator:
if rel_step is None:
rel_step = _eps_for_method(x0.dtype, f0.dtype, method)
return _linear_operator_difference(fun_wrapped, x0,
f0, rel_step, method)
else:
# by default we use rel_step
if abs_step is None:
h = _compute_absolute_step(rel_step, x0, f0, method)
else:
# user specifies an absolute step
sign_x0 = (x0 >= 0).astype(float) * 2 - 1
h = abs_step
# cannot have a zero step. This might happen if x0 is very large
# or small. In which case fall back to relative step.
dx = ((x0 + h) - x0)
h = np.where(dx == 0,
_eps_for_method(x0.dtype, f0.dtype, method) *
sign_x0 * np.maximum(1.0, np.abs(x0)),
h)
if method == '2-point':
h, use_one_sided = _adjust_scheme_to_bounds(
x0, h, 1, '1-sided', lb, ub)
elif method == '3-point':
h, use_one_sided = _adjust_scheme_to_bounds(
x0, h, 1, '2-sided', lb, ub)
elif method == 'cs':
use_one_sided = False
if sparsity is None:
return _dense_difference(fun_wrapped, x0, f0, h,
use_one_sided, method)
else:
if not issparse(sparsity) and len(sparsity) == 2:
structure, groups = sparsity
else:
structure = sparsity
groups = group_columns(sparsity)
if issparse(structure):
structure = csc_matrix(structure)
else:
structure = np.atleast_2d(structure)
groups = np.atleast_1d(groups)
return _sparse_difference(fun_wrapped, x0, f0, h,
use_one_sided, structure,
groups, method)
def _linear_operator_difference(fun, x0, f0, h, method):
m = f0.size
n = x0.size
if method == '2-point':
def matvec(p):
if np.array_equal(p, np.zeros_like(p)):
return np.zeros(m)
dx = h / norm(p)
x = x0 + dx*p
df = fun(x) - f0
return df / dx
elif method == '3-point':
def matvec(p):
if np.array_equal(p, np.zeros_like(p)):
return np.zeros(m)
dx = 2*h / norm(p)
x1 = x0 - (dx/2)*p
x2 = x0 + (dx/2)*p
f1 = fun(x1)
f2 = fun(x2)
df = f2 - f1
return df / dx
elif method == 'cs':
def matvec(p):
if np.array_equal(p, np.zeros_like(p)):
return np.zeros(m)
dx = h / norm(p)
x = x0 + dx*p*1.j
f1 = fun(x)
df = f1.imag
return df / dx
else:
raise RuntimeError("Never be here.")
return LinearOperator((m, n), matvec)
def _dense_difference(fun, x0, f0, h, use_one_sided, method):
m = f0.size
n = x0.size
J_transposed = np.empty((n, m))
h_vecs = np.diag(h)
for i in range(h.size):
if method == '2-point':
x = x0 + h_vecs[i]
dx = x[i] - x0[i] # Recompute dx as exactly representable number.
df = fun(x) - f0
elif method == '3-point' and use_one_sided[i]:
x1 = x0 + h_vecs[i]
x2 = x0 + 2 * h_vecs[i]
dx = x2[i] - x0[i]
f1 = fun(x1)
f2 = fun(x2)
df = -3.0 * f0 + 4 * f1 - f2
elif method == '3-point' and not use_one_sided[i]:
x1 = x0 - h_vecs[i]
x2 = x0 + h_vecs[i]
dx = x2[i] - x1[i]
f1 = fun(x1)
f2 = fun(x2)
df = f2 - f1
elif method == 'cs':
f1 = fun(x0 + h_vecs[i]*1.j)
df = f1.imag
dx = h_vecs[i, i]
else:
raise RuntimeError("Never be here.")
J_transposed[i] = df / dx
if m == 1:
J_transposed = np.ravel(J_transposed)
return J_transposed.T
def _sparse_difference(fun, x0, f0, h, use_one_sided,
structure, groups, method):
m = f0.size
n = x0.size
row_indices = []
col_indices = []
fractions = []
n_groups = np.max(groups) + 1
for group in range(n_groups):
# Perturb variables which are in the same group simultaneously.
e = np.equal(group, groups)
h_vec = h * e
if method == '2-point':
x = x0 + h_vec
dx = x - x0
df = fun(x) - f0
# The result is written to columns which correspond to perturbed
# variables.
cols, = np.nonzero(e)
# Find all non-zero elements in selected columns of Jacobian.
i, j, _ = find(structure[:, cols])
# Restore column indices in the full array.
j = cols[j]
elif method == '3-point':
# Here we do conceptually the same but separate one-sided
# and two-sided schemes.
x1 = x0.copy()
x2 = x0.copy()
mask_1 = use_one_sided & e
x1[mask_1] += h_vec[mask_1]
x2[mask_1] += 2 * h_vec[mask_1]
mask_2 = ~use_one_sided & e
x1[mask_2] -= h_vec[mask_2]
x2[mask_2] += h_vec[mask_2]
dx = np.zeros(n)
dx[mask_1] = x2[mask_1] - x0[mask_1]
dx[mask_2] = x2[mask_2] - x1[mask_2]
f1 = fun(x1)
f2 = fun(x2)
cols, = np.nonzero(e)
i, j, _ = find(structure[:, cols])
j = cols[j]
mask = use_one_sided[j]
df = np.empty(m)
rows = i[mask]
df[rows] = -3 * f0[rows] + 4 * f1[rows] - f2[rows]
rows = i[~mask]
df[rows] = f2[rows] - f1[rows]
elif method == 'cs':
f1 = fun(x0 + h_vec*1.j)
df = f1.imag
dx = h_vec
cols, = np.nonzero(e)
i, j, _ = find(structure[:, cols])
j = cols[j]
else:
raise ValueError("Never be here.")
# All that's left is to compute the fraction. We store i, j and
# fractions as separate arrays and later construct coo_matrix.
row_indices.append(i)
col_indices.append(j)
fractions.append(df[i] / dx[j])
row_indices = np.hstack(row_indices)
col_indices = np.hstack(col_indices)
fractions = np.hstack(fractions)
J = coo_matrix((fractions, (row_indices, col_indices)), shape=(m, n))
return csr_matrix(J)
def check_derivative(fun, jac, x0, bounds=(-np.inf, np.inf), args=(),
kwargs={}):
"""Check correctness of a function computing derivatives (Jacobian or
gradient) by comparison with a finite difference approximation.
Parameters
----------
fun : callable
Function of which to estimate the derivatives. The argument x
passed to this function is ndarray of shape (n,) (never a scalar
even if n=1). It must return 1-D array_like of shape (m,) or a scalar.
jac : callable
Function which computes Jacobian matrix of `fun`. It must work with
argument x the same way as `fun`. The return value must be array_like
or sparse matrix with an appropriate shape.
x0 : array_like of shape (n,) or float
Point at which to estimate the derivatives. Float will be converted
to 1-D array.
bounds : 2-tuple of array_like, optional
Lower and upper bounds on independent variables. Defaults to no bounds.
Each bound must match the size of `x0` or be a scalar, in the latter
case the bound will be the same for all variables. Use it to limit the
range of function evaluation.
args, kwargs : tuple and dict, optional
Additional arguments passed to `fun` and `jac`. Both empty by default.
The calling signature is ``fun(x, *args, **kwargs)`` and the same
for `jac`.
Returns
-------
accuracy : float
The maximum among all relative errors for elements with absolute values
higher than 1 and absolute errors for elements with absolute values
less or equal than 1. If `accuracy` is on the order of 1e-6 or lower,
then it is likely that your `jac` implementation is correct.
See Also
--------
approx_derivative : Compute finite difference approximation of derivative.
Examples
--------
>>> import numpy as np
>>> from scipy.optimize._numdiff import check_derivative
>>>
>>>
>>> def f(x, c1, c2):
... return np.array([x[0] * np.sin(c1 * x[1]),
... x[0] * np.cos(c2 * x[1])])
...
>>> def jac(x, c1, c2):
... return np.array([
... [np.sin(c1 * x[1]), c1 * x[0] * np.cos(c1 * x[1])],
... [np.cos(c2 * x[1]), -c2 * x[0] * np.sin(c2 * x[1])]
... ])
...
>>>
>>> x0 = np.array([1.0, 0.5 * np.pi])
>>> check_derivative(f, jac, x0, args=(1, 2))
2.4492935982947064e-16
"""
J_to_test = jac(x0, *args, **kwargs)
if issparse(J_to_test):
J_diff = approx_derivative(fun, x0, bounds=bounds, sparsity=J_to_test,
args=args, kwargs=kwargs)
J_to_test = csr_matrix(J_to_test)
abs_err = J_to_test - J_diff
i, j, abs_err_data = find(abs_err)
J_diff_data = np.asarray(J_diff[i, j]).ravel()
return np.max(np.abs(abs_err_data) /
np.maximum(1, np.abs(J_diff_data)))
else:
J_diff = approx_derivative(fun, x0, bounds=bounds,
args=args, kwargs=kwargs)
abs_err = np.abs(J_to_test - J_diff)
return np.max(abs_err / np.maximum(1, np.abs(J_diff)))
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,731 @@
import numpy as np
import operator
from . import (linear_sum_assignment, OptimizeResult)
from ._optimize import _check_unknown_options
from scipy._lib._util import check_random_state
import itertools
QUADRATIC_ASSIGNMENT_METHODS = ['faq', '2opt']
def quadratic_assignment(A, B, method="faq", options=None):
r"""
Approximates solution to the quadratic assignment problem and
the graph matching problem.
Quadratic assignment solves problems of the following form:
.. math::
\min_P & \ {\ \text{trace}(A^T P B P^T)}\\
\mbox{s.t. } & {P \ \epsilon \ \mathcal{P}}\\
where :math:`\mathcal{P}` is the set of all permutation matrices,
and :math:`A` and :math:`B` are square matrices.
Graph matching tries to *maximize* the same objective function.
This algorithm can be thought of as finding the alignment of the
nodes of two graphs that minimizes the number of induced edge
disagreements, or, in the case of weighted graphs, the sum of squared
edge weight differences.
Note that the quadratic assignment problem is NP-hard. The results given
here are approximations and are not guaranteed to be optimal.
Parameters
----------
A : 2-D array, square
The square matrix :math:`A` in the objective function above.
B : 2-D array, square
The square matrix :math:`B` in the objective function above.
method : str in {'faq', '2opt'} (default: 'faq')
The algorithm used to solve the problem.
:ref:`'faq' <optimize.qap-faq>` (default) and
:ref:`'2opt' <optimize.qap-2opt>` are available.
options : dict, optional
A dictionary of solver options. All solvers support the following:
maximize : bool (default: False)
Maximizes the objective function if ``True``.
partial_match : 2-D array of integers, optional (default: None)
Fixes part of the matching. Also known as a "seed" [2]_.
Each row of `partial_match` specifies a pair of matched nodes:
node ``partial_match[i, 0]`` of `A` is matched to node
``partial_match[i, 1]`` of `B`. The array has shape ``(m, 2)``,
where ``m`` is not greater than the number of nodes, :math:`n`.
rng : {None, int, `numpy.random.Generator`,
`numpy.random.RandomState`}, optional
If `seed` is None (or `np.random`), the `numpy.random.RandomState`
singleton is used.
If `seed` is an int, a new ``RandomState`` instance is used,
seeded with `seed`.
If `seed` is already a ``Generator`` or ``RandomState`` instance then
that instance is used.
For method-specific options, see
:func:`show_options('quadratic_assignment') <show_options>`.
Returns
-------
res : OptimizeResult
`OptimizeResult` containing the following fields.
col_ind : 1-D array
Column indices corresponding to the best permutation found of the
nodes of `B`.
fun : float
The objective value of the solution.
nit : int
The number of iterations performed during optimization.
Notes
-----
The default method :ref:`'faq' <optimize.qap-faq>` uses the Fast
Approximate QAP algorithm [1]_; it typically offers the best combination of
speed and accuracy.
Method :ref:`'2opt' <optimize.qap-2opt>` can be computationally expensive,
but may be a useful alternative, or it can be used to refine the solution
returned by another method.
References
----------
.. [1] J.T. Vogelstein, J.M. Conroy, V. Lyzinski, L.J. Podrazik,
S.G. Kratzer, E.T. Harley, D.E. Fishkind, R.J. Vogelstein, and
C.E. Priebe, "Fast approximate quadratic programming for graph
matching," PLOS one, vol. 10, no. 4, p. e0121002, 2015,
:doi:`10.1371/journal.pone.0121002`
.. [2] D. Fishkind, S. Adali, H. Patsolic, L. Meng, D. Singh, V. Lyzinski,
C. Priebe, "Seeded graph matching", Pattern Recognit. 87 (2019):
203-215, :doi:`10.1016/j.patcog.2018.09.014`
.. [3] "2-opt," Wikipedia.
https://en.wikipedia.org/wiki/2-opt
Examples
--------
>>> import numpy as np
>>> from scipy.optimize import quadratic_assignment
>>> A = np.array([[0, 80, 150, 170], [80, 0, 130, 100],
... [150, 130, 0, 120], [170, 100, 120, 0]])
>>> B = np.array([[0, 5, 2, 7], [0, 0, 3, 8],
... [0, 0, 0, 3], [0, 0, 0, 0]])
>>> res = quadratic_assignment(A, B)
>>> print(res)
fun: 3260
col_ind: [0 3 2 1]
nit: 9
The see the relationship between the returned ``col_ind`` and ``fun``,
use ``col_ind`` to form the best permutation matrix found, then evaluate
the objective function :math:`f(P) = trace(A^T P B P^T )`.
>>> perm = res['col_ind']
>>> P = np.eye(len(A), dtype=int)[perm]
>>> fun = np.trace(A.T @ P @ B @ P.T)
>>> print(fun)
3260
Alternatively, to avoid constructing the permutation matrix explicitly,
directly permute the rows and columns of the distance matrix.
>>> fun = np.trace(A.T @ B[perm][:, perm])
>>> print(fun)
3260
Although not guaranteed in general, ``quadratic_assignment`` happens to
have found the globally optimal solution.
>>> from itertools import permutations
>>> perm_opt, fun_opt = None, np.inf
>>> for perm in permutations([0, 1, 2, 3]):
... perm = np.array(perm)
... fun = np.trace(A.T @ B[perm][:, perm])
... if fun < fun_opt:
... fun_opt, perm_opt = fun, perm
>>> print(np.array_equal(perm_opt, res['col_ind']))
True
Here is an example for which the default method,
:ref:`'faq' <optimize.qap-faq>`, does not find the global optimum.
>>> A = np.array([[0, 5, 8, 6], [5, 0, 5, 1],
... [8, 5, 0, 2], [6, 1, 2, 0]])
>>> B = np.array([[0, 1, 8, 4], [1, 0, 5, 2],
... [8, 5, 0, 5], [4, 2, 5, 0]])
>>> res = quadratic_assignment(A, B)
>>> print(res)
fun: 178
col_ind: [1 0 3 2]
nit: 13
If accuracy is important, consider using :ref:`'2opt' <optimize.qap-2opt>`
to refine the solution.
>>> guess = np.array([np.arange(len(A)), res.col_ind]).T
>>> res = quadratic_assignment(A, B, method="2opt",
... options = {'partial_guess': guess})
>>> print(res)
fun: 176
col_ind: [1 2 3 0]
nit: 17
"""
if options is None:
options = {}
method = method.lower()
methods = {"faq": _quadratic_assignment_faq,
"2opt": _quadratic_assignment_2opt}
if method not in methods:
raise ValueError(f"method {method} must be in {methods}.")
res = methods[method](A, B, **options)
return res
def _calc_score(A, B, perm):
# equivalent to objective function but avoids matmul
return np.sum(A * B[perm][:, perm])
def _common_input_validation(A, B, partial_match):
A = np.atleast_2d(A)
B = np.atleast_2d(B)
if partial_match is None:
partial_match = np.array([[], []]).T
partial_match = np.atleast_2d(partial_match).astype(int)
msg = None
if A.shape[0] != A.shape[1]:
msg = "`A` must be square"
elif B.shape[0] != B.shape[1]:
msg = "`B` must be square"
elif A.ndim != 2 or B.ndim != 2:
msg = "`A` and `B` must have exactly two dimensions"
elif A.shape != B.shape:
msg = "`A` and `B` matrices must be of equal size"
elif partial_match.shape[0] > A.shape[0]:
msg = "`partial_match` can have only as many seeds as there are nodes"
elif partial_match.shape[1] != 2:
msg = "`partial_match` must have two columns"
elif partial_match.ndim != 2:
msg = "`partial_match` must have exactly two dimensions"
elif (partial_match < 0).any():
msg = "`partial_match` must contain only positive indices"
elif (partial_match >= len(A)).any():
msg = "`partial_match` entries must be less than number of nodes"
elif (not len(set(partial_match[:, 0])) == len(partial_match[:, 0]) or
not len(set(partial_match[:, 1])) == len(partial_match[:, 1])):
msg = "`partial_match` column entries must be unique"
if msg is not None:
raise ValueError(msg)
return A, B, partial_match
def _quadratic_assignment_faq(A, B,
maximize=False, partial_match=None, rng=None,
P0="barycenter", shuffle_input=False, maxiter=30,
tol=0.03, **unknown_options):
r"""Solve the quadratic assignment problem (approximately).
This function solves the Quadratic Assignment Problem (QAP) and the
Graph Matching Problem (GMP) using the Fast Approximate QAP Algorithm
(FAQ) [1]_.
Quadratic assignment solves problems of the following form:
.. math::
\min_P & \ {\ \text{trace}(A^T P B P^T)}\\
\mbox{s.t. } & {P \ \epsilon \ \mathcal{P}}\\
where :math:`\mathcal{P}` is the set of all permutation matrices,
and :math:`A` and :math:`B` are square matrices.
Graph matching tries to *maximize* the same objective function.
This algorithm can be thought of as finding the alignment of the
nodes of two graphs that minimizes the number of induced edge
disagreements, or, in the case of weighted graphs, the sum of squared
edge weight differences.
Note that the quadratic assignment problem is NP-hard. The results given
here are approximations and are not guaranteed to be optimal.
Parameters
----------
A : 2-D array, square
The square matrix :math:`A` in the objective function above.
B : 2-D array, square
The square matrix :math:`B` in the objective function above.
method : str in {'faq', '2opt'} (default: 'faq')
The algorithm used to solve the problem. This is the method-specific
documentation for 'faq'.
:ref:`'2opt' <optimize.qap-2opt>` is also available.
Options
-------
maximize : bool (default: False)
Maximizes the objective function if ``True``.
partial_match : 2-D array of integers, optional (default: None)
Fixes part of the matching. Also known as a "seed" [2]_.
Each row of `partial_match` specifies a pair of matched nodes:
node ``partial_match[i, 0]`` of `A` is matched to node
``partial_match[i, 1]`` of `B`. The array has shape ``(m, 2)``, where
``m`` is not greater than the number of nodes, :math:`n`.
rng : {None, int, `numpy.random.Generator`,
`numpy.random.RandomState`}, optional
If `seed` is None (or `np.random`), the `numpy.random.RandomState`
singleton is used.
If `seed` is an int, a new ``RandomState`` instance is used,
seeded with `seed`.
If `seed` is already a ``Generator`` or ``RandomState`` instance then
that instance is used.
P0 : 2-D array, "barycenter", or "randomized" (default: "barycenter")
Initial position. Must be a doubly-stochastic matrix [3]_.
If the initial position is an array, it must be a doubly stochastic
matrix of size :math:`m' \times m'` where :math:`m' = n - m`.
If ``"barycenter"`` (default), the initial position is the barycenter
of the Birkhoff polytope (the space of doubly stochastic matrices).
This is a :math:`m' \times m'` matrix with all entries equal to
:math:`1 / m'`.
If ``"randomized"`` the initial search position is
:math:`P_0 = (J + K) / 2`, where :math:`J` is the barycenter and
:math:`K` is a random doubly stochastic matrix.
shuffle_input : bool (default: False)
Set to `True` to resolve degenerate gradients randomly. For
non-degenerate gradients this option has no effect.
maxiter : int, positive (default: 30)
Integer specifying the max number of Frank-Wolfe iterations performed.
tol : float (default: 0.03)
Tolerance for termination. Frank-Wolfe iteration terminates when
:math:`\frac{||P_{i}-P_{i+1}||_F}{\sqrt{m')}} \leq tol`,
where :math:`i` is the iteration number.
Returns
-------
res : OptimizeResult
`OptimizeResult` containing the following fields.
col_ind : 1-D array
Column indices corresponding to the best permutation found of the
nodes of `B`.
fun : float
The objective value of the solution.
nit : int
The number of Frank-Wolfe iterations performed.
Notes
-----
The algorithm may be sensitive to the initial permutation matrix (or
search "position") due to the possibility of several local minima
within the feasible region. A barycenter initialization is more likely to
result in a better solution than a single random initialization. However,
calling ``quadratic_assignment`` several times with different random
initializations may result in a better optimum at the cost of longer
total execution time.
Examples
--------
As mentioned above, a barycenter initialization often results in a better
solution than a single random initialization.
>>> from numpy.random import default_rng
>>> rng = default_rng()
>>> n = 15
>>> A = rng.random((n, n))
>>> B = rng.random((n, n))
>>> res = quadratic_assignment(A, B) # FAQ is default method
>>> print(res.fun)
46.871483385480545 # may vary
>>> options = {"P0": "randomized"} # use randomized initialization
>>> res = quadratic_assignment(A, B, options=options)
>>> print(res.fun)
47.224831071310625 # may vary
However, consider running from several randomized initializations and
keeping the best result.
>>> res = min([quadratic_assignment(A, B, options=options)
... for i in range(30)], key=lambda x: x.fun)
>>> print(res.fun)
46.671852533681516 # may vary
The '2-opt' method can be used to further refine the results.
>>> options = {"partial_guess": np.array([np.arange(n), res.col_ind]).T}
>>> res = quadratic_assignment(A, B, method="2opt", options=options)
>>> print(res.fun)
46.47160735721583 # may vary
References
----------
.. [1] J.T. Vogelstein, J.M. Conroy, V. Lyzinski, L.J. Podrazik,
S.G. Kratzer, E.T. Harley, D.E. Fishkind, R.J. Vogelstein, and
C.E. Priebe, "Fast approximate quadratic programming for graph
matching," PLOS one, vol. 10, no. 4, p. e0121002, 2015,
:doi:`10.1371/journal.pone.0121002`
.. [2] D. Fishkind, S. Adali, H. Patsolic, L. Meng, D. Singh, V. Lyzinski,
C. Priebe, "Seeded graph matching", Pattern Recognit. 87 (2019):
203-215, :doi:`10.1016/j.patcog.2018.09.014`
.. [3] "Doubly stochastic Matrix," Wikipedia.
https://en.wikipedia.org/wiki/Doubly_stochastic_matrix
"""
_check_unknown_options(unknown_options)
maxiter = operator.index(maxiter)
# ValueError check
A, B, partial_match = _common_input_validation(A, B, partial_match)
msg = None
if isinstance(P0, str) and P0 not in {'barycenter', 'randomized'}:
msg = "Invalid 'P0' parameter string"
elif maxiter <= 0:
msg = "'maxiter' must be a positive integer"
elif tol <= 0:
msg = "'tol' must be a positive float"
if msg is not None:
raise ValueError(msg)
rng = check_random_state(rng)
n = len(A) # number of vertices in graphs
n_seeds = len(partial_match) # number of seeds
n_unseed = n - n_seeds
# [1] Algorithm 1 Line 1 - choose initialization
if not isinstance(P0, str):
P0 = np.atleast_2d(P0)
if P0.shape != (n_unseed, n_unseed):
msg = "`P0` matrix must have shape m' x m', where m'=n-m"
elif ((P0 < 0).any() or not np.allclose(np.sum(P0, axis=0), 1)
or not np.allclose(np.sum(P0, axis=1), 1)):
msg = "`P0` matrix must be doubly stochastic"
if msg is not None:
raise ValueError(msg)
elif P0 == 'barycenter':
P0 = np.ones((n_unseed, n_unseed)) / n_unseed
elif P0 == 'randomized':
J = np.ones((n_unseed, n_unseed)) / n_unseed
# generate a nxn matrix where each entry is a random number [0, 1]
# would use rand, but Generators don't have it
# would use random, but old mtrand.RandomStates don't have it
K = _doubly_stochastic(rng.uniform(size=(n_unseed, n_unseed)))
P0 = (J + K) / 2
# check trivial cases
if n == 0 or n_seeds == n:
score = _calc_score(A, B, partial_match[:, 1])
res = {"col_ind": partial_match[:, 1], "fun": score, "nit": 0}
return OptimizeResult(res)
obj_func_scalar = 1
if maximize:
obj_func_scalar = -1
nonseed_B = np.setdiff1d(range(n), partial_match[:, 1])
if shuffle_input:
nonseed_B = rng.permutation(nonseed_B)
nonseed_A = np.setdiff1d(range(n), partial_match[:, 0])
perm_A = np.concatenate([partial_match[:, 0], nonseed_A])
perm_B = np.concatenate([partial_match[:, 1], nonseed_B])
# definitions according to Seeded Graph Matching [2].
A11, A12, A21, A22 = _split_matrix(A[perm_A][:, perm_A], n_seeds)
B11, B12, B21, B22 = _split_matrix(B[perm_B][:, perm_B], n_seeds)
const_sum = A21 @ B21.T + A12.T @ B12
P = P0
# [1] Algorithm 1 Line 2 - loop while stopping criteria not met
for n_iter in range(1, maxiter+1):
# [1] Algorithm 1 Line 3 - compute the gradient of f(P) = -tr(APB^tP^t)
grad_fp = (const_sum + A22 @ P @ B22.T + A22.T @ P @ B22)
# [1] Algorithm 1 Line 4 - get direction Q by solving Eq. 8
_, cols = linear_sum_assignment(grad_fp, maximize=maximize)
Q = np.eye(n_unseed)[cols]
# [1] Algorithm 1 Line 5 - compute the step size
# Noting that e.g. trace(Ax) = trace(A)*x, expand and re-collect
# terms as ax**2 + bx + c. c does not affect location of minimum
# and can be ignored. Also, note that trace(A@B) = (A.T*B).sum();
# apply where possible for efficiency.
R = P - Q
b21 = ((R.T @ A21) * B21).sum()
b12 = ((R.T @ A12.T) * B12.T).sum()
AR22 = A22.T @ R
BR22 = B22 @ R.T
b22a = (AR22 * B22.T[cols]).sum()
b22b = (A22 * BR22[cols]).sum()
a = (AR22.T * BR22).sum()
b = b21 + b12 + b22a + b22b
# critical point of ax^2 + bx + c is at x = -d/(2*e)
# if a * obj_func_scalar > 0, it is a minimum
# if minimum is not in [0, 1], only endpoints need to be considered
if a*obj_func_scalar > 0 and 0 <= -b/(2*a) <= 1:
alpha = -b/(2*a)
else:
alpha = np.argmin([0, (b + a)*obj_func_scalar])
# [1] Algorithm 1 Line 6 - Update P
P_i1 = alpha * P + (1 - alpha) * Q
if np.linalg.norm(P - P_i1) / np.sqrt(n_unseed) < tol:
P = P_i1
break
P = P_i1
# [1] Algorithm 1 Line 7 - end main loop
# [1] Algorithm 1 Line 8 - project onto the set of permutation matrices
_, col = linear_sum_assignment(P, maximize=True)
perm = np.concatenate((np.arange(n_seeds), col + n_seeds))
unshuffled_perm = np.zeros(n, dtype=int)
unshuffled_perm[perm_A] = perm_B[perm]
score = _calc_score(A, B, unshuffled_perm)
res = {"col_ind": unshuffled_perm, "fun": score, "nit": n_iter}
return OptimizeResult(res)
def _split_matrix(X, n):
# definitions according to Seeded Graph Matching [2].
upper, lower = X[:n], X[n:]
return upper[:, :n], upper[:, n:], lower[:, :n], lower[:, n:]
def _doubly_stochastic(P, tol=1e-3):
# Adapted from @btaba implementation
# https://github.com/btaba/sinkhorn_knopp
# of Sinkhorn-Knopp algorithm
# https://projecteuclid.org/euclid.pjm/1102992505
max_iter = 1000
c = 1 / P.sum(axis=0)
r = 1 / (P @ c)
P_eps = P
for it in range(max_iter):
if ((np.abs(P_eps.sum(axis=1) - 1) < tol).all() and
(np.abs(P_eps.sum(axis=0) - 1) < tol).all()):
# All column/row sums ~= 1 within threshold
break
c = 1 / (r @ P)
r = 1 / (P @ c)
P_eps = r[:, None] * P * c
return P_eps
def _quadratic_assignment_2opt(A, B, maximize=False, rng=None,
partial_match=None,
partial_guess=None,
**unknown_options):
r"""Solve the quadratic assignment problem (approximately).
This function solves the Quadratic Assignment Problem (QAP) and the
Graph Matching Problem (GMP) using the 2-opt algorithm [1]_.
Quadratic assignment solves problems of the following form:
.. math::
\min_P & \ {\ \text{trace}(A^T P B P^T)}\\
\mbox{s.t. } & {P \ \epsilon \ \mathcal{P}}\\
where :math:`\mathcal{P}` is the set of all permutation matrices,
and :math:`A` and :math:`B` are square matrices.
Graph matching tries to *maximize* the same objective function.
This algorithm can be thought of as finding the alignment of the
nodes of two graphs that minimizes the number of induced edge
disagreements, or, in the case of weighted graphs, the sum of squared
edge weight differences.
Note that the quadratic assignment problem is NP-hard. The results given
here are approximations and are not guaranteed to be optimal.
Parameters
----------
A : 2-D array, square
The square matrix :math:`A` in the objective function above.
B : 2-D array, square
The square matrix :math:`B` in the objective function above.
method : str in {'faq', '2opt'} (default: 'faq')
The algorithm used to solve the problem. This is the method-specific
documentation for '2opt'.
:ref:`'faq' <optimize.qap-faq>` is also available.
Options
-------
maximize : bool (default: False)
Maximizes the objective function if ``True``.
rng : {None, int, `numpy.random.Generator`,
`numpy.random.RandomState`}, optional
If `seed` is None (or `np.random`), the `numpy.random.RandomState`
singleton is used.
If `seed` is an int, a new ``RandomState`` instance is used,
seeded with `seed`.
If `seed` is already a ``Generator`` or ``RandomState`` instance then
that instance is used.
partial_match : 2-D array of integers, optional (default: None)
Fixes part of the matching. Also known as a "seed" [2]_.
Each row of `partial_match` specifies a pair of matched nodes: node
``partial_match[i, 0]`` of `A` is matched to node
``partial_match[i, 1]`` of `B`. The array has shape ``(m, 2)``,
where ``m`` is not greater than the number of nodes, :math:`n`.
.. note::
`partial_match` must be sorted by the first column.
partial_guess : 2-D array of integers, optional (default: None)
A guess for the matching between the two matrices. Unlike
`partial_match`, `partial_guess` does not fix the indices; they are
still free to be optimized.
Each row of `partial_guess` specifies a pair of matched nodes: node
``partial_guess[i, 0]`` of `A` is matched to node
``partial_guess[i, 1]`` of `B`. The array has shape ``(m, 2)``,
where ``m`` is not greater than the number of nodes, :math:`n`.
.. note::
`partial_guess` must be sorted by the first column.
Returns
-------
res : OptimizeResult
`OptimizeResult` containing the following fields.
col_ind : 1-D array
Column indices corresponding to the best permutation found of the
nodes of `B`.
fun : float
The objective value of the solution.
nit : int
The number of iterations performed during optimization.
Notes
-----
This is a greedy algorithm that works similarly to bubble sort: beginning
with an initial permutation, it iteratively swaps pairs of indices to
improve the objective function until no such improvements are possible.
References
----------
.. [1] "2-opt," Wikipedia.
https://en.wikipedia.org/wiki/2-opt
.. [2] D. Fishkind, S. Adali, H. Patsolic, L. Meng, D. Singh, V. Lyzinski,
C. Priebe, "Seeded graph matching", Pattern Recognit. 87 (2019):
203-215, https://doi.org/10.1016/j.patcog.2018.09.014
"""
_check_unknown_options(unknown_options)
rng = check_random_state(rng)
A, B, partial_match = _common_input_validation(A, B, partial_match)
N = len(A)
# check trivial cases
if N == 0 or partial_match.shape[0] == N:
score = _calc_score(A, B, partial_match[:, 1])
res = {"col_ind": partial_match[:, 1], "fun": score, "nit": 0}
return OptimizeResult(res)
if partial_guess is None:
partial_guess = np.array([[], []]).T
partial_guess = np.atleast_2d(partial_guess).astype(int)
msg = None
if partial_guess.shape[0] > A.shape[0]:
msg = ("`partial_guess` can have only as "
"many entries as there are nodes")
elif partial_guess.shape[1] != 2:
msg = "`partial_guess` must have two columns"
elif partial_guess.ndim != 2:
msg = "`partial_guess` must have exactly two dimensions"
elif (partial_guess < 0).any():
msg = "`partial_guess` must contain only positive indices"
elif (partial_guess >= len(A)).any():
msg = "`partial_guess` entries must be less than number of nodes"
elif (not len(set(partial_guess[:, 0])) == len(partial_guess[:, 0]) or
not len(set(partial_guess[:, 1])) == len(partial_guess[:, 1])):
msg = "`partial_guess` column entries must be unique"
if msg is not None:
raise ValueError(msg)
fixed_rows = None
if partial_match.size or partial_guess.size:
# use partial_match and partial_guess for initial permutation,
# but randomly permute the rest.
guess_rows = np.zeros(N, dtype=bool)
guess_cols = np.zeros(N, dtype=bool)
fixed_rows = np.zeros(N, dtype=bool)
fixed_cols = np.zeros(N, dtype=bool)
perm = np.zeros(N, dtype=int)
rg, cg = partial_guess.T
guess_rows[rg] = True
guess_cols[cg] = True
perm[guess_rows] = cg
# match overrides guess
rf, cf = partial_match.T
fixed_rows[rf] = True
fixed_cols[cf] = True
perm[fixed_rows] = cf
random_rows = ~fixed_rows & ~guess_rows
random_cols = ~fixed_cols & ~guess_cols
perm[random_rows] = rng.permutation(np.arange(N)[random_cols])
else:
perm = rng.permutation(np.arange(N))
best_score = _calc_score(A, B, perm)
i_free = np.arange(N)
if fixed_rows is not None:
i_free = i_free[~fixed_rows]
better = operator.gt if maximize else operator.lt
n_iter = 0
done = False
while not done:
# equivalent to nested for loops i in range(N), j in range(i, N)
for i, j in itertools.combinations_with_replacement(i_free, 2):
n_iter += 1
perm[i], perm[j] = perm[j], perm[i]
score = _calc_score(A, B, perm)
if better(score, best_score):
best_score = score
break
# faster to swap back than to create a new list every time
perm[i], perm[j] = perm[j], perm[i]
else: # no swaps made
done = True
res = {"col_ind": perm, "fun": best_score, "nit": n_iter}
return OptimizeResult(res)
@@ -0,0 +1,522 @@
"""
Routines for removing redundant (linearly dependent) equations from linear
programming equality constraints.
"""
# Author: Matt Haberland
import numpy as np
from scipy.linalg import svd
from scipy.linalg.interpolative import interp_decomp
import scipy
from scipy.linalg.blas import dtrsm
def _row_count(A):
"""
Counts the number of nonzeros in each row of input array A.
Nonzeros are defined as any element with absolute value greater than
tol = 1e-13. This value should probably be an input to the function.
Parameters
----------
A : 2-D array
An array representing a matrix
Returns
-------
rowcount : 1-D array
Number of nonzeros in each row of A
"""
tol = 1e-13
return np.array((abs(A) > tol).sum(axis=1)).flatten()
def _get_densest(A, eligibleRows):
"""
Returns the index of the densest row of A. Ignores rows that are not
eligible for consideration.
Parameters
----------
A : 2-D array
An array representing a matrix
eligibleRows : 1-D logical array
Values indicate whether the corresponding row of A is eligible
to be considered
Returns
-------
i_densest : int
Index of the densest row in A eligible for consideration
"""
rowCounts = _row_count(A)
return np.argmax(rowCounts * eligibleRows)
def _remove_zero_rows(A, b):
"""
Eliminates trivial equations from system of equations defined by Ax = b
and identifies trivial infeasibilities
Parameters
----------
A : 2-D array
An array representing the left-hand side of a system of equations
b : 1-D array
An array representing the right-hand side of a system of equations
Returns
-------
A : 2-D array
An array representing the left-hand side of a system of equations
b : 1-D array
An array representing the right-hand side of a system of equations
status: int
An integer indicating the status of the removal operation
0: No infeasibility identified
2: Trivially infeasible
message : str
A string descriptor of the exit status of the optimization.
"""
status = 0
message = ""
i_zero = _row_count(A) == 0
A = A[np.logical_not(i_zero), :]
if not np.allclose(b[i_zero], 0):
status = 2
message = "There is a zero row in A_eq with a nonzero corresponding " \
"entry in b_eq. The problem is infeasible."
b = b[np.logical_not(i_zero)]
return A, b, status, message
def bg_update_dense(plu, perm_r, v, j):
LU, p = plu
vperm = v[perm_r]
u = dtrsm(1, LU, vperm, lower=1, diag=1)
LU[:j+1, j] = u[:j+1]
l = u[j+1:]
piv = LU[j, j]
LU[j+1:, j] += (l/piv)
return LU, p
def _remove_redundancy_pivot_dense(A, rhs, true_rank=None):
"""
Eliminates redundant equations from system of equations defined by Ax = b
and identifies infeasibilities.
Parameters
----------
A : 2-D sparse matrix
An matrix representing the left-hand side of a system of equations
rhs : 1-D array
An array representing the right-hand side of a system of equations
Returns
-------
A : 2-D sparse matrix
A matrix representing the left-hand side of a system of equations
rhs : 1-D array
An array representing the right-hand side of a system of equations
status: int
An integer indicating the status of the system
0: No infeasibility identified
2: Trivially infeasible
message : str
A string descriptor of the exit status of the optimization.
References
----------
.. [2] Andersen, Erling D. "Finding all linearly dependent rows in
large-scale linear programming." Optimization Methods and Software
6.3 (1995): 219-227.
"""
tolapiv = 1e-8
tolprimal = 1e-8
status = 0
message = ""
inconsistent = ("There is a linear combination of rows of A_eq that "
"results in zero, suggesting a redundant constraint. "
"However the same linear combination of b_eq is "
"nonzero, suggesting that the constraints conflict "
"and the problem is infeasible.")
A, rhs, status, message = _remove_zero_rows(A, rhs)
if status != 0:
return A, rhs, status, message
m, n = A.shape
v = list(range(m)) # Artificial column indices.
b = list(v) # Basis column indices.
# This is better as a list than a set because column order of basis matrix
# needs to be consistent.
d = [] # Indices of dependent rows
perm_r = None
A_orig = A
A = np.zeros((m, m + n), order='F')
np.fill_diagonal(A, 1)
A[:, m:] = A_orig
e = np.zeros(m)
js_candidates = np.arange(m, m+n, dtype=int) # candidate columns for basis
# manual masking was faster than masked array
js_mask = np.ones(js_candidates.shape, dtype=bool)
# Implements basic algorithm from [2]
# Uses some of the suggested improvements (removing zero rows and
# Bartels-Golub update idea).
# Removing column singletons would be easy, but it is not as important
# because the procedure is performed only on the equality constraint
# matrix from the original problem - not on the canonical form matrix,
# which would have many more column singletons due to slack variables
# from the inequality constraints.
# The thoughts on "crashing" the initial basis are only really useful if
# the matrix is sparse.
lu = np.eye(m, order='F'), np.arange(m) # initial LU is trivial
perm_r = lu[1]
for i in v:
e[i] = 1
if i > 0:
e[i-1] = 0
try: # fails for i==0 and any time it gets ill-conditioned
j = b[i-1]
lu = bg_update_dense(lu, perm_r, A[:, j], i-1)
except Exception:
lu = scipy.linalg.lu_factor(A[:, b])
LU, p = lu
perm_r = list(range(m))
for i1, i2 in enumerate(p):
perm_r[i1], perm_r[i2] = perm_r[i2], perm_r[i1]
pi = scipy.linalg.lu_solve(lu, e, trans=1)
js = js_candidates[js_mask]
batch = 50
# This is a tiny bit faster than looping over columns individually,
# like for j in js: if abs(A[:,j].transpose().dot(pi)) > tolapiv:
for j_index in range(0, len(js), batch):
j_indices = js[j_index: min(j_index+batch, len(js))]
c = abs(A[:, j_indices].transpose().dot(pi))
if (c > tolapiv).any():
j = js[j_index + np.argmax(c)] # very independent column
b[i] = j
js_mask[j-m] = False
break
else:
bibar = pi.T.dot(rhs.reshape(-1, 1))
bnorm = np.linalg.norm(rhs)
if abs(bibar)/(1+bnorm) > tolprimal: # inconsistent
status = 2
message = inconsistent
return A_orig, rhs, status, message
else: # dependent
d.append(i)
if true_rank is not None and len(d) == m - true_rank:
break # found all redundancies
keep = set(range(m))
keep = list(keep - set(d))
return A_orig[keep, :], rhs[keep], status, message
def _remove_redundancy_pivot_sparse(A, rhs):
"""
Eliminates redundant equations from system of equations defined by Ax = b
and identifies infeasibilities.
Parameters
----------
A : 2-D sparse matrix
An matrix representing the left-hand side of a system of equations
rhs : 1-D array
An array representing the right-hand side of a system of equations
Returns
-------
A : 2-D sparse matrix
A matrix representing the left-hand side of a system of equations
rhs : 1-D array
An array representing the right-hand side of a system of equations
status: int
An integer indicating the status of the system
0: No infeasibility identified
2: Trivially infeasible
message : str
A string descriptor of the exit status of the optimization.
References
----------
.. [2] Andersen, Erling D. "Finding all linearly dependent rows in
large-scale linear programming." Optimization Methods and Software
6.3 (1995): 219-227.
"""
tolapiv = 1e-8
tolprimal = 1e-8
status = 0
message = ""
inconsistent = ("There is a linear combination of rows of A_eq that "
"results in zero, suggesting a redundant constraint. "
"However the same linear combination of b_eq is "
"nonzero, suggesting that the constraints conflict "
"and the problem is infeasible.")
A, rhs, status, message = _remove_zero_rows(A, rhs)
if status != 0:
return A, rhs, status, message
m, n = A.shape
v = list(range(m)) # Artificial column indices.
b = list(v) # Basis column indices.
# This is better as a list than a set because column order of basis matrix
# needs to be consistent.
k = set(range(m, m+n)) # Structural column indices.
d = [] # Indices of dependent rows
A_orig = A
A = scipy.sparse.hstack((scipy.sparse.eye(m), A)).tocsc()
e = np.zeros(m)
# Implements basic algorithm from [2]
# Uses only one of the suggested improvements (removing zero rows).
# Removing column singletons would be easy, but it is not as important
# because the procedure is performed only on the equality constraint
# matrix from the original problem - not on the canonical form matrix,
# which would have many more column singletons due to slack variables
# from the inequality constraints.
# The thoughts on "crashing" the initial basis sound useful, but the
# description of the procedure seems to assume a lot of familiarity with
# the subject; it is not very explicit. I already went through enough
# trouble getting the basic algorithm working, so I was not interested in
# trying to decipher this, too. (Overall, the paper is fraught with
# mistakes and ambiguities - which is strange, because the rest of
# Andersen's papers are quite good.)
# I tried and tried and tried to improve performance using the
# Bartels-Golub update. It works, but it's only practical if the LU
# factorization can be specialized as described, and that is not possible
# until the SciPy SuperLU interface permits control over column
# permutation - see issue #7700.
for i in v:
B = A[:, b]
e[i] = 1
if i > 0:
e[i-1] = 0
pi = scipy.sparse.linalg.spsolve(B.transpose(), e).reshape(-1, 1)
js = list(k-set(b)) # not efficient, but this is not the time sink...
# Due to overhead, it tends to be faster (for problems tested) to
# compute the full matrix-vector product rather than individual
# vector-vector products (with the chance of terminating as soon
# as any are nonzero). For very large matrices, it might be worth
# it to compute, say, 100 or 1000 at a time and stop when a nonzero
# is found.
c = (np.abs(A[:, js].transpose().dot(pi)) > tolapiv).nonzero()[0]
if len(c) > 0: # independent
j = js[c[0]]
# in a previous commit, the previous line was changed to choose
# index j corresponding with the maximum dot product.
# While this avoided issues with almost
# singular matrices, it slowed the routine in most NETLIB tests.
# I think this is because these columns were denser than the
# first column with nonzero dot product (c[0]).
# It would be nice to have a heuristic that balances sparsity with
# high dot product, but I don't think it's worth the time to
# develop one right now. Bartels-Golub update is a much higher
# priority.
b[i] = j # replace artificial column
else:
bibar = pi.T.dot(rhs.reshape(-1, 1))
bnorm = np.linalg.norm(rhs)
if abs(bibar)/(1 + bnorm) > tolprimal:
status = 2
message = inconsistent
return A_orig, rhs, status, message
else: # dependent
d.append(i)
keep = set(range(m))
keep = list(keep - set(d))
return A_orig[keep, :], rhs[keep], status, message
def _remove_redundancy_svd(A, b):
"""
Eliminates redundant equations from system of equations defined by Ax = b
and identifies infeasibilities.
Parameters
----------
A : 2-D array
An array representing the left-hand side of a system of equations
b : 1-D array
An array representing the right-hand side of a system of equations
Returns
-------
A : 2-D array
An array representing the left-hand side of a system of equations
b : 1-D array
An array representing the right-hand side of a system of equations
status: int
An integer indicating the status of the system
0: No infeasibility identified
2: Trivially infeasible
message : str
A string descriptor of the exit status of the optimization.
References
----------
.. [2] Andersen, Erling D. "Finding all linearly dependent rows in
large-scale linear programming." Optimization Methods and Software
6.3 (1995): 219-227.
"""
A, b, status, message = _remove_zero_rows(A, b)
if status != 0:
return A, b, status, message
U, s, Vh = svd(A)
eps = np.finfo(float).eps
tol = s.max() * max(A.shape) * eps
m, n = A.shape
s_min = s[-1] if m <= n else 0
# this algorithm is faster than that of [2] when the nullspace is small
# but it could probably be improvement by randomized algorithms and with
# a sparse implementation.
# it relies on repeated singular value decomposition to find linearly
# dependent rows (as identified by columns of U that correspond with zero
# singular values). Unfortunately, only one row can be removed per
# decomposition (I tried otherwise; doing so can cause problems.)
# It would be nice if we could do truncated SVD like sp.sparse.linalg.svds
# but that function is unreliable at finding singular values near zero.
# Finding max eigenvalue L of A A^T, then largest eigenvalue (and
# associated eigenvector) of -A A^T + L I (I is identity) via power
# iteration would also work in theory, but is only efficient if the
# smallest nonzero eigenvalue of A A^T is close to the largest nonzero
# eigenvalue.
while abs(s_min) < tol:
v = U[:, -1] # TODO: return these so user can eliminate from problem?
# rows need to be represented in significant amount
eligibleRows = np.abs(v) > tol * 10e6
if not np.any(eligibleRows) or np.any(np.abs(v.dot(A)) > tol):
status = 4
message = ("Due to numerical issues, redundant equality "
"constraints could not be removed automatically. "
"Try providing your constraint matrices as sparse "
"matrices to activate sparse presolve, try turning "
"off redundancy removal, or try turning off presolve "
"altogether.")
break
if np.any(np.abs(v.dot(b)) > tol * 100): # factor of 100 to fix 10038 and 10349
status = 2
message = ("There is a linear combination of rows of A_eq that "
"results in zero, suggesting a redundant constraint. "
"However the same linear combination of b_eq is "
"nonzero, suggesting that the constraints conflict "
"and the problem is infeasible.")
break
i_remove = _get_densest(A, eligibleRows)
A = np.delete(A, i_remove, axis=0)
b = np.delete(b, i_remove)
U, s, Vh = svd(A)
m, n = A.shape
s_min = s[-1] if m <= n else 0
return A, b, status, message
def _remove_redundancy_id(A, rhs, rank=None, randomized=True):
"""Eliminates redundant equations from a system of equations.
Eliminates redundant equations from system of equations defined by Ax = b
and identifies infeasibilities.
Parameters
----------
A : 2-D array
An array representing the left-hand side of a system of equations
rhs : 1-D array
An array representing the right-hand side of a system of equations
rank : int, optional
The rank of A
randomized: bool, optional
True for randomized interpolative decomposition
Returns
-------
A : 2-D array
An array representing the left-hand side of a system of equations
rhs : 1-D array
An array representing the right-hand side of a system of equations
status: int
An integer indicating the status of the system
0: No infeasibility identified
2: Trivially infeasible
message : str
A string descriptor of the exit status of the optimization.
"""
status = 0
message = ""
inconsistent = ("There is a linear combination of rows of A_eq that "
"results in zero, suggesting a redundant constraint. "
"However the same linear combination of b_eq is "
"nonzero, suggesting that the constraints conflict "
"and the problem is infeasible.")
A, rhs, status, message = _remove_zero_rows(A, rhs)
if status != 0:
return A, rhs, status, message
m, n = A.shape
k = rank
if rank is None:
k = np.linalg.matrix_rank(A)
idx, proj = interp_decomp(A.T, k, rand=randomized)
# first k entries in idx are indices of the independent rows
# remaining entries are the indices of the m-k dependent rows
# proj provides a linear combinations of rows of A2 that form the
# remaining m-k (dependent) rows. The same linear combination of entries
# in rhs2 must give the remaining m-k entries. If not, the system is
# inconsistent, and the problem is infeasible.
if not np.allclose(rhs[idx[:k]] @ proj, rhs[idx[k:]]):
status = 2
message = inconsistent
# sort indices because the other redundancy removal routines leave rows
# in original order and tests were written with that in mind
idx = sorted(idx[:k])
A2 = A[idx, :]
rhs2 = rhs[idx]
return A2, rhs2, status, message
@@ -0,0 +1,711 @@
"""
Unified interfaces to root finding algorithms.
Functions
---------
- root : find a root of a vector function.
"""
__all__ = ['root']
import numpy as np
from warnings import warn
from ._optimize import MemoizeJac, OptimizeResult, _check_unknown_options
from ._minpack_py import _root_hybr, leastsq
from ._spectral import _root_df_sane
from . import _nonlin as nonlin
ROOT_METHODS = ['hybr', 'lm', 'broyden1', 'broyden2', 'anderson',
'linearmixing', 'diagbroyden', 'excitingmixing', 'krylov',
'df-sane']
def root(fun, x0, args=(), method='hybr', jac=None, tol=None, callback=None,
options=None):
r"""
Find a root of a vector function.
Parameters
----------
fun : callable
A vector function to find a root of.
x0 : ndarray
Initial guess.
args : tuple, optional
Extra arguments passed to the objective function and its Jacobian.
method : str, optional
Type of solver. Should be one of
- 'hybr' :ref:`(see here) <optimize.root-hybr>`
- 'lm' :ref:`(see here) <optimize.root-lm>`
- 'broyden1' :ref:`(see here) <optimize.root-broyden1>`
- 'broyden2' :ref:`(see here) <optimize.root-broyden2>`
- 'anderson' :ref:`(see here) <optimize.root-anderson>`
- 'linearmixing' :ref:`(see here) <optimize.root-linearmixing>`
- 'diagbroyden' :ref:`(see here) <optimize.root-diagbroyden>`
- 'excitingmixing' :ref:`(see here) <optimize.root-excitingmixing>`
- 'krylov' :ref:`(see here) <optimize.root-krylov>`
- 'df-sane' :ref:`(see here) <optimize.root-dfsane>`
jac : bool or callable, optional
If `jac` is a Boolean and is True, `fun` is assumed to return the
value of Jacobian along with the objective function. If False, the
Jacobian will be estimated numerically.
`jac` can also be a callable returning the Jacobian of `fun`. In
this case, it must accept the same arguments as `fun`.
tol : float, optional
Tolerance for termination. For detailed control, use solver-specific
options.
callback : function, optional
Optional callback function. It is called on every iteration as
``callback(x, f)`` where `x` is the current solution and `f`
the corresponding residual. For all methods but 'hybr' and 'lm'.
options : dict, optional
A dictionary of solver options. E.g., `xtol` or `maxiter`, see
:obj:`show_options()` for details.
Returns
-------
sol : OptimizeResult
The solution represented as a ``OptimizeResult`` object.
Important attributes are: ``x`` the solution array, ``success`` a
Boolean flag indicating if the algorithm exited successfully and
``message`` which describes the cause of the termination. See
`OptimizeResult` for a description of other attributes.
See also
--------
show_options : Additional options accepted by the solvers
Notes
-----
This section describes the available solvers that can be selected by the
'method' parameter. The default method is *hybr*.
Method *hybr* uses a modification of the Powell hybrid method as
implemented in MINPACK [1]_.
Method *lm* solves the system of nonlinear equations in a least squares
sense using a modification of the Levenberg-Marquardt algorithm as
implemented in MINPACK [1]_.
Method *df-sane* is a derivative-free spectral method. [3]_
Methods *broyden1*, *broyden2*, *anderson*, *linearmixing*,
*diagbroyden*, *excitingmixing*, *krylov* are inexact Newton methods,
with backtracking or full line searches [2]_. Each method corresponds
to a particular Jacobian approximations.
- Method *broyden1* uses Broyden's first Jacobian approximation, it is
known as Broyden's good method.
- Method *broyden2* uses Broyden's second Jacobian approximation, it
is known as Broyden's bad method.
- Method *anderson* uses (extended) Anderson mixing.
- Method *Krylov* uses Krylov approximation for inverse Jacobian. It
is suitable for large-scale problem.
- Method *diagbroyden* uses diagonal Broyden Jacobian approximation.
- Method *linearmixing* uses a scalar Jacobian approximation.
- Method *excitingmixing* uses a tuned diagonal Jacobian
approximation.
.. warning::
The algorithms implemented for methods *diagbroyden*,
*linearmixing* and *excitingmixing* may be useful for specific
problems, but whether they will work may depend strongly on the
problem.
.. versionadded:: 0.11.0
References
----------
.. [1] More, Jorge J., Burton S. Garbow, and Kenneth E. Hillstrom.
1980. User Guide for MINPACK-1.
.. [2] C. T. Kelley. 1995. Iterative Methods for Linear and Nonlinear
Equations. Society for Industrial and Applied Mathematics.
<https://archive.siam.org/books/kelley/fr16/>
.. [3] W. La Cruz, J.M. Martinez, M. Raydan. Math. Comp. 75, 1429 (2006).
Examples
--------
The following functions define a system of nonlinear equations and its
jacobian.
>>> import numpy as np
>>> def fun(x):
... return [x[0] + 0.5 * (x[0] - x[1])**3 - 1.0,
... 0.5 * (x[1] - x[0])**3 + x[1]]
>>> def jac(x):
... return np.array([[1 + 1.5 * (x[0] - x[1])**2,
... -1.5 * (x[0] - x[1])**2],
... [-1.5 * (x[1] - x[0])**2,
... 1 + 1.5 * (x[1] - x[0])**2]])
A solution can be obtained as follows.
>>> from scipy import optimize
>>> sol = optimize.root(fun, [0, 0], jac=jac, method='hybr')
>>> sol.x
array([ 0.8411639, 0.1588361])
**Large problem**
Suppose that we needed to solve the following integrodifferential
equation on the square :math:`[0,1]\times[0,1]`:
.. math::
\nabla^2 P = 10 \left(\int_0^1\int_0^1\cosh(P)\,dx\,dy\right)^2
with :math:`P(x,1) = 1` and :math:`P=0` elsewhere on the boundary of
the square.
The solution can be found using the ``method='krylov'`` solver:
>>> from scipy import optimize
>>> # parameters
>>> nx, ny = 75, 75
>>> hx, hy = 1./(nx-1), 1./(ny-1)
>>> P_left, P_right = 0, 0
>>> P_top, P_bottom = 1, 0
>>> def residual(P):
... d2x = np.zeros_like(P)
... d2y = np.zeros_like(P)
...
... d2x[1:-1] = (P[2:] - 2*P[1:-1] + P[:-2]) / hx/hx
... d2x[0] = (P[1] - 2*P[0] + P_left)/hx/hx
... d2x[-1] = (P_right - 2*P[-1] + P[-2])/hx/hx
...
... d2y[:,1:-1] = (P[:,2:] - 2*P[:,1:-1] + P[:,:-2])/hy/hy
... d2y[:,0] = (P[:,1] - 2*P[:,0] + P_bottom)/hy/hy
... d2y[:,-1] = (P_top - 2*P[:,-1] + P[:,-2])/hy/hy
...
... return d2x + d2y - 10*np.cosh(P).mean()**2
>>> guess = np.zeros((nx, ny), float)
>>> sol = optimize.root(residual, guess, method='krylov')
>>> print('Residual: %g' % abs(residual(sol.x)).max())
Residual: 5.7972e-06 # may vary
>>> import matplotlib.pyplot as plt
>>> x, y = np.mgrid[0:1:(nx*1j), 0:1:(ny*1j)]
>>> plt.pcolormesh(x, y, sol.x, shading='gouraud')
>>> plt.colorbar()
>>> plt.show()
"""
if not isinstance(args, tuple):
args = (args,)
meth = method.lower()
if options is None:
options = {}
if callback is not None and meth in ('hybr', 'lm'):
warn('Method %s does not accept callback.' % method,
RuntimeWarning, stacklevel=2)
# fun also returns the Jacobian
if not callable(jac) and meth in ('hybr', 'lm'):
if bool(jac):
fun = MemoizeJac(fun)
jac = fun.derivative
else:
jac = None
# set default tolerances
if tol is not None:
options = dict(options)
if meth in ('hybr', 'lm'):
options.setdefault('xtol', tol)
elif meth in ('df-sane',):
options.setdefault('ftol', tol)
elif meth in ('broyden1', 'broyden2', 'anderson', 'linearmixing',
'diagbroyden', 'excitingmixing', 'krylov'):
options.setdefault('xtol', tol)
options.setdefault('xatol', np.inf)
options.setdefault('ftol', np.inf)
options.setdefault('fatol', np.inf)
if meth == 'hybr':
sol = _root_hybr(fun, x0, args=args, jac=jac, **options)
elif meth == 'lm':
sol = _root_leastsq(fun, x0, args=args, jac=jac, **options)
elif meth == 'df-sane':
_warn_jac_unused(jac, method)
sol = _root_df_sane(fun, x0, args=args, callback=callback,
**options)
elif meth in ('broyden1', 'broyden2', 'anderson', 'linearmixing',
'diagbroyden', 'excitingmixing', 'krylov'):
_warn_jac_unused(jac, method)
sol = _root_nonlin_solve(fun, x0, args=args, jac=jac,
_method=meth, _callback=callback,
**options)
else:
raise ValueError('Unknown solver %s' % method)
return sol
def _warn_jac_unused(jac, method):
if jac is not None:
warn(f'Method {method} does not use the jacobian (jac).',
RuntimeWarning, stacklevel=2)
def _root_leastsq(fun, x0, args=(), jac=None,
col_deriv=0, xtol=1.49012e-08, ftol=1.49012e-08,
gtol=0.0, maxiter=0, eps=0.0, factor=100, diag=None,
**unknown_options):
"""
Solve for least squares with Levenberg-Marquardt
Options
-------
col_deriv : bool
non-zero to specify that the Jacobian function computes derivatives
down the columns (faster, because there is no transpose operation).
ftol : float
Relative error desired in the sum of squares.
xtol : float
Relative error desired in the approximate solution.
gtol : float
Orthogonality desired between the function vector and the columns
of the Jacobian.
maxiter : int
The maximum number of calls to the function. If zero, then
100*(N+1) is the maximum where N is the number of elements in x0.
eps : float
A suitable step length for the forward-difference approximation of
the Jacobian (for Dfun=None). If `eps` is less than the machine
precision, it is assumed that the relative errors in the functions
are of the order of the machine precision.
factor : float
A parameter determining the initial step bound
(``factor * || diag * x||``). Should be in interval ``(0.1, 100)``.
diag : sequence
N positive entries that serve as a scale factors for the variables.
"""
_check_unknown_options(unknown_options)
x, cov_x, info, msg, ier = leastsq(fun, x0, args=args, Dfun=jac,
full_output=True,
col_deriv=col_deriv, xtol=xtol,
ftol=ftol, gtol=gtol,
maxfev=maxiter, epsfcn=eps,
factor=factor, diag=diag)
sol = OptimizeResult(x=x, message=msg, status=ier,
success=ier in (1, 2, 3, 4), cov_x=cov_x,
fun=info.pop('fvec'), method="lm")
sol.update(info)
return sol
def _root_nonlin_solve(fun, x0, args=(), jac=None,
_callback=None, _method=None,
nit=None, disp=False, maxiter=None,
ftol=None, fatol=None, xtol=None, xatol=None,
tol_norm=None, line_search='armijo', jac_options=None,
**unknown_options):
_check_unknown_options(unknown_options)
f_tol = fatol
f_rtol = ftol
x_tol = xatol
x_rtol = xtol
verbose = disp
if jac_options is None:
jac_options = dict()
jacobian = {'broyden1': nonlin.BroydenFirst,
'broyden2': nonlin.BroydenSecond,
'anderson': nonlin.Anderson,
'linearmixing': nonlin.LinearMixing,
'diagbroyden': nonlin.DiagBroyden,
'excitingmixing': nonlin.ExcitingMixing,
'krylov': nonlin.KrylovJacobian
}[_method]
if args:
if jac is True:
def f(x):
return fun(x, *args)[0]
else:
def f(x):
return fun(x, *args)
else:
f = fun
x, info = nonlin.nonlin_solve(f, x0, jacobian=jacobian(**jac_options),
iter=nit, verbose=verbose,
maxiter=maxiter, f_tol=f_tol,
f_rtol=f_rtol, x_tol=x_tol,
x_rtol=x_rtol, tol_norm=tol_norm,
line_search=line_search,
callback=_callback, full_output=True,
raise_exception=False)
sol = OptimizeResult(x=x, method=_method)
sol.update(info)
return sol
def _root_broyden1_doc():
"""
Options
-------
nit : int, optional
Number of iterations to make. If omitted (default), make as many
as required to meet tolerances.
disp : bool, optional
Print status to stdout on every iteration.
maxiter : int, optional
Maximum number of iterations to make.
ftol : float, optional
Relative tolerance for the residual. If omitted, not used.
fatol : float, optional
Absolute tolerance (in max-norm) for the residual.
If omitted, default is 6e-6.
xtol : float, optional
Relative minimum step size. If omitted, not used.
xatol : float, optional
Absolute minimum step size, as determined from the Jacobian
approximation. If the step size is smaller than this, optimization
is terminated as successful. If omitted, not used.
tol_norm : function(vector) -> scalar, optional
Norm to use in convergence check. Default is the maximum norm.
line_search : {None, 'armijo' (default), 'wolfe'}, optional
Which type of a line search to use to determine the step size in
the direction given by the Jacobian approximation. Defaults to
'armijo'.
jac_options : dict, optional
Options for the respective Jacobian approximation.
alpha : float, optional
Initial guess for the Jacobian is (-1/alpha).
reduction_method : str or tuple, optional
Method used in ensuring that the rank of the Broyden
matrix stays low. Can either be a string giving the
name of the method, or a tuple of the form ``(method,
param1, param2, ...)`` that gives the name of the
method and values for additional parameters.
Methods available:
- ``restart``
Drop all matrix columns. Has no
extra parameters.
- ``simple``
Drop oldest matrix column. Has no
extra parameters.
- ``svd``
Keep only the most significant SVD
components.
Extra parameters:
- ``to_retain``
Number of SVD components to
retain when rank reduction is done.
Default is ``max_rank - 2``.
max_rank : int, optional
Maximum rank for the Broyden matrix.
Default is infinity (i.e., no rank reduction).
Examples
--------
>>> def func(x):
... return np.cos(x) + x[::-1] - [1, 2, 3, 4]
...
>>> from scipy import optimize
>>> res = optimize.root(func, [1, 1, 1, 1], method='broyden1', tol=1e-14)
>>> x = res.x
>>> x
array([4.04674914, 3.91158389, 2.71791677, 1.61756251])
>>> np.cos(x) + x[::-1]
array([1., 2., 3., 4.])
"""
pass
def _root_broyden2_doc():
"""
Options
-------
nit : int, optional
Number of iterations to make. If omitted (default), make as many
as required to meet tolerances.
disp : bool, optional
Print status to stdout on every iteration.
maxiter : int, optional
Maximum number of iterations to make.
ftol : float, optional
Relative tolerance for the residual. If omitted, not used.
fatol : float, optional
Absolute tolerance (in max-norm) for the residual.
If omitted, default is 6e-6.
xtol : float, optional
Relative minimum step size. If omitted, not used.
xatol : float, optional
Absolute minimum step size, as determined from the Jacobian
approximation. If the step size is smaller than this, optimization
is terminated as successful. If omitted, not used.
tol_norm : function(vector) -> scalar, optional
Norm to use in convergence check. Default is the maximum norm.
line_search : {None, 'armijo' (default), 'wolfe'}, optional
Which type of a line search to use to determine the step size in
the direction given by the Jacobian approximation. Defaults to
'armijo'.
jac_options : dict, optional
Options for the respective Jacobian approximation.
alpha : float, optional
Initial guess for the Jacobian is (-1/alpha).
reduction_method : str or tuple, optional
Method used in ensuring that the rank of the Broyden
matrix stays low. Can either be a string giving the
name of the method, or a tuple of the form ``(method,
param1, param2, ...)`` that gives the name of the
method and values for additional parameters.
Methods available:
- ``restart``
Drop all matrix columns. Has no
extra parameters.
- ``simple``
Drop oldest matrix column. Has no
extra parameters.
- ``svd``
Keep only the most significant SVD
components.
Extra parameters:
- ``to_retain``
Number of SVD components to
retain when rank reduction is done.
Default is ``max_rank - 2``.
max_rank : int, optional
Maximum rank for the Broyden matrix.
Default is infinity (i.e., no rank reduction).
"""
pass
def _root_anderson_doc():
"""
Options
-------
nit : int, optional
Number of iterations to make. If omitted (default), make as many
as required to meet tolerances.
disp : bool, optional
Print status to stdout on every iteration.
maxiter : int, optional
Maximum number of iterations to make.
ftol : float, optional
Relative tolerance for the residual. If omitted, not used.
fatol : float, optional
Absolute tolerance (in max-norm) for the residual.
If omitted, default is 6e-6.
xtol : float, optional
Relative minimum step size. If omitted, not used.
xatol : float, optional
Absolute minimum step size, as determined from the Jacobian
approximation. If the step size is smaller than this, optimization
is terminated as successful. If omitted, not used.
tol_norm : function(vector) -> scalar, optional
Norm to use in convergence check. Default is the maximum norm.
line_search : {None, 'armijo' (default), 'wolfe'}, optional
Which type of a line search to use to determine the step size in
the direction given by the Jacobian approximation. Defaults to
'armijo'.
jac_options : dict, optional
Options for the respective Jacobian approximation.
alpha : float, optional
Initial guess for the Jacobian is (-1/alpha).
M : float, optional
Number of previous vectors to retain. Defaults to 5.
w0 : float, optional
Regularization parameter for numerical stability.
Compared to unity, good values of the order of 0.01.
"""
pass
def _root_linearmixing_doc():
"""
Options
-------
nit : int, optional
Number of iterations to make. If omitted (default), make as many
as required to meet tolerances.
disp : bool, optional
Print status to stdout on every iteration.
maxiter : int, optional
Maximum number of iterations to make.
ftol : float, optional
Relative tolerance for the residual. If omitted, not used.
fatol : float, optional
Absolute tolerance (in max-norm) for the residual.
If omitted, default is 6e-6.
xtol : float, optional
Relative minimum step size. If omitted, not used.
xatol : float, optional
Absolute minimum step size, as determined from the Jacobian
approximation. If the step size is smaller than this, optimization
is terminated as successful. If omitted, not used.
tol_norm : function(vector) -> scalar, optional
Norm to use in convergence check. Default is the maximum norm.
line_search : {None, 'armijo' (default), 'wolfe'}, optional
Which type of a line search to use to determine the step size in
the direction given by the Jacobian approximation. Defaults to
'armijo'.
jac_options : dict, optional
Options for the respective Jacobian approximation.
alpha : float, optional
initial guess for the jacobian is (-1/alpha).
"""
pass
def _root_diagbroyden_doc():
"""
Options
-------
nit : int, optional
Number of iterations to make. If omitted (default), make as many
as required to meet tolerances.
disp : bool, optional
Print status to stdout on every iteration.
maxiter : int, optional
Maximum number of iterations to make.
ftol : float, optional
Relative tolerance for the residual. If omitted, not used.
fatol : float, optional
Absolute tolerance (in max-norm) for the residual.
If omitted, default is 6e-6.
xtol : float, optional
Relative minimum step size. If omitted, not used.
xatol : float, optional
Absolute minimum step size, as determined from the Jacobian
approximation. If the step size is smaller than this, optimization
is terminated as successful. If omitted, not used.
tol_norm : function(vector) -> scalar, optional
Norm to use in convergence check. Default is the maximum norm.
line_search : {None, 'armijo' (default), 'wolfe'}, optional
Which type of a line search to use to determine the step size in
the direction given by the Jacobian approximation. Defaults to
'armijo'.
jac_options : dict, optional
Options for the respective Jacobian approximation.
alpha : float, optional
initial guess for the jacobian is (-1/alpha).
"""
pass
def _root_excitingmixing_doc():
"""
Options
-------
nit : int, optional
Number of iterations to make. If omitted (default), make as many
as required to meet tolerances.
disp : bool, optional
Print status to stdout on every iteration.
maxiter : int, optional
Maximum number of iterations to make.
ftol : float, optional
Relative tolerance for the residual. If omitted, not used.
fatol : float, optional
Absolute tolerance (in max-norm) for the residual.
If omitted, default is 6e-6.
xtol : float, optional
Relative minimum step size. If omitted, not used.
xatol : float, optional
Absolute minimum step size, as determined from the Jacobian
approximation. If the step size is smaller than this, optimization
is terminated as successful. If omitted, not used.
tol_norm : function(vector) -> scalar, optional
Norm to use in convergence check. Default is the maximum norm.
line_search : {None, 'armijo' (default), 'wolfe'}, optional
Which type of a line search to use to determine the step size in
the direction given by the Jacobian approximation. Defaults to
'armijo'.
jac_options : dict, optional
Options for the respective Jacobian approximation.
alpha : float, optional
Initial Jacobian approximation is (-1/alpha).
alphamax : float, optional
The entries of the diagonal Jacobian are kept in the range
``[alpha, alphamax]``.
"""
pass
def _root_krylov_doc():
"""
Options
-------
nit : int, optional
Number of iterations to make. If omitted (default), make as many
as required to meet tolerances.
disp : bool, optional
Print status to stdout on every iteration.
maxiter : int, optional
Maximum number of iterations to make.
ftol : float, optional
Relative tolerance for the residual. If omitted, not used.
fatol : float, optional
Absolute tolerance (in max-norm) for the residual.
If omitted, default is 6e-6.
xtol : float, optional
Relative minimum step size. If omitted, not used.
xatol : float, optional
Absolute minimum step size, as determined from the Jacobian
approximation. If the step size is smaller than this, optimization
is terminated as successful. If omitted, not used.
tol_norm : function(vector) -> scalar, optional
Norm to use in convergence check. Default is the maximum norm.
line_search : {None, 'armijo' (default), 'wolfe'}, optional
Which type of a line search to use to determine the step size in
the direction given by the Jacobian approximation. Defaults to
'armijo'.
jac_options : dict, optional
Options for the respective Jacobian approximation.
rdiff : float, optional
Relative step size to use in numerical differentiation.
method : str or callable, optional
Krylov method to use to approximate the Jacobian. Can be a string,
or a function implementing the same interface as the iterative
solvers in `scipy.sparse.linalg`. If a string, needs to be one of:
``'lgmres'``, ``'gmres'``, ``'bicgstab'``, ``'cgs'``, ``'minres'``,
``'tfqmr'``.
The default is `scipy.sparse.linalg.lgmres`.
inner_M : LinearOperator or InverseJacobian
Preconditioner for the inner Krylov iteration.
Note that you can use also inverse Jacobians as (adaptive)
preconditioners. For example,
>>> jac = BroydenFirst()
>>> kjac = KrylovJacobian(inner_M=jac.inverse).
If the preconditioner has a method named 'update', it will
be called as ``update(x, f)`` after each nonlinear step,
with ``x`` giving the current point, and ``f`` the current
function value.
inner_tol, inner_maxiter, ...
Parameters to pass on to the "inner" Krylov solver.
See `scipy.sparse.linalg.gmres` for details.
outer_k : int, optional
Size of the subspace kept across LGMRES nonlinear
iterations.
See `scipy.sparse.linalg.lgmres` for details.
"""
pass
@@ -0,0 +1,525 @@
"""
Unified interfaces to root finding algorithms for real or complex
scalar functions.
Functions
---------
- root : find a root of a scalar function.
"""
import numpy as np
from . import _zeros_py as optzeros
from ._numdiff import approx_derivative
__all__ = ['root_scalar']
ROOT_SCALAR_METHODS = ['bisect', 'brentq', 'brenth', 'ridder', 'toms748',
'newton', 'secant', 'halley']
class MemoizeDer:
"""Decorator that caches the value and derivative(s) of function each
time it is called.
This is a simplistic memoizer that calls and caches a single value
of `f(x, *args)`.
It assumes that `args` does not change between invocations.
It supports the use case of a root-finder where `args` is fixed,
`x` changes, and only rarely, if at all, does x assume the same value
more than once."""
def __init__(self, fun):
self.fun = fun
self.vals = None
self.x = None
self.n_calls = 0
def __call__(self, x, *args):
r"""Calculate f or use cached value if available"""
# Derivative may be requested before the function itself, always check
if self.vals is None or x != self.x:
fg = self.fun(x, *args)
self.x = x
self.n_calls += 1
self.vals = fg[:]
return self.vals[0]
def fprime(self, x, *args):
r"""Calculate f' or use a cached value if available"""
if self.vals is None or x != self.x:
self(x, *args)
return self.vals[1]
def fprime2(self, x, *args):
r"""Calculate f'' or use a cached value if available"""
if self.vals is None or x != self.x:
self(x, *args)
return self.vals[2]
def ncalls(self):
return self.n_calls
def root_scalar(f, args=(), method=None, bracket=None,
fprime=None, fprime2=None,
x0=None, x1=None,
xtol=None, rtol=None, maxiter=None,
options=None):
"""
Find a root of a scalar function.
Parameters
----------
f : callable
A function to find a root of.
args : tuple, optional
Extra arguments passed to the objective function and its derivative(s).
method : str, optional
Type of solver. Should be one of
- 'bisect' :ref:`(see here) <optimize.root_scalar-bisect>`
- 'brentq' :ref:`(see here) <optimize.root_scalar-brentq>`
- 'brenth' :ref:`(see here) <optimize.root_scalar-brenth>`
- 'ridder' :ref:`(see here) <optimize.root_scalar-ridder>`
- 'toms748' :ref:`(see here) <optimize.root_scalar-toms748>`
- 'newton' :ref:`(see here) <optimize.root_scalar-newton>`
- 'secant' :ref:`(see here) <optimize.root_scalar-secant>`
- 'halley' :ref:`(see here) <optimize.root_scalar-halley>`
bracket: A sequence of 2 floats, optional
An interval bracketing a root. `f(x, *args)` must have different
signs at the two endpoints.
x0 : float, optional
Initial guess.
x1 : float, optional
A second guess.
fprime : bool or callable, optional
If `fprime` is a boolean and is True, `f` is assumed to return the
value of the objective function and of the derivative.
`fprime` can also be a callable returning the derivative of `f`. In
this case, it must accept the same arguments as `f`.
fprime2 : bool or callable, optional
If `fprime2` is a boolean and is True, `f` is assumed to return the
value of the objective function and of the
first and second derivatives.
`fprime2` can also be a callable returning the second derivative of `f`.
In this case, it must accept the same arguments as `f`.
xtol : float, optional
Tolerance (absolute) for termination.
rtol : float, optional
Tolerance (relative) for termination.
maxiter : int, optional
Maximum number of iterations.
options : dict, optional
A dictionary of solver options. E.g., ``k``, see
:obj:`show_options()` for details.
Returns
-------
sol : RootResults
The solution represented as a ``RootResults`` object.
Important attributes are: ``root`` the solution , ``converged`` a
boolean flag indicating if the algorithm exited successfully and
``flag`` which describes the cause of the termination. See
`RootResults` for a description of other attributes.
See also
--------
show_options : Additional options accepted by the solvers
root : Find a root of a vector function.
Notes
-----
This section describes the available solvers that can be selected by the
'method' parameter.
The default is to use the best method available for the situation
presented.
If a bracket is provided, it may use one of the bracketing methods.
If a derivative and an initial value are specified, it may
select one of the derivative-based methods.
If no method is judged applicable, it will raise an Exception.
Arguments for each method are as follows (x=required, o=optional).
+-----------------------------------------------+---+------+---------+----+----+--------+---------+------+------+---------+---------+
| method | f | args | bracket | x0 | x1 | fprime | fprime2 | xtol | rtol | maxiter | options |
+===============================================+===+======+=========+====+====+========+=========+======+======+=========+=========+
| :ref:`bisect <optimize.root_scalar-bisect>` | x | o | x | | | | | o | o | o | o |
+-----------------------------------------------+---+------+---------+----+----+--------+---------+------+------+---------+---------+
| :ref:`brentq <optimize.root_scalar-brentq>` | x | o | x | | | | | o | o | o | o |
+-----------------------------------------------+---+------+---------+----+----+--------+---------+------+------+---------+---------+
| :ref:`brenth <optimize.root_scalar-brenth>` | x | o | x | | | | | o | o | o | o |
+-----------------------------------------------+---+------+---------+----+----+--------+---------+------+------+---------+---------+
| :ref:`ridder <optimize.root_scalar-ridder>` | x | o | x | | | | | o | o | o | o |
+-----------------------------------------------+---+------+---------+----+----+--------+---------+------+------+---------+---------+
| :ref:`toms748 <optimize.root_scalar-toms748>` | x | o | x | | | | | o | o | o | o |
+-----------------------------------------------+---+------+---------+----+----+--------+---------+------+------+---------+---------+
| :ref:`secant <optimize.root_scalar-secant>` | x | o | | x | o | | | o | o | o | o |
+-----------------------------------------------+---+------+---------+----+----+--------+---------+------+------+---------+---------+
| :ref:`newton <optimize.root_scalar-newton>` | x | o | | x | | o | | o | o | o | o |
+-----------------------------------------------+---+------+---------+----+----+--------+---------+------+------+---------+---------+
| :ref:`halley <optimize.root_scalar-halley>` | x | o | | x | | x | x | o | o | o | o |
+-----------------------------------------------+---+------+---------+----+----+--------+---------+------+------+---------+---------+
Examples
--------
Find the root of a simple cubic
>>> from scipy import optimize
>>> def f(x):
... return (x**3 - 1) # only one real root at x = 1
>>> def fprime(x):
... return 3*x**2
The `brentq` method takes as input a bracket
>>> sol = optimize.root_scalar(f, bracket=[0, 3], method='brentq')
>>> sol.root, sol.iterations, sol.function_calls
(1.0, 10, 11)
The `newton` method takes as input a single point and uses the
derivative(s).
>>> sol = optimize.root_scalar(f, x0=0.2, fprime=fprime, method='newton')
>>> sol.root, sol.iterations, sol.function_calls
(1.0, 11, 22)
The function can provide the value and derivative(s) in a single call.
>>> def f_p_pp(x):
... return (x**3 - 1), 3*x**2, 6*x
>>> sol = optimize.root_scalar(
... f_p_pp, x0=0.2, fprime=True, method='newton'
... )
>>> sol.root, sol.iterations, sol.function_calls
(1.0, 11, 11)
>>> sol = optimize.root_scalar(
... f_p_pp, x0=0.2, fprime=True, fprime2=True, method='halley'
... )
>>> sol.root, sol.iterations, sol.function_calls
(1.0, 7, 8)
""" # noqa: E501
if not isinstance(args, tuple):
args = (args,)
if options is None:
options = {}
# fun also returns the derivative(s)
is_memoized = False
if fprime2 is not None and not callable(fprime2):
if bool(fprime2):
f = MemoizeDer(f)
is_memoized = True
fprime2 = f.fprime2
fprime = f.fprime
else:
fprime2 = None
if fprime is not None and not callable(fprime):
if bool(fprime):
f = MemoizeDer(f)
is_memoized = True
fprime = f.fprime
else:
fprime = None
# respect solver-specific default tolerances - only pass in if actually set
kwargs = {}
for k in ['xtol', 'rtol', 'maxiter']:
v = locals().get(k)
if v is not None:
kwargs[k] = v
# Set any solver-specific options
if options:
kwargs.update(options)
# Always request full_output from the underlying method as _root_scalar
# always returns a RootResults object
kwargs.update(full_output=True, disp=False)
# Pick a method if not specified.
# Use the "best" method available for the situation.
if not method:
if bracket:
method = 'brentq'
elif x0 is not None:
if fprime:
if fprime2:
method = 'halley'
else:
method = 'newton'
elif x1 is not None:
method = 'secant'
else:
method = 'newton'
if not method:
raise ValueError('Unable to select a solver as neither bracket '
'nor starting point provided.')
meth = method.lower()
map2underlying = {'halley': 'newton', 'secant': 'newton'}
try:
methodc = getattr(optzeros, map2underlying.get(meth, meth))
except AttributeError as e:
raise ValueError('Unknown solver %s' % meth) from e
if meth in ['bisect', 'ridder', 'brentq', 'brenth', 'toms748']:
if not isinstance(bracket, (list, tuple, np.ndarray)):
raise ValueError('Bracket needed for %s' % method)
a, b = bracket[:2]
try:
r, sol = methodc(f, a, b, args=args, **kwargs)
except ValueError as e:
# gh-17622 fixed some bugs in low-level solvers by raising an error
# (rather than returning incorrect results) when the callable
# returns a NaN. It did so by wrapping the callable rather than
# modifying compiled code, so the iteration count is not available.
if hasattr(e, "_x"):
sol = optzeros.RootResults(root=e._x,
iterations=np.nan,
function_calls=e._function_calls,
flag=str(e), method=method)
else:
raise
elif meth in ['secant']:
if x0 is None:
raise ValueError('x0 must not be None for %s' % method)
if 'xtol' in kwargs:
kwargs['tol'] = kwargs.pop('xtol')
r, sol = methodc(f, x0, args=args, fprime=None, fprime2=None,
x1=x1, **kwargs)
elif meth in ['newton']:
if x0 is None:
raise ValueError('x0 must not be None for %s' % method)
if not fprime:
# approximate fprime with finite differences
def fprime(x, *args):
# `root_scalar` doesn't actually seem to support vectorized
# use of `newton`. In that case, `approx_derivative` will
# always get scalar input. Nonetheless, it always returns an
# array, so we extract the element to produce scalar output.
return approx_derivative(f, x, method='2-point', args=args)[0]
if 'xtol' in kwargs:
kwargs['tol'] = kwargs.pop('xtol')
r, sol = methodc(f, x0, args=args, fprime=fprime, fprime2=None,
**kwargs)
elif meth in ['halley']:
if x0 is None:
raise ValueError('x0 must not be None for %s' % method)
if not fprime:
raise ValueError('fprime must be specified for %s' % method)
if not fprime2:
raise ValueError('fprime2 must be specified for %s' % method)
if 'xtol' in kwargs:
kwargs['tol'] = kwargs.pop('xtol')
r, sol = methodc(f, x0, args=args, fprime=fprime, fprime2=fprime2, **kwargs)
else:
raise ValueError('Unknown solver %s' % method)
if is_memoized:
# Replace the function_calls count with the memoized count.
# Avoids double and triple-counting.
n_calls = f.n_calls
sol.function_calls = n_calls
return sol
def _root_scalar_brentq_doc():
r"""
Options
-------
args : tuple, optional
Extra arguments passed to the objective function.
bracket: A sequence of 2 floats, optional
An interval bracketing a root. `f(x, *args)` must have different
signs at the two endpoints.
xtol : float, optional
Tolerance (absolute) for termination.
rtol : float, optional
Tolerance (relative) for termination.
maxiter : int, optional
Maximum number of iterations.
options: dict, optional
Specifies any method-specific options not covered above
"""
pass
def _root_scalar_brenth_doc():
r"""
Options
-------
args : tuple, optional
Extra arguments passed to the objective function.
bracket: A sequence of 2 floats, optional
An interval bracketing a root. `f(x, *args)` must have different
signs at the two endpoints.
xtol : float, optional
Tolerance (absolute) for termination.
rtol : float, optional
Tolerance (relative) for termination.
maxiter : int, optional
Maximum number of iterations.
options: dict, optional
Specifies any method-specific options not covered above.
"""
pass
def _root_scalar_toms748_doc():
r"""
Options
-------
args : tuple, optional
Extra arguments passed to the objective function.
bracket: A sequence of 2 floats, optional
An interval bracketing a root. `f(x, *args)` must have different
signs at the two endpoints.
xtol : float, optional
Tolerance (absolute) for termination.
rtol : float, optional
Tolerance (relative) for termination.
maxiter : int, optional
Maximum number of iterations.
options: dict, optional
Specifies any method-specific options not covered above.
"""
pass
def _root_scalar_secant_doc():
r"""
Options
-------
args : tuple, optional
Extra arguments passed to the objective function.
xtol : float, optional
Tolerance (absolute) for termination.
rtol : float, optional
Tolerance (relative) for termination.
maxiter : int, optional
Maximum number of iterations.
x0 : float, required
Initial guess.
x1 : float, required
A second guess.
options: dict, optional
Specifies any method-specific options not covered above.
"""
pass
def _root_scalar_newton_doc():
r"""
Options
-------
args : tuple, optional
Extra arguments passed to the objective function and its derivative.
xtol : float, optional
Tolerance (absolute) for termination.
rtol : float, optional
Tolerance (relative) for termination.
maxiter : int, optional
Maximum number of iterations.
x0 : float, required
Initial guess.
fprime : bool or callable, optional
If `fprime` is a boolean and is True, `f` is assumed to return the
value of derivative along with the objective function.
`fprime` can also be a callable returning the derivative of `f`. In
this case, it must accept the same arguments as `f`.
options: dict, optional
Specifies any method-specific options not covered above.
"""
pass
def _root_scalar_halley_doc():
r"""
Options
-------
args : tuple, optional
Extra arguments passed to the objective function and its derivatives.
xtol : float, optional
Tolerance (absolute) for termination.
rtol : float, optional
Tolerance (relative) for termination.
maxiter : int, optional
Maximum number of iterations.
x0 : float, required
Initial guess.
fprime : bool or callable, required
If `fprime` is a boolean and is True, `f` is assumed to return the
value of derivative along with the objective function.
`fprime` can also be a callable returning the derivative of `f`. In
this case, it must accept the same arguments as `f`.
fprime2 : bool or callable, required
If `fprime2` is a boolean and is True, `f` is assumed to return the
value of 1st and 2nd derivatives along with the objective function.
`fprime2` can also be a callable returning the 2nd derivative of `f`.
In this case, it must accept the same arguments as `f`.
options: dict, optional
Specifies any method-specific options not covered above.
"""
pass
def _root_scalar_ridder_doc():
r"""
Options
-------
args : tuple, optional
Extra arguments passed to the objective function.
bracket: A sequence of 2 floats, optional
An interval bracketing a root. `f(x, *args)` must have different
signs at the two endpoints.
xtol : float, optional
Tolerance (absolute) for termination.
rtol : float, optional
Tolerance (relative) for termination.
maxiter : int, optional
Maximum number of iterations.
options: dict, optional
Specifies any method-specific options not covered above.
"""
pass
def _root_scalar_bisect_doc():
r"""
Options
-------
args : tuple, optional
Extra arguments passed to the objective function.
bracket: A sequence of 2 floats, optional
An interval bracketing a root. `f(x, *args)` must have different
signs at the two endpoints.
xtol : float, optional
Tolerance (absolute) for termination.
rtol : float, optional
Tolerance (relative) for termination.
maxiter : int, optional
Maximum number of iterations.
options: dict, optional
Specifies any method-specific options not covered above.
"""
pass
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,460 @@
import collections
from abc import ABC, abstractmethod
import numpy as np
from scipy._lib._util import MapWrapper
class VertexBase(ABC):
"""
Base class for a vertex.
"""
def __init__(self, x, nn=None, index=None):
"""
Initiation of a vertex object.
Parameters
----------
x : tuple or vector
The geometric location (domain).
nn : list, optional
Nearest neighbour list.
index : int, optional
Index of vertex.
"""
self.x = x
self.hash = hash(self.x) # Save precomputed hash
if nn is not None:
self.nn = set(nn) # can use .indexupdate to add a new list
else:
self.nn = set()
self.index = index
def __hash__(self):
return self.hash
def __getattr__(self, item):
if item not in ['x_a']:
raise AttributeError(f"{type(self)} object has no attribute "
f"'{item}'")
if item == 'x_a':
self.x_a = np.array(self.x)
return self.x_a
@abstractmethod
def connect(self, v):
raise NotImplementedError("This method is only implemented with an "
"associated child of the base class.")
@abstractmethod
def disconnect(self, v):
raise NotImplementedError("This method is only implemented with an "
"associated child of the base class.")
def star(self):
"""Returns the star domain ``st(v)`` of the vertex.
Parameters
----------
v :
The vertex ``v`` in ``st(v)``
Returns
-------
st : set
A set containing all the vertices in ``st(v)``
"""
self.st = self.nn
self.st.add(self)
return self.st
class VertexScalarField(VertexBase):
"""
Add homology properties of a scalar field f: R^n --> R associated with
the geometry built from the VertexBase class
"""
def __init__(self, x, field=None, nn=None, index=None, field_args=(),
g_cons=None, g_cons_args=()):
"""
Parameters
----------
x : tuple,
vector of vertex coordinates
field : callable, optional
a scalar field f: R^n --> R associated with the geometry
nn : list, optional
list of nearest neighbours
index : int, optional
index of the vertex
field_args : tuple, optional
additional arguments to be passed to field
g_cons : callable, optional
constraints on the vertex
g_cons_args : tuple, optional
additional arguments to be passed to g_cons
"""
super().__init__(x, nn=nn, index=index)
# Note Vertex is only initiated once for all x so only
# evaluated once
# self.feasible = None
# self.f is externally defined by the cache to allow parallel
# processing
# None type that will break arithmetic operations unless defined
# self.f = None
self.check_min = True
self.check_max = True
def connect(self, v):
"""Connects self to another vertex object v.
Parameters
----------
v : VertexBase or VertexScalarField object
"""
if v is not self and v not in self.nn:
self.nn.add(v)
v.nn.add(self)
# Flags for checking homology properties:
self.check_min = True
self.check_max = True
v.check_min = True
v.check_max = True
def disconnect(self, v):
if v in self.nn:
self.nn.remove(v)
v.nn.remove(self)
# Flags for checking homology properties:
self.check_min = True
self.check_max = True
v.check_min = True
v.check_max = True
def minimiser(self):
"""Check whether this vertex is strictly less than all its
neighbours"""
if self.check_min:
self._min = all(self.f < v.f for v in self.nn)
self.check_min = False
return self._min
def maximiser(self):
"""
Check whether this vertex is strictly greater than all its
neighbours.
"""
if self.check_max:
self._max = all(self.f > v.f for v in self.nn)
self.check_max = False
return self._max
class VertexVectorField(VertexBase):
"""
Add homology properties of a scalar field f: R^n --> R^m associated with
the geometry built from the VertexBase class.
"""
def __init__(self, x, sfield=None, vfield=None, field_args=(),
vfield_args=(), g_cons=None,
g_cons_args=(), nn=None, index=None):
super().__init__(x, nn=nn, index=index)
raise NotImplementedError("This class is still a work in progress")
class VertexCacheBase:
"""Base class for a vertex cache for a simplicial complex."""
def __init__(self):
self.cache = collections.OrderedDict()
self.nfev = 0 # Feasible points
self.index = -1
def __iter__(self):
for v in self.cache:
yield self.cache[v]
return
def size(self):
"""Returns the size of the vertex cache."""
return self.index + 1
def print_out(self):
headlen = len(f"Vertex cache of size: {len(self.cache)}:")
print('=' * headlen)
print(f"Vertex cache of size: {len(self.cache)}:")
print('=' * headlen)
for v in self.cache:
self.cache[v].print_out()
class VertexCube(VertexBase):
"""Vertex class to be used for a pure simplicial complex with no associated
differential geometry (single level domain that exists in R^n)"""
def __init__(self, x, nn=None, index=None):
super().__init__(x, nn=nn, index=index)
def connect(self, v):
if v is not self and v not in self.nn:
self.nn.add(v)
v.nn.add(self)
def disconnect(self, v):
if v in self.nn:
self.nn.remove(v)
v.nn.remove(self)
class VertexCacheIndex(VertexCacheBase):
def __init__(self):
"""
Class for a vertex cache for a simplicial complex without an associated
field. Useful only for building and visualising a domain complex.
Parameters
----------
"""
super().__init__()
self.Vertex = VertexCube
def __getitem__(self, x, nn=None):
try:
return self.cache[x]
except KeyError:
self.index += 1
xval = self.Vertex(x, index=self.index)
# logging.info("New generated vertex at x = {}".format(x))
# NOTE: Surprisingly high performance increase if logging
# is commented out
self.cache[x] = xval
return self.cache[x]
class VertexCacheField(VertexCacheBase):
def __init__(self, field=None, field_args=(), g_cons=None, g_cons_args=(),
workers=1):
"""
Class for a vertex cache for a simplicial complex with an associated
field.
Parameters
----------
field : callable
Scalar or vector field callable.
field_args : tuple, optional
Any additional fixed parameters needed to completely specify the
field function
g_cons : dict or sequence of dict, optional
Constraints definition.
Function(s) ``R**n`` in the form::
g_cons_args : tuple, optional
Any additional fixed parameters needed to completely specify the
constraint functions
workers : int optional
Uses `multiprocessing.Pool <multiprocessing>`) to compute the field
functions in parallel.
"""
super().__init__()
self.index = -1
self.Vertex = VertexScalarField
self.field = field
self.field_args = field_args
self.wfield = FieldWrapper(field, field_args) # if workers is not 1
self.g_cons = g_cons
self.g_cons_args = g_cons_args
self.wgcons = ConstraintWrapper(g_cons, g_cons_args)
self.gpool = set() # A set of tuples to process for feasibility
# Field processing objects
self.fpool = set() # A set of tuples to process for scalar function
self.sfc_lock = False # True if self.fpool is non-Empty
self.workers = workers
self._mapwrapper = MapWrapper(workers)
if workers == 1:
self.process_gpool = self.proc_gpool
if g_cons is None:
self.process_fpool = self.proc_fpool_nog
else:
self.process_fpool = self.proc_fpool_g
else:
self.process_gpool = self.pproc_gpool
if g_cons is None:
self.process_fpool = self.pproc_fpool_nog
else:
self.process_fpool = self.pproc_fpool_g
def __getitem__(self, x, nn=None):
try:
return self.cache[x]
except KeyError:
self.index += 1
xval = self.Vertex(x, field=self.field, nn=nn, index=self.index,
field_args=self.field_args,
g_cons=self.g_cons,
g_cons_args=self.g_cons_args)
self.cache[x] = xval # Define in cache
self.gpool.add(xval) # Add to pool for processing feasibility
self.fpool.add(xval) # Add to pool for processing field values
return self.cache[x]
def __getstate__(self):
self_dict = self.__dict__.copy()
del self_dict['pool']
return self_dict
def process_pools(self):
if self.g_cons is not None:
self.process_gpool()
self.process_fpool()
self.proc_minimisers()
def feasibility_check(self, v):
v.feasible = True
for g, args in zip(self.g_cons, self.g_cons_args):
# constraint may return more than 1 value.
if np.any(g(v.x_a, *args) < 0.0):
v.f = np.inf
v.feasible = False
break
def compute_sfield(self, v):
"""Compute the scalar field values of a vertex object `v`.
Parameters
----------
v : VertexBase or VertexScalarField object
"""
try:
v.f = self.field(v.x_a, *self.field_args)
self.nfev += 1
except AttributeError:
v.f = np.inf
# logging.warning(f"Field function not found at x = {self.x_a}")
if np.isnan(v.f):
v.f = np.inf
def proc_gpool(self):
"""Process all constraints."""
if self.g_cons is not None:
for v in self.gpool:
self.feasibility_check(v)
# Clean the pool
self.gpool = set()
def pproc_gpool(self):
"""Process all constraints in parallel."""
gpool_l = []
for v in self.gpool:
gpool_l.append(v.x_a)
G = self._mapwrapper(self.wgcons.gcons, gpool_l)
for v, g in zip(self.gpool, G):
v.feasible = g # set vertex object attribute v.feasible = g (bool)
def proc_fpool_g(self):
"""Process all field functions with constraints supplied."""
for v in self.fpool:
if v.feasible:
self.compute_sfield(v)
# Clean the pool
self.fpool = set()
def proc_fpool_nog(self):
"""Process all field functions with no constraints supplied."""
for v in self.fpool:
self.compute_sfield(v)
# Clean the pool
self.fpool = set()
def pproc_fpool_g(self):
"""
Process all field functions with constraints supplied in parallel.
"""
self.wfield.func
fpool_l = []
for v in self.fpool:
if v.feasible:
fpool_l.append(v.x_a)
else:
v.f = np.inf
F = self._mapwrapper(self.wfield.func, fpool_l)
for va, f in zip(fpool_l, F):
vt = tuple(va)
self[vt].f = f # set vertex object attribute v.f = f
self.nfev += 1
# Clean the pool
self.fpool = set()
def pproc_fpool_nog(self):
"""
Process all field functions with no constraints supplied in parallel.
"""
self.wfield.func
fpool_l = []
for v in self.fpool:
fpool_l.append(v.x_a)
F = self._mapwrapper(self.wfield.func, fpool_l)
for va, f in zip(fpool_l, F):
vt = tuple(va)
self[vt].f = f # set vertex object attribute v.f = f
self.nfev += 1
# Clean the pool
self.fpool = set()
def proc_minimisers(self):
"""Check for minimisers."""
for v in self:
v.minimiser()
v.maximiser()
class ConstraintWrapper:
"""Object to wrap constraints to pass to `multiprocessing.Pool`."""
def __init__(self, g_cons, g_cons_args):
self.g_cons = g_cons
self.g_cons_args = g_cons_args
def gcons(self, v_x_a):
vfeasible = True
for g, args in zip(self.g_cons, self.g_cons_args):
# constraint may return more than 1 value.
if np.any(g(v_x_a, *args) < 0.0):
vfeasible = False
break
return vfeasible
class FieldWrapper:
"""Object to wrap field to pass to `multiprocessing.Pool`."""
def __init__(self, field, field_args):
self.field = field
self.field_args = field_args
def func(self, v_x_a):
try:
v_f = self.field(v_x_a, *self.field_args)
except Exception:
v_f = np.inf
if np.isnan(v_f):
v_f = np.inf
return v_f
@@ -0,0 +1,513 @@
"""
This module implements the Sequential Least Squares Programming optimization
algorithm (SLSQP), originally developed by Dieter Kraft.
See http://www.netlib.org/toms/733
Functions
---------
.. autosummary::
:toctree: generated/
approx_jacobian
fmin_slsqp
"""
__all__ = ['approx_jacobian', 'fmin_slsqp']
import numpy as np
from scipy.optimize._slsqp import slsqp
from numpy import (zeros, array, linalg, append, concatenate, finfo,
sqrt, vstack, isfinite, atleast_1d)
from ._optimize import (OptimizeResult, _check_unknown_options,
_prepare_scalar_function, _clip_x_for_func,
_check_clip_x)
from ._numdiff import approx_derivative
from ._constraints import old_bound_to_new, _arr_to_scalar
from scipy._lib._array_api import atleast_nd, array_namespace
# deprecated imports to be removed in SciPy 1.13.0
from numpy import exp, inf # noqa: F401
__docformat__ = "restructuredtext en"
_epsilon = sqrt(finfo(float).eps)
def approx_jacobian(x, func, epsilon, *args):
"""
Approximate the Jacobian matrix of a callable function.
Parameters
----------
x : array_like
The state vector at which to compute the Jacobian matrix.
func : callable f(x,*args)
The vector-valued function.
epsilon : float
The perturbation used to determine the partial derivatives.
args : sequence
Additional arguments passed to func.
Returns
-------
An array of dimensions ``(lenf, lenx)`` where ``lenf`` is the length
of the outputs of `func`, and ``lenx`` is the number of elements in
`x`.
Notes
-----
The approximation is done using forward differences.
"""
# approx_derivative returns (m, n) == (lenf, lenx)
jac = approx_derivative(func, x, method='2-point', abs_step=epsilon,
args=args)
# if func returns a scalar jac.shape will be (lenx,). Make sure
# it's at least a 2D array.
return np.atleast_2d(jac)
def fmin_slsqp(func, x0, eqcons=(), f_eqcons=None, ieqcons=(), f_ieqcons=None,
bounds=(), fprime=None, fprime_eqcons=None,
fprime_ieqcons=None, args=(), iter=100, acc=1.0E-6,
iprint=1, disp=None, full_output=0, epsilon=_epsilon,
callback=None):
"""
Minimize a function using Sequential Least Squares Programming
Python interface function for the SLSQP Optimization subroutine
originally implemented by Dieter Kraft.
Parameters
----------
func : callable f(x,*args)
Objective function. Must return a scalar.
x0 : 1-D ndarray of float
Initial guess for the independent variable(s).
eqcons : list, optional
A list of functions of length n such that
eqcons[j](x,*args) == 0.0 in a successfully optimized
problem.
f_eqcons : callable f(x,*args), optional
Returns a 1-D array in which each element must equal 0.0 in a
successfully optimized problem. If f_eqcons is specified,
eqcons is ignored.
ieqcons : list, optional
A list of functions of length n such that
ieqcons[j](x,*args) >= 0.0 in a successfully optimized
problem.
f_ieqcons : callable f(x,*args), optional
Returns a 1-D ndarray in which each element must be greater or
equal to 0.0 in a successfully optimized problem. If
f_ieqcons is specified, ieqcons is ignored.
bounds : list, optional
A list of tuples specifying the lower and upper bound
for each independent variable [(xl0, xu0),(xl1, xu1),...]
Infinite values will be interpreted as large floating values.
fprime : callable `f(x,*args)`, optional
A function that evaluates the partial derivatives of func.
fprime_eqcons : callable `f(x,*args)`, optional
A function of the form `f(x, *args)` that returns the m by n
array of equality constraint normals. If not provided,
the normals will be approximated. The array returned by
fprime_eqcons should be sized as ( len(eqcons), len(x0) ).
fprime_ieqcons : callable `f(x,*args)`, optional
A function of the form `f(x, *args)` that returns the m by n
array of inequality constraint normals. If not provided,
the normals will be approximated. The array returned by
fprime_ieqcons should be sized as ( len(ieqcons), len(x0) ).
args : sequence, optional
Additional arguments passed to func and fprime.
iter : int, optional
The maximum number of iterations.
acc : float, optional
Requested accuracy.
iprint : int, optional
The verbosity of fmin_slsqp :
* iprint <= 0 : Silent operation
* iprint == 1 : Print summary upon completion (default)
* iprint >= 2 : Print status of each iterate and summary
disp : int, optional
Overrides the iprint interface (preferred).
full_output : bool, optional
If False, return only the minimizer of func (default).
Otherwise, output final objective function and summary
information.
epsilon : float, optional
The step size for finite-difference derivative estimates.
callback : callable, optional
Called after each iteration, as ``callback(x)``, where ``x`` is the
current parameter vector.
Returns
-------
out : ndarray of float
The final minimizer of func.
fx : ndarray of float, if full_output is true
The final value of the objective function.
its : int, if full_output is true
The number of iterations.
imode : int, if full_output is true
The exit mode from the optimizer (see below).
smode : string, if full_output is true
Message describing the exit mode from the optimizer.
See also
--------
minimize: Interface to minimization algorithms for multivariate
functions. See the 'SLSQP' `method` in particular.
Notes
-----
Exit modes are defined as follows ::
-1 : Gradient evaluation required (g & a)
0 : Optimization terminated successfully
1 : Function evaluation required (f & c)
2 : More equality constraints than independent variables
3 : More than 3*n iterations in LSQ subproblem
4 : Inequality constraints incompatible
5 : Singular matrix E in LSQ subproblem
6 : Singular matrix C in LSQ subproblem
7 : Rank-deficient equality constraint subproblem HFTI
8 : Positive directional derivative for linesearch
9 : Iteration limit reached
Examples
--------
Examples are given :ref:`in the tutorial <tutorial-sqlsp>`.
"""
if disp is not None:
iprint = disp
opts = {'maxiter': iter,
'ftol': acc,
'iprint': iprint,
'disp': iprint != 0,
'eps': epsilon,
'callback': callback}
# Build the constraints as a tuple of dictionaries
cons = ()
# 1. constraints of the 1st kind (eqcons, ieqcons); no Jacobian; take
# the same extra arguments as the objective function.
cons += tuple({'type': 'eq', 'fun': c, 'args': args} for c in eqcons)
cons += tuple({'type': 'ineq', 'fun': c, 'args': args} for c in ieqcons)
# 2. constraints of the 2nd kind (f_eqcons, f_ieqcons) and their Jacobian
# (fprime_eqcons, fprime_ieqcons); also take the same extra arguments
# as the objective function.
if f_eqcons:
cons += ({'type': 'eq', 'fun': f_eqcons, 'jac': fprime_eqcons,
'args': args}, )
if f_ieqcons:
cons += ({'type': 'ineq', 'fun': f_ieqcons, 'jac': fprime_ieqcons,
'args': args}, )
res = _minimize_slsqp(func, x0, args, jac=fprime, bounds=bounds,
constraints=cons, **opts)
if full_output:
return res['x'], res['fun'], res['nit'], res['status'], res['message']
else:
return res['x']
def _minimize_slsqp(func, x0, args=(), jac=None, bounds=None,
constraints=(),
maxiter=100, ftol=1.0E-6, iprint=1, disp=False,
eps=_epsilon, callback=None, finite_diff_rel_step=None,
**unknown_options):
"""
Minimize a scalar function of one or more variables using Sequential
Least Squares Programming (SLSQP).
Options
-------
ftol : float
Precision goal for the value of f in the stopping criterion.
eps : float
Step size used for numerical approximation of the Jacobian.
disp : bool
Set to True to print convergence messages. If False,
`verbosity` is ignored and set to 0.
maxiter : int
Maximum number of iterations.
finite_diff_rel_step : None or array_like, optional
If `jac in ['2-point', '3-point', 'cs']` the relative step size to
use for numerical approximation of `jac`. The absolute step
size is computed as ``h = rel_step * sign(x) * max(1, abs(x))``,
possibly adjusted to fit into the bounds. For ``method='3-point'``
the sign of `h` is ignored. If None (default) then step is selected
automatically.
"""
_check_unknown_options(unknown_options)
iter = maxiter - 1
acc = ftol
epsilon = eps
if not disp:
iprint = 0
# Transform x0 into an array.
xp = array_namespace(x0)
x0 = atleast_nd(x0, ndim=1, xp=xp)
dtype = xp.float64
if xp.isdtype(x0.dtype, "real floating"):
dtype = x0.dtype
x = xp.reshape(xp.astype(x0, dtype), -1)
# SLSQP is sent 'old-style' bounds, 'new-style' bounds are required by
# ScalarFunction
if bounds is None or len(bounds) == 0:
new_bounds = (-np.inf, np.inf)
else:
new_bounds = old_bound_to_new(bounds)
# clip the initial guess to bounds, otherwise ScalarFunction doesn't work
x = np.clip(x, new_bounds[0], new_bounds[1])
# Constraints are triaged per type into a dictionary of tuples
if isinstance(constraints, dict):
constraints = (constraints, )
cons = {'eq': (), 'ineq': ()}
for ic, con in enumerate(constraints):
# check type
try:
ctype = con['type'].lower()
except KeyError as e:
raise KeyError('Constraint %d has no type defined.' % ic) from e
except TypeError as e:
raise TypeError('Constraints must be defined using a '
'dictionary.') from e
except AttributeError as e:
raise TypeError("Constraint's type must be a string.") from e
else:
if ctype not in ['eq', 'ineq']:
raise ValueError("Unknown constraint type '%s'." % con['type'])
# check function
if 'fun' not in con:
raise ValueError('Constraint %d has no function defined.' % ic)
# check Jacobian
cjac = con.get('jac')
if cjac is None:
# approximate Jacobian function. The factory function is needed
# to keep a reference to `fun`, see gh-4240.
def cjac_factory(fun):
def cjac(x, *args):
x = _check_clip_x(x, new_bounds)
if jac in ['2-point', '3-point', 'cs']:
return approx_derivative(fun, x, method=jac, args=args,
rel_step=finite_diff_rel_step,
bounds=new_bounds)
else:
return approx_derivative(fun, x, method='2-point',
abs_step=epsilon, args=args,
bounds=new_bounds)
return cjac
cjac = cjac_factory(con['fun'])
# update constraints' dictionary
cons[ctype] += ({'fun': con['fun'],
'jac': cjac,
'args': con.get('args', ())}, )
exit_modes = {-1: "Gradient evaluation required (g & a)",
0: "Optimization terminated successfully",
1: "Function evaluation required (f & c)",
2: "More equality constraints than independent variables",
3: "More than 3*n iterations in LSQ subproblem",
4: "Inequality constraints incompatible",
5: "Singular matrix E in LSQ subproblem",
6: "Singular matrix C in LSQ subproblem",
7: "Rank-deficient equality constraint subproblem HFTI",
8: "Positive directional derivative for linesearch",
9: "Iteration limit reached"}
# Set the parameters that SLSQP will need
# meq, mieq: number of equality and inequality constraints
meq = sum(map(len, [atleast_1d(c['fun'](x, *c['args']))
for c in cons['eq']]))
mieq = sum(map(len, [atleast_1d(c['fun'](x, *c['args']))
for c in cons['ineq']]))
# m = The total number of constraints
m = meq + mieq
# la = The number of constraints, or 1 if there are no constraints
la = array([1, m]).max()
# n = The number of independent variables
n = len(x)
# Define the workspaces for SLSQP
n1 = n + 1
mineq = m - meq + n1 + n1
len_w = (3*n1+m)*(n1+1)+(n1-meq+1)*(mineq+2) + 2*mineq+(n1+mineq)*(n1-meq) \
+ 2*meq + n1 + ((n+1)*n)//2 + 2*m + 3*n + 3*n1 + 1
len_jw = mineq
w = zeros(len_w)
jw = zeros(len_jw)
# Decompose bounds into xl and xu
if bounds is None or len(bounds) == 0:
xl = np.empty(n, dtype=float)
xu = np.empty(n, dtype=float)
xl.fill(np.nan)
xu.fill(np.nan)
else:
bnds = array([(_arr_to_scalar(l), _arr_to_scalar(u))
for (l, u) in bounds], float)
if bnds.shape[0] != n:
raise IndexError('SLSQP Error: the length of bounds is not '
'compatible with that of x0.')
with np.errstate(invalid='ignore'):
bnderr = bnds[:, 0] > bnds[:, 1]
if bnderr.any():
raise ValueError('SLSQP Error: lb > ub in bounds %s.' %
', '.join(str(b) for b in bnderr))
xl, xu = bnds[:, 0], bnds[:, 1]
# Mark infinite bounds with nans; the Fortran code understands this
infbnd = ~isfinite(bnds)
xl[infbnd[:, 0]] = np.nan
xu[infbnd[:, 1]] = np.nan
# ScalarFunction provides function and gradient evaluation
sf = _prepare_scalar_function(func, x, jac=jac, args=args, epsilon=eps,
finite_diff_rel_step=finite_diff_rel_step,
bounds=new_bounds)
# gh11403 SLSQP sometimes exceeds bounds by 1 or 2 ULP, make sure this
# doesn't get sent to the func/grad evaluator.
wrapped_fun = _clip_x_for_func(sf.fun, new_bounds)
wrapped_grad = _clip_x_for_func(sf.grad, new_bounds)
# Initialize the iteration counter and the mode value
mode = array(0, int)
acc = array(acc, float)
majiter = array(iter, int)
majiter_prev = 0
# Initialize internal SLSQP state variables
alpha = array(0, float)
f0 = array(0, float)
gs = array(0, float)
h1 = array(0, float)
h2 = array(0, float)
h3 = array(0, float)
h4 = array(0, float)
t = array(0, float)
t0 = array(0, float)
tol = array(0, float)
iexact = array(0, int)
incons = array(0, int)
ireset = array(0, int)
itermx = array(0, int)
line = array(0, int)
n1 = array(0, int)
n2 = array(0, int)
n3 = array(0, int)
# Print the header if iprint >= 2
if iprint >= 2:
print("%5s %5s %16s %16s" % ("NIT", "FC", "OBJFUN", "GNORM"))
# mode is zero on entry, so call objective, constraints and gradients
# there should be no func evaluations here because it's cached from
# ScalarFunction
fx = wrapped_fun(x)
g = append(wrapped_grad(x), 0.0)
c = _eval_constraint(x, cons)
a = _eval_con_normals(x, cons, la, n, m, meq, mieq)
while 1:
# Call SLSQP
slsqp(m, meq, x, xl, xu, fx, c, g, a, acc, majiter, mode, w, jw,
alpha, f0, gs, h1, h2, h3, h4, t, t0, tol,
iexact, incons, ireset, itermx, line,
n1, n2, n3)
if mode == 1: # objective and constraint evaluation required
fx = wrapped_fun(x)
c = _eval_constraint(x, cons)
if mode == -1: # gradient evaluation required
g = append(wrapped_grad(x), 0.0)
a = _eval_con_normals(x, cons, la, n, m, meq, mieq)
if majiter > majiter_prev:
# call callback if major iteration has incremented
if callback is not None:
callback(np.copy(x))
# Print the status of the current iterate if iprint > 2
if iprint >= 2:
print("%5i %5i % 16.6E % 16.6E" % (majiter, sf.nfev,
fx, linalg.norm(g)))
# If exit mode is not -1 or 1, slsqp has completed
if abs(mode) != 1:
break
majiter_prev = int(majiter)
# Optimization loop complete. Print status if requested
if iprint >= 1:
print(exit_modes[int(mode)] + " (Exit mode " + str(mode) + ')')
print(" Current function value:", fx)
print(" Iterations:", majiter)
print(" Function evaluations:", sf.nfev)
print(" Gradient evaluations:", sf.ngev)
return OptimizeResult(x=x, fun=fx, jac=g[:-1], nit=int(majiter),
nfev=sf.nfev, njev=sf.ngev, status=int(mode),
message=exit_modes[int(mode)], success=(mode == 0))
def _eval_constraint(x, cons):
# Compute constraints
if cons['eq']:
c_eq = concatenate([atleast_1d(con['fun'](x, *con['args']))
for con in cons['eq']])
else:
c_eq = zeros(0)
if cons['ineq']:
c_ieq = concatenate([atleast_1d(con['fun'](x, *con['args']))
for con in cons['ineq']])
else:
c_ieq = zeros(0)
# Now combine c_eq and c_ieq into a single matrix
c = concatenate((c_eq, c_ieq))
return c
def _eval_con_normals(x, cons, la, n, m, meq, mieq):
# Compute the normals of the constraints
if cons['eq']:
a_eq = vstack([con['jac'](x, *con['args'])
for con in cons['eq']])
else: # no equality constraint
a_eq = zeros((meq, n))
if cons['ineq']:
a_ieq = vstack([con['jac'](x, *con['args'])
for con in cons['ineq']])
else: # no inequality constraint
a_ieq = zeros((mieq, n))
# Now combine a_eq and a_ieq into a single a matrix
if m == 0: # no constraints
a = zeros((la, n))
else:
a = vstack((a_eq, a_ieq))
a = concatenate((a, zeros([la, 1])), 1)
return a
@@ -0,0 +1,260 @@
"""
Spectral Algorithm for Nonlinear Equations
"""
import collections
import numpy as np
from scipy.optimize import OptimizeResult
from scipy.optimize._optimize import _check_unknown_options
from ._linesearch import _nonmonotone_line_search_cruz, _nonmonotone_line_search_cheng
class _NoConvergence(Exception):
pass
def _root_df_sane(func, x0, args=(), ftol=1e-8, fatol=1e-300, maxfev=1000,
fnorm=None, callback=None, disp=False, M=10, eta_strategy=None,
sigma_eps=1e-10, sigma_0=1.0, line_search='cruz', **unknown_options):
r"""
Solve nonlinear equation with the DF-SANE method
Options
-------
ftol : float, optional
Relative norm tolerance.
fatol : float, optional
Absolute norm tolerance.
Algorithm terminates when ``||func(x)|| < fatol + ftol ||func(x_0)||``.
fnorm : callable, optional
Norm to use in the convergence check. If None, 2-norm is used.
maxfev : int, optional
Maximum number of function evaluations.
disp : bool, optional
Whether to print convergence process to stdout.
eta_strategy : callable, optional
Choice of the ``eta_k`` parameter, which gives slack for growth
of ``||F||**2``. Called as ``eta_k = eta_strategy(k, x, F)`` with
`k` the iteration number, `x` the current iterate and `F` the current
residual. Should satisfy ``eta_k > 0`` and ``sum(eta, k=0..inf) < inf``.
Default: ``||F||**2 / (1 + k)**2``.
sigma_eps : float, optional
The spectral coefficient is constrained to ``sigma_eps < sigma < 1/sigma_eps``.
Default: 1e-10
sigma_0 : float, optional
Initial spectral coefficient.
Default: 1.0
M : int, optional
Number of iterates to include in the nonmonotonic line search.
Default: 10
line_search : {'cruz', 'cheng'}
Type of line search to employ. 'cruz' is the original one defined in
[Martinez & Raydan. Math. Comp. 75, 1429 (2006)], 'cheng' is
a modified search defined in [Cheng & Li. IMA J. Numer. Anal. 29, 814 (2009)].
Default: 'cruz'
References
----------
.. [1] "Spectral residual method without gradient information for solving
large-scale nonlinear systems of equations." W. La Cruz,
J.M. Martinez, M. Raydan. Math. Comp. **75**, 1429 (2006).
.. [2] W. La Cruz, Opt. Meth. Software, 29, 24 (2014).
.. [3] W. Cheng, D.-H. Li. IMA J. Numer. Anal. **29**, 814 (2009).
"""
_check_unknown_options(unknown_options)
if line_search not in ('cheng', 'cruz'):
raise ValueError(f"Invalid value {line_search!r} for 'line_search'")
nexp = 2
if eta_strategy is None:
# Different choice from [1], as their eta is not invariant
# vs. scaling of F.
def eta_strategy(k, x, F):
# Obtain squared 2-norm of the initial residual from the outer scope
return f_0 / (1 + k)**2
if fnorm is None:
def fnorm(F):
# Obtain squared 2-norm of the current residual from the outer scope
return f_k**(1.0/nexp)
def fmerit(F):
return np.linalg.norm(F)**nexp
nfev = [0]
f, x_k, x_shape, f_k, F_k, is_complex = _wrap_func(func, x0, fmerit,
nfev, maxfev, args)
k = 0
f_0 = f_k
sigma_k = sigma_0
F_0_norm = fnorm(F_k)
# For the 'cruz' line search
prev_fs = collections.deque([f_k], M)
# For the 'cheng' line search
Q = 1.0
C = f_0
converged = False
message = "too many function evaluations required"
while True:
F_k_norm = fnorm(F_k)
if disp:
print("iter %d: ||F|| = %g, sigma = %g" % (k, F_k_norm, sigma_k))
if callback is not None:
callback(x_k, F_k)
if F_k_norm < ftol * F_0_norm + fatol:
# Converged!
message = "successful convergence"
converged = True
break
# Control spectral parameter, from [2]
if abs(sigma_k) > 1/sigma_eps:
sigma_k = 1/sigma_eps * np.sign(sigma_k)
elif abs(sigma_k) < sigma_eps:
sigma_k = sigma_eps
# Line search direction
d = -sigma_k * F_k
# Nonmonotone line search
eta = eta_strategy(k, x_k, F_k)
try:
if line_search == 'cruz':
alpha, xp, fp, Fp = _nonmonotone_line_search_cruz(f, x_k, d, prev_fs,
eta=eta)
elif line_search == 'cheng':
alpha, xp, fp, Fp, C, Q = _nonmonotone_line_search_cheng(f, x_k, d, f_k,
C, Q, eta=eta)
except _NoConvergence:
break
# Update spectral parameter
s_k = xp - x_k
y_k = Fp - F_k
sigma_k = np.vdot(s_k, s_k) / np.vdot(s_k, y_k)
# Take step
x_k = xp
F_k = Fp
f_k = fp
# Store function value
if line_search == 'cruz':
prev_fs.append(fp)
k += 1
x = _wrap_result(x_k, is_complex, shape=x_shape)
F = _wrap_result(F_k, is_complex)
result = OptimizeResult(x=x, success=converged,
message=message,
fun=F, nfev=nfev[0], nit=k, method="df-sane")
return result
def _wrap_func(func, x0, fmerit, nfev_list, maxfev, args=()):
"""
Wrap a function and an initial value so that (i) complex values
are wrapped to reals, and (ii) value for a merit function
fmerit(x, f) is computed at the same time, (iii) iteration count
is maintained and an exception is raised if it is exceeded.
Parameters
----------
func : callable
Function to wrap
x0 : ndarray
Initial value
fmerit : callable
Merit function fmerit(f) for computing merit value from residual.
nfev_list : list
List to store number of evaluations in. Should be [0] in the beginning.
maxfev : int
Maximum number of evaluations before _NoConvergence is raised.
args : tuple
Extra arguments to func
Returns
-------
wrap_func : callable
Wrapped function, to be called as
``F, fp = wrap_func(x0)``
x0_wrap : ndarray of float
Wrapped initial value; raveled to 1-D and complex
values mapped to reals.
x0_shape : tuple
Shape of the initial value array
f : float
Merit function at F
F : ndarray of float
Residual at x0_wrap
is_complex : bool
Whether complex values were mapped to reals
"""
x0 = np.asarray(x0)
x0_shape = x0.shape
F = np.asarray(func(x0, *args)).ravel()
is_complex = np.iscomplexobj(x0) or np.iscomplexobj(F)
x0 = x0.ravel()
nfev_list[0] = 1
if is_complex:
def wrap_func(x):
if nfev_list[0] >= maxfev:
raise _NoConvergence()
nfev_list[0] += 1
z = _real2complex(x).reshape(x0_shape)
v = np.asarray(func(z, *args)).ravel()
F = _complex2real(v)
f = fmerit(F)
return f, F
x0 = _complex2real(x0)
F = _complex2real(F)
else:
def wrap_func(x):
if nfev_list[0] >= maxfev:
raise _NoConvergence()
nfev_list[0] += 1
x = x.reshape(x0_shape)
F = np.asarray(func(x, *args)).ravel()
f = fmerit(F)
return f, F
return wrap_func, x0, x0_shape, fmerit(F), F, is_complex
def _wrap_result(result, is_complex, shape=None):
"""
Convert from real to complex and reshape result arrays.
"""
if is_complex:
z = _real2complex(result)
else:
z = result
if shape is not None:
z = z.reshape(shape)
return z
def _real2complex(x):
return np.ascontiguousarray(x, dtype=float).view(np.complex128)
def _complex2real(z):
return np.ascontiguousarray(z, dtype=complex).view(np.float64)
@@ -0,0 +1,430 @@
# TNC Python interface
# @(#) $Jeannot: tnc.py,v 1.11 2005/01/28 18:27:31 js Exp $
# Copyright (c) 2004-2005, Jean-Sebastien Roy (js@jeannot.org)
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the
# "Software"), to deal in the Software without restriction, including
# without limitation the rights to use, copy, modify, merge, publish,
# distribute, sublicense, and/or sell copies of the Software, and to
# permit persons to whom the Software is furnished to do so, subject to
# the following conditions:
# The above copyright notice and this permission notice shall be included
# in all copies or substantial portions of the Software.
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
# OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
# IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
# CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
# TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
# SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
"""
TNC: A Python interface to the TNC non-linear optimizer
TNC is a non-linear optimizer. To use it, you must provide a function to
minimize. The function must take one argument: the list of coordinates where to
evaluate the function; and it must return either a tuple, whose first element is the
value of the function, and whose second argument is the gradient of the function
(as a list of values); or None, to abort the minimization.
"""
from scipy.optimize import _moduleTNC as moduleTNC
from ._optimize import (MemoizeJac, OptimizeResult, _check_unknown_options,
_prepare_scalar_function)
from ._constraints import old_bound_to_new
from scipy._lib._array_api import atleast_nd, array_namespace
from numpy import inf, array, zeros
__all__ = ['fmin_tnc']
MSG_NONE = 0 # No messages
MSG_ITER = 1 # One line per iteration
MSG_INFO = 2 # Informational messages
MSG_VERS = 4 # Version info
MSG_EXIT = 8 # Exit reasons
MSG_ALL = MSG_ITER + MSG_INFO + MSG_VERS + MSG_EXIT
MSGS = {
MSG_NONE: "No messages",
MSG_ITER: "One line per iteration",
MSG_INFO: "Informational messages",
MSG_VERS: "Version info",
MSG_EXIT: "Exit reasons",
MSG_ALL: "All messages"
}
INFEASIBLE = -1 # Infeasible (lower bound > upper bound)
LOCALMINIMUM = 0 # Local minimum reached (|pg| ~= 0)
FCONVERGED = 1 # Converged (|f_n-f_(n-1)| ~= 0)
XCONVERGED = 2 # Converged (|x_n-x_(n-1)| ~= 0)
MAXFUN = 3 # Max. number of function evaluations reached
LSFAIL = 4 # Linear search failed
CONSTANT = 5 # All lower bounds are equal to the upper bounds
NOPROGRESS = 6 # Unable to progress
USERABORT = 7 # User requested end of minimization
RCSTRINGS = {
INFEASIBLE: "Infeasible (lower bound > upper bound)",
LOCALMINIMUM: "Local minimum reached (|pg| ~= 0)",
FCONVERGED: "Converged (|f_n-f_(n-1)| ~= 0)",
XCONVERGED: "Converged (|x_n-x_(n-1)| ~= 0)",
MAXFUN: "Max. number of function evaluations reached",
LSFAIL: "Linear search failed",
CONSTANT: "All lower bounds are equal to the upper bounds",
NOPROGRESS: "Unable to progress",
USERABORT: "User requested end of minimization"
}
# Changes to interface made by Travis Oliphant, Apr. 2004 for inclusion in
# SciPy
def fmin_tnc(func, x0, fprime=None, args=(), approx_grad=0,
bounds=None, epsilon=1e-8, scale=None, offset=None,
messages=MSG_ALL, maxCGit=-1, maxfun=None, eta=-1,
stepmx=0, accuracy=0, fmin=0, ftol=-1, xtol=-1, pgtol=-1,
rescale=-1, disp=None, callback=None):
"""
Minimize a function with variables subject to bounds, using
gradient information in a truncated Newton algorithm. This
method wraps a C implementation of the algorithm.
Parameters
----------
func : callable ``func(x, *args)``
Function to minimize. Must do one of:
1. Return f and g, where f is the value of the function and g its
gradient (a list of floats).
2. Return the function value but supply gradient function
separately as `fprime`.
3. Return the function value and set ``approx_grad=True``.
If the function returns None, the minimization
is aborted.
x0 : array_like
Initial estimate of minimum.
fprime : callable ``fprime(x, *args)``, optional
Gradient of `func`. If None, then either `func` must return the
function value and the gradient (``f,g = func(x, *args)``)
or `approx_grad` must be True.
args : tuple, optional
Arguments to pass to function.
approx_grad : bool, optional
If true, approximate the gradient numerically.
bounds : list, optional
(min, max) pairs for each element in x0, defining the
bounds on that parameter. Use None or +/-inf for one of
min or max when there is no bound in that direction.
epsilon : float, optional
Used if approx_grad is True. The stepsize in a finite
difference approximation for fprime.
scale : array_like, optional
Scaling factors to apply to each variable. If None, the
factors are up-low for interval bounded variables and
1+|x| for the others. Defaults to None.
offset : array_like, optional
Value to subtract from each variable. If None, the
offsets are (up+low)/2 for interval bounded variables
and x for the others.
messages : int, optional
Bit mask used to select messages display during
minimization values defined in the MSGS dict. Defaults to
MGS_ALL.
disp : int, optional
Integer interface to messages. 0 = no message, 5 = all messages
maxCGit : int, optional
Maximum number of hessian*vector evaluations per main
iteration. If maxCGit == 0, the direction chosen is
-gradient if maxCGit < 0, maxCGit is set to
max(1,min(50,n/2)). Defaults to -1.
maxfun : int, optional
Maximum number of function evaluation. If None, maxfun is
set to max(100, 10*len(x0)). Defaults to None. Note that this function
may violate the limit because of evaluating gradients by numerical
differentiation.
eta : float, optional
Severity of the line search. If < 0 or > 1, set to 0.25.
Defaults to -1.
stepmx : float, optional
Maximum step for the line search. May be increased during
call. If too small, it will be set to 10.0. Defaults to 0.
accuracy : float, optional
Relative precision for finite difference calculations. If
<= machine_precision, set to sqrt(machine_precision).
Defaults to 0.
fmin : float, optional
Minimum function value estimate. Defaults to 0.
ftol : float, optional
Precision goal for the value of f in the stopping criterion.
If ftol < 0.0, ftol is set to 0.0 defaults to -1.
xtol : float, optional
Precision goal for the value of x in the stopping
criterion (after applying x scaling factors). If xtol <
0.0, xtol is set to sqrt(machine_precision). Defaults to
-1.
pgtol : float, optional
Precision goal for the value of the projected gradient in
the stopping criterion (after applying x scaling factors).
If pgtol < 0.0, pgtol is set to 1e-2 * sqrt(accuracy).
Setting it to 0.0 is not recommended. Defaults to -1.
rescale : float, optional
Scaling factor (in log10) used to trigger f value
rescaling. If 0, rescale at each iteration. If a large
value, never rescale. If < 0, rescale is set to 1.3.
callback : callable, optional
Called after each iteration, as callback(xk), where xk is the
current parameter vector.
Returns
-------
x : ndarray
The solution.
nfeval : int
The number of function evaluations.
rc : int
Return code, see below
See also
--------
minimize: Interface to minimization algorithms for multivariate
functions. See the 'TNC' `method` in particular.
Notes
-----
The underlying algorithm is truncated Newton, also called
Newton Conjugate-Gradient. This method differs from
scipy.optimize.fmin_ncg in that
1. it wraps a C implementation of the algorithm
2. it allows each variable to be given an upper and lower bound.
The algorithm incorporates the bound constraints by determining
the descent direction as in an unconstrained truncated Newton,
but never taking a step-size large enough to leave the space
of feasible x's. The algorithm keeps track of a set of
currently active constraints, and ignores them when computing
the minimum allowable step size. (The x's associated with the
active constraint are kept fixed.) If the maximum allowable
step size is zero then a new constraint is added. At the end
of each iteration one of the constraints may be deemed no
longer active and removed. A constraint is considered
no longer active is if it is currently active
but the gradient for that variable points inward from the
constraint. The specific constraint removed is the one
associated with the variable of largest index whose
constraint is no longer active.
Return codes are defined as follows::
-1 : Infeasible (lower bound > upper bound)
0 : Local minimum reached (|pg| ~= 0)
1 : Converged (|f_n-f_(n-1)| ~= 0)
2 : Converged (|x_n-x_(n-1)| ~= 0)
3 : Max. number of function evaluations reached
4 : Linear search failed
5 : All lower bounds are equal to the upper bounds
6 : Unable to progress
7 : User requested end of minimization
References
----------
Wright S., Nocedal J. (2006), 'Numerical Optimization'
Nash S.G. (1984), "Newton-Type Minimization Via the Lanczos Method",
SIAM Journal of Numerical Analysis 21, pp. 770-778
"""
# handle fprime/approx_grad
if approx_grad:
fun = func
jac = None
elif fprime is None:
fun = MemoizeJac(func)
jac = fun.derivative
else:
fun = func
jac = fprime
if disp is not None: # disp takes precedence over messages
mesg_num = disp
else:
mesg_num = {0:MSG_NONE, 1:MSG_ITER, 2:MSG_INFO, 3:MSG_VERS,
4:MSG_EXIT, 5:MSG_ALL}.get(messages, MSG_ALL)
# build options
opts = {'eps': epsilon,
'scale': scale,
'offset': offset,
'mesg_num': mesg_num,
'maxCGit': maxCGit,
'maxfun': maxfun,
'eta': eta,
'stepmx': stepmx,
'accuracy': accuracy,
'minfev': fmin,
'ftol': ftol,
'xtol': xtol,
'gtol': pgtol,
'rescale': rescale,
'disp': False}
res = _minimize_tnc(fun, x0, args, jac, bounds, callback=callback, **opts)
return res['x'], res['nfev'], res['status']
def _minimize_tnc(fun, x0, args=(), jac=None, bounds=None,
eps=1e-8, scale=None, offset=None, mesg_num=None,
maxCGit=-1, eta=-1, stepmx=0, accuracy=0,
minfev=0, ftol=-1, xtol=-1, gtol=-1, rescale=-1, disp=False,
callback=None, finite_diff_rel_step=None, maxfun=None,
**unknown_options):
"""
Minimize a scalar function of one or more variables using a truncated
Newton (TNC) algorithm.
Options
-------
eps : float or ndarray
If `jac is None` the absolute step size used for numerical
approximation of the jacobian via forward differences.
scale : list of floats
Scaling factors to apply to each variable. If None, the
factors are up-low for interval bounded variables and
1+|x] for the others. Defaults to None.
offset : float
Value to subtract from each variable. If None, the
offsets are (up+low)/2 for interval bounded variables
and x for the others.
disp : bool
Set to True to print convergence messages.
maxCGit : int
Maximum number of hessian*vector evaluations per main
iteration. If maxCGit == 0, the direction chosen is
-gradient if maxCGit < 0, maxCGit is set to
max(1,min(50,n/2)). Defaults to -1.
eta : float
Severity of the line search. If < 0 or > 1, set to 0.25.
Defaults to -1.
stepmx : float
Maximum step for the line search. May be increased during
call. If too small, it will be set to 10.0. Defaults to 0.
accuracy : float
Relative precision for finite difference calculations. If
<= machine_precision, set to sqrt(machine_precision).
Defaults to 0.
minfev : float
Minimum function value estimate. Defaults to 0.
ftol : float
Precision goal for the value of f in the stopping criterion.
If ftol < 0.0, ftol is set to 0.0 defaults to -1.
xtol : float
Precision goal for the value of x in the stopping
criterion (after applying x scaling factors). If xtol <
0.0, xtol is set to sqrt(machine_precision). Defaults to
-1.
gtol : float
Precision goal for the value of the projected gradient in
the stopping criterion (after applying x scaling factors).
If gtol < 0.0, gtol is set to 1e-2 * sqrt(accuracy).
Setting it to 0.0 is not recommended. Defaults to -1.
rescale : float
Scaling factor (in log10) used to trigger f value
rescaling. If 0, rescale at each iteration. If a large
value, never rescale. If < 0, rescale is set to 1.3.
finite_diff_rel_step : None or array_like, optional
If `jac in ['2-point', '3-point', 'cs']` the relative step size to
use for numerical approximation of the jacobian. The absolute step
size is computed as ``h = rel_step * sign(x) * max(1, abs(x))``,
possibly adjusted to fit into the bounds. For ``method='3-point'``
the sign of `h` is ignored. If None (default) then step is selected
automatically.
maxfun : int
Maximum number of function evaluations. If None, `maxfun` is
set to max(100, 10*len(x0)). Defaults to None.
"""
_check_unknown_options(unknown_options)
fmin = minfev
pgtol = gtol
xp = array_namespace(x0)
x0 = atleast_nd(x0, ndim=1, xp=xp)
dtype = xp.float64
if xp.isdtype(x0.dtype, "real floating"):
dtype = x0.dtype
x0 = xp.reshape(xp.astype(x0, dtype), -1)
n = len(x0)
if bounds is None:
bounds = [(None,None)] * n
if len(bounds) != n:
raise ValueError('length of x0 != length of bounds')
new_bounds = old_bound_to_new(bounds)
if mesg_num is not None:
messages = {0:MSG_NONE, 1:MSG_ITER, 2:MSG_INFO, 3:MSG_VERS,
4:MSG_EXIT, 5:MSG_ALL}.get(mesg_num, MSG_ALL)
elif disp:
messages = MSG_ALL
else:
messages = MSG_NONE
sf = _prepare_scalar_function(fun, x0, jac=jac, args=args, epsilon=eps,
finite_diff_rel_step=finite_diff_rel_step,
bounds=new_bounds)
func_and_grad = sf.fun_and_grad
"""
low, up : the bounds (lists of floats)
if low is None, the lower bounds are removed.
if up is None, the upper bounds are removed.
low and up defaults to None
"""
low = zeros(n)
up = zeros(n)
for i in range(n):
if bounds[i] is None:
l, u = -inf, inf
else:
l,u = bounds[i]
if l is None:
low[i] = -inf
else:
low[i] = l
if u is None:
up[i] = inf
else:
up[i] = u
if scale is None:
scale = array([])
if offset is None:
offset = array([])
if maxfun is None:
maxfun = max(100, 10*len(x0))
rc, nf, nit, x, funv, jacv = moduleTNC.tnc_minimize(
func_and_grad, x0, low, up, scale,
offset, messages, maxCGit, maxfun,
eta, stepmx, accuracy, fmin, ftol,
xtol, pgtol, rescale, callback
)
# the TNC documentation states: "On output, x, f and g may be very
# slightly out of sync because of scaling". Therefore re-evaluate
# func_and_grad so they are synced.
funv, jacv = func_and_grad(x)
return OptimizeResult(x=x, fun=funv, jac=jacv, nfev=sf.nfev,
nit=nit, status=rc, message=RCSTRINGS[rc],
success=(-1 < rc < 3))
@@ -0,0 +1,12 @@
from ._trlib import TRLIBQuadraticSubproblem
__all__ = ['TRLIBQuadraticSubproblem', 'get_trlib_quadratic_subproblem']
def get_trlib_quadratic_subproblem(tol_rel_i=-2.0, tol_rel_b=-3.0, disp=False):
def subproblem_factory(x, fun, jac, hess, hessp):
return TRLIBQuadraticSubproblem(x, fun, jac, hess, hessp,
tol_rel_i=tol_rel_i,
tol_rel_b=tol_rel_b,
disp=disp)
return subproblem_factory
@@ -0,0 +1,304 @@
"""Trust-region optimization."""
import math
import warnings
import numpy as np
import scipy.linalg
from ._optimize import (_check_unknown_options, _status_message,
OptimizeResult, _prepare_scalar_function,
_call_callback_maybe_halt)
from scipy.optimize._hessian_update_strategy import HessianUpdateStrategy
from scipy.optimize._differentiable_functions import FD_METHODS
__all__ = []
def _wrap_function(function, args):
# wraps a minimizer function to count number of evaluations
# and to easily provide an args kwd.
ncalls = [0]
if function is None:
return ncalls, None
def function_wrapper(x, *wrapper_args):
ncalls[0] += 1
# A copy of x is sent to the user function (gh13740)
return function(np.copy(x), *(wrapper_args + args))
return ncalls, function_wrapper
class BaseQuadraticSubproblem:
"""
Base/abstract class defining the quadratic model for trust-region
minimization. Child classes must implement the ``solve`` method.
Values of the objective function, Jacobian and Hessian (if provided) at
the current iterate ``x`` are evaluated on demand and then stored as
attributes ``fun``, ``jac``, ``hess``.
"""
def __init__(self, x, fun, jac, hess=None, hessp=None):
self._x = x
self._f = None
self._g = None
self._h = None
self._g_mag = None
self._cauchy_point = None
self._newton_point = None
self._fun = fun
self._jac = jac
self._hess = hess
self._hessp = hessp
def __call__(self, p):
return self.fun + np.dot(self.jac, p) + 0.5 * np.dot(p, self.hessp(p))
@property
def fun(self):
"""Value of objective function at current iteration."""
if self._f is None:
self._f = self._fun(self._x)
return self._f
@property
def jac(self):
"""Value of Jacobian of objective function at current iteration."""
if self._g is None:
self._g = self._jac(self._x)
return self._g
@property
def hess(self):
"""Value of Hessian of objective function at current iteration."""
if self._h is None:
self._h = self._hess(self._x)
return self._h
def hessp(self, p):
if self._hessp is not None:
return self._hessp(self._x, p)
else:
return np.dot(self.hess, p)
@property
def jac_mag(self):
"""Magnitude of jacobian of objective function at current iteration."""
if self._g_mag is None:
self._g_mag = scipy.linalg.norm(self.jac)
return self._g_mag
def get_boundaries_intersections(self, z, d, trust_radius):
"""
Solve the scalar quadratic equation ``||z + t d|| == trust_radius``.
This is like a line-sphere intersection.
Return the two values of t, sorted from low to high.
"""
a = np.dot(d, d)
b = 2 * np.dot(z, d)
c = np.dot(z, z) - trust_radius**2
sqrt_discriminant = math.sqrt(b*b - 4*a*c)
# The following calculation is mathematically
# equivalent to:
# ta = (-b - sqrt_discriminant) / (2*a)
# tb = (-b + sqrt_discriminant) / (2*a)
# but produce smaller round off errors.
# Look at Matrix Computation p.97
# for a better justification.
aux = b + math.copysign(sqrt_discriminant, b)
ta = -aux / (2*a)
tb = -2*c / aux
return sorted([ta, tb])
def solve(self, trust_radius):
raise NotImplementedError('The solve method should be implemented by '
'the child class')
def _minimize_trust_region(fun, x0, args=(), jac=None, hess=None, hessp=None,
subproblem=None, initial_trust_radius=1.0,
max_trust_radius=1000.0, eta=0.15, gtol=1e-4,
maxiter=None, disp=False, return_all=False,
callback=None, inexact=True, **unknown_options):
"""
Minimization of scalar function of one or more variables using a
trust-region algorithm.
Options for the trust-region algorithm are:
initial_trust_radius : float
Initial trust radius.
max_trust_radius : float
Never propose steps that are longer than this value.
eta : float
Trust region related acceptance stringency for proposed steps.
gtol : float
Gradient norm must be less than `gtol`
before successful termination.
maxiter : int
Maximum number of iterations to perform.
disp : bool
If True, print convergence message.
inexact : bool
Accuracy to solve subproblems. If True requires less nonlinear
iterations, but more vector products. Only effective for method
trust-krylov.
This function is called by the `minimize` function.
It is not supposed to be called directly.
"""
_check_unknown_options(unknown_options)
if jac is None:
raise ValueError('Jacobian is currently required for trust-region '
'methods')
if hess is None and hessp is None:
raise ValueError('Either the Hessian or the Hessian-vector product '
'is currently required for trust-region methods')
if subproblem is None:
raise ValueError('A subproblem solving strategy is required for '
'trust-region methods')
if not (0 <= eta < 0.25):
raise Exception('invalid acceptance stringency')
if max_trust_radius <= 0:
raise Exception('the max trust radius must be positive')
if initial_trust_radius <= 0:
raise ValueError('the initial trust radius must be positive')
if initial_trust_radius >= max_trust_radius:
raise ValueError('the initial trust radius must be less than the '
'max trust radius')
# force the initial guess into a nice format
x0 = np.asarray(x0).flatten()
# A ScalarFunction representing the problem. This caches calls to fun, jac,
# hess.
sf = _prepare_scalar_function(fun, x0, jac=jac, hess=hess, args=args)
fun = sf.fun
jac = sf.grad
if callable(hess):
hess = sf.hess
elif callable(hessp):
# this elif statement must come before examining whether hess
# is estimated by FD methods or a HessianUpdateStrategy
pass
elif (hess in FD_METHODS or isinstance(hess, HessianUpdateStrategy)):
# If the Hessian is being estimated by finite differences or a
# Hessian update strategy then ScalarFunction.hess returns a
# LinearOperator or a HessianUpdateStrategy. This enables the
# calculation/creation of a hessp. BUT you only want to do this
# if the user *hasn't* provided a callable(hessp) function.
hess = None
def hessp(x, p, *args):
return sf.hess(x).dot(p)
else:
raise ValueError('Either the Hessian or the Hessian-vector product '
'is currently required for trust-region methods')
# ScalarFunction doesn't represent hessp
nhessp, hessp = _wrap_function(hessp, args)
# limit the number of iterations
if maxiter is None:
maxiter = len(x0)*200
# init the search status
warnflag = 0
# initialize the search
trust_radius = initial_trust_radius
x = x0
if return_all:
allvecs = [x]
m = subproblem(x, fun, jac, hess, hessp)
k = 0
# search for the function min
# do not even start if the gradient is small enough
while m.jac_mag >= gtol:
# Solve the sub-problem.
# This gives us the proposed step relative to the current position
# and it tells us whether the proposed step
# has reached the trust region boundary or not.
try:
p, hits_boundary = m.solve(trust_radius)
except np.linalg.LinAlgError:
warnflag = 3
break
# calculate the predicted value at the proposed point
predicted_value = m(p)
# define the local approximation at the proposed point
x_proposed = x + p
m_proposed = subproblem(x_proposed, fun, jac, hess, hessp)
# evaluate the ratio defined in equation (4.4)
actual_reduction = m.fun - m_proposed.fun
predicted_reduction = m.fun - predicted_value
if predicted_reduction <= 0:
warnflag = 2
break
rho = actual_reduction / predicted_reduction
# update the trust radius according to the actual/predicted ratio
if rho < 0.25:
trust_radius *= 0.25
elif rho > 0.75 and hits_boundary:
trust_radius = min(2*trust_radius, max_trust_radius)
# if the ratio is high enough then accept the proposed step
if rho > eta:
x = x_proposed
m = m_proposed
# append the best guess, call back, increment the iteration count
if return_all:
allvecs.append(np.copy(x))
k += 1
intermediate_result = OptimizeResult(x=x, fun=m.fun)
if _call_callback_maybe_halt(callback, intermediate_result):
break
# check if the gradient is small enough to stop
if m.jac_mag < gtol:
warnflag = 0
break
# check if we have looked at enough iterations
if k >= maxiter:
warnflag = 1
break
# print some stuff if requested
status_messages = (
_status_message['success'],
_status_message['maxiter'],
'A bad approximation caused failure to predict improvement.',
'A linalg error occurred, such as a non-psd Hessian.',
)
if disp:
if warnflag == 0:
print(status_messages[warnflag])
else:
warnings.warn(status_messages[warnflag], RuntimeWarning, stacklevel=3)
print(" Current function value: %f" % m.fun)
print(" Iterations: %d" % k)
print(" Function evaluations: %d" % sf.nfev)
print(" Gradient evaluations: %d" % sf.ngev)
print(" Hessian evaluations: %d" % (sf.nhev + nhessp[0]))
result = OptimizeResult(x=x, success=(warnflag == 0), status=warnflag,
fun=m.fun, jac=m.jac, nfev=sf.nfev, njev=sf.ngev,
nhev=sf.nhev + nhessp[0], nit=k,
message=status_messages[warnflag])
if hess is not None:
result['hess'] = m.hess
if return_all:
result['allvecs'] = allvecs
return result
@@ -0,0 +1,6 @@
"""This module contains the equality constrained SQP solver."""
from .minimize_trustregion_constr import _minimize_trustregion_constr
__all__ = ['_minimize_trustregion_constr']
@@ -0,0 +1,390 @@
import numpy as np
import scipy.sparse as sps
class CanonicalConstraint:
"""Canonical constraint to use with trust-constr algorithm.
It represents the set of constraints of the form::
f_eq(x) = 0
f_ineq(x) <= 0
where ``f_eq`` and ``f_ineq`` are evaluated by a single function, see
below.
The class is supposed to be instantiated by factory methods, which
should prepare the parameters listed below.
Parameters
----------
n_eq, n_ineq : int
Number of equality and inequality constraints respectively.
fun : callable
Function defining the constraints. The signature is
``fun(x) -> c_eq, c_ineq``, where ``c_eq`` is ndarray with `n_eq`
components and ``c_ineq`` is ndarray with `n_ineq` components.
jac : callable
Function to evaluate the Jacobian of the constraint. The signature
is ``jac(x) -> J_eq, J_ineq``, where ``J_eq`` and ``J_ineq`` are
either ndarray of csr_matrix of shapes (n_eq, n) and (n_ineq, n),
respectively.
hess : callable
Function to evaluate the Hessian of the constraints multiplied
by Lagrange multipliers, that is
``dot(f_eq, v_eq) + dot(f_ineq, v_ineq)``. The signature is
``hess(x, v_eq, v_ineq) -> H``, where ``H`` has an implied
shape (n, n) and provide a matrix-vector product operation
``H.dot(p)``.
keep_feasible : ndarray, shape (n_ineq,)
Mask indicating which inequality constraints should be kept feasible.
"""
def __init__(self, n_eq, n_ineq, fun, jac, hess, keep_feasible):
self.n_eq = n_eq
self.n_ineq = n_ineq
self.fun = fun
self.jac = jac
self.hess = hess
self.keep_feasible = keep_feasible
@classmethod
def from_PreparedConstraint(cls, constraint):
"""Create an instance from `PreparedConstrained` object."""
lb, ub = constraint.bounds
cfun = constraint.fun
keep_feasible = constraint.keep_feasible
if np.all(lb == -np.inf) and np.all(ub == np.inf):
return cls.empty(cfun.n)
if np.all(lb == -np.inf) and np.all(ub == np.inf):
return cls.empty(cfun.n)
elif np.all(lb == ub):
return cls._equal_to_canonical(cfun, lb)
elif np.all(lb == -np.inf):
return cls._less_to_canonical(cfun, ub, keep_feasible)
elif np.all(ub == np.inf):
return cls._greater_to_canonical(cfun, lb, keep_feasible)
else:
return cls._interval_to_canonical(cfun, lb, ub, keep_feasible)
@classmethod
def empty(cls, n):
"""Create an "empty" instance.
This "empty" instance is required to allow working with unconstrained
problems as if they have some constraints.
"""
empty_fun = np.empty(0)
empty_jac = np.empty((0, n))
empty_hess = sps.csr_matrix((n, n))
def fun(x):
return empty_fun, empty_fun
def jac(x):
return empty_jac, empty_jac
def hess(x, v_eq, v_ineq):
return empty_hess
return cls(0, 0, fun, jac, hess, np.empty(0, dtype=np.bool_))
@classmethod
def concatenate(cls, canonical_constraints, sparse_jacobian):
"""Concatenate multiple `CanonicalConstraint` into one.
`sparse_jacobian` (bool) determines the Jacobian format of the
concatenated constraint. Note that items in `canonical_constraints`
must have their Jacobians in the same format.
"""
def fun(x):
if canonical_constraints:
eq_all, ineq_all = zip(
*[c.fun(x) for c in canonical_constraints])
else:
eq_all, ineq_all = [], []
return np.hstack(eq_all), np.hstack(ineq_all)
if sparse_jacobian:
vstack = sps.vstack
else:
vstack = np.vstack
def jac(x):
if canonical_constraints:
eq_all, ineq_all = zip(
*[c.jac(x) for c in canonical_constraints])
else:
eq_all, ineq_all = [], []
return vstack(eq_all), vstack(ineq_all)
def hess(x, v_eq, v_ineq):
hess_all = []
index_eq = 0
index_ineq = 0
for c in canonical_constraints:
vc_eq = v_eq[index_eq:index_eq + c.n_eq]
vc_ineq = v_ineq[index_ineq:index_ineq + c.n_ineq]
hess_all.append(c.hess(x, vc_eq, vc_ineq))
index_eq += c.n_eq
index_ineq += c.n_ineq
def matvec(p):
result = np.zeros_like(p)
for h in hess_all:
result += h.dot(p)
return result
n = x.shape[0]
return sps.linalg.LinearOperator((n, n), matvec, dtype=float)
n_eq = sum(c.n_eq for c in canonical_constraints)
n_ineq = sum(c.n_ineq for c in canonical_constraints)
keep_feasible = np.hstack([c.keep_feasible for c in
canonical_constraints])
return cls(n_eq, n_ineq, fun, jac, hess, keep_feasible)
@classmethod
def _equal_to_canonical(cls, cfun, value):
empty_fun = np.empty(0)
n = cfun.n
n_eq = value.shape[0]
n_ineq = 0
keep_feasible = np.empty(0, dtype=bool)
if cfun.sparse_jacobian:
empty_jac = sps.csr_matrix((0, n))
else:
empty_jac = np.empty((0, n))
def fun(x):
return cfun.fun(x) - value, empty_fun
def jac(x):
return cfun.jac(x), empty_jac
def hess(x, v_eq, v_ineq):
return cfun.hess(x, v_eq)
empty_fun = np.empty(0)
n = cfun.n
if cfun.sparse_jacobian:
empty_jac = sps.csr_matrix((0, n))
else:
empty_jac = np.empty((0, n))
return cls(n_eq, n_ineq, fun, jac, hess, keep_feasible)
@classmethod
def _less_to_canonical(cls, cfun, ub, keep_feasible):
empty_fun = np.empty(0)
n = cfun.n
if cfun.sparse_jacobian:
empty_jac = sps.csr_matrix((0, n))
else:
empty_jac = np.empty((0, n))
finite_ub = ub < np.inf
n_eq = 0
n_ineq = np.sum(finite_ub)
if np.all(finite_ub):
def fun(x):
return empty_fun, cfun.fun(x) - ub
def jac(x):
return empty_jac, cfun.jac(x)
def hess(x, v_eq, v_ineq):
return cfun.hess(x, v_ineq)
else:
finite_ub = np.nonzero(finite_ub)[0]
keep_feasible = keep_feasible[finite_ub]
ub = ub[finite_ub]
def fun(x):
return empty_fun, cfun.fun(x)[finite_ub] - ub
def jac(x):
return empty_jac, cfun.jac(x)[finite_ub]
def hess(x, v_eq, v_ineq):
v = np.zeros(cfun.m)
v[finite_ub] = v_ineq
return cfun.hess(x, v)
return cls(n_eq, n_ineq, fun, jac, hess, keep_feasible)
@classmethod
def _greater_to_canonical(cls, cfun, lb, keep_feasible):
empty_fun = np.empty(0)
n = cfun.n
if cfun.sparse_jacobian:
empty_jac = sps.csr_matrix((0, n))
else:
empty_jac = np.empty((0, n))
finite_lb = lb > -np.inf
n_eq = 0
n_ineq = np.sum(finite_lb)
if np.all(finite_lb):
def fun(x):
return empty_fun, lb - cfun.fun(x)
def jac(x):
return empty_jac, -cfun.jac(x)
def hess(x, v_eq, v_ineq):
return cfun.hess(x, -v_ineq)
else:
finite_lb = np.nonzero(finite_lb)[0]
keep_feasible = keep_feasible[finite_lb]
lb = lb[finite_lb]
def fun(x):
return empty_fun, lb - cfun.fun(x)[finite_lb]
def jac(x):
return empty_jac, -cfun.jac(x)[finite_lb]
def hess(x, v_eq, v_ineq):
v = np.zeros(cfun.m)
v[finite_lb] = -v_ineq
return cfun.hess(x, v)
return cls(n_eq, n_ineq, fun, jac, hess, keep_feasible)
@classmethod
def _interval_to_canonical(cls, cfun, lb, ub, keep_feasible):
lb_inf = lb == -np.inf
ub_inf = ub == np.inf
equal = lb == ub
less = lb_inf & ~ub_inf
greater = ub_inf & ~lb_inf
interval = ~equal & ~lb_inf & ~ub_inf
equal = np.nonzero(equal)[0]
less = np.nonzero(less)[0]
greater = np.nonzero(greater)[0]
interval = np.nonzero(interval)[0]
n_less = less.shape[0]
n_greater = greater.shape[0]
n_interval = interval.shape[0]
n_ineq = n_less + n_greater + 2 * n_interval
n_eq = equal.shape[0]
keep_feasible = np.hstack((keep_feasible[less],
keep_feasible[greater],
keep_feasible[interval],
keep_feasible[interval]))
def fun(x):
f = cfun.fun(x)
eq = f[equal] - lb[equal]
le = f[less] - ub[less]
ge = lb[greater] - f[greater]
il = f[interval] - ub[interval]
ig = lb[interval] - f[interval]
return eq, np.hstack((le, ge, il, ig))
def jac(x):
J = cfun.jac(x)
eq = J[equal]
le = J[less]
ge = -J[greater]
il = J[interval]
ig = -il
if sps.issparse(J):
ineq = sps.vstack((le, ge, il, ig))
else:
ineq = np.vstack((le, ge, il, ig))
return eq, ineq
def hess(x, v_eq, v_ineq):
n_start = 0
v_l = v_ineq[n_start:n_start + n_less]
n_start += n_less
v_g = v_ineq[n_start:n_start + n_greater]
n_start += n_greater
v_il = v_ineq[n_start:n_start + n_interval]
n_start += n_interval
v_ig = v_ineq[n_start:n_start + n_interval]
v = np.zeros_like(lb)
v[equal] = v_eq
v[less] = v_l
v[greater] = -v_g
v[interval] = v_il - v_ig
return cfun.hess(x, v)
return cls(n_eq, n_ineq, fun, jac, hess, keep_feasible)
def initial_constraints_as_canonical(n, prepared_constraints, sparse_jacobian):
"""Convert initial values of the constraints to the canonical format.
The purpose to avoid one additional call to the constraints at the initial
point. It takes saved values in `PreparedConstraint`, modififies and
concatenates them to the canonical constraint format.
"""
c_eq = []
c_ineq = []
J_eq = []
J_ineq = []
for c in prepared_constraints:
f = c.fun.f
J = c.fun.J
lb, ub = c.bounds
if np.all(lb == ub):
c_eq.append(f - lb)
J_eq.append(J)
elif np.all(lb == -np.inf):
finite_ub = ub < np.inf
c_ineq.append(f[finite_ub] - ub[finite_ub])
J_ineq.append(J[finite_ub])
elif np.all(ub == np.inf):
finite_lb = lb > -np.inf
c_ineq.append(lb[finite_lb] - f[finite_lb])
J_ineq.append(-J[finite_lb])
else:
lb_inf = lb == -np.inf
ub_inf = ub == np.inf
equal = lb == ub
less = lb_inf & ~ub_inf
greater = ub_inf & ~lb_inf
interval = ~equal & ~lb_inf & ~ub_inf
c_eq.append(f[equal] - lb[equal])
c_ineq.append(f[less] - ub[less])
c_ineq.append(lb[greater] - f[greater])
c_ineq.append(f[interval] - ub[interval])
c_ineq.append(lb[interval] - f[interval])
J_eq.append(J[equal])
J_ineq.append(J[less])
J_ineq.append(-J[greater])
J_ineq.append(J[interval])
J_ineq.append(-J[interval])
c_eq = np.hstack(c_eq) if c_eq else np.empty(0)
c_ineq = np.hstack(c_ineq) if c_ineq else np.empty(0)
if sparse_jacobian:
vstack = sps.vstack
empty = sps.csr_matrix((0, n))
else:
vstack = np.vstack
empty = np.empty((0, n))
J_eq = vstack(J_eq) if J_eq else empty
J_ineq = vstack(J_ineq) if J_ineq else empty
return c_eq, c_ineq, J_eq, J_ineq
@@ -0,0 +1,217 @@
"""Byrd-Omojokun Trust-Region SQP method."""
from scipy.sparse import eye as speye
from .projections import projections
from .qp_subproblem import modified_dogleg, projected_cg, box_intersections
import numpy as np
from numpy.linalg import norm
__all__ = ['equality_constrained_sqp']
def default_scaling(x):
n, = np.shape(x)
return speye(n)
def equality_constrained_sqp(fun_and_constr, grad_and_jac, lagr_hess,
x0, fun0, grad0, constr0,
jac0, stop_criteria,
state,
initial_penalty,
initial_trust_radius,
factorization_method,
trust_lb=None,
trust_ub=None,
scaling=default_scaling):
"""Solve nonlinear equality-constrained problem using trust-region SQP.
Solve optimization problem:
minimize fun(x)
subject to: constr(x) = 0
using Byrd-Omojokun Trust-Region SQP method described in [1]_. Several
implementation details are based on [2]_ and [3]_, p. 549.
References
----------
.. [1] Lalee, Marucha, Jorge Nocedal, and Todd Plantenga. "On the
implementation of an algorithm for large-scale equality
constrained optimization." SIAM Journal on
Optimization 8.3 (1998): 682-706.
.. [2] Byrd, Richard H., Mary E. Hribar, and Jorge Nocedal.
"An interior point algorithm for large-scale nonlinear
programming." SIAM Journal on Optimization 9.4 (1999): 877-900.
.. [3] Nocedal, Jorge, and Stephen J. Wright. "Numerical optimization"
Second Edition (2006).
"""
PENALTY_FACTOR = 0.3 # Rho from formula (3.51), reference [2]_, p.891.
LARGE_REDUCTION_RATIO = 0.9
INTERMEDIARY_REDUCTION_RATIO = 0.3
SUFFICIENT_REDUCTION_RATIO = 1e-8 # Eta from reference [2]_, p.892.
TRUST_ENLARGEMENT_FACTOR_L = 7.0
TRUST_ENLARGEMENT_FACTOR_S = 2.0
MAX_TRUST_REDUCTION = 0.5
MIN_TRUST_REDUCTION = 0.1
SOC_THRESHOLD = 0.1
TR_FACTOR = 0.8 # Zeta from formula (3.21), reference [2]_, p.885.
BOX_FACTOR = 0.5
n, = np.shape(x0) # Number of parameters
# Set default lower and upper bounds.
if trust_lb is None:
trust_lb = np.full(n, -np.inf)
if trust_ub is None:
trust_ub = np.full(n, np.inf)
# Initial values
x = np.copy(x0)
trust_radius = initial_trust_radius
penalty = initial_penalty
# Compute Values
f = fun0
c = grad0
b = constr0
A = jac0
S = scaling(x)
# Get projections
Z, LS, Y = projections(A, factorization_method)
# Compute least-square lagrange multipliers
v = -LS.dot(c)
# Compute Hessian
H = lagr_hess(x, v)
# Update state parameters
optimality = norm(c + A.T.dot(v), np.inf)
constr_violation = norm(b, np.inf) if len(b) > 0 else 0
cg_info = {'niter': 0, 'stop_cond': 0,
'hits_boundary': False}
last_iteration_failed = False
while not stop_criteria(state, x, last_iteration_failed,
optimality, constr_violation,
trust_radius, penalty, cg_info):
# Normal Step - `dn`
# minimize 1/2*||A dn + b||^2
# subject to:
# ||dn|| <= TR_FACTOR * trust_radius
# BOX_FACTOR * lb <= dn <= BOX_FACTOR * ub.
dn = modified_dogleg(A, Y, b,
TR_FACTOR*trust_radius,
BOX_FACTOR*trust_lb,
BOX_FACTOR*trust_ub)
# Tangential Step - `dt`
# Solve the QP problem:
# minimize 1/2 dt.T H dt + dt.T (H dn + c)
# subject to:
# A dt = 0
# ||dt|| <= sqrt(trust_radius**2 - ||dn||**2)
# lb - dn <= dt <= ub - dn
c_t = H.dot(dn) + c
b_t = np.zeros_like(b)
trust_radius_t = np.sqrt(trust_radius**2 - np.linalg.norm(dn)**2)
lb_t = trust_lb - dn
ub_t = trust_ub - dn
dt, cg_info = projected_cg(H, c_t, Z, Y, b_t,
trust_radius_t,
lb_t, ub_t)
# Compute update (normal + tangential steps).
d = dn + dt
# Compute second order model: 1/2 d H d + c.T d + f.
quadratic_model = 1/2*(H.dot(d)).dot(d) + c.T.dot(d)
# Compute linearized constraint: l = A d + b.
linearized_constr = A.dot(d)+b
# Compute new penalty parameter according to formula (3.52),
# reference [2]_, p.891.
vpred = norm(b) - norm(linearized_constr)
# Guarantee `vpred` always positive,
# regardless of roundoff errors.
vpred = max(1e-16, vpred)
previous_penalty = penalty
if quadratic_model > 0:
new_penalty = quadratic_model / ((1-PENALTY_FACTOR)*vpred)
penalty = max(penalty, new_penalty)
# Compute predicted reduction according to formula (3.52),
# reference [2]_, p.891.
predicted_reduction = -quadratic_model + penalty*vpred
# Compute merit function at current point
merit_function = f + penalty*norm(b)
# Evaluate function and constraints at trial point
x_next = x + S.dot(d)
f_next, b_next = fun_and_constr(x_next)
# Compute merit function at trial point
merit_function_next = f_next + penalty*norm(b_next)
# Compute actual reduction according to formula (3.54),
# reference [2]_, p.892.
actual_reduction = merit_function - merit_function_next
# Compute reduction ratio
reduction_ratio = actual_reduction / predicted_reduction
# Second order correction (SOC), reference [2]_, p.892.
if reduction_ratio < SUFFICIENT_REDUCTION_RATIO and \
norm(dn) <= SOC_THRESHOLD * norm(dt):
# Compute second order correction
y = -Y.dot(b_next)
# Make sure increment is inside box constraints
_, t, intersect = box_intersections(d, y, trust_lb, trust_ub)
# Compute tentative point
x_soc = x + S.dot(d + t*y)
f_soc, b_soc = fun_and_constr(x_soc)
# Recompute actual reduction
merit_function_soc = f_soc + penalty*norm(b_soc)
actual_reduction_soc = merit_function - merit_function_soc
# Recompute reduction ratio
reduction_ratio_soc = actual_reduction_soc / predicted_reduction
if intersect and reduction_ratio_soc >= SUFFICIENT_REDUCTION_RATIO:
x_next = x_soc
f_next = f_soc
b_next = b_soc
reduction_ratio = reduction_ratio_soc
# Readjust trust region step, formula (3.55), reference [2]_, p.892.
if reduction_ratio >= LARGE_REDUCTION_RATIO:
trust_radius = max(TRUST_ENLARGEMENT_FACTOR_L * norm(d),
trust_radius)
elif reduction_ratio >= INTERMEDIARY_REDUCTION_RATIO:
trust_radius = max(TRUST_ENLARGEMENT_FACTOR_S * norm(d),
trust_radius)
# Reduce trust region step, according to reference [3]_, p.696.
elif reduction_ratio < SUFFICIENT_REDUCTION_RATIO:
trust_reduction = ((1-SUFFICIENT_REDUCTION_RATIO) /
(1-reduction_ratio))
new_trust_radius = trust_reduction * norm(d)
if new_trust_radius >= MAX_TRUST_REDUCTION * trust_radius:
trust_radius *= MAX_TRUST_REDUCTION
elif new_trust_radius >= MIN_TRUST_REDUCTION * trust_radius:
trust_radius = new_trust_radius
else:
trust_radius *= MIN_TRUST_REDUCTION
# Update iteration
if reduction_ratio >= SUFFICIENT_REDUCTION_RATIO:
x = x_next
f, b = f_next, b_next
c, A = grad_and_jac(x)
S = scaling(x)
# Get projections
Z, LS, Y = projections(A, factorization_method)
# Compute least-square lagrange multipliers
v = -LS.dot(c)
# Compute Hessian
H = lagr_hess(x, v)
# Set Flag
last_iteration_failed = False
# Otimality values
optimality = norm(c + A.T.dot(v), np.inf)
constr_violation = norm(b, np.inf) if len(b) > 0 else 0
else:
penalty = previous_penalty
last_iteration_failed = True
return x, state
@@ -0,0 +1,564 @@
import time
import numpy as np
from scipy.sparse.linalg import LinearOperator
from .._differentiable_functions import VectorFunction
from .._constraints import (
NonlinearConstraint, LinearConstraint, PreparedConstraint, Bounds, strict_bounds)
from .._hessian_update_strategy import BFGS
from .._optimize import OptimizeResult
from .._differentiable_functions import ScalarFunction
from .equality_constrained_sqp import equality_constrained_sqp
from .canonical_constraint import (CanonicalConstraint,
initial_constraints_as_canonical)
from .tr_interior_point import tr_interior_point
from .report import BasicReport, SQPReport, IPReport
TERMINATION_MESSAGES = {
0: "The maximum number of function evaluations is exceeded.",
1: "`gtol` termination condition is satisfied.",
2: "`xtol` termination condition is satisfied.",
3: "`callback` function requested termination."
}
class HessianLinearOperator:
"""Build LinearOperator from hessp"""
def __init__(self, hessp, n):
self.hessp = hessp
self.n = n
def __call__(self, x, *args):
def matvec(p):
return self.hessp(x, p, *args)
return LinearOperator((self.n, self.n), matvec=matvec)
class LagrangianHessian:
"""The Hessian of the Lagrangian as LinearOperator.
The Lagrangian is computed as the objective function plus all the
constraints multiplied with some numbers (Lagrange multipliers).
"""
def __init__(self, n, objective_hess, constraints_hess):
self.n = n
self.objective_hess = objective_hess
self.constraints_hess = constraints_hess
def __call__(self, x, v_eq=np.empty(0), v_ineq=np.empty(0)):
H_objective = self.objective_hess(x)
H_constraints = self.constraints_hess(x, v_eq, v_ineq)
def matvec(p):
return H_objective.dot(p) + H_constraints.dot(p)
return LinearOperator((self.n, self.n), matvec)
def update_state_sqp(state, x, last_iteration_failed, objective, prepared_constraints,
start_time, tr_radius, constr_penalty, cg_info):
state.nit += 1
state.nfev = objective.nfev
state.njev = objective.ngev
state.nhev = objective.nhev
state.constr_nfev = [c.fun.nfev if isinstance(c.fun, VectorFunction) else 0
for c in prepared_constraints]
state.constr_njev = [c.fun.njev if isinstance(c.fun, VectorFunction) else 0
for c in prepared_constraints]
state.constr_nhev = [c.fun.nhev if isinstance(c.fun, VectorFunction) else 0
for c in prepared_constraints]
if not last_iteration_failed:
state.x = x
state.fun = objective.f
state.grad = objective.g
state.v = [c.fun.v for c in prepared_constraints]
state.constr = [c.fun.f for c in prepared_constraints]
state.jac = [c.fun.J for c in prepared_constraints]
# Compute Lagrangian Gradient
state.lagrangian_grad = np.copy(state.grad)
for c in prepared_constraints:
state.lagrangian_grad += c.fun.J.T.dot(c.fun.v)
state.optimality = np.linalg.norm(state.lagrangian_grad, np.inf)
# Compute maximum constraint violation
state.constr_violation = 0
for i in range(len(prepared_constraints)):
lb, ub = prepared_constraints[i].bounds
c = state.constr[i]
state.constr_violation = np.max([state.constr_violation,
np.max(lb - c),
np.max(c - ub)])
state.execution_time = time.time() - start_time
state.tr_radius = tr_radius
state.constr_penalty = constr_penalty
state.cg_niter += cg_info["niter"]
state.cg_stop_cond = cg_info["stop_cond"]
return state
def update_state_ip(state, x, last_iteration_failed, objective,
prepared_constraints, start_time,
tr_radius, constr_penalty, cg_info,
barrier_parameter, barrier_tolerance):
state = update_state_sqp(state, x, last_iteration_failed, objective,
prepared_constraints, start_time, tr_radius,
constr_penalty, cg_info)
state.barrier_parameter = barrier_parameter
state.barrier_tolerance = barrier_tolerance
return state
def _minimize_trustregion_constr(fun, x0, args, grad,
hess, hessp, bounds, constraints,
xtol=1e-8, gtol=1e-8,
barrier_tol=1e-8,
sparse_jacobian=None,
callback=None, maxiter=1000,
verbose=0, finite_diff_rel_step=None,
initial_constr_penalty=1.0, initial_tr_radius=1.0,
initial_barrier_parameter=0.1,
initial_barrier_tolerance=0.1,
factorization_method=None,
disp=False):
"""Minimize a scalar function subject to constraints.
Parameters
----------
gtol : float, optional
Tolerance for termination by the norm of the Lagrangian gradient.
The algorithm will terminate when both the infinity norm (i.e., max
abs value) of the Lagrangian gradient and the constraint violation
are smaller than ``gtol``. Default is 1e-8.
xtol : float, optional
Tolerance for termination by the change of the independent variable.
The algorithm will terminate when ``tr_radius < xtol``, where
``tr_radius`` is the radius of the trust region used in the algorithm.
Default is 1e-8.
barrier_tol : float, optional
Threshold on the barrier parameter for the algorithm termination.
When inequality constraints are present, the algorithm will terminate
only when the barrier parameter is less than `barrier_tol`.
Default is 1e-8.
sparse_jacobian : {bool, None}, optional
Determines how to represent Jacobians of the constraints. If bool,
then Jacobians of all the constraints will be converted to the
corresponding format. If None (default), then Jacobians won't be
converted, but the algorithm can proceed only if they all have the
same format.
initial_tr_radius: float, optional
Initial trust radius. The trust radius gives the maximum distance
between solution points in consecutive iterations. It reflects the
trust the algorithm puts in the local approximation of the optimization
problem. For an accurate local approximation the trust-region should be
large and for an approximation valid only close to the current point it
should be a small one. The trust radius is automatically updated throughout
the optimization process, with ``initial_tr_radius`` being its initial value.
Default is 1 (recommended in [1]_, p. 19).
initial_constr_penalty : float, optional
Initial constraints penalty parameter. The penalty parameter is used for
balancing the requirements of decreasing the objective function
and satisfying the constraints. It is used for defining the merit function:
``merit_function(x) = fun(x) + constr_penalty * constr_norm_l2(x)``,
where ``constr_norm_l2(x)`` is the l2 norm of a vector containing all
the constraints. The merit function is used for accepting or rejecting
trial points and ``constr_penalty`` weights the two conflicting goals
of reducing objective function and constraints. The penalty is automatically
updated throughout the optimization process, with
``initial_constr_penalty`` being its initial value. Default is 1
(recommended in [1]_, p 19).
initial_barrier_parameter, initial_barrier_tolerance: float, optional
Initial barrier parameter and initial tolerance for the barrier subproblem.
Both are used only when inequality constraints are present. For dealing with
optimization problems ``min_x f(x)`` subject to inequality constraints
``c(x) <= 0`` the algorithm introduces slack variables, solving the problem
``min_(x,s) f(x) + barrier_parameter*sum(ln(s))`` subject to the equality
constraints ``c(x) + s = 0`` instead of the original problem. This subproblem
is solved for decreasing values of ``barrier_parameter`` and with decreasing
tolerances for the termination, starting with ``initial_barrier_parameter``
for the barrier parameter and ``initial_barrier_tolerance`` for the
barrier tolerance. Default is 0.1 for both values (recommended in [1]_ p. 19).
Also note that ``barrier_parameter`` and ``barrier_tolerance`` are updated
with the same prefactor.
factorization_method : string or None, optional
Method to factorize the Jacobian of the constraints. Use None (default)
for the auto selection or one of:
- 'NormalEquation' (requires scikit-sparse)
- 'AugmentedSystem'
- 'QRFactorization'
- 'SVDFactorization'
The methods 'NormalEquation' and 'AugmentedSystem' can be used only
with sparse constraints. The projections required by the algorithm
will be computed using, respectively, the normal equation and the
augmented system approaches explained in [1]_. 'NormalEquation'
computes the Cholesky factorization of ``A A.T`` and 'AugmentedSystem'
performs the LU factorization of an augmented system. They usually
provide similar results. 'AugmentedSystem' is used by default for
sparse matrices.
The methods 'QRFactorization' and 'SVDFactorization' can be used
only with dense constraints. They compute the required projections
using, respectively, QR and SVD factorizations. The 'SVDFactorization'
method can cope with Jacobian matrices with deficient row rank and will
be used whenever other factorization methods fail (which may imply the
conversion of sparse matrices to a dense format when required).
By default, 'QRFactorization' is used for dense matrices.
finite_diff_rel_step : None or array_like, optional
Relative step size for the finite difference approximation.
maxiter : int, optional
Maximum number of algorithm iterations. Default is 1000.
verbose : {0, 1, 2}, optional
Level of algorithm's verbosity:
* 0 (default) : work silently.
* 1 : display a termination report.
* 2 : display progress during iterations.
* 3 : display progress during iterations (more complete report).
disp : bool, optional
If True (default), then `verbose` will be set to 1 if it was 0.
Returns
-------
`OptimizeResult` with the fields documented below. Note the following:
1. All values corresponding to the constraints are ordered as they
were passed to the solver. And values corresponding to `bounds`
constraints are put *after* other constraints.
2. All numbers of function, Jacobian or Hessian evaluations correspond
to numbers of actual Python function calls. It means, for example,
that if a Jacobian is estimated by finite differences, then the
number of Jacobian evaluations will be zero and the number of
function evaluations will be incremented by all calls during the
finite difference estimation.
x : ndarray, shape (n,)
Solution found.
optimality : float
Infinity norm of the Lagrangian gradient at the solution.
constr_violation : float
Maximum constraint violation at the solution.
fun : float
Objective function at the solution.
grad : ndarray, shape (n,)
Gradient of the objective function at the solution.
lagrangian_grad : ndarray, shape (n,)
Gradient of the Lagrangian function at the solution.
nit : int
Total number of iterations.
nfev : integer
Number of the objective function evaluations.
njev : integer
Number of the objective function gradient evaluations.
nhev : integer
Number of the objective function Hessian evaluations.
cg_niter : int
Total number of the conjugate gradient method iterations.
method : {'equality_constrained_sqp', 'tr_interior_point'}
Optimization method used.
constr : list of ndarray
List of constraint values at the solution.
jac : list of {ndarray, sparse matrix}
List of the Jacobian matrices of the constraints at the solution.
v : list of ndarray
List of the Lagrange multipliers for the constraints at the solution.
For an inequality constraint a positive multiplier means that the upper
bound is active, a negative multiplier means that the lower bound is
active and if a multiplier is zero it means the constraint is not
active.
constr_nfev : list of int
Number of constraint evaluations for each of the constraints.
constr_njev : list of int
Number of Jacobian matrix evaluations for each of the constraints.
constr_nhev : list of int
Number of Hessian evaluations for each of the constraints.
tr_radius : float
Radius of the trust region at the last iteration.
constr_penalty : float
Penalty parameter at the last iteration, see `initial_constr_penalty`.
barrier_tolerance : float
Tolerance for the barrier subproblem at the last iteration.
Only for problems with inequality constraints.
barrier_parameter : float
Barrier parameter at the last iteration. Only for problems
with inequality constraints.
execution_time : float
Total execution time.
message : str
Termination message.
status : {0, 1, 2, 3}
Termination status:
* 0 : The maximum number of function evaluations is exceeded.
* 1 : `gtol` termination condition is satisfied.
* 2 : `xtol` termination condition is satisfied.
* 3 : `callback` function requested termination.
cg_stop_cond : int
Reason for CG subproblem termination at the last iteration:
* 0 : CG subproblem not evaluated.
* 1 : Iteration limit was reached.
* 2 : Reached the trust-region boundary.
* 3 : Negative curvature detected.
* 4 : Tolerance was satisfied.
References
----------
.. [1] Conn, A. R., Gould, N. I., & Toint, P. L.
Trust region methods. 2000. Siam. pp. 19.
"""
x0 = np.atleast_1d(x0).astype(float)
n_vars = np.size(x0)
if hess is None:
if callable(hessp):
hess = HessianLinearOperator(hessp, n_vars)
else:
hess = BFGS()
if disp and verbose == 0:
verbose = 1
if bounds is not None:
modified_lb = np.nextafter(bounds.lb, -np.inf, where=bounds.lb > -np.inf)
modified_ub = np.nextafter(bounds.ub, np.inf, where=bounds.ub < np.inf)
modified_lb = np.where(np.isfinite(bounds.lb), modified_lb, bounds.lb)
modified_ub = np.where(np.isfinite(bounds.ub), modified_ub, bounds.ub)
bounds = Bounds(modified_lb, modified_ub, keep_feasible=bounds.keep_feasible)
finite_diff_bounds = strict_bounds(bounds.lb, bounds.ub,
bounds.keep_feasible, n_vars)
else:
finite_diff_bounds = (-np.inf, np.inf)
# Define Objective Function
objective = ScalarFunction(fun, x0, args, grad, hess,
finite_diff_rel_step, finite_diff_bounds)
# Put constraints in list format when needed.
if isinstance(constraints, (NonlinearConstraint, LinearConstraint)):
constraints = [constraints]
# Prepare constraints.
prepared_constraints = [
PreparedConstraint(c, x0, sparse_jacobian, finite_diff_bounds)
for c in constraints]
# Check that all constraints are either sparse or dense.
n_sparse = sum(c.fun.sparse_jacobian for c in prepared_constraints)
if 0 < n_sparse < len(prepared_constraints):
raise ValueError("All constraints must have the same kind of the "
"Jacobian --- either all sparse or all dense. "
"You can set the sparsity globally by setting "
"`sparse_jacobian` to either True of False.")
if prepared_constraints:
sparse_jacobian = n_sparse > 0
if bounds is not None:
if sparse_jacobian is None:
sparse_jacobian = True
prepared_constraints.append(PreparedConstraint(bounds, x0,
sparse_jacobian))
# Concatenate initial constraints to the canonical form.
c_eq0, c_ineq0, J_eq0, J_ineq0 = initial_constraints_as_canonical(
n_vars, prepared_constraints, sparse_jacobian)
# Prepare all canonical constraints and concatenate it into one.
canonical_all = [CanonicalConstraint.from_PreparedConstraint(c)
for c in prepared_constraints]
if len(canonical_all) == 0:
canonical = CanonicalConstraint.empty(n_vars)
elif len(canonical_all) == 1:
canonical = canonical_all[0]
else:
canonical = CanonicalConstraint.concatenate(canonical_all,
sparse_jacobian)
# Generate the Hessian of the Lagrangian.
lagrangian_hess = LagrangianHessian(n_vars, objective.hess, canonical.hess)
# Choose appropriate method
if canonical.n_ineq == 0:
method = 'equality_constrained_sqp'
else:
method = 'tr_interior_point'
# Construct OptimizeResult
state = OptimizeResult(
nit=0, nfev=0, njev=0, nhev=0,
cg_niter=0, cg_stop_cond=0,
fun=objective.f, grad=objective.g,
lagrangian_grad=np.copy(objective.g),
constr=[c.fun.f for c in prepared_constraints],
jac=[c.fun.J for c in prepared_constraints],
constr_nfev=[0 for c in prepared_constraints],
constr_njev=[0 for c in prepared_constraints],
constr_nhev=[0 for c in prepared_constraints],
v=[c.fun.v for c in prepared_constraints],
method=method)
# Start counting
start_time = time.time()
# Define stop criteria
if method == 'equality_constrained_sqp':
def stop_criteria(state, x, last_iteration_failed,
optimality, constr_violation,
tr_radius, constr_penalty, cg_info):
state = update_state_sqp(state, x, last_iteration_failed,
objective, prepared_constraints,
start_time, tr_radius, constr_penalty,
cg_info)
if verbose == 2:
BasicReport.print_iteration(state.nit,
state.nfev,
state.cg_niter,
state.fun,
state.tr_radius,
state.optimality,
state.constr_violation)
elif verbose > 2:
SQPReport.print_iteration(state.nit,
state.nfev,
state.cg_niter,
state.fun,
state.tr_radius,
state.optimality,
state.constr_violation,
state.constr_penalty,
state.cg_stop_cond)
state.status = None
state.niter = state.nit # Alias for callback (backward-compatibility)
if callback is not None:
callback_stop = False
try:
callback_stop = callback(state)
except StopIteration:
callback_stop = True
if callback_stop:
state.status = 3
return True
if state.optimality < gtol and state.constr_violation < gtol:
state.status = 1
elif state.tr_radius < xtol:
state.status = 2
elif state.nit >= maxiter:
state.status = 0
return state.status in (0, 1, 2, 3)
elif method == 'tr_interior_point':
def stop_criteria(state, x, last_iteration_failed, tr_radius,
constr_penalty, cg_info, barrier_parameter,
barrier_tolerance):
state = update_state_ip(state, x, last_iteration_failed,
objective, prepared_constraints,
start_time, tr_radius, constr_penalty,
cg_info, barrier_parameter, barrier_tolerance)
if verbose == 2:
BasicReport.print_iteration(state.nit,
state.nfev,
state.cg_niter,
state.fun,
state.tr_radius,
state.optimality,
state.constr_violation)
elif verbose > 2:
IPReport.print_iteration(state.nit,
state.nfev,
state.cg_niter,
state.fun,
state.tr_radius,
state.optimality,
state.constr_violation,
state.constr_penalty,
state.barrier_parameter,
state.cg_stop_cond)
state.status = None
state.niter = state.nit # Alias for callback (backward compatibility)
if callback is not None:
callback_stop = False
try:
callback_stop = callback(state)
except StopIteration:
callback_stop = True
if callback_stop:
state.status = 3
return True
if state.optimality < gtol and state.constr_violation < gtol:
state.status = 1
elif (state.tr_radius < xtol
and state.barrier_parameter < barrier_tol):
state.status = 2
elif state.nit >= maxiter:
state.status = 0
return state.status in (0, 1, 2, 3)
if verbose == 2:
BasicReport.print_header()
elif verbose > 2:
if method == 'equality_constrained_sqp':
SQPReport.print_header()
elif method == 'tr_interior_point':
IPReport.print_header()
# Call inferior function to do the optimization
if method == 'equality_constrained_sqp':
def fun_and_constr(x):
f = objective.fun(x)
c_eq, _ = canonical.fun(x)
return f, c_eq
def grad_and_jac(x):
g = objective.grad(x)
J_eq, _ = canonical.jac(x)
return g, J_eq
_, result = equality_constrained_sqp(
fun_and_constr, grad_and_jac, lagrangian_hess,
x0, objective.f, objective.g,
c_eq0, J_eq0,
stop_criteria, state,
initial_constr_penalty, initial_tr_radius,
factorization_method)
elif method == 'tr_interior_point':
_, result = tr_interior_point(
objective.fun, objective.grad, lagrangian_hess,
n_vars, canonical.n_ineq, canonical.n_eq,
canonical.fun, canonical.jac,
x0, objective.f, objective.g,
c_ineq0, J_ineq0, c_eq0, J_eq0,
stop_criteria,
canonical.keep_feasible,
xtol, state, initial_barrier_parameter,
initial_barrier_tolerance,
initial_constr_penalty, initial_tr_radius,
factorization_method)
# Status 3 occurs when the callback function requests termination,
# this is assumed to not be a success.
result.success = True if result.status in (1, 2) else False
result.message = TERMINATION_MESSAGES[result.status]
# Alias (for backward compatibility with 1.1.0)
result.niter = result.nit
if verbose == 2:
BasicReport.print_footer()
elif verbose > 2:
if method == 'equality_constrained_sqp':
SQPReport.print_footer()
elif method == 'tr_interior_point':
IPReport.print_footer()
if verbose >= 1:
print(result.message)
print("Number of iterations: {}, function evaluations: {}, "
"CG iterations: {}, optimality: {:.2e}, "
"constraint violation: {:.2e}, execution time: {:4.2} s."
.format(result.nit, result.nfev, result.cg_niter,
result.optimality, result.constr_violation,
result.execution_time))
return result
@@ -0,0 +1,407 @@
"""Basic linear factorizations needed by the solver."""
from scipy.sparse import (bmat, csc_matrix, eye, issparse)
from scipy.sparse.linalg import LinearOperator
import scipy.linalg
import scipy.sparse.linalg
try:
from sksparse.cholmod import cholesky_AAt
sksparse_available = True
except ImportError:
import warnings
sksparse_available = False
import numpy as np
from warnings import warn
__all__ = [
'orthogonality',
'projections',
]
def orthogonality(A, g):
"""Measure orthogonality between a vector and the null space of a matrix.
Compute a measure of orthogonality between the null space
of the (possibly sparse) matrix ``A`` and a given vector ``g``.
The formula is a simplified (and cheaper) version of formula (3.13)
from [1]_.
``orth = norm(A g, ord=2)/(norm(A, ord='fro')*norm(g, ord=2))``.
References
----------
.. [1] Gould, Nicholas IM, Mary E. Hribar, and Jorge Nocedal.
"On the solution of equality constrained quadratic
programming problems arising in optimization."
SIAM Journal on Scientific Computing 23.4 (2001): 1376-1395.
"""
# Compute vector norms
norm_g = np.linalg.norm(g)
# Compute Froebnius norm of the matrix A
if issparse(A):
norm_A = scipy.sparse.linalg.norm(A, ord='fro')
else:
norm_A = np.linalg.norm(A, ord='fro')
# Check if norms are zero
if norm_g == 0 or norm_A == 0:
return 0
norm_A_g = np.linalg.norm(A.dot(g))
# Orthogonality measure
orth = norm_A_g / (norm_A*norm_g)
return orth
def normal_equation_projections(A, m, n, orth_tol, max_refin, tol):
"""Return linear operators for matrix A using ``NormalEquation`` approach.
"""
# Cholesky factorization
factor = cholesky_AAt(A)
# z = x - A.T inv(A A.T) A x
def null_space(x):
v = factor(A.dot(x))
z = x - A.T.dot(v)
# Iterative refinement to improve roundoff
# errors described in [2]_, algorithm 5.1.
k = 0
while orthogonality(A, z) > orth_tol:
if k >= max_refin:
break
# z_next = z - A.T inv(A A.T) A z
v = factor(A.dot(z))
z = z - A.T.dot(v)
k += 1
return z
# z = inv(A A.T) A x
def least_squares(x):
return factor(A.dot(x))
# z = A.T inv(A A.T) x
def row_space(x):
return A.T.dot(factor(x))
return null_space, least_squares, row_space
def augmented_system_projections(A, m, n, orth_tol, max_refin, tol):
"""Return linear operators for matrix A - ``AugmentedSystem``."""
# Form augmented system
K = csc_matrix(bmat([[eye(n), A.T], [A, None]]))
# LU factorization
# TODO: Use a symmetric indefinite factorization
# to solve the system twice as fast (because
# of the symmetry).
try:
solve = scipy.sparse.linalg.factorized(K)
except RuntimeError:
warn("Singular Jacobian matrix. Using dense SVD decomposition to "
"perform the factorizations.",
stacklevel=3)
return svd_factorization_projections(A.toarray(),
m, n, orth_tol,
max_refin, tol)
# z = x - A.T inv(A A.T) A x
# is computed solving the extended system:
# [I A.T] * [ z ] = [x]
# [A O ] [aux] [0]
def null_space(x):
# v = [x]
# [0]
v = np.hstack([x, np.zeros(m)])
# lu_sol = [ z ]
# [aux]
lu_sol = solve(v)
z = lu_sol[:n]
# Iterative refinement to improve roundoff
# errors described in [2]_, algorithm 5.2.
k = 0
while orthogonality(A, z) > orth_tol:
if k >= max_refin:
break
# new_v = [x] - [I A.T] * [ z ]
# [0] [A O ] [aux]
new_v = v - K.dot(lu_sol)
# [I A.T] * [delta z ] = new_v
# [A O ] [delta aux]
lu_update = solve(new_v)
# [ z ] += [delta z ]
# [aux] [delta aux]
lu_sol += lu_update
z = lu_sol[:n]
k += 1
# return z = x - A.T inv(A A.T) A x
return z
# z = inv(A A.T) A x
# is computed solving the extended system:
# [I A.T] * [aux] = [x]
# [A O ] [ z ] [0]
def least_squares(x):
# v = [x]
# [0]
v = np.hstack([x, np.zeros(m)])
# lu_sol = [aux]
# [ z ]
lu_sol = solve(v)
# return z = inv(A A.T) A x
return lu_sol[n:m+n]
# z = A.T inv(A A.T) x
# is computed solving the extended system:
# [I A.T] * [ z ] = [0]
# [A O ] [aux] [x]
def row_space(x):
# v = [0]
# [x]
v = np.hstack([np.zeros(n), x])
# lu_sol = [ z ]
# [aux]
lu_sol = solve(v)
# return z = A.T inv(A A.T) x
return lu_sol[:n]
return null_space, least_squares, row_space
def qr_factorization_projections(A, m, n, orth_tol, max_refin, tol):
"""Return linear operators for matrix A using ``QRFactorization`` approach.
"""
# QRFactorization
Q, R, P = scipy.linalg.qr(A.T, pivoting=True, mode='economic')
if np.linalg.norm(R[-1, :], np.inf) < tol:
warn('Singular Jacobian matrix. Using SVD decomposition to ' +
'perform the factorizations.',
stacklevel=3)
return svd_factorization_projections(A, m, n,
orth_tol,
max_refin,
tol)
# z = x - A.T inv(A A.T) A x
def null_space(x):
# v = P inv(R) Q.T x
aux1 = Q.T.dot(x)
aux2 = scipy.linalg.solve_triangular(R, aux1, lower=False)
v = np.zeros(m)
v[P] = aux2
z = x - A.T.dot(v)
# Iterative refinement to improve roundoff
# errors described in [2]_, algorithm 5.1.
k = 0
while orthogonality(A, z) > orth_tol:
if k >= max_refin:
break
# v = P inv(R) Q.T x
aux1 = Q.T.dot(z)
aux2 = scipy.linalg.solve_triangular(R, aux1, lower=False)
v[P] = aux2
# z_next = z - A.T v
z = z - A.T.dot(v)
k += 1
return z
# z = inv(A A.T) A x
def least_squares(x):
# z = P inv(R) Q.T x
aux1 = Q.T.dot(x)
aux2 = scipy.linalg.solve_triangular(R, aux1, lower=False)
z = np.zeros(m)
z[P] = aux2
return z
# z = A.T inv(A A.T) x
def row_space(x):
# z = Q inv(R.T) P.T x
aux1 = x[P]
aux2 = scipy.linalg.solve_triangular(R, aux1,
lower=False,
trans='T')
z = Q.dot(aux2)
return z
return null_space, least_squares, row_space
def svd_factorization_projections(A, m, n, orth_tol, max_refin, tol):
"""Return linear operators for matrix A using ``SVDFactorization`` approach.
"""
# SVD Factorization
U, s, Vt = scipy.linalg.svd(A, full_matrices=False)
# Remove dimensions related with very small singular values
U = U[:, s > tol]
Vt = Vt[s > tol, :]
s = s[s > tol]
# z = x - A.T inv(A A.T) A x
def null_space(x):
# v = U 1/s V.T x = inv(A A.T) A x
aux1 = Vt.dot(x)
aux2 = 1/s*aux1
v = U.dot(aux2)
z = x - A.T.dot(v)
# Iterative refinement to improve roundoff
# errors described in [2]_, algorithm 5.1.
k = 0
while orthogonality(A, z) > orth_tol:
if k >= max_refin:
break
# v = U 1/s V.T x = inv(A A.T) A x
aux1 = Vt.dot(z)
aux2 = 1/s*aux1
v = U.dot(aux2)
# z_next = z - A.T v
z = z - A.T.dot(v)
k += 1
return z
# z = inv(A A.T) A x
def least_squares(x):
# z = U 1/s V.T x = inv(A A.T) A x
aux1 = Vt.dot(x)
aux2 = 1/s*aux1
z = U.dot(aux2)
return z
# z = A.T inv(A A.T) x
def row_space(x):
# z = V 1/s U.T x
aux1 = U.T.dot(x)
aux2 = 1/s*aux1
z = Vt.T.dot(aux2)
return z
return null_space, least_squares, row_space
def projections(A, method=None, orth_tol=1e-12, max_refin=3, tol=1e-15):
"""Return three linear operators related with a given matrix A.
Parameters
----------
A : sparse matrix (or ndarray), shape (m, n)
Matrix ``A`` used in the projection.
method : string, optional
Method used for compute the given linear
operators. Should be one of:
- 'NormalEquation': The operators
will be computed using the
so-called normal equation approach
explained in [1]_. In order to do
so the Cholesky factorization of
``(A A.T)`` is computed. Exclusive
for sparse matrices.
- 'AugmentedSystem': The operators
will be computed using the
so-called augmented system approach
explained in [1]_. Exclusive
for sparse matrices.
- 'QRFactorization': Compute projections
using QR factorization. Exclusive for
dense matrices.
- 'SVDFactorization': Compute projections
using SVD factorization. Exclusive for
dense matrices.
orth_tol : float, optional
Tolerance for iterative refinements.
max_refin : int, optional
Maximum number of iterative refinements.
tol : float, optional
Tolerance for singular values.
Returns
-------
Z : LinearOperator, shape (n, n)
Null-space operator. For a given vector ``x``,
the null space operator is equivalent to apply
a projection matrix ``P = I - A.T inv(A A.T) A``
to the vector. It can be shown that this is
equivalent to project ``x`` into the null space
of A.
LS : LinearOperator, shape (m, n)
Least-squares operator. For a given vector ``x``,
the least-squares operator is equivalent to apply a
pseudoinverse matrix ``pinv(A.T) = inv(A A.T) A``
to the vector. It can be shown that this vector
``pinv(A.T) x`` is the least_square solution to
``A.T y = x``.
Y : LinearOperator, shape (n, m)
Row-space operator. For a given vector ``x``,
the row-space operator is equivalent to apply a
projection matrix ``Q = A.T inv(A A.T)``
to the vector. It can be shown that this
vector ``y = Q x`` the minimum norm solution
of ``A y = x``.
Notes
-----
Uses iterative refinements described in [1]
during the computation of ``Z`` in order to
cope with the possibility of large roundoff errors.
References
----------
.. [1] Gould, Nicholas IM, Mary E. Hribar, and Jorge Nocedal.
"On the solution of equality constrained quadratic
programming problems arising in optimization."
SIAM Journal on Scientific Computing 23.4 (2001): 1376-1395.
"""
m, n = np.shape(A)
# The factorization of an empty matrix
# only works for the sparse representation.
if m*n == 0:
A = csc_matrix(A)
# Check Argument
if issparse(A):
if method is None:
method = "AugmentedSystem"
if method not in ("NormalEquation", "AugmentedSystem"):
raise ValueError("Method not allowed for sparse matrix.")
if method == "NormalEquation" and not sksparse_available:
warnings.warn("Only accepts 'NormalEquation' option when "
"scikit-sparse is available. Using "
"'AugmentedSystem' option instead.",
ImportWarning, stacklevel=3)
method = 'AugmentedSystem'
else:
if method is None:
method = "QRFactorization"
if method not in ("QRFactorization", "SVDFactorization"):
raise ValueError("Method not allowed for dense array.")
if method == 'NormalEquation':
null_space, least_squares, row_space \
= normal_equation_projections(A, m, n, orth_tol, max_refin, tol)
elif method == 'AugmentedSystem':
null_space, least_squares, row_space \
= augmented_system_projections(A, m, n, orth_tol, max_refin, tol)
elif method == "QRFactorization":
null_space, least_squares, row_space \
= qr_factorization_projections(A, m, n, orth_tol, max_refin, tol)
elif method == "SVDFactorization":
null_space, least_squares, row_space \
= svd_factorization_projections(A, m, n, orth_tol, max_refin, tol)
Z = LinearOperator((n, n), null_space)
LS = LinearOperator((m, n), least_squares)
Y = LinearOperator((n, m), row_space)
return Z, LS, Y
@@ -0,0 +1,637 @@
"""Equality-constrained quadratic programming solvers."""
from scipy.sparse import (linalg, bmat, csc_matrix)
from math import copysign
import numpy as np
from numpy.linalg import norm
__all__ = [
'eqp_kktfact',
'sphere_intersections',
'box_intersections',
'box_sphere_intersections',
'inside_box_boundaries',
'modified_dogleg',
'projected_cg'
]
# For comparison with the projected CG
def eqp_kktfact(H, c, A, b):
"""Solve equality-constrained quadratic programming (EQP) problem.
Solve ``min 1/2 x.T H x + x.t c`` subject to ``A x + b = 0``
using direct factorization of the KKT system.
Parameters
----------
H : sparse matrix, shape (n, n)
Hessian matrix of the EQP problem.
c : array_like, shape (n,)
Gradient of the quadratic objective function.
A : sparse matrix
Jacobian matrix of the EQP problem.
b : array_like, shape (m,)
Right-hand side of the constraint equation.
Returns
-------
x : array_like, shape (n,)
Solution of the KKT problem.
lagrange_multipliers : ndarray, shape (m,)
Lagrange multipliers of the KKT problem.
"""
n, = np.shape(c) # Number of parameters
m, = np.shape(b) # Number of constraints
# Karush-Kuhn-Tucker matrix of coefficients.
# Defined as in Nocedal/Wright "Numerical
# Optimization" p.452 in Eq. (16.4).
kkt_matrix = csc_matrix(bmat([[H, A.T], [A, None]]))
# Vector of coefficients.
kkt_vec = np.hstack([-c, -b])
# TODO: Use a symmetric indefinite factorization
# to solve the system twice as fast (because
# of the symmetry).
lu = linalg.splu(kkt_matrix)
kkt_sol = lu.solve(kkt_vec)
x = kkt_sol[:n]
lagrange_multipliers = -kkt_sol[n:n+m]
return x, lagrange_multipliers
def sphere_intersections(z, d, trust_radius,
entire_line=False):
"""Find the intersection between segment (or line) and spherical constraints.
Find the intersection between the segment (or line) defined by the
parametric equation ``x(t) = z + t*d`` and the ball
``||x|| <= trust_radius``.
Parameters
----------
z : array_like, shape (n,)
Initial point.
d : array_like, shape (n,)
Direction.
trust_radius : float
Ball radius.
entire_line : bool, optional
When ``True``, the function returns the intersection between the line
``x(t) = z + t*d`` (``t`` can assume any value) and the ball
``||x|| <= trust_radius``. When ``False``, the function returns the intersection
between the segment ``x(t) = z + t*d``, ``0 <= t <= 1``, and the ball.
Returns
-------
ta, tb : float
The line/segment ``x(t) = z + t*d`` is inside the ball for
for ``ta <= t <= tb``.
intersect : bool
When ``True``, there is a intersection between the line/segment
and the sphere. On the other hand, when ``False``, there is no
intersection.
"""
# Special case when d=0
if norm(d) == 0:
return 0, 0, False
# Check for inf trust_radius
if np.isinf(trust_radius):
if entire_line:
ta = -np.inf
tb = np.inf
else:
ta = 0
tb = 1
intersect = True
return ta, tb, intersect
a = np.dot(d, d)
b = 2 * np.dot(z, d)
c = np.dot(z, z) - trust_radius**2
discriminant = b*b - 4*a*c
if discriminant < 0:
intersect = False
return 0, 0, intersect
sqrt_discriminant = np.sqrt(discriminant)
# The following calculation is mathematically
# equivalent to:
# ta = (-b - sqrt_discriminant) / (2*a)
# tb = (-b + sqrt_discriminant) / (2*a)
# but produce smaller round off errors.
# Look at Matrix Computation p.97
# for a better justification.
aux = b + copysign(sqrt_discriminant, b)
ta = -aux / (2*a)
tb = -2*c / aux
ta, tb = sorted([ta, tb])
if entire_line:
intersect = True
else:
# Checks to see if intersection happens
# within vectors length.
if tb < 0 or ta > 1:
intersect = False
ta = 0
tb = 0
else:
intersect = True
# Restrict intersection interval
# between 0 and 1.
ta = max(0, ta)
tb = min(1, tb)
return ta, tb, intersect
def box_intersections(z, d, lb, ub,
entire_line=False):
"""Find the intersection between segment (or line) and box constraints.
Find the intersection between the segment (or line) defined by the
parametric equation ``x(t) = z + t*d`` and the rectangular box
``lb <= x <= ub``.
Parameters
----------
z : array_like, shape (n,)
Initial point.
d : array_like, shape (n,)
Direction.
lb : array_like, shape (n,)
Lower bounds to each one of the components of ``x``. Used
to delimit the rectangular box.
ub : array_like, shape (n, )
Upper bounds to each one of the components of ``x``. Used
to delimit the rectangular box.
entire_line : bool, optional
When ``True``, the function returns the intersection between the line
``x(t) = z + t*d`` (``t`` can assume any value) and the rectangular
box. When ``False``, the function returns the intersection between the segment
``x(t) = z + t*d``, ``0 <= t <= 1``, and the rectangular box.
Returns
-------
ta, tb : float
The line/segment ``x(t) = z + t*d`` is inside the box for
for ``ta <= t <= tb``.
intersect : bool
When ``True``, there is a intersection between the line (or segment)
and the rectangular box. On the other hand, when ``False``, there is no
intersection.
"""
# Make sure it is a numpy array
z = np.asarray(z)
d = np.asarray(d)
lb = np.asarray(lb)
ub = np.asarray(ub)
# Special case when d=0
if norm(d) == 0:
return 0, 0, False
# Get values for which d==0
zero_d = (d == 0)
# If the boundaries are not satisfied for some coordinate
# for which "d" is zero, there is no box-line intersection.
if (z[zero_d] < lb[zero_d]).any() or (z[zero_d] > ub[zero_d]).any():
intersect = False
return 0, 0, intersect
# Remove values for which d is zero
not_zero_d = np.logical_not(zero_d)
z = z[not_zero_d]
d = d[not_zero_d]
lb = lb[not_zero_d]
ub = ub[not_zero_d]
# Find a series of intervals (t_lb[i], t_ub[i]).
t_lb = (lb-z) / d
t_ub = (ub-z) / d
# Get the intersection of all those intervals.
ta = max(np.minimum(t_lb, t_ub))
tb = min(np.maximum(t_lb, t_ub))
# Check if intersection is feasible
if ta <= tb:
intersect = True
else:
intersect = False
# Checks to see if intersection happens within vectors length.
if not entire_line:
if tb < 0 or ta > 1:
intersect = False
ta = 0
tb = 0
else:
# Restrict intersection interval between 0 and 1.
ta = max(0, ta)
tb = min(1, tb)
return ta, tb, intersect
def box_sphere_intersections(z, d, lb, ub, trust_radius,
entire_line=False,
extra_info=False):
"""Find the intersection between segment (or line) and box/sphere constraints.
Find the intersection between the segment (or line) defined by the
parametric equation ``x(t) = z + t*d``, the rectangular box
``lb <= x <= ub`` and the ball ``||x|| <= trust_radius``.
Parameters
----------
z : array_like, shape (n,)
Initial point.
d : array_like, shape (n,)
Direction.
lb : array_like, shape (n,)
Lower bounds to each one of the components of ``x``. Used
to delimit the rectangular box.
ub : array_like, shape (n, )
Upper bounds to each one of the components of ``x``. Used
to delimit the rectangular box.
trust_radius : float
Ball radius.
entire_line : bool, optional
When ``True``, the function returns the intersection between the line
``x(t) = z + t*d`` (``t`` can assume any value) and the constraints.
When ``False``, the function returns the intersection between the segment
``x(t) = z + t*d``, ``0 <= t <= 1`` and the constraints.
extra_info : bool, optional
When ``True``, the function returns ``intersect_sphere`` and ``intersect_box``.
Returns
-------
ta, tb : float
The line/segment ``x(t) = z + t*d`` is inside the rectangular box and
inside the ball for ``ta <= t <= tb``.
intersect : bool
When ``True``, there is a intersection between the line (or segment)
and both constraints. On the other hand, when ``False``, there is no
intersection.
sphere_info : dict, optional
Dictionary ``{ta, tb, intersect}`` containing the interval ``[ta, tb]``
for which the line intercepts the ball. And a boolean value indicating
whether the sphere is intersected by the line.
box_info : dict, optional
Dictionary ``{ta, tb, intersect}`` containing the interval ``[ta, tb]``
for which the line intercepts the box. And a boolean value indicating
whether the box is intersected by the line.
"""
ta_b, tb_b, intersect_b = box_intersections(z, d, lb, ub,
entire_line)
ta_s, tb_s, intersect_s = sphere_intersections(z, d,
trust_radius,
entire_line)
ta = np.maximum(ta_b, ta_s)
tb = np.minimum(tb_b, tb_s)
if intersect_b and intersect_s and ta <= tb:
intersect = True
else:
intersect = False
if extra_info:
sphere_info = {'ta': ta_s, 'tb': tb_s, 'intersect': intersect_s}
box_info = {'ta': ta_b, 'tb': tb_b, 'intersect': intersect_b}
return ta, tb, intersect, sphere_info, box_info
else:
return ta, tb, intersect
def inside_box_boundaries(x, lb, ub):
"""Check if lb <= x <= ub."""
return (lb <= x).all() and (x <= ub).all()
def reinforce_box_boundaries(x, lb, ub):
"""Return clipped value of x"""
return np.minimum(np.maximum(x, lb), ub)
def modified_dogleg(A, Y, b, trust_radius, lb, ub):
"""Approximately minimize ``1/2*|| A x + b ||^2`` inside trust-region.
Approximately solve the problem of minimizing ``1/2*|| A x + b ||^2``
subject to ``||x|| < Delta`` and ``lb <= x <= ub`` using a modification
of the classical dogleg approach.
Parameters
----------
A : LinearOperator (or sparse matrix or ndarray), shape (m, n)
Matrix ``A`` in the minimization problem. It should have
dimension ``(m, n)`` such that ``m < n``.
Y : LinearOperator (or sparse matrix or ndarray), shape (n, m)
LinearOperator that apply the projection matrix
``Q = A.T inv(A A.T)`` to the vector. The obtained vector
``y = Q x`` being the minimum norm solution of ``A y = x``.
b : array_like, shape (m,)
Vector ``b``in the minimization problem.
trust_radius: float
Trust radius to be considered. Delimits a sphere boundary
to the problem.
lb : array_like, shape (n,)
Lower bounds to each one of the components of ``x``.
It is expected that ``lb <= 0``, otherwise the algorithm
may fail. If ``lb[i] = -Inf``, the lower
bound for the ith component is just ignored.
ub : array_like, shape (n, )
Upper bounds to each one of the components of ``x``.
It is expected that ``ub >= 0``, otherwise the algorithm
may fail. If ``ub[i] = Inf``, the upper bound for the ith
component is just ignored.
Returns
-------
x : array_like, shape (n,)
Solution to the problem.
Notes
-----
Based on implementations described in pp. 885-886 from [1]_.
References
----------
.. [1] Byrd, Richard H., Mary E. Hribar, and Jorge Nocedal.
"An interior point algorithm for large-scale nonlinear
programming." SIAM Journal on Optimization 9.4 (1999): 877-900.
"""
# Compute minimum norm minimizer of 1/2*|| A x + b ||^2.
newton_point = -Y.dot(b)
# Check for interior point
if inside_box_boundaries(newton_point, lb, ub) \
and norm(newton_point) <= trust_radius:
x = newton_point
return x
# Compute gradient vector ``g = A.T b``
g = A.T.dot(b)
# Compute Cauchy point
# `cauchy_point = g.T g / (g.T A.T A g)``.
A_g = A.dot(g)
cauchy_point = -np.dot(g, g) / np.dot(A_g, A_g) * g
# Origin
origin_point = np.zeros_like(cauchy_point)
# Check the segment between cauchy_point and newton_point
# for a possible solution.
z = cauchy_point
p = newton_point - cauchy_point
_, alpha, intersect = box_sphere_intersections(z, p, lb, ub,
trust_radius)
if intersect:
x1 = z + alpha*p
else:
# Check the segment between the origin and cauchy_point
# for a possible solution.
z = origin_point
p = cauchy_point
_, alpha, _ = box_sphere_intersections(z, p, lb, ub,
trust_radius)
x1 = z + alpha*p
# Check the segment between origin and newton_point
# for a possible solution.
z = origin_point
p = newton_point
_, alpha, _ = box_sphere_intersections(z, p, lb, ub,
trust_radius)
x2 = z + alpha*p
# Return the best solution among x1 and x2.
if norm(A.dot(x1) + b) < norm(A.dot(x2) + b):
return x1
else:
return x2
def projected_cg(H, c, Z, Y, b, trust_radius=np.inf,
lb=None, ub=None, tol=None,
max_iter=None, max_infeasible_iter=None,
return_all=False):
"""Solve EQP problem with projected CG method.
Solve equality-constrained quadratic programming problem
``min 1/2 x.T H x + x.t c`` subject to ``A x + b = 0`` and,
possibly, to trust region constraints ``||x|| < trust_radius``
and box constraints ``lb <= x <= ub``.
Parameters
----------
H : LinearOperator (or sparse matrix or ndarray), shape (n, n)
Operator for computing ``H v``.
c : array_like, shape (n,)
Gradient of the quadratic objective function.
Z : LinearOperator (or sparse matrix or ndarray), shape (n, n)
Operator for projecting ``x`` into the null space of A.
Y : LinearOperator, sparse matrix, ndarray, shape (n, m)
Operator that, for a given a vector ``b``, compute smallest
norm solution of ``A x + b = 0``.
b : array_like, shape (m,)
Right-hand side of the constraint equation.
trust_radius : float, optional
Trust radius to be considered. By default, uses ``trust_radius=inf``,
which means no trust radius at all.
lb : array_like, shape (n,), optional
Lower bounds to each one of the components of ``x``.
If ``lb[i] = -Inf`` the lower bound for the i-th
component is just ignored (default).
ub : array_like, shape (n, ), optional
Upper bounds to each one of the components of ``x``.
If ``ub[i] = Inf`` the upper bound for the i-th
component is just ignored (default).
tol : float, optional
Tolerance used to interrupt the algorithm.
max_iter : int, optional
Maximum algorithm iterations. Where ``max_inter <= n-m``.
By default, uses ``max_iter = n-m``.
max_infeasible_iter : int, optional
Maximum infeasible (regarding box constraints) iterations the
algorithm is allowed to take.
By default, uses ``max_infeasible_iter = n-m``.
return_all : bool, optional
When ``true``, return the list of all vectors through the iterations.
Returns
-------
x : array_like, shape (n,)
Solution of the EQP problem.
info : Dict
Dictionary containing the following:
- niter : Number of iterations.
- stop_cond : Reason for algorithm termination:
1. Iteration limit was reached;
2. Reached the trust-region boundary;
3. Negative curvature detected;
4. Tolerance was satisfied.
- allvecs : List containing all intermediary vectors (optional).
- hits_boundary : True if the proposed step is on the boundary
of the trust region.
Notes
-----
Implementation of Algorithm 6.2 on [1]_.
In the absence of spherical and box constraints, for sufficient
iterations, the method returns a truly optimal result.
In the presence of those constraints, the value returned is only
a inexpensive approximation of the optimal value.
References
----------
.. [1] Gould, Nicholas IM, Mary E. Hribar, and Jorge Nocedal.
"On the solution of equality constrained quadratic
programming problems arising in optimization."
SIAM Journal on Scientific Computing 23.4 (2001): 1376-1395.
"""
CLOSE_TO_ZERO = 1e-25
n, = np.shape(c) # Number of parameters
m, = np.shape(b) # Number of constraints
# Initial Values
x = Y.dot(-b)
r = Z.dot(H.dot(x) + c)
g = Z.dot(r)
p = -g
# Store ``x`` value
if return_all:
allvecs = [x]
# Values for the first iteration
H_p = H.dot(p)
rt_g = norm(g)**2 # g.T g = r.T Z g = r.T g (ref [1]_ p.1389)
# If x > trust-region the problem does not have a solution.
tr_distance = trust_radius - norm(x)
if tr_distance < 0:
raise ValueError("Trust region problem does not have a solution.")
# If x == trust_radius, then x is the solution
# to the optimization problem, since x is the
# minimum norm solution to Ax=b.
elif tr_distance < CLOSE_TO_ZERO:
info = {'niter': 0, 'stop_cond': 2, 'hits_boundary': True}
if return_all:
allvecs.append(x)
info['allvecs'] = allvecs
return x, info
# Set default tolerance
if tol is None:
tol = max(min(0.01 * np.sqrt(rt_g), 0.1 * rt_g), CLOSE_TO_ZERO)
# Set default lower and upper bounds
if lb is None:
lb = np.full(n, -np.inf)
if ub is None:
ub = np.full(n, np.inf)
# Set maximum iterations
if max_iter is None:
max_iter = n-m
max_iter = min(max_iter, n-m)
# Set maximum infeasible iterations
if max_infeasible_iter is None:
max_infeasible_iter = n-m
hits_boundary = False
stop_cond = 1
counter = 0
last_feasible_x = np.zeros_like(x)
k = 0
for i in range(max_iter):
# Stop criteria - Tolerance : r.T g < tol
if rt_g < tol:
stop_cond = 4
break
k += 1
# Compute curvature
pt_H_p = H_p.dot(p)
# Stop criteria - Negative curvature
if pt_H_p <= 0:
if np.isinf(trust_radius):
raise ValueError("Negative curvature not allowed "
"for unrestricted problems.")
else:
# Find intersection with constraints
_, alpha, intersect = box_sphere_intersections(
x, p, lb, ub, trust_radius, entire_line=True)
# Update solution
if intersect:
x = x + alpha*p
# Reinforce variables are inside box constraints.
# This is only necessary because of roundoff errors.
x = reinforce_box_boundaries(x, lb, ub)
# Attribute information
stop_cond = 3
hits_boundary = True
break
# Get next step
alpha = rt_g / pt_H_p
x_next = x + alpha*p
# Stop criteria - Hits boundary
if np.linalg.norm(x_next) >= trust_radius:
# Find intersection with box constraints
_, theta, intersect = box_sphere_intersections(x, alpha*p, lb, ub,
trust_radius)
# Update solution
if intersect:
x = x + theta*alpha*p
# Reinforce variables are inside box constraints.
# This is only necessary because of roundoff errors.
x = reinforce_box_boundaries(x, lb, ub)
# Attribute information
stop_cond = 2
hits_boundary = True
break
# Check if ``x`` is inside the box and start counter if it is not.
if inside_box_boundaries(x_next, lb, ub):
counter = 0
else:
counter += 1
# Whenever outside box constraints keep looking for intersections.
if counter > 0:
_, theta, intersect = box_sphere_intersections(x, alpha*p, lb, ub,
trust_radius)
if intersect:
last_feasible_x = x + theta*alpha*p
# Reinforce variables are inside box constraints.
# This is only necessary because of roundoff errors.
last_feasible_x = reinforce_box_boundaries(last_feasible_x,
lb, ub)
counter = 0
# Stop after too many infeasible (regarding box constraints) iteration.
if counter > max_infeasible_iter:
break
# Store ``x_next`` value
if return_all:
allvecs.append(x_next)
# Update residual
r_next = r + alpha*H_p
# Project residual g+ = Z r+
g_next = Z.dot(r_next)
# Compute conjugate direction step d
rt_g_next = norm(g_next)**2 # g.T g = r.T g (ref [1]_ p.1389)
beta = rt_g_next / rt_g
p = - g_next + beta*p
# Prepare for next iteration
x = x_next
g = g_next
r = g_next
rt_g = norm(g)**2 # g.T g = r.T Z g = r.T g (ref [1]_ p.1389)
H_p = H.dot(p)
if not inside_box_boundaries(x, lb, ub):
x = last_feasible_x
hits_boundary = True
info = {'niter': k, 'stop_cond': stop_cond,
'hits_boundary': hits_boundary}
if return_all:
info['allvecs'] = allvecs
return x, info
@@ -0,0 +1,51 @@
"""Progress report printers."""
from __future__ import annotations
class ReportBase:
COLUMN_NAMES: list[str] = NotImplemented
COLUMN_WIDTHS: list[int] = NotImplemented
ITERATION_FORMATS: list[str] = NotImplemented
@classmethod
def print_header(cls):
fmt = ("|"
+ "|".join([f"{{:^{x}}}" for x in cls.COLUMN_WIDTHS])
+ "|")
separators = ['-' * x for x in cls.COLUMN_WIDTHS]
print(fmt.format(*cls.COLUMN_NAMES))
print(fmt.format(*separators))
@classmethod
def print_iteration(cls, *args):
iteration_format = [f"{{:{x}}}" for x in cls.ITERATION_FORMATS]
fmt = "|" + "|".join(iteration_format) + "|"
print(fmt.format(*args))
@classmethod
def print_footer(cls):
print()
class BasicReport(ReportBase):
COLUMN_NAMES = ["niter", "f evals", "CG iter", "obj func", "tr radius",
"opt", "c viol"]
COLUMN_WIDTHS = [7, 7, 7, 13, 10, 10, 10]
ITERATION_FORMATS = ["^7", "^7", "^7", "^+13.4e",
"^10.2e", "^10.2e", "^10.2e"]
class SQPReport(ReportBase):
COLUMN_NAMES = ["niter", "f evals", "CG iter", "obj func", "tr radius",
"opt", "c viol", "penalty", "CG stop"]
COLUMN_WIDTHS = [7, 7, 7, 13, 10, 10, 10, 10, 7]
ITERATION_FORMATS = ["^7", "^7", "^7", "^+13.4e", "^10.2e", "^10.2e",
"^10.2e", "^10.2e", "^7"]
class IPReport(ReportBase):
COLUMN_NAMES = ["niter", "f evals", "CG iter", "obj func", "tr radius",
"opt", "c viol", "penalty", "barrier param", "CG stop"]
COLUMN_WIDTHS = [7, 7, 7, 13, 10, 10, 10, 10, 13, 7]
ITERATION_FORMATS = ["^7", "^7", "^7", "^+13.4e", "^10.2e", "^10.2e",
"^10.2e", "^10.2e", "^13.2e", "^7"]
@@ -0,0 +1,296 @@
import numpy as np
from numpy.testing import assert_array_equal, assert_equal
from scipy.optimize._constraints import (NonlinearConstraint, Bounds,
PreparedConstraint)
from scipy.optimize._trustregion_constr.canonical_constraint \
import CanonicalConstraint, initial_constraints_as_canonical
def create_quadratic_function(n, m, rng):
a = rng.rand(m)
A = rng.rand(m, n)
H = rng.rand(m, n, n)
HT = np.transpose(H, (1, 2, 0))
def fun(x):
return a + A.dot(x) + 0.5 * H.dot(x).dot(x)
def jac(x):
return A + H.dot(x)
def hess(x, v):
return HT.dot(v)
return fun, jac, hess
def test_bounds_cases():
# Test 1: no constraints.
user_constraint = Bounds(-np.inf, np.inf)
x0 = np.array([-1, 2])
prepared_constraint = PreparedConstraint(user_constraint, x0, False)
c = CanonicalConstraint.from_PreparedConstraint(prepared_constraint)
assert_equal(c.n_eq, 0)
assert_equal(c.n_ineq, 0)
c_eq, c_ineq = c.fun(x0)
assert_array_equal(c_eq, [])
assert_array_equal(c_ineq, [])
J_eq, J_ineq = c.jac(x0)
assert_array_equal(J_eq, np.empty((0, 2)))
assert_array_equal(J_ineq, np.empty((0, 2)))
assert_array_equal(c.keep_feasible, [])
# Test 2: infinite lower bound.
user_constraint = Bounds(-np.inf, [0, np.inf, 1], [False, True, True])
x0 = np.array([-1, -2, -3], dtype=float)
prepared_constraint = PreparedConstraint(user_constraint, x0, False)
c = CanonicalConstraint.from_PreparedConstraint(prepared_constraint)
assert_equal(c.n_eq, 0)
assert_equal(c.n_ineq, 2)
c_eq, c_ineq = c.fun(x0)
assert_array_equal(c_eq, [])
assert_array_equal(c_ineq, [-1, -4])
J_eq, J_ineq = c.jac(x0)
assert_array_equal(J_eq, np.empty((0, 3)))
assert_array_equal(J_ineq, np.array([[1, 0, 0], [0, 0, 1]]))
assert_array_equal(c.keep_feasible, [False, True])
# Test 3: infinite upper bound.
user_constraint = Bounds([0, 1, -np.inf], np.inf, [True, False, True])
x0 = np.array([1, 2, 3], dtype=float)
prepared_constraint = PreparedConstraint(user_constraint, x0, False)
c = CanonicalConstraint.from_PreparedConstraint(prepared_constraint)
assert_equal(c.n_eq, 0)
assert_equal(c.n_ineq, 2)
c_eq, c_ineq = c.fun(x0)
assert_array_equal(c_eq, [])
assert_array_equal(c_ineq, [-1, -1])
J_eq, J_ineq = c.jac(x0)
assert_array_equal(J_eq, np.empty((0, 3)))
assert_array_equal(J_ineq, np.array([[-1, 0, 0], [0, -1, 0]]))
assert_array_equal(c.keep_feasible, [True, False])
# Test 4: interval constraint.
user_constraint = Bounds([-1, -np.inf, 2, 3], [1, np.inf, 10, 3],
[False, True, True, True])
x0 = np.array([0, 10, 8, 5])
prepared_constraint = PreparedConstraint(user_constraint, x0, False)
c = CanonicalConstraint.from_PreparedConstraint(prepared_constraint)
assert_equal(c.n_eq, 1)
assert_equal(c.n_ineq, 4)
c_eq, c_ineq = c.fun(x0)
assert_array_equal(c_eq, [2])
assert_array_equal(c_ineq, [-1, -2, -1, -6])
J_eq, J_ineq = c.jac(x0)
assert_array_equal(J_eq, [[0, 0, 0, 1]])
assert_array_equal(J_ineq, [[1, 0, 0, 0],
[0, 0, 1, 0],
[-1, 0, 0, 0],
[0, 0, -1, 0]])
assert_array_equal(c.keep_feasible, [False, True, False, True])
def test_nonlinear_constraint():
n = 3
m = 5
rng = np.random.RandomState(0)
x0 = rng.rand(n)
fun, jac, hess = create_quadratic_function(n, m, rng)
f = fun(x0)
J = jac(x0)
lb = [-10, 3, -np.inf, -np.inf, -5]
ub = [10, 3, np.inf, 3, np.inf]
user_constraint = NonlinearConstraint(
fun, lb, ub, jac, hess, [True, False, False, True, False])
for sparse_jacobian in [False, True]:
prepared_constraint = PreparedConstraint(user_constraint, x0,
sparse_jacobian)
c = CanonicalConstraint.from_PreparedConstraint(prepared_constraint)
assert_array_equal(c.n_eq, 1)
assert_array_equal(c.n_ineq, 4)
c_eq, c_ineq = c.fun(x0)
assert_array_equal(c_eq, [f[1] - lb[1]])
assert_array_equal(c_ineq, [f[3] - ub[3], lb[4] - f[4],
f[0] - ub[0], lb[0] - f[0]])
J_eq, J_ineq = c.jac(x0)
if sparse_jacobian:
J_eq = J_eq.toarray()
J_ineq = J_ineq.toarray()
assert_array_equal(J_eq, J[1, None])
assert_array_equal(J_ineq, np.vstack((J[3], -J[4], J[0], -J[0])))
v_eq = rng.rand(c.n_eq)
v_ineq = rng.rand(c.n_ineq)
v = np.zeros(m)
v[1] = v_eq[0]
v[3] = v_ineq[0]
v[4] = -v_ineq[1]
v[0] = v_ineq[2] - v_ineq[3]
assert_array_equal(c.hess(x0, v_eq, v_ineq), hess(x0, v))
assert_array_equal(c.keep_feasible, [True, False, True, True])
def test_concatenation():
rng = np.random.RandomState(0)
n = 4
x0 = rng.rand(n)
f1 = x0
J1 = np.eye(n)
lb1 = [-1, -np.inf, -2, 3]
ub1 = [1, np.inf, np.inf, 3]
bounds = Bounds(lb1, ub1, [False, False, True, False])
fun, jac, hess = create_quadratic_function(n, 5, rng)
f2 = fun(x0)
J2 = jac(x0)
lb2 = [-10, 3, -np.inf, -np.inf, -5]
ub2 = [10, 3, np.inf, 5, np.inf]
nonlinear = NonlinearConstraint(
fun, lb2, ub2, jac, hess, [True, False, False, True, False])
for sparse_jacobian in [False, True]:
bounds_prepared = PreparedConstraint(bounds, x0, sparse_jacobian)
nonlinear_prepared = PreparedConstraint(nonlinear, x0, sparse_jacobian)
c1 = CanonicalConstraint.from_PreparedConstraint(bounds_prepared)
c2 = CanonicalConstraint.from_PreparedConstraint(nonlinear_prepared)
c = CanonicalConstraint.concatenate([c1, c2], sparse_jacobian)
assert_equal(c.n_eq, 2)
assert_equal(c.n_ineq, 7)
c_eq, c_ineq = c.fun(x0)
assert_array_equal(c_eq, [f1[3] - lb1[3], f2[1] - lb2[1]])
assert_array_equal(c_ineq, [lb1[2] - f1[2], f1[0] - ub1[0],
lb1[0] - f1[0], f2[3] - ub2[3],
lb2[4] - f2[4], f2[0] - ub2[0],
lb2[0] - f2[0]])
J_eq, J_ineq = c.jac(x0)
if sparse_jacobian:
J_eq = J_eq.toarray()
J_ineq = J_ineq.toarray()
assert_array_equal(J_eq, np.vstack((J1[3], J2[1])))
assert_array_equal(J_ineq, np.vstack((-J1[2], J1[0], -J1[0], J2[3],
-J2[4], J2[0], -J2[0])))
v_eq = rng.rand(c.n_eq)
v_ineq = rng.rand(c.n_ineq)
v = np.zeros(5)
v[1] = v_eq[1]
v[3] = v_ineq[3]
v[4] = -v_ineq[4]
v[0] = v_ineq[5] - v_ineq[6]
H = c.hess(x0, v_eq, v_ineq).dot(np.eye(n))
assert_array_equal(H, hess(x0, v))
assert_array_equal(c.keep_feasible,
[True, False, False, True, False, True, True])
def test_empty():
x = np.array([1, 2, 3])
c = CanonicalConstraint.empty(3)
assert_equal(c.n_eq, 0)
assert_equal(c.n_ineq, 0)
c_eq, c_ineq = c.fun(x)
assert_array_equal(c_eq, [])
assert_array_equal(c_ineq, [])
J_eq, J_ineq = c.jac(x)
assert_array_equal(J_eq, np.empty((0, 3)))
assert_array_equal(J_ineq, np.empty((0, 3)))
H = c.hess(x, None, None).toarray()
assert_array_equal(H, np.zeros((3, 3)))
def test_initial_constraints_as_canonical():
# rng is only used to generate the coefficients of the quadratic
# function that is used by the nonlinear constraint.
rng = np.random.RandomState(0)
x0 = np.array([0.5, 0.4, 0.3, 0.2])
n = len(x0)
lb1 = [-1, -np.inf, -2, 3]
ub1 = [1, np.inf, np.inf, 3]
bounds = Bounds(lb1, ub1, [False, False, True, False])
fun, jac, hess = create_quadratic_function(n, 5, rng)
lb2 = [-10, 3, -np.inf, -np.inf, -5]
ub2 = [10, 3, np.inf, 5, np.inf]
nonlinear = NonlinearConstraint(
fun, lb2, ub2, jac, hess, [True, False, False, True, False])
for sparse_jacobian in [False, True]:
bounds_prepared = PreparedConstraint(bounds, x0, sparse_jacobian)
nonlinear_prepared = PreparedConstraint(nonlinear, x0, sparse_jacobian)
f1 = bounds_prepared.fun.f
J1 = bounds_prepared.fun.J
f2 = nonlinear_prepared.fun.f
J2 = nonlinear_prepared.fun.J
c_eq, c_ineq, J_eq, J_ineq = initial_constraints_as_canonical(
n, [bounds_prepared, nonlinear_prepared], sparse_jacobian)
assert_array_equal(c_eq, [f1[3] - lb1[3], f2[1] - lb2[1]])
assert_array_equal(c_ineq, [lb1[2] - f1[2], f1[0] - ub1[0],
lb1[0] - f1[0], f2[3] - ub2[3],
lb2[4] - f2[4], f2[0] - ub2[0],
lb2[0] - f2[0]])
if sparse_jacobian:
J1 = J1.toarray()
J2 = J2.toarray()
J_eq = J_eq.toarray()
J_ineq = J_ineq.toarray()
assert_array_equal(J_eq, np.vstack((J1[3], J2[1])))
assert_array_equal(J_ineq, np.vstack((-J1[2], J1[0], -J1[0], J2[3],
-J2[4], J2[0], -J2[0])))
def test_initial_constraints_as_canonical_empty():
n = 3
for sparse_jacobian in [False, True]:
c_eq, c_ineq, J_eq, J_ineq = initial_constraints_as_canonical(
n, [], sparse_jacobian)
assert_array_equal(c_eq, [])
assert_array_equal(c_ineq, [])
if sparse_jacobian:
J_eq = J_eq.toarray()
J_ineq = J_ineq.toarray()
assert_array_equal(J_eq, np.empty((0, n)))
assert_array_equal(J_ineq, np.empty((0, n)))
@@ -0,0 +1,214 @@
import numpy as np
import scipy.linalg
from scipy.sparse import csc_matrix
from scipy.optimize._trustregion_constr.projections \
import projections, orthogonality
from numpy.testing import (TestCase, assert_array_almost_equal,
assert_equal, assert_allclose)
try:
from sksparse.cholmod import cholesky_AAt # noqa: F401
sksparse_available = True
available_sparse_methods = ("NormalEquation", "AugmentedSystem")
except ImportError:
sksparse_available = False
available_sparse_methods = ("AugmentedSystem",)
available_dense_methods = ('QRFactorization', 'SVDFactorization')
class TestProjections(TestCase):
def test_nullspace_and_least_squares_sparse(self):
A_dense = np.array([[1, 2, 3, 4, 0, 5, 0, 7],
[0, 8, 7, 0, 1, 5, 9, 0],
[1, 0, 0, 0, 0, 1, 2, 3]])
At_dense = A_dense.T
A = csc_matrix(A_dense)
test_points = ([1, 2, 3, 4, 5, 6, 7, 8],
[1, 10, 3, 0, 1, 6, 7, 8],
[1.12, 10, 0, 0, 100000, 6, 0.7, 8])
for method in available_sparse_methods:
Z, LS, _ = projections(A, method)
for z in test_points:
# Test if x is in the null_space
x = Z.matvec(z)
assert_array_almost_equal(A.dot(x), 0)
# Test orthogonality
assert_array_almost_equal(orthogonality(A, x), 0)
# Test if x is the least square solution
x = LS.matvec(z)
x2 = scipy.linalg.lstsq(At_dense, z)[0]
assert_array_almost_equal(x, x2)
def test_iterative_refinements_sparse(self):
A_dense = np.array([[1, 2, 3, 4, 0, 5, 0, 7],
[0, 8, 7, 0, 1, 5, 9, 0],
[1, 0, 0, 0, 0, 1, 2, 3]])
A = csc_matrix(A_dense)
test_points = ([1, 2, 3, 4, 5, 6, 7, 8],
[1, 10, 3, 0, 1, 6, 7, 8],
[1.12, 10, 0, 0, 100000, 6, 0.7, 8],
[1, 0, 0, 0, 0, 1, 2, 3+1e-10])
for method in available_sparse_methods:
Z, LS, _ = projections(A, method, orth_tol=1e-18, max_refin=100)
for z in test_points:
# Test if x is in the null_space
x = Z.matvec(z)
atol = 1e-13 * abs(x).max()
assert_allclose(A.dot(x), 0, atol=atol)
# Test orthogonality
assert_allclose(orthogonality(A, x), 0, atol=1e-13)
def test_rowspace_sparse(self):
A_dense = np.array([[1, 2, 3, 4, 0, 5, 0, 7],
[0, 8, 7, 0, 1, 5, 9, 0],
[1, 0, 0, 0, 0, 1, 2, 3]])
A = csc_matrix(A_dense)
test_points = ([1, 2, 3],
[1, 10, 3],
[1.12, 10, 0])
for method in available_sparse_methods:
_, _, Y = projections(A, method)
for z in test_points:
# Test if x is solution of A x = z
x = Y.matvec(z)
assert_array_almost_equal(A.dot(x), z)
# Test if x is in the return row space of A
A_ext = np.vstack((A_dense, x))
assert_equal(np.linalg.matrix_rank(A_dense),
np.linalg.matrix_rank(A_ext))
def test_nullspace_and_least_squares_dense(self):
A = np.array([[1, 2, 3, 4, 0, 5, 0, 7],
[0, 8, 7, 0, 1, 5, 9, 0],
[1, 0, 0, 0, 0, 1, 2, 3]])
At = A.T
test_points = ([1, 2, 3, 4, 5, 6, 7, 8],
[1, 10, 3, 0, 1, 6, 7, 8],
[1.12, 10, 0, 0, 100000, 6, 0.7, 8])
for method in available_dense_methods:
Z, LS, _ = projections(A, method)
for z in test_points:
# Test if x is in the null_space
x = Z.matvec(z)
assert_array_almost_equal(A.dot(x), 0)
# Test orthogonality
assert_array_almost_equal(orthogonality(A, x), 0)
# Test if x is the least square solution
x = LS.matvec(z)
x2 = scipy.linalg.lstsq(At, z)[0]
assert_array_almost_equal(x, x2)
def test_compare_dense_and_sparse(self):
D = np.diag(range(1, 101))
A = np.hstack([D, D, D, D])
A_sparse = csc_matrix(A)
np.random.seed(0)
Z, LS, Y = projections(A)
Z_sparse, LS_sparse, Y_sparse = projections(A_sparse)
for k in range(20):
z = np.random.normal(size=(400,))
assert_array_almost_equal(Z.dot(z), Z_sparse.dot(z))
assert_array_almost_equal(LS.dot(z), LS_sparse.dot(z))
x = np.random.normal(size=(100,))
assert_array_almost_equal(Y.dot(x), Y_sparse.dot(x))
def test_compare_dense_and_sparse2(self):
D1 = np.diag([-1.7, 1, 0.5])
D2 = np.diag([1, -0.6, -0.3])
D3 = np.diag([-0.3, -1.5, 2])
A = np.hstack([D1, D2, D3])
A_sparse = csc_matrix(A)
np.random.seed(0)
Z, LS, Y = projections(A)
Z_sparse, LS_sparse, Y_sparse = projections(A_sparse)
for k in range(1):
z = np.random.normal(size=(9,))
assert_array_almost_equal(Z.dot(z), Z_sparse.dot(z))
assert_array_almost_equal(LS.dot(z), LS_sparse.dot(z))
x = np.random.normal(size=(3,))
assert_array_almost_equal(Y.dot(x), Y_sparse.dot(x))
def test_iterative_refinements_dense(self):
A = np.array([[1, 2, 3, 4, 0, 5, 0, 7],
[0, 8, 7, 0, 1, 5, 9, 0],
[1, 0, 0, 0, 0, 1, 2, 3]])
test_points = ([1, 2, 3, 4, 5, 6, 7, 8],
[1, 10, 3, 0, 1, 6, 7, 8],
[1, 0, 0, 0, 0, 1, 2, 3+1e-10])
for method in available_dense_methods:
Z, LS, _ = projections(A, method, orth_tol=1e-18, max_refin=10)
for z in test_points:
# Test if x is in the null_space
x = Z.matvec(z)
assert_allclose(A.dot(x), 0, rtol=0, atol=2.5e-14)
# Test orthogonality
assert_allclose(orthogonality(A, x), 0, rtol=0, atol=5e-16)
def test_rowspace_dense(self):
A = np.array([[1, 2, 3, 4, 0, 5, 0, 7],
[0, 8, 7, 0, 1, 5, 9, 0],
[1, 0, 0, 0, 0, 1, 2, 3]])
test_points = ([1, 2, 3],
[1, 10, 3],
[1.12, 10, 0])
for method in available_dense_methods:
_, _, Y = projections(A, method)
for z in test_points:
# Test if x is solution of A x = z
x = Y.matvec(z)
assert_array_almost_equal(A.dot(x), z)
# Test if x is in the return row space of A
A_ext = np.vstack((A, x))
assert_equal(np.linalg.matrix_rank(A),
np.linalg.matrix_rank(A_ext))
class TestOrthogonality(TestCase):
def test_dense_matrix(self):
A = np.array([[1, 2, 3, 4, 0, 5, 0, 7],
[0, 8, 7, 0, 1, 5, 9, 0],
[1, 0, 0, 0, 0, 1, 2, 3]])
test_vectors = ([-1.98931144, -1.56363389,
-0.84115584, 2.2864762,
5.599141, 0.09286976,
1.37040802, -0.28145812],
[697.92794044, -4091.65114008,
-3327.42316335, 836.86906951,
99434.98929065, -1285.37653682,
-4109.21503806, 2935.29289083])
test_expected_orth = (0, 0)
for i in range(len(test_vectors)):
x = test_vectors[i]
orth = test_expected_orth[i]
assert_array_almost_equal(orthogonality(A, x), orth)
def test_sparse_matrix(self):
A = np.array([[1, 2, 3, 4, 0, 5, 0, 7],
[0, 8, 7, 0, 1, 5, 9, 0],
[1, 0, 0, 0, 0, 1, 2, 3]])
A = csc_matrix(A)
test_vectors = ([-1.98931144, -1.56363389,
-0.84115584, 2.2864762,
5.599141, 0.09286976,
1.37040802, -0.28145812],
[697.92794044, -4091.65114008,
-3327.42316335, 836.86906951,
99434.98929065, -1285.37653682,
-4109.21503806, 2935.29289083])
test_expected_orth = (0, 0)
for i in range(len(test_vectors)):
x = test_vectors[i]
orth = test_expected_orth[i]
assert_array_almost_equal(orthogonality(A, x), orth)
@@ -0,0 +1,645 @@
import numpy as np
from scipy.sparse import csc_matrix
from scipy.optimize._trustregion_constr.qp_subproblem \
import (eqp_kktfact,
projected_cg,
box_intersections,
sphere_intersections,
box_sphere_intersections,
modified_dogleg)
from scipy.optimize._trustregion_constr.projections \
import projections
from numpy.testing import TestCase, assert_array_almost_equal, assert_equal
import pytest
class TestEQPDirectFactorization(TestCase):
# From Example 16.2 Nocedal/Wright "Numerical
# Optimization" p.452.
def test_nocedal_example(self):
H = csc_matrix([[6, 2, 1],
[2, 5, 2],
[1, 2, 4]])
A = csc_matrix([[1, 0, 1],
[0, 1, 1]])
c = np.array([-8, -3, -3])
b = -np.array([3, 0])
x, lagrange_multipliers = eqp_kktfact(H, c, A, b)
assert_array_almost_equal(x, [2, -1, 1])
assert_array_almost_equal(lagrange_multipliers, [3, -2])
class TestSphericalBoundariesIntersections(TestCase):
def test_2d_sphere_constraints(self):
# Interior inicial point
ta, tb, intersect = sphere_intersections([0, 0],
[1, 0], 0.5)
assert_array_almost_equal([ta, tb], [0, 0.5])
assert_equal(intersect, True)
# No intersection between line and circle
ta, tb, intersect = sphere_intersections([2, 0],
[0, 1], 1)
assert_equal(intersect, False)
# Outside initial point pointing toward outside the circle
ta, tb, intersect = sphere_intersections([2, 0],
[1, 0], 1)
assert_equal(intersect, False)
# Outside initial point pointing toward inside the circle
ta, tb, intersect = sphere_intersections([2, 0],
[-1, 0], 1.5)
assert_array_almost_equal([ta, tb], [0.5, 1])
assert_equal(intersect, True)
# Initial point on the boundary
ta, tb, intersect = sphere_intersections([2, 0],
[1, 0], 2)
assert_array_almost_equal([ta, tb], [0, 0])
assert_equal(intersect, True)
def test_2d_sphere_constraints_line_intersections(self):
# Interior initial point
ta, tb, intersect = sphere_intersections([0, 0],
[1, 0], 0.5,
entire_line=True)
assert_array_almost_equal([ta, tb], [-0.5, 0.5])
assert_equal(intersect, True)
# No intersection between line and circle
ta, tb, intersect = sphere_intersections([2, 0],
[0, 1], 1,
entire_line=True)
assert_equal(intersect, False)
# Outside initial point pointing toward outside the circle
ta, tb, intersect = sphere_intersections([2, 0],
[1, 0], 1,
entire_line=True)
assert_array_almost_equal([ta, tb], [-3, -1])
assert_equal(intersect, True)
# Outside initial point pointing toward inside the circle
ta, tb, intersect = sphere_intersections([2, 0],
[-1, 0], 1.5,
entire_line=True)
assert_array_almost_equal([ta, tb], [0.5, 3.5])
assert_equal(intersect, True)
# Initial point on the boundary
ta, tb, intersect = sphere_intersections([2, 0],
[1, 0], 2,
entire_line=True)
assert_array_almost_equal([ta, tb], [-4, 0])
assert_equal(intersect, True)
class TestBoxBoundariesIntersections(TestCase):
def test_2d_box_constraints(self):
# Box constraint in the direction of vector d
ta, tb, intersect = box_intersections([2, 0], [0, 2],
[1, 1], [3, 3])
assert_array_almost_equal([ta, tb], [0.5, 1])
assert_equal(intersect, True)
# Negative direction
ta, tb, intersect = box_intersections([2, 0], [0, 2],
[1, -3], [3, -1])
assert_equal(intersect, False)
# Some constraints are absent (set to +/- inf)
ta, tb, intersect = box_intersections([2, 0], [0, 2],
[-np.inf, 1],
[np.inf, np.inf])
assert_array_almost_equal([ta, tb], [0.5, 1])
assert_equal(intersect, True)
# Intersect on the face of the box
ta, tb, intersect = box_intersections([1, 0], [0, 1],
[1, 1], [3, 3])
assert_array_almost_equal([ta, tb], [1, 1])
assert_equal(intersect, True)
# Interior initial point
ta, tb, intersect = box_intersections([0, 0], [4, 4],
[-2, -3], [3, 2])
assert_array_almost_equal([ta, tb], [0, 0.5])
assert_equal(intersect, True)
# No intersection between line and box constraints
ta, tb, intersect = box_intersections([2, 0], [0, 2],
[-3, -3], [-1, -1])
assert_equal(intersect, False)
ta, tb, intersect = box_intersections([2, 0], [0, 2],
[-3, 3], [-1, 1])
assert_equal(intersect, False)
ta, tb, intersect = box_intersections([2, 0], [0, 2],
[-3, -np.inf],
[-1, np.inf])
assert_equal(intersect, False)
ta, tb, intersect = box_intersections([0, 0], [1, 100],
[1, 1], [3, 3])
assert_equal(intersect, False)
ta, tb, intersect = box_intersections([0.99, 0], [0, 2],
[1, 1], [3, 3])
assert_equal(intersect, False)
# Initial point on the boundary
ta, tb, intersect = box_intersections([2, 2], [0, 1],
[-2, -2], [2, 2])
assert_array_almost_equal([ta, tb], [0, 0])
assert_equal(intersect, True)
def test_2d_box_constraints_entire_line(self):
# Box constraint in the direction of vector d
ta, tb, intersect = box_intersections([2, 0], [0, 2],
[1, 1], [3, 3],
entire_line=True)
assert_array_almost_equal([ta, tb], [0.5, 1.5])
assert_equal(intersect, True)
# Negative direction
ta, tb, intersect = box_intersections([2, 0], [0, 2],
[1, -3], [3, -1],
entire_line=True)
assert_array_almost_equal([ta, tb], [-1.5, -0.5])
assert_equal(intersect, True)
# Some constraints are absent (set to +/- inf)
ta, tb, intersect = box_intersections([2, 0], [0, 2],
[-np.inf, 1],
[np.inf, np.inf],
entire_line=True)
assert_array_almost_equal([ta, tb], [0.5, np.inf])
assert_equal(intersect, True)
# Intersect on the face of the box
ta, tb, intersect = box_intersections([1, 0], [0, 1],
[1, 1], [3, 3],
entire_line=True)
assert_array_almost_equal([ta, tb], [1, 3])
assert_equal(intersect, True)
# Interior initial pointoint
ta, tb, intersect = box_intersections([0, 0], [4, 4],
[-2, -3], [3, 2],
entire_line=True)
assert_array_almost_equal([ta, tb], [-0.5, 0.5])
assert_equal(intersect, True)
# No intersection between line and box constraints
ta, tb, intersect = box_intersections([2, 0], [0, 2],
[-3, -3], [-1, -1],
entire_line=True)
assert_equal(intersect, False)
ta, tb, intersect = box_intersections([2, 0], [0, 2],
[-3, 3], [-1, 1],
entire_line=True)
assert_equal(intersect, False)
ta, tb, intersect = box_intersections([2, 0], [0, 2],
[-3, -np.inf],
[-1, np.inf],
entire_line=True)
assert_equal(intersect, False)
ta, tb, intersect = box_intersections([0, 0], [1, 100],
[1, 1], [3, 3],
entire_line=True)
assert_equal(intersect, False)
ta, tb, intersect = box_intersections([0.99, 0], [0, 2],
[1, 1], [3, 3],
entire_line=True)
assert_equal(intersect, False)
# Initial point on the boundary
ta, tb, intersect = box_intersections([2, 2], [0, 1],
[-2, -2], [2, 2],
entire_line=True)
assert_array_almost_equal([ta, tb], [-4, 0])
assert_equal(intersect, True)
def test_3d_box_constraints(self):
# Simple case
ta, tb, intersect = box_intersections([1, 1, 0], [0, 0, 1],
[1, 1, 1], [3, 3, 3])
assert_array_almost_equal([ta, tb], [1, 1])
assert_equal(intersect, True)
# Negative direction
ta, tb, intersect = box_intersections([1, 1, 0], [0, 0, -1],
[1, 1, 1], [3, 3, 3])
assert_equal(intersect, False)
# Interior point
ta, tb, intersect = box_intersections([2, 2, 2], [0, -1, 1],
[1, 1, 1], [3, 3, 3])
assert_array_almost_equal([ta, tb], [0, 1])
assert_equal(intersect, True)
def test_3d_box_constraints_entire_line(self):
# Simple case
ta, tb, intersect = box_intersections([1, 1, 0], [0, 0, 1],
[1, 1, 1], [3, 3, 3],
entire_line=True)
assert_array_almost_equal([ta, tb], [1, 3])
assert_equal(intersect, True)
# Negative direction
ta, tb, intersect = box_intersections([1, 1, 0], [0, 0, -1],
[1, 1, 1], [3, 3, 3],
entire_line=True)
assert_array_almost_equal([ta, tb], [-3, -1])
assert_equal(intersect, True)
# Interior point
ta, tb, intersect = box_intersections([2, 2, 2], [0, -1, 1],
[1, 1, 1], [3, 3, 3],
entire_line=True)
assert_array_almost_equal([ta, tb], [-1, 1])
assert_equal(intersect, True)
class TestBoxSphereBoundariesIntersections(TestCase):
def test_2d_box_constraints(self):
# Both constraints are active
ta, tb, intersect = box_sphere_intersections([1, 1], [-2, 2],
[-1, -2], [1, 2], 2,
entire_line=False)
assert_array_almost_equal([ta, tb], [0, 0.5])
assert_equal(intersect, True)
# None of the constraints are active
ta, tb, intersect = box_sphere_intersections([1, 1], [-1, 1],
[-1, -3], [1, 3], 10,
entire_line=False)
assert_array_almost_equal([ta, tb], [0, 1])
assert_equal(intersect, True)
# Box constraints are active
ta, tb, intersect = box_sphere_intersections([1, 1], [-4, 4],
[-1, -3], [1, 3], 10,
entire_line=False)
assert_array_almost_equal([ta, tb], [0, 0.5])
assert_equal(intersect, True)
# Spherical constraints are active
ta, tb, intersect = box_sphere_intersections([1, 1], [-4, 4],
[-1, -3], [1, 3], 2,
entire_line=False)
assert_array_almost_equal([ta, tb], [0, 0.25])
assert_equal(intersect, True)
# Infeasible problems
ta, tb, intersect = box_sphere_intersections([2, 2], [-4, 4],
[-1, -3], [1, 3], 2,
entire_line=False)
assert_equal(intersect, False)
ta, tb, intersect = box_sphere_intersections([1, 1], [-4, 4],
[2, 4], [2, 4], 2,
entire_line=False)
assert_equal(intersect, False)
def test_2d_box_constraints_entire_line(self):
# Both constraints are active
ta, tb, intersect = box_sphere_intersections([1, 1], [-2, 2],
[-1, -2], [1, 2], 2,
entire_line=True)
assert_array_almost_equal([ta, tb], [0, 0.5])
assert_equal(intersect, True)
# None of the constraints are active
ta, tb, intersect = box_sphere_intersections([1, 1], [-1, 1],
[-1, -3], [1, 3], 10,
entire_line=True)
assert_array_almost_equal([ta, tb], [0, 2])
assert_equal(intersect, True)
# Box constraints are active
ta, tb, intersect = box_sphere_intersections([1, 1], [-4, 4],
[-1, -3], [1, 3], 10,
entire_line=True)
assert_array_almost_equal([ta, tb], [0, 0.5])
assert_equal(intersect, True)
# Spherical constraints are active
ta, tb, intersect = box_sphere_intersections([1, 1], [-4, 4],
[-1, -3], [1, 3], 2,
entire_line=True)
assert_array_almost_equal([ta, tb], [0, 0.25])
assert_equal(intersect, True)
# Infeasible problems
ta, tb, intersect = box_sphere_intersections([2, 2], [-4, 4],
[-1, -3], [1, 3], 2,
entire_line=True)
assert_equal(intersect, False)
ta, tb, intersect = box_sphere_intersections([1, 1], [-4, 4],
[2, 4], [2, 4], 2,
entire_line=True)
assert_equal(intersect, False)
class TestModifiedDogleg(TestCase):
def test_cauchypoint_equalsto_newtonpoint(self):
A = np.array([[1, 8]])
b = np.array([-16])
_, _, Y = projections(A)
newton_point = np.array([0.24615385, 1.96923077])
# Newton point inside boundaries
x = modified_dogleg(A, Y, b, 2, [-np.inf, -np.inf], [np.inf, np.inf])
assert_array_almost_equal(x, newton_point)
# Spherical constraint active
x = modified_dogleg(A, Y, b, 1, [-np.inf, -np.inf], [np.inf, np.inf])
assert_array_almost_equal(x, newton_point/np.linalg.norm(newton_point))
# Box constraints active
x = modified_dogleg(A, Y, b, 2, [-np.inf, -np.inf], [0.1, np.inf])
assert_array_almost_equal(x, (newton_point/newton_point[0]) * 0.1)
def test_3d_example(self):
A = np.array([[1, 8, 1],
[4, 2, 2]])
b = np.array([-16, 2])
Z, LS, Y = projections(A)
newton_point = np.array([-1.37090909, 2.23272727, -0.49090909])
cauchy_point = np.array([0.11165723, 1.73068711, 0.16748585])
origin = np.zeros_like(newton_point)
# newton_point inside boundaries
x = modified_dogleg(A, Y, b, 3, [-np.inf, -np.inf, -np.inf],
[np.inf, np.inf, np.inf])
assert_array_almost_equal(x, newton_point)
# line between cauchy_point and newton_point contains best point
# (spherical constraint is active).
x = modified_dogleg(A, Y, b, 2, [-np.inf, -np.inf, -np.inf],
[np.inf, np.inf, np.inf])
z = cauchy_point
d = newton_point-cauchy_point
t = ((x-z)/(d))
assert_array_almost_equal(t, np.full(3, 0.40807330))
assert_array_almost_equal(np.linalg.norm(x), 2)
# line between cauchy_point and newton_point contains best point
# (box constraint is active).
x = modified_dogleg(A, Y, b, 5, [-1, -np.inf, -np.inf],
[np.inf, np.inf, np.inf])
z = cauchy_point
d = newton_point-cauchy_point
t = ((x-z)/(d))
assert_array_almost_equal(t, np.full(3, 0.7498195))
assert_array_almost_equal(x[0], -1)
# line between origin and cauchy_point contains best point
# (spherical constraint is active).
x = modified_dogleg(A, Y, b, 1, [-np.inf, -np.inf, -np.inf],
[np.inf, np.inf, np.inf])
z = origin
d = cauchy_point
t = ((x-z)/(d))
assert_array_almost_equal(t, np.full(3, 0.573936265))
assert_array_almost_equal(np.linalg.norm(x), 1)
# line between origin and newton_point contains best point
# (box constraint is active).
x = modified_dogleg(A, Y, b, 2, [-np.inf, -np.inf, -np.inf],
[np.inf, 1, np.inf])
z = origin
d = newton_point
t = ((x-z)/(d))
assert_array_almost_equal(t, np.full(3, 0.4478827364))
assert_array_almost_equal(x[1], 1)
class TestProjectCG(TestCase):
# From Example 16.2 Nocedal/Wright "Numerical
# Optimization" p.452.
def test_nocedal_example(self):
H = csc_matrix([[6, 2, 1],
[2, 5, 2],
[1, 2, 4]])
A = csc_matrix([[1, 0, 1],
[0, 1, 1]])
c = np.array([-8, -3, -3])
b = -np.array([3, 0])
Z, _, Y = projections(A)
x, info = projected_cg(H, c, Z, Y, b)
assert_equal(info["stop_cond"], 4)
assert_equal(info["hits_boundary"], False)
assert_array_almost_equal(x, [2, -1, 1])
def test_compare_with_direct_fact(self):
H = csc_matrix([[6, 2, 1, 3],
[2, 5, 2, 4],
[1, 2, 4, 5],
[3, 4, 5, 7]])
A = csc_matrix([[1, 0, 1, 0],
[0, 1, 1, 1]])
c = np.array([-2, -3, -3, 1])
b = -np.array([3, 0])
Z, _, Y = projections(A)
x, info = projected_cg(H, c, Z, Y, b, tol=0)
x_kkt, _ = eqp_kktfact(H, c, A, b)
assert_equal(info["stop_cond"], 1)
assert_equal(info["hits_boundary"], False)
assert_array_almost_equal(x, x_kkt)
def test_trust_region_infeasible(self):
H = csc_matrix([[6, 2, 1, 3],
[2, 5, 2, 4],
[1, 2, 4, 5],
[3, 4, 5, 7]])
A = csc_matrix([[1, 0, 1, 0],
[0, 1, 1, 1]])
c = np.array([-2, -3, -3, 1])
b = -np.array([3, 0])
trust_radius = 1
Z, _, Y = projections(A)
with pytest.raises(ValueError):
projected_cg(H, c, Z, Y, b, trust_radius=trust_radius)
def test_trust_region_barely_feasible(self):
H = csc_matrix([[6, 2, 1, 3],
[2, 5, 2, 4],
[1, 2, 4, 5],
[3, 4, 5, 7]])
A = csc_matrix([[1, 0, 1, 0],
[0, 1, 1, 1]])
c = np.array([-2, -3, -3, 1])
b = -np.array([3, 0])
trust_radius = 2.32379000772445021283
Z, _, Y = projections(A)
x, info = projected_cg(H, c, Z, Y, b,
tol=0,
trust_radius=trust_radius)
assert_equal(info["stop_cond"], 2)
assert_equal(info["hits_boundary"], True)
assert_array_almost_equal(np.linalg.norm(x), trust_radius)
assert_array_almost_equal(x, -Y.dot(b))
def test_hits_boundary(self):
H = csc_matrix([[6, 2, 1, 3],
[2, 5, 2, 4],
[1, 2, 4, 5],
[3, 4, 5, 7]])
A = csc_matrix([[1, 0, 1, 0],
[0, 1, 1, 1]])
c = np.array([-2, -3, -3, 1])
b = -np.array([3, 0])
trust_radius = 3
Z, _, Y = projections(A)
x, info = projected_cg(H, c, Z, Y, b,
tol=0,
trust_radius=trust_radius)
assert_equal(info["stop_cond"], 2)
assert_equal(info["hits_boundary"], True)
assert_array_almost_equal(np.linalg.norm(x), trust_radius)
def test_negative_curvature_unconstrained(self):
H = csc_matrix([[1, 2, 1, 3],
[2, 0, 2, 4],
[1, 2, 0, 2],
[3, 4, 2, 0]])
A = csc_matrix([[1, 0, 1, 0],
[0, 1, 0, 1]])
c = np.array([-2, -3, -3, 1])
b = -np.array([3, 0])
Z, _, Y = projections(A)
with pytest.raises(ValueError):
projected_cg(H, c, Z, Y, b, tol=0)
def test_negative_curvature(self):
H = csc_matrix([[1, 2, 1, 3],
[2, 0, 2, 4],
[1, 2, 0, 2],
[3, 4, 2, 0]])
A = csc_matrix([[1, 0, 1, 0],
[0, 1, 0, 1]])
c = np.array([-2, -3, -3, 1])
b = -np.array([3, 0])
Z, _, Y = projections(A)
trust_radius = 1000
x, info = projected_cg(H, c, Z, Y, b,
tol=0,
trust_radius=trust_radius)
assert_equal(info["stop_cond"], 3)
assert_equal(info["hits_boundary"], True)
assert_array_almost_equal(np.linalg.norm(x), trust_radius)
# The box constraints are inactive at the solution but
# are active during the iterations.
def test_inactive_box_constraints(self):
H = csc_matrix([[6, 2, 1, 3],
[2, 5, 2, 4],
[1, 2, 4, 5],
[3, 4, 5, 7]])
A = csc_matrix([[1, 0, 1, 0],
[0, 1, 1, 1]])
c = np.array([-2, -3, -3, 1])
b = -np.array([3, 0])
Z, _, Y = projections(A)
x, info = projected_cg(H, c, Z, Y, b,
tol=0,
lb=[0.5, -np.inf,
-np.inf, -np.inf],
return_all=True)
x_kkt, _ = eqp_kktfact(H, c, A, b)
assert_equal(info["stop_cond"], 1)
assert_equal(info["hits_boundary"], False)
assert_array_almost_equal(x, x_kkt)
# The box constraints active and the termination is
# by maximum iterations (infeasible interaction).
def test_active_box_constraints_maximum_iterations_reached(self):
H = csc_matrix([[6, 2, 1, 3],
[2, 5, 2, 4],
[1, 2, 4, 5],
[3, 4, 5, 7]])
A = csc_matrix([[1, 0, 1, 0],
[0, 1, 1, 1]])
c = np.array([-2, -3, -3, 1])
b = -np.array([3, 0])
Z, _, Y = projections(A)
x, info = projected_cg(H, c, Z, Y, b,
tol=0,
lb=[0.8, -np.inf,
-np.inf, -np.inf],
return_all=True)
assert_equal(info["stop_cond"], 1)
assert_equal(info["hits_boundary"], True)
assert_array_almost_equal(A.dot(x), -b)
assert_array_almost_equal(x[0], 0.8)
# The box constraints are active and the termination is
# because it hits boundary (without infeasible interaction).
def test_active_box_constraints_hits_boundaries(self):
H = csc_matrix([[6, 2, 1, 3],
[2, 5, 2, 4],
[1, 2, 4, 5],
[3, 4, 5, 7]])
A = csc_matrix([[1, 0, 1, 0],
[0, 1, 1, 1]])
c = np.array([-2, -3, -3, 1])
b = -np.array([3, 0])
trust_radius = 3
Z, _, Y = projections(A)
x, info = projected_cg(H, c, Z, Y, b,
tol=0,
ub=[np.inf, np.inf, 1.6, np.inf],
trust_radius=trust_radius,
return_all=True)
assert_equal(info["stop_cond"], 2)
assert_equal(info["hits_boundary"], True)
assert_array_almost_equal(x[2], 1.6)
# The box constraints are active and the termination is
# because it hits boundary (infeasible interaction).
def test_active_box_constraints_hits_boundaries_infeasible_iter(self):
H = csc_matrix([[6, 2, 1, 3],
[2, 5, 2, 4],
[1, 2, 4, 5],
[3, 4, 5, 7]])
A = csc_matrix([[1, 0, 1, 0],
[0, 1, 1, 1]])
c = np.array([-2, -3, -3, 1])
b = -np.array([3, 0])
trust_radius = 4
Z, _, Y = projections(A)
x, info = projected_cg(H, c, Z, Y, b,
tol=0,
ub=[np.inf, 0.1, np.inf, np.inf],
trust_radius=trust_radius,
return_all=True)
assert_equal(info["stop_cond"], 2)
assert_equal(info["hits_boundary"], True)
assert_array_almost_equal(x[1], 0.1)
# The box constraints are active and the termination is
# because it hits boundary (no infeasible interaction).
def test_active_box_constraints_negative_curvature(self):
H = csc_matrix([[1, 2, 1, 3],
[2, 0, 2, 4],
[1, 2, 0, 2],
[3, 4, 2, 0]])
A = csc_matrix([[1, 0, 1, 0],
[0, 1, 0, 1]])
c = np.array([-2, -3, -3, 1])
b = -np.array([3, 0])
Z, _, Y = projections(A)
trust_radius = 1000
x, info = projected_cg(H, c, Z, Y, b,
tol=0,
ub=[np.inf, np.inf, 100, np.inf],
trust_radius=trust_radius)
assert_equal(info["stop_cond"], 3)
assert_equal(info["hits_boundary"], True)
assert_array_almost_equal(x[2], 100)
@@ -0,0 +1,32 @@
import numpy as np
from scipy.optimize import minimize, Bounds
def test_gh10880():
# checks that verbose reporting works with trust-constr for
# bound-contrained problems
bnds = Bounds(1, 2)
opts = {'maxiter': 1000, 'verbose': 2}
minimize(lambda x: x**2, x0=2., method='trust-constr',
bounds=bnds, options=opts)
opts = {'maxiter': 1000, 'verbose': 3}
minimize(lambda x: x**2, x0=2., method='trust-constr',
bounds=bnds, options=opts)
def test_gh12922():
# checks that verbose reporting works with trust-constr for
# general constraints
def objective(x):
return np.array([(np.sum((x+1)**4))])
cons = {'type': 'ineq', 'fun': lambda x: -x[0]**2}
n = 25
x0 = np.linspace(-5, 5, n)
opts = {'maxiter': 1000, 'verbose': 2}
minimize(objective, x0=x0, method='trust-constr',
constraints=cons, options=opts)
opts = {'maxiter': 1000, 'verbose': 3}
minimize(objective, x0=x0, method='trust-constr',
constraints=cons, options=opts)
@@ -0,0 +1,346 @@
"""Trust-region interior point method.
References
----------
.. [1] Byrd, Richard H., Mary E. Hribar, and Jorge Nocedal.
"An interior point algorithm for large-scale nonlinear
programming." SIAM Journal on Optimization 9.4 (1999): 877-900.
.. [2] Byrd, Richard H., Guanghui Liu, and Jorge Nocedal.
"On the local behavior of an interior point method for
nonlinear programming." Numerical analysis 1997 (1997): 37-56.
.. [3] Nocedal, Jorge, and Stephen J. Wright. "Numerical optimization"
Second Edition (2006).
"""
import scipy.sparse as sps
import numpy as np
from .equality_constrained_sqp import equality_constrained_sqp
from scipy.sparse.linalg import LinearOperator
__all__ = ['tr_interior_point']
class BarrierSubproblem:
"""
Barrier optimization problem:
minimize fun(x) - barrier_parameter*sum(log(s))
subject to: constr_eq(x) = 0
constr_ineq(x) + s = 0
"""
def __init__(self, x0, s0, fun, grad, lagr_hess, n_vars, n_ineq, n_eq,
constr, jac, barrier_parameter, tolerance,
enforce_feasibility, global_stop_criteria,
xtol, fun0, grad0, constr_ineq0, jac_ineq0, constr_eq0,
jac_eq0):
# Store parameters
self.n_vars = n_vars
self.x0 = x0
self.s0 = s0
self.fun = fun
self.grad = grad
self.lagr_hess = lagr_hess
self.constr = constr
self.jac = jac
self.barrier_parameter = barrier_parameter
self.tolerance = tolerance
self.n_eq = n_eq
self.n_ineq = n_ineq
self.enforce_feasibility = enforce_feasibility
self.global_stop_criteria = global_stop_criteria
self.xtol = xtol
self.fun0 = self._compute_function(fun0, constr_ineq0, s0)
self.grad0 = self._compute_gradient(grad0)
self.constr0 = self._compute_constr(constr_ineq0, constr_eq0, s0)
self.jac0 = self._compute_jacobian(jac_eq0, jac_ineq0, s0)
self.terminate = False
def update(self, barrier_parameter, tolerance):
self.barrier_parameter = barrier_parameter
self.tolerance = tolerance
def get_slack(self, z):
return z[self.n_vars:self.n_vars+self.n_ineq]
def get_variables(self, z):
return z[:self.n_vars]
def function_and_constraints(self, z):
"""Returns barrier function and constraints at given point.
For z = [x, s], returns barrier function:
function(z) = fun(x) - barrier_parameter*sum(log(s))
and barrier constraints:
constraints(z) = [ constr_eq(x) ]
[ constr_ineq(x) + s ]
"""
# Get variables and slack variables
x = self.get_variables(z)
s = self.get_slack(z)
# Compute function and constraints
f = self.fun(x)
c_eq, c_ineq = self.constr(x)
# Return objective function and constraints
return (self._compute_function(f, c_ineq, s),
self._compute_constr(c_ineq, c_eq, s))
def _compute_function(self, f, c_ineq, s):
# Use technique from Nocedal and Wright book, ref [3]_, p.576,
# to guarantee constraints from `enforce_feasibility`
# stay feasible along iterations.
s[self.enforce_feasibility] = -c_ineq[self.enforce_feasibility]
log_s = [np.log(s_i) if s_i > 0 else -np.inf for s_i in s]
# Compute barrier objective function
return f - self.barrier_parameter*np.sum(log_s)
def _compute_constr(self, c_ineq, c_eq, s):
# Compute barrier constraint
return np.hstack((c_eq,
c_ineq + s))
def scaling(self, z):
"""Returns scaling vector.
Given by:
scaling = [ones(n_vars), s]
"""
s = self.get_slack(z)
diag_elements = np.hstack((np.ones(self.n_vars), s))
# Diagonal matrix
def matvec(vec):
return diag_elements*vec
return LinearOperator((self.n_vars+self.n_ineq,
self.n_vars+self.n_ineq),
matvec)
def gradient_and_jacobian(self, z):
"""Returns scaled gradient.
Return scaled gradient:
gradient = [ grad(x) ]
[ -barrier_parameter*ones(n_ineq) ]
and scaled Jacobian matrix:
jacobian = [ jac_eq(x) 0 ]
[ jac_ineq(x) S ]
Both of them scaled by the previously defined scaling factor.
"""
# Get variables and slack variables
x = self.get_variables(z)
s = self.get_slack(z)
# Compute first derivatives
g = self.grad(x)
J_eq, J_ineq = self.jac(x)
# Return gradient and Jacobian
return (self._compute_gradient(g),
self._compute_jacobian(J_eq, J_ineq, s))
def _compute_gradient(self, g):
return np.hstack((g, -self.barrier_parameter*np.ones(self.n_ineq)))
def _compute_jacobian(self, J_eq, J_ineq, s):
if self.n_ineq == 0:
return J_eq
else:
if sps.issparse(J_eq) or sps.issparse(J_ineq):
# It is expected that J_eq and J_ineq
# are already `csr_matrix` because of
# the way ``BoxConstraint``, ``NonlinearConstraint``
# and ``LinearConstraint`` are defined.
J_eq = sps.csr_matrix(J_eq)
J_ineq = sps.csr_matrix(J_ineq)
return self._assemble_sparse_jacobian(J_eq, J_ineq, s)
else:
S = np.diag(s)
zeros = np.zeros((self.n_eq, self.n_ineq))
# Convert to matrix
if sps.issparse(J_ineq):
J_ineq = J_ineq.toarray()
if sps.issparse(J_eq):
J_eq = J_eq.toarray()
# Concatenate matrices
return np.block([[J_eq, zeros],
[J_ineq, S]])
def _assemble_sparse_jacobian(self, J_eq, J_ineq, s):
"""Assemble sparse Jacobian given its components.
Given ``J_eq``, ``J_ineq`` and ``s`` returns:
jacobian = [ J_eq, 0 ]
[ J_ineq, diag(s) ]
It is equivalent to:
sps.bmat([[ J_eq, None ],
[ J_ineq, diag(s) ]], "csr")
but significantly more efficient for this
given structure.
"""
n_vars, n_ineq, n_eq = self.n_vars, self.n_ineq, self.n_eq
J_aux = sps.vstack([J_eq, J_ineq], "csr")
indptr, indices, data = J_aux.indptr, J_aux.indices, J_aux.data
new_indptr = indptr + np.hstack((np.zeros(n_eq, dtype=int),
np.arange(n_ineq+1, dtype=int)))
size = indices.size+n_ineq
new_indices = np.empty(size)
new_data = np.empty(size)
mask = np.full(size, False, bool)
mask[new_indptr[-n_ineq:]-1] = True
new_indices[mask] = n_vars+np.arange(n_ineq)
new_indices[~mask] = indices
new_data[mask] = s
new_data[~mask] = data
J = sps.csr_matrix((new_data, new_indices, new_indptr),
(n_eq + n_ineq, n_vars + n_ineq))
return J
def lagrangian_hessian_x(self, z, v):
"""Returns Lagrangian Hessian (in relation to `x`) -> Hx"""
x = self.get_variables(z)
# Get lagrange multipliers related to nonlinear equality constraints
v_eq = v[:self.n_eq]
# Get lagrange multipliers related to nonlinear ineq. constraints
v_ineq = v[self.n_eq:self.n_eq+self.n_ineq]
lagr_hess = self.lagr_hess
return lagr_hess(x, v_eq, v_ineq)
def lagrangian_hessian_s(self, z, v):
"""Returns scaled Lagrangian Hessian (in relation to`s`) -> S Hs S"""
s = self.get_slack(z)
# Using the primal formulation:
# S Hs S = diag(s)*diag(barrier_parameter/s**2)*diag(s).
# Reference [1]_ p. 882, formula (3.1)
primal = self.barrier_parameter
# Using the primal-dual formulation
# S Hs S = diag(s)*diag(v/s)*diag(s)
# Reference [1]_ p. 883, formula (3.11)
primal_dual = v[-self.n_ineq:]*s
# Uses the primal-dual formulation for
# positives values of v_ineq, and primal
# formulation for the remaining ones.
return np.where(v[-self.n_ineq:] > 0, primal_dual, primal)
def lagrangian_hessian(self, z, v):
"""Returns scaled Lagrangian Hessian"""
# Compute Hessian in relation to x and s
Hx = self.lagrangian_hessian_x(z, v)
if self.n_ineq > 0:
S_Hs_S = self.lagrangian_hessian_s(z, v)
# The scaled Lagragian Hessian is:
# [ Hx 0 ]
# [ 0 S Hs S ]
def matvec(vec):
vec_x = self.get_variables(vec)
vec_s = self.get_slack(vec)
if self.n_ineq > 0:
return np.hstack((Hx.dot(vec_x), S_Hs_S*vec_s))
else:
return Hx.dot(vec_x)
return LinearOperator((self.n_vars+self.n_ineq,
self.n_vars+self.n_ineq),
matvec)
def stop_criteria(self, state, z, last_iteration_failed,
optimality, constr_violation,
trust_radius, penalty, cg_info):
"""Stop criteria to the barrier problem.
The criteria here proposed is similar to formula (2.3)
from [1]_, p.879.
"""
x = self.get_variables(z)
if self.global_stop_criteria(state, x,
last_iteration_failed,
trust_radius, penalty,
cg_info,
self.barrier_parameter,
self.tolerance):
self.terminate = True
return True
else:
g_cond = (optimality < self.tolerance and
constr_violation < self.tolerance)
x_cond = trust_radius < self.xtol
return g_cond or x_cond
def tr_interior_point(fun, grad, lagr_hess, n_vars, n_ineq, n_eq,
constr, jac, x0, fun0, grad0,
constr_ineq0, jac_ineq0, constr_eq0,
jac_eq0, stop_criteria,
enforce_feasibility, xtol, state,
initial_barrier_parameter,
initial_tolerance,
initial_penalty,
initial_trust_radius,
factorization_method):
"""Trust-region interior points method.
Solve problem:
minimize fun(x)
subject to: constr_ineq(x) <= 0
constr_eq(x) = 0
using trust-region interior point method described in [1]_.
"""
# BOUNDARY_PARAMETER controls the decrease on the slack
# variables. Represents ``tau`` from [1]_ p.885, formula (3.18).
BOUNDARY_PARAMETER = 0.995
# BARRIER_DECAY_RATIO controls the decay of the barrier parameter
# and of the subproblem toloerance. Represents ``theta`` from [1]_ p.879.
BARRIER_DECAY_RATIO = 0.2
# TRUST_ENLARGEMENT controls the enlargement on trust radius
# after each iteration
TRUST_ENLARGEMENT = 5
# Default enforce_feasibility
if enforce_feasibility is None:
enforce_feasibility = np.zeros(n_ineq, bool)
# Initial Values
barrier_parameter = initial_barrier_parameter
tolerance = initial_tolerance
trust_radius = initial_trust_radius
# Define initial value for the slack variables
s0 = np.maximum(-1.5*constr_ineq0, np.ones(n_ineq))
# Define barrier subproblem
subprob = BarrierSubproblem(
x0, s0, fun, grad, lagr_hess, n_vars, n_ineq, n_eq, constr, jac,
barrier_parameter, tolerance, enforce_feasibility,
stop_criteria, xtol, fun0, grad0, constr_ineq0, jac_ineq0,
constr_eq0, jac_eq0)
# Define initial parameter for the first iteration.
z = np.hstack((x0, s0))
fun0_subprob, constr0_subprob = subprob.fun0, subprob.constr0
grad0_subprob, jac0_subprob = subprob.grad0, subprob.jac0
# Define trust region bounds
trust_lb = np.hstack((np.full(subprob.n_vars, -np.inf),
np.full(subprob.n_ineq, -BOUNDARY_PARAMETER)))
trust_ub = np.full(subprob.n_vars+subprob.n_ineq, np.inf)
# Solves a sequence of barrier problems
while True:
# Solve SQP subproblem
z, state = equality_constrained_sqp(
subprob.function_and_constraints,
subprob.gradient_and_jacobian,
subprob.lagrangian_hessian,
z, fun0_subprob, grad0_subprob,
constr0_subprob, jac0_subprob, subprob.stop_criteria,
state, initial_penalty, trust_radius,
factorization_method, trust_lb, trust_ub, subprob.scaling)
if subprob.terminate:
break
# Update parameters
trust_radius = max(initial_trust_radius,
TRUST_ENLARGEMENT*state.tr_radius)
# TODO: Use more advanced strategies from [2]_
# to update this parameters.
barrier_parameter *= BARRIER_DECAY_RATIO
tolerance *= BARRIER_DECAY_RATIO
# Update Barrier Problem
subprob.update(barrier_parameter, tolerance)
# Compute initial values for next iteration
fun0_subprob, constr0_subprob = subprob.function_and_constraints(z)
grad0_subprob, jac0_subprob = subprob.gradient_and_jacobian(z)
# Get x and s
x = subprob.get_variables(z)
return x, state
@@ -0,0 +1,122 @@
"""Dog-leg trust-region optimization."""
import numpy as np
import scipy.linalg
from ._trustregion import (_minimize_trust_region, BaseQuadraticSubproblem)
__all__ = []
def _minimize_dogleg(fun, x0, args=(), jac=None, hess=None,
**trust_region_options):
"""
Minimization of scalar function of one or more variables using
the dog-leg trust-region algorithm.
Options
-------
initial_trust_radius : float
Initial trust-region radius.
max_trust_radius : float
Maximum value of the trust-region radius. No steps that are longer
than this value will be proposed.
eta : float
Trust region related acceptance stringency for proposed steps.
gtol : float
Gradient norm must be less than `gtol` before successful
termination.
"""
if jac is None:
raise ValueError('Jacobian is required for dogleg minimization')
if not callable(hess):
raise ValueError('Hessian is required for dogleg minimization')
return _minimize_trust_region(fun, x0, args=args, jac=jac, hess=hess,
subproblem=DoglegSubproblem,
**trust_region_options)
class DoglegSubproblem(BaseQuadraticSubproblem):
"""Quadratic subproblem solved by the dogleg method"""
def cauchy_point(self):
"""
The Cauchy point is minimal along the direction of steepest descent.
"""
if self._cauchy_point is None:
g = self.jac
Bg = self.hessp(g)
self._cauchy_point = -(np.dot(g, g) / np.dot(g, Bg)) * g
return self._cauchy_point
def newton_point(self):
"""
The Newton point is a global minimum of the approximate function.
"""
if self._newton_point is None:
g = self.jac
B = self.hess
cho_info = scipy.linalg.cho_factor(B)
self._newton_point = -scipy.linalg.cho_solve(cho_info, g)
return self._newton_point
def solve(self, trust_radius):
"""
Minimize a function using the dog-leg trust-region algorithm.
This algorithm requires function values and first and second derivatives.
It also performs a costly Hessian decomposition for most iterations,
and the Hessian is required to be positive definite.
Parameters
----------
trust_radius : float
We are allowed to wander only this far away from the origin.
Returns
-------
p : ndarray
The proposed step.
hits_boundary : bool
True if the proposed step is on the boundary of the trust region.
Notes
-----
The Hessian is required to be positive definite.
References
----------
.. [1] Jorge Nocedal and Stephen Wright,
Numerical Optimization, second edition,
Springer-Verlag, 2006, page 73.
"""
# Compute the Newton point.
# This is the optimum for the quadratic model function.
# If it is inside the trust radius then return this point.
p_best = self.newton_point()
if scipy.linalg.norm(p_best) < trust_radius:
hits_boundary = False
return p_best, hits_boundary
# Compute the Cauchy point.
# This is the predicted optimum along the direction of steepest descent.
p_u = self.cauchy_point()
# If the Cauchy point is outside the trust region,
# then return the point where the path intersects the boundary.
p_u_norm = scipy.linalg.norm(p_u)
if p_u_norm >= trust_radius:
p_boundary = p_u * (trust_radius / p_u_norm)
hits_boundary = True
return p_boundary, hits_boundary
# Compute the intersection of the trust region boundary
# and the line segment connecting the Cauchy and Newton points.
# This requires solving a quadratic equation.
# ||p_u + t*(p_best - p_u)||**2 == trust_radius**2
# Solve this for positive time t using the quadratic formula.
_, tb = self.get_boundaries_intersections(p_u, p_best - p_u,
trust_radius)
p_boundary = p_u + tb * (p_best - p_u)
hits_boundary = True
return p_boundary, hits_boundary
@@ -0,0 +1,438 @@
"""Nearly exact trust-region optimization subproblem."""
import numpy as np
from scipy.linalg import (norm, get_lapack_funcs, solve_triangular,
cho_solve)
from ._trustregion import (_minimize_trust_region, BaseQuadraticSubproblem)
__all__ = ['_minimize_trustregion_exact',
'estimate_smallest_singular_value',
'singular_leading_submatrix',
'IterativeSubproblem']
def _minimize_trustregion_exact(fun, x0, args=(), jac=None, hess=None,
**trust_region_options):
"""
Minimization of scalar function of one or more variables using
a nearly exact trust-region algorithm.
Options
-------
initial_trust_radius : float
Initial trust-region radius.
max_trust_radius : float
Maximum value of the trust-region radius. No steps that are longer
than this value will be proposed.
eta : float
Trust region related acceptance stringency for proposed steps.
gtol : float
Gradient norm must be less than ``gtol`` before successful
termination.
"""
if jac is None:
raise ValueError('Jacobian is required for trust region '
'exact minimization.')
if not callable(hess):
raise ValueError('Hessian matrix is required for trust region '
'exact minimization.')
return _minimize_trust_region(fun, x0, args=args, jac=jac, hess=hess,
subproblem=IterativeSubproblem,
**trust_region_options)
def estimate_smallest_singular_value(U):
"""Given upper triangular matrix ``U`` estimate the smallest singular
value and the correspondent right singular vector in O(n**2) operations.
Parameters
----------
U : ndarray
Square upper triangular matrix.
Returns
-------
s_min : float
Estimated smallest singular value of the provided matrix.
z_min : ndarray
Estimatied right singular vector.
Notes
-----
The procedure is based on [1]_ and is done in two steps. First, it finds
a vector ``e`` with components selected from {+1, -1} such that the
solution ``w`` from the system ``U.T w = e`` is as large as possible.
Next it estimate ``U v = w``. The smallest singular value is close
to ``norm(w)/norm(v)`` and the right singular vector is close
to ``v/norm(v)``.
The estimation will be better more ill-conditioned is the matrix.
References
----------
.. [1] Cline, A. K., Moler, C. B., Stewart, G. W., Wilkinson, J. H.
An estimate for the condition number of a matrix. 1979.
SIAM Journal on Numerical Analysis, 16(2), 368-375.
"""
U = np.atleast_2d(U)
m, n = U.shape
if m != n:
raise ValueError("A square triangular matrix should be provided.")
# A vector `e` with components selected from {+1, -1}
# is selected so that the solution `w` to the system
# `U.T w = e` is as large as possible. Implementation
# based on algorithm 3.5.1, p. 142, from reference [2]
# adapted for lower triangular matrix.
p = np.zeros(n)
w = np.empty(n)
# Implemented according to: Golub, G. H., Van Loan, C. F. (2013).
# "Matrix computations". Forth Edition. JHU press. pp. 140-142.
for k in range(n):
wp = (1-p[k]) / U.T[k, k]
wm = (-1-p[k]) / U.T[k, k]
pp = p[k+1:] + U.T[k+1:, k]*wp
pm = p[k+1:] + U.T[k+1:, k]*wm
if abs(wp) + norm(pp, 1) >= abs(wm) + norm(pm, 1):
w[k] = wp
p[k+1:] = pp
else:
w[k] = wm
p[k+1:] = pm
# The system `U v = w` is solved using backward substitution.
v = solve_triangular(U, w)
v_norm = norm(v)
w_norm = norm(w)
# Smallest singular value
s_min = w_norm / v_norm
# Associated vector
z_min = v / v_norm
return s_min, z_min
def gershgorin_bounds(H):
"""
Given a square matrix ``H`` compute upper
and lower bounds for its eigenvalues (Gregoshgorin Bounds).
Defined ref. [1].
References
----------
.. [1] Conn, A. R., Gould, N. I., & Toint, P. L.
Trust region methods. 2000. Siam. pp. 19.
"""
H_diag = np.diag(H)
H_diag_abs = np.abs(H_diag)
H_row_sums = np.sum(np.abs(H), axis=1)
lb = np.min(H_diag + H_diag_abs - H_row_sums)
ub = np.max(H_diag - H_diag_abs + H_row_sums)
return lb, ub
def singular_leading_submatrix(A, U, k):
"""
Compute term that makes the leading ``k`` by ``k``
submatrix from ``A`` singular.
Parameters
----------
A : ndarray
Symmetric matrix that is not positive definite.
U : ndarray
Upper triangular matrix resulting of an incomplete
Cholesky decomposition of matrix ``A``.
k : int
Positive integer such that the leading k by k submatrix from
`A` is the first non-positive definite leading submatrix.
Returns
-------
delta : float
Amount that should be added to the element (k, k) of the
leading k by k submatrix of ``A`` to make it singular.
v : ndarray
A vector such that ``v.T B v = 0``. Where B is the matrix A after
``delta`` is added to its element (k, k).
"""
# Compute delta
delta = np.sum(U[:k-1, k-1]**2) - A[k-1, k-1]
n = len(A)
# Inicialize v
v = np.zeros(n)
v[k-1] = 1
# Compute the remaining values of v by solving a triangular system.
if k != 1:
v[:k-1] = solve_triangular(U[:k-1, :k-1], -U[:k-1, k-1])
return delta, v
class IterativeSubproblem(BaseQuadraticSubproblem):
"""Quadratic subproblem solved by nearly exact iterative method.
Notes
-----
This subproblem solver was based on [1]_, [2]_ and [3]_,
which implement similar algorithms. The algorithm is basically
that of [1]_ but ideas from [2]_ and [3]_ were also used.
References
----------
.. [1] A.R. Conn, N.I. Gould, and P.L. Toint, "Trust region methods",
Siam, pp. 169-200, 2000.
.. [2] J. Nocedal and S. Wright, "Numerical optimization",
Springer Science & Business Media. pp. 83-91, 2006.
.. [3] J.J. More and D.C. Sorensen, "Computing a trust region step",
SIAM Journal on Scientific and Statistical Computing, vol. 4(3),
pp. 553-572, 1983.
"""
# UPDATE_COEFF appears in reference [1]_
# in formula 7.3.14 (p. 190) named as "theta".
# As recommended there it value is fixed in 0.01.
UPDATE_COEFF = 0.01
EPS = np.finfo(float).eps
def __init__(self, x, fun, jac, hess, hessp=None,
k_easy=0.1, k_hard=0.2):
super().__init__(x, fun, jac, hess)
# When the trust-region shrinks in two consecutive
# calculations (``tr_radius < previous_tr_radius``)
# the lower bound ``lambda_lb`` may be reused,
# facilitating the convergence. To indicate no
# previous value is known at first ``previous_tr_radius``
# is set to -1 and ``lambda_lb`` to None.
self.previous_tr_radius = -1
self.lambda_lb = None
self.niter = 0
# ``k_easy`` and ``k_hard`` are parameters used
# to determine the stop criteria to the iterative
# subproblem solver. Take a look at pp. 194-197
# from reference _[1] for a more detailed description.
self.k_easy = k_easy
self.k_hard = k_hard
# Get Lapack function for cholesky decomposition.
# The implemented SciPy wrapper does not return
# the incomplete factorization needed by the method.
self.cholesky, = get_lapack_funcs(('potrf',), (self.hess,))
# Get info about Hessian
self.dimension = len(self.hess)
self.hess_gershgorin_lb,\
self.hess_gershgorin_ub = gershgorin_bounds(self.hess)
self.hess_inf = norm(self.hess, np.inf)
self.hess_fro = norm(self.hess, 'fro')
# A constant such that for vectors smaller than that
# backward substituition is not reliable. It was stabilished
# based on Golub, G. H., Van Loan, C. F. (2013).
# "Matrix computations". Forth Edition. JHU press., p.165.
self.CLOSE_TO_ZERO = self.dimension * self.EPS * self.hess_inf
def _initial_values(self, tr_radius):
"""Given a trust radius, return a good initial guess for
the damping factor, the lower bound and the upper bound.
The values were chosen accordingly to the guidelines on
section 7.3.8 (p. 192) from [1]_.
"""
# Upper bound for the damping factor
lambda_ub = max(0, self.jac_mag/tr_radius + min(-self.hess_gershgorin_lb,
self.hess_fro,
self.hess_inf))
# Lower bound for the damping factor
lambda_lb = max(0, -min(self.hess.diagonal()),
self.jac_mag/tr_radius - min(self.hess_gershgorin_ub,
self.hess_fro,
self.hess_inf))
# Improve bounds with previous info
if tr_radius < self.previous_tr_radius:
lambda_lb = max(self.lambda_lb, lambda_lb)
# Initial guess for the damping factor
if lambda_lb == 0:
lambda_initial = 0
else:
lambda_initial = max(np.sqrt(lambda_lb * lambda_ub),
lambda_lb + self.UPDATE_COEFF*(lambda_ub-lambda_lb))
return lambda_initial, lambda_lb, lambda_ub
def solve(self, tr_radius):
"""Solve quadratic subproblem"""
lambda_current, lambda_lb, lambda_ub = self._initial_values(tr_radius)
n = self.dimension
hits_boundary = True
already_factorized = False
self.niter = 0
while True:
# Compute Cholesky factorization
if already_factorized:
already_factorized = False
else:
H = self.hess+lambda_current*np.eye(n)
U, info = self.cholesky(H, lower=False,
overwrite_a=False,
clean=True)
self.niter += 1
# Check if factorization succeeded
if info == 0 and self.jac_mag > self.CLOSE_TO_ZERO:
# Successful factorization
# Solve `U.T U p = s`
p = cho_solve((U, False), -self.jac)
p_norm = norm(p)
# Check for interior convergence
if p_norm <= tr_radius and lambda_current == 0:
hits_boundary = False
break
# Solve `U.T w = p`
w = solve_triangular(U, p, trans='T')
w_norm = norm(w)
# Compute Newton step accordingly to
# formula (4.44) p.87 from ref [2]_.
delta_lambda = (p_norm/w_norm)**2 * (p_norm-tr_radius)/tr_radius
lambda_new = lambda_current + delta_lambda
if p_norm < tr_radius: # Inside boundary
s_min, z_min = estimate_smallest_singular_value(U)
ta, tb = self.get_boundaries_intersections(p, z_min,
tr_radius)
# Choose `step_len` with the smallest magnitude.
# The reason for this choice is explained at
# ref [3]_, p. 6 (Immediately before the formula
# for `tau`).
step_len = min([ta, tb], key=abs)
# Compute the quadratic term (p.T*H*p)
quadratic_term = np.dot(p, np.dot(H, p))
# Check stop criteria
relative_error = ((step_len**2 * s_min**2)
/ (quadratic_term + lambda_current*tr_radius**2))
if relative_error <= self.k_hard:
p += step_len * z_min
break
# Update uncertanty bounds
lambda_ub = lambda_current
lambda_lb = max(lambda_lb, lambda_current - s_min**2)
# Compute Cholesky factorization
H = self.hess + lambda_new*np.eye(n)
c, info = self.cholesky(H, lower=False,
overwrite_a=False,
clean=True)
# Check if the factorization have succeeded
#
if info == 0: # Successful factorization
# Update damping factor
lambda_current = lambda_new
already_factorized = True
else: # Unsuccessful factorization
# Update uncertanty bounds
lambda_lb = max(lambda_lb, lambda_new)
# Update damping factor
lambda_current = max(
np.sqrt(lambda_lb * lambda_ub),
lambda_lb + self.UPDATE_COEFF*(lambda_ub-lambda_lb)
)
else: # Outside boundary
# Check stop criteria
relative_error = abs(p_norm - tr_radius) / tr_radius
if relative_error <= self.k_easy:
break
# Update uncertanty bounds
lambda_lb = lambda_current
# Update damping factor
lambda_current = lambda_new
elif info == 0 and self.jac_mag <= self.CLOSE_TO_ZERO:
# jac_mag very close to zero
# Check for interior convergence
if lambda_current == 0:
p = np.zeros(n)
hits_boundary = False
break
s_min, z_min = estimate_smallest_singular_value(U)
step_len = tr_radius
# Check stop criteria
if (step_len**2 * s_min**2
<= self.k_hard * lambda_current * tr_radius**2):
p = step_len * z_min
break
# Update uncertanty bounds
lambda_ub = lambda_current
lambda_lb = max(lambda_lb, lambda_current - s_min**2)
# Update damping factor
lambda_current = max(
np.sqrt(lambda_lb * lambda_ub),
lambda_lb + self.UPDATE_COEFF*(lambda_ub-lambda_lb)
)
else: # Unsuccessful factorization
# Compute auxiliary terms
delta, v = singular_leading_submatrix(H, U, info)
v_norm = norm(v)
# Update uncertanty interval
lambda_lb = max(lambda_lb, lambda_current + delta/v_norm**2)
# Update damping factor
lambda_current = max(
np.sqrt(lambda_lb * lambda_ub),
lambda_lb + self.UPDATE_COEFF*(lambda_ub-lambda_lb)
)
self.lambda_lb = lambda_lb
self.lambda_current = lambda_current
self.previous_tr_radius = tr_radius
return p, hits_boundary
@@ -0,0 +1,65 @@
from ._trustregion import (_minimize_trust_region)
from ._trlib import (get_trlib_quadratic_subproblem)
__all__ = ['_minimize_trust_krylov']
def _minimize_trust_krylov(fun, x0, args=(), jac=None, hess=None, hessp=None,
inexact=True, **trust_region_options):
"""
Minimization of a scalar function of one or more variables using
a nearly exact trust-region algorithm that only requires matrix
vector products with the hessian matrix.
.. versionadded:: 1.0.0
Options
-------
inexact : bool, optional
Accuracy to solve subproblems. If True requires less nonlinear
iterations, but more vector products.
"""
if jac is None:
raise ValueError('Jacobian is required for trust region ',
'exact minimization.')
if hess is None and hessp is None:
raise ValueError('Either the Hessian or the Hessian-vector product '
'is required for Krylov trust-region minimization')
# tol_rel specifies the termination tolerance relative to the initial
# gradient norm in the Krylov subspace iteration.
# - tol_rel_i specifies the tolerance for interior convergence.
# - tol_rel_b specifies the tolerance for boundary convergence.
# in nonlinear programming applications it is not necessary to solve
# the boundary case as exact as the interior case.
# - setting tol_rel_i=-2 leads to a forcing sequence in the Krylov
# subspace iteration leading to quadratic convergence if eventually
# the trust region stays inactive.
# - setting tol_rel_b=-3 leads to a forcing sequence in the Krylov
# subspace iteration leading to superlinear convergence as long
# as the iterates hit the trust region boundary.
# For details consult the documentation of trlib_krylov_min
# in _trlib/trlib_krylov.h
#
# Optimality of this choice of parameters among a range of possibilities
# has been tested on the unconstrained subset of the CUTEst library.
if inexact:
return _minimize_trust_region(fun, x0, args=args, jac=jac,
hess=hess, hessp=hessp,
subproblem=get_trlib_quadratic_subproblem(
tol_rel_i=-2.0, tol_rel_b=-3.0,
disp=trust_region_options.get('disp', False)
),
**trust_region_options)
else:
return _minimize_trust_region(fun, x0, args=args, jac=jac,
hess=hess, hessp=hessp,
subproblem=get_trlib_quadratic_subproblem(
tol_rel_i=1e-8, tol_rel_b=1e-6,
disp=trust_region_options.get('disp', False)
),
**trust_region_options)
@@ -0,0 +1,126 @@
"""Newton-CG trust-region optimization."""
import math
import numpy as np
import scipy.linalg
from ._trustregion import (_minimize_trust_region, BaseQuadraticSubproblem)
__all__ = []
def _minimize_trust_ncg(fun, x0, args=(), jac=None, hess=None, hessp=None,
**trust_region_options):
"""
Minimization of scalar function of one or more variables using
the Newton conjugate gradient trust-region algorithm.
Options
-------
initial_trust_radius : float
Initial trust-region radius.
max_trust_radius : float
Maximum value of the trust-region radius. No steps that are longer
than this value will be proposed.
eta : float
Trust region related acceptance stringency for proposed steps.
gtol : float
Gradient norm must be less than `gtol` before successful
termination.
"""
if jac is None:
raise ValueError('Jacobian is required for Newton-CG trust-region '
'minimization')
if hess is None and hessp is None:
raise ValueError('Either the Hessian or the Hessian-vector product '
'is required for Newton-CG trust-region minimization')
return _minimize_trust_region(fun, x0, args=args, jac=jac, hess=hess,
hessp=hessp, subproblem=CGSteihaugSubproblem,
**trust_region_options)
class CGSteihaugSubproblem(BaseQuadraticSubproblem):
"""Quadratic subproblem solved by a conjugate gradient method"""
def solve(self, trust_radius):
"""
Solve the subproblem using a conjugate gradient method.
Parameters
----------
trust_radius : float
We are allowed to wander only this far away from the origin.
Returns
-------
p : ndarray
The proposed step.
hits_boundary : bool
True if the proposed step is on the boundary of the trust region.
Notes
-----
This is algorithm (7.2) of Nocedal and Wright 2nd edition.
Only the function that computes the Hessian-vector product is required.
The Hessian itself is not required, and the Hessian does
not need to be positive semidefinite.
"""
# get the norm of jacobian and define the origin
p_origin = np.zeros_like(self.jac)
# define a default tolerance
tolerance = min(0.5, math.sqrt(self.jac_mag)) * self.jac_mag
# Stop the method if the search direction
# is a direction of nonpositive curvature.
if self.jac_mag < tolerance:
hits_boundary = False
return p_origin, hits_boundary
# init the state for the first iteration
z = p_origin
r = self.jac
d = -r
# Search for the min of the approximation of the objective function.
while True:
# do an iteration
Bd = self.hessp(d)
dBd = np.dot(d, Bd)
if dBd <= 0:
# Look at the two boundary points.
# Find both values of t to get the boundary points such that
# ||z + t d|| == trust_radius
# and then choose the one with the predicted min value.
ta, tb = self.get_boundaries_intersections(z, d, trust_radius)
pa = z + ta * d
pb = z + tb * d
if self(pa) < self(pb):
p_boundary = pa
else:
p_boundary = pb
hits_boundary = True
return p_boundary, hits_boundary
r_squared = np.dot(r, r)
alpha = r_squared / dBd
z_next = z + alpha * d
if scipy.linalg.norm(z_next) >= trust_radius:
# Find t >= 0 to get the boundary point such that
# ||z + t d|| == trust_radius
ta, tb = self.get_boundaries_intersections(z, d, trust_radius)
p_boundary = z + tb * d
hits_boundary = True
return p_boundary, hits_boundary
r_next = r + alpha * Bd
r_next_squared = np.dot(r_next, r_next)
if math.sqrt(r_next_squared) < tolerance:
hits_boundary = False
return z_next, hits_boundary
beta_next = r_next_squared / r_squared
d_next = -r_next + beta_next * d
# update the state for the next iteration
z = z_next
r = r_next
d = d_next
@@ -0,0 +1,968 @@
r"""
Parameters used in test and benchmark methods.
Collections of test cases suitable for testing 1-D root-finders
'original': The original benchmarking functions.
Real-valued functions of real-valued inputs on an interval
with a zero.
f1, .., f3 are continuous and infinitely differentiable
f4 has a left- and right- discontinuity at the root
f5 has a root at 1 replacing a 1st order pole
f6 is randomly positive on one side of the root,
randomly negative on the other.
f4 - f6 are not continuous at the root.
'aps': The test problems in the 1995 paper
TOMS "Algorithm 748: Enclosing Zeros of Continuous Functions"
by Alefeld, Potra and Shi. Real-valued functions of
real-valued inputs on an interval with a zero.
Suitable for methods which start with an enclosing interval, and
derivatives up to 2nd order.
'complex': Some complex-valued functions of complex-valued inputs.
No enclosing bracket is provided.
Suitable for methods which use one or more starting values, and
derivatives up to 2nd order.
The test cases are provided as a list of dictionaries. The dictionary
keys will be a subset of:
["f", "fprime", "fprime2", "args", "bracket", "smoothness",
"a", "b", "x0", "x1", "root", "ID"]
"""
# Sources:
# [1] Alefeld, G. E. and Potra, F. A. and Shi, Yixun,
# "Algorithm 748: Enclosing Zeros of Continuous Functions",
# ACM Trans. Math. Softw. Volume 221(1995)
# doi = {10.1145/210089.210111},
# [2] Chandrupatla, Tirupathi R. "A new hybrid quadratic/bisection algorithm
# for finding the zero of a nonlinear function without using derivatives."
# Advances in Engineering Software 28.3 (1997): 145-149.
from random import random
import numpy as np
from scipy.optimize import _zeros_py as cc
# "description" refers to the original functions
description = """
f2 is a symmetric parabola, x**2 - 1
f3 is a quartic polynomial with large hump in interval
f4 is step function with a discontinuity at 1
f5 is a hyperbola with vertical asymptote at 1
f6 has random values positive to left of 1, negative to right
Of course, these are not real problems. They just test how the
'good' solvers behave in bad circumstances where bisection is
really the best. A good solver should not be much worse than
bisection in such circumstance, while being faster for smooth
monotone sorts of functions.
"""
def f1(x):
r"""f1 is a quadratic with roots at 0 and 1"""
return x * (x - 1.)
def f1_fp(x):
return 2 * x - 1
def f1_fpp(x):
return 2
def f2(x):
r"""f2 is a symmetric parabola, x**2 - 1"""
return x**2 - 1
def f2_fp(x):
return 2 * x
def f2_fpp(x):
return 2
def f3(x):
r"""A quartic with roots at 0, 1, 2 and 3"""
return x * (x - 1.) * (x - 2.) * (x - 3.) # x**4 - 6x**3 + 11x**2 - 6x
def f3_fp(x):
return 4 * x**3 - 18 * x**2 + 22 * x - 6
def f3_fpp(x):
return 12 * x**2 - 36 * x + 22
def f4(x):
r"""Piecewise linear, left- and right- discontinuous at x=1, the root."""
if x > 1:
return 1.0 + .1 * x
if x < 1:
return -1.0 + .1 * x
return 0
def f5(x):
r"""
Hyperbola with a pole at x=1, but pole replaced with 0. Not continuous at root.
"""
if x != 1:
return 1.0 / (1. - x)
return 0
# f6(x) returns random value. Without memoization, calling twice with the
# same x returns different values, hence a "random value", not a
# "function with random values"
_f6_cache = {}
def f6(x):
v = _f6_cache.get(x, None)
if v is None:
if x > 1:
v = random()
elif x < 1:
v = -random()
else:
v = 0
_f6_cache[x] = v
return v
# Each Original test case has
# - a function and its two derivatives,
# - additional arguments,
# - a bracket enclosing a root,
# - the order of differentiability (smoothness) on this interval
# - a starting value for methods which don't require a bracket
# - the root (inside the bracket)
# - an Identifier of the test case
_ORIGINAL_TESTS_KEYS = [
"f", "fprime", "fprime2", "args", "bracket", "smoothness", "x0", "root", "ID"
]
_ORIGINAL_TESTS = [
[f1, f1_fp, f1_fpp, (), [0.5, np.sqrt(3)], np.inf, 0.6, 1.0, "original.01.00"],
[f2, f2_fp, f2_fpp, (), [0.5, np.sqrt(3)], np.inf, 0.6, 1.0, "original.02.00"],
[f3, f3_fp, f3_fpp, (), [0.5, np.sqrt(3)], np.inf, 0.6, 1.0, "original.03.00"],
[f4, None, None, (), [0.5, np.sqrt(3)], -1, 0.6, 1.0, "original.04.00"],
[f5, None, None, (), [0.5, np.sqrt(3)], -1, 0.6, 1.0, "original.05.00"],
[f6, None, None, (), [0.5, np.sqrt(3)], -np.inf, 0.6, 1.0, "original.05.00"]
]
_ORIGINAL_TESTS_DICTS = [
dict(zip(_ORIGINAL_TESTS_KEYS, testcase)) for testcase in _ORIGINAL_TESTS
]
# ##################
# "APS" test cases
# Functions and test cases that appear in [1]
def aps01_f(x):
r"""Straightforward sum of trigonometric function and polynomial"""
return np.sin(x) - x / 2
def aps01_fp(x):
return np.cos(x) - 1.0 / 2
def aps01_fpp(x):
return -np.sin(x)
def aps02_f(x):
r"""poles at x=n**2, 1st and 2nd derivatives at root are also close to 0"""
ii = np.arange(1, 21)
return -2 * np.sum((2 * ii - 5)**2 / (x - ii**2)**3)
def aps02_fp(x):
ii = np.arange(1, 21)
return 6 * np.sum((2 * ii - 5)**2 / (x - ii**2)**4)
def aps02_fpp(x):
ii = np.arange(1, 21)
return 24 * np.sum((2 * ii - 5)**2 / (x - ii**2)**5)
def aps03_f(x, a, b):
r"""Rapidly changing at the root"""
return a * x * np.exp(b * x)
def aps03_fp(x, a, b):
return a * (b * x + 1) * np.exp(b * x)
def aps03_fpp(x, a, b):
return a * (b * (b * x + 1) + b) * np.exp(b * x)
def aps04_f(x, n, a):
r"""Medium-degree polynomial"""
return x**n - a
def aps04_fp(x, n, a):
return n * x**(n - 1)
def aps04_fpp(x, n, a):
return n * (n - 1) * x**(n - 2)
def aps05_f(x):
r"""Simple Trigonometric function"""
return np.sin(x) - 1.0 / 2
def aps05_fp(x):
return np.cos(x)
def aps05_fpp(x):
return -np.sin(x)
def aps06_f(x, n):
r"""Exponential rapidly changing from -1 to 1 at x=0"""
return 2 * x * np.exp(-n) - 2 * np.exp(-n * x) + 1
def aps06_fp(x, n):
return 2 * np.exp(-n) + 2 * n * np.exp(-n * x)
def aps06_fpp(x, n):
return -2 * n * n * np.exp(-n * x)
def aps07_f(x, n):
r"""Upside down parabola with parametrizable height"""
return (1 + (1 - n)**2) * x - (1 - n * x)**2
def aps07_fp(x, n):
return (1 + (1 - n)**2) + 2 * n * (1 - n * x)
def aps07_fpp(x, n):
return -2 * n * n
def aps08_f(x, n):
r"""Degree n polynomial"""
return x * x - (1 - x)**n
def aps08_fp(x, n):
return 2 * x + n * (1 - x)**(n - 1)
def aps08_fpp(x, n):
return 2 - n * (n - 1) * (1 - x)**(n - 2)
def aps09_f(x, n):
r"""Upside down quartic with parametrizable height"""
return (1 + (1 - n)**4) * x - (1 - n * x)**4
def aps09_fp(x, n):
return (1 + (1 - n)**4) + 4 * n * (1 - n * x)**3
def aps09_fpp(x, n):
return -12 * n * (1 - n * x)**2
def aps10_f(x, n):
r"""Exponential plus a polynomial"""
return np.exp(-n * x) * (x - 1) + x**n
def aps10_fp(x, n):
return np.exp(-n * x) * (-n * (x - 1) + 1) + n * x**(n - 1)
def aps10_fpp(x, n):
return (np.exp(-n * x) * (-n * (-n * (x - 1) + 1) + -n * x)
+ n * (n - 1) * x**(n - 2))
def aps11_f(x, n):
r"""Rational function with a zero at x=1/n and a pole at x=0"""
return (n * x - 1) / ((n - 1) * x)
def aps11_fp(x, n):
return 1 / (n - 1) / x**2
def aps11_fpp(x, n):
return -2 / (n - 1) / x**3
def aps12_f(x, n):
r"""nth root of x, with a zero at x=n"""
return np.power(x, 1.0 / n) - np.power(n, 1.0 / n)
def aps12_fp(x, n):
return np.power(x, (1.0 - n) / n) / n
def aps12_fpp(x, n):
return np.power(x, (1.0 - 2 * n) / n) * (1.0 / n) * (1.0 - n) / n
_MAX_EXPABLE = np.log(np.finfo(float).max)
def aps13_f(x):
r"""Function with *all* derivatives 0 at the root"""
if x == 0:
return 0
# x2 = 1.0/x**2
# if x2 > 708:
# return 0
y = 1 / x**2
if y > _MAX_EXPABLE:
return 0
return x / np.exp(y)
def aps13_fp(x):
if x == 0:
return 0
y = 1 / x**2
if y > _MAX_EXPABLE:
return 0
return (1 + 2 / x**2) / np.exp(y)
def aps13_fpp(x):
if x == 0:
return 0
y = 1 / x**2
if y > _MAX_EXPABLE:
return 0
return 2 * (2 - x**2) / x**5 / np.exp(y)
def aps14_f(x, n):
r"""0 for negative x-values, trigonometric+linear for x positive"""
if x <= 0:
return -n / 20.0
return n / 20.0 * (x / 1.5 + np.sin(x) - 1)
def aps14_fp(x, n):
if x <= 0:
return 0
return n / 20.0 * (1.0 / 1.5 + np.cos(x))
def aps14_fpp(x, n):
if x <= 0:
return 0
return -n / 20.0 * (np.sin(x))
def aps15_f(x, n):
r"""piecewise linear, constant outside of [0, 0.002/(1+n)]"""
if x < 0:
return -0.859
if x > 2 * 1e-3 / (1 + n):
return np.e - 1.859
return np.exp((n + 1) * x / 2 * 1000) - 1.859
def aps15_fp(x, n):
if not 0 <= x <= 2 * 1e-3 / (1 + n):
return np.e - 1.859
return np.exp((n + 1) * x / 2 * 1000) * (n + 1) / 2 * 1000
def aps15_fpp(x, n):
if not 0 <= x <= 2 * 1e-3 / (1 + n):
return np.e - 1.859
return np.exp((n + 1) * x / 2 * 1000) * (n + 1) / 2 * 1000 * (n + 1) / 2 * 1000
# Each APS test case has
# - a function and its two derivatives,
# - additional arguments,
# - a bracket enclosing a root,
# - the order of differentiability of the function on this interval
# - a starting value for methods which don't require a bracket
# - the root (inside the bracket)
# - an Identifier of the test case
#
# Algorithm 748 is a bracketing algorithm so a bracketing interval was provided
# in [1] for each test case. Newton and Halley methods need a single
# starting point x0, which was chosen to be near the middle of the interval,
# unless that would have made the problem too easy.
_APS_TESTS_KEYS = [
"f", "fprime", "fprime2", "args", "bracket", "smoothness", "x0", "root", "ID"
]
_APS_TESTS = [
[aps01_f, aps01_fp, aps01_fpp, (), [np.pi / 2, np.pi], np.inf,
3, 1.89549426703398094e+00, "aps.01.00"],
[aps02_f, aps02_fp, aps02_fpp, (), [1 + 1e-9, 4 - 1e-9], np.inf,
2, 3.02291534727305677e+00, "aps.02.00"],
[aps02_f, aps02_fp, aps02_fpp, (), [4 + 1e-9, 9 - 1e-9], np.inf,
5, 6.68375356080807848e+00, "aps.02.01"],
[aps02_f, aps02_fp, aps02_fpp, (), [9 + 1e-9, 16 - 1e-9], np.inf,
10, 1.12387016550022114e+01, "aps.02.02"],
[aps02_f, aps02_fp, aps02_fpp, (), [16 + 1e-9, 25 - 1e-9], np.inf,
17, 1.96760000806234103e+01, "aps.02.03"],
[aps02_f, aps02_fp, aps02_fpp, (), [25 + 1e-9, 36 - 1e-9], np.inf,
26, 2.98282273265047557e+01, "aps.02.04"],
[aps02_f, aps02_fp, aps02_fpp, (), [36 + 1e-9, 49 - 1e-9], np.inf,
37, 4.19061161952894139e+01, "aps.02.05"],
[aps02_f, aps02_fp, aps02_fpp, (), [49 + 1e-9, 64 - 1e-9], np.inf,
50, 5.59535958001430913e+01, "aps.02.06"],
[aps02_f, aps02_fp, aps02_fpp, (), [64 + 1e-9, 81 - 1e-9], np.inf,
65, 7.19856655865877997e+01, "aps.02.07"],
[aps02_f, aps02_fp, aps02_fpp, (), [81 + 1e-9, 100 - 1e-9], np.inf,
82, 9.00088685391666701e+01, "aps.02.08"],
[aps02_f, aps02_fp, aps02_fpp, (), [100 + 1e-9, 121 - 1e-9], np.inf,
101, 1.10026532748330197e+02, "aps.02.09"],
[aps03_f, aps03_fp, aps03_fpp, (-40, -1), [-9, 31], np.inf,
-2, 0, "aps.03.00"],
[aps03_f, aps03_fp, aps03_fpp, (-100, -2), [-9, 31], np.inf,
-2, 0, "aps.03.01"],
[aps03_f, aps03_fp, aps03_fpp, (-200, -3), [-9, 31], np.inf,
-2, 0, "aps.03.02"],
[aps04_f, aps04_fp, aps04_fpp, (4, 0.2), [0, 5], np.inf,
2.5, 6.68740304976422006e-01, "aps.04.00"],
[aps04_f, aps04_fp, aps04_fpp, (6, 0.2), [0, 5], np.inf,
2.5, 7.64724491331730039e-01, "aps.04.01"],
[aps04_f, aps04_fp, aps04_fpp, (8, 0.2), [0, 5], np.inf,
2.5, 8.17765433957942545e-01, "aps.04.02"],
[aps04_f, aps04_fp, aps04_fpp, (10, 0.2), [0, 5], np.inf,
2.5, 8.51339922520784609e-01, "aps.04.03"],
[aps04_f, aps04_fp, aps04_fpp, (12, 0.2), [0, 5], np.inf,
2.5, 8.74485272221167897e-01, "aps.04.04"],
[aps04_f, aps04_fp, aps04_fpp, (4, 1), [0, 5], np.inf,
2.5, 1, "aps.04.05"],
[aps04_f, aps04_fp, aps04_fpp, (6, 1), [0, 5], np.inf,
2.5, 1, "aps.04.06"],
[aps04_f, aps04_fp, aps04_fpp, (8, 1), [0, 5], np.inf,
2.5, 1, "aps.04.07"],
[aps04_f, aps04_fp, aps04_fpp, (10, 1), [0, 5], np.inf,
2.5, 1, "aps.04.08"],
[aps04_f, aps04_fp, aps04_fpp, (12, 1), [0, 5], np.inf,
2.5, 1, "aps.04.09"],
[aps04_f, aps04_fp, aps04_fpp, (8, 1), [-0.95, 4.05], np.inf,
1.5, 1, "aps.04.10"],
[aps04_f, aps04_fp, aps04_fpp, (10, 1), [-0.95, 4.05], np.inf,
1.5, 1, "aps.04.11"],
[aps04_f, aps04_fp, aps04_fpp, (12, 1), [-0.95, 4.05], np.inf,
1.5, 1, "aps.04.12"],
[aps04_f, aps04_fp, aps04_fpp, (14, 1), [-0.95, 4.05], np.inf,
1.5, 1, "aps.04.13"],
[aps05_f, aps05_fp, aps05_fpp, (), [0, 1.5], np.inf,
1.3, np.pi / 6, "aps.05.00"],
[aps06_f, aps06_fp, aps06_fpp, (1,), [0, 1], np.inf,
0.5, 4.22477709641236709e-01, "aps.06.00"],
[aps06_f, aps06_fp, aps06_fpp, (2,), [0, 1], np.inf,
0.5, 3.06699410483203705e-01, "aps.06.01"],
[aps06_f, aps06_fp, aps06_fpp, (3,), [0, 1], np.inf,
0.5, 2.23705457654662959e-01, "aps.06.02"],
[aps06_f, aps06_fp, aps06_fpp, (4,), [0, 1], np.inf,
0.5, 1.71719147519508369e-01, "aps.06.03"],
[aps06_f, aps06_fp, aps06_fpp, (5,), [0, 1], np.inf,
0.4, 1.38257155056824066e-01, "aps.06.04"],
[aps06_f, aps06_fp, aps06_fpp, (20,), [0, 1], np.inf,
0.1, 3.46573590208538521e-02, "aps.06.05"],
[aps06_f, aps06_fp, aps06_fpp, (40,), [0, 1], np.inf,
5e-02, 1.73286795139986315e-02, "aps.06.06"],
[aps06_f, aps06_fp, aps06_fpp, (60,), [0, 1], np.inf,
1.0 / 30, 1.15524530093324210e-02, "aps.06.07"],
[aps06_f, aps06_fp, aps06_fpp, (80,), [0, 1], np.inf,
2.5e-02, 8.66433975699931573e-03, "aps.06.08"],
[aps06_f, aps06_fp, aps06_fpp, (100,), [0, 1], np.inf,
2e-02, 6.93147180559945415e-03, "aps.06.09"],
[aps07_f, aps07_fp, aps07_fpp, (5,), [0, 1], np.inf,
0.4, 3.84025518406218985e-02, "aps.07.00"],
[aps07_f, aps07_fp, aps07_fpp, (10,), [0, 1], np.inf,
0.4, 9.90000999800049949e-03, "aps.07.01"],
[aps07_f, aps07_fp, aps07_fpp, (20,), [0, 1], np.inf,
0.4, 2.49375003906201174e-03, "aps.07.02"],
[aps08_f, aps08_fp, aps08_fpp, (2,), [0, 1], np.inf,
0.9, 0.5, "aps.08.00"],
[aps08_f, aps08_fp, aps08_fpp, (5,), [0, 1], np.inf,
0.9, 3.45954815848242059e-01, "aps.08.01"],
[aps08_f, aps08_fp, aps08_fpp, (10,), [0, 1], np.inf,
0.9, 2.45122333753307220e-01, "aps.08.02"],
[aps08_f, aps08_fp, aps08_fpp, (15,), [0, 1], np.inf,
0.9, 1.95547623536565629e-01, "aps.08.03"],
[aps08_f, aps08_fp, aps08_fpp, (20,), [0, 1], np.inf,
0.9, 1.64920957276440960e-01, "aps.08.04"],
[aps09_f, aps09_fp, aps09_fpp, (1,), [0, 1], np.inf,
0.5, 2.75508040999484394e-01, "aps.09.00"],
[aps09_f, aps09_fp, aps09_fpp, (2,), [0, 1], np.inf,
0.5, 1.37754020499742197e-01, "aps.09.01"],
[aps09_f, aps09_fp, aps09_fpp, (4,), [0, 1], np.inf,
0.5, 1.03052837781564422e-02, "aps.09.02"],
[aps09_f, aps09_fp, aps09_fpp, (5,), [0, 1], np.inf,
0.5, 3.61710817890406339e-03, "aps.09.03"],
[aps09_f, aps09_fp, aps09_fpp, (8,), [0, 1], np.inf,
0.5, 4.10872918496395375e-04, "aps.09.04"],
[aps09_f, aps09_fp, aps09_fpp, (15,), [0, 1], np.inf,
0.5, 2.59895758929076292e-05, "aps.09.05"],
[aps09_f, aps09_fp, aps09_fpp, (20,), [0, 1], np.inf,
0.5, 7.66859512218533719e-06, "aps.09.06"],
[aps10_f, aps10_fp, aps10_fpp, (1,), [0, 1], np.inf,
0.9, 4.01058137541547011e-01, "aps.10.00"],
[aps10_f, aps10_fp, aps10_fpp, (5,), [0, 1], np.inf,
0.9, 5.16153518757933583e-01, "aps.10.01"],
[aps10_f, aps10_fp, aps10_fpp, (10,), [0, 1], np.inf,
0.9, 5.39522226908415781e-01, "aps.10.02"],
[aps10_f, aps10_fp, aps10_fpp, (15,), [0, 1], np.inf,
0.9, 5.48182294340655241e-01, "aps.10.03"],
[aps10_f, aps10_fp, aps10_fpp, (20,), [0, 1], np.inf,
0.9, 5.52704666678487833e-01, "aps.10.04"],
[aps11_f, aps11_fp, aps11_fpp, (2,), [0.01, 1], np.inf,
1e-02, 1.0 / 2, "aps.11.00"],
[aps11_f, aps11_fp, aps11_fpp, (5,), [0.01, 1], np.inf,
1e-02, 1.0 / 5, "aps.11.01"],
[aps11_f, aps11_fp, aps11_fpp, (15,), [0.01, 1], np.inf,
1e-02, 1.0 / 15, "aps.11.02"],
[aps11_f, aps11_fp, aps11_fpp, (20,), [0.01, 1], np.inf,
1e-02, 1.0 / 20, "aps.11.03"],
[aps12_f, aps12_fp, aps12_fpp, (2,), [1, 100], np.inf,
1.1, 2, "aps.12.00"],
[aps12_f, aps12_fp, aps12_fpp, (3,), [1, 100], np.inf,
1.1, 3, "aps.12.01"],
[aps12_f, aps12_fp, aps12_fpp, (4,), [1, 100], np.inf,
1.1, 4, "aps.12.02"],
[aps12_f, aps12_fp, aps12_fpp, (5,), [1, 100], np.inf,
1.1, 5, "aps.12.03"],
[aps12_f, aps12_fp, aps12_fpp, (6,), [1, 100], np.inf,
1.1, 6, "aps.12.04"],
[aps12_f, aps12_fp, aps12_fpp, (7,), [1, 100], np.inf,
1.1, 7, "aps.12.05"],
[aps12_f, aps12_fp, aps12_fpp, (9,), [1, 100], np.inf,
1.1, 9, "aps.12.06"],
[aps12_f, aps12_fp, aps12_fpp, (11,), [1, 100], np.inf,
1.1, 11, "aps.12.07"],
[aps12_f, aps12_fp, aps12_fpp, (13,), [1, 100], np.inf,
1.1, 13, "aps.12.08"],
[aps12_f, aps12_fp, aps12_fpp, (15,), [1, 100], np.inf,
1.1, 15, "aps.12.09"],
[aps12_f, aps12_fp, aps12_fpp, (17,), [1, 100], np.inf,
1.1, 17, "aps.12.10"],
[aps12_f, aps12_fp, aps12_fpp, (19,), [1, 100], np.inf,
1.1, 19, "aps.12.11"],
[aps12_f, aps12_fp, aps12_fpp, (21,), [1, 100], np.inf,
1.1, 21, "aps.12.12"],
[aps12_f, aps12_fp, aps12_fpp, (23,), [1, 100], np.inf,
1.1, 23, "aps.12.13"],
[aps12_f, aps12_fp, aps12_fpp, (25,), [1, 100], np.inf,
1.1, 25, "aps.12.14"],
[aps12_f, aps12_fp, aps12_fpp, (27,), [1, 100], np.inf,
1.1, 27, "aps.12.15"],
[aps12_f, aps12_fp, aps12_fpp, (29,), [1, 100], np.inf,
1.1, 29, "aps.12.16"],
[aps12_f, aps12_fp, aps12_fpp, (31,), [1, 100], np.inf,
1.1, 31, "aps.12.17"],
[aps12_f, aps12_fp, aps12_fpp, (33,), [1, 100], np.inf,
1.1, 33, "aps.12.18"],
[aps13_f, aps13_fp, aps13_fpp, (), [-1, 4], np.inf,
1.5, 0, "aps.13.00"],
[aps14_f, aps14_fp, aps14_fpp, (1,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.00"],
[aps14_f, aps14_fp, aps14_fpp, (2,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.01"],
[aps14_f, aps14_fp, aps14_fpp, (3,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.02"],
[aps14_f, aps14_fp, aps14_fpp, (4,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.03"],
[aps14_f, aps14_fp, aps14_fpp, (5,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.04"],
[aps14_f, aps14_fp, aps14_fpp, (6,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.05"],
[aps14_f, aps14_fp, aps14_fpp, (7,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.06"],
[aps14_f, aps14_fp, aps14_fpp, (8,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.07"],
[aps14_f, aps14_fp, aps14_fpp, (9,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.08"],
[aps14_f, aps14_fp, aps14_fpp, (10,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.09"],
[aps14_f, aps14_fp, aps14_fpp, (11,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.10"],
[aps14_f, aps14_fp, aps14_fpp, (12,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.11"],
[aps14_f, aps14_fp, aps14_fpp, (13,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.12"],
[aps14_f, aps14_fp, aps14_fpp, (14,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.13"],
[aps14_f, aps14_fp, aps14_fpp, (15,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.14"],
[aps14_f, aps14_fp, aps14_fpp, (16,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.15"],
[aps14_f, aps14_fp, aps14_fpp, (17,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.16"],
[aps14_f, aps14_fp, aps14_fpp, (18,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.17"],
[aps14_f, aps14_fp, aps14_fpp, (19,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.18"],
[aps14_f, aps14_fp, aps14_fpp, (20,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.19"],
[aps14_f, aps14_fp, aps14_fpp, (21,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.20"],
[aps14_f, aps14_fp, aps14_fpp, (22,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.21"],
[aps14_f, aps14_fp, aps14_fpp, (23,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.22"],
[aps14_f, aps14_fp, aps14_fpp, (24,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.23"],
[aps14_f, aps14_fp, aps14_fpp, (25,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.24"],
[aps14_f, aps14_fp, aps14_fpp, (26,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.25"],
[aps14_f, aps14_fp, aps14_fpp, (27,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.26"],
[aps14_f, aps14_fp, aps14_fpp, (28,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.27"],
[aps14_f, aps14_fp, aps14_fpp, (29,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.28"],
[aps14_f, aps14_fp, aps14_fpp, (30,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.29"],
[aps14_f, aps14_fp, aps14_fpp, (31,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.30"],
[aps14_f, aps14_fp, aps14_fpp, (32,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.31"],
[aps14_f, aps14_fp, aps14_fpp, (33,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.32"],
[aps14_f, aps14_fp, aps14_fpp, (34,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.33"],
[aps14_f, aps14_fp, aps14_fpp, (35,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.34"],
[aps14_f, aps14_fp, aps14_fpp, (36,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.35"],
[aps14_f, aps14_fp, aps14_fpp, (37,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.36"],
[aps14_f, aps14_fp, aps14_fpp, (38,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.37"],
[aps14_f, aps14_fp, aps14_fpp, (39,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.38"],
[aps14_f, aps14_fp, aps14_fpp, (40,), [-1000, np.pi / 2], 0,
1, 6.23806518961612433e-01, "aps.14.39"],
[aps15_f, aps15_fp, aps15_fpp, (20,), [-1000, 1e-4], 0,
-2, 5.90513055942197166e-05, "aps.15.00"],
[aps15_f, aps15_fp, aps15_fpp, (21,), [-1000, 1e-4], 0,
-2, 5.63671553399369967e-05, "aps.15.01"],
[aps15_f, aps15_fp, aps15_fpp, (22,), [-1000, 1e-4], 0,
-2, 5.39164094555919196e-05, "aps.15.02"],
[aps15_f, aps15_fp, aps15_fpp, (23,), [-1000, 1e-4], 0,
-2, 5.16698923949422470e-05, "aps.15.03"],
[aps15_f, aps15_fp, aps15_fpp, (24,), [-1000, 1e-4], 0,
-2, 4.96030966991445609e-05, "aps.15.04"],
[aps15_f, aps15_fp, aps15_fpp, (25,), [-1000, 1e-4], 0,
-2, 4.76952852876389951e-05, "aps.15.05"],
[aps15_f, aps15_fp, aps15_fpp, (26,), [-1000, 1e-4], 0,
-2, 4.59287932399486662e-05, "aps.15.06"],
[aps15_f, aps15_fp, aps15_fpp, (27,), [-1000, 1e-4], 0,
-2, 4.42884791956647841e-05, "aps.15.07"],
[aps15_f, aps15_fp, aps15_fpp, (28,), [-1000, 1e-4], 0,
-2, 4.27612902578832391e-05, "aps.15.08"],
[aps15_f, aps15_fp, aps15_fpp, (29,), [-1000, 1e-4], 0,
-2, 4.13359139159538030e-05, "aps.15.09"],
[aps15_f, aps15_fp, aps15_fpp, (30,), [-1000, 1e-4], 0,
-2, 4.00024973380198076e-05, "aps.15.10"],
[aps15_f, aps15_fp, aps15_fpp, (31,), [-1000, 1e-4], 0,
-2, 3.87524192962066869e-05, "aps.15.11"],
[aps15_f, aps15_fp, aps15_fpp, (32,), [-1000, 1e-4], 0,
-2, 3.75781035599579910e-05, "aps.15.12"],
[aps15_f, aps15_fp, aps15_fpp, (33,), [-1000, 1e-4], 0,
-2, 3.64728652199592355e-05, "aps.15.13"],
[aps15_f, aps15_fp, aps15_fpp, (34,), [-1000, 1e-4], 0,
-2, 3.54307833565318273e-05, "aps.15.14"],
[aps15_f, aps15_fp, aps15_fpp, (35,), [-1000, 1e-4], 0,
-2, 3.44465949299614980e-05, "aps.15.15"],
[aps15_f, aps15_fp, aps15_fpp, (36,), [-1000, 1e-4], 0,
-2, 3.35156058778003705e-05, "aps.15.16"],
[aps15_f, aps15_fp, aps15_fpp, (37,), [-1000, 1e-4], 0,
-2, 3.26336162494372125e-05, "aps.15.17"],
[aps15_f, aps15_fp, aps15_fpp, (38,), [-1000, 1e-4], 0,
-2, 3.17968568584260013e-05, "aps.15.18"],
[aps15_f, aps15_fp, aps15_fpp, (39,), [-1000, 1e-4], 0,
-2, 3.10019354369653455e-05, "aps.15.19"],
[aps15_f, aps15_fp, aps15_fpp, (40,), [-1000, 1e-4], 0,
-2, 3.02457906702100968e-05, "aps.15.20"],
[aps15_f, aps15_fp, aps15_fpp, (100,), [-1000, 1e-4], 0,
-2, 1.22779942324615231e-05, "aps.15.21"],
[aps15_f, aps15_fp, aps15_fpp, (200,), [-1000, 1e-4], 0,
-2, 6.16953939044086617e-06, "aps.15.22"],
[aps15_f, aps15_fp, aps15_fpp, (300,), [-1000, 1e-4], 0,
-2, 4.11985852982928163e-06, "aps.15.23"],
[aps15_f, aps15_fp, aps15_fpp, (400,), [-1000, 1e-4], 0,
-2, 3.09246238772721682e-06, "aps.15.24"],
[aps15_f, aps15_fp, aps15_fpp, (500,), [-1000, 1e-4], 0,
-2, 2.47520442610501789e-06, "aps.15.25"],
[aps15_f, aps15_fp, aps15_fpp, (600,), [-1000, 1e-4], 0,
-2, 2.06335676785127107e-06, "aps.15.26"],
[aps15_f, aps15_fp, aps15_fpp, (700,), [-1000, 1e-4], 0,
-2, 1.76901200781542651e-06, "aps.15.27"],
[aps15_f, aps15_fp, aps15_fpp, (800,), [-1000, 1e-4], 0,
-2, 1.54816156988591016e-06, "aps.15.28"],
[aps15_f, aps15_fp, aps15_fpp, (900,), [-1000, 1e-4], 0,
-2, 1.37633453660223511e-06, "aps.15.29"],
[aps15_f, aps15_fp, aps15_fpp, (1000,), [-1000, 1e-4], 0,
-2, 1.23883857889971403e-06, "aps.15.30"]
]
_APS_TESTS_DICTS = [dict(zip(_APS_TESTS_KEYS, testcase)) for testcase in _APS_TESTS]
# ##################
# "complex" test cases
# A few simple, complex-valued, functions, defined on the complex plane.
def cplx01_f(z, n, a):
r"""z**n-a: Use to find the nth root of a"""
return z**n - a
def cplx01_fp(z, n, a):
return n * z**(n - 1)
def cplx01_fpp(z, n, a):
return n * (n - 1) * z**(n - 2)
def cplx02_f(z, a):
r"""e**z - a: Use to find the log of a"""
return np.exp(z) - a
def cplx02_fp(z, a):
return np.exp(z)
def cplx02_fpp(z, a):
return np.exp(z)
# Each "complex" test case has
# - a function and its two derivatives,
# - additional arguments,
# - the order of differentiability of the function on this interval
# - two starting values x0 and x1
# - the root
# - an Identifier of the test case
#
# Algorithm 748 is a bracketing algorithm so a bracketing interval was provided
# in [1] for each test case. Newton and Halley need a single starting point
# x0, which was chosen to be near the middle of the interval, unless that
# would make the problem too easy.
_COMPLEX_TESTS_KEYS = [
"f", "fprime", "fprime2", "args", "smoothness", "x0", "x1", "root", "ID"
]
_COMPLEX_TESTS = [
[cplx01_f, cplx01_fp, cplx01_fpp, (2, -1), np.inf,
(1 + 1j), (0.5 + 0.5j), 1j, "complex.01.00"],
[cplx01_f, cplx01_fp, cplx01_fpp, (3, 1), np.inf,
(-1 + 1j), (-0.5 + 2.0j), (-0.5 + np.sqrt(3) / 2 * 1.0j),
"complex.01.01"],
[cplx01_f, cplx01_fp, cplx01_fpp, (3, -1), np.inf,
1j, (0.5 + 0.5j), (0.5 + np.sqrt(3) / 2 * 1.0j),
"complex.01.02"],
[cplx01_f, cplx01_fp, cplx01_fpp, (3, 8), np.inf,
5, 4, 2, "complex.01.03"],
[cplx02_f, cplx02_fp, cplx02_fpp, (-1,), np.inf,
(1 + 2j), (0.5 + 0.5j), np.pi * 1.0j, "complex.02.00"],
[cplx02_f, cplx02_fp, cplx02_fpp, (1j,), np.inf,
(1 + 2j), (0.5 + 0.5j), np.pi * 0.5j, "complex.02.01"],
]
_COMPLEX_TESTS_DICTS = [
dict(zip(_COMPLEX_TESTS_KEYS, testcase)) for testcase in _COMPLEX_TESTS
]
def _add_a_b(tests):
r"""Add "a" and "b" keys to each test from the "bracket" value"""
for d in tests:
for k, v in zip(['a', 'b'], d.get('bracket', [])):
d[k] = v
_add_a_b(_ORIGINAL_TESTS_DICTS)
_add_a_b(_APS_TESTS_DICTS)
_add_a_b(_COMPLEX_TESTS_DICTS)
def get_tests(collection='original', smoothness=None):
r"""Return the requested collection of test cases, as an array of dicts with subset-specific keys
Allowed values of collection:
'original': The original benchmarking functions.
Real-valued functions of real-valued inputs on an interval with a zero.
f1, .., f3 are continuous and infinitely differentiable
f4 has a single discontinuity at the root
f5 has a root at 1 replacing a 1st order pole
f6 is randomly positive on one side of the root, randomly negative on the other
'aps': The test problems in the TOMS "Algorithm 748: Enclosing Zeros of Continuous Functions"
paper by Alefeld, Potra and Shi. Real-valued functions of
real-valued inputs on an interval with a zero.
Suitable for methods which start with an enclosing interval, and
derivatives up to 2nd order.
'complex': Some complex-valued functions of complex-valued inputs.
No enclosing bracket is provided.
Suitable for methods which use one or more starting values, and
derivatives up to 2nd order.
The dictionary keys will be a subset of
["f", "fprime", "fprime2", "args", "bracket", "a", b", "smoothness", "x0", "x1", "root", "ID"]
""" # noqa: E501
collection = collection or "original"
subsets = {"aps": _APS_TESTS_DICTS,
"complex": _COMPLEX_TESTS_DICTS,
"original": _ORIGINAL_TESTS_DICTS,
"chandrupatla": _CHANDRUPATLA_TESTS_DICTS}
tests = subsets.get(collection, [])
if smoothness is not None:
tests = [tc for tc in tests if tc['smoothness'] >= smoothness]
return tests
# Backwards compatibility
methods = [cc.bisect, cc.ridder, cc.brenth, cc.brentq]
mstrings = ['cc.bisect', 'cc.ridder', 'cc.brenth', 'cc.brentq']
functions = [f2, f3, f4, f5, f6]
fstrings = ['f2', 'f3', 'f4', 'f5', 'f6']
# ##################
# "Chandrupatla" test cases
# Functions and test cases that appear in [2]
def fun1(x):
return x**3 - 2*x - 5
fun1.root = 2.0945514815423265 # additional precision using mpmath.findroot
def fun2(x):
return 1 - 1/x**2
fun2.root = 1
def fun3(x):
return (x-3)**3
fun3.root = 3
def fun4(x):
return 6*(x-2)**5
fun4.root = 2
def fun5(x):
return x**9
fun5.root = 0
def fun6(x):
return x**19
fun6.root = 0
def fun7(x):
return 0 if abs(x) < 3.8e-4 else x*np.exp(-x**(-2))
fun7.root = 0
def fun8(x):
xi = 0.61489
return -(3062*(1-xi)*np.exp(-x))/(xi + (1-xi)*np.exp(-x)) - 1013 + 1628/x
fun8.root = 1.0375360332870405
def fun9(x):
return np.exp(x) - 2 - 0.01/x**2 + .000002/x**3
fun9.root = 0.7032048403631358
# Each "chandropatla" test case has
# - a function,
# - two starting values x0 and x1
# - the root
# - the number of function evaluations required by Chandrupatla's algorithm
# - an Identifier of the test case
#
# Chandrupatla's is a bracketing algorithm, so a bracketing interval was
# provided in [2] for each test case. No special support for testing with
# secant/Newton/Halley is provided.
_CHANDRUPATLA_TESTS_KEYS = ["f", "bracket", "root", "nfeval", "ID"]
_CHANDRUPATLA_TESTS = [
[fun1, [2, 3], fun1.root, 7],
[fun1, [1, 10], fun1.root, 11],
[fun1, [1, 100], fun1.root, 14],
[fun1, [-1e4, 1e4], fun1.root, 23],
[fun1, [-1e10, 1e10], fun1.root, 43],
[fun2, [0.5, 1.51], fun2.root, 8],
[fun2, [1e-4, 1e4], fun2.root, 22],
[fun2, [1e-6, 1e6], fun2.root, 28],
[fun2, [1e-10, 1e10], fun2.root, 41],
[fun2, [1e-12, 1e12], fun2.root, 48],
[fun3, [0, 5], fun3.root, 21],
[fun3, [-10, 10], fun3.root, 23],
[fun3, [-1e4, 1e4], fun3.root, 36],
[fun3, [-1e6, 1e6], fun3.root, 45],
[fun3, [-1e10, 1e10], fun3.root, 55],
[fun4, [0, 5], fun4.root, 21],
[fun4, [-10, 10], fun4.root, 23],
[fun4, [-1e4, 1e4], fun4.root, 33],
[fun4, [-1e6, 1e6], fun4.root, 43],
[fun4, [-1e10, 1e10], fun4.root, 54],
[fun5, [-1, 4], fun5.root, 21],
[fun5, [-2, 5], fun5.root, 22],
[fun5, [-1, 10], fun5.root, 23],
[fun5, [-5, 50], fun5.root, 25],
[fun5, [-10, 100], fun5.root, 26],
[fun6, [-1., 4.], fun6.root, 21],
[fun6, [-2., 5.], fun6.root, 22],
[fun6, [-1., 10.], fun6.root, 23],
[fun6, [-5., 50.], fun6.root, 25],
[fun6, [-10., 100.], fun6.root, 26],
[fun7, [-1, 4], fun7.root, 8],
[fun7, [-2, 5], fun7.root, 8],
[fun7, [-1, 10], fun7.root, 11],
[fun7, [-5, 50], fun7.root, 18],
[fun7, [-10, 100], fun7.root, 19],
[fun8, [2e-4, 2], fun8.root, 9],
[fun8, [2e-4, 3], fun8.root, 10],
[fun8, [2e-4, 9], fun8.root, 11],
[fun8, [2e-4, 27], fun8.root, 12],
[fun8, [2e-4, 81], fun8.root, 14],
[fun9, [2e-4, 1], fun9.root, 7],
[fun9, [2e-4, 3], fun9.root, 8],
[fun9, [2e-4, 9], fun9.root, 10],
[fun9, [2e-4, 27], fun9.root, 11],
[fun9, [2e-4, 81], fun9.root, 13],
]
_CHANDRUPATLA_TESTS = [test + [f'{test[0].__name__}.{i%5+1}']
for i, test in enumerate(_CHANDRUPATLA_TESTS)]
_CHANDRUPATLA_TESTS_DICTS = [dict(zip(_CHANDRUPATLA_TESTS_KEYS, testcase))
for testcase in _CHANDRUPATLA_TESTS]
_add_a_b(_CHANDRUPATLA_TESTS_DICTS)
File diff suppressed because it is too large Load Diff

Some files were not shown because too many files have changed in this diff Show More