feat: initial commit - Phase 1 & 2 core features

2026-04-22 17:07:33 +08:00
commit 1773bda06b
25005 changed files with 6252106 additions and 0 deletions
@@ -0,0 +1,76 @@
+From the website for the L-BFGS-B code (from at
+http://www.ece.northwestern.edu/~nocedal/lbfgsb.html):
+
+"""
+L-BFGS-B is a limited-memory quasi-Newton code for bound-constrained
+optimization, i.e. for problems where the only constraints are of the
+form l<= x <= u.
+"""
+
+This is a Python wrapper (using F2PY) written by David M. Cooke
+<cookedm@physics.mcmaster.ca> and released as version 0.9 on April 9, 2004.
+The wrapper was slightly modified by Joonas Paalasmaa for the 3.0 version
+in March 2012.
+
+License of L-BFGS-B (Fortran code)
+==================================
+
+The version included here (in lbfgsb.f) is 3.0 (released April 25, 2011). It was
+written by Ciyou Zhu, Richard Byrd, and Jorge Nocedal <nocedal@ece.nwu.edu>. It
+carries the following condition for use:
+
+  """
+  This software is freely available, but we expect that all publications
+  describing work using this software, or all commercial products using it,
+  quote at least one of the references given below. This software is released
+  under the BSD License.
+  
+  References
+    * R. H. Byrd, P. Lu and J. Nocedal. A Limited Memory Algorithm for Bound
+      Constrained Optimization, (1995), SIAM Journal on Scientific and
+      Statistical Computing, 16, 5, pp. 1190-1208.
+    * C. Zhu, R. H. Byrd and J. Nocedal. L-BFGS-B: Algorithm 778: L-BFGS-B,
+      FORTRAN routines for large scale bound constrained optimization (1997),
+      ACM Transactions on Mathematical Software, 23, 4, pp. 550 - 560.
+    * J.L. Morales and J. Nocedal. L-BFGS-B: Remark on Algorithm 778: L-BFGS-B,
+      FORTRAN routines for large scale bound constrained optimization (2011),
+      ACM Transactions on Mathematical Software, 38, 1.
+  """
+
+The Python wrapper
+==================
+
+This code uses F2PY (http://cens.ioc.ee/projects/f2py2e/) to generate
+the wrapper around the Fortran code.
+
+The Python code and wrapper are copyrighted 2004 by David M. Cooke
+<cookedm@physics.mcmaster.ca>.
+
+Example usage
+=============
+
+An example of the usage is given at the bottom of the lbfgsb.py file.
+Run it with 'python lbfgsb.py'.
+
+License for the Python wrapper
+==============================
+
+Copyright (c) 2004 David M. Cooke <cookedm@physics.mcmaster.ca>
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
@@ -0,0 +1,451 @@
+"""
+=====================================================
+Optimization and root finding (:mod:`scipy.optimize`)
+=====================================================
+
+.. currentmodule:: scipy.optimize
+
+.. toctree::
+   :hidden:
+
+   optimize.cython_optimize
+
+SciPy ``optimize`` provides functions for minimizing (or maximizing)
+objective functions, possibly subject to constraints. It includes
+solvers for nonlinear problems (with support for both local and global
+optimization algorithms), linear programming, constrained
+and nonlinear least-squares, root finding, and curve fitting.
+
+Common functions and objects, shared across different solvers, are:
+
+.. autosummary::
+   :toctree: generated/
+
+   show_options - Show specific options optimization solvers.
+   OptimizeResult - The optimization result returned by some optimizers.
+   OptimizeWarning - The optimization encountered problems.
+
+
+Optimization
+============
+
+Scalar functions optimization
+-----------------------------
+
+.. autosummary::
+   :toctree: generated/
+
+   minimize_scalar - Interface for minimizers of univariate functions
+
+The `minimize_scalar` function supports the following methods:
+
+.. toctree::
+
+   optimize.minimize_scalar-brent
+   optimize.minimize_scalar-bounded
+   optimize.minimize_scalar-golden
+
+Local (multivariate) optimization
+---------------------------------
+
+.. autosummary::
+   :toctree: generated/
+
+   minimize - Interface for minimizers of multivariate functions.
+
+The `minimize` function supports the following methods:
+
+.. toctree::
+
+   optimize.minimize-neldermead
+   optimize.minimize-powell
+   optimize.minimize-cg
+   optimize.minimize-bfgs
+   optimize.minimize-newtoncg
+   optimize.minimize-lbfgsb
+   optimize.minimize-tnc
+   optimize.minimize-cobyla
+   optimize.minimize-slsqp
+   optimize.minimize-trustconstr
+   optimize.minimize-dogleg
+   optimize.minimize-trustncg
+   optimize.minimize-trustkrylov
+   optimize.minimize-trustexact
+
+Constraints are passed to `minimize` function as a single object or
+as a list of objects from the following classes:
+
+.. autosummary::
+   :toctree: generated/
+
+   NonlinearConstraint - Class defining general nonlinear constraints.
+   LinearConstraint - Class defining general linear constraints.
+
+Simple bound constraints are handled separately and there is a special class
+for them:
+
+.. autosummary::
+   :toctree: generated/
+
+   Bounds - Bound constraints.
+
+Quasi-Newton strategies implementing `HessianUpdateStrategy`
+interface can be used to approximate the Hessian in `minimize`
+function (available only for the 'trust-constr' method). Available
+quasi-Newton methods implementing this interface are:
+
+.. autosummary::
+   :toctree: generated/
+
+   BFGS - Broyden-Fletcher-Goldfarb-Shanno (BFGS) Hessian update strategy.
+   SR1 - Symmetric-rank-1 Hessian update strategy.
+
+.. _global_optimization:
+
+Global optimization
+-------------------
+
+.. autosummary::
+   :toctree: generated/
+
+   basinhopping - Basinhopping stochastic optimizer.
+   brute - Brute force searching optimizer.
+   differential_evolution - Stochastic optimizer using differential evolution.
+
+   shgo - Simplicial homology global optimizer.
+   dual_annealing - Dual annealing stochastic optimizer.
+   direct - DIRECT (Dividing Rectangles) optimizer.
+
+Least-squares and curve fitting
+===============================
+
+Nonlinear least-squares
+-----------------------
+
+.. autosummary::
+   :toctree: generated/
+
+   least_squares - Solve a nonlinear least-squares problem with bounds on the variables.
+
+Linear least-squares
+--------------------
+
+.. autosummary::
+   :toctree: generated/
+
+   nnls - Linear least-squares problem with non-negativity constraint.
+   lsq_linear - Linear least-squares problem with bound constraints.
+   isotonic_regression - Least squares problem of isotonic regression via PAVA.
+
+Curve fitting
+-------------
+
+.. autosummary::
+   :toctree: generated/
+
+   curve_fit -- Fit curve to a set of points.
+
+Root finding
+============
+
+Scalar functions
+----------------
+.. autosummary::
+   :toctree: generated/
+
+   root_scalar - Unified interface for nonlinear solvers of scalar functions.
+   brentq - quadratic interpolation Brent method.
+   brenth - Brent method, modified by Harris with hyperbolic extrapolation.
+   ridder - Ridder's method.
+   bisect - Bisection method.
+   newton - Newton's method (also Secant and Halley's methods).
+   toms748 - Alefeld, Potra & Shi Algorithm 748.
+   RootResults - The root finding result returned by some root finders.
+
+The `root_scalar` function supports the following methods:
+
+.. toctree::
+
+   optimize.root_scalar-brentq
+   optimize.root_scalar-brenth
+   optimize.root_scalar-bisect
+   optimize.root_scalar-ridder
+   optimize.root_scalar-newton
+   optimize.root_scalar-toms748
+   optimize.root_scalar-secant
+   optimize.root_scalar-halley
+
+
+
+The table below lists situations and appropriate methods, along with
+*asymptotic* convergence rates per iteration (and per function evaluation)
+for successful convergence to a simple root(*).
+Bisection is the slowest of them all, adding one bit of accuracy for each
+function evaluation, but is guaranteed to converge.
+The other bracketing methods all (eventually) increase the number of accurate
+bits by about 50% for every function evaluation.
+The derivative-based methods, all built on `newton`, can converge quite quickly
+if the initial value is close to the root.  They can also be applied to
+functions defined on (a subset of) the complex plane.
+
+-------------+----------+----------+-----------+-------------+-------------+----------------+
+| Domain of f | Bracket? |    Derivatives?      | Solvers     |        Convergence           |
+             +          +----------+-----------+             +-------------+----------------+
+|             |          | `fprime` | `fprime2` |             | Guaranteed? |  Rate(s)(*)    |
+=============+==========+==========+===========+=============+=============+================+
+| `R`         | Yes      | N/A      | N/A       | - bisection | - Yes       | - 1 "Linear"   |
+|             |          |          |           | - brentq    | - Yes       | - >=1, <= 1.62 |
+|             |          |          |           | - brenth    | - Yes       | - >=1, <= 1.62 |
+|             |          |          |           | - ridder    | - Yes       | - 2.0 (1.41)   |
+|             |          |          |           | - toms748   | - Yes       | - 2.7 (1.65)   |
+-------------+----------+----------+-----------+-------------+-------------+----------------+
+| `R` or `C`  | No       | No       | No        | secant      | No          | 1.62 (1.62)    |
+-------------+----------+----------+-----------+-------------+-------------+----------------+
+| `R` or `C`  | No       | Yes      | No        | newton      | No          | 2.00 (1.41)    |
+-------------+----------+----------+-----------+-------------+-------------+----------------+
+| `R` or `C`  | No       | Yes      | Yes       | halley      | No          | 3.00 (1.44)    |
+-------------+----------+----------+-----------+-------------+-------------+----------------+
+
+.. seealso::
+
+   `scipy.optimize.cython_optimize` -- Typed Cython versions of root finding functions
+
+Fixed point finding:
+
+.. autosummary::
+   :toctree: generated/
+
+   fixed_point - Single-variable fixed-point solver.
+
+Multidimensional
+----------------
+
+.. autosummary::
+   :toctree: generated/
+
+   root - Unified interface for nonlinear solvers of multivariate functions.
+
+The `root` function supports the following methods:
+
+.. toctree::
+
+   optimize.root-hybr
+   optimize.root-lm
+   optimize.root-broyden1
+   optimize.root-broyden2
+   optimize.root-anderson
+   optimize.root-linearmixing
+   optimize.root-diagbroyden
+   optimize.root-excitingmixing
+   optimize.root-krylov
+   optimize.root-dfsane
+
+Linear programming / MILP
+=========================
+
+.. autosummary::
+   :toctree: generated/
+
+   milp -- Mixed integer linear programming.
+   linprog -- Unified interface for minimizers of linear programming problems.
+
+The `linprog` function supports the following methods:
+
+.. toctree::
+
+   optimize.linprog-simplex
+   optimize.linprog-interior-point
+   optimize.linprog-revised_simplex
+   optimize.linprog-highs-ipm
+   optimize.linprog-highs-ds
+   optimize.linprog-highs
+
+The simplex, interior-point, and revised simplex methods support callback
+functions, such as:
+
+.. autosummary::
+   :toctree: generated/
+
+   linprog_verbose_callback -- Sample callback function for linprog (simplex).
+
+Assignment problems
+===================
+
+.. autosummary::
+   :toctree: generated/
+
+   linear_sum_assignment -- Solves the linear-sum assignment problem.
+   quadratic_assignment -- Solves the quadratic assignment problem.
+
+The `quadratic_assignment` function supports the following methods:
+
+.. toctree::
+
+   optimize.qap-faq
+   optimize.qap-2opt
+
+Utilities
+=========
+
+Finite-difference approximation
+-------------------------------
+
+.. autosummary::
+   :toctree: generated/
+
+   approx_fprime - Approximate the gradient of a scalar function.
+   check_grad - Check the supplied derivative using finite differences.
+
+
+Line search
+-----------
+
+.. autosummary::
+   :toctree: generated/
+
+   bracket - Bracket a minimum, given two starting points.
+   line_search - Return a step that satisfies the strong Wolfe conditions.
+
+Hessian approximation
+---------------------
+
+.. autosummary::
+   :toctree: generated/
+
+   LbfgsInvHessProduct - Linear operator for L-BFGS approximate inverse Hessian.
+   HessianUpdateStrategy - Interface for implementing Hessian update strategies
+
+Benchmark problems
+------------------
+
+.. autosummary::
+   :toctree: generated/
+
+   rosen - The Rosenbrock function.
+   rosen_der - The derivative of the Rosenbrock function.
+   rosen_hess - The Hessian matrix of the Rosenbrock function.
+   rosen_hess_prod - Product of the Rosenbrock Hessian with a vector.
+
+Legacy functions
+================
+
+The functions below are not recommended for use in new scripts;
+all of these methods are accessible via a newer, more consistent
+interfaces, provided by the interfaces above.
+
+Optimization
+------------
+
+General-purpose multivariate methods:
+
+.. autosummary::
+   :toctree: generated/
+
+   fmin - Nelder-Mead Simplex algorithm.
+   fmin_powell - Powell's (modified) conjugate direction method.
+   fmin_cg - Non-linear (Polak-Ribiere) conjugate gradient algorithm.
+   fmin_bfgs - Quasi-Newton method (Broydon-Fletcher-Goldfarb-Shanno).
+   fmin_ncg - Line-search Newton Conjugate Gradient.
+
+Constrained multivariate methods:
+
+.. autosummary::
+   :toctree: generated/
+
+   fmin_l_bfgs_b - Zhu, Byrd, and Nocedal's constrained optimizer.
+   fmin_tnc - Truncated Newton code.
+   fmin_cobyla - Constrained optimization by linear approximation.
+   fmin_slsqp - Minimization using sequential least-squares programming.
+
+Univariate (scalar) minimization methods:
+
+.. autosummary::
+   :toctree: generated/
+
+   fminbound - Bounded minimization of a scalar function.
+   brent - 1-D function minimization using Brent method.
+   golden - 1-D function minimization using Golden Section method.
+
+Least-squares
+-------------
+
+.. autosummary::
+   :toctree: generated/
+
+   leastsq - Minimize the sum of squares of M equations in N unknowns.
+
+Root finding
+------------
+
+General nonlinear solvers:
+
+.. autosummary::
+   :toctree: generated/
+
+   fsolve - Non-linear multivariable equation solver.
+   broyden1 - Broyden's first method.
+   broyden2 - Broyden's second method.
+   NoConvergence -  Exception raised when nonlinear solver does not converge.
+
+Large-scale nonlinear solvers:
+
+.. autosummary::
+   :toctree: generated/
+
+   newton_krylov
+   anderson
+
+   BroydenFirst
+   InverseJacobian
+   KrylovJacobian
+
+Simple iteration solvers:
+
+.. autosummary::
+   :toctree: generated/
+
+   excitingmixing
+   linearmixing
+   diagbroyden
+
+"""  # noqa: E501
+
+from ._optimize import *
+from ._minimize import *
+from ._root import *
+from ._root_scalar import *
+from ._minpack_py import *
+from ._zeros_py import *
+from ._lbfgsb_py import fmin_l_bfgs_b, LbfgsInvHessProduct
+from ._tnc import fmin_tnc
+from ._cobyla_py import fmin_cobyla
+from ._nonlin import *
+from ._slsqp_py import fmin_slsqp
+from ._nnls import nnls
+from ._basinhopping import basinhopping
+from ._linprog import linprog, linprog_verbose_callback
+from ._lsap import linear_sum_assignment
+from ._differentialevolution import differential_evolution
+from ._lsq import least_squares, lsq_linear
+from ._isotonic import isotonic_regression
+from ._constraints import (NonlinearConstraint,
+                           LinearConstraint,
+                           Bounds)
+from ._hessian_update_strategy import HessianUpdateStrategy, BFGS, SR1
+from ._shgo import shgo
+from ._dual_annealing import dual_annealing
+from ._qap import quadratic_assignment
+from ._direct_py import direct
+from ._milp import milp
+
+# Deprecated namespaces, to be removed in v2.0.0
+from . import (
+    cobyla, lbfgsb, linesearch, minpack, minpack2, moduleTNC, nonlin, optimize,
+    slsqp, tnc, zeros
+)
+
+__all__ = [s for s in dir() if not s.startswith('_')]
+
+from scipy._lib._testutils import PytestTester
+test = PytestTester(__name__)
+del PytestTester
@@ -0,0 +1,753 @@
+"""
+basinhopping: The basinhopping global optimization algorithm
+"""
+import numpy as np
+import math
+import inspect
+import scipy.optimize
+from scipy._lib._util import check_random_state
+
+__all__ = ['basinhopping']
+
+
+_params = (inspect.Parameter('res_new', kind=inspect.Parameter.KEYWORD_ONLY),
+           inspect.Parameter('res_old', kind=inspect.Parameter.KEYWORD_ONLY))
+_new_accept_test_signature = inspect.Signature(parameters=_params)
+
+
+class Storage:
+    """
+    Class used to store the lowest energy structure
+    """
+    def __init__(self, minres):
+        self._add(minres)
+
+    def _add(self, minres):
+        self.minres = minres
+        self.minres.x = np.copy(minres.x)
+
+    def update(self, minres):
+        if minres.success and (minres.fun < self.minres.fun
+                               or not self.minres.success):
+            self._add(minres)
+            return True
+        else:
+            return False
+
+    def get_lowest(self):
+        return self.minres
+
+
+class BasinHoppingRunner:
+    """This class implements the core of the basinhopping algorithm.
+
+    x0 : ndarray
+        The starting coordinates.
+    minimizer : callable
+        The local minimizer, with signature ``result = minimizer(x)``.
+        The return value is an `optimize.OptimizeResult` object.
+    step_taking : callable
+        This function displaces the coordinates randomly. Signature should
+        be ``x_new = step_taking(x)``. Note that `x` may be modified in-place.
+    accept_tests : list of callables
+        Each test is passed the kwargs `f_new`, `x_new`, `f_old` and
+        `x_old`. These tests will be used to judge whether or not to accept
+        the step. The acceptable return values are True, False, or ``"force
+        accept"``. If any of the tests return False then the step is rejected.
+        If ``"force accept"``, then this will override any other tests in
+        order to accept the step. This can be used, for example, to forcefully
+        escape from a local minimum that ``basinhopping`` is trapped in.
+    disp : bool, optional
+        Display status messages.
+
+    """
+    def __init__(self, x0, minimizer, step_taking, accept_tests, disp=False):
+        self.x = np.copy(x0)
+        self.minimizer = minimizer
+        self.step_taking = step_taking
+        self.accept_tests = accept_tests
+        self.disp = disp
+
+        self.nstep = 0
+
+        # initialize return object
+        self.res = scipy.optimize.OptimizeResult()
+        self.res.minimization_failures = 0
+
+        # do initial minimization
+        minres = minimizer(self.x)
+        if not minres.success:
+            self.res.minimization_failures += 1
+            if self.disp:
+                print("warning: basinhopping: local minimization failure")
+        self.x = np.copy(minres.x)
+        self.energy = minres.fun
+        self.incumbent_minres = minres  # best minimize result found so far
+        if self.disp:
+            print("basinhopping step %d: f %g" % (self.nstep, self.energy))
+
+        # initialize storage class
+        self.storage = Storage(minres)
+
+        if hasattr(minres, "nfev"):
+            self.res.nfev = minres.nfev
+        if hasattr(minres, "njev"):
+            self.res.njev = minres.njev
+        if hasattr(minres, "nhev"):
+            self.res.nhev = minres.nhev
+
+    def _monte_carlo_step(self):
+        """Do one Monte Carlo iteration
+
+        Randomly displace the coordinates, minimize, and decide whether
+        or not to accept the new coordinates.
+        """
+        # Take a random step.  Make a copy of x because the step_taking
+        # algorithm might change x in place
+        x_after_step = np.copy(self.x)
+        x_after_step = self.step_taking(x_after_step)
+
+        # do a local minimization
+        minres = self.minimizer(x_after_step)
+        x_after_quench = minres.x
+        energy_after_quench = minres.fun
+        if not minres.success:
+            self.res.minimization_failures += 1
+            if self.disp:
+                print("warning: basinhopping: local minimization failure")
+        if hasattr(minres, "nfev"):
+            self.res.nfev += minres.nfev
+        if hasattr(minres, "njev"):
+            self.res.njev += minres.njev
+        if hasattr(minres, "nhev"):
+            self.res.nhev += minres.nhev
+
+        # accept the move based on self.accept_tests. If any test is False,
+        # then reject the step.  If any test returns the special string
+        # 'force accept', then accept the step regardless. This can be used
+        # to forcefully escape from a local minimum if normal basin hopping
+        # steps are not sufficient.
+        accept = True
+        for test in self.accept_tests:
+            if inspect.signature(test) == _new_accept_test_signature:
+                testres = test(res_new=minres, res_old=self.incumbent_minres)
+            else:
+                testres = test(f_new=energy_after_quench, x_new=x_after_quench,
+                               f_old=self.energy, x_old=self.x)
+
+            if testres == 'force accept':
+                accept = True
+                break
+            elif testres is None:
+                raise ValueError("accept_tests must return True, False, or "
+                                 "'force accept'")
+            elif not testres:
+                accept = False
+
+        # Report the result of the acceptance test to the take step class.
+        # This is for adaptive step taking
+        if hasattr(self.step_taking, "report"):
+            self.step_taking.report(accept, f_new=energy_after_quench,
+                                    x_new=x_after_quench, f_old=self.energy,
+                                    x_old=self.x)
+
+        return accept, minres
+
+    def one_cycle(self):
+        """Do one cycle of the basinhopping algorithm
+        """
+        self.nstep += 1
+        new_global_min = False
+
+        accept, minres = self._monte_carlo_step()
+
+        if accept:
+            self.energy = minres.fun
+            self.x = np.copy(minres.x)
+            self.incumbent_minres = minres  # best minimize result found so far
+            new_global_min = self.storage.update(minres)
+
+        # print some information
+        if self.disp:
+            self.print_report(minres.fun, accept)
+            if new_global_min:
+                print("found new global minimum on step %d with function"
+                      " value %g" % (self.nstep, self.energy))
+
+        # save some variables as BasinHoppingRunner attributes
+        self.xtrial = minres.x
+        self.energy_trial = minres.fun
+        self.accept = accept
+
+        return new_global_min
+
+    def print_report(self, energy_trial, accept):
+        """print a status update"""
+        minres = self.storage.get_lowest()
+        print("basinhopping step %d: f %g trial_f %g accepted %d "
+              " lowest_f %g" % (self.nstep, self.energy, energy_trial,
+                                accept, minres.fun))
+
+
+class AdaptiveStepsize:
+    """
+    Class to implement adaptive stepsize.
+
+    This class wraps the step taking class and modifies the stepsize to
+    ensure the true acceptance rate is as close as possible to the target.
+
+    Parameters
+    ----------
+    takestep : callable
+        The step taking routine.  Must contain modifiable attribute
+        takestep.stepsize
+    accept_rate : float, optional
+        The target step acceptance rate
+    interval : int, optional
+        Interval for how often to update the stepsize
+    factor : float, optional
+        The step size is multiplied or divided by this factor upon each
+        update.
+    verbose : bool, optional
+        Print information about each update
+
+    """
+    def __init__(self, takestep, accept_rate=0.5, interval=50, factor=0.9,
+                 verbose=True):
+        self.takestep = takestep
+        self.target_accept_rate = accept_rate
+        self.interval = interval
+        self.factor = factor
+        self.verbose = verbose
+
+        self.nstep = 0
+        self.nstep_tot = 0
+        self.naccept = 0
+
+    def __call__(self, x):
+        return self.take_step(x)
+
+    def _adjust_step_size(self):
+        old_stepsize = self.takestep.stepsize
+        accept_rate = float(self.naccept) / self.nstep
+        if accept_rate > self.target_accept_rate:
+            # We're accepting too many steps. This generally means we're
+            # trapped in a basin. Take bigger steps.
+            self.takestep.stepsize /= self.factor
+        else:
+            # We're not accepting enough steps. Take smaller steps.
+            self.takestep.stepsize *= self.factor
+        if self.verbose:
+            print("adaptive stepsize: acceptance rate {:f} target {:f} new "
+                  "stepsize {:g} old stepsize {:g}".format(accept_rate,
+                  self.target_accept_rate, self.takestep.stepsize,
+                  old_stepsize))
+
+    def take_step(self, x):
+        self.nstep += 1
+        self.nstep_tot += 1
+        if self.nstep % self.interval == 0:
+            self._adjust_step_size()
+        return self.takestep(x)
+
+    def report(self, accept, **kwargs):
+        "called by basinhopping to report the result of the step"
+        if accept:
+            self.naccept += 1
+
+
+class RandomDisplacement:
+    """Add a random displacement of maximum size `stepsize` to each coordinate.
+
+    Calling this updates `x` in-place.
+
+    Parameters
+    ----------
+    stepsize : float, optional
+        Maximum stepsize in any dimension
+    random_gen : {None, int, `numpy.random.Generator`,
+                  `numpy.random.RandomState`}, optional
+
+        If `seed` is None (or `np.random`), the `numpy.random.RandomState`
+        singleton is used.
+        If `seed` is an int, a new ``RandomState`` instance is used,
+        seeded with `seed`.
+        If `seed` is already a ``Generator`` or ``RandomState`` instance then
+        that instance is used.
+
+    """
+
+    def __init__(self, stepsize=0.5, random_gen=None):
+        self.stepsize = stepsize
+        self.random_gen = check_random_state(random_gen)
+
+    def __call__(self, x):
+        x += self.random_gen.uniform(-self.stepsize, self.stepsize,
+                                     np.shape(x))
+        return x
+
+
+class MinimizerWrapper:
+    """
+    wrap a minimizer function as a minimizer class
+    """
+    def __init__(self, minimizer, func=None, **kwargs):
+        self.minimizer = minimizer
+        self.func = func
+        self.kwargs = kwargs
+
+    def __call__(self, x0):
+        if self.func is None:
+            return self.minimizer(x0, **self.kwargs)
+        else:
+            return self.minimizer(self.func, x0, **self.kwargs)
+
+
+class Metropolis:
+    """Metropolis acceptance criterion.
+
+    Parameters
+    ----------
+    T : float
+        The "temperature" parameter for the accept or reject criterion.
+    random_gen : {None, int, `numpy.random.Generator`,
+                  `numpy.random.RandomState`}, optional
+
+        If `seed` is None (or `np.random`), the `numpy.random.RandomState`
+        singleton is used.
+        If `seed` is an int, a new ``RandomState`` instance is used,
+        seeded with `seed`.
+        If `seed` is already a ``Generator`` or ``RandomState`` instance then
+        that instance is used.
+        Random number generator used for acceptance test.
+
+    """
+
+    def __init__(self, T, random_gen=None):
+        # Avoid ZeroDivisionError since "MBH can be regarded as a special case
+        # of the BH framework with the Metropolis criterion, where temperature
+        # T = 0." (Reject all steps that increase energy.)
+        self.beta = 1.0 / T if T != 0 else float('inf')
+        self.random_gen = check_random_state(random_gen)
+
+    def accept_reject(self, res_new, res_old):
+        """
+        Assuming the local search underlying res_new was successful:
+        If new energy is lower than old, it will always be accepted.
+        If new is higher than old, there is a chance it will be accepted,
+        less likely for larger differences.
+        """
+        with np.errstate(invalid='ignore'):
+            # The energy values being fed to Metropolis are 1-length arrays, and if
+            # they are equal, their difference is 0, which gets multiplied by beta,
+            # which is inf, and array([0]) * float('inf') causes
+            #
+            # RuntimeWarning: invalid value encountered in multiply
+            #
+            # Ignore this warning so when the algorithm is on a flat plane, it always
+            # accepts the step, to try to move off the plane.
+            prod = -(res_new.fun - res_old.fun) * self.beta
+            w = math.exp(min(0, prod))
+
+        rand = self.random_gen.uniform()
+        return w >= rand and (res_new.success or not res_old.success)
+
+    def __call__(self, *, res_new, res_old):
+        """
+        f_new and f_old are mandatory in kwargs
+        """
+        return bool(self.accept_reject(res_new, res_old))
+
+
+def basinhopping(func, x0, niter=100, T=1.0, stepsize=0.5,
+                 minimizer_kwargs=None, take_step=None, accept_test=None,
+                 callback=None, interval=50, disp=False, niter_success=None,
+                 seed=None, *, target_accept_rate=0.5, stepwise_factor=0.9):
+    """Find the global minimum of a function using the basin-hopping algorithm.
+
+    Basin-hopping is a two-phase method that combines a global stepping
+    algorithm with local minimization at each step. Designed to mimic
+    the natural process of energy minimization of clusters of atoms, it works
+    well for similar problems with "funnel-like, but rugged" energy landscapes
+    [5]_.
+
+    As the step-taking, step acceptance, and minimization methods are all
+    customizable, this function can also be used to implement other two-phase
+    methods.
+
+    Parameters
+    ----------
+    func : callable ``f(x, *args)``
+        Function to be optimized.  ``args`` can be passed as an optional item
+        in the dict `minimizer_kwargs`
+    x0 : array_like
+        Initial guess.
+    niter : integer, optional
+        The number of basin-hopping iterations. There will be a total of
+        ``niter + 1`` runs of the local minimizer.
+    T : float, optional
+        The "temperature" parameter for the acceptance or rejection criterion.
+        Higher "temperatures" mean that larger jumps in function value will be
+        accepted.  For best results `T` should be comparable to the
+        separation (in function value) between local minima.
+    stepsize : float, optional
+        Maximum step size for use in the random displacement.
+    minimizer_kwargs : dict, optional
+        Extra keyword arguments to be passed to the local minimizer
+        `scipy.optimize.minimize` Some important options could be:
+
+            method : str
+                The minimization method (e.g. ``"L-BFGS-B"``)
+            args : tuple
+                Extra arguments passed to the objective function (`func`) and
+                its derivatives (Jacobian, Hessian).
+
+    take_step : callable ``take_step(x)``, optional
+        Replace the default step-taking routine with this routine. The default
+        step-taking routine is a random displacement of the coordinates, but
+        other step-taking algorithms may be better for some systems.
+        `take_step` can optionally have the attribute ``take_step.stepsize``.
+        If this attribute exists, then `basinhopping` will adjust
+        ``take_step.stepsize`` in order to try to optimize the global minimum
+        search.
+    accept_test : callable, ``accept_test(f_new=f_new, x_new=x_new, f_old=fold, x_old=x_old)``, optional
+        Define a test which will be used to judge whether to accept the
+        step. This will be used in addition to the Metropolis test based on
+        "temperature" `T`. The acceptable return values are True,
+        False, or ``"force accept"``. If any of the tests return False
+        then the step is rejected. If the latter, then this will override any
+        other tests in order to accept the step. This can be used, for example,
+        to forcefully escape from a local minimum that `basinhopping` is
+        trapped in.
+    callback : callable, ``callback(x, f, accept)``, optional
+        A callback function which will be called for all minima found. ``x``
+        and ``f`` are the coordinates and function value of the trial minimum,
+        and ``accept`` is whether that minimum was accepted. This can
+        be used, for example, to save the lowest N minima found. Also,
+        `callback` can be used to specify a user defined stop criterion by
+        optionally returning True to stop the `basinhopping` routine.
+    interval : integer, optional
+        interval for how often to update the `stepsize`
+    disp : bool, optional
+        Set to True to print status messages
+    niter_success : integer, optional
+        Stop the run if the global minimum candidate remains the same for this
+        number of iterations.
+    seed : {None, int, `numpy.random.Generator`, `numpy.random.RandomState`}, optional
+
+        If `seed` is None (or `np.random`), the `numpy.random.RandomState`
+        singleton is used.
+        If `seed` is an int, a new ``RandomState`` instance is used,
+        seeded with `seed`.
+        If `seed` is already a ``Generator`` or ``RandomState`` instance then
+        that instance is used.
+        Specify `seed` for repeatable minimizations. The random numbers
+        generated with this seed only affect the default Metropolis
+        `accept_test` and the default `take_step`. If you supply your own
+        `take_step` and `accept_test`, and these functions use random
+        number generation, then those functions are responsible for the state
+        of their random number generator.
+    target_accept_rate : float, optional
+        The target acceptance rate that is used to adjust the `stepsize`.
+        If the current acceptance rate is greater than the target,
+        then the `stepsize` is increased. Otherwise, it is decreased.
+        Range is (0, 1). Default is 0.5.
+
+        .. versionadded:: 1.8.0
+
+    stepwise_factor : float, optional
+        The `stepsize` is multiplied or divided by this stepwise factor upon
+        each update. Range is (0, 1). Default is 0.9.
+
+        .. versionadded:: 1.8.0
+
+    Returns
+    -------
+    res : OptimizeResult
+        The optimization result represented as a `OptimizeResult` object.
+        Important attributes are: ``x`` the solution array, ``fun`` the value
+        of the function at the solution, and ``message`` which describes the
+        cause of the termination. The ``OptimizeResult`` object returned by the
+        selected minimizer at the lowest minimum is also contained within this
+        object and can be accessed through the ``lowest_optimization_result``
+        attribute.  See `OptimizeResult` for a description of other attributes.
+
+    See Also
+    --------
+    minimize :
+        The local minimization function called once for each basinhopping step.
+        `minimizer_kwargs` is passed to this routine.
+
+    Notes
+    -----
+    Basin-hopping is a stochastic algorithm which attempts to find the global
+    minimum of a smooth scalar function of one or more variables [1]_ [2]_ [3]_
+    [4]_. The algorithm in its current form was described by David Wales and
+    Jonathan Doye [2]_ http://www-wales.ch.cam.ac.uk/.
+
+    The algorithm is iterative with each cycle composed of the following
+    features
+
+    1) random perturbation of the coordinates
+
+    2) local minimization
+
+    3) accept or reject the new coordinates based on the minimized function
+       value
+
+    The acceptance test used here is the Metropolis criterion of standard Monte
+    Carlo algorithms, although there are many other possibilities [3]_.
+
+    This global minimization method has been shown to be extremely efficient
+    for a wide variety of problems in physics and chemistry. It is
+    particularly useful when the function has many minima separated by large
+    barriers. See the `Cambridge Cluster Database
+    <https://www-wales.ch.cam.ac.uk/CCD.html>`_ for databases of molecular
+    systems that have been optimized primarily using basin-hopping. This
+    database includes minimization problems exceeding 300 degrees of freedom.
+
+    See the free software program `GMIN <https://www-wales.ch.cam.ac.uk/GMIN>`_
+    for a Fortran implementation of basin-hopping. This implementation has many
+    variations of the procedure described above, including more
+    advanced step taking algorithms and alternate acceptance criterion.
+
+    For stochastic global optimization there is no way to determine if the true
+    global minimum has actually been found. Instead, as a consistency check,
+    the algorithm can be run from a number of different random starting points
+    to ensure the lowest minimum found in each example has converged to the
+    global minimum. For this reason, `basinhopping` will by default simply
+    run for the number of iterations `niter` and return the lowest minimum
+    found. It is left to the user to ensure that this is in fact the global
+    minimum.
+
+    Choosing `stepsize`:  This is a crucial parameter in `basinhopping` and
+    depends on the problem being solved. The step is chosen uniformly in the
+    region from x0-stepsize to x0+stepsize, in each dimension. Ideally, it
+    should be comparable to the typical separation (in argument values) between
+    local minima of the function being optimized. `basinhopping` will, by
+    default, adjust `stepsize` to find an optimal value, but this may take
+    many iterations. You will get quicker results if you set a sensible
+    initial value for ``stepsize``.
+
+    Choosing `T`: The parameter `T` is the "temperature" used in the
+    Metropolis criterion. Basinhopping steps are always accepted if
+    ``func(xnew) < func(xold)``. Otherwise, they are accepted with
+    probability::
+
+        exp( -(func(xnew) - func(xold)) / T )
+
+    So, for best results, `T` should to be comparable to the typical
+    difference (in function values) between local minima. (The height of
+    "walls" between local minima is irrelevant.)
+
+    If `T` is 0, the algorithm becomes Monotonic Basin-Hopping, in which all
+    steps that increase energy are rejected.
+
+    .. versionadded:: 0.12.0
+
+    References
+    ----------
+    .. [1] Wales, David J. 2003, Energy Landscapes, Cambridge University Press,
+        Cambridge, UK.
+    .. [2] Wales, D J, and Doye J P K, Global Optimization by Basin-Hopping and
+        the Lowest Energy Structures of Lennard-Jones Clusters Containing up to
+        110 Atoms.  Journal of Physical Chemistry A, 1997, 101, 5111.
+    .. [3] Li, Z. and Scheraga, H. A., Monte Carlo-minimization approach to the
+        multiple-minima problem in protein folding, Proc. Natl. Acad. Sci. USA,
+        1987, 84, 6611.
+    .. [4] Wales, D. J. and Scheraga, H. A., Global optimization of clusters,
+        crystals, and biomolecules, Science, 1999, 285, 1368.
+    .. [5] Olson, B., Hashmi, I., Molloy, K., and Shehu1, A., Basin Hopping as
+        a General and Versatile Optimization Framework for the Characterization
+        of Biological Macromolecules, Advances in Artificial Intelligence,
+        Volume 2012 (2012), Article ID 674832, :doi:`10.1155/2012/674832`
+
+    Examples
+    --------
+    The following example is a 1-D minimization problem, with many
+    local minima superimposed on a parabola.
+
+    >>> import numpy as np
+    >>> from scipy.optimize import basinhopping
+    >>> func = lambda x: np.cos(14.5 * x - 0.3) + (x + 0.2) * x
+    >>> x0 = [1.]
+
+    Basinhopping, internally, uses a local minimization algorithm. We will use
+    the parameter `minimizer_kwargs` to tell basinhopping which algorithm to
+    use and how to set up that minimizer. This parameter will be passed to
+    `scipy.optimize.minimize`.
+
+    >>> minimizer_kwargs = {"method": "BFGS"}
+    >>> ret = basinhopping(func, x0, minimizer_kwargs=minimizer_kwargs,
+    ...                    niter=200)
+    >>> print("global minimum: x = %.4f, f(x) = %.4f" % (ret.x, ret.fun))
+    global minimum: x = -0.1951, f(x) = -1.0009
+
+    Next consider a 2-D minimization problem. Also, this time, we
+    will use gradient information to significantly speed up the search.
+
+    >>> def func2d(x):
+    ...     f = np.cos(14.5 * x[0] - 0.3) + (x[1] + 0.2) * x[1] + (x[0] +
+    ...                                                            0.2) * x[0]
+    ...     df = np.zeros(2)
+    ...     df[0] = -14.5 * np.sin(14.5 * x[0] - 0.3) + 2. * x[0] + 0.2
+    ...     df[1] = 2. * x[1] + 0.2
+    ...     return f, df
+
+    We'll also use a different local minimization algorithm. Also, we must tell
+    the minimizer that our function returns both energy and gradient (Jacobian).
+
+    >>> minimizer_kwargs = {"method":"L-BFGS-B", "jac":True}
+    >>> x0 = [1.0, 1.0]
+    >>> ret = basinhopping(func2d, x0, minimizer_kwargs=minimizer_kwargs,
+    ...                    niter=200)
+    >>> print("global minimum: x = [%.4f, %.4f], f(x) = %.4f" % (ret.x[0],
+    ...                                                           ret.x[1],
+    ...                                                           ret.fun))
+    global minimum: x = [-0.1951, -0.1000], f(x) = -1.0109
+
+    Here is an example using a custom step-taking routine. Imagine you want
+    the first coordinate to take larger steps than the rest of the coordinates.
+    This can be implemented like so:
+
+    >>> class MyTakeStep:
+    ...    def __init__(self, stepsize=0.5):
+    ...        self.stepsize = stepsize
+    ...        self.rng = np.random.default_rng()
+    ...    def __call__(self, x):
+    ...        s = self.stepsize
+    ...        x[0] += self.rng.uniform(-2.*s, 2.*s)
+    ...        x[1:] += self.rng.uniform(-s, s, x[1:].shape)
+    ...        return x
+
+    Since ``MyTakeStep.stepsize`` exists basinhopping will adjust the magnitude
+    of `stepsize` to optimize the search. We'll use the same 2-D function as
+    before
+
+    >>> mytakestep = MyTakeStep()
+    >>> ret = basinhopping(func2d, x0, minimizer_kwargs=minimizer_kwargs,
+    ...                    niter=200, take_step=mytakestep)
+    >>> print("global minimum: x = [%.4f, %.4f], f(x) = %.4f" % (ret.x[0],
+    ...                                                           ret.x[1],
+    ...                                                           ret.fun))
+    global minimum: x = [-0.1951, -0.1000], f(x) = -1.0109
+
+    Now, let's do an example using a custom callback function which prints the
+    value of every minimum found
+
+    >>> def print_fun(x, f, accepted):
+    ...         print("at minimum %.4f accepted %d" % (f, int(accepted)))
+
+    We'll run it for only 10 basinhopping steps this time.
+
+    >>> rng = np.random.default_rng()
+    >>> ret = basinhopping(func2d, x0, minimizer_kwargs=minimizer_kwargs,
+    ...                    niter=10, callback=print_fun, seed=rng)
+    at minimum 0.4159 accepted 1
+    at minimum -0.4317 accepted 1
+    at minimum -1.0109 accepted 1
+    at minimum -0.9073 accepted 1
+    at minimum -0.4317 accepted 0
+    at minimum -0.1021 accepted 1
+    at minimum -0.7425 accepted 1
+    at minimum -0.9073 accepted 1
+    at minimum -0.4317 accepted 0
+    at minimum -0.7425 accepted 1
+    at minimum -0.9073 accepted 1
+
+    The minimum at -1.0109 is actually the global minimum, found already on the
+    8th iteration.
+
+    """ # numpy/numpydoc#87  # noqa: E501
+    if target_accept_rate <= 0. or target_accept_rate >= 1.:
+        raise ValueError('target_accept_rate has to be in range (0, 1)')
+    if stepwise_factor <= 0. or stepwise_factor >= 1.:
+        raise ValueError('stepwise_factor has to be in range (0, 1)')
+
+    x0 = np.array(x0)
+
+    # set up the np.random generator
+    rng = check_random_state(seed)
+
+    # set up minimizer
+    if minimizer_kwargs is None:
+        minimizer_kwargs = dict()
+    wrapped_minimizer = MinimizerWrapper(scipy.optimize.minimize, func,
+                                         **minimizer_kwargs)
+
+    # set up step-taking algorithm
+    if take_step is not None:
+        if not callable(take_step):
+            raise TypeError("take_step must be callable")
+        # if take_step.stepsize exists then use AdaptiveStepsize to control
+        # take_step.stepsize
+        if hasattr(take_step, "stepsize"):
+            take_step_wrapped = AdaptiveStepsize(
+                take_step, interval=interval,
+                accept_rate=target_accept_rate,
+                factor=stepwise_factor,
+                verbose=disp)
+        else:
+            take_step_wrapped = take_step
+    else:
+        # use default
+        displace = RandomDisplacement(stepsize=stepsize, random_gen=rng)
+        take_step_wrapped = AdaptiveStepsize(displace, interval=interval,
+                                             accept_rate=target_accept_rate,
+                                             factor=stepwise_factor,
+                                             verbose=disp)
+
+    # set up accept tests
+    accept_tests = []
+    if accept_test is not None:
+        if not callable(accept_test):
+            raise TypeError("accept_test must be callable")
+        accept_tests = [accept_test]
+
+    # use default
+    metropolis = Metropolis(T, random_gen=rng)
+    accept_tests.append(metropolis)
+
+    if niter_success is None:
+        niter_success = niter + 2
+
+    bh = BasinHoppingRunner(x0, wrapped_minimizer, take_step_wrapped,
+                            accept_tests, disp=disp)
+
+    # The wrapped minimizer is called once during construction of
+    # BasinHoppingRunner, so run the callback
+    if callable(callback):
+        callback(bh.storage.minres.x, bh.storage.minres.fun, True)
+
+    # start main iteration loop
+    count, i = 0, 0
+    message = ["requested number of basinhopping iterations completed"
+               " successfully"]
+    for i in range(niter):
+        new_global_min = bh.one_cycle()
+
+        if callable(callback):
+            # should we pass a copy of x?
+            val = callback(bh.xtrial, bh.energy_trial, bh.accept)
+            if val is not None:
+                if val:
+                    message = ["callback function requested stop early by"
+                               "returning True"]
+                    break
+
+        count += 1
+        if new_global_min:
+            count = 0
+        elif count > niter_success:
+            message = ["success condition satisfied"]
+            break
+
+    # prepare return object
+    res = bh.res
+    res.lowest_optimization_result = bh.storage.get_lowest()
+    res.x = np.copy(res.lowest_optimization_result.x)
+    res.fun = res.lowest_optimization_result.fun
+    res.message = message
+    res.nit = i + 1
+    res.success = res.lowest_optimization_result.success
+    return res
@@ -0,0 +1,663 @@
+import numpy as np
+import scipy._lib._elementwise_iterative_method as eim
+from scipy._lib._util import _RichResult
+
+_ELIMITS = -1  # used in _bracket_root
+_ESTOPONESIDE = 2  # used in _bracket_root
+
+def _bracket_root_iv(func, xl0, xr0, xmin, xmax, factor, args, maxiter):
+
+    if not callable(func):
+        raise ValueError('`func` must be callable.')
+
+    if not np.iterable(args):
+        args = (args,)
+
+    xl0 = np.asarray(xl0)[()]
+    if not np.issubdtype(xl0.dtype, np.number) or np.iscomplex(xl0).any():
+        raise ValueError('`xl0` must be numeric and real.')
+
+    xr0 = xl0 + 1 if xr0 is None else xr0
+    xmin = -np.inf if xmin is None else xmin
+    xmax = np.inf if xmax is None else xmax
+    factor = 2. if factor is None else factor
+    xl0, xr0, xmin, xmax, factor = np.broadcast_arrays(xl0, xr0, xmin, xmax, factor)
+
+    if not np.issubdtype(xr0.dtype, np.number) or np.iscomplex(xr0).any():
+        raise ValueError('`xr0` must be numeric and real.')
+
+    if not np.issubdtype(xmin.dtype, np.number) or np.iscomplex(xmin).any():
+        raise ValueError('`xmin` must be numeric and real.')
+
+    if not np.issubdtype(xmax.dtype, np.number) or np.iscomplex(xmax).any():
+        raise ValueError('`xmax` must be numeric and real.')
+
+    if not np.issubdtype(factor.dtype, np.number) or np.iscomplex(factor).any():
+        raise ValueError('`factor` must be numeric and real.')
+    if not np.all(factor > 1):
+        raise ValueError('All elements of `factor` must be greater than 1.')
+
+    maxiter = np.asarray(maxiter)
+    message = '`maxiter` must be a non-negative integer.'
+    if (not np.issubdtype(maxiter.dtype, np.number) or maxiter.shape != tuple()
+            or np.iscomplex(maxiter)):
+        raise ValueError(message)
+    maxiter_int = int(maxiter[()])
+    if not maxiter == maxiter_int or maxiter < 0:
+        raise ValueError(message)
+
+    if not np.all((xmin <= xl0) & (xl0 < xr0) & (xr0 <= xmax)):
+        raise ValueError('`xmin <= xl0 < xr0 <= xmax` must be True (elementwise).')
+
+    return func, xl0, xr0, xmin, xmax, factor, args, maxiter
+
+
+def _bracket_root(func, xl0, xr0=None, *, xmin=None, xmax=None, factor=None,
+                  args=(), maxiter=1000):
+    """Bracket the root of a monotonic scalar function of one variable
+
+    This function works elementwise when `xl0`, `xr0`, `xmin`, `xmax`, `factor`, and
+    the elements of `args` are broadcastable arrays.
+
+    Parameters
+    ----------
+    func : callable
+        The function for which the root is to be bracketed.
+        The signature must be::
+
+            func(x: ndarray, *args) -> ndarray
+
+        where each element of ``x`` is a finite real and ``args`` is a tuple,
+        which may contain an arbitrary number of arrays that are broadcastable
+        with `x`. ``func`` must be an elementwise function: each element
+        ``func(x)[i]`` must equal ``func(x[i])`` for all indices ``i``.
+    xl0, xr0: float array_like
+        Starting guess of bracket, which need not contain a root. If `xr0` is
+        not provided, ``xr0 = xl0 + 1``. Must be broadcastable with one another.
+    xmin, xmax : float array_like, optional
+        Minimum and maximum allowable endpoints of the bracket, inclusive. Must
+        be broadcastable with `xl0` and `xr0`.
+    factor : float array_like, default: 2
+        The factor used to grow the bracket. See notes for details.
+    args : tuple, optional
+        Additional positional arguments to be passed to `func`.  Must be arrays
+        broadcastable with `xl0`, `xr0`, `xmin`, and `xmax`. If the callable to be
+        bracketed requires arguments that are not broadcastable with these
+        arrays, wrap that callable with `func` such that `func` accepts
+        only `x` and broadcastable arrays.
+    maxiter : int, optional
+        The maximum number of iterations of the algorithm to perform.
+
+    Returns
+    -------
+    res : _RichResult
+        An instance of `scipy._lib._util._RichResult` with the following
+        attributes. The descriptions are written as though the values will be
+        scalars; however, if `func` returns an array, the outputs will be
+        arrays of the same shape.
+
+        xl, xr : float
+            The lower and upper ends of the bracket, if the algorithm
+            terminated successfully.
+        fl, fr : float
+            The function value at the lower and upper ends of the bracket.
+        nfev : int
+            The number of function evaluations required to find the bracket.
+            This is distinct from the number of times `func` is *called*
+            because the function may evaluated at multiple points in a single
+            call.
+        nit : int
+            The number of iterations of the algorithm that were performed.
+        status : int
+            An integer representing the exit status of the algorithm.
+
+            - ``0`` : The algorithm produced a valid bracket.
+            - ``-1`` : The bracket expanded to the allowable limits without finding a bracket.
+            - ``-2`` : The maximum number of iterations was reached.
+            - ``-3`` : A non-finite value was encountered.
+            - ``-4`` : Iteration was terminated by `callback`.
+            - ``1`` : The algorithm is proceeding normally (in `callback` only).
+            - ``2`` : A bracket was found in the opposite search direction (in `callback` only).
+
+        success : bool
+            ``True`` when the algorithm terminated successfully (status ``0``).
+
+    Notes
+    -----
+    This function generalizes an algorithm found in pieces throughout
+    `scipy.stats`. The strategy is to iteratively grow the bracket `(l, r)`
+     until ``func(l) < 0 < func(r)``. The bracket grows to the left as follows.
+
+    - If `xmin` is not provided, the distance between `xl0` and `l` is iteratively
+      increased by `factor`.
+    - If `xmin` is provided, the distance between `xmin` and `l` is iteratively
+      decreased by `factor`. Note that this also *increases* the bracket size.
+
+    Growth of the bracket to the right is analogous.
+
+    Growth of the bracket in one direction stops when the endpoint is no longer
+    finite, the function value at the endpoint is no longer finite, or the
+    endpoint reaches its limiting value (`xmin` or `xmax`). Iteration terminates
+    when the bracket stops growing in both directions, the bracket surrounds
+    the root, or a root is found (accidentally).
+
+    If two brackets are found - that is, a bracket is found on both sides in
+    the same iteration, the smaller of the two is returned.
+    If roots of the function are found, both `l` and `r` are set to the
+    leftmost root.
+
+    """  # noqa: E501
+    # Todo:
+    # - find bracket with sign change in specified direction
+    # - Add tolerance
+    # - allow factor < 1?
+
+    callback = None  # works; I just don't want to test it
+    temp = _bracket_root_iv(func, xl0, xr0, xmin, xmax, factor, args, maxiter)
+    func, xl0, xr0, xmin, xmax, factor, args, maxiter = temp
+
+    xs = (xl0, xr0)
+    temp = eim._initialize(func, xs, args)
+    func, xs, fs, args, shape, dtype = temp  # line split for PEP8
+
+    # The approach is to treat the left and right searches as though they were
+    # (almost) totally independent one-sided bracket searches. (The interaction
+    # is considered when checking for termination and preparing the result
+    # object.)
+    # `x` is the "moving" end of the bracket
+    x = np.concatenate(xs)
+    f = np.concatenate(fs)
+    n = len(x) // 2
+
+    # `x_last` is the previous location of the moving end of the bracket. If
+    # the signs of `f` and `f_last` are different, `x` and `x_last` form a
+    # bracket.
+    x_last = np.concatenate((x[n:], x[:n]))
+    f_last = np.concatenate((f[n:], f[:n]))
+    # `x0` is the "fixed" end of the bracket.
+    x0 = x_last
+    # We don't need to retain the corresponding function value, since the
+    # fixed end of the bracket is only needed to compute the new value of the
+    # moving end; it is never returned.
+
+    xmin = np.broadcast_to(xmin, shape).astype(dtype, copy=False).ravel()
+    xmax = np.broadcast_to(xmax, shape).astype(dtype, copy=False).ravel()
+    limit = np.concatenate((xmin, xmax))
+
+    factor = np.broadcast_to(factor, shape).astype(dtype, copy=False).ravel()
+    factor = np.concatenate((factor, factor))
+
+    active = np.arange(2*n)
+    args = [np.concatenate((arg, arg)) for arg in args]
+
+    # This is needed due to inner workings of `eim._loop`.
+    # We're abusing it a tiny bit.
+    shape = shape + (2,)
+
+    # `d` is for "distance".
+    # For searches without a limit, the distance between the fixed end of the
+    # bracket `x0` and the moving end `x` will grow by `factor` each iteration.
+    # For searches with a limit, the distance between the `limit` and moving
+    # end of the bracket `x` will shrink by `factor` each iteration.
+    i = np.isinf(limit)
+    ni = ~i
+    d = np.zeros_like(x)
+    d[i] = x[i] - x0[i]
+    d[ni] = limit[ni] - x[ni]
+
+    status = np.full_like(x, eim._EINPROGRESS, dtype=int)  # in progress
+    nit, nfev = 0, 1  # one function evaluation per side performed above
+
+    work = _RichResult(x=x, x0=x0, f=f, limit=limit, factor=factor,
+                       active=active, d=d, x_last=x_last, f_last=f_last,
+                       nit=nit, nfev=nfev, status=status, args=args,
+                       xl=None, xr=None, fl=None, fr=None, n=n)
+    res_work_pairs = [('status', 'status'), ('xl', 'xl'), ('xr', 'xr'),
+                      ('nit', 'nit'), ('nfev', 'nfev'), ('fl', 'fl'),
+                      ('fr', 'fr'), ('x', 'x'), ('f', 'f'),
+                      ('x_last', 'x_last'), ('f_last', 'f_last')]
+
+    def pre_func_eval(work):
+        # Initialize moving end of bracket
+        x = np.zeros_like(work.x)
+
+        # Unlimited brackets grow by `factor` by increasing distance from fixed
+        # end to moving end.
+        i = np.isinf(work.limit)  # indices of unlimited brackets
+        work.d[i] *= work.factor[i]
+        x[i] = work.x0[i] + work.d[i]
+
+        # Limited brackets grow by decreasing the distance from the limit to
+        # the moving end.
+        ni = ~i  # indices of limited brackets
+        work.d[ni] /= work.factor[ni]
+        x[ni] = work.limit[ni] - work.d[ni]
+
+        return x
+
+    def post_func_eval(x, f, work):
+        # Keep track of the previous location of the moving end so that we can
+        # return a narrower bracket. (The alternative is to remember the
+        # original fixed end, but then the bracket would be wider than needed.)
+        work.x_last = work.x
+        work.f_last = work.f
+        work.x = x
+        work.f = f
+
+    def check_termination(work):
+        stop = np.zeros_like(work.x, dtype=bool)
+
+        # Condition 1: a valid bracket (or the root itself) has been found
+        sf = np.sign(work.f)
+        sf_last = np.sign(work.f_last)
+        i = (sf_last == -sf) | (sf_last == 0) | (sf == 0)
+        work.status[i] = eim._ECONVERGED
+        stop[i] = True
+
+        # Condition 2: the other side's search found a valid bracket.
+        # (If we just found a bracket with the rightward search, we can stop
+        #  the leftward search, and vice-versa.)
+        # To do this, we need to set the status of the other side's search;
+        # this is tricky because `work.status` contains only the *active*
+        # elements, so we don't immediately know the index of the element we
+        # need to set - or even if it's still there. (That search may have
+        # terminated already, e.g. by reaching its `limit`.)
+        # To facilitate this, `work.active` contains a unit integer index of
+        # each search. Index `k` (`k < n)` and `k + n` correspond with a
+        # leftward and rightward search, respectively. Elements are removed
+        # from `work.active` just as they are removed from `work.status`, so
+        # we use `work.active` to help find the right location in
+        # `work.status`.
+        # Get the integer indices of the elements that can also stop
+        also_stop = (work.active[i] + work.n) % (2*work.n)
+        # Check whether they are still active.
+        # To start, we need to find out where in `work.active` they would
+        # appear if they are indeed there.
+        j = np.searchsorted(work.active, also_stop)
+        # If the location exceeds the length of the `work.active`, they are
+        # not there.
+        j = j[j < len(work.active)]
+        # Check whether they are still there.
+        j = j[also_stop == work.active[j]]
+        # Now convert these to boolean indices to use with `work.status`.
+        i = np.zeros_like(stop)
+        i[j] = True  # boolean indices of elements that can also stop
+        i = i & ~stop
+        work.status[i] = _ESTOPONESIDE
+        stop[i] = True
+
+        # Condition 3: moving end of bracket reaches limit
+        i = (work.x == work.limit) & ~stop
+        work.status[i] = _ELIMITS
+        stop[i] = True
+
+        # Condition 4: non-finite value encountered
+        i = ~(np.isfinite(work.x) & np.isfinite(work.f)) & ~stop
+        work.status[i] = eim._EVALUEERR
+        stop[i] = True
+
+        return stop
+
+    def post_termination_check(work):
+        pass
+
+    def customize_result(res, shape):
+        n = len(res['x']) // 2
+
+        # To avoid ambiguity, below we refer to `xl0`, the initial left endpoint
+        # as `a` and `xr0`, the initial right endpoint, as `b`.
+        # Because we treat the two one-sided searches as though they were
+        # independent, what we keep track of in `work` and what we want to
+        # return in `res` look quite different. Combine the results from the
+        # two one-sided searches before reporting the results to the user.
+        # - "a" refers to the leftward search (the moving end started at `a`)
+        # - "b" refers to the rightward search (the moving end started at `b`)
+        # - "l" refers to the left end of the bracket (closer to -oo)
+        # - "r" refers to the right end of the bracket (closer to +oo)
+        xal = res['x'][:n]
+        xar = res['x_last'][:n]
+        xbl = res['x_last'][n:]
+        xbr = res['x'][n:]
+
+        fal = res['f'][:n]
+        far = res['f_last'][:n]
+        fbl = res['f_last'][n:]
+        fbr = res['f'][n:]
+
+        # Initialize the brackets and corresponding function values to return
+        # to the user. Brackets may not be valid (e.g. there is no root,
+        # there weren't enough iterations, NaN encountered), but we still need
+        # to return something. One option would be all NaNs, but what I've
+        # chosen here is the left- and right-most points at which the function
+        # has been evaluated. This gives the user some information about what
+        # interval of the real line has been searched and shows that there is
+        # no sign change between the two ends.
+        xl = xal.copy()
+        fl = fal.copy()
+        xr = xbr.copy()
+        fr = fbr.copy()
+
+        # `status` indicates whether the bracket is valid or not. If so,
+        # we want to adjust the bracket we return to be the narrowest possible
+        # given the points at which we evaluated the function.
+        # For example if bracket "a" is valid and smaller than bracket "b" OR
+        # if bracket "a" is valid and bracket "b" is not valid, we want to
+        # return bracket "a" (and vice versa).
+        sa = res['status'][:n]
+        sb = res['status'][n:]
+
+        da = xar - xal
+        db = xbr - xbl
+
+        i1 = ((da <= db) & (sa == 0)) | ((sa == 0) & (sb != 0))
+        i2 = ((db <= da) & (sb == 0)) | ((sb == 0) & (sa != 0))
+
+        xr[i1] = xar[i1]
+        fr[i1] = far[i1]
+        xl[i2] = xbl[i2]
+        fl[i2] = fbl[i2]
+
+        # Finish assembling the result object
+        res['xl'] = xl
+        res['xr'] = xr
+        res['fl'] = fl
+        res['fr'] = fr
+
+        res['nit'] = np.maximum(res['nit'][:n], res['nit'][n:])
+        res['nfev'] = res['nfev'][:n] + res['nfev'][n:]
+        # If the status on one side is zero, the status is zero. In any case,
+        # report the status from one side only.
+        res['status'] = np.choose(sa == 0, (sb, sa))
+        res['success'] = (res['status'] == 0)
+
+        del res['x']
+        del res['f']
+        del res['x_last']
+        del res['f_last']
+
+        return shape[:-1]
+
+    return eim._loop(work, callback, shape, maxiter, func, args, dtype,
+                     pre_func_eval, post_func_eval, check_termination,
+                     post_termination_check, customize_result, res_work_pairs)
+
+
+def _bracket_minimum_iv(func, xm0, xl0, xr0, xmin, xmax, factor, args, maxiter):
+
+    if not callable(func):
+        raise ValueError('`func` must be callable.')
+
+    if not np.iterable(args):
+        args = (args,)
+
+    xm0 = np.asarray(xm0)[()]
+    if not np.issubdtype(xm0.dtype, np.number) or np.iscomplex(xm0).any():
+        raise ValueError('`xm0` must be numeric and real.')
+
+    xmin = -np.inf if xmin is None else xmin
+    xmax = np.inf if xmax is None else xmax
+
+    xl0_not_supplied = False
+    if xl0 is None:
+        xl0 = xm0 - 0.5
+        xl0_not_supplied = True
+
+    xr0_not_supplied = False
+    if xr0 is None:
+        xr0 = xm0 + 0.5
+        xr0_not_supplied = True
+
+    factor = 2.0 if factor is None else factor
+    xl0, xm0, xr0, xmin, xmax, factor = np.broadcast_arrays(
+        xl0, xm0, xr0, xmin, xmax, factor
+    )
+
+    if not np.issubdtype(xl0.dtype, np.number) or np.iscomplex(xl0).any():
+        raise ValueError('`xl0` must be numeric and real.')
+
+    if not np.issubdtype(xr0.dtype, np.number) or np.iscomplex(xr0).any():
+        raise ValueError('`xr0` must be numeric and real.')
+
+    if not np.issubdtype(xmin.dtype, np.number) or np.iscomplex(xmin).any():
+        raise ValueError('`xmin` must be numeric and real.')
+
+    if not np.issubdtype(xmax.dtype, np.number) or np.iscomplex(xmax).any():
+        raise ValueError('`xmax` must be numeric and real.')
+
+    if not np.issubdtype(factor.dtype, np.number) or np.iscomplex(factor).any():
+        raise ValueError('`factor` must be numeric and real.')
+    if not np.all(factor > 1):
+        raise ValueError('All elements of `factor` must be greater than 1.')
+
+    # Default choices for xl or xr might have exceeded xmin or xmax. Adjust
+    # to make sure this doesn't happen. We replace with copies because xl, and xr
+    # are read-only views produced by broadcast_arrays.
+    if xl0_not_supplied:
+        xl0 = xl0.copy()
+        cond = ~np.isinf(xmin) & (xl0 < xmin)
+        xl0[cond] = (
+            xm0[cond] - xmin[cond]
+        ) / np.array(16, dtype=xl0.dtype)
+    if xr0_not_supplied:
+        xr0 = xr0.copy()
+        cond = ~np.isinf(xmax) & (xmax < xr0)
+        xr0[cond] = (
+            xmax[cond] - xm0[cond]
+        ) / np.array(16, dtype=xr0.dtype)
+
+    maxiter = np.asarray(maxiter)
+    message = '`maxiter` must be a non-negative integer.'
+    if (not np.issubdtype(maxiter.dtype, np.number) or maxiter.shape != tuple()
+            or np.iscomplex(maxiter)):
+        raise ValueError(message)
+    maxiter_int = int(maxiter[()])
+    if not maxiter == maxiter_int or maxiter < 0:
+        raise ValueError(message)
+
+    if not np.all((xmin <= xl0) & (xl0 < xm0) & (xm0 < xr0) & (xr0 <= xmax)):
+        raise ValueError(
+            '`xmin <= xl0 < xm0 < xr0 <= xmax` must be True (elementwise).'
+        )
+
+    return func, xm0, xl0, xr0, xmin, xmax, factor, args, maxiter
+
+
+def _bracket_minimum(func, xm0, *, xl0=None, xr0=None, xmin=None, xmax=None,
+                     factor=None, args=(), maxiter=1000):
+    """Bracket the minimum of a unimodal scalar function of one variable
+
+    This function works elementwise when `xm0`, `xl0`, `xr0`, `xmin`, `xmax`,
+    and the elements of `args` are broadcastable arrays.
+
+    Parameters
+    ----------
+    func : callable
+        The function for which the minimum is to be bracketed.
+        The signature must be::
+
+            func(x: ndarray, *args) -> ndarray
+
+        where each element of ``x`` is a finite real and ``args`` is a tuple,
+        which may contain an arbitrary number of arrays that are broadcastable
+        with ``x``. `func` must be an elementwise function: each element
+        ``func(x)[i]`` must equal ``func(x[i])`` for all indices `i`.
+    xm0: float array_like
+        Starting guess for middle point of bracket.
+    xl0, xr0: float array_like, optional
+        Starting guesses for left and right endpoints of the bracket. Must be
+        broadcastable with one another and with `xm0`.
+    xmin, xmax : float array_like, optional
+        Minimum and maximum allowable endpoints of the bracket, inclusive. Must
+        be broadcastable with `xl0`, `xm0`, and `xr0`.
+    factor : float array_like, optional
+        Controls expansion of bracket endpoint in downhill direction. Works
+        differently in the cases where a limit is set in the downhill direction
+        with `xmax` or `xmin`. See Notes.
+    args : tuple, optional
+        Additional positional arguments to be passed to `func`.  Must be arrays
+        broadcastable with `xl0`, `xm0`, `xr0`, `xmin`, and `xmax`. If the
+        callable to be bracketed requires arguments that are not broadcastable
+        with these arrays, wrap that callable with `func` such that `func`
+        accepts only ``x`` and broadcastable arrays.
+    maxiter : int, optional
+        The maximum number of iterations of the algorithm to perform. The number
+        of function evaluations is three greater than the number of iterations.
+
+    Returns
+    -------
+    res : _RichResult
+        An instance of `scipy._lib._util._RichResult` with the following
+        attributes. The descriptions are written as though the values will be
+        scalars; however, if `func` returns an array, the outputs will be
+        arrays of the same shape.
+
+        xl, xm, xr : float
+            The left, middle, and right points of the bracket, if the algorithm
+            terminated successfully.
+        fl, fm, fr : float
+            The function value at the left, middle, and right points of the bracket.
+        nfev : int
+            The number of function evaluations required to find the bracket.
+        nit : int
+            The number of iterations of the algorithm that were performed.
+        status : int
+            An integer representing the exit status of the algorithm.
+
+            - ``0`` : The algorithm produced a valid bracket.
+            - ``-1`` : The bracket expanded to the allowable limits. Assuming
+                       unimodality, this implies the endpoint at the limit is a
+                       minimizer.
+            - ``-2`` : The maximum number of iterations was reached.
+            - ``-3`` : A non-finite value was encountered.
+
+        success : bool
+            ``True`` when the algorithm terminated successfully (status ``0``).
+
+    Notes
+    -----
+    Similar to `scipy.optimize.bracket`, this function seeks to find real
+    points ``xl < xm < xr`` such that ``f(xl) >= f(xm)`` and ``f(xr) >= f(xm)``,
+    where at least one of the inequalities is strict. Unlike `scipy.optimize.bracket`,
+    this function can operate in a vectorized manner on array input, so long as
+    the input arrays are broadcastable with each other. Also unlike
+    `scipy.optimize.bracket`, users may specify minimum and maximum endpoints
+    for the desired bracket.
+
+    Given an initial trio of points ``xl = xl0``, ``xm = xm0``, ``xr = xr0``,
+    the algorithm checks if these points already give a valid bracket. If not,
+    a new endpoint, ``w`` is chosen in the "downhill" direction, ``xm`` becomes the new
+    opposite endpoint, and either `xl` or `xr` becomes the new middle point,
+    depending on which direction is downhill. The algorithm repeats from here.
+
+    The new endpoint `w` is chosen differently depending on whether or not a
+    boundary `xmin` or `xmax` has been set in the downhill direction. Without
+    loss of generality, suppose the downhill direction is to the right, so that
+    ``f(xl) > f(xm) > f(xr)``. If there is no boundary to the right, then `w`
+    is chosen to be ``xr + factor * (xr - xm)`` where `factor` is controlled by
+    the user (defaults to 2.0) so that step sizes increase in geometric proportion.
+    If there is a boundary, `xmax` in this case, then `w` is chosen to be
+    ``xmax - (xmax - xr)/factor``, with steps slowing to a stop at
+    `xmax`. This cautious approach ensures that a minimum near but distinct from
+    the boundary isn't missed while also detecting whether or not the `xmax` is
+    a minimizer when `xmax` is reached after a finite number of steps.
+    """  # noqa: E501
+    callback = None  # works; I just don't want to test it
+
+    temp = _bracket_minimum_iv(func, xm0, xl0, xr0, xmin, xmax, factor, args, maxiter)
+    func, xm0, xl0, xr0, xmin, xmax, factor, args, maxiter = temp
+
+    xs = (xl0, xm0, xr0)
+    func, xs, fs, args, shape, dtype = eim._initialize(func, xs, args)
+
+    xl0, xm0, xr0 = xs
+    fl0, fm0, fr0 = fs
+    xmin = np.broadcast_to(xmin, shape).astype(dtype, copy=False).ravel()
+    xmax = np.broadcast_to(xmax, shape).astype(dtype, copy=False).ravel()
+    # We will modify factor later on so make a copy. np.broadcast_to returns
+    # a read-only view.
+    factor = np.broadcast_to(factor, shape).astype(dtype, copy=True).ravel()
+
+    # To simplify the logic, swap xl and xr if f(xl) < f(xr). We should always be
+    # marching downhill in the direction from xl to xr.
+    comp = fl0 < fr0
+    xl0[comp], xr0[comp] = xr0[comp], xl0[comp]
+    fl0[comp], fr0[comp] = fr0[comp], fl0[comp]
+    # We only need the boundary in the direction we're traveling.
+    limit = np.where(comp, xmin, xmax)
+
+    unlimited = np.isinf(limit)
+    limited = ~unlimited
+    step = np.empty_like(xl0)
+
+    step[unlimited] = (xr0[unlimited] - xm0[unlimited])
+    step[limited] = (limit[limited] - xr0[limited])
+
+    # Step size is divided by factor for case where there is a limit.
+    factor[limited] = 1 / factor[limited]
+
+    status = np.full_like(xl0, eim._EINPROGRESS, dtype=int)
+    nit, nfev = 0, 3
+
+    work = _RichResult(xl=xl0, xm=xm0, xr=xr0, xr0=xr0, fl=fl0, fm=fm0, fr=fr0,
+                       step=step, limit=limit, limited=limited, factor=factor, nit=nit,
+                       nfev=nfev, status=status, args=args)
+
+    res_work_pairs = [('status', 'status'), ('xl', 'xl'), ('xm', 'xm'), ('xr', 'xr'),
+                      ('nit', 'nit'), ('nfev', 'nfev'), ('fl', 'fl'), ('fm', 'fm'),
+                      ('fr', 'fr')]
+
+    def pre_func_eval(work):
+        work.step *= work.factor
+        x = np.empty_like(work.xr)
+        x[~work.limited] = work.xr0[~work.limited] + work.step[~work.limited]
+        x[work.limited] = work.limit[work.limited] - work.step[work.limited]
+        # Since the new bracket endpoint is calculated from an offset with the
+        # limit, it may be the case that the new endpoint equals the old endpoint,
+        # when the old endpoint is sufficiently close to the limit. We use the
+        # limit itself as the new endpoint in these cases.
+        x[work.limited] = np.where(
+            x[work.limited] == work.xr[work.limited],
+            work.limit[work.limited],
+            x[work.limited],
+        )
+        return x
+
+    def post_func_eval(x, f, work):
+        work.xl, work.xm, work.xr = work.xm, work.xr, x
+        work.fl, work.fm, work.fr = work.fm, work.fr, f
+
+    def check_termination(work):
+        # Condition 1: A valid bracket has been found.
+        stop = (
+            (work.fl >= work.fm) & (work.fr > work.fm)
+            | (work.fl > work.fm) & (work.fr >= work.fm)
+        )
+        work.status[stop] = eim._ECONVERGED
+
+        # Condition 2: Moving end of bracket reaches limit.
+        i = (work.xr == work.limit) & ~stop
+        work.status[i] = _ELIMITS
+        stop[i] = True
+
+        # Condition 3: non-finite value encountered
+        i = ~(np.isfinite(work.xr) & np.isfinite(work.fr)) & ~stop
+        work.status[i] = eim._EVALUEERR
+        stop[i] = True
+
+        return stop
+
+    def post_termination_check(work):
+        pass
+
+    def customize_result(res, shape):
+        # Reorder entries of xl and xr if they were swapped due to f(xl0) < f(xr0).
+        comp = res['xl'] > res['xr']
+        res['xl'][comp], res['xr'][comp] = res['xr'][comp], res['xl'][comp]
+        res['fl'][comp], res['fr'][comp] = res['fr'][comp], res['fl'][comp]
+        return shape
+
+    return eim._loop(work, callback, shape,
+                     maxiter, func, args, dtype,
+                     pre_func_eval, post_func_eval,
+                     check_termination, post_termination_check,
+                     customize_result, res_work_pairs)
@@ -0,0 +1,524 @@
+import numpy as np
+from ._zeros_py import _xtol, _rtol, _iter
+import scipy._lib._elementwise_iterative_method as eim
+from scipy._lib._util import _RichResult
+
+def _chandrupatla(func, a, b, *, args=(), xatol=_xtol, xrtol=_rtol,
+                  fatol=None, frtol=0, maxiter=_iter, callback=None):
+    """Find the root of an elementwise function using Chandrupatla's algorithm.
+
+    For each element of the output of `func`, `chandrupatla` seeks the scalar
+    root that makes the element 0. This function allows for `a`, `b`, and the
+    output of `func` to be of any broadcastable shapes.
+
+    Parameters
+    ----------
+    func : callable
+        The function whose root is desired. The signature must be::
+
+            func(x: ndarray, *args) -> ndarray
+
+         where each element of ``x`` is a finite real and ``args`` is a tuple,
+         which may contain an arbitrary number of components of any type(s).
+         ``func`` must be an elementwise function: each element ``func(x)[i]``
+         must equal ``func(x[i])`` for all indices ``i``. `_chandrupatla`
+         seeks an array ``x`` such that ``func(x)`` is an array of zeros.
+    a, b : array_like
+        The lower and upper bounds of the root of the function. Must be
+        broadcastable with one another.
+    args : tuple, optional
+        Additional positional arguments to be passed to `func`.
+    xatol, xrtol, fatol, frtol : float, optional
+        Absolute and relative tolerances on the root and function value.
+        See Notes for details.
+    maxiter : int, optional
+        The maximum number of iterations of the algorithm to perform.
+    callback : callable, optional
+        An optional user-supplied function to be called before the first
+        iteration and after each iteration.
+        Called as ``callback(res)``, where ``res`` is a ``_RichResult``
+        similar to that returned by `_chandrupatla` (but containing the current
+        iterate's values of all variables). If `callback` raises a
+        ``StopIteration``, the algorithm will terminate immediately and
+        `_chandrupatla` will return a result.
+
+    Returns
+    -------
+    res : _RichResult
+        An instance of `scipy._lib._util._RichResult` with the following
+        attributes. The descriptions are written as though the values will be
+        scalars; however, if `func` returns an array, the outputs will be
+        arrays of the same shape.
+
+        x : float
+            The root of the function, if the algorithm terminated successfully.
+        nfev : int
+            The number of times the function was called to find the root.
+        nit : int
+            The number of iterations of Chandrupatla's algorithm performed.
+        status : int
+            An integer representing the exit status of the algorithm.
+            ``0`` : The algorithm converged to the specified tolerances.
+            ``-1`` : The algorithm encountered an invalid bracket.
+            ``-2`` : The maximum number of iterations was reached.
+            ``-3`` : A non-finite value was encountered.
+            ``-4`` : Iteration was terminated by `callback`.
+            ``1`` : The algorithm is proceeding normally (in `callback` only).
+        success : bool
+            ``True`` when the algorithm terminated successfully (status ``0``).
+        fun : float
+            The value of `func` evaluated at `x`.
+        xl, xr : float
+            The lower and upper ends of the bracket.
+        fl, fr : float
+            The function value at the lower and upper ends of the bracket.
+
+    Notes
+    -----
+    Implemented based on Chandrupatla's original paper [1]_.
+
+    If ``xl`` and ``xr`` are the left and right ends of the bracket,
+    ``xmin = xl if abs(func(xl)) <= abs(func(xr)) else xr``,
+    and ``fmin0 = min(func(a), func(b))``, then the algorithm is considered to
+    have converged when ``abs(xr - xl) < xatol + abs(xmin) * xrtol`` or
+    ``fun(xmin) <= fatol + abs(fmin0) * frtol``. This is equivalent to the
+    termination condition described in [1]_ with ``xrtol = 4e-10``,
+    ``xatol = 1e-5``, and ``fatol = frtol = 0``. The default values are
+    ``xatol = 2e-12``, ``xrtol = 4 * np.finfo(float).eps``, ``frtol = 0``,
+    and ``fatol`` is the smallest normal number of the ``dtype`` returned
+    by ``func``.
+
+    References
+    ----------
+
+    .. [1] Chandrupatla, Tirupathi R.
+        "A new hybrid quadratic/bisection algorithm for finding the zero of a
+        nonlinear function without using derivatives".
+        Advances in Engineering Software, 28(3), 145-149.
+        https://doi.org/10.1016/s0965-9978(96)00051-8
+
+    See Also
+    --------
+    brentq, brenth, ridder, bisect, newton
+
+    Examples
+    --------
+    >>> from scipy import optimize
+    >>> def f(x, c):
+    ...     return x**3 - 2*x - c
+    >>> c = 5
+    >>> res = optimize._chandrupatla._chandrupatla(f, 0, 3, args=(c,))
+    >>> res.x
+    2.0945514818937463
+
+    >>> c = [3, 4, 5]
+    >>> res = optimize._chandrupatla._chandrupatla(f, 0, 3, args=(c,))
+    >>> res.x
+    array([1.8932892 , 2.        , 2.09455148])
+
+    """
+    res = _chandrupatla_iv(func, args, xatol, xrtol,
+                           fatol, frtol, maxiter, callback)
+    func, args, xatol, xrtol, fatol, frtol, maxiter, callback = res
+
+    # Initialization
+    temp = eim._initialize(func, (a, b), args)
+    func, xs, fs, args, shape, dtype = temp
+    x1, x2 = xs
+    f1, f2 = fs
+    status = np.full_like(x1, eim._EINPROGRESS, dtype=int)  # in progress
+    nit, nfev = 0, 2  # two function evaluations performed above
+    xatol = _xtol if xatol is None else xatol
+    xrtol = _rtol if xrtol is None else xrtol
+    fatol = np.finfo(dtype).tiny if fatol is None else fatol
+    frtol = frtol * np.minimum(np.abs(f1), np.abs(f2))
+    work = _RichResult(x1=x1, f1=f1, x2=x2, f2=f2, x3=None, f3=None, t=0.5,
+                       xatol=xatol, xrtol=xrtol, fatol=fatol, frtol=frtol,
+                       nit=nit, nfev=nfev, status=status)
+    res_work_pairs = [('status', 'status'), ('x', 'xmin'), ('fun', 'fmin'),
+                      ('nit', 'nit'), ('nfev', 'nfev'), ('xl', 'x1'),
+                      ('fl', 'f1'), ('xr', 'x2'), ('fr', 'f2')]
+
+    def pre_func_eval(work):
+        # [1] Figure 1 (first box)
+        x = work.x1 + work.t * (work.x2 - work.x1)
+        return x
+
+    def post_func_eval(x, f, work):
+        # [1] Figure 1 (first diamond and boxes)
+        # Note: y/n are reversed in figure; compare to BASIC in appendix
+        work.x3, work.f3 = work.x2.copy(), work.f2.copy()
+        j = np.sign(f) == np.sign(work.f1)
+        nj = ~j
+        work.x3[j], work.f3[j] = work.x1[j], work.f1[j]
+        work.x2[nj], work.f2[nj] = work.x1[nj], work.f1[nj]
+        work.x1, work.f1 = x, f
+
+    def check_termination(work):
+        # [1] Figure 1 (second diamond)
+        # Check for all terminal conditions and record statuses.
+
+        # See [1] Section 4 (first two sentences)
+        i = np.abs(work.f1) < np.abs(work.f2)
+        work.xmin = np.choose(i, (work.x2, work.x1))
+        work.fmin = np.choose(i, (work.f2, work.f1))
+        stop = np.zeros_like(work.x1, dtype=bool)  # termination condition met
+
+        # This is the convergence criterion used in bisect. Chandrupatla's
+        # criterion is equivalent to this except with a factor of 4 on `xrtol`.
+        work.dx = abs(work.x2 - work.x1)
+        work.tol = abs(work.xmin) * work.xrtol + work.xatol
+        i = work.dx < work.tol
+        # Modify in place to incorporate tolerance on function value. Note that
+        # `frtol` has been redefined as `frtol = frtol * np.minimum(f1, f2)`,
+        # where `f1` and `f2` are the function evaluated at the original ends of
+        # the bracket.
+        i |= np.abs(work.fmin) <= work.fatol + work.frtol
+        work.status[i] = eim._ECONVERGED
+        stop[i] = True
+
+        i = (np.sign(work.f1) == np.sign(work.f2)) & ~stop
+        work.xmin[i], work.fmin[i], work.status[i] = np.nan, np.nan, eim._ESIGNERR
+        stop[i] = True
+
+        i = ~((np.isfinite(work.x1) & np.isfinite(work.x2)
+               & np.isfinite(work.f1) & np.isfinite(work.f2)) | stop)
+        work.xmin[i], work.fmin[i], work.status[i] = np.nan, np.nan, eim._EVALUEERR
+        stop[i] = True
+
+        return stop
+
+    def post_termination_check(work):
+        # [1] Figure 1 (third diamond and boxes / Equation 1)
+        xi1 = (work.x1 - work.x2) / (work.x3 - work.x2)
+        phi1 = (work.f1 - work.f2) / (work.f3 - work.f2)
+        alpha = (work.x3 - work.x1) / (work.x2 - work.x1)
+        j = ((1 - np.sqrt(1 - xi1)) < phi1) & (phi1 < np.sqrt(xi1))
+
+        f1j, f2j, f3j, alphaj = work.f1[j], work.f2[j], work.f3[j], alpha[j]
+        t = np.full_like(alpha, 0.5)
+        t[j] = (f1j / (f1j - f2j) * f3j / (f3j - f2j)
+                - alphaj * f1j / (f3j - f1j) * f2j / (f2j - f3j))
+
+        # [1] Figure 1 (last box; see also BASIC in appendix with comment
+        # "Adjust T Away from the Interval Boundary")
+        tl = 0.5 * work.tol / work.dx
+        work.t = np.clip(t, tl, 1 - tl)
+
+    def customize_result(res, shape):
+        xl, xr, fl, fr = res['xl'], res['xr'], res['fl'], res['fr']
+        i = res['xl'] < res['xr']
+        res['xl'] = np.choose(i, (xr, xl))
+        res['xr'] = np.choose(i, (xl, xr))
+        res['fl'] = np.choose(i, (fr, fl))
+        res['fr'] = np.choose(i, (fl, fr))
+        return shape
+
+    return eim._loop(work, callback, shape, maxiter, func, args, dtype,
+                     pre_func_eval, post_func_eval, check_termination,
+                     post_termination_check, customize_result, res_work_pairs)
+
+
+def _chandrupatla_iv(func, args, xatol, xrtol,
+                     fatol, frtol, maxiter, callback):
+    # Input validation for `_chandrupatla`
+
+    if not callable(func):
+        raise ValueError('`func` must be callable.')
+
+    if not np.iterable(args):
+        args = (args,)
+
+    tols = np.asarray([xatol if xatol is not None else 1,
+                       xrtol if xrtol is not None else 1,
+                       fatol if fatol is not None else 1,
+                       frtol if frtol is not None else 1])
+    if (not np.issubdtype(tols.dtype, np.number) or np.any(tols < 0)
+            or np.any(np.isnan(tols)) or tols.shape != (4,)):
+        raise ValueError('Tolerances must be non-negative scalars.')
+
+    maxiter_int = int(maxiter)
+    if maxiter != maxiter_int or maxiter < 0:
+        raise ValueError('`maxiter` must be a non-negative integer.')
+
+    if callback is not None and not callable(callback):
+        raise ValueError('`callback` must be callable.')
+
+    return func, args, xatol, xrtol, fatol, frtol, maxiter, callback
+
+
+def _chandrupatla_minimize(func, x1, x2, x3, *, args=(), xatol=None,
+                           xrtol=None, fatol=None, frtol=None, maxiter=100,
+                           callback=None):
+    """Find the minimizer of an elementwise function.
+
+    For each element of the output of `func`, `_chandrupatla_minimize` seeks
+    the scalar minimizer that minimizes the element. This function allows for
+    `x1`, `x2`, `x3`, and the elements of `args` to be arrays of any
+    broadcastable shapes.
+
+    Parameters
+    ----------
+    func : callable
+        The function whose minimizer is desired. The signature must be::
+
+            func(x: ndarray, *args) -> ndarray
+
+         where each element of ``x`` is a finite real and ``args`` is a tuple,
+         which may contain an arbitrary number of arrays that are broadcastable
+         with `x`. ``func`` must be an elementwise function: each element
+         ``func(x)[i]`` must equal ``func(x[i])`` for all indices ``i``.
+         `_chandrupatla` seeks an array ``x`` such that ``func(x)`` is an array
+         of minima.
+    x1, x2, x3 : array_like
+        The abscissae of a standard scalar minimization bracket. A bracket is
+        valid if ``x1 < x2 < x3`` and ``func(x1) > func(x2) <= func(x3)``.
+        Must be broadcastable with one another and `args`.
+    args : tuple, optional
+        Additional positional arguments to be passed to `func`.  Must be arrays
+        broadcastable with `x1`, `x2`, and `x3`. If the callable to be
+        differentiated requires arguments that are not broadcastable with `x`,
+        wrap that callable with `func` such that `func` accepts only `x` and
+        broadcastable arrays.
+    xatol, xrtol, fatol, frtol : float, optional
+        Absolute and relative tolerances on the minimizer and function value.
+        See Notes for details.
+    maxiter : int, optional
+        The maximum number of iterations of the algorithm to perform.
+    callback : callable, optional
+        An optional user-supplied function to be called before the first
+        iteration and after each iteration.
+        Called as ``callback(res)``, where ``res`` is a ``_RichResult``
+        similar to that returned by `_chandrupatla_minimize` (but containing
+        the current iterate's values of all variables). If `callback` raises a
+        ``StopIteration``, the algorithm will terminate immediately and
+        `_chandrupatla_minimize` will return a result.
+
+    Returns
+    -------
+    res : _RichResult
+        An instance of `scipy._lib._util._RichResult` with the following
+        attributes. (The descriptions are written as though the values will be
+        scalars; however, if `func` returns an array, the outputs will be
+        arrays of the same shape.)
+
+        success : bool
+            ``True`` when the algorithm terminated successfully (status ``0``).
+        status : int
+            An integer representing the exit status of the algorithm.
+            ``0`` : The algorithm converged to the specified tolerances.
+            ``-1`` : The algorithm encountered an invalid bracket.
+            ``-2`` : The maximum number of iterations was reached.
+            ``-3`` : A non-finite value was encountered.
+            ``-4`` : Iteration was terminated by `callback`.
+            ``1`` : The algorithm is proceeding normally (in `callback` only).
+        x : float
+            The minimizer of the function, if the algorithm terminated
+            successfully.
+        fun : float
+            The value of `func` evaluated at `x`.
+        nfev : int
+            The number of points at which `func` was evaluated.
+        nit : int
+            The number of iterations of the algorithm that were performed.
+        xl, xm, xr : float
+            The final three-point bracket.
+        fl, fm, fr : float
+            The function value at the bracket points.
+
+    Notes
+    -----
+    Implemented based on Chandrupatla's original paper [1]_.
+
+    If ``x1 < x2 < x3`` are the points of the bracket and ``f1 > f2 <= f3``
+    are the values of ``func`` at those points, then the algorithm is
+    considered to have converged when ``x3 - x1 <= abs(x2)*xrtol + xatol``
+    or ``(f1 - 2*f2 + f3)/2 <= abs(f2)*frtol + fatol``. Note that first of
+    these differs from the termination conditions described in [1]_. The
+    default values of `xrtol` is the square root of the precision of the
+    appropriate dtype, and ``xatol=fatol = frtol`` is the smallest normal
+    number of the appropriate dtype.
+
+    References
+    ----------
+    .. [1] Chandrupatla, Tirupathi R. (1998).
+        "An efficient quadratic fit-sectioning algorithm for minimization
+        without derivatives".
+        Computer Methods in Applied Mechanics and Engineering, 152 (1-2),
+        211-217. https://doi.org/10.1016/S0045-7825(97)00190-4
+
+    See Also
+    --------
+    golden, brent, bounded
+
+    Examples
+    --------
+    >>> from scipy.optimize._chandrupatla import _chandrupatla_minimize
+    >>> def f(x, args=1):
+    ...     return (x - args)**2
+    >>> res = _chandrupatla_minimize(f, -5, 0, 5)
+    >>> res.x
+    1.0
+    >>> c = [1, 1.5, 2]
+    >>> res = _chandrupatla_minimize(f, -5, 0, 5, args=(c,))
+    >>> res.x
+    array([1. , 1.5, 2. ])
+    """
+    res = _chandrupatla_iv(func, args, xatol, xrtol,
+                           fatol, frtol, maxiter, callback)
+    func, args, xatol, xrtol, fatol, frtol, maxiter, callback = res
+
+    # Initialization
+    xs = (x1, x2, x3)
+    temp = eim._initialize(func, xs, args)
+    func, xs, fs, args, shape, dtype = temp  # line split for PEP8
+    x1, x2, x3 = xs
+    f1, f2, f3 = fs
+    phi = dtype.type(0.5 + 0.5*5**0.5)  # golden ratio
+    status = np.full_like(x1, eim._EINPROGRESS, dtype=int)  # in progress
+    nit, nfev = 0, 3  # three function evaluations performed above
+    fatol = np.finfo(dtype).tiny if fatol is None else fatol
+    frtol = np.finfo(dtype).tiny if frtol is None else frtol
+    xatol = np.finfo(dtype).tiny if xatol is None else xatol
+    xrtol = np.sqrt(np.finfo(dtype).eps) if xrtol is None else xrtol
+
+    # Ensure that x1 < x2 < x3 initially.
+    xs, fs = np.vstack((x1, x2, x3)), np.vstack((f1, f2, f3))
+    i = np.argsort(xs, axis=0)
+    x1, x2, x3 = np.take_along_axis(xs, i, axis=0)
+    f1, f2, f3 = np.take_along_axis(fs, i, axis=0)
+    q0 = x3.copy()  # "At the start, q0 is set at x3..." ([1] after (7))
+
+    work = _RichResult(x1=x1, f1=f1, x2=x2, f2=f2, x3=x3, f3=f3, phi=phi,
+                       xatol=xatol, xrtol=xrtol, fatol=fatol, frtol=frtol,
+                       nit=nit, nfev=nfev, status=status, q0=q0, args=args)
+    res_work_pairs = [('status', 'status'),
+                      ('x', 'x2'), ('fun', 'f2'),
+                      ('nit', 'nit'), ('nfev', 'nfev'),
+                      ('xl', 'x1'), ('xm', 'x2'), ('xr', 'x3'),
+                      ('fl', 'f1'), ('fm', 'f2'), ('fr', 'f3')]
+
+    def pre_func_eval(work):
+        # `_check_termination` is called first -> `x3 - x2 > x2 - x1`
+        # But let's calculate a few terms that we'll reuse
+        x21 = work.x2 - work.x1
+        x32 = work.x3 - work.x2
+
+        # [1] Section 3. "The quadratic minimum point Q1 is calculated using
+        # the relations developed in the previous section." [1] Section 2 (5/6)
+        A = x21 * (work.f3 - work.f2)
+        B = x32 * (work.f1 - work.f2)
+        C = A / (A + B)
+        # q1 = C * (work.x1 + work.x2) / 2 + (1 - C) * (work.x2 + work.x3) / 2
+        q1 = 0.5 * (C*(work.x1 - work.x3) + work.x2 + work.x3)  # much faster
+        # this is an array, so multiplying by 0.5 does not change dtype
+
+        # "If Q1 and Q0 are sufficiently close... Q1 is accepted if it is
+        # sufficiently away from the inside point x2"
+        i = abs(q1 - work.q0) < 0.5 * abs(x21)  # [1] (7)
+        xi = q1[i]
+        # Later, after (9), "If the point Q1 is in a +/- xtol neighborhood of
+        # x2, the new point is chosen in the larger interval at a distance
+        # tol away from x2."
+        # See also QBASIC code after "Accept Ql adjust if close to X2".
+        j = abs(q1[i] - work.x2[i]) <= work.xtol[i]
+        xi[j] = work.x2[i][j] + np.sign(x32[i][j]) * work.xtol[i][j]
+
+        # "If condition (7) is not satisfied, golden sectioning of the larger
+        # interval is carried out to introduce the new point."
+        # (For simplicity, we go ahead and calculate it for all points, but we
+        # change the elements for which the condition was satisfied.)
+        x = work.x2 + (2 - work.phi) * x32
+        x[i] = xi
+
+        # "We define Q0 as the value of Q1 at the previous iteration."
+        work.q0 = q1
+        return x
+
+    def post_func_eval(x, f, work):
+        # Standard logic for updating a three-point bracket based on a new
+        # point. In QBASIC code, see "IF SGN(X-X2) = SGN(X3-X2) THEN...".
+        # There is an awful lot of data copying going on here; this would
+        # probably benefit from code optimization or implementation in Pythran.
+        i = np.sign(x - work.x2) == np.sign(work.x3 - work.x2)
+        xi, x1i, x2i, x3i = x[i], work.x1[i], work.x2[i], work.x3[i],
+        fi, f1i, f2i, f3i = f[i], work.f1[i], work.f2[i], work.f3[i]
+        j = fi > f2i
+        x3i[j], f3i[j] = xi[j], fi[j]
+        j = ~j
+        x1i[j], f1i[j], x2i[j], f2i[j] = x2i[j], f2i[j], xi[j], fi[j]
+
+        ni = ~i
+        xni, x1ni, x2ni, x3ni = x[ni], work.x1[ni], work.x2[ni], work.x3[ni],
+        fni, f1ni, f2ni, f3ni = f[ni], work.f1[ni], work.f2[ni], work.f3[ni]
+        j = fni > f2ni
+        x1ni[j], f1ni[j] = xni[j], fni[j]
+        j = ~j
+        x3ni[j], f3ni[j], x2ni[j], f2ni[j] = x2ni[j], f2ni[j], xni[j], fni[j]
+
+        work.x1[i], work.x2[i], work.x3[i] = x1i, x2i, x3i
+        work.f1[i], work.f2[i], work.f3[i] = f1i, f2i, f3i
+        work.x1[ni], work.x2[ni], work.x3[ni] = x1ni, x2ni, x3ni,
+        work.f1[ni], work.f2[ni], work.f3[ni] = f1ni, f2ni, f3ni
+
+    def check_termination(work):
+        # Check for all terminal conditions and record statuses.
+        stop = np.zeros_like(work.x1, dtype=bool)  # termination condition met
+
+        # Bracket is invalid; stop and don't return minimizer/minimum
+        i = ((work.f2 > work.f1) | (work.f2 > work.f3))
+        work.x2[i], work.f2[i] = np.nan, np.nan
+        stop[i], work.status[i] = True, eim._ESIGNERR
+
+        # Non-finite values; stop and don't return minimizer/minimum
+        finite = np.isfinite(work.x1+work.x2+work.x3+work.f1+work.f2+work.f3)
+        i = ~(finite | stop)
+        work.x2[i], work.f2[i] = np.nan, np.nan
+        stop[i], work.status[i] = True, eim._EVALUEERR
+
+        # [1] Section 3 "Points 1 and 3 are interchanged if necessary to make
+        # the (x2, x3) the larger interval."
+        # Note: I had used np.choose; this is much faster. This would be a good
+        # place to save e.g. `work.x3 - work.x2` for reuse, but I tried and
+        # didn't notice a speed boost, so let's keep it simple.
+        i = abs(work.x3 - work.x2) < abs(work.x2 - work.x1)
+        temp = work.x1[i]
+        work.x1[i] = work.x3[i]
+        work.x3[i] = temp
+        temp = work.f1[i]
+        work.f1[i] = work.f3[i]
+        work.f3[i] = temp
+
+        # [1] Section 3 (bottom of page 212)
+        # "We set a tolerance value xtol..."
+        work.xtol = abs(work.x2) * work.xrtol + work.xatol  # [1] (8)
+        # "The convergence based on interval is achieved when..."
+        # Note: Equality allowed in case of `xtol=0`
+        i = abs(work.x3 - work.x2) <= 2 * work.xtol  # [1] (9)
+
+        # "We define ftol using..."
+        ftol = abs(work.f2) * work.frtol + work.fatol  # [1] (10)
+        # "The convergence based on function values is achieved when..."
+        # Note 1: modify in place to incorporate tolerance on function value.
+        # Note 2: factor of 2 is not in the text; see QBASIC start of DO loop
+        i |= (work.f1 - 2 * work.f2 + work.f3) <= 2*ftol  # [1] (11)
+        i &= ~stop
+        stop[i], work.status[i] = True, eim._ECONVERGED
+
+        return stop
+
+    def post_termination_check(work):
+        pass
+
+    def customize_result(res, shape):
+        xl, xr, fl, fr = res['xl'], res['xr'], res['fl'], res['fr']
+        i = res['xl'] < res['xr']
+        res['xl'] = np.choose(i, (xr, xl))
+        res['xr'] = np.choose(i, (xl, xr))
+        res['fl'] = np.choose(i, (fr, fl))
+        res['fr'] = np.choose(i, (fl, fr))
+        return shape
+
+    return eim._loop(work, callback, shape, maxiter, func, args, dtype,
+                     pre_func_eval, post_func_eval, check_termination,
+                     post_termination_check, customize_result, res_work_pairs)
@@ -0,0 +1,316 @@
+"""
+Interface to Constrained Optimization By Linear Approximation
+
+Functions
+---------
+.. autosummary::
+   :toctree: generated/
+
+    fmin_cobyla
+
+"""
+
+import functools
+from threading import RLock
+
+import numpy as np
+from scipy.optimize import _cobyla as cobyla
+from ._optimize import (OptimizeResult, _check_unknown_options,
+    _prepare_scalar_function)
+try:
+    from itertools import izip
+except ImportError:
+    izip = zip
+
+__all__ = ['fmin_cobyla']
+
+# Workaround as _cobyla.minimize is not threadsafe
+# due to an unknown f2py bug and can segfault,
+# see gh-9658.
+_module_lock = RLock()
+def synchronized(func):
+    @functools.wraps(func)
+    def wrapper(*args, **kwargs):
+        with _module_lock:
+            return func(*args, **kwargs)
+    return wrapper
+
+@synchronized
+def fmin_cobyla(func, x0, cons, args=(), consargs=None, rhobeg=1.0,
+                rhoend=1e-4, maxfun=1000, disp=None, catol=2e-4,
+                *, callback=None):
+    """
+    Minimize a function using the Constrained Optimization By Linear
+    Approximation (COBYLA) method. This method wraps a FORTRAN
+    implementation of the algorithm.
+
+    Parameters
+    ----------
+    func : callable
+        Function to minimize. In the form func(x, \\*args).
+    x0 : ndarray
+        Initial guess.
+    cons : sequence
+        Constraint functions; must all be ``>=0`` (a single function
+        if only 1 constraint). Each function takes the parameters `x`
+        as its first argument, and it can return either a single number or
+        an array or list of numbers.
+    args : tuple, optional
+        Extra arguments to pass to function.
+    consargs : tuple, optional
+        Extra arguments to pass to constraint functions (default of None means
+        use same extra arguments as those passed to func).
+        Use ``()`` for no extra arguments.
+    rhobeg : float, optional
+        Reasonable initial changes to the variables.
+    rhoend : float, optional
+        Final accuracy in the optimization (not precisely guaranteed). This
+        is a lower bound on the size of the trust region.
+    disp : {0, 1, 2, 3}, optional
+        Controls the frequency of output; 0 implies no output.
+    maxfun : int, optional
+        Maximum number of function evaluations.
+    catol : float, optional
+        Absolute tolerance for constraint violations.
+    callback : callable, optional
+        Called after each iteration, as ``callback(x)``, where ``x`` is the
+        current parameter vector.
+
+    Returns
+    -------
+    x : ndarray
+        The argument that minimises `f`.
+
+    See also
+    --------
+    minimize: Interface to minimization algorithms for multivariate
+        functions. See the 'COBYLA' `method` in particular.
+
+    Notes
+    -----
+    This algorithm is based on linear approximations to the objective
+    function and each constraint. We briefly describe the algorithm.
+
+    Suppose the function is being minimized over k variables. At the
+    jth iteration the algorithm has k+1 points v_1, ..., v_(k+1),
+    an approximate solution x_j, and a radius RHO_j.
+    (i.e., linear plus a constant) approximations to the objective
+    function and constraint functions such that their function values
+    agree with the linear approximation on the k+1 points v_1,.., v_(k+1).
+    This gives a linear program to solve (where the linear approximations
+    of the constraint functions are constrained to be non-negative).
+
+    However, the linear approximations are likely only good
+    approximations near the current simplex, so the linear program is
+    given the further requirement that the solution, which
+    will become x_(j+1), must be within RHO_j from x_j. RHO_j only
+    decreases, never increases. The initial RHO_j is rhobeg and the
+    final RHO_j is rhoend. In this way COBYLA's iterations behave
+    like a trust region algorithm.
+
+    Additionally, the linear program may be inconsistent, or the
+    approximation may give poor improvement. For details about
+    how these issues are resolved, as well as how the points v_i are
+    updated, refer to the source code or the references below.
+
+
+    References
+    ----------
+    Powell M.J.D. (1994), "A direct search optimization method that models
+    the objective and constraint functions by linear interpolation.", in
+    Advances in Optimization and Numerical Analysis, eds. S. Gomez and
+    J-P Hennart, Kluwer Academic (Dordrecht), pp. 51-67
+
+    Powell M.J.D. (1998), "Direct search algorithms for optimization
+    calculations", Acta Numerica 7, 287-336
+
+    Powell M.J.D. (2007), "A view of algorithms for optimization without
+    derivatives", Cambridge University Technical Report DAMTP 2007/NA03
+
+
+    Examples
+    --------
+    Minimize the objective function f(x,y) = x*y subject
+    to the constraints x**2 + y**2 < 1 and y > 0::
+
+        >>> def objective(x):
+        ...     return x[0]*x[1]
+        ...
+        >>> def constr1(x):
+        ...     return 1 - (x[0]**2 + x[1]**2)
+        ...
+        >>> def constr2(x):
+        ...     return x[1]
+        ...
+        >>> from scipy.optimize import fmin_cobyla
+        >>> fmin_cobyla(objective, [0.0, 0.1], [constr1, constr2], rhoend=1e-7)
+        array([-0.70710685,  0.70710671])
+
+    The exact solution is (-sqrt(2)/2, sqrt(2)/2).
+
+
+
+    """
+    err = "cons must be a sequence of callable functions or a single"\
+          " callable function."
+    try:
+        len(cons)
+    except TypeError as e:
+        if callable(cons):
+            cons = [cons]
+        else:
+            raise TypeError(err) from e
+    else:
+        for thisfunc in cons:
+            if not callable(thisfunc):
+                raise TypeError(err)
+
+    if consargs is None:
+        consargs = args
+
+    # build constraints
+    con = tuple({'type': 'ineq', 'fun': c, 'args': consargs} for c in cons)
+
+    # options
+    opts = {'rhobeg': rhobeg,
+            'tol': rhoend,
+            'disp': disp,
+            'maxiter': maxfun,
+            'catol': catol,
+            'callback': callback}
+
+    sol = _minimize_cobyla(func, x0, args, constraints=con,
+                           **opts)
+    if disp and not sol['success']:
+        print(f"COBYLA failed to find a solution: {sol.message}")
+    return sol['x']
+
+
+@synchronized
+def _minimize_cobyla(fun, x0, args=(), constraints=(),
+                     rhobeg=1.0, tol=1e-4, maxiter=1000,
+                     disp=False, catol=2e-4, callback=None, bounds=None,
+                     **unknown_options):
+    """
+    Minimize a scalar function of one or more variables using the
+    Constrained Optimization BY Linear Approximation (COBYLA) algorithm.
+
+    Options
+    -------
+    rhobeg : float
+        Reasonable initial changes to the variables.
+    tol : float
+        Final accuracy in the optimization (not precisely guaranteed).
+        This is a lower bound on the size of the trust region.
+    disp : bool
+        Set to True to print convergence messages. If False,
+        `verbosity` is ignored as set to 0.
+    maxiter : int
+        Maximum number of function evaluations.
+    catol : float
+        Tolerance (absolute) for constraint violations
+
+    """
+    _check_unknown_options(unknown_options)
+    maxfun = maxiter
+    rhoend = tol
+    iprint = int(bool(disp))
+
+    # check constraints
+    if isinstance(constraints, dict):
+        constraints = (constraints, )
+
+    if bounds:
+        i_lb = np.isfinite(bounds.lb)
+        if np.any(i_lb):
+            def lb_constraint(x, *args, **kwargs):
+                return x[i_lb] - bounds.lb[i_lb]
+
+            constraints.append({'type': 'ineq', 'fun': lb_constraint})
+
+        i_ub = np.isfinite(bounds.ub)
+        if np.any(i_ub):
+            def ub_constraint(x):
+                return bounds.ub[i_ub] - x[i_ub]
+
+            constraints.append({'type': 'ineq', 'fun': ub_constraint})
+
+    for ic, con in enumerate(constraints):
+        # check type
+        try:
+            ctype = con['type'].lower()
+        except KeyError as e:
+            raise KeyError('Constraint %d has no type defined.' % ic) from e
+        except TypeError as e:
+            raise TypeError('Constraints must be defined using a '
+                            'dictionary.') from e
+        except AttributeError as e:
+            raise TypeError("Constraint's type must be a string.") from e
+        else:
+            if ctype != 'ineq':
+                raise ValueError("Constraints of type '%s' not handled by "
+                                 "COBYLA." % con['type'])
+
+        # check function
+        if 'fun' not in con:
+            raise KeyError('Constraint %d has no function defined.' % ic)
+
+        # check extra arguments
+        if 'args' not in con:
+            con['args'] = ()
+
+    # m is the total number of constraint values
+    # it takes into account that some constraints may be vector-valued
+    cons_lengths = []
+    for c in constraints:
+        f = c['fun'](x0, *c['args'])
+        try:
+            cons_length = len(f)
+        except TypeError:
+            cons_length = 1
+        cons_lengths.append(cons_length)
+    m = sum(cons_lengths)
+
+    # create the ScalarFunction, cobyla doesn't require derivative function
+    def _jac(x, *args):
+        return None
+
+    sf = _prepare_scalar_function(fun, x0, args=args, jac=_jac)
+
+    def calcfc(x, con):
+        f = sf.fun(x)
+        i = 0
+        for size, c in izip(cons_lengths, constraints):
+            con[i: i + size] = c['fun'](x, *c['args'])
+            i += size
+        return f
+
+    def wrapped_callback(x):
+        if callback is not None:
+            callback(np.copy(x))
+
+    info = np.zeros(4, np.float64)
+    xopt, info = cobyla.minimize(calcfc, m=m, x=np.copy(x0), rhobeg=rhobeg,
+                                  rhoend=rhoend, iprint=iprint, maxfun=maxfun,
+                                  dinfo=info, callback=wrapped_callback)
+
+    if info[3] > catol:
+        # Check constraint violation
+        info[0] = 4
+
+    return OptimizeResult(x=xopt,
+                          status=int(info[0]),
+                          success=info[0] == 1,
+                          message={1: 'Optimization terminated successfully.',
+                                   2: 'Maximum number of function evaluations '
+                                      'has been exceeded.',
+                                   3: 'Rounding errors are becoming damaging '
+                                      'in COBYLA subroutine.',
+                                   4: 'Did not converge to a solution '
+                                      'satisfying the constraints. See '
+                                      '`maxcv` for magnitude of violation.',
+                                   5: 'NaN result encountered.'
+                                   }.get(info[0], 'Unknown exit status.'),
+                          nfev=int(info[1]),
+                          fun=info[2],
+                          maxcv=info[3])
@@ -0,0 +1,590 @@
+"""Constraints definition for minimize."""
+import numpy as np
+from ._hessian_update_strategy import BFGS
+from ._differentiable_functions import (
+    VectorFunction, LinearVectorFunction, IdentityVectorFunction)
+from ._optimize import OptimizeWarning
+from warnings import warn, catch_warnings, simplefilter, filterwarnings
+from scipy.sparse import issparse
+
+
+def _arr_to_scalar(x):
+    # If x is a numpy array, return x.item().  This will
+    # fail if the array has more than one element.
+    return x.item() if isinstance(x, np.ndarray) else x
+
+
+class NonlinearConstraint:
+    """Nonlinear constraint on the variables.
+
+    The constraint has the general inequality form::
+
+        lb <= fun(x) <= ub
+
+    Here the vector of independent variables x is passed as ndarray of shape
+    (n,) and ``fun`` returns a vector with m components.
+
+    It is possible to use equal bounds to represent an equality constraint or
+    infinite bounds to represent a one-sided constraint.
+
+    Parameters
+    ----------
+    fun : callable
+        The function defining the constraint.
+        The signature is ``fun(x) -> array_like, shape (m,)``.
+    lb, ub : array_like
+        Lower and upper bounds on the constraint. Each array must have the
+        shape (m,) or be a scalar, in the latter case a bound will be the same
+        for all components of the constraint. Use ``np.inf`` with an
+        appropriate sign to specify a one-sided constraint.
+        Set components of `lb` and `ub` equal to represent an equality
+        constraint. Note that you can mix constraints of different types:
+        interval, one-sided or equality, by setting different components of
+        `lb` and `ub` as  necessary.
+    jac : {callable,  '2-point', '3-point', 'cs'}, optional
+        Method of computing the Jacobian matrix (an m-by-n matrix,
+        where element (i, j) is the partial derivative of f[i] with
+        respect to x[j]).  The keywords {'2-point', '3-point',
+        'cs'} select a finite difference scheme for the numerical estimation.
+        A callable must have the following signature:
+        ``jac(x) -> {ndarray, sparse matrix}, shape (m, n)``.
+        Default is '2-point'.
+    hess : {callable, '2-point', '3-point', 'cs', HessianUpdateStrategy, None}, optional
+        Method for computing the Hessian matrix. The keywords
+        {'2-point', '3-point', 'cs'} select a finite difference scheme for
+        numerical  estimation.  Alternatively, objects implementing
+        `HessianUpdateStrategy` interface can be used to approximate the
+        Hessian. Currently available implementations are:
+
+            - `BFGS` (default option)
+            - `SR1`
+
+        A callable must return the Hessian matrix of ``dot(fun, v)`` and
+        must have the following signature:
+        ``hess(x, v) -> {LinearOperator, sparse matrix, array_like}, shape (n, n)``.
+        Here ``v`` is ndarray with shape (m,) containing Lagrange multipliers.
+    keep_feasible : array_like of bool, optional
+        Whether to keep the constraint components feasible throughout
+        iterations. A single value set this property for all components.
+        Default is False. Has no effect for equality constraints.
+    finite_diff_rel_step: None or array_like, optional
+        Relative step size for the finite difference approximation. Default is
+        None, which will select a reasonable value automatically depending
+        on a finite difference scheme.
+    finite_diff_jac_sparsity: {None, array_like, sparse matrix}, optional
+        Defines the sparsity structure of the Jacobian matrix for finite
+        difference estimation, its shape must be (m, n). If the Jacobian has
+        only few non-zero elements in *each* row, providing the sparsity
+        structure will greatly speed up the computations. A zero entry means
+        that a corresponding element in the Jacobian is identically zero.
+        If provided, forces the use of 'lsmr' trust-region solver.
+        If None (default) then dense differencing will be used.
+
+    Notes
+    -----
+    Finite difference schemes {'2-point', '3-point', 'cs'} may be used for
+    approximating either the Jacobian or the Hessian. We, however, do not allow
+    its use for approximating both simultaneously. Hence whenever the Jacobian
+    is estimated via finite-differences, we require the Hessian to be estimated
+    using one of the quasi-Newton strategies.
+
+    The scheme 'cs' is potentially the most accurate, but requires the function
+    to correctly handles complex inputs and be analytically continuable to the
+    complex plane. The scheme '3-point' is more accurate than '2-point' but
+    requires twice as many operations.
+
+    Examples
+    --------
+    Constrain ``x[0] < sin(x[1]) + 1.9``
+
+    >>> from scipy.optimize import NonlinearConstraint
+    >>> import numpy as np
+    >>> con = lambda x: x[0] - np.sin(x[1])
+    >>> nlc = NonlinearConstraint(con, -np.inf, 1.9)
+
+    """
+    def __init__(self, fun, lb, ub, jac='2-point', hess=BFGS(),
+                 keep_feasible=False, finite_diff_rel_step=None,
+                 finite_diff_jac_sparsity=None):
+        self.fun = fun
+        self.lb = lb
+        self.ub = ub
+        self.finite_diff_rel_step = finite_diff_rel_step
+        self.finite_diff_jac_sparsity = finite_diff_jac_sparsity
+        self.jac = jac
+        self.hess = hess
+        self.keep_feasible = keep_feasible
+
+
+class LinearConstraint:
+    """Linear constraint on the variables.
+
+    The constraint has the general inequality form::
+
+        lb <= A.dot(x) <= ub
+
+    Here the vector of independent variables x is passed as ndarray of shape
+    (n,) and the matrix A has shape (m, n).
+
+    It is possible to use equal bounds to represent an equality constraint or
+    infinite bounds to represent a one-sided constraint.
+
+    Parameters
+    ----------
+    A : {array_like, sparse matrix}, shape (m, n)
+        Matrix defining the constraint.
+    lb, ub : dense array_like, optional
+        Lower and upper limits on the constraint. Each array must have the
+        shape (m,) or be a scalar, in the latter case a bound will be the same
+        for all components of the constraint. Use ``np.inf`` with an
+        appropriate sign to specify a one-sided constraint.
+        Set components of `lb` and `ub` equal to represent an equality
+        constraint. Note that you can mix constraints of different types:
+        interval, one-sided or equality, by setting different components of
+        `lb` and `ub` as  necessary. Defaults to ``lb = -np.inf``
+        and ``ub = np.inf`` (no limits).
+    keep_feasible : dense array_like of bool, optional
+        Whether to keep the constraint components feasible throughout
+        iterations. A single value set this property for all components.
+        Default is False. Has no effect for equality constraints.
+    """
+    def _input_validation(self):
+        if self.A.ndim != 2:
+            message = "`A` must have exactly two dimensions."
+            raise ValueError(message)
+
+        try:
+            shape = self.A.shape[0:1]
+            self.lb = np.broadcast_to(self.lb, shape)
+            self.ub = np.broadcast_to(self.ub, shape)
+            self.keep_feasible = np.broadcast_to(self.keep_feasible, shape)
+        except ValueError:
+            message = ("`lb`, `ub`, and `keep_feasible` must be broadcastable "
+                       "to shape `A.shape[0:1]`")
+            raise ValueError(message)
+
+    def __init__(self, A, lb=-np.inf, ub=np.inf, keep_feasible=False):
+        if not issparse(A):
+            # In some cases, if the constraint is not valid, this emits a
+            # VisibleDeprecationWarning about ragged nested sequences
+            # before eventually causing an error. `scipy.optimize.milp` would
+            # prefer that this just error out immediately so it can handle it
+            # rather than concerning the user.
+            with catch_warnings():
+                simplefilter("error")
+                self.A = np.atleast_2d(A).astype(np.float64)
+        else:
+            self.A = A
+        if issparse(lb) or issparse(ub):
+            raise ValueError("Constraint limits must be dense arrays.")
+        self.lb = np.atleast_1d(lb).astype(np.float64)
+        self.ub = np.atleast_1d(ub).astype(np.float64)
+
+        if issparse(keep_feasible):
+            raise ValueError("`keep_feasible` must be a dense array.")
+        self.keep_feasible = np.atleast_1d(keep_feasible).astype(bool)
+        self._input_validation()
+
+    def residual(self, x):
+        """
+        Calculate the residual between the constraint function and the limits
+
+        For a linear constraint of the form::
+
+            lb <= A@x <= ub
+
+        the lower and upper residuals between ``A@x`` and the limits are values
+        ``sl`` and ``sb`` such that::
+
+            lb + sl == A@x == ub - sb
+
+        When all elements of ``sl`` and ``sb`` are positive, all elements of
+        the constraint are satisfied; a negative element in ``sl`` or ``sb``
+        indicates that the corresponding element of the constraint is not
+        satisfied.
+
+        Parameters
+        ----------
+        x: array_like
+            Vector of independent variables
+
+        Returns
+        -------
+        sl, sb : array-like
+            The lower and upper residuals
+        """
+        return self.A@x - self.lb, self.ub - self.A@x
+
+
+class Bounds:
+    """Bounds constraint on the variables.
+
+    The constraint has the general inequality form::
+
+        lb <= x <= ub
+
+    It is possible to use equal bounds to represent an equality constraint or
+    infinite bounds to represent a one-sided constraint.
+
+    Parameters
+    ----------
+    lb, ub : dense array_like, optional
+        Lower and upper bounds on independent variables. `lb`, `ub`, and
+        `keep_feasible` must be the same shape or broadcastable.
+        Set components of `lb` and `ub` equal
+        to fix a variable. Use ``np.inf`` with an appropriate sign to disable
+        bounds on all or some variables. Note that you can mix constraints of
+        different types: interval, one-sided or equality, by setting different
+        components of `lb` and `ub` as necessary. Defaults to ``lb = -np.inf``
+        and ``ub = np.inf`` (no bounds).
+    keep_feasible : dense array_like of bool, optional
+        Whether to keep the constraint components feasible throughout
+        iterations. Must be broadcastable with `lb` and `ub`.
+        Default is False. Has no effect for equality constraints.
+    """
+    def _input_validation(self):
+        try:
+            res = np.broadcast_arrays(self.lb, self.ub, self.keep_feasible)
+            self.lb, self.ub, self.keep_feasible = res
+        except ValueError:
+            message = "`lb`, `ub`, and `keep_feasible` must be broadcastable."
+            raise ValueError(message)
+
+    def __init__(self, lb=-np.inf, ub=np.inf, keep_feasible=False):
+        if issparse(lb) or issparse(ub):
+            raise ValueError("Lower and upper bounds must be dense arrays.")
+        self.lb = np.atleast_1d(lb)
+        self.ub = np.atleast_1d(ub)
+
+        if issparse(keep_feasible):
+            raise ValueError("`keep_feasible` must be a dense array.")
+        self.keep_feasible = np.atleast_1d(keep_feasible).astype(bool)
+        self._input_validation()
+
+    def __repr__(self):
+        start = f"{type(self).__name__}({self.lb!r}, {self.ub!r}"
+        if np.any(self.keep_feasible):
+            end = f", keep_feasible={self.keep_feasible!r})"
+        else:
+            end = ")"
+        return start + end
+
+    def residual(self, x):
+        """Calculate the residual (slack) between the input and the bounds
+
+        For a bound constraint of the form::
+
+            lb <= x <= ub
+
+        the lower and upper residuals between `x` and the bounds are values
+        ``sl`` and ``sb`` such that::
+
+            lb + sl == x == ub - sb
+
+        When all elements of ``sl`` and ``sb`` are positive, all elements of
+        ``x`` lie within the bounds; a negative element in ``sl`` or ``sb``
+        indicates that the corresponding element of ``x`` is out of bounds.
+
+        Parameters
+        ----------
+        x: array_like
+            Vector of independent variables
+
+        Returns
+        -------
+        sl, sb : array-like
+            The lower and upper residuals
+        """
+        return x - self.lb, self.ub - x
+
+
+class PreparedConstraint:
+    """Constraint prepared from a user defined constraint.
+
+    On creation it will check whether a constraint definition is valid and
+    the initial point is feasible. If created successfully, it will contain
+    the attributes listed below.
+
+    Parameters
+    ----------
+    constraint : {NonlinearConstraint, LinearConstraint`, Bounds}
+        Constraint to check and prepare.
+    x0 : array_like
+        Initial vector of independent variables.
+    sparse_jacobian : bool or None, optional
+        If bool, then the Jacobian of the constraint will be converted
+        to the corresponded format if necessary. If None (default), such
+        conversion is not made.
+    finite_diff_bounds : 2-tuple, optional
+        Lower and upper bounds on the independent variables for the finite
+        difference approximation, if applicable. Defaults to no bounds.
+
+    Attributes
+    ----------
+    fun : {VectorFunction, LinearVectorFunction, IdentityVectorFunction}
+        Function defining the constraint wrapped by one of the convenience
+        classes.
+    bounds : 2-tuple
+        Contains lower and upper bounds for the constraints --- lb and ub.
+        These are converted to ndarray and have a size equal to the number of
+        the constraints.
+    keep_feasible : ndarray
+         Array indicating which components must be kept feasible with a size
+         equal to the number of the constraints.
+    """
+    def __init__(self, constraint, x0, sparse_jacobian=None,
+                 finite_diff_bounds=(-np.inf, np.inf)):
+        if isinstance(constraint, NonlinearConstraint):
+            fun = VectorFunction(constraint.fun, x0,
+                                 constraint.jac, constraint.hess,
+                                 constraint.finite_diff_rel_step,
+                                 constraint.finite_diff_jac_sparsity,
+                                 finite_diff_bounds, sparse_jacobian)
+        elif isinstance(constraint, LinearConstraint):
+            fun = LinearVectorFunction(constraint.A, x0, sparse_jacobian)
+        elif isinstance(constraint, Bounds):
+            fun = IdentityVectorFunction(x0, sparse_jacobian)
+        else:
+            raise ValueError("`constraint` of an unknown type is passed.")
+
+        m = fun.m
+
+        lb = np.asarray(constraint.lb, dtype=float)
+        ub = np.asarray(constraint.ub, dtype=float)
+        keep_feasible = np.asarray(constraint.keep_feasible, dtype=bool)
+
+        lb = np.broadcast_to(lb, m)
+        ub = np.broadcast_to(ub, m)
+        keep_feasible = np.broadcast_to(keep_feasible, m)
+
+        if keep_feasible.shape != (m,):
+            raise ValueError("`keep_feasible` has a wrong shape.")
+
+        mask = keep_feasible & (lb != ub)
+        f0 = fun.f
+        if np.any(f0[mask] < lb[mask]) or np.any(f0[mask] > ub[mask]):
+            raise ValueError("`x0` is infeasible with respect to some "
+                             "inequality constraint with `keep_feasible` "
+                             "set to True.")
+
+        self.fun = fun
+        self.bounds = (lb, ub)
+        self.keep_feasible = keep_feasible
+
+    def violation(self, x):
+        """How much the constraint is exceeded by.
+
+        Parameters
+        ----------
+        x : array-like
+            Vector of independent variables
+
+        Returns
+        -------
+        excess : array-like
+            How much the constraint is exceeded by, for each of the
+            constraints specified by `PreparedConstraint.fun`.
+        """
+        with catch_warnings():
+            # Ignore the following warning, it's not important when
+            # figuring out total violation
+            # UserWarning: delta_grad == 0.0. Check if the approximated
+            # function is linear
+            filterwarnings("ignore", "delta_grad", UserWarning)
+            ev = self.fun.fun(np.asarray(x))
+
+        excess_lb = np.maximum(self.bounds[0] - ev, 0)
+        excess_ub = np.maximum(ev - self.bounds[1], 0)
+
+        return excess_lb + excess_ub
+
+
+def new_bounds_to_old(lb, ub, n):
+    """Convert the new bounds representation to the old one.
+
+    The new representation is a tuple (lb, ub) and the old one is a list
+    containing n tuples, ith containing lower and upper bound on a ith
+    variable.
+    If any of the entries in lb/ub are -np.inf/np.inf they are replaced by
+    None.
+    """
+    lb = np.broadcast_to(lb, n)
+    ub = np.broadcast_to(ub, n)
+
+    lb = [float(x) if x > -np.inf else None for x in lb]
+    ub = [float(x) if x < np.inf else None for x in ub]
+
+    return list(zip(lb, ub))
+
+
+def old_bound_to_new(bounds):
+    """Convert the old bounds representation to the new one.
+
+    The new representation is a tuple (lb, ub) and the old one is a list
+    containing n tuples, ith containing lower and upper bound on a ith
+    variable.
+    If any of the entries in lb/ub are None they are replaced by
+    -np.inf/np.inf.
+    """
+    lb, ub = zip(*bounds)
+
+    # Convert occurrences of None to -inf or inf, and replace occurrences of
+    # any numpy array x with x.item(). Then wrap the results in numpy arrays.
+    lb = np.array([float(_arr_to_scalar(x)) if x is not None else -np.inf
+                   for x in lb])
+    ub = np.array([float(_arr_to_scalar(x)) if x is not None else np.inf
+                   for x in ub])
+
+    return lb, ub
+
+
+def strict_bounds(lb, ub, keep_feasible, n_vars):
+    """Remove bounds which are not asked to be kept feasible."""
+    strict_lb = np.resize(lb, n_vars).astype(float)
+    strict_ub = np.resize(ub, n_vars).astype(float)
+    keep_feasible = np.resize(keep_feasible, n_vars)
+    strict_lb[~keep_feasible] = -np.inf
+    strict_ub[~keep_feasible] = np.inf
+    return strict_lb, strict_ub
+
+
+def new_constraint_to_old(con, x0):
+    """
+    Converts new-style constraint objects to old-style constraint dictionaries.
+    """
+    if isinstance(con, NonlinearConstraint):
+        if (con.finite_diff_jac_sparsity is not None or
+                con.finite_diff_rel_step is not None or
+                not isinstance(con.hess, BFGS) or  # misses user specified BFGS
+                con.keep_feasible):
+            warn("Constraint options `finite_diff_jac_sparsity`, "
+                 "`finite_diff_rel_step`, `keep_feasible`, and `hess`"
+                 "are ignored by this method.",
+                 OptimizeWarning, stacklevel=3)
+
+        fun = con.fun
+        if callable(con.jac):
+            jac = con.jac
+        else:
+            jac = None
+
+    else:  # LinearConstraint
+        if np.any(con.keep_feasible):
+            warn("Constraint option `keep_feasible` is ignored by this method.",
+                 OptimizeWarning, stacklevel=3)
+
+        A = con.A
+        if issparse(A):
+            A = A.toarray()
+        def fun(x):
+            return np.dot(A, x)
+        def jac(x):
+            return A
+
+    # FIXME: when bugs in VectorFunction/LinearVectorFunction are worked out,
+    # use pcon.fun.fun and pcon.fun.jac. Until then, get fun/jac above.
+    pcon = PreparedConstraint(con, x0)
+    lb, ub = pcon.bounds
+
+    i_eq = lb == ub
+    i_bound_below = np.logical_xor(lb != -np.inf, i_eq)
+    i_bound_above = np.logical_xor(ub != np.inf, i_eq)
+    i_unbounded = np.logical_and(lb == -np.inf, ub == np.inf)
+
+    if np.any(i_unbounded):
+        warn("At least one constraint is unbounded above and below. Such "
+             "constraints are ignored.",
+             OptimizeWarning, stacklevel=3)
+
+    ceq = []
+    if np.any(i_eq):
+        def f_eq(x):
+            y = np.array(fun(x)).flatten()
+            return y[i_eq] - lb[i_eq]
+        ceq = [{"type": "eq", "fun": f_eq}]
+
+        if jac is not None:
+            def j_eq(x):
+                dy = jac(x)
+                if issparse(dy):
+                    dy = dy.toarray()
+                dy = np.atleast_2d(dy)
+                return dy[i_eq, :]
+            ceq[0]["jac"] = j_eq
+
+    cineq = []
+    n_bound_below = np.sum(i_bound_below)
+    n_bound_above = np.sum(i_bound_above)
+    if n_bound_below + n_bound_above:
+        def f_ineq(x):
+            y = np.zeros(n_bound_below + n_bound_above)
+            y_all = np.array(fun(x)).flatten()
+            y[:n_bound_below] = y_all[i_bound_below] - lb[i_bound_below]
+            y[n_bound_below:] = -(y_all[i_bound_above] - ub[i_bound_above])
+            return y
+        cineq = [{"type": "ineq", "fun": f_ineq}]
+
+        if jac is not None:
+            def j_ineq(x):
+                dy = np.zeros((n_bound_below + n_bound_above, len(x0)))
+                dy_all = jac(x)
+                if issparse(dy_all):
+                    dy_all = dy_all.toarray()
+                dy_all = np.atleast_2d(dy_all)
+                dy[:n_bound_below, :] = dy_all[i_bound_below]
+                dy[n_bound_below:, :] = -dy_all[i_bound_above]
+                return dy
+            cineq[0]["jac"] = j_ineq
+
+    old_constraints = ceq + cineq
+
+    if len(old_constraints) > 1:
+        warn("Equality and inequality constraints are specified in the same "
+             "element of the constraint list. For efficient use with this "
+             "method, equality and inequality constraints should be specified "
+             "in separate elements of the constraint list. ",
+             OptimizeWarning, stacklevel=3)
+    return old_constraints
+
+
+def old_constraint_to_new(ic, con):
+    """
+    Converts old-style constraint dictionaries to new-style constraint objects.
+    """
+    # check type
+    try:
+        ctype = con['type'].lower()
+    except KeyError as e:
+        raise KeyError('Constraint %d has no type defined.' % ic) from e
+    except TypeError as e:
+        raise TypeError(
+            'Constraints must be a sequence of dictionaries.'
+        ) from e
+    except AttributeError as e:
+        raise TypeError("Constraint's type must be a string.") from e
+    else:
+        if ctype not in ['eq', 'ineq']:
+            raise ValueError("Unknown constraint type '%s'." % con['type'])
+    if 'fun' not in con:
+        raise ValueError('Constraint %d has no function defined.' % ic)
+
+    lb = 0
+    if ctype == 'eq':
+        ub = 0
+    else:
+        ub = np.inf
+
+    jac = '2-point'
+    if 'args' in con:
+        args = con['args']
+        def fun(x):
+            return con["fun"](x, *args)
+        if 'jac' in con:
+            def jac(x):
+                return con["jac"](x, *args)
+    else:
+        fun = con['fun']
+        if 'jac' in con:
+            jac = con['jac']
+
+    return NonlinearConstraint(fun, lb, ub, jac)
@@ -0,0 +1,728 @@
+import numpy as np
+
+"""
+# 2023 - ported from minpack2.dcsrch, dcstep (Fortran) to Python
+c     MINPACK-1 Project. June 1983.
+c     Argonne National Laboratory.
+c     Jorge J. More' and David J. Thuente.
+c
+c     MINPACK-2 Project. November 1993.
+c     Argonne National Laboratory and University of Minnesota.
+c     Brett M. Averick, Richard G. Carter, and Jorge J. More'.
+"""
+
+# NOTE this file was linted by black on first commit, and can be kept that way.
+
+
+class DCSRCH:
+    """
+    Parameters
+    ----------
+    phi : callable phi(alpha)
+        Function at point `alpha`
+    derphi : callable phi'(alpha)
+        Objective function derivative. Returns a scalar.
+    ftol : float
+        A nonnegative tolerance for the sufficient decrease condition.
+    gtol : float
+        A nonnegative tolerance for the curvature condition.
+    xtol : float
+        A nonnegative relative tolerance for an acceptable step. The
+        subroutine exits with a warning if the relative difference between
+        sty and stx is less than xtol.
+    stpmin : float
+        A nonnegative lower bound for the step.
+    stpmax :
+        A nonnegative upper bound for the step.
+
+    Notes
+    -----
+
+    This subroutine finds a step that satisfies a sufficient
+    decrease condition and a curvature condition.
+
+    Each call of the subroutine updates an interval with
+    endpoints stx and sty. The interval is initially chosen
+    so that it contains a minimizer of the modified function
+
+           psi(stp) = f(stp) - f(0) - ftol*stp*f'(0).
+
+    If psi(stp) <= 0 and f'(stp) >= 0 for some step, then the
+    interval is chosen so that it contains a minimizer of f.
+
+    The algorithm is designed to find a step that satisfies
+    the sufficient decrease condition
+
+           f(stp) <= f(0) + ftol*stp*f'(0),
+
+    and the curvature condition
+
+           abs(f'(stp)) <= gtol*abs(f'(0)).
+
+    If ftol is less than gtol and if, for example, the function
+    is bounded below, then there is always a step which satisfies
+    both conditions.
+
+    If no step can be found that satisfies both conditions, then
+    the algorithm stops with a warning. In this case stp only
+    satisfies the sufficient decrease condition.
+
+    A typical invocation of dcsrch has the following outline:
+
+    Evaluate the function at stp = 0.0d0; store in f.
+    Evaluate the gradient at stp = 0.0d0; store in g.
+    Choose a starting step stp.
+
+    task = 'START'
+    10 continue
+        call dcsrch(stp,f,g,ftol,gtol,xtol,task,stpmin,stpmax,
+                   isave,dsave)
+        if (task .eq. 'FG') then
+           Evaluate the function and the gradient at stp
+           go to 10
+           end if
+
+    NOTE: The user must not alter work arrays between calls.
+
+    The subroutine statement is
+
+        subroutine dcsrch(f,g,stp,ftol,gtol,xtol,stpmin,stpmax,
+                         task,isave,dsave)
+        where
+
+    stp is a double precision variable.
+        On entry stp is the current estimate of a satisfactory
+            step. On initial entry, a positive initial estimate
+            must be provided.
+        On exit stp is the current estimate of a satisfactory step
+            if task = 'FG'. If task = 'CONV' then stp satisfies
+            the sufficient decrease and curvature condition.
+
+    f is a double precision variable.
+        On initial entry f is the value of the function at 0.
+        On subsequent entries f is the value of the
+            function at stp.
+        On exit f is the value of the function at stp.
+
+    g is a double precision variable.
+        On initial entry g is the derivative of the function at 0.
+        On subsequent entries g is the derivative of the
+           function at stp.
+        On exit g is the derivative of the function at stp.
+
+    ftol is a double precision variable.
+        On entry ftol specifies a nonnegative tolerance for the
+           sufficient decrease condition.
+        On exit ftol is unchanged.
+
+    gtol is a double precision variable.
+        On entry gtol specifies a nonnegative tolerance for the
+           curvature condition.
+        On exit gtol is unchanged.
+
+    xtol is a double precision variable.
+        On entry xtol specifies a nonnegative relative tolerance
+          for an acceptable step. The subroutine exits with a
+          warning if the relative difference between sty and stx
+          is less than xtol.
+
+        On exit xtol is unchanged.
+
+    task is a character variable of length at least 60.
+        On initial entry task must be set to 'START'.
+        On exit task indicates the required action:
+
+           If task(1:2) = 'FG' then evaluate the function and
+           derivative at stp and call dcsrch again.
+
+           If task(1:4) = 'CONV' then the search is successful.
+
+           If task(1:4) = 'WARN' then the subroutine is not able
+           to satisfy the convergence conditions. The exit value of
+           stp contains the best point found during the search.
+
+          If task(1:5) = 'ERROR' then there is an error in the
+          input arguments.
+
+        On exit with convergence, a warning or an error, the
+           variable task contains additional information.
+
+    stpmin is a double precision variable.
+        On entry stpmin is a nonnegative lower bound for the step.
+        On exit stpmin is unchanged.
+
+    stpmax is a double precision variable.
+        On entry stpmax is a nonnegative upper bound for the step.
+        On exit stpmax is unchanged.
+
+    isave is an integer work array of dimension 2.
+
+    dsave is a double precision work array of dimension 13.
+
+    Subprograms called
+
+      MINPACK-2 ... dcstep
+    MINPACK-1 Project. June 1983.
+    Argonne National Laboratory.
+    Jorge J. More' and David J. Thuente.
+
+    MINPACK-2 Project. November 1993.
+    Argonne National Laboratory and University of Minnesota.
+    Brett M. Averick, Richard G. Carter, and Jorge J. More'.
+    """
+
+    def __init__(self, phi, derphi, ftol, gtol, xtol, stpmin, stpmax):
+        self.stage = None
+        self.ginit = None
+        self.gtest = None
+        self.gx = None
+        self.gy = None
+        self.finit = None
+        self.fx = None
+        self.fy = None
+        self.stx = None
+        self.sty = None
+        self.stmin = None
+        self.stmax = None
+        self.width = None
+        self.width1 = None
+
+        # leave all assessment of tolerances/limits to the first call of
+        # this object
+        self.ftol = ftol
+        self.gtol = gtol
+        self.xtol = xtol
+        self.stpmin = stpmin
+        self.stpmax = stpmax
+
+        self.phi = phi
+        self.derphi = derphi
+
+    def __call__(self, alpha1, phi0=None, derphi0=None, maxiter=100):
+        """
+        Parameters
+        ----------
+        alpha1 : float
+            alpha1 is the current estimate of a satisfactory
+            step. A positive initial estimate must be provided.
+        phi0 : float
+            the value of `phi` at 0 (if known).
+        derphi0 : float
+            the derivative of `derphi` at 0 (if known).
+        maxiter : int
+
+        Returns
+        -------
+        alpha : float
+            Step size, or None if no suitable step was found.
+        phi : float
+            Value of `phi` at the new point `alpha`.
+        phi0 : float
+            Value of `phi` at `alpha=0`.
+        task : bytes
+            On exit task indicates status information.
+
+           If task[:4] == b'CONV' then the search is successful.
+
+           If task[:4] == b'WARN' then the subroutine is not able
+           to satisfy the convergence conditions. The exit value of
+           stp contains the best point found during the search.
+
+           If task[:5] == b'ERROR' then there is an error in the
+           input arguments.
+        """
+        if phi0 is None:
+            phi0 = self.phi(0.0)
+        if derphi0 is None:
+            derphi0 = self.derphi(0.0)
+
+        phi1 = phi0
+        derphi1 = derphi0
+
+        task = b"START"
+        for i in range(maxiter):
+            stp, phi1, derphi1, task = self._iterate(
+                alpha1, phi1, derphi1, task
+            )
+
+            if not np.isfinite(stp):
+                task = b"WARN"
+                stp = None
+                break
+
+            if task[:2] == b"FG":
+                alpha1 = stp
+                phi1 = self.phi(stp)
+                derphi1 = self.derphi(stp)
+            else:
+                break
+        else:
+            # maxiter reached, the line search did not converge
+            stp = None
+            task = b"WARNING: dcsrch did not converge within max iterations"
+
+        if task[:5] == b"ERROR" or task[:4] == b"WARN":
+            stp = None  # failed
+
+        return stp, phi1, phi0, task
+
+    def _iterate(self, stp, f, g, task):
+        """
+        Parameters
+        ----------
+        stp : float
+            The current estimate of a satisfactory step. On initial entry, a
+            positive initial estimate must be provided.
+        f : float
+            On first call f is the value of the function at 0. On subsequent
+            entries f should be the value of the function at stp.
+        g : float
+            On initial entry g is the derivative of the function at 0. On
+            subsequent entries g is the derivative of the function at stp.
+        task : bytes
+            On initial entry task must be set to 'START'.
+
+        On exit with convergence, a warning or an error, the
+           variable task contains additional information.
+
+
+        Returns
+        -------
+        stp, f, g, task: tuple
+
+            stp : float
+                the current estimate of a satisfactory step if task = 'FG'. If
+                task = 'CONV' then stp satisfies the sufficient decrease and
+                curvature condition.
+            f : float
+                the value of the function at stp.
+            g : float
+                the derivative of the function at stp.
+            task : bytes
+                On exit task indicates the required action:
+
+               If task(1:2) == b'FG' then evaluate the function and
+               derivative at stp and call dcsrch again.
+
+               If task(1:4) == b'CONV' then the search is successful.
+
+               If task(1:4) == b'WARN' then the subroutine is not able
+               to satisfy the convergence conditions. The exit value of
+               stp contains the best point found during the search.
+
+              If task(1:5) == b'ERROR' then there is an error in the
+              input arguments.
+        """
+        p5 = 0.5
+        p66 = 0.66
+        xtrapl = 1.1
+        xtrapu = 4.0
+
+        if task[:5] == b"START":
+            if stp < self.stpmin:
+                task = b"ERROR: STP .LT. STPMIN"
+            if stp > self.stpmax:
+                task = b"ERROR: STP .GT. STPMAX"
+            if g >= 0:
+                task = b"ERROR: INITIAL G .GE. ZERO"
+            if self.ftol < 0:
+                task = b"ERROR: FTOL .LT. ZERO"
+            if self.gtol < 0:
+                task = b"ERROR: GTOL .LT. ZERO"
+            if self.xtol < 0:
+                task = b"ERROR: XTOL .LT. ZERO"
+            if self.stpmin < 0:
+                task = b"ERROR: STPMIN .LT. ZERO"
+            if self.stpmax < self.stpmin:
+                task = b"ERROR: STPMAX .LT. STPMIN"
+
+            if task[:5] == b"ERROR":
+                return stp, f, g, task
+
+            # Initialize local variables.
+
+            self.brackt = False
+            self.stage = 1
+            self.finit = f
+            self.ginit = g
+            self.gtest = self.ftol * self.ginit
+            self.width = self.stpmax - self.stpmin
+            self.width1 = self.width / p5
+
+            # The variables stx, fx, gx contain the values of the step,
+            # function, and derivative at the best step.
+            # The variables sty, fy, gy contain the value of the step,
+            # function, and derivative at sty.
+            # The variables stp, f, g contain the values of the step,
+            # function, and derivative at stp.
+
+            self.stx = 0.0
+            self.fx = self.finit
+            self.gx = self.ginit
+            self.sty = 0.0
+            self.fy = self.finit
+            self.gy = self.ginit
+            self.stmin = 0
+            self.stmax = stp + xtrapu * stp
+            task = b"FG"
+            return stp, f, g, task
+
+        # in the original Fortran this was a location to restore variables
+        # we don't need to do that because they're attributes.
+
+        # If psi(stp) <= 0 and f'(stp) >= 0 for some step, then the
+        # algorithm enters the second stage.
+        ftest = self.finit + stp * self.gtest
+
+        if self.stage == 1 and f <= ftest and g >= 0:
+            self.stage = 2
+
+        # test for warnings
+        if self.brackt and (stp <= self.stmin or stp >= self.stmax):
+            task = b"WARNING: ROUNDING ERRORS PREVENT PROGRESS"
+        if self.brackt and self.stmax - self.stmin <= self.xtol * self.stmax:
+            task = b"WARNING: XTOL TEST SATISFIED"
+        if stp == self.stpmax and f <= ftest and g <= self.gtest:
+            task = b"WARNING: STP = STPMAX"
+        if stp == self.stpmin and (f > ftest or g >= self.gtest):
+            task = b"WARNING: STP = STPMIN"
+
+        # test for convergence
+        if f <= ftest and abs(g) <= self.gtol * -self.ginit:
+            task = b"CONVERGENCE"
+
+        # test for termination
+        if task[:4] == b"WARN" or task[:4] == b"CONV":
+            return stp, f, g, task
+
+        # A modified function is used to predict the step during the
+        # first stage if a lower function value has been obtained but
+        # the decrease is not sufficient.
+        if self.stage == 1 and f <= self.fx and f > ftest:
+            # Define the modified function and derivative values.
+            fm = f - stp * self.gtest
+            fxm = self.fx - self.stx * self.gtest
+            fym = self.fy - self.sty * self.gtest
+            gm = g - self.gtest
+            gxm = self.gx - self.gtest
+            gym = self.gy - self.gtest
+
+            # Call dcstep to update stx, sty, and to compute the new step.
+            # dcstep can have several operations which can produce NaN
+            # e.g. inf/inf. Filter these out.
+            with np.errstate(invalid="ignore", over="ignore"):
+                tup = dcstep(
+                    self.stx,
+                    fxm,
+                    gxm,
+                    self.sty,
+                    fym,
+                    gym,
+                    stp,
+                    fm,
+                    gm,
+                    self.brackt,
+                    self.stmin,
+                    self.stmax,
+                )
+                self.stx, fxm, gxm, self.sty, fym, gym, stp, self.brackt = tup
+
+            # Reset the function and derivative values for f
+            self.fx = fxm + self.stx * self.gtest
+            self.fy = fym + self.sty * self.gtest
+            self.gx = gxm + self.gtest
+            self.gy = gym + self.gtest
+
+        else:
+            # Call dcstep to update stx, sty, and to compute the new step.
+            # dcstep can have several operations which can produce NaN
+            # e.g. inf/inf. Filter these out.
+
+            with np.errstate(invalid="ignore", over="ignore"):
+                tup = dcstep(
+                    self.stx,
+                    self.fx,
+                    self.gx,
+                    self.sty,
+                    self.fy,
+                    self.gy,
+                    stp,
+                    f,
+                    g,
+                    self.brackt,
+                    self.stmin,
+                    self.stmax,
+                )
+            (
+                self.stx,
+                self.fx,
+                self.gx,
+                self.sty,
+                self.fy,
+                self.gy,
+                stp,
+                self.brackt,
+            ) = tup
+
+        # Decide if a bisection step is needed
+        if self.brackt:
+            if abs(self.sty - self.stx) >= p66 * self.width1:
+                stp = self.stx + p5 * (self.sty - self.stx)
+            self.width1 = self.width
+            self.width = abs(self.sty - self.stx)
+
+        # Set the minimum and maximum steps allowed for stp.
+        if self.brackt:
+            self.stmin = min(self.stx, self.sty)
+            self.stmax = max(self.stx, self.sty)
+        else:
+            self.stmin = stp + xtrapl * (stp - self.stx)
+            self.stmax = stp + xtrapu * (stp - self.stx)
+
+        # Force the step to be within the bounds stpmax and stpmin.
+        stp = np.clip(stp, self.stpmin, self.stpmax)
+
+        # If further progress is not possible, let stp be the best
+        # point obtained during the search.
+        if (
+            self.brackt
+            and (stp <= self.stmin or stp >= self.stmax)
+            or (
+                self.brackt
+                and self.stmax - self.stmin <= self.xtol * self.stmax
+            )
+        ):
+            stp = self.stx
+
+        # Obtain another function and derivative
+        task = b"FG"
+        return stp, f, g, task
+
+
+def dcstep(stx, fx, dx, sty, fy, dy, stp, fp, dp, brackt, stpmin, stpmax):
+    """
+    Subroutine dcstep
+
+    This subroutine computes a safeguarded step for a search
+    procedure and updates an interval that contains a step that
+    satisfies a sufficient decrease and a curvature condition.
+
+    The parameter stx contains the step with the least function
+    value. If brackt is set to .true. then a minimizer has
+    been bracketed in an interval with endpoints stx and sty.
+    The parameter stp contains the current step.
+    The subroutine assumes that if brackt is set to .true. then
+
+        min(stx,sty) < stp < max(stx,sty),
+
+    and that the derivative at stx is negative in the direction
+    of the step.
+
+    The subroutine statement is
+
+      subroutine dcstep(stx,fx,dx,sty,fy,dy,stp,fp,dp,brackt,
+                        stpmin,stpmax)
+
+    where
+
+    stx is a double precision variable.
+        On entry stx is the best step obtained so far and is an
+          endpoint of the interval that contains the minimizer.
+        On exit stx is the updated best step.
+
+    fx is a double precision variable.
+        On entry fx is the function at stx.
+        On exit fx is the function at stx.
+
+    dx is a double precision variable.
+        On entry dx is the derivative of the function at
+          stx. The derivative must be negative in the direction of
+          the step, that is, dx and stp - stx must have opposite
+          signs.
+        On exit dx is the derivative of the function at stx.
+
+    sty is a double precision variable.
+        On entry sty is the second endpoint of the interval that
+          contains the minimizer.
+        On exit sty is the updated endpoint of the interval that
+          contains the minimizer.
+
+    fy is a double precision variable.
+        On entry fy is the function at sty.
+        On exit fy is the function at sty.
+
+    dy is a double precision variable.
+        On entry dy is the derivative of the function at sty.
+        On exit dy is the derivative of the function at the exit sty.
+
+    stp is a double precision variable.
+        On entry stp is the current step. If brackt is set to .true.
+          then on input stp must be between stx and sty.
+        On exit stp is a new trial step.
+
+    fp is a double precision variable.
+        On entry fp is the function at stp
+        On exit fp is unchanged.
+
+    dp is a double precision variable.
+        On entry dp is the derivative of the function at stp.
+        On exit dp is unchanged.
+
+    brackt is an logical variable.
+        On entry brackt specifies if a minimizer has been bracketed.
+            Initially brackt must be set to .false.
+        On exit brackt specifies if a minimizer has been bracketed.
+            When a minimizer is bracketed brackt is set to .true.
+
+    stpmin is a double precision variable.
+        On entry stpmin is a lower bound for the step.
+        On exit stpmin is unchanged.
+
+    stpmax is a double precision variable.
+        On entry stpmax is an upper bound for the step.
+        On exit stpmax is unchanged.
+
+    MINPACK-1 Project. June 1983
+    Argonne National Laboratory.
+    Jorge J. More' and David J. Thuente.
+
+    MINPACK-2 Project. November 1993.
+    Argonne National Laboratory and University of Minnesota.
+    Brett M. Averick and Jorge J. More'.
+
+    """
+    sgn_dp = np.sign(dp)
+    sgn_dx = np.sign(dx)
+
+    # sgnd = dp * (dx / abs(dx))
+    sgnd = sgn_dp * sgn_dx
+
+    # First case: A higher function value. The minimum is bracketed.
+    # If the cubic step is closer to stx than the quadratic step, the
+    # cubic step is taken, otherwise the average of the cubic and
+    # quadratic steps is taken.
+    if fp > fx:
+        theta = 3.0 * (fx - fp) / (stp - stx) + dx + dp
+        s = max(abs(theta), abs(dx), abs(dp))
+        gamma = s * np.sqrt((theta / s) ** 2 - (dx / s) * (dp / s))
+        if stp < stx:
+            gamma *= -1
+        p = (gamma - dx) + theta
+        q = ((gamma - dx) + gamma) + dp
+        r = p / q
+        stpc = stx + r * (stp - stx)
+        stpq = stx + ((dx / ((fx - fp) / (stp - stx) + dx)) / 2.0) * (stp - stx)
+        if abs(stpc - stx) <= abs(stpq - stx):
+            stpf = stpc
+        else:
+            stpf = stpc + (stpq - stpc) / 2.0
+        brackt = True
+    elif sgnd < 0.0:
+        # Second case: A lower function value and derivatives of opposite
+        # sign. The minimum is bracketed. If the cubic step is farther from
+        # stp than the secant step, the cubic step is taken, otherwise the
+        # secant step is taken.
+        theta = 3 * (fx - fp) / (stp - stx) + dx + dp
+        s = max(abs(theta), abs(dx), abs(dp))
+        gamma = s * np.sqrt((theta / s) ** 2 - (dx / s) * (dp / s))
+        if stp > stx:
+            gamma *= -1
+        p = (gamma - dp) + theta
+        q = ((gamma - dp) + gamma) + dx
+        r = p / q
+        stpc = stp + r * (stx - stp)
+        stpq = stp + (dp / (dp - dx)) * (stx - stp)
+        if abs(stpc - stp) > abs(stpq - stp):
+            stpf = stpc
+        else:
+            stpf = stpq
+        brackt = True
+    elif abs(dp) < abs(dx):
+        # Third case: A lower function value, derivatives of the same sign,
+        # and the magnitude of the derivative decreases.
+
+        # The cubic step is computed only if the cubic tends to infinity
+        # in the direction of the step or if the minimum of the cubic
+        # is beyond stp. Otherwise the cubic step is defined to be the
+        # secant step.
+        theta = 3 * (fx - fp) / (stp - stx) + dx + dp
+        s = max(abs(theta), abs(dx), abs(dp))
+
+        # The case gamma = 0 only arises if the cubic does not tend
+        # to infinity in the direction of the step.
+        gamma = s * np.sqrt(max(0, (theta / s) ** 2 - (dx / s) * (dp / s)))
+        if stp > stx:
+            gamma = -gamma
+        p = (gamma - dp) + theta
+        q = (gamma + (dx - dp)) + gamma
+        r = p / q
+        if r < 0 and gamma != 0:
+            stpc = stp + r * (stx - stp)
+        elif stp > stx:
+            stpc = stpmax
+        else:
+            stpc = stpmin
+        stpq = stp + (dp / (dp - dx)) * (stx - stp)
+
+        if brackt:
+            # A minimizer has been bracketed. If the cubic step is
+            # closer to stp than the secant step, the cubic step is
+            # taken, otherwise the secant step is taken.
+            if abs(stpc - stp) < abs(stpq - stp):
+                stpf = stpc
+            else:
+                stpf = stpq
+
+            if stp > stx:
+                stpf = min(stp + 0.66 * (sty - stp), stpf)
+            else:
+                stpf = max(stp + 0.66 * (sty - stp), stpf)
+        else:
+            # A minimizer has not been bracketed. If the cubic step is
+            # farther from stp than the secant step, the cubic step is
+            # taken, otherwise the secant step is taken.
+            if abs(stpc - stp) > abs(stpq - stp):
+                stpf = stpc
+            else:
+                stpf = stpq
+            stpf = np.clip(stpf, stpmin, stpmax)
+
+    else:
+        # Fourth case: A lower function value, derivatives of the same sign,
+        # and the magnitude of the derivative does not decrease. If the
+        # minimum is not bracketed, the step is either stpmin or stpmax,
+        # otherwise the cubic step is taken.
+        if brackt:
+            theta = 3.0 * (fp - fy) / (sty - stp) + dy + dp
+            s = max(abs(theta), abs(dy), abs(dp))
+            gamma = s * np.sqrt((theta / s) ** 2 - (dy / s) * (dp / s))
+            if stp > sty:
+                gamma = -gamma
+            p = (gamma - dp) + theta
+            q = ((gamma - dp) + gamma) + dy
+            r = p / q
+            stpc = stp + r * (sty - stp)
+            stpf = stpc
+        elif stp > stx:
+            stpf = stpmax
+        else:
+            stpf = stpmin
+
+    # Update the interval which contains a minimizer.
+    if fp > fx:
+        sty = stp
+        fy = fp
+        dy = dp
+    else:
+        if sgnd < 0:
+            sty = stx
+            fy = fx
+            dy = dx
+        stx = stp
+        fx = fp
+        dx = dp
+
+    # Compute the new step.
+    stp = stpf
+
+    return stx, fx, dx, sty, fy, dy, stp, brackt
@@ -0,0 +1,646 @@
+import numpy as np
+import scipy.sparse as sps
+from ._numdiff import approx_derivative, group_columns
+from ._hessian_update_strategy import HessianUpdateStrategy
+from scipy.sparse.linalg import LinearOperator
+from scipy._lib._array_api import atleast_nd, array_namespace
+
+
+FD_METHODS = ('2-point', '3-point', 'cs')
+
+
+class ScalarFunction:
+    """Scalar function and its derivatives.
+
+    This class defines a scalar function F: R^n->R and methods for
+    computing or approximating its first and second derivatives.
+
+    Parameters
+    ----------
+    fun : callable
+        evaluates the scalar function. Must be of the form ``fun(x, *args)``,
+        where ``x`` is the argument in the form of a 1-D array and ``args`` is
+        a tuple of any additional fixed parameters needed to completely specify
+        the function. Should return a scalar.
+    x0 : array-like
+        Provides an initial set of variables for evaluating fun. Array of real
+        elements of size (n,), where 'n' is the number of independent
+        variables.
+    args : tuple, optional
+        Any additional fixed parameters needed to completely specify the scalar
+        function.
+    grad : {callable, '2-point', '3-point', 'cs'}
+        Method for computing the gradient vector.
+        If it is a callable, it should be a function that returns the gradient
+        vector:
+
+            ``grad(x, *args) -> array_like, shape (n,)``
+
+        where ``x`` is an array with shape (n,) and ``args`` is a tuple with
+        the fixed parameters.
+        Alternatively, the keywords  {'2-point', '3-point', 'cs'} can be used
+        to select a finite difference scheme for numerical estimation of the
+        gradient with a relative step size. These finite difference schemes
+        obey any specified `bounds`.
+    hess : {callable, '2-point', '3-point', 'cs', HessianUpdateStrategy}
+        Method for computing the Hessian matrix. If it is callable, it should
+        return the  Hessian matrix:
+
+            ``hess(x, *args) -> {LinearOperator, spmatrix, array}, (n, n)``
+
+        where x is a (n,) ndarray and `args` is a tuple with the fixed
+        parameters. Alternatively, the keywords {'2-point', '3-point', 'cs'}
+        select a finite difference scheme for numerical estimation. Or, objects
+        implementing `HessianUpdateStrategy` interface can be used to
+        approximate the Hessian.
+        Whenever the gradient is estimated via finite-differences, the Hessian
+        cannot be estimated with options {'2-point', '3-point', 'cs'} and needs
+        to be estimated using one of the quasi-Newton strategies.
+    finite_diff_rel_step : None or array_like
+        Relative step size to use. The absolute step size is computed as
+        ``h = finite_diff_rel_step * sign(x0) * max(1, abs(x0))``, possibly
+        adjusted to fit into the bounds. For ``method='3-point'`` the sign
+        of `h` is ignored. If None then finite_diff_rel_step is selected
+        automatically,
+    finite_diff_bounds : tuple of array_like
+        Lower and upper bounds on independent variables. Defaults to no bounds,
+        (-np.inf, np.inf). Each bound must match the size of `x0` or be a
+        scalar, in the latter case the bound will be the same for all
+        variables. Use it to limit the range of function evaluation.
+    epsilon : None or array_like, optional
+        Absolute step size to use, possibly adjusted to fit into the bounds.
+        For ``method='3-point'`` the sign of `epsilon` is ignored. By default
+        relative steps are used, only if ``epsilon is not None`` are absolute
+        steps used.
+
+    Notes
+    -----
+    This class implements a memoization logic. There are methods `fun`,
+    `grad`, hess` and corresponding attributes `f`, `g` and `H`. The following
+    things should be considered:
+
+        1. Use only public methods `fun`, `grad` and `hess`.
+        2. After one of the methods is called, the corresponding attribute
+           will be set. However, a subsequent call with a different argument
+           of *any* of the methods may overwrite the attribute.
+    """
+    def __init__(self, fun, x0, args, grad, hess, finite_diff_rel_step,
+                 finite_diff_bounds, epsilon=None):
+        if not callable(grad) and grad not in FD_METHODS:
+            raise ValueError(
+                f"`grad` must be either callable or one of {FD_METHODS}."
+            )
+
+        if not (callable(hess) or hess in FD_METHODS
+                or isinstance(hess, HessianUpdateStrategy)):
+            raise ValueError(
+                f"`hess` must be either callable, HessianUpdateStrategy"
+                f" or one of {FD_METHODS}."
+            )
+
+        if grad in FD_METHODS and hess in FD_METHODS:
+            raise ValueError("Whenever the gradient is estimated via "
+                             "finite-differences, we require the Hessian "
+                             "to be estimated using one of the "
+                             "quasi-Newton strategies.")
+
+        self.xp = xp = array_namespace(x0)
+        _x = atleast_nd(x0, ndim=1, xp=xp)
+        _dtype = xp.float64
+        if xp.isdtype(_x.dtype, "real floating"):
+            _dtype = _x.dtype
+
+        # promotes to floating
+        self.x = xp.astype(_x, _dtype)
+        self.x_dtype = _dtype
+        self.n = self.x.size
+        self.nfev = 0
+        self.ngev = 0
+        self.nhev = 0
+        self.f_updated = False
+        self.g_updated = False
+        self.H_updated = False
+
+        self._lowest_x = None
+        self._lowest_f = np.inf
+
+        finite_diff_options = {}
+        if grad in FD_METHODS:
+            finite_diff_options["method"] = grad
+            finite_diff_options["rel_step"] = finite_diff_rel_step
+            finite_diff_options["abs_step"] = epsilon
+            finite_diff_options["bounds"] = finite_diff_bounds
+        if hess in FD_METHODS:
+            finite_diff_options["method"] = hess
+            finite_diff_options["rel_step"] = finite_diff_rel_step
+            finite_diff_options["abs_step"] = epsilon
+            finite_diff_options["as_linear_operator"] = True
+
+        # Function evaluation
+        def fun_wrapped(x):
+            self.nfev += 1
+            # Send a copy because the user may overwrite it.
+            # Overwriting results in undefined behaviour because
+            # fun(self.x) will change self.x, with the two no longer linked.
+            fx = fun(np.copy(x), *args)
+            # Make sure the function returns a true scalar
+            if not np.isscalar(fx):
+                try:
+                    fx = np.asarray(fx).item()
+                except (TypeError, ValueError) as e:
+                    raise ValueError(
+                        "The user-provided objective function "
+                        "must return a scalar value."
+                    ) from e
+
+            if fx < self._lowest_f:
+                self._lowest_x = x
+                self._lowest_f = fx
+
+            return fx
+
+        def update_fun():
+            self.f = fun_wrapped(self.x)
+
+        self._update_fun_impl = update_fun
+        self._update_fun()
+
+        # Gradient evaluation
+        if callable(grad):
+            def grad_wrapped(x):
+                self.ngev += 1
+                return np.atleast_1d(grad(np.copy(x), *args))
+
+            def update_grad():
+                self.g = grad_wrapped(self.x)
+
+        elif grad in FD_METHODS:
+            def update_grad():
+                self._update_fun()
+                self.ngev += 1
+                self.g = approx_derivative(fun_wrapped, self.x, f0=self.f,
+                                           **finite_diff_options)
+
+        self._update_grad_impl = update_grad
+        self._update_grad()
+
+        # Hessian Evaluation
+        if callable(hess):
+            self.H = hess(np.copy(x0), *args)
+            self.H_updated = True
+            self.nhev += 1
+
+            if sps.issparse(self.H):
+                def hess_wrapped(x):
+                    self.nhev += 1
+                    return sps.csr_matrix(hess(np.copy(x), *args))
+                self.H = sps.csr_matrix(self.H)
+
+            elif isinstance(self.H, LinearOperator):
+                def hess_wrapped(x):
+                    self.nhev += 1
+                    return hess(np.copy(x), *args)
+
+            else:
+                def hess_wrapped(x):
+                    self.nhev += 1
+                    return np.atleast_2d(np.asarray(hess(np.copy(x), *args)))
+                self.H = np.atleast_2d(np.asarray(self.H))
+
+            def update_hess():
+                self.H = hess_wrapped(self.x)
+
+        elif hess in FD_METHODS:
+            def update_hess():
+                self._update_grad()
+                self.H = approx_derivative(grad_wrapped, self.x, f0=self.g,
+                                           **finite_diff_options)
+                return self.H
+
+            update_hess()
+            self.H_updated = True
+        elif isinstance(hess, HessianUpdateStrategy):
+            self.H = hess
+            self.H.initialize(self.n, 'hess')
+            self.H_updated = True
+            self.x_prev = None
+            self.g_prev = None
+
+            def update_hess():
+                self._update_grad()
+                self.H.update(self.x - self.x_prev, self.g - self.g_prev)
+
+        self._update_hess_impl = update_hess
+
+        if isinstance(hess, HessianUpdateStrategy):
+            def update_x(x):
+                self._update_grad()
+                self.x_prev = self.x
+                self.g_prev = self.g
+                # ensure that self.x is a copy of x. Don't store a reference
+                # otherwise the memoization doesn't work properly.
+
+                _x = atleast_nd(x, ndim=1, xp=self.xp)
+                self.x = self.xp.astype(_x, self.x_dtype)
+                self.f_updated = False
+                self.g_updated = False
+                self.H_updated = False
+                self._update_hess()
+        else:
+            def update_x(x):
+                # ensure that self.x is a copy of x. Don't store a reference
+                # otherwise the memoization doesn't work properly.
+                _x = atleast_nd(x, ndim=1, xp=self.xp)
+                self.x = self.xp.astype(_x, self.x_dtype)
+                self.f_updated = False
+                self.g_updated = False
+                self.H_updated = False
+        self._update_x_impl = update_x
+
+    def _update_fun(self):
+        if not self.f_updated:
+            self._update_fun_impl()
+            self.f_updated = True
+
+    def _update_grad(self):
+        if not self.g_updated:
+            self._update_grad_impl()
+            self.g_updated = True
+
+    def _update_hess(self):
+        if not self.H_updated:
+            self._update_hess_impl()
+            self.H_updated = True
+
+    def fun(self, x):
+        if not np.array_equal(x, self.x):
+            self._update_x_impl(x)
+        self._update_fun()
+        return self.f
+
+    def grad(self, x):
+        if not np.array_equal(x, self.x):
+            self._update_x_impl(x)
+        self._update_grad()
+        return self.g
+
+    def hess(self, x):
+        if not np.array_equal(x, self.x):
+            self._update_x_impl(x)
+        self._update_hess()
+        return self.H
+
+    def fun_and_grad(self, x):
+        if not np.array_equal(x, self.x):
+            self._update_x_impl(x)
+        self._update_fun()
+        self._update_grad()
+        return self.f, self.g
+
+
+class VectorFunction:
+    """Vector function and its derivatives.
+
+    This class defines a vector function F: R^n->R^m and methods for
+    computing or approximating its first and second derivatives.
+
+    Notes
+    -----
+    This class implements a memoization logic. There are methods `fun`,
+    `jac`, hess` and corresponding attributes `f`, `J` and `H`. The following
+    things should be considered:
+
+        1. Use only public methods `fun`, `jac` and `hess`.
+        2. After one of the methods is called, the corresponding attribute
+           will be set. However, a subsequent call with a different argument
+           of *any* of the methods may overwrite the attribute.
+    """
+    def __init__(self, fun, x0, jac, hess,
+                 finite_diff_rel_step, finite_diff_jac_sparsity,
+                 finite_diff_bounds, sparse_jacobian):
+        if not callable(jac) and jac not in FD_METHODS:
+            raise ValueError(f"`jac` must be either callable or one of {FD_METHODS}.")
+
+        if not (callable(hess) or hess in FD_METHODS
+                or isinstance(hess, HessianUpdateStrategy)):
+            raise ValueError("`hess` must be either callable,"
+                             f"HessianUpdateStrategy or one of {FD_METHODS}.")
+
+        if jac in FD_METHODS and hess in FD_METHODS:
+            raise ValueError("Whenever the Jacobian is estimated via "
+                             "finite-differences, we require the Hessian to "
+                             "be estimated using one of the quasi-Newton "
+                             "strategies.")
+
+        self.xp = xp = array_namespace(x0)
+        _x = atleast_nd(x0, ndim=1, xp=xp)
+        _dtype = xp.float64
+        if xp.isdtype(_x.dtype, "real floating"):
+            _dtype = _x.dtype
+
+        # promotes to floating
+        self.x = xp.astype(_x, _dtype)
+        self.x_dtype = _dtype
+
+        self.n = self.x.size
+        self.nfev = 0
+        self.njev = 0
+        self.nhev = 0
+        self.f_updated = False
+        self.J_updated = False
+        self.H_updated = False
+
+        finite_diff_options = {}
+        if jac in FD_METHODS:
+            finite_diff_options["method"] = jac
+            finite_diff_options["rel_step"] = finite_diff_rel_step
+            if finite_diff_jac_sparsity is not None:
+                sparsity_groups = group_columns(finite_diff_jac_sparsity)
+                finite_diff_options["sparsity"] = (finite_diff_jac_sparsity,
+                                                   sparsity_groups)
+            finite_diff_options["bounds"] = finite_diff_bounds
+            self.x_diff = np.copy(self.x)
+        if hess in FD_METHODS:
+            finite_diff_options["method"] = hess
+            finite_diff_options["rel_step"] = finite_diff_rel_step
+            finite_diff_options["as_linear_operator"] = True
+            self.x_diff = np.copy(self.x)
+        if jac in FD_METHODS and hess in FD_METHODS:
+            raise ValueError("Whenever the Jacobian is estimated via "
+                             "finite-differences, we require the Hessian to "
+                             "be estimated using one of the quasi-Newton "
+                             "strategies.")
+
+        # Function evaluation
+        def fun_wrapped(x):
+            self.nfev += 1
+            return np.atleast_1d(fun(x))
+
+        def update_fun():
+            self.f = fun_wrapped(self.x)
+
+        self._update_fun_impl = update_fun
+        update_fun()
+
+        self.v = np.zeros_like(self.f)
+        self.m = self.v.size
+
+        # Jacobian Evaluation
+        if callable(jac):
+            self.J = jac(self.x)
+            self.J_updated = True
+            self.njev += 1
+
+            if (sparse_jacobian or
+                    sparse_jacobian is None and sps.issparse(self.J)):
+                def jac_wrapped(x):
+                    self.njev += 1
+                    return sps.csr_matrix(jac(x))
+                self.J = sps.csr_matrix(self.J)
+                self.sparse_jacobian = True
+
+            elif sps.issparse(self.J):
+                def jac_wrapped(x):
+                    self.njev += 1
+                    return jac(x).toarray()
+                self.J = self.J.toarray()
+                self.sparse_jacobian = False
+
+            else:
+                def jac_wrapped(x):
+                    self.njev += 1
+                    return np.atleast_2d(jac(x))
+                self.J = np.atleast_2d(self.J)
+                self.sparse_jacobian = False
+
+            def update_jac():
+                self.J = jac_wrapped(self.x)
+
+        elif jac in FD_METHODS:
+            self.J = approx_derivative(fun_wrapped, self.x, f0=self.f,
+                                       **finite_diff_options)
+            self.J_updated = True
+
+            if (sparse_jacobian or
+                    sparse_jacobian is None and sps.issparse(self.J)):
+                def update_jac():
+                    self._update_fun()
+                    self.J = sps.csr_matrix(
+                        approx_derivative(fun_wrapped, self.x, f0=self.f,
+                                          **finite_diff_options))
+                self.J = sps.csr_matrix(self.J)
+                self.sparse_jacobian = True
+
+            elif sps.issparse(self.J):
+                def update_jac():
+                    self._update_fun()
+                    self.J = approx_derivative(fun_wrapped, self.x, f0=self.f,
+                                               **finite_diff_options).toarray()
+                self.J = self.J.toarray()
+                self.sparse_jacobian = False
+
+            else:
+                def update_jac():
+                    self._update_fun()
+                    self.J = np.atleast_2d(
+                        approx_derivative(fun_wrapped, self.x, f0=self.f,
+                                          **finite_diff_options))
+                self.J = np.atleast_2d(self.J)
+                self.sparse_jacobian = False
+
+        self._update_jac_impl = update_jac
+
+        # Define Hessian
+        if callable(hess):
+            self.H = hess(self.x, self.v)
+            self.H_updated = True
+            self.nhev += 1
+
+            if sps.issparse(self.H):
+                def hess_wrapped(x, v):
+                    self.nhev += 1
+                    return sps.csr_matrix(hess(x, v))
+                self.H = sps.csr_matrix(self.H)
+
+            elif isinstance(self.H, LinearOperator):
+                def hess_wrapped(x, v):
+                    self.nhev += 1
+                    return hess(x, v)
+
+            else:
+                def hess_wrapped(x, v):
+                    self.nhev += 1
+                    return np.atleast_2d(np.asarray(hess(x, v)))
+                self.H = np.atleast_2d(np.asarray(self.H))
+
+            def update_hess():
+                self.H = hess_wrapped(self.x, self.v)
+        elif hess in FD_METHODS:
+            def jac_dot_v(x, v):
+                return jac_wrapped(x).T.dot(v)
+
+            def update_hess():
+                self._update_jac()
+                self.H = approx_derivative(jac_dot_v, self.x,
+                                           f0=self.J.T.dot(self.v),
+                                           args=(self.v,),
+                                           **finite_diff_options)
+            update_hess()
+            self.H_updated = True
+        elif isinstance(hess, HessianUpdateStrategy):
+            self.H = hess
+            self.H.initialize(self.n, 'hess')
+            self.H_updated = True
+            self.x_prev = None
+            self.J_prev = None
+
+            def update_hess():
+                self._update_jac()
+                # When v is updated before x was updated, then x_prev and
+                # J_prev are None and we need this check.
+                if self.x_prev is not None and self.J_prev is not None:
+                    delta_x = self.x - self.x_prev
+                    delta_g = self.J.T.dot(self.v) - self.J_prev.T.dot(self.v)
+                    self.H.update(delta_x, delta_g)
+
+        self._update_hess_impl = update_hess
+
+        if isinstance(hess, HessianUpdateStrategy):
+            def update_x(x):
+                self._update_jac()
+                self.x_prev = self.x
+                self.J_prev = self.J
+                _x = atleast_nd(x, ndim=1, xp=self.xp)
+                self.x = self.xp.astype(_x, self.x_dtype)
+                self.f_updated = False
+                self.J_updated = False
+                self.H_updated = False
+                self._update_hess()
+        else:
+            def update_x(x):
+                _x = atleast_nd(x, ndim=1, xp=self.xp)
+                self.x = self.xp.astype(_x, self.x_dtype)
+                self.f_updated = False
+                self.J_updated = False
+                self.H_updated = False
+
+        self._update_x_impl = update_x
+
+    def _update_v(self, v):
+        if not np.array_equal(v, self.v):
+            self.v = v
+            self.H_updated = False
+
+    def _update_x(self, x):
+        if not np.array_equal(x, self.x):
+            self._update_x_impl(x)
+
+    def _update_fun(self):
+        if not self.f_updated:
+            self._update_fun_impl()
+            self.f_updated = True
+
+    def _update_jac(self):
+        if not self.J_updated:
+            self._update_jac_impl()
+            self.J_updated = True
+
+    def _update_hess(self):
+        if not self.H_updated:
+            self._update_hess_impl()
+            self.H_updated = True
+
+    def fun(self, x):
+        self._update_x(x)
+        self._update_fun()
+        return self.f
+
+    def jac(self, x):
+        self._update_x(x)
+        self._update_jac()
+        return self.J
+
+    def hess(self, x, v):
+        # v should be updated before x.
+        self._update_v(v)
+        self._update_x(x)
+        self._update_hess()
+        return self.H
+
+
+class LinearVectorFunction:
+    """Linear vector function and its derivatives.
+
+    Defines a linear function F = A x, where x is N-D vector and
+    A is m-by-n matrix. The Jacobian is constant and equals to A. The Hessian
+    is identically zero and it is returned as a csr matrix.
+    """
+    def __init__(self, A, x0, sparse_jacobian):
+        if sparse_jacobian or sparse_jacobian is None and sps.issparse(A):
+            self.J = sps.csr_matrix(A)
+            self.sparse_jacobian = True
+        elif sps.issparse(A):
+            self.J = A.toarray()
+            self.sparse_jacobian = False
+        else:
+            # np.asarray makes sure A is ndarray and not matrix
+            self.J = np.atleast_2d(np.asarray(A))
+            self.sparse_jacobian = False
+
+        self.m, self.n = self.J.shape
+
+        self.xp = xp = array_namespace(x0)
+        _x = atleast_nd(x0, ndim=1, xp=xp)
+        _dtype = xp.float64
+        if xp.isdtype(_x.dtype, "real floating"):
+            _dtype = _x.dtype
+
+        # promotes to floating
+        self.x = xp.astype(_x, _dtype)
+        self.x_dtype = _dtype
+
+        self.f = self.J.dot(self.x)
+        self.f_updated = True
+
+        self.v = np.zeros(self.m, dtype=float)
+        self.H = sps.csr_matrix((self.n, self.n))
+
+    def _update_x(self, x):
+        if not np.array_equal(x, self.x):
+            _x = atleast_nd(x, ndim=1, xp=self.xp)
+            self.x = self.xp.astype(_x, self.x_dtype)
+            self.f_updated = False
+
+    def fun(self, x):
+        self._update_x(x)
+        if not self.f_updated:
+            self.f = self.J.dot(x)
+            self.f_updated = True
+        return self.f
+
+    def jac(self, x):
+        self._update_x(x)
+        return self.J
+
+    def hess(self, x, v):
+        self._update_x(x)
+        self.v = v
+        return self.H
+
+
+class IdentityVectorFunction(LinearVectorFunction):
+    """Identity vector function and its derivatives.
+
+    The Jacobian is the identity matrix, returned as a dense array when
+    `sparse_jacobian=False` and as a csr matrix otherwise. The Hessian is
+    identically zero and it is returned as a csr matrix.
+    """
+    def __init__(self, x0, sparse_jacobian):
+        n = len(x0)
+        if sparse_jacobian or sparse_jacobian is None:
+            A = sps.eye(n, format='csr')
+            sparse_jacobian = True
+        else:
+            A = np.eye(n)
+            sparse_jacobian = False
+        super().__init__(A, x0, sparse_jacobian)
@@ -0,0 +1,669 @@
+# mypy: disable-error-code="attr-defined"
+import numpy as np
+import scipy._lib._elementwise_iterative_method as eim
+from scipy._lib._util import _RichResult
+
+_EERRORINCREASE = -1  # used in _differentiate
+
+def _differentiate_iv(func, x, args, atol, rtol, maxiter, order, initial_step,
+                      step_factor, step_direction, preserve_shape, callback):
+    # Input validation for `_differentiate`
+
+    if not callable(func):
+        raise ValueError('`func` must be callable.')
+
+    # x has more complex IV that is taken care of during initialization
+    x = np.asarray(x)
+    dtype = x.dtype if np.issubdtype(x.dtype, np.inexact) else np.float64
+
+    if not np.iterable(args):
+        args = (args,)
+
+    if atol is None:
+        atol = np.finfo(dtype).tiny
+
+    if rtol is None:
+        rtol = np.sqrt(np.finfo(dtype).eps)
+
+    message = 'Tolerances and step parameters must be non-negative scalars.'
+    tols = np.asarray([atol, rtol, initial_step, step_factor])
+    if (not np.issubdtype(tols.dtype, np.number)
+            or np.any(tols < 0)
+            or tols.shape != (4,)):
+        raise ValueError(message)
+    initial_step, step_factor = tols[2:].astype(dtype)
+
+    maxiter_int = int(maxiter)
+    if maxiter != maxiter_int or maxiter <= 0:
+        raise ValueError('`maxiter` must be a positive integer.')
+
+    order_int = int(order)
+    if order_int != order or order <= 0:
+        raise ValueError('`order` must be a positive integer.')
+
+    step_direction = np.sign(step_direction).astype(dtype)
+    x, step_direction = np.broadcast_arrays(x, step_direction)
+    x, step_direction = x[()], step_direction[()]
+
+    message = '`preserve_shape` must be True or False.'
+    if preserve_shape not in {True, False}:
+        raise ValueError(message)
+
+    if callback is not None and not callable(callback):
+        raise ValueError('`callback` must be callable.')
+
+    return (func, x, args, atol, rtol, maxiter_int, order_int, initial_step,
+            step_factor, step_direction, preserve_shape, callback)
+
+
+def _differentiate(func, x, *, args=(), atol=None, rtol=None, maxiter=10,
+                   order=8, initial_step=0.5, step_factor=2.0,
+                   step_direction=0, preserve_shape=False, callback=None):
+    """Evaluate the derivative of an elementwise scalar function numerically.
+
+    Parameters
+    ----------
+    func : callable
+        The function whose derivative is desired. The signature must be::
+
+            func(x: ndarray, *fargs) -> ndarray
+
+         where each element of ``x`` is a finite real and ``fargs`` is a tuple,
+         which may contain an arbitrary number of arrays that are broadcastable
+         with `x`. ``func`` must be an elementwise function: each element
+         ``func(x)[i]`` must equal ``func(x[i])`` for all indices ``i``.
+    x : array_like
+        Abscissae at which to evaluate the derivative.
+    args : tuple, optional
+        Additional positional arguments to be passed to `func`. Must be arrays
+        broadcastable with `x`. If the callable to be differentiated requires
+        arguments that are not broadcastable with `x`, wrap that callable with
+        `func`. See Examples.
+    atol, rtol : float, optional
+        Absolute and relative tolerances for the stopping condition: iteration
+        will stop when ``res.error < atol + rtol * abs(res.df)``. The default
+        `atol` is the smallest normal number of the appropriate dtype, and
+        the default `rtol` is the square root of the precision of the
+        appropriate dtype.
+    order : int, default: 8
+        The (positive integer) order of the finite difference formula to be
+        used. Odd integers will be rounded up to the next even integer.
+    initial_step : float, default: 0.5
+        The (absolute) initial step size for the finite difference derivative
+        approximation.
+    step_factor : float, default: 2.0
+        The factor by which the step size is *reduced* in each iteration; i.e.
+        the step size in iteration 1 is ``initial_step/step_factor``. If
+        ``step_factor < 1``, subsequent steps will be greater than the initial
+        step; this may be useful if steps smaller than some threshold are
+        undesirable (e.g. due to subtractive cancellation error).
+    maxiter : int, default: 10
+        The maximum number of iterations of the algorithm to perform. See
+        notes.
+    step_direction : array_like
+        An array representing the direction of the finite difference steps (for
+        use when `x` lies near to the boundary of the domain of the function.)
+        Must be broadcastable with `x` and all `args`.
+        Where 0 (default), central differences are used; where negative (e.g.
+        -1), steps are non-positive; and where positive (e.g. 1), all steps are
+        non-negative.
+    preserve_shape : bool, default: False
+        In the following, "arguments of `func`" refers to the array ``x`` and
+        any arrays within ``fargs``. Let ``shape`` be the broadcasted shape
+        of `x` and all elements of `args` (which is conceptually
+        distinct from ``fargs`` passed into `f`).
+
+        - When ``preserve_shape=False`` (default), `f` must accept arguments
+          of *any* broadcastable shapes.
+
+        - When ``preserve_shape=True``, `f` must accept arguments of shape
+          ``shape`` *or* ``shape + (n,)``, where ``(n,)`` is the number of
+          abscissae at which the function is being evaluated.
+
+        In either case, for each scalar element ``xi`` within `x`, the array
+        returned by `f` must include the scalar ``f(xi)`` at the same index.
+        Consequently, the shape of the output is always the shape of the input
+        ``x``.
+
+        See Examples.
+    callback : callable, optional
+        An optional user-supplied function to be called before the first
+        iteration and after each iteration.
+        Called as ``callback(res)``, where ``res`` is a ``_RichResult``
+        similar to that returned by `_differentiate` (but containing the
+        current iterate's values of all variables). If `callback` raises a
+        ``StopIteration``, the algorithm will terminate immediately and
+        `_differentiate` will return a result.
+
+    Returns
+    -------
+    res : _RichResult
+        An instance of `scipy._lib._util._RichResult` with the following
+        attributes. (The descriptions are written as though the values will be
+        scalars; however, if `func` returns an array, the outputs will be
+        arrays of the same shape.)
+
+        success : bool
+            ``True`` when the algorithm terminated successfully (status ``0``).
+        status : int
+            An integer representing the exit status of the algorithm.
+            ``0`` : The algorithm converged to the specified tolerances.
+            ``-1`` : The error estimate increased, so iteration was terminated.
+            ``-2`` : The maximum number of iterations was reached.
+            ``-3`` : A non-finite value was encountered.
+            ``-4`` : Iteration was terminated by `callback`.
+            ``1`` : The algorithm is proceeding normally (in `callback` only).
+        df : float
+            The derivative of `func` at `x`, if the algorithm terminated
+            successfully.
+        error : float
+            An estimate of the error: the magnitude of the difference between
+            the current estimate of the derivative and the estimate in the
+            previous iteration.
+        nit : int
+            The number of iterations performed.
+        nfev : int
+            The number of points at which `func` was evaluated.
+        x : float
+            The value at which the derivative of `func` was evaluated
+            (after broadcasting with `args` and `step_direction`).
+
+    Notes
+    -----
+    The implementation was inspired by jacobi [1]_, numdifftools [2]_, and
+    DERIVEST [3]_, but the implementation follows the theory of Taylor series
+    more straightforwardly (and arguably naively so).
+    In the first iteration, the derivative is estimated using a finite
+    difference formula of order `order` with maximum step size `initial_step`.
+    Each subsequent iteration, the maximum step size is reduced by
+    `step_factor`, and the derivative is estimated again until a termination
+    condition is reached. The error estimate is the magnitude of the difference
+    between the current derivative approximation and that of the previous
+    iteration.
+
+    The stencils of the finite difference formulae are designed such that
+    abscissae are "nested": after `func` is evaluated at ``order + 1``
+    points in the first iteration, `func` is evaluated at only two new points
+    in each subsequent iteration; ``order - 1`` previously evaluated function
+    values required by the finite difference formula are reused, and two
+    function values (evaluations at the points furthest from `x`) are unused.
+
+    Step sizes are absolute. When the step size is small relative to the
+    magnitude of `x`, precision is lost; for example, if `x` is ``1e20``, the
+    default initial step size of ``0.5`` cannot be resolved. Accordingly,
+    consider using larger initial step sizes for large magnitudes of `x`.
+
+    The default tolerances are challenging to satisfy at points where the
+    true derivative is exactly zero. If the derivative may be exactly zero,
+    consider specifying an absolute tolerance (e.g. ``atol=1e-16``) to
+    improve convergence.
+
+    References
+    ----------
+    [1]_ Hans Dembinski (@HDembinski). jacobi.
+         https://github.com/HDembinski/jacobi
+    [2]_ Per A. Brodtkorb and John D'Errico. numdifftools.
+         https://numdifftools.readthedocs.io/en/latest/
+    [3]_ John D'Errico. DERIVEST: Adaptive Robust Numerical Differentiation.
+         https://www.mathworks.com/matlabcentral/fileexchange/13490-adaptive-robust-numerical-differentiation
+    [4]_ Numerical Differentition. Wikipedia.
+         https://en.wikipedia.org/wiki/Numerical_differentiation
+
+    Examples
+    --------
+    Evaluate the derivative of ``np.exp`` at several points ``x``.
+
+    >>> import numpy as np
+    >>> from scipy.optimize._differentiate import _differentiate
+    >>> f = np.exp
+    >>> df = np.exp  # true derivative
+    >>> x = np.linspace(1, 2, 5)
+    >>> res = _differentiate(f, x)
+    >>> res.df  # approximation of the derivative
+    array([2.71828183, 3.49034296, 4.48168907, 5.75460268, 7.3890561 ])
+    >>> res.error  # estimate of the error
+    array(
+        [7.12940817e-12, 9.16688947e-12, 1.17594823e-11, 1.50972568e-11, 1.93942640e-11]
+    )
+    >>> abs(res.df - df(x))  # true error
+    array(
+        [3.06421555e-14, 3.01980663e-14, 5.06261699e-14, 6.30606678e-14, 8.34887715e-14]
+    )
+
+    Show the convergence of the approximation as the step size is reduced.
+    Each iteration, the step size is reduced by `step_factor`, so for
+    sufficiently small initial step, each iteration reduces the error by a
+    factor of ``1/step_factor**order`` until finite precision arithmetic
+    inhibits further improvement.
+
+    >>> iter = list(range(1, 12))  # maximum iterations
+    >>> hfac = 2  # step size reduction per iteration
+    >>> hdir = [-1, 0, 1]  # compare left-, central-, and right- steps
+    >>> order = 4  # order of differentiation formula
+    >>> x = 1
+    >>> ref = df(x)
+    >>> errors = []  # true error
+    >>> for i in iter:
+    ...     res = _differentiate(f, x, maxiter=i, step_factor=hfac,
+    ...                          step_direction=hdir, order=order,
+    ...                          atol=0, rtol=0)  # prevent early termination
+    ...     errors.append(abs(res.df - ref))
+    >>> errors = np.array(errors)
+    >>> plt.semilogy(iter, errors[:, 0], label='left differences')
+    >>> plt.semilogy(iter, errors[:, 1], label='central differences')
+    >>> plt.semilogy(iter, errors[:, 2], label='right differences')
+    >>> plt.xlabel('iteration')
+    >>> plt.ylabel('error')
+    >>> plt.legend()
+    >>> plt.show()
+    >>> (errors[1, 1] / errors[0, 1], 1 / hfac**order)
+    (0.06215223140159822, 0.0625)
+
+    The implementation is vectorized over `x`, `step_direction`, and `args`.
+    The function is evaluated once before the first iteration to perform input
+    validation and standardization, and once per iteration thereafter.
+
+    >>> def f(x, p):
+    ...     print('here')
+    ...     f.nit += 1
+    ...     return x**p
+    >>> f.nit = 0
+    >>> def df(x, p):
+    ...     return p*x**(p-1)
+    >>> x = np.arange(1, 5)
+    >>> p = np.arange(1, 6).reshape((-1, 1))
+    >>> hdir = np.arange(-1, 2).reshape((-1, 1, 1))
+    >>> res = _differentiate(f, x, args=(p,), step_direction=hdir, maxiter=1)
+    >>> np.allclose(res.df, df(x, p))
+    True
+    >>> res.df.shape
+    (3, 5, 4)
+    >>> f.nit
+    2
+
+    By default, `preserve_shape` is False, and therefore the callable
+    `f` may be called with arrays of any broadcastable shapes.
+    For example:
+
+    >>> shapes = []
+    >>> def f(x, c):
+    ...    shape = np.broadcast_shapes(x.shape, c.shape)
+    ...    shapes.append(shape)
+    ...    return np.sin(c*x)
+    >>>
+    >>> c = [1, 5, 10, 20]
+    >>> res = _differentiate(f, 0, args=(c,))
+    >>> shapes
+    [(4,), (4, 8), (4, 2), (3, 2), (2, 2), (1, 2)]
+
+    To understand where these shapes are coming from - and to better
+    understand how `_differentiate` computes accurate results - note that
+    higher values of ``c`` correspond with higher frequency sinusoids.
+    The higher frequency sinusoids make the function's derivative change
+    faster, so more function evaluations are required to achieve the target
+    accuracy:
+
+    >>> res.nfev
+    array([11, 13, 15, 17])
+
+    The initial ``shape``, ``(4,)``, corresponds with evaluating the
+    function at a single abscissa and all four frequencies; this is used
+    for input validation and to determine the size and dtype of the arrays
+    that store results. The next shape corresponds with evaluating the
+    function at an initial grid of abscissae and all four frequencies.
+    Successive calls to the function evaluate the function at two more
+    abscissae, increasing the effective order of the approximation by two.
+    However, in later function evaluations, the function is evaluated at
+    fewer frequencies because the corresponding derivative has already
+    converged to the required tolerance. This saves function evaluations to
+    improve performance, but it requires the function to accept arguments of
+    any shape.
+
+    "Vector-valued" functions are unlikely to satisfy this requirement.
+    For example, consider
+
+    >>> def f(x):
+    ...    return [x, np.sin(3*x), x+np.sin(10*x), np.sin(20*x)*(x-1)**2]
+
+    This integrand is not compatible with `_differentiate` as written; for instance,
+    the shape of the output will not be the same as the shape of ``x``. Such a
+    function *could* be converted to a compatible form with the introduction of
+    additional parameters, but this would be inconvenient. In such cases,
+    a simpler solution would be to use `preserve_shape`.
+
+    >>> shapes = []
+    >>> def f(x):
+    ...     shapes.append(x.shape)
+    ...     x0, x1, x2, x3 = x
+    ...     return [x0, np.sin(3*x1), x2+np.sin(10*x2), np.sin(20*x3)*(x3-1)**2]
+    >>>
+    >>> x = np.zeros(4)
+    >>> res = _differentiate(f, x, preserve_shape=True)
+    >>> shapes
+    [(4,), (4, 8), (4, 2), (4, 2), (4, 2), (4, 2)]
+
+    Here, the shape of ``x`` is ``(4,)``. With ``preserve_shape=True``, the
+    function may be called with argument ``x`` of shape ``(4,)`` or ``(4, n)``,
+    and this is what we observe.
+
+    """
+    # TODO (followup):
+    #  - investigate behavior at saddle points
+    #  - array initial_step / step_factor?
+    #  - multivariate functions?
+
+    res = _differentiate_iv(func, x, args, atol, rtol, maxiter, order, initial_step,
+                            step_factor, step_direction, preserve_shape, callback)
+    (func, x, args, atol, rtol, maxiter, order,
+     h0, fac, hdir, preserve_shape, callback) = res
+
+    # Initialization
+    # Since f(x) (no step) is not needed for central differences, it may be
+    # possible to eliminate this function evaluation. However, it's useful for
+    # input validation and standardization, and everything else is designed to
+    # reduce function calls, so let's keep it simple.
+    temp = eim._initialize(func, (x,), args, preserve_shape=preserve_shape)
+    func, xs, fs, args, shape, dtype = temp
+    x, f = xs[0], fs[0]
+    df = np.full_like(f, np.nan)
+    # Ideally we'd broadcast the shape of `hdir` in `_elementwise_algo_init`, but
+    # it's simpler to do it here than to generalize `_elementwise_algo_init` further.
+    # `hdir` and `x` are already broadcasted in `_differentiate_iv`, so we know
+    # that `hdir` can be broadcasted to the final shape.
+    hdir = np.broadcast_to(hdir, shape).flatten()
+
+    status = np.full_like(x, eim._EINPROGRESS, dtype=int)  # in progress
+    nit, nfev = 0, 1  # one function evaluations performed above
+    # Boolean indices of left, central, right, and (all) one-sided steps
+    il = hdir < 0
+    ic = hdir == 0
+    ir = hdir > 0
+    io = il | ir
+
+    # Most of these attributes are reasonably obvious, but:
+    # - `fs` holds all the function values of all active `x`. The zeroth
+    #   axis corresponds with active points `x`, the first axis corresponds
+    #   with the different steps (in the order described in
+    #   `_differentiate_weights`).
+    # - `terms` (which could probably use a better name) is half the `order`,
+    #   which is always even.
+    work = _RichResult(x=x, df=df, fs=f[:, np.newaxis], error=np.nan, h=h0,
+                       df_last=np.nan, error_last=np.nan, h0=h0, fac=fac,
+                       atol=atol, rtol=rtol, nit=nit, nfev=nfev,
+                       status=status, dtype=dtype, terms=(order+1)//2,
+                       hdir=hdir, il=il, ic=ic, ir=ir, io=io)
+    # This is the correspondence between terms in the `work` object and the
+    # final result. In this case, the mapping is trivial. Note that `success`
+    # is prepended automatically.
+    res_work_pairs = [('status', 'status'), ('df', 'df'), ('error', 'error'),
+                      ('nit', 'nit'), ('nfev', 'nfev'), ('x', 'x')]
+
+    def pre_func_eval(work):
+        """Determine the abscissae at which the function needs to be evaluated.
+
+        See `_differentiate_weights` for a description of the stencil (pattern
+        of the abscissae).
+
+        In the first iteration, there is only one stored function value in
+        `work.fs`, `f(x)`, so we need to evaluate at `order` new points. In
+        subsequent iterations, we evaluate at two new points. Note that
+        `work.x` is always flattened into a 1D array after broadcasting with
+        all `args`, so we add a new axis at the end and evaluate all point
+        in one call to the function.
+
+        For improvement:
+        - Consider measuring the step size actually taken, since `(x + h) - x`
+          is not identically equal to `h` with floating point arithmetic.
+        - Adjust the step size automatically if `x` is too big to resolve the
+          step.
+        - We could probably save some work if there are no central difference
+          steps or no one-sided steps.
+        """
+        n = work.terms  # half the order
+        h = work.h  # step size
+        c = work.fac  # step reduction factor
+        d = c**0.5  # square root of step reduction factor (one-sided stencil)
+        # Note - no need to be careful about dtypes until we allocate `x_eval`
+
+        if work.nit == 0:
+            hc = h / c**np.arange(n)
+            hc = np.concatenate((-hc[::-1], hc))
+        else:
+            hc = np.asarray([-h, h]) / c**(n-1)
+
+        if work.nit == 0:
+            hr = h / d**np.arange(2*n)
+        else:
+            hr = np.asarray([h, h/d]) / c**(n-1)
+
+        n_new = 2*n if work.nit == 0 else 2  # number of new abscissae
+        x_eval = np.zeros((len(work.hdir), n_new), dtype=work.dtype)
+        il, ic, ir = work.il, work.ic, work.ir
+        x_eval[ir] = work.x[ir, np.newaxis] + hr
+        x_eval[ic] = work.x[ic, np.newaxis] + hc
+        x_eval[il] = work.x[il, np.newaxis] - hr
+        return x_eval
+
+    def post_func_eval(x, f, work):
+        """ Estimate the derivative and error from the function evaluations
+
+        As in `pre_func_eval`: in the first iteration, there is only one stored
+        function value in `work.fs`, `f(x)`, so we need to add the `order` new
+        points. In subsequent iterations, we add two new points. The tricky
+        part is getting the order to match that of the weights, which is
+        described in `_differentiate_weights`.
+
+        For improvement:
+        - Change the order of the weights (and steps in `pre_func_eval`) to
+          simplify `work_fc` concatenation and eliminate `fc` concatenation.
+        - It would be simple to do one-step Richardson extrapolation with `df`
+          and `df_last` to increase the order of the estimate and/or improve
+          the error estimate.
+        - Process the function evaluations in a more numerically favorable
+          way. For instance, combining the pairs of central difference evals
+          into a second-order approximation and using Richardson extrapolation
+          to produce a higher order approximation seemed to retain accuracy up
+          to very high order.
+        - Alternatively, we could use `polyfit` like Jacobi. An advantage of
+          fitting polynomial to more points than necessary is improved noise
+          tolerance.
+        """
+        n = work.terms
+        n_new = n if work.nit == 0 else 1
+        il, ic, io = work.il, work.ic, work.io
+
+        # Central difference
+        # `work_fc` is *all* the points at which the function has been evaluated
+        # `fc` is the points we're using *this iteration* to produce the estimate
+        work_fc = (f[ic, :n_new], work.fs[ic, :], f[ic, -n_new:])
+        work_fc = np.concatenate(work_fc, axis=-1)
+        if work.nit == 0:
+            fc = work_fc
+        else:
+            fc = (work_fc[:, :n], work_fc[:, n:n+1], work_fc[:, -n:])
+            fc = np.concatenate(fc, axis=-1)
+
+        # One-sided difference
+        work_fo = np.concatenate((work.fs[io, :], f[io, :]), axis=-1)
+        if work.nit == 0:
+            fo = work_fo
+        else:
+            fo = np.concatenate((work_fo[:, 0:1], work_fo[:, -2*n:]), axis=-1)
+
+        work.fs = np.zeros((len(ic), work.fs.shape[-1] + 2*n_new))
+        work.fs[ic] = work_fc
+        work.fs[io] = work_fo
+
+        wc, wo = _differentiate_weights(work, n)
+        work.df_last = work.df.copy()
+        work.df[ic] = fc @ wc / work.h
+        work.df[io] = fo @ wo / work.h
+        work.df[il] *= -1
+
+        work.h /= work.fac
+        work.error_last = work.error
+        # Simple error estimate - the difference in derivative estimates between
+        # this iteration and the last. This is typically conservative because if
+        # convergence has begin, the true error is much closer to the difference
+        # between the current estimate and the *next* error estimate. However,
+        # we could use Richarson extrapolation to produce an error estimate that
+        # is one order higher, and take the difference between that and
+        # `work.df` (which would just be constant factor that depends on `fac`.)
+        work.error = abs(work.df - work.df_last)
+
+    def check_termination(work):
+        """Terminate due to convergence, non-finite values, or error increase"""
+        stop = np.zeros_like(work.df).astype(bool)
+
+        i = work.error < work.atol + work.rtol*abs(work.df)
+        work.status[i] = eim._ECONVERGED
+        stop[i] = True
+
+        if work.nit > 0:
+            i = ~((np.isfinite(work.x) & np.isfinite(work.df)) | stop)
+            work.df[i], work.status[i] = np.nan, eim._EVALUEERR
+            stop[i] = True
+
+        # With infinite precision, there is a step size below which
+        # all smaller step sizes will reduce the error. But in floating point
+        # arithmetic, catastrophic cancellation will begin to cause the error
+        # to increase again. This heuristic tries to avoid step sizes that are
+        # too small. There may be more theoretically sound approaches for
+        # detecting a step size that minimizes the total error, but this
+        # heuristic seems simple and effective.
+        i = (work.error > work.error_last*10) & ~stop
+        work.status[i] = _EERRORINCREASE
+        stop[i] = True
+
+        return stop
+
+    def post_termination_check(work):
+        return
+
+    def customize_result(res, shape):
+        return shape
+
+    return eim._loop(work, callback, shape, maxiter, func, args, dtype,
+                     pre_func_eval, post_func_eval, check_termination,
+                     post_termination_check, customize_result, res_work_pairs,
+                     preserve_shape)
+
+
+def _differentiate_weights(work, n):
+    # This produces the weights of the finite difference formula for a given
+    # stencil. In experiments, use of a second-order central difference formula
+    # with Richardson extrapolation was more accurate numerically, but it was
+    # more complicated, and it would have become even more complicated when
+    # adding support for one-sided differences. However, now that all the
+    # function evaluation values are stored, they can be processed in whatever
+    # way is desired to produce the derivative estimate. We leave alternative
+    # approaches to future work. To be more self-contained, here is the theory
+    # for deriving the weights below.
+    #
+    # Recall that the Taylor expansion of a univariate, scalar-values function
+    # about a point `x` may be expressed as:
+    #      f(x + h)  =     f(x) + f'(x)*h + f''(x)/2!*h**2  + O(h**3)
+    # Suppose we evaluate f(x), f(x+h), and f(x-h).  We have:
+    #      f(x)      =     f(x)
+    #      f(x + h)  =     f(x) + f'(x)*h + f''(x)/2!*h**2  + O(h**3)
+    #      f(x - h)  =     f(x) - f'(x)*h + f''(x)/2!*h**2  + O(h**3)
+    # We can solve for weights `wi` such that:
+    #   w1*f(x)      = w1*(f(x))
+    # + w2*f(x + h)  = w2*(f(x) + f'(x)*h + f''(x)/2!*h**2) + O(h**3)
+    # + w3*f(x - h)  = w3*(f(x) - f'(x)*h + f''(x)/2!*h**2) + O(h**3)
+    #                =     0    + f'(x)*h + 0               + O(h**3)
+    # Then
+    #     f'(x) ~ (w1*f(x) + w2*f(x+h) + w3*f(x-h))/h
+    # is a finite difference derivative approximation with error O(h**2),
+    # and so it is said to be a "second-order" approximation. Under certain
+    # conditions (e.g. well-behaved function, `h` sufficiently small), the
+    # error in the approximation will decrease with h**2; that is, if `h` is
+    # reduced by a factor of 2, the error is reduced by a factor of 4.
+    #
+    # By default, we use eighth-order formulae. Our central-difference formula
+    # uses abscissae:
+    #   x-h/c**3, x-h/c**2, x-h/c, x-h, x, x+h, x+h/c, x+h/c**2, x+h/c**3
+    # where `c` is the step factor. (Typically, the step factor is greater than
+    # one, so the outermost points - as written above - are actually closest to
+    # `x`.) This "stencil" is chosen so that each iteration, the step can be
+    # reduced by the factor `c`, and most of the function evaluations can be
+    # reused with the new step size. For example, in the next iteration, we
+    # will have:
+    #   x-h/c**4, x-h/c**3, x-h/c**2, x-h/c, x, x+h/c, x+h/c**2, x+h/c**3, x+h/c**4
+    # We do not reuse `x-h` and `x+h` for the new derivative estimate.
+    # While this would increase the order of the formula and thus the
+    # theoretical convergence rate, it is also less stable numerically.
+    # (As noted above, there are other ways of processing the values that are
+    # more stable. Thus, even now we store `f(x-h)` and `f(x+h)` in `work.fs`
+    # to simplify future development of this sort of improvement.)
+    #
+    # The (right) one-sided formula is produced similarly using abscissae
+    #   x, x+h, x+h/d, x+h/d**2, ..., x+h/d**6, x+h/d**7, x+h/d**7
+    # where `d` is the square root of `c`. (The left one-sided formula simply
+    # uses -h.) When the step size is reduced by factor `c = d**2`, we have
+    # abscissae:
+    #   x, x+h/d**2, x+h/d**3..., x+h/d**8, x+h/d**9, x+h/d**9
+    # `d` is chosen as the square root of `c` so that the rate of the step-size
+    # reduction is the same per iteration as in the central difference case.
+    # Note that because the central difference formulas are inherently of even
+    # order, for simplicity, we use only even-order formulas for one-sided
+    # differences, too.
+
+    # It's possible for the user to specify `fac` in, say, double precision but
+    # `x` and `args` in single precision. `fac` gets converted to single
+    # precision, but we should always use double precision for the intermediate
+    # calculations here to avoid additional error in the weights.
+    fac = work.fac.astype(np.float64)
+
+    # Note that if the user switches back to floating point precision with
+    # `x` and `args`, then `fac` will not necessarily equal the (lower
+    # precision) cached `_differentiate_weights.fac`, and the weights will
+    # need to be recalculated. This could be fixed, but it's late, and of
+    # low consequence.
+    if fac != _differentiate_weights.fac:
+        _differentiate_weights.central = []
+        _differentiate_weights.right = []
+        _differentiate_weights.fac = fac
+
+    if len(_differentiate_weights.central) != 2*n + 1:
+        # Central difference weights. Consider refactoring this; it could
+        # probably be more compact.
+        i = np.arange(-n, n + 1)
+        p = np.abs(i) - 1.  # center point has power `p` -1, but sign `s` is 0
+        s = np.sign(i)
+
+        h = s / fac ** p
+        A = np.vander(h, increasing=True).T
+        b = np.zeros(2*n + 1)
+        b[1] = 1
+        weights = np.linalg.solve(A, b)
+
+        # Enforce identities to improve accuracy
+        weights[n] = 0
+        for i in range(n):
+            weights[-i-1] = -weights[i]
+
+        # Cache the weights. We only need to calculate them once unless
+        # the step factor changes.
+        _differentiate_weights.central = weights
+
+        # One-sided difference weights. The left one-sided weights (with
+        # negative steps) are simply the negative of the right one-sided
+        # weights, so no need to compute them separately.
+        i = np.arange(2*n + 1)
+        p = i - 1.
+        s = np.sign(i)
+
+        h = s / np.sqrt(fac) ** p
+        A = np.vander(h, increasing=True).T
+        b = np.zeros(2 * n + 1)
+        b[1] = 1
+        weights = np.linalg.solve(A, b)
+
+        _differentiate_weights.right = weights
+
+    return (_differentiate_weights.central.astype(work.dtype, copy=False),
+            _differentiate_weights.right.astype(work.dtype, copy=False))
+_differentiate_weights.central = []
+_differentiate_weights.right = []
+_differentiate_weights.fac = None
@@ -0,0 +1,278 @@
+from __future__ import annotations
+from typing import (  # noqa: UP035
+    Any, Callable, Iterable, TYPE_CHECKING
+)
+
+import numpy as np
+from scipy.optimize import OptimizeResult
+from ._constraints import old_bound_to_new, Bounds
+from ._direct import direct as _direct  # type: ignore
+
+if TYPE_CHECKING:
+    import numpy.typing as npt
+
+__all__ = ['direct']
+
+ERROR_MESSAGES = (
+    "Number of function evaluations done is larger than maxfun={}",
+    "Number of iterations is larger than maxiter={}",
+    "u[i] < l[i] for some i",
+    "maxfun is too large",
+    "Initialization failed",
+    "There was an error in the creation of the sample points",
+    "An error occurred while the function was sampled",
+    "Maximum number of levels has been reached.",
+    "Forced stop",
+    "Invalid arguments",
+    "Out of memory",
+)
+
+SUCCESS_MESSAGES = (
+    ("The best function value found is within a relative error={} "
+     "of the (known) global optimum f_min"),
+    ("The volume of the hyperrectangle containing the lowest function value "
+     "found is below vol_tol={}"),
+    ("The side length measure of the hyperrectangle containing the lowest "
+     "function value found is below len_tol={}"),
+)
+
+
+def direct(
+    func: Callable[[npt.ArrayLike, tuple[Any]], float],
+    bounds: Iterable | Bounds,
+    *,
+    args: tuple = (),
+    eps: float = 1e-4,
+    maxfun: int | None = None,
+    maxiter: int = 1000,
+    locally_biased: bool = True,
+    f_min: float = -np.inf,
+    f_min_rtol: float = 1e-4,
+    vol_tol: float = 1e-16,
+    len_tol: float = 1e-6,
+    callback: Callable[[npt.ArrayLike], None] | None = None
+) -> OptimizeResult:
+    """
+    Finds the global minimum of a function using the
+    DIRECT algorithm.
+
+    Parameters
+    ----------
+    func : callable
+        The objective function to be minimized.
+        ``func(x, *args) -> float``
+        where ``x`` is an 1-D array with shape (n,) and ``args`` is a tuple of
+        the fixed parameters needed to completely specify the function.
+    bounds : sequence or `Bounds`
+        Bounds for variables. There are two ways to specify the bounds:
+
+        1. Instance of `Bounds` class.
+        2. ``(min, max)`` pairs for each element in ``x``.
+
+    args : tuple, optional
+        Any additional fixed parameters needed to
+        completely specify the objective function.
+    eps : float, optional
+        Minimal required difference of the objective function values
+        between the current best hyperrectangle and the next potentially
+        optimal hyperrectangle to be divided. In consequence, `eps` serves as a
+        tradeoff between local and global search: the smaller, the more local
+        the search becomes. Default is 1e-4.
+    maxfun : int or None, optional
+        Approximate upper bound on objective function evaluations.
+        If `None`, will be automatically set to ``1000 * N`` where ``N``
+        represents the number of dimensions. Will be capped if necessary to
+        limit DIRECT's RAM usage to app. 1GiB. This will only occur for very
+        high dimensional problems and excessive `max_fun`. Default is `None`.
+    maxiter : int, optional
+        Maximum number of iterations. Default is 1000.
+    locally_biased : bool, optional
+        If `True` (default), use the locally biased variant of the
+        algorithm known as DIRECT_L. If `False`, use the original unbiased
+        DIRECT algorithm. For hard problems with many local minima,
+        `False` is recommended.
+    f_min : float, optional
+        Function value of the global optimum. Set this value only if the
+        global optimum is known. Default is ``-np.inf``, so that this
+        termination criterion is deactivated.
+    f_min_rtol : float, optional
+        Terminate the optimization once the relative error between the
+        current best minimum `f` and the supplied global minimum `f_min`
+        is smaller than `f_min_rtol`. This parameter is only used if
+        `f_min` is also set. Must lie between 0 and 1. Default is 1e-4.
+    vol_tol : float, optional
+        Terminate the optimization once the volume of the hyperrectangle
+        containing the lowest function value is smaller than `vol_tol`
+        of the complete search space. Must lie between 0 and 1.
+        Default is 1e-16.
+    len_tol : float, optional
+        If `locally_biased=True`, terminate the optimization once half of
+        the normalized maximal side length of the hyperrectangle containing
+        the lowest function value is smaller than `len_tol`.
+        If `locally_biased=False`, terminate the optimization once half of
+        the normalized diagonal of the hyperrectangle containing the lowest
+        function value is smaller than `len_tol`. Must lie between 0 and 1.
+        Default is 1e-6.
+    callback : callable, optional
+        A callback function with signature ``callback(xk)`` where ``xk``
+        represents the best function value found so far.
+
+    Returns
+    -------
+    res : OptimizeResult
+        The optimization result represented as a ``OptimizeResult`` object.
+        Important attributes are: ``x`` the solution array, ``success`` a
+        Boolean flag indicating if the optimizer exited successfully and
+        ``message`` which describes the cause of the termination. See
+        `OptimizeResult` for a description of other attributes.
+
+    Notes
+    -----
+    DIviding RECTangles (DIRECT) is a deterministic global
+    optimization algorithm capable of minimizing a black box function with
+    its variables subject to lower and upper bound constraints by sampling
+    potential solutions in the search space [1]_. The algorithm starts by
+    normalising the search space to an n-dimensional unit hypercube.
+    It samples the function at the center of this hypercube and at 2n
+    (n is the number of variables) more points, 2 in each coordinate
+    direction. Using these function values, DIRECT then divides the
+    domain into hyperrectangles, each having exactly one of the sampling
+    points as its center. In each iteration, DIRECT chooses, using the `eps`
+    parameter which defaults to 1e-4, some of the existing hyperrectangles
+    to be further divided. This division process continues until either the
+    maximum number of iterations or maximum function evaluations allowed
+    are exceeded, or the hyperrectangle containing the minimal value found
+    so far becomes small enough. If `f_min` is specified, the optimization
+    will stop once this function value is reached within a relative tolerance.
+    The locally biased variant of DIRECT (originally called DIRECT_L) [2]_ is
+    used by default. It makes the search more locally biased and more
+    efficient for cases with only a few local minima.
+
+    A note about termination criteria: `vol_tol` refers to the volume of the
+    hyperrectangle containing the lowest function value found so far. This
+    volume decreases exponentially with increasing dimensionality of the
+    problem. Therefore `vol_tol` should be decreased to avoid premature
+    termination of the algorithm for higher dimensions. This does not hold
+    for `len_tol`: it refers either to half of the maximal side length
+    (for ``locally_biased=True``) or half of the diagonal of the
+    hyperrectangle (for ``locally_biased=False``).
+
+    This code is based on the DIRECT 2.0.4 Fortran code by Gablonsky et al. at
+    https://ctk.math.ncsu.edu/SOFTWARE/DIRECTv204.tar.gz .
+    This original version was initially converted via f2c and then cleaned up
+    and reorganized by Steven G. Johnson, August 2007, for the NLopt project.
+    The `direct` function wraps the C implementation.
+
+    .. versionadded:: 1.9.0
+
+    References
+    ----------
+    .. [1] Jones, D.R., Perttunen, C.D. & Stuckman, B.E. Lipschitzian
+        optimization without the Lipschitz constant. J Optim Theory Appl
+        79, 157-181 (1993).
+    .. [2] Gablonsky, J., Kelley, C. A Locally-Biased form of the DIRECT
+        Algorithm. Journal of Global Optimization 21, 27-37 (2001).
+
+    Examples
+    --------
+    The following example is a 2-D problem with four local minima: minimizing
+    the Styblinski-Tang function
+    (https://en.wikipedia.org/wiki/Test_functions_for_optimization).
+
+    >>> from scipy.optimize import direct, Bounds
+    >>> def styblinski_tang(pos):
+    ...     x, y = pos
+    ...     return 0.5 * (x**4 - 16*x**2 + 5*x + y**4 - 16*y**2 + 5*y)
+    >>> bounds = Bounds([-4., -4.], [4., 4.])
+    >>> result = direct(styblinski_tang, bounds)
+    >>> result.x, result.fun, result.nfev
+    array([-2.90321597, -2.90321597]), -78.3323279095383, 2011
+
+    The correct global minimum was found but with a huge number of function
+    evaluations (2011). Loosening the termination tolerances `vol_tol` and
+    `len_tol` can be used to stop DIRECT earlier.
+
+    >>> result = direct(styblinski_tang, bounds, len_tol=1e-3)
+    >>> result.x, result.fun, result.nfev
+    array([-2.9044353, -2.9044353]), -78.33230330754142, 207
+
+    """
+    # convert bounds to new Bounds class if necessary
+    if not isinstance(bounds, Bounds):
+        if isinstance(bounds, list) or isinstance(bounds, tuple):
+            lb, ub = old_bound_to_new(bounds)
+            bounds = Bounds(lb, ub)
+        else:
+            message = ("bounds must be a sequence or "
+                       "instance of Bounds class")
+            raise ValueError(message)
+
+    lb = np.ascontiguousarray(bounds.lb, dtype=np.float64)
+    ub = np.ascontiguousarray(bounds.ub, dtype=np.float64)
+
+    # validate bounds
+    # check that lower bounds are smaller than upper bounds
+    if not np.all(lb < ub):
+        raise ValueError('Bounds are not consistent min < max')
+    # check for infs
+    if (np.any(np.isinf(lb)) or np.any(np.isinf(ub))):
+        raise ValueError("Bounds must not be inf.")
+
+    # validate tolerances
+    if (vol_tol < 0 or vol_tol > 1):
+        raise ValueError("vol_tol must be between 0 and 1.")
+    if (len_tol < 0 or len_tol > 1):
+        raise ValueError("len_tol must be between 0 and 1.")
+    if (f_min_rtol < 0 or f_min_rtol > 1):
+        raise ValueError("f_min_rtol must be between 0 and 1.")
+
+    # validate maxfun and maxiter
+    if maxfun is None:
+        maxfun = 1000 * lb.shape[0]
+    if not isinstance(maxfun, int):
+        raise ValueError("maxfun must be of type int.")
+    if maxfun < 0:
+        raise ValueError("maxfun must be > 0.")
+    if not isinstance(maxiter, int):
+        raise ValueError("maxiter must be of type int.")
+    if maxiter < 0:
+        raise ValueError("maxiter must be > 0.")
+
+    # validate boolean parameters
+    if not isinstance(locally_biased, bool):
+        raise ValueError("locally_biased must be True or False.")
+
+    def _func_wrap(x, args=None):
+        x = np.asarray(x)
+        if args is None:
+            f = func(x)
+        else:
+            f = func(x, *args)
+        # always return a float
+        return np.asarray(f).item()
+
+    # TODO: fix disp argument
+    x, fun, ret_code, nfev, nit = _direct(
+        _func_wrap,
+        np.asarray(lb), np.asarray(ub),
+        args,
+        False, eps, maxfun, maxiter,
+        locally_biased,
+        f_min, f_min_rtol,
+        vol_tol, len_tol, callback
+    )
+
+    format_val = (maxfun, maxiter, f_min_rtol, vol_tol, len_tol)
+    if ret_code > 2:
+        message = SUCCESS_MESSAGES[ret_code - 3].format(
+                    format_val[ret_code - 1])
+    elif 0 < ret_code <= 2:
+        message = ERROR_MESSAGES[ret_code - 1].format(format_val[ret_code - 1])
+    elif 0 > ret_code > -100:
+        message = ERROR_MESSAGES[abs(ret_code) + 1]
+    else:
+        message = ERROR_MESSAGES[ret_code + 99]
+
+    return OptimizeResult(x=np.asarray(x), fun=fun, status=ret_code,
+                          success=ret_code > 2, message=message,
+                          nfev=nfev, nit=nit)
@@ -0,0 +1,715 @@
+# Dual Annealing implementation.
+# Copyright (c) 2018 Sylvain Gubian <sylvain.gubian@pmi.com>,
+# Yang Xiang <yang.xiang@pmi.com>
+# Author: Sylvain Gubian, Yang Xiang, PMP S.A.
+
+"""
+A Dual Annealing global optimization algorithm
+"""
+
+import numpy as np
+from scipy.optimize import OptimizeResult
+from scipy.optimize import minimize, Bounds
+from scipy.special import gammaln
+from scipy._lib._util import check_random_state
+from scipy.optimize._constraints import new_bounds_to_old
+
+__all__ = ['dual_annealing']
+
+
+class VisitingDistribution:
+    """
+    Class used to generate new coordinates based on the distorted
+    Cauchy-Lorentz distribution. Depending on the steps within the strategy
+    chain, the class implements the strategy for generating new location
+    changes.
+
+    Parameters
+    ----------
+    lb : array_like
+        A 1-D NumPy ndarray containing lower bounds of the generated
+        components. Neither NaN or inf are allowed.
+    ub : array_like
+        A 1-D NumPy ndarray containing upper bounds for the generated
+        components. Neither NaN or inf are allowed.
+    visiting_param : float
+        Parameter for visiting distribution. Default value is 2.62.
+        Higher values give the visiting distribution a heavier tail, this
+        makes the algorithm jump to a more distant region.
+        The value range is (1, 3]. Its value is fixed for the life of the
+        object.
+    rand_gen : {`~numpy.random.RandomState`, `~numpy.random.Generator`}
+        A `~numpy.random.RandomState`, `~numpy.random.Generator` object
+        for using the current state of the created random generator container.
+
+    """
+    TAIL_LIMIT = 1.e8
+    MIN_VISIT_BOUND = 1.e-10
+
+    def __init__(self, lb, ub, visiting_param, rand_gen):
+        # if you wish to make _visiting_param adjustable during the life of
+        # the object then _factor2, _factor3, _factor5, _d1, _factor6 will
+        # have to be dynamically calculated in `visit_fn`. They're factored
+        # out here so they don't need to be recalculated all the time.
+        self._visiting_param = visiting_param
+        self.rand_gen = rand_gen
+        self.lower = lb
+        self.upper = ub
+        self.bound_range = ub - lb
+
+        # these are invariant numbers unless visiting_param changes
+        self._factor2 = np.exp((4.0 - self._visiting_param) * np.log(
+            self._visiting_param - 1.0))
+        self._factor3 = np.exp((2.0 - self._visiting_param) * np.log(2.0)
+                               / (self._visiting_param - 1.0))
+        self._factor4_p = np.sqrt(np.pi) * self._factor2 / (self._factor3 * (
+            3.0 - self._visiting_param))
+
+        self._factor5 = 1.0 / (self._visiting_param - 1.0) - 0.5
+        self._d1 = 2.0 - self._factor5
+        self._factor6 = np.pi * (1.0 - self._factor5) / np.sin(
+            np.pi * (1.0 - self._factor5)) / np.exp(gammaln(self._d1))
+
+    def visiting(self, x, step, temperature):
+        """ Based on the step in the strategy chain, new coordinates are
+        generated by changing all components is the same time or only
+        one of them, the new values are computed with visit_fn method
+        """
+        dim = x.size
+        if step < dim:
+            # Changing all coordinates with a new visiting value
+            visits = self.visit_fn(temperature, dim)
+            upper_sample, lower_sample = self.rand_gen.uniform(size=2)
+            visits[visits > self.TAIL_LIMIT] = self.TAIL_LIMIT * upper_sample
+            visits[visits < -self.TAIL_LIMIT] = -self.TAIL_LIMIT * lower_sample
+            x_visit = visits + x
+            a = x_visit - self.lower
+            b = np.fmod(a, self.bound_range) + self.bound_range
+            x_visit = np.fmod(b, self.bound_range) + self.lower
+            x_visit[np.fabs(
+                x_visit - self.lower) < self.MIN_VISIT_BOUND] += 1.e-10
+        else:
+            # Changing only one coordinate at a time based on strategy
+            # chain step
+            x_visit = np.copy(x)
+            visit = self.visit_fn(temperature, 1)[0]
+            if visit > self.TAIL_LIMIT:
+                visit = self.TAIL_LIMIT * self.rand_gen.uniform()
+            elif visit < -self.TAIL_LIMIT:
+                visit = -self.TAIL_LIMIT * self.rand_gen.uniform()
+            index = step - dim
+            x_visit[index] = visit + x[index]
+            a = x_visit[index] - self.lower[index]
+            b = np.fmod(a, self.bound_range[index]) + self.bound_range[index]
+            x_visit[index] = np.fmod(b, self.bound_range[
+                index]) + self.lower[index]
+            if np.fabs(x_visit[index] - self.lower[
+                    index]) < self.MIN_VISIT_BOUND:
+                x_visit[index] += self.MIN_VISIT_BOUND
+        return x_visit
+
+    def visit_fn(self, temperature, dim):
+        """ Formula Visita from p. 405 of reference [2] """
+        x, y = self.rand_gen.normal(size=(dim, 2)).T
+
+        factor1 = np.exp(np.log(temperature) / (self._visiting_param - 1.0))
+        factor4 = self._factor4_p * factor1
+
+        # sigmax
+        x *= np.exp(-(self._visiting_param - 1.0) * np.log(
+            self._factor6 / factor4) / (3.0 - self._visiting_param))
+
+        den = np.exp((self._visiting_param - 1.0) * np.log(np.fabs(y)) /
+                     (3.0 - self._visiting_param))
+
+        return x / den
+
+
+class EnergyState:
+    """
+    Class used to record the energy state. At any time, it knows what is the
+    currently used coordinates and the most recent best location.
+
+    Parameters
+    ----------
+    lower : array_like
+        A 1-D NumPy ndarray containing lower bounds for generating an initial
+        random components in the `reset` method.
+    upper : array_like
+        A 1-D NumPy ndarray containing upper bounds for generating an initial
+        random components in the `reset` method
+        components. Neither NaN or inf are allowed.
+    callback : callable, ``callback(x, f, context)``, optional
+        A callback function which will be called for all minima found.
+        ``x`` and ``f`` are the coordinates and function value of the
+        latest minimum found, and `context` has value in [0, 1, 2]
+    """
+    # Maximum number of trials for generating a valid starting point
+    MAX_REINIT_COUNT = 1000
+
+    def __init__(self, lower, upper, callback=None):
+        self.ebest = None
+        self.current_energy = None
+        self.current_location = None
+        self.xbest = None
+        self.lower = lower
+        self.upper = upper
+        self.callback = callback
+
+    def reset(self, func_wrapper, rand_gen, x0=None):
+        """
+        Initialize current location is the search domain. If `x0` is not
+        provided, a random location within the bounds is generated.
+        """
+        if x0 is None:
+            self.current_location = rand_gen.uniform(self.lower, self.upper,
+                                                     size=len(self.lower))
+        else:
+            self.current_location = np.copy(x0)
+        init_error = True
+        reinit_counter = 0
+        while init_error:
+            self.current_energy = func_wrapper.fun(self.current_location)
+            if self.current_energy is None:
+                raise ValueError('Objective function is returning None')
+            if (not np.isfinite(self.current_energy) or np.isnan(
+                    self.current_energy)):
+                if reinit_counter >= EnergyState.MAX_REINIT_COUNT:
+                    init_error = False
+                    message = (
+                        'Stopping algorithm because function '
+                        'create NaN or (+/-) infinity values even with '
+                        'trying new random parameters'
+                    )
+                    raise ValueError(message)
+                self.current_location = rand_gen.uniform(self.lower,
+                                                         self.upper,
+                                                         size=self.lower.size)
+                reinit_counter += 1
+            else:
+                init_error = False
+            # If first time reset, initialize ebest and xbest
+            if self.ebest is None and self.xbest is None:
+                self.ebest = self.current_energy
+                self.xbest = np.copy(self.current_location)
+            # Otherwise, we keep them in case of reannealing reset
+
+    def update_best(self, e, x, context):
+        self.ebest = e
+        self.xbest = np.copy(x)
+        if self.callback is not None:
+            val = self.callback(x, e, context)
+            if val is not None:
+                if val:
+                    return ('Callback function requested to stop early by '
+                           'returning True')
+
+    def update_current(self, e, x):
+        self.current_energy = e
+        self.current_location = np.copy(x)
+
+
+class StrategyChain:
+    """
+    Class that implements within a Markov chain the strategy for location
+    acceptance and local search decision making.
+
+    Parameters
+    ----------
+    acceptance_param : float
+        Parameter for acceptance distribution. It is used to control the
+        probability of acceptance. The lower the acceptance parameter, the
+        smaller the probability of acceptance. Default value is -5.0 with
+        a range (-1e4, -5].
+    visit_dist : VisitingDistribution
+        Instance of `VisitingDistribution` class.
+    func_wrapper : ObjectiveFunWrapper
+        Instance of `ObjectiveFunWrapper` class.
+    minimizer_wrapper: LocalSearchWrapper
+        Instance of `LocalSearchWrapper` class.
+    rand_gen : {None, int, `numpy.random.Generator`,
+                `numpy.random.RandomState`}, optional
+
+        If `seed` is None (or `np.random`), the `numpy.random.RandomState`
+        singleton is used.
+        If `seed` is an int, a new ``RandomState`` instance is used,
+        seeded with `seed`.
+        If `seed` is already a ``Generator`` or ``RandomState`` instance then
+        that instance is used.
+    energy_state: EnergyState
+        Instance of `EnergyState` class.
+
+    """
+
+    def __init__(self, acceptance_param, visit_dist, func_wrapper,
+                 minimizer_wrapper, rand_gen, energy_state):
+        # Local strategy chain minimum energy and location
+        self.emin = energy_state.current_energy
+        self.xmin = np.array(energy_state.current_location)
+        # Global optimizer state
+        self.energy_state = energy_state
+        # Acceptance parameter
+        self.acceptance_param = acceptance_param
+        # Visiting distribution instance
+        self.visit_dist = visit_dist
+        # Wrapper to objective function
+        self.func_wrapper = func_wrapper
+        # Wrapper to the local minimizer
+        self.minimizer_wrapper = minimizer_wrapper
+        self.not_improved_idx = 0
+        self.not_improved_max_idx = 1000
+        self._rand_gen = rand_gen
+        self.temperature_step = 0
+        self.K = 100 * len(energy_state.current_location)
+
+    def accept_reject(self, j, e, x_visit):
+        r = self._rand_gen.uniform()
+        pqv_temp = 1.0 - ((1.0 - self.acceptance_param) *
+            (e - self.energy_state.current_energy) / self.temperature_step)
+        if pqv_temp <= 0.:
+            pqv = 0.
+        else:
+            pqv = np.exp(np.log(pqv_temp) / (
+                1. - self.acceptance_param))
+
+        if r <= pqv:
+            # We accept the new location and update state
+            self.energy_state.update_current(e, x_visit)
+            self.xmin = np.copy(self.energy_state.current_location)
+
+        # No improvement for a long time
+        if self.not_improved_idx >= self.not_improved_max_idx:
+            if j == 0 or self.energy_state.current_energy < self.emin:
+                self.emin = self.energy_state.current_energy
+                self.xmin = np.copy(self.energy_state.current_location)
+
+    def run(self, step, temperature):
+        self.temperature_step = temperature / float(step + 1)
+        self.not_improved_idx += 1
+        for j in range(self.energy_state.current_location.size * 2):
+            if j == 0:
+                if step == 0:
+                    self.energy_state_improved = True
+                else:
+                    self.energy_state_improved = False
+            x_visit = self.visit_dist.visiting(
+                self.energy_state.current_location, j, temperature)
+            # Calling the objective function
+            e = self.func_wrapper.fun(x_visit)
+            if e < self.energy_state.current_energy:
+                # We have got a better energy value
+                self.energy_state.update_current(e, x_visit)
+                if e < self.energy_state.ebest:
+                    val = self.energy_state.update_best(e, x_visit, 0)
+                    if val is not None:
+                        if val:
+                            return val
+                    self.energy_state_improved = True
+                    self.not_improved_idx = 0
+            else:
+                # We have not improved but do we accept the new location?
+                self.accept_reject(j, e, x_visit)
+            if self.func_wrapper.nfev >= self.func_wrapper.maxfun:
+                return ('Maximum number of function call reached '
+                        'during annealing')
+        # End of StrategyChain loop
+
+    def local_search(self):
+        # Decision making for performing a local search
+        # based on strategy chain results
+        # If energy has been improved or no improvement since too long,
+        # performing a local search with the best strategy chain location
+        if self.energy_state_improved:
+            # Global energy has improved, let's see if LS improves further
+            e, x = self.minimizer_wrapper.local_search(self.energy_state.xbest,
+                                                       self.energy_state.ebest)
+            if e < self.energy_state.ebest:
+                self.not_improved_idx = 0
+                val = self.energy_state.update_best(e, x, 1)
+                if val is not None:
+                    if val:
+                        return val
+                self.energy_state.update_current(e, x)
+            if self.func_wrapper.nfev >= self.func_wrapper.maxfun:
+                return ('Maximum number of function call reached '
+                        'during local search')
+        # Check probability of a need to perform a LS even if no improvement
+        do_ls = False
+        if self.K < 90 * len(self.energy_state.current_location):
+            pls = np.exp(self.K * (
+                self.energy_state.ebest - self.energy_state.current_energy) /
+                self.temperature_step)
+            if pls >= self._rand_gen.uniform():
+                do_ls = True
+        # Global energy not improved, let's see what LS gives
+        # on the best strategy chain location
+        if self.not_improved_idx >= self.not_improved_max_idx:
+            do_ls = True
+        if do_ls:
+            e, x = self.minimizer_wrapper.local_search(self.xmin, self.emin)
+            self.xmin = np.copy(x)
+            self.emin = e
+            self.not_improved_idx = 0
+            self.not_improved_max_idx = self.energy_state.current_location.size
+            if e < self.energy_state.ebest:
+                val = self.energy_state.update_best(
+                    self.emin, self.xmin, 2)
+                if val is not None:
+                    if val:
+                        return val
+                self.energy_state.update_current(e, x)
+            if self.func_wrapper.nfev >= self.func_wrapper.maxfun:
+                return ('Maximum number of function call reached '
+                        'during dual annealing')
+
+
+class ObjectiveFunWrapper:
+
+    def __init__(self, func, maxfun=1e7, *args):
+        self.func = func
+        self.args = args
+        # Number of objective function evaluations
+        self.nfev = 0
+        # Number of gradient function evaluation if used
+        self.ngev = 0
+        # Number of hessian of the objective function if used
+        self.nhev = 0
+        self.maxfun = maxfun
+
+    def fun(self, x):
+        self.nfev += 1
+        return self.func(x, *self.args)
+
+
+class LocalSearchWrapper:
+    """
+    Class used to wrap around the minimizer used for local search
+    Default local minimizer is SciPy minimizer L-BFGS-B
+    """
+
+    LS_MAXITER_RATIO = 6
+    LS_MAXITER_MIN = 100
+    LS_MAXITER_MAX = 1000
+
+    def __init__(self, search_bounds, func_wrapper, *args, **kwargs):
+        self.func_wrapper = func_wrapper
+        self.kwargs = kwargs
+        self.jac = self.kwargs.get('jac', None)
+        self.minimizer = minimize
+        bounds_list = list(zip(*search_bounds))
+        self.lower = np.array(bounds_list[0])
+        self.upper = np.array(bounds_list[1])
+
+        # If no minimizer specified, use SciPy minimize with 'L-BFGS-B' method
+        if not self.kwargs:
+            n = len(self.lower)
+            ls_max_iter = min(max(n * self.LS_MAXITER_RATIO,
+                                  self.LS_MAXITER_MIN),
+                              self.LS_MAXITER_MAX)
+            self.kwargs['method'] = 'L-BFGS-B'
+            self.kwargs['options'] = {
+                'maxiter': ls_max_iter,
+            }
+            self.kwargs['bounds'] = list(zip(self.lower, self.upper))
+        elif callable(self.jac):
+            def wrapped_jac(x):
+                return self.jac(x, *args)
+            self.kwargs['jac'] = wrapped_jac
+
+    def local_search(self, x, e):
+        # Run local search from the given x location where energy value is e
+        x_tmp = np.copy(x)
+        mres = self.minimizer(self.func_wrapper.fun, x, **self.kwargs)
+        if 'njev' in mres:
+            self.func_wrapper.ngev += mres.njev
+        if 'nhev' in mres:
+            self.func_wrapper.nhev += mres.nhev
+        # Check if is valid value
+        is_finite = np.all(np.isfinite(mres.x)) and np.isfinite(mres.fun)
+        in_bounds = np.all(mres.x >= self.lower) and np.all(
+            mres.x <= self.upper)
+        is_valid = is_finite and in_bounds
+
+        # Use the new point only if it is valid and return a better results
+        if is_valid and mres.fun < e:
+            return mres.fun, mres.x
+        else:
+            return e, x_tmp
+
+
+def dual_annealing(func, bounds, args=(), maxiter=1000,
+                   minimizer_kwargs=None, initial_temp=5230.,
+                   restart_temp_ratio=2.e-5, visit=2.62, accept=-5.0,
+                   maxfun=1e7, seed=None, no_local_search=False,
+                   callback=None, x0=None):
+    """
+    Find the global minimum of a function using Dual Annealing.
+
+    Parameters
+    ----------
+    func : callable
+        The objective function to be minimized. Must be in the form
+        ``f(x, *args)``, where ``x`` is the argument in the form of a 1-D array
+        and ``args`` is a  tuple of any additional fixed parameters needed to
+        completely specify the function.
+    bounds : sequence or `Bounds`
+        Bounds for variables. There are two ways to specify the bounds:
+
+        1. Instance of `Bounds` class.
+        2. Sequence of ``(min, max)`` pairs for each element in `x`.
+
+    args : tuple, optional
+        Any additional fixed parameters needed to completely specify the
+        objective function.
+    maxiter : int, optional
+        The maximum number of global search iterations. Default value is 1000.
+    minimizer_kwargs : dict, optional
+        Extra keyword arguments to be passed to the local minimizer
+        (`minimize`). Some important options could be:
+        ``method`` for the minimizer method to use and ``args`` for
+        objective function additional arguments.
+    initial_temp : float, optional
+        The initial temperature, use higher values to facilitates a wider
+        search of the energy landscape, allowing dual_annealing to escape
+        local minima that it is trapped in. Default value is 5230. Range is
+        (0.01, 5.e4].
+    restart_temp_ratio : float, optional
+        During the annealing process, temperature is decreasing, when it
+        reaches ``initial_temp * restart_temp_ratio``, the reannealing process
+        is triggered. Default value of the ratio is 2e-5. Range is (0, 1).
+    visit : float, optional
+        Parameter for visiting distribution. Default value is 2.62. Higher
+        values give the visiting distribution a heavier tail, this makes
+        the algorithm jump to a more distant region. The value range is (1, 3].
+    accept : float, optional
+        Parameter for acceptance distribution. It is used to control the
+        probability of acceptance. The lower the acceptance parameter, the
+        smaller the probability of acceptance. Default value is -5.0 with
+        a range (-1e4, -5].
+    maxfun : int, optional
+        Soft limit for the number of objective function calls. If the
+        algorithm is in the middle of a local search, this number will be
+        exceeded, the algorithm will stop just after the local search is
+        done. Default value is 1e7.
+    seed : {None, int, `numpy.random.Generator`, `numpy.random.RandomState`}, optional
+        If `seed` is None (or `np.random`), the `numpy.random.RandomState`
+        singleton is used.
+        If `seed` is an int, a new ``RandomState`` instance is used,
+        seeded with `seed`.
+        If `seed` is already a ``Generator`` or ``RandomState`` instance then
+        that instance is used.
+        Specify `seed` for repeatable minimizations. The random numbers
+        generated with this seed only affect the visiting distribution function
+        and new coordinates generation.
+    no_local_search : bool, optional
+        If `no_local_search` is set to True, a traditional Generalized
+        Simulated Annealing will be performed with no local search
+        strategy applied.
+    callback : callable, optional
+        A callback function with signature ``callback(x, f, context)``,
+        which will be called for all minima found.
+        ``x`` and ``f`` are the coordinates and function value of the
+        latest minimum found, and ``context`` has value in [0, 1, 2], with the
+        following meaning:
+
+            - 0: minimum detected in the annealing process.
+            - 1: detection occurred in the local search process.
+            - 2: detection done in the dual annealing process.
+
+        If the callback implementation returns True, the algorithm will stop.
+    x0 : ndarray, shape(n,), optional
+        Coordinates of a single N-D starting point.
+
+    Returns
+    -------
+    res : OptimizeResult
+        The optimization result represented as a `OptimizeResult` object.
+        Important attributes are: ``x`` the solution array, ``fun`` the value
+        of the function at the solution, and ``message`` which describes the
+        cause of the termination.
+        See `OptimizeResult` for a description of other attributes.
+
+    Notes
+    -----
+    This function implements the Dual Annealing optimization. This stochastic
+    approach derived from [3]_ combines the generalization of CSA (Classical
+    Simulated Annealing) and FSA (Fast Simulated Annealing) [1]_ [2]_ coupled
+    to a strategy for applying a local search on accepted locations [4]_.
+    An alternative implementation of this same algorithm is described in [5]_
+    and benchmarks are presented in [6]_. This approach introduces an advanced
+    method to refine the solution found by the generalized annealing
+    process. This algorithm uses a distorted Cauchy-Lorentz visiting
+    distribution, with its shape controlled by the parameter :math:`q_{v}`
+
+    .. math::
+
+        g_{q_{v}}(\\Delta x(t)) \\propto \\frac{ \\
+        \\left[T_{q_{v}}(t) \\right]^{-\\frac{D}{3-q_{v}}}}{ \\
+        \\left[{1+(q_{v}-1)\\frac{(\\Delta x(t))^{2}} { \\
+        \\left[T_{q_{v}}(t)\\right]^{\\frac{2}{3-q_{v}}}}}\\right]^{ \\
+        \\frac{1}{q_{v}-1}+\\frac{D-1}{2}}}
+
+    Where :math:`t` is the artificial time. This visiting distribution is used
+    to generate a trial jump distance :math:`\\Delta x(t)` of variable
+    :math:`x(t)` under artificial temperature :math:`T_{q_{v}}(t)`.
+
+    From the starting point, after calling the visiting distribution
+    function, the acceptance probability is computed as follows:
+
+    .. math::
+
+        p_{q_{a}} = \\min{\\{1,\\left[1-(1-q_{a}) \\beta \\Delta E \\right]^{ \\
+        \\frac{1}{1-q_{a}}}\\}}
+
+    Where :math:`q_{a}` is a acceptance parameter. For :math:`q_{a}<1`, zero
+    acceptance probability is assigned to the cases where
+
+    .. math::
+
+        [1-(1-q_{a}) \\beta \\Delta E] < 0
+
+    The artificial temperature :math:`T_{q_{v}}(t)` is decreased according to
+
+    .. math::
+
+        T_{q_{v}}(t) = T_{q_{v}}(1) \\frac{2^{q_{v}-1}-1}{\\left( \\
+        1 + t\\right)^{q_{v}-1}-1}
+
+    Where :math:`q_{v}` is the visiting parameter.
+
+    .. versionadded:: 1.2.0
+
+    References
+    ----------
+    .. [1] Tsallis C. Possible generalization of Boltzmann-Gibbs
+        statistics. Journal of Statistical Physics, 52, 479-487 (1998).
+    .. [2] Tsallis C, Stariolo DA. Generalized Simulated Annealing.
+        Physica A, 233, 395-406 (1996).
+    .. [3] Xiang Y, Sun DY, Fan W, Gong XG. Generalized Simulated
+        Annealing Algorithm and Its Application to the Thomson Model.
+        Physics Letters A, 233, 216-220 (1997).
+    .. [4] Xiang Y, Gong XG. Efficiency of Generalized Simulated
+        Annealing. Physical Review E, 62, 4473 (2000).
+    .. [5] Xiang Y, Gubian S, Suomela B, Hoeng J. Generalized
+        Simulated Annealing for Efficient Global Optimization: the GenSA
+        Package for R. The R Journal, Volume 5/1 (2013).
+    .. [6] Mullen, K. Continuous Global Optimization in R. Journal of
+        Statistical Software, 60(6), 1 - 45, (2014).
+        :doi:`10.18637/jss.v060.i06`
+
+    Examples
+    --------
+    The following example is a 10-D problem, with many local minima.
+    The function involved is called Rastrigin
+    (https://en.wikipedia.org/wiki/Rastrigin_function)
+
+    >>> import numpy as np
+    >>> from scipy.optimize import dual_annealing
+    >>> func = lambda x: np.sum(x*x - 10*np.cos(2*np.pi*x)) + 10*np.size(x)
+    >>> lw = [-5.12] * 10
+    >>> up = [5.12] * 10
+    >>> ret = dual_annealing(func, bounds=list(zip(lw, up)))
+    >>> ret.x
+    array([-4.26437714e-09, -3.91699361e-09, -1.86149218e-09, -3.97165720e-09,
+           -6.29151648e-09, -6.53145322e-09, -3.93616815e-09, -6.55623025e-09,
+           -6.05775280e-09, -5.00668935e-09]) # random
+    >>> ret.fun
+    0.000000
+
+    """
+
+    if isinstance(bounds, Bounds):
+        bounds = new_bounds_to_old(bounds.lb, bounds.ub, len(bounds.lb))
+
+    if x0 is not None and not len(x0) == len(bounds):
+        raise ValueError('Bounds size does not match x0')
+
+    lu = list(zip(*bounds))
+    lower = np.array(lu[0])
+    upper = np.array(lu[1])
+    # Check that restart temperature ratio is correct
+    if restart_temp_ratio <= 0. or restart_temp_ratio >= 1.:
+        raise ValueError('Restart temperature ratio has to be in range (0, 1)')
+    # Checking bounds are valid
+    if (np.any(np.isinf(lower)) or np.any(np.isinf(upper)) or np.any(
+            np.isnan(lower)) or np.any(np.isnan(upper))):
+        raise ValueError('Some bounds values are inf values or nan values')
+    # Checking that bounds are consistent
+    if not np.all(lower < upper):
+        raise ValueError('Bounds are not consistent min < max')
+    # Checking that bounds are the same length
+    if not len(lower) == len(upper):
+        raise ValueError('Bounds do not have the same dimensions')
+
+    # Wrapper for the objective function
+    func_wrapper = ObjectiveFunWrapper(func, maxfun, *args)
+
+    # minimizer_kwargs has to be a dict, not None
+    minimizer_kwargs = minimizer_kwargs or {}
+
+    minimizer_wrapper = LocalSearchWrapper(
+        bounds, func_wrapper, *args, **minimizer_kwargs)
+
+    # Initialization of random Generator for reproducible runs if seed provided
+    rand_state = check_random_state(seed)
+    # Initialization of the energy state
+    energy_state = EnergyState(lower, upper, callback)
+    energy_state.reset(func_wrapper, rand_state, x0)
+    # Minimum value of annealing temperature reached to perform
+    # re-annealing
+    temperature_restart = initial_temp * restart_temp_ratio
+    # VisitingDistribution instance
+    visit_dist = VisitingDistribution(lower, upper, visit, rand_state)
+    # Strategy chain instance
+    strategy_chain = StrategyChain(accept, visit_dist, func_wrapper,
+                                   minimizer_wrapper, rand_state, energy_state)
+    need_to_stop = False
+    iteration = 0
+    message = []
+    # OptimizeResult object to be returned
+    optimize_res = OptimizeResult()
+    optimize_res.success = True
+    optimize_res.status = 0
+
+    t1 = np.exp((visit - 1) * np.log(2.0)) - 1.0
+    # Run the search loop
+    while not need_to_stop:
+        for i in range(maxiter):
+            # Compute temperature for this step
+            s = float(i) + 2.0
+            t2 = np.exp((visit - 1) * np.log(s)) - 1.0
+            temperature = initial_temp * t1 / t2
+            if iteration >= maxiter:
+                message.append("Maximum number of iteration reached")
+                need_to_stop = True
+                break
+            # Need a re-annealing process?
+            if temperature < temperature_restart:
+                energy_state.reset(func_wrapper, rand_state)
+                break
+            # starting strategy chain
+            val = strategy_chain.run(i, temperature)
+            if val is not None:
+                message.append(val)
+                need_to_stop = True
+                optimize_res.success = False
+                break
+            # Possible local search at the end of the strategy chain
+            if not no_local_search:
+                val = strategy_chain.local_search()
+                if val is not None:
+                    message.append(val)
+                    need_to_stop = True
+                    optimize_res.success = False
+                    break
+            iteration += 1
+
+    # Setting the OptimizeResult values
+    optimize_res.x = energy_state.xbest
+    optimize_res.fun = energy_state.ebest
+    optimize_res.nit = iteration
+    optimize_res.nfev = func_wrapper.nfev
+    optimize_res.njev = func_wrapper.ngev
+    optimize_res.nhev = func_wrapper.nhev
+    optimize_res.message = message
+    return optimize_res
@@ -0,0 +1,430 @@
+"""Hessian update strategies for quasi-Newton optimization methods."""
+import numpy as np
+from numpy.linalg import norm
+from scipy.linalg import get_blas_funcs
+from warnings import warn
+
+
+__all__ = ['HessianUpdateStrategy', 'BFGS', 'SR1']
+
+
+class HessianUpdateStrategy:
+    """Interface for implementing Hessian update strategies.
+
+    Many optimization methods make use of Hessian (or inverse Hessian)
+    approximations, such as the quasi-Newton methods BFGS, SR1, L-BFGS.
+    Some of these  approximations, however, do not actually need to store
+    the entire matrix or can compute the internal matrix product with a
+    given vector in a very efficiently manner. This class serves as an
+    abstract interface between the optimization algorithm and the
+    quasi-Newton update strategies, giving freedom of implementation
+    to store and update the internal matrix as efficiently as possible.
+    Different choices of initialization and update procedure will result
+    in different quasi-Newton strategies.
+
+    Four methods should be implemented in derived classes: ``initialize``,
+    ``update``, ``dot`` and ``get_matrix``.
+
+    Notes
+    -----
+    Any instance of a class that implements this interface,
+    can be accepted by the method ``minimize`` and used by
+    the compatible solvers to approximate the Hessian (or
+    inverse Hessian) used by the optimization algorithms.
+    """
+
+    def initialize(self, n, approx_type):
+        """Initialize internal matrix.
+
+        Allocate internal memory for storing and updating
+        the Hessian or its inverse.
+
+        Parameters
+        ----------
+        n : int
+            Problem dimension.
+        approx_type : {'hess', 'inv_hess'}
+            Selects either the Hessian or the inverse Hessian.
+            When set to 'hess' the Hessian will be stored and updated.
+            When set to 'inv_hess' its inverse will be used instead.
+        """
+        raise NotImplementedError("The method ``initialize(n, approx_type)``"
+                                  " is not implemented.")
+
+    def update(self, delta_x, delta_grad):
+        """Update internal matrix.
+
+        Update Hessian matrix or its inverse (depending on how 'approx_type'
+        is defined) using information about the last evaluated points.
+
+        Parameters
+        ----------
+        delta_x : ndarray
+            The difference between two points the gradient
+            function have been evaluated at: ``delta_x = x2 - x1``.
+        delta_grad : ndarray
+            The difference between the gradients:
+            ``delta_grad = grad(x2) - grad(x1)``.
+        """
+        raise NotImplementedError("The method ``update(delta_x, delta_grad)``"
+                                  " is not implemented.")
+
+    def dot(self, p):
+        """Compute the product of the internal matrix with the given vector.
+
+        Parameters
+        ----------
+        p : array_like
+            1-D array representing a vector.
+
+        Returns
+        -------
+        Hp : array
+            1-D represents the result of multiplying the approximation matrix
+            by vector p.
+        """
+        raise NotImplementedError("The method ``dot(p)``"
+                                  " is not implemented.")
+
+    def get_matrix(self):
+        """Return current internal matrix.
+
+        Returns
+        -------
+        H : ndarray, shape (n, n)
+            Dense matrix containing either the Hessian
+            or its inverse (depending on how 'approx_type'
+            is defined).
+        """
+        raise NotImplementedError("The method ``get_matrix(p)``"
+                                  " is not implemented.")
+
+
+class FullHessianUpdateStrategy(HessianUpdateStrategy):
+    """Hessian update strategy with full dimensional internal representation.
+    """
+    _syr = get_blas_funcs('syr', dtype='d')  # Symmetric rank 1 update
+    _syr2 = get_blas_funcs('syr2', dtype='d')  # Symmetric rank 2 update
+    # Symmetric matrix-vector product
+    _symv = get_blas_funcs('symv', dtype='d')
+
+    def __init__(self, init_scale='auto'):
+        self.init_scale = init_scale
+        # Until initialize is called we can't really use the class,
+        # so it makes sense to set everything to None.
+        self.first_iteration = None
+        self.approx_type = None
+        self.B = None
+        self.H = None
+
+    def initialize(self, n, approx_type):
+        """Initialize internal matrix.
+
+        Allocate internal memory for storing and updating
+        the Hessian or its inverse.
+
+        Parameters
+        ----------
+        n : int
+            Problem dimension.
+        approx_type : {'hess', 'inv_hess'}
+            Selects either the Hessian or the inverse Hessian.
+            When set to 'hess' the Hessian will be stored and updated.
+            When set to 'inv_hess' its inverse will be used instead.
+        """
+        self.first_iteration = True
+        self.n = n
+        self.approx_type = approx_type
+        if approx_type not in ('hess', 'inv_hess'):
+            raise ValueError("`approx_type` must be 'hess' or 'inv_hess'.")
+        # Create matrix
+        if self.approx_type == 'hess':
+            self.B = np.eye(n, dtype=float)
+        else:
+            self.H = np.eye(n, dtype=float)
+
+    def _auto_scale(self, delta_x, delta_grad):
+        # Heuristic to scale matrix at first iteration.
+        # Described in Nocedal and Wright "Numerical Optimization"
+        # p.143 formula (6.20).
+        s_norm2 = np.dot(delta_x, delta_x)
+        y_norm2 = np.dot(delta_grad, delta_grad)
+        ys = np.abs(np.dot(delta_grad, delta_x))
+        if ys == 0.0 or y_norm2 == 0 or s_norm2 == 0:
+            return 1
+        if self.approx_type == 'hess':
+            return y_norm2 / ys
+        else:
+            return ys / y_norm2
+
+    def _update_implementation(self, delta_x, delta_grad):
+        raise NotImplementedError("The method ``_update_implementation``"
+                                  " is not implemented.")
+
+    def update(self, delta_x, delta_grad):
+        """Update internal matrix.
+
+        Update Hessian matrix or its inverse (depending on how 'approx_type'
+        is defined) using information about the last evaluated points.
+
+        Parameters
+        ----------
+        delta_x : ndarray
+            The difference between two points the gradient
+            function have been evaluated at: ``delta_x = x2 - x1``.
+        delta_grad : ndarray
+            The difference between the gradients:
+            ``delta_grad = grad(x2) - grad(x1)``.
+        """
+        if np.all(delta_x == 0.0):
+            return
+        if np.all(delta_grad == 0.0):
+            warn('delta_grad == 0.0. Check if the approximated '
+                 'function is linear. If the function is linear '
+                 'better results can be obtained by defining the '
+                 'Hessian as zero instead of using quasi-Newton '
+                 'approximations.',
+                 UserWarning, stacklevel=2)
+            return
+        if self.first_iteration:
+            # Get user specific scale
+            if self.init_scale == "auto":
+                scale = self._auto_scale(delta_x, delta_grad)
+            else:
+                scale = float(self.init_scale)
+            # Scale initial matrix with ``scale * np.eye(n)``
+            if self.approx_type == 'hess':
+                self.B *= scale
+            else:
+                self.H *= scale
+            self.first_iteration = False
+        self._update_implementation(delta_x, delta_grad)
+
+    def dot(self, p):
+        """Compute the product of the internal matrix with the given vector.
+
+        Parameters
+        ----------
+        p : array_like
+            1-D array representing a vector.
+
+        Returns
+        -------
+        Hp : array
+            1-D represents the result of multiplying the approximation matrix
+            by vector p.
+        """
+        if self.approx_type == 'hess':
+            return self._symv(1, self.B, p)
+        else:
+            return self._symv(1, self.H, p)
+
+    def get_matrix(self):
+        """Return the current internal matrix.
+
+        Returns
+        -------
+        M : ndarray, shape (n, n)
+            Dense matrix containing either the Hessian or its inverse
+            (depending on how `approx_type` was defined).
+        """
+        if self.approx_type == 'hess':
+            M = np.copy(self.B)
+        else:
+            M = np.copy(self.H)
+        li = np.tril_indices_from(M, k=-1)
+        M[li] = M.T[li]
+        return M
+
+
+class BFGS(FullHessianUpdateStrategy):
+    """Broyden-Fletcher-Goldfarb-Shanno (BFGS) Hessian update strategy.
+
+    Parameters
+    ----------
+    exception_strategy : {'skip_update', 'damp_update'}, optional
+        Define how to proceed when the curvature condition is violated.
+        Set it to 'skip_update' to just skip the update. Or, alternatively,
+        set it to 'damp_update' to interpolate between the actual BFGS
+        result and the unmodified matrix. Both exceptions strategies
+        are explained  in [1]_, p.536-537.
+    min_curvature : float
+        This number, scaled by a normalization factor, defines the
+        minimum curvature ``dot(delta_grad, delta_x)`` allowed to go
+        unaffected by the exception strategy. By default is equal to
+        1e-8 when ``exception_strategy = 'skip_update'`` and equal
+        to 0.2 when ``exception_strategy = 'damp_update'``.
+    init_scale : {float, 'auto'}
+        Matrix scale at first iteration. At the first
+        iteration the Hessian matrix or its inverse will be initialized
+        with ``init_scale*np.eye(n)``, where ``n`` is the problem dimension.
+        Set it to 'auto' in order to use an automatic heuristic for choosing
+        the initial scale. The heuristic is described in [1]_, p.143.
+        By default uses 'auto'.
+
+    Notes
+    -----
+    The update is based on the description in [1]_, p.140.
+
+    References
+    ----------
+    .. [1] Nocedal, Jorge, and Stephen J. Wright. "Numerical optimization"
+           Second Edition (2006).
+    """
+
+    def __init__(self, exception_strategy='skip_update', min_curvature=None,
+                 init_scale='auto'):
+        if exception_strategy == 'skip_update':
+            if min_curvature is not None:
+                self.min_curvature = min_curvature
+            else:
+                self.min_curvature = 1e-8
+        elif exception_strategy == 'damp_update':
+            if min_curvature is not None:
+                self.min_curvature = min_curvature
+            else:
+                self.min_curvature = 0.2
+        else:
+            raise ValueError("`exception_strategy` must be 'skip_update' "
+                             "or 'damp_update'.")
+
+        super().__init__(init_scale)
+        self.exception_strategy = exception_strategy
+
+    def _update_inverse_hessian(self, ys, Hy, yHy, s):
+        """Update the inverse Hessian matrix.
+
+        BFGS update using the formula:
+
+            ``H <- H + ((H*y).T*y + s.T*y)/(s.T*y)^2 * (s*s.T)
+                     - 1/(s.T*y) * ((H*y)*s.T + s*(H*y).T)``
+
+        where ``s = delta_x`` and ``y = delta_grad``. This formula is
+        equivalent to (6.17) in [1]_ written in a more efficient way
+        for implementation.
+
+        References
+        ----------
+        .. [1] Nocedal, Jorge, and Stephen J. Wright. "Numerical optimization"
+               Second Edition (2006).
+        """
+        self.H = self._syr2(-1.0 / ys, s, Hy, a=self.H)
+        self.H = self._syr((ys+yHy)/ys**2, s, a=self.H)
+
+    def _update_hessian(self, ys, Bs, sBs, y):
+        """Update the Hessian matrix.
+
+        BFGS update using the formula:
+
+            ``B <- B - (B*s)*(B*s).T/s.T*(B*s) + y*y^T/s.T*y``
+
+        where ``s`` is short for ``delta_x`` and ``y`` is short
+        for ``delta_grad``. Formula (6.19) in [1]_.
+
+        References
+        ----------
+        .. [1] Nocedal, Jorge, and Stephen J. Wright. "Numerical optimization"
+               Second Edition (2006).
+        """
+        self.B = self._syr(1.0 / ys, y, a=self.B)
+        self.B = self._syr(-1.0 / sBs, Bs, a=self.B)
+
+    def _update_implementation(self, delta_x, delta_grad):
+        # Auxiliary variables w and z
+        if self.approx_type == 'hess':
+            w = delta_x
+            z = delta_grad
+        else:
+            w = delta_grad
+            z = delta_x
+        # Do some common operations
+        wz = np.dot(w, z)
+        Mw = self.dot(w)
+        wMw = Mw.dot(w)
+        # Guarantee that wMw > 0 by reinitializing matrix.
+        # While this is always true in exact arithmetic,
+        # indefinite matrix may appear due to roundoff errors.
+        if wMw <= 0.0:
+            scale = self._auto_scale(delta_x, delta_grad)
+            # Reinitialize matrix
+            if self.approx_type == 'hess':
+                self.B = scale * np.eye(self.n, dtype=float)
+            else:
+                self.H = scale * np.eye(self.n, dtype=float)
+            # Do common operations for new matrix
+            Mw = self.dot(w)
+            wMw = Mw.dot(w)
+        # Check if curvature condition is violated
+        if wz <= self.min_curvature * wMw:
+            # If the option 'skip_update' is set
+            # we just skip the update when the condition
+            # is violated.
+            if self.exception_strategy == 'skip_update':
+                return
+            # If the option 'damp_update' is set we
+            # interpolate between the actual BFGS
+            # result and the unmodified matrix.
+            elif self.exception_strategy == 'damp_update':
+                update_factor = (1-self.min_curvature) / (1 - wz/wMw)
+                z = update_factor*z + (1-update_factor)*Mw
+                wz = np.dot(w, z)
+        # Update matrix
+        if self.approx_type == 'hess':
+            self._update_hessian(wz, Mw, wMw, z)
+        else:
+            self._update_inverse_hessian(wz, Mw, wMw, z)
+
+
+class SR1(FullHessianUpdateStrategy):
+    """Symmetric-rank-1 Hessian update strategy.
+
+    Parameters
+    ----------
+    min_denominator : float
+        This number, scaled by a normalization factor,
+        defines the minimum denominator magnitude allowed
+        in the update. When the condition is violated we skip
+        the update. By default uses ``1e-8``.
+    init_scale : {float, 'auto'}, optional
+        Matrix scale at first iteration. At the first
+        iteration the Hessian matrix or its inverse will be initialized
+        with ``init_scale*np.eye(n)``, where ``n`` is the problem dimension.
+        Set it to 'auto' in order to use an automatic heuristic for choosing
+        the initial scale. The heuristic is described in [1]_, p.143.
+        By default uses 'auto'.
+
+    Notes
+    -----
+    The update is based on the description in [1]_, p.144-146.
+
+    References
+    ----------
+    .. [1] Nocedal, Jorge, and Stephen J. Wright. "Numerical optimization"
+           Second Edition (2006).
+    """
+
+    def __init__(self, min_denominator=1e-8, init_scale='auto'):
+        self.min_denominator = min_denominator
+        super().__init__(init_scale)
+
+    def _update_implementation(self, delta_x, delta_grad):
+        # Auxiliary variables w and z
+        if self.approx_type == 'hess':
+            w = delta_x
+            z = delta_grad
+        else:
+            w = delta_grad
+            z = delta_x
+        # Do some common operations
+        Mw = self.dot(w)
+        z_minus_Mw = z - Mw
+        denominator = np.dot(w, z_minus_Mw)
+        # If the denominator is too small
+        # we just skip the update.
+        if np.abs(denominator) <= self.min_denominator*norm(w)*norm(z_minus_Mw):
+            return
+        # Update matrix
+        if self.approx_type == 'hess':
+            self.B = self._syr(1/denominator, z_minus_Mw, a=self.B)
+        else:
+            self.H = self._syr(1/denominator, z_minus_Mw, a=self.H)
@@ -0,0 +1,106 @@
+# cython: language_level=3
+
+from libcpp cimport bool
+from libcpp.string cimport string
+
+cdef extern from "HConst.h" nogil:
+
+    const int HIGHS_CONST_I_INF "kHighsIInf"
+    const double HIGHS_CONST_INF "kHighsInf"
+    const double kHighsTiny
+    const double kHighsZero
+    const int kHighsThreadLimit
+
+    cdef enum HighsDebugLevel:
+      HighsDebugLevel_kHighsDebugLevelNone "kHighsDebugLevelNone" = 0
+      HighsDebugLevel_kHighsDebugLevelCheap "kHighsDebugLevelCheap"
+      HighsDebugLevel_kHighsDebugLevelCostly "kHighsDebugLevelCostly"
+      HighsDebugLevel_kHighsDebugLevelExpensive "kHighsDebugLevelExpensive"
+      HighsDebugLevel_kHighsDebugLevelMin "kHighsDebugLevelMin" = HighsDebugLevel_kHighsDebugLevelNone
+      HighsDebugLevel_kHighsDebugLevelMax "kHighsDebugLevelMax" = HighsDebugLevel_kHighsDebugLevelExpensive
+
+    ctypedef enum HighsModelStatus:
+        HighsModelStatusNOTSET "HighsModelStatus::kNotset" = 0
+        HighsModelStatusLOAD_ERROR "HighsModelStatus::kLoadError"
+        HighsModelStatusMODEL_ERROR "HighsModelStatus::kModelError"
+        HighsModelStatusPRESOLVE_ERROR "HighsModelStatus::kPresolveError"
+        HighsModelStatusSOLVE_ERROR "HighsModelStatus::kSolveError"
+        HighsModelStatusPOSTSOLVE_ERROR "HighsModelStatus::kPostsolveError"
+        HighsModelStatusMODEL_EMPTY "HighsModelStatus::kModelEmpty"
+        HighsModelStatusOPTIMAL "HighsModelStatus::kOptimal"
+        HighsModelStatusINFEASIBLE "HighsModelStatus::kInfeasible"
+        HighsModelStatus_UNBOUNDED_OR_INFEASIBLE "HighsModelStatus::kUnboundedOrInfeasible"
+        HighsModelStatusUNBOUNDED "HighsModelStatus::kUnbounded"
+        HighsModelStatusREACHED_DUAL_OBJECTIVE_VALUE_UPPER_BOUND "HighsModelStatus::kObjectiveBound"
+        HighsModelStatusREACHED_OBJECTIVE_TARGET "HighsModelStatus::kObjectiveTarget"
+        HighsModelStatusREACHED_TIME_LIMIT "HighsModelStatus::kTimeLimit"
+        HighsModelStatusREACHED_ITERATION_LIMIT "HighsModelStatus::kIterationLimit"
+        HighsModelStatusUNKNOWN "HighsModelStatus::kUnknown"
+        HighsModelStatusHIGHS_MODEL_STATUS_MIN "HighsModelStatus::kMin" = HighsModelStatusNOTSET
+        HighsModelStatusHIGHS_MODEL_STATUS_MAX "HighsModelStatus::kMax" = HighsModelStatusUNKNOWN
+
+    cdef enum HighsBasisStatus:
+        HighsBasisStatusLOWER "HighsBasisStatus::kLower" = 0, # (slack) variable is at its lower bound [including fixed variables]
+        HighsBasisStatusBASIC "HighsBasisStatus::kBasic" # (slack) variable is basic
+        HighsBasisStatusUPPER "HighsBasisStatus::kUpper" # (slack) variable is at its upper bound
+        HighsBasisStatusZERO "HighsBasisStatus::kZero" # free variable is non-basic and set to zero
+        HighsBasisStatusNONBASIC "HighsBasisStatus::kNonbasic" # nonbasic with no specific bound information - useful for users and postsolve
+
+    cdef enum SolverOption:
+        SOLVER_OPTION_SIMPLEX "SolverOption::SOLVER_OPTION_SIMPLEX" = -1
+        SOLVER_OPTION_CHOOSE "SolverOption::SOLVER_OPTION_CHOOSE"
+        SOLVER_OPTION_IPM "SolverOption::SOLVER_OPTION_IPM"
+
+    cdef enum PrimalDualStatus:
+        PrimalDualStatusSTATUS_NOT_SET "PrimalDualStatus::STATUS_NOT_SET" = -1
+        PrimalDualStatusSTATUS_MIN "PrimalDualStatus::STATUS_MIN" = PrimalDualStatusSTATUS_NOT_SET
+        PrimalDualStatusSTATUS_NO_SOLUTION "PrimalDualStatus::STATUS_NO_SOLUTION"
+        PrimalDualStatusSTATUS_UNKNOWN "PrimalDualStatus::STATUS_UNKNOWN"
+        PrimalDualStatusSTATUS_INFEASIBLE_POINT "PrimalDualStatus::STATUS_INFEASIBLE_POINT"
+        PrimalDualStatusSTATUS_FEASIBLE_POINT "PrimalDualStatus::STATUS_FEASIBLE_POINT"
+        PrimalDualStatusSTATUS_MAX "PrimalDualStatus::STATUS_MAX" = PrimalDualStatusSTATUS_FEASIBLE_POINT
+
+    cdef enum HighsOptionType:
+        HighsOptionTypeBOOL "HighsOptionType::kBool" = 0
+        HighsOptionTypeINT "HighsOptionType::kInt"
+        HighsOptionTypeDOUBLE "HighsOptionType::kDouble"
+        HighsOptionTypeSTRING "HighsOptionType::kString"
+
+    # workaround for lack of enum class support in Cython < 3.x
+    # cdef enum class ObjSense(int):
+    #     ObjSenseMINIMIZE "ObjSense::kMinimize" = 1
+    #     ObjSenseMAXIMIZE "ObjSense::kMaximize" = -1
+
+    cdef cppclass ObjSense:
+        pass
+
+    cdef ObjSense ObjSenseMINIMIZE "ObjSense::kMinimize"
+    cdef ObjSense ObjSenseMAXIMIZE "ObjSense::kMaximize"
+
+    # cdef enum class MatrixFormat(int):
+    #     MatrixFormatkColwise "MatrixFormat::kColwise" = 1
+    #     MatrixFormatkRowwise "MatrixFormat::kRowwise"
+    #     MatrixFormatkRowwisePartitioned "MatrixFormat::kRowwisePartitioned"
+
+    cdef cppclass MatrixFormat:
+        pass
+
+    cdef MatrixFormat MatrixFormatkColwise "MatrixFormat::kColwise"
+    cdef MatrixFormat MatrixFormatkRowwise "MatrixFormat::kRowwise"
+    cdef MatrixFormat MatrixFormatkRowwisePartitioned "MatrixFormat::kRowwisePartitioned"
+
+    # cdef enum class HighsVarType(int):
+    #     kContinuous "HighsVarType::kContinuous"
+    #     kInteger "HighsVarType::kInteger"
+    #     kSemiContinuous "HighsVarType::kSemiContinuous"
+    #     kSemiInteger "HighsVarType::kSemiInteger"
+    #     kImplicitInteger "HighsVarType::kImplicitInteger"
+
+    cdef cppclass HighsVarType:
+        pass
+
+    cdef HighsVarType kContinuous "HighsVarType::kContinuous"
+    cdef HighsVarType kInteger "HighsVarType::kInteger"
+    cdef HighsVarType kSemiContinuous "HighsVarType::kSemiContinuous"
+    cdef HighsVarType kSemiInteger "HighsVarType::kSemiInteger"
+    cdef HighsVarType kImplicitInteger "HighsVarType::kImplicitInteger"
@@ -0,0 +1,56 @@
+# cython: language_level=3
+
+from libc.stdio cimport FILE
+
+from libcpp cimport bool
+from libcpp.string cimport string
+
+from .HighsStatus cimport HighsStatus
+from .HighsOptions cimport HighsOptions
+from .HighsInfo cimport HighsInfo
+from .HighsLp cimport (
+    HighsLp,
+    HighsSolution,
+    HighsBasis,
+    ObjSense,
+)
+from .HConst cimport HighsModelStatus
+
+cdef extern from "Highs.h":
+    # From HiGHS/src/Highs.h
+    cdef cppclass Highs:
+        HighsStatus passHighsOptions(const HighsOptions& options)
+        HighsStatus passModel(const HighsLp& lp)
+        HighsStatus run()
+        HighsStatus setHighsLogfile(FILE* logfile)
+        HighsStatus setHighsOutput(FILE* output)
+        HighsStatus writeHighsOptions(const string filename, const bool report_only_non_default_values = true)
+
+        # split up for cython below
+        #const HighsModelStatus& getModelStatus(const bool scaled_model = False) const
+        const HighsModelStatus & getModelStatus() const
+
+        const HighsInfo& getHighsInfo "getInfo" () const
+        string modelStatusToString(const HighsModelStatus model_status) const
+        #HighsStatus getHighsInfoValue(const string& info, int& value)
+        HighsStatus getHighsInfoValue(const string& info, double& value) const
+        const HighsOptions& getHighsOptions() const
+
+        const HighsLp& getLp() const
+
+        HighsStatus writeSolution(const string filename, const bool pretty) const
+
+        HighsStatus setBasis()
+        const HighsSolution& getSolution() const
+        const HighsBasis& getBasis() const
+
+        bool changeObjectiveSense(const ObjSense sense)
+
+        HighsStatus setHighsOptionValueBool "setOptionValue" (const string & option, const bool value)
+        HighsStatus setHighsOptionValueInt "setOptionValue" (const string & option, const int value)
+        HighsStatus setHighsOptionValueStr "setOptionValue" (const string & option, const string & value)
+        HighsStatus setHighsOptionValueDbl "setOptionValue" (const string & option, const double value)
+
+        string primalDualStatusToString(const int primal_dual_status)
+
+        void resetGlobalScheduler(bool blocking)
@@ -0,0 +1,20 @@
+# cython: language_level=3
+
+
+cdef extern from "HighsIO.h" nogil:
+    # workaround for lack of enum class support in Cython < 3.x
+    # cdef enum class HighsLogType(int):
+    #     kInfo "HighsLogType::kInfo" = 1
+    #     kDetailed "HighsLogType::kDetailed"
+    #     kVerbose "HighsLogType::kVerbose"
+    #     kWarning "HighsLogType::kWarning"
+    #     kError "HighsLogType::kError"
+
+    cdef cppclass HighsLogType:
+        pass
+
+    cdef HighsLogType kInfo "HighsLogType::kInfo"
+    cdef HighsLogType kDetailed "HighsLogType::kDetailed"
+    cdef HighsLogType kVerbose "HighsLogType::kVerbose"
+    cdef HighsLogType kWarning "HighsLogType::kWarning"
+    cdef HighsLogType kError "HighsLogType::kError"
@@ -0,0 +1,22 @@
+# cython: language_level=3
+
+cdef extern from "HighsInfo.h" nogil:
+    # From HiGHS/src/lp_data/HighsInfo.h
+    cdef cppclass HighsInfo:
+        # Inherited from HighsInfoStruct:
+        int mip_node_count
+        int simplex_iteration_count
+        int ipm_iteration_count
+        int crossover_iteration_count
+        int primal_solution_status
+        int dual_solution_status
+        int basis_validity
+        double objective_function_value
+        double mip_dual_bound
+        double mip_gap
+        int num_primal_infeasibilities
+        double max_primal_infeasibility
+        double sum_primal_infeasibilities
+        int num_dual_infeasibilities
+        double max_dual_infeasibility
+        double sum_dual_infeasibilities
@@ -0,0 +1,46 @@
+# cython: language_level=3
+
+from libcpp cimport bool
+from libcpp.string cimport string
+from libcpp.vector cimport vector
+
+from .HConst cimport HighsBasisStatus, ObjSense, HighsVarType
+from .HighsSparseMatrix cimport HighsSparseMatrix
+
+
+cdef extern from "HighsLp.h" nogil:
+    # From HiGHS/src/lp_data/HighsLp.h
+    cdef cppclass HighsLp:
+        int num_col_
+        int num_row_
+
+        vector[double] col_cost_
+        vector[double] col_lower_
+        vector[double] col_upper_
+        vector[double] row_lower_
+        vector[double] row_upper_
+
+        HighsSparseMatrix a_matrix_
+
+        ObjSense sense_
+        double offset_
+
+        string model_name_
+
+        vector[string] row_names_
+        vector[string] col_names_
+
+        vector[HighsVarType] integrality_
+
+        bool isMip() const
+
+    cdef cppclass HighsSolution:
+        vector[double] col_value
+        vector[double] col_dual
+        vector[double] row_value
+        vector[double] row_dual
+
+    cdef cppclass HighsBasis:
+        bool valid_
+        vector[HighsBasisStatus] col_status
+        vector[HighsBasisStatus] row_status
@@ -0,0 +1,9 @@
+# cython: language_level=3
+
+from .HighsStatus cimport HighsStatus
+from .HighsLp cimport HighsLp
+from .HighsOptions cimport HighsOptions
+
+cdef extern from "HighsLpUtils.h" nogil:
+    # From HiGHS/src/lp_data/HighsLpUtils.h
+    HighsStatus assessLp(HighsLp& lp, const HighsOptions& options)
@@ -0,0 +1,10 @@
+# cython: language_level=3
+
+from libcpp.string cimport string
+
+from .HConst cimport HighsModelStatus
+
+cdef extern from "HighsModelUtils.h" nogil:
+    # From HiGHS/src/lp_data/HighsModelUtils.h
+    string utilHighsModelStatusToString(const HighsModelStatus model_status)
+    string utilBasisStatusToString(const int primal_dual_status)
@@ -0,0 +1,110 @@
+# cython: language_level=3
+
+from libc.stdio cimport FILE
+
+from libcpp cimport bool
+from libcpp.string cimport string
+from libcpp.vector cimport vector
+
+from .HConst cimport HighsOptionType
+
+cdef extern from "HighsOptions.h" nogil:
+
+    cdef cppclass OptionRecord:
+        HighsOptionType type
+        string name
+        string description
+        bool advanced
+
+    cdef cppclass OptionRecordBool(OptionRecord):
+        bool* value
+        bool default_value
+
+    cdef cppclass OptionRecordInt(OptionRecord):
+        int* value
+        int lower_bound
+        int default_value
+        int upper_bound
+
+    cdef cppclass OptionRecordDouble(OptionRecord):
+        double* value
+        double lower_bound
+        double default_value
+        double upper_bound
+
+    cdef cppclass OptionRecordString(OptionRecord):
+        string* value
+        string default_value
+
+    cdef cppclass HighsOptions:
+        # From HighsOptionsStruct:
+
+        # Options read from the command line
+        string model_file
+        string presolve
+        string solver
+        string parallel
+        double time_limit
+        string options_file
+
+        # Options read from the file
+        double infinite_cost
+        double infinite_bound
+        double small_matrix_value
+        double large_matrix_value
+        double primal_feasibility_tolerance
+        double dual_feasibility_tolerance
+        double ipm_optimality_tolerance
+        double dual_objective_value_upper_bound
+        int highs_debug_level
+        int simplex_strategy
+        int simplex_scale_strategy
+        int simplex_crash_strategy
+        int simplex_dual_edge_weight_strategy
+        int simplex_primal_edge_weight_strategy
+        int simplex_iteration_limit
+        int simplex_update_limit
+        int ipm_iteration_limit
+        int highs_min_threads
+        int highs_max_threads
+        int message_level
+        string solution_file
+        bool write_solution_to_file
+        bool write_solution_pretty
+
+        # Advanced options
+        bool run_crossover
+        bool mps_parser_type_free
+        int keep_n_rows
+        int allowed_simplex_matrix_scale_factor
+        int allowed_simplex_cost_scale_factor
+        int simplex_dualise_strategy
+        int simplex_permute_strategy
+        int dual_simplex_cleanup_strategy
+        int simplex_price_strategy
+        int dual_chuzc_sort_strategy
+        bool simplex_initial_condition_check
+        double simplex_initial_condition_tolerance
+        double dual_steepest_edge_weight_log_error_threshhold
+        double dual_simplex_cost_perturbation_multiplier
+        double start_crossover_tolerance
+        bool less_infeasible_DSE_check
+        bool less_infeasible_DSE_choose_row
+        bool use_original_HFactor_logic
+
+        # Options for MIP solver
+        int mip_max_nodes
+        int mip_report_level
+
+        # Switch for MIP solver
+        bool mip
+
+        # Options for HighsPrintMessage and HighsLogMessage
+        FILE* logfile
+        FILE* output
+        int message_level
+        string solution_file
+        bool write_solution_to_file
+        bool write_solution_pretty
+
+        vector[OptionRecord*] records
@@ -0,0 +1,9 @@
+# cython: language_level=3
+
+from libcpp cimport bool
+
+from .HighsOptions cimport HighsOptions
+
+cdef extern from "HighsRuntimeOptions.h" nogil:
+    # From HiGHS/src/lp_data/HighsRuntimeOptions.h
+    bool loadOptions(int argc, char** argv, HighsOptions& options)
@@ -0,0 +1,12 @@
+# cython: language_level=3
+
+from libcpp.string cimport string
+
+cdef extern from "HighsStatus.h" nogil:
+    ctypedef enum HighsStatus:
+        HighsStatusError "HighsStatus::kError" = -1
+        HighsStatusOK "HighsStatus::kOk" = 0
+        HighsStatusWarning "HighsStatus::kWarning" = 1
+
+
+    string highsStatusToString(HighsStatus status)
@@ -0,0 +1,95 @@
+# cython: language_level=3
+
+from libcpp cimport bool
+
+cdef extern from "SimplexConst.h" nogil:
+
+    cdef enum SimplexAlgorithm:
+        PRIMAL "SimplexAlgorithm::kPrimal" = 0
+        DUAL "SimplexAlgorithm::kDual"
+
+    cdef enum SimplexStrategy:
+        SIMPLEX_STRATEGY_MIN "SimplexStrategy::kSimplexStrategyMin" = 0
+        SIMPLEX_STRATEGY_CHOOSE "SimplexStrategy::kSimplexStrategyChoose" = SIMPLEX_STRATEGY_MIN
+        SIMPLEX_STRATEGY_DUAL "SimplexStrategy::kSimplexStrategyDual"
+        SIMPLEX_STRATEGY_DUAL_PLAIN "SimplexStrategy::kSimplexStrategyDualPlain" = SIMPLEX_STRATEGY_DUAL
+        SIMPLEX_STRATEGY_DUAL_TASKS "SimplexStrategy::kSimplexStrategyDualTasks"
+        SIMPLEX_STRATEGY_DUAL_MULTI "SimplexStrategy::kSimplexStrategyDualMulti"
+        SIMPLEX_STRATEGY_PRIMAL "SimplexStrategy::kSimplexStrategyPrimal"
+        SIMPLEX_STRATEGY_MAX "SimplexStrategy::kSimplexStrategyMax" = SIMPLEX_STRATEGY_PRIMAL
+        SIMPLEX_STRATEGY_NUM "SimplexStrategy::kSimplexStrategyNum"
+
+    cdef enum SimplexCrashStrategy:
+        SIMPLEX_CRASH_STRATEGY_MIN "SimplexCrashStrategy::kSimplexCrashStrategyMin" = 0
+        SIMPLEX_CRASH_STRATEGY_OFF "SimplexCrashStrategy::kSimplexCrashStrategyOff" = SIMPLEX_CRASH_STRATEGY_MIN
+        SIMPLEX_CRASH_STRATEGY_LTSSF_K "SimplexCrashStrategy::kSimplexCrashStrategyLtssfK"
+        SIMPLEX_CRASH_STRATEGY_LTSSF "SimplexCrashStrategy::kSimplexCrashStrategyLtssf" = SIMPLEX_CRASH_STRATEGY_LTSSF_K
+        SIMPLEX_CRASH_STRATEGY_BIXBY "SimplexCrashStrategy::kSimplexCrashStrategyBixby"
+        SIMPLEX_CRASH_STRATEGY_LTSSF_PRI "SimplexCrashStrategy::kSimplexCrashStrategyLtssfPri"
+        SIMPLEX_CRASH_STRATEGY_LTSF_K "SimplexCrashStrategy::kSimplexCrashStrategyLtsfK"
+        SIMPLEX_CRASH_STRATEGY_LTSF_PRI "SimplexCrashStrategy::kSimplexCrashStrategyLtsfPri"
+        SIMPLEX_CRASH_STRATEGY_LTSF "SimplexCrashStrategy::kSimplexCrashStrategyLtsf"
+        SIMPLEX_CRASH_STRATEGY_BIXBY_NO_NONZERO_COL_COSTS "SimplexCrashStrategy::kSimplexCrashStrategyBixbyNoNonzeroColCosts"
+        SIMPLEX_CRASH_STRATEGY_BASIC "SimplexCrashStrategy::kSimplexCrashStrategyBasic"
+        SIMPLEX_CRASH_STRATEGY_TEST_SING "SimplexCrashStrategy::kSimplexCrashStrategyTestSing"
+        SIMPLEX_CRASH_STRATEGY_MAX "SimplexCrashStrategy::kSimplexCrashStrategyMax" = SIMPLEX_CRASH_STRATEGY_TEST_SING
+
+    cdef enum SimplexEdgeWeightStrategy:
+        SIMPLEX_EDGE_WEIGHT_STRATEGY_MIN "SimplexEdgeWeightStrategy::kSimplexEdgeWeightStrategyMin" = -1
+        SIMPLEX_EDGE_WEIGHT_STRATEGY_CHOOSE "SimplexEdgeWeightStrategy::kSimplexEdgeWeightStrategyChoose" = SIMPLEX_EDGE_WEIGHT_STRATEGY_MIN
+        SIMPLEX_EDGE_WEIGHT_STRATEGY_DANTZIG "SimplexEdgeWeightStrategy::kSimplexEdgeWeightStrategyDantzig"
+        SIMPLEX_EDGE_WEIGHT_STRATEGY_DEVEX "SimplexEdgeWeightStrategy::kSimplexEdgeWeightStrategyDevex"
+        SIMPLEX_EDGE_WEIGHT_STRATEGY_STEEPEST_EDGE "SimplexEdgeWeightStrategy::kSimplexEdgeWeightStrategySteepestEdge"
+        SIMPLEX_EDGE_WEIGHT_STRATEGY_STEEPEST_EDGE_UNIT_INITIAL "SimplexEdgeWeightStrategy::kSimplexEdgeWeightStrategySteepestEdgeUnitInitial"
+        SIMPLEX_EDGE_WEIGHT_STRATEGY_MAX "SimplexEdgeWeightStrategy::kSimplexEdgeWeightStrategyMax" = SIMPLEX_EDGE_WEIGHT_STRATEGY_STEEPEST_EDGE_UNIT_INITIAL
+
+    cdef enum SimplexPriceStrategy:
+        SIMPLEX_PRICE_STRATEGY_MIN = 0
+        SIMPLEX_PRICE_STRATEGY_COL = SIMPLEX_PRICE_STRATEGY_MIN
+        SIMPLEX_PRICE_STRATEGY_ROW
+        SIMPLEX_PRICE_STRATEGY_ROW_SWITCH
+        SIMPLEX_PRICE_STRATEGY_ROW_SWITCH_COL_SWITCH
+        SIMPLEX_PRICE_STRATEGY_MAX = SIMPLEX_PRICE_STRATEGY_ROW_SWITCH_COL_SWITCH
+
+    cdef enum SimplexDualChuzcStrategy:
+        SIMPLEX_DUAL_CHUZC_STRATEGY_MIN = 0
+        SIMPLEX_DUAL_CHUZC_STRATEGY_CHOOSE = SIMPLEX_DUAL_CHUZC_STRATEGY_MIN
+        SIMPLEX_DUAL_CHUZC_STRATEGY_QUAD
+        SIMPLEX_DUAL_CHUZC_STRATEGY_HEAP
+        SIMPLEX_DUAL_CHUZC_STRATEGY_BOTH
+        SIMPLEX_DUAL_CHUZC_STRATEGY_MAX = SIMPLEX_DUAL_CHUZC_STRATEGY_BOTH
+
+    cdef enum InvertHint:
+        INVERT_HINT_NO = 0
+        INVERT_HINT_UPDATE_LIMIT_REACHED
+        INVERT_HINT_SYNTHETIC_CLOCK_SAYS_INVERT
+        INVERT_HINT_POSSIBLY_OPTIMAL
+        INVERT_HINT_POSSIBLY_PRIMAL_UNBOUNDED
+        INVERT_HINT_POSSIBLY_DUAL_UNBOUNDED
+        INVERT_HINT_POSSIBLY_SINGULAR_BASIS
+        INVERT_HINT_PRIMAL_INFEASIBLE_IN_PRIMAL_SIMPLEX
+        INVERT_HINT_CHOOSE_COLUMN_FAIL
+        INVERT_HINT_Count
+
+    cdef enum DualEdgeWeightMode:
+        DANTZIG "DualEdgeWeightMode::DANTZIG" = 0
+        DEVEX "DualEdgeWeightMode::DEVEX"
+        STEEPEST_EDGE "DualEdgeWeightMode::STEEPEST_EDGE"
+        Count "DualEdgeWeightMode::Count"
+
+    cdef enum PriceMode:
+        ROW "PriceMode::ROW" = 0
+        COL "PriceMode::COL"
+
+    const int PARALLEL_THREADS_DEFAULT
+    const int DUAL_TASKS_MIN_THREADS
+    const int DUAL_MULTI_MIN_THREADS
+
+    const bool invert_if_row_out_negative
+
+    const int NONBASIC_FLAG_TRUE
+    const int NONBASIC_FLAG_FALSE
+
+    const int NONBASIC_MOVE_UP
+    const int NONBASIC_MOVE_DN
+    const int NONBASIC_MOVE_ZE
@@ -0,0 +1,7 @@
+# cython: language_level=3
+
+cdef extern from "highs_c_api.h" nogil:
+    int Highs_passLp(void* highs, int numcol, int numrow, int numnz,
+                     double* colcost, double* collower, double* colupper,
+                     double* rowlower, double* rowupper,
+                     int* astart, int* aindex,  double* avalue)
@@ -0,0 +1,158 @@
+from __future__ import annotations
+from typing import TYPE_CHECKING
+
+import numpy as np
+
+from ._optimize import OptimizeResult
+from ._pava_pybind import pava
+
+if TYPE_CHECKING:
+    import numpy.typing as npt
+
+
+__all__ = ["isotonic_regression"]
+
+
+def isotonic_regression(
+    y: npt.ArrayLike,
+    *,
+    weights: npt.ArrayLike | None = None,
+    increasing: bool = True,
+) -> OptimizeResult:
+    r"""Nonparametric isotonic regression.
+
+    A (not strictly) monotonically increasing array `x` with the same length
+    as `y` is calculated by the pool adjacent violators algorithm (PAVA), see
+    [1]_. See the Notes section for more details.
+
+    Parameters
+    ----------
+    y : (N,) array_like
+        Response variable.
+    weights : (N,) array_like or None
+        Case weights.
+    increasing : bool
+        If True, fit monotonic increasing, i.e. isotonic, regression.
+        If False, fit a monotonic decreasing, i.e. antitonic, regression.
+        Default is True.
+
+    Returns
+    -------
+    res : OptimizeResult
+        The optimization result represented as a ``OptimizeResult`` object.
+        Important attributes are:
+
+        - ``x``: The isotonic regression solution, i.e. an increasing (or
+          decreasing) array of the same length than y, with elements in the
+          range from min(y) to max(y).
+        - ``weights`` : Array with the sum of case weights for each block
+          (or pool) B.
+        - ``blocks``: Array of length B+1 with the indices of the start
+          positions of each block (or pool) B. The j-th block is given by
+          ``x[blocks[j]:blocks[j+1]]`` for which all values are the same.
+
+    Notes
+    -----
+    Given data :math:`y` and case weights :math:`w`, the isotonic regression
+    solves the following optimization problem:
+
+    .. math::
+
+        \operatorname{argmin}_{x_i} \sum_i w_i (y_i - x_i)^2 \quad
+        \text{subject to } x_i \leq x_j \text{ whenever } i \leq j \,.
+
+    For every input value :math:`y_i`, it generates a value :math:`x_i` such
+    that :math:`x` is increasing (but not strictly), i.e.
+    :math:`x_i \leq x_{i+1}`. This is accomplished by the PAVA.
+    The solution consists of pools or blocks, i.e. neighboring elements of
+    :math:`x`, e.g. :math:`x_i` and :math:`x_{i+1}`, that all have the same
+    value.
+
+    Most interestingly, the solution stays the same if the squared loss is
+    replaced by the wide class of Bregman functions which are the unique
+    class of strictly consistent scoring functions for the mean, see [2]_
+    and references therein.
+
+    The implemented version of PAVA according to [1]_ has a computational
+    complexity of O(N) with input size N.
+
+    References
+    ----------
+    .. [1] Busing, F. M. T. A. (2022).
+           Monotone Regression: A Simple and Fast O(n) PAVA Implementation.
+           Journal of Statistical Software, Code Snippets, 102(1), 1-25.
+           :doi:`10.18637/jss.v102.c01`
+    .. [2] Jordan, A.I., Mühlemann, A. & Ziegel, J.F.
+           Characterizing the optimal solutions to the isotonic regression
+           problem for identifiable functionals.
+           Ann Inst Stat Math 74, 489-514 (2022).
+           :doi:`10.1007/s10463-021-00808-0`
+
+    Examples
+    --------
+    This example demonstrates that ``isotonic_regression`` really solves a
+    constrained optimization problem.
+
+    >>> import numpy as np
+    >>> from scipy.optimize import isotonic_regression, minimize
+    >>> y = [1.5, 1.0, 4.0, 6.0, 5.7, 5.0, 7.8, 9.0, 7.5, 9.5, 9.0]
+    >>> def objective(yhat, y):
+    ...     return np.sum((yhat - y)**2)
+    >>> def constraint(yhat, y):
+    ...     # This is for a monotonically increasing regression.
+    ...     return np.diff(yhat)
+    >>> result = minimize(objective, x0=y, args=(y,),
+    ...                   constraints=[{'type': 'ineq',
+    ...                                 'fun': lambda x: constraint(x, y)}])
+    >>> result.x
+    array([1.25      , 1.25      , 4.        , 5.56666667, 5.56666667,
+           5.56666667, 7.8       , 8.25      , 8.25      , 9.25      ,
+           9.25      ])
+    >>> result = isotonic_regression(y)
+    >>> result.x
+    array([1.25      , 1.25      , 4.        , 5.56666667, 5.56666667,
+           5.56666667, 7.8       , 8.25      , 8.25      , 9.25      ,
+           9.25      ])
+
+    The big advantage of ``isotonic_regression`` compared to calling
+    ``minimize`` is that it is more user friendly, i.e. one does not need to
+    define objective and constraint functions, and that it is orders of
+    magnitudes faster. On commodity hardware (in 2023), for normal distributed
+    input y of length 1000, the minimizer takes about 4 seconds, while
+    ``isotonic_regression`` takes about 200 microseconds.
+    """
+    yarr = np.asarray(y)  # Check yarr.ndim == 1 is implicit (pybind11) in pava.
+    if weights is None:
+        warr = np.ones_like(yarr)
+    else:
+        warr = np.asarray(weights)
+
+        if not (yarr.ndim == warr.ndim == 1 and yarr.shape[0] == warr.shape[0]):
+            raise ValueError(
+                "Input arrays y and w must have one dimension of equal length."
+            )
+        if np.any(warr <= 0):
+            raise ValueError("Weights w must be strictly positive.")
+
+    order = slice(None) if increasing else slice(None, None, -1)
+    x = np.array(yarr[order], order="C", dtype=np.float64, copy=True)
+    wx = np.array(warr[order], order="C", dtype=np.float64, copy=True)
+    n = x.shape[0]
+    r = np.full(shape=n + 1, fill_value=-1, dtype=np.intp)
+    x, wx, r, b = pava(x, wx, r)
+    # Now that we know the number of blocks b, we only keep the relevant part
+    # of r and wx.
+    # As information: Due to the pava implementation, after the last block
+    # index, there might be smaller numbers appended to r, e.g.
+    # r = [0, 10, 8, 7] which in the end should be r = [0, 10].
+    r = r[:b + 1]
+    wx = wx[:b]
+    if not increasing:
+        x = x[::-1]
+        wx = wx[::-1]
+        r = r[-1] - r[::-1]
+    return OptimizeResult(
+        x=x,
+        weights=wx,
+        blocks=r,
+    )
@@ -0,0 +1,543 @@
+"""
+Functions
+---------
+.. autosummary::
+   :toctree: generated/
+
+    fmin_l_bfgs_b
+
+"""
+
+## License for the Python wrapper
+## ==============================
+
+## Copyright (c) 2004 David M. Cooke <cookedm@physics.mcmaster.ca>
+
+## Permission is hereby granted, free of charge, to any person obtaining a
+## copy of this software and associated documentation files (the "Software"),
+## to deal in the Software without restriction, including without limitation
+## the rights to use, copy, modify, merge, publish, distribute, sublicense,
+## and/or sell copies of the Software, and to permit persons to whom the
+## Software is furnished to do so, subject to the following conditions:
+
+## The above copyright notice and this permission notice shall be included in
+## all copies or substantial portions of the Software.
+
+## THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+## IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+## FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+## AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+## LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+## FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+## DEALINGS IN THE SOFTWARE.
+
+## Modifications by Travis Oliphant and Enthought, Inc. for inclusion in SciPy
+
+import numpy as np
+from numpy import array, asarray, float64, zeros
+from . import _lbfgsb
+from ._optimize import (MemoizeJac, OptimizeResult, _call_callback_maybe_halt,
+                        _wrap_callback, _check_unknown_options,
+                        _prepare_scalar_function)
+from ._constraints import old_bound_to_new
+
+from scipy.sparse.linalg import LinearOperator
+
+__all__ = ['fmin_l_bfgs_b', 'LbfgsInvHessProduct']
+
+
+def fmin_l_bfgs_b(func, x0, fprime=None, args=(),
+                  approx_grad=0,
+                  bounds=None, m=10, factr=1e7, pgtol=1e-5,
+                  epsilon=1e-8,
+                  iprint=-1, maxfun=15000, maxiter=15000, disp=None,
+                  callback=None, maxls=20):
+    """
+    Minimize a function func using the L-BFGS-B algorithm.
+
+    Parameters
+    ----------
+    func : callable f(x,*args)
+        Function to minimize.
+    x0 : ndarray
+        Initial guess.
+    fprime : callable fprime(x,*args), optional
+        The gradient of `func`. If None, then `func` returns the function
+        value and the gradient (``f, g = func(x, *args)``), unless
+        `approx_grad` is True in which case `func` returns only ``f``.
+    args : sequence, optional
+        Arguments to pass to `func` and `fprime`.
+    approx_grad : bool, optional
+        Whether to approximate the gradient numerically (in which case
+        `func` returns only the function value).
+    bounds : list, optional
+        ``(min, max)`` pairs for each element in ``x``, defining
+        the bounds on that parameter. Use None or +-inf for one of ``min`` or
+        ``max`` when there is no bound in that direction.
+    m : int, optional
+        The maximum number of variable metric corrections
+        used to define the limited memory matrix. (The limited memory BFGS
+        method does not store the full hessian but uses this many terms in an
+        approximation to it.)
+    factr : float, optional
+        The iteration stops when
+        ``(f^k - f^{k+1})/max{|f^k|,|f^{k+1}|,1} <= factr * eps``,
+        where ``eps`` is the machine precision, which is automatically
+        generated by the code. Typical values for `factr` are: 1e12 for
+        low accuracy; 1e7 for moderate accuracy; 10.0 for extremely
+        high accuracy. See Notes for relationship to `ftol`, which is exposed
+        (instead of `factr`) by the `scipy.optimize.minimize` interface to
+        L-BFGS-B.
+    pgtol : float, optional
+        The iteration will stop when
+        ``max{|proj g_i | i = 1, ..., n} <= pgtol``
+        where ``proj g_i`` is the i-th component of the projected gradient.
+    epsilon : float, optional
+        Step size used when `approx_grad` is True, for numerically
+        calculating the gradient
+    iprint : int, optional
+        Controls the frequency of output. ``iprint < 0`` means no output;
+        ``iprint = 0``    print only one line at the last iteration;
+        ``0 < iprint < 99`` print also f and ``|proj g|`` every iprint iterations;
+        ``iprint = 99``   print details of every iteration except n-vectors;
+        ``iprint = 100``  print also the changes of active set and final x;
+        ``iprint > 100``  print details of every iteration including x and g.
+    disp : int, optional
+        If zero, then no output. If a positive number, then this over-rides
+        `iprint` (i.e., `iprint` gets the value of `disp`).
+    maxfun : int, optional
+        Maximum number of function evaluations. Note that this function
+        may violate the limit because of evaluating gradients by numerical
+        differentiation.
+    maxiter : int, optional
+        Maximum number of iterations.
+    callback : callable, optional
+        Called after each iteration, as ``callback(xk)``, where ``xk`` is the
+        current parameter vector.
+    maxls : int, optional
+        Maximum number of line search steps (per iteration). Default is 20.
+
+    Returns
+    -------
+    x : array_like
+        Estimated position of the minimum.
+    f : float
+        Value of `func` at the minimum.
+    d : dict
+        Information dictionary.
+
+        * d['warnflag'] is
+
+          - 0 if converged,
+          - 1 if too many function evaluations or too many iterations,
+          - 2 if stopped for another reason, given in d['task']
+
+        * d['grad'] is the gradient at the minimum (should be 0 ish)
+        * d['funcalls'] is the number of function calls made.
+        * d['nit'] is the number of iterations.
+
+    See also
+    --------
+    minimize: Interface to minimization algorithms for multivariate
+        functions. See the 'L-BFGS-B' `method` in particular. Note that the
+        `ftol` option is made available via that interface, while `factr` is
+        provided via this interface, where `factr` is the factor multiplying
+        the default machine floating-point precision to arrive at `ftol`:
+        ``ftol = factr * numpy.finfo(float).eps``.
+
+    Notes
+    -----
+    License of L-BFGS-B (FORTRAN code):
+
+    The version included here (in fortran code) is 3.0
+    (released April 25, 2011). It was written by Ciyou Zhu, Richard Byrd,
+    and Jorge Nocedal <nocedal@ece.nwu.edu>. It carries the following
+    condition for use:
+
+    This software is freely available, but we expect that all publications
+    describing work using this software, or all commercial products using it,
+    quote at least one of the references given below. This software is released
+    under the BSD License.
+
+    References
+    ----------
+    * R. H. Byrd, P. Lu and J. Nocedal. A Limited Memory Algorithm for Bound
+      Constrained Optimization, (1995), SIAM Journal on Scientific and
+      Statistical Computing, 16, 5, pp. 1190-1208.
+    * C. Zhu, R. H. Byrd and J. Nocedal. L-BFGS-B: Algorithm 778: L-BFGS-B,
+      FORTRAN routines for large scale bound constrained optimization (1997),
+      ACM Transactions on Mathematical Software, 23, 4, pp. 550 - 560.
+    * J.L. Morales and J. Nocedal. L-BFGS-B: Remark on Algorithm 778: L-BFGS-B,
+      FORTRAN routines for large scale bound constrained optimization (2011),
+      ACM Transactions on Mathematical Software, 38, 1.
+
+    Examples
+    --------
+    Solve a linear regression problem via `fmin_l_bfgs_b`. To do this, first we define
+    an objective function ``f(m, b) = (y - y_model)**2``, where `y` describes the
+    observations and `y_model` the prediction of the linear model as
+    ``y_model = m*x + b``. The bounds for the parameters, ``m`` and ``b``, are arbitrarily
+    chosen as ``(0,5)`` and ``(5,10)`` for this example.
+
+    >>> import numpy as np
+    >>> from scipy.optimize import fmin_l_bfgs_b
+    >>> X = np.arange(0, 10, 1)
+    >>> M = 2
+    >>> B = 3
+    >>> Y = M * X + B
+    >>> def func(parameters, *args):
+    ...     x = args[0]
+    ...     y = args[1]
+    ...     m, b = parameters
+    ...     y_model = m*x + b
+    ...     error = sum(np.power((y - y_model), 2))
+    ...     return error
+
+    >>> initial_values = np.array([0.0, 1.0])
+
+    >>> x_opt, f_opt, info = fmin_l_bfgs_b(func, x0=initial_values, args=(X, Y),
+    ...                                    approx_grad=True)
+    >>> x_opt, f_opt
+    array([1.99999999, 3.00000006]), 1.7746231151323805e-14  # may vary
+
+    The optimized parameters in ``x_opt`` agree with the ground truth parameters
+    ``m`` and ``b``. Next, let us perform a bound contrained optimization using the `bounds`
+    parameter. 
+
+    >>> bounds = [(0, 5), (5, 10)]
+    >>> x_opt, f_op, info = fmin_l_bfgs_b(func, x0=initial_values, args=(X, Y),
+    ...                                   approx_grad=True, bounds=bounds)
+    >>> x_opt, f_opt
+    array([1.65990508, 5.31649385]), 15.721334516453945  # may vary    
+    """
+    # handle fprime/approx_grad
+    if approx_grad:
+        fun = func
+        jac = None
+    elif fprime is None:
+        fun = MemoizeJac(func)
+        jac = fun.derivative
+    else:
+        fun = func
+        jac = fprime
+
+    # build options
+    callback = _wrap_callback(callback)
+    opts = {'disp': disp,
+            'iprint': iprint,
+            'maxcor': m,
+            'ftol': factr * np.finfo(float).eps,
+            'gtol': pgtol,
+            'eps': epsilon,
+            'maxfun': maxfun,
+            'maxiter': maxiter,
+            'callback': callback,
+            'maxls': maxls}
+
+    res = _minimize_lbfgsb(fun, x0, args=args, jac=jac, bounds=bounds,
+                           **opts)
+    d = {'grad': res['jac'],
+         'task': res['message'],
+         'funcalls': res['nfev'],
+         'nit': res['nit'],
+         'warnflag': res['status']}
+    f = res['fun']
+    x = res['x']
+
+    return x, f, d
+
+
+def _minimize_lbfgsb(fun, x0, args=(), jac=None, bounds=None,
+                     disp=None, maxcor=10, ftol=2.2204460492503131e-09,
+                     gtol=1e-5, eps=1e-8, maxfun=15000, maxiter=15000,
+                     iprint=-1, callback=None, maxls=20,
+                     finite_diff_rel_step=None, **unknown_options):
+    """
+    Minimize a scalar function of one or more variables using the L-BFGS-B
+    algorithm.
+
+    Options
+    -------
+    disp : None or int
+        If `disp is None` (the default), then the supplied version of `iprint`
+        is used. If `disp is not None`, then it overrides the supplied version
+        of `iprint` with the behaviour you outlined.
+    maxcor : int
+        The maximum number of variable metric corrections used to
+        define the limited memory matrix. (The limited memory BFGS
+        method does not store the full hessian but uses this many terms
+        in an approximation to it.)
+    ftol : float
+        The iteration stops when ``(f^k -
+        f^{k+1})/max{|f^k|,|f^{k+1}|,1} <= ftol``.
+    gtol : float
+        The iteration will stop when ``max{|proj g_i | i = 1, ..., n}
+        <= gtol`` where ``proj g_i`` is the i-th component of the
+        projected gradient.
+    eps : float or ndarray
+        If `jac is None` the absolute step size used for numerical
+        approximation of the jacobian via forward differences.
+    maxfun : int
+        Maximum number of function evaluations. Note that this function
+        may violate the limit because of evaluating gradients by numerical
+        differentiation.
+    maxiter : int
+        Maximum number of iterations.
+    iprint : int, optional
+        Controls the frequency of output. ``iprint < 0`` means no output;
+        ``iprint = 0``    print only one line at the last iteration;
+        ``0 < iprint < 99`` print also f and ``|proj g|`` every iprint iterations;
+        ``iprint = 99``   print details of every iteration except n-vectors;
+        ``iprint = 100``  print also the changes of active set and final x;
+        ``iprint > 100``  print details of every iteration including x and g.
+    maxls : int, optional
+        Maximum number of line search steps (per iteration). Default is 20.
+    finite_diff_rel_step : None or array_like, optional
+        If `jac in ['2-point', '3-point', 'cs']` the relative step size to
+        use for numerical approximation of the jacobian. The absolute step
+        size is computed as ``h = rel_step * sign(x) * max(1, abs(x))``,
+        possibly adjusted to fit into the bounds. For ``method='3-point'``
+        the sign of `h` is ignored. If None (default) then step is selected
+        automatically.
+
+    Notes
+    -----
+    The option `ftol` is exposed via the `scipy.optimize.minimize` interface,
+    but calling `scipy.optimize.fmin_l_bfgs_b` directly exposes `factr`. The
+    relationship between the two is ``ftol = factr * numpy.finfo(float).eps``.
+    I.e., `factr` multiplies the default machine floating-point precision to
+    arrive at `ftol`.
+
+    """
+    _check_unknown_options(unknown_options)
+    m = maxcor
+    pgtol = gtol
+    factr = ftol / np.finfo(float).eps
+
+    x0 = asarray(x0).ravel()
+    n, = x0.shape
+
+    # historically old-style bounds were/are expected by lbfgsb.
+    # That's still the case but we'll deal with new-style from here on,
+    # it's easier
+    if bounds is None:
+        pass
+    elif len(bounds) != n:
+        raise ValueError('length of x0 != length of bounds')
+    else:
+        bounds = np.array(old_bound_to_new(bounds))
+
+        # check bounds
+        if (bounds[0] > bounds[1]).any():
+            raise ValueError(
+                "LBFGSB - one of the lower bounds is greater than an upper bound."
+            )
+
+        # initial vector must lie within the bounds. Otherwise ScalarFunction and
+        # approx_derivative will cause problems
+        x0 = np.clip(x0, bounds[0], bounds[1])
+
+    if disp is not None:
+        if disp == 0:
+            iprint = -1
+        else:
+            iprint = disp
+
+    # _prepare_scalar_function can use bounds=None to represent no bounds
+    sf = _prepare_scalar_function(fun, x0, jac=jac, args=args, epsilon=eps,
+                                  bounds=bounds,
+                                  finite_diff_rel_step=finite_diff_rel_step)
+
+    func_and_grad = sf.fun_and_grad
+
+    fortran_int = _lbfgsb.types.intvar.dtype
+
+    nbd = zeros(n, fortran_int)
+    low_bnd = zeros(n, float64)
+    upper_bnd = zeros(n, float64)
+    bounds_map = {(-np.inf, np.inf): 0,
+                  (1, np.inf): 1,
+                  (1, 1): 2,
+                  (-np.inf, 1): 3}
+
+    if bounds is not None:
+        for i in range(0, n):
+            l, u = bounds[0, i], bounds[1, i]
+            if not np.isinf(l):
+                low_bnd[i] = l
+                l = 1
+            if not np.isinf(u):
+                upper_bnd[i] = u
+                u = 1
+            nbd[i] = bounds_map[l, u]
+
+    if not maxls > 0:
+        raise ValueError('maxls must be positive.')
+
+    x = array(x0, float64)
+    f = array(0.0, float64)
+    g = zeros((n,), float64)
+    wa = zeros(2*m*n + 5*n + 11*m*m + 8*m, float64)
+    iwa = zeros(3*n, fortran_int)
+    task = zeros(1, 'S60')
+    csave = zeros(1, 'S60')
+    lsave = zeros(4, fortran_int)
+    isave = zeros(44, fortran_int)
+    dsave = zeros(29, float64)
+
+    task[:] = 'START'
+
+    n_iterations = 0
+
+    while 1:
+        # g may become float32 if a user provides a function that calculates
+        # the Jacobian in float32 (see gh-18730). The underlying Fortran code
+        # expects float64, so upcast it
+        g = g.astype(np.float64)
+        # x, f, g, wa, iwa, task, csave, lsave, isave, dsave = \
+        _lbfgsb.setulb(m, x, low_bnd, upper_bnd, nbd, f, g, factr,
+                       pgtol, wa, iwa, task, iprint, csave, lsave,
+                       isave, dsave, maxls)
+        task_str = task.tobytes()
+        if task_str.startswith(b'FG'):
+            # The minimization routine wants f and g at the current x.
+            # Note that interruptions due to maxfun are postponed
+            # until the completion of the current minimization iteration.
+            # Overwrite f and g:
+            f, g = func_and_grad(x)
+        elif task_str.startswith(b'NEW_X'):
+            # new iteration
+            n_iterations += 1
+
+            intermediate_result = OptimizeResult(x=x, fun=f)
+            if _call_callback_maybe_halt(callback, intermediate_result):
+                task[:] = 'STOP: CALLBACK REQUESTED HALT'
+            if n_iterations >= maxiter:
+                task[:] = 'STOP: TOTAL NO. of ITERATIONS REACHED LIMIT'
+            elif sf.nfev > maxfun:
+                task[:] = ('STOP: TOTAL NO. of f AND g EVALUATIONS '
+                           'EXCEEDS LIMIT')
+        else:
+            break
+
+    task_str = task.tobytes().strip(b'\x00').strip()
+    if task_str.startswith(b'CONV'):
+        warnflag = 0
+    elif sf.nfev > maxfun or n_iterations >= maxiter:
+        warnflag = 1
+    else:
+        warnflag = 2
+
+    # These two portions of the workspace are described in the mainlb
+    # subroutine in lbfgsb.f. See line 363.
+    s = wa[0: m*n].reshape(m, n)
+    y = wa[m*n: 2*m*n].reshape(m, n)
+
+    # See lbfgsb.f line 160 for this portion of the workspace.
+    # isave(31) = the total number of BFGS updates prior the current iteration;
+    n_bfgs_updates = isave[30]
+
+    n_corrs = min(n_bfgs_updates, maxcor)
+    hess_inv = LbfgsInvHessProduct(s[:n_corrs], y[:n_corrs])
+
+    task_str = task_str.decode()
+    return OptimizeResult(fun=f, jac=g, nfev=sf.nfev,
+                          njev=sf.ngev,
+                          nit=n_iterations, status=warnflag, message=task_str,
+                          x=x, success=(warnflag == 0), hess_inv=hess_inv)
+
+
+class LbfgsInvHessProduct(LinearOperator):
+    """Linear operator for the L-BFGS approximate inverse Hessian.
+
+    This operator computes the product of a vector with the approximate inverse
+    of the Hessian of the objective function, using the L-BFGS limited
+    memory approximation to the inverse Hessian, accumulated during the
+    optimization.
+
+    Objects of this class implement the ``scipy.sparse.linalg.LinearOperator``
+    interface.
+
+    Parameters
+    ----------
+    sk : array_like, shape=(n_corr, n)
+        Array of `n_corr` most recent updates to the solution vector.
+        (See [1]).
+    yk : array_like, shape=(n_corr, n)
+        Array of `n_corr` most recent updates to the gradient. (See [1]).
+
+    References
+    ----------
+    .. [1] Nocedal, Jorge. "Updating quasi-Newton matrices with limited
+       storage." Mathematics of computation 35.151 (1980): 773-782.
+
+    """
+
+    def __init__(self, sk, yk):
+        """Construct the operator."""
+        if sk.shape != yk.shape or sk.ndim != 2:
+            raise ValueError('sk and yk must have matching shape, (n_corrs, n)')
+        n_corrs, n = sk.shape
+
+        super().__init__(dtype=np.float64, shape=(n, n))
+
+        self.sk = sk
+        self.yk = yk
+        self.n_corrs = n_corrs
+        self.rho = 1 / np.einsum('ij,ij->i', sk, yk)
+
+    def _matvec(self, x):
+        """Efficient matrix-vector multiply with the BFGS matrices.
+
+        This calculation is described in Section (4) of [1].
+
+        Parameters
+        ----------
+        x : ndarray
+            An array with shape (n,) or (n,1).
+
+        Returns
+        -------
+        y : ndarray
+            The matrix-vector product
+
+        """
+        s, y, n_corrs, rho = self.sk, self.yk, self.n_corrs, self.rho
+        q = np.array(x, dtype=self.dtype, copy=True)
+        if q.ndim == 2 and q.shape[1] == 1:
+            q = q.reshape(-1)
+
+        alpha = np.empty(n_corrs)
+
+        for i in range(n_corrs-1, -1, -1):
+            alpha[i] = rho[i] * np.dot(s[i], q)
+            q = q - alpha[i]*y[i]
+
+        r = q
+        for i in range(n_corrs):
+            beta = rho[i] * np.dot(y[i], r)
+            r = r + s[i] * (alpha[i] - beta)
+
+        return r
+
+    def todense(self):
+        """Return a dense array representation of this operator.
+
+        Returns
+        -------
+        arr : ndarray, shape=(n, n)
+            An array with the same shape and containing
+            the same data represented by this `LinearOperator`.
+
+        """
+        s, y, n_corrs, rho = self.sk, self.yk, self.n_corrs, self.rho
+        I = np.eye(*self.shape, dtype=self.dtype)
+        Hk = I
+
+        for i in range(n_corrs):
+            A1 = I - s[i][:, np.newaxis] * y[i][np.newaxis, :] * rho[i]
+            A2 = I - y[i][:, np.newaxis] * s[i][np.newaxis, :] * rho[i]
+
+            Hk = np.dot(A1, np.dot(Hk, A2)) + (rho[i] * s[i][:, np.newaxis] *
+                                                        s[i][np.newaxis, :])
+        return Hk
@@ -0,0 +1,897 @@
+"""
+Functions
+---------
+.. autosummary::
+   :toctree: generated/
+
+    line_search_armijo
+    line_search_wolfe1
+    line_search_wolfe2
+    scalar_search_wolfe1
+    scalar_search_wolfe2
+
+"""
+from warnings import warn
+
+from scipy.optimize import _minpack2 as minpack2    # noqa: F401
+from ._dcsrch import DCSRCH
+import numpy as np
+
+__all__ = ['LineSearchWarning', 'line_search_wolfe1', 'line_search_wolfe2',
+           'scalar_search_wolfe1', 'scalar_search_wolfe2',
+           'line_search_armijo']
+
+class LineSearchWarning(RuntimeWarning):
+    pass
+
+
+def _check_c1_c2(c1, c2):
+    if not (0 < c1 < c2 < 1):
+        raise ValueError("'c1' and 'c2' do not satisfy"
+                         "'0 < c1 < c2 < 1'.")
+
+
+#------------------------------------------------------------------------------
+# Minpack's Wolfe line and scalar searches
+#------------------------------------------------------------------------------
+
+def line_search_wolfe1(f, fprime, xk, pk, gfk=None,
+                       old_fval=None, old_old_fval=None,
+                       args=(), c1=1e-4, c2=0.9, amax=50, amin=1e-8,
+                       xtol=1e-14):
+    """
+    As `scalar_search_wolfe1` but do a line search to direction `pk`
+
+    Parameters
+    ----------
+    f : callable
+        Function `f(x)`
+    fprime : callable
+        Gradient of `f`
+    xk : array_like
+        Current point
+    pk : array_like
+        Search direction
+    gfk : array_like, optional
+        Gradient of `f` at point `xk`
+    old_fval : float, optional
+        Value of `f` at point `xk`
+    old_old_fval : float, optional
+        Value of `f` at point preceding `xk`
+
+    The rest of the parameters are the same as for `scalar_search_wolfe1`.
+
+    Returns
+    -------
+    stp, f_count, g_count, fval, old_fval
+        As in `line_search_wolfe1`
+    gval : array
+        Gradient of `f` at the final point
+
+    Notes
+    -----
+    Parameters `c1` and `c2` must satisfy ``0 < c1 < c2 < 1``.
+
+    """
+    if gfk is None:
+        gfk = fprime(xk, *args)
+
+    gval = [gfk]
+    gc = [0]
+    fc = [0]
+
+    def phi(s):
+        fc[0] += 1
+        return f(xk + s*pk, *args)
+
+    def derphi(s):
+        gval[0] = fprime(xk + s*pk, *args)
+        gc[0] += 1
+        return np.dot(gval[0], pk)
+
+    derphi0 = np.dot(gfk, pk)
+
+    stp, fval, old_fval = scalar_search_wolfe1(
+            phi, derphi, old_fval, old_old_fval, derphi0,
+            c1=c1, c2=c2, amax=amax, amin=amin, xtol=xtol)
+
+    return stp, fc[0], gc[0], fval, old_fval, gval[0]
+
+
+def scalar_search_wolfe1(phi, derphi, phi0=None, old_phi0=None, derphi0=None,
+                         c1=1e-4, c2=0.9,
+                         amax=50, amin=1e-8, xtol=1e-14):
+    """
+    Scalar function search for alpha that satisfies strong Wolfe conditions
+
+    alpha > 0 is assumed to be a descent direction.
+
+    Parameters
+    ----------
+    phi : callable phi(alpha)
+        Function at point `alpha`
+    derphi : callable phi'(alpha)
+        Objective function derivative. Returns a scalar.
+    phi0 : float, optional
+        Value of phi at 0
+    old_phi0 : float, optional
+        Value of phi at previous point
+    derphi0 : float, optional
+        Value derphi at 0
+    c1 : float, optional
+        Parameter for Armijo condition rule.
+    c2 : float, optional
+        Parameter for curvature condition rule.
+    amax, amin : float, optional
+        Maximum and minimum step size
+    xtol : float, optional
+        Relative tolerance for an acceptable step.
+
+    Returns
+    -------
+    alpha : float
+        Step size, or None if no suitable step was found
+    phi : float
+        Value of `phi` at the new point `alpha`
+    phi0 : float
+        Value of `phi` at `alpha=0`
+
+    Notes
+    -----
+    Uses routine DCSRCH from MINPACK.
+    
+    Parameters `c1` and `c2` must satisfy ``0 < c1 < c2 < 1`` as described in [1]_.
+
+    References
+    ----------
+    
+    .. [1] Nocedal, J., & Wright, S. J. (2006). Numerical optimization.
+       In Springer Series in Operations Research and Financial Engineering.
+       (Springer Series in Operations Research and Financial Engineering).
+       Springer Nature.
+
+    """
+    _check_c1_c2(c1, c2)
+
+    if phi0 is None:
+        phi0 = phi(0.)
+    if derphi0 is None:
+        derphi0 = derphi(0.)
+
+    if old_phi0 is not None and derphi0 != 0:
+        alpha1 = min(1.0, 1.01*2*(phi0 - old_phi0)/derphi0)
+        if alpha1 < 0:
+            alpha1 = 1.0
+    else:
+        alpha1 = 1.0
+
+    maxiter = 100
+
+    dcsrch = DCSRCH(phi, derphi, c1, c2, xtol, amin, amax)
+    stp, phi1, phi0, task = dcsrch(
+        alpha1, phi0=phi0, derphi0=derphi0, maxiter=maxiter
+    )
+
+    return stp, phi1, phi0
+
+
+line_search = line_search_wolfe1
+
+
+#------------------------------------------------------------------------------
+# Pure-Python Wolfe line and scalar searches
+#------------------------------------------------------------------------------
+
+# Note: `line_search_wolfe2` is the public `scipy.optimize.line_search`
+
+def line_search_wolfe2(f, myfprime, xk, pk, gfk=None, old_fval=None,
+                       old_old_fval=None, args=(), c1=1e-4, c2=0.9, amax=None,
+                       extra_condition=None, maxiter=10):
+    """Find alpha that satisfies strong Wolfe conditions.
+
+    Parameters
+    ----------
+    f : callable f(x,*args)
+        Objective function.
+    myfprime : callable f'(x,*args)
+        Objective function gradient.
+    xk : ndarray
+        Starting point.
+    pk : ndarray
+        Search direction. The search direction must be a descent direction
+        for the algorithm to converge.
+    gfk : ndarray, optional
+        Gradient value for x=xk (xk being the current parameter
+        estimate). Will be recomputed if omitted.
+    old_fval : float, optional
+        Function value for x=xk. Will be recomputed if omitted.
+    old_old_fval : float, optional
+        Function value for the point preceding x=xk.
+    args : tuple, optional
+        Additional arguments passed to objective function.
+    c1 : float, optional
+        Parameter for Armijo condition rule.
+    c2 : float, optional
+        Parameter for curvature condition rule.
+    amax : float, optional
+        Maximum step size
+    extra_condition : callable, optional
+        A callable of the form ``extra_condition(alpha, x, f, g)``
+        returning a boolean. Arguments are the proposed step ``alpha``
+        and the corresponding ``x``, ``f`` and ``g`` values. The line search
+        accepts the value of ``alpha`` only if this
+        callable returns ``True``. If the callable returns ``False``
+        for the step length, the algorithm will continue with
+        new iterates. The callable is only called for iterates
+        satisfying the strong Wolfe conditions.
+    maxiter : int, optional
+        Maximum number of iterations to perform.
+
+    Returns
+    -------
+    alpha : float or None
+        Alpha for which ``x_new = x0 + alpha * pk``,
+        or None if the line search algorithm did not converge.
+    fc : int
+        Number of function evaluations made.
+    gc : int
+        Number of gradient evaluations made.
+    new_fval : float or None
+        New function value ``f(x_new)=f(x0+alpha*pk)``,
+        or None if the line search algorithm did not converge.
+    old_fval : float
+        Old function value ``f(x0)``.
+    new_slope : float or None
+        The local slope along the search direction at the
+        new value ``<myfprime(x_new), pk>``,
+        or None if the line search algorithm did not converge.
+
+
+    Notes
+    -----
+    Uses the line search algorithm to enforce strong Wolfe
+    conditions. See Wright and Nocedal, 'Numerical Optimization',
+    1999, pp. 59-61.
+
+    The search direction `pk` must be a descent direction (e.g.
+    ``-myfprime(xk)``) to find a step length that satisfies the strong Wolfe
+    conditions. If the search direction is not a descent direction (e.g.
+    ``myfprime(xk)``), then `alpha`, `new_fval`, and `new_slope` will be None.
+
+    Examples
+    --------
+    >>> import numpy as np
+    >>> from scipy.optimize import line_search
+
+    A objective function and its gradient are defined.
+
+    >>> def obj_func(x):
+    ...     return (x[0])**2+(x[1])**2
+    >>> def obj_grad(x):
+    ...     return [2*x[0], 2*x[1]]
+
+    We can find alpha that satisfies strong Wolfe conditions.
+
+    >>> start_point = np.array([1.8, 1.7])
+    >>> search_gradient = np.array([-1.0, -1.0])
+    >>> line_search(obj_func, obj_grad, start_point, search_gradient)
+    (1.0, 2, 1, 1.1300000000000001, 6.13, [1.6, 1.4])
+
+    """
+    fc = [0]
+    gc = [0]
+    gval = [None]
+    gval_alpha = [None]
+
+    def phi(alpha):
+        fc[0] += 1
+        return f(xk + alpha * pk, *args)
+
+    fprime = myfprime
+
+    def derphi(alpha):
+        gc[0] += 1
+        gval[0] = fprime(xk + alpha * pk, *args)  # store for later use
+        gval_alpha[0] = alpha
+        return np.dot(gval[0], pk)
+
+    if gfk is None:
+        gfk = fprime(xk, *args)
+    derphi0 = np.dot(gfk, pk)
+
+    if extra_condition is not None:
+        # Add the current gradient as argument, to avoid needless
+        # re-evaluation
+        def extra_condition2(alpha, phi):
+            if gval_alpha[0] != alpha:
+                derphi(alpha)
+            x = xk + alpha * pk
+            return extra_condition(alpha, x, phi, gval[0])
+    else:
+        extra_condition2 = None
+
+    alpha_star, phi_star, old_fval, derphi_star = scalar_search_wolfe2(
+            phi, derphi, old_fval, old_old_fval, derphi0, c1, c2, amax,
+            extra_condition2, maxiter=maxiter)
+
+    if derphi_star is None:
+        warn('The line search algorithm did not converge',
+             LineSearchWarning, stacklevel=2)
+    else:
+        # derphi_star is a number (derphi) -- so use the most recently
+        # calculated gradient used in computing it derphi = gfk*pk
+        # this is the gradient at the next step no need to compute it
+        # again in the outer loop.
+        derphi_star = gval[0]
+
+    return alpha_star, fc[0], gc[0], phi_star, old_fval, derphi_star
+
+
+def scalar_search_wolfe2(phi, derphi, phi0=None,
+                         old_phi0=None, derphi0=None,
+                         c1=1e-4, c2=0.9, amax=None,
+                         extra_condition=None, maxiter=10):
+    """Find alpha that satisfies strong Wolfe conditions.
+
+    alpha > 0 is assumed to be a descent direction.
+
+    Parameters
+    ----------
+    phi : callable phi(alpha)
+        Objective scalar function.
+    derphi : callable phi'(alpha)
+        Objective function derivative. Returns a scalar.
+    phi0 : float, optional
+        Value of phi at 0.
+    old_phi0 : float, optional
+        Value of phi at previous point.
+    derphi0 : float, optional
+        Value of derphi at 0
+    c1 : float, optional
+        Parameter for Armijo condition rule.
+    c2 : float, optional
+        Parameter for curvature condition rule.
+    amax : float, optional
+        Maximum step size.
+    extra_condition : callable, optional
+        A callable of the form ``extra_condition(alpha, phi_value)``
+        returning a boolean. The line search accepts the value
+        of ``alpha`` only if this callable returns ``True``.
+        If the callable returns ``False`` for the step length,
+        the algorithm will continue with new iterates.
+        The callable is only called for iterates satisfying
+        the strong Wolfe conditions.
+    maxiter : int, optional
+        Maximum number of iterations to perform.
+
+    Returns
+    -------
+    alpha_star : float or None
+        Best alpha, or None if the line search algorithm did not converge.
+    phi_star : float
+        phi at alpha_star.
+    phi0 : float
+        phi at 0.
+    derphi_star : float or None
+        derphi at alpha_star, or None if the line search algorithm
+        did not converge.
+
+    Notes
+    -----
+    Uses the line search algorithm to enforce strong Wolfe
+    conditions. See Wright and Nocedal, 'Numerical Optimization',
+    1999, pp. 59-61.
+
+    """
+    _check_c1_c2(c1, c2)
+
+    if phi0 is None:
+        phi0 = phi(0.)
+
+    if derphi0 is None:
+        derphi0 = derphi(0.)
+
+    alpha0 = 0
+    if old_phi0 is not None and derphi0 != 0:
+        alpha1 = min(1.0, 1.01*2*(phi0 - old_phi0)/derphi0)
+    else:
+        alpha1 = 1.0
+
+    if alpha1 < 0:
+        alpha1 = 1.0
+
+    if amax is not None:
+        alpha1 = min(alpha1, amax)
+
+    phi_a1 = phi(alpha1)
+    #derphi_a1 = derphi(alpha1) evaluated below
+
+    phi_a0 = phi0
+    derphi_a0 = derphi0
+
+    if extra_condition is None:
+        def extra_condition(alpha, phi):
+            return True
+
+    for i in range(maxiter):
+        if alpha1 == 0 or (amax is not None and alpha0 > amax):
+            # alpha1 == 0: This shouldn't happen. Perhaps the increment has
+            # slipped below machine precision?
+            alpha_star = None
+            phi_star = phi0
+            phi0 = old_phi0
+            derphi_star = None
+
+            if alpha1 == 0:
+                msg = 'Rounding errors prevent the line search from converging'
+            else:
+                msg = "The line search algorithm could not find a solution " + \
+                      "less than or equal to amax: %s" % amax
+
+            warn(msg, LineSearchWarning, stacklevel=2)
+            break
+
+        not_first_iteration = i > 0
+        if (phi_a1 > phi0 + c1 * alpha1 * derphi0) or \
+           ((phi_a1 >= phi_a0) and not_first_iteration):
+            alpha_star, phi_star, derphi_star = \
+                        _zoom(alpha0, alpha1, phi_a0,
+                              phi_a1, derphi_a0, phi, derphi,
+                              phi0, derphi0, c1, c2, extra_condition)
+            break
+
+        derphi_a1 = derphi(alpha1)
+        if (abs(derphi_a1) <= -c2*derphi0):
+            if extra_condition(alpha1, phi_a1):
+                alpha_star = alpha1
+                phi_star = phi_a1
+                derphi_star = derphi_a1
+                break
+
+        if (derphi_a1 >= 0):
+            alpha_star, phi_star, derphi_star = \
+                        _zoom(alpha1, alpha0, phi_a1,
+                              phi_a0, derphi_a1, phi, derphi,
+                              phi0, derphi0, c1, c2, extra_condition)
+            break
+
+        alpha2 = 2 * alpha1  # increase by factor of two on each iteration
+        if amax is not None:
+            alpha2 = min(alpha2, amax)
+        alpha0 = alpha1
+        alpha1 = alpha2
+        phi_a0 = phi_a1
+        phi_a1 = phi(alpha1)
+        derphi_a0 = derphi_a1
+
+    else:
+        # stopping test maxiter reached
+        alpha_star = alpha1
+        phi_star = phi_a1
+        derphi_star = None
+        warn('The line search algorithm did not converge',
+             LineSearchWarning, stacklevel=2)
+
+    return alpha_star, phi_star, phi0, derphi_star
+
+
+def _cubicmin(a, fa, fpa, b, fb, c, fc):
+    """
+    Finds the minimizer for a cubic polynomial that goes through the
+    points (a,fa), (b,fb), and (c,fc) with derivative at a of fpa.
+
+    If no minimizer can be found, return None.
+
+    """
+    # f(x) = A *(x-a)^3 + B*(x-a)^2 + C*(x-a) + D
+
+    with np.errstate(divide='raise', over='raise', invalid='raise'):
+        try:
+            C = fpa
+            db = b - a
+            dc = c - a
+            denom = (db * dc) ** 2 * (db - dc)
+            d1 = np.empty((2, 2))
+            d1[0, 0] = dc ** 2
+            d1[0, 1] = -db ** 2
+            d1[1, 0] = -dc ** 3
+            d1[1, 1] = db ** 3
+            [A, B] = np.dot(d1, np.asarray([fb - fa - C * db,
+                                            fc - fa - C * dc]).flatten())
+            A /= denom
+            B /= denom
+            radical = B * B - 3 * A * C
+            xmin = a + (-B + np.sqrt(radical)) / (3 * A)
+        except ArithmeticError:
+            return None
+    if not np.isfinite(xmin):
+        return None
+    return xmin
+
+
+def _quadmin(a, fa, fpa, b, fb):
+    """
+    Finds the minimizer for a quadratic polynomial that goes through
+    the points (a,fa), (b,fb) with derivative at a of fpa.
+
+    """
+    # f(x) = B*(x-a)^2 + C*(x-a) + D
+    with np.errstate(divide='raise', over='raise', invalid='raise'):
+        try:
+            D = fa
+            C = fpa
+            db = b - a * 1.0
+            B = (fb - D - C * db) / (db * db)
+            xmin = a - C / (2.0 * B)
+        except ArithmeticError:
+            return None
+    if not np.isfinite(xmin):
+        return None
+    return xmin
+
+
+def _zoom(a_lo, a_hi, phi_lo, phi_hi, derphi_lo,
+          phi, derphi, phi0, derphi0, c1, c2, extra_condition):
+    """Zoom stage of approximate linesearch satisfying strong Wolfe conditions.
+
+    Part of the optimization algorithm in `scalar_search_wolfe2`.
+
+    Notes
+    -----
+    Implements Algorithm 3.6 (zoom) in Wright and Nocedal,
+    'Numerical Optimization', 1999, pp. 61.
+
+    """
+
+    maxiter = 10
+    i = 0
+    delta1 = 0.2  # cubic interpolant check
+    delta2 = 0.1  # quadratic interpolant check
+    phi_rec = phi0
+    a_rec = 0
+    while True:
+        # interpolate to find a trial step length between a_lo and
+        # a_hi Need to choose interpolation here. Use cubic
+        # interpolation and then if the result is within delta *
+        # dalpha or outside of the interval bounded by a_lo or a_hi
+        # then use quadratic interpolation, if the result is still too
+        # close, then use bisection
+
+        dalpha = a_hi - a_lo
+        if dalpha < 0:
+            a, b = a_hi, a_lo
+        else:
+            a, b = a_lo, a_hi
+
+        # minimizer of cubic interpolant
+        # (uses phi_lo, derphi_lo, phi_hi, and the most recent value of phi)
+        #
+        # if the result is too close to the end points (or out of the
+        # interval), then use quadratic interpolation with phi_lo,
+        # derphi_lo and phi_hi if the result is still too close to the
+        # end points (or out of the interval) then use bisection
+
+        if (i > 0):
+            cchk = delta1 * dalpha
+            a_j = _cubicmin(a_lo, phi_lo, derphi_lo, a_hi, phi_hi,
+                            a_rec, phi_rec)
+        if (i == 0) or (a_j is None) or (a_j > b - cchk) or (a_j < a + cchk):
+            qchk = delta2 * dalpha
+            a_j = _quadmin(a_lo, phi_lo, derphi_lo, a_hi, phi_hi)
+            if (a_j is None) or (a_j > b-qchk) or (a_j < a+qchk):
+                a_j = a_lo + 0.5*dalpha
+
+        # Check new value of a_j
+
+        phi_aj = phi(a_j)
+        if (phi_aj > phi0 + c1*a_j*derphi0) or (phi_aj >= phi_lo):
+            phi_rec = phi_hi
+            a_rec = a_hi
+            a_hi = a_j
+            phi_hi = phi_aj
+        else:
+            derphi_aj = derphi(a_j)
+            if abs(derphi_aj) <= -c2*derphi0 and extra_condition(a_j, phi_aj):
+                a_star = a_j
+                val_star = phi_aj
+                valprime_star = derphi_aj
+                break
+            if derphi_aj*(a_hi - a_lo) >= 0:
+                phi_rec = phi_hi
+                a_rec = a_hi
+                a_hi = a_lo
+                phi_hi = phi_lo
+            else:
+                phi_rec = phi_lo
+                a_rec = a_lo
+            a_lo = a_j
+            phi_lo = phi_aj
+            derphi_lo = derphi_aj
+        i += 1
+        if (i > maxiter):
+            # Failed to find a conforming step size
+            a_star = None
+            val_star = None
+            valprime_star = None
+            break
+    return a_star, val_star, valprime_star
+
+
+#------------------------------------------------------------------------------
+# Armijo line and scalar searches
+#------------------------------------------------------------------------------
+
+def line_search_armijo(f, xk, pk, gfk, old_fval, args=(), c1=1e-4, alpha0=1):
+    """Minimize over alpha, the function ``f(xk+alpha pk)``.
+
+    Parameters
+    ----------
+    f : callable
+        Function to be minimized.
+    xk : array_like
+        Current point.
+    pk : array_like
+        Search direction.
+    gfk : array_like
+        Gradient of `f` at point `xk`.
+    old_fval : float
+        Value of `f` at point `xk`.
+    args : tuple, optional
+        Optional arguments.
+    c1 : float, optional
+        Value to control stopping criterion.
+    alpha0 : scalar, optional
+        Value of `alpha` at start of the optimization.
+
+    Returns
+    -------
+    alpha
+    f_count
+    f_val_at_alpha
+
+    Notes
+    -----
+    Uses the interpolation algorithm (Armijo backtracking) as suggested by
+    Wright and Nocedal in 'Numerical Optimization', 1999, pp. 56-57
+
+    """
+    xk = np.atleast_1d(xk)
+    fc = [0]
+
+    def phi(alpha1):
+        fc[0] += 1
+        return f(xk + alpha1*pk, *args)
+
+    if old_fval is None:
+        phi0 = phi(0.)
+    else:
+        phi0 = old_fval  # compute f(xk) -- done in past loop
+
+    derphi0 = np.dot(gfk, pk)
+    alpha, phi1 = scalar_search_armijo(phi, phi0, derphi0, c1=c1,
+                                       alpha0=alpha0)
+    return alpha, fc[0], phi1
+
+
+def line_search_BFGS(f, xk, pk, gfk, old_fval, args=(), c1=1e-4, alpha0=1):
+    """
+    Compatibility wrapper for `line_search_armijo`
+    """
+    r = line_search_armijo(f, xk, pk, gfk, old_fval, args=args, c1=c1,
+                           alpha0=alpha0)
+    return r[0], r[1], 0, r[2]
+
+
+def scalar_search_armijo(phi, phi0, derphi0, c1=1e-4, alpha0=1, amin=0):
+    """Minimize over alpha, the function ``phi(alpha)``.
+
+    Uses the interpolation algorithm (Armijo backtracking) as suggested by
+    Wright and Nocedal in 'Numerical Optimization', 1999, pp. 56-57
+
+    alpha > 0 is assumed to be a descent direction.
+
+    Returns
+    -------
+    alpha
+    phi1
+
+    """
+    phi_a0 = phi(alpha0)
+    if phi_a0 <= phi0 + c1*alpha0*derphi0:
+        return alpha0, phi_a0
+
+    # Otherwise, compute the minimizer of a quadratic interpolant:
+
+    alpha1 = -(derphi0) * alpha0**2 / 2.0 / (phi_a0 - phi0 - derphi0 * alpha0)
+    phi_a1 = phi(alpha1)
+
+    if (phi_a1 <= phi0 + c1*alpha1*derphi0):
+        return alpha1, phi_a1
+
+    # Otherwise, loop with cubic interpolation until we find an alpha which
+    # satisfies the first Wolfe condition (since we are backtracking, we will
+    # assume that the value of alpha is not too small and satisfies the second
+    # condition.
+
+    while alpha1 > amin:       # we are assuming alpha>0 is a descent direction
+        factor = alpha0**2 * alpha1**2 * (alpha1-alpha0)
+        a = alpha0**2 * (phi_a1 - phi0 - derphi0*alpha1) - \
+            alpha1**2 * (phi_a0 - phi0 - derphi0*alpha0)
+        a = a / factor
+        b = -alpha0**3 * (phi_a1 - phi0 - derphi0*alpha1) + \
+            alpha1**3 * (phi_a0 - phi0 - derphi0*alpha0)
+        b = b / factor
+
+        alpha2 = (-b + np.sqrt(abs(b**2 - 3 * a * derphi0))) / (3.0*a)
+        phi_a2 = phi(alpha2)
+
+        if (phi_a2 <= phi0 + c1*alpha2*derphi0):
+            return alpha2, phi_a2
+
+        if (alpha1 - alpha2) > alpha1 / 2.0 or (1 - alpha2/alpha1) < 0.96:
+            alpha2 = alpha1 / 2.0
+
+        alpha0 = alpha1
+        alpha1 = alpha2
+        phi_a0 = phi_a1
+        phi_a1 = phi_a2
+
+    # Failed to find a suitable step length
+    return None, phi_a1
+
+
+#------------------------------------------------------------------------------
+# Non-monotone line search for DF-SANE
+#------------------------------------------------------------------------------
+
+def _nonmonotone_line_search_cruz(f, x_k, d, prev_fs, eta,
+                                  gamma=1e-4, tau_min=0.1, tau_max=0.5):
+    """
+    Nonmonotone backtracking line search as described in [1]_
+
+    Parameters
+    ----------
+    f : callable
+        Function returning a tuple ``(f, F)`` where ``f`` is the value
+        of a merit function and ``F`` the residual.
+    x_k : ndarray
+        Initial position.
+    d : ndarray
+        Search direction.
+    prev_fs : float
+        List of previous merit function values. Should have ``len(prev_fs) <= M``
+        where ``M`` is the nonmonotonicity window parameter.
+    eta : float
+        Allowed merit function increase, see [1]_
+    gamma, tau_min, tau_max : float, optional
+        Search parameters, see [1]_
+
+    Returns
+    -------
+    alpha : float
+        Step length
+    xp : ndarray
+        Next position
+    fp : float
+        Merit function value at next position
+    Fp : ndarray
+        Residual at next position
+
+    References
+    ----------
+    [1] "Spectral residual method without gradient information for solving
+        large-scale nonlinear systems of equations." W. La Cruz,
+        J.M. Martinez, M. Raydan. Math. Comp. **75**, 1429 (2006).
+
+    """
+    f_k = prev_fs[-1]
+    f_bar = max(prev_fs)
+
+    alpha_p = 1
+    alpha_m = 1
+    alpha = 1
+
+    while True:
+        xp = x_k + alpha_p * d
+        fp, Fp = f(xp)
+
+        if fp <= f_bar + eta - gamma * alpha_p**2 * f_k:
+            alpha = alpha_p
+            break
+
+        alpha_tp = alpha_p**2 * f_k / (fp + (2*alpha_p - 1)*f_k)
+
+        xp = x_k - alpha_m * d
+        fp, Fp = f(xp)
+
+        if fp <= f_bar + eta - gamma * alpha_m**2 * f_k:
+            alpha = -alpha_m
+            break
+
+        alpha_tm = alpha_m**2 * f_k / (fp + (2*alpha_m - 1)*f_k)
+
+        alpha_p = np.clip(alpha_tp, tau_min * alpha_p, tau_max * alpha_p)
+        alpha_m = np.clip(alpha_tm, tau_min * alpha_m, tau_max * alpha_m)
+
+    return alpha, xp, fp, Fp
+
+
+def _nonmonotone_line_search_cheng(f, x_k, d, f_k, C, Q, eta,
+                                   gamma=1e-4, tau_min=0.1, tau_max=0.5,
+                                   nu=0.85):
+    """
+    Nonmonotone line search from [1]
+
+    Parameters
+    ----------
+    f : callable
+        Function returning a tuple ``(f, F)`` where ``f`` is the value
+        of a merit function and ``F`` the residual.
+    x_k : ndarray
+        Initial position.
+    d : ndarray
+        Search direction.
+    f_k : float
+        Initial merit function value.
+    C, Q : float
+        Control parameters. On the first iteration, give values
+        Q=1.0, C=f_k
+    eta : float
+        Allowed merit function increase, see [1]_
+    nu, gamma, tau_min, tau_max : float, optional
+        Search parameters, see [1]_
+
+    Returns
+    -------
+    alpha : float
+        Step length
+    xp : ndarray
+        Next position
+    fp : float
+        Merit function value at next position
+    Fp : ndarray
+        Residual at next position
+    C : float
+        New value for the control parameter C
+    Q : float
+        New value for the control parameter Q
+
+    References
+    ----------
+    .. [1] W. Cheng & D.-H. Li, ''A derivative-free nonmonotone line
+           search and its application to the spectral residual
+           method'', IMA J. Numer. Anal. 29, 814 (2009).
+
+    """
+    alpha_p = 1
+    alpha_m = 1
+    alpha = 1
+
+    while True:
+        xp = x_k + alpha_p * d
+        fp, Fp = f(xp)
+
+        if fp <= C + eta - gamma * alpha_p**2 * f_k:
+            alpha = alpha_p
+            break
+
+        alpha_tp = alpha_p**2 * f_k / (fp + (2*alpha_p - 1)*f_k)
+
+        xp = x_k - alpha_m * d
+        fp, Fp = f(xp)
+
+        if fp <= C + eta - gamma * alpha_m**2 * f_k:
+            alpha = -alpha_m
+            break
+
+        alpha_tm = alpha_m**2 * f_k / (fp + (2*alpha_m - 1)*f_k)
+
+        alpha_p = np.clip(alpha_tp, tau_min * alpha_p, tau_max * alpha_p)
+        alpha_m = np.clip(alpha_tm, tau_min * alpha_m, tau_max * alpha_m)
+
+    # Update C and Q
+    Q_next = nu * Q + 1
+    C = (nu * Q * (C + eta) + fp) / Q_next
+    Q = Q_next
+
+    return alpha, xp, fp, Fp, C, Q
@@ -0,0 +1,714 @@
+"""
+A top-level linear programming interface.
+
+.. versionadded:: 0.15.0
+
+Functions
+---------
+.. autosummary::
+   :toctree: generated/
+
+    linprog
+    linprog_verbose_callback
+    linprog_terse_callback
+
+"""
+
+import numpy as np
+
+from ._optimize import OptimizeResult, OptimizeWarning
+from warnings import warn
+from ._linprog_highs import _linprog_highs
+from ._linprog_ip import _linprog_ip
+from ._linprog_simplex import _linprog_simplex
+from ._linprog_rs import _linprog_rs
+from ._linprog_doc import (_linprog_highs_doc, _linprog_ip_doc,  # noqa: F401
+                           _linprog_rs_doc, _linprog_simplex_doc,
+                           _linprog_highs_ipm_doc, _linprog_highs_ds_doc)
+from ._linprog_util import (
+    _parse_linprog, _presolve, _get_Abc, _LPProblem, _autoscale,
+    _postsolve, _check_result, _display_summary)
+from copy import deepcopy
+
+__all__ = ['linprog', 'linprog_verbose_callback', 'linprog_terse_callback']
+
+__docformat__ = "restructuredtext en"
+
+LINPROG_METHODS = [
+    'simplex', 'revised simplex', 'interior-point', 'highs', 'highs-ds', 'highs-ipm'
+]
+
+
+def linprog_verbose_callback(res):
+    """
+    A sample callback function demonstrating the linprog callback interface.
+    This callback produces detailed output to sys.stdout before each iteration
+    and after the final iteration of the simplex algorithm.
+
+    Parameters
+    ----------
+    res : A `scipy.optimize.OptimizeResult` consisting of the following fields:
+
+        x : 1-D array
+            The independent variable vector which optimizes the linear
+            programming problem.
+        fun : float
+            Value of the objective function.
+        success : bool
+            True if the algorithm succeeded in finding an optimal solution.
+        slack : 1-D array
+            The values of the slack variables. Each slack variable corresponds
+            to an inequality constraint. If the slack is zero, then the
+            corresponding constraint is active.
+        con : 1-D array
+            The (nominally zero) residuals of the equality constraints, that is,
+            ``b - A_eq @ x``
+        phase : int
+            The phase of the optimization being executed. In phase 1 a basic
+            feasible solution is sought and the T has an additional row
+            representing an alternate objective function.
+        status : int
+            An integer representing the exit status of the optimization::
+
+                 0 : Optimization terminated successfully
+                 1 : Iteration limit reached
+                 2 : Problem appears to be infeasible
+                 3 : Problem appears to be unbounded
+                 4 : Serious numerical difficulties encountered
+
+        nit : int
+            The number of iterations performed.
+        message : str
+            A string descriptor of the exit status of the optimization.
+    """
+    x = res['x']
+    fun = res['fun']
+    phase = res['phase']
+    status = res['status']
+    nit = res['nit']
+    message = res['message']
+    complete = res['complete']
+
+    saved_printoptions = np.get_printoptions()
+    np.set_printoptions(linewidth=500,
+                        formatter={'float': lambda x: f"{x: 12.4f}"})
+    if status:
+        print('--------- Simplex Early Exit -------\n')
+        print(f'The simplex method exited early with status {status:d}')
+        print(message)
+    elif complete:
+        print('--------- Simplex Complete --------\n')
+        print(f'Iterations required: {nit}')
+    else:
+        print(f'--------- Iteration {nit:d}  ---------\n')
+
+    if nit > 0:
+        if phase == 1:
+            print('Current Pseudo-Objective Value:')
+        else:
+            print('Current Objective Value:')
+        print('f = ', fun)
+        print()
+        print('Current Solution Vector:')
+        print('x = ', x)
+        print()
+
+    np.set_printoptions(**saved_printoptions)
+
+
+def linprog_terse_callback(res):
+    """
+    A sample callback function demonstrating the linprog callback interface.
+    This callback produces brief output to sys.stdout before each iteration
+    and after the final iteration of the simplex algorithm.
+
+    Parameters
+    ----------
+    res : A `scipy.optimize.OptimizeResult` consisting of the following fields:
+
+        x : 1-D array
+            The independent variable vector which optimizes the linear
+            programming problem.
+        fun : float
+            Value of the objective function.
+        success : bool
+            True if the algorithm succeeded in finding an optimal solution.
+        slack : 1-D array
+            The values of the slack variables. Each slack variable corresponds
+            to an inequality constraint. If the slack is zero, then the
+            corresponding constraint is active.
+        con : 1-D array
+            The (nominally zero) residuals of the equality constraints, that is,
+            ``b - A_eq @ x``.
+        phase : int
+            The phase of the optimization being executed. In phase 1 a basic
+            feasible solution is sought and the T has an additional row
+            representing an alternate objective function.
+        status : int
+            An integer representing the exit status of the optimization::
+
+                 0 : Optimization terminated successfully
+                 1 : Iteration limit reached
+                 2 : Problem appears to be infeasible
+                 3 : Problem appears to be unbounded
+                 4 : Serious numerical difficulties encountered
+
+        nit : int
+            The number of iterations performed.
+        message : str
+            A string descriptor of the exit status of the optimization.
+    """
+    nit = res['nit']
+    x = res['x']
+
+    if nit == 0:
+        print("Iter:   X:")
+    print(f"{nit: <5d}   ", end="")
+    print(x)
+
+
+def linprog(c, A_ub=None, b_ub=None, A_eq=None, b_eq=None,
+            bounds=(0, None), method='highs', callback=None,
+            options=None, x0=None, integrality=None):
+    r"""
+    Linear programming: minimize a linear objective function subject to linear
+    equality and inequality constraints.
+
+    Linear programming solves problems of the following form:
+
+    .. math::
+
+        \min_x \ & c^T x \\
+        \mbox{such that} \ & A_{ub} x \leq b_{ub},\\
+        & A_{eq} x = b_{eq},\\
+        & l \leq x \leq u ,
+
+    where :math:`x` is a vector of decision variables; :math:`c`,
+    :math:`b_{ub}`, :math:`b_{eq}`, :math:`l`, and :math:`u` are vectors; and
+    :math:`A_{ub}` and :math:`A_{eq}` are matrices.
+
+    Alternatively, that's:
+
+        - minimize ::
+
+            c @ x
+
+        - such that ::
+
+            A_ub @ x <= b_ub
+            A_eq @ x == b_eq
+            lb <= x <= ub
+
+    Note that by default ``lb = 0`` and ``ub = None``. Other bounds can be
+    specified with ``bounds``.
+
+    Parameters
+    ----------
+    c : 1-D array
+        The coefficients of the linear objective function to be minimized.
+    A_ub : 2-D array, optional
+        The inequality constraint matrix. Each row of ``A_ub`` specifies the
+        coefficients of a linear inequality constraint on ``x``.
+    b_ub : 1-D array, optional
+        The inequality constraint vector. Each element represents an
+        upper bound on the corresponding value of ``A_ub @ x``.
+    A_eq : 2-D array, optional
+        The equality constraint matrix. Each row of ``A_eq`` specifies the
+        coefficients of a linear equality constraint on ``x``.
+    b_eq : 1-D array, optional
+        The equality constraint vector. Each element of ``A_eq @ x`` must equal
+        the corresponding element of ``b_eq``.
+    bounds : sequence, optional
+        A sequence of ``(min, max)`` pairs for each element in ``x``, defining
+        the minimum and maximum values of that decision variable.
+        If a single tuple ``(min, max)`` is provided, then ``min`` and ``max``
+        will serve as bounds for all decision variables.
+        Use ``None`` to indicate that there is no bound. For instance, the
+        default bound ``(0, None)`` means that all decision variables are
+        non-negative, and the pair ``(None, None)`` means no bounds at all,
+        i.e. all variables are allowed to be any real.
+    method : str, optional
+        The algorithm used to solve the standard form problem.
+        :ref:`'highs' <optimize.linprog-highs>` (default),
+        :ref:`'highs-ds' <optimize.linprog-highs-ds>`,
+        :ref:`'highs-ipm' <optimize.linprog-highs-ipm>`,
+        :ref:`'interior-point' <optimize.linprog-interior-point>` (legacy),
+        :ref:`'revised simplex' <optimize.linprog-revised_simplex>` (legacy),
+        and
+        :ref:`'simplex' <optimize.linprog-simplex>` (legacy) are supported.
+        The legacy methods are deprecated and will be removed in SciPy 1.11.0.
+    callback : callable, optional
+        If a callback function is provided, it will be called at least once per
+        iteration of the algorithm. The callback function must accept a single
+        `scipy.optimize.OptimizeResult` consisting of the following fields:
+
+        x : 1-D array
+            The current solution vector.
+        fun : float
+            The current value of the objective function ``c @ x``.
+        success : bool
+            ``True`` when the algorithm has completed successfully.
+        slack : 1-D array
+            The (nominally positive) values of the slack,
+            ``b_ub - A_ub @ x``.
+        con : 1-D array
+            The (nominally zero) residuals of the equality constraints,
+            ``b_eq - A_eq @ x``.
+        phase : int
+            The phase of the algorithm being executed.
+        status : int
+            An integer representing the status of the algorithm.
+
+            ``0`` : Optimization proceeding nominally.
+
+            ``1`` : Iteration limit reached.
+
+            ``2`` : Problem appears to be infeasible.
+
+            ``3`` : Problem appears to be unbounded.
+
+            ``4`` : Numerical difficulties encountered.
+
+            nit : int
+                The current iteration number.
+            message : str
+                A string descriptor of the algorithm status.
+
+        Callback functions are not currently supported by the HiGHS methods.
+
+    options : dict, optional
+        A dictionary of solver options. All methods accept the following
+        options:
+
+        maxiter : int
+            Maximum number of iterations to perform.
+            Default: see method-specific documentation.
+        disp : bool
+            Set to ``True`` to print convergence messages.
+            Default: ``False``.
+        presolve : bool
+            Set to ``False`` to disable automatic presolve.
+            Default: ``True``.
+
+        All methods except the HiGHS solvers also accept:
+
+        tol : float
+            A tolerance which determines when a residual is "close enough" to
+            zero to be considered exactly zero.
+        autoscale : bool
+            Set to ``True`` to automatically perform equilibration.
+            Consider using this option if the numerical values in the
+            constraints are separated by several orders of magnitude.
+            Default: ``False``.
+        rr : bool
+            Set to ``False`` to disable automatic redundancy removal.
+            Default: ``True``.
+        rr_method : string
+            Method used to identify and remove redundant rows from the
+            equality constraint matrix after presolve. For problems with
+            dense input, the available methods for redundancy removal are:
+
+            "SVD":
+                Repeatedly performs singular value decomposition on
+                the matrix, detecting redundant rows based on nonzeros
+                in the left singular vectors that correspond with
+                zero singular values. May be fast when the matrix is
+                nearly full rank.
+            "pivot":
+                Uses the algorithm presented in [5]_ to identify
+                redundant rows.
+            "ID":
+                Uses a randomized interpolative decomposition.
+                Identifies columns of the matrix transpose not used in
+                a full-rank interpolative decomposition of the matrix.
+            None:
+                Uses "svd" if the matrix is nearly full rank, that is,
+                the difference between the matrix rank and the number
+                of rows is less than five. If not, uses "pivot". The
+                behavior of this default is subject to change without
+                prior notice.
+
+            Default: None.
+            For problems with sparse input, this option is ignored, and the
+            pivot-based algorithm presented in [5]_ is used.
+
+        For method-specific options, see
+        :func:`show_options('linprog') <show_options>`.
+
+    x0 : 1-D array, optional
+        Guess values of the decision variables, which will be refined by
+        the optimization algorithm. This argument is currently used only by the
+        'revised simplex' method, and can only be used if `x0` represents a
+        basic feasible solution.
+
+    integrality : 1-D array or int, optional
+        Indicates the type of integrality constraint on each decision variable.
+
+        ``0`` : Continuous variable; no integrality constraint.
+
+        ``1`` : Integer variable; decision variable must be an integer
+        within `bounds`.
+
+        ``2`` : Semi-continuous variable; decision variable must be within
+        `bounds` or take value ``0``.
+
+        ``3`` : Semi-integer variable; decision variable must be an integer
+        within `bounds` or take value ``0``.
+
+        By default, all variables are continuous.
+
+        For mixed integrality constraints, supply an array of shape `c.shape`.
+        To infer a constraint on each decision variable from shorter inputs,
+        the argument will be broadcasted to `c.shape` using `np.broadcast_to`.
+
+        This argument is currently used only by the ``'highs'`` method and
+        ignored otherwise.
+
+    Returns
+    -------
+    res : OptimizeResult
+        A :class:`scipy.optimize.OptimizeResult` consisting of the fields
+        below. Note that the return types of the fields may depend on whether
+        the optimization was successful, therefore it is recommended to check
+        `OptimizeResult.status` before relying on the other fields:
+
+        x : 1-D array
+            The values of the decision variables that minimizes the
+            objective function while satisfying the constraints.
+        fun : float
+            The optimal value of the objective function ``c @ x``.
+        slack : 1-D array
+            The (nominally positive) values of the slack variables,
+            ``b_ub - A_ub @ x``.
+        con : 1-D array
+            The (nominally zero) residuals of the equality constraints,
+            ``b_eq - A_eq @ x``.
+        success : bool
+            ``True`` when the algorithm succeeds in finding an optimal
+            solution.
+        status : int
+            An integer representing the exit status of the algorithm.
+
+            ``0`` : Optimization terminated successfully.
+
+            ``1`` : Iteration limit reached.
+
+            ``2`` : Problem appears to be infeasible.
+
+            ``3`` : Problem appears to be unbounded.
+
+            ``4`` : Numerical difficulties encountered.
+
+        nit : int
+            The total number of iterations performed in all phases.
+        message : str
+            A string descriptor of the exit status of the algorithm.
+
+    See Also
+    --------
+    show_options : Additional options accepted by the solvers.
+
+    Notes
+    -----
+    This section describes the available solvers that can be selected by the
+    'method' parameter.
+
+    `'highs-ds'` and
+    `'highs-ipm'` are interfaces to the
+    HiGHS simplex and interior-point method solvers [13]_, respectively.
+    `'highs'` (default) chooses between
+    the two automatically. These are the fastest linear
+    programming solvers in SciPy, especially for large, sparse problems;
+    which of these two is faster is problem-dependent.
+    The other solvers (`'interior-point'`, `'revised simplex'`, and
+    `'simplex'`) are legacy methods and will be removed in SciPy 1.11.0.
+
+    Method *highs-ds* is a wrapper of the C++ high performance dual
+    revised simplex implementation (HSOL) [13]_, [14]_. Method *highs-ipm*
+    is a wrapper of a C++ implementation of an **i**\ nterior-\ **p**\ oint
+    **m**\ ethod [13]_; it features a crossover routine, so it is as accurate
+    as a simplex solver. Method *highs* chooses between the two automatically.
+    For new code involving `linprog`, we recommend explicitly choosing one of
+    these three method values.
+
+    .. versionadded:: 1.6.0
+
+    Method *interior-point* uses the primal-dual path following algorithm
+    as outlined in [4]_. This algorithm supports sparse constraint matrices and
+    is typically faster than the simplex methods, especially for large, sparse
+    problems. Note, however, that the solution returned may be slightly less
+    accurate than those of the simplex methods and will not, in general,
+    correspond with a vertex of the polytope defined by the constraints.
+
+    .. versionadded:: 1.0.0
+
+    Method *revised simplex* uses the revised simplex method as described in
+    [9]_, except that a factorization [11]_ of the basis matrix, rather than
+    its inverse, is efficiently maintained and used to solve the linear systems
+    at each iteration of the algorithm.
+
+    .. versionadded:: 1.3.0
+
+    Method *simplex* uses a traditional, full-tableau implementation of
+    Dantzig's simplex algorithm [1]_, [2]_ (*not* the
+    Nelder-Mead simplex). This algorithm is included for backwards
+    compatibility and educational purposes.
+
+    .. versionadded:: 0.15.0
+
+    Before applying *interior-point*, *revised simplex*, or *simplex*,
+    a presolve procedure based on [8]_ attempts
+    to identify trivial infeasibilities, trivial unboundedness, and potential
+    problem simplifications. Specifically, it checks for:
+
+    - rows of zeros in ``A_eq`` or ``A_ub``, representing trivial constraints;
+    - columns of zeros in ``A_eq`` `and` ``A_ub``, representing unconstrained
+      variables;
+    - column singletons in ``A_eq``, representing fixed variables; and
+    - column singletons in ``A_ub``, representing simple bounds.
+
+    If presolve reveals that the problem is unbounded (e.g. an unconstrained
+    and unbounded variable has negative cost) or infeasible (e.g., a row of
+    zeros in ``A_eq`` corresponds with a nonzero in ``b_eq``), the solver
+    terminates with the appropriate status code. Note that presolve terminates
+    as soon as any sign of unboundedness is detected; consequently, a problem
+    may be reported as unbounded when in reality the problem is infeasible
+    (but infeasibility has not been detected yet). Therefore, if it is
+    important to know whether the problem is actually infeasible, solve the
+    problem again with option ``presolve=False``.
+
+    If neither infeasibility nor unboundedness are detected in a single pass
+    of the presolve, bounds are tightened where possible and fixed
+    variables are removed from the problem. Then, linearly dependent rows
+    of the ``A_eq`` matrix are removed, (unless they represent an
+    infeasibility) to avoid numerical difficulties in the primary solve
+    routine. Note that rows that are nearly linearly dependent (within a
+    prescribed tolerance) may also be removed, which can change the optimal
+    solution in rare cases. If this is a concern, eliminate redundancy from
+    your problem formulation and run with option ``rr=False`` or
+    ``presolve=False``.
+
+    Several potential improvements can be made here: additional presolve
+    checks outlined in [8]_ should be implemented, the presolve routine should
+    be run multiple times (until no further simplifications can be made), and
+    more of the efficiency improvements from [5]_ should be implemented in the
+    redundancy removal routines.
+
+    After presolve, the problem is transformed to standard form by converting
+    the (tightened) simple bounds to upper bound constraints, introducing
+    non-negative slack variables for inequality constraints, and expressing
+    unbounded variables as the difference between two non-negative variables.
+    Optionally, the problem is automatically scaled via equilibration [12]_.
+    The selected algorithm solves the standard form problem, and a
+    postprocessing routine converts the result to a solution to the original
+    problem.
+
+    References
+    ----------
+    .. [1] Dantzig, George B., Linear programming and extensions. Rand
+           Corporation Research Study Princeton Univ. Press, Princeton, NJ,
+           1963
+    .. [2] Hillier, S.H. and Lieberman, G.J. (1995), "Introduction to
+           Mathematical Programming", McGraw-Hill, Chapter 4.
+    .. [3] Bland, Robert G. New finite pivoting rules for the simplex method.
+           Mathematics of Operations Research (2), 1977: pp. 103-107.
+    .. [4] Andersen, Erling D., and Knud D. Andersen. "The MOSEK interior point
+           optimizer for linear programming: an implementation of the
+           homogeneous algorithm." High performance optimization. Springer US,
+           2000. 197-232.
+    .. [5] Andersen, Erling D. "Finding all linearly dependent rows in
+           large-scale linear programming." Optimization Methods and Software
+           6.3 (1995): 219-227.
+    .. [6] Freund, Robert M. "Primal-Dual Interior-Point Methods for Linear
+           Programming based on Newton's Method." Unpublished Course Notes,
+           March 2004. Available 2/25/2017 at
+           https://ocw.mit.edu/courses/sloan-school-of-management/15-084j-nonlinear-programming-spring-2004/lecture-notes/lec14_int_pt_mthd.pdf
+    .. [7] Fourer, Robert. "Solving Linear Programs by Interior-Point Methods."
+           Unpublished Course Notes, August 26, 2005. Available 2/25/2017 at
+           http://www.4er.org/CourseNotes/Book%20B/B-III.pdf
+    .. [8] Andersen, Erling D., and Knud D. Andersen. "Presolving in linear
+           programming." Mathematical Programming 71.2 (1995): 221-245.
+    .. [9] Bertsimas, Dimitris, and J. Tsitsiklis. "Introduction to linear
+           programming." Athena Scientific 1 (1997): 997.
+    .. [10] Andersen, Erling D., et al. Implementation of interior point
+            methods for large scale linear programming. HEC/Universite de
+            Geneve, 1996.
+    .. [11] Bartels, Richard H. "A stabilization of the simplex method."
+            Journal in  Numerische Mathematik 16.5 (1971): 414-434.
+    .. [12] Tomlin, J. A. "On scaling linear programming problems."
+            Mathematical Programming Study 4 (1975): 146-166.
+    .. [13] Huangfu, Q., Galabova, I., Feldmeier, M., and Hall, J. A. J.
+            "HiGHS - high performance software for linear optimization."
+            https://highs.dev/
+    .. [14] Huangfu, Q. and Hall, J. A. J. "Parallelizing the dual revised
+            simplex method." Mathematical Programming Computation, 10 (1),
+            119-142, 2018. DOI: 10.1007/s12532-017-0130-5
+
+    Examples
+    --------
+    Consider the following problem:
+
+    .. math::
+
+        \min_{x_0, x_1} \ -x_0 + 4x_1 & \\
+        \mbox{such that} \ -3x_0 + x_1 & \leq 6,\\
+        -x_0 - 2x_1 & \geq -4,\\
+        x_1 & \geq -3.
+
+    The problem is not presented in the form accepted by `linprog`. This is
+    easily remedied by converting the "greater than" inequality
+    constraint to a "less than" inequality constraint by
+    multiplying both sides by a factor of :math:`-1`. Note also that the last
+    constraint is really the simple bound :math:`-3 \leq x_1 \leq \infty`.
+    Finally, since there are no bounds on :math:`x_0`, we must explicitly
+    specify the bounds :math:`-\infty \leq x_0 \leq \infty`, as the
+    default is for variables to be non-negative. After collecting coeffecients
+    into arrays and tuples, the input for this problem is:
+
+    >>> from scipy.optimize import linprog
+    >>> c = [-1, 4]
+    >>> A = [[-3, 1], [1, 2]]
+    >>> b = [6, 4]
+    >>> x0_bounds = (None, None)
+    >>> x1_bounds = (-3, None)
+    >>> res = linprog(c, A_ub=A, b_ub=b, bounds=[x0_bounds, x1_bounds])
+    >>> res.fun
+    -22.0
+    >>> res.x
+    array([10., -3.])
+    >>> res.message
+    'Optimization terminated successfully. (HiGHS Status 7: Optimal)'
+
+    The marginals (AKA dual values / shadow prices / Lagrange multipliers)
+    and residuals (slacks) are also available.
+
+    >>> res.ineqlin
+      residual: [ 3.900e+01  0.000e+00]
+     marginals: [-0.000e+00 -1.000e+00]
+
+    For example, because the marginal associated with the second inequality
+    constraint is -1, we expect the optimal value of the objective function
+    to decrease by ``eps`` if we add a small amount ``eps`` to the right hand
+    side of the second inequality constraint:
+
+    >>> eps = 0.05
+    >>> b[1] += eps
+    >>> linprog(c, A_ub=A, b_ub=b, bounds=[x0_bounds, x1_bounds]).fun
+    -22.05
+
+    Also, because the residual on the first inequality constraint is 39, we
+    can decrease the right hand side of the first constraint by 39 without
+    affecting the optimal solution.
+
+    >>> b = [6, 4]  # reset to original values
+    >>> b[0] -= 39
+    >>> linprog(c, A_ub=A, b_ub=b, bounds=[x0_bounds, x1_bounds]).fun
+    -22.0
+
+    """
+
+    meth = method.lower()
+    methods = {"highs", "highs-ds", "highs-ipm",
+               "simplex", "revised simplex", "interior-point"}
+
+    if meth not in methods:
+        raise ValueError(f"Unknown solver '{method}'")
+
+    if x0 is not None and meth != "revised simplex":
+        warning_message = "x0 is used only when method is 'revised simplex'. "
+        warn(warning_message, OptimizeWarning, stacklevel=2)
+
+    if np.any(integrality) and not meth == "highs":
+        integrality = None
+        warning_message = ("Only `method='highs'` supports integer "
+                           "constraints. Ignoring `integrality`.")
+        warn(warning_message, OptimizeWarning, stacklevel=2)
+    elif np.any(integrality):
+        integrality = np.broadcast_to(integrality, np.shape(c))
+
+    lp = _LPProblem(c, A_ub, b_ub, A_eq, b_eq, bounds, x0, integrality)
+    lp, solver_options = _parse_linprog(lp, options, meth)
+    tol = solver_options.get('tol', 1e-9)
+
+    # Give unmodified problem to HiGHS
+    if meth.startswith('highs'):
+        if callback is not None:
+            raise NotImplementedError("HiGHS solvers do not support the "
+                                      "callback interface.")
+        highs_solvers = {'highs-ipm': 'ipm', 'highs-ds': 'simplex',
+                         'highs': None}
+
+        sol = _linprog_highs(lp, solver=highs_solvers[meth],
+                             **solver_options)
+        sol['status'], sol['message'] = (
+            _check_result(sol['x'], sol['fun'], sol['status'], sol['slack'],
+                          sol['con'], lp.bounds, tol, sol['message'],
+                          integrality))
+        sol['success'] = sol['status'] == 0
+        return OptimizeResult(sol)
+
+    warn(f"`method='{meth}'` is deprecated and will be removed in SciPy "
+         "1.11.0. Please use one of the HiGHS solvers (e.g. "
+         "`method='highs'`) in new code.", DeprecationWarning, stacklevel=2)
+
+    iteration = 0
+    complete = False  # will become True if solved in presolve
+    undo = []
+
+    # Keep the original arrays to calculate slack/residuals for original
+    # problem.
+    lp_o = deepcopy(lp)
+
+    # Solve trivial problem, eliminate variables, tighten bounds, etc.
+    rr_method = solver_options.pop('rr_method', None)  # need to pop these;
+    rr = solver_options.pop('rr', True)  # they're not passed to methods
+    c0 = 0  # we might get a constant term in the objective
+    if solver_options.pop('presolve', True):
+        (lp, c0, x, undo, complete, status, message) = _presolve(lp, rr,
+                                                                 rr_method,
+                                                                 tol)
+
+    C, b_scale = 1, 1  # for trivial unscaling if autoscale is not used
+    postsolve_args = (lp_o._replace(bounds=lp.bounds), undo, C, b_scale)
+
+    if not complete:
+        A, b, c, c0, x0 = _get_Abc(lp, c0)
+        if solver_options.pop('autoscale', False):
+            A, b, c, x0, C, b_scale = _autoscale(A, b, c, x0)
+            postsolve_args = postsolve_args[:-2] + (C, b_scale)
+
+        if meth == 'simplex':
+            x, status, message, iteration = _linprog_simplex(
+                c, c0=c0, A=A, b=b, callback=callback,
+                postsolve_args=postsolve_args, **solver_options)
+        elif meth == 'interior-point':
+            x, status, message, iteration = _linprog_ip(
+                c, c0=c0, A=A, b=b, callback=callback,
+                postsolve_args=postsolve_args, **solver_options)
+        elif meth == 'revised simplex':
+            x, status, message, iteration = _linprog_rs(
+                c, c0=c0, A=A, b=b, x0=x0, callback=callback,
+                postsolve_args=postsolve_args, **solver_options)
+
+    # Eliminate artificial variables, re-introduce presolved variables, etc.
+    disp = solver_options.get('disp', False)
+
+    x, fun, slack, con = _postsolve(x, postsolve_args, complete)
+
+    status, message = _check_result(x, fun, status, slack, con, lp_o.bounds,
+                                    tol, message, integrality)
+
+    if disp:
+        _display_summary(message, status, fun, iteration)
+
+    sol = {
+        'x': x,
+        'fun': fun,
+        'slack': slack,
+        'con': con,
+        'status': status,
+        'message': message,
+        'nit': iteration,
+        'success': status == 0}
+
+    return OptimizeResult(sol)
@@ -0,0 +1,440 @@
+"""HiGHS Linear Optimization Methods
+
+Interface to HiGHS linear optimization software.
+https://highs.dev/
+
+.. versionadded:: 1.5.0
+
+References
+----------
+.. [1] Q. Huangfu and J.A.J. Hall. "Parallelizing the dual revised simplex
+           method." Mathematical Programming Computation, 10 (1), 119-142,
+           2018. DOI: 10.1007/s12532-017-0130-5
+
+"""
+
+import inspect
+import numpy as np
+from ._optimize import OptimizeWarning, OptimizeResult
+from warnings import warn
+from ._highs._highs_wrapper import _highs_wrapper
+from ._highs._highs_constants import (
+    CONST_INF,
+    MESSAGE_LEVEL_NONE,
+    HIGHS_OBJECTIVE_SENSE_MINIMIZE,
+
+    MODEL_STATUS_NOTSET,
+    MODEL_STATUS_LOAD_ERROR,
+    MODEL_STATUS_MODEL_ERROR,
+    MODEL_STATUS_PRESOLVE_ERROR,
+    MODEL_STATUS_SOLVE_ERROR,
+    MODEL_STATUS_POSTSOLVE_ERROR,
+    MODEL_STATUS_MODEL_EMPTY,
+    MODEL_STATUS_OPTIMAL,
+    MODEL_STATUS_INFEASIBLE,
+    MODEL_STATUS_UNBOUNDED_OR_INFEASIBLE,
+    MODEL_STATUS_UNBOUNDED,
+    MODEL_STATUS_REACHED_DUAL_OBJECTIVE_VALUE_UPPER_BOUND
+    as MODEL_STATUS_RDOVUB,
+    MODEL_STATUS_REACHED_OBJECTIVE_TARGET,
+    MODEL_STATUS_REACHED_TIME_LIMIT,
+    MODEL_STATUS_REACHED_ITERATION_LIMIT,
+
+    HIGHS_SIMPLEX_STRATEGY_DUAL,
+
+    HIGHS_SIMPLEX_CRASH_STRATEGY_OFF,
+
+    HIGHS_SIMPLEX_EDGE_WEIGHT_STRATEGY_CHOOSE,
+    HIGHS_SIMPLEX_EDGE_WEIGHT_STRATEGY_DANTZIG,
+    HIGHS_SIMPLEX_EDGE_WEIGHT_STRATEGY_DEVEX,
+    HIGHS_SIMPLEX_EDGE_WEIGHT_STRATEGY_STEEPEST_EDGE,
+)
+from scipy.sparse import csc_matrix, vstack, issparse
+
+
+def _highs_to_scipy_status_message(highs_status, highs_message):
+    """Converts HiGHS status number/message to SciPy status number/message"""
+
+    scipy_statuses_messages = {
+        None: (4, "HiGHS did not provide a status code. "),
+        MODEL_STATUS_NOTSET: (4, ""),
+        MODEL_STATUS_LOAD_ERROR: (4, ""),
+        MODEL_STATUS_MODEL_ERROR: (2, ""),
+        MODEL_STATUS_PRESOLVE_ERROR: (4, ""),
+        MODEL_STATUS_SOLVE_ERROR: (4, ""),
+        MODEL_STATUS_POSTSOLVE_ERROR: (4, ""),
+        MODEL_STATUS_MODEL_EMPTY: (4, ""),
+        MODEL_STATUS_RDOVUB: (4, ""),
+        MODEL_STATUS_REACHED_OBJECTIVE_TARGET: (4, ""),
+        MODEL_STATUS_OPTIMAL: (0, "Optimization terminated successfully. "),
+        MODEL_STATUS_REACHED_TIME_LIMIT: (1, "Time limit reached. "),
+        MODEL_STATUS_REACHED_ITERATION_LIMIT: (1, "Iteration limit reached. "),
+        MODEL_STATUS_INFEASIBLE: (2, "The problem is infeasible. "),
+        MODEL_STATUS_UNBOUNDED: (3, "The problem is unbounded. "),
+        MODEL_STATUS_UNBOUNDED_OR_INFEASIBLE: (4, "The problem is unbounded "
+                                               "or infeasible. ")}
+    unrecognized = (4, "The HiGHS status code was not recognized. ")
+    scipy_status, scipy_message = (
+        scipy_statuses_messages.get(highs_status, unrecognized))
+    scipy_message = (f"{scipy_message}"
+                     f"(HiGHS Status {highs_status}: {highs_message})")
+    return scipy_status, scipy_message
+
+
+def _replace_inf(x):
+    # Replace `np.inf` with CONST_INF
+    infs = np.isinf(x)
+    with np.errstate(invalid="ignore"):
+        x[infs] = np.sign(x[infs])*CONST_INF
+    return x
+
+
+def _convert_to_highs_enum(option, option_str, choices):
+    # If option is in the choices we can look it up, if not use
+    # the default value taken from function signature and warn:
+    try:
+        return choices[option.lower()]
+    except AttributeError:
+        return choices[option]
+    except KeyError:
+        sig = inspect.signature(_linprog_highs)
+        default_str = sig.parameters[option_str].default
+        warn(f"Option {option_str} is {option}, but only values in "
+             f"{set(choices.keys())} are allowed. Using default: "
+             f"{default_str}.",
+             OptimizeWarning, stacklevel=3)
+        return choices[default_str]
+
+
+def _linprog_highs(lp, solver, time_limit=None, presolve=True,
+                   disp=False, maxiter=None,
+                   dual_feasibility_tolerance=None,
+                   primal_feasibility_tolerance=None,
+                   ipm_optimality_tolerance=None,
+                   simplex_dual_edge_weight_strategy=None,
+                   mip_rel_gap=None,
+                   mip_max_nodes=None,
+                   **unknown_options):
+    r"""
+    Solve the following linear programming problem using one of the HiGHS
+    solvers:
+
+    User-facing documentation is in _linprog_doc.py.
+
+    Parameters
+    ----------
+    lp :  _LPProblem
+        A ``scipy.optimize._linprog_util._LPProblem`` ``namedtuple``.
+    solver : "ipm" or "simplex" or None
+        Which HiGHS solver to use.  If ``None``, "simplex" will be used.
+
+    Options
+    -------
+    maxiter : int
+        The maximum number of iterations to perform in either phase. For
+        ``solver='ipm'``, this does not include the number of crossover
+        iterations.  Default is the largest possible value for an ``int``
+        on the platform.
+    disp : bool
+        Set to ``True`` if indicators of optimization status are to be printed
+        to the console each iteration; default ``False``.
+    time_limit : float
+        The maximum time in seconds allotted to solve the problem; default is
+        the largest possible value for a ``double`` on the platform.
+    presolve : bool
+        Presolve attempts to identify trivial infeasibilities,
+        identify trivial unboundedness, and simplify the problem before
+        sending it to the main solver. It is generally recommended
+        to keep the default setting ``True``; set to ``False`` if presolve is
+        to be disabled.
+    dual_feasibility_tolerance : double
+        Dual feasibility tolerance.  Default is 1e-07.
+        The minimum of this and ``primal_feasibility_tolerance``
+        is used for the feasibility tolerance when ``solver='ipm'``.
+    primal_feasibility_tolerance : double
+        Primal feasibility tolerance.  Default is 1e-07.
+        The minimum of this and ``dual_feasibility_tolerance``
+        is used for the feasibility tolerance when ``solver='ipm'``.
+    ipm_optimality_tolerance : double
+        Optimality tolerance for ``solver='ipm'``.  Default is 1e-08.
+        Minimum possible value is 1e-12 and must be smaller than the largest
+        possible value for a ``double`` on the platform.
+    simplex_dual_edge_weight_strategy : str (default: None)
+        Strategy for simplex dual edge weights. The default, ``None``,
+        automatically selects one of the following.
+
+        ``'dantzig'`` uses Dantzig's original strategy of choosing the most
+        negative reduced cost.
+
+        ``'devex'`` uses the strategy described in [15]_.
+
+        ``steepest`` uses the exact steepest edge strategy as described in
+        [16]_.
+
+        ``'steepest-devex'`` begins with the exact steepest edge strategy
+        until the computation is too costly or inexact and then switches to
+        the devex method.
+
+        Currently, using ``None`` always selects ``'steepest-devex'``, but this
+        may change as new options become available.
+
+    mip_max_nodes : int
+        The maximum number of nodes allotted to solve the problem; default is
+        the largest possible value for a ``HighsInt`` on the platform.
+        Ignored if not using the MIP solver.
+    unknown_options : dict
+        Optional arguments not used by this particular solver. If
+        ``unknown_options`` is non-empty, a warning is issued listing all
+        unused options.
+
+    Returns
+    -------
+    sol : dict
+        A dictionary consisting of the fields:
+
+            x : 1D array
+                The values of the decision variables that minimizes the
+                objective function while satisfying the constraints.
+            fun : float
+                The optimal value of the objective function ``c @ x``.
+            slack : 1D array
+                The (nominally positive) values of the slack,
+                ``b_ub - A_ub @ x``.
+            con : 1D array
+                The (nominally zero) residuals of the equality constraints,
+                ``b_eq - A_eq @ x``.
+            success : bool
+                ``True`` when the algorithm succeeds in finding an optimal
+                solution.
+            status : int
+                An integer representing the exit status of the algorithm.
+
+                ``0`` : Optimization terminated successfully.
+
+                ``1`` : Iteration or time limit reached.
+
+                ``2`` : Problem appears to be infeasible.
+
+                ``3`` : Problem appears to be unbounded.
+
+                ``4`` : The HiGHS solver ran into a problem.
+
+            message : str
+                A string descriptor of the exit status of the algorithm.
+            nit : int
+                The total number of iterations performed.
+                For ``solver='simplex'``, this includes iterations in all
+                phases. For ``solver='ipm'``, this does not include
+                crossover iterations.
+            crossover_nit : int
+                The number of primal/dual pushes performed during the
+                crossover routine for ``solver='ipm'``.  This is ``0``
+                for ``solver='simplex'``.
+            ineqlin : OptimizeResult
+                Solution and sensitivity information corresponding to the
+                inequality constraints, `b_ub`. A dictionary consisting of the
+                fields:
+
+                residual : np.ndnarray
+                    The (nominally positive) values of the slack variables,
+                    ``b_ub - A_ub @ x``.  This quantity is also commonly
+                    referred to as "slack".
+
+                marginals : np.ndarray
+                    The sensitivity (partial derivative) of the objective
+                    function with respect to the right-hand side of the
+                    inequality constraints, `b_ub`.
+
+            eqlin : OptimizeResult
+                Solution and sensitivity information corresponding to the
+                equality constraints, `b_eq`.  A dictionary consisting of the
+                fields:
+
+                residual : np.ndarray
+                    The (nominally zero) residuals of the equality constraints,
+                    ``b_eq - A_eq @ x``.
+
+                marginals : np.ndarray
+                    The sensitivity (partial derivative) of the objective
+                    function with respect to the right-hand side of the
+                    equality constraints, `b_eq`.
+
+            lower, upper : OptimizeResult
+                Solution and sensitivity information corresponding to the
+                lower and upper bounds on decision variables, `bounds`.
+
+                residual : np.ndarray
+                    The (nominally positive) values of the quantity
+                    ``x - lb`` (lower) or ``ub - x`` (upper).
+
+                marginals : np.ndarray
+                    The sensitivity (partial derivative) of the objective
+                    function with respect to the lower and upper
+                    `bounds`.
+
+            mip_node_count : int
+                The number of subproblems or "nodes" solved by the MILP
+                solver. Only present when `integrality` is not `None`.
+
+            mip_dual_bound : float
+                The MILP solver's final estimate of the lower bound on the
+                optimal solution. Only present when `integrality` is not
+                `None`.
+
+            mip_gap : float
+                The difference between the final objective function value
+                and the final dual bound, scaled by the final objective
+                function value. Only present when `integrality` is not
+                `None`.
+
+    Notes
+    -----
+    The result fields `ineqlin`, `eqlin`, `lower`, and `upper` all contain
+    `marginals`, or partial derivatives of the objective function with respect
+    to the right-hand side of each constraint. These partial derivatives are
+    also referred to as "Lagrange multipliers", "dual values", and
+    "shadow prices". The sign convention of `marginals` is opposite that
+    of Lagrange multipliers produced by many nonlinear solvers.
+
+    References
+    ----------
+    .. [15] Harris, Paula MJ. "Pivot selection methods of the Devex LP code."
+            Mathematical programming 5.1 (1973): 1-28.
+    .. [16] Goldfarb, Donald, and John Ker Reid. "A practicable steepest-edge
+            simplex algorithm." Mathematical Programming 12.1 (1977): 361-371.
+    """
+    if unknown_options:
+        message = (f"Unrecognized options detected: {unknown_options}. "
+                   "These will be passed to HiGHS verbatim.")
+        warn(message, OptimizeWarning, stacklevel=3)
+
+    # Map options to HiGHS enum values
+    simplex_dual_edge_weight_strategy_enum = _convert_to_highs_enum(
+        simplex_dual_edge_weight_strategy,
+        'simplex_dual_edge_weight_strategy',
+        choices={'dantzig': HIGHS_SIMPLEX_EDGE_WEIGHT_STRATEGY_DANTZIG,
+                 'devex': HIGHS_SIMPLEX_EDGE_WEIGHT_STRATEGY_DEVEX,
+                 'steepest-devex': HIGHS_SIMPLEX_EDGE_WEIGHT_STRATEGY_CHOOSE,
+                 'steepest':
+                 HIGHS_SIMPLEX_EDGE_WEIGHT_STRATEGY_STEEPEST_EDGE,
+                 None: None})
+
+    c, A_ub, b_ub, A_eq, b_eq, bounds, x0, integrality = lp
+
+    lb, ub = bounds.T.copy()  # separate bounds, copy->C-cntgs
+    # highs_wrapper solves LHS <= A*x <= RHS, not equality constraints
+    with np.errstate(invalid="ignore"):
+        lhs_ub = -np.ones_like(b_ub)*np.inf  # LHS of UB constraints is -inf
+    rhs_ub = b_ub  # RHS of UB constraints is b_ub
+    lhs_eq = b_eq  # Equality constraint is inequality
+    rhs_eq = b_eq  # constraint with LHS=RHS
+    lhs = np.concatenate((lhs_ub, lhs_eq))
+    rhs = np.concatenate((rhs_ub, rhs_eq))
+
+    if issparse(A_ub) or issparse(A_eq):
+        A = vstack((A_ub, A_eq))
+    else:
+        A = np.vstack((A_ub, A_eq))
+    A = csc_matrix(A)
+
+    options = {
+        'presolve': presolve,
+        'sense': HIGHS_OBJECTIVE_SENSE_MINIMIZE,
+        'solver': solver,
+        'time_limit': time_limit,
+        'highs_debug_level': MESSAGE_LEVEL_NONE,
+        'dual_feasibility_tolerance': dual_feasibility_tolerance,
+        'ipm_optimality_tolerance': ipm_optimality_tolerance,
+        'log_to_console': disp,
+        'mip_max_nodes': mip_max_nodes,
+        'output_flag': disp,
+        'primal_feasibility_tolerance': primal_feasibility_tolerance,
+        'simplex_dual_edge_weight_strategy':
+            simplex_dual_edge_weight_strategy_enum,
+        'simplex_strategy': HIGHS_SIMPLEX_STRATEGY_DUAL,
+        'simplex_crash_strategy': HIGHS_SIMPLEX_CRASH_STRATEGY_OFF,
+        'ipm_iteration_limit': maxiter,
+        'simplex_iteration_limit': maxiter,
+        'mip_rel_gap': mip_rel_gap,
+    }
+    options.update(unknown_options)
+
+    # np.inf doesn't work; use very large constant
+    rhs = _replace_inf(rhs)
+    lhs = _replace_inf(lhs)
+    lb = _replace_inf(lb)
+    ub = _replace_inf(ub)
+
+    if integrality is None or np.sum(integrality) == 0:
+        integrality = np.empty(0)
+    else:
+        integrality = np.array(integrality)
+
+    res = _highs_wrapper(c, A.indptr, A.indices, A.data, lhs, rhs,
+                         lb, ub, integrality.astype(np.uint8), options)
+
+    # HiGHS represents constraints as lhs/rhs, so
+    # Ax + s = b => Ax = b - s
+    # and we need to split up s by A_ub and A_eq
+    if 'slack' in res:
+        slack = res['slack']
+        con = np.array(slack[len(b_ub):])
+        slack = np.array(slack[:len(b_ub)])
+    else:
+        slack, con = None, None
+
+    # lagrange multipliers for equalities/inequalities and upper/lower bounds
+    if 'lambda' in res:
+        lamda = res['lambda']
+        marg_ineqlin = np.array(lamda[:len(b_ub)])
+        marg_eqlin = np.array(lamda[len(b_ub):])
+        marg_upper = np.array(res['marg_bnds'][1, :])
+        marg_lower = np.array(res['marg_bnds'][0, :])
+    else:
+        marg_ineqlin, marg_eqlin = None, None
+        marg_upper, marg_lower = None, None
+
+    # this needs to be updated if we start choosing the solver intelligently
+
+    # Convert to scipy-style status and message
+    highs_status = res.get('status', None)
+    highs_message = res.get('message', None)
+    status, message = _highs_to_scipy_status_message(highs_status,
+                                                     highs_message)
+
+    x = np.array(res['x']) if 'x' in res else None
+    sol = {'x': x,
+           'slack': slack,
+           'con': con,
+           'ineqlin': OptimizeResult({
+               'residual': slack,
+               'marginals': marg_ineqlin,
+           }),
+           'eqlin': OptimizeResult({
+               'residual': con,
+               'marginals': marg_eqlin,
+           }),
+           'lower': OptimizeResult({
+               'residual': None if x is None else x - lb,
+               'marginals': marg_lower,
+           }),
+           'upper': OptimizeResult({
+               'residual': None if x is None else ub - x,
+               'marginals': marg_upper
+            }),
+           'fun': res.get('fun'),
+           'status': status,
+           'success': res['status'] == MODEL_STATUS_OPTIMAL,
+           'message': message,
+           'nit': res.get('simplex_nit', 0) or res.get('ipm_nit', 0),
+           'crossover_nit': res.get('crossover_nit'),
+           }
+
+    if np.any(x) and integrality is not None:
+        sol.update({
+            'mip_node_count': res.get('mip_node_count', 0),
+            'mip_dual_bound': res.get('mip_dual_bound', 0.0),
+            'mip_gap': res.get('mip_gap', 0.0),
+        })
+
+    return sol
@@ -0,0 +1,572 @@
+"""Revised simplex method for linear programming
+
+The *revised simplex* method uses the method described in [1]_, except
+that a factorization [2]_ of the basis matrix, rather than its inverse,
+is efficiently maintained and used to solve the linear systems at each
+iteration of the algorithm.
+
+.. versionadded:: 1.3.0
+
+References
+----------
+.. [1] Bertsimas, Dimitris, and J. Tsitsiklis. "Introduction to linear
+           programming." Athena Scientific 1 (1997): 997.
+.. [2] Bartels, Richard H. "A stabilization of the simplex method."
+            Journal in  Numerische Mathematik 16.5 (1971): 414-434.
+
+"""
+# Author: Matt Haberland
+
+import numpy as np
+from numpy.linalg import LinAlgError
+
+from scipy.linalg import solve
+from ._optimize import _check_unknown_options
+from ._bglu_dense import LU
+from ._bglu_dense import BGLU as BGLU
+from ._linprog_util import _postsolve
+from ._optimize import OptimizeResult
+
+
+def _phase_one(A, b, x0, callback, postsolve_args, maxiter, tol, disp,
+               maxupdate, mast, pivot):
+    """
+    The purpose of phase one is to find an initial basic feasible solution
+    (BFS) to the original problem.
+
+    Generates an auxiliary problem with a trivial BFS and an objective that
+    minimizes infeasibility of the original problem. Solves the auxiliary
+    problem using the main simplex routine (phase two). This either yields
+    a BFS to the original problem or determines that the original problem is
+    infeasible. If feasible, phase one detects redundant rows in the original
+    constraint matrix and removes them, then chooses additional indices as
+    necessary to complete a basis/BFS for the original problem.
+    """
+
+    m, n = A.shape
+    status = 0
+
+    # generate auxiliary problem to get initial BFS
+    A, b, c, basis, x, status = _generate_auxiliary_problem(A, b, x0, tol)
+
+    if status == 6:
+        residual = c.dot(x)
+        iter_k = 0
+        return x, basis, A, b, residual, status, iter_k
+
+    # solve auxiliary problem
+    phase_one_n = n
+    iter_k = 0
+    x, basis, status, iter_k = _phase_two(c, A, x, basis, callback,
+                                          postsolve_args,
+                                          maxiter, tol, disp,
+                                          maxupdate, mast, pivot,
+                                          iter_k, phase_one_n)
+
+    # check for infeasibility
+    residual = c.dot(x)
+    if status == 0 and residual > tol:
+        status = 2
+
+    # drive artificial variables out of basis
+    # TODO: test redundant row removal better
+    # TODO: make solve more efficient with BGLU? This could take a while.
+    keep_rows = np.ones(m, dtype=bool)
+    for basis_column in basis[basis >= n]:
+        B = A[:, basis]
+        try:
+            basis_finder = np.abs(solve(B, A))  # inefficient
+            pertinent_row = np.argmax(basis_finder[:, basis_column])
+            eligible_columns = np.ones(n, dtype=bool)
+            eligible_columns[basis[basis < n]] = 0
+            eligible_column_indices = np.where(eligible_columns)[0]
+            index = np.argmax(basis_finder[:, :n]
+                              [pertinent_row, eligible_columns])
+            new_basis_column = eligible_column_indices[index]
+            if basis_finder[pertinent_row, new_basis_column] < tol:
+                keep_rows[pertinent_row] = False
+            else:
+                basis[basis == basis_column] = new_basis_column
+        except LinAlgError:
+            status = 4
+
+    # form solution to original problem
+    A = A[keep_rows, :n]
+    basis = basis[keep_rows]
+    x = x[:n]
+    m = A.shape[0]
+    return x, basis, A, b, residual, status, iter_k
+
+
+def _get_more_basis_columns(A, basis):
+    """
+    Called when the auxiliary problem terminates with artificial columns in
+    the basis, which must be removed and replaced with non-artificial
+    columns. Finds additional columns that do not make the matrix singular.
+    """
+    m, n = A.shape
+
+    # options for inclusion are those that aren't already in the basis
+    a = np.arange(m+n)
+    bl = np.zeros(len(a), dtype=bool)
+    bl[basis] = 1
+    options = a[~bl]
+    options = options[options < n]  # and they have to be non-artificial
+
+    # form basis matrix
+    B = np.zeros((m, m))
+    B[:, 0:len(basis)] = A[:, basis]
+
+    if (basis.size > 0 and
+            np.linalg.matrix_rank(B[:, :len(basis)]) < len(basis)):
+        raise Exception("Basis has dependent columns")
+
+    rank = 0  # just enter the loop
+    for i in range(n):  # somewhat arbitrary, but we need another way out
+        # permute the options, and take as many as needed
+        new_basis = np.random.permutation(options)[:m-len(basis)]
+        B[:, len(basis):] = A[:, new_basis]  # update the basis matrix
+        rank = np.linalg.matrix_rank(B)      # check the rank
+        if rank == m:
+            break
+
+    return np.concatenate((basis, new_basis))
+
+
+def _generate_auxiliary_problem(A, b, x0, tol):
+    """
+    Modifies original problem to create an auxiliary problem with a trivial
+    initial basic feasible solution and an objective that minimizes
+    infeasibility in the original problem.
+
+    Conceptually, this is done by stacking an identity matrix on the right of
+    the original constraint matrix, adding artificial variables to correspond
+    with each of these new columns, and generating a cost vector that is all
+    zeros except for ones corresponding with each of the new variables.
+
+    A initial basic feasible solution is trivial: all variables are zero
+    except for the artificial variables, which are set equal to the
+    corresponding element of the right hand side `b`.
+
+    Running the simplex method on this auxiliary problem drives all of the
+    artificial variables - and thus the cost - to zero if the original problem
+    is feasible. The original problem is declared infeasible otherwise.
+
+    Much of the complexity below is to improve efficiency by using singleton
+    columns in the original problem where possible, thus generating artificial
+    variables only as necessary, and using an initial 'guess' basic feasible
+    solution.
+    """
+    status = 0
+    m, n = A.shape
+
+    if x0 is not None:
+        x = x0
+    else:
+        x = np.zeros(n)
+
+    r = b - A@x  # residual; this must be all zeros for feasibility
+
+    A[r < 0] = -A[r < 0]  # express problem with RHS positive for trivial BFS
+    b[r < 0] = -b[r < 0]  # to the auxiliary problem
+    r[r < 0] *= -1
+
+    # Rows which we will need to find a trivial way to zero.
+    # This should just be the rows where there is a nonzero residual.
+    # But then we would not necessarily have a column singleton in every row.
+    # This makes it difficult to find an initial basis.
+    if x0 is None:
+        nonzero_constraints = np.arange(m)
+    else:
+        nonzero_constraints = np.where(r > tol)[0]
+
+    # these are (at least some of) the initial basis columns
+    basis = np.where(np.abs(x) > tol)[0]
+
+    if len(nonzero_constraints) == 0 and len(basis) <= m:  # already a BFS
+        c = np.zeros(n)
+        basis = _get_more_basis_columns(A, basis)
+        return A, b, c, basis, x, status
+    elif (len(nonzero_constraints) > m - len(basis) or
+          np.any(x < 0)):  # can't get trivial BFS
+        c = np.zeros(n)
+        status = 6
+        return A, b, c, basis, x, status
+
+    # chooses existing columns appropriate for inclusion in initial basis
+    cols, rows = _select_singleton_columns(A, r)
+
+    # find the rows we need to zero that we _can_ zero with column singletons
+    i_tofix = np.isin(rows, nonzero_constraints)
+    # these columns can't already be in the basis, though
+    # we are going to add them to the basis and change the corresponding x val
+    i_notinbasis = np.logical_not(np.isin(cols, basis))
+    i_fix_without_aux = np.logical_and(i_tofix, i_notinbasis)
+    rows = rows[i_fix_without_aux]
+    cols = cols[i_fix_without_aux]
+
+    # indices of the rows we can only zero with auxiliary variable
+    # these rows will get a one in each auxiliary column
+    arows = nonzero_constraints[np.logical_not(
+                                np.isin(nonzero_constraints, rows))]
+    n_aux = len(arows)
+    acols = n + np.arange(n_aux)          # indices of auxiliary columns
+
+    basis_ng = np.concatenate((cols, acols))   # basis columns not from guess
+    basis_ng_rows = np.concatenate((rows, arows))  # rows we need to zero
+
+    # add auxiliary singleton columns
+    A = np.hstack((A, np.zeros((m, n_aux))))
+    A[arows, acols] = 1
+
+    # generate initial BFS
+    x = np.concatenate((x, np.zeros(n_aux)))
+    x[basis_ng] = r[basis_ng_rows]/A[basis_ng_rows, basis_ng]
+
+    # generate costs to minimize infeasibility
+    c = np.zeros(n_aux + n)
+    c[acols] = 1
+
+    # basis columns correspond with nonzeros in guess, those with column
+    # singletons we used to zero remaining constraints, and any additional
+    # columns to get a full set (m columns)
+    basis = np.concatenate((basis, basis_ng))
+    basis = _get_more_basis_columns(A, basis)  # add columns as needed
+
+    return A, b, c, basis, x, status
+
+
+def _select_singleton_columns(A, b):
+    """
+    Finds singleton columns for which the singleton entry is of the same sign
+    as the right-hand side; these columns are eligible for inclusion in an
+    initial basis. Determines the rows in which the singleton entries are
+    located. For each of these rows, returns the indices of the one singleton
+    column and its corresponding row.
+    """
+    # find indices of all singleton columns and corresponding row indices
+    column_indices = np.nonzero(np.sum(np.abs(A) != 0, axis=0) == 1)[0]
+    columns = A[:, column_indices]          # array of singleton columns
+    row_indices = np.zeros(len(column_indices), dtype=int)
+    nonzero_rows, nonzero_columns = np.nonzero(columns)
+    row_indices[nonzero_columns] = nonzero_rows   # corresponding row indices
+
+    # keep only singletons with entries that have same sign as RHS
+    # this is necessary because all elements of BFS must be non-negative
+    same_sign = A[row_indices, column_indices]*b[row_indices] >= 0
+    column_indices = column_indices[same_sign][::-1]
+    row_indices = row_indices[same_sign][::-1]
+    # Reversing the order so that steps below select rightmost columns
+    # for initial basis, which will tend to be slack variables. (If the
+    # guess corresponds with a basic feasible solution but a constraint
+    # is not satisfied with the corresponding slack variable zero, the slack
+    # variable must be basic.)
+
+    # for each row, keep rightmost singleton column with an entry in that row
+    unique_row_indices, first_columns = np.unique(row_indices,
+                                                  return_index=True)
+    return column_indices[first_columns], unique_row_indices
+
+
+def _find_nonzero_rows(A, tol):
+    """
+    Returns logical array indicating the locations of rows with at least
+    one nonzero element.
+    """
+    return np.any(np.abs(A) > tol, axis=1)
+
+
+def _select_enter_pivot(c_hat, bl, a, rule="bland", tol=1e-12):
+    """
+    Selects a pivot to enter the basis. Currently Bland's rule - the smallest
+    index that has a negative reduced cost - is the default.
+    """
+    if rule.lower() == "mrc":  # index with minimum reduced cost
+        return a[~bl][np.argmin(c_hat)]
+    else:  # smallest index w/ negative reduced cost
+        return a[~bl][c_hat < -tol][0]
+
+
+def _display_iter(phase, iteration, slack, con, fun):
+    """
+    Print indicators of optimization status to the console.
+    """
+    header = True if not iteration % 20 else False
+
+    if header:
+        print("Phase",
+              "Iteration",
+              "Minimum Slack      ",
+              "Constraint Residual",
+              "Objective          ")
+
+    # :<X.Y left aligns Y digits in X digit spaces
+    fmt = '{0:<6}{1:<10}{2:<20.13}{3:<20.13}{4:<20.13}'
+    try:
+        slack = np.min(slack)
+    except ValueError:
+        slack = "NA"
+    print(fmt.format(phase, iteration, slack, np.linalg.norm(con), fun))
+
+
+def _display_and_callback(phase_one_n, x, postsolve_args, status,
+                          iteration, disp, callback):
+    if phase_one_n is not None:
+        phase = 1
+        x_postsolve = x[:phase_one_n]
+    else:
+        phase = 2
+        x_postsolve = x
+    x_o, fun, slack, con = _postsolve(x_postsolve,
+                                      postsolve_args)
+
+    if callback is not None:
+        res = OptimizeResult({'x': x_o, 'fun': fun, 'slack': slack,
+                              'con': con, 'nit': iteration,
+                              'phase': phase, 'complete': False,
+                              'status': status, 'message': "",
+                              'success': False})
+        callback(res)
+    if disp:
+        _display_iter(phase, iteration, slack, con, fun)
+
+
+def _phase_two(c, A, x, b, callback, postsolve_args, maxiter, tol, disp,
+               maxupdate, mast, pivot, iteration=0, phase_one_n=None):
+    """
+    The heart of the simplex method. Beginning with a basic feasible solution,
+    moves to adjacent basic feasible solutions successively lower reduced cost.
+    Terminates when there are no basic feasible solutions with lower reduced
+    cost or if the problem is determined to be unbounded.
+
+    This implementation follows the revised simplex method based on LU
+    decomposition. Rather than maintaining a tableau or an inverse of the
+    basis matrix, we keep a factorization of the basis matrix that allows
+    efficient solution of linear systems while avoiding stability issues
+    associated with inverted matrices.
+    """
+    m, n = A.shape
+    status = 0
+    a = np.arange(n)                    # indices of columns of A
+    ab = np.arange(m)                   # indices of columns of B
+    if maxupdate:
+        # basis matrix factorization object; similar to B = A[:, b]
+        B = BGLU(A, b, maxupdate, mast)
+    else:
+        B = LU(A, b)
+
+    for iteration in range(iteration, maxiter):
+
+        if disp or callback is not None:
+            _display_and_callback(phase_one_n, x, postsolve_args, status,
+                                  iteration, disp, callback)
+
+        bl = np.zeros(len(a), dtype=bool)
+        bl[b] = 1
+
+        xb = x[b]       # basic variables
+        cb = c[b]       # basic costs
+
+        try:
+            v = B.solve(cb, transposed=True)    # similar to v = solve(B.T, cb)
+        except LinAlgError:
+            status = 4
+            break
+
+        # TODO: cythonize?
+        c_hat = c - v.dot(A)    # reduced cost
+        c_hat = c_hat[~bl]
+        # Above is much faster than:
+        # N = A[:, ~bl]                 # slow!
+        # c_hat = c[~bl] - v.T.dot(N)
+        # Can we perform the multiplication only on the nonbasic columns?
+
+        if np.all(c_hat >= -tol):  # all reduced costs positive -> terminate
+            break
+
+        j = _select_enter_pivot(c_hat, bl, a, rule=pivot, tol=tol)
+        u = B.solve(A[:, j])        # similar to u = solve(B, A[:, j])
+
+        i = u > tol                 # if none of the u are positive, unbounded
+        if not np.any(i):
+            status = 3
+            break
+
+        th = xb[i]/u[i]
+        l = np.argmin(th)           # implicitly selects smallest subscript
+        th_star = th[l]             # step size
+
+        x[b] = x[b] - th_star*u     # take step
+        x[j] = th_star
+        B.update(ab[i][l], j)       # modify basis
+        b = B.b                     # similar to b[ab[i][l]] =
+
+    else:
+        # If the end of the for loop is reached (without a break statement),
+        # then another step has been taken, so the iteration counter should
+        # increment, info should be displayed, and callback should be called.
+        iteration += 1
+        status = 1
+        if disp or callback is not None:
+            _display_and_callback(phase_one_n, x, postsolve_args, status,
+                                  iteration, disp, callback)
+
+    return x, b, status, iteration
+
+
+def _linprog_rs(c, c0, A, b, x0, callback, postsolve_args,
+                maxiter=5000, tol=1e-12, disp=False,
+                maxupdate=10, mast=False, pivot="mrc",
+                **unknown_options):
+    """
+    Solve the following linear programming problem via a two-phase
+    revised simplex algorithm.::
+
+        minimize:     c @ x
+
+        subject to:  A @ x == b
+                     0 <= x < oo
+
+    User-facing documentation is in _linprog_doc.py.
+
+    Parameters
+    ----------
+    c : 1-D array
+        Coefficients of the linear objective function to be minimized.
+    c0 : float
+        Constant term in objective function due to fixed (and eliminated)
+        variables. (Currently unused.)
+    A : 2-D array
+        2-D array which, when matrix-multiplied by ``x``, gives the values of
+        the equality constraints at ``x``.
+    b : 1-D array
+        1-D array of values representing the RHS of each equality constraint
+        (row) in ``A_eq``.
+    x0 : 1-D array, optional
+        Starting values of the independent variables, which will be refined by
+        the optimization algorithm. For the revised simplex method, these must
+        correspond with a basic feasible solution.
+    callback : callable, optional
+        If a callback function is provided, it will be called within each
+        iteration of the algorithm. The callback function must accept a single
+        `scipy.optimize.OptimizeResult` consisting of the following fields:
+
+            x : 1-D array
+                Current solution vector.
+            fun : float
+                Current value of the objective function ``c @ x``.
+            success : bool
+                True only when an algorithm has completed successfully,
+                so this is always False as the callback function is called
+                only while the algorithm is still iterating.
+            slack : 1-D array
+                The values of the slack variables. Each slack variable
+                corresponds to an inequality constraint. If the slack is zero,
+                the corresponding constraint is active.
+            con : 1-D array
+                The (nominally zero) residuals of the equality constraints,
+                that is, ``b - A_eq @ x``.
+            phase : int
+                The phase of the algorithm being executed.
+            status : int
+                For revised simplex, this is always 0 because if a different
+                status is detected, the algorithm terminates.
+            nit : int
+                The number of iterations performed.
+            message : str
+                A string descriptor of the exit status of the optimization.
+    postsolve_args : tuple
+        Data needed by _postsolve to convert the solution to the standard-form
+        problem into the solution to the original problem.
+
+    Options
+    -------
+    maxiter : int
+       The maximum number of iterations to perform in either phase.
+    tol : float
+        The tolerance which determines when a solution is "close enough" to
+        zero in Phase 1 to be considered a basic feasible solution or close
+        enough to positive to serve as an optimal solution.
+    disp : bool
+        Set to ``True`` if indicators of optimization status are to be printed
+        to the console each iteration.
+    maxupdate : int
+        The maximum number of updates performed on the LU factorization.
+        After this many updates is reached, the basis matrix is factorized
+        from scratch.
+    mast : bool
+        Minimize Amortized Solve Time. If enabled, the average time to solve
+        a linear system using the basis factorization is measured. Typically,
+        the average solve time will decrease with each successive solve after
+        initial factorization, as factorization takes much more time than the
+        solve operation (and updates). Eventually, however, the updated
+        factorization becomes sufficiently complex that the average solve time
+        begins to increase. When this is detected, the basis is refactorized
+        from scratch. Enable this option to maximize speed at the risk of
+        nondeterministic behavior. Ignored if ``maxupdate`` is 0.
+    pivot : "mrc" or "bland"
+        Pivot rule: Minimum Reduced Cost (default) or Bland's rule. Choose
+        Bland's rule if iteration limit is reached and cycling is suspected.
+    unknown_options : dict
+        Optional arguments not used by this particular solver. If
+        `unknown_options` is non-empty a warning is issued listing all
+        unused options.
+
+    Returns
+    -------
+    x : 1-D array
+        Solution vector.
+    status : int
+        An integer representing the exit status of the optimization::
+
+         0 : Optimization terminated successfully
+         1 : Iteration limit reached
+         2 : Problem appears to be infeasible
+         3 : Problem appears to be unbounded
+         4 : Numerical difficulties encountered
+         5 : No constraints; turn presolve on
+         6 : Guess x0 cannot be converted to a basic feasible solution
+
+    message : str
+        A string descriptor of the exit status of the optimization.
+    iteration : int
+        The number of iterations taken to solve the problem.
+    """
+
+    _check_unknown_options(unknown_options)
+
+    messages = ["Optimization terminated successfully.",
+                "Iteration limit reached.",
+                "The problem appears infeasible, as the phase one auxiliary "
+                "problem terminated successfully with a residual of {0:.1e}, "
+                "greater than the tolerance {1} required for the solution to "
+                "be considered feasible. Consider increasing the tolerance to "
+                "be greater than {0:.1e}. If this tolerance is unnaceptably "
+                "large, the problem is likely infeasible.",
+                "The problem is unbounded, as the simplex algorithm found "
+                "a basic feasible solution from which there is a direction "
+                "with negative reduced cost in which all decision variables "
+                "increase.",
+                "Numerical difficulties encountered; consider trying "
+                "method='interior-point'.",
+                "Problems with no constraints are trivially solved; please "
+                "turn presolve on.",
+                "The guess x0 cannot be converted to a basic feasible "
+                "solution. "
+                ]
+
+    if A.size == 0:  # address test_unbounded_below_no_presolve_corrected
+        return np.zeros(c.shape), 5, messages[5], 0
+
+    x, basis, A, b, residual, status, iteration = (
+        _phase_one(A, b, x0, callback, postsolve_args,
+                   maxiter, tol, disp, maxupdate, mast, pivot))
+
+    if status == 0:
+        x, basis, status, iteration = _phase_two(c, A, x, basis, callback,
+                                                 postsolve_args,
+                                                 maxiter, tol, disp,
+                                                 maxupdate, mast, pivot,
+                                                 iteration)
+
+    return x, status, messages[status].format(residual, tol), iteration
@@ -0,0 +1,661 @@
+"""Simplex method for  linear programming
+
+The *simplex* method uses a traditional, full-tableau implementation of
+Dantzig's simplex algorithm [1]_, [2]_ (*not* the Nelder-Mead simplex).
+This algorithm is included for backwards compatibility and educational
+purposes.
+
+    .. versionadded:: 0.15.0
+
+Warnings
+--------
+
+The simplex method may encounter numerical difficulties when pivot
+values are close to the specified tolerance. If encountered try
+remove any redundant constraints, change the pivot strategy to Bland's
+rule or increase the tolerance value.
+
+Alternatively, more robust methods maybe be used. See
+:ref:`'interior-point' <optimize.linprog-interior-point>` and
+:ref:`'revised simplex' <optimize.linprog-revised_simplex>`.
+
+References
+----------
+.. [1] Dantzig, George B., Linear programming and extensions. Rand
+       Corporation Research Study Princeton Univ. Press, Princeton, NJ,
+       1963
+.. [2] Hillier, S.H. and Lieberman, G.J. (1995), "Introduction to
+       Mathematical Programming", McGraw-Hill, Chapter 4.
+"""
+
+import numpy as np
+from warnings import warn
+from ._optimize import OptimizeResult, OptimizeWarning, _check_unknown_options
+from ._linprog_util import _postsolve
+
+
+def _pivot_col(T, tol=1e-9, bland=False):
+    """
+    Given a linear programming simplex tableau, determine the column
+    of the variable to enter the basis.
+
+    Parameters
+    ----------
+    T : 2-D array
+        A 2-D array representing the simplex tableau, T, corresponding to the
+        linear programming problem. It should have the form:
+
+        [[A[0, 0], A[0, 1], ..., A[0, n_total], b[0]],
+         [A[1, 0], A[1, 1], ..., A[1, n_total], b[1]],
+         .
+         .
+         .
+         [A[m, 0], A[m, 1], ..., A[m, n_total], b[m]],
+         [c[0],   c[1], ...,   c[n_total],    0]]
+
+        for a Phase 2 problem, or the form:
+
+        [[A[0, 0], A[0, 1], ..., A[0, n_total], b[0]],
+         [A[1, 0], A[1, 1], ..., A[1, n_total], b[1]],
+         .
+         .
+         .
+         [A[m, 0], A[m, 1], ..., A[m, n_total], b[m]],
+         [c[0],   c[1], ...,   c[n_total],   0],
+         [c'[0],  c'[1], ...,  c'[n_total],  0]]
+
+         for a Phase 1 problem (a problem in which a basic feasible solution is
+         sought prior to maximizing the actual objective. ``T`` is modified in
+         place by ``_solve_simplex``.
+    tol : float
+        Elements in the objective row larger than -tol will not be considered
+        for pivoting. Nominally this value is zero, but numerical issues
+        cause a tolerance about zero to be necessary.
+    bland : bool
+        If True, use Bland's rule for selection of the column (select the
+        first column with a negative coefficient in the objective row,
+        regardless of magnitude).
+
+    Returns
+    -------
+    status: bool
+        True if a suitable pivot column was found, otherwise False.
+        A return of False indicates that the linear programming simplex
+        algorithm is complete.
+    col: int
+        The index of the column of the pivot element.
+        If status is False, col will be returned as nan.
+    """
+    ma = np.ma.masked_where(T[-1, :-1] >= -tol, T[-1, :-1], copy=False)
+    if ma.count() == 0:
+        return False, np.nan
+    if bland:
+        # ma.mask is sometimes 0d
+        return True, np.nonzero(np.logical_not(np.atleast_1d(ma.mask)))[0][0]
+    return True, np.ma.nonzero(ma == ma.min())[0][0]
+
+
+def _pivot_row(T, basis, pivcol, phase, tol=1e-9, bland=False):
+    """
+    Given a linear programming simplex tableau, determine the row for the
+    pivot operation.
+
+    Parameters
+    ----------
+    T : 2-D array
+        A 2-D array representing the simplex tableau, T, corresponding to the
+        linear programming problem. It should have the form:
+
+        [[A[0, 0], A[0, 1], ..., A[0, n_total], b[0]],
+         [A[1, 0], A[1, 1], ..., A[1, n_total], b[1]],
+         .
+         .
+         .
+         [A[m, 0], A[m, 1], ..., A[m, n_total], b[m]],
+         [c[0],   c[1], ...,   c[n_total],    0]]
+
+        for a Phase 2 problem, or the form:
+
+        [[A[0, 0], A[0, 1], ..., A[0, n_total], b[0]],
+         [A[1, 0], A[1, 1], ..., A[1, n_total], b[1]],
+         .
+         .
+         .
+         [A[m, 0], A[m, 1], ..., A[m, n_total], b[m]],
+         [c[0],   c[1], ...,   c[n_total],   0],
+         [c'[0],  c'[1], ...,  c'[n_total],  0]]
+
+         for a Phase 1 problem (a Problem in which a basic feasible solution is
+         sought prior to maximizing the actual objective. ``T`` is modified in
+         place by ``_solve_simplex``.
+    basis : array
+        A list of the current basic variables.
+    pivcol : int
+        The index of the pivot column.
+    phase : int
+        The phase of the simplex algorithm (1 or 2).
+    tol : float
+        Elements in the pivot column smaller than tol will not be considered
+        for pivoting. Nominally this value is zero, but numerical issues
+        cause a tolerance about zero to be necessary.
+    bland : bool
+        If True, use Bland's rule for selection of the row (if more than one
+        row can be used, choose the one with the lowest variable index).
+
+    Returns
+    -------
+    status: bool
+        True if a suitable pivot row was found, otherwise False. A return
+        of False indicates that the linear programming problem is unbounded.
+    row: int
+        The index of the row of the pivot element. If status is False, row
+        will be returned as nan.
+    """
+    if phase == 1:
+        k = 2
+    else:
+        k = 1
+    ma = np.ma.masked_where(T[:-k, pivcol] <= tol, T[:-k, pivcol], copy=False)
+    if ma.count() == 0:
+        return False, np.nan
+    mb = np.ma.masked_where(T[:-k, pivcol] <= tol, T[:-k, -1], copy=False)
+    q = mb / ma
+    min_rows = np.ma.nonzero(q == q.min())[0]
+    if bland:
+        return True, min_rows[np.argmin(np.take(basis, min_rows))]
+    return True, min_rows[0]
+
+
+def _apply_pivot(T, basis, pivrow, pivcol, tol=1e-9):
+    """
+    Pivot the simplex tableau inplace on the element given by (pivrow, pivol).
+    The entering variable corresponds to the column given by pivcol forcing
+    the variable basis[pivrow] to leave the basis.
+
+    Parameters
+    ----------
+    T : 2-D array
+        A 2-D array representing the simplex tableau, T, corresponding to the
+        linear programming problem. It should have the form:
+
+        [[A[0, 0], A[0, 1], ..., A[0, n_total], b[0]],
+         [A[1, 0], A[1, 1], ..., A[1, n_total], b[1]],
+         .
+         .
+         .
+         [A[m, 0], A[m, 1], ..., A[m, n_total], b[m]],
+         [c[0],   c[1], ...,   c[n_total],    0]]
+
+        for a Phase 2 problem, or the form:
+
+        [[A[0, 0], A[0, 1], ..., A[0, n_total], b[0]],
+         [A[1, 0], A[1, 1], ..., A[1, n_total], b[1]],
+         .
+         .
+         .
+         [A[m, 0], A[m, 1], ..., A[m, n_total], b[m]],
+         [c[0],   c[1], ...,   c[n_total],   0],
+         [c'[0],  c'[1], ...,  c'[n_total],  0]]
+
+         for a Phase 1 problem (a problem in which a basic feasible solution is
+         sought prior to maximizing the actual objective. ``T`` is modified in
+         place by ``_solve_simplex``.
+    basis : 1-D array
+        An array of the indices of the basic variables, such that basis[i]
+        contains the column corresponding to the basic variable for row i.
+        Basis is modified in place by _apply_pivot.
+    pivrow : int
+        Row index of the pivot.
+    pivcol : int
+        Column index of the pivot.
+    """
+    basis[pivrow] = pivcol
+    pivval = T[pivrow, pivcol]
+    T[pivrow] = T[pivrow] / pivval
+    for irow in range(T.shape[0]):
+        if irow != pivrow:
+            T[irow] = T[irow] - T[pivrow] * T[irow, pivcol]
+
+    # The selected pivot should never lead to a pivot value less than the tol.
+    if np.isclose(pivval, tol, atol=0, rtol=1e4):
+        message = (
+            f"The pivot operation produces a pivot value of:{pivval: .1e}, "
+            "which is only slightly greater than the specified "
+            f"tolerance{tol: .1e}. This may lead to issues regarding the "
+            "numerical stability of the simplex method. "
+            "Removing redundant constraints, changing the pivot strategy "
+            "via Bland's rule or increasing the tolerance may "
+            "help reduce the issue.")
+        warn(message, OptimizeWarning, stacklevel=5)
+
+
+def _solve_simplex(T, n, basis, callback, postsolve_args,
+                   maxiter=1000, tol=1e-9, phase=2, bland=False, nit0=0,
+                   ):
+    """
+    Solve a linear programming problem in "standard form" using the Simplex
+    Method. Linear Programming is intended to solve the following problem form:
+
+    Minimize::
+
+        c @ x
+
+    Subject to::
+
+        A @ x == b
+            x >= 0
+
+    Parameters
+    ----------
+    T : 2-D array
+        A 2-D array representing the simplex tableau, T, corresponding to the
+        linear programming problem. It should have the form:
+
+        [[A[0, 0], A[0, 1], ..., A[0, n_total], b[0]],
+         [A[1, 0], A[1, 1], ..., A[1, n_total], b[1]],
+         .
+         .
+         .
+         [A[m, 0], A[m, 1], ..., A[m, n_total], b[m]],
+         [c[0],   c[1], ...,   c[n_total],    0]]
+
+        for a Phase 2 problem, or the form:
+
+        [[A[0, 0], A[0, 1], ..., A[0, n_total], b[0]],
+         [A[1, 0], A[1, 1], ..., A[1, n_total], b[1]],
+         .
+         .
+         .
+         [A[m, 0], A[m, 1], ..., A[m, n_total], b[m]],
+         [c[0],   c[1], ...,   c[n_total],   0],
+         [c'[0],  c'[1], ...,  c'[n_total],  0]]
+
+         for a Phase 1 problem (a problem in which a basic feasible solution is
+         sought prior to maximizing the actual objective. ``T`` is modified in
+         place by ``_solve_simplex``.
+    n : int
+        The number of true variables in the problem.
+    basis : 1-D array
+        An array of the indices of the basic variables, such that basis[i]
+        contains the column corresponding to the basic variable for row i.
+        Basis is modified in place by _solve_simplex
+    callback : callable, optional
+        If a callback function is provided, it will be called within each
+        iteration of the algorithm. The callback must accept a
+        `scipy.optimize.OptimizeResult` consisting of the following fields:
+
+            x : 1-D array
+                Current solution vector
+            fun : float
+                Current value of the objective function
+            success : bool
+                True only when a phase has completed successfully. This
+                will be False for most iterations.
+            slack : 1-D array
+                The values of the slack variables. Each slack variable
+                corresponds to an inequality constraint. If the slack is zero,
+                the corresponding constraint is active.
+            con : 1-D array
+                The (nominally zero) residuals of the equality constraints,
+                that is, ``b - A_eq @ x``
+            phase : int
+                The phase of the optimization being executed. In phase 1 a basic
+                feasible solution is sought and the T has an additional row
+                representing an alternate objective function.
+            status : int
+                An integer representing the exit status of the optimization::
+
+                     0 : Optimization terminated successfully
+                     1 : Iteration limit reached
+                     2 : Problem appears to be infeasible
+                     3 : Problem appears to be unbounded
+                     4 : Serious numerical difficulties encountered
+
+            nit : int
+                The number of iterations performed.
+            message : str
+                A string descriptor of the exit status of the optimization.
+    postsolve_args : tuple
+        Data needed by _postsolve to convert the solution to the standard-form
+        problem into the solution to the original problem.
+    maxiter : int
+        The maximum number of iterations to perform before aborting the
+        optimization.
+    tol : float
+        The tolerance which determines when a solution is "close enough" to
+        zero in Phase 1 to be considered a basic feasible solution or close
+        enough to positive to serve as an optimal solution.
+    phase : int
+        The phase of the optimization being executed. In phase 1 a basic
+        feasible solution is sought and the T has an additional row
+        representing an alternate objective function.
+    bland : bool
+        If True, choose pivots using Bland's rule [3]_. In problems which
+        fail to converge due to cycling, using Bland's rule can provide
+        convergence at the expense of a less optimal path about the simplex.
+    nit0 : int
+        The initial iteration number used to keep an accurate iteration total
+        in a two-phase problem.
+
+    Returns
+    -------
+    nit : int
+        The number of iterations. Used to keep an accurate iteration total
+        in the two-phase problem.
+    status : int
+        An integer representing the exit status of the optimization::
+
+         0 : Optimization terminated successfully
+         1 : Iteration limit reached
+         2 : Problem appears to be infeasible
+         3 : Problem appears to be unbounded
+         4 : Serious numerical difficulties encountered
+
+    """
+    nit = nit0
+    status = 0
+    message = ''
+    complete = False
+
+    if phase == 1:
+        m = T.shape[1]-2
+    elif phase == 2:
+        m = T.shape[1]-1
+    else:
+        raise ValueError("Argument 'phase' to _solve_simplex must be 1 or 2")
+
+    if phase == 2:
+        # Check if any artificial variables are still in the basis.
+        # If yes, check if any coefficients from this row and a column
+        # corresponding to one of the non-artificial variable is non-zero.
+        # If found, pivot at this term. If not, start phase 2.
+        # Do this for all artificial variables in the basis.
+        # Ref: "An Introduction to Linear Programming and Game Theory"
+        # by Paul R. Thie, Gerard E. Keough, 3rd Ed,
+        # Chapter 3.7 Redundant Systems (pag 102)
+        for pivrow in [row for row in range(basis.size)
+                       if basis[row] > T.shape[1] - 2]:
+            non_zero_row = [col for col in range(T.shape[1] - 1)
+                            if abs(T[pivrow, col]) > tol]
+            if len(non_zero_row) > 0:
+                pivcol = non_zero_row[0]
+                _apply_pivot(T, basis, pivrow, pivcol, tol)
+                nit += 1
+
+    if len(basis[:m]) == 0:
+        solution = np.empty(T.shape[1] - 1, dtype=np.float64)
+    else:
+        solution = np.empty(max(T.shape[1] - 1, max(basis[:m]) + 1),
+                            dtype=np.float64)
+
+    while not complete:
+        # Find the pivot column
+        pivcol_found, pivcol = _pivot_col(T, tol, bland)
+        if not pivcol_found:
+            pivcol = np.nan
+            pivrow = np.nan
+            status = 0
+            complete = True
+        else:
+            # Find the pivot row
+            pivrow_found, pivrow = _pivot_row(T, basis, pivcol, phase, tol, bland)
+            if not pivrow_found:
+                status = 3
+                complete = True
+
+        if callback is not None:
+            solution[:] = 0
+            solution[basis[:n]] = T[:n, -1]
+            x = solution[:m]
+            x, fun, slack, con = _postsolve(
+                x, postsolve_args
+            )
+            res = OptimizeResult({
+                'x': x,
+                'fun': fun,
+                'slack': slack,
+                'con': con,
+                'status': status,
+                'message': message,
+                'nit': nit,
+                'success': status == 0 and complete,
+                'phase': phase,
+                'complete': complete,
+                })
+            callback(res)
+
+        if not complete:
+            if nit >= maxiter:
+                # Iteration limit exceeded
+                status = 1
+                complete = True
+            else:
+                _apply_pivot(T, basis, pivrow, pivcol, tol)
+                nit += 1
+    return nit, status
+
+
+def _linprog_simplex(c, c0, A, b, callback, postsolve_args,
+                     maxiter=1000, tol=1e-9, disp=False, bland=False,
+                     **unknown_options):
+    """
+    Minimize a linear objective function subject to linear equality and
+    non-negativity constraints using the two phase simplex method.
+    Linear programming is intended to solve problems of the following form:
+
+    Minimize::
+
+        c @ x
+
+    Subject to::
+
+        A @ x == b
+            x >= 0
+
+    User-facing documentation is in _linprog_doc.py.
+
+    Parameters
+    ----------
+    c : 1-D array
+        Coefficients of the linear objective function to be minimized.
+    c0 : float
+        Constant term in objective function due to fixed (and eliminated)
+        variables. (Purely for display.)
+    A : 2-D array
+        2-D array such that ``A @ x``, gives the values of the equality
+        constraints at ``x``.
+    b : 1-D array
+        1-D array of values representing the right hand side of each equality
+        constraint (row) in ``A``.
+    callback : callable, optional
+        If a callback function is provided, it will be called within each
+        iteration of the algorithm. The callback function must accept a single
+        `scipy.optimize.OptimizeResult` consisting of the following fields:
+
+            x : 1-D array
+                Current solution vector
+            fun : float
+                Current value of the objective function
+            success : bool
+                True when an algorithm has completed successfully.
+            slack : 1-D array
+                The values of the slack variables. Each slack variable
+                corresponds to an inequality constraint. If the slack is zero,
+                the corresponding constraint is active.
+            con : 1-D array
+                The (nominally zero) residuals of the equality constraints,
+                that is, ``b - A_eq @ x``
+            phase : int
+                The phase of the algorithm being executed.
+            status : int
+                An integer representing the status of the optimization::
+
+                     0 : Algorithm proceeding nominally
+                     1 : Iteration limit reached
+                     2 : Problem appears to be infeasible
+                     3 : Problem appears to be unbounded
+                     4 : Serious numerical difficulties encountered
+            nit : int
+                The number of iterations performed.
+            message : str
+                A string descriptor of the exit status of the optimization.
+    postsolve_args : tuple
+        Data needed by _postsolve to convert the solution to the standard-form
+        problem into the solution to the original problem.
+
+    Options
+    -------
+    maxiter : int
+       The maximum number of iterations to perform.
+    disp : bool
+        If True, print exit status message to sys.stdout
+    tol : float
+        The tolerance which determines when a solution is "close enough" to
+        zero in Phase 1 to be considered a basic feasible solution or close
+        enough to positive to serve as an optimal solution.
+    bland : bool
+        If True, use Bland's anti-cycling rule [3]_ to choose pivots to
+        prevent cycling. If False, choose pivots which should lead to a
+        converged solution more quickly. The latter method is subject to
+        cycling (non-convergence) in rare instances.
+    unknown_options : dict
+        Optional arguments not used by this particular solver. If
+        `unknown_options` is non-empty a warning is issued listing all
+        unused options.
+
+    Returns
+    -------
+    x : 1-D array
+        Solution vector.
+    status : int
+        An integer representing the exit status of the optimization::
+
+         0 : Optimization terminated successfully
+         1 : Iteration limit reached
+         2 : Problem appears to be infeasible
+         3 : Problem appears to be unbounded
+         4 : Serious numerical difficulties encountered
+
+    message : str
+        A string descriptor of the exit status of the optimization.
+    iteration : int
+        The number of iterations taken to solve the problem.
+
+    References
+    ----------
+    .. [1] Dantzig, George B., Linear programming and extensions. Rand
+           Corporation Research Study Princeton Univ. Press, Princeton, NJ,
+           1963
+    .. [2] Hillier, S.H. and Lieberman, G.J. (1995), "Introduction to
+           Mathematical Programming", McGraw-Hill, Chapter 4.
+    .. [3] Bland, Robert G. New finite pivoting rules for the simplex method.
+           Mathematics of Operations Research (2), 1977: pp. 103-107.
+
+
+    Notes
+    -----
+    The expected problem formulation differs between the top level ``linprog``
+    module and the method specific solvers. The method specific solvers expect a
+    problem in standard form:
+
+    Minimize::
+
+        c @ x
+
+    Subject to::
+
+        A @ x == b
+            x >= 0
+
+    Whereas the top level ``linprog`` module expects a problem of form:
+
+    Minimize::
+
+        c @ x
+
+    Subject to::
+
+        A_ub @ x <= b_ub
+        A_eq @ x == b_eq
+         lb <= x <= ub
+
+    where ``lb = 0`` and ``ub = None`` unless set in ``bounds``.
+
+    The original problem contains equality, upper-bound and variable constraints
+    whereas the method specific solver requires equality constraints and
+    variable non-negativity.
+
+    ``linprog`` module converts the original problem to standard form by
+    converting the simple bounds to upper bound constraints, introducing
+    non-negative slack variables for inequality constraints, and expressing
+    unbounded variables as the difference between two non-negative variables.
+    """
+    _check_unknown_options(unknown_options)
+
+    status = 0
+    messages = {0: "Optimization terminated successfully.",
+                1: "Iteration limit reached.",
+                2: "Optimization failed. Unable to find a feasible"
+                   " starting point.",
+                3: "Optimization failed. The problem appears to be unbounded.",
+                4: "Optimization failed. Singular matrix encountered."}
+
+    n, m = A.shape
+
+    # All constraints must have b >= 0.
+    is_negative_constraint = np.less(b, 0)
+    A[is_negative_constraint] *= -1
+    b[is_negative_constraint] *= -1
+
+    # As all constraints are equality constraints the artificial variables
+    # will also be basic variables.
+    av = np.arange(n) + m
+    basis = av.copy()
+
+    # Format the phase one tableau by adding artificial variables and stacking
+    # the constraints, the objective row and pseudo-objective row.
+    row_constraints = np.hstack((A, np.eye(n), b[:, np.newaxis]))
+    row_objective = np.hstack((c, np.zeros(n), c0))
+    row_pseudo_objective = -row_constraints.sum(axis=0)
+    row_pseudo_objective[av] = 0
+    T = np.vstack((row_constraints, row_objective, row_pseudo_objective))
+
+    nit1, status = _solve_simplex(T, n, basis, callback=callback,
+                                  postsolve_args=postsolve_args,
+                                  maxiter=maxiter, tol=tol, phase=1,
+                                  bland=bland
+                                  )
+    # if pseudo objective is zero, remove the last row from the tableau and
+    # proceed to phase 2
+    nit2 = nit1
+    if abs(T[-1, -1]) < tol:
+        # Remove the pseudo-objective row from the tableau
+        T = T[:-1, :]
+        # Remove the artificial variable columns from the tableau
+        T = np.delete(T, av, 1)
+    else:
+        # Failure to find a feasible starting point
+        status = 2
+        messages[status] = (
+            "Phase 1 of the simplex method failed to find a feasible "
+            "solution. The pseudo-objective function evaluates to {0:.1e} "
+            "which exceeds the required tolerance of {1} for a solution to be "
+            "considered 'close enough' to zero to be a basic solution. "
+            "Consider increasing the tolerance to be greater than {0:.1e}. "
+            "If this tolerance is unacceptably  large the problem may be "
+            "infeasible.".format(abs(T[-1, -1]), tol)
+        )
+
+    if status == 0:
+        # Phase 2
+        nit2, status = _solve_simplex(T, n, basis, callback=callback,
+                                      postsolve_args=postsolve_args,
+                                      maxiter=maxiter, tol=tol, phase=2,
+                                      bland=bland, nit0=nit1
+                                      )
+
+    solution = np.zeros(n + m)
+    solution[basis[:n]] = T[:n, -1]
+    x = solution[:m]
+
+    return x, status, messages[status], int(nit2)
@@ -0,0 +1,5 @@
+"""This module contains least-squares algorithms."""
+from .least_squares import least_squares
+from .lsq_linear import lsq_linear
+
+__all__ = ['least_squares', 'lsq_linear']
@@ -0,0 +1,183 @@
+"""Bounded-variable least-squares algorithm."""
+import numpy as np
+from numpy.linalg import norm, lstsq
+from scipy.optimize import OptimizeResult
+
+from .common import print_header_linear, print_iteration_linear
+
+
+def compute_kkt_optimality(g, on_bound):
+    """Compute the maximum violation of KKT conditions."""
+    g_kkt = g * on_bound
+    free_set = on_bound == 0
+    g_kkt[free_set] = np.abs(g[free_set])
+    return np.max(g_kkt)
+
+
+def bvls(A, b, x_lsq, lb, ub, tol, max_iter, verbose, rcond=None):
+    m, n = A.shape
+
+    x = x_lsq.copy()
+    on_bound = np.zeros(n)
+
+    mask = x <= lb
+    x[mask] = lb[mask]
+    on_bound[mask] = -1
+
+    mask = x >= ub
+    x[mask] = ub[mask]
+    on_bound[mask] = 1
+
+    free_set = on_bound == 0
+    active_set = ~free_set
+    free_set, = np.nonzero(free_set)
+
+    r = A.dot(x) - b
+    cost = 0.5 * np.dot(r, r)
+    initial_cost = cost
+    g = A.T.dot(r)
+
+    cost_change = None
+    step_norm = None
+    iteration = 0
+
+    if verbose == 2:
+        print_header_linear()
+
+    # This is the initialization loop. The requirement is that the
+    # least-squares solution on free variables is feasible before BVLS starts.
+    # One possible initialization is to set all variables to lower or upper
+    # bounds, but many iterations may be required from this state later on.
+    # The implemented ad-hoc procedure which intuitively should give a better
+    # initial state: find the least-squares solution on current free variables,
+    # if its feasible then stop, otherwise, set violating variables to
+    # corresponding bounds and continue on the reduced set of free variables.
+
+    while free_set.size > 0:
+        if verbose == 2:
+            optimality = compute_kkt_optimality(g, on_bound)
+            print_iteration_linear(iteration, cost, cost_change, step_norm,
+                                   optimality)
+
+        iteration += 1
+        x_free_old = x[free_set].copy()
+
+        A_free = A[:, free_set]
+        b_free = b - A.dot(x * active_set)
+        z = lstsq(A_free, b_free, rcond=rcond)[0]
+
+        lbv = z < lb[free_set]
+        ubv = z > ub[free_set]
+        v = lbv | ubv
+
+        if np.any(lbv):
+            ind = free_set[lbv]
+            x[ind] = lb[ind]
+            active_set[ind] = True
+            on_bound[ind] = -1
+
+        if np.any(ubv):
+            ind = free_set[ubv]
+            x[ind] = ub[ind]
+            active_set[ind] = True
+            on_bound[ind] = 1
+
+        ind = free_set[~v]
+        x[ind] = z[~v]
+
+        r = A.dot(x) - b
+        cost_new = 0.5 * np.dot(r, r)
+        cost_change = cost - cost_new
+        cost = cost_new
+        g = A.T.dot(r)
+        step_norm = norm(x[free_set] - x_free_old)
+
+        if np.any(v):
+            free_set = free_set[~v]
+        else:
+            break
+
+    if max_iter is None:
+        max_iter = n
+    max_iter += iteration
+
+    termination_status = None
+
+    # Main BVLS loop.
+
+    optimality = compute_kkt_optimality(g, on_bound)
+    for iteration in range(iteration, max_iter):  # BVLS Loop A
+        if verbose == 2:
+            print_iteration_linear(iteration, cost, cost_change,
+                                   step_norm, optimality)
+
+        if optimality < tol:
+            termination_status = 1
+
+        if termination_status is not None:
+            break
+
+        move_to_free = np.argmax(g * on_bound)
+        on_bound[move_to_free] = 0
+        
+        while True:   # BVLS Loop B
+
+            free_set = on_bound == 0
+            active_set = ~free_set
+            free_set, = np.nonzero(free_set)
+    
+            x_free = x[free_set]
+            x_free_old = x_free.copy()
+            lb_free = lb[free_set]
+            ub_free = ub[free_set]
+
+            A_free = A[:, free_set]
+            b_free = b - A.dot(x * active_set)
+            z = lstsq(A_free, b_free, rcond=rcond)[0]
+
+            lbv, = np.nonzero(z < lb_free)
+            ubv, = np.nonzero(z > ub_free)
+            v = np.hstack((lbv, ubv))
+
+            if v.size > 0:
+                alphas = np.hstack((
+                    lb_free[lbv] - x_free[lbv],
+                    ub_free[ubv] - x_free[ubv])) / (z[v] - x_free[v])
+
+                i = np.argmin(alphas)
+                i_free = v[i]
+                alpha = alphas[i]
+
+                x_free *= 1 - alpha
+                x_free += alpha * z
+                x[free_set] = x_free
+
+                if i < lbv.size:
+                    on_bound[free_set[i_free]] = -1
+                else:
+                    on_bound[free_set[i_free]] = 1
+            else:
+                x_free = z
+                x[free_set] = x_free
+                break
+
+        step_norm = norm(x_free - x_free_old)
+
+        r = A.dot(x) - b
+        cost_new = 0.5 * np.dot(r, r)
+        cost_change = cost - cost_new
+
+        if cost_change < tol * cost:
+            termination_status = 2
+        cost = cost_new
+
+        g = A.T.dot(r)
+        optimality = compute_kkt_optimality(g, on_bound)
+
+    if termination_status is None:
+        termination_status = 0
+
+    return OptimizeResult(
+        x=x, fun=r, cost=cost, optimality=optimality, active_mask=on_bound,
+        nit=iteration + 1, status=termination_status,
+        initial_cost=initial_cost)
@@ -0,0 +1,733 @@
+"""Functions used by least-squares algorithms."""
+from math import copysign
+
+import numpy as np
+from numpy.linalg import norm
+
+from scipy.linalg import cho_factor, cho_solve, LinAlgError
+from scipy.sparse import issparse
+from scipy.sparse.linalg import LinearOperator, aslinearoperator
+
+
+EPS = np.finfo(float).eps
+
+
+# Functions related to a trust-region problem.
+
+
+def intersect_trust_region(x, s, Delta):
+    """Find the intersection of a line with the boundary of a trust region.
+
+    This function solves the quadratic equation with respect to t
+    ||(x + s*t)||**2 = Delta**2.
+
+    Returns
+    -------
+    t_neg, t_pos : tuple of float
+        Negative and positive roots.
+
+    Raises
+    ------
+    ValueError
+        If `s` is zero or `x` is not within the trust region.
+    """
+    a = np.dot(s, s)
+    if a == 0:
+        raise ValueError("`s` is zero.")
+
+    b = np.dot(x, s)
+
+    c = np.dot(x, x) - Delta**2
+    if c > 0:
+        raise ValueError("`x` is not within the trust region.")
+
+    d = np.sqrt(b*b - a*c)  # Root from one fourth of the discriminant.
+
+    # Computations below avoid loss of significance, see "Numerical Recipes".
+    q = -(b + copysign(d, b))
+    t1 = q / a
+    t2 = c / q
+
+    if t1 < t2:
+        return t1, t2
+    else:
+        return t2, t1
+
+
+def solve_lsq_trust_region(n, m, uf, s, V, Delta, initial_alpha=None,
+                           rtol=0.01, max_iter=10):
+    """Solve a trust-region problem arising in least-squares minimization.
+
+    This function implements a method described by J. J. More [1]_ and used
+    in MINPACK, but it relies on a single SVD of Jacobian instead of series
+    of Cholesky decompositions. Before running this function, compute:
+    ``U, s, VT = svd(J, full_matrices=False)``.
+
+    Parameters
+    ----------
+    n : int
+        Number of variables.
+    m : int
+        Number of residuals.
+    uf : ndarray
+        Computed as U.T.dot(f).
+    s : ndarray
+        Singular values of J.
+    V : ndarray
+        Transpose of VT.
+    Delta : float
+        Radius of a trust region.
+    initial_alpha : float, optional
+        Initial guess for alpha, which might be available from a previous
+        iteration. If None, determined automatically.
+    rtol : float, optional
+        Stopping tolerance for the root-finding procedure. Namely, the
+        solution ``p`` will satisfy ``abs(norm(p) - Delta) < rtol * Delta``.
+    max_iter : int, optional
+        Maximum allowed number of iterations for the root-finding procedure.
+
+    Returns
+    -------
+    p : ndarray, shape (n,)
+        Found solution of a trust-region problem.
+    alpha : float
+        Positive value such that (J.T*J + alpha*I)*p = -J.T*f.
+        Sometimes called Levenberg-Marquardt parameter.
+    n_iter : int
+        Number of iterations made by root-finding procedure. Zero means
+        that Gauss-Newton step was selected as the solution.
+
+    References
+    ----------
+    .. [1] More, J. J., "The Levenberg-Marquardt Algorithm: Implementation
+           and Theory," Numerical Analysis, ed. G. A. Watson, Lecture Notes
+           in Mathematics 630, Springer Verlag, pp. 105-116, 1977.
+    """
+    def phi_and_derivative(alpha, suf, s, Delta):
+        """Function of which to find zero.
+
+        It is defined as "norm of regularized (by alpha) least-squares
+        solution minus `Delta`". Refer to [1]_.
+        """
+        denom = s**2 + alpha
+        p_norm = norm(suf / denom)
+        phi = p_norm - Delta
+        phi_prime = -np.sum(suf ** 2 / denom**3) / p_norm
+        return phi, phi_prime
+
+    suf = s * uf
+
+    # Check if J has full rank and try Gauss-Newton step.
+    if m >= n:
+        threshold = EPS * m * s[0]
+        full_rank = s[-1] > threshold
+    else:
+        full_rank = False
+
+    if full_rank:
+        p = -V.dot(uf / s)
+        if norm(p) <= Delta:
+            return p, 0.0, 0
+
+    alpha_upper = norm(suf) / Delta
+
+    if full_rank:
+        phi, phi_prime = phi_and_derivative(0.0, suf, s, Delta)
+        alpha_lower = -phi / phi_prime
+    else:
+        alpha_lower = 0.0
+
+    if initial_alpha is None or not full_rank and initial_alpha == 0:
+        alpha = max(0.001 * alpha_upper, (alpha_lower * alpha_upper)**0.5)
+    else:
+        alpha = initial_alpha
+
+    for it in range(max_iter):
+        if alpha < alpha_lower or alpha > alpha_upper:
+            alpha = max(0.001 * alpha_upper, (alpha_lower * alpha_upper)**0.5)
+
+        phi, phi_prime = phi_and_derivative(alpha, suf, s, Delta)
+
+        if phi < 0:
+            alpha_upper = alpha
+
+        ratio = phi / phi_prime
+        alpha_lower = max(alpha_lower, alpha - ratio)
+        alpha -= (phi + Delta) * ratio / Delta
+
+        if np.abs(phi) < rtol * Delta:
+            break
+
+    p = -V.dot(suf / (s**2 + alpha))
+
+    # Make the norm of p equal to Delta, p is changed only slightly during
+    # this. It is done to prevent p lie outside the trust region (which can
+    # cause problems later).
+    p *= Delta / norm(p)
+
+    return p, alpha, it + 1
+
+
+def solve_trust_region_2d(B, g, Delta):
+    """Solve a general trust-region problem in 2 dimensions.
+
+    The problem is reformulated as a 4th order algebraic equation,
+    the solution of which is found by numpy.roots.
+
+    Parameters
+    ----------
+    B : ndarray, shape (2, 2)
+        Symmetric matrix, defines a quadratic term of the function.
+    g : ndarray, shape (2,)
+        Defines a linear term of the function.
+    Delta : float
+        Radius of a trust region.
+
+    Returns
+    -------
+    p : ndarray, shape (2,)
+        Found solution.
+    newton_step : bool
+        Whether the returned solution is the Newton step which lies within
+        the trust region.
+    """
+    try:
+        R, lower = cho_factor(B)
+        p = -cho_solve((R, lower), g)
+        if np.dot(p, p) <= Delta**2:
+            return p, True
+    except LinAlgError:
+        pass
+
+    a = B[0, 0] * Delta**2
+    b = B[0, 1] * Delta**2
+    c = B[1, 1] * Delta**2
+
+    d = g[0] * Delta
+    f = g[1] * Delta
+
+    coeffs = np.array(
+        [-b + d, 2 * (a - c + f), 6 * b, 2 * (-a + c + f), -b - d])
+    t = np.roots(coeffs)  # Can handle leading zeros.
+    t = np.real(t[np.isreal(t)])
+
+    p = Delta * np.vstack((2 * t / (1 + t**2), (1 - t**2) / (1 + t**2)))
+    value = 0.5 * np.sum(p * B.dot(p), axis=0) + np.dot(g, p)
+    i = np.argmin(value)
+    p = p[:, i]
+
+    return p, False
+
+
+def update_tr_radius(Delta, actual_reduction, predicted_reduction,
+                     step_norm, bound_hit):
+    """Update the radius of a trust region based on the cost reduction.
+
+    Returns
+    -------
+    Delta : float
+        New radius.
+    ratio : float
+        Ratio between actual and predicted reductions.
+    """
+    if predicted_reduction > 0:
+        ratio = actual_reduction / predicted_reduction
+    elif predicted_reduction == actual_reduction == 0:
+        ratio = 1
+    else:
+        ratio = 0
+
+    if ratio < 0.25:
+        Delta = 0.25 * step_norm
+    elif ratio > 0.75 and bound_hit:
+        Delta *= 2.0
+
+    return Delta, ratio
+
+
+# Construction and minimization of quadratic functions.
+
+
+def build_quadratic_1d(J, g, s, diag=None, s0=None):
+    """Parameterize a multivariate quadratic function along a line.
+
+    The resulting univariate quadratic function is given as follows::
+
+        f(t) = 0.5 * (s0 + s*t).T * (J.T*J + diag) * (s0 + s*t) +
+               g.T * (s0 + s*t)
+
+    Parameters
+    ----------
+    J : ndarray, sparse matrix or LinearOperator shape (m, n)
+        Jacobian matrix, affects the quadratic term.
+    g : ndarray, shape (n,)
+        Gradient, defines the linear term.
+    s : ndarray, shape (n,)
+        Direction vector of a line.
+    diag : None or ndarray with shape (n,), optional
+        Addition diagonal part, affects the quadratic term.
+        If None, assumed to be 0.
+    s0 : None or ndarray with shape (n,), optional
+        Initial point. If None, assumed to be 0.
+
+    Returns
+    -------
+    a : float
+        Coefficient for t**2.
+    b : float
+        Coefficient for t.
+    c : float
+        Free term. Returned only if `s0` is provided.
+    """
+    v = J.dot(s)
+    a = np.dot(v, v)
+    if diag is not None:
+        a += np.dot(s * diag, s)
+    a *= 0.5
+
+    b = np.dot(g, s)
+
+    if s0 is not None:
+        u = J.dot(s0)
+        b += np.dot(u, v)
+        c = 0.5 * np.dot(u, u) + np.dot(g, s0)
+        if diag is not None:
+            b += np.dot(s0 * diag, s)
+            c += 0.5 * np.dot(s0 * diag, s0)
+        return a, b, c
+    else:
+        return a, b
+
+
+def minimize_quadratic_1d(a, b, lb, ub, c=0):
+    """Minimize a 1-D quadratic function subject to bounds.
+
+    The free term `c` is 0 by default. Bounds must be finite.
+
+    Returns
+    -------
+    t : float
+        Minimum point.
+    y : float
+        Minimum value.
+    """
+    t = [lb, ub]
+    if a != 0:
+        extremum = -0.5 * b / a
+        if lb < extremum < ub:
+            t.append(extremum)
+    t = np.asarray(t)
+    y = t * (a * t + b) + c
+    min_index = np.argmin(y)
+    return t[min_index], y[min_index]
+
+
+def evaluate_quadratic(J, g, s, diag=None):
+    """Compute values of a quadratic function arising in least squares.
+
+    The function is 0.5 * s.T * (J.T * J + diag) * s + g.T * s.
+
+    Parameters
+    ----------
+    J : ndarray, sparse matrix or LinearOperator, shape (m, n)
+        Jacobian matrix, affects the quadratic term.
+    g : ndarray, shape (n,)
+        Gradient, defines the linear term.
+    s : ndarray, shape (k, n) or (n,)
+        Array containing steps as rows.
+    diag : ndarray, shape (n,), optional
+        Addition diagonal part, affects the quadratic term.
+        If None, assumed to be 0.
+
+    Returns
+    -------
+    values : ndarray with shape (k,) or float
+        Values of the function. If `s` was 2-D, then ndarray is
+        returned, otherwise, float is returned.
+    """
+    if s.ndim == 1:
+        Js = J.dot(s)
+        q = np.dot(Js, Js)
+        if diag is not None:
+            q += np.dot(s * diag, s)
+    else:
+        Js = J.dot(s.T)
+        q = np.sum(Js**2, axis=0)
+        if diag is not None:
+            q += np.sum(diag * s**2, axis=1)
+
+    l = np.dot(s, g)
+
+    return 0.5 * q + l
+
+
+# Utility functions to work with bound constraints.
+
+
+def in_bounds(x, lb, ub):
+    """Check if a point lies within bounds."""
+    return np.all((x >= lb) & (x <= ub))
+
+
+def step_size_to_bound(x, s, lb, ub):
+    """Compute a min_step size required to reach a bound.
+
+    The function computes a positive scalar t, such that x + s * t is on
+    the bound.
+
+    Returns
+    -------
+    step : float
+        Computed step. Non-negative value.
+    hits : ndarray of int with shape of x
+        Each element indicates whether a corresponding variable reaches the
+        bound:
+
+             *  0 - the bound was not hit.
+             * -1 - the lower bound was hit.
+             *  1 - the upper bound was hit.
+    """
+    non_zero = np.nonzero(s)
+    s_non_zero = s[non_zero]
+    steps = np.empty_like(x)
+    steps.fill(np.inf)
+    with np.errstate(over='ignore'):
+        steps[non_zero] = np.maximum((lb - x)[non_zero] / s_non_zero,
+                                     (ub - x)[non_zero] / s_non_zero)
+    min_step = np.min(steps)
+    return min_step, np.equal(steps, min_step) * np.sign(s).astype(int)
+
+
+def find_active_constraints(x, lb, ub, rtol=1e-10):
+    """Determine which constraints are active in a given point.
+
+    The threshold is computed using `rtol` and the absolute value of the
+    closest bound.
+
+    Returns
+    -------
+    active : ndarray of int with shape of x
+        Each component shows whether the corresponding constraint is active:
+
+             *  0 - a constraint is not active.
+             * -1 - a lower bound is active.
+             *  1 - a upper bound is active.
+    """
+    active = np.zeros_like(x, dtype=int)
+
+    if rtol == 0:
+        active[x <= lb] = -1
+        active[x >= ub] = 1
+        return active
+
+    lower_dist = x - lb
+    upper_dist = ub - x
+
+    lower_threshold = rtol * np.maximum(1, np.abs(lb))
+    upper_threshold = rtol * np.maximum(1, np.abs(ub))
+
+    lower_active = (np.isfinite(lb) &
+                    (lower_dist <= np.minimum(upper_dist, lower_threshold)))
+    active[lower_active] = -1
+
+    upper_active = (np.isfinite(ub) &
+                    (upper_dist <= np.minimum(lower_dist, upper_threshold)))
+    active[upper_active] = 1
+
+    return active
+
+
+def make_strictly_feasible(x, lb, ub, rstep=1e-10):
+    """Shift a point to the interior of a feasible region.
+
+    Each element of the returned vector is at least at a relative distance
+    `rstep` from the closest bound. If ``rstep=0`` then `np.nextafter` is used.
+    """
+    x_new = x.copy()
+
+    active = find_active_constraints(x, lb, ub, rstep)
+    lower_mask = np.equal(active, -1)
+    upper_mask = np.equal(active, 1)
+
+    if rstep == 0:
+        x_new[lower_mask] = np.nextafter(lb[lower_mask], ub[lower_mask])
+        x_new[upper_mask] = np.nextafter(ub[upper_mask], lb[upper_mask])
+    else:
+        x_new[lower_mask] = (lb[lower_mask] +
+                             rstep * np.maximum(1, np.abs(lb[lower_mask])))
+        x_new[upper_mask] = (ub[upper_mask] -
+                             rstep * np.maximum(1, np.abs(ub[upper_mask])))
+
+    tight_bounds = (x_new < lb) | (x_new > ub)
+    x_new[tight_bounds] = 0.5 * (lb[tight_bounds] + ub[tight_bounds])
+
+    return x_new
+
+
+def CL_scaling_vector(x, g, lb, ub):
+    """Compute Coleman-Li scaling vector and its derivatives.
+
+    Components of a vector v are defined as follows::
+
+               | ub[i] - x[i], if g[i] < 0 and ub[i] < np.inf
+        v[i] = | x[i] - lb[i], if g[i] > 0 and lb[i] > -np.inf
+               | 1,           otherwise
+
+    According to this definition v[i] >= 0 for all i. It differs from the
+    definition in paper [1]_ (eq. (2.2)), where the absolute value of v is
+    used. Both definitions are equivalent down the line.
+    Derivatives of v with respect to x take value 1, -1 or 0 depending on a
+    case.
+
+    Returns
+    -------
+    v : ndarray with shape of x
+        Scaling vector.
+    dv : ndarray with shape of x
+        Derivatives of v[i] with respect to x[i], diagonal elements of v's
+        Jacobian.
+
+    References
+    ----------
+    .. [1] M.A. Branch, T.F. Coleman, and Y. Li, "A Subspace, Interior,
+           and Conjugate Gradient Method for Large-Scale Bound-Constrained
+           Minimization Problems," SIAM Journal on Scientific Computing,
+           Vol. 21, Number 1, pp 1-23, 1999.
+    """
+    v = np.ones_like(x)
+    dv = np.zeros_like(x)
+
+    mask = (g < 0) & np.isfinite(ub)
+    v[mask] = ub[mask] - x[mask]
+    dv[mask] = -1
+
+    mask = (g > 0) & np.isfinite(lb)
+    v[mask] = x[mask] - lb[mask]
+    dv[mask] = 1
+
+    return v, dv
+
+
+def reflective_transformation(y, lb, ub):
+    """Compute reflective transformation and its gradient."""
+    if in_bounds(y, lb, ub):
+        return y, np.ones_like(y)
+
+    lb_finite = np.isfinite(lb)
+    ub_finite = np.isfinite(ub)
+
+    x = y.copy()
+    g_negative = np.zeros_like(y, dtype=bool)
+
+    mask = lb_finite & ~ub_finite
+    x[mask] = np.maximum(y[mask], 2 * lb[mask] - y[mask])
+    g_negative[mask] = y[mask] < lb[mask]
+
+    mask = ~lb_finite & ub_finite
+    x[mask] = np.minimum(y[mask], 2 * ub[mask] - y[mask])
+    g_negative[mask] = y[mask] > ub[mask]
+
+    mask = lb_finite & ub_finite
+    d = ub - lb
+    t = np.remainder(y[mask] - lb[mask], 2 * d[mask])
+    x[mask] = lb[mask] + np.minimum(t, 2 * d[mask] - t)
+    g_negative[mask] = t > d[mask]
+
+    g = np.ones_like(y)
+    g[g_negative] = -1
+
+    return x, g
+
+
+# Functions to display algorithm's progress.
+
+
+def print_header_nonlinear():
+    print("{:^15}{:^15}{:^15}{:^15}{:^15}{:^15}"
+          .format("Iteration", "Total nfev", "Cost", "Cost reduction",
+                  "Step norm", "Optimality"))
+
+
+def print_iteration_nonlinear(iteration, nfev, cost, cost_reduction,
+                              step_norm, optimality):
+    if cost_reduction is None:
+        cost_reduction = " " * 15
+    else:
+        cost_reduction = f"{cost_reduction:^15.2e}"
+
+    if step_norm is None:
+        step_norm = " " * 15
+    else:
+        step_norm = f"{step_norm:^15.2e}"
+
+    print("{:^15}{:^15}{:^15.4e}{}{}{:^15.2e}"
+          .format(iteration, nfev, cost, cost_reduction,
+                  step_norm, optimality))
+
+
+def print_header_linear():
+    print("{:^15}{:^15}{:^15}{:^15}{:^15}"
+          .format("Iteration", "Cost", "Cost reduction", "Step norm",
+                  "Optimality"))
+
+
+def print_iteration_linear(iteration, cost, cost_reduction, step_norm,
+                           optimality):
+    if cost_reduction is None:
+        cost_reduction = " " * 15
+    else:
+        cost_reduction = f"{cost_reduction:^15.2e}"
+
+    if step_norm is None:
+        step_norm = " " * 15
+    else:
+        step_norm = f"{step_norm:^15.2e}"
+
+    print(f"{iteration:^15}{cost:^15.4e}{cost_reduction}{step_norm}{optimality:^15.2e}")
+
+
+# Simple helper functions.
+
+
+def compute_grad(J, f):
+    """Compute gradient of the least-squares cost function."""
+    if isinstance(J, LinearOperator):
+        return J.rmatvec(f)
+    else:
+        return J.T.dot(f)
+
+
+def compute_jac_scale(J, scale_inv_old=None):
+    """Compute variables scale based on the Jacobian matrix."""
+    if issparse(J):
+        scale_inv = np.asarray(J.power(2).sum(axis=0)).ravel()**0.5
+    else:
+        scale_inv = np.sum(J**2, axis=0)**0.5
+
+    if scale_inv_old is None:
+        scale_inv[scale_inv == 0] = 1
+    else:
+        scale_inv = np.maximum(scale_inv, scale_inv_old)
+
+    return 1 / scale_inv, scale_inv
+
+
+def left_multiplied_operator(J, d):
+    """Return diag(d) J as LinearOperator."""
+    J = aslinearoperator(J)
+
+    def matvec(x):
+        return d * J.matvec(x)
+
+    def matmat(X):
+        return d[:, np.newaxis] * J.matmat(X)
+
+    def rmatvec(x):
+        return J.rmatvec(x.ravel() * d)
+
+    return LinearOperator(J.shape, matvec=matvec, matmat=matmat,
+                          rmatvec=rmatvec)
+
+
+def right_multiplied_operator(J, d):
+    """Return J diag(d) as LinearOperator."""
+    J = aslinearoperator(J)
+
+    def matvec(x):
+        return J.matvec(np.ravel(x) * d)
+
+    def matmat(X):
+        return J.matmat(X * d[:, np.newaxis])
+
+    def rmatvec(x):
+        return d * J.rmatvec(x)
+
+    return LinearOperator(J.shape, matvec=matvec, matmat=matmat,
+                          rmatvec=rmatvec)
+
+
+def regularized_lsq_operator(J, diag):
+    """Return a matrix arising in regularized least squares as LinearOperator.
+
+    The matrix is
+        [ J ]
+        [ D ]
+    where D is diagonal matrix with elements from `diag`.
+    """
+    J = aslinearoperator(J)
+    m, n = J.shape
+
+    def matvec(x):
+        return np.hstack((J.matvec(x), diag * x))
+
+    def rmatvec(x):
+        x1 = x[:m]
+        x2 = x[m:]
+        return J.rmatvec(x1) + diag * x2
+
+    return LinearOperator((m + n, n), matvec=matvec, rmatvec=rmatvec)
+
+
+def right_multiply(J, d, copy=True):
+    """Compute J diag(d).
+
+    If `copy` is False, `J` is modified in place (unless being LinearOperator).
+    """
+    if copy and not isinstance(J, LinearOperator):
+        J = J.copy()
+
+    if issparse(J):
+        J.data *= d.take(J.indices, mode='clip')  # scikit-learn recipe.
+    elif isinstance(J, LinearOperator):
+        J = right_multiplied_operator(J, d)
+    else:
+        J *= d
+
+    return J
+
+
+def left_multiply(J, d, copy=True):
+    """Compute diag(d) J.
+
+    If `copy` is False, `J` is modified in place (unless being LinearOperator).
+    """
+    if copy and not isinstance(J, LinearOperator):
+        J = J.copy()
+
+    if issparse(J):
+        J.data *= np.repeat(d, np.diff(J.indptr))  # scikit-learn recipe.
+    elif isinstance(J, LinearOperator):
+        J = left_multiplied_operator(J, d)
+    else:
+        J *= d[:, np.newaxis]
+
+    return J
+
+
+def check_termination(dF, F, dx_norm, x_norm, ratio, ftol, xtol):
+    """Check termination condition for nonlinear least squares."""
+    ftol_satisfied = dF < ftol * F and ratio > 0.25
+    xtol_satisfied = dx_norm < xtol * (xtol + x_norm)
+
+    if ftol_satisfied and xtol_satisfied:
+        return 4
+    elif ftol_satisfied:
+        return 2
+    elif xtol_satisfied:
+        return 3
+    else:
+        return None
+
+
+def scale_for_robust_loss_function(J, f, rho):
+    """Scale Jacobian and residuals for a robust loss function.
+
+    Arrays are modified in place.
+    """
+    J_scale = rho[1] + 2 * rho[2] * f**2
+    J_scale[J_scale < EPS] = EPS
+    J_scale **= 0.5
+
+    f *= rho[1] / J_scale
+
+    return left_multiply(J, J_scale, copy=False), f
@@ -0,0 +1,331 @@
+"""
+Dogleg algorithm with rectangular trust regions for least-squares minimization.
+
+The description of the algorithm can be found in [Voglis]_. The algorithm does
+trust-region iterations, but the shape of trust regions is rectangular as
+opposed to conventional elliptical. The intersection of a trust region and
+an initial feasible region is again some rectangle. Thus, on each iteration a
+bound-constrained quadratic optimization problem is solved.
+
+A quadratic problem is solved by well-known dogleg approach, where the
+function is minimized along piecewise-linear "dogleg" path [NumOpt]_,
+Chapter 4. If Jacobian is not rank-deficient then the function is decreasing
+along this path, and optimization amounts to simply following along this
+path as long as a point stays within the bounds. A constrained Cauchy step
+(along the anti-gradient) is considered for safety in rank deficient cases,
+in this situations the convergence might be slow.
+
+If during iterations some variable hit the initial bound and the component
+of anti-gradient points outside the feasible region, then a next dogleg step
+won't make any progress. At this state such variables satisfy first-order
+optimality conditions and they are excluded before computing a next dogleg
+step.
+
+Gauss-Newton step can be computed exactly by `numpy.linalg.lstsq` (for dense
+Jacobian matrices) or by iterative procedure `scipy.sparse.linalg.lsmr` (for
+dense and sparse matrices, or Jacobian being LinearOperator). The second
+option allows to solve very large problems (up to couple of millions of
+residuals on a regular PC), provided the Jacobian matrix is sufficiently
+sparse. But note that dogbox is not very good for solving problems with
+large number of constraints, because of variables exclusion-inclusion on each
+iteration (a required number of function evaluations might be high or accuracy
+of a solution will be poor), thus its large-scale usage is probably limited
+to unconstrained problems.
+
+References
+----------
+.. [Voglis] C. Voglis and I. E. Lagaris, "A Rectangular Trust Region Dogleg
+            Approach for Unconstrained and Bound Constrained Nonlinear
+            Optimization", WSEAS International Conference on Applied
+            Mathematics, Corfu, Greece, 2004.
+.. [NumOpt] J. Nocedal and S. J. Wright, "Numerical optimization, 2nd edition".
+"""
+import numpy as np
+from numpy.linalg import lstsq, norm
+
+from scipy.sparse.linalg import LinearOperator, aslinearoperator, lsmr
+from scipy.optimize import OptimizeResult
+
+from .common import (
+    step_size_to_bound, in_bounds, update_tr_radius, evaluate_quadratic,
+    build_quadratic_1d, minimize_quadratic_1d, compute_grad,
+    compute_jac_scale, check_termination, scale_for_robust_loss_function,
+    print_header_nonlinear, print_iteration_nonlinear)
+
+
+def lsmr_operator(Jop, d, active_set):
+    """Compute LinearOperator to use in LSMR by dogbox algorithm.
+
+    `active_set` mask is used to excluded active variables from computations
+    of matrix-vector products.
+    """
+    m, n = Jop.shape
+
+    def matvec(x):
+        x_free = x.ravel().copy()
+        x_free[active_set] = 0
+        return Jop.matvec(x * d)
+
+    def rmatvec(x):
+        r = d * Jop.rmatvec(x)
+        r[active_set] = 0
+        return r
+
+    return LinearOperator((m, n), matvec=matvec, rmatvec=rmatvec, dtype=float)
+
+
+def find_intersection(x, tr_bounds, lb, ub):
+    """Find intersection of trust-region bounds and initial bounds.
+
+    Returns
+    -------
+    lb_total, ub_total : ndarray with shape of x
+        Lower and upper bounds of the intersection region.
+    orig_l, orig_u : ndarray of bool with shape of x
+        True means that an original bound is taken as a corresponding bound
+        in the intersection region.
+    tr_l, tr_u : ndarray of bool with shape of x
+        True means that a trust-region bound is taken as a corresponding bound
+        in the intersection region.
+    """
+    lb_centered = lb - x
+    ub_centered = ub - x
+
+    lb_total = np.maximum(lb_centered, -tr_bounds)
+    ub_total = np.minimum(ub_centered, tr_bounds)
+
+    orig_l = np.equal(lb_total, lb_centered)
+    orig_u = np.equal(ub_total, ub_centered)
+
+    tr_l = np.equal(lb_total, -tr_bounds)
+    tr_u = np.equal(ub_total, tr_bounds)
+
+    return lb_total, ub_total, orig_l, orig_u, tr_l, tr_u
+
+
+def dogleg_step(x, newton_step, g, a, b, tr_bounds, lb, ub):
+    """Find dogleg step in a rectangular region.
+
+    Returns
+    -------
+    step : ndarray, shape (n,)
+        Computed dogleg step.
+    bound_hits : ndarray of int, shape (n,)
+        Each component shows whether a corresponding variable hits the
+        initial bound after the step is taken:
+            *  0 - a variable doesn't hit the bound.
+            * -1 - lower bound is hit.
+            *  1 - upper bound is hit.
+    tr_hit : bool
+        Whether the step hit the boundary of the trust-region.
+    """
+    lb_total, ub_total, orig_l, orig_u, tr_l, tr_u = find_intersection(
+        x, tr_bounds, lb, ub
+    )
+    bound_hits = np.zeros_like(x, dtype=int)
+
+    if in_bounds(newton_step, lb_total, ub_total):
+        return newton_step, bound_hits, False
+
+    to_bounds, _ = step_size_to_bound(np.zeros_like(x), -g, lb_total, ub_total)
+
+    # The classical dogleg algorithm would check if Cauchy step fits into
+    # the bounds, and just return it constrained version if not. But in a
+    # rectangular trust region it makes sense to try to improve constrained
+    # Cauchy step too. Thus, we don't distinguish these two cases.
+
+    cauchy_step = -minimize_quadratic_1d(a, b, 0, to_bounds)[0] * g
+
+    step_diff = newton_step - cauchy_step
+    step_size, hits = step_size_to_bound(cauchy_step, step_diff,
+                                         lb_total, ub_total)
+    bound_hits[(hits < 0) & orig_l] = -1
+    bound_hits[(hits > 0) & orig_u] = 1
+    tr_hit = np.any((hits < 0) & tr_l | (hits > 0) & tr_u)
+
+    return cauchy_step + step_size * step_diff, bound_hits, tr_hit
+
+
+def dogbox(fun, jac, x0, f0, J0, lb, ub, ftol, xtol, gtol, max_nfev, x_scale,
+           loss_function, tr_solver, tr_options, verbose):
+    f = f0
+    f_true = f.copy()
+    nfev = 1
+
+    J = J0
+    njev = 1
+
+    if loss_function is not None:
+        rho = loss_function(f)
+        cost = 0.5 * np.sum(rho[0])
+        J, f = scale_for_robust_loss_function(J, f, rho)
+    else:
+        cost = 0.5 * np.dot(f, f)
+
+    g = compute_grad(J, f)
+
+    jac_scale = isinstance(x_scale, str) and x_scale == 'jac'
+    if jac_scale:
+        scale, scale_inv = compute_jac_scale(J)
+    else:
+        scale, scale_inv = x_scale, 1 / x_scale
+
+    Delta = norm(x0 * scale_inv, ord=np.inf)
+    if Delta == 0:
+        Delta = 1.0
+
+    on_bound = np.zeros_like(x0, dtype=int)
+    on_bound[np.equal(x0, lb)] = -1
+    on_bound[np.equal(x0, ub)] = 1
+
+    x = x0
+    step = np.empty_like(x0)
+
+    if max_nfev is None:
+        max_nfev = x0.size * 100
+
+    termination_status = None
+    iteration = 0
+    step_norm = None
+    actual_reduction = None
+
+    if verbose == 2:
+        print_header_nonlinear()
+
+    while True:
+        active_set = on_bound * g < 0
+        free_set = ~active_set
+
+        g_free = g[free_set]
+        g_full = g.copy()
+        g[active_set] = 0
+
+        g_norm = norm(g, ord=np.inf)
+        if g_norm < gtol:
+            termination_status = 1
+
+        if verbose == 2:
+            print_iteration_nonlinear(iteration, nfev, cost, actual_reduction,
+                                      step_norm, g_norm)
+
+        if termination_status is not None or nfev == max_nfev:
+            break
+
+        x_free = x[free_set]
+        lb_free = lb[free_set]
+        ub_free = ub[free_set]
+        scale_free = scale[free_set]
+
+        # Compute (Gauss-)Newton and build quadratic model for Cauchy step.
+        if tr_solver == 'exact':
+            J_free = J[:, free_set]
+            newton_step = lstsq(J_free, -f, rcond=-1)[0]
+
+            # Coefficients for the quadratic model along the anti-gradient.
+            a, b = build_quadratic_1d(J_free, g_free, -g_free)
+        elif tr_solver == 'lsmr':
+            Jop = aslinearoperator(J)
+
+            # We compute lsmr step in scaled variables and then
+            # transform back to normal variables, if lsmr would give exact lsq
+            # solution, this would be equivalent to not doing any
+            # transformations, but from experience it's better this way.
+
+            # We pass active_set to make computations as if we selected
+            # the free subset of J columns, but without actually doing any
+            # slicing, which is expensive for sparse matrices and impossible
+            # for LinearOperator.
+
+            lsmr_op = lsmr_operator(Jop, scale, active_set)
+            newton_step = -lsmr(lsmr_op, f, **tr_options)[0][free_set]
+            newton_step *= scale_free
+
+            # Components of g for active variables were zeroed, so this call
+            # is correct and equivalent to using J_free and g_free.
+            a, b = build_quadratic_1d(Jop, g, -g)
+
+        actual_reduction = -1.0
+        while actual_reduction <= 0 and nfev < max_nfev:
+            tr_bounds = Delta * scale_free
+
+            step_free, on_bound_free, tr_hit = dogleg_step(
+                x_free, newton_step, g_free, a, b, tr_bounds, lb_free, ub_free)
+
+            step.fill(0.0)
+            step[free_set] = step_free
+
+            if tr_solver == 'exact':
+                predicted_reduction = -evaluate_quadratic(J_free, g_free,
+                                                          step_free)
+            elif tr_solver == 'lsmr':
+                predicted_reduction = -evaluate_quadratic(Jop, g, step)
+
+            # gh11403 ensure that solution is fully within bounds.
+            x_new = np.clip(x + step, lb, ub)
+
+            f_new = fun(x_new)
+            nfev += 1
+
+            step_h_norm = norm(step * scale_inv, ord=np.inf)
+
+            if not np.all(np.isfinite(f_new)):
+                Delta = 0.25 * step_h_norm
+                continue
+
+            # Usual trust-region step quality estimation.
+            if loss_function is not None:
+                cost_new = loss_function(f_new, cost_only=True)
+            else:
+                cost_new = 0.5 * np.dot(f_new, f_new)
+            actual_reduction = cost - cost_new
+
+            Delta, ratio = update_tr_radius(
+                Delta, actual_reduction, predicted_reduction,
+                step_h_norm, tr_hit
+            )
+
+            step_norm = norm(step)
+            termination_status = check_termination(
+                actual_reduction, cost, step_norm, norm(x), ratio, ftol, xtol)
+
+            if termination_status is not None:
+                break
+
+        if actual_reduction > 0:
+            on_bound[free_set] = on_bound_free
+
+            x = x_new
+            # Set variables exactly at the boundary.
+            mask = on_bound == -1
+            x[mask] = lb[mask]
+            mask = on_bound == 1
+            x[mask] = ub[mask]
+
+            f = f_new
+            f_true = f.copy()
+
+            cost = cost_new
+
+            J = jac(x, f)
+            njev += 1
+
+            if loss_function is not None:
+                rho = loss_function(f)
+                J, f = scale_for_robust_loss_function(J, f, rho)
+
+            g = compute_grad(J, f)
+
+            if jac_scale:
+                scale, scale_inv = compute_jac_scale(J, scale_inv)
+        else:
+            step_norm = 0
+            actual_reduction = 0
+
+        iteration += 1
+
+    if termination_status is None:
+        termination_status = 0
+
+    return OptimizeResult(
+        x=x, cost=cost, fun=f_true, jac=J, grad=g_full, optimality=g_norm,
+        active_mask=on_bound, nfev=nfev, njev=njev, status=termination_status)
@@ -0,0 +1,967 @@
+"""Generic interface for least-squares minimization."""
+from warnings import warn
+
+import numpy as np
+from numpy.linalg import norm
+
+from scipy.sparse import issparse
+from scipy.sparse.linalg import LinearOperator
+from scipy.optimize import _minpack, OptimizeResult
+from scipy.optimize._numdiff import approx_derivative, group_columns
+from scipy.optimize._minimize import Bounds
+
+from .trf import trf
+from .dogbox import dogbox
+from .common import EPS, in_bounds, make_strictly_feasible
+
+
+TERMINATION_MESSAGES = {
+    -1: "Improper input parameters status returned from `leastsq`",
+    0: "The maximum number of function evaluations is exceeded.",
+    1: "`gtol` termination condition is satisfied.",
+    2: "`ftol` termination condition is satisfied.",
+    3: "`xtol` termination condition is satisfied.",
+    4: "Both `ftol` and `xtol` termination conditions are satisfied."
+}
+
+
+FROM_MINPACK_TO_COMMON = {
+    0: -1,  # Improper input parameters from MINPACK.
+    1: 2,
+    2: 3,
+    3: 4,
+    4: 1,
+    5: 0
+    # There are 6, 7, 8 for too small tolerance parameters,
+    # but we guard against it by checking ftol, xtol, gtol beforehand.
+}
+
+
+def call_minpack(fun, x0, jac, ftol, xtol, gtol, max_nfev, x_scale, diff_step):
+    n = x0.size
+
+    if diff_step is None:
+        epsfcn = EPS
+    else:
+        epsfcn = diff_step**2
+
+    # Compute MINPACK's `diag`, which is inverse of our `x_scale` and
+    # ``x_scale='jac'`` corresponds to ``diag=None``.
+    if isinstance(x_scale, str) and x_scale == 'jac':
+        diag = None
+    else:
+        diag = 1 / x_scale
+
+    full_output = True
+    col_deriv = False
+    factor = 100.0
+
+    if jac is None:
+        if max_nfev is None:
+            # n squared to account for Jacobian evaluations.
+            max_nfev = 100 * n * (n + 1)
+        x, info, status = _minpack._lmdif(
+            fun, x0, (), full_output, ftol, xtol, gtol,
+            max_nfev, epsfcn, factor, diag)
+    else:
+        if max_nfev is None:
+            max_nfev = 100 * n
+        x, info, status = _minpack._lmder(
+            fun, jac, x0, (), full_output, col_deriv,
+            ftol, xtol, gtol, max_nfev, factor, diag)
+
+    f = info['fvec']
+
+    if callable(jac):
+        J = jac(x)
+    else:
+        J = np.atleast_2d(approx_derivative(fun, x))
+
+    cost = 0.5 * np.dot(f, f)
+    g = J.T.dot(f)
+    g_norm = norm(g, ord=np.inf)
+
+    nfev = info['nfev']
+    njev = info.get('njev', None)
+
+    status = FROM_MINPACK_TO_COMMON[status]
+    active_mask = np.zeros_like(x0, dtype=int)
+
+    return OptimizeResult(
+        x=x, cost=cost, fun=f, jac=J, grad=g, optimality=g_norm,
+        active_mask=active_mask, nfev=nfev, njev=njev, status=status)
+
+
+def prepare_bounds(bounds, n):
+    lb, ub = (np.asarray(b, dtype=float) for b in bounds)
+    if lb.ndim == 0:
+        lb = np.resize(lb, n)
+
+    if ub.ndim == 0:
+        ub = np.resize(ub, n)
+
+    return lb, ub
+
+
+def check_tolerance(ftol, xtol, gtol, method):
+    def check(tol, name):
+        if tol is None:
+            tol = 0
+        elif tol < EPS:
+            warn(f"Setting `{name}` below the machine epsilon ({EPS:.2e}) effectively "
+                 f"disables the corresponding termination condition.",
+                 stacklevel=3)
+        return tol
+
+    ftol = check(ftol, "ftol")
+    xtol = check(xtol, "xtol")
+    gtol = check(gtol, "gtol")
+
+    if method == "lm" and (ftol < EPS or xtol < EPS or gtol < EPS):
+        raise ValueError("All tolerances must be higher than machine epsilon "
+                         f"({EPS:.2e}) for method 'lm'.")
+    elif ftol < EPS and xtol < EPS and gtol < EPS:
+        raise ValueError("At least one of the tolerances must be higher than "
+                         f"machine epsilon ({EPS:.2e}).")
+
+    return ftol, xtol, gtol
+
+
+def check_x_scale(x_scale, x0):
+    if isinstance(x_scale, str) and x_scale == 'jac':
+        return x_scale
+
+    try:
+        x_scale = np.asarray(x_scale, dtype=float)
+        valid = np.all(np.isfinite(x_scale)) and np.all(x_scale > 0)
+    except (ValueError, TypeError):
+        valid = False
+
+    if not valid:
+        raise ValueError("`x_scale` must be 'jac' or array_like with "
+                         "positive numbers.")
+
+    if x_scale.ndim == 0:
+        x_scale = np.resize(x_scale, x0.shape)
+
+    if x_scale.shape != x0.shape:
+        raise ValueError("Inconsistent shapes between `x_scale` and `x0`.")
+
+    return x_scale
+
+
+def check_jac_sparsity(jac_sparsity, m, n):
+    if jac_sparsity is None:
+        return None
+
+    if not issparse(jac_sparsity):
+        jac_sparsity = np.atleast_2d(jac_sparsity)
+
+    if jac_sparsity.shape != (m, n):
+        raise ValueError("`jac_sparsity` has wrong shape.")
+
+    return jac_sparsity, group_columns(jac_sparsity)
+
+
+# Loss functions.
+
+
+def huber(z, rho, cost_only):
+    mask = z <= 1
+    rho[0, mask] = z[mask]
+    rho[0, ~mask] = 2 * z[~mask]**0.5 - 1
+    if cost_only:
+        return
+    rho[1, mask] = 1
+    rho[1, ~mask] = z[~mask]**-0.5
+    rho[2, mask] = 0
+    rho[2, ~mask] = -0.5 * z[~mask]**-1.5
+
+
+def soft_l1(z, rho, cost_only):
+    t = 1 + z
+    rho[0] = 2 * (t**0.5 - 1)
+    if cost_only:
+        return
+    rho[1] = t**-0.5
+    rho[2] = -0.5 * t**-1.5
+
+
+def cauchy(z, rho, cost_only):
+    rho[0] = np.log1p(z)
+    if cost_only:
+        return
+    t = 1 + z
+    rho[1] = 1 / t
+    rho[2] = -1 / t**2
+
+
+def arctan(z, rho, cost_only):
+    rho[0] = np.arctan(z)
+    if cost_only:
+        return
+    t = 1 + z**2
+    rho[1] = 1 / t
+    rho[2] = -2 * z / t**2
+
+
+IMPLEMENTED_LOSSES = dict(linear=None, huber=huber, soft_l1=soft_l1,
+                          cauchy=cauchy, arctan=arctan)
+
+
+def construct_loss_function(m, loss, f_scale):
+    if loss == 'linear':
+        return None
+
+    if not callable(loss):
+        loss = IMPLEMENTED_LOSSES[loss]
+        rho = np.empty((3, m))
+
+        def loss_function(f, cost_only=False):
+            z = (f / f_scale) ** 2
+            loss(z, rho, cost_only=cost_only)
+            if cost_only:
+                return 0.5 * f_scale ** 2 * np.sum(rho[0])
+            rho[0] *= f_scale ** 2
+            rho[2] /= f_scale ** 2
+            return rho
+    else:
+        def loss_function(f, cost_only=False):
+            z = (f / f_scale) ** 2
+            rho = loss(z)
+            if cost_only:
+                return 0.5 * f_scale ** 2 * np.sum(rho[0])
+            rho[0] *= f_scale ** 2
+            rho[2] /= f_scale ** 2
+            return rho
+
+    return loss_function
+
+
+def least_squares(
+        fun, x0, jac='2-point', bounds=(-np.inf, np.inf), method='trf',
+        ftol=1e-8, xtol=1e-8, gtol=1e-8, x_scale=1.0, loss='linear',
+        f_scale=1.0, diff_step=None, tr_solver=None, tr_options={},
+        jac_sparsity=None, max_nfev=None, verbose=0, args=(), kwargs={}):
+    """Solve a nonlinear least-squares problem with bounds on the variables.
+
+    Given the residuals f(x) (an m-D real function of n real
+    variables) and the loss function rho(s) (a scalar function), `least_squares`
+    finds a local minimum of the cost function F(x)::
+
+        minimize F(x) = 0.5 * sum(rho(f_i(x)**2), i = 0, ..., m - 1)
+        subject to lb <= x <= ub
+
+    The purpose of the loss function rho(s) is to reduce the influence of
+    outliers on the solution.
+
+    Parameters
+    ----------
+    fun : callable
+        Function which computes the vector of residuals, with the signature
+        ``fun(x, *args, **kwargs)``, i.e., the minimization proceeds with
+        respect to its first argument. The argument ``x`` passed to this
+        function is an ndarray of shape (n,) (never a scalar, even for n=1).
+        It must allocate and return a 1-D array_like of shape (m,) or a scalar.
+        If the argument ``x`` is complex or the function ``fun`` returns
+        complex residuals, it must be wrapped in a real function of real
+        arguments, as shown at the end of the Examples section.
+    x0 : array_like with shape (n,) or float
+        Initial guess on independent variables. If float, it will be treated
+        as a 1-D array with one element. When `method` is 'trf', the initial
+        guess might be slightly adjusted to lie sufficiently within the given
+        `bounds`.
+    jac : {'2-point', '3-point', 'cs', callable}, optional
+        Method of computing the Jacobian matrix (an m-by-n matrix, where
+        element (i, j) is the partial derivative of f[i] with respect to
+        x[j]). The keywords select a finite difference scheme for numerical
+        estimation. The scheme '3-point' is more accurate, but requires
+        twice as many operations as '2-point' (default). The scheme 'cs'
+        uses complex steps, and while potentially the most accurate, it is
+        applicable only when `fun` correctly handles complex inputs and
+        can be analytically continued to the complex plane. Method 'lm'
+        always uses the '2-point' scheme. If callable, it is used as
+        ``jac(x, *args, **kwargs)`` and should return a good approximation
+        (or the exact value) for the Jacobian as an array_like (np.atleast_2d
+        is applied), a sparse matrix (csr_matrix preferred for performance) or
+        a `scipy.sparse.linalg.LinearOperator`.
+    bounds : 2-tuple of array_like or `Bounds`, optional
+        There are two ways to specify bounds:
+
+            1. Instance of `Bounds` class
+            2. Lower and upper bounds on independent variables. Defaults to no
+               bounds. Each array must match the size of `x0` or be a scalar,
+               in the latter case a bound will be the same for all variables.
+               Use ``np.inf`` with an appropriate sign to disable bounds on all
+               or some variables.
+    method : {'trf', 'dogbox', 'lm'}, optional
+        Algorithm to perform minimization.
+
+            * 'trf' : Trust Region Reflective algorithm, particularly suitable
+              for large sparse problems with bounds. Generally robust method.
+            * 'dogbox' : dogleg algorithm with rectangular trust regions,
+              typical use case is small problems with bounds. Not recommended
+              for problems with rank-deficient Jacobian.
+            * 'lm' : Levenberg-Marquardt algorithm as implemented in MINPACK.
+              Doesn't handle bounds and sparse Jacobians. Usually the most
+              efficient method for small unconstrained problems.
+
+        Default is 'trf'. See Notes for more information.
+    ftol : float or None, optional
+        Tolerance for termination by the change of the cost function. Default
+        is 1e-8. The optimization process is stopped when ``dF < ftol * F``,
+        and there was an adequate agreement between a local quadratic model and
+        the true model in the last step.
+
+        If None and 'method' is not 'lm', the termination by this condition is
+        disabled. If 'method' is 'lm', this tolerance must be higher than
+        machine epsilon.
+    xtol : float or None, optional
+        Tolerance for termination by the change of the independent variables.
+        Default is 1e-8. The exact condition depends on the `method` used:
+
+            * For 'trf' and 'dogbox' : ``norm(dx) < xtol * (xtol + norm(x))``.
+            * For 'lm' : ``Delta < xtol * norm(xs)``, where ``Delta`` is
+              a trust-region radius and ``xs`` is the value of ``x``
+              scaled according to `x_scale` parameter (see below).
+
+        If None and 'method' is not 'lm', the termination by this condition is
+        disabled. If 'method' is 'lm', this tolerance must be higher than
+        machine epsilon.
+    gtol : float or None, optional
+        Tolerance for termination by the norm of the gradient. Default is 1e-8.
+        The exact condition depends on a `method` used:
+
+            * For 'trf' : ``norm(g_scaled, ord=np.inf) < gtol``, where
+              ``g_scaled`` is the value of the gradient scaled to account for
+              the presence of the bounds [STIR]_.
+            * For 'dogbox' : ``norm(g_free, ord=np.inf) < gtol``, where
+              ``g_free`` is the gradient with respect to the variables which
+              are not in the optimal state on the boundary.
+            * For 'lm' : the maximum absolute value of the cosine of angles
+              between columns of the Jacobian and the residual vector is less
+              than `gtol`, or the residual vector is zero.
+
+        If None and 'method' is not 'lm', the termination by this condition is
+        disabled. If 'method' is 'lm', this tolerance must be higher than
+        machine epsilon.
+    x_scale : array_like or 'jac', optional
+        Characteristic scale of each variable. Setting `x_scale` is equivalent
+        to reformulating the problem in scaled variables ``xs = x / x_scale``.
+        An alternative view is that the size of a trust region along jth
+        dimension is proportional to ``x_scale[j]``. Improved convergence may
+        be achieved by setting `x_scale` such that a step of a given size
+        along any of the scaled variables has a similar effect on the cost
+        function. If set to 'jac', the scale is iteratively updated using the
+        inverse norms of the columns of the Jacobian matrix (as described in
+        [JJMore]_).
+    loss : str or callable, optional
+        Determines the loss function. The following keyword values are allowed:
+
+            * 'linear' (default) : ``rho(z) = z``. Gives a standard
+              least-squares problem.
+            * 'soft_l1' : ``rho(z) = 2 * ((1 + z)**0.5 - 1)``. The smooth
+              approximation of l1 (absolute value) loss. Usually a good
+              choice for robust least squares.
+            * 'huber' : ``rho(z) = z if z <= 1 else 2*z**0.5 - 1``. Works
+              similarly to 'soft_l1'.
+            * 'cauchy' : ``rho(z) = ln(1 + z)``. Severely weakens outliers
+              influence, but may cause difficulties in optimization process.
+            * 'arctan' : ``rho(z) = arctan(z)``. Limits a maximum loss on
+              a single residual, has properties similar to 'cauchy'.
+
+        If callable, it must take a 1-D ndarray ``z=f**2`` and return an
+        array_like with shape (3, m) where row 0 contains function values,
+        row 1 contains first derivatives and row 2 contains second
+        derivatives. Method 'lm' supports only 'linear' loss.
+    f_scale : float, optional
+        Value of soft margin between inlier and outlier residuals, default
+        is 1.0. The loss function is evaluated as follows
+        ``rho_(f**2) = C**2 * rho(f**2 / C**2)``, where ``C`` is `f_scale`,
+        and ``rho`` is determined by `loss` parameter. This parameter has
+        no effect with ``loss='linear'``, but for other `loss` values it is
+        of crucial importance.
+    max_nfev : None or int, optional
+        Maximum number of function evaluations before the termination.
+        If None (default), the value is chosen automatically:
+
+            * For 'trf' and 'dogbox' : 100 * n.
+            * For 'lm' :  100 * n if `jac` is callable and 100 * n * (n + 1)
+              otherwise (because 'lm' counts function calls in Jacobian
+              estimation).
+
+    diff_step : None or array_like, optional
+        Determines the relative step size for the finite difference
+        approximation of the Jacobian. The actual step is computed as
+        ``x * diff_step``. If None (default), then `diff_step` is taken to be
+        a conventional "optimal" power of machine epsilon for the finite
+        difference scheme used [NR]_.
+    tr_solver : {None, 'exact', 'lsmr'}, optional
+        Method for solving trust-region subproblems, relevant only for 'trf'
+        and 'dogbox' methods.
+
+            * 'exact' is suitable for not very large problems with dense
+              Jacobian matrices. The computational complexity per iteration is
+              comparable to a singular value decomposition of the Jacobian
+              matrix.
+            * 'lsmr' is suitable for problems with sparse and large Jacobian
+              matrices. It uses the iterative procedure
+              `scipy.sparse.linalg.lsmr` for finding a solution of a linear
+              least-squares problem and only requires matrix-vector product
+              evaluations.
+
+        If None (default), the solver is chosen based on the type of Jacobian
+        returned on the first iteration.
+    tr_options : dict, optional
+        Keyword options passed to trust-region solver.
+
+            * ``tr_solver='exact'``: `tr_options` are ignored.
+            * ``tr_solver='lsmr'``: options for `scipy.sparse.linalg.lsmr`.
+              Additionally,  ``method='trf'`` supports  'regularize' option
+              (bool, default is True), which adds a regularization term to the
+              normal equation, which improves convergence if the Jacobian is
+              rank-deficient [Byrd]_ (eq. 3.4).
+
+    jac_sparsity : {None, array_like, sparse matrix}, optional
+        Defines the sparsity structure of the Jacobian matrix for finite
+        difference estimation, its shape must be (m, n). If the Jacobian has
+        only few non-zero elements in *each* row, providing the sparsity
+        structure will greatly speed up the computations [Curtis]_. A zero
+        entry means that a corresponding element in the Jacobian is identically
+        zero. If provided, forces the use of 'lsmr' trust-region solver.
+        If None (default), then dense differencing will be used. Has no effect
+        for 'lm' method.
+    verbose : {0, 1, 2}, optional
+        Level of algorithm's verbosity:
+
+            * 0 (default) : work silently.
+            * 1 : display a termination report.
+            * 2 : display progress during iterations (not supported by 'lm'
+              method).
+
+    args, kwargs : tuple and dict, optional
+        Additional arguments passed to `fun` and `jac`. Both empty by default.
+        The calling signature is ``fun(x, *args, **kwargs)`` and the same for
+        `jac`.
+
+    Returns
+    -------
+    result : OptimizeResult
+        `OptimizeResult` with the following fields defined:
+
+            x : ndarray, shape (n,)
+                Solution found.
+            cost : float
+                Value of the cost function at the solution.
+            fun : ndarray, shape (m,)
+                Vector of residuals at the solution.
+            jac : ndarray, sparse matrix or LinearOperator, shape (m, n)
+                Modified Jacobian matrix at the solution, in the sense that J^T J
+                is a Gauss-Newton approximation of the Hessian of the cost function.
+                The type is the same as the one used by the algorithm.
+            grad : ndarray, shape (m,)
+                Gradient of the cost function at the solution.
+            optimality : float
+                First-order optimality measure. In unconstrained problems, it is
+                always the uniform norm of the gradient. In constrained problems,
+                it is the quantity which was compared with `gtol` during iterations.
+            active_mask : ndarray of int, shape (n,)
+                Each component shows whether a corresponding constraint is active
+                (that is, whether a variable is at the bound):
+
+                    *  0 : a constraint is not active.
+                    * -1 : a lower bound is active.
+                    *  1 : an upper bound is active.
+
+                Might be somewhat arbitrary for 'trf' method as it generates a
+                sequence of strictly feasible iterates and `active_mask` is
+                determined within a tolerance threshold.
+            nfev : int
+                Number of function evaluations done. Methods 'trf' and 'dogbox' do
+                not count function calls for numerical Jacobian approximation, as
+                opposed to 'lm' method.
+            njev : int or None
+                Number of Jacobian evaluations done. If numerical Jacobian
+                approximation is used in 'lm' method, it is set to None.
+            status : int
+                The reason for algorithm termination:
+
+                    * -1 : improper input parameters status returned from MINPACK.
+                    *  0 : the maximum number of function evaluations is exceeded.
+                    *  1 : `gtol` termination condition is satisfied.
+                    *  2 : `ftol` termination condition is satisfied.
+                    *  3 : `xtol` termination condition is satisfied.
+                    *  4 : Both `ftol` and `xtol` termination conditions are satisfied.
+
+            message : str
+                Verbal description of the termination reason.
+            success : bool
+                True if one of the convergence criteria is satisfied (`status` > 0).
+
+    See Also
+    --------
+    leastsq : A legacy wrapper for the MINPACK implementation of the
+              Levenberg-Marquadt algorithm.
+    curve_fit : Least-squares minimization applied to a curve-fitting problem.
+
+    Notes
+    -----
+    Method 'lm' (Levenberg-Marquardt) calls a wrapper over least-squares
+    algorithms implemented in MINPACK (lmder, lmdif). It runs the
+    Levenberg-Marquardt algorithm formulated as a trust-region type algorithm.
+    The implementation is based on paper [JJMore]_, it is very robust and
+    efficient with a lot of smart tricks. It should be your first choice
+    for unconstrained problems. Note that it doesn't support bounds. Also,
+    it doesn't work when m < n.
+
+    Method 'trf' (Trust Region Reflective) is motivated by the process of
+    solving a system of equations, which constitute the first-order optimality
+    condition for a bound-constrained minimization problem as formulated in
+    [STIR]_. The algorithm iteratively solves trust-region subproblems
+    augmented by a special diagonal quadratic term and with trust-region shape
+    determined by the distance from the bounds and the direction of the
+    gradient. This enhancements help to avoid making steps directly into bounds
+    and efficiently explore the whole space of variables. To further improve
+    convergence, the algorithm considers search directions reflected from the
+    bounds. To obey theoretical requirements, the algorithm keeps iterates
+    strictly feasible. With dense Jacobians trust-region subproblems are
+    solved by an exact method very similar to the one described in [JJMore]_
+    (and implemented in MINPACK). The difference from the MINPACK
+    implementation is that a singular value decomposition of a Jacobian
+    matrix is done once per iteration, instead of a QR decomposition and series
+    of Givens rotation eliminations. For large sparse Jacobians a 2-D subspace
+    approach of solving trust-region subproblems is used [STIR]_, [Byrd]_.
+    The subspace is spanned by a scaled gradient and an approximate
+    Gauss-Newton solution delivered by `scipy.sparse.linalg.lsmr`. When no
+    constraints are imposed the algorithm is very similar to MINPACK and has
+    generally comparable performance. The algorithm works quite robust in
+    unbounded and bounded problems, thus it is chosen as a default algorithm.
+
+    Method 'dogbox' operates in a trust-region framework, but considers
+    rectangular trust regions as opposed to conventional ellipsoids [Voglis]_.
+    The intersection of a current trust region and initial bounds is again
+    rectangular, so on each iteration a quadratic minimization problem subject
+    to bound constraints is solved approximately by Powell's dogleg method
+    [NumOpt]_. The required Gauss-Newton step can be computed exactly for
+    dense Jacobians or approximately by `scipy.sparse.linalg.lsmr` for large
+    sparse Jacobians. The algorithm is likely to exhibit slow convergence when
+    the rank of Jacobian is less than the number of variables. The algorithm
+    often outperforms 'trf' in bounded problems with a small number of
+    variables.
+
+    Robust loss functions are implemented as described in [BA]_. The idea
+    is to modify a residual vector and a Jacobian matrix on each iteration
+    such that computed gradient and Gauss-Newton Hessian approximation match
+    the true gradient and Hessian approximation of the cost function. Then
+    the algorithm proceeds in a normal way, i.e., robust loss functions are
+    implemented as a simple wrapper over standard least-squares algorithms.
+
+    .. versionadded:: 0.17.0
+
+    References
+    ----------
+    .. [STIR] M. A. Branch, T. F. Coleman, and Y. Li, "A Subspace, Interior,
+              and Conjugate Gradient Method for Large-Scale Bound-Constrained
+              Minimization Problems," SIAM Journal on Scientific Computing,
+              Vol. 21, Number 1, pp 1-23, 1999.
+    .. [NR] William H. Press et. al., "Numerical Recipes. The Art of Scientific
+            Computing. 3rd edition", Sec. 5.7.
+    .. [Byrd] R. H. Byrd, R. B. Schnabel and G. A. Shultz, "Approximate
+              solution of the trust region problem by minimization over
+              two-dimensional subspaces", Math. Programming, 40, pp. 247-263,
+              1988.
+    .. [Curtis] A. Curtis, M. J. D. Powell, and J. Reid, "On the estimation of
+                sparse Jacobian matrices", Journal of the Institute of
+                Mathematics and its Applications, 13, pp. 117-120, 1974.
+    .. [JJMore] J. J. More, "The Levenberg-Marquardt Algorithm: Implementation
+                and Theory," Numerical Analysis, ed. G. A. Watson, Lecture
+                Notes in Mathematics 630, Springer Verlag, pp. 105-116, 1977.
+    .. [Voglis] C. Voglis and I. E. Lagaris, "A Rectangular Trust Region
+                Dogleg Approach for Unconstrained and Bound Constrained
+                Nonlinear Optimization", WSEAS International Conference on
+                Applied Mathematics, Corfu, Greece, 2004.
+    .. [NumOpt] J. Nocedal and S. J. Wright, "Numerical optimization,
+                2nd edition", Chapter 4.
+    .. [BA] B. Triggs et. al., "Bundle Adjustment - A Modern Synthesis",
+            Proceedings of the International Workshop on Vision Algorithms:
+            Theory and Practice, pp. 298-372, 1999.
+
+    Examples
+    --------
+    In this example we find a minimum of the Rosenbrock function without bounds
+    on independent variables.
+
+    >>> import numpy as np
+    >>> def fun_rosenbrock(x):
+    ...     return np.array([10 * (x[1] - x[0]**2), (1 - x[0])])
+
+    Notice that we only provide the vector of the residuals. The algorithm
+    constructs the cost function as a sum of squares of the residuals, which
+    gives the Rosenbrock function. The exact minimum is at ``x = [1.0, 1.0]``.
+
+    >>> from scipy.optimize import least_squares
+    >>> x0_rosenbrock = np.array([2, 2])
+    >>> res_1 = least_squares(fun_rosenbrock, x0_rosenbrock)
+    >>> res_1.x
+    array([ 1.,  1.])
+    >>> res_1.cost
+    9.8669242910846867e-30
+    >>> res_1.optimality
+    8.8928864934219529e-14
+
+    We now constrain the variables, in such a way that the previous solution
+    becomes infeasible. Specifically, we require that ``x[1] >= 1.5``, and
+    ``x[0]`` left unconstrained. To this end, we specify the `bounds` parameter
+    to `least_squares` in the form ``bounds=([-np.inf, 1.5], np.inf)``.
+
+    We also provide the analytic Jacobian:
+
+    >>> def jac_rosenbrock(x):
+    ...     return np.array([
+    ...         [-20 * x[0], 10],
+    ...         [-1, 0]])
+
+    Putting this all together, we see that the new solution lies on the bound:
+
+    >>> res_2 = least_squares(fun_rosenbrock, x0_rosenbrock, jac_rosenbrock,
+    ...                       bounds=([-np.inf, 1.5], np.inf))
+    >>> res_2.x
+    array([ 1.22437075,  1.5       ])
+    >>> res_2.cost
+    0.025213093946805685
+    >>> res_2.optimality
+    1.5885401433157753e-07
+
+    Now we solve a system of equations (i.e., the cost function should be zero
+    at a minimum) for a Broyden tridiagonal vector-valued function of 100000
+    variables:
+
+    >>> def fun_broyden(x):
+    ...     f = (3 - x) * x + 1
+    ...     f[1:] -= x[:-1]
+    ...     f[:-1] -= 2 * x[1:]
+    ...     return f
+
+    The corresponding Jacobian matrix is sparse. We tell the algorithm to
+    estimate it by finite differences and provide the sparsity structure of
+    Jacobian to significantly speed up this process.
+
+    >>> from scipy.sparse import lil_matrix
+    >>> def sparsity_broyden(n):
+    ...     sparsity = lil_matrix((n, n), dtype=int)
+    ...     i = np.arange(n)
+    ...     sparsity[i, i] = 1
+    ...     i = np.arange(1, n)
+    ...     sparsity[i, i - 1] = 1
+    ...     i = np.arange(n - 1)
+    ...     sparsity[i, i + 1] = 1
+    ...     return sparsity
+    ...
+    >>> n = 100000
+    >>> x0_broyden = -np.ones(n)
+    ...
+    >>> res_3 = least_squares(fun_broyden, x0_broyden,
+    ...                       jac_sparsity=sparsity_broyden(n))
+    >>> res_3.cost
+    4.5687069299604613e-23
+    >>> res_3.optimality
+    1.1650454296851518e-11
+
+    Let's also solve a curve fitting problem using robust loss function to
+    take care of outliers in the data. Define the model function as
+    ``y = a + b * exp(c * t)``, where t is a predictor variable, y is an
+    observation and a, b, c are parameters to estimate.
+
+    First, define the function which generates the data with noise and
+    outliers, define the model parameters, and generate data:
+
+    >>> from numpy.random import default_rng
+    >>> rng = default_rng()
+    >>> def gen_data(t, a, b, c, noise=0., n_outliers=0, seed=None):
+    ...     rng = default_rng(seed)
+    ...
+    ...     y = a + b * np.exp(t * c)
+    ...
+    ...     error = noise * rng.standard_normal(t.size)
+    ...     outliers = rng.integers(0, t.size, n_outliers)
+    ...     error[outliers] *= 10
+    ...
+    ...     return y + error
+    ...
+    >>> a = 0.5
+    >>> b = 2.0
+    >>> c = -1
+    >>> t_min = 0
+    >>> t_max = 10
+    >>> n_points = 15
+    ...
+    >>> t_train = np.linspace(t_min, t_max, n_points)
+    >>> y_train = gen_data(t_train, a, b, c, noise=0.1, n_outliers=3)
+
+    Define function for computing residuals and initial estimate of
+    parameters.
+
+    >>> def fun(x, t, y):
+    ...     return x[0] + x[1] * np.exp(x[2] * t) - y
+    ...
+    >>> x0 = np.array([1.0, 1.0, 0.0])
+
+    Compute a standard least-squares solution:
+
+    >>> res_lsq = least_squares(fun, x0, args=(t_train, y_train))
+
+    Now compute two solutions with two different robust loss functions. The
+    parameter `f_scale` is set to 0.1, meaning that inlier residuals should
+    not significantly exceed 0.1 (the noise level used).
+
+    >>> res_soft_l1 = least_squares(fun, x0, loss='soft_l1', f_scale=0.1,
+    ...                             args=(t_train, y_train))
+    >>> res_log = least_squares(fun, x0, loss='cauchy', f_scale=0.1,
+    ...                         args=(t_train, y_train))
+
+    And, finally, plot all the curves. We see that by selecting an appropriate
+    `loss`  we can get estimates close to optimal even in the presence of
+    strong outliers. But keep in mind that generally it is recommended to try
+    'soft_l1' or 'huber' losses first (if at all necessary) as the other two
+    options may cause difficulties in optimization process.
+
+    >>> t_test = np.linspace(t_min, t_max, n_points * 10)
+    >>> y_true = gen_data(t_test, a, b, c)
+    >>> y_lsq = gen_data(t_test, *res_lsq.x)
+    >>> y_soft_l1 = gen_data(t_test, *res_soft_l1.x)
+    >>> y_log = gen_data(t_test, *res_log.x)
+    ...
+    >>> import matplotlib.pyplot as plt
+    >>> plt.plot(t_train, y_train, 'o')
+    >>> plt.plot(t_test, y_true, 'k', linewidth=2, label='true')
+    >>> plt.plot(t_test, y_lsq, label='linear loss')
+    >>> plt.plot(t_test, y_soft_l1, label='soft_l1 loss')
+    >>> plt.plot(t_test, y_log, label='cauchy loss')
+    >>> plt.xlabel("t")
+    >>> plt.ylabel("y")
+    >>> plt.legend()
+    >>> plt.show()
+
+    In the next example, we show how complex-valued residual functions of
+    complex variables can be optimized with ``least_squares()``. Consider the
+    following function:
+
+    >>> def f(z):
+    ...     return z - (0.5 + 0.5j)
+
+    We wrap it into a function of real variables that returns real residuals
+    by simply handling the real and imaginary parts as independent variables:
+
+    >>> def f_wrap(x):
+    ...     fx = f(x[0] + 1j*x[1])
+    ...     return np.array([fx.real, fx.imag])
+
+    Thus, instead of the original m-D complex function of n complex
+    variables we optimize a 2m-D real function of 2n real variables:
+
+    >>> from scipy.optimize import least_squares
+    >>> res_wrapped = least_squares(f_wrap, (0.1, 0.1), bounds=([0, 0], [1, 1]))
+    >>> z = res_wrapped.x[0] + res_wrapped.x[1]*1j
+    >>> z
+    (0.49999999999925893+0.49999999999925893j)
+
+    """
+    if method not in ['trf', 'dogbox', 'lm']:
+        raise ValueError("`method` must be 'trf', 'dogbox' or 'lm'.")
+
+    if jac not in ['2-point', '3-point', 'cs'] and not callable(jac):
+        raise ValueError("`jac` must be '2-point', '3-point', 'cs' or "
+                         "callable.")
+
+    if tr_solver not in [None, 'exact', 'lsmr']:
+        raise ValueError("`tr_solver` must be None, 'exact' or 'lsmr'.")
+
+    if loss not in IMPLEMENTED_LOSSES and not callable(loss):
+        raise ValueError("`loss` must be one of {} or a callable."
+                         .format(IMPLEMENTED_LOSSES.keys()))
+
+    if method == 'lm' and loss != 'linear':
+        raise ValueError("method='lm' supports only 'linear' loss function.")
+
+    if verbose not in [0, 1, 2]:
+        raise ValueError("`verbose` must be in [0, 1, 2].")
+
+    if max_nfev is not None and max_nfev <= 0:
+        raise ValueError("`max_nfev` must be None or positive integer.")
+
+    if np.iscomplexobj(x0):
+        raise ValueError("`x0` must be real.")
+
+    x0 = np.atleast_1d(x0).astype(float)
+
+    if x0.ndim > 1:
+        raise ValueError("`x0` must have at most 1 dimension.")
+
+    if isinstance(bounds, Bounds):
+        lb, ub = bounds.lb, bounds.ub
+        bounds = (lb, ub)
+    else:
+        if len(bounds) == 2:
+            lb, ub = prepare_bounds(bounds, x0.shape[0])
+        else:
+            raise ValueError("`bounds` must contain 2 elements.")
+
+    if method == 'lm' and not np.all((lb == -np.inf) & (ub == np.inf)):
+        raise ValueError("Method 'lm' doesn't support bounds.")
+
+    if lb.shape != x0.shape or ub.shape != x0.shape:
+        raise ValueError("Inconsistent shapes between bounds and `x0`.")
+
+    if np.any(lb >= ub):
+        raise ValueError("Each lower bound must be strictly less than each "
+                         "upper bound.")
+
+    if not in_bounds(x0, lb, ub):
+        raise ValueError("`x0` is infeasible.")
+
+    x_scale = check_x_scale(x_scale, x0)
+
+    ftol, xtol, gtol = check_tolerance(ftol, xtol, gtol, method)
+
+    if method == 'trf':
+        x0 = make_strictly_feasible(x0, lb, ub)
+
+    def fun_wrapped(x):
+        return np.atleast_1d(fun(x, *args, **kwargs))
+
+    f0 = fun_wrapped(x0)
+
+    if f0.ndim != 1:
+        raise ValueError("`fun` must return at most 1-d array_like. "
+                         f"f0.shape: {f0.shape}")
+
+    if not np.all(np.isfinite(f0)):
+        raise ValueError("Residuals are not finite in the initial point.")
+
+    n = x0.size
+    m = f0.size
+
+    if method == 'lm' and m < n:
+        raise ValueError("Method 'lm' doesn't work when the number of "
+                         "residuals is less than the number of variables.")
+
+    loss_function = construct_loss_function(m, loss, f_scale)
+    if callable(loss):
+        rho = loss_function(f0)
+        if rho.shape != (3, m):
+            raise ValueError("The return value of `loss` callable has wrong "
+                             "shape.")
+        initial_cost = 0.5 * np.sum(rho[0])
+    elif loss_function is not None:
+        initial_cost = loss_function(f0, cost_only=True)
+    else:
+        initial_cost = 0.5 * np.dot(f0, f0)
+
+    if callable(jac):
+        J0 = jac(x0, *args, **kwargs)
+
+        if issparse(J0):
+            J0 = J0.tocsr()
+
+            def jac_wrapped(x, _=None):
+                return jac(x, *args, **kwargs).tocsr()
+
+        elif isinstance(J0, LinearOperator):
+            def jac_wrapped(x, _=None):
+                return jac(x, *args, **kwargs)
+
+        else:
+            J0 = np.atleast_2d(J0)
+
+            def jac_wrapped(x, _=None):
+                return np.atleast_2d(jac(x, *args, **kwargs))
+
+    else:  # Estimate Jacobian by finite differences.
+        if method == 'lm':
+            if jac_sparsity is not None:
+                raise ValueError("method='lm' does not support "
+                                 "`jac_sparsity`.")
+
+            if jac != '2-point':
+                warn(f"jac='{jac}' works equivalently to '2-point' for method='lm'.",
+                     stacklevel=2)
+
+            J0 = jac_wrapped = None
+        else:
+            if jac_sparsity is not None and tr_solver == 'exact':
+                raise ValueError("tr_solver='exact' is incompatible "
+                                 "with `jac_sparsity`.")
+
+            jac_sparsity = check_jac_sparsity(jac_sparsity, m, n)
+
+            def jac_wrapped(x, f):
+                J = approx_derivative(fun, x, rel_step=diff_step, method=jac,
+                                      f0=f, bounds=bounds, args=args,
+                                      kwargs=kwargs, sparsity=jac_sparsity)
+                if J.ndim != 2:  # J is guaranteed not sparse.
+                    J = np.atleast_2d(J)
+
+                return J
+
+            J0 = jac_wrapped(x0, f0)
+
+    if J0 is not None:
+        if J0.shape != (m, n):
+            raise ValueError(
+                f"The return value of `jac` has wrong shape: expected {(m, n)}, "
+                f"actual {J0.shape}."
+            )
+
+        if not isinstance(J0, np.ndarray):
+            if method == 'lm':
+                raise ValueError("method='lm' works only with dense "
+                                 "Jacobian matrices.")
+
+            if tr_solver == 'exact':
+                raise ValueError(
+                    "tr_solver='exact' works only with dense "
+                    "Jacobian matrices.")
+
+        jac_scale = isinstance(x_scale, str) and x_scale == 'jac'
+        if isinstance(J0, LinearOperator) and jac_scale:
+            raise ValueError("x_scale='jac' can't be used when `jac` "
+                             "returns LinearOperator.")
+
+        if tr_solver is None:
+            if isinstance(J0, np.ndarray):
+                tr_solver = 'exact'
+            else:
+                tr_solver = 'lsmr'
+
+    if method == 'lm':
+        result = call_minpack(fun_wrapped, x0, jac_wrapped, ftol, xtol, gtol,
+                              max_nfev, x_scale, diff_step)
+
+    elif method == 'trf':
+        result = trf(fun_wrapped, jac_wrapped, x0, f0, J0, lb, ub, ftol, xtol,
+                     gtol, max_nfev, x_scale, loss_function, tr_solver,
+                     tr_options.copy(), verbose)
+
+    elif method == 'dogbox':
+        if tr_solver == 'lsmr' and 'regularize' in tr_options:
+            warn("The keyword 'regularize' in `tr_options` is not relevant "
+                 "for 'dogbox' method.",
+                 stacklevel=2)
+            tr_options = tr_options.copy()
+            del tr_options['regularize']
+
+        result = dogbox(fun_wrapped, jac_wrapped, x0, f0, J0, lb, ub, ftol,
+                        xtol, gtol, max_nfev, x_scale, loss_function,
+                        tr_solver, tr_options, verbose)
+
+    result.message = TERMINATION_MESSAGES[result.status]
+    result.success = result.status > 0
+
+    if verbose >= 1:
+        print(result.message)
+        print("Function evaluations {}, initial cost {:.4e}, final cost "
+              "{:.4e}, first-order optimality {:.2e}."
+              .format(result.nfev, initial_cost, result.cost,
+                      result.optimality))
+
+    return result
@@ -0,0 +1,362 @@
+"""Linear least squares with bound constraints on independent variables."""
+import numpy as np
+from numpy.linalg import norm
+from scipy.sparse import issparse, csr_matrix
+from scipy.sparse.linalg import LinearOperator, lsmr
+from scipy.optimize import OptimizeResult
+from scipy.optimize._minimize import Bounds
+
+from .common import in_bounds, compute_grad
+from .trf_linear import trf_linear
+from .bvls import bvls
+
+
+def prepare_bounds(bounds, n):
+    if len(bounds) != 2:
+        raise ValueError("`bounds` must contain 2 elements.")
+    lb, ub = (np.asarray(b, dtype=float) for b in bounds)
+
+    if lb.ndim == 0:
+        lb = np.resize(lb, n)
+
+    if ub.ndim == 0:
+        ub = np.resize(ub, n)
+
+    return lb, ub
+
+
+TERMINATION_MESSAGES = {
+    -1: "The algorithm was not able to make progress on the last iteration.",
+    0: "The maximum number of iterations is exceeded.",
+    1: "The first-order optimality measure is less than `tol`.",
+    2: "The relative change of the cost function is less than `tol`.",
+    3: "The unconstrained solution is optimal."
+}
+
+
+def lsq_linear(A, b, bounds=(-np.inf, np.inf), method='trf', tol=1e-10,
+               lsq_solver=None, lsmr_tol=None, max_iter=None,
+               verbose=0, *, lsmr_maxiter=None,):
+    r"""Solve a linear least-squares problem with bounds on the variables.
+
+    Given a m-by-n design matrix A and a target vector b with m elements,
+    `lsq_linear` solves the following optimization problem::
+
+        minimize 0.5 * ||A x - b||**2
+        subject to lb <= x <= ub
+
+    This optimization problem is convex, hence a found minimum (if iterations
+    have converged) is guaranteed to be global.
+
+    Parameters
+    ----------
+    A : array_like, sparse matrix of LinearOperator, shape (m, n)
+        Design matrix. Can be `scipy.sparse.linalg.LinearOperator`.
+    b : array_like, shape (m,)
+        Target vector.
+    bounds : 2-tuple of array_like or `Bounds`, optional
+        Lower and upper bounds on parameters. Defaults to no bounds.
+        There are two ways to specify the bounds:
+
+            - Instance of `Bounds` class.
+
+            - 2-tuple of array_like: Each element of the tuple must be either
+              an array with the length equal to the number of parameters, or a
+              scalar (in which case the bound is taken to be the same for all
+              parameters). Use ``np.inf`` with an appropriate sign to disable
+              bounds on all or some parameters.
+
+    method : 'trf' or 'bvls', optional
+        Method to perform minimization.
+
+            * 'trf' : Trust Region Reflective algorithm adapted for a linear
+              least-squares problem. This is an interior-point-like method
+              and the required number of iterations is weakly correlated with
+              the number of variables.
+            * 'bvls' : Bounded-variable least-squares algorithm. This is
+              an active set method, which requires the number of iterations
+              comparable to the number of variables. Can't be used when `A` is
+              sparse or LinearOperator.
+
+        Default is 'trf'.
+    tol : float, optional
+        Tolerance parameter. The algorithm terminates if a relative change
+        of the cost function is less than `tol` on the last iteration.
+        Additionally, the first-order optimality measure is considered:
+
+            * ``method='trf'`` terminates if the uniform norm of the gradient,
+              scaled to account for the presence of the bounds, is less than
+              `tol`.
+            * ``method='bvls'`` terminates if Karush-Kuhn-Tucker conditions
+              are satisfied within `tol` tolerance.
+
+    lsq_solver : {None, 'exact', 'lsmr'}, optional
+        Method of solving unbounded least-squares problems throughout
+        iterations:
+
+            * 'exact' : Use dense QR or SVD decomposition approach. Can't be
+              used when `A` is sparse or LinearOperator.
+            * 'lsmr' : Use `scipy.sparse.linalg.lsmr` iterative procedure
+              which requires only matrix-vector product evaluations. Can't
+              be used with ``method='bvls'``.
+
+        If None (default), the solver is chosen based on type of `A`.
+    lsmr_tol : None, float or 'auto', optional
+        Tolerance parameters 'atol' and 'btol' for `scipy.sparse.linalg.lsmr`
+        If None (default), it is set to ``1e-2 * tol``. If 'auto', the
+        tolerance will be adjusted based on the optimality of the current
+        iterate, which can speed up the optimization process, but is not always
+        reliable.
+    max_iter : None or int, optional
+        Maximum number of iterations before termination. If None (default), it
+        is set to 100 for ``method='trf'`` or to the number of variables for
+        ``method='bvls'`` (not counting iterations for 'bvls' initialization).
+    verbose : {0, 1, 2}, optional
+        Level of algorithm's verbosity:
+
+            * 0 : work silently (default).
+            * 1 : display a termination report.
+            * 2 : display progress during iterations.
+    lsmr_maxiter : None or int, optional
+        Maximum number of iterations for the lsmr least squares solver,
+        if it is used (by setting ``lsq_solver='lsmr'``). If None (default), it
+        uses lsmr's default of ``min(m, n)`` where ``m`` and ``n`` are the
+        number of rows and columns of `A`, respectively. Has no effect if
+        ``lsq_solver='exact'``.
+
+    Returns
+    -------
+    OptimizeResult with the following fields defined:
+    x : ndarray, shape (n,)
+        Solution found.
+    cost : float
+        Value of the cost function at the solution.
+    fun : ndarray, shape (m,)
+        Vector of residuals at the solution.
+    optimality : float
+        First-order optimality measure. The exact meaning depends on `method`,
+        refer to the description of `tol` parameter.
+    active_mask : ndarray of int, shape (n,)
+        Each component shows whether a corresponding constraint is active
+        (that is, whether a variable is at the bound):
+
+            *  0 : a constraint is not active.
+            * -1 : a lower bound is active.
+            *  1 : an upper bound is active.
+
+        Might be somewhat arbitrary for the `trf` method as it generates a
+        sequence of strictly feasible iterates and active_mask is determined
+        within a tolerance threshold.
+    unbounded_sol : tuple
+        Unbounded least squares solution tuple returned by the least squares
+        solver (set with `lsq_solver` option). If `lsq_solver` is not set or is
+        set to ``'exact'``, the tuple contains an ndarray of shape (n,) with
+        the unbounded solution, an ndarray with the sum of squared residuals,
+        an int with the rank of `A`, and an ndarray with the singular values
+        of `A` (see NumPy's ``linalg.lstsq`` for more information). If
+        `lsq_solver` is set to ``'lsmr'``, the tuple contains an ndarray of
+        shape (n,) with the unbounded solution, an int with the exit code,
+        an int with the number of iterations, and five floats with
+        various norms and the condition number of `A` (see SciPy's
+        ``sparse.linalg.lsmr`` for more information). This output can be
+        useful for determining the convergence of the least squares solver,
+        particularly the iterative ``'lsmr'`` solver. The unbounded least
+        squares problem is to minimize ``0.5 * ||A x - b||**2``.
+    nit : int
+        Number of iterations. Zero if the unconstrained solution is optimal.
+    status : int
+        Reason for algorithm termination:
+
+            * -1 : the algorithm was not able to make progress on the last
+              iteration.
+            *  0 : the maximum number of iterations is exceeded.
+            *  1 : the first-order optimality measure is less than `tol`.
+            *  2 : the relative change of the cost function is less than `tol`.
+            *  3 : the unconstrained solution is optimal.
+
+    message : str
+        Verbal description of the termination reason.
+    success : bool
+        True if one of the convergence criteria is satisfied (`status` > 0).
+
+    See Also
+    --------
+    nnls : Linear least squares with non-negativity constraint.
+    least_squares : Nonlinear least squares with bounds on the variables.
+
+    Notes
+    -----
+    The algorithm first computes the unconstrained least-squares solution by
+    `numpy.linalg.lstsq` or `scipy.sparse.linalg.lsmr` depending on
+    `lsq_solver`. This solution is returned as optimal if it lies within the
+    bounds.
+
+    Method 'trf' runs the adaptation of the algorithm described in [STIR]_ for
+    a linear least-squares problem. The iterations are essentially the same as
+    in the nonlinear least-squares algorithm, but as the quadratic function
+    model is always accurate, we don't need to track or modify the radius of
+    a trust region. The line search (backtracking) is used as a safety net
+    when a selected step does not decrease the cost function. Read more
+    detailed description of the algorithm in `scipy.optimize.least_squares`.
+
+    Method 'bvls' runs a Python implementation of the algorithm described in
+    [BVLS]_. The algorithm maintains active and free sets of variables, on
+    each iteration chooses a new variable to move from the active set to the
+    free set and then solves the unconstrained least-squares problem on free
+    variables. This algorithm is guaranteed to give an accurate solution
+    eventually, but may require up to n iterations for a problem with n
+    variables. Additionally, an ad-hoc initialization procedure is
+    implemented, that determines which variables to set free or active
+    initially. It takes some number of iterations before actual BVLS starts,
+    but can significantly reduce the number of further iterations.
+
+    References
+    ----------
+    .. [STIR] M. A. Branch, T. F. Coleman, and Y. Li, "A Subspace, Interior,
+              and Conjugate Gradient Method for Large-Scale Bound-Constrained
+              Minimization Problems," SIAM Journal on Scientific Computing,
+              Vol. 21, Number 1, pp 1-23, 1999.
+    .. [BVLS] P. B. Start and R. L. Parker, "Bounded-Variable Least-Squares:
+              an Algorithm and Applications", Computational Statistics, 10,
+              129-141, 1995.
+
+    Examples
+    --------
+    In this example, a problem with a large sparse matrix and bounds on the
+    variables is solved.
+
+    >>> import numpy as np
+    >>> from scipy.sparse import rand
+    >>> from scipy.optimize import lsq_linear
+    >>> rng = np.random.default_rng()
+    ...
+    >>> m = 20000
+    >>> n = 10000
+    ...
+    >>> A = rand(m, n, density=1e-4, random_state=rng)
+    >>> b = rng.standard_normal(m)
+    ...
+    >>> lb = rng.standard_normal(n)
+    >>> ub = lb + 1
+    ...
+    >>> res = lsq_linear(A, b, bounds=(lb, ub), lsmr_tol='auto', verbose=1)
+    # may vary
+    The relative change of the cost function is less than `tol`.
+    Number of iterations 16, initial cost 1.5039e+04, final cost 1.1112e+04,
+    first-order optimality 4.66e-08.
+    """
+    if method not in ['trf', 'bvls']:
+        raise ValueError("`method` must be 'trf' or 'bvls'")
+
+    if lsq_solver not in [None, 'exact', 'lsmr']:
+        raise ValueError("`solver` must be None, 'exact' or 'lsmr'.")
+
+    if verbose not in [0, 1, 2]:
+        raise ValueError("`verbose` must be in [0, 1, 2].")
+
+    if issparse(A):
+        A = csr_matrix(A)
+    elif not isinstance(A, LinearOperator):
+        A = np.atleast_2d(np.asarray(A))
+
+    if method == 'bvls':
+        if lsq_solver == 'lsmr':
+            raise ValueError("method='bvls' can't be used with "
+                             "lsq_solver='lsmr'")
+
+        if not isinstance(A, np.ndarray):
+            raise ValueError("method='bvls' can't be used with `A` being "
+                             "sparse or LinearOperator.")
+
+    if lsq_solver is None:
+        if isinstance(A, np.ndarray):
+            lsq_solver = 'exact'
+        else:
+            lsq_solver = 'lsmr'
+    elif lsq_solver == 'exact' and not isinstance(A, np.ndarray):
+        raise ValueError("`exact` solver can't be used when `A` is "
+                         "sparse or LinearOperator.")
+
+    if len(A.shape) != 2:  # No ndim for LinearOperator.
+        raise ValueError("`A` must have at most 2 dimensions.")
+
+    if max_iter is not None and max_iter <= 0:
+        raise ValueError("`max_iter` must be None or positive integer.")
+
+    m, n = A.shape
+
+    b = np.atleast_1d(b)
+    if b.ndim != 1:
+        raise ValueError("`b` must have at most 1 dimension.")
+
+    if b.size != m:
+        raise ValueError("Inconsistent shapes between `A` and `b`.")
+
+    if isinstance(bounds, Bounds):
+        lb = bounds.lb
+        ub = bounds.ub
+    else:
+        lb, ub = prepare_bounds(bounds, n)
+
+    if lb.shape != (n,) and ub.shape != (n,):
+        raise ValueError("Bounds have wrong shape.")
+
+    if np.any(lb >= ub):
+        raise ValueError("Each lower bound must be strictly less than each "
+                         "upper bound.")
+
+    if lsmr_maxiter is not None and lsmr_maxiter < 1:
+        raise ValueError("`lsmr_maxiter` must be None or positive integer.")
+
+    if not ((isinstance(lsmr_tol, float) and lsmr_tol > 0) or
+            lsmr_tol in ('auto', None)):
+        raise ValueError("`lsmr_tol` must be None, 'auto', or positive float.")
+
+    if lsq_solver == 'exact':
+        unbd_lsq = np.linalg.lstsq(A, b, rcond=-1)
+    elif lsq_solver == 'lsmr':
+        first_lsmr_tol = lsmr_tol  # tol of first call to lsmr
+        if lsmr_tol is None or lsmr_tol == 'auto':
+            first_lsmr_tol = 1e-2 * tol  # default if lsmr_tol not defined
+        unbd_lsq = lsmr(A, b, maxiter=lsmr_maxiter,
+                        atol=first_lsmr_tol, btol=first_lsmr_tol)
+    x_lsq = unbd_lsq[0]  # extract the solution from the least squares solver
+
+    if in_bounds(x_lsq, lb, ub):
+        r = A @ x_lsq - b
+        cost = 0.5 * np.dot(r, r)
+        termination_status = 3
+        termination_message = TERMINATION_MESSAGES[termination_status]
+        g = compute_grad(A, r)
+        g_norm = norm(g, ord=np.inf)
+
+        if verbose > 0:
+            print(termination_message)
+            print(f"Final cost {cost:.4e}, first-order optimality {g_norm:.2e}")
+
+        return OptimizeResult(
+            x=x_lsq, fun=r, cost=cost, optimality=g_norm,
+            active_mask=np.zeros(n), unbounded_sol=unbd_lsq,
+            nit=0, status=termination_status,
+            message=termination_message, success=True)
+
+    if method == 'trf':
+        res = trf_linear(A, b, x_lsq, lb, ub, tol, lsq_solver, lsmr_tol,
+                         max_iter, verbose, lsmr_maxiter=lsmr_maxiter)
+    elif method == 'bvls':
+        res = bvls(A, b, x_lsq, lb, ub, tol, max_iter, verbose)
+
+    res.unbounded_sol = unbd_lsq
+    res.message = TERMINATION_MESSAGES[res.status]
+    res.success = res.status > 0
+
+    if verbose > 0:
+        print(res.message)
+        print(
+            f"Number of iterations {res.nit}, initial cost {res.initial_cost:.4e}, "
+            f"final cost {res.cost:.4e}, first-order optimality {res.optimality:.2e}."
+        )
+
+    del res.initial_cost
+
+    return res
@@ -0,0 +1,560 @@
+"""Trust Region Reflective algorithm for least-squares optimization.
+
+The algorithm is based on ideas from paper [STIR]_. The main idea is to
+account for the presence of the bounds by appropriate scaling of the variables (or,
+equivalently, changing a trust-region shape). Let's introduce a vector v:
+
+           | ub[i] - x[i], if g[i] < 0 and ub[i] < np.inf
+    v[i] = | x[i] - lb[i], if g[i] > 0 and lb[i] > -np.inf
+           | 1,           otherwise
+
+where g is the gradient of a cost function and lb, ub are the bounds. Its
+components are distances to the bounds at which the anti-gradient points (if
+this distance is finite). Define a scaling matrix D = diag(v**0.5).
+First-order optimality conditions can be stated as
+
+    D^2 g(x) = 0.
+
+Meaning that components of the gradient should be zero for strictly interior
+variables, and components must point inside the feasible region for variables
+on the bound.
+
+Now consider this system of equations as a new optimization problem. If the
+point x is strictly interior (not on the bound), then the left-hand side is
+differentiable and the Newton step for it satisfies
+
+    (D^2 H + diag(g) Jv) p = -D^2 g
+
+where H is the Hessian matrix (or its J^T J approximation in least squares),
+Jv is the Jacobian matrix of v with components -1, 1 or 0, such that all
+elements of matrix C = diag(g) Jv are non-negative. Introduce the change
+of the variables x = D x_h (_h would be "hat" in LaTeX). In the new variables,
+we have a Newton step satisfying
+
+    B_h p_h = -g_h,
+
+where B_h = D H D + C, g_h = D g. In least squares B_h = J_h^T J_h, where
+J_h = J D. Note that J_h and g_h are proper Jacobian and gradient with respect
+to "hat" variables. To guarantee global convergence we formulate a
+trust-region problem based on the Newton step in the new variables:
+
+    0.5 * p_h^T B_h p + g_h^T p_h -> min, ||p_h|| <= Delta
+
+In the original space B = H + D^{-1} C D^{-1}, and the equivalent trust-region
+problem is
+
+    0.5 * p^T B p + g^T p -> min, ||D^{-1} p|| <= Delta
+
+Here, the meaning of the matrix D becomes more clear: it alters the shape
+of a trust-region, such that large steps towards the bounds are not allowed.
+In the implementation, the trust-region problem is solved in "hat" space,
+but handling of the bounds is done in the original space (see below and read
+the code).
+
+The introduction of the matrix D doesn't allow to ignore bounds, the algorithm
+must keep iterates strictly feasible (to satisfy aforementioned
+differentiability), the parameter theta controls step back from the boundary
+(see the code for details).
+
+The algorithm does another important trick. If the trust-region solution
+doesn't fit into the bounds, then a reflected (from a firstly encountered
+bound) search direction is considered. For motivation and analysis refer to
+[STIR]_ paper (and other papers of the authors). In practice, it doesn't need
+a lot of justifications, the algorithm simply chooses the best step among
+three: a constrained trust-region step, a reflected step and a constrained
+Cauchy step (a minimizer along -g_h in "hat" space, or -D^2 g in the original
+space).
+
+Another feature is that a trust-region radius control strategy is modified to
+account for appearance of the diagonal C matrix (called diag_h in the code).
+
+Note that all described peculiarities are completely gone as we consider
+problems without bounds (the algorithm becomes a standard trust-region type
+algorithm very similar to ones implemented in MINPACK).
+
+The implementation supports two methods of solving the trust-region problem.
+The first, called 'exact', applies SVD on Jacobian and then solves the problem
+very accurately using the algorithm described in [JJMore]_. It is not
+applicable to large problem. The second, called 'lsmr', uses the 2-D subspace
+approach (sometimes called "indefinite dogleg"), where the problem is solved
+in a subspace spanned by the gradient and the approximate Gauss-Newton step
+found by ``scipy.sparse.linalg.lsmr``. A 2-D trust-region problem is
+reformulated as a 4th order algebraic equation and solved very accurately by
+``numpy.roots``. The subspace approach allows to solve very large problems
+(up to couple of millions of residuals on a regular PC), provided the Jacobian
+matrix is sufficiently sparse.
+
+References
+----------
+.. [STIR] Branch, M.A., T.F. Coleman, and Y. Li, "A Subspace, Interior,
+      and Conjugate Gradient Method for Large-Scale Bound-Constrained
+      Minimization Problems," SIAM Journal on Scientific Computing,
+      Vol. 21, Number 1, pp 1-23, 1999.
+.. [JJMore] More, J. J., "The Levenberg-Marquardt Algorithm: Implementation
+    and Theory," Numerical Analysis, ed. G. A. Watson, Lecture
+"""
+import numpy as np
+from numpy.linalg import norm
+from scipy.linalg import svd, qr
+from scipy.sparse.linalg import lsmr
+from scipy.optimize import OptimizeResult
+
+from .common import (
+    step_size_to_bound, find_active_constraints, in_bounds,
+    make_strictly_feasible, intersect_trust_region, solve_lsq_trust_region,
+    solve_trust_region_2d, minimize_quadratic_1d, build_quadratic_1d,
+    evaluate_quadratic, right_multiplied_operator, regularized_lsq_operator,
+    CL_scaling_vector, compute_grad, compute_jac_scale, check_termination,
+    update_tr_radius, scale_for_robust_loss_function, print_header_nonlinear,
+    print_iteration_nonlinear)
+
+
+def trf(fun, jac, x0, f0, J0, lb, ub, ftol, xtol, gtol, max_nfev, x_scale,
+        loss_function, tr_solver, tr_options, verbose):
+    # For efficiency, it makes sense to run the simplified version of the
+    # algorithm when no bounds are imposed. We decided to write the two
+    # separate functions. It violates the DRY principle, but the individual
+    # functions are kept the most readable.
+    if np.all(lb == -np.inf) and np.all(ub == np.inf):
+        return trf_no_bounds(
+            fun, jac, x0, f0, J0, ftol, xtol, gtol, max_nfev, x_scale,
+            loss_function, tr_solver, tr_options, verbose)
+    else:
+        return trf_bounds(
+            fun, jac, x0, f0, J0, lb, ub, ftol, xtol, gtol, max_nfev, x_scale,
+            loss_function, tr_solver, tr_options, verbose)
+
+
+def select_step(x, J_h, diag_h, g_h, p, p_h, d, Delta, lb, ub, theta):
+    """Select the best step according to Trust Region Reflective algorithm."""
+    if in_bounds(x + p, lb, ub):
+        p_value = evaluate_quadratic(J_h, g_h, p_h, diag=diag_h)
+        return p, p_h, -p_value
+
+    p_stride, hits = step_size_to_bound(x, p, lb, ub)
+
+    # Compute the reflected direction.
+    r_h = np.copy(p_h)
+    r_h[hits.astype(bool)] *= -1
+    r = d * r_h
+
+    # Restrict trust-region step, such that it hits the bound.
+    p *= p_stride
+    p_h *= p_stride
+    x_on_bound = x + p
+
+    # Reflected direction will cross first either feasible region or trust
+    # region boundary.
+    _, to_tr = intersect_trust_region(p_h, r_h, Delta)
+    to_bound, _ = step_size_to_bound(x_on_bound, r, lb, ub)
+
+    # Find lower and upper bounds on a step size along the reflected
+    # direction, considering the strict feasibility requirement. There is no
+    # single correct way to do that, the chosen approach seems to work best
+    # on test problems.
+    r_stride = min(to_bound, to_tr)
+    if r_stride > 0:
+        r_stride_l = (1 - theta) * p_stride / r_stride
+        if r_stride == to_bound:
+            r_stride_u = theta * to_bound
+        else:
+            r_stride_u = to_tr
+    else:
+        r_stride_l = 0
+        r_stride_u = -1
+
+    # Check if reflection step is available.
+    if r_stride_l <= r_stride_u:
+        a, b, c = build_quadratic_1d(J_h, g_h, r_h, s0=p_h, diag=diag_h)
+        r_stride, r_value = minimize_quadratic_1d(
+            a, b, r_stride_l, r_stride_u, c=c)
+        r_h *= r_stride
+        r_h += p_h
+        r = r_h * d
+    else:
+        r_value = np.inf
+
+    # Now correct p_h to make it strictly interior.
+    p *= theta
+    p_h *= theta
+    p_value = evaluate_quadratic(J_h, g_h, p_h, diag=diag_h)
+
+    ag_h = -g_h
+    ag = d * ag_h
+
+    to_tr = Delta / norm(ag_h)
+    to_bound, _ = step_size_to_bound(x, ag, lb, ub)
+    if to_bound < to_tr:
+        ag_stride = theta * to_bound
+    else:
+        ag_stride = to_tr
+
+    a, b = build_quadratic_1d(J_h, g_h, ag_h, diag=diag_h)
+    ag_stride, ag_value = minimize_quadratic_1d(a, b, 0, ag_stride)
+    ag_h *= ag_stride
+    ag *= ag_stride
+
+    if p_value < r_value and p_value < ag_value:
+        return p, p_h, -p_value
+    elif r_value < p_value and r_value < ag_value:
+        return r, r_h, -r_value
+    else:
+        return ag, ag_h, -ag_value
+
+
+def trf_bounds(fun, jac, x0, f0, J0, lb, ub, ftol, xtol, gtol, max_nfev,
+               x_scale, loss_function, tr_solver, tr_options, verbose):
+    x = x0.copy()
+
+    f = f0
+    f_true = f.copy()
+    nfev = 1
+
+    J = J0
+    njev = 1
+    m, n = J.shape
+
+    if loss_function is not None:
+        rho = loss_function(f)
+        cost = 0.5 * np.sum(rho[0])
+        J, f = scale_for_robust_loss_function(J, f, rho)
+    else:
+        cost = 0.5 * np.dot(f, f)
+
+    g = compute_grad(J, f)
+
+    jac_scale = isinstance(x_scale, str) and x_scale == 'jac'
+    if jac_scale:
+        scale, scale_inv = compute_jac_scale(J)
+    else:
+        scale, scale_inv = x_scale, 1 / x_scale
+
+    v, dv = CL_scaling_vector(x, g, lb, ub)
+    v[dv != 0] *= scale_inv[dv != 0]
+    Delta = norm(x0 * scale_inv / v**0.5)
+    if Delta == 0:
+        Delta = 1.0
+
+    g_norm = norm(g * v, ord=np.inf)
+
+    f_augmented = np.zeros(m + n)
+    if tr_solver == 'exact':
+        J_augmented = np.empty((m + n, n))
+    elif tr_solver == 'lsmr':
+        reg_term = 0.0
+        regularize = tr_options.pop('regularize', True)
+
+    if max_nfev is None:
+        max_nfev = x0.size * 100
+
+    alpha = 0.0  # "Levenberg-Marquardt" parameter
+
+    termination_status = None
+    iteration = 0
+    step_norm = None
+    actual_reduction = None
+
+    if verbose == 2:
+        print_header_nonlinear()
+
+    while True:
+        v, dv = CL_scaling_vector(x, g, lb, ub)
+
+        g_norm = norm(g * v, ord=np.inf)
+        if g_norm < gtol:
+            termination_status = 1
+
+        if verbose == 2:
+            print_iteration_nonlinear(iteration, nfev, cost, actual_reduction,
+                                      step_norm, g_norm)
+
+        if termination_status is not None or nfev == max_nfev:
+            break
+
+        # Now compute variables in "hat" space. Here, we also account for
+        # scaling introduced by `x_scale` parameter. This part is a bit tricky,
+        # you have to write down the formulas and see how the trust-region
+        # problem is formulated when the two types of scaling are applied.
+        # The idea is that first we apply `x_scale` and then apply Coleman-Li
+        # approach in the new variables.
+
+        # v is recomputed in the variables after applying `x_scale`, note that
+        # components which were identically 1 not affected.
+        v[dv != 0] *= scale_inv[dv != 0]
+
+        # Here, we apply two types of scaling.
+        d = v**0.5 * scale
+
+        # C = diag(g * scale) Jv
+        diag_h = g * dv * scale
+
+        # After all this has been done, we continue normally.
+
+        # "hat" gradient.
+        g_h = d * g
+
+        f_augmented[:m] = f
+        if tr_solver == 'exact':
+            J_augmented[:m] = J * d
+            J_h = J_augmented[:m]  # Memory view.
+            J_augmented[m:] = np.diag(diag_h**0.5)
+            U, s, V = svd(J_augmented, full_matrices=False)
+            V = V.T
+            uf = U.T.dot(f_augmented)
+        elif tr_solver == 'lsmr':
+            J_h = right_multiplied_operator(J, d)
+
+            if regularize:
+                a, b = build_quadratic_1d(J_h, g_h, -g_h, diag=diag_h)
+                to_tr = Delta / norm(g_h)
+                ag_value = minimize_quadratic_1d(a, b, 0, to_tr)[1]
+                reg_term = -ag_value / Delta**2
+
+            lsmr_op = regularized_lsq_operator(J_h, (diag_h + reg_term)**0.5)
+            gn_h = lsmr(lsmr_op, f_augmented, **tr_options)[0]
+            S = np.vstack((g_h, gn_h)).T
+            S, _ = qr(S, mode='economic')
+            JS = J_h.dot(S)  # LinearOperator does dot too.
+            B_S = np.dot(JS.T, JS) + np.dot(S.T * diag_h, S)
+            g_S = S.T.dot(g_h)
+
+        # theta controls step back step ratio from the bounds.
+        theta = max(0.995, 1 - g_norm)
+
+        actual_reduction = -1
+        while actual_reduction <= 0 and nfev < max_nfev:
+            if tr_solver == 'exact':
+                p_h, alpha, n_iter = solve_lsq_trust_region(
+                    n, m, uf, s, V, Delta, initial_alpha=alpha)
+            elif tr_solver == 'lsmr':
+                p_S, _ = solve_trust_region_2d(B_S, g_S, Delta)
+                p_h = S.dot(p_S)
+
+            p = d * p_h  # Trust-region solution in the original space.
+            step, step_h, predicted_reduction = select_step(
+                x, J_h, diag_h, g_h, p, p_h, d, Delta, lb, ub, theta)
+
+            x_new = make_strictly_feasible(x + step, lb, ub, rstep=0)
+            f_new = fun(x_new)
+            nfev += 1
+
+            step_h_norm = norm(step_h)
+
+            if not np.all(np.isfinite(f_new)):
+                Delta = 0.25 * step_h_norm
+                continue
+
+            # Usual trust-region step quality estimation.
+            if loss_function is not None:
+                cost_new = loss_function(f_new, cost_only=True)
+            else:
+                cost_new = 0.5 * np.dot(f_new, f_new)
+            actual_reduction = cost - cost_new
+            Delta_new, ratio = update_tr_radius(
+                Delta, actual_reduction, predicted_reduction,
+                step_h_norm, step_h_norm > 0.95 * Delta)
+
+            step_norm = norm(step)
+            termination_status = check_termination(
+                actual_reduction, cost, step_norm, norm(x), ratio, ftol, xtol)
+            if termination_status is not None:
+                break
+
+            alpha *= Delta / Delta_new
+            Delta = Delta_new
+
+        if actual_reduction > 0:
+            x = x_new
+
+            f = f_new
+            f_true = f.copy()
+
+            cost = cost_new
+
+            J = jac(x, f)
+            njev += 1
+
+            if loss_function is not None:
+                rho = loss_function(f)
+                J, f = scale_for_robust_loss_function(J, f, rho)
+
+            g = compute_grad(J, f)
+
+            if jac_scale:
+                scale, scale_inv = compute_jac_scale(J, scale_inv)
+        else:
+            step_norm = 0
+            actual_reduction = 0
+
+        iteration += 1
+
+    if termination_status is None:
+        termination_status = 0
+
+    active_mask = find_active_constraints(x, lb, ub, rtol=xtol)
+    return OptimizeResult(
+        x=x, cost=cost, fun=f_true, jac=J, grad=g, optimality=g_norm,
+        active_mask=active_mask, nfev=nfev, njev=njev,
+        status=termination_status)
+
+
+def trf_no_bounds(fun, jac, x0, f0, J0, ftol, xtol, gtol, max_nfev,
+                  x_scale, loss_function, tr_solver, tr_options, verbose):
+    x = x0.copy()
+
+    f = f0
+    f_true = f.copy()
+    nfev = 1
+
+    J = J0
+    njev = 1
+    m, n = J.shape
+
+    if loss_function is not None:
+        rho = loss_function(f)
+        cost = 0.5 * np.sum(rho[0])
+        J, f = scale_for_robust_loss_function(J, f, rho)
+    else:
+        cost = 0.5 * np.dot(f, f)
+
+    g = compute_grad(J, f)
+
+    jac_scale = isinstance(x_scale, str) and x_scale == 'jac'
+    if jac_scale:
+        scale, scale_inv = compute_jac_scale(J)
+    else:
+        scale, scale_inv = x_scale, 1 / x_scale
+
+    Delta = norm(x0 * scale_inv)
+    if Delta == 0:
+        Delta = 1.0
+
+    if tr_solver == 'lsmr':
+        reg_term = 0
+        damp = tr_options.pop('damp', 0.0)
+        regularize = tr_options.pop('regularize', True)
+
+    if max_nfev is None:
+        max_nfev = x0.size * 100
+
+    alpha = 0.0  # "Levenberg-Marquardt" parameter
+
+    termination_status = None
+    iteration = 0
+    step_norm = None
+    actual_reduction = None
+
+    if verbose == 2:
+        print_header_nonlinear()
+
+    while True:
+        g_norm = norm(g, ord=np.inf)
+        if g_norm < gtol:
+            termination_status = 1
+
+        if verbose == 2:
+            print_iteration_nonlinear(iteration, nfev, cost, actual_reduction,
+                                      step_norm, g_norm)
+
+        if termination_status is not None or nfev == max_nfev:
+            break
+
+        d = scale
+        g_h = d * g
+
+        if tr_solver == 'exact':
+            J_h = J * d
+            U, s, V = svd(J_h, full_matrices=False)
+            V = V.T
+            uf = U.T.dot(f)
+        elif tr_solver == 'lsmr':
+            J_h = right_multiplied_operator(J, d)
+
+            if regularize:
+                a, b = build_quadratic_1d(J_h, g_h, -g_h)
+                to_tr = Delta / norm(g_h)
+                ag_value = minimize_quadratic_1d(a, b, 0, to_tr)[1]
+                reg_term = -ag_value / Delta**2
+
+            damp_full = (damp**2 + reg_term)**0.5
+            gn_h = lsmr(J_h, f, damp=damp_full, **tr_options)[0]
+            S = np.vstack((g_h, gn_h)).T
+            S, _ = qr(S, mode='economic')
+            JS = J_h.dot(S)
+            B_S = np.dot(JS.T, JS)
+            g_S = S.T.dot(g_h)
+
+        actual_reduction = -1
+        while actual_reduction <= 0 and nfev < max_nfev:
+            if tr_solver == 'exact':
+                step_h, alpha, n_iter = solve_lsq_trust_region(
+                    n, m, uf, s, V, Delta, initial_alpha=alpha)
+            elif tr_solver == 'lsmr':
+                p_S, _ = solve_trust_region_2d(B_S, g_S, Delta)
+                step_h = S.dot(p_S)
+
+            predicted_reduction = -evaluate_quadratic(J_h, g_h, step_h)
+            step = d * step_h
+            x_new = x + step
+            f_new = fun(x_new)
+            nfev += 1
+
+            step_h_norm = norm(step_h)
+
+            if not np.all(np.isfinite(f_new)):
+                Delta = 0.25 * step_h_norm
+                continue
+
+            # Usual trust-region step quality estimation.
+            if loss_function is not None:
+                cost_new = loss_function(f_new, cost_only=True)
+            else:
+                cost_new = 0.5 * np.dot(f_new, f_new)
+            actual_reduction = cost - cost_new
+
+            Delta_new, ratio = update_tr_radius(
+                Delta, actual_reduction, predicted_reduction,
+                step_h_norm, step_h_norm > 0.95 * Delta)
+
+            step_norm = norm(step)
+            termination_status = check_termination(
+                actual_reduction, cost, step_norm, norm(x), ratio, ftol, xtol)
+            if termination_status is not None:
+                break
+
+            alpha *= Delta / Delta_new
+            Delta = Delta_new
+
+        if actual_reduction > 0:
+            x = x_new
+
+            f = f_new
+            f_true = f.copy()
+
+            cost = cost_new
+
+            J = jac(x, f)
+            njev += 1
+
+            if loss_function is not None:
+                rho = loss_function(f)
+                J, f = scale_for_robust_loss_function(J, f, rho)
+
+            g = compute_grad(J, f)
+
+            if jac_scale:
+                scale, scale_inv = compute_jac_scale(J, scale_inv)
+        else:
+            step_norm = 0
+            actual_reduction = 0
+
+        iteration += 1
+
+    if termination_status is None:
+        termination_status = 0
+
+    active_mask = np.zeros_like(x)
+    return OptimizeResult(
+        x=x, cost=cost, fun=f_true, jac=J, grad=g, optimality=g_norm,
+        active_mask=active_mask, nfev=nfev, njev=njev,
+        status=termination_status)
@@ -0,0 +1,249 @@
+"""The adaptation of Trust Region Reflective algorithm for a linear
+least-squares problem."""
+import numpy as np
+from numpy.linalg import norm
+from scipy.linalg import qr, solve_triangular
+from scipy.sparse.linalg import lsmr
+from scipy.optimize import OptimizeResult
+
+from .givens_elimination import givens_elimination
+from .common import (
+    EPS, step_size_to_bound, find_active_constraints, in_bounds,
+    make_strictly_feasible, build_quadratic_1d, evaluate_quadratic,
+    minimize_quadratic_1d, CL_scaling_vector, reflective_transformation,
+    print_header_linear, print_iteration_linear, compute_grad,
+    regularized_lsq_operator, right_multiplied_operator)
+
+
+def regularized_lsq_with_qr(m, n, R, QTb, perm, diag, copy_R=True):
+    """Solve regularized least squares using information from QR-decomposition.
+
+    The initial problem is to solve the following system in a least-squares
+    sense::
+
+        A x = b
+        D x = 0
+
+    where D is diagonal matrix. The method is based on QR decomposition
+    of the form A P = Q R, where P is a column permutation matrix, Q is an
+    orthogonal matrix and R is an upper triangular matrix.
+
+    Parameters
+    ----------
+    m, n : int
+        Initial shape of A.
+    R : ndarray, shape (n, n)
+        Upper triangular matrix from QR decomposition of A.
+    QTb : ndarray, shape (n,)
+        First n components of Q^T b.
+    perm : ndarray, shape (n,)
+        Array defining column permutation of A, such that ith column of
+        P is perm[i]-th column of identity matrix.
+    diag : ndarray, shape (n,)
+        Array containing diagonal elements of D.
+
+    Returns
+    -------
+    x : ndarray, shape (n,)
+        Found least-squares solution.
+    """
+    if copy_R:
+        R = R.copy()
+    v = QTb.copy()
+
+    givens_elimination(R, v, diag[perm])
+
+    abs_diag_R = np.abs(np.diag(R))
+    threshold = EPS * max(m, n) * np.max(abs_diag_R)
+    nns, = np.nonzero(abs_diag_R > threshold)
+
+    R = R[np.ix_(nns, nns)]
+    v = v[nns]
+
+    x = np.zeros(n)
+    x[perm[nns]] = solve_triangular(R, v)
+
+    return x
+
+
+def backtracking(A, g, x, p, theta, p_dot_g, lb, ub):
+    """Find an appropriate step size using backtracking line search."""
+    alpha = 1
+    while True:
+        x_new, _ = reflective_transformation(x + alpha * p, lb, ub)
+        step = x_new - x
+        cost_change = -evaluate_quadratic(A, g, step)
+        if cost_change > -0.1 * alpha * p_dot_g:
+            break
+        alpha *= 0.5
+
+    active = find_active_constraints(x_new, lb, ub)
+    if np.any(active != 0):
+        x_new, _ = reflective_transformation(x + theta * alpha * p, lb, ub)
+        x_new = make_strictly_feasible(x_new, lb, ub, rstep=0)
+        step = x_new - x
+        cost_change = -evaluate_quadratic(A, g, step)
+
+    return x, step, cost_change
+
+
+def select_step(x, A_h, g_h, c_h, p, p_h, d, lb, ub, theta):
+    """Select the best step according to Trust Region Reflective algorithm."""
+    if in_bounds(x + p, lb, ub):
+        return p
+
+    p_stride, hits = step_size_to_bound(x, p, lb, ub)
+    r_h = np.copy(p_h)
+    r_h[hits.astype(bool)] *= -1
+    r = d * r_h
+
+    # Restrict step, such that it hits the bound.
+    p *= p_stride
+    p_h *= p_stride
+    x_on_bound = x + p
+
+    # Find the step size along reflected direction.
+    r_stride_u, _ = step_size_to_bound(x_on_bound, r, lb, ub)
+
+    # Stay interior.
+    r_stride_l = (1 - theta) * r_stride_u
+    r_stride_u *= theta
+
+    if r_stride_u > 0:
+        a, b, c = build_quadratic_1d(A_h, g_h, r_h, s0=p_h, diag=c_h)
+        r_stride, r_value = minimize_quadratic_1d(
+            a, b, r_stride_l, r_stride_u, c=c)
+        r_h = p_h + r_h * r_stride
+        r = d * r_h
+    else:
+        r_value = np.inf
+
+    # Now correct p_h to make it strictly interior.
+    p_h *= theta
+    p *= theta
+    p_value = evaluate_quadratic(A_h, g_h, p_h, diag=c_h)
+
+    ag_h = -g_h
+    ag = d * ag_h
+    ag_stride_u, _ = step_size_to_bound(x, ag, lb, ub)
+    ag_stride_u *= theta
+    a, b = build_quadratic_1d(A_h, g_h, ag_h, diag=c_h)
+    ag_stride, ag_value = minimize_quadratic_1d(a, b, 0, ag_stride_u)
+    ag *= ag_stride
+
+    if p_value < r_value and p_value < ag_value:
+        return p
+    elif r_value < p_value and r_value < ag_value:
+        return r
+    else:
+        return ag
+
+
+def trf_linear(A, b, x_lsq, lb, ub, tol, lsq_solver, lsmr_tol,
+               max_iter, verbose, *, lsmr_maxiter=None):
+    m, n = A.shape
+    x, _ = reflective_transformation(x_lsq, lb, ub)
+    x = make_strictly_feasible(x, lb, ub, rstep=0.1)
+
+    if lsq_solver == 'exact':
+        QT, R, perm = qr(A, mode='economic', pivoting=True)
+        QT = QT.T
+
+        if m < n:
+            R = np.vstack((R, np.zeros((n - m, n))))
+
+        QTr = np.zeros(n)
+        k = min(m, n)
+    elif lsq_solver == 'lsmr':
+        r_aug = np.zeros(m + n)
+        auto_lsmr_tol = False
+        if lsmr_tol is None:
+            lsmr_tol = 1e-2 * tol
+        elif lsmr_tol == 'auto':
+            auto_lsmr_tol = True
+
+    r = A.dot(x) - b
+    g = compute_grad(A, r)
+    cost = 0.5 * np.dot(r, r)
+    initial_cost = cost
+
+    termination_status = None
+    step_norm = None
+    cost_change = None
+
+    if max_iter is None:
+        max_iter = 100
+
+    if verbose == 2:
+        print_header_linear()
+
+    for iteration in range(max_iter):
+        v, dv = CL_scaling_vector(x, g, lb, ub)
+        g_scaled = g * v
+        g_norm = norm(g_scaled, ord=np.inf)
+        if g_norm < tol:
+            termination_status = 1
+
+        if verbose == 2:
+            print_iteration_linear(iteration, cost, cost_change,
+                                   step_norm, g_norm)
+
+        if termination_status is not None:
+            break
+
+        diag_h = g * dv
+        diag_root_h = diag_h ** 0.5
+        d = v ** 0.5
+        g_h = d * g
+
+        A_h = right_multiplied_operator(A, d)
+        if lsq_solver == 'exact':
+            QTr[:k] = QT.dot(r)
+            p_h = -regularized_lsq_with_qr(m, n, R * d[perm], QTr, perm,
+                                           diag_root_h, copy_R=False)
+        elif lsq_solver == 'lsmr':
+            lsmr_op = regularized_lsq_operator(A_h, diag_root_h)
+            r_aug[:m] = r
+            if auto_lsmr_tol:
+                eta = 1e-2 * min(0.5, g_norm)
+                lsmr_tol = max(EPS, min(0.1, eta * g_norm))
+            p_h = -lsmr(lsmr_op, r_aug, maxiter=lsmr_maxiter,
+                        atol=lsmr_tol, btol=lsmr_tol)[0]
+
+        p = d * p_h
+
+        p_dot_g = np.dot(p, g)
+        if p_dot_g > 0:
+            termination_status = -1
+
+        theta = 1 - min(0.005, g_norm)
+        step = select_step(x, A_h, g_h, diag_h, p, p_h, d, lb, ub, theta)
+        cost_change = -evaluate_quadratic(A, g, step)
+
+        # Perhaps almost never executed, the idea is that `p` is descent
+        # direction thus we must find acceptable cost decrease using simple
+        # "backtracking", otherwise the algorithm's logic would break.
+        if cost_change < 0:
+            x, step, cost_change = backtracking(
+                A, g, x, p, theta, p_dot_g, lb, ub)
+        else:
+            x = make_strictly_feasible(x + step, lb, ub, rstep=0)
+
+        step_norm = norm(step)
+        r = A.dot(x) - b
+        g = compute_grad(A, r)
+
+        if cost_change < tol * cost:
+            termination_status = 2
+
+        cost = 0.5 * np.dot(r, r)
+
+    if termination_status is None:
+        termination_status = 0
+
+    active_mask = find_active_constraints(x, lb, ub, rtol=tol)
+
+    return OptimizeResult(
+        x=x, fun=r, cost=cost, optimality=g_norm, active_mask=active_mask,
+        nit=iteration + 1, status=termination_status,
+        initial_cost=initial_cost)
@@ -0,0 +1,392 @@
+import warnings
+import numpy as np
+from scipy.sparse import csc_array, vstack, issparse
+from scipy._lib._util import VisibleDeprecationWarning
+from ._highs._highs_wrapper import _highs_wrapper  # type: ignore[import]
+from ._constraints import LinearConstraint, Bounds
+from ._optimize import OptimizeResult
+from ._linprog_highs import _highs_to_scipy_status_message
+
+
+def _constraints_to_components(constraints):
+    """
+    Convert sequence of constraints to a single set of components A, b_l, b_u.
+
+    `constraints` could be
+
+    1. A LinearConstraint
+    2. A tuple representing a LinearConstraint
+    3. An invalid object
+    4. A sequence of composed entirely of objects of type 1/2
+    5. A sequence containing at least one object of type 3
+
+    We want to accept 1, 2, and 4 and reject 3 and 5.
+    """
+    message = ("`constraints` (or each element within `constraints`) must be "
+               "convertible into an instance of "
+               "`scipy.optimize.LinearConstraint`.")
+    As = []
+    b_ls = []
+    b_us = []
+
+    # Accept case 1 by standardizing as case 4
+    if isinstance(constraints, LinearConstraint):
+        constraints = [constraints]
+    else:
+        # Reject case 3
+        try:
+            iter(constraints)
+        except TypeError as exc:
+            raise ValueError(message) from exc
+
+        # Accept case 2 by standardizing as case 4
+        if len(constraints) == 3:
+            # argument could be a single tuple representing a LinearConstraint
+            try:
+                constraints = [LinearConstraint(*constraints)]
+            except (TypeError, ValueError, VisibleDeprecationWarning):
+                # argument was not a tuple representing a LinearConstraint
+                pass
+
+    # Address cases 4/5
+    for constraint in constraints:
+        # if it's not a LinearConstraint or something that represents a
+        # LinearConstraint at this point, it's invalid
+        if not isinstance(constraint, LinearConstraint):
+            try:
+                constraint = LinearConstraint(*constraint)
+            except TypeError as exc:
+                raise ValueError(message) from exc
+        As.append(csc_array(constraint.A))
+        b_ls.append(np.atleast_1d(constraint.lb).astype(np.float64))
+        b_us.append(np.atleast_1d(constraint.ub).astype(np.float64))
+
+    if len(As) > 1:
+        A = vstack(As, format="csc")
+        b_l = np.concatenate(b_ls)
+        b_u = np.concatenate(b_us)
+    else:  # avoid unnecessary copying
+        A = As[0]
+        b_l = b_ls[0]
+        b_u = b_us[0]
+
+    return A, b_l, b_u
+
+
+def _milp_iv(c, integrality, bounds, constraints, options):
+    # objective IV
+    if issparse(c):
+        raise ValueError("`c` must be a dense array.")
+    c = np.atleast_1d(c).astype(np.float64)
+    if c.ndim != 1 or c.size == 0 or not np.all(np.isfinite(c)):
+        message = ("`c` must be a one-dimensional array of finite numbers "
+                   "with at least one element.")
+        raise ValueError(message)
+
+    # integrality IV
+    if issparse(integrality):
+        raise ValueError("`integrality` must be a dense array.")
+    message = ("`integrality` must contain integers 0-3 and be broadcastable "
+               "to `c.shape`.")
+    if integrality is None:
+        integrality = 0
+    try:
+        integrality = np.broadcast_to(integrality, c.shape).astype(np.uint8)
+    except ValueError:
+        raise ValueError(message)
+    if integrality.min() < 0 or integrality.max() > 3:
+        raise ValueError(message)
+
+    # bounds IV
+    if bounds is None:
+        bounds = Bounds(0, np.inf)
+    elif not isinstance(bounds, Bounds):
+        message = ("`bounds` must be convertible into an instance of "
+                   "`scipy.optimize.Bounds`.")
+        try:
+            bounds = Bounds(*bounds)
+        except TypeError as exc:
+            raise ValueError(message) from exc
+
+    try:
+        lb = np.broadcast_to(bounds.lb, c.shape).astype(np.float64)
+        ub = np.broadcast_to(bounds.ub, c.shape).astype(np.float64)
+    except (ValueError, TypeError) as exc:
+        message = ("`bounds.lb` and `bounds.ub` must contain reals and "
+                   "be broadcastable to `c.shape`.")
+        raise ValueError(message) from exc
+
+    # constraints IV
+    if not constraints:
+        constraints = [LinearConstraint(np.empty((0, c.size)),
+                                        np.empty((0,)), np.empty((0,)))]
+    try:
+        A, b_l, b_u = _constraints_to_components(constraints)
+    except ValueError as exc:
+        message = ("`constraints` (or each element within `constraints`) must "
+                   "be convertible into an instance of "
+                   "`scipy.optimize.LinearConstraint`.")
+        raise ValueError(message) from exc
+
+    if A.shape != (b_l.size, c.size):
+        message = "The shape of `A` must be (len(b_l), len(c))."
+        raise ValueError(message)
+    indptr, indices, data = A.indptr, A.indices, A.data.astype(np.float64)
+
+    # options IV
+    options = options or {}
+    supported_options = {'disp', 'presolve', 'time_limit', 'node_limit',
+                         'mip_rel_gap'}
+    unsupported_options = set(options).difference(supported_options)
+    if unsupported_options:
+        message = (f"Unrecognized options detected: {unsupported_options}. "
+                   "These will be passed to HiGHS verbatim.")
+        warnings.warn(message, RuntimeWarning, stacklevel=3)
+    options_iv = {'log_to_console': options.pop("disp", False),
+                  'mip_max_nodes': options.pop("node_limit", None)}
+    options_iv.update(options)
+
+    return c, integrality, lb, ub, indptr, indices, data, b_l, b_u, options_iv
+
+
+def milp(c, *, integrality=None, bounds=None, constraints=None, options=None):
+    r"""
+    Mixed-integer linear programming
+
+    Solves problems of the following form:
+
+    .. math::
+
+        \min_x \ & c^T x \\
+        \mbox{such that} \ & b_l \leq A x \leq b_u,\\
+        & l \leq x \leq u, \\
+        & x_i \in \mathbb{Z}, i \in X_i
+
+    where :math:`x` is a vector of decision variables;
+    :math:`c`, :math:`b_l`, :math:`b_u`, :math:`l`, and :math:`u` are vectors;
+    :math:`A` is a matrix, and :math:`X_i` is the set of indices of
+    decision variables that must be integral. (In this context, a
+    variable that can assume only integer values is said to be "integral";
+    it has an "integrality" constraint.)
+
+    Alternatively, that's:
+
+    minimize::
+
+        c @ x
+
+    such that::
+
+        b_l <= A @ x <= b_u
+        l <= x <= u
+        Specified elements of x must be integers
+
+    By default, ``l = 0`` and ``u = np.inf`` unless specified with
+    ``bounds``.
+
+    Parameters
+    ----------
+    c : 1D dense array_like
+        The coefficients of the linear objective function to be minimized.
+        `c` is converted to a double precision array before the problem is
+        solved.
+    integrality : 1D dense array_like, optional
+        Indicates the type of integrality constraint on each decision variable.
+
+        ``0`` : Continuous variable; no integrality constraint.
+
+        ``1`` : Integer variable; decision variable must be an integer
+        within `bounds`.
+
+        ``2`` : Semi-continuous variable; decision variable must be within
+        `bounds` or take value ``0``.
+
+        ``3`` : Semi-integer variable; decision variable must be an integer
+        within `bounds` or take value ``0``.
+
+        By default, all variables are continuous. `integrality` is converted
+        to an array of integers before the problem is solved.
+
+    bounds : scipy.optimize.Bounds, optional
+        Bounds on the decision variables. Lower and upper bounds are converted
+        to double precision arrays before the problem is solved. The
+        ``keep_feasible`` parameter of the `Bounds` object is ignored. If
+        not specified, all decision variables are constrained to be
+        non-negative.
+    constraints : sequence of scipy.optimize.LinearConstraint, optional
+        Linear constraints of the optimization problem. Arguments may be
+        one of the following:
+
+        1. A single `LinearConstraint` object
+        2. A single tuple that can be converted to a `LinearConstraint` object
+           as ``LinearConstraint(*constraints)``
+        3. A sequence composed entirely of objects of type 1. and 2.
+
+        Before the problem is solved, all values are converted to double
+        precision, and the matrices of constraint coefficients are converted to
+        instances of `scipy.sparse.csc_array`. The ``keep_feasible`` parameter
+        of `LinearConstraint` objects is ignored.
+    options : dict, optional
+        A dictionary of solver options. The following keys are recognized.
+
+        disp : bool (default: ``False``)
+            Set to ``True`` if indicators of optimization status are to be
+            printed to the console during optimization.
+        node_limit : int, optional
+            The maximum number of nodes (linear program relaxations) to solve
+            before stopping. Default is no maximum number of nodes.
+        presolve : bool (default: ``True``)
+            Presolve attempts to identify trivial infeasibilities,
+            identify trivial unboundedness, and simplify the problem before
+            sending it to the main solver.
+        time_limit : float, optional
+            The maximum number of seconds allotted to solve the problem.
+            Default is no time limit.
+        mip_rel_gap : float, optional
+            Termination criterion for MIP solver: solver will terminate when
+            the gap between the primal objective value and the dual objective
+            bound, scaled by the primal objective value, is <= mip_rel_gap.
+
+    Returns
+    -------
+    res : OptimizeResult
+        An instance of :class:`scipy.optimize.OptimizeResult`. The object
+        is guaranteed to have the following attributes.
+
+        status : int
+            An integer representing the exit status of the algorithm.
+
+            ``0`` : Optimal solution found.
+
+            ``1`` : Iteration or time limit reached.
+
+            ``2`` : Problem is infeasible.
+
+            ``3`` : Problem is unbounded.
+
+            ``4`` : Other; see message for details.
+
+        success : bool
+            ``True`` when an optimal solution is found and ``False`` otherwise.
+
+        message : str
+            A string descriptor of the exit status of the algorithm.
+
+        The following attributes will also be present, but the values may be
+        ``None``, depending on the solution status.
+
+        x : ndarray
+            The values of the decision variables that minimize the
+            objective function while satisfying the constraints.
+        fun : float
+            The optimal value of the objective function ``c @ x``.
+        mip_node_count : int
+            The number of subproblems or "nodes" solved by the MILP solver.
+        mip_dual_bound : float
+            The MILP solver's final estimate of the lower bound on the optimal
+            solution.
+        mip_gap : float
+            The difference between the primal objective value and the dual
+            objective bound, scaled by the primal objective value.
+
+    Notes
+    -----
+    `milp` is a wrapper of the HiGHS linear optimization software [1]_. The
+    algorithm is deterministic, and it typically finds the global optimum of
+    moderately challenging mixed-integer linear programs (when it exists).
+
+    References
+    ----------
+    .. [1] Huangfu, Q., Galabova, I., Feldmeier, M., and Hall, J. A. J.
+           "HiGHS - high performance software for linear optimization."
+           https://highs.dev/
+    .. [2] Huangfu, Q. and Hall, J. A. J. "Parallelizing the dual revised
+           simplex method." Mathematical Programming Computation, 10 (1),
+           119-142, 2018. DOI: 10.1007/s12532-017-0130-5
+
+    Examples
+    --------
+    Consider the problem at
+    https://en.wikipedia.org/wiki/Integer_programming#Example, which is
+    expressed as a maximization problem of two variables. Since `milp` requires
+    that the problem be expressed as a minimization problem, the objective
+    function coefficients on the decision variables are:
+
+    >>> import numpy as np
+    >>> c = -np.array([0, 1])
+
+    Note the negative sign: we maximize the original objective function
+    by minimizing the negative of the objective function.
+
+    We collect the coefficients of the constraints into arrays like:
+
+    >>> A = np.array([[-1, 1], [3, 2], [2, 3]])
+    >>> b_u = np.array([1, 12, 12])
+    >>> b_l = np.full_like(b_u, -np.inf)
+
+    Because there is no lower limit on these constraints, we have defined a
+    variable ``b_l`` full of values representing negative infinity. This may
+    be unfamiliar to users of `scipy.optimize.linprog`, which only accepts
+    "less than" (or "upper bound") inequality constraints of the form
+    ``A_ub @ x <= b_u``. By accepting both ``b_l`` and ``b_u`` of constraints
+    ``b_l <= A_ub @ x <= b_u``, `milp` makes it easy to specify "greater than"
+    inequality constraints, "less than" inequality constraints, and equality
+    constraints concisely.
+
+    These arrays are collected into a single `LinearConstraint` object like:
+
+    >>> from scipy.optimize import LinearConstraint
+    >>> constraints = LinearConstraint(A, b_l, b_u)
+
+    The non-negativity bounds on the decision variables are enforced by
+    default, so we do not need to provide an argument for `bounds`.
+
+    Finally, the problem states that both decision variables must be integers:
+
+    >>> integrality = np.ones_like(c)
+
+    We solve the problem like:
+
+    >>> from scipy.optimize import milp
+    >>> res = milp(c=c, constraints=constraints, integrality=integrality)
+    >>> res.x
+    [1.0, 2.0]
+
+    Note that had we solved the relaxed problem (without integrality
+    constraints):
+
+    >>> res = milp(c=c, constraints=constraints)  # OR:
+    >>> # from scipy.optimize import linprog; res = linprog(c, A, b_u)
+    >>> res.x
+    [1.8, 2.8]
+
+    we would not have obtained the correct solution by rounding to the nearest
+    integers.
+
+    Other examples are given :ref:`in the tutorial <tutorial-optimize_milp>`.
+
+    """
+    args_iv = _milp_iv(c, integrality, bounds, constraints, options)
+    c, integrality, lb, ub, indptr, indices, data, b_l, b_u, options = args_iv
+
+    highs_res = _highs_wrapper(c, indptr, indices, data, b_l, b_u,
+                               lb, ub, integrality, options)
+
+    res = {}
+
+    # Convert to scipy-style status and message
+    highs_status = highs_res.get('status', None)
+    highs_message = highs_res.get('message', None)
+    status, message = _highs_to_scipy_status_message(highs_status,
+                                                     highs_message)
+    res['status'] = status
+    res['message'] = message
+    res['success'] = (status == 0)
+    x = highs_res.get('x', None)
+    res['x'] = np.array(x) if x is not None else None
+    res['fun'] = highs_res.get('fun', None)
+    res['mip_node_count'] = highs_res.get('mip_node_count', None)
+    res['mip_dual_bound'] = highs_res.get('mip_dual_bound', None)
+    res['mip_gap'] = highs_res.get('mip_gap', None)
+
+    return OptimizeResult(res)
@@ -0,0 +1,164 @@
+import numpy as np
+from scipy.linalg import solve, LinAlgWarning
+import warnings
+
+__all__ = ['nnls']
+
+
+def nnls(A, b, maxiter=None, *, atol=None):
+    """
+    Solve ``argmin_x || Ax - b ||_2`` for ``x>=0``.
+
+    This problem, often called as NonNegative Least Squares, is a convex
+    optimization problem with convex constraints. It typically arises when
+    the ``x`` models quantities for which only nonnegative values are
+    attainable; weight of ingredients, component costs and so on.
+
+    Parameters
+    ----------
+    A : (m, n) ndarray
+        Coefficient array
+    b : (m,) ndarray, float
+        Right-hand side vector.
+    maxiter: int, optional
+        Maximum number of iterations, optional. Default value is ``3 * n``.
+    atol: float
+        Tolerance value used in the algorithm to assess closeness to zero in
+        the projected residual ``(A.T @ (A x - b)`` entries. Increasing this
+        value relaxes the solution constraints. A typical relaxation value can
+        be selected as ``max(m, n) * np.linalg.norm(a, 1) * np.spacing(1.)``.
+        This value is not set as default since the norm operation becomes
+        expensive for large problems hence can be used only when necessary.
+
+    Returns
+    -------
+    x : ndarray
+        Solution vector.
+    rnorm : float
+        The 2-norm of the residual, ``|| Ax-b ||_2``.
+
+    See Also
+    --------
+    lsq_linear : Linear least squares with bounds on the variables
+
+    Notes
+    -----
+    The code is based on [2]_ which is an improved version of the classical
+    algorithm of [1]_. It utilizes an active set method and solves the KKT
+    (Karush-Kuhn-Tucker) conditions for the non-negative least squares problem.
+
+    References
+    ----------
+    .. [1] : Lawson C., Hanson R.J., "Solving Least Squares Problems", SIAM,
+       1995, :doi:`10.1137/1.9781611971217`
+    .. [2] : Bro, Rasmus and de Jong, Sijmen, "A Fast Non-Negativity-
+       Constrained Least Squares Algorithm", Journal Of Chemometrics, 1997,
+       :doi:`10.1002/(SICI)1099-128X(199709/10)11:5<393::AID-CEM483>3.0.CO;2-L`
+
+     Examples
+    --------
+    >>> import numpy as np
+    >>> from scipy.optimize import nnls
+    ...
+    >>> A = np.array([[1, 0], [1, 0], [0, 1]])
+    >>> b = np.array([2, 1, 1])
+    >>> nnls(A, b)
+    (array([1.5, 1. ]), 0.7071067811865475)
+
+    >>> b = np.array([-1, -1, -1])
+    >>> nnls(A, b)
+    (array([0., 0.]), 1.7320508075688772)
+
+    """
+
+    A = np.asarray_chkfinite(A)
+    b = np.asarray_chkfinite(b)
+
+    if len(A.shape) != 2:
+        raise ValueError("Expected a two-dimensional array (matrix)" +
+                         f", but the shape of A is {A.shape}")
+    if len(b.shape) != 1:
+        raise ValueError("Expected a one-dimensional array (vector)" +
+                         f", but the shape of b is {b.shape}")
+
+    m, n = A.shape
+
+    if m != b.shape[0]:
+        raise ValueError(
+                "Incompatible dimensions. The first dimension of " +
+                f"A is {m}, while the shape of b is {(b.shape[0], )}")
+
+    x, rnorm, mode = _nnls(A, b, maxiter, tol=atol)
+    if mode != 1:
+        raise RuntimeError("Maximum number of iterations reached.")
+
+    return x, rnorm
+
+
+def _nnls(A, b, maxiter=None, tol=None):
+    """
+    This is a single RHS algorithm from ref [2] above. For multiple RHS
+    support, the algorithm is given in  :doi:`10.1002/cem.889`
+    """
+    m, n = A.shape
+
+    AtA = A.T @ A
+    Atb = b @ A  # Result is 1D - let NumPy figure it out
+
+    if not maxiter:
+        maxiter = 3*n
+    if tol is None:
+        tol = 10 * max(m, n) * np.spacing(1.)
+
+    # Initialize vars
+    x = np.zeros(n, dtype=np.float64)
+    s = np.zeros(n, dtype=np.float64)
+    # Inactive constraint switches
+    P = np.zeros(n, dtype=bool)
+
+    # Projected residual
+    w = Atb.copy().astype(np.float64)  # x=0. Skip (-AtA @ x) term
+
+    # Overall iteration counter
+    # Outer loop is not counted, inner iter is counted across outer spins
+    iter = 0
+
+    while (not P.all()) and (w[~P] > tol).any():  # B
+        # Get the "most" active coeff index and move to inactive set
+        k = np.argmax(w * (~P))  # B.2
+        P[k] = True  # B.3
+
+        # Iteration solution
+        s[:] = 0.
+        # B.4
+        with warnings.catch_warnings():
+            warnings.filterwarnings('ignore', message='Ill-conditioned matrix',
+                                    category=LinAlgWarning)
+            s[P] = solve(AtA[np.ix_(P, P)], Atb[P], assume_a='sym', check_finite=False)
+
+        # Inner loop
+        while (iter < maxiter) and (s[P].min() < 0):  # C.1
+            iter += 1
+            inds = P * (s < 0)
+            alpha = (x[inds] / (x[inds] - s[inds])).min()  # C.2
+            x *= (1 - alpha)
+            x += alpha*s
+            P[x <= tol] = False
+            with warnings.catch_warnings():
+                warnings.filterwarnings('ignore', message='Ill-conditioned matrix',
+                                        category=LinAlgWarning)
+                s[P] = solve(AtA[np.ix_(P, P)], Atb[P], assume_a='sym',
+                             check_finite=False)
+            s[~P] = 0  # C.6
+
+        x[:] = s[:]
+        w[:] = Atb - AtA @ x
+
+        if iter == maxiter:
+            # Typically following line should return
+            # return x, np.linalg.norm(A@x - b), -1
+            # however at the top level, -1 raises an exception wasting norm
+            # Instead return dummy number 0.
+            return x, 0., -1
+
+    return x, np.linalg.norm(A@x - b), 1
@@ -0,0 +1,775 @@
+"""Routines for numerical differentiation."""
+import functools
+import numpy as np
+from numpy.linalg import norm
+
+from scipy.sparse.linalg import LinearOperator
+from ..sparse import issparse, csc_matrix, csr_matrix, coo_matrix, find
+from ._group_columns import group_dense, group_sparse
+from scipy._lib._array_api import atleast_nd, array_namespace
+
+
+def _adjust_scheme_to_bounds(x0, h, num_steps, scheme, lb, ub):
+    """Adjust final difference scheme to the presence of bounds.
+
+    Parameters
+    ----------
+    x0 : ndarray, shape (n,)
+        Point at which we wish to estimate derivative.
+    h : ndarray, shape (n,)
+        Desired absolute finite difference steps.
+    num_steps : int
+        Number of `h` steps in one direction required to implement finite
+        difference scheme. For example, 2 means that we need to evaluate
+        f(x0 + 2 * h) or f(x0 - 2 * h)
+    scheme : {'1-sided', '2-sided'}
+        Whether steps in one or both directions are required. In other
+        words '1-sided' applies to forward and backward schemes, '2-sided'
+        applies to center schemes.
+    lb : ndarray, shape (n,)
+        Lower bounds on independent variables.
+    ub : ndarray, shape (n,)
+        Upper bounds on independent variables.
+
+    Returns
+    -------
+    h_adjusted : ndarray, shape (n,)
+        Adjusted absolute step sizes. Step size decreases only if a sign flip
+        or switching to one-sided scheme doesn't allow to take a full step.
+    use_one_sided : ndarray of bool, shape (n,)
+        Whether to switch to one-sided scheme. Informative only for
+        ``scheme='2-sided'``.
+    """
+    if scheme == '1-sided':
+        use_one_sided = np.ones_like(h, dtype=bool)
+    elif scheme == '2-sided':
+        h = np.abs(h)
+        use_one_sided = np.zeros_like(h, dtype=bool)
+    else:
+        raise ValueError("`scheme` must be '1-sided' or '2-sided'.")
+
+    if np.all((lb == -np.inf) & (ub == np.inf)):
+        return h, use_one_sided
+
+    h_total = h * num_steps
+    h_adjusted = h.copy()
+
+    lower_dist = x0 - lb
+    upper_dist = ub - x0
+
+    if scheme == '1-sided':
+        x = x0 + h_total
+        violated = (x < lb) | (x > ub)
+        fitting = np.abs(h_total) <= np.maximum(lower_dist, upper_dist)
+        h_adjusted[violated & fitting] *= -1
+
+        forward = (upper_dist >= lower_dist) & ~fitting
+        h_adjusted[forward] = upper_dist[forward] / num_steps
+        backward = (upper_dist < lower_dist) & ~fitting
+        h_adjusted[backward] = -lower_dist[backward] / num_steps
+    elif scheme == '2-sided':
+        central = (lower_dist >= h_total) & (upper_dist >= h_total)
+
+        forward = (upper_dist >= lower_dist) & ~central
+        h_adjusted[forward] = np.minimum(
+            h[forward], 0.5 * upper_dist[forward] / num_steps)
+        use_one_sided[forward] = True
+
+        backward = (upper_dist < lower_dist) & ~central
+        h_adjusted[backward] = -np.minimum(
+            h[backward], 0.5 * lower_dist[backward] / num_steps)
+        use_one_sided[backward] = True
+
+        min_dist = np.minimum(upper_dist, lower_dist) / num_steps
+        adjusted_central = (~central & (np.abs(h_adjusted) <= min_dist))
+        h_adjusted[adjusted_central] = min_dist[adjusted_central]
+        use_one_sided[adjusted_central] = False
+
+    return h_adjusted, use_one_sided
+
+
+@functools.lru_cache
+def _eps_for_method(x0_dtype, f0_dtype, method):
+    """
+    Calculates relative EPS step to use for a given data type
+    and numdiff step method.
+
+    Progressively smaller steps are used for larger floating point types.
+
+    Parameters
+    ----------
+    f0_dtype: np.dtype
+        dtype of function evaluation
+
+    x0_dtype: np.dtype
+        dtype of parameter vector
+
+    method: {'2-point', '3-point', 'cs'}
+
+    Returns
+    -------
+    EPS: float
+        relative step size. May be np.float16, np.float32, np.float64
+
+    Notes
+    -----
+    The default relative step will be np.float64. However, if x0 or f0 are
+    smaller floating point types (np.float16, np.float32), then the smallest
+    floating point type is chosen.
+    """
+    # the default EPS value
+    EPS = np.finfo(np.float64).eps
+
+    x0_is_fp = False
+    if np.issubdtype(x0_dtype, np.inexact):
+        # if you're a floating point type then over-ride the default EPS
+        EPS = np.finfo(x0_dtype).eps
+        x0_itemsize = np.dtype(x0_dtype).itemsize
+        x0_is_fp = True
+
+    if np.issubdtype(f0_dtype, np.inexact):
+        f0_itemsize = np.dtype(f0_dtype).itemsize
+        # choose the smallest itemsize between x0 and f0
+        if x0_is_fp and f0_itemsize < x0_itemsize:
+            EPS = np.finfo(f0_dtype).eps
+
+    if method in ["2-point", "cs"]:
+        return EPS**0.5
+    elif method in ["3-point"]:
+        return EPS**(1/3)
+    else:
+        raise RuntimeError("Unknown step method, should be one of "
+                           "{'2-point', '3-point', 'cs'}")
+
+
+def _compute_absolute_step(rel_step, x0, f0, method):
+    """
+    Computes an absolute step from a relative step for finite difference
+    calculation.
+
+    Parameters
+    ----------
+    rel_step: None or array-like
+        Relative step for the finite difference calculation
+    x0 : np.ndarray
+        Parameter vector
+    f0 : np.ndarray or scalar
+    method : {'2-point', '3-point', 'cs'}
+
+    Returns
+    -------
+    h : float
+        The absolute step size
+
+    Notes
+    -----
+    `h` will always be np.float64. However, if `x0` or `f0` are
+    smaller floating point dtypes (e.g. np.float32), then the absolute
+    step size will be calculated from the smallest floating point size.
+    """
+    # this is used instead of np.sign(x0) because we need
+    # sign_x0 to be 1 when x0 == 0.
+    sign_x0 = (x0 >= 0).astype(float) * 2 - 1
+
+    rstep = _eps_for_method(x0.dtype, f0.dtype, method)
+
+    if rel_step is None:
+        abs_step = rstep * sign_x0 * np.maximum(1.0, np.abs(x0))
+    else:
+        # User has requested specific relative steps.
+        # Don't multiply by max(1, abs(x0) because if x0 < 1 then their
+        # requested step is not used.
+        abs_step = rel_step * sign_x0 * np.abs(x0)
+
+        # however we don't want an abs_step of 0, which can happen if
+        # rel_step is 0, or x0 is 0. Instead, substitute a realistic step
+        dx = ((x0 + abs_step) - x0)
+        abs_step = np.where(dx == 0,
+                            rstep * sign_x0 * np.maximum(1.0, np.abs(x0)),
+                            abs_step)
+
+    return abs_step
+
+
+def _prepare_bounds(bounds, x0):
+    """
+    Prepares new-style bounds from a two-tuple specifying the lower and upper
+    limits for values in x0. If a value is not bound then the lower/upper bound
+    will be expected to be -np.inf/np.inf.
+
+    Examples
+    --------
+    >>> _prepare_bounds([(0, 1, 2), (1, 2, np.inf)], [0.5, 1.5, 2.5])
+    (array([0., 1., 2.]), array([ 1.,  2., inf]))
+    """
+    lb, ub = (np.asarray(b, dtype=float) for b in bounds)
+    if lb.ndim == 0:
+        lb = np.resize(lb, x0.shape)
+
+    if ub.ndim == 0:
+        ub = np.resize(ub, x0.shape)
+
+    return lb, ub
+
+
+def group_columns(A, order=0):
+    """Group columns of a 2-D matrix for sparse finite differencing [1]_.
+
+    Two columns are in the same group if in each row at least one of them
+    has zero. A greedy sequential algorithm is used to construct groups.
+
+    Parameters
+    ----------
+    A : array_like or sparse matrix, shape (m, n)
+        Matrix of which to group columns.
+    order : int, iterable of int with shape (n,) or None
+        Permutation array which defines the order of columns enumeration.
+        If int or None, a random permutation is used with `order` used as
+        a random seed. Default is 0, that is use a random permutation but
+        guarantee repeatability.
+
+    Returns
+    -------
+    groups : ndarray of int, shape (n,)
+        Contains values from 0 to n_groups-1, where n_groups is the number
+        of found groups. Each value ``groups[i]`` is an index of a group to
+        which ith column assigned. The procedure was helpful only if
+        n_groups is significantly less than n.
+
+    References
+    ----------
+    .. [1] A. Curtis, M. J. D. Powell, and J. Reid, "On the estimation of
+           sparse Jacobian matrices", Journal of the Institute of Mathematics
+           and its Applications, 13 (1974), pp. 117-120.
+    """
+    if issparse(A):
+        A = csc_matrix(A)
+    else:
+        A = np.atleast_2d(A)
+        A = (A != 0).astype(np.int32)
+
+    if A.ndim != 2:
+        raise ValueError("`A` must be 2-dimensional.")
+
+    m, n = A.shape
+
+    if order is None or np.isscalar(order):
+        rng = np.random.RandomState(order)
+        order = rng.permutation(n)
+    else:
+        order = np.asarray(order)
+        if order.shape != (n,):
+            raise ValueError("`order` has incorrect shape.")
+
+    A = A[:, order]
+
+    if issparse(A):
+        groups = group_sparse(m, n, A.indices, A.indptr)
+    else:
+        groups = group_dense(m, n, A)
+
+    groups[order] = groups.copy()
+
+    return groups
+
+
+def approx_derivative(fun, x0, method='3-point', rel_step=None, abs_step=None,
+                      f0=None, bounds=(-np.inf, np.inf), sparsity=None,
+                      as_linear_operator=False, args=(), kwargs={}):
+    """Compute finite difference approximation of the derivatives of a
+    vector-valued function.
+
+    If a function maps from R^n to R^m, its derivatives form m-by-n matrix
+    called the Jacobian, where an element (i, j) is a partial derivative of
+    f[i] with respect to x[j].
+
+    Parameters
+    ----------
+    fun : callable
+        Function of which to estimate the derivatives. The argument x
+        passed to this function is ndarray of shape (n,) (never a scalar
+        even if n=1). It must return 1-D array_like of shape (m,) or a scalar.
+    x0 : array_like of shape (n,) or float
+        Point at which to estimate the derivatives. Float will be converted
+        to a 1-D array.
+    method : {'3-point', '2-point', 'cs'}, optional
+        Finite difference method to use:
+            - '2-point' - use the first order accuracy forward or backward
+                          difference.
+            - '3-point' - use central difference in interior points and the
+                          second order accuracy forward or backward difference
+                          near the boundary.
+            - 'cs' - use a complex-step finite difference scheme. This assumes
+                     that the user function is real-valued and can be
+                     analytically continued to the complex plane. Otherwise,
+                     produces bogus results.
+    rel_step : None or array_like, optional
+        Relative step size to use. If None (default) the absolute step size is
+        computed as ``h = rel_step * sign(x0) * max(1, abs(x0))``, with
+        `rel_step` being selected automatically, see Notes. Otherwise
+        ``h = rel_step * sign(x0) * abs(x0)``. For ``method='3-point'`` the
+        sign of `h` is ignored. The calculated step size is possibly adjusted
+        to fit into the bounds.
+    abs_step : array_like, optional
+        Absolute step size to use, possibly adjusted to fit into the bounds.
+        For ``method='3-point'`` the sign of `abs_step` is ignored. By default
+        relative steps are used, only if ``abs_step is not None`` are absolute
+        steps used.
+    f0 : None or array_like, optional
+        If not None it is assumed to be equal to ``fun(x0)``, in this case
+        the ``fun(x0)`` is not called. Default is None.
+    bounds : tuple of array_like, optional
+        Lower and upper bounds on independent variables. Defaults to no bounds.
+        Each bound must match the size of `x0` or be a scalar, in the latter
+        case the bound will be the same for all variables. Use it to limit the
+        range of function evaluation. Bounds checking is not implemented
+        when `as_linear_operator` is True.
+    sparsity : {None, array_like, sparse matrix, 2-tuple}, optional
+        Defines a sparsity structure of the Jacobian matrix. If the Jacobian
+        matrix is known to have only few non-zero elements in each row, then
+        it's possible to estimate its several columns by a single function
+        evaluation [3]_. To perform such economic computations two ingredients
+        are required:
+
+        * structure : array_like or sparse matrix of shape (m, n). A zero
+          element means that a corresponding element of the Jacobian
+          identically equals to zero.
+        * groups : array_like of shape (n,). A column grouping for a given
+          sparsity structure, use `group_columns` to obtain it.
+
+        A single array or a sparse matrix is interpreted as a sparsity
+        structure, and groups are computed inside the function. A tuple is
+        interpreted as (structure, groups). If None (default), a standard
+        dense differencing will be used.
+
+        Note, that sparse differencing makes sense only for large Jacobian
+        matrices where each row contains few non-zero elements.
+    as_linear_operator : bool, optional
+        When True the function returns an `scipy.sparse.linalg.LinearOperator`.
+        Otherwise it returns a dense array or a sparse matrix depending on
+        `sparsity`. The linear operator provides an efficient way of computing
+        ``J.dot(p)`` for any vector ``p`` of shape (n,), but does not allow
+        direct access to individual elements of the matrix. By default
+        `as_linear_operator` is False.
+    args, kwargs : tuple and dict, optional
+        Additional arguments passed to `fun`. Both empty by default.
+        The calling signature is ``fun(x, *args, **kwargs)``.
+
+    Returns
+    -------
+    J : {ndarray, sparse matrix, LinearOperator}
+        Finite difference approximation of the Jacobian matrix.
+        If `as_linear_operator` is True returns a LinearOperator
+        with shape (m, n). Otherwise it returns a dense array or sparse
+        matrix depending on how `sparsity` is defined. If `sparsity`
+        is None then a ndarray with shape (m, n) is returned. If
+        `sparsity` is not None returns a csr_matrix with shape (m, n).
+        For sparse matrices and linear operators it is always returned as
+        a 2-D structure, for ndarrays, if m=1 it is returned
+        as a 1-D gradient array with shape (n,).
+
+    See Also
+    --------
+    check_derivative : Check correctness of a function computing derivatives.
+
+    Notes
+    -----
+    If `rel_step` is not provided, it assigned as ``EPS**(1/s)``, where EPS is
+    determined from the smallest floating point dtype of `x0` or `fun(x0)`,
+    ``np.finfo(x0.dtype).eps``, s=2 for '2-point' method and
+    s=3 for '3-point' method. Such relative step approximately minimizes a sum
+    of truncation and round-off errors, see [1]_. Relative steps are used by
+    default. However, absolute steps are used when ``abs_step is not None``.
+    If any of the absolute or relative steps produces an indistinguishable
+    difference from the original `x0`, ``(x0 + dx) - x0 == 0``, then a
+    automatic step size is substituted for that particular entry.
+
+    A finite difference scheme for '3-point' method is selected automatically.
+    The well-known central difference scheme is used for points sufficiently
+    far from the boundary, and 3-point forward or backward scheme is used for
+    points near the boundary. Both schemes have the second-order accuracy in
+    terms of Taylor expansion. Refer to [2]_ for the formulas of 3-point
+    forward and backward difference schemes.
+
+    For dense differencing when m=1 Jacobian is returned with a shape (n,),
+    on the other hand when n=1 Jacobian is returned with a shape (m, 1).
+    Our motivation is the following: a) It handles a case of gradient
+    computation (m=1) in a conventional way. b) It clearly separates these two
+    different cases. b) In all cases np.atleast_2d can be called to get 2-D
+    Jacobian with correct dimensions.
+
+    References
+    ----------
+    .. [1] W. H. Press et. al. "Numerical Recipes. The Art of Scientific
+           Computing. 3rd edition", sec. 5.7.
+
+    .. [2] A. Curtis, M. J. D. Powell, and J. Reid, "On the estimation of
+           sparse Jacobian matrices", Journal of the Institute of Mathematics
+           and its Applications, 13 (1974), pp. 117-120.
+
+    .. [3] B. Fornberg, "Generation of Finite Difference Formulas on
+           Arbitrarily Spaced Grids", Mathematics of Computation 51, 1988.
+
+    Examples
+    --------
+    >>> import numpy as np
+    >>> from scipy.optimize._numdiff import approx_derivative
+    >>>
+    >>> def f(x, c1, c2):
+    ...     return np.array([x[0] * np.sin(c1 * x[1]),
+    ...                      x[0] * np.cos(c2 * x[1])])
+    ...
+    >>> x0 = np.array([1.0, 0.5 * np.pi])
+    >>> approx_derivative(f, x0, args=(1, 2))
+    array([[ 1.,  0.],
+           [-1.,  0.]])
+
+    Bounds can be used to limit the region of function evaluation.
+    In the example below we compute left and right derivative at point 1.0.
+
+    >>> def g(x):
+    ...     return x**2 if x >= 1 else x
+    ...
+    >>> x0 = 1.0
+    >>> approx_derivative(g, x0, bounds=(-np.inf, 1.0))
+    array([ 1.])
+    >>> approx_derivative(g, x0, bounds=(1.0, np.inf))
+    array([ 2.])
+    """
+    if method not in ['2-point', '3-point', 'cs']:
+        raise ValueError("Unknown method '%s'. " % method)
+
+    xp = array_namespace(x0)
+    _x = atleast_nd(x0, ndim=1, xp=xp)
+    _dtype = xp.float64
+    if xp.isdtype(_x.dtype, "real floating"):
+        _dtype = _x.dtype
+
+    # promotes to floating
+    x0 = xp.astype(_x, _dtype)
+
+    if x0.ndim > 1:
+        raise ValueError("`x0` must have at most 1 dimension.")
+
+    lb, ub = _prepare_bounds(bounds, x0)
+
+    if lb.shape != x0.shape or ub.shape != x0.shape:
+        raise ValueError("Inconsistent shapes between bounds and `x0`.")
+
+    if as_linear_operator and not (np.all(np.isinf(lb))
+                                   and np.all(np.isinf(ub))):
+        raise ValueError("Bounds not supported when "
+                         "`as_linear_operator` is True.")
+
+    def fun_wrapped(x):
+        # send user function same fp type as x0. (but only if cs is not being
+        # used
+        if xp.isdtype(x.dtype, "real floating"):
+            x = xp.astype(x, x0.dtype)
+
+        f = np.atleast_1d(fun(x, *args, **kwargs))
+        if f.ndim > 1:
+            raise RuntimeError("`fun` return value has "
+                               "more than 1 dimension.")
+        return f
+
+    if f0 is None:
+        f0 = fun_wrapped(x0)
+    else:
+        f0 = np.atleast_1d(f0)
+        if f0.ndim > 1:
+            raise ValueError("`f0` passed has more than 1 dimension.")
+
+    if np.any((x0 < lb) | (x0 > ub)):
+        raise ValueError("`x0` violates bound constraints.")
+
+    if as_linear_operator:
+        if rel_step is None:
+            rel_step = _eps_for_method(x0.dtype, f0.dtype, method)
+
+        return _linear_operator_difference(fun_wrapped, x0,
+                                           f0, rel_step, method)
+    else:
+        # by default we use rel_step
+        if abs_step is None:
+            h = _compute_absolute_step(rel_step, x0, f0, method)
+        else:
+            # user specifies an absolute step
+            sign_x0 = (x0 >= 0).astype(float) * 2 - 1
+            h = abs_step
+
+            # cannot have a zero step. This might happen if x0 is very large
+            # or small. In which case fall back to relative step.
+            dx = ((x0 + h) - x0)
+            h = np.where(dx == 0,
+                         _eps_for_method(x0.dtype, f0.dtype, method) *
+                         sign_x0 * np.maximum(1.0, np.abs(x0)),
+                         h)
+
+        if method == '2-point':
+            h, use_one_sided = _adjust_scheme_to_bounds(
+                x0, h, 1, '1-sided', lb, ub)
+        elif method == '3-point':
+            h, use_one_sided = _adjust_scheme_to_bounds(
+                x0, h, 1, '2-sided', lb, ub)
+        elif method == 'cs':
+            use_one_sided = False
+
+        if sparsity is None:
+            return _dense_difference(fun_wrapped, x0, f0, h,
+                                     use_one_sided, method)
+        else:
+            if not issparse(sparsity) and len(sparsity) == 2:
+                structure, groups = sparsity
+            else:
+                structure = sparsity
+                groups = group_columns(sparsity)
+
+            if issparse(structure):
+                structure = csc_matrix(structure)
+            else:
+                structure = np.atleast_2d(structure)
+
+            groups = np.atleast_1d(groups)
+            return _sparse_difference(fun_wrapped, x0, f0, h,
+                                      use_one_sided, structure,
+                                      groups, method)
+
+
+def _linear_operator_difference(fun, x0, f0, h, method):
+    m = f0.size
+    n = x0.size
+
+    if method == '2-point':
+        def matvec(p):
+            if np.array_equal(p, np.zeros_like(p)):
+                return np.zeros(m)
+            dx = h / norm(p)
+            x = x0 + dx*p
+            df = fun(x) - f0
+            return df / dx
+
+    elif method == '3-point':
+        def matvec(p):
+            if np.array_equal(p, np.zeros_like(p)):
+                return np.zeros(m)
+            dx = 2*h / norm(p)
+            x1 = x0 - (dx/2)*p
+            x2 = x0 + (dx/2)*p
+            f1 = fun(x1)
+            f2 = fun(x2)
+            df = f2 - f1
+            return df / dx
+
+    elif method == 'cs':
+        def matvec(p):
+            if np.array_equal(p, np.zeros_like(p)):
+                return np.zeros(m)
+            dx = h / norm(p)
+            x = x0 + dx*p*1.j
+            f1 = fun(x)
+            df = f1.imag
+            return df / dx
+
+    else:
+        raise RuntimeError("Never be here.")
+
+    return LinearOperator((m, n), matvec)
+
+
+def _dense_difference(fun, x0, f0, h, use_one_sided, method):
+    m = f0.size
+    n = x0.size
+    J_transposed = np.empty((n, m))
+    h_vecs = np.diag(h)
+
+    for i in range(h.size):
+        if method == '2-point':
+            x = x0 + h_vecs[i]
+            dx = x[i] - x0[i]  # Recompute dx as exactly representable number.
+            df = fun(x) - f0
+        elif method == '3-point' and use_one_sided[i]:
+            x1 = x0 + h_vecs[i]
+            x2 = x0 + 2 * h_vecs[i]
+            dx = x2[i] - x0[i]
+            f1 = fun(x1)
+            f2 = fun(x2)
+            df = -3.0 * f0 + 4 * f1 - f2
+        elif method == '3-point' and not use_one_sided[i]:
+            x1 = x0 - h_vecs[i]
+            x2 = x0 + h_vecs[i]
+            dx = x2[i] - x1[i]
+            f1 = fun(x1)
+            f2 = fun(x2)
+            df = f2 - f1
+        elif method == 'cs':
+            f1 = fun(x0 + h_vecs[i]*1.j)
+            df = f1.imag
+            dx = h_vecs[i, i]
+        else:
+            raise RuntimeError("Never be here.")
+
+        J_transposed[i] = df / dx
+
+    if m == 1:
+        J_transposed = np.ravel(J_transposed)
+
+    return J_transposed.T
+
+
+def _sparse_difference(fun, x0, f0, h, use_one_sided,
+                       structure, groups, method):
+    m = f0.size
+    n = x0.size
+    row_indices = []
+    col_indices = []
+    fractions = []
+
+    n_groups = np.max(groups) + 1
+    for group in range(n_groups):
+        # Perturb variables which are in the same group simultaneously.
+        e = np.equal(group, groups)
+        h_vec = h * e
+        if method == '2-point':
+            x = x0 + h_vec
+            dx = x - x0
+            df = fun(x) - f0
+            # The result is  written to columns which correspond to perturbed
+            # variables.
+            cols, = np.nonzero(e)
+            # Find all non-zero elements in selected columns of Jacobian.
+            i, j, _ = find(structure[:, cols])
+            # Restore column indices in the full array.
+            j = cols[j]
+        elif method == '3-point':
+            # Here we do conceptually the same but separate one-sided
+            # and two-sided schemes.
+            x1 = x0.copy()
+            x2 = x0.copy()
+
+            mask_1 = use_one_sided & e
+            x1[mask_1] += h_vec[mask_1]
+            x2[mask_1] += 2 * h_vec[mask_1]
+
+            mask_2 = ~use_one_sided & e
+            x1[mask_2] -= h_vec[mask_2]
+            x2[mask_2] += h_vec[mask_2]
+
+            dx = np.zeros(n)
+            dx[mask_1] = x2[mask_1] - x0[mask_1]
+            dx[mask_2] = x2[mask_2] - x1[mask_2]
+
+            f1 = fun(x1)
+            f2 = fun(x2)
+
+            cols, = np.nonzero(e)
+            i, j, _ = find(structure[:, cols])
+            j = cols[j]
+
+            mask = use_one_sided[j]
+            df = np.empty(m)
+
+            rows = i[mask]
+            df[rows] = -3 * f0[rows] + 4 * f1[rows] - f2[rows]
+
+            rows = i[~mask]
+            df[rows] = f2[rows] - f1[rows]
+        elif method == 'cs':
+            f1 = fun(x0 + h_vec*1.j)
+            df = f1.imag
+            dx = h_vec
+            cols, = np.nonzero(e)
+            i, j, _ = find(structure[:, cols])
+            j = cols[j]
+        else:
+            raise ValueError("Never be here.")
+
+        # All that's left is to compute the fraction. We store i, j and
+        # fractions as separate arrays and later construct coo_matrix.
+        row_indices.append(i)
+        col_indices.append(j)
+        fractions.append(df[i] / dx[j])
+
+    row_indices = np.hstack(row_indices)
+    col_indices = np.hstack(col_indices)
+    fractions = np.hstack(fractions)
+    J = coo_matrix((fractions, (row_indices, col_indices)), shape=(m, n))
+    return csr_matrix(J)
+
+
+def check_derivative(fun, jac, x0, bounds=(-np.inf, np.inf), args=(),
+                     kwargs={}):
+    """Check correctness of a function computing derivatives (Jacobian or
+    gradient) by comparison with a finite difference approximation.
+
+    Parameters
+    ----------
+    fun : callable
+        Function of which to estimate the derivatives. The argument x
+        passed to this function is ndarray of shape (n,) (never a scalar
+        even if n=1). It must return 1-D array_like of shape (m,) or a scalar.
+    jac : callable
+        Function which computes Jacobian matrix of `fun`. It must work with
+        argument x the same way as `fun`. The return value must be array_like
+        or sparse matrix with an appropriate shape.
+    x0 : array_like of shape (n,) or float
+        Point at which to estimate the derivatives. Float will be converted
+        to 1-D array.
+    bounds : 2-tuple of array_like, optional
+        Lower and upper bounds on independent variables. Defaults to no bounds.
+        Each bound must match the size of `x0` or be a scalar, in the latter
+        case the bound will be the same for all variables. Use it to limit the
+        range of function evaluation.
+    args, kwargs : tuple and dict, optional
+        Additional arguments passed to `fun` and `jac`. Both empty by default.
+        The calling signature is ``fun(x, *args, **kwargs)`` and the same
+        for `jac`.
+
+    Returns
+    -------
+    accuracy : float
+        The maximum among all relative errors for elements with absolute values
+        higher than 1 and absolute errors for elements with absolute values
+        less or equal than 1. If `accuracy` is on the order of 1e-6 or lower,
+        then it is likely that your `jac` implementation is correct.
+
+    See Also
+    --------
+    approx_derivative : Compute finite difference approximation of derivative.
+
+    Examples
+    --------
+    >>> import numpy as np
+    >>> from scipy.optimize._numdiff import check_derivative
+    >>>
+    >>>
+    >>> def f(x, c1, c2):
+    ...     return np.array([x[0] * np.sin(c1 * x[1]),
+    ...                      x[0] * np.cos(c2 * x[1])])
+    ...
+    >>> def jac(x, c1, c2):
+    ...     return np.array([
+    ...         [np.sin(c1 * x[1]),  c1 * x[0] * np.cos(c1 * x[1])],
+    ...         [np.cos(c2 * x[1]), -c2 * x[0] * np.sin(c2 * x[1])]
+    ...     ])
+    ...
+    >>>
+    >>> x0 = np.array([1.0, 0.5 * np.pi])
+    >>> check_derivative(f, jac, x0, args=(1, 2))
+    2.4492935982947064e-16
+    """
+    J_to_test = jac(x0, *args, **kwargs)
+    if issparse(J_to_test):
+        J_diff = approx_derivative(fun, x0, bounds=bounds, sparsity=J_to_test,
+                                   args=args, kwargs=kwargs)
+        J_to_test = csr_matrix(J_to_test)
+        abs_err = J_to_test - J_diff
+        i, j, abs_err_data = find(abs_err)
+        J_diff_data = np.asarray(J_diff[i, j]).ravel()
+        return np.max(np.abs(abs_err_data) /
+                      np.maximum(1, np.abs(J_diff_data)))
+    else:
+        J_diff = approx_derivative(fun, x0, bounds=bounds,
+                                   args=args, kwargs=kwargs)
+        abs_err = np.abs(J_to_test - J_diff)
+        return np.max(abs_err / np.maximum(1, np.abs(J_diff)))
@@ -0,0 +1,731 @@
+import numpy as np
+import operator
+from . import (linear_sum_assignment, OptimizeResult)
+from ._optimize import _check_unknown_options
+
+from scipy._lib._util import check_random_state
+import itertools
+
+QUADRATIC_ASSIGNMENT_METHODS = ['faq', '2opt']
+
+def quadratic_assignment(A, B, method="faq", options=None):
+    r"""
+    Approximates solution to the quadratic assignment problem and
+    the graph matching problem.
+
+    Quadratic assignment solves problems of the following form:
+
+    .. math::
+
+        \min_P & \ {\ \text{trace}(A^T P B P^T)}\\
+        \mbox{s.t. } & {P \ \epsilon \ \mathcal{P}}\\
+
+    where :math:`\mathcal{P}` is the set of all permutation matrices,
+    and :math:`A` and :math:`B` are square matrices.
+
+    Graph matching tries to *maximize* the same objective function.
+    This algorithm can be thought of as finding the alignment of the
+    nodes of two graphs that minimizes the number of induced edge
+    disagreements, or, in the case of weighted graphs, the sum of squared
+    edge weight differences.
+
+    Note that the quadratic assignment problem is NP-hard. The results given
+    here are approximations and are not guaranteed to be optimal.
+
+
+    Parameters
+    ----------
+    A : 2-D array, square
+        The square matrix :math:`A` in the objective function above.
+
+    B : 2-D array, square
+        The square matrix :math:`B` in the objective function above.
+
+    method :  str in {'faq', '2opt'} (default: 'faq')
+        The algorithm used to solve the problem.
+        :ref:`'faq' <optimize.qap-faq>` (default) and
+        :ref:`'2opt' <optimize.qap-2opt>` are available.
+
+    options : dict, optional
+        A dictionary of solver options. All solvers support the following:
+
+        maximize : bool (default: False)
+            Maximizes the objective function if ``True``.
+
+        partial_match : 2-D array of integers, optional (default: None)
+            Fixes part of the matching. Also known as a "seed" [2]_.
+
+            Each row of `partial_match` specifies a pair of matched nodes:
+            node ``partial_match[i, 0]`` of `A` is matched to node
+            ``partial_match[i, 1]`` of `B`. The array has shape ``(m, 2)``,
+            where ``m`` is not greater than the number of nodes, :math:`n`.
+
+        rng : {None, int, `numpy.random.Generator`,
+               `numpy.random.RandomState`}, optional
+
+            If `seed` is None (or `np.random`), the `numpy.random.RandomState`
+            singleton is used.
+            If `seed` is an int, a new ``RandomState`` instance is used,
+            seeded with `seed`.
+            If `seed` is already a ``Generator`` or ``RandomState`` instance then
+            that instance is used.
+
+        For method-specific options, see
+        :func:`show_options('quadratic_assignment') <show_options>`.
+
+    Returns
+    -------
+    res : OptimizeResult
+        `OptimizeResult` containing the following fields.
+
+        col_ind : 1-D array
+            Column indices corresponding to the best permutation found of the
+            nodes of `B`.
+        fun : float
+            The objective value of the solution.
+        nit : int
+            The number of iterations performed during optimization.
+
+    Notes
+    -----
+    The default method :ref:`'faq' <optimize.qap-faq>` uses the Fast
+    Approximate QAP algorithm [1]_; it typically offers the best combination of
+    speed and accuracy.
+    Method :ref:`'2opt' <optimize.qap-2opt>` can be computationally expensive,
+    but may be a useful alternative, or it can be used to refine the solution
+    returned by another method.
+
+    References
+    ----------
+    .. [1] J.T. Vogelstein, J.M. Conroy, V. Lyzinski, L.J. Podrazik,
+           S.G. Kratzer, E.T. Harley, D.E. Fishkind, R.J. Vogelstein, and
+           C.E. Priebe, "Fast approximate quadratic programming for graph
+           matching," PLOS one, vol. 10, no. 4, p. e0121002, 2015,
+           :doi:`10.1371/journal.pone.0121002`
+
+    .. [2] D. Fishkind, S. Adali, H. Patsolic, L. Meng, D. Singh, V. Lyzinski,
+           C. Priebe, "Seeded graph matching", Pattern Recognit. 87 (2019):
+           203-215, :doi:`10.1016/j.patcog.2018.09.014`
+
+    .. [3] "2-opt," Wikipedia.
+           https://en.wikipedia.org/wiki/2-opt
+
+    Examples
+    --------
+    >>> import numpy as np
+    >>> from scipy.optimize import quadratic_assignment
+    >>> A = np.array([[0, 80, 150, 170], [80, 0, 130, 100],
+    ...               [150, 130, 0, 120], [170, 100, 120, 0]])
+    >>> B = np.array([[0, 5, 2, 7], [0, 0, 3, 8],
+    ...               [0, 0, 0, 3], [0, 0, 0, 0]])
+    >>> res = quadratic_assignment(A, B)
+    >>> print(res)
+         fun: 3260
+     col_ind: [0 3 2 1]
+         nit: 9
+
+    The see the relationship between the returned ``col_ind`` and ``fun``,
+    use ``col_ind`` to form the best permutation matrix found, then evaluate
+    the objective function :math:`f(P) = trace(A^T P B P^T )`.
+
+    >>> perm = res['col_ind']
+    >>> P = np.eye(len(A), dtype=int)[perm]
+    >>> fun = np.trace(A.T @ P @ B @ P.T)
+    >>> print(fun)
+    3260
+
+    Alternatively, to avoid constructing the permutation matrix explicitly,
+    directly permute the rows and columns of the distance matrix.
+
+    >>> fun = np.trace(A.T @ B[perm][:, perm])
+    >>> print(fun)
+    3260
+
+    Although not guaranteed in general, ``quadratic_assignment`` happens to
+    have found the globally optimal solution.
+
+    >>> from itertools import permutations
+    >>> perm_opt, fun_opt = None, np.inf
+    >>> for perm in permutations([0, 1, 2, 3]):
+    ...     perm = np.array(perm)
+    ...     fun = np.trace(A.T @ B[perm][:, perm])
+    ...     if fun < fun_opt:
+    ...         fun_opt, perm_opt = fun, perm
+    >>> print(np.array_equal(perm_opt, res['col_ind']))
+    True
+
+    Here is an example for which the default method,
+    :ref:`'faq' <optimize.qap-faq>`, does not find the global optimum.
+
+    >>> A = np.array([[0, 5, 8, 6], [5, 0, 5, 1],
+    ...               [8, 5, 0, 2], [6, 1, 2, 0]])
+    >>> B = np.array([[0, 1, 8, 4], [1, 0, 5, 2],
+    ...               [8, 5, 0, 5], [4, 2, 5, 0]])
+    >>> res = quadratic_assignment(A, B)
+    >>> print(res)
+         fun: 178
+     col_ind: [1 0 3 2]
+         nit: 13
+
+    If accuracy is important, consider using  :ref:`'2opt' <optimize.qap-2opt>`
+    to refine the solution.
+
+    >>> guess = np.array([np.arange(len(A)), res.col_ind]).T
+    >>> res = quadratic_assignment(A, B, method="2opt",
+    ...                            options = {'partial_guess': guess})
+    >>> print(res)
+         fun: 176
+     col_ind: [1 2 3 0]
+         nit: 17
+
+    """
+
+    if options is None:
+        options = {}
+
+    method = method.lower()
+    methods = {"faq": _quadratic_assignment_faq,
+               "2opt": _quadratic_assignment_2opt}
+    if method not in methods:
+        raise ValueError(f"method {method} must be in {methods}.")
+    res = methods[method](A, B, **options)
+    return res
+
+
+def _calc_score(A, B, perm):
+    # equivalent to objective function but avoids matmul
+    return np.sum(A * B[perm][:, perm])
+
+
+def _common_input_validation(A, B, partial_match):
+    A = np.atleast_2d(A)
+    B = np.atleast_2d(B)
+
+    if partial_match is None:
+        partial_match = np.array([[], []]).T
+    partial_match = np.atleast_2d(partial_match).astype(int)
+
+    msg = None
+    if A.shape[0] != A.shape[1]:
+        msg = "`A` must be square"
+    elif B.shape[0] != B.shape[1]:
+        msg = "`B` must be square"
+    elif A.ndim != 2 or B.ndim != 2:
+        msg = "`A` and `B` must have exactly two dimensions"
+    elif A.shape != B.shape:
+        msg = "`A` and `B` matrices must be of equal size"
+    elif partial_match.shape[0] > A.shape[0]:
+        msg = "`partial_match` can have only as many seeds as there are nodes"
+    elif partial_match.shape[1] != 2:
+        msg = "`partial_match` must have two columns"
+    elif partial_match.ndim != 2:
+        msg = "`partial_match` must have exactly two dimensions"
+    elif (partial_match < 0).any():
+        msg = "`partial_match` must contain only positive indices"
+    elif (partial_match >= len(A)).any():
+        msg = "`partial_match` entries must be less than number of nodes"
+    elif (not len(set(partial_match[:, 0])) == len(partial_match[:, 0]) or
+          not len(set(partial_match[:, 1])) == len(partial_match[:, 1])):
+        msg = "`partial_match` column entries must be unique"
+
+    if msg is not None:
+        raise ValueError(msg)
+
+    return A, B, partial_match
+
+
+def _quadratic_assignment_faq(A, B,
+                              maximize=False, partial_match=None, rng=None,
+                              P0="barycenter", shuffle_input=False, maxiter=30,
+                              tol=0.03, **unknown_options):
+    r"""Solve the quadratic assignment problem (approximately).
+
+    This function solves the Quadratic Assignment Problem (QAP) and the
+    Graph Matching Problem (GMP) using the Fast Approximate QAP Algorithm
+    (FAQ) [1]_.
+
+    Quadratic assignment solves problems of the following form:
+
+    .. math::
+
+        \min_P & \ {\ \text{trace}(A^T P B P^T)}\\
+        \mbox{s.t. } & {P \ \epsilon \ \mathcal{P}}\\
+
+    where :math:`\mathcal{P}` is the set of all permutation matrices,
+    and :math:`A` and :math:`B` are square matrices.
+
+    Graph matching tries to *maximize* the same objective function.
+    This algorithm can be thought of as finding the alignment of the
+    nodes of two graphs that minimizes the number of induced edge
+    disagreements, or, in the case of weighted graphs, the sum of squared
+    edge weight differences.
+
+    Note that the quadratic assignment problem is NP-hard. The results given
+    here are approximations and are not guaranteed to be optimal.
+
+    Parameters
+    ----------
+    A : 2-D array, square
+        The square matrix :math:`A` in the objective function above.
+    B : 2-D array, square
+        The square matrix :math:`B` in the objective function above.
+    method :  str in {'faq', '2opt'} (default: 'faq')
+        The algorithm used to solve the problem. This is the method-specific
+        documentation for 'faq'.
+        :ref:`'2opt' <optimize.qap-2opt>` is also available.
+
+    Options
+    -------
+    maximize : bool (default: False)
+        Maximizes the objective function if ``True``.
+    partial_match : 2-D array of integers, optional (default: None)
+        Fixes part of the matching. Also known as a "seed" [2]_.
+
+        Each row of `partial_match` specifies a pair of matched nodes:
+        node ``partial_match[i, 0]`` of `A` is matched to node
+        ``partial_match[i, 1]`` of `B`. The array has shape ``(m, 2)``, where
+        ``m`` is not greater than the number of nodes, :math:`n`.
+
+    rng : {None, int, `numpy.random.Generator`,
+           `numpy.random.RandomState`}, optional
+
+        If `seed` is None (or `np.random`), the `numpy.random.RandomState`
+        singleton is used.
+        If `seed` is an int, a new ``RandomState`` instance is used,
+        seeded with `seed`.
+        If `seed` is already a ``Generator`` or ``RandomState`` instance then
+        that instance is used.
+    P0 : 2-D array, "barycenter", or "randomized" (default: "barycenter")
+        Initial position. Must be a doubly-stochastic matrix [3]_.
+
+        If the initial position is an array, it must be a doubly stochastic
+        matrix of size :math:`m' \times m'` where :math:`m' = n - m`.
+
+        If ``"barycenter"`` (default), the initial position is the barycenter
+        of the Birkhoff polytope (the space of doubly stochastic matrices).
+        This is a :math:`m' \times m'` matrix with all entries equal to
+        :math:`1 / m'`.
+
+        If ``"randomized"`` the initial search position is
+        :math:`P_0 = (J + K) / 2`, where :math:`J` is the barycenter and
+        :math:`K` is a random doubly stochastic matrix.
+    shuffle_input : bool (default: False)
+        Set to `True` to resolve degenerate gradients randomly. For
+        non-degenerate gradients this option has no effect.
+    maxiter : int, positive (default: 30)
+        Integer specifying the max number of Frank-Wolfe iterations performed.
+    tol : float (default: 0.03)
+        Tolerance for termination. Frank-Wolfe iteration terminates when
+        :math:`\frac{||P_{i}-P_{i+1}||_F}{\sqrt{m')}} \leq tol`,
+        where :math:`i` is the iteration number.
+
+    Returns
+    -------
+    res : OptimizeResult
+        `OptimizeResult` containing the following fields.
+
+        col_ind : 1-D array
+            Column indices corresponding to the best permutation found of the
+            nodes of `B`.
+        fun : float
+            The objective value of the solution.
+        nit : int
+            The number of Frank-Wolfe iterations performed.
+
+    Notes
+    -----
+    The algorithm may be sensitive to the initial permutation matrix (or
+    search "position") due to the possibility of several local minima
+    within the feasible region. A barycenter initialization is more likely to
+    result in a better solution than a single random initialization. However,
+    calling ``quadratic_assignment`` several times with different random
+    initializations may result in a better optimum at the cost of longer
+    total execution time.
+
+    Examples
+    --------
+    As mentioned above, a barycenter initialization often results in a better
+    solution than a single random initialization.
+
+    >>> from numpy.random import default_rng
+    >>> rng = default_rng()
+    >>> n = 15
+    >>> A = rng.random((n, n))
+    >>> B = rng.random((n, n))
+    >>> res = quadratic_assignment(A, B)  # FAQ is default method
+    >>> print(res.fun)
+    46.871483385480545  # may vary
+
+    >>> options = {"P0": "randomized"}  # use randomized initialization
+    >>> res = quadratic_assignment(A, B, options=options)
+    >>> print(res.fun)
+    47.224831071310625 # may vary
+
+    However, consider running from several randomized initializations and
+    keeping the best result.
+
+    >>> res = min([quadratic_assignment(A, B, options=options)
+    ...            for i in range(30)], key=lambda x: x.fun)
+    >>> print(res.fun)
+    46.671852533681516 # may vary
+
+    The '2-opt' method can be used to further refine the results.
+
+    >>> options = {"partial_guess": np.array([np.arange(n), res.col_ind]).T}
+    >>> res = quadratic_assignment(A, B, method="2opt", options=options)
+    >>> print(res.fun)
+    46.47160735721583 # may vary
+
+    References
+    ----------
+    .. [1] J.T. Vogelstein, J.M. Conroy, V. Lyzinski, L.J. Podrazik,
+           S.G. Kratzer, E.T. Harley, D.E. Fishkind, R.J. Vogelstein, and
+           C.E. Priebe, "Fast approximate quadratic programming for graph
+           matching," PLOS one, vol. 10, no. 4, p. e0121002, 2015,
+           :doi:`10.1371/journal.pone.0121002`
+
+    .. [2] D. Fishkind, S. Adali, H. Patsolic, L. Meng, D. Singh, V. Lyzinski,
+           C. Priebe, "Seeded graph matching", Pattern Recognit. 87 (2019):
+           203-215, :doi:`10.1016/j.patcog.2018.09.014`
+
+    .. [3] "Doubly stochastic Matrix," Wikipedia.
+           https://en.wikipedia.org/wiki/Doubly_stochastic_matrix
+
+    """
+
+    _check_unknown_options(unknown_options)
+
+    maxiter = operator.index(maxiter)
+
+    # ValueError check
+    A, B, partial_match = _common_input_validation(A, B, partial_match)
+
+    msg = None
+    if isinstance(P0, str) and P0 not in {'barycenter', 'randomized'}:
+        msg = "Invalid 'P0' parameter string"
+    elif maxiter <= 0:
+        msg = "'maxiter' must be a positive integer"
+    elif tol <= 0:
+        msg = "'tol' must be a positive float"
+    if msg is not None:
+        raise ValueError(msg)
+
+    rng = check_random_state(rng)
+    n = len(A)  # number of vertices in graphs
+    n_seeds = len(partial_match)  # number of seeds
+    n_unseed = n - n_seeds
+
+    # [1] Algorithm 1 Line 1 - choose initialization
+    if not isinstance(P0, str):
+        P0 = np.atleast_2d(P0)
+        if P0.shape != (n_unseed, n_unseed):
+            msg = "`P0` matrix must have shape m' x m', where m'=n-m"
+        elif ((P0 < 0).any() or not np.allclose(np.sum(P0, axis=0), 1)
+              or not np.allclose(np.sum(P0, axis=1), 1)):
+            msg = "`P0` matrix must be doubly stochastic"
+        if msg is not None:
+            raise ValueError(msg)
+    elif P0 == 'barycenter':
+        P0 = np.ones((n_unseed, n_unseed)) / n_unseed
+    elif P0 == 'randomized':
+        J = np.ones((n_unseed, n_unseed)) / n_unseed
+        # generate a nxn matrix where each entry is a random number [0, 1]
+        # would use rand, but Generators don't have it
+        # would use random, but old mtrand.RandomStates don't have it
+        K = _doubly_stochastic(rng.uniform(size=(n_unseed, n_unseed)))
+        P0 = (J + K) / 2
+
+    # check trivial cases
+    if n == 0 or n_seeds == n:
+        score = _calc_score(A, B, partial_match[:, 1])
+        res = {"col_ind": partial_match[:, 1], "fun": score, "nit": 0}
+        return OptimizeResult(res)
+
+    obj_func_scalar = 1
+    if maximize:
+        obj_func_scalar = -1
+
+    nonseed_B = np.setdiff1d(range(n), partial_match[:, 1])
+    if shuffle_input:
+        nonseed_B = rng.permutation(nonseed_B)
+
+    nonseed_A = np.setdiff1d(range(n), partial_match[:, 0])
+    perm_A = np.concatenate([partial_match[:, 0], nonseed_A])
+    perm_B = np.concatenate([partial_match[:, 1], nonseed_B])
+
+    # definitions according to Seeded Graph Matching [2].
+    A11, A12, A21, A22 = _split_matrix(A[perm_A][:, perm_A], n_seeds)
+    B11, B12, B21, B22 = _split_matrix(B[perm_B][:, perm_B], n_seeds)
+    const_sum = A21 @ B21.T + A12.T @ B12
+
+    P = P0
+    # [1] Algorithm 1 Line 2 - loop while stopping criteria not met
+    for n_iter in range(1, maxiter+1):
+        # [1] Algorithm 1 Line 3 - compute the gradient of f(P) = -tr(APB^tP^t)
+        grad_fp = (const_sum + A22 @ P @ B22.T + A22.T @ P @ B22)
+        # [1] Algorithm 1 Line 4 - get direction Q by solving Eq. 8
+        _, cols = linear_sum_assignment(grad_fp, maximize=maximize)
+        Q = np.eye(n_unseed)[cols]
+
+        # [1] Algorithm 1 Line 5 - compute the step size
+        # Noting that e.g. trace(Ax) = trace(A)*x, expand and re-collect
+        # terms as ax**2 + bx + c. c does not affect location of minimum
+        # and can be ignored. Also, note that trace(A@B) = (A.T*B).sum();
+        # apply where possible for efficiency.
+        R = P - Q
+        b21 = ((R.T @ A21) * B21).sum()
+        b12 = ((R.T @ A12.T) * B12.T).sum()
+        AR22 = A22.T @ R
+        BR22 = B22 @ R.T
+        b22a = (AR22 * B22.T[cols]).sum()
+        b22b = (A22 * BR22[cols]).sum()
+        a = (AR22.T * BR22).sum()
+        b = b21 + b12 + b22a + b22b
+        # critical point of ax^2 + bx + c is at x = -d/(2*e)
+        # if a * obj_func_scalar > 0, it is a minimum
+        # if minimum is not in [0, 1], only endpoints need to be considered
+        if a*obj_func_scalar > 0 and 0 <= -b/(2*a) <= 1:
+            alpha = -b/(2*a)
+        else:
+            alpha = np.argmin([0, (b + a)*obj_func_scalar])
+
+        # [1] Algorithm 1 Line 6 - Update P
+        P_i1 = alpha * P + (1 - alpha) * Q
+        if np.linalg.norm(P - P_i1) / np.sqrt(n_unseed) < tol:
+            P = P_i1
+            break
+        P = P_i1
+    # [1] Algorithm 1 Line 7 - end main loop
+
+    # [1] Algorithm 1 Line 8 - project onto the set of permutation matrices
+    _, col = linear_sum_assignment(P, maximize=True)
+    perm = np.concatenate((np.arange(n_seeds), col + n_seeds))
+
+    unshuffled_perm = np.zeros(n, dtype=int)
+    unshuffled_perm[perm_A] = perm_B[perm]
+
+    score = _calc_score(A, B, unshuffled_perm)
+    res = {"col_ind": unshuffled_perm, "fun": score, "nit": n_iter}
+    return OptimizeResult(res)
+
+
+def _split_matrix(X, n):
+    # definitions according to Seeded Graph Matching [2].
+    upper, lower = X[:n], X[n:]
+    return upper[:, :n], upper[:, n:], lower[:, :n], lower[:, n:]
+
+
+def _doubly_stochastic(P, tol=1e-3):
+    # Adapted from @btaba implementation
+    # https://github.com/btaba/sinkhorn_knopp
+    # of Sinkhorn-Knopp algorithm
+    # https://projecteuclid.org/euclid.pjm/1102992505
+
+    max_iter = 1000
+    c = 1 / P.sum(axis=0)
+    r = 1 / (P @ c)
+    P_eps = P
+
+    for it in range(max_iter):
+        if ((np.abs(P_eps.sum(axis=1) - 1) < tol).all() and
+                (np.abs(P_eps.sum(axis=0) - 1) < tol).all()):
+            # All column/row sums ~= 1 within threshold
+            break
+
+        c = 1 / (r @ P)
+        r = 1 / (P @ c)
+        P_eps = r[:, None] * P * c
+
+    return P_eps
+
+
+def _quadratic_assignment_2opt(A, B, maximize=False, rng=None,
+                               partial_match=None,
+                               partial_guess=None,
+                               **unknown_options):
+    r"""Solve the quadratic assignment problem (approximately).
+
+    This function solves the Quadratic Assignment Problem (QAP) and the
+    Graph Matching Problem (GMP) using the 2-opt algorithm [1]_.
+
+    Quadratic assignment solves problems of the following form:
+
+    .. math::
+
+        \min_P & \ {\ \text{trace}(A^T P B P^T)}\\
+        \mbox{s.t. } & {P \ \epsilon \ \mathcal{P}}\\
+
+    where :math:`\mathcal{P}` is the set of all permutation matrices,
+    and :math:`A` and :math:`B` are square matrices.
+
+    Graph matching tries to *maximize* the same objective function.
+    This algorithm can be thought of as finding the alignment of the
+    nodes of two graphs that minimizes the number of induced edge
+    disagreements, or, in the case of weighted graphs, the sum of squared
+    edge weight differences.
+
+    Note that the quadratic assignment problem is NP-hard. The results given
+    here are approximations and are not guaranteed to be optimal.
+
+    Parameters
+    ----------
+    A : 2-D array, square
+        The square matrix :math:`A` in the objective function above.
+    B : 2-D array, square
+        The square matrix :math:`B` in the objective function above.
+    method :  str in {'faq', '2opt'} (default: 'faq')
+        The algorithm used to solve the problem. This is the method-specific
+        documentation for '2opt'.
+        :ref:`'faq' <optimize.qap-faq>` is also available.
+
+    Options
+    -------
+    maximize : bool (default: False)
+        Maximizes the objective function if ``True``.
+    rng : {None, int, `numpy.random.Generator`,
+           `numpy.random.RandomState`}, optional
+
+        If `seed` is None (or `np.random`), the `numpy.random.RandomState`
+        singleton is used.
+        If `seed` is an int, a new ``RandomState`` instance is used,
+        seeded with `seed`.
+        If `seed` is already a ``Generator`` or ``RandomState`` instance then
+        that instance is used.
+    partial_match : 2-D array of integers, optional (default: None)
+        Fixes part of the matching. Also known as a "seed" [2]_.
+
+        Each row of `partial_match` specifies a pair of matched nodes: node
+        ``partial_match[i, 0]`` of `A` is matched to node
+        ``partial_match[i, 1]`` of `B`. The array has shape ``(m, 2)``,
+        where ``m`` is not greater than the number of nodes, :math:`n`.
+
+        .. note::
+             `partial_match` must be sorted by the first column.
+
+    partial_guess : 2-D array of integers, optional (default: None)
+        A guess for the matching between the two matrices. Unlike
+        `partial_match`, `partial_guess` does not fix the indices; they are
+        still free to be optimized.
+
+        Each row of `partial_guess` specifies a pair of matched nodes: node
+        ``partial_guess[i, 0]`` of `A` is matched to node
+        ``partial_guess[i, 1]`` of `B`. The array has shape ``(m, 2)``,
+        where ``m`` is not greater than the number of nodes, :math:`n`.
+
+        .. note:: 
+                `partial_guess` must be sorted by the first column.
+
+    Returns
+    -------
+    res : OptimizeResult
+        `OptimizeResult` containing the following fields.
+
+        col_ind : 1-D array
+            Column indices corresponding to the best permutation found of the
+            nodes of `B`.
+        fun : float
+            The objective value of the solution.
+        nit : int
+            The number of iterations performed during optimization.
+
+    Notes
+    -----
+    This is a greedy algorithm that works similarly to bubble sort: beginning
+    with an initial permutation, it iteratively swaps pairs of indices to
+    improve the objective function until no such improvements are possible.
+
+    References
+    ----------
+    .. [1] "2-opt," Wikipedia.
+           https://en.wikipedia.org/wiki/2-opt
+
+    .. [2] D. Fishkind, S. Adali, H. Patsolic, L. Meng, D. Singh, V. Lyzinski,
+           C. Priebe, "Seeded graph matching", Pattern Recognit. 87 (2019):
+           203-215, https://doi.org/10.1016/j.patcog.2018.09.014
+
+    """
+    _check_unknown_options(unknown_options)
+    rng = check_random_state(rng)
+    A, B, partial_match = _common_input_validation(A, B, partial_match)
+
+    N = len(A)
+    # check trivial cases
+    if N == 0 or partial_match.shape[0] == N:
+        score = _calc_score(A, B, partial_match[:, 1])
+        res = {"col_ind": partial_match[:, 1], "fun": score, "nit": 0}
+        return OptimizeResult(res)
+
+    if partial_guess is None:
+        partial_guess = np.array([[], []]).T
+    partial_guess = np.atleast_2d(partial_guess).astype(int)
+
+    msg = None
+    if partial_guess.shape[0] > A.shape[0]:
+        msg = ("`partial_guess` can have only as "
+               "many entries as there are nodes")
+    elif partial_guess.shape[1] != 2:
+        msg = "`partial_guess` must have two columns"
+    elif partial_guess.ndim != 2:
+        msg = "`partial_guess` must have exactly two dimensions"
+    elif (partial_guess < 0).any():
+        msg = "`partial_guess` must contain only positive indices"
+    elif (partial_guess >= len(A)).any():
+        msg = "`partial_guess` entries must be less than number of nodes"
+    elif (not len(set(partial_guess[:, 0])) == len(partial_guess[:, 0]) or
+          not len(set(partial_guess[:, 1])) == len(partial_guess[:, 1])):
+        msg = "`partial_guess` column entries must be unique"
+    if msg is not None:
+        raise ValueError(msg)
+
+    fixed_rows = None
+    if partial_match.size or partial_guess.size:
+        # use partial_match and partial_guess for initial permutation,
+        # but randomly permute the rest.
+        guess_rows = np.zeros(N, dtype=bool)
+        guess_cols = np.zeros(N, dtype=bool)
+        fixed_rows = np.zeros(N, dtype=bool)
+        fixed_cols = np.zeros(N, dtype=bool)
+        perm = np.zeros(N, dtype=int)
+
+        rg, cg = partial_guess.T
+        guess_rows[rg] = True
+        guess_cols[cg] = True
+        perm[guess_rows] = cg
+
+        # match overrides guess
+        rf, cf = partial_match.T
+        fixed_rows[rf] = True
+        fixed_cols[cf] = True
+        perm[fixed_rows] = cf
+
+        random_rows = ~fixed_rows & ~guess_rows
+        random_cols = ~fixed_cols & ~guess_cols
+        perm[random_rows] = rng.permutation(np.arange(N)[random_cols])
+    else:
+        perm = rng.permutation(np.arange(N))
+
+    best_score = _calc_score(A, B, perm)
+
+    i_free = np.arange(N)
+    if fixed_rows is not None:
+        i_free = i_free[~fixed_rows]
+
+    better = operator.gt if maximize else operator.lt
+    n_iter = 0
+    done = False
+    while not done:
+        # equivalent to nested for loops i in range(N), j in range(i, N)
+        for i, j in itertools.combinations_with_replacement(i_free, 2):
+            n_iter += 1
+            perm[i], perm[j] = perm[j], perm[i]
+            score = _calc_score(A, B, perm)
+            if better(score, best_score):
+                best_score = score
+                break
+            # faster to swap back than to create a new list every time
+            perm[i], perm[j] = perm[j], perm[i]
+        else:  # no swaps made
+            done = True
+
+    res = {"col_ind": perm, "fun": best_score, "nit": n_iter}
+    return OptimizeResult(res)
@@ -0,0 +1,522 @@
+"""
+Routines for removing redundant (linearly dependent) equations from linear
+programming equality constraints.
+"""
+# Author: Matt Haberland
+
+import numpy as np
+from scipy.linalg import svd
+from scipy.linalg.interpolative import interp_decomp
+import scipy
+from scipy.linalg.blas import dtrsm
+
+
+def _row_count(A):
+    """
+    Counts the number of nonzeros in each row of input array A.
+    Nonzeros are defined as any element with absolute value greater than
+    tol = 1e-13. This value should probably be an input to the function.
+
+    Parameters
+    ----------
+    A : 2-D array
+        An array representing a matrix
+
+    Returns
+    -------
+    rowcount : 1-D array
+        Number of nonzeros in each row of A
+
+    """
+    tol = 1e-13
+    return np.array((abs(A) > tol).sum(axis=1)).flatten()
+
+
+def _get_densest(A, eligibleRows):
+    """
+    Returns the index of the densest row of A. Ignores rows that are not
+    eligible for consideration.
+
+    Parameters
+    ----------
+    A : 2-D array
+        An array representing a matrix
+    eligibleRows : 1-D logical array
+        Values indicate whether the corresponding row of A is eligible
+        to be considered
+
+    Returns
+    -------
+    i_densest : int
+        Index of the densest row in A eligible for consideration
+
+    """
+    rowCounts = _row_count(A)
+    return np.argmax(rowCounts * eligibleRows)
+
+
+def _remove_zero_rows(A, b):
+    """
+    Eliminates trivial equations from system of equations defined by Ax = b
+   and identifies trivial infeasibilities
+
+    Parameters
+    ----------
+    A : 2-D array
+        An array representing the left-hand side of a system of equations
+    b : 1-D array
+        An array representing the right-hand side of a system of equations
+
+    Returns
+    -------
+    A : 2-D array
+        An array representing the left-hand side of a system of equations
+    b : 1-D array
+        An array representing the right-hand side of a system of equations
+    status: int
+        An integer indicating the status of the removal operation
+        0: No infeasibility identified
+        2: Trivially infeasible
+    message : str
+        A string descriptor of the exit status of the optimization.
+
+    """
+    status = 0
+    message = ""
+    i_zero = _row_count(A) == 0
+    A = A[np.logical_not(i_zero), :]
+    if not np.allclose(b[i_zero], 0):
+        status = 2
+        message = "There is a zero row in A_eq with a nonzero corresponding " \
+                  "entry in b_eq. The problem is infeasible."
+    b = b[np.logical_not(i_zero)]
+    return A, b, status, message
+
+
+def bg_update_dense(plu, perm_r, v, j):
+    LU, p = plu
+
+    vperm = v[perm_r]
+    u = dtrsm(1, LU, vperm, lower=1, diag=1)
+    LU[:j+1, j] = u[:j+1]
+    l = u[j+1:]
+    piv = LU[j, j]
+    LU[j+1:, j] += (l/piv)
+    return LU, p
+
+
+def _remove_redundancy_pivot_dense(A, rhs, true_rank=None):
+    """
+    Eliminates redundant equations from system of equations defined by Ax = b
+    and identifies infeasibilities.
+
+    Parameters
+    ----------
+    A : 2-D sparse matrix
+        An matrix representing the left-hand side of a system of equations
+    rhs : 1-D array
+        An array representing the right-hand side of a system of equations
+
+    Returns
+    -------
+    A : 2-D sparse matrix
+        A matrix representing the left-hand side of a system of equations
+    rhs : 1-D array
+        An array representing the right-hand side of a system of equations
+    status: int
+        An integer indicating the status of the system
+        0: No infeasibility identified
+        2: Trivially infeasible
+    message : str
+        A string descriptor of the exit status of the optimization.
+
+    References
+    ----------
+    .. [2] Andersen, Erling D. "Finding all linearly dependent rows in
+           large-scale linear programming." Optimization Methods and Software
+           6.3 (1995): 219-227.
+
+    """
+    tolapiv = 1e-8
+    tolprimal = 1e-8
+    status = 0
+    message = ""
+    inconsistent = ("There is a linear combination of rows of A_eq that "
+                    "results in zero, suggesting a redundant constraint. "
+                    "However the same linear combination of b_eq is "
+                    "nonzero, suggesting that the constraints conflict "
+                    "and the problem is infeasible.")
+    A, rhs, status, message = _remove_zero_rows(A, rhs)
+
+    if status != 0:
+        return A, rhs, status, message
+
+    m, n = A.shape
+
+    v = list(range(m))      # Artificial column indices.
+    b = list(v)             # Basis column indices.
+    # This is better as a list than a set because column order of basis matrix
+    # needs to be consistent.
+    d = []                  # Indices of dependent rows
+    perm_r = None
+
+    A_orig = A
+    A = np.zeros((m, m + n), order='F')
+    np.fill_diagonal(A, 1)
+    A[:, m:] = A_orig
+    e = np.zeros(m)
+
+    js_candidates = np.arange(m, m+n, dtype=int)  # candidate columns for basis
+    # manual masking was faster than masked array
+    js_mask = np.ones(js_candidates.shape, dtype=bool)
+
+    # Implements basic algorithm from [2]
+    # Uses some of the suggested improvements (removing zero rows and
+    # Bartels-Golub update idea).
+    # Removing column singletons would be easy, but it is not as important
+    # because the procedure is performed only on the equality constraint
+    # matrix from the original problem - not on the canonical form matrix,
+    # which would have many more column singletons due to slack variables
+    # from the inequality constraints.
+    # The thoughts on "crashing" the initial basis are only really useful if
+    # the matrix is sparse.
+
+    lu = np.eye(m, order='F'), np.arange(m)  # initial LU is trivial
+    perm_r = lu[1]
+    for i in v:
+
+        e[i] = 1
+        if i > 0:
+            e[i-1] = 0
+
+        try:  # fails for i==0 and any time it gets ill-conditioned
+            j = b[i-1]
+            lu = bg_update_dense(lu, perm_r, A[:, j], i-1)
+        except Exception:
+            lu = scipy.linalg.lu_factor(A[:, b])
+            LU, p = lu
+            perm_r = list(range(m))
+            for i1, i2 in enumerate(p):
+                perm_r[i1], perm_r[i2] = perm_r[i2], perm_r[i1]
+
+        pi = scipy.linalg.lu_solve(lu, e, trans=1)
+
+        js = js_candidates[js_mask]
+        batch = 50
+
+        # This is a tiny bit faster than looping over columns individually,
+        # like for j in js: if abs(A[:,j].transpose().dot(pi)) > tolapiv:
+        for j_index in range(0, len(js), batch):
+            j_indices = js[j_index: min(j_index+batch, len(js))]
+
+            c = abs(A[:, j_indices].transpose().dot(pi))
+            if (c > tolapiv).any():
+                j = js[j_index + np.argmax(c)]  # very independent column
+                b[i] = j
+                js_mask[j-m] = False
+                break
+        else:
+            bibar = pi.T.dot(rhs.reshape(-1, 1))
+            bnorm = np.linalg.norm(rhs)
+            if abs(bibar)/(1+bnorm) > tolprimal:  # inconsistent
+                status = 2
+                message = inconsistent
+                return A_orig, rhs, status, message
+            else:  # dependent
+                d.append(i)
+                if true_rank is not None and len(d) == m - true_rank:
+                    break   # found all redundancies
+
+    keep = set(range(m))
+    keep = list(keep - set(d))
+    return A_orig[keep, :], rhs[keep], status, message
+
+
+def _remove_redundancy_pivot_sparse(A, rhs):
+    """
+    Eliminates redundant equations from system of equations defined by Ax = b
+    and identifies infeasibilities.
+
+    Parameters
+    ----------
+    A : 2-D sparse matrix
+        An matrix representing the left-hand side of a system of equations
+    rhs : 1-D array
+        An array representing the right-hand side of a system of equations
+
+    Returns
+    -------
+    A : 2-D sparse matrix
+        A matrix representing the left-hand side of a system of equations
+    rhs : 1-D array
+        An array representing the right-hand side of a system of equations
+    status: int
+        An integer indicating the status of the system
+        0: No infeasibility identified
+        2: Trivially infeasible
+    message : str
+        A string descriptor of the exit status of the optimization.
+
+    References
+    ----------
+    .. [2] Andersen, Erling D. "Finding all linearly dependent rows in
+           large-scale linear programming." Optimization Methods and Software
+           6.3 (1995): 219-227.
+
+    """
+
+    tolapiv = 1e-8
+    tolprimal = 1e-8
+    status = 0
+    message = ""
+    inconsistent = ("There is a linear combination of rows of A_eq that "
+                    "results in zero, suggesting a redundant constraint. "
+                    "However the same linear combination of b_eq is "
+                    "nonzero, suggesting that the constraints conflict "
+                    "and the problem is infeasible.")
+    A, rhs, status, message = _remove_zero_rows(A, rhs)
+
+    if status != 0:
+        return A, rhs, status, message
+
+    m, n = A.shape
+
+    v = list(range(m))      # Artificial column indices.
+    b = list(v)             # Basis column indices.
+    # This is better as a list than a set because column order of basis matrix
+    # needs to be consistent.
+    k = set(range(m, m+n))  # Structural column indices.
+    d = []                  # Indices of dependent rows
+
+    A_orig = A
+    A = scipy.sparse.hstack((scipy.sparse.eye(m), A)).tocsc()
+    e = np.zeros(m)
+
+    # Implements basic algorithm from [2]
+    # Uses only one of the suggested improvements (removing zero rows).
+    # Removing column singletons would be easy, but it is not as important
+    # because the procedure is performed only on the equality constraint
+    # matrix from the original problem - not on the canonical form matrix,
+    # which would have many more column singletons due to slack variables
+    # from the inequality constraints.
+    # The thoughts on "crashing" the initial basis sound useful, but the
+    # description of the procedure seems to assume a lot of familiarity with
+    # the subject; it is not very explicit. I already went through enough
+    # trouble getting the basic algorithm working, so I was not interested in
+    # trying to decipher this, too. (Overall, the paper is fraught with
+    # mistakes and ambiguities - which is strange, because the rest of
+    # Andersen's papers are quite good.)
+    # I tried and tried and tried to improve performance using the
+    # Bartels-Golub update. It works, but it's only practical if the LU
+    # factorization can be specialized as described, and that is not possible
+    # until the SciPy SuperLU interface permits control over column
+    # permutation - see issue #7700.
+
+    for i in v:
+        B = A[:, b]
+
+        e[i] = 1
+        if i > 0:
+            e[i-1] = 0
+
+        pi = scipy.sparse.linalg.spsolve(B.transpose(), e).reshape(-1, 1)
+
+        js = list(k-set(b))  # not efficient, but this is not the time sink...
+
+        # Due to overhead, it tends to be faster (for problems tested) to
+        # compute the full matrix-vector product rather than individual
+        # vector-vector products (with the chance of terminating as soon
+        # as any are nonzero). For very large matrices, it might be worth
+        # it to compute, say, 100 or 1000 at a time and stop when a nonzero
+        # is found.
+
+        c = (np.abs(A[:, js].transpose().dot(pi)) > tolapiv).nonzero()[0]
+        if len(c) > 0:  # independent
+            j = js[c[0]]
+            # in a previous commit, the previous line was changed to choose
+            # index j corresponding with the maximum dot product.
+            # While this avoided issues with almost
+            # singular matrices, it slowed the routine in most NETLIB tests.
+            # I think this is because these columns were denser than the
+            # first column with nonzero dot product (c[0]).
+            # It would be nice to have a heuristic that balances sparsity with
+            # high dot product, but I don't think it's worth the time to
+            # develop one right now. Bartels-Golub update is a much higher
+            # priority.
+            b[i] = j  # replace artificial column
+        else:
+            bibar = pi.T.dot(rhs.reshape(-1, 1))
+            bnorm = np.linalg.norm(rhs)
+            if abs(bibar)/(1 + bnorm) > tolprimal:
+                status = 2
+                message = inconsistent
+                return A_orig, rhs, status, message
+            else:  # dependent
+                d.append(i)
+
+    keep = set(range(m))
+    keep = list(keep - set(d))
+    return A_orig[keep, :], rhs[keep], status, message
+
+
+def _remove_redundancy_svd(A, b):
+    """
+    Eliminates redundant equations from system of equations defined by Ax = b
+    and identifies infeasibilities.
+
+    Parameters
+    ----------
+    A : 2-D array
+        An array representing the left-hand side of a system of equations
+    b : 1-D array
+        An array representing the right-hand side of a system of equations
+
+    Returns
+    -------
+    A : 2-D array
+        An array representing the left-hand side of a system of equations
+    b : 1-D array
+        An array representing the right-hand side of a system of equations
+    status: int
+        An integer indicating the status of the system
+        0: No infeasibility identified
+        2: Trivially infeasible
+    message : str
+        A string descriptor of the exit status of the optimization.
+
+    References
+    ----------
+    .. [2] Andersen, Erling D. "Finding all linearly dependent rows in
+           large-scale linear programming." Optimization Methods and Software
+           6.3 (1995): 219-227.
+
+    """
+
+    A, b, status, message = _remove_zero_rows(A, b)
+
+    if status != 0:
+        return A, b, status, message
+
+    U, s, Vh = svd(A)
+    eps = np.finfo(float).eps
+    tol = s.max() * max(A.shape) * eps
+
+    m, n = A.shape
+    s_min = s[-1] if m <= n else 0
+
+    # this algorithm is faster than that of [2] when the nullspace is small
+    # but it could probably be improvement by randomized algorithms and with
+    # a sparse implementation.
+    # it relies on repeated singular value decomposition to find linearly
+    # dependent rows (as identified by columns of U that correspond with zero
+    # singular values). Unfortunately, only one row can be removed per
+    # decomposition (I tried otherwise; doing so can cause problems.)
+    # It would be nice if we could do truncated SVD like sp.sparse.linalg.svds
+    # but that function is unreliable at finding singular values near zero.
+    # Finding max eigenvalue L of A A^T, then largest eigenvalue (and
+    # associated eigenvector) of -A A^T + L I (I is identity) via power
+    # iteration would also work in theory, but is only efficient if the
+    # smallest nonzero eigenvalue of A A^T is close to the largest nonzero
+    # eigenvalue.
+
+    while abs(s_min) < tol:
+        v = U[:, -1]  # TODO: return these so user can eliminate from problem?
+        # rows need to be represented in significant amount
+        eligibleRows = np.abs(v) > tol * 10e6
+        if not np.any(eligibleRows) or np.any(np.abs(v.dot(A)) > tol):
+            status = 4
+            message = ("Due to numerical issues, redundant equality "
+                       "constraints could not be removed automatically. "
+                       "Try providing your constraint matrices as sparse "
+                       "matrices to activate sparse presolve, try turning "
+                       "off redundancy removal, or try turning off presolve "
+                       "altogether.")
+            break
+        if np.any(np.abs(v.dot(b)) > tol * 100):  # factor of 100 to fix 10038 and 10349
+            status = 2
+            message = ("There is a linear combination of rows of A_eq that "
+                       "results in zero, suggesting a redundant constraint. "
+                       "However the same linear combination of b_eq is "
+                       "nonzero, suggesting that the constraints conflict "
+                       "and the problem is infeasible.")
+            break
+
+        i_remove = _get_densest(A, eligibleRows)
+        A = np.delete(A, i_remove, axis=0)
+        b = np.delete(b, i_remove)
+        U, s, Vh = svd(A)
+        m, n = A.shape
+        s_min = s[-1] if m <= n else 0
+
+    return A, b, status, message
+
+
+def _remove_redundancy_id(A, rhs, rank=None, randomized=True):
+    """Eliminates redundant equations from a system of equations.
+
+    Eliminates redundant equations from system of equations defined by Ax = b
+    and identifies infeasibilities.
+
+    Parameters
+    ----------
+    A : 2-D array
+        An array representing the left-hand side of a system of equations
+    rhs : 1-D array
+        An array representing the right-hand side of a system of equations
+    rank : int, optional
+        The rank of A
+    randomized: bool, optional
+        True for randomized interpolative decomposition
+
+    Returns
+    -------
+    A : 2-D array
+        An array representing the left-hand side of a system of equations
+    rhs : 1-D array
+        An array representing the right-hand side of a system of equations
+    status: int
+        An integer indicating the status of the system
+        0: No infeasibility identified
+        2: Trivially infeasible
+    message : str
+        A string descriptor of the exit status of the optimization.
+
+    """
+
+    status = 0
+    message = ""
+    inconsistent = ("There is a linear combination of rows of A_eq that "
+                    "results in zero, suggesting a redundant constraint. "
+                    "However the same linear combination of b_eq is "
+                    "nonzero, suggesting that the constraints conflict "
+                    "and the problem is infeasible.")
+
+    A, rhs, status, message = _remove_zero_rows(A, rhs)
+
+    if status != 0:
+        return A, rhs, status, message
+
+    m, n = A.shape
+
+    k = rank
+    if rank is None:
+        k = np.linalg.matrix_rank(A)
+
+    idx, proj = interp_decomp(A.T, k, rand=randomized)
+
+    # first k entries in idx are indices of the independent rows
+    # remaining entries are the indices of the m-k dependent rows
+    # proj provides a linear combinations of rows of A2 that form the
+    # remaining m-k (dependent) rows. The same linear combination of entries
+    # in rhs2 must give the remaining m-k entries. If not, the system is
+    # inconsistent, and the problem is infeasible.
+    if not np.allclose(rhs[idx[:k]] @ proj, rhs[idx[k:]]):
+        status = 2
+        message = inconsistent
+
+    # sort indices because the other redundancy removal routines leave rows
+    # in original order and tests were written with that in mind
+    idx = sorted(idx[:k])
+    A2 = A[idx, :]
+    rhs2 = rhs[idx]
+    return A2, rhs2, status, message
@@ -0,0 +1,711 @@
+"""
+Unified interfaces to root finding algorithms.
+
+Functions
+---------
+- root : find a root of a vector function.
+"""
+__all__ = ['root']
+
+import numpy as np
+
+from warnings import warn
+
+from ._optimize import MemoizeJac, OptimizeResult, _check_unknown_options
+from ._minpack_py import _root_hybr, leastsq
+from ._spectral import _root_df_sane
+from . import _nonlin as nonlin
+
+
+ROOT_METHODS = ['hybr', 'lm', 'broyden1', 'broyden2', 'anderson',
+                'linearmixing', 'diagbroyden', 'excitingmixing', 'krylov',
+                'df-sane']
+
+
+def root(fun, x0, args=(), method='hybr', jac=None, tol=None, callback=None,
+         options=None):
+    r"""
+    Find a root of a vector function.
+
+    Parameters
+    ----------
+    fun : callable
+        A vector function to find a root of.
+    x0 : ndarray
+        Initial guess.
+    args : tuple, optional
+        Extra arguments passed to the objective function and its Jacobian.
+    method : str, optional
+        Type of solver. Should be one of
+
+            - 'hybr'             :ref:`(see here) <optimize.root-hybr>`
+            - 'lm'               :ref:`(see here) <optimize.root-lm>`
+            - 'broyden1'         :ref:`(see here) <optimize.root-broyden1>`
+            - 'broyden2'         :ref:`(see here) <optimize.root-broyden2>`
+            - 'anderson'         :ref:`(see here) <optimize.root-anderson>`
+            - 'linearmixing'     :ref:`(see here) <optimize.root-linearmixing>`
+            - 'diagbroyden'      :ref:`(see here) <optimize.root-diagbroyden>`
+            - 'excitingmixing'   :ref:`(see here) <optimize.root-excitingmixing>`
+            - 'krylov'           :ref:`(see here) <optimize.root-krylov>`
+            - 'df-sane'          :ref:`(see here) <optimize.root-dfsane>`
+
+    jac : bool or callable, optional
+        If `jac` is a Boolean and is True, `fun` is assumed to return the
+        value of Jacobian along with the objective function. If False, the
+        Jacobian will be estimated numerically.
+        `jac` can also be a callable returning the Jacobian of `fun`. In
+        this case, it must accept the same arguments as `fun`.
+    tol : float, optional
+        Tolerance for termination. For detailed control, use solver-specific
+        options.
+    callback : function, optional
+        Optional callback function. It is called on every iteration as
+        ``callback(x, f)`` where `x` is the current solution and `f`
+        the corresponding residual. For all methods but 'hybr' and 'lm'.
+    options : dict, optional
+        A dictionary of solver options. E.g., `xtol` or `maxiter`, see
+        :obj:`show_options()` for details.
+
+    Returns
+    -------
+    sol : OptimizeResult
+        The solution represented as a ``OptimizeResult`` object.
+        Important attributes are: ``x`` the solution array, ``success`` a
+        Boolean flag indicating if the algorithm exited successfully and
+        ``message`` which describes the cause of the termination. See
+        `OptimizeResult` for a description of other attributes.
+
+    See also
+    --------
+    show_options : Additional options accepted by the solvers
+
+    Notes
+    -----
+    This section describes the available solvers that can be selected by the
+    'method' parameter. The default method is *hybr*.
+
+    Method *hybr* uses a modification of the Powell hybrid method as
+    implemented in MINPACK [1]_.
+
+    Method *lm* solves the system of nonlinear equations in a least squares
+    sense using a modification of the Levenberg-Marquardt algorithm as
+    implemented in MINPACK [1]_.
+
+    Method *df-sane* is a derivative-free spectral method. [3]_
+
+    Methods *broyden1*, *broyden2*, *anderson*, *linearmixing*,
+    *diagbroyden*, *excitingmixing*, *krylov* are inexact Newton methods,
+    with backtracking or full line searches [2]_. Each method corresponds
+    to a particular Jacobian approximations.
+
+    - Method *broyden1* uses Broyden's first Jacobian approximation, it is
+      known as Broyden's good method.
+    - Method *broyden2* uses Broyden's second Jacobian approximation, it
+      is known as Broyden's bad method.
+    - Method *anderson* uses (extended) Anderson mixing.
+    - Method *Krylov* uses Krylov approximation for inverse Jacobian. It
+      is suitable for large-scale problem.
+    - Method *diagbroyden* uses diagonal Broyden Jacobian approximation.
+    - Method *linearmixing* uses a scalar Jacobian approximation.
+    - Method *excitingmixing* uses a tuned diagonal Jacobian
+      approximation.
+
+    .. warning::
+
+        The algorithms implemented for methods *diagbroyden*,
+        *linearmixing* and *excitingmixing* may be useful for specific
+        problems, but whether they will work may depend strongly on the
+        problem.
+
+    .. versionadded:: 0.11.0
+
+    References
+    ----------
+    .. [1] More, Jorge J., Burton S. Garbow, and Kenneth E. Hillstrom.
+       1980. User Guide for MINPACK-1.
+    .. [2] C. T. Kelley. 1995. Iterative Methods for Linear and Nonlinear
+       Equations. Society for Industrial and Applied Mathematics.
+       <https://archive.siam.org/books/kelley/fr16/>
+    .. [3] W. La Cruz, J.M. Martinez, M. Raydan. Math. Comp. 75, 1429 (2006).
+
+    Examples
+    --------
+    The following functions define a system of nonlinear equations and its
+    jacobian.
+
+    >>> import numpy as np
+    >>> def fun(x):
+    ...     return [x[0]  + 0.5 * (x[0] - x[1])**3 - 1.0,
+    ...             0.5 * (x[1] - x[0])**3 + x[1]]
+
+    >>> def jac(x):
+    ...     return np.array([[1 + 1.5 * (x[0] - x[1])**2,
+    ...                       -1.5 * (x[0] - x[1])**2],
+    ...                      [-1.5 * (x[1] - x[0])**2,
+    ...                       1 + 1.5 * (x[1] - x[0])**2]])
+
+    A solution can be obtained as follows.
+
+    >>> from scipy import optimize
+    >>> sol = optimize.root(fun, [0, 0], jac=jac, method='hybr')
+    >>> sol.x
+    array([ 0.8411639,  0.1588361])
+
+    **Large problem**
+
+    Suppose that we needed to solve the following integrodifferential
+    equation on the square :math:`[0,1]\times[0,1]`:
+
+    .. math::
+
+       \nabla^2 P = 10 \left(\int_0^1\int_0^1\cosh(P)\,dx\,dy\right)^2
+
+    with :math:`P(x,1) = 1` and :math:`P=0` elsewhere on the boundary of
+    the square.
+
+    The solution can be found using the ``method='krylov'`` solver:
+
+    >>> from scipy import optimize
+    >>> # parameters
+    >>> nx, ny = 75, 75
+    >>> hx, hy = 1./(nx-1), 1./(ny-1)
+
+    >>> P_left, P_right = 0, 0
+    >>> P_top, P_bottom = 1, 0
+
+    >>> def residual(P):
+    ...    d2x = np.zeros_like(P)
+    ...    d2y = np.zeros_like(P)
+    ...
+    ...    d2x[1:-1] = (P[2:]   - 2*P[1:-1] + P[:-2]) / hx/hx
+    ...    d2x[0]    = (P[1]    - 2*P[0]    + P_left)/hx/hx
+    ...    d2x[-1]   = (P_right - 2*P[-1]   + P[-2])/hx/hx
+    ...
+    ...    d2y[:,1:-1] = (P[:,2:] - 2*P[:,1:-1] + P[:,:-2])/hy/hy
+    ...    d2y[:,0]    = (P[:,1]  - 2*P[:,0]    + P_bottom)/hy/hy
+    ...    d2y[:,-1]   = (P_top   - 2*P[:,-1]   + P[:,-2])/hy/hy
+    ...
+    ...    return d2x + d2y - 10*np.cosh(P).mean()**2
+
+    >>> guess = np.zeros((nx, ny), float)
+    >>> sol = optimize.root(residual, guess, method='krylov')
+    >>> print('Residual: %g' % abs(residual(sol.x)).max())
+    Residual: 5.7972e-06  # may vary
+
+    >>> import matplotlib.pyplot as plt
+    >>> x, y = np.mgrid[0:1:(nx*1j), 0:1:(ny*1j)]
+    >>> plt.pcolormesh(x, y, sol.x, shading='gouraud')
+    >>> plt.colorbar()
+    >>> plt.show()
+
+    """
+    if not isinstance(args, tuple):
+        args = (args,)
+
+    meth = method.lower()
+    if options is None:
+        options = {}
+
+    if callback is not None and meth in ('hybr', 'lm'):
+        warn('Method %s does not accept callback.' % method,
+             RuntimeWarning, stacklevel=2)
+
+    # fun also returns the Jacobian
+    if not callable(jac) and meth in ('hybr', 'lm'):
+        if bool(jac):
+            fun = MemoizeJac(fun)
+            jac = fun.derivative
+        else:
+            jac = None
+
+    # set default tolerances
+    if tol is not None:
+        options = dict(options)
+        if meth in ('hybr', 'lm'):
+            options.setdefault('xtol', tol)
+        elif meth in ('df-sane',):
+            options.setdefault('ftol', tol)
+        elif meth in ('broyden1', 'broyden2', 'anderson', 'linearmixing',
+                      'diagbroyden', 'excitingmixing', 'krylov'):
+            options.setdefault('xtol', tol)
+            options.setdefault('xatol', np.inf)
+            options.setdefault('ftol', np.inf)
+            options.setdefault('fatol', np.inf)
+
+    if meth == 'hybr':
+        sol = _root_hybr(fun, x0, args=args, jac=jac, **options)
+    elif meth == 'lm':
+        sol = _root_leastsq(fun, x0, args=args, jac=jac, **options)
+    elif meth == 'df-sane':
+        _warn_jac_unused(jac, method)
+        sol = _root_df_sane(fun, x0, args=args, callback=callback,
+                            **options)
+    elif meth in ('broyden1', 'broyden2', 'anderson', 'linearmixing',
+                  'diagbroyden', 'excitingmixing', 'krylov'):
+        _warn_jac_unused(jac, method)
+        sol = _root_nonlin_solve(fun, x0, args=args, jac=jac,
+                                 _method=meth, _callback=callback,
+                                 **options)
+    else:
+        raise ValueError('Unknown solver %s' % method)
+
+    return sol
+
+
+def _warn_jac_unused(jac, method):
+    if jac is not None:
+        warn(f'Method {method} does not use the jacobian (jac).',
+             RuntimeWarning, stacklevel=2)
+
+
+def _root_leastsq(fun, x0, args=(), jac=None,
+                  col_deriv=0, xtol=1.49012e-08, ftol=1.49012e-08,
+                  gtol=0.0, maxiter=0, eps=0.0, factor=100, diag=None,
+                  **unknown_options):
+    """
+    Solve for least squares with Levenberg-Marquardt
+
+    Options
+    -------
+    col_deriv : bool
+        non-zero to specify that the Jacobian function computes derivatives
+        down the columns (faster, because there is no transpose operation).
+    ftol : float
+        Relative error desired in the sum of squares.
+    xtol : float
+        Relative error desired in the approximate solution.
+    gtol : float
+        Orthogonality desired between the function vector and the columns
+        of the Jacobian.
+    maxiter : int
+        The maximum number of calls to the function. If zero, then
+        100*(N+1) is the maximum where N is the number of elements in x0.
+    eps : float
+        A suitable step length for the forward-difference approximation of
+        the Jacobian (for Dfun=None). If `eps` is less than the machine
+        precision, it is assumed that the relative errors in the functions
+        are of the order of the machine precision.
+    factor : float
+        A parameter determining the initial step bound
+        (``factor * || diag * x||``). Should be in interval ``(0.1, 100)``.
+    diag : sequence
+        N positive entries that serve as a scale factors for the variables.
+    """
+
+    _check_unknown_options(unknown_options)
+    x, cov_x, info, msg, ier = leastsq(fun, x0, args=args, Dfun=jac,
+                                       full_output=True,
+                                       col_deriv=col_deriv, xtol=xtol,
+                                       ftol=ftol, gtol=gtol,
+                                       maxfev=maxiter, epsfcn=eps,
+                                       factor=factor, diag=diag)
+    sol = OptimizeResult(x=x, message=msg, status=ier,
+                         success=ier in (1, 2, 3, 4), cov_x=cov_x,
+                         fun=info.pop('fvec'), method="lm")
+    sol.update(info)
+    return sol
+
+
+def _root_nonlin_solve(fun, x0, args=(), jac=None,
+                       _callback=None, _method=None,
+                       nit=None, disp=False, maxiter=None,
+                       ftol=None, fatol=None, xtol=None, xatol=None,
+                       tol_norm=None, line_search='armijo', jac_options=None,
+                       **unknown_options):
+    _check_unknown_options(unknown_options)
+
+    f_tol = fatol
+    f_rtol = ftol
+    x_tol = xatol
+    x_rtol = xtol
+    verbose = disp
+    if jac_options is None:
+        jac_options = dict()
+
+    jacobian = {'broyden1': nonlin.BroydenFirst,
+                'broyden2': nonlin.BroydenSecond,
+                'anderson': nonlin.Anderson,
+                'linearmixing': nonlin.LinearMixing,
+                'diagbroyden': nonlin.DiagBroyden,
+                'excitingmixing': nonlin.ExcitingMixing,
+                'krylov': nonlin.KrylovJacobian
+                }[_method]
+
+    if args:
+        if jac is True:
+            def f(x):
+                return fun(x, *args)[0]
+        else:
+            def f(x):
+                return fun(x, *args)
+    else:
+        f = fun
+
+    x, info = nonlin.nonlin_solve(f, x0, jacobian=jacobian(**jac_options),
+                                  iter=nit, verbose=verbose,
+                                  maxiter=maxiter, f_tol=f_tol,
+                                  f_rtol=f_rtol, x_tol=x_tol,
+                                  x_rtol=x_rtol, tol_norm=tol_norm,
+                                  line_search=line_search,
+                                  callback=_callback, full_output=True,
+                                  raise_exception=False)
+    sol = OptimizeResult(x=x, method=_method)
+    sol.update(info)
+    return sol
+
+def _root_broyden1_doc():
+    """
+    Options
+    -------
+    nit : int, optional
+        Number of iterations to make. If omitted (default), make as many
+        as required to meet tolerances.
+    disp : bool, optional
+        Print status to stdout on every iteration.
+    maxiter : int, optional
+        Maximum number of iterations to make.
+    ftol : float, optional
+        Relative tolerance for the residual. If omitted, not used.
+    fatol : float, optional
+        Absolute tolerance (in max-norm) for the residual.
+        If omitted, default is 6e-6.
+    xtol : float, optional
+        Relative minimum step size. If omitted, not used.
+    xatol : float, optional
+        Absolute minimum step size, as determined from the Jacobian
+        approximation. If the step size is smaller than this, optimization
+        is terminated as successful. If omitted, not used.
+    tol_norm : function(vector) -> scalar, optional
+        Norm to use in convergence check. Default is the maximum norm.
+    line_search : {None, 'armijo' (default), 'wolfe'}, optional
+        Which type of a line search to use to determine the step size in
+        the direction given by the Jacobian approximation. Defaults to
+        'armijo'.
+    jac_options : dict, optional
+        Options for the respective Jacobian approximation.
+            alpha : float, optional
+                Initial guess for the Jacobian is (-1/alpha).
+            reduction_method : str or tuple, optional
+                Method used in ensuring that the rank of the Broyden
+                matrix stays low. Can either be a string giving the
+                name of the method, or a tuple of the form ``(method,
+                param1, param2, ...)`` that gives the name of the
+                method and values for additional parameters.
+
+                Methods available:
+
+                    - ``restart``
+                        Drop all matrix columns. Has no
+                        extra parameters.
+                    - ``simple``
+                        Drop oldest matrix column. Has no
+                        extra parameters.
+                    - ``svd``
+                        Keep only the most significant SVD
+                        components.
+
+                        Extra parameters:
+
+                            - ``to_retain``
+                                Number of SVD components to
+                                retain when rank reduction is done.
+                                Default is ``max_rank - 2``.
+            max_rank : int, optional
+                Maximum rank for the Broyden matrix.
+                Default is infinity (i.e., no rank reduction).
+
+    Examples
+    --------
+    >>> def func(x):
+    ...     return np.cos(x) + x[::-1] - [1, 2, 3, 4]
+    ...
+    >>> from scipy import optimize
+    >>> res = optimize.root(func, [1, 1, 1, 1], method='broyden1', tol=1e-14)
+    >>> x = res.x
+    >>> x
+    array([4.04674914, 3.91158389, 2.71791677, 1.61756251])
+    >>> np.cos(x) + x[::-1]
+    array([1., 2., 3., 4.])
+
+    """
+    pass
+
+def _root_broyden2_doc():
+    """
+    Options
+    -------
+    nit : int, optional
+        Number of iterations to make. If omitted (default), make as many
+        as required to meet tolerances.
+    disp : bool, optional
+        Print status to stdout on every iteration.
+    maxiter : int, optional
+        Maximum number of iterations to make.
+    ftol : float, optional
+        Relative tolerance for the residual. If omitted, not used.
+    fatol : float, optional
+        Absolute tolerance (in max-norm) for the residual.
+        If omitted, default is 6e-6.
+    xtol : float, optional
+        Relative minimum step size. If omitted, not used.
+    xatol : float, optional
+        Absolute minimum step size, as determined from the Jacobian
+        approximation. If the step size is smaller than this, optimization
+        is terminated as successful. If omitted, not used.
+    tol_norm : function(vector) -> scalar, optional
+        Norm to use in convergence check. Default is the maximum norm.
+    line_search : {None, 'armijo' (default), 'wolfe'}, optional
+        Which type of a line search to use to determine the step size in
+        the direction given by the Jacobian approximation. Defaults to
+        'armijo'.
+    jac_options : dict, optional
+        Options for the respective Jacobian approximation.
+
+        alpha : float, optional
+            Initial guess for the Jacobian is (-1/alpha).
+        reduction_method : str or tuple, optional
+            Method used in ensuring that the rank of the Broyden
+            matrix stays low. Can either be a string giving the
+            name of the method, or a tuple of the form ``(method,
+            param1, param2, ...)`` that gives the name of the
+            method and values for additional parameters.
+
+            Methods available:
+
+                - ``restart``
+                    Drop all matrix columns. Has no
+                    extra parameters.
+                - ``simple``
+                    Drop oldest matrix column. Has no
+                    extra parameters.
+                - ``svd``
+                    Keep only the most significant SVD
+                    components.
+
+                    Extra parameters:
+
+                        - ``to_retain``
+                            Number of SVD components to
+                            retain when rank reduction is done.
+                            Default is ``max_rank - 2``.
+        max_rank : int, optional
+            Maximum rank for the Broyden matrix.
+            Default is infinity (i.e., no rank reduction).
+    """
+    pass
+
+def _root_anderson_doc():
+    """
+    Options
+    -------
+    nit : int, optional
+        Number of iterations to make. If omitted (default), make as many
+        as required to meet tolerances.
+    disp : bool, optional
+        Print status to stdout on every iteration.
+    maxiter : int, optional
+        Maximum number of iterations to make.
+    ftol : float, optional
+        Relative tolerance for the residual. If omitted, not used.
+    fatol : float, optional
+        Absolute tolerance (in max-norm) for the residual.
+        If omitted, default is 6e-6.
+    xtol : float, optional
+        Relative minimum step size. If omitted, not used.
+    xatol : float, optional
+        Absolute minimum step size, as determined from the Jacobian
+        approximation. If the step size is smaller than this, optimization
+        is terminated as successful. If omitted, not used.
+    tol_norm : function(vector) -> scalar, optional
+        Norm to use in convergence check. Default is the maximum norm.
+    line_search : {None, 'armijo' (default), 'wolfe'}, optional
+        Which type of a line search to use to determine the step size in
+        the direction given by the Jacobian approximation. Defaults to
+        'armijo'.
+    jac_options : dict, optional
+        Options for the respective Jacobian approximation.
+
+        alpha : float, optional
+            Initial guess for the Jacobian is (-1/alpha).
+        M : float, optional
+            Number of previous vectors to retain. Defaults to 5.
+        w0 : float, optional
+            Regularization parameter for numerical stability.
+            Compared to unity, good values of the order of 0.01.
+    """
+    pass
+
+def _root_linearmixing_doc():
+    """
+    Options
+    -------
+    nit : int, optional
+        Number of iterations to make. If omitted (default), make as many
+        as required to meet tolerances.
+    disp : bool, optional
+        Print status to stdout on every iteration.
+    maxiter : int, optional
+        Maximum number of iterations to make.
+    ftol : float, optional
+        Relative tolerance for the residual. If omitted, not used.
+    fatol : float, optional
+        Absolute tolerance (in max-norm) for the residual.
+        If omitted, default is 6e-6.
+    xtol : float, optional
+        Relative minimum step size. If omitted, not used.
+    xatol : float, optional
+        Absolute minimum step size, as determined from the Jacobian
+        approximation. If the step size is smaller than this, optimization
+        is terminated as successful. If omitted, not used.
+    tol_norm : function(vector) -> scalar, optional
+        Norm to use in convergence check. Default is the maximum norm.
+    line_search : {None, 'armijo' (default), 'wolfe'}, optional
+        Which type of a line search to use to determine the step size in
+        the direction given by the Jacobian approximation. Defaults to
+        'armijo'.
+    jac_options : dict, optional
+        Options for the respective Jacobian approximation.
+
+        alpha : float, optional
+            initial guess for the jacobian is (-1/alpha).
+    """
+    pass
+
+def _root_diagbroyden_doc():
+    """
+    Options
+    -------
+    nit : int, optional
+        Number of iterations to make. If omitted (default), make as many
+        as required to meet tolerances.
+    disp : bool, optional
+        Print status to stdout on every iteration.
+    maxiter : int, optional
+        Maximum number of iterations to make.
+    ftol : float, optional
+        Relative tolerance for the residual. If omitted, not used.
+    fatol : float, optional
+        Absolute tolerance (in max-norm) for the residual.
+        If omitted, default is 6e-6.
+    xtol : float, optional
+        Relative minimum step size. If omitted, not used.
+    xatol : float, optional
+        Absolute minimum step size, as determined from the Jacobian
+        approximation. If the step size is smaller than this, optimization
+        is terminated as successful. If omitted, not used.
+    tol_norm : function(vector) -> scalar, optional
+        Norm to use in convergence check. Default is the maximum norm.
+    line_search : {None, 'armijo' (default), 'wolfe'}, optional
+        Which type of a line search to use to determine the step size in
+        the direction given by the Jacobian approximation. Defaults to
+        'armijo'.
+    jac_options : dict, optional
+        Options for the respective Jacobian approximation.
+
+        alpha : float, optional
+            initial guess for the jacobian is (-1/alpha).
+    """
+    pass
+
+def _root_excitingmixing_doc():
+    """
+    Options
+    -------
+    nit : int, optional
+        Number of iterations to make. If omitted (default), make as many
+        as required to meet tolerances.
+    disp : bool, optional
+        Print status to stdout on every iteration.
+    maxiter : int, optional
+        Maximum number of iterations to make.
+    ftol : float, optional
+        Relative tolerance for the residual. If omitted, not used.
+    fatol : float, optional
+        Absolute tolerance (in max-norm) for the residual.
+        If omitted, default is 6e-6.
+    xtol : float, optional
+        Relative minimum step size. If omitted, not used.
+    xatol : float, optional
+        Absolute minimum step size, as determined from the Jacobian
+        approximation. If the step size is smaller than this, optimization
+        is terminated as successful. If omitted, not used.
+    tol_norm : function(vector) -> scalar, optional
+        Norm to use in convergence check. Default is the maximum norm.
+    line_search : {None, 'armijo' (default), 'wolfe'}, optional
+        Which type of a line search to use to determine the step size in
+        the direction given by the Jacobian approximation. Defaults to
+        'armijo'.
+    jac_options : dict, optional
+        Options for the respective Jacobian approximation.
+
+        alpha : float, optional
+            Initial Jacobian approximation is (-1/alpha).
+        alphamax : float, optional
+            The entries of the diagonal Jacobian are kept in the range
+            ``[alpha, alphamax]``.
+    """
+    pass
+
+def _root_krylov_doc():
+    """
+    Options
+    -------
+    nit : int, optional
+        Number of iterations to make. If omitted (default), make as many
+        as required to meet tolerances.
+    disp : bool, optional
+        Print status to stdout on every iteration.
+    maxiter : int, optional
+        Maximum number of iterations to make.
+    ftol : float, optional
+        Relative tolerance for the residual. If omitted, not used.
+    fatol : float, optional
+        Absolute tolerance (in max-norm) for the residual.
+        If omitted, default is 6e-6.
+    xtol : float, optional
+        Relative minimum step size. If omitted, not used.
+    xatol : float, optional
+        Absolute minimum step size, as determined from the Jacobian
+        approximation. If the step size is smaller than this, optimization
+        is terminated as successful. If omitted, not used.
+    tol_norm : function(vector) -> scalar, optional
+        Norm to use in convergence check. Default is the maximum norm.
+    line_search : {None, 'armijo' (default), 'wolfe'}, optional
+        Which type of a line search to use to determine the step size in
+        the direction given by the Jacobian approximation. Defaults to
+        'armijo'.
+    jac_options : dict, optional
+        Options for the respective Jacobian approximation.
+
+        rdiff : float, optional
+            Relative step size to use in numerical differentiation.
+        method : str or callable, optional
+            Krylov method to use to approximate the Jacobian.  Can be a string,
+            or a function implementing the same interface as the iterative
+            solvers in `scipy.sparse.linalg`. If a string, needs to be one of:
+            ``'lgmres'``, ``'gmres'``, ``'bicgstab'``, ``'cgs'``, ``'minres'``,
+            ``'tfqmr'``.
+
+            The default is `scipy.sparse.linalg.lgmres`.
+        inner_M : LinearOperator or InverseJacobian
+            Preconditioner for the inner Krylov iteration.
+            Note that you can use also inverse Jacobians as (adaptive)
+            preconditioners. For example,
+
+            >>> jac = BroydenFirst()
+            >>> kjac = KrylovJacobian(inner_M=jac.inverse).
+
+            If the preconditioner has a method named 'update', it will
+            be called as ``update(x, f)`` after each nonlinear step,
+            with ``x`` giving the current point, and ``f`` the current
+            function value.
+        inner_tol, inner_maxiter, ...
+            Parameters to pass on to the "inner" Krylov solver.
+            See `scipy.sparse.linalg.gmres` for details.
+        outer_k : int, optional
+            Size of the subspace kept across LGMRES nonlinear
+            iterations.
+
+            See `scipy.sparse.linalg.lgmres` for details.
+    """
+    pass
@@ -0,0 +1,525 @@
+"""
+Unified interfaces to root finding algorithms for real or complex
+scalar functions.
+
+Functions
+---------
+- root : find a root of a scalar function.
+"""
+import numpy as np
+
+from . import _zeros_py as optzeros
+from ._numdiff import approx_derivative
+
+__all__ = ['root_scalar']
+
+ROOT_SCALAR_METHODS = ['bisect', 'brentq', 'brenth', 'ridder', 'toms748',
+                       'newton', 'secant', 'halley']
+
+
+class MemoizeDer:
+    """Decorator that caches the value and derivative(s) of function each
+    time it is called.
+
+    This is a simplistic memoizer that calls and caches a single value
+    of `f(x, *args)`.
+    It assumes that `args` does not change between invocations.
+    It supports the use case of a root-finder where `args` is fixed,
+    `x` changes, and only rarely, if at all, does x assume the same value
+    more than once."""
+    def __init__(self, fun):
+        self.fun = fun
+        self.vals = None
+        self.x = None
+        self.n_calls = 0
+
+    def __call__(self, x, *args):
+        r"""Calculate f or use cached value if available"""
+        # Derivative may be requested before the function itself, always check
+        if self.vals is None or x != self.x:
+            fg = self.fun(x, *args)
+            self.x = x
+            self.n_calls += 1
+            self.vals = fg[:]
+        return self.vals[0]
+
+    def fprime(self, x, *args):
+        r"""Calculate f' or use a cached value if available"""
+        if self.vals is None or x != self.x:
+            self(x, *args)
+        return self.vals[1]
+
+    def fprime2(self, x, *args):
+        r"""Calculate f'' or use a cached value if available"""
+        if self.vals is None or x != self.x:
+            self(x, *args)
+        return self.vals[2]
+
+    def ncalls(self):
+        return self.n_calls
+
+
+def root_scalar(f, args=(), method=None, bracket=None,
+                fprime=None, fprime2=None,
+                x0=None, x1=None,
+                xtol=None, rtol=None, maxiter=None,
+                options=None):
+    """
+    Find a root of a scalar function.
+
+    Parameters
+    ----------
+    f : callable
+        A function to find a root of.
+    args : tuple, optional
+        Extra arguments passed to the objective function and its derivative(s).
+    method : str, optional
+        Type of solver.  Should be one of
+
+            - 'bisect'    :ref:`(see here) <optimize.root_scalar-bisect>`
+            - 'brentq'    :ref:`(see here) <optimize.root_scalar-brentq>`
+            - 'brenth'    :ref:`(see here) <optimize.root_scalar-brenth>`
+            - 'ridder'    :ref:`(see here) <optimize.root_scalar-ridder>`
+            - 'toms748'    :ref:`(see here) <optimize.root_scalar-toms748>`
+            - 'newton'    :ref:`(see here) <optimize.root_scalar-newton>`
+            - 'secant'    :ref:`(see here) <optimize.root_scalar-secant>`
+            - 'halley'    :ref:`(see here) <optimize.root_scalar-halley>`
+
+    bracket: A sequence of 2 floats, optional
+        An interval bracketing a root.  `f(x, *args)` must have different
+        signs at the two endpoints.
+    x0 : float, optional
+        Initial guess.
+    x1 : float, optional
+        A second guess.
+    fprime : bool or callable, optional
+        If `fprime` is a boolean and is True, `f` is assumed to return the
+        value of the objective function and of the derivative.
+        `fprime` can also be a callable returning the derivative of `f`. In
+        this case, it must accept the same arguments as `f`.
+    fprime2 : bool or callable, optional
+        If `fprime2` is a boolean and is True, `f` is assumed to return the
+        value of the objective function and of the
+        first and second derivatives.
+        `fprime2` can also be a callable returning the second derivative of `f`.
+        In this case, it must accept the same arguments as `f`.
+    xtol : float, optional
+        Tolerance (absolute) for termination.
+    rtol : float, optional
+        Tolerance (relative) for termination.
+    maxiter : int, optional
+        Maximum number of iterations.
+    options : dict, optional
+        A dictionary of solver options. E.g., ``k``, see
+        :obj:`show_options()` for details.
+
+    Returns
+    -------
+    sol : RootResults
+        The solution represented as a ``RootResults`` object.
+        Important attributes are: ``root`` the solution , ``converged`` a
+        boolean flag indicating if the algorithm exited successfully and
+        ``flag`` which describes the cause of the termination. See
+        `RootResults` for a description of other attributes.
+
+    See also
+    --------
+    show_options : Additional options accepted by the solvers
+    root : Find a root of a vector function.
+
+    Notes
+    -----
+    This section describes the available solvers that can be selected by the
+    'method' parameter.
+
+    The default is to use the best method available for the situation
+    presented.
+    If a bracket is provided, it may use one of the bracketing methods.
+    If a derivative and an initial value are specified, it may
+    select one of the derivative-based methods.
+    If no method is judged applicable, it will raise an Exception.
+
+    Arguments for each method are as follows (x=required, o=optional).
+
+    +-----------------------------------------------+---+------+---------+----+----+--------+---------+------+------+---------+---------+
+    |                    method                     | f | args | bracket | x0 | x1 | fprime | fprime2 | xtol | rtol | maxiter | options |
+    +===============================================+===+======+=========+====+====+========+=========+======+======+=========+=========+
+    | :ref:`bisect <optimize.root_scalar-bisect>`   | x |  o   |    x    |    |    |        |         |  o   |  o   |    o    |   o     |
+    +-----------------------------------------------+---+------+---------+----+----+--------+---------+------+------+---------+---------+
+    | :ref:`brentq <optimize.root_scalar-brentq>`   | x |  o   |    x    |    |    |        |         |  o   |  o   |    o    |   o     |
+    +-----------------------------------------------+---+------+---------+----+----+--------+---------+------+------+---------+---------+
+    | :ref:`brenth <optimize.root_scalar-brenth>`   | x |  o   |    x    |    |    |        |         |  o   |  o   |    o    |   o     |
+    +-----------------------------------------------+---+------+---------+----+----+--------+---------+------+------+---------+---------+
+    | :ref:`ridder <optimize.root_scalar-ridder>`   | x |  o   |    x    |    |    |        |         |  o   |  o   |    o    |   o     |
+    +-----------------------------------------------+---+------+---------+----+----+--------+---------+------+------+---------+---------+
+    | :ref:`toms748 <optimize.root_scalar-toms748>` | x |  o   |    x    |    |    |        |         |  o   |  o   |    o    |   o     |
+    +-----------------------------------------------+---+------+---------+----+----+--------+---------+------+------+---------+---------+
+    | :ref:`secant <optimize.root_scalar-secant>`   | x |  o   |         | x  | o  |        |         |  o   |  o   |    o    |   o     |
+    +-----------------------------------------------+---+------+---------+----+----+--------+---------+------+------+---------+---------+
+    | :ref:`newton <optimize.root_scalar-newton>`   | x |  o   |         | x  |    |   o    |         |  o   |  o   |    o    |   o     |
+    +-----------------------------------------------+---+------+---------+----+----+--------+---------+------+------+---------+---------+
+    | :ref:`halley <optimize.root_scalar-halley>`   | x |  o   |         | x  |    |   x    |    x    |  o   |  o   |    o    |   o     |
+    +-----------------------------------------------+---+------+---------+----+----+--------+---------+------+------+---------+---------+
+
+    Examples
+    --------
+
+    Find the root of a simple cubic
+
+    >>> from scipy import optimize
+    >>> def f(x):
+    ...     return (x**3 - 1)  # only one real root at x = 1
+
+    >>> def fprime(x):
+    ...     return 3*x**2
+
+    The `brentq` method takes as input a bracket
+
+    >>> sol = optimize.root_scalar(f, bracket=[0, 3], method='brentq')
+    >>> sol.root, sol.iterations, sol.function_calls
+    (1.0, 10, 11)
+
+    The `newton` method takes as input a single point and uses the
+    derivative(s).
+
+    >>> sol = optimize.root_scalar(f, x0=0.2, fprime=fprime, method='newton')
+    >>> sol.root, sol.iterations, sol.function_calls
+    (1.0, 11, 22)
+
+    The function can provide the value and derivative(s) in a single call.
+
+    >>> def f_p_pp(x):
+    ...     return (x**3 - 1), 3*x**2, 6*x
+
+    >>> sol = optimize.root_scalar(
+    ...     f_p_pp, x0=0.2, fprime=True, method='newton'
+    ... )
+    >>> sol.root, sol.iterations, sol.function_calls
+    (1.0, 11, 11)
+
+    >>> sol = optimize.root_scalar(
+    ...     f_p_pp, x0=0.2, fprime=True, fprime2=True, method='halley'
+    ... )
+    >>> sol.root, sol.iterations, sol.function_calls
+    (1.0, 7, 8)
+
+
+    """  # noqa: E501
+    if not isinstance(args, tuple):
+        args = (args,)
+
+    if options is None:
+        options = {}
+
+    # fun also returns the derivative(s)
+    is_memoized = False
+    if fprime2 is not None and not callable(fprime2):
+        if bool(fprime2):
+            f = MemoizeDer(f)
+            is_memoized = True
+            fprime2 = f.fprime2
+            fprime = f.fprime
+        else:
+            fprime2 = None
+    if fprime is not None and not callable(fprime):
+        if bool(fprime):
+            f = MemoizeDer(f)
+            is_memoized = True
+            fprime = f.fprime
+        else:
+            fprime = None
+
+    # respect solver-specific default tolerances - only pass in if actually set
+    kwargs = {}
+    for k in ['xtol', 'rtol', 'maxiter']:
+        v = locals().get(k)
+        if v is not None:
+            kwargs[k] = v
+
+    # Set any solver-specific options
+    if options:
+        kwargs.update(options)
+    # Always request full_output from the underlying method as _root_scalar
+    # always returns a RootResults object
+    kwargs.update(full_output=True, disp=False)
+
+    # Pick a method if not specified.
+    # Use the "best" method available for the situation.
+    if not method:
+        if bracket:
+            method = 'brentq'
+        elif x0 is not None:
+            if fprime:
+                if fprime2:
+                    method = 'halley'
+                else:
+                    method = 'newton'
+            elif x1 is not None:
+                method = 'secant'
+            else:
+                method = 'newton'
+    if not method:
+        raise ValueError('Unable to select a solver as neither bracket '
+                         'nor starting point provided.')
+
+    meth = method.lower()
+    map2underlying = {'halley': 'newton', 'secant': 'newton'}
+
+    try:
+        methodc = getattr(optzeros, map2underlying.get(meth, meth))
+    except AttributeError as e:
+        raise ValueError('Unknown solver %s' % meth) from e
+
+    if meth in ['bisect', 'ridder', 'brentq', 'brenth', 'toms748']:
+        if not isinstance(bracket, (list, tuple, np.ndarray)):
+            raise ValueError('Bracket needed for %s' % method)
+
+        a, b = bracket[:2]
+        try:
+            r, sol = methodc(f, a, b, args=args, **kwargs)
+        except ValueError as e:
+            # gh-17622 fixed some bugs in low-level solvers by raising an error
+            # (rather than returning incorrect results) when the callable
+            # returns a NaN. It did so by wrapping the callable rather than
+            # modifying compiled code, so the iteration count is not available.
+            if hasattr(e, "_x"):
+                sol = optzeros.RootResults(root=e._x,
+                                           iterations=np.nan,
+                                           function_calls=e._function_calls,
+                                           flag=str(e), method=method)
+            else:
+                raise
+
+    elif meth in ['secant']:
+        if x0 is None:
+            raise ValueError('x0 must not be None for %s' % method)
+        if 'xtol' in kwargs:
+            kwargs['tol'] = kwargs.pop('xtol')
+        r, sol = methodc(f, x0, args=args, fprime=None, fprime2=None,
+                         x1=x1, **kwargs)
+    elif meth in ['newton']:
+        if x0 is None:
+            raise ValueError('x0 must not be None for %s' % method)
+        if not fprime:
+            # approximate fprime with finite differences
+
+            def fprime(x, *args):
+                # `root_scalar` doesn't actually seem to support vectorized
+                # use of `newton`. In that case, `approx_derivative` will
+                # always get scalar input. Nonetheless, it always returns an
+                # array, so we extract the element to produce scalar output.
+                return approx_derivative(f, x, method='2-point', args=args)[0]
+
+        if 'xtol' in kwargs:
+            kwargs['tol'] = kwargs.pop('xtol')
+        r, sol = methodc(f, x0, args=args, fprime=fprime, fprime2=None,
+                         **kwargs)
+    elif meth in ['halley']:
+        if x0 is None:
+            raise ValueError('x0 must not be None for %s' % method)
+        if not fprime:
+            raise ValueError('fprime must be specified for %s' % method)
+        if not fprime2:
+            raise ValueError('fprime2 must be specified for %s' % method)
+        if 'xtol' in kwargs:
+            kwargs['tol'] = kwargs.pop('xtol')
+        r, sol = methodc(f, x0, args=args, fprime=fprime, fprime2=fprime2, **kwargs)
+    else:
+        raise ValueError('Unknown solver %s' % method)
+
+    if is_memoized:
+        # Replace the function_calls count with the memoized count.
+        # Avoids double and triple-counting.
+        n_calls = f.n_calls
+        sol.function_calls = n_calls
+
+    return sol
+
+
+def _root_scalar_brentq_doc():
+    r"""
+    Options
+    -------
+    args : tuple, optional
+        Extra arguments passed to the objective function.
+    bracket: A sequence of 2 floats, optional
+        An interval bracketing a root.  `f(x, *args)` must have different
+        signs at the two endpoints.
+    xtol : float, optional
+        Tolerance (absolute) for termination.
+    rtol : float, optional
+        Tolerance (relative) for termination.
+    maxiter : int, optional
+        Maximum number of iterations.
+    options: dict, optional
+        Specifies any method-specific options not covered above
+
+    """
+    pass
+
+
+def _root_scalar_brenth_doc():
+    r"""
+    Options
+    -------
+    args : tuple, optional
+        Extra arguments passed to the objective function.
+    bracket: A sequence of 2 floats, optional
+        An interval bracketing a root.  `f(x, *args)` must have different
+        signs at the two endpoints.
+    xtol : float, optional
+        Tolerance (absolute) for termination.
+    rtol : float, optional
+        Tolerance (relative) for termination.
+    maxiter : int, optional
+        Maximum number of iterations.
+    options: dict, optional
+        Specifies any method-specific options not covered above.
+
+    """
+    pass
+
+def _root_scalar_toms748_doc():
+    r"""
+    Options
+    -------
+    args : tuple, optional
+        Extra arguments passed to the objective function.
+    bracket: A sequence of 2 floats, optional
+        An interval bracketing a root.  `f(x, *args)` must have different
+        signs at the two endpoints.
+    xtol : float, optional
+        Tolerance (absolute) for termination.
+    rtol : float, optional
+        Tolerance (relative) for termination.
+    maxiter : int, optional
+        Maximum number of iterations.
+    options: dict, optional
+        Specifies any method-specific options not covered above.
+
+    """
+    pass
+
+
+def _root_scalar_secant_doc():
+    r"""
+    Options
+    -------
+    args : tuple, optional
+        Extra arguments passed to the objective function.
+    xtol : float, optional
+        Tolerance (absolute) for termination.
+    rtol : float, optional
+        Tolerance (relative) for termination.
+    maxiter : int, optional
+        Maximum number of iterations.
+    x0 : float, required
+        Initial guess.
+    x1 : float, required
+        A second guess.
+    options: dict, optional
+        Specifies any method-specific options not covered above.
+
+    """
+    pass
+
+
+def _root_scalar_newton_doc():
+    r"""
+    Options
+    -------
+    args : tuple, optional
+        Extra arguments passed to the objective function and its derivative.
+    xtol : float, optional
+        Tolerance (absolute) for termination.
+    rtol : float, optional
+        Tolerance (relative) for termination.
+    maxiter : int, optional
+        Maximum number of iterations.
+    x0 : float, required
+        Initial guess.
+    fprime : bool or callable, optional
+        If `fprime` is a boolean and is True, `f` is assumed to return the
+        value of derivative along with the objective function.
+        `fprime` can also be a callable returning the derivative of `f`. In
+        this case, it must accept the same arguments as `f`.
+    options: dict, optional
+        Specifies any method-specific options not covered above.
+
+    """
+    pass
+
+
+def _root_scalar_halley_doc():
+    r"""
+    Options
+    -------
+    args : tuple, optional
+        Extra arguments passed to the objective function and its derivatives.
+    xtol : float, optional
+        Tolerance (absolute) for termination.
+    rtol : float, optional
+        Tolerance (relative) for termination.
+    maxiter : int, optional
+        Maximum number of iterations.
+    x0 : float, required
+        Initial guess.
+    fprime : bool or callable, required
+        If `fprime` is a boolean and is True, `f` is assumed to return the
+        value of derivative along with the objective function.
+        `fprime` can also be a callable returning the derivative of `f`. In
+        this case, it must accept the same arguments as `f`.
+    fprime2 : bool or callable, required
+        If `fprime2` is a boolean and is True, `f` is assumed to return the
+        value of 1st and 2nd derivatives along with the objective function.
+        `fprime2` can also be a callable returning the 2nd derivative of `f`.
+        In this case, it must accept the same arguments as `f`.
+    options: dict, optional
+        Specifies any method-specific options not covered above.
+
+    """
+    pass
+
+
+def _root_scalar_ridder_doc():
+    r"""
+    Options
+    -------
+    args : tuple, optional
+        Extra arguments passed to the objective function.
+    bracket: A sequence of 2 floats, optional
+        An interval bracketing a root.  `f(x, *args)` must have different
+        signs at the two endpoints.
+    xtol : float, optional
+        Tolerance (absolute) for termination.
+    rtol : float, optional
+        Tolerance (relative) for termination.
+    maxiter : int, optional
+        Maximum number of iterations.
+    options: dict, optional
+        Specifies any method-specific options not covered above.
+
+    """
+    pass
+
+
+def _root_scalar_bisect_doc():
+    r"""
+    Options
+    -------
+    args : tuple, optional
+        Extra arguments passed to the objective function.
+    bracket: A sequence of 2 floats, optional
+        An interval bracketing a root.  `f(x, *args)` must have different
+        signs at the two endpoints.
+    xtol : float, optional
+        Tolerance (absolute) for termination.
+    rtol : float, optional
+        Tolerance (relative) for termination.
+    maxiter : int, optional
+        Maximum number of iterations.
+    options: dict, optional
+        Specifies any method-specific options not covered above.
+
+    """
+    pass
@@ -0,0 +1,460 @@
+import collections
+from abc import ABC, abstractmethod
+
+import numpy as np
+
+from scipy._lib._util import MapWrapper
+
+
+class VertexBase(ABC):
+    """
+    Base class for a vertex.
+    """
+    def __init__(self, x, nn=None, index=None):
+        """
+        Initiation of a vertex object.
+
+        Parameters
+        ----------
+        x : tuple or vector
+            The geometric location (domain).
+        nn : list, optional
+            Nearest neighbour list.
+        index : int, optional
+            Index of vertex.
+        """
+        self.x = x
+        self.hash = hash(self.x)  # Save precomputed hash
+
+        if nn is not None:
+            self.nn = set(nn)  # can use .indexupdate to add a new list
+        else:
+            self.nn = set()
+
+        self.index = index
+
+    def __hash__(self):
+        return self.hash
+
+    def __getattr__(self, item):
+        if item not in ['x_a']:
+            raise AttributeError(f"{type(self)} object has no attribute "
+                                 f"'{item}'")
+        if item == 'x_a':
+            self.x_a = np.array(self.x)
+            return self.x_a
+
+    @abstractmethod
+    def connect(self, v):
+        raise NotImplementedError("This method is only implemented with an "
+                                  "associated child of the base class.")
+
+    @abstractmethod
+    def disconnect(self, v):
+        raise NotImplementedError("This method is only implemented with an "
+                                  "associated child of the base class.")
+
+    def star(self):
+        """Returns the star domain ``st(v)`` of the vertex.
+
+        Parameters
+        ----------
+        v :
+            The vertex ``v`` in ``st(v)``
+
+        Returns
+        -------
+        st : set
+            A set containing all the vertices in ``st(v)``
+        """
+        self.st = self.nn
+        self.st.add(self)
+        return self.st
+
+
+class VertexScalarField(VertexBase):
+    """
+    Add homology properties of a scalar field f: R^n --> R associated with
+    the geometry built from the VertexBase class
+    """
+
+    def __init__(self, x, field=None, nn=None, index=None, field_args=(),
+                 g_cons=None, g_cons_args=()):
+        """
+        Parameters
+        ----------
+        x : tuple,
+            vector of vertex coordinates
+        field : callable, optional
+            a scalar field f: R^n --> R associated with the geometry
+        nn : list, optional
+            list of nearest neighbours
+        index : int, optional
+            index of the vertex
+        field_args : tuple, optional
+            additional arguments to be passed to field
+        g_cons : callable, optional
+            constraints on the vertex
+        g_cons_args : tuple, optional
+            additional arguments to be passed to g_cons
+
+        """
+        super().__init__(x, nn=nn, index=index)
+
+        # Note Vertex is only initiated once for all x so only
+        # evaluated once
+        # self.feasible = None
+
+        # self.f is externally defined by the cache to allow parallel
+        # processing
+        # None type that will break arithmetic operations unless defined
+        # self.f = None
+
+        self.check_min = True
+        self.check_max = True
+
+    def connect(self, v):
+        """Connects self to another vertex object v.
+
+        Parameters
+        ----------
+        v : VertexBase or VertexScalarField object
+        """
+        if v is not self and v not in self.nn:
+            self.nn.add(v)
+            v.nn.add(self)
+
+            # Flags for checking homology properties:
+            self.check_min = True
+            self.check_max = True
+            v.check_min = True
+            v.check_max = True
+
+    def disconnect(self, v):
+        if v in self.nn:
+            self.nn.remove(v)
+            v.nn.remove(self)
+
+            # Flags for checking homology properties:
+            self.check_min = True
+            self.check_max = True
+            v.check_min = True
+            v.check_max = True
+
+    def minimiser(self):
+        """Check whether this vertex is strictly less than all its
+           neighbours"""
+        if self.check_min:
+            self._min = all(self.f < v.f for v in self.nn)
+            self.check_min = False
+
+        return self._min
+
+    def maximiser(self):
+        """
+        Check whether this vertex is strictly greater than all its
+        neighbours.
+        """
+        if self.check_max:
+            self._max = all(self.f > v.f for v in self.nn)
+            self.check_max = False
+
+        return self._max
+
+
+class VertexVectorField(VertexBase):
+    """
+    Add homology properties of a scalar field f: R^n --> R^m associated with
+    the geometry built from the VertexBase class.
+    """
+
+    def __init__(self, x, sfield=None, vfield=None, field_args=(),
+                 vfield_args=(), g_cons=None,
+                 g_cons_args=(), nn=None, index=None):
+        super().__init__(x, nn=nn, index=index)
+
+        raise NotImplementedError("This class is still a work in progress")
+
+
+class VertexCacheBase:
+    """Base class for a vertex cache for a simplicial complex."""
+    def __init__(self):
+
+        self.cache = collections.OrderedDict()
+        self.nfev = 0  # Feasible points
+        self.index = -1
+
+    def __iter__(self):
+        for v in self.cache:
+            yield self.cache[v]
+        return
+
+    def size(self):
+        """Returns the size of the vertex cache."""
+        return self.index + 1
+
+    def print_out(self):
+        headlen = len(f"Vertex cache of size: {len(self.cache)}:")
+        print('=' * headlen)
+        print(f"Vertex cache of size: {len(self.cache)}:")
+        print('=' * headlen)
+        for v in self.cache:
+            self.cache[v].print_out()
+
+
+class VertexCube(VertexBase):
+    """Vertex class to be used for a pure simplicial complex with no associated
+    differential geometry (single level domain that exists in R^n)"""
+    def __init__(self, x, nn=None, index=None):
+        super().__init__(x, nn=nn, index=index)
+
+    def connect(self, v):
+        if v is not self and v not in self.nn:
+            self.nn.add(v)
+            v.nn.add(self)
+
+    def disconnect(self, v):
+        if v in self.nn:
+            self.nn.remove(v)
+            v.nn.remove(self)
+
+
+class VertexCacheIndex(VertexCacheBase):
+    def __init__(self):
+        """
+        Class for a vertex cache for a simplicial complex without an associated
+        field. Useful only for building and visualising a domain complex.
+
+        Parameters
+        ----------
+        """
+        super().__init__()
+        self.Vertex = VertexCube
+
+    def __getitem__(self, x, nn=None):
+        try:
+            return self.cache[x]
+        except KeyError:
+            self.index += 1
+            xval = self.Vertex(x, index=self.index)
+            # logging.info("New generated vertex at x = {}".format(x))
+            # NOTE: Surprisingly high performance increase if logging
+            # is commented out
+            self.cache[x] = xval
+            return self.cache[x]
+
+
+class VertexCacheField(VertexCacheBase):
+    def __init__(self, field=None, field_args=(), g_cons=None, g_cons_args=(),
+                 workers=1):
+        """
+        Class for a vertex cache for a simplicial complex with an associated
+        field.
+
+        Parameters
+        ----------
+        field : callable
+            Scalar or vector field callable.
+        field_args : tuple, optional
+            Any additional fixed parameters needed to completely specify the
+            field function
+        g_cons : dict or sequence of dict, optional
+            Constraints definition.
+            Function(s) ``R**n`` in the form::
+        g_cons_args : tuple, optional
+            Any additional fixed parameters needed to completely specify the
+            constraint functions
+        workers : int  optional
+            Uses `multiprocessing.Pool <multiprocessing>`) to compute the field
+             functions in parallel.
+
+        """
+        super().__init__()
+        self.index = -1
+        self.Vertex = VertexScalarField
+        self.field = field
+        self.field_args = field_args
+        self.wfield = FieldWrapper(field, field_args)  # if workers is not 1
+
+        self.g_cons = g_cons
+        self.g_cons_args = g_cons_args
+        self.wgcons = ConstraintWrapper(g_cons, g_cons_args)
+        self.gpool = set()  # A set of tuples to process for feasibility
+
+        # Field processing objects
+        self.fpool = set()  # A set of tuples to process for scalar function
+        self.sfc_lock = False  # True if self.fpool is non-Empty
+
+        self.workers = workers
+        self._mapwrapper = MapWrapper(workers)
+
+        if workers == 1:
+            self.process_gpool = self.proc_gpool
+            if g_cons is None:
+                self.process_fpool = self.proc_fpool_nog
+            else:
+                self.process_fpool = self.proc_fpool_g
+        else:
+            self.process_gpool = self.pproc_gpool
+            if g_cons is None:
+                self.process_fpool = self.pproc_fpool_nog
+            else:
+                self.process_fpool = self.pproc_fpool_g
+
+    def __getitem__(self, x, nn=None):
+        try:
+            return self.cache[x]
+        except KeyError:
+            self.index += 1
+            xval = self.Vertex(x, field=self.field, nn=nn, index=self.index,
+                               field_args=self.field_args,
+                               g_cons=self.g_cons,
+                               g_cons_args=self.g_cons_args)
+
+            self.cache[x] = xval  # Define in cache
+            self.gpool.add(xval)  # Add to pool for processing feasibility
+            self.fpool.add(xval)  # Add to pool for processing field values
+            return self.cache[x]
+
+    def __getstate__(self):
+        self_dict = self.__dict__.copy()
+        del self_dict['pool']
+        return self_dict
+
+    def process_pools(self):
+        if self.g_cons is not None:
+            self.process_gpool()
+        self.process_fpool()
+        self.proc_minimisers()
+
+    def feasibility_check(self, v):
+        v.feasible = True
+        for g, args in zip(self.g_cons, self.g_cons_args):
+            # constraint may return more than 1 value.
+            if np.any(g(v.x_a, *args) < 0.0):
+                v.f = np.inf
+                v.feasible = False
+                break
+
+    def compute_sfield(self, v):
+        """Compute the scalar field values of a vertex object `v`.
+
+        Parameters
+        ----------
+        v : VertexBase or VertexScalarField object
+        """
+        try:
+            v.f = self.field(v.x_a, *self.field_args)
+            self.nfev += 1
+        except AttributeError:
+            v.f = np.inf
+            # logging.warning(f"Field function not found at x = {self.x_a}")
+        if np.isnan(v.f):
+            v.f = np.inf
+
+    def proc_gpool(self):
+        """Process all constraints."""
+        if self.g_cons is not None:
+            for v in self.gpool:
+                self.feasibility_check(v)
+        # Clean the pool
+        self.gpool = set()
+
+    def pproc_gpool(self):
+        """Process all constraints in parallel."""
+        gpool_l = []
+        for v in self.gpool:
+            gpool_l.append(v.x_a)
+
+        G = self._mapwrapper(self.wgcons.gcons, gpool_l)
+        for v, g in zip(self.gpool, G):
+            v.feasible = g  # set vertex object attribute v.feasible = g (bool)
+
+    def proc_fpool_g(self):
+        """Process all field functions with constraints supplied."""
+        for v in self.fpool:
+            if v.feasible:
+                self.compute_sfield(v)
+        # Clean the pool
+        self.fpool = set()
+
+    def proc_fpool_nog(self):
+        """Process all field functions with no constraints supplied."""
+        for v in self.fpool:
+            self.compute_sfield(v)
+        # Clean the pool
+        self.fpool = set()
+
+    def pproc_fpool_g(self):
+        """
+        Process all field functions with constraints supplied in parallel.
+        """
+        self.wfield.func
+        fpool_l = []
+        for v in self.fpool:
+            if v.feasible:
+                fpool_l.append(v.x_a)
+            else:
+                v.f = np.inf
+        F = self._mapwrapper(self.wfield.func, fpool_l)
+        for va, f in zip(fpool_l, F):
+            vt = tuple(va)
+            self[vt].f = f  # set vertex object attribute v.f = f
+            self.nfev += 1
+        # Clean the pool
+        self.fpool = set()
+
+    def pproc_fpool_nog(self):
+        """
+        Process all field functions with no constraints supplied in parallel.
+        """
+        self.wfield.func
+        fpool_l = []
+        for v in self.fpool:
+            fpool_l.append(v.x_a)
+        F = self._mapwrapper(self.wfield.func, fpool_l)
+        for va, f in zip(fpool_l, F):
+            vt = tuple(va)
+            self[vt].f = f  # set vertex object attribute v.f = f
+            self.nfev += 1
+        # Clean the pool
+        self.fpool = set()
+
+    def proc_minimisers(self):
+        """Check for minimisers."""
+        for v in self:
+            v.minimiser()
+            v.maximiser()
+
+
+class ConstraintWrapper:
+    """Object to wrap constraints to pass to `multiprocessing.Pool`."""
+    def __init__(self, g_cons, g_cons_args):
+        self.g_cons = g_cons
+        self.g_cons_args = g_cons_args
+
+    def gcons(self, v_x_a):
+        vfeasible = True
+        for g, args in zip(self.g_cons, self.g_cons_args):
+            # constraint may return more than 1 value.
+            if np.any(g(v_x_a, *args) < 0.0):
+                vfeasible = False
+                break
+        return vfeasible
+
+
+class FieldWrapper:
+    """Object to wrap field to pass to `multiprocessing.Pool`."""
+    def __init__(self, field, field_args):
+        self.field = field
+        self.field_args = field_args
+
+    def func(self, v_x_a):
+        try:
+            v_f = self.field(v_x_a, *self.field_args)
+        except Exception:
+            v_f = np.inf
+        if np.isnan(v_f):
+            v_f = np.inf
+
+        return v_f
@@ -0,0 +1,513 @@
+"""
+This module implements the Sequential Least Squares Programming optimization
+algorithm (SLSQP), originally developed by Dieter Kraft.
+See http://www.netlib.org/toms/733
+
+Functions
+---------
+.. autosummary::
+   :toctree: generated/
+
+    approx_jacobian
+    fmin_slsqp
+
+"""
+
+__all__ = ['approx_jacobian', 'fmin_slsqp']
+
+import numpy as np
+from scipy.optimize._slsqp import slsqp
+from numpy import (zeros, array, linalg, append, concatenate, finfo,
+                   sqrt, vstack, isfinite, atleast_1d)
+from ._optimize import (OptimizeResult, _check_unknown_options,
+                        _prepare_scalar_function, _clip_x_for_func,
+                        _check_clip_x)
+from ._numdiff import approx_derivative
+from ._constraints import old_bound_to_new, _arr_to_scalar
+from scipy._lib._array_api import atleast_nd, array_namespace
+
+# deprecated imports to be removed in SciPy 1.13.0
+from numpy import exp, inf  # noqa: F401
+
+
+__docformat__ = "restructuredtext en"
+
+_epsilon = sqrt(finfo(float).eps)
+
+
+def approx_jacobian(x, func, epsilon, *args):
+    """
+    Approximate the Jacobian matrix of a callable function.
+
+    Parameters
+    ----------
+    x : array_like
+        The state vector at which to compute the Jacobian matrix.
+    func : callable f(x,*args)
+        The vector-valued function.
+    epsilon : float
+        The perturbation used to determine the partial derivatives.
+    args : sequence
+        Additional arguments passed to func.
+
+    Returns
+    -------
+    An array of dimensions ``(lenf, lenx)`` where ``lenf`` is the length
+    of the outputs of `func`, and ``lenx`` is the number of elements in
+    `x`.
+
+    Notes
+    -----
+    The approximation is done using forward differences.
+
+    """
+    # approx_derivative returns (m, n) == (lenf, lenx)
+    jac = approx_derivative(func, x, method='2-point', abs_step=epsilon,
+                            args=args)
+    # if func returns a scalar jac.shape will be (lenx,). Make sure
+    # it's at least a 2D array.
+    return np.atleast_2d(jac)
+
+
+def fmin_slsqp(func, x0, eqcons=(), f_eqcons=None, ieqcons=(), f_ieqcons=None,
+               bounds=(), fprime=None, fprime_eqcons=None,
+               fprime_ieqcons=None, args=(), iter=100, acc=1.0E-6,
+               iprint=1, disp=None, full_output=0, epsilon=_epsilon,
+               callback=None):
+    """
+    Minimize a function using Sequential Least Squares Programming
+
+    Python interface function for the SLSQP Optimization subroutine
+    originally implemented by Dieter Kraft.
+
+    Parameters
+    ----------
+    func : callable f(x,*args)
+        Objective function.  Must return a scalar.
+    x0 : 1-D ndarray of float
+        Initial guess for the independent variable(s).
+    eqcons : list, optional
+        A list of functions of length n such that
+        eqcons[j](x,*args) == 0.0 in a successfully optimized
+        problem.
+    f_eqcons : callable f(x,*args), optional
+        Returns a 1-D array in which each element must equal 0.0 in a
+        successfully optimized problem. If f_eqcons is specified,
+        eqcons is ignored.
+    ieqcons : list, optional
+        A list of functions of length n such that
+        ieqcons[j](x,*args) >= 0.0 in a successfully optimized
+        problem.
+    f_ieqcons : callable f(x,*args), optional
+        Returns a 1-D ndarray in which each element must be greater or
+        equal to 0.0 in a successfully optimized problem. If
+        f_ieqcons is specified, ieqcons is ignored.
+    bounds : list, optional
+        A list of tuples specifying the lower and upper bound
+        for each independent variable [(xl0, xu0),(xl1, xu1),...]
+        Infinite values will be interpreted as large floating values.
+    fprime : callable `f(x,*args)`, optional
+        A function that evaluates the partial derivatives of func.
+    fprime_eqcons : callable `f(x,*args)`, optional
+        A function of the form `f(x, *args)` that returns the m by n
+        array of equality constraint normals. If not provided,
+        the normals will be approximated. The array returned by
+        fprime_eqcons should be sized as ( len(eqcons), len(x0) ).
+    fprime_ieqcons : callable `f(x,*args)`, optional
+        A function of the form `f(x, *args)` that returns the m by n
+        array of inequality constraint normals. If not provided,
+        the normals will be approximated. The array returned by
+        fprime_ieqcons should be sized as ( len(ieqcons), len(x0) ).
+    args : sequence, optional
+        Additional arguments passed to func and fprime.
+    iter : int, optional
+        The maximum number of iterations.
+    acc : float, optional
+        Requested accuracy.
+    iprint : int, optional
+        The verbosity of fmin_slsqp :
+
+        * iprint <= 0 : Silent operation
+        * iprint == 1 : Print summary upon completion (default)
+        * iprint >= 2 : Print status of each iterate and summary
+    disp : int, optional
+        Overrides the iprint interface (preferred).
+    full_output : bool, optional
+        If False, return only the minimizer of func (default).
+        Otherwise, output final objective function and summary
+        information.
+    epsilon : float, optional
+        The step size for finite-difference derivative estimates.
+    callback : callable, optional
+        Called after each iteration, as ``callback(x)``, where ``x`` is the
+        current parameter vector.
+
+    Returns
+    -------
+    out : ndarray of float
+        The final minimizer of func.
+    fx : ndarray of float, if full_output is true
+        The final value of the objective function.
+    its : int, if full_output is true
+        The number of iterations.
+    imode : int, if full_output is true
+        The exit mode from the optimizer (see below).
+    smode : string, if full_output is true
+        Message describing the exit mode from the optimizer.
+
+    See also
+    --------
+    minimize: Interface to minimization algorithms for multivariate
+        functions. See the 'SLSQP' `method` in particular.
+
+    Notes
+    -----
+    Exit modes are defined as follows ::
+
+        -1 : Gradient evaluation required (g & a)
+         0 : Optimization terminated successfully
+         1 : Function evaluation required (f & c)
+         2 : More equality constraints than independent variables
+         3 : More than 3*n iterations in LSQ subproblem
+         4 : Inequality constraints incompatible
+         5 : Singular matrix E in LSQ subproblem
+         6 : Singular matrix C in LSQ subproblem
+         7 : Rank-deficient equality constraint subproblem HFTI
+         8 : Positive directional derivative for linesearch
+         9 : Iteration limit reached
+
+    Examples
+    --------
+    Examples are given :ref:`in the tutorial <tutorial-sqlsp>`.
+
+    """
+    if disp is not None:
+        iprint = disp
+
+    opts = {'maxiter': iter,
+            'ftol': acc,
+            'iprint': iprint,
+            'disp': iprint != 0,
+            'eps': epsilon,
+            'callback': callback}
+
+    # Build the constraints as a tuple of dictionaries
+    cons = ()
+    # 1. constraints of the 1st kind (eqcons, ieqcons); no Jacobian; take
+    #    the same extra arguments as the objective function.
+    cons += tuple({'type': 'eq', 'fun': c, 'args': args} for c in eqcons)
+    cons += tuple({'type': 'ineq', 'fun': c, 'args': args} for c in ieqcons)
+    # 2. constraints of the 2nd kind (f_eqcons, f_ieqcons) and their Jacobian
+    #    (fprime_eqcons, fprime_ieqcons); also take the same extra arguments
+    #    as the objective function.
+    if f_eqcons:
+        cons += ({'type': 'eq', 'fun': f_eqcons, 'jac': fprime_eqcons,
+                  'args': args}, )
+    if f_ieqcons:
+        cons += ({'type': 'ineq', 'fun': f_ieqcons, 'jac': fprime_ieqcons,
+                  'args': args}, )
+
+    res = _minimize_slsqp(func, x0, args, jac=fprime, bounds=bounds,
+                          constraints=cons, **opts)
+    if full_output:
+        return res['x'], res['fun'], res['nit'], res['status'], res['message']
+    else:
+        return res['x']
+
+
+def _minimize_slsqp(func, x0, args=(), jac=None, bounds=None,
+                    constraints=(),
+                    maxiter=100, ftol=1.0E-6, iprint=1, disp=False,
+                    eps=_epsilon, callback=None, finite_diff_rel_step=None,
+                    **unknown_options):
+    """
+    Minimize a scalar function of one or more variables using Sequential
+    Least Squares Programming (SLSQP).
+
+    Options
+    -------
+    ftol : float
+        Precision goal for the value of f in the stopping criterion.
+    eps : float
+        Step size used for numerical approximation of the Jacobian.
+    disp : bool
+        Set to True to print convergence messages. If False,
+        `verbosity` is ignored and set to 0.
+    maxiter : int
+        Maximum number of iterations.
+    finite_diff_rel_step : None or array_like, optional
+        If `jac in ['2-point', '3-point', 'cs']` the relative step size to
+        use for numerical approximation of `jac`. The absolute step
+        size is computed as ``h = rel_step * sign(x) * max(1, abs(x))``,
+        possibly adjusted to fit into the bounds. For ``method='3-point'``
+        the sign of `h` is ignored. If None (default) then step is selected
+        automatically.
+    """
+    _check_unknown_options(unknown_options)
+    iter = maxiter - 1
+    acc = ftol
+    epsilon = eps
+
+    if not disp:
+        iprint = 0
+
+    # Transform x0 into an array.
+    xp = array_namespace(x0)
+    x0 = atleast_nd(x0, ndim=1, xp=xp)
+    dtype = xp.float64
+    if xp.isdtype(x0.dtype, "real floating"):
+        dtype = x0.dtype
+    x = xp.reshape(xp.astype(x0, dtype), -1)
+
+    # SLSQP is sent 'old-style' bounds, 'new-style' bounds are required by
+    # ScalarFunction
+    if bounds is None or len(bounds) == 0:
+        new_bounds = (-np.inf, np.inf)
+    else:
+        new_bounds = old_bound_to_new(bounds)
+
+    # clip the initial guess to bounds, otherwise ScalarFunction doesn't work
+    x = np.clip(x, new_bounds[0], new_bounds[1])
+
+    # Constraints are triaged per type into a dictionary of tuples
+    if isinstance(constraints, dict):
+        constraints = (constraints, )
+
+    cons = {'eq': (), 'ineq': ()}
+    for ic, con in enumerate(constraints):
+        # check type
+        try:
+            ctype = con['type'].lower()
+        except KeyError as e:
+            raise KeyError('Constraint %d has no type defined.' % ic) from e
+        except TypeError as e:
+            raise TypeError('Constraints must be defined using a '
+                            'dictionary.') from e
+        except AttributeError as e:
+            raise TypeError("Constraint's type must be a string.") from e
+        else:
+            if ctype not in ['eq', 'ineq']:
+                raise ValueError("Unknown constraint type '%s'." % con['type'])
+
+        # check function
+        if 'fun' not in con:
+            raise ValueError('Constraint %d has no function defined.' % ic)
+
+        # check Jacobian
+        cjac = con.get('jac')
+        if cjac is None:
+            # approximate Jacobian function. The factory function is needed
+            # to keep a reference to `fun`, see gh-4240.
+            def cjac_factory(fun):
+                def cjac(x, *args):
+                    x = _check_clip_x(x, new_bounds)
+
+                    if jac in ['2-point', '3-point', 'cs']:
+                        return approx_derivative(fun, x, method=jac, args=args,
+                                                 rel_step=finite_diff_rel_step,
+                                                 bounds=new_bounds)
+                    else:
+                        return approx_derivative(fun, x, method='2-point',
+                                                 abs_step=epsilon, args=args,
+                                                 bounds=new_bounds)
+
+                return cjac
+            cjac = cjac_factory(con['fun'])
+
+        # update constraints' dictionary
+        cons[ctype] += ({'fun': con['fun'],
+                         'jac': cjac,
+                         'args': con.get('args', ())}, )
+
+    exit_modes = {-1: "Gradient evaluation required (g & a)",
+                   0: "Optimization terminated successfully",
+                   1: "Function evaluation required (f & c)",
+                   2: "More equality constraints than independent variables",
+                   3: "More than 3*n iterations in LSQ subproblem",
+                   4: "Inequality constraints incompatible",
+                   5: "Singular matrix E in LSQ subproblem",
+                   6: "Singular matrix C in LSQ subproblem",
+                   7: "Rank-deficient equality constraint subproblem HFTI",
+                   8: "Positive directional derivative for linesearch",
+                   9: "Iteration limit reached"}
+
+    # Set the parameters that SLSQP will need
+    # meq, mieq: number of equality and inequality constraints
+    meq = sum(map(len, [atleast_1d(c['fun'](x, *c['args']))
+              for c in cons['eq']]))
+    mieq = sum(map(len, [atleast_1d(c['fun'](x, *c['args']))
+               for c in cons['ineq']]))
+    # m = The total number of constraints
+    m = meq + mieq
+    # la = The number of constraints, or 1 if there are no constraints
+    la = array([1, m]).max()
+    # n = The number of independent variables
+    n = len(x)
+
+    # Define the workspaces for SLSQP
+    n1 = n + 1
+    mineq = m - meq + n1 + n1
+    len_w = (3*n1+m)*(n1+1)+(n1-meq+1)*(mineq+2) + 2*mineq+(n1+mineq)*(n1-meq) \
+            + 2*meq + n1 + ((n+1)*n)//2 + 2*m + 3*n + 3*n1 + 1
+    len_jw = mineq
+    w = zeros(len_w)
+    jw = zeros(len_jw)
+
+    # Decompose bounds into xl and xu
+    if bounds is None or len(bounds) == 0:
+        xl = np.empty(n, dtype=float)
+        xu = np.empty(n, dtype=float)
+        xl.fill(np.nan)
+        xu.fill(np.nan)
+    else:
+        bnds = array([(_arr_to_scalar(l), _arr_to_scalar(u))
+                      for (l, u) in bounds], float)
+        if bnds.shape[0] != n:
+            raise IndexError('SLSQP Error: the length of bounds is not '
+                             'compatible with that of x0.')
+
+        with np.errstate(invalid='ignore'):
+            bnderr = bnds[:, 0] > bnds[:, 1]
+
+        if bnderr.any():
+            raise ValueError('SLSQP Error: lb > ub in bounds %s.' %
+                             ', '.join(str(b) for b in bnderr))
+        xl, xu = bnds[:, 0], bnds[:, 1]
+
+        # Mark infinite bounds with nans; the Fortran code understands this
+        infbnd = ~isfinite(bnds)
+        xl[infbnd[:, 0]] = np.nan
+        xu[infbnd[:, 1]] = np.nan
+
+    # ScalarFunction provides function and gradient evaluation
+    sf = _prepare_scalar_function(func, x, jac=jac, args=args, epsilon=eps,
+                                  finite_diff_rel_step=finite_diff_rel_step,
+                                  bounds=new_bounds)
+    # gh11403 SLSQP sometimes exceeds bounds by 1 or 2 ULP, make sure this
+    # doesn't get sent to the func/grad evaluator.
+    wrapped_fun = _clip_x_for_func(sf.fun, new_bounds)
+    wrapped_grad = _clip_x_for_func(sf.grad, new_bounds)
+
+    # Initialize the iteration counter and the mode value
+    mode = array(0, int)
+    acc = array(acc, float)
+    majiter = array(iter, int)
+    majiter_prev = 0
+
+    # Initialize internal SLSQP state variables
+    alpha = array(0, float)
+    f0 = array(0, float)
+    gs = array(0, float)
+    h1 = array(0, float)
+    h2 = array(0, float)
+    h3 = array(0, float)
+    h4 = array(0, float)
+    t = array(0, float)
+    t0 = array(0, float)
+    tol = array(0, float)
+    iexact = array(0, int)
+    incons = array(0, int)
+    ireset = array(0, int)
+    itermx = array(0, int)
+    line = array(0, int)
+    n1 = array(0, int)
+    n2 = array(0, int)
+    n3 = array(0, int)
+
+    # Print the header if iprint >= 2
+    if iprint >= 2:
+        print("%5s %5s %16s %16s" % ("NIT", "FC", "OBJFUN", "GNORM"))
+
+    # mode is zero on entry, so call objective, constraints and gradients
+    # there should be no func evaluations here because it's cached from
+    # ScalarFunction
+    fx = wrapped_fun(x)
+    g = append(wrapped_grad(x), 0.0)
+    c = _eval_constraint(x, cons)
+    a = _eval_con_normals(x, cons, la, n, m, meq, mieq)
+
+    while 1:
+        # Call SLSQP
+        slsqp(m, meq, x, xl, xu, fx, c, g, a, acc, majiter, mode, w, jw,
+              alpha, f0, gs, h1, h2, h3, h4, t, t0, tol,
+              iexact, incons, ireset, itermx, line,
+              n1, n2, n3)
+
+        if mode == 1:  # objective and constraint evaluation required
+            fx = wrapped_fun(x)
+            c = _eval_constraint(x, cons)
+
+        if mode == -1:  # gradient evaluation required
+            g = append(wrapped_grad(x), 0.0)
+            a = _eval_con_normals(x, cons, la, n, m, meq, mieq)
+
+        if majiter > majiter_prev:
+            # call callback if major iteration has incremented
+            if callback is not None:
+                callback(np.copy(x))
+
+            # Print the status of the current iterate if iprint > 2
+            if iprint >= 2:
+                print("%5i %5i % 16.6E % 16.6E" % (majiter, sf.nfev,
+                                                   fx, linalg.norm(g)))
+
+        # If exit mode is not -1 or 1, slsqp has completed
+        if abs(mode) != 1:
+            break
+
+        majiter_prev = int(majiter)
+
+    # Optimization loop complete. Print status if requested
+    if iprint >= 1:
+        print(exit_modes[int(mode)] + "    (Exit mode " + str(mode) + ')')
+        print("            Current function value:", fx)
+        print("            Iterations:", majiter)
+        print("            Function evaluations:", sf.nfev)
+        print("            Gradient evaluations:", sf.ngev)
+
+    return OptimizeResult(x=x, fun=fx, jac=g[:-1], nit=int(majiter),
+                          nfev=sf.nfev, njev=sf.ngev, status=int(mode),
+                          message=exit_modes[int(mode)], success=(mode == 0))
+
+
+def _eval_constraint(x, cons):
+    # Compute constraints
+    if cons['eq']:
+        c_eq = concatenate([atleast_1d(con['fun'](x, *con['args']))
+                            for con in cons['eq']])
+    else:
+        c_eq = zeros(0)
+
+    if cons['ineq']:
+        c_ieq = concatenate([atleast_1d(con['fun'](x, *con['args']))
+                             for con in cons['ineq']])
+    else:
+        c_ieq = zeros(0)
+
+    # Now combine c_eq and c_ieq into a single matrix
+    c = concatenate((c_eq, c_ieq))
+    return c
+
+
+def _eval_con_normals(x, cons, la, n, m, meq, mieq):
+    # Compute the normals of the constraints
+    if cons['eq']:
+        a_eq = vstack([con['jac'](x, *con['args'])
+                       for con in cons['eq']])
+    else:  # no equality constraint
+        a_eq = zeros((meq, n))
+
+    if cons['ineq']:
+        a_ieq = vstack([con['jac'](x, *con['args'])
+                        for con in cons['ineq']])
+    else:  # no inequality constraint
+        a_ieq = zeros((mieq, n))
+
+    # Now combine a_eq and a_ieq into a single a matrix
+    if m == 0:  # no constraints
+        a = zeros((la, n))
+    else:
+        a = vstack((a_eq, a_ieq))
+    a = concatenate((a, zeros([la, 1])), 1)
+
+    return a
@@ -0,0 +1,260 @@
+"""
+Spectral Algorithm for Nonlinear Equations
+"""
+import collections
+
+import numpy as np
+from scipy.optimize import OptimizeResult
+from scipy.optimize._optimize import _check_unknown_options
+from ._linesearch import _nonmonotone_line_search_cruz, _nonmonotone_line_search_cheng
+
+class _NoConvergence(Exception):
+    pass
+
+
+def _root_df_sane(func, x0, args=(), ftol=1e-8, fatol=1e-300, maxfev=1000,
+                  fnorm=None, callback=None, disp=False, M=10, eta_strategy=None,
+                  sigma_eps=1e-10, sigma_0=1.0, line_search='cruz', **unknown_options):
+    r"""
+    Solve nonlinear equation with the DF-SANE method
+
+    Options
+    -------
+    ftol : float, optional
+        Relative norm tolerance.
+    fatol : float, optional
+        Absolute norm tolerance.
+        Algorithm terminates when ``||func(x)|| < fatol + ftol ||func(x_0)||``.
+    fnorm : callable, optional
+        Norm to use in the convergence check. If None, 2-norm is used.
+    maxfev : int, optional
+        Maximum number of function evaluations.
+    disp : bool, optional
+        Whether to print convergence process to stdout.
+    eta_strategy : callable, optional
+        Choice of the ``eta_k`` parameter, which gives slack for growth
+        of ``||F||**2``.  Called as ``eta_k = eta_strategy(k, x, F)`` with
+        `k` the iteration number, `x` the current iterate and `F` the current
+        residual. Should satisfy ``eta_k > 0`` and ``sum(eta, k=0..inf) < inf``.
+        Default: ``||F||**2 / (1 + k)**2``.
+    sigma_eps : float, optional
+        The spectral coefficient is constrained to ``sigma_eps < sigma < 1/sigma_eps``.
+        Default: 1e-10
+    sigma_0 : float, optional
+        Initial spectral coefficient.
+        Default: 1.0
+    M : int, optional
+        Number of iterates to include in the nonmonotonic line search.
+        Default: 10
+    line_search : {'cruz', 'cheng'}
+        Type of line search to employ. 'cruz' is the original one defined in
+        [Martinez & Raydan. Math. Comp. 75, 1429 (2006)], 'cheng' is
+        a modified search defined in [Cheng & Li. IMA J. Numer. Anal. 29, 814 (2009)].
+        Default: 'cruz'
+
+    References
+    ----------
+    .. [1] "Spectral residual method without gradient information for solving
+           large-scale nonlinear systems of equations." W. La Cruz,
+           J.M. Martinez, M. Raydan. Math. Comp. **75**, 1429 (2006).
+    .. [2] W. La Cruz, Opt. Meth. Software, 29, 24 (2014).
+    .. [3] W. Cheng, D.-H. Li. IMA J. Numer. Anal. **29**, 814 (2009).
+
+    """
+    _check_unknown_options(unknown_options)
+
+    if line_search not in ('cheng', 'cruz'):
+        raise ValueError(f"Invalid value {line_search!r} for 'line_search'")
+
+    nexp = 2
+
+    if eta_strategy is None:
+        # Different choice from [1], as their eta is not invariant
+        # vs. scaling of F.
+        def eta_strategy(k, x, F):
+            # Obtain squared 2-norm of the initial residual from the outer scope
+            return f_0 / (1 + k)**2
+
+    if fnorm is None:
+        def fnorm(F):
+            # Obtain squared 2-norm of the current residual from the outer scope
+            return f_k**(1.0/nexp)
+
+    def fmerit(F):
+        return np.linalg.norm(F)**nexp
+
+    nfev = [0]
+    f, x_k, x_shape, f_k, F_k, is_complex = _wrap_func(func, x0, fmerit,
+                                                       nfev, maxfev, args)
+
+    k = 0
+    f_0 = f_k
+    sigma_k = sigma_0
+
+    F_0_norm = fnorm(F_k)
+
+    # For the 'cruz' line search
+    prev_fs = collections.deque([f_k], M)
+
+    # For the 'cheng' line search
+    Q = 1.0
+    C = f_0
+
+    converged = False
+    message = "too many function evaluations required"
+
+    while True:
+        F_k_norm = fnorm(F_k)
+
+        if disp:
+            print("iter %d: ||F|| = %g, sigma = %g" % (k, F_k_norm, sigma_k))
+
+        if callback is not None:
+            callback(x_k, F_k)
+
+        if F_k_norm < ftol * F_0_norm + fatol:
+            # Converged!
+            message = "successful convergence"
+            converged = True
+            break
+
+        # Control spectral parameter, from [2]
+        if abs(sigma_k) > 1/sigma_eps:
+            sigma_k = 1/sigma_eps * np.sign(sigma_k)
+        elif abs(sigma_k) < sigma_eps:
+            sigma_k = sigma_eps
+
+        # Line search direction
+        d = -sigma_k * F_k
+
+        # Nonmonotone line search
+        eta = eta_strategy(k, x_k, F_k)
+        try:
+            if line_search == 'cruz':
+                alpha, xp, fp, Fp = _nonmonotone_line_search_cruz(f, x_k, d, prev_fs,
+                                                                  eta=eta)
+            elif line_search == 'cheng':
+                alpha, xp, fp, Fp, C, Q = _nonmonotone_line_search_cheng(f, x_k, d, f_k,
+                                                                         C, Q, eta=eta)
+        except _NoConvergence:
+            break
+
+        # Update spectral parameter
+        s_k = xp - x_k
+        y_k = Fp - F_k
+        sigma_k = np.vdot(s_k, s_k) / np.vdot(s_k, y_k)
+
+        # Take step
+        x_k = xp
+        F_k = Fp
+        f_k = fp
+
+        # Store function value
+        if line_search == 'cruz':
+            prev_fs.append(fp)
+
+        k += 1
+
+    x = _wrap_result(x_k, is_complex, shape=x_shape)
+    F = _wrap_result(F_k, is_complex)
+
+    result = OptimizeResult(x=x, success=converged,
+                            message=message,
+                            fun=F, nfev=nfev[0], nit=k, method="df-sane")
+
+    return result
+
+
+def _wrap_func(func, x0, fmerit, nfev_list, maxfev, args=()):
+    """
+    Wrap a function and an initial value so that (i) complex values
+    are wrapped to reals, and (ii) value for a merit function
+    fmerit(x, f) is computed at the same time, (iii) iteration count
+    is maintained and an exception is raised if it is exceeded.
+
+    Parameters
+    ----------
+    func : callable
+        Function to wrap
+    x0 : ndarray
+        Initial value
+    fmerit : callable
+        Merit function fmerit(f) for computing merit value from residual.
+    nfev_list : list
+        List to store number of evaluations in. Should be [0] in the beginning.
+    maxfev : int
+        Maximum number of evaluations before _NoConvergence is raised.
+    args : tuple
+        Extra arguments to func
+
+    Returns
+    -------
+    wrap_func : callable
+        Wrapped function, to be called as
+        ``F, fp = wrap_func(x0)``
+    x0_wrap : ndarray of float
+        Wrapped initial value; raveled to 1-D and complex
+        values mapped to reals.
+    x0_shape : tuple
+        Shape of the initial value array
+    f : float
+        Merit function at F
+    F : ndarray of float
+        Residual at x0_wrap
+    is_complex : bool
+        Whether complex values were mapped to reals
+
+    """
+    x0 = np.asarray(x0)
+    x0_shape = x0.shape
+    F = np.asarray(func(x0, *args)).ravel()
+    is_complex = np.iscomplexobj(x0) or np.iscomplexobj(F)
+    x0 = x0.ravel()
+
+    nfev_list[0] = 1
+
+    if is_complex:
+        def wrap_func(x):
+            if nfev_list[0] >= maxfev:
+                raise _NoConvergence()
+            nfev_list[0] += 1
+            z = _real2complex(x).reshape(x0_shape)
+            v = np.asarray(func(z, *args)).ravel()
+            F = _complex2real(v)
+            f = fmerit(F)
+            return f, F
+
+        x0 = _complex2real(x0)
+        F = _complex2real(F)
+    else:
+        def wrap_func(x):
+            if nfev_list[0] >= maxfev:
+                raise _NoConvergence()
+            nfev_list[0] += 1
+            x = x.reshape(x0_shape)
+            F = np.asarray(func(x, *args)).ravel()
+            f = fmerit(F)
+            return f, F
+
+    return wrap_func, x0, x0_shape, fmerit(F), F, is_complex
+
+
+def _wrap_result(result, is_complex, shape=None):
+    """
+    Convert from real to complex and reshape result arrays.
+    """
+    if is_complex:
+        z = _real2complex(result)
+    else:
+        z = result
+    if shape is not None:
+        z = z.reshape(shape)
+    return z
+
+
+def _real2complex(x):
+    return np.ascontiguousarray(x, dtype=float).view(np.complex128)
+
+
+def _complex2real(z):
+    return np.ascontiguousarray(z, dtype=complex).view(np.float64)
@@ -0,0 +1,430 @@
+# TNC Python interface
+# @(#) $Jeannot: tnc.py,v 1.11 2005/01/28 18:27:31 js Exp $
+
+# Copyright (c) 2004-2005, Jean-Sebastien Roy (js@jeannot.org)
+
+# Permission is hereby granted, free of charge, to any person obtaining a
+# copy of this software and associated documentation files (the
+# "Software"), to deal in the Software without restriction, including
+# without limitation the rights to use, copy, modify, merge, publish,
+# distribute, sublicense, and/or sell copies of the Software, and to
+# permit persons to whom the Software is furnished to do so, subject to
+# the following conditions:
+
+# The above copyright notice and this permission notice shall be included
+# in all copies or substantial portions of the Software.
+
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+# OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+# IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+# CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+# TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+# SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+"""
+TNC: A Python interface to the TNC non-linear optimizer
+
+TNC is a non-linear optimizer. To use it, you must provide a function to
+minimize. The function must take one argument: the list of coordinates where to
+evaluate the function; and it must return either a tuple, whose first element is the
+value of the function, and whose second argument is the gradient of the function
+(as a list of values); or None, to abort the minimization.
+"""
+
+from scipy.optimize import _moduleTNC as moduleTNC
+from ._optimize import (MemoizeJac, OptimizeResult, _check_unknown_options,
+                       _prepare_scalar_function)
+from ._constraints import old_bound_to_new
+from scipy._lib._array_api import atleast_nd, array_namespace
+
+from numpy import inf, array, zeros
+
+__all__ = ['fmin_tnc']
+
+
+MSG_NONE = 0  # No messages
+MSG_ITER = 1  # One line per iteration
+MSG_INFO = 2  # Informational messages
+MSG_VERS = 4  # Version info
+MSG_EXIT = 8  # Exit reasons
+MSG_ALL = MSG_ITER + MSG_INFO + MSG_VERS + MSG_EXIT
+
+MSGS = {
+        MSG_NONE: "No messages",
+        MSG_ITER: "One line per iteration",
+        MSG_INFO: "Informational messages",
+        MSG_VERS: "Version info",
+        MSG_EXIT: "Exit reasons",
+        MSG_ALL: "All messages"
+}
+
+INFEASIBLE = -1  # Infeasible (lower bound > upper bound)
+LOCALMINIMUM = 0  # Local minimum reached (|pg| ~= 0)
+FCONVERGED = 1  # Converged (|f_n-f_(n-1)| ~= 0)
+XCONVERGED = 2  # Converged (|x_n-x_(n-1)| ~= 0)
+MAXFUN = 3  # Max. number of function evaluations reached
+LSFAIL = 4  # Linear search failed
+CONSTANT = 5  # All lower bounds are equal to the upper bounds
+NOPROGRESS = 6  # Unable to progress
+USERABORT = 7  # User requested end of minimization
+
+RCSTRINGS = {
+        INFEASIBLE: "Infeasible (lower bound > upper bound)",
+        LOCALMINIMUM: "Local minimum reached (|pg| ~= 0)",
+        FCONVERGED: "Converged (|f_n-f_(n-1)| ~= 0)",
+        XCONVERGED: "Converged (|x_n-x_(n-1)| ~= 0)",
+        MAXFUN: "Max. number of function evaluations reached",
+        LSFAIL: "Linear search failed",
+        CONSTANT: "All lower bounds are equal to the upper bounds",
+        NOPROGRESS: "Unable to progress",
+        USERABORT: "User requested end of minimization"
+}
+
+# Changes to interface made by Travis Oliphant, Apr. 2004 for inclusion in
+#  SciPy
+
+
+def fmin_tnc(func, x0, fprime=None, args=(), approx_grad=0,
+             bounds=None, epsilon=1e-8, scale=None, offset=None,
+             messages=MSG_ALL, maxCGit=-1, maxfun=None, eta=-1,
+             stepmx=0, accuracy=0, fmin=0, ftol=-1, xtol=-1, pgtol=-1,
+             rescale=-1, disp=None, callback=None):
+    """
+    Minimize a function with variables subject to bounds, using
+    gradient information in a truncated Newton algorithm. This
+    method wraps a C implementation of the algorithm.
+
+    Parameters
+    ----------
+    func : callable ``func(x, *args)``
+        Function to minimize.  Must do one of:
+
+        1. Return f and g, where f is the value of the function and g its
+           gradient (a list of floats).
+
+        2. Return the function value but supply gradient function
+           separately as `fprime`.
+
+        3. Return the function value and set ``approx_grad=True``.
+
+        If the function returns None, the minimization
+        is aborted.
+    x0 : array_like
+        Initial estimate of minimum.
+    fprime : callable ``fprime(x, *args)``, optional
+        Gradient of `func`. If None, then either `func` must return the
+        function value and the gradient (``f,g = func(x, *args)``)
+        or `approx_grad` must be True.
+    args : tuple, optional
+        Arguments to pass to function.
+    approx_grad : bool, optional
+        If true, approximate the gradient numerically.
+    bounds : list, optional
+        (min, max) pairs for each element in x0, defining the
+        bounds on that parameter. Use None or +/-inf for one of
+        min or max when there is no bound in that direction.
+    epsilon : float, optional
+        Used if approx_grad is True. The stepsize in a finite
+        difference approximation for fprime.
+    scale : array_like, optional
+        Scaling factors to apply to each variable. If None, the
+        factors are up-low for interval bounded variables and
+        1+|x| for the others. Defaults to None.
+    offset : array_like, optional
+        Value to subtract from each variable. If None, the
+        offsets are (up+low)/2 for interval bounded variables
+        and x for the others.
+    messages : int, optional
+        Bit mask used to select messages display during
+        minimization values defined in the MSGS dict. Defaults to
+        MGS_ALL.
+    disp : int, optional
+        Integer interface to messages. 0 = no message, 5 = all messages
+    maxCGit : int, optional
+        Maximum number of hessian*vector evaluations per main
+        iteration. If maxCGit == 0, the direction chosen is
+        -gradient if maxCGit < 0, maxCGit is set to
+        max(1,min(50,n/2)). Defaults to -1.
+    maxfun : int, optional
+        Maximum number of function evaluation. If None, maxfun is
+        set to max(100, 10*len(x0)). Defaults to None. Note that this function
+        may violate the limit because of evaluating gradients by numerical
+        differentiation.
+    eta : float, optional
+        Severity of the line search. If < 0 or > 1, set to 0.25.
+        Defaults to -1.
+    stepmx : float, optional
+        Maximum step for the line search. May be increased during
+        call. If too small, it will be set to 10.0. Defaults to 0.
+    accuracy : float, optional
+        Relative precision for finite difference calculations. If
+        <= machine_precision, set to sqrt(machine_precision).
+        Defaults to 0.
+    fmin : float, optional
+        Minimum function value estimate. Defaults to 0.
+    ftol : float, optional
+        Precision goal for the value of f in the stopping criterion.
+        If ftol < 0.0, ftol is set to 0.0 defaults to -1.
+    xtol : float, optional
+        Precision goal for the value of x in the stopping
+        criterion (after applying x scaling factors). If xtol <
+        0.0, xtol is set to sqrt(machine_precision). Defaults to
+        -1.
+    pgtol : float, optional
+        Precision goal for the value of the projected gradient in
+        the stopping criterion (after applying x scaling factors).
+        If pgtol < 0.0, pgtol is set to 1e-2 * sqrt(accuracy).
+        Setting it to 0.0 is not recommended. Defaults to -1.
+    rescale : float, optional
+        Scaling factor (in log10) used to trigger f value
+        rescaling. If 0, rescale at each iteration. If a large
+        value, never rescale. If < 0, rescale is set to 1.3.
+    callback : callable, optional
+        Called after each iteration, as callback(xk), where xk is the
+        current parameter vector.
+
+    Returns
+    -------
+    x : ndarray
+        The solution.
+    nfeval : int
+        The number of function evaluations.
+    rc : int
+        Return code, see below
+
+    See also
+    --------
+    minimize: Interface to minimization algorithms for multivariate
+        functions. See the 'TNC' `method` in particular.
+
+    Notes
+    -----
+    The underlying algorithm is truncated Newton, also called
+    Newton Conjugate-Gradient. This method differs from
+    scipy.optimize.fmin_ncg in that
+
+    1. it wraps a C implementation of the algorithm
+    2. it allows each variable to be given an upper and lower bound.
+
+    The algorithm incorporates the bound constraints by determining
+    the descent direction as in an unconstrained truncated Newton,
+    but never taking a step-size large enough to leave the space
+    of feasible x's. The algorithm keeps track of a set of
+    currently active constraints, and ignores them when computing
+    the minimum allowable step size. (The x's associated with the
+    active constraint are kept fixed.) If the maximum allowable
+    step size is zero then a new constraint is added. At the end
+    of each iteration one of the constraints may be deemed no
+    longer active and removed. A constraint is considered
+    no longer active is if it is currently active
+    but the gradient for that variable points inward from the
+    constraint. The specific constraint removed is the one
+    associated with the variable of largest index whose
+    constraint is no longer active.
+
+    Return codes are defined as follows::
+
+        -1 : Infeasible (lower bound > upper bound)
+         0 : Local minimum reached (|pg| ~= 0)
+         1 : Converged (|f_n-f_(n-1)| ~= 0)
+         2 : Converged (|x_n-x_(n-1)| ~= 0)
+         3 : Max. number of function evaluations reached
+         4 : Linear search failed
+         5 : All lower bounds are equal to the upper bounds
+         6 : Unable to progress
+         7 : User requested end of minimization
+
+    References
+    ----------
+    Wright S., Nocedal J. (2006), 'Numerical Optimization'
+
+    Nash S.G. (1984), "Newton-Type Minimization Via the Lanczos Method",
+    SIAM Journal of Numerical Analysis 21, pp. 770-778
+
+    """
+    # handle fprime/approx_grad
+    if approx_grad:
+        fun = func
+        jac = None
+    elif fprime is None:
+        fun = MemoizeJac(func)
+        jac = fun.derivative
+    else:
+        fun = func
+        jac = fprime
+
+    if disp is not None:  # disp takes precedence over messages
+        mesg_num = disp
+    else:
+        mesg_num = {0:MSG_NONE, 1:MSG_ITER, 2:MSG_INFO, 3:MSG_VERS,
+                    4:MSG_EXIT, 5:MSG_ALL}.get(messages, MSG_ALL)
+    # build options
+    opts = {'eps': epsilon,
+            'scale': scale,
+            'offset': offset,
+            'mesg_num': mesg_num,
+            'maxCGit': maxCGit,
+            'maxfun': maxfun,
+            'eta': eta,
+            'stepmx': stepmx,
+            'accuracy': accuracy,
+            'minfev': fmin,
+            'ftol': ftol,
+            'xtol': xtol,
+            'gtol': pgtol,
+            'rescale': rescale,
+            'disp': False}
+
+    res = _minimize_tnc(fun, x0, args, jac, bounds, callback=callback, **opts)
+
+    return res['x'], res['nfev'], res['status']
+
+
+def _minimize_tnc(fun, x0, args=(), jac=None, bounds=None,
+                  eps=1e-8, scale=None, offset=None, mesg_num=None,
+                  maxCGit=-1, eta=-1, stepmx=0, accuracy=0,
+                  minfev=0, ftol=-1, xtol=-1, gtol=-1, rescale=-1, disp=False,
+                  callback=None, finite_diff_rel_step=None, maxfun=None,
+                  **unknown_options):
+    """
+    Minimize a scalar function of one or more variables using a truncated
+    Newton (TNC) algorithm.
+
+    Options
+    -------
+    eps : float or ndarray
+        If `jac is None` the absolute step size used for numerical
+        approximation of the jacobian via forward differences.
+    scale : list of floats
+        Scaling factors to apply to each variable. If None, the
+        factors are up-low for interval bounded variables and
+        1+|x] for the others. Defaults to None.
+    offset : float
+        Value to subtract from each variable. If None, the
+        offsets are (up+low)/2 for interval bounded variables
+        and x for the others.
+    disp : bool
+       Set to True to print convergence messages.
+    maxCGit : int
+        Maximum number of hessian*vector evaluations per main
+        iteration. If maxCGit == 0, the direction chosen is
+        -gradient if maxCGit < 0, maxCGit is set to
+        max(1,min(50,n/2)). Defaults to -1.
+    eta : float
+        Severity of the line search. If < 0 or > 1, set to 0.25.
+        Defaults to -1.
+    stepmx : float
+        Maximum step for the line search. May be increased during
+        call. If too small, it will be set to 10.0. Defaults to 0.
+    accuracy : float
+        Relative precision for finite difference calculations. If
+        <= machine_precision, set to sqrt(machine_precision).
+        Defaults to 0.
+    minfev : float
+        Minimum function value estimate. Defaults to 0.
+    ftol : float
+        Precision goal for the value of f in the stopping criterion.
+        If ftol < 0.0, ftol is set to 0.0 defaults to -1.
+    xtol : float
+        Precision goal for the value of x in the stopping
+        criterion (after applying x scaling factors). If xtol <
+        0.0, xtol is set to sqrt(machine_precision). Defaults to
+        -1.
+    gtol : float
+        Precision goal for the value of the projected gradient in
+        the stopping criterion (after applying x scaling factors).
+        If gtol < 0.0, gtol is set to 1e-2 * sqrt(accuracy).
+        Setting it to 0.0 is not recommended. Defaults to -1.
+    rescale : float
+        Scaling factor (in log10) used to trigger f value
+        rescaling.  If 0, rescale at each iteration.  If a large
+        value, never rescale.  If < 0, rescale is set to 1.3.
+    finite_diff_rel_step : None or array_like, optional
+        If `jac in ['2-point', '3-point', 'cs']` the relative step size to
+        use for numerical approximation of the jacobian. The absolute step
+        size is computed as ``h = rel_step * sign(x) * max(1, abs(x))``,
+        possibly adjusted to fit into the bounds. For ``method='3-point'``
+        the sign of `h` is ignored. If None (default) then step is selected
+        automatically.
+    maxfun : int
+        Maximum number of function evaluations. If None, `maxfun` is
+        set to max(100, 10*len(x0)). Defaults to None.
+    """
+    _check_unknown_options(unknown_options)
+    fmin = minfev
+    pgtol = gtol
+
+    xp = array_namespace(x0)
+    x0 = atleast_nd(x0, ndim=1, xp=xp)
+    dtype = xp.float64
+    if xp.isdtype(x0.dtype, "real floating"):
+        dtype = x0.dtype
+    x0 = xp.reshape(xp.astype(x0, dtype), -1)
+
+    n = len(x0)
+
+    if bounds is None:
+        bounds = [(None,None)] * n
+    if len(bounds) != n:
+        raise ValueError('length of x0 != length of bounds')
+    new_bounds = old_bound_to_new(bounds)
+
+    if mesg_num is not None:
+        messages = {0:MSG_NONE, 1:MSG_ITER, 2:MSG_INFO, 3:MSG_VERS,
+                    4:MSG_EXIT, 5:MSG_ALL}.get(mesg_num, MSG_ALL)
+    elif disp:
+        messages = MSG_ALL
+    else:
+        messages = MSG_NONE
+
+    sf = _prepare_scalar_function(fun, x0, jac=jac, args=args, epsilon=eps,
+                                  finite_diff_rel_step=finite_diff_rel_step,
+                                  bounds=new_bounds)
+    func_and_grad = sf.fun_and_grad
+
+    """
+    low, up   : the bounds (lists of floats)
+                if low is None, the lower bounds are removed.
+                if up is None, the upper bounds are removed.
+                low and up defaults to None
+    """
+    low = zeros(n)
+    up = zeros(n)
+    for i in range(n):
+        if bounds[i] is None:
+            l, u = -inf, inf
+        else:
+            l,u = bounds[i]
+            if l is None:
+                low[i] = -inf
+            else:
+                low[i] = l
+            if u is None:
+                up[i] = inf
+            else:
+                up[i] = u
+
+    if scale is None:
+        scale = array([])
+
+    if offset is None:
+        offset = array([])
+
+    if maxfun is None:
+        maxfun = max(100, 10*len(x0))
+
+    rc, nf, nit, x, funv, jacv = moduleTNC.tnc_minimize(
+        func_and_grad, x0, low, up, scale,
+        offset, messages, maxCGit, maxfun,
+        eta, stepmx, accuracy, fmin, ftol,
+        xtol, pgtol, rescale, callback
+    )
+    # the TNC documentation states: "On output, x, f and g may be very
+    # slightly out of sync because of scaling". Therefore re-evaluate
+    # func_and_grad so they are synced.
+    funv, jacv = func_and_grad(x)
+
+    return OptimizeResult(x=x, fun=funv, jac=jacv, nfev=sf.nfev,
+                          nit=nit, status=rc, message=RCSTRINGS[rc],
+                          success=(-1 < rc < 3))
@@ -0,0 +1,12 @@
+from ._trlib import TRLIBQuadraticSubproblem
+
+__all__ = ['TRLIBQuadraticSubproblem', 'get_trlib_quadratic_subproblem']
+
+
+def get_trlib_quadratic_subproblem(tol_rel_i=-2.0, tol_rel_b=-3.0, disp=False):
+    def subproblem_factory(x, fun, jac, hess, hessp):
+        return TRLIBQuadraticSubproblem(x, fun, jac, hess, hessp,
+                                        tol_rel_i=tol_rel_i,
+                                        tol_rel_b=tol_rel_b,
+                                        disp=disp)
+    return subproblem_factory
@@ -0,0 +1,304 @@
+"""Trust-region optimization."""
+import math
+import warnings
+
+import numpy as np
+import scipy.linalg
+from ._optimize import (_check_unknown_options, _status_message,
+                        OptimizeResult, _prepare_scalar_function,
+                        _call_callback_maybe_halt)
+from scipy.optimize._hessian_update_strategy import HessianUpdateStrategy
+from scipy.optimize._differentiable_functions import FD_METHODS
+__all__ = []
+
+
+def _wrap_function(function, args):
+    # wraps a minimizer function to count number of evaluations
+    # and to easily provide an args kwd.
+    ncalls = [0]
+    if function is None:
+        return ncalls, None
+
+    def function_wrapper(x, *wrapper_args):
+        ncalls[0] += 1
+        # A copy of x is sent to the user function (gh13740)
+        return function(np.copy(x), *(wrapper_args + args))
+
+    return ncalls, function_wrapper
+
+
+class BaseQuadraticSubproblem:
+    """
+    Base/abstract class defining the quadratic model for trust-region
+    minimization. Child classes must implement the ``solve`` method.
+
+    Values of the objective function, Jacobian and Hessian (if provided) at
+    the current iterate ``x`` are evaluated on demand and then stored as
+    attributes ``fun``, ``jac``, ``hess``.
+    """
+
+    def __init__(self, x, fun, jac, hess=None, hessp=None):
+        self._x = x
+        self._f = None
+        self._g = None
+        self._h = None
+        self._g_mag = None
+        self._cauchy_point = None
+        self._newton_point = None
+        self._fun = fun
+        self._jac = jac
+        self._hess = hess
+        self._hessp = hessp
+
+    def __call__(self, p):
+        return self.fun + np.dot(self.jac, p) + 0.5 * np.dot(p, self.hessp(p))
+
+    @property
+    def fun(self):
+        """Value of objective function at current iteration."""
+        if self._f is None:
+            self._f = self._fun(self._x)
+        return self._f
+
+    @property
+    def jac(self):
+        """Value of Jacobian of objective function at current iteration."""
+        if self._g is None:
+            self._g = self._jac(self._x)
+        return self._g
+
+    @property
+    def hess(self):
+        """Value of Hessian of objective function at current iteration."""
+        if self._h is None:
+            self._h = self._hess(self._x)
+        return self._h
+
+    def hessp(self, p):
+        if self._hessp is not None:
+            return self._hessp(self._x, p)
+        else:
+            return np.dot(self.hess, p)
+
+    @property
+    def jac_mag(self):
+        """Magnitude of jacobian of objective function at current iteration."""
+        if self._g_mag is None:
+            self._g_mag = scipy.linalg.norm(self.jac)
+        return self._g_mag
+
+    def get_boundaries_intersections(self, z, d, trust_radius):
+        """
+        Solve the scalar quadratic equation ``||z + t d|| == trust_radius``.
+        This is like a line-sphere intersection.
+        Return the two values of t, sorted from low to high.
+        """
+        a = np.dot(d, d)
+        b = 2 * np.dot(z, d)
+        c = np.dot(z, z) - trust_radius**2
+        sqrt_discriminant = math.sqrt(b*b - 4*a*c)
+
+        # The following calculation is mathematically
+        # equivalent to:
+        # ta = (-b - sqrt_discriminant) / (2*a)
+        # tb = (-b + sqrt_discriminant) / (2*a)
+        # but produce smaller round off errors.
+        # Look at Matrix Computation p.97
+        # for a better justification.
+        aux = b + math.copysign(sqrt_discriminant, b)
+        ta = -aux / (2*a)
+        tb = -2*c / aux
+        return sorted([ta, tb])
+
+    def solve(self, trust_radius):
+        raise NotImplementedError('The solve method should be implemented by '
+                                  'the child class')
+
+
+def _minimize_trust_region(fun, x0, args=(), jac=None, hess=None, hessp=None,
+                           subproblem=None, initial_trust_radius=1.0,
+                           max_trust_radius=1000.0, eta=0.15, gtol=1e-4,
+                           maxiter=None, disp=False, return_all=False,
+                           callback=None, inexact=True, **unknown_options):
+    """
+    Minimization of scalar function of one or more variables using a
+    trust-region algorithm.
+
+    Options for the trust-region algorithm are:
+        initial_trust_radius : float
+            Initial trust radius.
+        max_trust_radius : float
+            Never propose steps that are longer than this value.
+        eta : float
+            Trust region related acceptance stringency for proposed steps.
+        gtol : float
+            Gradient norm must be less than `gtol`
+            before successful termination.
+        maxiter : int
+            Maximum number of iterations to perform.
+        disp : bool
+            If True, print convergence message.
+        inexact : bool
+            Accuracy to solve subproblems. If True requires less nonlinear
+            iterations, but more vector products. Only effective for method
+            trust-krylov.
+
+    This function is called by the `minimize` function.
+    It is not supposed to be called directly.
+    """
+    _check_unknown_options(unknown_options)
+
+    if jac is None:
+        raise ValueError('Jacobian is currently required for trust-region '
+                         'methods')
+    if hess is None and hessp is None:
+        raise ValueError('Either the Hessian or the Hessian-vector product '
+                         'is currently required for trust-region methods')
+    if subproblem is None:
+        raise ValueError('A subproblem solving strategy is required for '
+                         'trust-region methods')
+    if not (0 <= eta < 0.25):
+        raise Exception('invalid acceptance stringency')
+    if max_trust_radius <= 0:
+        raise Exception('the max trust radius must be positive')
+    if initial_trust_radius <= 0:
+        raise ValueError('the initial trust radius must be positive')
+    if initial_trust_radius >= max_trust_radius:
+        raise ValueError('the initial trust radius must be less than the '
+                         'max trust radius')
+
+    # force the initial guess into a nice format
+    x0 = np.asarray(x0).flatten()
+
+    # A ScalarFunction representing the problem. This caches calls to fun, jac,
+    # hess.
+    sf = _prepare_scalar_function(fun, x0, jac=jac, hess=hess, args=args)
+    fun = sf.fun
+    jac = sf.grad
+    if callable(hess):
+        hess = sf.hess
+    elif callable(hessp):
+        # this elif statement must come before examining whether hess
+        # is estimated by FD methods or a HessianUpdateStrategy
+        pass
+    elif (hess in FD_METHODS or isinstance(hess, HessianUpdateStrategy)):
+        # If the Hessian is being estimated by finite differences or a
+        # Hessian update strategy then ScalarFunction.hess returns a
+        # LinearOperator or a HessianUpdateStrategy. This enables the
+        # calculation/creation of a hessp. BUT you only want to do this
+        # if the user *hasn't* provided a callable(hessp) function.
+        hess = None
+
+        def hessp(x, p, *args):
+            return sf.hess(x).dot(p)
+    else:
+        raise ValueError('Either the Hessian or the Hessian-vector product '
+                         'is currently required for trust-region methods')
+
+    # ScalarFunction doesn't represent hessp
+    nhessp, hessp = _wrap_function(hessp, args)
+
+    # limit the number of iterations
+    if maxiter is None:
+        maxiter = len(x0)*200
+
+    # init the search status
+    warnflag = 0
+
+    # initialize the search
+    trust_radius = initial_trust_radius
+    x = x0
+    if return_all:
+        allvecs = [x]
+    m = subproblem(x, fun, jac, hess, hessp)
+    k = 0
+
+    # search for the function min
+    # do not even start if the gradient is small enough
+    while m.jac_mag >= gtol:
+
+        # Solve the sub-problem.
+        # This gives us the proposed step relative to the current position
+        # and it tells us whether the proposed step
+        # has reached the trust region boundary or not.
+        try:
+            p, hits_boundary = m.solve(trust_radius)
+        except np.linalg.LinAlgError:
+            warnflag = 3
+            break
+
+        # calculate the predicted value at the proposed point
+        predicted_value = m(p)
+
+        # define the local approximation at the proposed point
+        x_proposed = x + p
+        m_proposed = subproblem(x_proposed, fun, jac, hess, hessp)
+
+        # evaluate the ratio defined in equation (4.4)
+        actual_reduction = m.fun - m_proposed.fun
+        predicted_reduction = m.fun - predicted_value
+        if predicted_reduction <= 0:
+            warnflag = 2
+            break
+        rho = actual_reduction / predicted_reduction
+
+        # update the trust radius according to the actual/predicted ratio
+        if rho < 0.25:
+            trust_radius *= 0.25
+        elif rho > 0.75 and hits_boundary:
+            trust_radius = min(2*trust_radius, max_trust_radius)
+
+        # if the ratio is high enough then accept the proposed step
+        if rho > eta:
+            x = x_proposed
+            m = m_proposed
+
+        # append the best guess, call back, increment the iteration count
+        if return_all:
+            allvecs.append(np.copy(x))
+        k += 1
+
+        intermediate_result = OptimizeResult(x=x, fun=m.fun)
+        if _call_callback_maybe_halt(callback, intermediate_result):
+            break
+
+        # check if the gradient is small enough to stop
+        if m.jac_mag < gtol:
+            warnflag = 0
+            break
+
+        # check if we have looked at enough iterations
+        if k >= maxiter:
+            warnflag = 1
+            break
+
+    # print some stuff if requested
+    status_messages = (
+            _status_message['success'],
+            _status_message['maxiter'],
+            'A bad approximation caused failure to predict improvement.',
+            'A linalg error occurred, such as a non-psd Hessian.',
+            )
+    if disp:
+        if warnflag == 0:
+            print(status_messages[warnflag])
+        else:
+            warnings.warn(status_messages[warnflag], RuntimeWarning, stacklevel=3)
+        print("         Current function value: %f" % m.fun)
+        print("         Iterations: %d" % k)
+        print("         Function evaluations: %d" % sf.nfev)
+        print("         Gradient evaluations: %d" % sf.ngev)
+        print("         Hessian evaluations: %d" % (sf.nhev + nhessp[0]))
+
+    result = OptimizeResult(x=x, success=(warnflag == 0), status=warnflag,
+                            fun=m.fun, jac=m.jac, nfev=sf.nfev, njev=sf.ngev,
+                            nhev=sf.nhev + nhessp[0], nit=k,
+                            message=status_messages[warnflag])
+
+    if hess is not None:
+        result['hess'] = m.hess
+
+    if return_all:
+        result['allvecs'] = allvecs
+
+    return result
@@ -0,0 +1,6 @@
+"""This module contains the equality constrained SQP solver."""
+
+
+from .minimize_trustregion_constr import _minimize_trustregion_constr
+
+__all__ = ['_minimize_trustregion_constr']
@@ -0,0 +1,390 @@
+import numpy as np
+import scipy.sparse as sps
+
+
+class CanonicalConstraint:
+    """Canonical constraint to use with trust-constr algorithm.
+
+    It represents the set of constraints of the form::
+
+        f_eq(x) = 0
+        f_ineq(x) <= 0
+
+    where ``f_eq`` and ``f_ineq`` are evaluated by a single function, see
+    below.
+
+    The class is supposed to be instantiated by factory methods, which
+    should prepare the parameters listed below.
+
+    Parameters
+    ----------
+    n_eq, n_ineq : int
+        Number of equality and inequality constraints respectively.
+    fun : callable
+        Function defining the constraints. The signature is
+        ``fun(x) -> c_eq, c_ineq``, where ``c_eq`` is ndarray with `n_eq`
+        components and ``c_ineq`` is ndarray with `n_ineq` components.
+    jac : callable
+        Function to evaluate the Jacobian of the constraint. The signature
+        is ``jac(x) -> J_eq, J_ineq``, where ``J_eq`` and ``J_ineq`` are
+        either ndarray of csr_matrix of shapes (n_eq, n) and (n_ineq, n),
+        respectively.
+    hess : callable
+        Function to evaluate the Hessian of the constraints multiplied
+        by Lagrange multipliers, that is
+        ``dot(f_eq, v_eq) + dot(f_ineq, v_ineq)``. The signature is
+        ``hess(x, v_eq, v_ineq) -> H``, where ``H`` has an implied
+        shape (n, n) and provide a matrix-vector product operation
+        ``H.dot(p)``.
+    keep_feasible : ndarray, shape (n_ineq,)
+        Mask indicating which inequality constraints should be kept feasible.
+    """
+    def __init__(self, n_eq, n_ineq, fun, jac, hess, keep_feasible):
+        self.n_eq = n_eq
+        self.n_ineq = n_ineq
+        self.fun = fun
+        self.jac = jac
+        self.hess = hess
+        self.keep_feasible = keep_feasible
+
+    @classmethod
+    def from_PreparedConstraint(cls, constraint):
+        """Create an instance from `PreparedConstrained` object."""
+        lb, ub = constraint.bounds
+        cfun = constraint.fun
+        keep_feasible = constraint.keep_feasible
+
+        if np.all(lb == -np.inf) and np.all(ub == np.inf):
+            return cls.empty(cfun.n)
+
+        if np.all(lb == -np.inf) and np.all(ub == np.inf):
+            return cls.empty(cfun.n)
+        elif np.all(lb == ub):
+            return cls._equal_to_canonical(cfun, lb)
+        elif np.all(lb == -np.inf):
+            return cls._less_to_canonical(cfun, ub, keep_feasible)
+        elif np.all(ub == np.inf):
+            return cls._greater_to_canonical(cfun, lb, keep_feasible)
+        else:
+            return cls._interval_to_canonical(cfun, lb, ub, keep_feasible)
+
+    @classmethod
+    def empty(cls, n):
+        """Create an "empty" instance.
+
+        This "empty" instance is required to allow working with unconstrained
+        problems as if they have some constraints.
+        """
+        empty_fun = np.empty(0)
+        empty_jac = np.empty((0, n))
+        empty_hess = sps.csr_matrix((n, n))
+
+        def fun(x):
+            return empty_fun, empty_fun
+
+        def jac(x):
+            return empty_jac, empty_jac
+
+        def hess(x, v_eq, v_ineq):
+            return empty_hess
+
+        return cls(0, 0, fun, jac, hess, np.empty(0, dtype=np.bool_))
+
+    @classmethod
+    def concatenate(cls, canonical_constraints, sparse_jacobian):
+        """Concatenate multiple `CanonicalConstraint` into one.
+
+        `sparse_jacobian` (bool) determines the Jacobian format of the
+        concatenated constraint. Note that items in `canonical_constraints`
+        must have their Jacobians in the same format.
+        """
+        def fun(x):
+            if canonical_constraints:
+                eq_all, ineq_all = zip(
+                        *[c.fun(x) for c in canonical_constraints])
+            else:
+                eq_all, ineq_all = [], []
+
+            return np.hstack(eq_all), np.hstack(ineq_all)
+
+        if sparse_jacobian:
+            vstack = sps.vstack
+        else:
+            vstack = np.vstack
+
+        def jac(x):
+            if canonical_constraints:
+                eq_all, ineq_all = zip(
+                        *[c.jac(x) for c in canonical_constraints])
+            else:
+                eq_all, ineq_all = [], []
+
+            return vstack(eq_all), vstack(ineq_all)
+
+        def hess(x, v_eq, v_ineq):
+            hess_all = []
+            index_eq = 0
+            index_ineq = 0
+            for c in canonical_constraints:
+                vc_eq = v_eq[index_eq:index_eq + c.n_eq]
+                vc_ineq = v_ineq[index_ineq:index_ineq + c.n_ineq]
+                hess_all.append(c.hess(x, vc_eq, vc_ineq))
+                index_eq += c.n_eq
+                index_ineq += c.n_ineq
+
+            def matvec(p):
+                result = np.zeros_like(p)
+                for h in hess_all:
+                    result += h.dot(p)
+                return result
+
+            n = x.shape[0]
+            return sps.linalg.LinearOperator((n, n), matvec, dtype=float)
+
+        n_eq = sum(c.n_eq for c in canonical_constraints)
+        n_ineq = sum(c.n_ineq for c in canonical_constraints)
+        keep_feasible = np.hstack([c.keep_feasible for c in
+                                   canonical_constraints])
+
+        return cls(n_eq, n_ineq, fun, jac, hess, keep_feasible)
+
+    @classmethod
+    def _equal_to_canonical(cls, cfun, value):
+        empty_fun = np.empty(0)
+        n = cfun.n
+
+        n_eq = value.shape[0]
+        n_ineq = 0
+        keep_feasible = np.empty(0, dtype=bool)
+
+        if cfun.sparse_jacobian:
+            empty_jac = sps.csr_matrix((0, n))
+        else:
+            empty_jac = np.empty((0, n))
+
+        def fun(x):
+            return cfun.fun(x) - value, empty_fun
+
+        def jac(x):
+            return cfun.jac(x), empty_jac
+
+        def hess(x, v_eq, v_ineq):
+            return cfun.hess(x, v_eq)
+
+        empty_fun = np.empty(0)
+        n = cfun.n
+        if cfun.sparse_jacobian:
+            empty_jac = sps.csr_matrix((0, n))
+        else:
+            empty_jac = np.empty((0, n))
+
+        return cls(n_eq, n_ineq, fun, jac, hess, keep_feasible)
+
+    @classmethod
+    def _less_to_canonical(cls, cfun, ub, keep_feasible):
+        empty_fun = np.empty(0)
+        n = cfun.n
+        if cfun.sparse_jacobian:
+            empty_jac = sps.csr_matrix((0, n))
+        else:
+            empty_jac = np.empty((0, n))
+
+        finite_ub = ub < np.inf
+        n_eq = 0
+        n_ineq = np.sum(finite_ub)
+
+        if np.all(finite_ub):
+            def fun(x):
+                return empty_fun, cfun.fun(x) - ub
+
+            def jac(x):
+                return empty_jac, cfun.jac(x)
+
+            def hess(x, v_eq, v_ineq):
+                return cfun.hess(x, v_ineq)
+        else:
+            finite_ub = np.nonzero(finite_ub)[0]
+            keep_feasible = keep_feasible[finite_ub]
+            ub = ub[finite_ub]
+
+            def fun(x):
+                return empty_fun, cfun.fun(x)[finite_ub] - ub
+
+            def jac(x):
+                return empty_jac, cfun.jac(x)[finite_ub]
+
+            def hess(x, v_eq, v_ineq):
+                v = np.zeros(cfun.m)
+                v[finite_ub] = v_ineq
+                return cfun.hess(x, v)
+
+        return cls(n_eq, n_ineq, fun, jac, hess, keep_feasible)
+
+    @classmethod
+    def _greater_to_canonical(cls, cfun, lb, keep_feasible):
+        empty_fun = np.empty(0)
+        n = cfun.n
+        if cfun.sparse_jacobian:
+            empty_jac = sps.csr_matrix((0, n))
+        else:
+            empty_jac = np.empty((0, n))
+
+        finite_lb = lb > -np.inf
+        n_eq = 0
+        n_ineq = np.sum(finite_lb)
+
+        if np.all(finite_lb):
+            def fun(x):
+                return empty_fun, lb - cfun.fun(x)
+
+            def jac(x):
+                return empty_jac, -cfun.jac(x)
+
+            def hess(x, v_eq, v_ineq):
+                return cfun.hess(x, -v_ineq)
+        else:
+            finite_lb = np.nonzero(finite_lb)[0]
+            keep_feasible = keep_feasible[finite_lb]
+            lb = lb[finite_lb]
+
+            def fun(x):
+                return empty_fun, lb - cfun.fun(x)[finite_lb]
+
+            def jac(x):
+                return empty_jac, -cfun.jac(x)[finite_lb]
+
+            def hess(x, v_eq, v_ineq):
+                v = np.zeros(cfun.m)
+                v[finite_lb] = -v_ineq
+                return cfun.hess(x, v)
+
+        return cls(n_eq, n_ineq, fun, jac, hess, keep_feasible)
+
+    @classmethod
+    def _interval_to_canonical(cls, cfun, lb, ub, keep_feasible):
+        lb_inf = lb == -np.inf
+        ub_inf = ub == np.inf
+        equal = lb == ub
+        less = lb_inf & ~ub_inf
+        greater = ub_inf & ~lb_inf
+        interval = ~equal & ~lb_inf & ~ub_inf
+
+        equal = np.nonzero(equal)[0]
+        less = np.nonzero(less)[0]
+        greater = np.nonzero(greater)[0]
+        interval = np.nonzero(interval)[0]
+        n_less = less.shape[0]
+        n_greater = greater.shape[0]
+        n_interval = interval.shape[0]
+        n_ineq = n_less + n_greater + 2 * n_interval
+        n_eq = equal.shape[0]
+
+        keep_feasible = np.hstack((keep_feasible[less],
+                                   keep_feasible[greater],
+                                   keep_feasible[interval],
+                                   keep_feasible[interval]))
+
+        def fun(x):
+            f = cfun.fun(x)
+            eq = f[equal] - lb[equal]
+            le = f[less] - ub[less]
+            ge = lb[greater] - f[greater]
+            il = f[interval] - ub[interval]
+            ig = lb[interval] - f[interval]
+            return eq, np.hstack((le, ge, il, ig))
+
+        def jac(x):
+            J = cfun.jac(x)
+            eq = J[equal]
+            le = J[less]
+            ge = -J[greater]
+            il = J[interval]
+            ig = -il
+            if sps.issparse(J):
+                ineq = sps.vstack((le, ge, il, ig))
+            else:
+                ineq = np.vstack((le, ge, il, ig))
+            return eq, ineq
+
+        def hess(x, v_eq, v_ineq):
+            n_start = 0
+            v_l = v_ineq[n_start:n_start + n_less]
+            n_start += n_less
+            v_g = v_ineq[n_start:n_start + n_greater]
+            n_start += n_greater
+            v_il = v_ineq[n_start:n_start + n_interval]
+            n_start += n_interval
+            v_ig = v_ineq[n_start:n_start + n_interval]
+
+            v = np.zeros_like(lb)
+            v[equal] = v_eq
+            v[less] = v_l
+            v[greater] = -v_g
+            v[interval] = v_il - v_ig
+
+            return cfun.hess(x, v)
+
+        return cls(n_eq, n_ineq, fun, jac, hess, keep_feasible)
+
+
+def initial_constraints_as_canonical(n, prepared_constraints, sparse_jacobian):
+    """Convert initial values of the constraints to the canonical format.
+
+    The purpose to avoid one additional call to the constraints at the initial
+    point. It takes saved values in `PreparedConstraint`, modififies and
+    concatenates them to the canonical constraint format.
+    """
+    c_eq = []
+    c_ineq = []
+    J_eq = []
+    J_ineq = []
+
+    for c in prepared_constraints:
+        f = c.fun.f
+        J = c.fun.J
+        lb, ub = c.bounds
+        if np.all(lb == ub):
+            c_eq.append(f - lb)
+            J_eq.append(J)
+        elif np.all(lb == -np.inf):
+            finite_ub = ub < np.inf
+            c_ineq.append(f[finite_ub] - ub[finite_ub])
+            J_ineq.append(J[finite_ub])
+        elif np.all(ub == np.inf):
+            finite_lb = lb > -np.inf
+            c_ineq.append(lb[finite_lb] - f[finite_lb])
+            J_ineq.append(-J[finite_lb])
+        else:
+            lb_inf = lb == -np.inf
+            ub_inf = ub == np.inf
+            equal = lb == ub
+            less = lb_inf & ~ub_inf
+            greater = ub_inf & ~lb_inf
+            interval = ~equal & ~lb_inf & ~ub_inf
+
+            c_eq.append(f[equal] - lb[equal])
+            c_ineq.append(f[less] - ub[less])
+            c_ineq.append(lb[greater] - f[greater])
+            c_ineq.append(f[interval] - ub[interval])
+            c_ineq.append(lb[interval] - f[interval])
+
+            J_eq.append(J[equal])
+            J_ineq.append(J[less])
+            J_ineq.append(-J[greater])
+            J_ineq.append(J[interval])
+            J_ineq.append(-J[interval])
+
+    c_eq = np.hstack(c_eq) if c_eq else np.empty(0)
+    c_ineq = np.hstack(c_ineq) if c_ineq else np.empty(0)
+
+    if sparse_jacobian:
+        vstack = sps.vstack
+        empty = sps.csr_matrix((0, n))
+    else:
+        vstack = np.vstack
+        empty = np.empty((0, n))
+
+    J_eq = vstack(J_eq) if J_eq else empty
+    J_ineq = vstack(J_ineq) if J_ineq else empty
+
+    return c_eq, c_ineq, J_eq, J_ineq
@@ -0,0 +1,217 @@
+"""Byrd-Omojokun Trust-Region SQP method."""
+
+from scipy.sparse import eye as speye
+from .projections import projections
+from .qp_subproblem import modified_dogleg, projected_cg, box_intersections
+import numpy as np
+from numpy.linalg import norm
+
+__all__ = ['equality_constrained_sqp']
+
+
+def default_scaling(x):
+    n, = np.shape(x)
+    return speye(n)
+
+
+def equality_constrained_sqp(fun_and_constr, grad_and_jac, lagr_hess,
+                             x0, fun0, grad0, constr0,
+                             jac0, stop_criteria,
+                             state,
+                             initial_penalty,
+                             initial_trust_radius,
+                             factorization_method,
+                             trust_lb=None,
+                             trust_ub=None,
+                             scaling=default_scaling):
+    """Solve nonlinear equality-constrained problem using trust-region SQP.
+
+    Solve optimization problem:
+
+        minimize fun(x)
+        subject to: constr(x) = 0
+
+    using Byrd-Omojokun Trust-Region SQP method described in [1]_. Several
+    implementation details are based on [2]_ and [3]_, p. 549.
+
+    References
+    ----------
+    .. [1] Lalee, Marucha, Jorge Nocedal, and Todd Plantenga. "On the
+           implementation of an algorithm for large-scale equality
+           constrained optimization." SIAM Journal on
+           Optimization 8.3 (1998): 682-706.
+    .. [2] Byrd, Richard H., Mary E. Hribar, and Jorge Nocedal.
+           "An interior point algorithm for large-scale nonlinear
+           programming." SIAM Journal on Optimization 9.4 (1999): 877-900.
+    .. [3] Nocedal, Jorge, and Stephen J. Wright. "Numerical optimization"
+           Second Edition (2006).
+    """
+    PENALTY_FACTOR = 0.3  # Rho from formula (3.51), reference [2]_, p.891.
+    LARGE_REDUCTION_RATIO = 0.9
+    INTERMEDIARY_REDUCTION_RATIO = 0.3
+    SUFFICIENT_REDUCTION_RATIO = 1e-8  # Eta from reference [2]_, p.892.
+    TRUST_ENLARGEMENT_FACTOR_L = 7.0
+    TRUST_ENLARGEMENT_FACTOR_S = 2.0
+    MAX_TRUST_REDUCTION = 0.5
+    MIN_TRUST_REDUCTION = 0.1
+    SOC_THRESHOLD = 0.1
+    TR_FACTOR = 0.8  # Zeta from formula (3.21), reference [2]_, p.885.
+    BOX_FACTOR = 0.5
+
+    n, = np.shape(x0)  # Number of parameters
+
+    # Set default lower and upper bounds.
+    if trust_lb is None:
+        trust_lb = np.full(n, -np.inf)
+    if trust_ub is None:
+        trust_ub = np.full(n, np.inf)
+
+    # Initial values
+    x = np.copy(x0)
+    trust_radius = initial_trust_radius
+    penalty = initial_penalty
+    # Compute Values
+    f = fun0
+    c = grad0
+    b = constr0
+    A = jac0
+    S = scaling(x)
+    # Get projections
+    Z, LS, Y = projections(A, factorization_method)
+    # Compute least-square lagrange multipliers
+    v = -LS.dot(c)
+    # Compute Hessian
+    H = lagr_hess(x, v)
+
+    # Update state parameters
+    optimality = norm(c + A.T.dot(v), np.inf)
+    constr_violation = norm(b, np.inf) if len(b) > 0 else 0
+    cg_info = {'niter': 0, 'stop_cond': 0,
+               'hits_boundary': False}
+
+    last_iteration_failed = False
+    while not stop_criteria(state, x, last_iteration_failed,
+                            optimality, constr_violation,
+                            trust_radius, penalty, cg_info):
+        # Normal Step - `dn`
+        # minimize 1/2*||A dn + b||^2
+        # subject to:
+        # ||dn|| <= TR_FACTOR * trust_radius
+        # BOX_FACTOR * lb <= dn <= BOX_FACTOR * ub.
+        dn = modified_dogleg(A, Y, b,
+                             TR_FACTOR*trust_radius,
+                             BOX_FACTOR*trust_lb,
+                             BOX_FACTOR*trust_ub)
+
+        # Tangential Step - `dt`
+        # Solve the QP problem:
+        # minimize 1/2 dt.T H dt + dt.T (H dn + c)
+        # subject to:
+        # A dt = 0
+        # ||dt|| <= sqrt(trust_radius**2 - ||dn||**2)
+        # lb - dn <= dt <= ub - dn
+        c_t = H.dot(dn) + c
+        b_t = np.zeros_like(b)
+        trust_radius_t = np.sqrt(trust_radius**2 - np.linalg.norm(dn)**2)
+        lb_t = trust_lb - dn
+        ub_t = trust_ub - dn
+        dt, cg_info = projected_cg(H, c_t, Z, Y, b_t,
+                                   trust_radius_t,
+                                   lb_t, ub_t)
+
+        # Compute update (normal + tangential steps).
+        d = dn + dt
+
+        # Compute second order model: 1/2 d H d + c.T d + f.
+        quadratic_model = 1/2*(H.dot(d)).dot(d) + c.T.dot(d)
+        # Compute linearized constraint: l = A d + b.
+        linearized_constr = A.dot(d)+b
+        # Compute new penalty parameter according to formula (3.52),
+        # reference [2]_, p.891.
+        vpred = norm(b) - norm(linearized_constr)
+        # Guarantee `vpred` always positive,
+        # regardless of roundoff errors.
+        vpred = max(1e-16, vpred)
+        previous_penalty = penalty
+        if quadratic_model > 0:
+            new_penalty = quadratic_model / ((1-PENALTY_FACTOR)*vpred)
+            penalty = max(penalty, new_penalty)
+        # Compute predicted reduction according to formula (3.52),
+        # reference [2]_, p.891.
+        predicted_reduction = -quadratic_model + penalty*vpred
+
+        # Compute merit function at current point
+        merit_function = f + penalty*norm(b)
+        # Evaluate function and constraints at trial point
+        x_next = x + S.dot(d)
+        f_next, b_next = fun_and_constr(x_next)
+        # Compute merit function at trial point
+        merit_function_next = f_next + penalty*norm(b_next)
+        # Compute actual reduction according to formula (3.54),
+        # reference [2]_, p.892.
+        actual_reduction = merit_function - merit_function_next
+        # Compute reduction ratio
+        reduction_ratio = actual_reduction / predicted_reduction
+
+        # Second order correction (SOC), reference [2]_, p.892.
+        if reduction_ratio < SUFFICIENT_REDUCTION_RATIO and \
+           norm(dn) <= SOC_THRESHOLD * norm(dt):
+            # Compute second order correction
+            y = -Y.dot(b_next)
+            # Make sure increment is inside box constraints
+            _, t, intersect = box_intersections(d, y, trust_lb, trust_ub)
+            # Compute tentative point
+            x_soc = x + S.dot(d + t*y)
+            f_soc, b_soc = fun_and_constr(x_soc)
+            # Recompute actual reduction
+            merit_function_soc = f_soc + penalty*norm(b_soc)
+            actual_reduction_soc = merit_function - merit_function_soc
+            # Recompute reduction ratio
+            reduction_ratio_soc = actual_reduction_soc / predicted_reduction
+            if intersect and reduction_ratio_soc >= SUFFICIENT_REDUCTION_RATIO:
+                x_next = x_soc
+                f_next = f_soc
+                b_next = b_soc
+                reduction_ratio = reduction_ratio_soc
+
+        # Readjust trust region step, formula (3.55), reference [2]_, p.892.
+        if reduction_ratio >= LARGE_REDUCTION_RATIO:
+            trust_radius = max(TRUST_ENLARGEMENT_FACTOR_L * norm(d),
+                               trust_radius)
+        elif reduction_ratio >= INTERMEDIARY_REDUCTION_RATIO:
+            trust_radius = max(TRUST_ENLARGEMENT_FACTOR_S * norm(d),
+                               trust_radius)
+        # Reduce trust region step, according to reference [3]_, p.696.
+        elif reduction_ratio < SUFFICIENT_REDUCTION_RATIO:
+            trust_reduction = ((1-SUFFICIENT_REDUCTION_RATIO) /
+                               (1-reduction_ratio))
+            new_trust_radius = trust_reduction * norm(d)
+            if new_trust_radius >= MAX_TRUST_REDUCTION * trust_radius:
+                trust_radius *= MAX_TRUST_REDUCTION
+            elif new_trust_radius >= MIN_TRUST_REDUCTION * trust_radius:
+                trust_radius = new_trust_radius
+            else:
+                trust_radius *= MIN_TRUST_REDUCTION
+
+        # Update iteration
+        if reduction_ratio >= SUFFICIENT_REDUCTION_RATIO:
+            x = x_next
+            f, b = f_next, b_next
+            c, A = grad_and_jac(x)
+            S = scaling(x)
+            # Get projections
+            Z, LS, Y = projections(A, factorization_method)
+            # Compute least-square lagrange multipliers
+            v = -LS.dot(c)
+            # Compute Hessian
+            H = lagr_hess(x, v)
+            # Set Flag
+            last_iteration_failed = False
+            # Otimality values
+            optimality = norm(c + A.T.dot(v), np.inf)
+            constr_violation = norm(b, np.inf) if len(b) > 0 else 0
+        else:
+            penalty = previous_penalty
+            last_iteration_failed = True
+
+    return x, state
@@ -0,0 +1,564 @@
+import time
+import numpy as np
+from scipy.sparse.linalg import LinearOperator
+from .._differentiable_functions import VectorFunction
+from .._constraints import (
+    NonlinearConstraint, LinearConstraint, PreparedConstraint, Bounds, strict_bounds)
+from .._hessian_update_strategy import BFGS
+from .._optimize import OptimizeResult
+from .._differentiable_functions import ScalarFunction
+from .equality_constrained_sqp import equality_constrained_sqp
+from .canonical_constraint import (CanonicalConstraint,
+                                   initial_constraints_as_canonical)
+from .tr_interior_point import tr_interior_point
+from .report import BasicReport, SQPReport, IPReport
+
+
+TERMINATION_MESSAGES = {
+    0: "The maximum number of function evaluations is exceeded.",
+    1: "`gtol` termination condition is satisfied.",
+    2: "`xtol` termination condition is satisfied.",
+    3: "`callback` function requested termination."
+}
+
+
+class HessianLinearOperator:
+    """Build LinearOperator from hessp"""
+    def __init__(self, hessp, n):
+        self.hessp = hessp
+        self.n = n
+
+    def __call__(self, x, *args):
+        def matvec(p):
+            return self.hessp(x, p, *args)
+
+        return LinearOperator((self.n, self.n), matvec=matvec)
+
+
+class LagrangianHessian:
+    """The Hessian of the Lagrangian as LinearOperator.
+
+    The Lagrangian is computed as the objective function plus all the
+    constraints multiplied with some numbers (Lagrange multipliers).
+    """
+    def __init__(self, n, objective_hess, constraints_hess):
+        self.n = n
+        self.objective_hess = objective_hess
+        self.constraints_hess = constraints_hess
+
+    def __call__(self, x, v_eq=np.empty(0), v_ineq=np.empty(0)):
+        H_objective = self.objective_hess(x)
+        H_constraints = self.constraints_hess(x, v_eq, v_ineq)
+
+        def matvec(p):
+            return H_objective.dot(p) + H_constraints.dot(p)
+
+        return LinearOperator((self.n, self.n), matvec)
+
+
+def update_state_sqp(state, x, last_iteration_failed, objective, prepared_constraints,
+                     start_time, tr_radius, constr_penalty, cg_info):
+    state.nit += 1
+    state.nfev = objective.nfev
+    state.njev = objective.ngev
+    state.nhev = objective.nhev
+    state.constr_nfev = [c.fun.nfev if isinstance(c.fun, VectorFunction) else 0
+                         for c in prepared_constraints]
+    state.constr_njev = [c.fun.njev if isinstance(c.fun, VectorFunction) else 0
+                         for c in prepared_constraints]
+    state.constr_nhev = [c.fun.nhev if isinstance(c.fun, VectorFunction) else 0
+                         for c in prepared_constraints]
+
+    if not last_iteration_failed:
+        state.x = x
+        state.fun = objective.f
+        state.grad = objective.g
+        state.v = [c.fun.v for c in prepared_constraints]
+        state.constr = [c.fun.f for c in prepared_constraints]
+        state.jac = [c.fun.J for c in prepared_constraints]
+        # Compute Lagrangian Gradient
+        state.lagrangian_grad = np.copy(state.grad)
+        for c in prepared_constraints:
+            state.lagrangian_grad += c.fun.J.T.dot(c.fun.v)
+        state.optimality = np.linalg.norm(state.lagrangian_grad, np.inf)
+        # Compute maximum constraint violation
+        state.constr_violation = 0
+        for i in range(len(prepared_constraints)):
+            lb, ub = prepared_constraints[i].bounds
+            c = state.constr[i]
+            state.constr_violation = np.max([state.constr_violation,
+                                             np.max(lb - c),
+                                             np.max(c - ub)])
+
+    state.execution_time = time.time() - start_time
+    state.tr_radius = tr_radius
+    state.constr_penalty = constr_penalty
+    state.cg_niter += cg_info["niter"]
+    state.cg_stop_cond = cg_info["stop_cond"]
+
+    return state
+
+
+def update_state_ip(state, x, last_iteration_failed, objective,
+                    prepared_constraints, start_time,
+                    tr_radius, constr_penalty, cg_info,
+                    barrier_parameter, barrier_tolerance):
+    state = update_state_sqp(state, x, last_iteration_failed, objective,
+                             prepared_constraints, start_time, tr_radius,
+                             constr_penalty, cg_info)
+    state.barrier_parameter = barrier_parameter
+    state.barrier_tolerance = barrier_tolerance
+    return state
+
+
+def _minimize_trustregion_constr(fun, x0, args, grad,
+                                 hess, hessp, bounds, constraints,
+                                 xtol=1e-8, gtol=1e-8,
+                                 barrier_tol=1e-8,
+                                 sparse_jacobian=None,
+                                 callback=None, maxiter=1000,
+                                 verbose=0, finite_diff_rel_step=None,
+                                 initial_constr_penalty=1.0, initial_tr_radius=1.0,
+                                 initial_barrier_parameter=0.1,
+                                 initial_barrier_tolerance=0.1,
+                                 factorization_method=None,
+                                 disp=False):
+    """Minimize a scalar function subject to constraints.
+
+    Parameters
+    ----------
+    gtol : float, optional
+        Tolerance for termination by the norm of the Lagrangian gradient.
+        The algorithm will terminate when both the infinity norm (i.e., max
+        abs value) of the Lagrangian gradient and the constraint violation
+        are smaller than ``gtol``. Default is 1e-8.
+    xtol : float, optional
+        Tolerance for termination by the change of the independent variable.
+        The algorithm will terminate when ``tr_radius < xtol``, where
+        ``tr_radius`` is the radius of the trust region used in the algorithm.
+        Default is 1e-8.
+    barrier_tol : float, optional
+        Threshold on the barrier parameter for the algorithm termination.
+        When inequality constraints are present, the algorithm will terminate
+        only when the barrier parameter is less than `barrier_tol`.
+        Default is 1e-8.
+    sparse_jacobian : {bool, None}, optional
+        Determines how to represent Jacobians of the constraints. If bool,
+        then Jacobians of all the constraints will be converted to the
+        corresponding format. If None (default), then Jacobians won't be
+        converted, but the algorithm can proceed only if they all have the
+        same format.
+    initial_tr_radius: float, optional
+        Initial trust radius. The trust radius gives the maximum distance
+        between solution points in consecutive iterations. It reflects the
+        trust the algorithm puts in the local approximation of the optimization
+        problem. For an accurate local approximation the trust-region should be
+        large and for an  approximation valid only close to the current point it
+        should be a small one. The trust radius is automatically updated throughout
+        the optimization process, with ``initial_tr_radius`` being its initial value.
+        Default is 1 (recommended in [1]_, p. 19).
+    initial_constr_penalty : float, optional
+        Initial constraints penalty parameter. The penalty parameter is used for
+        balancing the requirements of decreasing the objective function
+        and satisfying the constraints. It is used for defining the merit function:
+        ``merit_function(x) = fun(x) + constr_penalty * constr_norm_l2(x)``,
+        where ``constr_norm_l2(x)`` is the l2 norm of a vector containing all
+        the constraints. The merit function is used for accepting or rejecting
+        trial points and ``constr_penalty`` weights the two conflicting goals
+        of reducing objective function and constraints. The penalty is automatically
+        updated throughout the optimization  process, with
+        ``initial_constr_penalty`` being its  initial value. Default is 1
+        (recommended in [1]_, p 19).
+    initial_barrier_parameter, initial_barrier_tolerance: float, optional
+        Initial barrier parameter and initial tolerance for the barrier subproblem.
+        Both are used only when inequality constraints are present. For dealing with
+        optimization problems ``min_x f(x)`` subject to inequality constraints
+        ``c(x) <= 0`` the algorithm introduces slack variables, solving the problem
+        ``min_(x,s) f(x) + barrier_parameter*sum(ln(s))`` subject to the equality
+        constraints  ``c(x) + s = 0`` instead of the original problem. This subproblem
+        is solved for decreasing values of ``barrier_parameter`` and with decreasing
+        tolerances for the termination, starting with ``initial_barrier_parameter``
+        for the barrier parameter and ``initial_barrier_tolerance`` for the
+        barrier tolerance. Default is 0.1 for both values (recommended in [1]_ p. 19).
+        Also note that ``barrier_parameter`` and ``barrier_tolerance`` are updated
+        with the same prefactor.
+    factorization_method : string or None, optional
+        Method to factorize the Jacobian of the constraints. Use None (default)
+        for the auto selection or one of:
+
+            - 'NormalEquation' (requires scikit-sparse)
+            - 'AugmentedSystem'
+            - 'QRFactorization'
+            - 'SVDFactorization'
+
+        The methods 'NormalEquation' and 'AugmentedSystem' can be used only
+        with sparse constraints. The projections required by the algorithm
+        will be computed using, respectively, the normal equation  and the
+        augmented system approaches explained in [1]_. 'NormalEquation'
+        computes the Cholesky factorization of ``A A.T`` and 'AugmentedSystem'
+        performs the LU factorization of an augmented system. They usually
+        provide similar results. 'AugmentedSystem' is used by default for
+        sparse matrices.
+
+        The methods 'QRFactorization' and 'SVDFactorization' can be used
+        only with dense constraints. They compute the required projections
+        using, respectively, QR and SVD factorizations. The 'SVDFactorization'
+        method can cope with Jacobian matrices with deficient row rank and will
+        be used whenever other factorization methods fail (which may imply the
+        conversion of sparse matrices to a dense format when required).
+        By default, 'QRFactorization' is used for dense matrices.
+    finite_diff_rel_step : None or array_like, optional
+        Relative step size for the finite difference approximation.
+    maxiter : int, optional
+        Maximum number of algorithm iterations. Default is 1000.
+    verbose : {0, 1, 2}, optional
+        Level of algorithm's verbosity:
+
+            * 0 (default) : work silently.
+            * 1 : display a termination report.
+            * 2 : display progress during iterations.
+            * 3 : display progress during iterations (more complete report).
+
+    disp : bool, optional
+        If True (default), then `verbose` will be set to 1 if it was 0.
+
+    Returns
+    -------
+    `OptimizeResult` with the fields documented below. Note the following:
+
+        1. All values corresponding to the constraints are ordered as they
+           were passed to the solver. And values corresponding to `bounds`
+           constraints are put *after* other constraints.
+        2. All numbers of function, Jacobian or Hessian evaluations correspond
+           to numbers of actual Python function calls. It means, for example,
+           that if a Jacobian is estimated by finite differences, then the
+           number of Jacobian evaluations will be zero and the number of
+           function evaluations will be incremented by all calls during the
+           finite difference estimation.
+
+    x : ndarray, shape (n,)
+        Solution found.
+    optimality : float
+        Infinity norm of the Lagrangian gradient at the solution.
+    constr_violation : float
+        Maximum constraint violation at the solution.
+    fun : float
+        Objective function at the solution.
+    grad : ndarray, shape (n,)
+        Gradient of the objective function at the solution.
+    lagrangian_grad : ndarray, shape (n,)
+        Gradient of the Lagrangian function at the solution.
+    nit : int
+        Total number of iterations.
+    nfev : integer
+        Number of the objective function evaluations.
+    njev : integer
+        Number of the objective function gradient evaluations.
+    nhev : integer
+        Number of the objective function Hessian evaluations.
+    cg_niter : int
+        Total number of the conjugate gradient method iterations.
+    method : {'equality_constrained_sqp', 'tr_interior_point'}
+        Optimization method used.
+    constr : list of ndarray
+        List of constraint values at the solution.
+    jac : list of {ndarray, sparse matrix}
+        List of the Jacobian matrices of the constraints at the solution.
+    v : list of ndarray
+        List of the Lagrange multipliers for the constraints at the solution.
+        For an inequality constraint a positive multiplier means that the upper
+        bound is active, a negative multiplier means that the lower bound is
+        active and if a multiplier is zero it means the constraint is not
+        active.
+    constr_nfev : list of int
+        Number of constraint evaluations for each of the constraints.
+    constr_njev : list of int
+        Number of Jacobian matrix evaluations for each of the constraints.
+    constr_nhev : list of int
+        Number of Hessian evaluations for each of the constraints.
+    tr_radius : float
+        Radius of the trust region at the last iteration.
+    constr_penalty : float
+        Penalty parameter at the last iteration, see `initial_constr_penalty`.
+    barrier_tolerance : float
+        Tolerance for the barrier subproblem at the last iteration.
+        Only for problems with inequality constraints.
+    barrier_parameter : float
+        Barrier parameter at the last iteration. Only for problems
+        with inequality constraints.
+    execution_time : float
+        Total execution time.
+    message : str
+        Termination message.
+    status : {0, 1, 2, 3}
+        Termination status:
+
+            * 0 : The maximum number of function evaluations is exceeded.
+            * 1 : `gtol` termination condition is satisfied.
+            * 2 : `xtol` termination condition is satisfied.
+            * 3 : `callback` function requested termination.
+
+    cg_stop_cond : int
+        Reason for CG subproblem termination at the last iteration:
+
+            * 0 : CG subproblem not evaluated.
+            * 1 : Iteration limit was reached.
+            * 2 : Reached the trust-region boundary.
+            * 3 : Negative curvature detected.
+            * 4 : Tolerance was satisfied.
+
+    References
+    ----------
+    .. [1] Conn, A. R., Gould, N. I., & Toint, P. L.
+           Trust region methods. 2000. Siam. pp. 19.
+    """
+    x0 = np.atleast_1d(x0).astype(float)
+    n_vars = np.size(x0)
+    if hess is None:
+        if callable(hessp):
+            hess = HessianLinearOperator(hessp, n_vars)
+        else:
+            hess = BFGS()
+    if disp and verbose == 0:
+        verbose = 1
+
+    if bounds is not None:
+        modified_lb = np.nextafter(bounds.lb, -np.inf, where=bounds.lb > -np.inf)
+        modified_ub = np.nextafter(bounds.ub, np.inf, where=bounds.ub < np.inf)
+        modified_lb = np.where(np.isfinite(bounds.lb), modified_lb, bounds.lb)
+        modified_ub = np.where(np.isfinite(bounds.ub), modified_ub, bounds.ub)
+        bounds = Bounds(modified_lb, modified_ub, keep_feasible=bounds.keep_feasible)
+        finite_diff_bounds = strict_bounds(bounds.lb, bounds.ub,
+                                           bounds.keep_feasible, n_vars)
+    else:
+        finite_diff_bounds = (-np.inf, np.inf)
+
+    # Define Objective Function
+    objective = ScalarFunction(fun, x0, args, grad, hess,
+                               finite_diff_rel_step, finite_diff_bounds)
+
+    # Put constraints in list format when needed.
+    if isinstance(constraints, (NonlinearConstraint, LinearConstraint)):
+        constraints = [constraints]
+
+    # Prepare constraints.
+    prepared_constraints = [
+        PreparedConstraint(c, x0, sparse_jacobian, finite_diff_bounds)
+        for c in constraints]
+
+    # Check that all constraints are either sparse or dense.
+    n_sparse = sum(c.fun.sparse_jacobian for c in prepared_constraints)
+    if 0 < n_sparse < len(prepared_constraints):
+        raise ValueError("All constraints must have the same kind of the "
+                         "Jacobian --- either all sparse or all dense. "
+                         "You can set the sparsity globally by setting "
+                         "`sparse_jacobian` to either True of False.")
+    if prepared_constraints:
+        sparse_jacobian = n_sparse > 0
+
+    if bounds is not None:
+        if sparse_jacobian is None:
+            sparse_jacobian = True
+        prepared_constraints.append(PreparedConstraint(bounds, x0,
+                                                       sparse_jacobian))
+
+    # Concatenate initial constraints to the canonical form.
+    c_eq0, c_ineq0, J_eq0, J_ineq0 = initial_constraints_as_canonical(
+        n_vars, prepared_constraints, sparse_jacobian)
+
+    # Prepare all canonical constraints and concatenate it into one.
+    canonical_all = [CanonicalConstraint.from_PreparedConstraint(c)
+                     for c in prepared_constraints]
+
+    if len(canonical_all) == 0:
+        canonical = CanonicalConstraint.empty(n_vars)
+    elif len(canonical_all) == 1:
+        canonical = canonical_all[0]
+    else:
+        canonical = CanonicalConstraint.concatenate(canonical_all,
+                                                    sparse_jacobian)
+
+    # Generate the Hessian of the Lagrangian.
+    lagrangian_hess = LagrangianHessian(n_vars, objective.hess, canonical.hess)
+
+    # Choose appropriate method
+    if canonical.n_ineq == 0:
+        method = 'equality_constrained_sqp'
+    else:
+        method = 'tr_interior_point'
+
+    # Construct OptimizeResult
+    state = OptimizeResult(
+        nit=0, nfev=0, njev=0, nhev=0,
+        cg_niter=0, cg_stop_cond=0,
+        fun=objective.f, grad=objective.g,
+        lagrangian_grad=np.copy(objective.g),
+        constr=[c.fun.f for c in prepared_constraints],
+        jac=[c.fun.J for c in prepared_constraints],
+        constr_nfev=[0 for c in prepared_constraints],
+        constr_njev=[0 for c in prepared_constraints],
+        constr_nhev=[0 for c in prepared_constraints],
+        v=[c.fun.v for c in prepared_constraints],
+        method=method)
+
+    # Start counting
+    start_time = time.time()
+
+    # Define stop criteria
+    if method == 'equality_constrained_sqp':
+        def stop_criteria(state, x, last_iteration_failed,
+                          optimality, constr_violation,
+                          tr_radius, constr_penalty, cg_info):
+            state = update_state_sqp(state, x, last_iteration_failed,
+                                     objective, prepared_constraints,
+                                     start_time, tr_radius, constr_penalty,
+                                     cg_info)
+            if verbose == 2:
+                BasicReport.print_iteration(state.nit,
+                                            state.nfev,
+                                            state.cg_niter,
+                                            state.fun,
+                                            state.tr_radius,
+                                            state.optimality,
+                                            state.constr_violation)
+            elif verbose > 2:
+                SQPReport.print_iteration(state.nit,
+                                          state.nfev,
+                                          state.cg_niter,
+                                          state.fun,
+                                          state.tr_radius,
+                                          state.optimality,
+                                          state.constr_violation,
+                                          state.constr_penalty,
+                                          state.cg_stop_cond)
+            state.status = None
+            state.niter = state.nit  # Alias for callback (backward-compatibility)
+            if callback is not None:
+                callback_stop = False
+                try:
+                    callback_stop = callback(state)
+                except StopIteration:
+                    callback_stop = True
+                if callback_stop:
+                    state.status = 3
+                    return True
+            if state.optimality < gtol and state.constr_violation < gtol:
+                state.status = 1
+            elif state.tr_radius < xtol:
+                state.status = 2
+            elif state.nit >= maxiter:
+                state.status = 0
+            return state.status in (0, 1, 2, 3)
+    elif method == 'tr_interior_point':
+        def stop_criteria(state, x, last_iteration_failed, tr_radius,
+                          constr_penalty, cg_info, barrier_parameter,
+                          barrier_tolerance):
+            state = update_state_ip(state, x, last_iteration_failed,
+                                    objective, prepared_constraints,
+                                    start_time, tr_radius, constr_penalty,
+                                    cg_info, barrier_parameter, barrier_tolerance)
+            if verbose == 2:
+                BasicReport.print_iteration(state.nit,
+                                            state.nfev,
+                                            state.cg_niter,
+                                            state.fun,
+                                            state.tr_radius,
+                                            state.optimality,
+                                            state.constr_violation)
+            elif verbose > 2:
+                IPReport.print_iteration(state.nit,
+                                         state.nfev,
+                                         state.cg_niter,
+                                         state.fun,
+                                         state.tr_radius,
+                                         state.optimality,
+                                         state.constr_violation,
+                                         state.constr_penalty,
+                                         state.barrier_parameter,
+                                         state.cg_stop_cond)
+            state.status = None
+            state.niter = state.nit  # Alias for callback (backward compatibility)
+            if callback is not None:
+                callback_stop = False
+                try:
+                    callback_stop = callback(state)
+                except StopIteration:
+                    callback_stop = True
+                if callback_stop:
+                    state.status = 3
+                    return True
+            if state.optimality < gtol and state.constr_violation < gtol:
+                state.status = 1
+            elif (state.tr_radius < xtol
+                  and state.barrier_parameter < barrier_tol):
+                state.status = 2
+            elif state.nit >= maxiter:
+                state.status = 0
+            return state.status in (0, 1, 2, 3)
+
+    if verbose == 2:
+        BasicReport.print_header()
+    elif verbose > 2:
+        if method == 'equality_constrained_sqp':
+            SQPReport.print_header()
+        elif method == 'tr_interior_point':
+            IPReport.print_header()
+
+    # Call inferior function to do the optimization
+    if method == 'equality_constrained_sqp':
+        def fun_and_constr(x):
+            f = objective.fun(x)
+            c_eq, _ = canonical.fun(x)
+            return f, c_eq
+
+        def grad_and_jac(x):
+            g = objective.grad(x)
+            J_eq, _ = canonical.jac(x)
+            return g, J_eq
+
+        _, result = equality_constrained_sqp(
+            fun_and_constr, grad_and_jac, lagrangian_hess,
+            x0, objective.f, objective.g,
+            c_eq0, J_eq0,
+            stop_criteria, state,
+            initial_constr_penalty, initial_tr_radius,
+            factorization_method)
+
+    elif method == 'tr_interior_point':
+        _, result = tr_interior_point(
+            objective.fun, objective.grad, lagrangian_hess,
+            n_vars, canonical.n_ineq, canonical.n_eq,
+            canonical.fun, canonical.jac,
+            x0, objective.f, objective.g,
+            c_ineq0, J_ineq0, c_eq0, J_eq0,
+            stop_criteria,
+            canonical.keep_feasible,
+            xtol, state, initial_barrier_parameter,
+            initial_barrier_tolerance,
+            initial_constr_penalty, initial_tr_radius,
+            factorization_method)
+
+    # Status 3 occurs when the callback function requests termination,
+    # this is assumed to not be a success.
+    result.success = True if result.status in (1, 2) else False
+    result.message = TERMINATION_MESSAGES[result.status]
+
+    # Alias (for backward compatibility with 1.1.0)
+    result.niter = result.nit
+
+    if verbose == 2:
+        BasicReport.print_footer()
+    elif verbose > 2:
+        if method == 'equality_constrained_sqp':
+            SQPReport.print_footer()
+        elif method == 'tr_interior_point':
+            IPReport.print_footer()
+    if verbose >= 1:
+        print(result.message)
+        print("Number of iterations: {}, function evaluations: {}, "
+              "CG iterations: {}, optimality: {:.2e}, "
+              "constraint violation: {:.2e}, execution time: {:4.2} s."
+              .format(result.nit, result.nfev, result.cg_niter,
+                      result.optimality, result.constr_violation,
+                      result.execution_time))
+    return result
@@ -0,0 +1,407 @@
+"""Basic linear factorizations needed by the solver."""
+
+from scipy.sparse import (bmat, csc_matrix, eye, issparse)
+from scipy.sparse.linalg import LinearOperator
+import scipy.linalg
+import scipy.sparse.linalg
+try:
+    from sksparse.cholmod import cholesky_AAt
+    sksparse_available = True
+except ImportError:
+    import warnings
+    sksparse_available = False
+import numpy as np
+from warnings import warn
+
+__all__ = [
+    'orthogonality',
+    'projections',
+]
+
+
+def orthogonality(A, g):
+    """Measure orthogonality between a vector and the null space of a matrix.
+
+    Compute a measure of orthogonality between the null space
+    of the (possibly sparse) matrix ``A`` and a given vector ``g``.
+
+    The formula is a simplified (and cheaper) version of formula (3.13)
+    from [1]_.
+    ``orth =  norm(A g, ord=2)/(norm(A, ord='fro')*norm(g, ord=2))``.
+
+    References
+    ----------
+    .. [1] Gould, Nicholas IM, Mary E. Hribar, and Jorge Nocedal.
+           "On the solution of equality constrained quadratic
+            programming problems arising in optimization."
+            SIAM Journal on Scientific Computing 23.4 (2001): 1376-1395.
+    """
+    # Compute vector norms
+    norm_g = np.linalg.norm(g)
+    # Compute Froebnius norm of the matrix A
+    if issparse(A):
+        norm_A = scipy.sparse.linalg.norm(A, ord='fro')
+    else:
+        norm_A = np.linalg.norm(A, ord='fro')
+
+    # Check if norms are zero
+    if norm_g == 0 or norm_A == 0:
+        return 0
+
+    norm_A_g = np.linalg.norm(A.dot(g))
+    # Orthogonality measure
+    orth = norm_A_g / (norm_A*norm_g)
+    return orth
+
+
+def normal_equation_projections(A, m, n, orth_tol, max_refin, tol):
+    """Return linear operators for matrix A using ``NormalEquation`` approach.
+    """
+    # Cholesky factorization
+    factor = cholesky_AAt(A)
+
+    # z = x - A.T inv(A A.T) A x
+    def null_space(x):
+        v = factor(A.dot(x))
+        z = x - A.T.dot(v)
+
+        # Iterative refinement to improve roundoff
+        # errors described in [2]_, algorithm 5.1.
+        k = 0
+        while orthogonality(A, z) > orth_tol:
+            if k >= max_refin:
+                break
+            # z_next = z - A.T inv(A A.T) A z
+            v = factor(A.dot(z))
+            z = z - A.T.dot(v)
+            k += 1
+
+        return z
+
+    # z = inv(A A.T) A x
+    def least_squares(x):
+        return factor(A.dot(x))
+
+    # z = A.T inv(A A.T) x
+    def row_space(x):
+        return A.T.dot(factor(x))
+
+    return null_space, least_squares, row_space
+
+
+def augmented_system_projections(A, m, n, orth_tol, max_refin, tol):
+    """Return linear operators for matrix A - ``AugmentedSystem``."""
+    # Form augmented system
+    K = csc_matrix(bmat([[eye(n), A.T], [A, None]]))
+    # LU factorization
+    # TODO: Use a symmetric indefinite factorization
+    #       to solve the system twice as fast (because
+    #       of the symmetry).
+    try:
+        solve = scipy.sparse.linalg.factorized(K)
+    except RuntimeError:
+        warn("Singular Jacobian matrix. Using dense SVD decomposition to "
+             "perform the factorizations.",
+             stacklevel=3)
+        return svd_factorization_projections(A.toarray(),
+                                             m, n, orth_tol,
+                                             max_refin, tol)
+
+    # z = x - A.T inv(A A.T) A x
+    # is computed solving the extended system:
+    # [I A.T] * [ z ] = [x]
+    # [A  O ]   [aux]   [0]
+    def null_space(x):
+        # v = [x]
+        #     [0]
+        v = np.hstack([x, np.zeros(m)])
+        # lu_sol = [ z ]
+        #          [aux]
+        lu_sol = solve(v)
+        z = lu_sol[:n]
+
+        # Iterative refinement to improve roundoff
+        # errors described in [2]_, algorithm 5.2.
+        k = 0
+        while orthogonality(A, z) > orth_tol:
+            if k >= max_refin:
+                break
+            # new_v = [x] - [I A.T] * [ z ]
+            #         [0]   [A  O ]   [aux]
+            new_v = v - K.dot(lu_sol)
+            # [I A.T] * [delta  z ] = new_v
+            # [A  O ]   [delta aux]
+            lu_update = solve(new_v)
+            #  [ z ] += [delta  z ]
+            #  [aux]    [delta aux]
+            lu_sol += lu_update
+            z = lu_sol[:n]
+            k += 1
+
+        # return z = x - A.T inv(A A.T) A x
+        return z
+
+    # z = inv(A A.T) A x
+    # is computed solving the extended system:
+    # [I A.T] * [aux] = [x]
+    # [A  O ]   [ z ]   [0]
+    def least_squares(x):
+        # v = [x]
+        #     [0]
+        v = np.hstack([x, np.zeros(m)])
+        # lu_sol = [aux]
+        #          [ z ]
+        lu_sol = solve(v)
+        # return z = inv(A A.T) A x
+        return lu_sol[n:m+n]
+
+    # z = A.T inv(A A.T) x
+    # is computed solving the extended system:
+    # [I A.T] * [ z ] = [0]
+    # [A  O ]   [aux]   [x]
+    def row_space(x):
+        # v = [0]
+        #     [x]
+        v = np.hstack([np.zeros(n), x])
+        # lu_sol = [ z ]
+        #          [aux]
+        lu_sol = solve(v)
+        # return z = A.T inv(A A.T) x
+        return lu_sol[:n]
+
+    return null_space, least_squares, row_space
+
+
+def qr_factorization_projections(A, m, n, orth_tol, max_refin, tol):
+    """Return linear operators for matrix A using ``QRFactorization`` approach.
+    """
+    # QRFactorization
+    Q, R, P = scipy.linalg.qr(A.T, pivoting=True, mode='economic')
+
+    if np.linalg.norm(R[-1, :], np.inf) < tol:
+        warn('Singular Jacobian matrix. Using SVD decomposition to ' +
+             'perform the factorizations.',
+             stacklevel=3)
+        return svd_factorization_projections(A, m, n,
+                                             orth_tol,
+                                             max_refin,
+                                             tol)
+
+    # z = x - A.T inv(A A.T) A x
+    def null_space(x):
+        # v = P inv(R) Q.T x
+        aux1 = Q.T.dot(x)
+        aux2 = scipy.linalg.solve_triangular(R, aux1, lower=False)
+        v = np.zeros(m)
+        v[P] = aux2
+        z = x - A.T.dot(v)
+
+        # Iterative refinement to improve roundoff
+        # errors described in [2]_, algorithm 5.1.
+        k = 0
+        while orthogonality(A, z) > orth_tol:
+            if k >= max_refin:
+                break
+            # v = P inv(R) Q.T x
+            aux1 = Q.T.dot(z)
+            aux2 = scipy.linalg.solve_triangular(R, aux1, lower=False)
+            v[P] = aux2
+            # z_next = z - A.T v
+            z = z - A.T.dot(v)
+            k += 1
+
+        return z
+
+    # z = inv(A A.T) A x
+    def least_squares(x):
+        # z = P inv(R) Q.T x
+        aux1 = Q.T.dot(x)
+        aux2 = scipy.linalg.solve_triangular(R, aux1, lower=False)
+        z = np.zeros(m)
+        z[P] = aux2
+        return z
+
+    # z = A.T inv(A A.T) x
+    def row_space(x):
+        # z = Q inv(R.T) P.T x
+        aux1 = x[P]
+        aux2 = scipy.linalg.solve_triangular(R, aux1,
+                                             lower=False,
+                                             trans='T')
+        z = Q.dot(aux2)
+        return z
+
+    return null_space, least_squares, row_space
+
+
+def svd_factorization_projections(A, m, n, orth_tol, max_refin, tol):
+    """Return linear operators for matrix A using ``SVDFactorization`` approach.
+    """
+    # SVD Factorization
+    U, s, Vt = scipy.linalg.svd(A, full_matrices=False)
+
+    # Remove dimensions related with very small singular values
+    U = U[:, s > tol]
+    Vt = Vt[s > tol, :]
+    s = s[s > tol]
+
+    # z = x - A.T inv(A A.T) A x
+    def null_space(x):
+        # v = U 1/s V.T x = inv(A A.T) A x
+        aux1 = Vt.dot(x)
+        aux2 = 1/s*aux1
+        v = U.dot(aux2)
+        z = x - A.T.dot(v)
+
+        # Iterative refinement to improve roundoff
+        # errors described in [2]_, algorithm 5.1.
+        k = 0
+        while orthogonality(A, z) > orth_tol:
+            if k >= max_refin:
+                break
+            # v = U 1/s V.T x = inv(A A.T) A x
+            aux1 = Vt.dot(z)
+            aux2 = 1/s*aux1
+            v = U.dot(aux2)
+            # z_next = z - A.T v
+            z = z - A.T.dot(v)
+            k += 1
+
+        return z
+
+    # z = inv(A A.T) A x
+    def least_squares(x):
+        # z = U 1/s V.T x = inv(A A.T) A x
+        aux1 = Vt.dot(x)
+        aux2 = 1/s*aux1
+        z = U.dot(aux2)
+        return z
+
+    # z = A.T inv(A A.T) x
+    def row_space(x):
+        # z = V 1/s U.T x
+        aux1 = U.T.dot(x)
+        aux2 = 1/s*aux1
+        z = Vt.T.dot(aux2)
+        return z
+
+    return null_space, least_squares, row_space
+
+
+def projections(A, method=None, orth_tol=1e-12, max_refin=3, tol=1e-15):
+    """Return three linear operators related with a given matrix A.
+
+    Parameters
+    ----------
+    A : sparse matrix (or ndarray), shape (m, n)
+        Matrix ``A`` used in the projection.
+    method : string, optional
+        Method used for compute the given linear
+        operators. Should be one of:
+
+            - 'NormalEquation': The operators
+               will be computed using the
+               so-called normal equation approach
+               explained in [1]_. In order to do
+               so the Cholesky factorization of
+               ``(A A.T)`` is computed. Exclusive
+               for sparse matrices.
+            - 'AugmentedSystem': The operators
+               will be computed using the
+               so-called augmented system approach
+               explained in [1]_. Exclusive
+               for sparse matrices.
+            - 'QRFactorization': Compute projections
+               using QR factorization. Exclusive for
+               dense matrices.
+            - 'SVDFactorization': Compute projections
+               using SVD factorization. Exclusive for
+               dense matrices.
+
+    orth_tol : float, optional
+        Tolerance for iterative refinements.
+    max_refin : int, optional
+        Maximum number of iterative refinements.
+    tol : float, optional
+        Tolerance for singular values.
+
+    Returns
+    -------
+    Z : LinearOperator, shape (n, n)
+        Null-space operator. For a given vector ``x``,
+        the null space operator is equivalent to apply
+        a projection matrix ``P = I - A.T inv(A A.T) A``
+        to the vector. It can be shown that this is
+        equivalent to project ``x`` into the null space
+        of A.
+    LS : LinearOperator, shape (m, n)
+        Least-squares operator. For a given vector ``x``,
+        the least-squares operator is equivalent to apply a
+        pseudoinverse matrix ``pinv(A.T) = inv(A A.T) A``
+        to the vector. It can be shown that this vector
+        ``pinv(A.T) x`` is the least_square solution to
+        ``A.T y = x``.
+    Y : LinearOperator, shape (n, m)
+        Row-space operator. For a given vector ``x``,
+        the row-space operator is equivalent to apply a
+        projection matrix ``Q = A.T inv(A A.T)``
+        to the vector.  It can be shown that this
+        vector ``y = Q x``  the minimum norm solution
+        of ``A y = x``.
+
+    Notes
+    -----
+    Uses iterative refinements described in [1]
+    during the computation of ``Z`` in order to
+    cope with the possibility of large roundoff errors.
+
+    References
+    ----------
+    .. [1] Gould, Nicholas IM, Mary E. Hribar, and Jorge Nocedal.
+        "On the solution of equality constrained quadratic
+        programming problems arising in optimization."
+        SIAM Journal on Scientific Computing 23.4 (2001): 1376-1395.
+    """
+    m, n = np.shape(A)
+
+    # The factorization of an empty matrix
+    # only works for the sparse representation.
+    if m*n == 0:
+        A = csc_matrix(A)
+
+    # Check Argument
+    if issparse(A):
+        if method is None:
+            method = "AugmentedSystem"
+        if method not in ("NormalEquation", "AugmentedSystem"):
+            raise ValueError("Method not allowed for sparse matrix.")
+        if method == "NormalEquation" and not sksparse_available:
+            warnings.warn("Only accepts 'NormalEquation' option when "
+                          "scikit-sparse is available. Using "
+                          "'AugmentedSystem' option instead.",
+                          ImportWarning, stacklevel=3)
+            method = 'AugmentedSystem'
+    else:
+        if method is None:
+            method = "QRFactorization"
+        if method not in ("QRFactorization", "SVDFactorization"):
+            raise ValueError("Method not allowed for dense array.")
+
+    if method == 'NormalEquation':
+        null_space, least_squares, row_space \
+            = normal_equation_projections(A, m, n, orth_tol, max_refin, tol)
+    elif method == 'AugmentedSystem':
+        null_space, least_squares, row_space \
+            = augmented_system_projections(A, m, n, orth_tol, max_refin, tol)
+    elif method == "QRFactorization":
+        null_space, least_squares, row_space \
+            = qr_factorization_projections(A, m, n, orth_tol, max_refin, tol)
+    elif method == "SVDFactorization":
+        null_space, least_squares, row_space \
+            = svd_factorization_projections(A, m, n, orth_tol, max_refin, tol)
+
+    Z = LinearOperator((n, n), null_space)
+    LS = LinearOperator((m, n), least_squares)
+    Y = LinearOperator((n, m), row_space)
+
+    return Z, LS, Y
@@ -0,0 +1,637 @@
+"""Equality-constrained quadratic programming solvers."""
+
+from scipy.sparse import (linalg, bmat, csc_matrix)
+from math import copysign
+import numpy as np
+from numpy.linalg import norm
+
+__all__ = [
+    'eqp_kktfact',
+    'sphere_intersections',
+    'box_intersections',
+    'box_sphere_intersections',
+    'inside_box_boundaries',
+    'modified_dogleg',
+    'projected_cg'
+]
+
+
+# For comparison with the projected CG
+def eqp_kktfact(H, c, A, b):
+    """Solve equality-constrained quadratic programming (EQP) problem.
+
+    Solve ``min 1/2 x.T H x + x.t c`` subject to ``A x + b = 0``
+    using direct factorization of the KKT system.
+
+    Parameters
+    ----------
+    H : sparse matrix, shape (n, n)
+        Hessian matrix of the EQP problem.
+    c : array_like, shape (n,)
+        Gradient of the quadratic objective function.
+    A : sparse matrix
+        Jacobian matrix of the EQP problem.
+    b : array_like, shape (m,)
+        Right-hand side of the constraint equation.
+
+    Returns
+    -------
+    x : array_like, shape (n,)
+        Solution of the KKT problem.
+    lagrange_multipliers : ndarray, shape (m,)
+        Lagrange multipliers of the KKT problem.
+    """
+    n, = np.shape(c)  # Number of parameters
+    m, = np.shape(b)  # Number of constraints
+
+    # Karush-Kuhn-Tucker matrix of coefficients.
+    # Defined as in Nocedal/Wright "Numerical
+    # Optimization" p.452 in Eq. (16.4).
+    kkt_matrix = csc_matrix(bmat([[H, A.T], [A, None]]))
+    # Vector of coefficients.
+    kkt_vec = np.hstack([-c, -b])
+
+    # TODO: Use a symmetric indefinite factorization
+    #       to solve the system twice as fast (because
+    #       of the symmetry).
+    lu = linalg.splu(kkt_matrix)
+    kkt_sol = lu.solve(kkt_vec)
+    x = kkt_sol[:n]
+    lagrange_multipliers = -kkt_sol[n:n+m]
+
+    return x, lagrange_multipliers
+
+
+def sphere_intersections(z, d, trust_radius,
+                         entire_line=False):
+    """Find the intersection between segment (or line) and spherical constraints.
+
+    Find the intersection between the segment (or line) defined by the
+    parametric  equation ``x(t) = z + t*d`` and the ball
+    ``||x|| <= trust_radius``.
+
+    Parameters
+    ----------
+    z : array_like, shape (n,)
+        Initial point.
+    d : array_like, shape (n,)
+        Direction.
+    trust_radius : float
+        Ball radius.
+    entire_line : bool, optional
+        When ``True``, the function returns the intersection between the line
+        ``x(t) = z + t*d`` (``t`` can assume any value) and the ball
+        ``||x|| <= trust_radius``. When ``False``, the function returns the intersection
+        between the segment ``x(t) = z + t*d``, ``0 <= t <= 1``, and the ball.
+
+    Returns
+    -------
+    ta, tb : float
+        The line/segment ``x(t) = z + t*d`` is inside the ball for
+        for ``ta <= t <= tb``.
+    intersect : bool
+        When ``True``, there is a intersection between the line/segment
+        and the sphere. On the other hand, when ``False``, there is no
+        intersection.
+    """
+    # Special case when d=0
+    if norm(d) == 0:
+        return 0, 0, False
+    # Check for inf trust_radius
+    if np.isinf(trust_radius):
+        if entire_line:
+            ta = -np.inf
+            tb = np.inf
+        else:
+            ta = 0
+            tb = 1
+        intersect = True
+        return ta, tb, intersect
+
+    a = np.dot(d, d)
+    b = 2 * np.dot(z, d)
+    c = np.dot(z, z) - trust_radius**2
+    discriminant = b*b - 4*a*c
+    if discriminant < 0:
+        intersect = False
+        return 0, 0, intersect
+    sqrt_discriminant = np.sqrt(discriminant)
+
+    # The following calculation is mathematically
+    # equivalent to:
+    # ta = (-b - sqrt_discriminant) / (2*a)
+    # tb = (-b + sqrt_discriminant) / (2*a)
+    # but produce smaller round off errors.
+    # Look at Matrix Computation p.97
+    # for a better justification.
+    aux = b + copysign(sqrt_discriminant, b)
+    ta = -aux / (2*a)
+    tb = -2*c / aux
+    ta, tb = sorted([ta, tb])
+
+    if entire_line:
+        intersect = True
+    else:
+        # Checks to see if intersection happens
+        # within vectors length.
+        if tb < 0 or ta > 1:
+            intersect = False
+            ta = 0
+            tb = 0
+        else:
+            intersect = True
+            # Restrict intersection interval
+            # between 0 and 1.
+            ta = max(0, ta)
+            tb = min(1, tb)
+
+    return ta, tb, intersect
+
+
+def box_intersections(z, d, lb, ub,
+                      entire_line=False):
+    """Find the intersection between segment (or line) and box constraints.
+
+    Find the intersection between the segment (or line) defined by the
+    parametric  equation ``x(t) = z + t*d`` and the rectangular box
+    ``lb <= x <= ub``.
+
+    Parameters
+    ----------
+    z : array_like, shape (n,)
+        Initial point.
+    d : array_like, shape (n,)
+        Direction.
+    lb : array_like, shape (n,)
+        Lower bounds to each one of the components of ``x``. Used
+        to delimit the rectangular box.
+    ub : array_like, shape (n, )
+        Upper bounds to each one of the components of ``x``. Used
+        to delimit the rectangular box.
+    entire_line : bool, optional
+        When ``True``, the function returns the intersection between the line
+        ``x(t) = z + t*d`` (``t`` can assume any value) and the rectangular
+        box. When ``False``, the function returns the intersection between the segment
+        ``x(t) = z + t*d``, ``0 <= t <= 1``, and the rectangular box.
+
+    Returns
+    -------
+    ta, tb : float
+        The line/segment ``x(t) = z + t*d`` is inside the box for
+        for ``ta <= t <= tb``.
+    intersect : bool
+        When ``True``, there is a intersection between the line (or segment)
+        and the rectangular box. On the other hand, when ``False``, there is no
+        intersection.
+    """
+    # Make sure it is a numpy array
+    z = np.asarray(z)
+    d = np.asarray(d)
+    lb = np.asarray(lb)
+    ub = np.asarray(ub)
+    # Special case when d=0
+    if norm(d) == 0:
+        return 0, 0, False
+
+    # Get values for which d==0
+    zero_d = (d == 0)
+    # If the boundaries are not satisfied for some coordinate
+    # for which "d" is zero, there is no box-line intersection.
+    if (z[zero_d] < lb[zero_d]).any() or (z[zero_d] > ub[zero_d]).any():
+        intersect = False
+        return 0, 0, intersect
+    # Remove values for which d is zero
+    not_zero_d = np.logical_not(zero_d)
+    z = z[not_zero_d]
+    d = d[not_zero_d]
+    lb = lb[not_zero_d]
+    ub = ub[not_zero_d]
+
+    # Find a series of intervals (t_lb[i], t_ub[i]).
+    t_lb = (lb-z) / d
+    t_ub = (ub-z) / d
+    # Get the intersection of all those intervals.
+    ta = max(np.minimum(t_lb, t_ub))
+    tb = min(np.maximum(t_lb, t_ub))
+
+    # Check if intersection is feasible
+    if ta <= tb:
+        intersect = True
+    else:
+        intersect = False
+    # Checks to see if intersection happens within vectors length.
+    if not entire_line:
+        if tb < 0 or ta > 1:
+            intersect = False
+            ta = 0
+            tb = 0
+        else:
+            # Restrict intersection interval between 0 and 1.
+            ta = max(0, ta)
+            tb = min(1, tb)
+
+    return ta, tb, intersect
+
+
+def box_sphere_intersections(z, d, lb, ub, trust_radius,
+                             entire_line=False,
+                             extra_info=False):
+    """Find the intersection between segment (or line) and box/sphere constraints.
+
+    Find the intersection between the segment (or line) defined by the
+    parametric  equation ``x(t) = z + t*d``, the rectangular box
+    ``lb <= x <= ub`` and the ball ``||x|| <= trust_radius``.
+
+    Parameters
+    ----------
+    z : array_like, shape (n,)
+        Initial point.
+    d : array_like, shape (n,)
+        Direction.
+    lb : array_like, shape (n,)
+        Lower bounds to each one of the components of ``x``. Used
+        to delimit the rectangular box.
+    ub : array_like, shape (n, )
+        Upper bounds to each one of the components of ``x``. Used
+        to delimit the rectangular box.
+    trust_radius : float
+        Ball radius.
+    entire_line : bool, optional
+        When ``True``, the function returns the intersection between the line
+        ``x(t) = z + t*d`` (``t`` can assume any value) and the constraints.
+        When ``False``, the function returns the intersection between the segment
+        ``x(t) = z + t*d``, ``0 <= t <= 1`` and the constraints.
+    extra_info : bool, optional
+        When ``True``, the function returns ``intersect_sphere`` and ``intersect_box``.
+
+    Returns
+    -------
+    ta, tb : float
+        The line/segment ``x(t) = z + t*d`` is inside the rectangular box and
+        inside the ball for ``ta <= t <= tb``.
+    intersect : bool
+        When ``True``, there is a intersection between the line (or segment)
+        and both constraints. On the other hand, when ``False``, there is no
+        intersection.
+    sphere_info : dict, optional
+        Dictionary ``{ta, tb, intersect}`` containing the interval ``[ta, tb]``
+        for which the line intercepts the ball. And a boolean value indicating
+        whether the sphere is intersected by the line.
+    box_info : dict, optional
+        Dictionary ``{ta, tb, intersect}`` containing the interval ``[ta, tb]``
+        for which the line intercepts the box. And a boolean value indicating
+        whether the box is intersected by the line.
+    """
+    ta_b, tb_b, intersect_b = box_intersections(z, d, lb, ub,
+                                                entire_line)
+    ta_s, tb_s, intersect_s = sphere_intersections(z, d,
+                                                   trust_radius,
+                                                   entire_line)
+    ta = np.maximum(ta_b, ta_s)
+    tb = np.minimum(tb_b, tb_s)
+    if intersect_b and intersect_s and ta <= tb:
+        intersect = True
+    else:
+        intersect = False
+
+    if extra_info:
+        sphere_info = {'ta': ta_s, 'tb': tb_s, 'intersect': intersect_s}
+        box_info = {'ta': ta_b, 'tb': tb_b, 'intersect': intersect_b}
+        return ta, tb, intersect, sphere_info, box_info
+    else:
+        return ta, tb, intersect
+
+
+def inside_box_boundaries(x, lb, ub):
+    """Check if lb <= x <= ub."""
+    return (lb <= x).all() and (x <= ub).all()
+
+
+def reinforce_box_boundaries(x, lb, ub):
+    """Return clipped value of x"""
+    return np.minimum(np.maximum(x, lb), ub)
+
+
+def modified_dogleg(A, Y, b, trust_radius, lb, ub):
+    """Approximately  minimize ``1/2*|| A x + b ||^2`` inside trust-region.
+
+    Approximately solve the problem of minimizing ``1/2*|| A x + b ||^2``
+    subject to ``||x|| < Delta`` and ``lb <= x <= ub`` using a modification
+    of the classical dogleg approach.
+
+    Parameters
+    ----------
+    A : LinearOperator (or sparse matrix or ndarray), shape (m, n)
+        Matrix ``A`` in the minimization problem. It should have
+        dimension ``(m, n)`` such that ``m < n``.
+    Y : LinearOperator (or sparse matrix or ndarray), shape (n, m)
+        LinearOperator that apply the projection matrix
+        ``Q = A.T inv(A A.T)`` to the vector. The obtained vector
+        ``y = Q x`` being the minimum norm solution of ``A y = x``.
+    b : array_like, shape (m,)
+        Vector ``b``in the minimization problem.
+    trust_radius: float
+        Trust radius to be considered. Delimits a sphere boundary
+        to the problem.
+    lb : array_like, shape (n,)
+        Lower bounds to each one of the components of ``x``.
+        It is expected that ``lb <= 0``, otherwise the algorithm
+        may fail. If ``lb[i] = -Inf``, the lower
+        bound for the ith component is just ignored.
+    ub : array_like, shape (n, )
+        Upper bounds to each one of the components of ``x``.
+        It is expected that ``ub >= 0``, otherwise the algorithm
+        may fail. If ``ub[i] = Inf``, the upper bound for the ith
+        component is just ignored.
+
+    Returns
+    -------
+    x : array_like, shape (n,)
+        Solution to the problem.
+
+    Notes
+    -----
+    Based on implementations described in pp. 885-886 from [1]_.
+
+    References
+    ----------
+    .. [1] Byrd, Richard H., Mary E. Hribar, and Jorge Nocedal.
+           "An interior point algorithm for large-scale nonlinear
+           programming." SIAM Journal on Optimization 9.4 (1999): 877-900.
+    """
+    # Compute minimum norm minimizer of 1/2*|| A x + b ||^2.
+    newton_point = -Y.dot(b)
+    # Check for interior point
+    if inside_box_boundaries(newton_point, lb, ub)  \
+       and norm(newton_point) <= trust_radius:
+        x = newton_point
+        return x
+
+    # Compute gradient vector ``g = A.T b``
+    g = A.T.dot(b)
+    # Compute Cauchy point
+    # `cauchy_point = g.T g / (g.T A.T A g)``.
+    A_g = A.dot(g)
+    cauchy_point = -np.dot(g, g) / np.dot(A_g, A_g) * g
+    # Origin
+    origin_point = np.zeros_like(cauchy_point)
+
+    # Check the segment between cauchy_point and newton_point
+    # for a possible solution.
+    z = cauchy_point
+    p = newton_point - cauchy_point
+    _, alpha, intersect = box_sphere_intersections(z, p, lb, ub,
+                                                   trust_radius)
+    if intersect:
+        x1 = z + alpha*p
+    else:
+        # Check the segment between the origin and cauchy_point
+        # for a possible solution.
+        z = origin_point
+        p = cauchy_point
+        _, alpha, _ = box_sphere_intersections(z, p, lb, ub,
+                                               trust_radius)
+        x1 = z + alpha*p
+
+    # Check the segment between origin and newton_point
+    # for a possible solution.
+    z = origin_point
+    p = newton_point
+    _, alpha, _ = box_sphere_intersections(z, p, lb, ub,
+                                           trust_radius)
+    x2 = z + alpha*p
+
+    # Return the best solution among x1 and x2.
+    if norm(A.dot(x1) + b) < norm(A.dot(x2) + b):
+        return x1
+    else:
+        return x2
+
+
+def projected_cg(H, c, Z, Y, b, trust_radius=np.inf,
+                 lb=None, ub=None, tol=None,
+                 max_iter=None, max_infeasible_iter=None,
+                 return_all=False):
+    """Solve EQP problem with projected CG method.
+
+    Solve equality-constrained quadratic programming problem
+    ``min 1/2 x.T H x + x.t c``  subject to ``A x + b = 0`` and,
+    possibly, to trust region constraints ``||x|| < trust_radius``
+    and box constraints ``lb <= x <= ub``.
+
+    Parameters
+    ----------
+    H : LinearOperator (or sparse matrix or ndarray), shape (n, n)
+        Operator for computing ``H v``.
+    c : array_like, shape (n,)
+        Gradient of the quadratic objective function.
+    Z : LinearOperator (or sparse matrix or ndarray), shape (n, n)
+        Operator for projecting ``x`` into the null space of A.
+    Y : LinearOperator,  sparse matrix, ndarray, shape (n, m)
+        Operator that, for a given a vector ``b``, compute smallest
+        norm solution of ``A x + b = 0``.
+    b : array_like, shape (m,)
+        Right-hand side of the constraint equation.
+    trust_radius : float, optional
+        Trust radius to be considered. By default, uses ``trust_radius=inf``,
+        which means no trust radius at all.
+    lb : array_like, shape (n,), optional
+        Lower bounds to each one of the components of ``x``.
+        If ``lb[i] = -Inf`` the lower bound for the i-th
+        component is just ignored (default).
+    ub : array_like, shape (n, ), optional
+        Upper bounds to each one of the components of ``x``.
+        If ``ub[i] = Inf`` the upper bound for the i-th
+        component is just ignored (default).
+    tol : float, optional
+        Tolerance used to interrupt the algorithm.
+    max_iter : int, optional
+        Maximum algorithm iterations. Where ``max_inter <= n-m``.
+        By default, uses ``max_iter = n-m``.
+    max_infeasible_iter : int, optional
+        Maximum infeasible (regarding box constraints) iterations the
+        algorithm is allowed to take.
+        By default, uses ``max_infeasible_iter = n-m``.
+    return_all : bool, optional
+        When ``true``, return the list of all vectors through the iterations.
+
+    Returns
+    -------
+    x : array_like, shape (n,)
+        Solution of the EQP problem.
+    info : Dict
+        Dictionary containing the following:
+
+            - niter : Number of iterations.
+            - stop_cond : Reason for algorithm termination:
+                1. Iteration limit was reached;
+                2. Reached the trust-region boundary;
+                3. Negative curvature detected;
+                4. Tolerance was satisfied.
+            - allvecs : List containing all intermediary vectors (optional).
+            - hits_boundary : True if the proposed step is on the boundary
+              of the trust region.
+
+    Notes
+    -----
+    Implementation of Algorithm 6.2 on [1]_.
+
+    In the absence of spherical and box constraints, for sufficient
+    iterations, the method returns a truly optimal result.
+    In the presence of those constraints, the value returned is only
+    a inexpensive approximation of the optimal value.
+
+    References
+    ----------
+    .. [1] Gould, Nicholas IM, Mary E. Hribar, and Jorge Nocedal.
+           "On the solution of equality constrained quadratic
+            programming problems arising in optimization."
+            SIAM Journal on Scientific Computing 23.4 (2001): 1376-1395.
+    """
+    CLOSE_TO_ZERO = 1e-25
+
+    n, = np.shape(c)  # Number of parameters
+    m, = np.shape(b)  # Number of constraints
+
+    # Initial Values
+    x = Y.dot(-b)
+    r = Z.dot(H.dot(x) + c)
+    g = Z.dot(r)
+    p = -g
+
+    # Store ``x`` value
+    if return_all:
+        allvecs = [x]
+    # Values for the first iteration
+    H_p = H.dot(p)
+    rt_g = norm(g)**2  # g.T g = r.T Z g = r.T g (ref [1]_ p.1389)
+
+    # If x > trust-region the problem does not have a solution.
+    tr_distance = trust_radius - norm(x)
+    if tr_distance < 0:
+        raise ValueError("Trust region problem does not have a solution.")
+    # If x == trust_radius, then x is the solution
+    # to the optimization problem, since x is the
+    # minimum norm solution to Ax=b.
+    elif tr_distance < CLOSE_TO_ZERO:
+        info = {'niter': 0, 'stop_cond': 2, 'hits_boundary': True}
+        if return_all:
+            allvecs.append(x)
+            info['allvecs'] = allvecs
+        return x, info
+
+    # Set default tolerance
+    if tol is None:
+        tol = max(min(0.01 * np.sqrt(rt_g), 0.1 * rt_g), CLOSE_TO_ZERO)
+    # Set default lower and upper bounds
+    if lb is None:
+        lb = np.full(n, -np.inf)
+    if ub is None:
+        ub = np.full(n, np.inf)
+    # Set maximum iterations
+    if max_iter is None:
+        max_iter = n-m
+    max_iter = min(max_iter, n-m)
+    # Set maximum infeasible iterations
+    if max_infeasible_iter is None:
+        max_infeasible_iter = n-m
+
+    hits_boundary = False
+    stop_cond = 1
+    counter = 0
+    last_feasible_x = np.zeros_like(x)
+    k = 0
+    for i in range(max_iter):
+        # Stop criteria - Tolerance : r.T g < tol
+        if rt_g < tol:
+            stop_cond = 4
+            break
+        k += 1
+        # Compute curvature
+        pt_H_p = H_p.dot(p)
+        # Stop criteria - Negative curvature
+        if pt_H_p <= 0:
+            if np.isinf(trust_radius):
+                raise ValueError("Negative curvature not allowed "
+                                 "for unrestricted problems.")
+            else:
+                # Find intersection with constraints
+                _, alpha, intersect = box_sphere_intersections(
+                    x, p, lb, ub, trust_radius, entire_line=True)
+                # Update solution
+                if intersect:
+                    x = x + alpha*p
+                # Reinforce variables are inside box constraints.
+                # This is only necessary because of roundoff errors.
+                x = reinforce_box_boundaries(x, lb, ub)
+                # Attribute information
+                stop_cond = 3
+                hits_boundary = True
+                break
+
+        # Get next step
+        alpha = rt_g / pt_H_p
+        x_next = x + alpha*p
+
+        # Stop criteria - Hits boundary
+        if np.linalg.norm(x_next) >= trust_radius:
+            # Find intersection with box constraints
+            _, theta, intersect = box_sphere_intersections(x, alpha*p, lb, ub,
+                                                           trust_radius)
+            # Update solution
+            if intersect:
+                x = x + theta*alpha*p
+            # Reinforce variables are inside box constraints.
+            # This is only necessary because of roundoff errors.
+            x = reinforce_box_boundaries(x, lb, ub)
+            # Attribute information
+            stop_cond = 2
+            hits_boundary = True
+            break
+
+        # Check if ``x`` is inside the box and start counter if it is not.
+        if inside_box_boundaries(x_next, lb, ub):
+            counter = 0
+        else:
+            counter += 1
+        # Whenever outside box constraints keep looking for intersections.
+        if counter > 0:
+            _, theta, intersect = box_sphere_intersections(x, alpha*p, lb, ub,
+                                                           trust_radius)
+            if intersect:
+                last_feasible_x = x + theta*alpha*p
+                # Reinforce variables are inside box constraints.
+                # This is only necessary because of roundoff errors.
+                last_feasible_x = reinforce_box_boundaries(last_feasible_x,
+                                                           lb, ub)
+                counter = 0
+        # Stop after too many infeasible (regarding box constraints) iteration.
+        if counter > max_infeasible_iter:
+            break
+        # Store ``x_next`` value
+        if return_all:
+            allvecs.append(x_next)
+
+        # Update residual
+        r_next = r + alpha*H_p
+        # Project residual g+ = Z r+
+        g_next = Z.dot(r_next)
+        # Compute conjugate direction step d
+        rt_g_next = norm(g_next)**2  # g.T g = r.T g (ref [1]_ p.1389)
+        beta = rt_g_next / rt_g
+        p = - g_next + beta*p
+        # Prepare for next iteration
+        x = x_next
+        g = g_next
+        r = g_next
+        rt_g = norm(g)**2  # g.T g = r.T Z g = r.T g (ref [1]_ p.1389)
+        H_p = H.dot(p)
+
+    if not inside_box_boundaries(x, lb, ub):
+        x = last_feasible_x
+        hits_boundary = True
+    info = {'niter': k, 'stop_cond': stop_cond,
+            'hits_boundary': hits_boundary}
+    if return_all:
+        info['allvecs'] = allvecs
+    return x, info
@@ -0,0 +1,51 @@
+"""Progress report printers."""
+
+from __future__ import annotations
+
+class ReportBase:
+    COLUMN_NAMES: list[str] = NotImplemented
+    COLUMN_WIDTHS: list[int] = NotImplemented
+    ITERATION_FORMATS: list[str] = NotImplemented
+
+    @classmethod
+    def print_header(cls):
+        fmt = ("|"
+               + "|".join([f"{{:^{x}}}" for x in cls.COLUMN_WIDTHS])
+               + "|")
+        separators = ['-' * x for x in cls.COLUMN_WIDTHS]
+        print(fmt.format(*cls.COLUMN_NAMES))
+        print(fmt.format(*separators))
+
+    @classmethod
+    def print_iteration(cls, *args):
+        iteration_format = [f"{{:{x}}}" for x in cls.ITERATION_FORMATS]
+        fmt = "|" + "|".join(iteration_format) + "|"
+        print(fmt.format(*args))
+
+    @classmethod
+    def print_footer(cls):
+        print()
+
+
+class BasicReport(ReportBase):
+    COLUMN_NAMES = ["niter", "f evals", "CG iter", "obj func", "tr radius",
+                    "opt", "c viol"]
+    COLUMN_WIDTHS = [7, 7, 7, 13, 10, 10, 10]
+    ITERATION_FORMATS = ["^7", "^7", "^7", "^+13.4e",
+                         "^10.2e", "^10.2e", "^10.2e"]
+
+
+class SQPReport(ReportBase):
+    COLUMN_NAMES = ["niter", "f evals", "CG iter", "obj func", "tr radius",
+                    "opt", "c viol", "penalty", "CG stop"]
+    COLUMN_WIDTHS = [7, 7, 7, 13, 10, 10, 10, 10, 7]
+    ITERATION_FORMATS = ["^7", "^7", "^7", "^+13.4e", "^10.2e", "^10.2e",
+                         "^10.2e", "^10.2e", "^7"]
+
+
+class IPReport(ReportBase):
+    COLUMN_NAMES = ["niter", "f evals", "CG iter", "obj func", "tr radius",
+                    "opt", "c viol", "penalty", "barrier param", "CG stop"]
+    COLUMN_WIDTHS = [7, 7, 7, 13, 10, 10, 10, 10, 13, 7]
+    ITERATION_FORMATS = ["^7", "^7", "^7", "^+13.4e", "^10.2e", "^10.2e",
+                         "^10.2e", "^10.2e", "^13.2e", "^7"]
@@ -0,0 +1,296 @@
+import numpy as np
+from numpy.testing import assert_array_equal, assert_equal
+from scipy.optimize._constraints import (NonlinearConstraint, Bounds,
+                                         PreparedConstraint)
+from scipy.optimize._trustregion_constr.canonical_constraint \
+    import CanonicalConstraint, initial_constraints_as_canonical
+
+
+def create_quadratic_function(n, m, rng):
+    a = rng.rand(m)
+    A = rng.rand(m, n)
+    H = rng.rand(m, n, n)
+    HT = np.transpose(H, (1, 2, 0))
+
+    def fun(x):
+        return a + A.dot(x) + 0.5 * H.dot(x).dot(x)
+
+    def jac(x):
+        return A + H.dot(x)
+
+    def hess(x, v):
+        return HT.dot(v)
+
+    return fun, jac, hess
+
+
+def test_bounds_cases():
+    # Test 1: no constraints.
+    user_constraint = Bounds(-np.inf, np.inf)
+    x0 = np.array([-1, 2])
+    prepared_constraint = PreparedConstraint(user_constraint, x0, False)
+    c = CanonicalConstraint.from_PreparedConstraint(prepared_constraint)
+
+    assert_equal(c.n_eq, 0)
+    assert_equal(c.n_ineq, 0)
+
+    c_eq, c_ineq = c.fun(x0)
+    assert_array_equal(c_eq, [])
+    assert_array_equal(c_ineq, [])
+
+    J_eq, J_ineq = c.jac(x0)
+    assert_array_equal(J_eq, np.empty((0, 2)))
+    assert_array_equal(J_ineq, np.empty((0, 2)))
+
+    assert_array_equal(c.keep_feasible, [])
+
+    # Test 2: infinite lower bound.
+    user_constraint = Bounds(-np.inf, [0, np.inf, 1], [False, True, True])
+    x0 = np.array([-1, -2, -3], dtype=float)
+    prepared_constraint = PreparedConstraint(user_constraint, x0, False)
+    c = CanonicalConstraint.from_PreparedConstraint(prepared_constraint)
+
+    assert_equal(c.n_eq, 0)
+    assert_equal(c.n_ineq, 2)
+
+    c_eq, c_ineq = c.fun(x0)
+    assert_array_equal(c_eq, [])
+    assert_array_equal(c_ineq, [-1, -4])
+
+    J_eq, J_ineq = c.jac(x0)
+    assert_array_equal(J_eq, np.empty((0, 3)))
+    assert_array_equal(J_ineq, np.array([[1, 0, 0], [0, 0, 1]]))
+
+    assert_array_equal(c.keep_feasible, [False, True])
+
+    # Test 3: infinite upper bound.
+    user_constraint = Bounds([0, 1, -np.inf], np.inf, [True, False, True])
+    x0 = np.array([1, 2, 3], dtype=float)
+    prepared_constraint = PreparedConstraint(user_constraint, x0, False)
+    c = CanonicalConstraint.from_PreparedConstraint(prepared_constraint)
+
+    assert_equal(c.n_eq, 0)
+    assert_equal(c.n_ineq, 2)
+
+    c_eq, c_ineq = c.fun(x0)
+    assert_array_equal(c_eq, [])
+    assert_array_equal(c_ineq, [-1, -1])
+
+    J_eq, J_ineq = c.jac(x0)
+    assert_array_equal(J_eq, np.empty((0, 3)))
+    assert_array_equal(J_ineq, np.array([[-1, 0, 0], [0, -1, 0]]))
+
+    assert_array_equal(c.keep_feasible, [True, False])
+
+    # Test 4: interval constraint.
+    user_constraint = Bounds([-1, -np.inf, 2, 3], [1, np.inf, 10, 3],
+                             [False, True, True, True])
+    x0 = np.array([0, 10, 8, 5])
+    prepared_constraint = PreparedConstraint(user_constraint, x0, False)
+    c = CanonicalConstraint.from_PreparedConstraint(prepared_constraint)
+
+    assert_equal(c.n_eq, 1)
+    assert_equal(c.n_ineq, 4)
+
+    c_eq, c_ineq = c.fun(x0)
+    assert_array_equal(c_eq, [2])
+    assert_array_equal(c_ineq, [-1, -2, -1, -6])
+
+    J_eq, J_ineq = c.jac(x0)
+    assert_array_equal(J_eq, [[0, 0, 0, 1]])
+    assert_array_equal(J_ineq, [[1, 0, 0, 0],
+                                [0, 0, 1, 0],
+                                [-1, 0, 0, 0],
+                                [0, 0, -1, 0]])
+
+    assert_array_equal(c.keep_feasible, [False, True, False, True])
+
+
+def test_nonlinear_constraint():
+    n = 3
+    m = 5
+    rng = np.random.RandomState(0)
+    x0 = rng.rand(n)
+
+    fun, jac, hess = create_quadratic_function(n, m, rng)
+    f = fun(x0)
+    J = jac(x0)
+
+    lb = [-10, 3, -np.inf, -np.inf, -5]
+    ub = [10, 3, np.inf, 3, np.inf]
+    user_constraint = NonlinearConstraint(
+        fun, lb, ub, jac, hess, [True, False, False, True, False])
+
+    for sparse_jacobian in [False, True]:
+        prepared_constraint = PreparedConstraint(user_constraint, x0,
+                                                 sparse_jacobian)
+        c = CanonicalConstraint.from_PreparedConstraint(prepared_constraint)
+
+        assert_array_equal(c.n_eq, 1)
+        assert_array_equal(c.n_ineq, 4)
+
+        c_eq, c_ineq = c.fun(x0)
+        assert_array_equal(c_eq, [f[1] - lb[1]])
+        assert_array_equal(c_ineq, [f[3] - ub[3], lb[4] - f[4],
+                                    f[0] - ub[0], lb[0] - f[0]])
+
+        J_eq, J_ineq = c.jac(x0)
+        if sparse_jacobian:
+            J_eq = J_eq.toarray()
+            J_ineq = J_ineq.toarray()
+
+        assert_array_equal(J_eq, J[1, None])
+        assert_array_equal(J_ineq, np.vstack((J[3], -J[4], J[0], -J[0])))
+
+        v_eq = rng.rand(c.n_eq)
+        v_ineq = rng.rand(c.n_ineq)
+        v = np.zeros(m)
+        v[1] = v_eq[0]
+        v[3] = v_ineq[0]
+        v[4] = -v_ineq[1]
+        v[0] = v_ineq[2] - v_ineq[3]
+        assert_array_equal(c.hess(x0, v_eq, v_ineq), hess(x0, v))
+
+        assert_array_equal(c.keep_feasible, [True, False, True, True])
+
+
+def test_concatenation():
+    rng = np.random.RandomState(0)
+    n = 4
+    x0 = rng.rand(n)
+
+    f1 = x0
+    J1 = np.eye(n)
+    lb1 = [-1, -np.inf, -2, 3]
+    ub1 = [1, np.inf, np.inf, 3]
+    bounds = Bounds(lb1, ub1, [False, False, True, False])
+
+    fun, jac, hess = create_quadratic_function(n, 5, rng)
+    f2 = fun(x0)
+    J2 = jac(x0)
+    lb2 = [-10, 3, -np.inf, -np.inf, -5]
+    ub2 = [10, 3, np.inf, 5, np.inf]
+    nonlinear = NonlinearConstraint(
+        fun, lb2, ub2, jac, hess, [True, False, False, True, False])
+
+    for sparse_jacobian in [False, True]:
+        bounds_prepared = PreparedConstraint(bounds, x0, sparse_jacobian)
+        nonlinear_prepared = PreparedConstraint(nonlinear, x0, sparse_jacobian)
+
+        c1 = CanonicalConstraint.from_PreparedConstraint(bounds_prepared)
+        c2 = CanonicalConstraint.from_PreparedConstraint(nonlinear_prepared)
+        c = CanonicalConstraint.concatenate([c1, c2], sparse_jacobian)
+
+        assert_equal(c.n_eq, 2)
+        assert_equal(c.n_ineq, 7)
+
+        c_eq, c_ineq = c.fun(x0)
+        assert_array_equal(c_eq, [f1[3] - lb1[3], f2[1] - lb2[1]])
+        assert_array_equal(c_ineq, [lb1[2] - f1[2], f1[0] - ub1[0],
+                                    lb1[0] - f1[0], f2[3] - ub2[3],
+                                    lb2[4] - f2[4], f2[0] - ub2[0],
+                                    lb2[0] - f2[0]])
+
+        J_eq, J_ineq = c.jac(x0)
+        if sparse_jacobian:
+            J_eq = J_eq.toarray()
+            J_ineq = J_ineq.toarray()
+
+        assert_array_equal(J_eq, np.vstack((J1[3], J2[1])))
+        assert_array_equal(J_ineq, np.vstack((-J1[2], J1[0], -J1[0], J2[3],
+                                              -J2[4], J2[0], -J2[0])))
+
+        v_eq = rng.rand(c.n_eq)
+        v_ineq = rng.rand(c.n_ineq)
+        v = np.zeros(5)
+        v[1] = v_eq[1]
+        v[3] = v_ineq[3]
+        v[4] = -v_ineq[4]
+        v[0] = v_ineq[5] - v_ineq[6]
+        H = c.hess(x0, v_eq, v_ineq).dot(np.eye(n))
+        assert_array_equal(H, hess(x0, v))
+
+        assert_array_equal(c.keep_feasible,
+                           [True, False, False, True, False, True, True])
+
+
+def test_empty():
+    x = np.array([1, 2, 3])
+    c = CanonicalConstraint.empty(3)
+    assert_equal(c.n_eq, 0)
+    assert_equal(c.n_ineq, 0)
+
+    c_eq, c_ineq = c.fun(x)
+    assert_array_equal(c_eq, [])
+    assert_array_equal(c_ineq, [])
+
+    J_eq, J_ineq = c.jac(x)
+    assert_array_equal(J_eq, np.empty((0, 3)))
+    assert_array_equal(J_ineq, np.empty((0, 3)))
+
+    H = c.hess(x, None, None).toarray()
+    assert_array_equal(H, np.zeros((3, 3)))
+
+
+def test_initial_constraints_as_canonical():
+    # rng is only used to generate the coefficients of the quadratic
+    # function that is used by the nonlinear constraint.
+    rng = np.random.RandomState(0)
+
+    x0 = np.array([0.5, 0.4, 0.3, 0.2])
+    n = len(x0)
+
+    lb1 = [-1, -np.inf, -2, 3]
+    ub1 = [1, np.inf, np.inf, 3]
+    bounds = Bounds(lb1, ub1, [False, False, True, False])
+
+    fun, jac, hess = create_quadratic_function(n, 5, rng)
+    lb2 = [-10, 3, -np.inf, -np.inf, -5]
+    ub2 = [10, 3, np.inf, 5, np.inf]
+    nonlinear = NonlinearConstraint(
+        fun, lb2, ub2, jac, hess, [True, False, False, True, False])
+
+    for sparse_jacobian in [False, True]:
+        bounds_prepared = PreparedConstraint(bounds, x0, sparse_jacobian)
+        nonlinear_prepared = PreparedConstraint(nonlinear, x0, sparse_jacobian)
+
+        f1 = bounds_prepared.fun.f
+        J1 = bounds_prepared.fun.J
+        f2 = nonlinear_prepared.fun.f
+        J2 = nonlinear_prepared.fun.J
+
+        c_eq, c_ineq, J_eq, J_ineq = initial_constraints_as_canonical(
+            n, [bounds_prepared, nonlinear_prepared], sparse_jacobian)
+
+        assert_array_equal(c_eq, [f1[3] - lb1[3], f2[1] - lb2[1]])
+        assert_array_equal(c_ineq, [lb1[2] - f1[2], f1[0] - ub1[0],
+                                    lb1[0] - f1[0], f2[3] - ub2[3],
+                                    lb2[4] - f2[4], f2[0] - ub2[0],
+                                    lb2[0] - f2[0]])
+
+        if sparse_jacobian:
+            J1 = J1.toarray()
+            J2 = J2.toarray()
+            J_eq = J_eq.toarray()
+            J_ineq = J_ineq.toarray()
+
+        assert_array_equal(J_eq, np.vstack((J1[3], J2[1])))
+        assert_array_equal(J_ineq, np.vstack((-J1[2], J1[0], -J1[0], J2[3],
+                                              -J2[4], J2[0], -J2[0])))
+
+
+def test_initial_constraints_as_canonical_empty():
+    n = 3
+    for sparse_jacobian in [False, True]:
+        c_eq, c_ineq, J_eq, J_ineq = initial_constraints_as_canonical(
+            n, [], sparse_jacobian)
+
+        assert_array_equal(c_eq, [])
+        assert_array_equal(c_ineq, [])
+
+        if sparse_jacobian:
+            J_eq = J_eq.toarray()
+            J_ineq = J_ineq.toarray()
+
+        assert_array_equal(J_eq, np.empty((0, n)))
+        assert_array_equal(J_ineq, np.empty((0, n)))
@@ -0,0 +1,214 @@
+import numpy as np
+import scipy.linalg
+from scipy.sparse import csc_matrix
+from scipy.optimize._trustregion_constr.projections \
+    import projections, orthogonality
+from numpy.testing import (TestCase, assert_array_almost_equal,
+                           assert_equal, assert_allclose)
+
+try:
+    from sksparse.cholmod import cholesky_AAt  # noqa: F401
+    sksparse_available = True
+    available_sparse_methods = ("NormalEquation", "AugmentedSystem")
+except ImportError:
+    sksparse_available = False
+    available_sparse_methods = ("AugmentedSystem",)
+available_dense_methods = ('QRFactorization', 'SVDFactorization')
+
+
+class TestProjections(TestCase):
+
+    def test_nullspace_and_least_squares_sparse(self):
+        A_dense = np.array([[1, 2, 3, 4, 0, 5, 0, 7],
+                            [0, 8, 7, 0, 1, 5, 9, 0],
+                            [1, 0, 0, 0, 0, 1, 2, 3]])
+        At_dense = A_dense.T
+        A = csc_matrix(A_dense)
+        test_points = ([1, 2, 3, 4, 5, 6, 7, 8],
+                       [1, 10, 3, 0, 1, 6, 7, 8],
+                       [1.12, 10, 0, 0, 100000, 6, 0.7, 8])
+
+        for method in available_sparse_methods:
+            Z, LS, _ = projections(A, method)
+            for z in test_points:
+                # Test if x is in the null_space
+                x = Z.matvec(z)
+                assert_array_almost_equal(A.dot(x), 0)
+                # Test orthogonality
+                assert_array_almost_equal(orthogonality(A, x), 0)
+                # Test if x is the least square solution
+                x = LS.matvec(z)
+                x2 = scipy.linalg.lstsq(At_dense, z)[0]
+                assert_array_almost_equal(x, x2)
+
+    def test_iterative_refinements_sparse(self):
+        A_dense = np.array([[1, 2, 3, 4, 0, 5, 0, 7],
+                            [0, 8, 7, 0, 1, 5, 9, 0],
+                            [1, 0, 0, 0, 0, 1, 2, 3]])
+        A = csc_matrix(A_dense)
+        test_points = ([1, 2, 3, 4, 5, 6, 7, 8],
+                       [1, 10, 3, 0, 1, 6, 7, 8],
+                       [1.12, 10, 0, 0, 100000, 6, 0.7, 8],
+                       [1, 0, 0, 0, 0, 1, 2, 3+1e-10])
+
+        for method in available_sparse_methods:
+            Z, LS, _ = projections(A, method, orth_tol=1e-18, max_refin=100)
+            for z in test_points:
+                # Test if x is in the null_space
+                x = Z.matvec(z)
+                atol = 1e-13 * abs(x).max()
+                assert_allclose(A.dot(x), 0, atol=atol)
+                # Test orthogonality
+                assert_allclose(orthogonality(A, x), 0, atol=1e-13)
+
+    def test_rowspace_sparse(self):
+        A_dense = np.array([[1, 2, 3, 4, 0, 5, 0, 7],
+                            [0, 8, 7, 0, 1, 5, 9, 0],
+                            [1, 0, 0, 0, 0, 1, 2, 3]])
+        A = csc_matrix(A_dense)
+        test_points = ([1, 2, 3],
+                       [1, 10, 3],
+                       [1.12, 10, 0])
+
+        for method in available_sparse_methods:
+            _, _, Y = projections(A, method)
+            for z in test_points:
+                # Test if x is solution of A x = z
+                x = Y.matvec(z)
+                assert_array_almost_equal(A.dot(x), z)
+                # Test if x is in the return row space of A
+                A_ext = np.vstack((A_dense, x))
+                assert_equal(np.linalg.matrix_rank(A_dense),
+                             np.linalg.matrix_rank(A_ext))
+
+    def test_nullspace_and_least_squares_dense(self):
+        A = np.array([[1, 2, 3, 4, 0, 5, 0, 7],
+                      [0, 8, 7, 0, 1, 5, 9, 0],
+                      [1, 0, 0, 0, 0, 1, 2, 3]])
+        At = A.T
+        test_points = ([1, 2, 3, 4, 5, 6, 7, 8],
+                       [1, 10, 3, 0, 1, 6, 7, 8],
+                       [1.12, 10, 0, 0, 100000, 6, 0.7, 8])
+
+        for method in available_dense_methods:
+            Z, LS, _ = projections(A, method)
+            for z in test_points:
+                # Test if x is in the null_space
+                x = Z.matvec(z)
+                assert_array_almost_equal(A.dot(x), 0)
+                # Test orthogonality
+                assert_array_almost_equal(orthogonality(A, x), 0)
+                # Test if x is the least square solution
+                x = LS.matvec(z)
+                x2 = scipy.linalg.lstsq(At, z)[0]
+                assert_array_almost_equal(x, x2)
+
+    def test_compare_dense_and_sparse(self):
+        D = np.diag(range(1, 101))
+        A = np.hstack([D, D, D, D])
+        A_sparse = csc_matrix(A)
+        np.random.seed(0)
+
+        Z, LS, Y = projections(A)
+        Z_sparse, LS_sparse, Y_sparse = projections(A_sparse)
+        for k in range(20):
+            z = np.random.normal(size=(400,))
+            assert_array_almost_equal(Z.dot(z), Z_sparse.dot(z))
+            assert_array_almost_equal(LS.dot(z), LS_sparse.dot(z))
+            x = np.random.normal(size=(100,))
+            assert_array_almost_equal(Y.dot(x), Y_sparse.dot(x))
+
+    def test_compare_dense_and_sparse2(self):
+        D1 = np.diag([-1.7, 1, 0.5])
+        D2 = np.diag([1, -0.6, -0.3])
+        D3 = np.diag([-0.3, -1.5, 2])
+        A = np.hstack([D1, D2, D3])
+        A_sparse = csc_matrix(A)
+        np.random.seed(0)
+
+        Z, LS, Y = projections(A)
+        Z_sparse, LS_sparse, Y_sparse = projections(A_sparse)
+        for k in range(1):
+            z = np.random.normal(size=(9,))
+            assert_array_almost_equal(Z.dot(z), Z_sparse.dot(z))
+            assert_array_almost_equal(LS.dot(z), LS_sparse.dot(z))
+            x = np.random.normal(size=(3,))
+            assert_array_almost_equal(Y.dot(x), Y_sparse.dot(x))
+
+    def test_iterative_refinements_dense(self):
+        A = np.array([[1, 2, 3, 4, 0, 5, 0, 7],
+                            [0, 8, 7, 0, 1, 5, 9, 0],
+                            [1, 0, 0, 0, 0, 1, 2, 3]])
+        test_points = ([1, 2, 3, 4, 5, 6, 7, 8],
+                       [1, 10, 3, 0, 1, 6, 7, 8],
+                       [1, 0, 0, 0, 0, 1, 2, 3+1e-10])
+
+        for method in available_dense_methods:
+            Z, LS, _ = projections(A, method, orth_tol=1e-18, max_refin=10)
+            for z in test_points:
+                # Test if x is in the null_space
+                x = Z.matvec(z)
+                assert_allclose(A.dot(x), 0, rtol=0, atol=2.5e-14)
+                # Test orthogonality
+                assert_allclose(orthogonality(A, x), 0, rtol=0, atol=5e-16)
+
+    def test_rowspace_dense(self):
+        A = np.array([[1, 2, 3, 4, 0, 5, 0, 7],
+                      [0, 8, 7, 0, 1, 5, 9, 0],
+                      [1, 0, 0, 0, 0, 1, 2, 3]])
+        test_points = ([1, 2, 3],
+                       [1, 10, 3],
+                       [1.12, 10, 0])
+
+        for method in available_dense_methods:
+            _, _, Y = projections(A, method)
+            for z in test_points:
+                # Test if x is solution of A x = z
+                x = Y.matvec(z)
+                assert_array_almost_equal(A.dot(x), z)
+                # Test if x is in the return row space of A
+                A_ext = np.vstack((A, x))
+                assert_equal(np.linalg.matrix_rank(A),
+                             np.linalg.matrix_rank(A_ext))
+
+
+class TestOrthogonality(TestCase):
+
+    def test_dense_matrix(self):
+        A = np.array([[1, 2, 3, 4, 0, 5, 0, 7],
+                      [0, 8, 7, 0, 1, 5, 9, 0],
+                      [1, 0, 0, 0, 0, 1, 2, 3]])
+        test_vectors = ([-1.98931144, -1.56363389,
+                         -0.84115584, 2.2864762,
+                         5.599141, 0.09286976,
+                         1.37040802, -0.28145812],
+                        [697.92794044, -4091.65114008,
+                         -3327.42316335, 836.86906951,
+                         99434.98929065, -1285.37653682,
+                         -4109.21503806, 2935.29289083])
+        test_expected_orth = (0, 0)
+
+        for i in range(len(test_vectors)):
+            x = test_vectors[i]
+            orth = test_expected_orth[i]
+            assert_array_almost_equal(orthogonality(A, x), orth)
+
+    def test_sparse_matrix(self):
+        A = np.array([[1, 2, 3, 4, 0, 5, 0, 7],
+                      [0, 8, 7, 0, 1, 5, 9, 0],
+                      [1, 0, 0, 0, 0, 1, 2, 3]])
+        A = csc_matrix(A)
+        test_vectors = ([-1.98931144, -1.56363389,
+                         -0.84115584, 2.2864762,
+                         5.599141, 0.09286976,
+                         1.37040802, -0.28145812],
+                        [697.92794044, -4091.65114008,
+                         -3327.42316335, 836.86906951,
+                         99434.98929065, -1285.37653682,
+                         -4109.21503806, 2935.29289083])
+        test_expected_orth = (0, 0)
+
+        for i in range(len(test_vectors)):
+            x = test_vectors[i]
+            orth = test_expected_orth[i]
+            assert_array_almost_equal(orthogonality(A, x), orth)
@@ -0,0 +1,645 @@
+import numpy as np
+from scipy.sparse import csc_matrix
+from scipy.optimize._trustregion_constr.qp_subproblem \
+    import (eqp_kktfact,
+            projected_cg,
+            box_intersections,
+            sphere_intersections,
+            box_sphere_intersections,
+            modified_dogleg)
+from scipy.optimize._trustregion_constr.projections \
+    import projections
+from numpy.testing import TestCase, assert_array_almost_equal, assert_equal
+import pytest
+
+
+class TestEQPDirectFactorization(TestCase):
+
+    # From Example 16.2 Nocedal/Wright "Numerical
+    # Optimization" p.452.
+    def test_nocedal_example(self):
+        H = csc_matrix([[6, 2, 1],
+                        [2, 5, 2],
+                        [1, 2, 4]])
+        A = csc_matrix([[1, 0, 1],
+                        [0, 1, 1]])
+        c = np.array([-8, -3, -3])
+        b = -np.array([3, 0])
+        x, lagrange_multipliers = eqp_kktfact(H, c, A, b)
+        assert_array_almost_equal(x, [2, -1, 1])
+        assert_array_almost_equal(lagrange_multipliers, [3, -2])
+
+
+class TestSphericalBoundariesIntersections(TestCase):
+
+    def test_2d_sphere_constraints(self):
+        # Interior inicial point
+        ta, tb, intersect = sphere_intersections([0, 0],
+                                                 [1, 0], 0.5)
+        assert_array_almost_equal([ta, tb], [0, 0.5])
+        assert_equal(intersect, True)
+
+        # No intersection between line and circle
+        ta, tb, intersect = sphere_intersections([2, 0],
+                                                 [0, 1], 1)
+        assert_equal(intersect, False)
+
+        # Outside initial point pointing toward outside the circle
+        ta, tb, intersect = sphere_intersections([2, 0],
+                                                 [1, 0], 1)
+        assert_equal(intersect, False)
+
+        # Outside initial point pointing toward inside the circle
+        ta, tb, intersect = sphere_intersections([2, 0],
+                                                 [-1, 0], 1.5)
+        assert_array_almost_equal([ta, tb], [0.5, 1])
+        assert_equal(intersect, True)
+
+        # Initial point on the boundary
+        ta, tb, intersect = sphere_intersections([2, 0],
+                                                 [1, 0], 2)
+        assert_array_almost_equal([ta, tb], [0, 0])
+        assert_equal(intersect, True)
+
+    def test_2d_sphere_constraints_line_intersections(self):
+        # Interior initial point
+        ta, tb, intersect = sphere_intersections([0, 0],
+                                                 [1, 0], 0.5,
+                                                 entire_line=True)
+        assert_array_almost_equal([ta, tb], [-0.5, 0.5])
+        assert_equal(intersect, True)
+
+        # No intersection between line and circle
+        ta, tb, intersect = sphere_intersections([2, 0],
+                                                 [0, 1], 1,
+                                                 entire_line=True)
+        assert_equal(intersect, False)
+
+        # Outside initial point pointing toward outside the circle
+        ta, tb, intersect = sphere_intersections([2, 0],
+                                                 [1, 0], 1,
+                                                 entire_line=True)
+        assert_array_almost_equal([ta, tb], [-3, -1])
+        assert_equal(intersect, True)
+
+        # Outside initial point pointing toward inside the circle
+        ta, tb, intersect = sphere_intersections([2, 0],
+                                                 [-1, 0], 1.5,
+                                                 entire_line=True)
+        assert_array_almost_equal([ta, tb], [0.5, 3.5])
+        assert_equal(intersect, True)
+
+        # Initial point on the boundary
+        ta, tb, intersect = sphere_intersections([2, 0],
+                                                 [1, 0], 2,
+                                                 entire_line=True)
+        assert_array_almost_equal([ta, tb], [-4, 0])
+        assert_equal(intersect, True)
+
+
+class TestBoxBoundariesIntersections(TestCase):
+
+    def test_2d_box_constraints(self):
+        # Box constraint in the direction of vector d
+        ta, tb, intersect = box_intersections([2, 0], [0, 2],
+                                              [1, 1], [3, 3])
+        assert_array_almost_equal([ta, tb], [0.5, 1])
+        assert_equal(intersect, True)
+
+        # Negative direction
+        ta, tb, intersect = box_intersections([2, 0], [0, 2],
+                                              [1, -3], [3, -1])
+        assert_equal(intersect, False)
+
+        # Some constraints are absent (set to +/- inf)
+        ta, tb, intersect = box_intersections([2, 0], [0, 2],
+                                              [-np.inf, 1],
+                                              [np.inf, np.inf])
+        assert_array_almost_equal([ta, tb], [0.5, 1])
+        assert_equal(intersect, True)
+
+        # Intersect on the face of the box
+        ta, tb, intersect = box_intersections([1, 0], [0, 1],
+                                              [1, 1], [3, 3])
+        assert_array_almost_equal([ta, tb], [1, 1])
+        assert_equal(intersect, True)
+
+        # Interior initial point
+        ta, tb, intersect = box_intersections([0, 0], [4, 4],
+                                              [-2, -3], [3, 2])
+        assert_array_almost_equal([ta, tb], [0, 0.5])
+        assert_equal(intersect, True)
+
+        # No intersection between line and box constraints
+        ta, tb, intersect = box_intersections([2, 0], [0, 2],
+                                              [-3, -3], [-1, -1])
+        assert_equal(intersect, False)
+        ta, tb, intersect = box_intersections([2, 0], [0, 2],
+                                              [-3, 3], [-1, 1])
+        assert_equal(intersect, False)
+        ta, tb, intersect = box_intersections([2, 0], [0, 2],
+                                              [-3, -np.inf],
+                                              [-1, np.inf])
+        assert_equal(intersect, False)
+        ta, tb, intersect = box_intersections([0, 0], [1, 100],
+                                              [1, 1], [3, 3])
+        assert_equal(intersect, False)
+        ta, tb, intersect = box_intersections([0.99, 0], [0, 2],
+                                                         [1, 1], [3, 3])
+        assert_equal(intersect, False)
+
+        # Initial point on the boundary
+        ta, tb, intersect = box_intersections([2, 2], [0, 1],
+                                              [-2, -2], [2, 2])
+        assert_array_almost_equal([ta, tb], [0, 0])
+        assert_equal(intersect, True)
+
+    def test_2d_box_constraints_entire_line(self):
+        # Box constraint in the direction of vector d
+        ta, tb, intersect = box_intersections([2, 0], [0, 2],
+                                              [1, 1], [3, 3],
+                                              entire_line=True)
+        assert_array_almost_equal([ta, tb], [0.5, 1.5])
+        assert_equal(intersect, True)
+
+        # Negative direction
+        ta, tb, intersect = box_intersections([2, 0], [0, 2],
+                                              [1, -3], [3, -1],
+                                              entire_line=True)
+        assert_array_almost_equal([ta, tb], [-1.5, -0.5])
+        assert_equal(intersect, True)
+
+        # Some constraints are absent (set to +/- inf)
+        ta, tb, intersect = box_intersections([2, 0], [0, 2],
+                                              [-np.inf, 1],
+                                              [np.inf, np.inf],
+                                              entire_line=True)
+        assert_array_almost_equal([ta, tb], [0.5, np.inf])
+        assert_equal(intersect, True)
+
+        # Intersect on the face of the box
+        ta, tb, intersect = box_intersections([1, 0], [0, 1],
+                                              [1, 1], [3, 3],
+                                              entire_line=True)
+        assert_array_almost_equal([ta, tb], [1, 3])
+        assert_equal(intersect, True)
+
+        # Interior initial pointoint
+        ta, tb, intersect = box_intersections([0, 0], [4, 4],
+                                              [-2, -3], [3, 2],
+                                              entire_line=True)
+        assert_array_almost_equal([ta, tb], [-0.5, 0.5])
+        assert_equal(intersect, True)
+
+        # No intersection between line and box constraints
+        ta, tb, intersect = box_intersections([2, 0], [0, 2],
+                                              [-3, -3], [-1, -1],
+                                              entire_line=True)
+        assert_equal(intersect, False)
+        ta, tb, intersect = box_intersections([2, 0], [0, 2],
+                                              [-3, 3], [-1, 1],
+                                              entire_line=True)
+        assert_equal(intersect, False)
+        ta, tb, intersect = box_intersections([2, 0], [0, 2],
+                                              [-3, -np.inf],
+                                              [-1, np.inf],
+                                              entire_line=True)
+        assert_equal(intersect, False)
+        ta, tb, intersect = box_intersections([0, 0], [1, 100],
+                                              [1, 1], [3, 3],
+                                              entire_line=True)
+        assert_equal(intersect, False)
+        ta, tb, intersect = box_intersections([0.99, 0], [0, 2],
+                                              [1, 1], [3, 3],
+                                              entire_line=True)
+        assert_equal(intersect, False)
+
+        # Initial point on the boundary
+        ta, tb, intersect = box_intersections([2, 2], [0, 1],
+                                              [-2, -2], [2, 2],
+                                              entire_line=True)
+        assert_array_almost_equal([ta, tb], [-4, 0])
+        assert_equal(intersect, True)
+
+    def test_3d_box_constraints(self):
+        # Simple case
+        ta, tb, intersect = box_intersections([1, 1, 0], [0, 0, 1],
+                                              [1, 1, 1], [3, 3, 3])
+        assert_array_almost_equal([ta, tb], [1, 1])
+        assert_equal(intersect, True)
+
+        # Negative direction
+        ta, tb, intersect = box_intersections([1, 1, 0], [0, 0, -1],
+                                              [1, 1, 1], [3, 3, 3])
+        assert_equal(intersect, False)
+
+        # Interior point
+        ta, tb, intersect = box_intersections([2, 2, 2], [0, -1, 1],
+                                              [1, 1, 1], [3, 3, 3])
+        assert_array_almost_equal([ta, tb], [0, 1])
+        assert_equal(intersect, True)
+
+    def test_3d_box_constraints_entire_line(self):
+        # Simple case
+        ta, tb, intersect = box_intersections([1, 1, 0], [0, 0, 1],
+                                              [1, 1, 1], [3, 3, 3],
+                                              entire_line=True)
+        assert_array_almost_equal([ta, tb], [1, 3])
+        assert_equal(intersect, True)
+
+        # Negative direction
+        ta, tb, intersect = box_intersections([1, 1, 0], [0, 0, -1],
+                                              [1, 1, 1], [3, 3, 3],
+                                              entire_line=True)
+        assert_array_almost_equal([ta, tb], [-3, -1])
+        assert_equal(intersect, True)
+
+        # Interior point
+        ta, tb, intersect = box_intersections([2, 2, 2], [0, -1, 1],
+                                              [1, 1, 1], [3, 3, 3],
+                                              entire_line=True)
+        assert_array_almost_equal([ta, tb], [-1, 1])
+        assert_equal(intersect, True)
+
+
+class TestBoxSphereBoundariesIntersections(TestCase):
+
+    def test_2d_box_constraints(self):
+        # Both constraints are active
+        ta, tb, intersect = box_sphere_intersections([1, 1], [-2, 2],
+                                                     [-1, -2], [1, 2], 2,
+                                                     entire_line=False)
+        assert_array_almost_equal([ta, tb], [0, 0.5])
+        assert_equal(intersect, True)
+
+        # None of the constraints are active
+        ta, tb, intersect = box_sphere_intersections([1, 1], [-1, 1],
+                                                     [-1, -3], [1, 3], 10,
+                                                     entire_line=False)
+        assert_array_almost_equal([ta, tb], [0, 1])
+        assert_equal(intersect, True)
+
+        # Box constraints are active
+        ta, tb, intersect = box_sphere_intersections([1, 1], [-4, 4],
+                                                     [-1, -3], [1, 3], 10,
+                                                     entire_line=False)
+        assert_array_almost_equal([ta, tb], [0, 0.5])
+        assert_equal(intersect, True)
+
+        # Spherical constraints are active
+        ta, tb, intersect = box_sphere_intersections([1, 1], [-4, 4],
+                                                     [-1, -3], [1, 3], 2,
+                                                     entire_line=False)
+        assert_array_almost_equal([ta, tb], [0, 0.25])
+        assert_equal(intersect, True)
+
+        # Infeasible problems
+        ta, tb, intersect = box_sphere_intersections([2, 2], [-4, 4],
+                                                     [-1, -3], [1, 3], 2,
+                                                     entire_line=False)
+        assert_equal(intersect, False)
+        ta, tb, intersect = box_sphere_intersections([1, 1], [-4, 4],
+                                                     [2, 4], [2, 4], 2,
+                                                     entire_line=False)
+        assert_equal(intersect, False)
+
+    def test_2d_box_constraints_entire_line(self):
+        # Both constraints are active
+        ta, tb, intersect = box_sphere_intersections([1, 1], [-2, 2],
+                                                     [-1, -2], [1, 2], 2,
+                                                     entire_line=True)
+        assert_array_almost_equal([ta, tb], [0, 0.5])
+        assert_equal(intersect, True)
+
+        # None of the constraints are active
+        ta, tb, intersect = box_sphere_intersections([1, 1], [-1, 1],
+                                                     [-1, -3], [1, 3], 10,
+                                                     entire_line=True)
+        assert_array_almost_equal([ta, tb], [0, 2])
+        assert_equal(intersect, True)
+
+        # Box constraints are active
+        ta, tb, intersect = box_sphere_intersections([1, 1], [-4, 4],
+                                                     [-1, -3], [1, 3], 10,
+                                                     entire_line=True)
+        assert_array_almost_equal([ta, tb], [0, 0.5])
+        assert_equal(intersect, True)
+
+        # Spherical constraints are active
+        ta, tb, intersect = box_sphere_intersections([1, 1], [-4, 4],
+                                                     [-1, -3], [1, 3], 2,
+                                                     entire_line=True)
+        assert_array_almost_equal([ta, tb], [0, 0.25])
+        assert_equal(intersect, True)
+
+        # Infeasible problems
+        ta, tb, intersect = box_sphere_intersections([2, 2], [-4, 4],
+                                                     [-1, -3], [1, 3], 2,
+                                                     entire_line=True)
+        assert_equal(intersect, False)
+        ta, tb, intersect = box_sphere_intersections([1, 1], [-4, 4],
+                                                     [2, 4], [2, 4], 2,
+                                                     entire_line=True)
+        assert_equal(intersect, False)
+
+
+class TestModifiedDogleg(TestCase):
+
+    def test_cauchypoint_equalsto_newtonpoint(self):
+        A = np.array([[1, 8]])
+        b = np.array([-16])
+        _, _, Y = projections(A)
+        newton_point = np.array([0.24615385, 1.96923077])
+
+        # Newton point inside boundaries
+        x = modified_dogleg(A, Y, b, 2, [-np.inf, -np.inf], [np.inf, np.inf])
+        assert_array_almost_equal(x, newton_point)
+
+        # Spherical constraint active
+        x = modified_dogleg(A, Y, b, 1, [-np.inf, -np.inf], [np.inf, np.inf])
+        assert_array_almost_equal(x, newton_point/np.linalg.norm(newton_point))
+
+        # Box constraints active
+        x = modified_dogleg(A, Y, b, 2, [-np.inf, -np.inf], [0.1, np.inf])
+        assert_array_almost_equal(x, (newton_point/newton_point[0]) * 0.1)
+
+    def test_3d_example(self):
+        A = np.array([[1, 8, 1],
+                      [4, 2, 2]])
+        b = np.array([-16, 2])
+        Z, LS, Y = projections(A)
+
+        newton_point = np.array([-1.37090909, 2.23272727, -0.49090909])
+        cauchy_point = np.array([0.11165723, 1.73068711, 0.16748585])
+        origin = np.zeros_like(newton_point)
+
+        # newton_point inside boundaries
+        x = modified_dogleg(A, Y, b, 3, [-np.inf, -np.inf, -np.inf],
+                            [np.inf, np.inf, np.inf])
+        assert_array_almost_equal(x, newton_point)
+
+        # line between cauchy_point and newton_point contains best point
+        # (spherical constraint is active).
+        x = modified_dogleg(A, Y, b, 2, [-np.inf, -np.inf, -np.inf],
+                            [np.inf, np.inf, np.inf])
+        z = cauchy_point
+        d = newton_point-cauchy_point
+        t = ((x-z)/(d))
+        assert_array_almost_equal(t, np.full(3, 0.40807330))
+        assert_array_almost_equal(np.linalg.norm(x), 2)
+
+        # line between cauchy_point and newton_point contains best point
+        # (box constraint is active).
+        x = modified_dogleg(A, Y, b, 5, [-1, -np.inf, -np.inf],
+                            [np.inf, np.inf, np.inf])
+        z = cauchy_point
+        d = newton_point-cauchy_point
+        t = ((x-z)/(d))
+        assert_array_almost_equal(t, np.full(3, 0.7498195))
+        assert_array_almost_equal(x[0], -1)
+
+        # line between origin and cauchy_point contains best point
+        # (spherical constraint is active).
+        x = modified_dogleg(A, Y, b, 1, [-np.inf, -np.inf, -np.inf],
+                            [np.inf, np.inf, np.inf])
+        z = origin
+        d = cauchy_point
+        t = ((x-z)/(d))
+        assert_array_almost_equal(t, np.full(3, 0.573936265))
+        assert_array_almost_equal(np.linalg.norm(x), 1)
+
+        # line between origin and newton_point contains best point
+        # (box constraint is active).
+        x = modified_dogleg(A, Y, b, 2, [-np.inf, -np.inf, -np.inf],
+                            [np.inf, 1, np.inf])
+        z = origin
+        d = newton_point
+        t = ((x-z)/(d))
+        assert_array_almost_equal(t, np.full(3, 0.4478827364))
+        assert_array_almost_equal(x[1], 1)
+
+
+class TestProjectCG(TestCase):
+
+    # From Example 16.2 Nocedal/Wright "Numerical
+    # Optimization" p.452.
+    def test_nocedal_example(self):
+        H = csc_matrix([[6, 2, 1],
+                        [2, 5, 2],
+                        [1, 2, 4]])
+        A = csc_matrix([[1, 0, 1],
+                        [0, 1, 1]])
+        c = np.array([-8, -3, -3])
+        b = -np.array([3, 0])
+        Z, _, Y = projections(A)
+        x, info = projected_cg(H, c, Z, Y, b)
+        assert_equal(info["stop_cond"], 4)
+        assert_equal(info["hits_boundary"], False)
+        assert_array_almost_equal(x, [2, -1, 1])
+
+    def test_compare_with_direct_fact(self):
+        H = csc_matrix([[6, 2, 1, 3],
+                        [2, 5, 2, 4],
+                        [1, 2, 4, 5],
+                        [3, 4, 5, 7]])
+        A = csc_matrix([[1, 0, 1, 0],
+                        [0, 1, 1, 1]])
+        c = np.array([-2, -3, -3, 1])
+        b = -np.array([3, 0])
+        Z, _, Y = projections(A)
+        x, info = projected_cg(H, c, Z, Y, b, tol=0)
+        x_kkt, _ = eqp_kktfact(H, c, A, b)
+        assert_equal(info["stop_cond"], 1)
+        assert_equal(info["hits_boundary"], False)
+        assert_array_almost_equal(x, x_kkt)
+
+    def test_trust_region_infeasible(self):
+        H = csc_matrix([[6, 2, 1, 3],
+                        [2, 5, 2, 4],
+                        [1, 2, 4, 5],
+                        [3, 4, 5, 7]])
+        A = csc_matrix([[1, 0, 1, 0],
+                        [0, 1, 1, 1]])
+        c = np.array([-2, -3, -3, 1])
+        b = -np.array([3, 0])
+        trust_radius = 1
+        Z, _, Y = projections(A)
+        with pytest.raises(ValueError):
+            projected_cg(H, c, Z, Y, b, trust_radius=trust_radius)
+
+    def test_trust_region_barely_feasible(self):
+        H = csc_matrix([[6, 2, 1, 3],
+                        [2, 5, 2, 4],
+                        [1, 2, 4, 5],
+                        [3, 4, 5, 7]])
+        A = csc_matrix([[1, 0, 1, 0],
+                        [0, 1, 1, 1]])
+        c = np.array([-2, -3, -3, 1])
+        b = -np.array([3, 0])
+        trust_radius = 2.32379000772445021283
+        Z, _, Y = projections(A)
+        x, info = projected_cg(H, c, Z, Y, b,
+                               tol=0,
+                               trust_radius=trust_radius)
+        assert_equal(info["stop_cond"], 2)
+        assert_equal(info["hits_boundary"], True)
+        assert_array_almost_equal(np.linalg.norm(x), trust_radius)
+        assert_array_almost_equal(x, -Y.dot(b))
+
+    def test_hits_boundary(self):
+        H = csc_matrix([[6, 2, 1, 3],
+                        [2, 5, 2, 4],
+                        [1, 2, 4, 5],
+                        [3, 4, 5, 7]])
+        A = csc_matrix([[1, 0, 1, 0],
+                        [0, 1, 1, 1]])
+        c = np.array([-2, -3, -3, 1])
+        b = -np.array([3, 0])
+        trust_radius = 3
+        Z, _, Y = projections(A)
+        x, info = projected_cg(H, c, Z, Y, b,
+                               tol=0,
+                               trust_radius=trust_radius)
+        assert_equal(info["stop_cond"], 2)
+        assert_equal(info["hits_boundary"], True)
+        assert_array_almost_equal(np.linalg.norm(x), trust_radius)
+
+    def test_negative_curvature_unconstrained(self):
+        H = csc_matrix([[1, 2, 1, 3],
+                        [2, 0, 2, 4],
+                        [1, 2, 0, 2],
+                        [3, 4, 2, 0]])
+        A = csc_matrix([[1, 0, 1, 0],
+                        [0, 1, 0, 1]])
+        c = np.array([-2, -3, -3, 1])
+        b = -np.array([3, 0])
+        Z, _, Y = projections(A)
+        with pytest.raises(ValueError):
+            projected_cg(H, c, Z, Y, b, tol=0)
+
+    def test_negative_curvature(self):
+        H = csc_matrix([[1, 2, 1, 3],
+                        [2, 0, 2, 4],
+                        [1, 2, 0, 2],
+                        [3, 4, 2, 0]])
+        A = csc_matrix([[1, 0, 1, 0],
+                        [0, 1, 0, 1]])
+        c = np.array([-2, -3, -3, 1])
+        b = -np.array([3, 0])
+        Z, _, Y = projections(A)
+        trust_radius = 1000
+        x, info = projected_cg(H, c, Z, Y, b,
+                               tol=0,
+                               trust_radius=trust_radius)
+        assert_equal(info["stop_cond"], 3)
+        assert_equal(info["hits_boundary"], True)
+        assert_array_almost_equal(np.linalg.norm(x), trust_radius)
+
+    # The box constraints are inactive at the solution but
+    # are active during the iterations.
+    def test_inactive_box_constraints(self):
+        H = csc_matrix([[6, 2, 1, 3],
+                        [2, 5, 2, 4],
+                        [1, 2, 4, 5],
+                        [3, 4, 5, 7]])
+        A = csc_matrix([[1, 0, 1, 0],
+                        [0, 1, 1, 1]])
+        c = np.array([-2, -3, -3, 1])
+        b = -np.array([3, 0])
+        Z, _, Y = projections(A)
+        x, info = projected_cg(H, c, Z, Y, b,
+                               tol=0,
+                               lb=[0.5, -np.inf,
+                                   -np.inf, -np.inf],
+                               return_all=True)
+        x_kkt, _ = eqp_kktfact(H, c, A, b)
+        assert_equal(info["stop_cond"], 1)
+        assert_equal(info["hits_boundary"], False)
+        assert_array_almost_equal(x, x_kkt)
+
+    # The box constraints active and the termination is
+    # by maximum iterations (infeasible interaction).
+    def test_active_box_constraints_maximum_iterations_reached(self):
+        H = csc_matrix([[6, 2, 1, 3],
+                        [2, 5, 2, 4],
+                        [1, 2, 4, 5],
+                        [3, 4, 5, 7]])
+        A = csc_matrix([[1, 0, 1, 0],
+                        [0, 1, 1, 1]])
+        c = np.array([-2, -3, -3, 1])
+        b = -np.array([3, 0])
+        Z, _, Y = projections(A)
+        x, info = projected_cg(H, c, Z, Y, b,
+                               tol=0,
+                               lb=[0.8, -np.inf,
+                                   -np.inf, -np.inf],
+                               return_all=True)
+        assert_equal(info["stop_cond"], 1)
+        assert_equal(info["hits_boundary"], True)
+        assert_array_almost_equal(A.dot(x), -b)
+        assert_array_almost_equal(x[0], 0.8)
+
+    # The box constraints are active and the termination is
+    # because it hits boundary (without infeasible interaction).
+    def test_active_box_constraints_hits_boundaries(self):
+        H = csc_matrix([[6, 2, 1, 3],
+                        [2, 5, 2, 4],
+                        [1, 2, 4, 5],
+                        [3, 4, 5, 7]])
+        A = csc_matrix([[1, 0, 1, 0],
+                        [0, 1, 1, 1]])
+        c = np.array([-2, -3, -3, 1])
+        b = -np.array([3, 0])
+        trust_radius = 3
+        Z, _, Y = projections(A)
+        x, info = projected_cg(H, c, Z, Y, b,
+                               tol=0,
+                               ub=[np.inf, np.inf, 1.6, np.inf],
+                               trust_radius=trust_radius,
+                               return_all=True)
+        assert_equal(info["stop_cond"], 2)
+        assert_equal(info["hits_boundary"], True)
+        assert_array_almost_equal(x[2], 1.6)
+
+    # The box constraints are active and the termination is
+    # because it hits boundary (infeasible interaction).
+    def test_active_box_constraints_hits_boundaries_infeasible_iter(self):
+        H = csc_matrix([[6, 2, 1, 3],
+                        [2, 5, 2, 4],
+                        [1, 2, 4, 5],
+                        [3, 4, 5, 7]])
+        A = csc_matrix([[1, 0, 1, 0],
+                        [0, 1, 1, 1]])
+        c = np.array([-2, -3, -3, 1])
+        b = -np.array([3, 0])
+        trust_radius = 4
+        Z, _, Y = projections(A)
+        x, info = projected_cg(H, c, Z, Y, b,
+                               tol=0,
+                               ub=[np.inf, 0.1, np.inf, np.inf],
+                               trust_radius=trust_radius,
+                               return_all=True)
+        assert_equal(info["stop_cond"], 2)
+        assert_equal(info["hits_boundary"], True)
+        assert_array_almost_equal(x[1], 0.1)
+
+    # The box constraints are active and the termination is
+    # because it hits boundary (no infeasible interaction).
+    def test_active_box_constraints_negative_curvature(self):
+        H = csc_matrix([[1, 2, 1, 3],
+                        [2, 0, 2, 4],
+                        [1, 2, 0, 2],
+                        [3, 4, 2, 0]])
+        A = csc_matrix([[1, 0, 1, 0],
+                        [0, 1, 0, 1]])
+        c = np.array([-2, -3, -3, 1])
+        b = -np.array([3, 0])
+        Z, _, Y = projections(A)
+        trust_radius = 1000
+        x, info = projected_cg(H, c, Z, Y, b,
+                               tol=0,
+                               ub=[np.inf, np.inf, 100, np.inf],
+                               trust_radius=trust_radius)
+        assert_equal(info["stop_cond"], 3)
+        assert_equal(info["hits_boundary"], True)
+        assert_array_almost_equal(x[2], 100)
@@ -0,0 +1,32 @@
+import numpy as np
+from scipy.optimize import minimize, Bounds
+
+def test_gh10880():
+    # checks that verbose reporting works with trust-constr for
+    # bound-contrained problems
+    bnds = Bounds(1, 2)
+    opts = {'maxiter': 1000, 'verbose': 2}
+    minimize(lambda x: x**2, x0=2., method='trust-constr',
+             bounds=bnds, options=opts)
+
+    opts = {'maxiter': 1000, 'verbose': 3}
+    minimize(lambda x: x**2, x0=2., method='trust-constr',
+             bounds=bnds, options=opts)
+
+def test_gh12922():
+    # checks that verbose reporting works with trust-constr for
+    # general constraints
+    def objective(x):
+        return np.array([(np.sum((x+1)**4))])
+
+    cons = {'type': 'ineq', 'fun': lambda x: -x[0]**2}
+    n = 25
+    x0 = np.linspace(-5, 5, n)
+
+    opts = {'maxiter': 1000, 'verbose': 2}
+    minimize(objective, x0=x0, method='trust-constr',
+                      constraints=cons, options=opts)
+
+    opts = {'maxiter': 1000, 'verbose': 3}
+    minimize(objective, x0=x0, method='trust-constr',
+                      constraints=cons, options=opts)
@@ -0,0 +1,346 @@
+"""Trust-region interior point method.
+
+References
+----------
+.. [1] Byrd, Richard H., Mary E. Hribar, and Jorge Nocedal.
+       "An interior point algorithm for large-scale nonlinear
+       programming." SIAM Journal on Optimization 9.4 (1999): 877-900.
+.. [2] Byrd, Richard H., Guanghui Liu, and Jorge Nocedal.
+       "On the local behavior of an interior point method for
+       nonlinear programming." Numerical analysis 1997 (1997): 37-56.
+.. [3] Nocedal, Jorge, and Stephen J. Wright. "Numerical optimization"
+       Second Edition (2006).
+"""
+
+import scipy.sparse as sps
+import numpy as np
+from .equality_constrained_sqp import equality_constrained_sqp
+from scipy.sparse.linalg import LinearOperator
+
+__all__ = ['tr_interior_point']
+
+
+class BarrierSubproblem:
+    """
+    Barrier optimization problem:
+        minimize fun(x) - barrier_parameter*sum(log(s))
+        subject to: constr_eq(x)     = 0
+                  constr_ineq(x) + s = 0
+    """
+
+    def __init__(self, x0, s0, fun, grad, lagr_hess, n_vars, n_ineq, n_eq,
+                 constr, jac, barrier_parameter, tolerance,
+                 enforce_feasibility, global_stop_criteria,
+                 xtol, fun0, grad0, constr_ineq0, jac_ineq0, constr_eq0,
+                 jac_eq0):
+        # Store parameters
+        self.n_vars = n_vars
+        self.x0 = x0
+        self.s0 = s0
+        self.fun = fun
+        self.grad = grad
+        self.lagr_hess = lagr_hess
+        self.constr = constr
+        self.jac = jac
+        self.barrier_parameter = barrier_parameter
+        self.tolerance = tolerance
+        self.n_eq = n_eq
+        self.n_ineq = n_ineq
+        self.enforce_feasibility = enforce_feasibility
+        self.global_stop_criteria = global_stop_criteria
+        self.xtol = xtol
+        self.fun0 = self._compute_function(fun0, constr_ineq0, s0)
+        self.grad0 = self._compute_gradient(grad0)
+        self.constr0 = self._compute_constr(constr_ineq0, constr_eq0, s0)
+        self.jac0 = self._compute_jacobian(jac_eq0, jac_ineq0, s0)
+        self.terminate = False
+
+    def update(self, barrier_parameter, tolerance):
+        self.barrier_parameter = barrier_parameter
+        self.tolerance = tolerance
+
+    def get_slack(self, z):
+        return z[self.n_vars:self.n_vars+self.n_ineq]
+
+    def get_variables(self, z):
+        return z[:self.n_vars]
+
+    def function_and_constraints(self, z):
+        """Returns barrier function and constraints at given point.
+
+        For z = [x, s], returns barrier function:
+            function(z) = fun(x) - barrier_parameter*sum(log(s))
+        and barrier constraints:
+            constraints(z) = [   constr_eq(x)     ]
+                             [ constr_ineq(x) + s ]
+
+        """
+        # Get variables and slack variables
+        x = self.get_variables(z)
+        s = self.get_slack(z)
+        # Compute function and constraints
+        f = self.fun(x)
+        c_eq, c_ineq = self.constr(x)
+        # Return objective function and constraints
+        return (self._compute_function(f, c_ineq, s),
+                self._compute_constr(c_ineq, c_eq, s))
+
+    def _compute_function(self, f, c_ineq, s):
+        # Use technique from Nocedal and Wright book, ref [3]_, p.576,
+        # to guarantee constraints from `enforce_feasibility`
+        # stay feasible along iterations.
+        s[self.enforce_feasibility] = -c_ineq[self.enforce_feasibility]
+        log_s = [np.log(s_i) if s_i > 0 else -np.inf for s_i in s]
+        # Compute barrier objective function
+        return f - self.barrier_parameter*np.sum(log_s)
+
+    def _compute_constr(self, c_ineq, c_eq, s):
+        # Compute barrier constraint
+        return np.hstack((c_eq,
+                          c_ineq + s))
+
+    def scaling(self, z):
+        """Returns scaling vector.
+        Given by:
+            scaling = [ones(n_vars), s]
+        """
+        s = self.get_slack(z)
+        diag_elements = np.hstack((np.ones(self.n_vars), s))
+
+        # Diagonal matrix
+        def matvec(vec):
+            return diag_elements*vec
+        return LinearOperator((self.n_vars+self.n_ineq,
+                               self.n_vars+self.n_ineq),
+                              matvec)
+
+    def gradient_and_jacobian(self, z):
+        """Returns scaled gradient.
+
+        Return scaled gradient:
+            gradient = [             grad(x)             ]
+                       [ -barrier_parameter*ones(n_ineq) ]
+        and scaled Jacobian matrix:
+            jacobian = [  jac_eq(x)  0  ]
+                       [ jac_ineq(x) S  ]
+        Both of them scaled by the previously defined scaling factor.
+        """
+        # Get variables and slack variables
+        x = self.get_variables(z)
+        s = self.get_slack(z)
+        # Compute first derivatives
+        g = self.grad(x)
+        J_eq, J_ineq = self.jac(x)
+        # Return gradient and Jacobian
+        return (self._compute_gradient(g),
+                self._compute_jacobian(J_eq, J_ineq, s))
+
+    def _compute_gradient(self, g):
+        return np.hstack((g, -self.barrier_parameter*np.ones(self.n_ineq)))
+
+    def _compute_jacobian(self, J_eq, J_ineq, s):
+        if self.n_ineq == 0:
+            return J_eq
+        else:
+            if sps.issparse(J_eq) or sps.issparse(J_ineq):
+                # It is expected that J_eq and J_ineq
+                # are already `csr_matrix` because of
+                # the way ``BoxConstraint``, ``NonlinearConstraint``
+                # and ``LinearConstraint`` are defined.
+                J_eq = sps.csr_matrix(J_eq)
+                J_ineq = sps.csr_matrix(J_ineq)
+                return self._assemble_sparse_jacobian(J_eq, J_ineq, s)
+            else:
+                S = np.diag(s)
+                zeros = np.zeros((self.n_eq, self.n_ineq))
+                # Convert to matrix
+                if sps.issparse(J_ineq):
+                    J_ineq = J_ineq.toarray()
+                if sps.issparse(J_eq):
+                    J_eq = J_eq.toarray()
+                # Concatenate matrices
+                return np.block([[J_eq, zeros],
+                                 [J_ineq, S]])
+
+    def _assemble_sparse_jacobian(self, J_eq, J_ineq, s):
+        """Assemble sparse Jacobian given its components.
+
+        Given ``J_eq``, ``J_ineq`` and ``s`` returns:
+            jacobian = [ J_eq,     0     ]
+                       [ J_ineq, diag(s) ]
+
+        It is equivalent to:
+            sps.bmat([[ J_eq,   None    ],
+                      [ J_ineq, diag(s) ]], "csr")
+        but significantly more efficient for this
+        given structure.
+        """
+        n_vars, n_ineq, n_eq = self.n_vars, self.n_ineq, self.n_eq
+        J_aux = sps.vstack([J_eq, J_ineq], "csr")
+        indptr, indices, data = J_aux.indptr, J_aux.indices, J_aux.data
+        new_indptr = indptr + np.hstack((np.zeros(n_eq, dtype=int),
+                                         np.arange(n_ineq+1, dtype=int)))
+        size = indices.size+n_ineq
+        new_indices = np.empty(size)
+        new_data = np.empty(size)
+        mask = np.full(size, False, bool)
+        mask[new_indptr[-n_ineq:]-1] = True
+        new_indices[mask] = n_vars+np.arange(n_ineq)
+        new_indices[~mask] = indices
+        new_data[mask] = s
+        new_data[~mask] = data
+        J = sps.csr_matrix((new_data, new_indices, new_indptr),
+                           (n_eq + n_ineq, n_vars + n_ineq))
+        return J
+
+    def lagrangian_hessian_x(self, z, v):
+        """Returns Lagrangian Hessian (in relation to `x`) -> Hx"""
+        x = self.get_variables(z)
+        # Get lagrange multipliers related to nonlinear equality constraints
+        v_eq = v[:self.n_eq]
+        # Get lagrange multipliers related to nonlinear ineq. constraints
+        v_ineq = v[self.n_eq:self.n_eq+self.n_ineq]
+        lagr_hess = self.lagr_hess
+        return lagr_hess(x, v_eq, v_ineq)
+
+    def lagrangian_hessian_s(self, z, v):
+        """Returns scaled Lagrangian Hessian (in relation to`s`) -> S Hs S"""
+        s = self.get_slack(z)
+        # Using the primal formulation:
+        #     S Hs S = diag(s)*diag(barrier_parameter/s**2)*diag(s).
+        # Reference [1]_ p. 882, formula (3.1)
+        primal = self.barrier_parameter
+        # Using the primal-dual formulation
+        #     S Hs S = diag(s)*diag(v/s)*diag(s)
+        # Reference [1]_ p. 883, formula (3.11)
+        primal_dual = v[-self.n_ineq:]*s
+        # Uses the primal-dual formulation for
+        # positives values of v_ineq, and primal
+        # formulation for the remaining ones.
+        return np.where(v[-self.n_ineq:] > 0, primal_dual, primal)
+
+    def lagrangian_hessian(self, z, v):
+        """Returns scaled Lagrangian Hessian"""
+        # Compute Hessian in relation to x and s
+        Hx = self.lagrangian_hessian_x(z, v)
+        if self.n_ineq > 0:
+            S_Hs_S = self.lagrangian_hessian_s(z, v)
+
+        # The scaled Lagragian Hessian is:
+        #     [ Hx    0    ]
+        #     [ 0   S Hs S ]
+        def matvec(vec):
+            vec_x = self.get_variables(vec)
+            vec_s = self.get_slack(vec)
+            if self.n_ineq > 0:
+                return np.hstack((Hx.dot(vec_x), S_Hs_S*vec_s))
+            else:
+                return Hx.dot(vec_x)
+        return LinearOperator((self.n_vars+self.n_ineq,
+                               self.n_vars+self.n_ineq),
+                              matvec)
+
+    def stop_criteria(self, state, z, last_iteration_failed,
+                      optimality, constr_violation,
+                      trust_radius, penalty, cg_info):
+        """Stop criteria to the barrier problem.
+        The criteria here proposed is similar to formula (2.3)
+        from [1]_, p.879.
+        """
+        x = self.get_variables(z)
+        if self.global_stop_criteria(state, x,
+                                     last_iteration_failed,
+                                     trust_radius, penalty,
+                                     cg_info,
+                                     self.barrier_parameter,
+                                     self.tolerance):
+            self.terminate = True
+            return True
+        else:
+            g_cond = (optimality < self.tolerance and
+                      constr_violation < self.tolerance)
+            x_cond = trust_radius < self.xtol
+            return g_cond or x_cond
+
+
+def tr_interior_point(fun, grad, lagr_hess, n_vars, n_ineq, n_eq,
+                      constr, jac, x0, fun0, grad0,
+                      constr_ineq0, jac_ineq0, constr_eq0,
+                      jac_eq0, stop_criteria,
+                      enforce_feasibility, xtol, state,
+                      initial_barrier_parameter,
+                      initial_tolerance,
+                      initial_penalty,
+                      initial_trust_radius,
+                      factorization_method):
+    """Trust-region interior points method.
+
+    Solve problem:
+        minimize fun(x)
+        subject to: constr_ineq(x) <= 0
+                    constr_eq(x) = 0
+    using trust-region interior point method described in [1]_.
+    """
+    # BOUNDARY_PARAMETER controls the decrease on the slack
+    # variables. Represents ``tau`` from [1]_ p.885, formula (3.18).
+    BOUNDARY_PARAMETER = 0.995
+    # BARRIER_DECAY_RATIO controls the decay of the barrier parameter
+    # and of the subproblem toloerance. Represents ``theta`` from [1]_ p.879.
+    BARRIER_DECAY_RATIO = 0.2
+    # TRUST_ENLARGEMENT controls the enlargement on trust radius
+    # after each iteration
+    TRUST_ENLARGEMENT = 5
+
+    # Default enforce_feasibility
+    if enforce_feasibility is None:
+        enforce_feasibility = np.zeros(n_ineq, bool)
+    # Initial Values
+    barrier_parameter = initial_barrier_parameter
+    tolerance = initial_tolerance
+    trust_radius = initial_trust_radius
+    # Define initial value for the slack variables
+    s0 = np.maximum(-1.5*constr_ineq0, np.ones(n_ineq))
+    # Define barrier subproblem
+    subprob = BarrierSubproblem(
+        x0, s0, fun, grad, lagr_hess, n_vars, n_ineq, n_eq, constr, jac,
+        barrier_parameter, tolerance, enforce_feasibility,
+        stop_criteria, xtol, fun0, grad0, constr_ineq0, jac_ineq0,
+        constr_eq0, jac_eq0)
+    # Define initial parameter for the first iteration.
+    z = np.hstack((x0, s0))
+    fun0_subprob, constr0_subprob = subprob.fun0, subprob.constr0
+    grad0_subprob, jac0_subprob = subprob.grad0, subprob.jac0
+    # Define trust region bounds
+    trust_lb = np.hstack((np.full(subprob.n_vars, -np.inf),
+                          np.full(subprob.n_ineq, -BOUNDARY_PARAMETER)))
+    trust_ub = np.full(subprob.n_vars+subprob.n_ineq, np.inf)
+
+    # Solves a sequence of barrier problems
+    while True:
+        # Solve SQP subproblem
+        z, state = equality_constrained_sqp(
+            subprob.function_and_constraints,
+            subprob.gradient_and_jacobian,
+            subprob.lagrangian_hessian,
+            z, fun0_subprob, grad0_subprob,
+            constr0_subprob, jac0_subprob, subprob.stop_criteria,
+            state, initial_penalty, trust_radius,
+            factorization_method, trust_lb, trust_ub, subprob.scaling)
+        if subprob.terminate:
+            break
+        # Update parameters
+        trust_radius = max(initial_trust_radius,
+                           TRUST_ENLARGEMENT*state.tr_radius)
+        # TODO: Use more advanced strategies from [2]_
+        # to update this parameters.
+        barrier_parameter *= BARRIER_DECAY_RATIO
+        tolerance *= BARRIER_DECAY_RATIO
+        # Update Barrier Problem
+        subprob.update(barrier_parameter, tolerance)
+        # Compute initial values for next iteration
+        fun0_subprob, constr0_subprob = subprob.function_and_constraints(z)
+        grad0_subprob, jac0_subprob = subprob.gradient_and_jacobian(z)
+
+    # Get x and s
+    x = subprob.get_variables(z)
+    return x, state
@@ -0,0 +1,122 @@
+"""Dog-leg trust-region optimization."""
+import numpy as np
+import scipy.linalg
+from ._trustregion import (_minimize_trust_region, BaseQuadraticSubproblem)
+
+__all__ = []
+
+
+def _minimize_dogleg(fun, x0, args=(), jac=None, hess=None,
+                     **trust_region_options):
+    """
+    Minimization of scalar function of one or more variables using
+    the dog-leg trust-region algorithm.
+
+    Options
+    -------
+    initial_trust_radius : float
+        Initial trust-region radius.
+    max_trust_radius : float
+        Maximum value of the trust-region radius. No steps that are longer
+        than this value will be proposed.
+    eta : float
+        Trust region related acceptance stringency for proposed steps.
+    gtol : float
+        Gradient norm must be less than `gtol` before successful
+        termination.
+
+    """
+    if jac is None:
+        raise ValueError('Jacobian is required for dogleg minimization')
+    if not callable(hess):
+        raise ValueError('Hessian is required for dogleg minimization')
+    return _minimize_trust_region(fun, x0, args=args, jac=jac, hess=hess,
+                                  subproblem=DoglegSubproblem,
+                                  **trust_region_options)
+
+
+class DoglegSubproblem(BaseQuadraticSubproblem):
+    """Quadratic subproblem solved by the dogleg method"""
+
+    def cauchy_point(self):
+        """
+        The Cauchy point is minimal along the direction of steepest descent.
+        """
+        if self._cauchy_point is None:
+            g = self.jac
+            Bg = self.hessp(g)
+            self._cauchy_point = -(np.dot(g, g) / np.dot(g, Bg)) * g
+        return self._cauchy_point
+
+    def newton_point(self):
+        """
+        The Newton point is a global minimum of the approximate function.
+        """
+        if self._newton_point is None:
+            g = self.jac
+            B = self.hess
+            cho_info = scipy.linalg.cho_factor(B)
+            self._newton_point = -scipy.linalg.cho_solve(cho_info, g)
+        return self._newton_point
+
+    def solve(self, trust_radius):
+        """
+        Minimize a function using the dog-leg trust-region algorithm.
+
+        This algorithm requires function values and first and second derivatives.
+        It also performs a costly Hessian decomposition for most iterations,
+        and the Hessian is required to be positive definite.
+
+        Parameters
+        ----------
+        trust_radius : float
+            We are allowed to wander only this far away from the origin.
+
+        Returns
+        -------
+        p : ndarray
+            The proposed step.
+        hits_boundary : bool
+            True if the proposed step is on the boundary of the trust region.
+
+        Notes
+        -----
+        The Hessian is required to be positive definite.
+
+        References
+        ----------
+        .. [1] Jorge Nocedal and Stephen Wright,
+               Numerical Optimization, second edition,
+               Springer-Verlag, 2006, page 73.
+        """
+
+        # Compute the Newton point.
+        # This is the optimum for the quadratic model function.
+        # If it is inside the trust radius then return this point.
+        p_best = self.newton_point()
+        if scipy.linalg.norm(p_best) < trust_radius:
+            hits_boundary = False
+            return p_best, hits_boundary
+
+        # Compute the Cauchy point.
+        # This is the predicted optimum along the direction of steepest descent.
+        p_u = self.cauchy_point()
+
+        # If the Cauchy point is outside the trust region,
+        # then return the point where the path intersects the boundary.
+        p_u_norm = scipy.linalg.norm(p_u)
+        if p_u_norm >= trust_radius:
+            p_boundary = p_u * (trust_radius / p_u_norm)
+            hits_boundary = True
+            return p_boundary, hits_boundary
+
+        # Compute the intersection of the trust region boundary
+        # and the line segment connecting the Cauchy and Newton points.
+        # This requires solving a quadratic equation.
+        # ||p_u + t*(p_best - p_u)||**2 == trust_radius**2
+        # Solve this for positive time t using the quadratic formula.
+        _, tb = self.get_boundaries_intersections(p_u, p_best - p_u,
+                                                  trust_radius)
+        p_boundary = p_u + tb * (p_best - p_u)
+        hits_boundary = True
+        return p_boundary, hits_boundary
@@ -0,0 +1,438 @@
+"""Nearly exact trust-region optimization subproblem."""
+import numpy as np
+from scipy.linalg import (norm, get_lapack_funcs, solve_triangular,
+                          cho_solve)
+from ._trustregion import (_minimize_trust_region, BaseQuadraticSubproblem)
+
+__all__ = ['_minimize_trustregion_exact',
+           'estimate_smallest_singular_value',
+           'singular_leading_submatrix',
+           'IterativeSubproblem']
+
+
+def _minimize_trustregion_exact(fun, x0, args=(), jac=None, hess=None,
+                                **trust_region_options):
+    """
+    Minimization of scalar function of one or more variables using
+    a nearly exact trust-region algorithm.
+
+    Options
+    -------
+    initial_trust_radius : float
+        Initial trust-region radius.
+    max_trust_radius : float
+        Maximum value of the trust-region radius. No steps that are longer
+        than this value will be proposed.
+    eta : float
+        Trust region related acceptance stringency for proposed steps.
+    gtol : float
+        Gradient norm must be less than ``gtol`` before successful
+        termination.
+    """
+
+    if jac is None:
+        raise ValueError('Jacobian is required for trust region '
+                         'exact minimization.')
+    if not callable(hess):
+        raise ValueError('Hessian matrix is required for trust region '
+                         'exact minimization.')
+    return _minimize_trust_region(fun, x0, args=args, jac=jac, hess=hess,
+                                  subproblem=IterativeSubproblem,
+                                  **trust_region_options)
+
+
+def estimate_smallest_singular_value(U):
+    """Given upper triangular matrix ``U`` estimate the smallest singular
+    value and the correspondent right singular vector in O(n**2) operations.
+
+    Parameters
+    ----------
+    U : ndarray
+        Square upper triangular matrix.
+
+    Returns
+    -------
+    s_min : float
+        Estimated smallest singular value of the provided matrix.
+    z_min : ndarray
+        Estimatied right singular vector.
+
+    Notes
+    -----
+    The procedure is based on [1]_ and is done in two steps. First, it finds
+    a vector ``e`` with components selected from {+1, -1} such that the
+    solution ``w`` from the system ``U.T w = e`` is as large as possible.
+    Next it estimate ``U v = w``. The smallest singular value is close
+    to ``norm(w)/norm(v)`` and the right singular vector is close
+    to ``v/norm(v)``.
+
+    The estimation will be better more ill-conditioned is the matrix.
+
+    References
+    ----------
+    .. [1] Cline, A. K., Moler, C. B., Stewart, G. W., Wilkinson, J. H.
+           An estimate for the condition number of a matrix.  1979.
+           SIAM Journal on Numerical Analysis, 16(2), 368-375.
+    """
+
+    U = np.atleast_2d(U)
+    m, n = U.shape
+
+    if m != n:
+        raise ValueError("A square triangular matrix should be provided.")
+
+    # A vector `e` with components selected from {+1, -1}
+    # is selected so that the solution `w` to the system
+    # `U.T w = e` is as large as possible. Implementation
+    # based on algorithm 3.5.1, p. 142, from reference [2]
+    # adapted for lower triangular matrix.
+
+    p = np.zeros(n)
+    w = np.empty(n)
+
+    # Implemented according to:  Golub, G. H., Van Loan, C. F. (2013).
+    # "Matrix computations". Forth Edition. JHU press. pp. 140-142.
+    for k in range(n):
+        wp = (1-p[k]) / U.T[k, k]
+        wm = (-1-p[k]) / U.T[k, k]
+        pp = p[k+1:] + U.T[k+1:, k]*wp
+        pm = p[k+1:] + U.T[k+1:, k]*wm
+
+        if abs(wp) + norm(pp, 1) >= abs(wm) + norm(pm, 1):
+            w[k] = wp
+            p[k+1:] = pp
+        else:
+            w[k] = wm
+            p[k+1:] = pm
+
+    # The system `U v = w` is solved using backward substitution.
+    v = solve_triangular(U, w)
+
+    v_norm = norm(v)
+    w_norm = norm(w)
+
+    # Smallest singular value
+    s_min = w_norm / v_norm
+
+    # Associated vector
+    z_min = v / v_norm
+
+    return s_min, z_min
+
+
+def gershgorin_bounds(H):
+    """
+    Given a square matrix ``H`` compute upper
+    and lower bounds for its eigenvalues (Gregoshgorin Bounds).
+    Defined ref. [1].
+
+    References
+    ----------
+    .. [1] Conn, A. R., Gould, N. I., & Toint, P. L.
+           Trust region methods. 2000. Siam. pp. 19.
+    """
+
+    H_diag = np.diag(H)
+    H_diag_abs = np.abs(H_diag)
+    H_row_sums = np.sum(np.abs(H), axis=1)
+    lb = np.min(H_diag + H_diag_abs - H_row_sums)
+    ub = np.max(H_diag - H_diag_abs + H_row_sums)
+
+    return lb, ub
+
+
+def singular_leading_submatrix(A, U, k):
+    """
+    Compute term that makes the leading ``k`` by ``k``
+    submatrix from ``A`` singular.
+
+    Parameters
+    ----------
+    A : ndarray
+        Symmetric matrix that is not positive definite.
+    U : ndarray
+        Upper triangular matrix resulting of an incomplete
+        Cholesky decomposition of matrix ``A``.
+    k : int
+        Positive integer such that the leading k by k submatrix from
+        `A` is the first non-positive definite leading submatrix.
+
+    Returns
+    -------
+    delta : float
+        Amount that should be added to the element (k, k) of the
+        leading k by k submatrix of ``A`` to make it singular.
+    v : ndarray
+        A vector such that ``v.T B v = 0``. Where B is the matrix A after
+        ``delta`` is added to its element (k, k).
+    """
+
+    # Compute delta
+    delta = np.sum(U[:k-1, k-1]**2) - A[k-1, k-1]
+
+    n = len(A)
+
+    # Inicialize v
+    v = np.zeros(n)
+    v[k-1] = 1
+
+    # Compute the remaining values of v by solving a triangular system.
+    if k != 1:
+        v[:k-1] = solve_triangular(U[:k-1, :k-1], -U[:k-1, k-1])
+
+    return delta, v
+
+
+class IterativeSubproblem(BaseQuadraticSubproblem):
+    """Quadratic subproblem solved by nearly exact iterative method.
+
+    Notes
+    -----
+    This subproblem solver was based on [1]_, [2]_ and [3]_,
+    which implement similar algorithms. The algorithm is basically
+    that of [1]_ but ideas from [2]_ and [3]_ were also used.
+
+    References
+    ----------
+    .. [1] A.R. Conn, N.I. Gould, and P.L. Toint, "Trust region methods",
+           Siam, pp. 169-200, 2000.
+    .. [2] J. Nocedal and  S. Wright, "Numerical optimization",
+           Springer Science & Business Media. pp. 83-91, 2006.
+    .. [3] J.J. More and D.C. Sorensen, "Computing a trust region step",
+           SIAM Journal on Scientific and Statistical Computing, vol. 4(3),
+           pp. 553-572, 1983.
+    """
+
+    # UPDATE_COEFF appears in reference [1]_
+    # in formula 7.3.14 (p. 190) named as "theta".
+    # As recommended there it value is fixed in 0.01.
+    UPDATE_COEFF = 0.01
+
+    EPS = np.finfo(float).eps
+
+    def __init__(self, x, fun, jac, hess, hessp=None,
+                 k_easy=0.1, k_hard=0.2):
+
+        super().__init__(x, fun, jac, hess)
+
+        # When the trust-region shrinks in two consecutive
+        # calculations (``tr_radius < previous_tr_radius``)
+        # the lower bound ``lambda_lb`` may be reused,
+        # facilitating  the convergence. To indicate no
+        # previous value is known at first ``previous_tr_radius``
+        # is set to -1  and ``lambda_lb`` to None.
+        self.previous_tr_radius = -1
+        self.lambda_lb = None
+
+        self.niter = 0
+
+        # ``k_easy`` and ``k_hard`` are parameters used
+        # to determine the stop criteria to the iterative
+        # subproblem solver. Take a look at pp. 194-197
+        # from reference _[1] for a more detailed description.
+        self.k_easy = k_easy
+        self.k_hard = k_hard
+
+        # Get Lapack function for cholesky decomposition.
+        # The implemented SciPy wrapper does not return
+        # the incomplete factorization needed by the method.
+        self.cholesky, = get_lapack_funcs(('potrf',), (self.hess,))
+
+        # Get info about Hessian
+        self.dimension = len(self.hess)
+        self.hess_gershgorin_lb,\
+            self.hess_gershgorin_ub = gershgorin_bounds(self.hess)
+        self.hess_inf = norm(self.hess, np.inf)
+        self.hess_fro = norm(self.hess, 'fro')
+
+        # A constant such that for vectors smaller than that
+        # backward substituition is not reliable. It was stabilished
+        # based on Golub, G. H., Van Loan, C. F. (2013).
+        # "Matrix computations". Forth Edition. JHU press., p.165.
+        self.CLOSE_TO_ZERO = self.dimension * self.EPS * self.hess_inf
+
+    def _initial_values(self, tr_radius):
+        """Given a trust radius, return a good initial guess for
+        the damping factor, the lower bound and the upper bound.
+        The values were chosen accordingly to the guidelines on
+        section 7.3.8 (p. 192) from [1]_.
+        """
+
+        # Upper bound for the damping factor
+        lambda_ub = max(0, self.jac_mag/tr_radius + min(-self.hess_gershgorin_lb,
+                                                        self.hess_fro,
+                                                        self.hess_inf))
+
+        # Lower bound for the damping factor
+        lambda_lb = max(0, -min(self.hess.diagonal()),
+                        self.jac_mag/tr_radius - min(self.hess_gershgorin_ub,
+                                                     self.hess_fro,
+                                                     self.hess_inf))
+
+        # Improve bounds with previous info
+        if tr_radius < self.previous_tr_radius:
+            lambda_lb = max(self.lambda_lb, lambda_lb)
+
+        # Initial guess for the damping factor
+        if lambda_lb == 0:
+            lambda_initial = 0
+        else:
+            lambda_initial = max(np.sqrt(lambda_lb * lambda_ub),
+                                 lambda_lb + self.UPDATE_COEFF*(lambda_ub-lambda_lb))
+
+        return lambda_initial, lambda_lb, lambda_ub
+
+    def solve(self, tr_radius):
+        """Solve quadratic subproblem"""
+
+        lambda_current, lambda_lb, lambda_ub = self._initial_values(tr_radius)
+        n = self.dimension
+        hits_boundary = True
+        already_factorized = False
+        self.niter = 0
+
+        while True:
+
+            # Compute Cholesky factorization
+            if already_factorized:
+                already_factorized = False
+            else:
+                H = self.hess+lambda_current*np.eye(n)
+                U, info = self.cholesky(H, lower=False,
+                                        overwrite_a=False,
+                                        clean=True)
+
+            self.niter += 1
+
+            # Check if factorization succeeded
+            if info == 0 and self.jac_mag > self.CLOSE_TO_ZERO:
+                # Successful factorization
+
+                # Solve `U.T U p = s`
+                p = cho_solve((U, False), -self.jac)
+
+                p_norm = norm(p)
+
+                # Check for interior convergence
+                if p_norm <= tr_radius and lambda_current == 0:
+                    hits_boundary = False
+                    break
+
+                # Solve `U.T w = p`
+                w = solve_triangular(U, p, trans='T')
+
+                w_norm = norm(w)
+
+                # Compute Newton step accordingly to
+                # formula (4.44) p.87 from ref [2]_.
+                delta_lambda = (p_norm/w_norm)**2 * (p_norm-tr_radius)/tr_radius
+                lambda_new = lambda_current + delta_lambda
+
+                if p_norm < tr_radius:  # Inside boundary
+                    s_min, z_min = estimate_smallest_singular_value(U)
+
+                    ta, tb = self.get_boundaries_intersections(p, z_min,
+                                                               tr_radius)
+
+                    # Choose `step_len` with the smallest magnitude.
+                    # The reason for this choice is explained at
+                    # ref [3]_, p. 6 (Immediately before the formula
+                    # for `tau`).
+                    step_len = min([ta, tb], key=abs)
+
+                    # Compute the quadratic term  (p.T*H*p)
+                    quadratic_term = np.dot(p, np.dot(H, p))
+
+                    # Check stop criteria
+                    relative_error = ((step_len**2 * s_min**2)
+                                      / (quadratic_term + lambda_current*tr_radius**2))
+                    if relative_error <= self.k_hard:
+                        p += step_len * z_min
+                        break
+
+                    # Update uncertanty bounds
+                    lambda_ub = lambda_current
+                    lambda_lb = max(lambda_lb, lambda_current - s_min**2)
+
+                    # Compute Cholesky factorization
+                    H = self.hess + lambda_new*np.eye(n)
+                    c, info = self.cholesky(H, lower=False,
+                                            overwrite_a=False,
+                                            clean=True)
+
+                    # Check if the factorization have succeeded
+                    #
+                    if info == 0:  # Successful factorization
+                        # Update damping factor
+                        lambda_current = lambda_new
+                        already_factorized = True
+                    else:  # Unsuccessful factorization
+                        # Update uncertanty bounds
+                        lambda_lb = max(lambda_lb, lambda_new)
+
+                        # Update damping factor
+                        lambda_current = max(
+                            np.sqrt(lambda_lb * lambda_ub),
+                            lambda_lb + self.UPDATE_COEFF*(lambda_ub-lambda_lb)
+                        )
+
+                else:  # Outside boundary
+                    # Check stop criteria
+                    relative_error = abs(p_norm - tr_radius) / tr_radius
+                    if relative_error <= self.k_easy:
+                        break
+
+                    # Update uncertanty bounds
+                    lambda_lb = lambda_current
+
+                    # Update damping factor
+                    lambda_current = lambda_new
+
+            elif info == 0 and self.jac_mag <= self.CLOSE_TO_ZERO:
+                # jac_mag very close to zero
+
+                # Check for interior convergence
+                if lambda_current == 0:
+                    p = np.zeros(n)
+                    hits_boundary = False
+                    break
+
+                s_min, z_min = estimate_smallest_singular_value(U)
+                step_len = tr_radius
+
+                # Check stop criteria
+                if (step_len**2 * s_min**2
+                    <= self.k_hard * lambda_current * tr_radius**2):
+                    p = step_len * z_min
+                    break
+
+                # Update uncertanty bounds
+                lambda_ub = lambda_current
+                lambda_lb = max(lambda_lb, lambda_current - s_min**2)
+
+                # Update damping factor
+                lambda_current = max(
+                    np.sqrt(lambda_lb * lambda_ub),
+                    lambda_lb + self.UPDATE_COEFF*(lambda_ub-lambda_lb)
+                )
+
+            else:  # Unsuccessful factorization
+
+                # Compute auxiliary terms
+                delta, v = singular_leading_submatrix(H, U, info)
+                v_norm = norm(v)
+
+                # Update uncertanty interval
+                lambda_lb = max(lambda_lb, lambda_current + delta/v_norm**2)
+
+                # Update damping factor
+                lambda_current = max(
+                    np.sqrt(lambda_lb * lambda_ub),
+                    lambda_lb + self.UPDATE_COEFF*(lambda_ub-lambda_lb)
+                )
+
+        self.lambda_lb = lambda_lb
+        self.lambda_current = lambda_current
+        self.previous_tr_radius = tr_radius
+
+        return p, hits_boundary
@@ -0,0 +1,65 @@
+from ._trustregion import (_minimize_trust_region)
+from ._trlib import (get_trlib_quadratic_subproblem)
+
+__all__ = ['_minimize_trust_krylov']
+
+def _minimize_trust_krylov(fun, x0, args=(), jac=None, hess=None, hessp=None,
+                           inexact=True, **trust_region_options):
+    """
+    Minimization of a scalar function of one or more variables using
+    a nearly exact trust-region algorithm that only requires matrix
+    vector products with the hessian matrix.
+
+    .. versionadded:: 1.0.0
+
+    Options
+    -------
+    inexact : bool, optional
+        Accuracy to solve subproblems. If True requires less nonlinear
+        iterations, but more vector products.
+    """
+
+    if jac is None:
+        raise ValueError('Jacobian is required for trust region ',
+                         'exact minimization.')
+    if hess is None and hessp is None:
+        raise ValueError('Either the Hessian or the Hessian-vector product '
+                         'is required for Krylov trust-region minimization')
+
+    # tol_rel specifies the termination tolerance relative to the initial
+    # gradient norm in the Krylov subspace iteration.
+
+    # - tol_rel_i specifies the tolerance for interior convergence.
+    # - tol_rel_b specifies the tolerance for boundary convergence.
+    #   in nonlinear programming applications it is not necessary to solve
+    #   the boundary case as exact as the interior case.
+
+    # - setting tol_rel_i=-2 leads to a forcing sequence in the Krylov
+    #   subspace iteration leading to quadratic convergence if eventually
+    #   the trust region stays inactive.
+    # - setting tol_rel_b=-3 leads to a forcing sequence in the Krylov
+    #   subspace iteration leading to superlinear convergence as long
+    #   as the iterates hit the trust region boundary.
+
+    # For details consult the documentation of trlib_krylov_min
+    # in _trlib/trlib_krylov.h
+    #
+    # Optimality of this choice of parameters among a range of possibilities
+    # has been tested on the unconstrained subset of the CUTEst library.
+
+    if inexact:
+        return _minimize_trust_region(fun, x0, args=args, jac=jac,
+                                      hess=hess, hessp=hessp,
+                                      subproblem=get_trlib_quadratic_subproblem(
+                                          tol_rel_i=-2.0, tol_rel_b=-3.0,
+                                          disp=trust_region_options.get('disp', False)
+                                          ),
+                                      **trust_region_options)
+    else:
+        return _minimize_trust_region(fun, x0, args=args, jac=jac,
+                                      hess=hess, hessp=hessp,
+                                      subproblem=get_trlib_quadratic_subproblem(
+                                          tol_rel_i=1e-8, tol_rel_b=1e-6,
+                                          disp=trust_region_options.get('disp', False)
+                                          ),
+                                      **trust_region_options)
@@ -0,0 +1,126 @@
+"""Newton-CG trust-region optimization."""
+import math
+
+import numpy as np
+import scipy.linalg
+from ._trustregion import (_minimize_trust_region, BaseQuadraticSubproblem)
+
+__all__ = []
+
+
+def _minimize_trust_ncg(fun, x0, args=(), jac=None, hess=None, hessp=None,
+                        **trust_region_options):
+    """
+    Minimization of scalar function of one or more variables using
+    the Newton conjugate gradient trust-region algorithm.
+
+    Options
+    -------
+    initial_trust_radius : float
+        Initial trust-region radius.
+    max_trust_radius : float
+        Maximum value of the trust-region radius. No steps that are longer
+        than this value will be proposed.
+    eta : float
+        Trust region related acceptance stringency for proposed steps.
+    gtol : float
+        Gradient norm must be less than `gtol` before successful
+        termination.
+
+    """
+    if jac is None:
+        raise ValueError('Jacobian is required for Newton-CG trust-region '
+                         'minimization')
+    if hess is None and hessp is None:
+        raise ValueError('Either the Hessian or the Hessian-vector product '
+                         'is required for Newton-CG trust-region minimization')
+    return _minimize_trust_region(fun, x0, args=args, jac=jac, hess=hess,
+                                  hessp=hessp, subproblem=CGSteihaugSubproblem,
+                                  **trust_region_options)
+
+
+class CGSteihaugSubproblem(BaseQuadraticSubproblem):
+    """Quadratic subproblem solved by a conjugate gradient method"""
+    def solve(self, trust_radius):
+        """
+        Solve the subproblem using a conjugate gradient method.
+
+        Parameters
+        ----------
+        trust_radius : float
+            We are allowed to wander only this far away from the origin.
+
+        Returns
+        -------
+        p : ndarray
+            The proposed step.
+        hits_boundary : bool
+            True if the proposed step is on the boundary of the trust region.
+
+        Notes
+        -----
+        This is algorithm (7.2) of Nocedal and Wright 2nd edition.
+        Only the function that computes the Hessian-vector product is required.
+        The Hessian itself is not required, and the Hessian does
+        not need to be positive semidefinite.
+        """
+
+        # get the norm of jacobian and define the origin
+        p_origin = np.zeros_like(self.jac)
+
+        # define a default tolerance
+        tolerance = min(0.5, math.sqrt(self.jac_mag)) * self.jac_mag
+
+        # Stop the method if the search direction
+        # is a direction of nonpositive curvature.
+        if self.jac_mag < tolerance:
+            hits_boundary = False
+            return p_origin, hits_boundary
+
+        # init the state for the first iteration
+        z = p_origin
+        r = self.jac
+        d = -r
+
+        # Search for the min of the approximation of the objective function.
+        while True:
+
+            # do an iteration
+            Bd = self.hessp(d)
+            dBd = np.dot(d, Bd)
+            if dBd <= 0:
+                # Look at the two boundary points.
+                # Find both values of t to get the boundary points such that
+                # ||z + t d|| == trust_radius
+                # and then choose the one with the predicted min value.
+                ta, tb = self.get_boundaries_intersections(z, d, trust_radius)
+                pa = z + ta * d
+                pb = z + tb * d
+                if self(pa) < self(pb):
+                    p_boundary = pa
+                else:
+                    p_boundary = pb
+                hits_boundary = True
+                return p_boundary, hits_boundary
+            r_squared = np.dot(r, r)
+            alpha = r_squared / dBd
+            z_next = z + alpha * d
+            if scipy.linalg.norm(z_next) >= trust_radius:
+                # Find t >= 0 to get the boundary point such that
+                # ||z + t d|| == trust_radius
+                ta, tb = self.get_boundaries_intersections(z, d, trust_radius)
+                p_boundary = z + tb * d
+                hits_boundary = True
+                return p_boundary, hits_boundary
+            r_next = r + alpha * Bd
+            r_next_squared = np.dot(r_next, r_next)
+            if math.sqrt(r_next_squared) < tolerance:
+                hits_boundary = False
+                return z_next, hits_boundary
+            beta_next = r_next_squared / r_squared
+            d_next = -r_next + beta_next * d
+
+            # update the state for the next iteration
+            z = z_next
+            r = r_next
+            d = d_next
@@ -0,0 +1,968 @@
+r"""
+Parameters used in test and benchmark methods.
+
+Collections of test cases suitable for testing 1-D root-finders
+  'original': The original benchmarking functions.
+     Real-valued functions of real-valued inputs on an interval
+     with a zero.
+     f1, .., f3 are continuous and infinitely differentiable
+     f4 has a left- and right- discontinuity at the root
+     f5 has a root at 1 replacing a 1st order pole
+     f6 is randomly positive on one side of the root,
+     randomly negative on the other.
+     f4 - f6 are not continuous at the root.
+
+  'aps': The test problems in the 1995 paper
+     TOMS "Algorithm 748: Enclosing Zeros of Continuous Functions"
+     by Alefeld, Potra and Shi. Real-valued functions of
+     real-valued inputs on an interval with a zero.
+     Suitable for methods which start with an enclosing interval, and
+     derivatives up to 2nd order.
+
+  'complex': Some complex-valued functions of complex-valued inputs.
+     No enclosing bracket is provided.
+     Suitable for methods which use one or more starting values, and
+     derivatives up to 2nd order.
+
+  The test cases are provided as a list of dictionaries. The dictionary
+  keys will be a subset of:
+  ["f", "fprime", "fprime2", "args", "bracket", "smoothness",
+  "a", "b", "x0", "x1", "root", "ID"]
+"""
+
+# Sources:
+#  [1] Alefeld, G. E. and Potra, F. A. and Shi, Yixun,
+#      "Algorithm 748: Enclosing Zeros of Continuous Functions",
+#      ACM Trans. Math. Softw. Volume 221(1995)
+#       doi = {10.1145/210089.210111},
+#  [2] Chandrupatla, Tirupathi R. "A new hybrid quadratic/bisection algorithm
+#      for finding the zero of a nonlinear function without using derivatives."
+#      Advances in Engineering Software 28.3 (1997): 145-149.
+
+from random import random
+
+import numpy as np
+
+from scipy.optimize import _zeros_py as cc
+
+# "description" refers to the original functions
+description = """
+f2 is a symmetric parabola, x**2 - 1
+f3 is a quartic polynomial with large hump in interval
+f4 is step function with a discontinuity at 1
+f5 is a hyperbola with vertical asymptote at 1
+f6 has random values positive to left of 1, negative to right
+
+Of course, these are not real problems. They just test how the
+'good' solvers behave in bad circumstances where bisection is
+really the best. A good solver should not be much worse than
+bisection in such circumstance, while being faster for smooth
+monotone sorts of functions.
+"""
+
+
+def f1(x):
+    r"""f1 is a quadratic with roots at 0 and 1"""
+    return x * (x - 1.)
+
+
+def f1_fp(x):
+    return 2 * x - 1
+
+
+def f1_fpp(x):
+    return 2
+
+
+def f2(x):
+    r"""f2 is a symmetric parabola, x**2 - 1"""
+    return x**2 - 1
+
+
+def f2_fp(x):
+    return 2 * x
+
+
+def f2_fpp(x):
+    return 2
+
+
+def f3(x):
+    r"""A quartic with roots at 0, 1, 2 and 3"""
+    return x * (x - 1.) * (x - 2.) * (x - 3.)  # x**4 - 6x**3 + 11x**2 - 6x
+
+
+def f3_fp(x):
+    return 4 * x**3 - 18 * x**2 + 22 * x - 6
+
+
+def f3_fpp(x):
+    return 12 * x**2 - 36 * x + 22
+
+
+def f4(x):
+    r"""Piecewise linear, left- and right- discontinuous at x=1, the root."""
+    if x > 1:
+        return 1.0 + .1 * x
+    if x < 1:
+        return -1.0 + .1 * x
+    return 0
+
+
+def f5(x):
+    r"""
+    Hyperbola with a pole at x=1, but pole replaced with 0. Not continuous at root.
+    """
+    if x != 1:
+        return 1.0 / (1. - x)
+    return 0
+
+
+# f6(x) returns random value. Without memoization, calling twice with the
+# same x returns different values, hence a "random value", not a
+# "function with random values"
+_f6_cache = {}
+def f6(x):
+    v = _f6_cache.get(x, None)
+    if v is None:
+        if x > 1:
+            v = random()
+        elif x < 1:
+            v = -random()
+        else:
+            v = 0
+        _f6_cache[x] = v
+    return v
+
+
+# Each Original test case has
+# - a function and its two derivatives,
+# - additional arguments,
+# - a bracket enclosing a root,
+# - the order of differentiability (smoothness) on this interval
+# - a starting value for methods which don't require a bracket
+# - the root (inside the bracket)
+# - an Identifier of the test case
+
+_ORIGINAL_TESTS_KEYS = [
+    "f", "fprime", "fprime2", "args", "bracket", "smoothness", "x0", "root", "ID"
+]
+_ORIGINAL_TESTS = [
+    [f1, f1_fp, f1_fpp, (), [0.5, np.sqrt(3)], np.inf, 0.6, 1.0, "original.01.00"],
+    [f2, f2_fp, f2_fpp, (), [0.5, np.sqrt(3)], np.inf, 0.6, 1.0, "original.02.00"],
+    [f3, f3_fp, f3_fpp, (), [0.5, np.sqrt(3)], np.inf, 0.6, 1.0, "original.03.00"],
+    [f4, None, None, (), [0.5, np.sqrt(3)], -1, 0.6, 1.0, "original.04.00"],
+    [f5, None, None, (), [0.5, np.sqrt(3)], -1, 0.6, 1.0, "original.05.00"],
+    [f6, None, None, (), [0.5, np.sqrt(3)], -np.inf, 0.6, 1.0, "original.05.00"]
+]
+
+_ORIGINAL_TESTS_DICTS = [
+    dict(zip(_ORIGINAL_TESTS_KEYS, testcase)) for testcase in _ORIGINAL_TESTS
+]
+
+#   ##################
+#   "APS" test cases
+#   Functions and test cases that appear in [1]
+
+
+def aps01_f(x):
+    r"""Straightforward sum of trigonometric function and polynomial"""
+    return np.sin(x) - x / 2
+
+
+def aps01_fp(x):
+    return np.cos(x) - 1.0 / 2
+
+
+def aps01_fpp(x):
+    return -np.sin(x)
+
+
+def aps02_f(x):
+    r"""poles at x=n**2, 1st and 2nd derivatives at root are also close to 0"""
+    ii = np.arange(1, 21)
+    return -2 * np.sum((2 * ii - 5)**2 / (x - ii**2)**3)
+
+
+def aps02_fp(x):
+    ii = np.arange(1, 21)
+    return 6 * np.sum((2 * ii - 5)**2 / (x - ii**2)**4)
+
+
+def aps02_fpp(x):
+    ii = np.arange(1, 21)
+    return 24 * np.sum((2 * ii - 5)**2 / (x - ii**2)**5)
+
+
+def aps03_f(x, a, b):
+    r"""Rapidly changing at the root"""
+    return a * x * np.exp(b * x)
+
+
+def aps03_fp(x, a, b):
+    return a * (b * x + 1) * np.exp(b * x)
+
+
+def aps03_fpp(x, a, b):
+    return a * (b * (b * x + 1) + b) * np.exp(b * x)
+
+
+def aps04_f(x, n, a):
+    r"""Medium-degree polynomial"""
+    return x**n - a
+
+
+def aps04_fp(x, n, a):
+    return n * x**(n - 1)
+
+
+def aps04_fpp(x, n, a):
+    return n * (n - 1) * x**(n - 2)
+
+
+def aps05_f(x):
+    r"""Simple Trigonometric function"""
+    return np.sin(x) - 1.0 / 2
+
+
+def aps05_fp(x):
+    return np.cos(x)
+
+
+def aps05_fpp(x):
+    return -np.sin(x)
+
+
+def aps06_f(x, n):
+    r"""Exponential rapidly changing from -1 to 1 at x=0"""
+    return 2 * x * np.exp(-n) - 2 * np.exp(-n * x) + 1
+
+
+def aps06_fp(x, n):
+    return 2 * np.exp(-n) + 2 * n * np.exp(-n * x)
+
+
+def aps06_fpp(x, n):
+    return -2 * n * n * np.exp(-n * x)
+
+
+def aps07_f(x, n):
+    r"""Upside down parabola with parametrizable height"""
+    return (1 + (1 - n)**2) * x - (1 - n * x)**2
+
+
+def aps07_fp(x, n):
+    return (1 + (1 - n)**2) + 2 * n * (1 - n * x)
+
+
+def aps07_fpp(x, n):
+    return -2 * n * n
+
+
+def aps08_f(x, n):
+    r"""Degree n polynomial"""
+    return x * x - (1 - x)**n
+
+
+def aps08_fp(x, n):
+    return 2 * x + n * (1 - x)**(n - 1)
+
+
+def aps08_fpp(x, n):
+    return 2 - n * (n - 1) * (1 - x)**(n - 2)
+
+
+def aps09_f(x, n):
+    r"""Upside down quartic with parametrizable height"""
+    return (1 + (1 - n)**4) * x - (1 - n * x)**4
+
+
+def aps09_fp(x, n):
+    return (1 + (1 - n)**4) + 4 * n * (1 - n * x)**3
+
+
+def aps09_fpp(x, n):
+    return -12 * n * (1 - n * x)**2
+
+
+def aps10_f(x, n):
+    r"""Exponential plus a polynomial"""
+    return np.exp(-n * x) * (x - 1) + x**n
+
+
+def aps10_fp(x, n):
+    return np.exp(-n * x) * (-n * (x - 1) + 1) + n * x**(n - 1)
+
+
+def aps10_fpp(x, n):
+    return (np.exp(-n * x) * (-n * (-n * (x - 1) + 1) + -n * x)
+            + n * (n - 1) * x**(n - 2))
+
+
+def aps11_f(x, n):
+    r"""Rational function with a zero at x=1/n and a pole at x=0"""
+    return (n * x - 1) / ((n - 1) * x)
+
+
+def aps11_fp(x, n):
+    return 1 / (n - 1) / x**2
+
+
+def aps11_fpp(x, n):
+    return -2 / (n - 1) / x**3
+
+
+def aps12_f(x, n):
+    r"""nth root of x, with a zero at x=n"""
+    return np.power(x, 1.0 / n) - np.power(n, 1.0 / n)
+
+
+def aps12_fp(x, n):
+    return np.power(x, (1.0 - n) / n) / n
+
+
+def aps12_fpp(x, n):
+    return np.power(x, (1.0 - 2 * n) / n) * (1.0 / n) * (1.0 - n) / n
+
+
+_MAX_EXPABLE = np.log(np.finfo(float).max)
+
+
+def aps13_f(x):
+    r"""Function with *all* derivatives 0 at the root"""
+    if x == 0:
+        return 0
+    # x2 = 1.0/x**2
+    # if x2 > 708:
+    #     return 0
+    y = 1 / x**2
+    if y > _MAX_EXPABLE:
+        return 0
+    return x / np.exp(y)
+
+
+def aps13_fp(x):
+    if x == 0:
+        return 0
+    y = 1 / x**2
+    if y > _MAX_EXPABLE:
+        return 0
+    return (1 + 2 / x**2) / np.exp(y)
+
+
+def aps13_fpp(x):
+    if x == 0:
+        return 0
+    y = 1 / x**2
+    if y > _MAX_EXPABLE:
+        return 0
+    return 2 * (2 - x**2) / x**5 / np.exp(y)
+
+
+def aps14_f(x, n):
+    r"""0 for negative x-values, trigonometric+linear for x positive"""
+    if x <= 0:
+        return -n / 20.0
+    return n / 20.0 * (x / 1.5 + np.sin(x) - 1)
+
+
+def aps14_fp(x, n):
+    if x <= 0:
+        return 0
+    return n / 20.0 * (1.0 / 1.5 + np.cos(x))
+
+
+def aps14_fpp(x, n):
+    if x <= 0:
+        return 0
+    return -n / 20.0 * (np.sin(x))
+
+
+def aps15_f(x, n):
+    r"""piecewise linear, constant outside of [0, 0.002/(1+n)]"""
+    if x < 0:
+        return -0.859
+    if x > 2 * 1e-3 / (1 + n):
+        return np.e - 1.859
+    return np.exp((n + 1) * x / 2 * 1000) - 1.859
+
+
+def aps15_fp(x, n):
+    if not 0 <= x <= 2 * 1e-3 / (1 + n):
+        return np.e - 1.859
+    return np.exp((n + 1) * x / 2 * 1000) * (n + 1) / 2 * 1000
+
+
+def aps15_fpp(x, n):
+    if not 0 <= x <= 2 * 1e-3 / (1 + n):
+        return np.e - 1.859
+    return np.exp((n + 1) * x / 2 * 1000) * (n + 1) / 2 * 1000 * (n + 1) / 2 * 1000
+
+
+# Each APS test case has
+# - a function and its two derivatives,
+# - additional arguments,
+# - a bracket enclosing a root,
+# - the order of differentiability of the function on this interval
+# - a starting value for methods which don't require a bracket
+# - the root (inside the bracket)
+# - an Identifier of the test case
+#
+# Algorithm 748 is a bracketing algorithm so a bracketing interval was provided
+# in [1] for each test case. Newton and Halley methods need a single
+# starting point x0, which was chosen to be near the middle of the interval,
+# unless that would have made the problem too easy.
+
+_APS_TESTS_KEYS = [
+    "f", "fprime", "fprime2", "args", "bracket", "smoothness", "x0", "root", "ID"
+]
+_APS_TESTS = [
+    [aps01_f, aps01_fp, aps01_fpp, (), [np.pi / 2, np.pi], np.inf,
+     3, 1.89549426703398094e+00, "aps.01.00"],
+    [aps02_f, aps02_fp, aps02_fpp, (), [1 + 1e-9, 4 - 1e-9], np.inf,
+     2, 3.02291534727305677e+00, "aps.02.00"],
+    [aps02_f, aps02_fp, aps02_fpp, (), [4 + 1e-9, 9 - 1e-9], np.inf,
+     5, 6.68375356080807848e+00, "aps.02.01"],
+    [aps02_f, aps02_fp, aps02_fpp, (), [9 + 1e-9, 16 - 1e-9], np.inf,
+     10, 1.12387016550022114e+01, "aps.02.02"],
+    [aps02_f, aps02_fp, aps02_fpp, (), [16 + 1e-9, 25 - 1e-9], np.inf,
+     17, 1.96760000806234103e+01, "aps.02.03"],
+    [aps02_f, aps02_fp, aps02_fpp, (), [25 + 1e-9, 36 - 1e-9], np.inf,
+     26, 2.98282273265047557e+01, "aps.02.04"],
+    [aps02_f, aps02_fp, aps02_fpp, (), [36 + 1e-9, 49 - 1e-9], np.inf,
+     37, 4.19061161952894139e+01, "aps.02.05"],
+    [aps02_f, aps02_fp, aps02_fpp, (), [49 + 1e-9, 64 - 1e-9], np.inf,
+     50, 5.59535958001430913e+01, "aps.02.06"],
+    [aps02_f, aps02_fp, aps02_fpp, (), [64 + 1e-9, 81 - 1e-9], np.inf,
+     65, 7.19856655865877997e+01, "aps.02.07"],
+    [aps02_f, aps02_fp, aps02_fpp, (), [81 + 1e-9, 100 - 1e-9], np.inf,
+     82, 9.00088685391666701e+01, "aps.02.08"],
+    [aps02_f, aps02_fp, aps02_fpp, (), [100 + 1e-9, 121 - 1e-9], np.inf,
+     101, 1.10026532748330197e+02, "aps.02.09"],
+    [aps03_f, aps03_fp, aps03_fpp, (-40, -1), [-9, 31], np.inf,
+     -2, 0, "aps.03.00"],
+    [aps03_f, aps03_fp, aps03_fpp, (-100, -2), [-9, 31], np.inf,
+     -2, 0, "aps.03.01"],
+    [aps03_f, aps03_fp, aps03_fpp, (-200, -3), [-9, 31], np.inf,
+     -2, 0, "aps.03.02"],
+    [aps04_f, aps04_fp, aps04_fpp, (4, 0.2), [0, 5], np.inf,
+     2.5, 6.68740304976422006e-01, "aps.04.00"],
+    [aps04_f, aps04_fp, aps04_fpp, (6, 0.2), [0, 5], np.inf,
+     2.5, 7.64724491331730039e-01, "aps.04.01"],
+    [aps04_f, aps04_fp, aps04_fpp, (8, 0.2), [0, 5], np.inf,
+     2.5, 8.17765433957942545e-01, "aps.04.02"],
+    [aps04_f, aps04_fp, aps04_fpp, (10, 0.2), [0, 5], np.inf,
+     2.5, 8.51339922520784609e-01, "aps.04.03"],
+    [aps04_f, aps04_fp, aps04_fpp, (12, 0.2), [0, 5], np.inf,
+     2.5, 8.74485272221167897e-01, "aps.04.04"],
+    [aps04_f, aps04_fp, aps04_fpp, (4, 1), [0, 5], np.inf,
+     2.5, 1, "aps.04.05"],
+    [aps04_f, aps04_fp, aps04_fpp, (6, 1), [0, 5], np.inf,
+     2.5, 1, "aps.04.06"],
+    [aps04_f, aps04_fp, aps04_fpp, (8, 1), [0, 5], np.inf,
+     2.5, 1, "aps.04.07"],
+    [aps04_f, aps04_fp, aps04_fpp, (10, 1), [0, 5], np.inf,
+     2.5, 1, "aps.04.08"],
+    [aps04_f, aps04_fp, aps04_fpp, (12, 1), [0, 5], np.inf,
+     2.5, 1, "aps.04.09"],
+    [aps04_f, aps04_fp, aps04_fpp, (8, 1), [-0.95, 4.05], np.inf,
+     1.5, 1, "aps.04.10"],
+    [aps04_f, aps04_fp, aps04_fpp, (10, 1), [-0.95, 4.05], np.inf,
+     1.5, 1, "aps.04.11"],
+    [aps04_f, aps04_fp, aps04_fpp, (12, 1), [-0.95, 4.05], np.inf,
+     1.5, 1, "aps.04.12"],
+    [aps04_f, aps04_fp, aps04_fpp, (14, 1), [-0.95, 4.05], np.inf,
+     1.5, 1, "aps.04.13"],
+    [aps05_f, aps05_fp, aps05_fpp, (), [0, 1.5], np.inf,
+     1.3, np.pi / 6, "aps.05.00"],
+    [aps06_f, aps06_fp, aps06_fpp, (1,), [0, 1], np.inf,
+     0.5, 4.22477709641236709e-01, "aps.06.00"],
+    [aps06_f, aps06_fp, aps06_fpp, (2,), [0, 1], np.inf,
+     0.5, 3.06699410483203705e-01, "aps.06.01"],
+    [aps06_f, aps06_fp, aps06_fpp, (3,), [0, 1], np.inf,
+     0.5, 2.23705457654662959e-01, "aps.06.02"],
+    [aps06_f, aps06_fp, aps06_fpp, (4,), [0, 1], np.inf,
+     0.5, 1.71719147519508369e-01, "aps.06.03"],
+    [aps06_f, aps06_fp, aps06_fpp, (5,), [0, 1], np.inf,
+     0.4, 1.38257155056824066e-01, "aps.06.04"],
+    [aps06_f, aps06_fp, aps06_fpp, (20,), [0, 1], np.inf,
+     0.1, 3.46573590208538521e-02, "aps.06.05"],
+    [aps06_f, aps06_fp, aps06_fpp, (40,), [0, 1], np.inf,
+     5e-02, 1.73286795139986315e-02, "aps.06.06"],
+    [aps06_f, aps06_fp, aps06_fpp, (60,), [0, 1], np.inf,
+     1.0 / 30, 1.15524530093324210e-02, "aps.06.07"],
+    [aps06_f, aps06_fp, aps06_fpp, (80,), [0, 1], np.inf,
+     2.5e-02, 8.66433975699931573e-03, "aps.06.08"],
+    [aps06_f, aps06_fp, aps06_fpp, (100,), [0, 1], np.inf,
+     2e-02, 6.93147180559945415e-03, "aps.06.09"],
+    [aps07_f, aps07_fp, aps07_fpp, (5,), [0, 1], np.inf,
+     0.4, 3.84025518406218985e-02, "aps.07.00"],
+    [aps07_f, aps07_fp, aps07_fpp, (10,), [0, 1], np.inf,
+     0.4, 9.90000999800049949e-03, "aps.07.01"],
+    [aps07_f, aps07_fp, aps07_fpp, (20,), [0, 1], np.inf,
+     0.4, 2.49375003906201174e-03, "aps.07.02"],
+    [aps08_f, aps08_fp, aps08_fpp, (2,), [0, 1], np.inf,
+     0.9, 0.5, "aps.08.00"],
+    [aps08_f, aps08_fp, aps08_fpp, (5,), [0, 1], np.inf,
+     0.9, 3.45954815848242059e-01, "aps.08.01"],
+    [aps08_f, aps08_fp, aps08_fpp, (10,), [0, 1], np.inf,
+     0.9, 2.45122333753307220e-01, "aps.08.02"],
+    [aps08_f, aps08_fp, aps08_fpp, (15,), [0, 1], np.inf,
+     0.9, 1.95547623536565629e-01, "aps.08.03"],
+    [aps08_f, aps08_fp, aps08_fpp, (20,), [0, 1], np.inf,
+     0.9, 1.64920957276440960e-01, "aps.08.04"],
+    [aps09_f, aps09_fp, aps09_fpp, (1,), [0, 1], np.inf,
+     0.5, 2.75508040999484394e-01, "aps.09.00"],
+    [aps09_f, aps09_fp, aps09_fpp, (2,), [0, 1], np.inf,
+     0.5, 1.37754020499742197e-01, "aps.09.01"],
+    [aps09_f, aps09_fp, aps09_fpp, (4,), [0, 1], np.inf,
+     0.5, 1.03052837781564422e-02, "aps.09.02"],
+    [aps09_f, aps09_fp, aps09_fpp, (5,), [0, 1], np.inf,
+     0.5, 3.61710817890406339e-03, "aps.09.03"],
+    [aps09_f, aps09_fp, aps09_fpp, (8,), [0, 1], np.inf,
+     0.5, 4.10872918496395375e-04, "aps.09.04"],
+    [aps09_f, aps09_fp, aps09_fpp, (15,), [0, 1], np.inf,
+     0.5, 2.59895758929076292e-05, "aps.09.05"],
+    [aps09_f, aps09_fp, aps09_fpp, (20,), [0, 1], np.inf,
+     0.5, 7.66859512218533719e-06, "aps.09.06"],
+    [aps10_f, aps10_fp, aps10_fpp, (1,), [0, 1], np.inf,
+     0.9, 4.01058137541547011e-01, "aps.10.00"],
+    [aps10_f, aps10_fp, aps10_fpp, (5,), [0, 1], np.inf,
+     0.9, 5.16153518757933583e-01, "aps.10.01"],
+    [aps10_f, aps10_fp, aps10_fpp, (10,), [0, 1], np.inf,
+     0.9, 5.39522226908415781e-01, "aps.10.02"],
+    [aps10_f, aps10_fp, aps10_fpp, (15,), [0, 1], np.inf,
+     0.9, 5.48182294340655241e-01, "aps.10.03"],
+    [aps10_f, aps10_fp, aps10_fpp, (20,), [0, 1], np.inf,
+     0.9, 5.52704666678487833e-01, "aps.10.04"],
+    [aps11_f, aps11_fp, aps11_fpp, (2,), [0.01, 1], np.inf,
+     1e-02, 1.0 / 2, "aps.11.00"],
+    [aps11_f, aps11_fp, aps11_fpp, (5,), [0.01, 1], np.inf,
+     1e-02, 1.0 / 5, "aps.11.01"],
+    [aps11_f, aps11_fp, aps11_fpp, (15,), [0.01, 1], np.inf,
+     1e-02, 1.0 / 15, "aps.11.02"],
+    [aps11_f, aps11_fp, aps11_fpp, (20,), [0.01, 1], np.inf,
+     1e-02, 1.0 / 20, "aps.11.03"],
+    [aps12_f, aps12_fp, aps12_fpp, (2,), [1, 100], np.inf,
+     1.1, 2, "aps.12.00"],
+    [aps12_f, aps12_fp, aps12_fpp, (3,), [1, 100], np.inf,
+     1.1, 3, "aps.12.01"],
+    [aps12_f, aps12_fp, aps12_fpp, (4,), [1, 100], np.inf,
+     1.1, 4, "aps.12.02"],
+    [aps12_f, aps12_fp, aps12_fpp, (5,), [1, 100], np.inf,
+     1.1, 5, "aps.12.03"],
+    [aps12_f, aps12_fp, aps12_fpp, (6,), [1, 100], np.inf,
+     1.1, 6, "aps.12.04"],
+    [aps12_f, aps12_fp, aps12_fpp, (7,), [1, 100], np.inf,
+     1.1, 7, "aps.12.05"],
+    [aps12_f, aps12_fp, aps12_fpp, (9,), [1, 100], np.inf,
+     1.1, 9, "aps.12.06"],
+    [aps12_f, aps12_fp, aps12_fpp, (11,), [1, 100], np.inf,
+     1.1, 11, "aps.12.07"],
+    [aps12_f, aps12_fp, aps12_fpp, (13,), [1, 100], np.inf,
+     1.1, 13, "aps.12.08"],
+    [aps12_f, aps12_fp, aps12_fpp, (15,), [1, 100], np.inf,
+     1.1, 15, "aps.12.09"],
+    [aps12_f, aps12_fp, aps12_fpp, (17,), [1, 100], np.inf,
+     1.1, 17, "aps.12.10"],
+    [aps12_f, aps12_fp, aps12_fpp, (19,), [1, 100], np.inf,
+     1.1, 19, "aps.12.11"],
+    [aps12_f, aps12_fp, aps12_fpp, (21,), [1, 100], np.inf,
+     1.1, 21, "aps.12.12"],
+    [aps12_f, aps12_fp, aps12_fpp, (23,), [1, 100], np.inf,
+     1.1, 23, "aps.12.13"],
+    [aps12_f, aps12_fp, aps12_fpp, (25,), [1, 100], np.inf,
+     1.1, 25, "aps.12.14"],
+    [aps12_f, aps12_fp, aps12_fpp, (27,), [1, 100], np.inf,
+     1.1, 27, "aps.12.15"],
+    [aps12_f, aps12_fp, aps12_fpp, (29,), [1, 100], np.inf,
+     1.1, 29, "aps.12.16"],
+    [aps12_f, aps12_fp, aps12_fpp, (31,), [1, 100], np.inf,
+     1.1, 31, "aps.12.17"],
+    [aps12_f, aps12_fp, aps12_fpp, (33,), [1, 100], np.inf,
+     1.1, 33, "aps.12.18"],
+    [aps13_f, aps13_fp, aps13_fpp, (), [-1, 4], np.inf,
+     1.5, 0, "aps.13.00"],
+    [aps14_f, aps14_fp, aps14_fpp, (1,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.00"],
+    [aps14_f, aps14_fp, aps14_fpp, (2,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.01"],
+    [aps14_f, aps14_fp, aps14_fpp, (3,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.02"],
+    [aps14_f, aps14_fp, aps14_fpp, (4,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.03"],
+    [aps14_f, aps14_fp, aps14_fpp, (5,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.04"],
+    [aps14_f, aps14_fp, aps14_fpp, (6,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.05"],
+    [aps14_f, aps14_fp, aps14_fpp, (7,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.06"],
+    [aps14_f, aps14_fp, aps14_fpp, (8,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.07"],
+    [aps14_f, aps14_fp, aps14_fpp, (9,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.08"],
+    [aps14_f, aps14_fp, aps14_fpp, (10,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.09"],
+    [aps14_f, aps14_fp, aps14_fpp, (11,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.10"],
+    [aps14_f, aps14_fp, aps14_fpp, (12,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.11"],
+    [aps14_f, aps14_fp, aps14_fpp, (13,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.12"],
+    [aps14_f, aps14_fp, aps14_fpp, (14,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.13"],
+    [aps14_f, aps14_fp, aps14_fpp, (15,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.14"],
+    [aps14_f, aps14_fp, aps14_fpp, (16,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.15"],
+    [aps14_f, aps14_fp, aps14_fpp, (17,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.16"],
+    [aps14_f, aps14_fp, aps14_fpp, (18,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.17"],
+    [aps14_f, aps14_fp, aps14_fpp, (19,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.18"],
+    [aps14_f, aps14_fp, aps14_fpp, (20,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.19"],
+    [aps14_f, aps14_fp, aps14_fpp, (21,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.20"],
+    [aps14_f, aps14_fp, aps14_fpp, (22,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.21"],
+    [aps14_f, aps14_fp, aps14_fpp, (23,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.22"],
+    [aps14_f, aps14_fp, aps14_fpp, (24,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.23"],
+    [aps14_f, aps14_fp, aps14_fpp, (25,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.24"],
+    [aps14_f, aps14_fp, aps14_fpp, (26,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.25"],
+    [aps14_f, aps14_fp, aps14_fpp, (27,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.26"],
+    [aps14_f, aps14_fp, aps14_fpp, (28,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.27"],
+    [aps14_f, aps14_fp, aps14_fpp, (29,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.28"],
+    [aps14_f, aps14_fp, aps14_fpp, (30,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.29"],
+    [aps14_f, aps14_fp, aps14_fpp, (31,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.30"],
+    [aps14_f, aps14_fp, aps14_fpp, (32,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.31"],
+    [aps14_f, aps14_fp, aps14_fpp, (33,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.32"],
+    [aps14_f, aps14_fp, aps14_fpp, (34,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.33"],
+    [aps14_f, aps14_fp, aps14_fpp, (35,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.34"],
+    [aps14_f, aps14_fp, aps14_fpp, (36,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.35"],
+    [aps14_f, aps14_fp, aps14_fpp, (37,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.36"],
+    [aps14_f, aps14_fp, aps14_fpp, (38,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.37"],
+    [aps14_f, aps14_fp, aps14_fpp, (39,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.38"],
+    [aps14_f, aps14_fp, aps14_fpp, (40,), [-1000, np.pi / 2], 0,
+     1, 6.23806518961612433e-01, "aps.14.39"],
+    [aps15_f, aps15_fp, aps15_fpp, (20,), [-1000, 1e-4], 0,
+     -2, 5.90513055942197166e-05, "aps.15.00"],
+    [aps15_f, aps15_fp, aps15_fpp, (21,), [-1000, 1e-4], 0,
+     -2, 5.63671553399369967e-05, "aps.15.01"],
+    [aps15_f, aps15_fp, aps15_fpp, (22,), [-1000, 1e-4], 0,
+     -2, 5.39164094555919196e-05, "aps.15.02"],
+    [aps15_f, aps15_fp, aps15_fpp, (23,), [-1000, 1e-4], 0,
+     -2, 5.16698923949422470e-05, "aps.15.03"],
+    [aps15_f, aps15_fp, aps15_fpp, (24,), [-1000, 1e-4], 0,
+     -2, 4.96030966991445609e-05, "aps.15.04"],
+    [aps15_f, aps15_fp, aps15_fpp, (25,), [-1000, 1e-4], 0,
+     -2, 4.76952852876389951e-05, "aps.15.05"],
+    [aps15_f, aps15_fp, aps15_fpp, (26,), [-1000, 1e-4], 0,
+     -2, 4.59287932399486662e-05, "aps.15.06"],
+    [aps15_f, aps15_fp, aps15_fpp, (27,), [-1000, 1e-4], 0,
+     -2, 4.42884791956647841e-05, "aps.15.07"],
+    [aps15_f, aps15_fp, aps15_fpp, (28,), [-1000, 1e-4], 0,
+     -2, 4.27612902578832391e-05, "aps.15.08"],
+    [aps15_f, aps15_fp, aps15_fpp, (29,), [-1000, 1e-4], 0,
+     -2, 4.13359139159538030e-05, "aps.15.09"],
+    [aps15_f, aps15_fp, aps15_fpp, (30,), [-1000, 1e-4], 0,
+     -2, 4.00024973380198076e-05, "aps.15.10"],
+    [aps15_f, aps15_fp, aps15_fpp, (31,), [-1000, 1e-4], 0,
+     -2, 3.87524192962066869e-05, "aps.15.11"],
+    [aps15_f, aps15_fp, aps15_fpp, (32,), [-1000, 1e-4], 0,
+     -2, 3.75781035599579910e-05, "aps.15.12"],
+    [aps15_f, aps15_fp, aps15_fpp, (33,), [-1000, 1e-4], 0,
+     -2, 3.64728652199592355e-05, "aps.15.13"],
+    [aps15_f, aps15_fp, aps15_fpp, (34,), [-1000, 1e-4], 0,
+     -2, 3.54307833565318273e-05, "aps.15.14"],
+    [aps15_f, aps15_fp, aps15_fpp, (35,), [-1000, 1e-4], 0,
+     -2, 3.44465949299614980e-05, "aps.15.15"],
+    [aps15_f, aps15_fp, aps15_fpp, (36,), [-1000, 1e-4], 0,
+     -2, 3.35156058778003705e-05, "aps.15.16"],
+    [aps15_f, aps15_fp, aps15_fpp, (37,), [-1000, 1e-4], 0,
+     -2, 3.26336162494372125e-05, "aps.15.17"],
+    [aps15_f, aps15_fp, aps15_fpp, (38,), [-1000, 1e-4], 0,
+     -2, 3.17968568584260013e-05, "aps.15.18"],
+    [aps15_f, aps15_fp, aps15_fpp, (39,), [-1000, 1e-4], 0,
+     -2, 3.10019354369653455e-05, "aps.15.19"],
+    [aps15_f, aps15_fp, aps15_fpp, (40,), [-1000, 1e-4], 0,
+     -2, 3.02457906702100968e-05, "aps.15.20"],
+    [aps15_f, aps15_fp, aps15_fpp, (100,), [-1000, 1e-4], 0,
+     -2, 1.22779942324615231e-05, "aps.15.21"],
+    [aps15_f, aps15_fp, aps15_fpp, (200,), [-1000, 1e-4], 0,
+     -2, 6.16953939044086617e-06, "aps.15.22"],
+    [aps15_f, aps15_fp, aps15_fpp, (300,), [-1000, 1e-4], 0,
+     -2, 4.11985852982928163e-06, "aps.15.23"],
+    [aps15_f, aps15_fp, aps15_fpp, (400,), [-1000, 1e-4], 0,
+     -2, 3.09246238772721682e-06, "aps.15.24"],
+    [aps15_f, aps15_fp, aps15_fpp, (500,), [-1000, 1e-4], 0,
+     -2, 2.47520442610501789e-06, "aps.15.25"],
+    [aps15_f, aps15_fp, aps15_fpp, (600,), [-1000, 1e-4], 0,
+     -2, 2.06335676785127107e-06, "aps.15.26"],
+    [aps15_f, aps15_fp, aps15_fpp, (700,), [-1000, 1e-4], 0,
+     -2, 1.76901200781542651e-06, "aps.15.27"],
+    [aps15_f, aps15_fp, aps15_fpp, (800,), [-1000, 1e-4], 0,
+     -2, 1.54816156988591016e-06, "aps.15.28"],
+    [aps15_f, aps15_fp, aps15_fpp, (900,), [-1000, 1e-4], 0,
+     -2, 1.37633453660223511e-06, "aps.15.29"],
+    [aps15_f, aps15_fp, aps15_fpp, (1000,), [-1000, 1e-4], 0,
+     -2, 1.23883857889971403e-06, "aps.15.30"]
+]
+
+_APS_TESTS_DICTS = [dict(zip(_APS_TESTS_KEYS, testcase)) for testcase in _APS_TESTS]
+
+
+#   ##################
+#   "complex" test cases
+#   A few simple, complex-valued, functions, defined on the complex plane.
+
+
+def cplx01_f(z, n, a):
+    r"""z**n-a:  Use to find the nth root of a"""
+    return z**n - a
+
+
+def cplx01_fp(z, n, a):
+    return n * z**(n - 1)
+
+
+def cplx01_fpp(z, n, a):
+    return n * (n - 1) * z**(n - 2)
+
+
+def cplx02_f(z, a):
+    r"""e**z - a: Use to find the log of a"""
+    return np.exp(z) - a
+
+
+def cplx02_fp(z, a):
+    return np.exp(z)
+
+
+def cplx02_fpp(z, a):
+    return np.exp(z)
+
+
+# Each "complex" test case has
+# - a function and its two derivatives,
+# - additional arguments,
+# - the order of differentiability of the function on this interval
+# - two starting values x0 and x1
+# - the root
+# - an Identifier of the test case
+#
+# Algorithm 748 is a bracketing algorithm so a bracketing interval was provided
+# in [1] for each test case. Newton and Halley need a single starting point
+# x0, which was chosen to be near the middle of the interval, unless that
+# would make the problem too easy.
+
+
+_COMPLEX_TESTS_KEYS = [
+    "f", "fprime", "fprime2", "args", "smoothness", "x0", "x1", "root", "ID"
+]
+_COMPLEX_TESTS = [
+    [cplx01_f, cplx01_fp, cplx01_fpp, (2, -1), np.inf,
+     (1 + 1j), (0.5 + 0.5j), 1j, "complex.01.00"],
+    [cplx01_f, cplx01_fp, cplx01_fpp, (3, 1), np.inf,
+     (-1 + 1j), (-0.5 + 2.0j), (-0.5 + np.sqrt(3) / 2 * 1.0j),
+     "complex.01.01"],
+    [cplx01_f, cplx01_fp, cplx01_fpp, (3, -1), np.inf,
+     1j, (0.5 + 0.5j), (0.5 + np.sqrt(3) / 2 * 1.0j),
+     "complex.01.02"],
+    [cplx01_f, cplx01_fp, cplx01_fpp, (3, 8), np.inf,
+     5, 4, 2, "complex.01.03"],
+    [cplx02_f, cplx02_fp, cplx02_fpp, (-1,), np.inf,
+     (1 + 2j), (0.5 + 0.5j), np.pi * 1.0j, "complex.02.00"],
+    [cplx02_f, cplx02_fp, cplx02_fpp, (1j,), np.inf,
+     (1 + 2j), (0.5 + 0.5j), np.pi * 0.5j, "complex.02.01"],
+]
+
+_COMPLEX_TESTS_DICTS = [
+    dict(zip(_COMPLEX_TESTS_KEYS, testcase)) for testcase in _COMPLEX_TESTS
+]
+
+
+def _add_a_b(tests):
+    r"""Add "a" and "b" keys to each test from the "bracket" value"""
+    for d in tests:
+        for k, v in zip(['a', 'b'], d.get('bracket', [])):
+            d[k] = v
+
+
+_add_a_b(_ORIGINAL_TESTS_DICTS)
+_add_a_b(_APS_TESTS_DICTS)
+_add_a_b(_COMPLEX_TESTS_DICTS)
+
+
+def get_tests(collection='original', smoothness=None):
+    r"""Return the requested collection of test cases, as an array of dicts with subset-specific keys
+
+    Allowed values of collection:
+    'original': The original benchmarking functions.
+         Real-valued functions of real-valued inputs on an interval with a zero.
+         f1, .., f3 are continuous and infinitely differentiable
+         f4 has a single discontinuity at the root
+         f5 has a root at 1 replacing a 1st order pole
+         f6 is randomly positive on one side of the root, randomly negative on the other
+    'aps': The test problems in the TOMS "Algorithm 748: Enclosing Zeros of Continuous Functions"
+         paper by Alefeld, Potra and Shi. Real-valued functions of
+         real-valued inputs on an interval with a zero.
+         Suitable for methods which start with an enclosing interval, and
+         derivatives up to 2nd order.
+    'complex': Some complex-valued functions of complex-valued inputs.
+         No enclosing bracket is provided.
+         Suitable for methods which use one or more starting values, and
+         derivatives up to 2nd order.
+
+    The dictionary keys will be a subset of
+    ["f", "fprime", "fprime2", "args", "bracket", "a", b", "smoothness", "x0", "x1", "root", "ID"]
+    """  # noqa: E501
+    collection = collection or "original"
+    subsets = {"aps": _APS_TESTS_DICTS,
+               "complex": _COMPLEX_TESTS_DICTS,
+               "original": _ORIGINAL_TESTS_DICTS,
+               "chandrupatla": _CHANDRUPATLA_TESTS_DICTS}
+    tests = subsets.get(collection, [])
+    if smoothness is not None:
+        tests = [tc for tc in tests if tc['smoothness'] >= smoothness]
+    return tests
+
+
+# Backwards compatibility
+methods = [cc.bisect, cc.ridder, cc.brenth, cc.brentq]
+mstrings = ['cc.bisect', 'cc.ridder', 'cc.brenth', 'cc.brentq']
+functions = [f2, f3, f4, f5, f6]
+fstrings = ['f2', 'f3', 'f4', 'f5', 'f6']
+
+#   ##################
+#   "Chandrupatla" test cases
+#   Functions and test cases that appear in [2]
+
+def fun1(x):
+    return x**3 - 2*x - 5
+fun1.root = 2.0945514815423265  # additional precision using mpmath.findroot
+
+
+def fun2(x):
+    return 1 - 1/x**2
+fun2.root = 1
+
+
+def fun3(x):
+    return (x-3)**3
+fun3.root = 3
+
+
+def fun4(x):
+    return 6*(x-2)**5
+fun4.root = 2
+
+
+def fun5(x):
+    return x**9
+fun5.root = 0
+
+
+def fun6(x):
+    return x**19
+fun6.root = 0
+
+
+def fun7(x):
+    return 0 if abs(x) < 3.8e-4 else x*np.exp(-x**(-2))
+fun7.root = 0
+
+
+def fun8(x):
+    xi = 0.61489
+    return -(3062*(1-xi)*np.exp(-x))/(xi + (1-xi)*np.exp(-x)) - 1013 + 1628/x
+fun8.root = 1.0375360332870405
+
+
+def fun9(x):
+    return np.exp(x) - 2 - 0.01/x**2 + .000002/x**3
+fun9.root = 0.7032048403631358
+
+# Each "chandropatla" test case has
+# - a function,
+# - two starting values x0 and x1
+# - the root
+# - the number of function evaluations required by Chandrupatla's algorithm
+# - an Identifier of the test case
+#
+# Chandrupatla's is a bracketing algorithm, so a bracketing interval was
+# provided in [2] for each test case. No special support for testing with
+# secant/Newton/Halley is provided.
+
+_CHANDRUPATLA_TESTS_KEYS = ["f", "bracket", "root", "nfeval", "ID"]
+_CHANDRUPATLA_TESTS = [
+    [fun1, [2, 3], fun1.root, 7],
+    [fun1, [1, 10], fun1.root, 11],
+    [fun1, [1, 100], fun1.root, 14],
+    [fun1, [-1e4, 1e4], fun1.root, 23],
+    [fun1, [-1e10, 1e10], fun1.root, 43],
+    [fun2, [0.5, 1.51], fun2.root, 8],
+    [fun2, [1e-4, 1e4], fun2.root, 22],
+    [fun2, [1e-6, 1e6], fun2.root, 28],
+    [fun2, [1e-10, 1e10], fun2.root, 41],
+    [fun2, [1e-12, 1e12], fun2.root, 48],
+    [fun3, [0, 5], fun3.root, 21],
+    [fun3, [-10, 10], fun3.root, 23],
+    [fun3, [-1e4, 1e4], fun3.root, 36],
+    [fun3, [-1e6, 1e6], fun3.root, 45],
+    [fun3, [-1e10, 1e10], fun3.root, 55],
+    [fun4, [0, 5], fun4.root, 21],
+    [fun4, [-10, 10], fun4.root, 23],
+    [fun4, [-1e4, 1e4], fun4.root, 33],
+    [fun4, [-1e6, 1e6], fun4.root, 43],
+    [fun4, [-1e10, 1e10], fun4.root, 54],
+    [fun5, [-1, 4], fun5.root, 21],
+    [fun5, [-2, 5], fun5.root, 22],
+    [fun5, [-1, 10], fun5.root, 23],
+    [fun5, [-5, 50], fun5.root, 25],
+    [fun5, [-10, 100], fun5.root, 26],
+    [fun6, [-1., 4.], fun6.root, 21],
+    [fun6, [-2., 5.], fun6.root, 22],
+    [fun6, [-1., 10.], fun6.root, 23],
+    [fun6, [-5., 50.], fun6.root, 25],
+    [fun6, [-10., 100.], fun6.root, 26],
+    [fun7, [-1, 4], fun7.root, 8],
+    [fun7, [-2, 5], fun7.root, 8],
+    [fun7, [-1, 10], fun7.root, 11],
+    [fun7, [-5, 50], fun7.root, 18],
+    [fun7, [-10, 100], fun7.root, 19],
+    [fun8, [2e-4, 2], fun8.root, 9],
+    [fun8, [2e-4, 3], fun8.root, 10],
+    [fun8, [2e-4, 9], fun8.root, 11],
+    [fun8, [2e-4, 27], fun8.root, 12],
+    [fun8, [2e-4, 81], fun8.root, 14],
+    [fun9, [2e-4, 1], fun9.root, 7],
+    [fun9, [2e-4, 3], fun9.root, 8],
+    [fun9, [2e-4, 9], fun9.root, 10],
+    [fun9, [2e-4, 27], fun9.root, 11],
+    [fun9, [2e-4, 81], fun9.root, 13],
+]
+_CHANDRUPATLA_TESTS = [test + [f'{test[0].__name__}.{i%5+1}']
+                       for i, test in enumerate(_CHANDRUPATLA_TESTS)]
+
+_CHANDRUPATLA_TESTS_DICTS = [dict(zip(_CHANDRUPATLA_TESTS_KEYS, testcase))
+                             for testcase in _CHANDRUPATLA_TESTS]
+_add_a_b(_CHANDRUPATLA_TESTS_DICTS)
--- a/Show More
+++ b/Show More