Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hager_Zhang first commit #1344

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Conversation

alisterde
Copy link
Member

@MichaelClerx @ben18785
I've got an initial draft of the Hager-Zhang line search algorithm #1160 it is passing all of the tests I've written and appears to be functional.
However, I want to write more tests to check specific conditions in the logic before I'll be confident in the algorithm, I also think I need to add a few more convergence checks but I'll address that tomorrow.

I would appreciate any initial thoughts you have about it.

Copy link
Member

@MichaelClerx MichaelClerx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alisterde

This looks BIG! I've added some initial comments on coding and style, but I'll need some more explanation before I can review what the code actually does

@@ -0,0 +1,899 @@
#
# Fixed learning-rate gradient descent.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Fixed learning-rate gradient descent.
# Hager-Zhang line search.

from numpy.linalg import norm


class HagerZhang(pints.Optimiser):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best to have this consistent with the file name, so either HagerZhangLineSearch or _hager_zhang.py


class HagerZhang(pints.Optimiser):
"""
Gradient-descent method with a fixed learning rate.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a docstring with references etc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pseudocode too would I think be really useful. For pseudocode in the other methods, we tend not to incorporate ask/tell but rather lay out the algorithm as given in the paper (but made tidy and understandable for us).

from numpy.linalg import norm


class HagerZhang(pints.Optimiser):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need an interface for line search methods? E.g. class HagerZhang(pints.LineSearch) with class LineSearch(pints.Optimiser) ?

def __init__(self, x0, sigma0=None, boundaries=None):
super(HagerZhang, self).__init__(x0, sigma0, boundaries)

# Set optimiser state
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it this method only works for 1d problems. Should there be a check in the constructor for this? Or in a LineSearch class that this one extends?

# the opposite slope condition (see function definition)
self.theta = 0.5

self.__gamma = 0.66
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not using __ anywhere in PINTS, just _ to mark this as private


def __initialising(self, k, alpha_k0):
'''
This function is part of the Hager-Zhang line search method [1].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line's not really needed!

self._current_f = np.asarray(proposed_f)
self._current_dfdx = np.asarray(proposed_dfdx)

elif self.wolfe_check is True:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
elif self.wolfe_check is True:
elif self.wolfe_check:

"""
Gradient-descent method with a fixed learning rate.
"""

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add a big long # comment here explaining the structure of this class and how the yield stuff is supposed to work


self.__gamma = 0.66

# range (0, 1) small factor used in initial guess of step size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's quite a lot of properties declared down here. Seems this method has a lot of "state"!

  • Are they all strictly necessary?
  • And if so, would there be any sense in bundling them per subtask, e.g. some object for wolfe-related properties, something like that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree re: bundling. Could we perhaps bundle according to first and second Wolfe and approximate? Or would that not work?

r, x, s, b, px = self.problem()
opt = pints.OptimisationController(r, x, method=method)
m = opt.optimiser()
_B = np.identity(m._n_parameters)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the underscore?

b = 1.0

def temp(a=a, b=b, m=m):
generator = m._HagerZhang__bisect_or_secant(a=a, b=b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what the best way to test these is. As a rule of thumb you don't test the private API. So we need to figure out if this is an exception or if there's something we're not seeing that would break this down into smaller testable blocks with public methods

@MichaelClerx
Copy link
Member

Maybe we can link to this somewhere in the file. I can never find it :D
https://docs.python.org/3.8/reference/expressions.html#yield-expressions

Copy link
Collaborator

@ben18785 ben18785 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good @alisterde -- it's a beast of a method! My points are mainly surface level things as I'm finding following the logic tricky. As Michael says, having some description in the docstrings that describes the ask/telling would, I think, really help.


class HagerZhang(pints.Optimiser):
"""
Gradient-descent method with a fixed learning rate.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pseudocode too would I think be really useful. For pseudocode in the other methods, we tend not to incorporate ask/tell but rather lay out the algorithm as given in the paper (but made tidy and understandable for us).


# As c1 approaches 0 and c2 approaches 1, the line search
# terminates more quickly.
self._c1 = 1E-4 # Parameter for Armijo condition rule, 0 < c1 < 0.5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find the Armijo rule in the paper? Could you point to where this is from? I'm guessing this is "delta" on page 184 of Hager and Zhang?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, where the default value was taken from?

# As c1 approaches 0 and c2 approaches 1, the line search
# terminates more quickly.
self._c1 = 1E-4 # Parameter for Armijo condition rule, 0 < c1 < 0.5
self._c2 = 0.9 # Parameter for curvature condition rule, c1 < c2 < 1.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this is sigma on page 184 of the paper? Of course, we don't need to use the same parameter names as them but it might help in places to cross-check.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly as above for the default value.

# range (0, 1), used in the ``self.__update()`` and
# ``self.__initial_bracket()`` when the potential intervals violate
# the opposite slope condition (see function definition)
self.theta = 0.5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, would be good to know where the default value comes from here.


self.__gamma = 0.66

# range (0, 1) small factor used in initial guess of step size
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree re: bundling. Could we perhaps bundle according to first and second Wolfe and approximate? Or would that not work?

Comment on lines +326 to +335
proposed_grad = np.matmul(np.transpose(proposed_dfdx), self.__px)

wolfe_curvature = (proposed_grad >= self._c2 *
np.matmul(np.transpose(self._current_dfdx),
self.__px))

exact_wolfe_suff_dec = (self._c1 *
np.matmul(np.transpose(self._current_dfdx),
self.__px)
>= proposed_f - self._current_f)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.matmul(np.transpose(self._current_dfdx), self.__px) seems to be being repeated three times here? I'd precalculate it and then reuse.


# Checking if approximate wolfe conditions are meet.
apprx_wolfe_suff_dec = ((2.0 * self._c1 - 1.0) *
np.matmul(np.transpose(self._current_dfdx),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another time.

approximate_wolfe = (apprx_wolfe_suff_dec and wolfe_curvature
and apprx_wolfe_applies)

# If wolfe conditions are meet the line search is stopped.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

met

# is only decreased in subsequent boundary manipulation.
return ps_2 * alpha_k0

def __very_close(self, x, y):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_very_close? That makes it clear that it returns a Boolean.

suitable interval is found that contains the minimum.
'''

# (steps B0 from [1])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I'm missing it but I can't see "B0" in the paper?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants