One of the usefull tweaks for faster training of neural networks is to vary (in often cases reduce) the learning rate hyperprameter which is used by Gradient-based optimization algorithms.

Keras provide a callack function that can be used to control this hyperprameter over time (numer of iterations/epochs). To use this callback, we need to:

  • Define a function that takes an epoch index as input and returns the new learning rate as output.
  • Create an instance of LearningRateScheduler and pass the previously defined function as a parameter.
def sechdule(epoch):
  ...

lr_callback = tf.keras.callbacks.LearningRateScheduler(sechdule, verbose=True)

Scheduling functions

There is endless ways to schedule/control the learning rate, this section presents some examples.

Constant Learning Rate

The following scheduling function keeps learning rate at constant value regardless of time.

# Define configuration parameters
start_lr = 0.001

# Define the scheduling function
def schedule(epoch):
  return start_lr

Time-based Decay

The following scheduling function gradually decreases the learning rate over time from a starting value. The mathematical formula is \(lr = \frac{lr_0}{(1+k*t)}\) where \(lr_0\) is the initial learning rate value, \(k\) is a decay hyperparameter and \(t\) is the epoch/iteration number.

# Define configuration parameters
start_lr = 0.001
decay = 0.1

# Define the scheduling function
def schedule(epoch):
  previous_lr = 1
  def lr(epoch, start_lr, decay):
    nonlocal previous_lr
    previous_lr *= (start_lr / (1. + decay * epoch))
    return previous_lr
  return lr(epoch, start_lr, decay)

Exponential Decay

The following scheduling function exponentially decreases the learning rate over time from starting point. Mathematically it can be reporesented as \(lr = lr_0 * \exp^{-k*t}\) where \(lr_0\) is the initial learning rate value, \(k\) is a decay hyperparameter and \(t\) is the epoch/iteration number.

# Define configuration parameters
start_lr = 0.001
exp_decay = 0.1

# Define the scheduling function
def schedule(epoch):
  def lr(epoch, start_lr, exp_decay):
    return start_lr * math.exp(-exp_decay*epoch)
  return lr(epoch, start_lr, exp_decay)

Constant then exponential decayed

The following scheduling function keeps the learning rate at starting value for the first ten epochs and after that will decrease it exponentially.

# Define configuration parameters
start_lr = 0.001
rampup_epochs = 10
exp_decay = 0.1

# Define the scheduling function
def schedule(epoch):
  def lr(epoch, start_lr, rampup_epochs, exp_decay):
    if epoch < rampup_epochs:
      return start_lr
    else:
      return start_lr * math.exp(-exp_decay * epoch)
  return lr(epoch, start_lr, rampup_epochs, exp_decay)

One Cycle Learning Rate

The following scheduling function gradually increases the learning rate from a starting point up to a max value during a period of epochs. After that it will decrease the learning rate exponentially and stabilise it to a minimum value. This scheduling algorithm is also known as One Cycle Learning Rate source

# Define configuration parameters
start_lr = 0.0001
min_lr = 0.00001
max_lr = 0.001
rampup_epochs = 10
sustain_epochs = 0
exp_decay = 0.8

# Define the scheduling function
def schedule(epoch):
  def lr(epoch, start_lr, min_lr, max_lr, rampup_epochs, sustain_epochs, exp_decay):
    if epoch < rampup_epochs:
      lr = (max_lr - start_lr)/rampup_epochs * epoch + start_lr
    elif epoch < rampup_epochs + sustain_epochs:
      lr = max_lr
    else:
      lr = (max_lr - min_lr) * exp_decay**(epoch-rampup_epochs-sustain_epochs) + min_lr
    return lr
  return lr(epoch, start_lr, min_lr, max_lr, rampup_epochs, sustain_epochs, exp_decay)

Visualization

The following chart visualizes the learnining rate as it is scheduled by each of the previousy defined functions.

Learn more about LearningRateScheduler usage in tf-keras - link