Learning Rate Scheduling with Callbacks
One of the usefull tweaks for faster training of neural networks is to vary (in often cases reduce) the learning rate hyperprameter which is used by Gradient-based optimization algorithms.
Keras provide a callack function that can be used to control this hyperprameter over time (numer of iterations/epochs). To use this callback, we need to:
- Define a function that takes an epoch index as input and returns the new learning rate as output.
- Create an instance of
LearningRateScheduler
and pass the previously defined function as a parameter.
def sechdule(epoch):
...
lr_callback = tf.keras.callbacks.LearningRateScheduler(sechdule, verbose=True)
Scheduling functions
There is endless ways to schedule/control the learning rate, this section presents some examples.
Constant Learning Rate
The following scheduling function keeps learning rate at constant value regardless of time.
# Define configuration parameters
start_lr = 0.001
# Define the scheduling function
def schedule(epoch):
return start_lr
Time-based Decay
The following scheduling function gradually decreases the learning rate over time from a starting value. The mathematical formula is \(lr = \frac{lr_0}{(1+k*t)}\) where \(lr_0\) is the initial learning rate value, \(k\) is a decay hyperparameter and \(t\) is the epoch/iteration number.
# Define configuration parameters
start_lr = 0.001
decay = 0.1
# Define the scheduling function
def schedule(epoch):
previous_lr = 1
def lr(epoch, start_lr, decay):
nonlocal previous_lr
previous_lr *= (start_lr / (1. + decay * epoch))
return previous_lr
return lr(epoch, start_lr, decay)
Exponential Decay
The following scheduling function exponentially decreases the learning rate over time from starting point. Mathematically it can be reporesented as \(lr = lr_0 * \exp^{-k*t}\) where \(lr_0\) is the initial learning rate value, \(k\) is a decay hyperparameter and \(t\) is the epoch/iteration number.
# Define configuration parameters
start_lr = 0.001
exp_decay = 0.1
# Define the scheduling function
def schedule(epoch):
def lr(epoch, start_lr, exp_decay):
return start_lr * math.exp(-exp_decay*epoch)
return lr(epoch, start_lr, exp_decay)
Constant then exponential decayed
The following scheduling function keeps the learning rate at starting value for the first ten epochs and after that will decrease it exponentially.
# Define configuration parameters
start_lr = 0.001
rampup_epochs = 10
exp_decay = 0.1
# Define the scheduling function
def schedule(epoch):
def lr(epoch, start_lr, rampup_epochs, exp_decay):
if epoch < rampup_epochs:
return start_lr
else:
return start_lr * math.exp(-exp_decay * epoch)
return lr(epoch, start_lr, rampup_epochs, exp_decay)
One Cycle Learning Rate
The following scheduling function gradually increases the learning rate from a starting point up to a max value during a period of epochs. After that it will decrease the learning rate exponentially and stabilise it to a minimum value. This scheduling algorithm is also known as One Cycle Learning Rate source
# Define configuration parameters
start_lr = 0.0001
min_lr = 0.00001
max_lr = 0.001
rampup_epochs = 10
sustain_epochs = 0
exp_decay = 0.8
# Define the scheduling function
def schedule(epoch):
def lr(epoch, start_lr, min_lr, max_lr, rampup_epochs, sustain_epochs, exp_decay):
if epoch < rampup_epochs:
lr = (max_lr - start_lr)/rampup_epochs * epoch + start_lr
elif epoch < rampup_epochs + sustain_epochs:
lr = max_lr
else:
lr = (max_lr - min_lr) * exp_decay**(epoch-rampup_epochs-sustain_epochs) + min_lr
return lr
return lr(epoch, start_lr, min_lr, max_lr, rampup_epochs, sustain_epochs, exp_decay)
Visualization
The following chart visualizes the learnining rate as it is scheduled by each of the previousy defined functions.
Learn more about LearningRateScheduler usage in tf-keras - link