Resource Management with Pruning
Similarly to Quantization which we saw earlier, Pruning is another useful technique that can be leaveraged to reduce model size and complexity resulting in better latency and reduced inference cost.
Note: Quantization and Pruning are not exclusive, but we can use both to get additional benefits and unlock more performance improvements.
TensorFlow Model Optimization Toolkit also provides support for different Pruning techniques.
Install the toolkit with pip
$ pip install tensorflow_model_optimization
Import it as tfmot
import tensorflow_model_optimization as tfmot
Now we can use many of the pruning functionalities for Keras which are available under tfmot.sparsity.keras
We can prune a model during training by wrapping it with prune_low_magnitude
like this
from tfmot.sparsity.keras import PolynomialDecay
from tfmot.sparsity.keras import prune_low_magnitude
pruning_schedule = PolynomialDecay(initial_sparsity=0.5, final_sparsity=0.8, begin_step=2000, end_step=4000)
model = create_model()
model_for_pruning = prune_low_magnitude(model, pruning_schedule=pruning_schedule)
...
model_for_pruning.fit(...)
Learn more about pruning in tf-keras with the following resources:
- Pruning http://yann.lecun.com/exdb/publis/pdf/lecun-90b.pdf
- The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks - arxiv.org