Use line_profiler to profile your python code

If your program is slow then before anything it is important to identify the bottelneck or where most of the overhead is coming from. In fact, localting the overhead helps a lot in determining what to do next to improve performance.

In python we can use line_profiler package to identity where exactly most of time is spent.

pip install line-profiler

As an example, we will try to profile sklearn fit method which is used to train a linear regression model.

First, let’s create some toy data for training

from sklearn.datasets import make_regression

X, y = make_regression(random_state=13)

Second, create the model to train

from sklearn.linear_model import LinearRegression
est = LinearRegression()

Finally, initilize the LineProfiler by wrapping the fit methond from the Linear Regression model and then running the profile.

from line_profiler import LineProfiler

lp = LineProfiler(est.fit)
print("Run on a single row")
lp.run("est.fit(X, y)")
lp.print_stats()

The output of the profiling is very detailed with information about timing and number of hits for every line of code in the profiled program. In the case of fit method, it will look like this

Run on a single row
Timer unit: 1e-06 s

Total time: 0.022127 s
File: /usr/local/lib/python3.6/dist-packages/sklearn/linear_model/_base.py
Function: fit at line 467

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
                                             def fit(self, X, y, sample_weight=None):
                                                 """
                                                 Fit linear model.
                                         
                                                 Parameters
                                                 ----------
                                                 X : {array-like, sparse matrix} of shape (n_samples, n_features)
                                                     Training data
                                         
                                                 y : array-like of shape (n_samples,) or (n_samples, n_targets)
                                                     Target values. Will be cast to X's dtype if necessary
                                         
                                                 sample_weight : array-like of shape (n_samples,), default=None
                                                     Individual weights for each sample
                                         
                                                     .. versionadded:: 0.17
                                                        parameter *sample_weight* support to LinearRegression.
                                         
                                                 Returns
                                                 -------
                                                 self : returns an instance of self.
                                                 """
                                         
       1         10.0     10.0      0.0          n_jobs_ = self.n_jobs
       1          4.0      4.0      0.0          X, y = check_X_y(X, y, accept_sparse=['csr', 'csc', 'coo'],
       1       1515.0   1515.0      6.8                           y_numeric=True, multi_output=True)
                                         
       1          4.0      4.0      0.0          if sample_weight is not None:
                                                     sample_weight = _check_sample_weight(sample_weight, X,
                                                                                          dtype=X.dtype)
                                         
       1          5.0      5.0      0.0          X, y, X_offset, y_offset, X_scale = self._preprocess_data(
       1          4.0      4.0      0.0              X, y, fit_intercept=self.fit_intercept, normalize=self.normalize,
       1          3.0      3.0      0.0              copy=self.copy_X, sample_weight=sample_weight,
       1       1229.0   1229.0      5.6              return_mean=True)
                                         
       1          4.0      4.0      0.0          if sample_weight is not None:
                                                     # Sample weight can be implemented via a simple rescaling.
                                                     X, y = _rescale_data(X, y, sample_weight)
                                         
       1          7.0      7.0      0.0          if sp.issparse(X):
                                                     X_offset_scale = X_offset / X_scale
                                         
                                                     def matvec(b):
                                                         return X.dot(b) - b.dot(X_offset_scale)
                                         
                                                     def rmatvec(b):
                                                         return X.T.dot(b) - X_offset_scale * np.sum(b)
                                         
                                                     X_centered = sparse.linalg.LinearOperator(shape=X.shape,
                                                                                               matvec=matvec,
                                                                                               rmatvec=rmatvec)
                                         
                                                     if y.ndim < 2:
                                                         out = sparse_lsqr(X_centered, y)
                                                         self.coef_ = out[0]
                                                         self._residues = out[3]
                                                     else:
                                                         # sparse_lstsq cannot handle y with shape (M, K)
                                                         outs = Parallel(n_jobs=n_jobs_)(
                                                             delayed(sparse_lsqr)(X_centered, y[:, j].ravel())
                                                             for j in range(y.shape[1]))
                                                         self.coef_ = np.vstack([out[0] for out in outs])
                                                         self._residues = np.vstack([out[3] for out in outs])
                                                 else:
                                                     self.coef_, self._residues, self.rank_, self.singular_ = \
       1      19208.0  19208.0     86.8                  linalg.lstsq(X, y)
       1          8.0      8.0      0.0              self.coef_ = self.coef_.T
                                         
       1          5.0      5.0      0.0          if y.ndim == 1:
       1         42.0     42.0      0.2              self.coef_ = np.ravel(self.coef_)
       1         76.0     76.0      0.3          self._set_intercept(X_offset, y_offset, X_scale)
       1          3.0      3.0      0.0          return self

With such output it is easy to locate where most of the time was spent, in this case there is a considerable amout of time spent in safety checks that sklearn usually do for instance to avoid divisions by zero.

   491         1          4.0      4.0      0.0          X, y = check_X_y(X, y, accept_sparse=['csr', 'csc', 'coo'],
   492         1       1515.0   1515.0      6.8                           y_numeric=True, multi_output=True)

Then, another bottlenck happens in data preprocessing

       1          5.0      5.0      0.0          X, y, X_offset, y_offset, X_scale = self._preprocess_data(
       1          4.0      4.0      0.0              X, y, fit_intercept=self.fit_intercept, normalize=self.normalize,
       1          3.0      3.0      0.0              copy=self.copy_X, sample_weight=sample_weight,
       1       1229.0   1229.0      5.6              return_mean=True)

Finally, we can see that most of the time was spent in the actual training which seems that sklearn used numpy’s lstsq.

   532                                                       self.coef_, self._residues, self.rank_, self.singular_ = \
   533         1      19208.0  19208.0     86.8                  linalg.lstsq(X, y)

In this toy profiling example, we have identified the overhead inside sklearn’s fit as due to safety checks and data preprocessing. This means that we can go faster if we avoid those two steps (e.g. we know the data is perfect and does not need checks or normalization) by directly using numpy’s lstsq.

Related tips

Flax basics

Build a Machine Learning Docker image

Use line_profiler to profile your python code