Defining benchmarks with decorators¶
To benchmark your machine learning code in nnbench, define your key metrics in Python functions and apply one of the provided decorators.
The available decorators are
- @nnbench.benchmark
, which runs a benchmark with supplied parameters,
- @nnbench.parametrize
, which runs several benchmarks with the supplied parameter configurations,
- @nnbench.product
, which runs benchmarks with all parameter combinations that arise from the supplied values.
First we introduce a small machine learning example which we will subsequently use to motivate the use of the three benchmark decorators.
We recommend to split the model training, benchmark definition, and benchmark running into different files. In this guide, these are called training.py
, benchmarks.py
, and main.py
.
Example¶
Let us consider an example where we want to evaluate a scikit-learn
random forest classifier on the Iris dataset.
For this purpose, we will define several helper functions inside a file, training.py
. We use prepare_data()
, to load the dataset, train_rf()
to train a random forest model with the specified parameters, and accuracy()
to calculate the accuracy of the supplied model on the given dataset.
# training.py
import numpy as np
from sklearn import base, metrics
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
def prepare_data() -> tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]:
data = load_iris()
X, y = data.data, data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
return X_train, X_test, y_train, y_test
def train_rf(X_train: np.ndarray, y_train: np.ndarray, n_estimators: int, max_depth: int, random_state: int = 42) -> RandomForestClassifier:
model = RandomForestClassifier(n_estimators=n_estimators, max_depth=max_depth, random_state=random_state)
model.fit(X_train, y_train)
return model
def accuracy(model: base.BaseEstimator, y_test: np.ndarray, y_pred: np.ndarray) -> float:
accuracy = metrics.accuracy_score(y_test, y_pred)
return accuracy
@benchmark
for single benchmarks¶
Now, we define our benchmarks in a new file called benchmarks.py
.
We first encapsulate the benchmark logic into a function, benchmark_accuracy()
which prepares the data, trains a classifier, and lastly, obtains the accuracy.
To mark such a function as a benchmark, we apply the @benchmark
decorator.
# benchmarks.py
import nnbench
from training import prepare_data, train_rf, accuracy
@nnbench.benchmark()
def benchmark_accuracy(n_estimators: int, max_depth: int, random_state: int) -> float:
X_train, X_test, y_train, y_test = prepare_data()
rf = train_rf(X_train=X_train, y_train=y_train, n_estimators=n_estimators,
max_depth=max_depth, random_state=random_state)
acc = accuracy(model=rf, X_test=X_test, y_test=y_test)
return acc
Warning
This training benchmark is designed as a local, simple, and self-contained example to showcase nnbench. In a real world scenario, to follow best practices, you may want to separate the data preparation and model training steps from the benchmarking logic and pass the corresponding artifacts as a parameter to the benchmark. See the user guide for more information.
Lastly, we set up a benchmark runner in the main.py
. There, we supply the parameters (n_estimators
, max_depth
, random_state
) necessary in the function definition as a dictionary to the params
keyword argument.
# main.py
import nnbench
r = nnbench.BenchmarkRunner()
reporter = nnbench.BenchmarkReporter()
result = r.run("benchmarks.py", params={"n_estimators": 100, "max_depth": 5, "random_state": 42})
reporter.display(result)
When we execute the main.py
we get the following output:
@nnbench.parametrize
for multiple configuration benchmarks¶
Sometimes, we are not only interested in the performance of a model for given parameters but want to compare the performance for different configurations.
To achieve this, we can turn our single accuracy benchmark in the benchmarks.py
file into a parametrized benchmark.
To do this, replace the decorator with @nnbench.parametrize
and supply the parameter combinations of choice as dictionaries in the first argument.
# benchmarks.py
import nnbench
from training import prepare_data, train_rf, accuracy
@nnbench.parametrize(
({"n_estimators": 10, "max_depth": 2},
{"n_estimators": 50, "max_depth": 5},
{"n_estimators": 100, "max_depth": 10})
)
def benchmark_accuracy(n_estimators: int, max_depth: int, random_state: int) -> float:
X_train, X_test, y_train, y_test = prepare_data()
rf = train_rf(X_train=X_train, y_train=y_train, n_estimators=n_estimators,
max_depth=max_depth, random_state=random_state)
acc = accuracy(model=rf, X_test=X_test, y_test=y_test)
return acc
Notice that the parametrization is still incomplete, as we did not supply a random_state
argument.
The unfilled arguments are given in BenchmarkRunner.run()
via a dictionary passed as the params
keyword argument.
# main.py
import nnbench
r = nnbench.BenchmarkRunner()
reporter = nnbench.BenchmarkReporter()
result = r.run("benchmarks.py", params={"random_state": 42})
reporter.display(result)
Executing the parametrized benchmark, we get an output similar to this:
python main.py
name value
------------------------------------------------ --------
benchmark_accuracy_n_estimators=10_max_depth=2 0.955556
benchmark_accuracy_n_estimators=50_max_depth=5 0.866667
benchmark_accuracy_n_estimators=100_max_depth=10 0.911111
@nnbench.product
for benchmarks over parameter configuration grids¶
In case we want to run a benchmark scan for all possible combinations of a set of parameters, we can use the @nnbench.product
decorator to supply the different values for each parameter.
# benchmarks.py
import nnbench
from training import prepare_data, train_rf, accuracy
@nnbench.product(n_estimators=[10, 50, 100], max_depth=[2, 5, 10])
def benchmark_accuracy_product(n_estimators: int, max_depth: int, random_state: int) -> float:
X_train, X_test, y_train, y_test = prepare_data()
rf = train_rf(X_train=X_train, y_train=y_train, n_estimators=n_estimators,
max_depth=max_depth, random_state=random_state)
acc = accuracy(model=rf, X_test=X_test, y_test=y_test)
return acc
We still provide the random_state
parameter to the runner directly, like we did with the @nnbench.parametrize
decorator.
By executing the benchmark, we get results for all combinations of n_estimators
and max_depth
.
It looks similar to this:
python main.py
name value
------------------------------------------------ --------
benchmark_accuracy_n_estimators=10_max_depth=2 0.933333
benchmark_accuracy_n_estimators=10_max_depth=5 0.955556
benchmark_accuracy_n_estimators=10_max_depth=10 0.977778
benchmark_accuracy_n_estimators=50_max_depth=2 0.933333
benchmark_accuracy_n_estimators=50_max_depth=5 0.911111
benchmark_accuracy_n_estimators=50_max_depth=10 0.977778
benchmark_accuracy_n_estimators=100_max_depth=2 0.933333
benchmark_accuracy_n_estimators=100_max_depth=5 0.955556
benchmark_accuracy_n_estimators=100_max_depth=10 0.955556