Skip to content

Model Serving

Compliance Info

Below we map the engineering practice to articles of the AI Act, which benefit from following the practice.

  • Art. 15 (Accuracy, Robustness and Cybersecurity), in particular:
    • Art. 15(4), model serving allows to run inference on models remotely, and to deploy models with redundancy to make them more resilient.

Motivation

Model serving is the process of deploying machine learning models to production environments, where they can be accessed and used by applications or users.

Not every AI system involves a real-time inference component, for example batch processing systems or offline analytics. However, for those that do, model serving is a critical part of the machine learning lifecycle.

Model serving needs to be designed to ensure that the deployed models are accurate, robust, and secure.

Implementation Notes

Containerization is a common practice for model serving, as it allows for the deployment of models in isolated environments, ensuring consistency and reproducibility. It also enables the use of different versions of models and dependencies without conflicts.

The models to be deployed should be obtained from a model registry, in order to preserve the traceability and reproducibility of the models. A model serving should be able to expose metadata about the model and its provenance, in order to associate this information with every prediction (see the page on inference logs).

Inference API Design

Real-time inference APIs provide the interface for applications to interact with the deployed models.

Designing the API for an interface is a trade-off between flexibility and usability.

While a bespoke API can be designed for a single model (which might be tied to a specific input data schema), a more generic API can be designed to support multiple models and input data schemas.

One such generic API is specified as the Open Inference Protocol, which is supported by several model serving frameworks. It provides API endpoints and type definitions for inference requests (with a flexible data schema), model metadata, and model management.

Key Technologies

Legal Disclaimer (click to toggle)

The information provided on this website is for informational purposes only and does not constitute legal advice. The tools, practices, and mappings presented here reflect our interpretation of the EU AI Act and are intended to support understanding and implementation of trustworthy AI principles. Following this guidance does not guarantee compliance with the EU AI Act or any other legal or regulatory framework. We are not affiliated with, nor do we endorse, any of the tools listed on this website.