This has pained me in the past because hardly any information was around. Tensorflow serving allows for Prometheus metrics. However figuring out how to write the appropriate file was kind of harder than the length of the file itself:
{
enable: true,
path: "/metrics"
}
If you’re running many different tensorflow serving servers, you can make it a ConfigMap common to all of them. In such a case, part of your deployment manifest could be like:
containers:
- image: tensorflow/serving:1.14.0
args:
- --port=8500
- --rest_api_port=8501
- --model_config_file=/models/serving.config
- --monitoring_config_file=/etc/monitoring.config