This repository has been archived on 2026-04-18. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
server-config/services/intel-gpu-collector.py
Simon Gardling 0235617627 monitoring: fix intel-gpu-collector crash resilience
Wrap entire read_one_sample() in try/except to handle all failures
(missing binary, permission errors, malformed JSON, timeouts).
Write zero-valued metrics on failure instead of exiting non-zero.
Increase timeout from 5s to 8s for slower GPU initialization.
2026-04-02 17:43:13 -04:00

3.2 KiB