Hi! i have a mixed set of containers (a few, not too many) and bare-metal services (quite a few) and i would like to monitor them.
I am using good old “monit” that monitors my network interfaces, filesystems status and traditional services (via pid files). It’s not pretty, but get the work done. It seems i cannot find a way to have it also monitor my containers. Consider that i use podman and have a strict one service, one user policy (all containers are rootless).
I also run “netdata” but i find it overwhelming, too much data, too much graphics, just too much for my needs.
I need something that:
- let me monitor service status
- let me monitor containers status
- let me restart services or containers (not mandatory, but preferred)
- has a nice web GUI
- the web gui is also mobile friendly (not mandatory, but appreciated)
- Can print some history data (not manatory, but interesting)
- Can monitor CPU usage (mandatory)
- Can monitor filesystem usage (mandatory)
I don’t care for authentication features, since it will be behind a reverse proxy with HTTPS and proxy authentication already.
I am not looking for a fancy and comples dashboard, but for something i can host on a secondary page that i open if/when i want to check stuff. Also, if the tool can be scripted or accessed via an API could be useful, so i would write some extractors to print something in a summary page in my own dashboard.
I think Prometheus is a good industry standard. It can do everything you listed except for restarting stuff. It’s got a decent built-in monitoring capability and you can extend it trivially to monitor anything. For example I wrote a 5-liner to monitor ZFS health and another for LVM. I even monitor my routers with it. OpenWrt has an installable node exporter for Prometheus.
Service restarting is a remote execution capability and generally falls outside of the monitoring domain. You’d be better off implementing that with another process/service manager. If you’re running systemd, that’s one of its primary purposes. You can use it to start/stop/restart containers just like normal processes.
Can you share a guide / tutorial on how to accomplish what OP wants (or just get started with Prometheus)? I was in the same boat as OP and settled for netdata, and eventually gave up on monitoring altogether because it was either overwhelming me with data, too cumbersome to set up or had features behind paid plans.
Anytime you’re asking this, go for the projects Quick Start / Getting Started doc. In this case here. If you’re on a Debian based system Prometheus is already packaged in the repository so you don’t have to download the latest. You likely won’t win anything but the pain for having to set up the bare binary as a service with systemd. I followed that doc to setup mine but installed it from apt.
On a second thought, if you’re getting it from the repo and it already has a systemd unit defined, it might be more difficult to follow the Getting Started doc. You know what, follow it as-is. Once you have something running and monitoring ad-hoc, it’ll be easy to install from apt and put your config in it.
Give https://github.com/louislam/uptime-kuma a try. I’m planning to do the same for similar use case. Sensu (sensu.io) is a more sophisticated option but it requires more infrastructure and there is a bit of a learning curve with it.
While I really like uptime kuma, it seems a bit too restricted for OPs use case. For example, to monitor disk or CPU usage, you would need to write your own scripts. It would be doable, but not very nice.
At least how I understood the.question, OP would probably look for something like icinga.
Yeah better fit but a bit of trouble to setup… What’s your opinion on Icinga? Never used it myself.
We had it at work, but I never did anything else than receiving and resolving alerts. But it looked good for me and I liked the system.
Have you tried Cockpit? It has pretty nice Podman integration.
grafana is pretty annoying to learn and setup but it does everything you seem to want.
good ol’ nagios (or one of its forks)
You can do most of not all of this with CheckMk but it’s probably overkill.
I use Homarr and Enjoy it a lot. Nice Interface, can Monitor not only Services but also the Server itself and is quite Customizable.
If you go with dashboard approach, I would suggest Homepage
I like Homarr because it is Drag and Drop to edit the Page. I think with Homepage i would have to edit the config file.
I tried to spin up a Homarr docker container the other day after seeing it on YouTube, but because it’s located in ghcr it just wouldn’t install.
I even added ghcr to my resources in docker using my password and an API key, but still no dice.
I’m missing something obvious, but I’m not sure what, any pointers?
Edit: I’ve just tried again and this time it hasn’t failed with an error message, just hanging in Portainer stacks deployment instead
Edit 2: I left it hanging and checked while I was out and about (love Tailscale)and it’s working now!
deleted by creator
i have a mixed set of containers (a few, not too many) and bare-metal services
Containers run on bare metal. Or are you running them in a vm?
I run containers on bare metal indeed.
I have services running in containers on bare metal and services running without containers, on bare metal.