- cross-posted to:
- linux@lemmy.ml
- cross-posted to:
- linux@lemmy.ml
Debian 12.1 (6.1.0-11-amd64) running LXD/LXC and on an unprivileged container setting security.idmap.isolated=true
seems to fail to update the owner/group of the container’s files.
Here is an example:
# lxc launch images:debian/12 debian
(...)
# lxc config get debian volatile.idmap.base
296608
# lxc stop debian
Error: The instance is already stopped
# lxc config set debian security.idmap.isolated true
# lxc config get debian security.idmap.isolated
true
# lxc start debian
Now if I list the files on the container volume I’ll get they’re all owned by the host root
user:
# ls -la /mnt/NVME1/lxd/containers/debian/rootfs/
total 24
drwxr-xr-x 1 root root 154 Sep 5 06:28 .
d--x------ 1 296608 root 78 Sep 5 15:59 ..
lrwxrwxrwx 1 root root 7 Sep 5 06:25 bin -> usr/bin
drwxr-xr-x 1 root root 0 Jul 14 17:00 boot
drwxr-xr-x 1 root root 0 Sep 5 06:28 dev
drwxr-xr-x 1 root root 1570 Sep 5 06:28 etc
I tried multiple versions of LXD/LXC. This happens with both 5.0.2 from apt
as well with 4.0 and 5.17 (latest) from snap
.
Interestingly enough I have another Debian 10 (4.19.0-25-amd64) running and older LXD 4 from snap
and on that one things work as expected:
# ls -la /mnt/NVME1/lxd/containers/debian/rootfs/
total 0
drwxr-xr-x 1 1065536 1065536 138 Oct 29 2020 .
d--x------ 1 1065536 root 78 Oct 14 2020 ..
drwxr-xr-x 1 1065536 1065536 1328 Jul 24 19:07 bin
drwxr-xr-x 1 1065536 1065536 0 Sep 19 2020 boot
drwxr-xr-x 1 1065536 1065536 0 Oct 14 2020 dev
drwxr-xr-x 1 1065536 1065536 1716 Jul 24 19:08 etc
As you can see on this systems all the files are owned by 1065536:1065536
.
Update:
I tried to probe around the maps with lxc config show debian
in both machines and I saw this:
Machine running Debian 10:
security.idmap.isolated: "true"
(...)
volatile.idmap.base: "1065536"
volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1065536,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":1065536,"Nsid":0,"Maprange":65536}]'
volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1065536,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":1065536,"Nsid":0,"Maprange":65536}]'
volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":1065536,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":1065536,"Nsid":0,"Maprange":65536}]'
Machine running Debian 12:
security.idmap.isolated: "true"
(...)
volatile.idmap.base: "231072"
volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":231072,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":231072,"Nsid":0,"Maprange":65536}]'
volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":231072,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":231072,"Nsid":0,"Maprange":65536}]'
volatile.last_state.idmap: '[]'
Why didn’t it populate volatile.last_state.idmap: '[]'
?
How can I fix it? Thank you.
Apparently this is by design a feature of newer kernels. Here is a good explanation by Stéphane Graber, maintainer of LXC:
Prior to VFS idmap being available, we needed to work around file ownership by having LXD manually rewrite the owner of every single file on disk. That’s what you’re showing here on an older kernel.
On newer kernels, this is no longer needed as we can have the kernel keep the permissions on-disk unshifted and just shift in-kernel so the ownership looks correct inside of the container.
What you’re showing above looks like a perfectly working setup on a kernel that does support VFS idmap.
I could indeed config this on the host machine:
root@vm-debian-12-cli:~# lxc info | grep 'shift\|idmap' - storage_shifted idmapped_mounts: "true" shiftfs: "false" idmapped_mounts_v2: "true"
And inside containers the root mount point also shows as
idmapped
(last line):root@debian:~# cat /proc/self/uid_map 0 231072 65536 root@debian:~# cat /proc/self/gid_map 0 231072 65536 root@debian:~# cat /proc/self/mountinfo 490 460 0:24 /@rootfs/mnt/NVME1/lxd/containers/debian/rootfs / rw,relatime,idmapped shared:251 master:1 - btrfs /dev/sda1 rw,space_cache=v2,user_subvol_rm_allowed,subvolid=259,subvol=/@rootfs/mnt/NVME1/lxd/containers/debian
To disable this one might:
There is an environment variable that can be passed to LXD by adding an override in its systemd unit. LXD_IDMAPPED_MOUNTS_DISABLE=1
However, and according to Mr. Graber we shouldn’t do that:
Okay, so your system is operating perfectly normally and with the lowest overhead possible right now, nothing to be worried about.
The old pre-start shifting method was very slow and very risky as a crash or failure to shift a particular bit of metadata (ACL, xattr, …) could allow for a security issue with the container. It was also horrible for CoW filesystems as it effectively made it look like every single file in the container had been modified, potentially duplicating GBs of data.
shiftfs (which was an Ubuntu-specific hack) and now the proper VFS idmap shifting, simply have the kernel apply the reverse uidmap/gidmap on any filesystem operation to a mount that’s marked as idmapped. It’s an extremely trivial operation to perform, allows for dynamic changes to the container maps (very useful for isolated), allows for sharing data between containers and properly supports everything that can hold a uid/gid (ioctl, xattr, acl, …) so doing away with the risk of having missed something.