Skip to content

Support recursive read-only bind mounts (kernel >= 5.12) #24229

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
danishprakash opened this issue Oct 10, 2024 · 17 comments · May be fixed by #25680
Open

Support recursive read-only bind mounts (kernel >= 5.12) #24229

danishprakash opened this issue Oct 10, 2024 · 17 comments · May be fixed by #25680
Labels
kind/feature Categorizes issue or PR as related to a new feature. stale-issue

Comments

@danishprakash
Copy link
Contributor

danishprakash commented Oct 10, 2024

With kernel < 5.12, recursive read-only bind mount would set the read-only flag on the top-level mount, not the sub-mounts, as a result, submounts remain rw:

$ mkdir -p mounts
$ sudo mount -t tmpfs tmpfs mounts
$ sudo mkdir -p ./mounts/{foo,bar}
$ sudo mount -t tmpfs tmpfs mounts/foo
$ sudo mount -t tmpfs tmpfs mounts/bar

# only the top-level mount is ro, sub-mounts are rw
$ podman run --rm -it -v ./mounts/:/tmp/mounts:ro alpine cat /proc/self/mountinfo | grep /tmp/mounts
1605 1600 0:114 / /tmp/mounts ro,relatime - tmpfs tmpfs rw,inode64
1606 1605 0:144 / /tmp/mounts/foo rw,relatime - tmpfs tmpfs rw,inode64
1607 1605 0:169 / /tmp/mounts/bar rw,relatime - tmpfs tmpfs rw,inode64

With kernel >= 5.12, it's possible to achieve this with the introduction of mount_setattr[1]. runc and crun supports rro[2] and docker forwards the rro mount option iff the kernel supports it[3].

We can do something similar I believe as part of generateSpec, unless I'm missing something here. Open to implementing this once there's approval.

cc/ @cyphar

podman version
Client:       Podman Engine
Version:      5.2.3
API Version:  5.2.3
Go Version:   go1.23.1
Built:        Tue Sep 24 22:37:25 2024
OS/Arch:      linux/amd64

[1] - torvalds/linux@2a18672
[2] - https://github.com/opencontainers/runc/blob/9112335fb2bb9eb93d1f21c267cd5599d0d199e7/libcontainer/specconv/spec_linux.go#L112
[3] - https://github.com/moby/moby/blob/810c7c1dce5bbf76af4ed5c6bac8c47fead4f6c6/daemon/oci_linux.go#L643-L647

@Luap99
Copy link
Member

Luap99 commented Oct 10, 2024

@giuseppe WDYT?

@giuseppe
Copy link
Member

I think it is a great idea, and crun already supports it.

@danishprakash would you like to work on a patch to add this functionality?

I am not sure if it should be done automatically in the way Docker does, or if it should be an explicit flag like rro. @containers/podman-maintainers opinions?

@mheon
Copy link
Member

mheon commented Oct 10, 2024

It feels like it ought to be the default, but we should also have the ability to turn it off via volume options

@Luap99
Copy link
Member

Luap99 commented Oct 10, 2024

I think the fact that :ro does not apply to child mounts violates POLA and given if one sets ro they likely want the security that this means a container cannot modify the content there so I think it would be a net win if ro is treated as rro on the kernel side.

Of course backwards compatibility is tricky with that behavior. If we need a why out what should we name such a option norro? And then I think we still must support older kernels right so we need some kind of feature checks which kind of suck IMO. I really hate these kernel version checks. I guess we could try a stub mount as well then cache it as file?

@giuseppe Do the oci runtimes crun/runc just hard error if we request rro and the kernel has no support for it?

@giuseppe
Copy link
Member

@giuseppe Do the oci runtimes crun/runc just hard error if we request rro and the kernel has no support for it?

yes, if we request rro and that cannot be honored then the OCI runtimes fail. A fallback to ro would be a security issue, since it loosens the configuration.

I think the fact that :ro does not apply to child mounts violates POLA and given if one sets ro they likely want the security that this means a container cannot modify the content there so I think it would be a net win if ro is treated as rro on the kernel side.

fair. Then we could have a check like containers/storage does for several kernel features and convert ro to rro when supported.

@danishprakash
Copy link
Contributor Author

@danishprakash would you like to work on a patch to add this functionality?

Yes, I can pick this up.


I agree that this should be the default, but given that backwards compatibility is going to be an issue, we can perhaps consider the approach taken by mount(8), they introduced ro=recursive syntax to get out of a similar situation[1]. This might be a way out but the default ro (non-recursive) still remains a security concern imo.

[1] - util-linux/util-linux#1501

@Luap99 Luap99 added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 22, 2024
Copy link

A friendly reminder that this issue had no activity for 30 days.

@giuseppe
Copy link
Member

giuseppe commented Mar 6, 2025

@danishprakash still interested to work on this issue?

@danishprakash
Copy link
Contributor Author

Yes @giuseppe, just got back to it yesterday. Sorry about the long silence on this.

@danishprakash
Copy link
Contributor Author

Do we have a provision for a list of supported feature by the runtime--[crun|runc] features? Of course, there's cmd out, but I feel this might be a useful addition to cache runtime features. One possible way could be to have ContainerRuntime[Config|Features] within SpecGenerator and use it down the stack, but I'm not sure if that's the right place for it.

@giuseppe
Copy link
Member

we could cache the result of $OCI_RUNTIME features in the rundir

@danishprakash
Copy link
Contributor Author

How about fetching and caching the features as part of newConmonOCIRuntime() before returning an OCIRuntime unmarshalled onto opencontainers/runtime-spec/specs-go/features? That's similar to how docker does it and sounds like the right place to plug this into.

@giuseppe
Copy link
Member

yes, that seems like a good place to me

@danishprakash
Copy link
Contributor Author

Storing features in the runtime is fine, but using that while generating the spec isn't as straightforward given that we don't have access to the runtime while we create the spec. Possibilities include:

  • pass in the runtime but that seems wrong
  • request the default runtime again
  • simply figure out the runtime of choice from within specgen and parse it locally to figure out whether the runtime supports a certain feature--rro in this case--or not.

Ideally, I would have loved for the runtime to be available while parsing the spec so as to be able to appropriately generate the spec. Docker does this currently and I believe this would also be extensible in the future if we need to consult with the runtime while we creating the spec. Thoughts?

@giuseppe
Copy link
Member

  • pass in the runtime but that seems wrong

Given the alternatives, I'd do this. Why do you think it is wrong?

@containers/podman-maintainers any other suggestions?

@Luap99
Copy link
Member

Luap99 commented Mar 21, 2025

We don't have a daemon to store a cache and even if we only do it once on first start and write it to the rundir we then would have to always read an unmarshal it (likely not to noticeable but still some overhead). But the IMO worse problem with a cache like this is lifetime. So far we are using this to perform some kernel feature checks which makes sense to have them on tmpfs as a new boot could mean new kernel. However the oci runtime on the other hand might get updated without a reboot. As such we then cache the old features even when a newer runtime with the features might be in use.

Also we support many different runtimes via --runtime so would we not have to cache the features for every single one of them? Doing this properly will get much more complicated.


Kernel 5.12 is quite old now, could be just set this as baseline for our support an always add rro? What oci runtime versions started to support rro? AFAIK it doesn't only depend on the kernel version as the runtimes must set it via mount_setattr()?
People who can update podman should also be able to update its dependencies such as runtime and kernel.


As far as specgen generation goes I think it would be wrong to make decisions based on the runtime.
We support stuff like podman system migrate --new-runtime to update all existing containers to a new runtime. That of course is a bit questionable in itself with how we do that but if the spec changes based on the runtime then we can later not migrate and assume it still works. (I know assuming that it just works today is also be a bit of a stretch)

If we were to add runtime specific stuff to the spec then IMO it should be part of generateSpec() in libpod and never pkg/specgen. @mheon WDYT?

@mheon
Copy link
Member

mheon commented Mar 21, 2025

I agree with @Luap99 - this doesn't sound like a Specgen thing, but a generateSpec() thing. Features that are runtime-dependent should be added at that point in the process, where we are absolutely sure what runtime is in use, what version of the runtime is in use, and what kernel version is in use. We have access to the completed OCI runtime at that point, so if we cache the features in the runtime they are easily accessible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. stale-issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants