Flox hangs during activation

On the newest version of flox at time of writing (1.3.17) on Fedora 41, I run into an issue where flox occasionally hangs during activation. I’m having trouble tracking down the cause so I’ll list some symptoms and facts about my setup and hopefully that will be helpful.

  • I’ve installed flox declaratively through the Nix/Generick setup via a home-manager flake.
  • I’m running Determinate Nix
  • The environment I’m activating it in can resolve and build, and activates sometimes. Usually the first try or two is fine. Then future attempts at activation hang later in the day (but not always. For instance, while writing this post I tried adding --mode=run to triage. The first time I added this the flox activation succeeded, the second time and onward it failed).
  • I only have this issue with 1 of my flox environments. It has some dev packages defining dependencies on a few languages, some defined variables, and some defined services.
  • After the issue appears, I have tried removing various parts of the flox config from the manifest as part of triage, but that has not resolved the issue.

given flox activate --start-services -vv this is where it hangs:

+ declare -r _flox_shell=/bin/bash
+ unset FLOX_SHELL
+ case "$_flox_shell" in
++ /nix/store/jzwgipqqh04hawnma4z54g6hnzyxh314-flox-activations-1.3.17/bin/flox-activations start-or-attach --runtime-dir /run/user/1000 --pid 294327 --flox-env /home/userName/project/path/.flox/run/x86_64-linux.projectName.dev --store-path /nix/store/5x1lh4q4ybyv2cjaijdnq5b3swacllly-environment-develop

Interestingly enough, if I run sudo /home/userName/.nix-profile/bin/flox activate the flox environment starts correctly.

However, if I run sudo /home/userName/.nix-profile/bin/flox activate --start-services the flox environment still hangs, but at a different place.

2025-04-08T19:44:45.400380Z DEBUG flox_rust_sdk::models::environment: detected concrete environment type: path
2025-04-08T19:44:45.400561Z DEBUG flox_rust_sdk::providers::services: running process-compose process list cmd=env NO_COLOR=1 PATH=/nix/store/gnf4wjrn5g17gzch44ycvb83ci4s8a6z-process-compose-1.40.1/bin/process-compose /nix/store/gnf4wjrn5g17gzch44ycvb83ci4s8a6z-process-compose-1.40.1/bin/process-compose --unix-socket /run/user/0/flox.0b6c17ad.sock process list --output json

Which makes me think it may be one of my services causing the issue. The only “unusual” service I added to my manifest recently is an ssh tunnel that I launch on environment activation i.e.:

[services.tunnel]
command = "ssh -N -L port:host:port host"
is-daemon = true
shutdown.command = "killall ssh"

Could that be causing an issue?

Is this possibly a permissions issue? What additional steps might I try to further triage or mitigate this issue?

Hej,

thanks for the report, we will have a look at this.
Do you have any more information about the environment, e.g.

  • can you share more of the manifest.toml (if not sensitive)?
  • there may be further logs in .flox/log/watchdog.*
  • are you usually activating environments with multiple users, i.e. root and your own user?
  • have you triaged the issue including removing the tunnel service?

Hello!

I have tried removing the tunnel service. This morning for instance I booted fresh, commented out the tunnel service, ran flox activate --start-services, and experienced the issue.

.flox/log/watchdog output:
2025-04-09T14:32:53.837309Z DEBUG flox_watchdog::logger: still watching, woof woof

I am not usually activating environments with multiple users. Using root yesterday was just a triage step as I was exploring possibilities.

Here is my sanitized manifest:

version = 1
[install]
ant.pkg-path = "ant"
java = { flake = "sanitized-flake-ref" }
nodejs-sanitized.pkg-path = "sanitized"
checkstyle.pkg-path = "checkstyle"
perl.pkg-path = "perl"
gradle.pkg-path = "gradle"

[vars]
SANITIZED_USERSPACE_PATH1_TO_PROJECT=""
SANITIZED_USERSPACE_PATH2_TO_ANOTHER_PROJECT=""

[hook]

[profile]

[services]
[services.tunnel]
command = "ssh -N -L port:host:port host"
is-daemon = true
shutdown.command = "killall ssh"
[services.a-service]
command = "podman run --rm -d --name service sanitized"
is-daemon = true
shutdown.command = "podman kill service"

[options]
systems = [
    "x86_64-linux",
]


A few things that can cause a hang:

  • the flakeref might be causing a build on new machines this is being activated on
  • The ssh service might be interactively asking for something like password/known_hosts/yubikey presses/gpg_askpass etc. Adding some verbosity to the ssh call would help verify.
  • podman is also known to sometimes ask for input
  • lingering services that did not get killed (podman’s engine or SSH Multiplexing/ControlMaster?)

Anything in flox services logs tunnel or flox services logs a-service?

My apologies for the late response.
flox services logs tunnel returns nothing and flox services logs --follow only shows the podman containers successfully starting.

I do think it’s an ssh issue though. I can reproduce the issue, run killall ssh and then the issue is resolved.

Thank you for your help! I’m not sure what to do to best provide this functionality (the ssh tunnel) in the context of Flox. Do you have a recommendation?

I’d recommend looking at something like autossh to keep this tunnel alive. I agree that this seems to be ssh-related, hence my guess this has to do with some interaction or SSH_ASKPASS thing.

The sudo comment above also makes me think there may be a perms/auth issue, which would cause prompting to take place.

1 Like