Cuda, Glib, GCC, Poetry, and Descending One Layer of Abstraction

Hey! TLDR, I am having a hard time getting Flox to reconcile a few dependencies and have been going back and forth on it enough that I am wondering if I’m “missing the point” of the whole Flox / Nix ecosystem.

I started using flox maybe 4 or 5 months ago now, and it’s been pretty useful. However, for context, I’m quite “new”–I’ve been programming for <3 years, have really only started shipping stuff to a production environment in the last year, and have only done so as a solo dev without supervision in the last 6 months. Suffice it to say, I’ve never touched nix directly before and don’t really understand how it works.

I’m using a poetry environment nested inside of flox, and it’s been working quite well. It came time to add some gpu acceleration to my project via pytorch, and I had to get cuda working. After much pain, I can use nvcc to compile a cu script in flox. And, pytorch works on my machine in a poetry environment outside of flox. However, it’s not working in flox.

In the flox env, I dropped down to pycuda from torch and I found that for some reason flox was pointing to cuda 32 bit. When I forced it to point to cuda 64 bit, this issue I kept having with GCC and Glib popped back up, where it seems flox keeps trying to point to some version of glibc that is not actually compatible with cuda & my machine.

My instinct would be to try and pin glib to a version that works for me (2.35), but a) I don’t know how to do this as glibc does not have specific numbered versions in flox and b) it seems like it would be a bandaid over some underlying issue.

So, I guess I feel like I have to be missing something big here, and maybe it’s obvious–am I supposed to be installing cuda & nvidia drivers in flox, too? If so, do I then define a specific version for every machine I use?

It seems like either that is the case, or I need to understand flox better and perhaps edit whatever in flox is the equivalent of the nix shell?

I don’t know, I’m kind of lost, any direction would be helpful.

poetry run python -c "import pycuda.driver as cuda; cuda.init(); print(cuda.Device.count())"
/nix/store/4bj2kxdm1462fzcc2i2s4dn33g2angcc-bash-5.2p32/bin/bash: /usr/lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.36' not found (required by /nix/store/4bj2kxdm1462fzcc2i2s4dn33g2angcc-bash-5.2p32/bin/bash)
/nix/store/4bj2kxdm1462fzcc2i2s4dn33g2angcc-bash-5.2p32/bin/bash: /usr/lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.38' not found (required by /nix/store/4bj2kxdm1462fzcc2i2s4dn33g2angcc-bash-5.2p32/bin/bash)

Can you show us your manifest? If you want a particular glibc that you know for sure another package was built against you would normally put those in the same pkg-group.

My manifest is below.

I’m going to try the pkg-group thing and let you know.

#
# This is a Flox environment manifest.
# Visit flox.dev/docs/concepts/manifest/
# or see flox-edit(1), manifest.toml(5) for more information.
#
version = 1

# List packages you wish to install in your environment inside
# the `[install]` section.
[install]
python3 = { pkg-path = "python3", version = "^3.11" }
poetry = { pkg-path = "poetry" }
postgresql.pkg-path = "postgresql"
zlib.pkg-path = "zlib"
libpqxx.pkg-path = "libpqxx"

# Set environment variables in the `[vars]` section. These variables may not
# reference one another, and are added to the environment without first
# expanding them. They are available for use in the `[profile]` and `[hook]`
# scripts.
[vars]
# message = "Howdy"

# The `hook.on-activate` script is run by the *bash* shell immediately upon
# activating an environment, and will not be invoked if Flox detects that the
# environment has previously been activated. Variables set by the script will
# be inherited by `[profile]` scripts defined below. Note that any stdout
# generated by the script will be redirected to stderr.
[hook]
on-activate = '''
  # Autogenerated by Flox

  # Setup a Python virtual environment

  export POETRY_VIRTUALENVS_PATH="$FLOX_ENV_CACHE/poetry/virtualenvs"

  if [ -z "$(poetry env info --path)" ]; then
    echo "Creating poetry virtual environment in $POETRY_VIRTUALENVS_PATH"
    poetry lock --quiet
  fi

  # Quietly activate venv and install packages in a subshell so
  # that the venv can be freshly activated in the profile section.
  (
    source "$(poetry env info --path)/bin/activate"
    poetry install --quiet
  )

  # End autogenerated by Flox
'''

# Scripts defined in the `[profile]` section are *sourced* by *your shell* and
# inherit environment variables set in the `[vars]` section and by `[hook]` scripts.
# The `profile.common` script is sourced by all shells and special care should be
# taken to ensure compatibility with all shells, after which exactly one of
# `profile.{bash,fish,tcsh,zsh}` is sourced by the corresponding shell.
[profile]
common = '''
  # generated by me
  ulimit -n 10000
  export PATH=/usr/local/cuda-12.1/bin:$PATH  
  export PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring
  export LD_LIBRARY_PATH=/opt/nvidia/nsight-compute/2023.1.0/host/linux-desktop-glibc_2_11_3-x64:$LD_LIBRARY_PATH

'''
bash = '''
  # Autogenerated by Flox

  echo "Activating poetry virtual environment" >&2
  source "$(poetry env info --path)/bin/activate"

  # End autogenerated by Flox
'''
fish = '''
  # Autogenerated by Flox

  echo "Activating poetry virtual environment" >&2
  source "$(poetry env info --path)/bin/activate.fish"

  # End autogenerated by Flox'''
tcsh = '''
  # Autogenerated by Flox

  echo "Activating poetry virtual environment" >&2
  source "$(poetry env info --path)/bin/activate.csh"

  # End autogenerated by Flox'''
zsh = '''
  # Autogenerated by Flox

  echo "Activating poetry virtual environment" >&2
  source "$(poetry env info --path)/bin/activate"

  # End autogenerated by Flox
'''

# The `[services]` section of the manifest allows you to define services.
# Services defined here use the packages provided by the `[install]` section
# and any variables you've defined in the `[vars]` section or `hook.on-activate` script.
[services]
# postgres.command = "postgres --config-file=pg.conf"

# Additional options can be set in the `[options]` section. Refer to
# manifest.toml(5) for a list of available options.
[options]
systems = ["x86_64-linux"]
# Uncomment to disable CUDA detection.
# cuda-detection = false

Okay so I tried package groups in a million different configurations; I didn’t find a solution, but I think I have a better understanding of the problem.

Even if I set the glibc 2.35 as it’s own package group and the rest of my dependencies in a group with a higher version, say 2.39, I can’t build the environment. It appears that flox / nix itself requires glibc higher than 2.35, but 2.35 is precisely what is needed for my machine / gpu configuration to run torch w gpu.

My understanding is that I’m not supposed to try to install cuda in flox, as it is a system installation? It doesn’t seem trivial to just set a path that makes only cuda use glibc 2.35, or am I wrong there?

Is there a good solution, maybe with flakes, or do I need to use torch outside of flox?

One manifest config I tried:

# Python 3.12 and other non-CUDA dependencies should use glibc 2.39
python3.pkg-path = "python3"
python3.version = "^3.12.4"
python3.pkg-group = "glibc-2.39"

# Poetry, PostgreSQL, libpqxx, and zlib will use glibc 2.39 as well
poetry.pkg-path = "poetry"
poetry.version = "1.8.3"
poetry.pkg-group = "glibc-2.39"

postgresql.pkg-path = "postgresql"
postgresql.version = "16.4"
postgresql.pkg-group = "glibc-2.39"

zlib.pkg-path = "zlib"
zlib.version = "1.3.1"
zlib.pkg-group = "glibc-2.39"

libpqxx.pkg-path = "libpqxx"
libpqxx.version = "7.7.5"
libpqxx.pkg-group = "glibc-2.39"

# Set everything else to use glibc 2.35 (CUDA related components)
glibc.pkg-path = "glibc"
glibc.version = "2.35"
glibc.pkg-group = "glibc-2.35"

Response:

❌ ERROR: resolution failed: constraints for group 'glibc-2.35' are too tight

   Use 'flox edit' to adjust version constraints in the [install] section,
   or isolate dependencies in a new group with '<pkg>.pkg-group = "newgroup"'

I made some modifications but I don’t have your system to test. I logically grouped python, postgres, and glibc separately

also the glibc version number needs to match from flox show glibc so I think that might’ve been contributing to the problem.

[install]

# Python 3.12 and other non-CUDA dependencies should use glibc 2.39
python3.pkg-path = "python3"
python3.version = "^3.12.4"
python3.pkg-group = "python"

# Poetry, PostgreSQL, libpqxx, and zlib will use glibc 2.39 as well
poetry.pkg-path = "poetry"
poetry.version = "1.8.3"
poetry.pkg-group = "python"

postgresql.pkg-path = "postgresql"
postgresql.version = "16.4"
postgresql.pkg-group = "postgresql"

zlib.pkg-path = "zlib"
zlib.version = "1.3.1"
zlib.pkg-group = "glibc-2.39"

libpqxx.pkg-path = "libpqxx"
libpqxx.version = "7.7.5"
libpqxx.pkg-group = "glibc-2.39"

# Set everything else to use glibc 2.35 (CUDA related components)
glibc.pkg-path = "glibc"
glibc.version = "2.35-224"
glibc.pkg-group = "glibc-2.35"
glibc.systems = ["aarch64-linux", "x86_64-linux"]

maybe ignore my comment. probably need to start with the core problem: what OS and cuda driver are you using?

I’m using ubuntu 22.04.1, cuda 12.1–again, I’ve gotten cuda to run both directly and in a poetry env (less flox) with pytorch.

Thanks for the thoroughness of that manifest—I’ll still try it out if you think it might work.