In this document, you can find an introduction to tools commonly used for remote editing and development. The generla workflow is the following:
syncthing
in this document)tmux
)debug
section), and edit it your local machineThis workflow allows you to:
Create a .ssh
folder in your home if it doesn’t exist and put in it a file called config
with the following content:
Host *
ControlMaster auto
ControlPath ~/.ssh/sockets/%r@%h-%p
ControlPersist 60
Host amadeus.lim.di.unimi.it !exec "[ -e ~/.ssh/socket-%r@%h:%p ]"
LocalForward 22001 localhost:22000
RemoteForward 22001 localhost:22000
LocalForward 8384 localhost:8384
LocalForward 8097 localhost:8097
LocalForward 6006 localhost:6006
This configuration allows the reuse of the first connection among several ssh
commands: if you connect through ssh to any host, all the folloqing ssh
commands will use the first connection: you won’t need to retype your password,
the overhead decreases and firewalls will not push you back because of the
proliferating of the number of connections. It also instructs ssh
command about the
file to be used as socket.
Moreover, we set ssh to always forward certain ports for the server amadeus.lim.di.unimi.it
.
For instance, port 8097 is the one used by visdom
for plotting from PyTorch and numpy,
ports 6006, 22001 and 22000 are used by Syncthing for syncing over ssh.
syncthing
syncthing
is an open source p2p syncing tool much more powerful than simple SFTP clients. However it must be configured on both sides.
syncthing
on both client and host (amadeus has it, but you can still download the pre-compiled binaries in your home if you do not have)localhost:8384
or to the displayed addressActions > Show ID
for the client device IDssh user@host -L 8385:localhost:8384
localhost:8485
from the client browser to access syncthing hostActions > Show ID
for the host device ID.git
)send only
watch for changes
is selectedActions > Advanced > [synced folder name] > Fs Watcher Delay
and set to 1 (the minimum allowed, 1 second)Save the configuration and restart.
If one of your machines is behind a strict firewall, it can be useful to use ssh tunneling for connecting.
This requires a particular configuration that you first connect to the remote machine through ssh with port forwarding and then start syncthing on both machines.
See the .ssh/config
file in previous paragraph. More info in the official guides
.
Once you have synced code, you should connect through ssh to the remote host and start tmux
to run your code. tmux
is a nice piece of software used for working on remote host. It can be used to run long experiments that need to disconnect your machine (e.g. experiments lasting many days). I really recommend to always use tmux. Using it, you can also use the same terminal creating multiple tabs and split. One recurring pattern, for instance, is having the terminal with 3 splits: one for the experiment running, the second for controlling the resources on the remote machine and the third to give commands to the machine.
For controlling tmux
you need to prepend each command with CTRL-b
: everything pressed after this combination will be interpreted as a tmux command and it will not be written to the terminal.
These steps exemplify the use of tmux
:
ssh username@host
tmux
. Now, a new blank terminal is shown. Notice the bottom bar indicating some useful informationCTRL-b
and %
. Now the window is splitted.CTRL-b
and "
.CTRL-b
and x
you can kill the splits (and all the precesses running from it). The same with the command exit
CTRL-b
and arrow you can move to an existing splitCTRL-b
and c
you can create a new tabCTRL-b
and [
you enter to “copy mode”: in this modality you can go up and down in the terminal and you can select and copy its content. Try surfing with arrows and PgUp
and PgDown
keys.CTRL+SPACE
to start copyingALT-w
or CTRL-w
to copy the selectionq
to exit the copy modeCTRL+b
and ]
to pasteCTRL+b
and d
. The tmux session will be detachedtmux a
to reattach to the previous session, that is continued to existSince key CTRL-b
is not that comfortable, I suggest to change it: create a file in your home called .tmux.conf
and put the following lines:
unbind-key C-b
set -g prefix `
bind-key ` send-prefix
Now instead of pressing CTRL-b
, you can just press ` char (for italian keyboard, you could use the \
char, for instance).
You can also customize your keybindings to make them more easy to remember. For instance, I use keybindings very similar to vim
. For more info see https://tmuxcheatsheet.com/
.
N.B. If the terminal become unresponsive, it can be that the ssh connection has been closed server-side (e.g. because of a connection error). In such a case, press ~
and then .
to close the connection and return to your local terminal. If it doesn’t work, see the “SOS” section.
It is a good practice to continuously monitor the resources on the host in a separate split.
For CPU and RAM, just run htop
command in a seprate always live split.
For GPU, run watch -n 5 nvidia-smi
. The watch
command will repeatedly run the nvidia-smi
command every 5 seconds.
With these two commands you should be able to understand if:
Note that in pytorch
you should use internal commands torch.cuda.memory_allocated
and torch.cuda.max_memory_allocated
since nvidia-smi
fails in showing the real amount of RAM used: docs
For debugging code from remote you need to use a debugger. I suggest to always debug your code without parallelism whenever possible.
In Python, just use the default debugger. Since version 3.7 you can simply add the instruction debugger()
one line before the one you want to start the debug. You can set up your preferred debugger (I suggest ipdb
). For previous versions use import pdb; pdb.set_trace()
.
A better debugger is ipdb (import ipdb; ipdb.set_trace()
). You will probably need to install it: pip install ipdb
. See its commands by pressing h
or here
If you really want a graphical debugger, you try wdb
. You will probably need to install it: pip install wdb
.
Another useful tool is pysnooper . You can use it in place of printing to stdout or of logging. It’s much easier to use and very powerful. I use it for debugging scripts with multiprocessing on different files.
I also recommend to use pyenv
and poetry
to isolate your project from the OS python packages.
In matlab
, you can use the default debugger by using the statement keyboard
just one line after you want to stop. Then you will be prompted and you can use typical matlab commands to show variables. However, the editing needs the use a emacs
keys (as of now I am still not able to change these mappings). Remember these ones:
KEY | ACTION |
---|---|
CTRL-a | move cursor to (at) beginning-of-line |
CTRL-e | move cursor to end-of-line |
CTRL-f | move cursor forward one character |
CTRL-b* | move cursor backward one character |
* if you are using the default tmux keymap, CTRL_B
is also the tmux escape sequence; in this case you’ll have to press twice CTRL-B
to send it to Matlab
For managing the debugging itself, you can use dbcont , dbstep , dbquit and all the commands listed in the official docs
Use Debugger
package: command ]add Debugger
from Julia REPL.
Set breakpoints with @bp
. Start debugging a function with @enter functionName(args)
(will stop at the first instruction) or with @run functionName(args)
(will stop at the first breakpoint).
Open the REPL. Run include('filename'); mainFunction()
to test mainFunction after having edited it. Alternatively, you can also try Revise
to automatically reload changed modules.
It can happen that your program fill the host resources. In that case, you can:
ssh
connection keeping pressed CTRL-C
1.
doesn’t work, create a new terminal or a new split in tmux and run kill PROCESS NUMBER
(you can find the process number with ps x
or htop
)2.
works for your, try killing the termux split with CTRL-b
and x
ssh username@host killall program
where program
is the command of the experiment that your were executing (i.e. python
, python3
, matlab
, etc.). This command will try to create a new ssh connection, run the command killall program
and then exit suddenly.When you are in a server without root access and you need to install some application, you are in a difficult situation. You can try to install it in your home, but it is not always possible and usually requires the admin intervention (i.e. Homebrew and Nix). Moreover, if you are worried about computational performances, Homebrew and Nix are not the best solutions. The best solution is to use a container.
Containers are environments that are isolated from the rest of the system. Most of the container technologies, however, still require the admin intervention to be installed, even though they’re much more likely already setup in the server than Homebrew and Nix.
It is difficult to chose the proper technology, though. 99% of the benchmarks around are made by sysadmins, that are mainly interested in startup times and web server performances, not in the computational overhead imposed by the container while running our scientific code.
A few time ago I have made some little benchmarks for these purposes. The results are shown below:
As you see, the best solution is by far Apptainer, then docker, devbox and podman. It must be noted that docker total score is so high because of its ability to asynchronously read and write files. However, this is not that relevant for scientific code, usually. Removed that, I would suggest using podman over docker.
But the truth is that Apptainer has been built for scientific code and perfectly fits our needs:
So, use it. The workflow is like this:
Points 1 and 2 are executed only once, while 3-5 are executed for each different server you deal with. Points 6-7 are executed every time you want to run your code.
So, here (Linux) and here (Windows and Mac) are the instructions to install Apptainer. In Windows, I suggest you use WSL, which is simpler.
Now you need to define an image. Here is an example that I have used in the past. However, here you can find more details.
You can build a definition file with apptainer build mycontainer.sif mycontainer.def
,
where mycontainer.def
is the name of the definition file (e.g. the one below) and
mycontainer.sif
is the name of the image you want to create.
# This line means "pick the base container image from
# the docker hub".
# Bootstrap: docker
Bootstrap: localimage
# Whereas here we specify the particular image we are
# interested in using as the base image, in this case
# a basic `fedora` system at version `39`.
# The base image is the operating system configuration
# that you want to customize.
From: fedora:39
# you can also start building from an existing image to modify it
# In this case, comment out this line and the line in %post that were already executed
# From: mycontainer.sif
# environment variables here
%environment
export LANG=en_US.UTF-8 # luatex needs locale set
# A definition file has several sections, see the documentation.
# In the `post` section you can run commands to customize
# your environment
%post
# This is the place where you can
# install additional dependencies.
########### My basic stuffs ###########
# software for development
dnf -y install tmux neovim syncthing fish zoxide fzf ripgrep openssh-clients openssh powerline git hostname fd-find copr-cli procps-ng syncthing htop
# python build dependencies
dnf -y install make gcc patch zlib-devel bzip2 bzip2-devel readline-devel sqlite sqlite-devel openssl-devel tk-devel libffi-devel xz-devel libuuid-devel gdbm-libs libnsl2
# python-pip
dnf -y install python3-pip
# lazygit
dnf -y install 'dnf-command(copr)'
dnf copr enable atim/lazygit -y
dnf -y install lazygit
# font
curl -OL https://github.com/ryanoasis/nerd-fonts/releases/latest/download/SourceCodePro.tar.xz
# locale
dnf -y install glibc-locale-source glibc-langpack-en
########### Other more specific stuffs ###########
# luatex and gregotex
dnf -y install texlive-collection-luatex texlive-collection-fontsrecommended texlive-collection-latexrecommended texlive-collection-latexextra texlive-collection-latex texlive-collection-music texlive-collec
tion-mathscience texlive-gregoriotex
dnf -y install latexmk
# tesseract
dnf -y install tesseract tesseract-langpack-eng tesseract-langpack-ita tesseract-langpack-ita_old tesseract-script-latin tesseract-tools tesseract-osd
Now you can run the image. You can do it in two ways:
instance
to which you can connect even after ssh disconnectionapptainer instance start mycontainer.sif a_name_for_this_instance
apptainer shell instance://a_name_for_this_instance
apptainer instance stop mycontainer.sif a_name_for_this_instance
apptainer shell mycontainer.sif
You could actually define a command to run inside the container in the %runscript
section of the definition file. In this case, you can just run ./mycontainer.sif
to
run it or apptainer instance run mycontainer.sif a_name_for_this_instance
to run it in
a detached instance.