📌 For information on generic organisation of projects, see these slides (accessible with EPFL gdrive login).
- GitHub repository
- Connecting to the remote server
- IDE setup for remote development
- Environment setup
- Code and data storage
- Security
- Working on the remote server
- Managing CPU and GPU resources
- Debugging and troubleshooting
- Reproducibility and good practices
- End-of-project checklist
Before starting any coding, you must set up a GitHub repository in concertation with your supervisor in order to:
- confirm the GitHub organisation or account under which the repository should be created;
- define naming conventions consistent with the project;
- assign the appropriate license (see Licenses below).
Working repositories are set to private, they are made public at the end of the project.
- Use lower case.
- Use hyphens to separate tokens.
- If related to a larger project, prefix with that project's name (e.g.
impresso-image-classification). - In case of doubt, ask your supervisor.
You are free to structure your repository as you wish, but we advise:
project-name/
├── notebooks/ # Working notebooks
├── lib/ # Scripts with CLI interfaces (converted from notebooks)
├── report/ # PDF and LaTeX sources of your report
├── requirements.txt
└── README.md
Your README must include:
- Basic information: your name, supervisors' names, academic year.
- About: a brief introduction to the project.
- Research summary: a brief summary of your approaches/implementations and an illustration of your results.
- Installation and Usage: dependencies (platform, libraries), compilation if necessary, and how to run the code.
- License: as decided with your supervisor (see below).
The license is chosen by your supervisor, not by you. Your supervisor will indicate which open license applies (typically AGPL, GPL, LGPL, or MIT). Once confirmed, add the license file via GitHub (add new file → start typing "license" → pre-filled choices will appear) and add the following at the end of your README:
project_name - Jean Dupont
Copyright (c) Year Jean Dupont / EPFL
This program is licensed under the terms of the [license].
Your lab (DHLAB or LHST) can grant you access to a machine on the IC cluster. Ask your supervisor to obtain access.
- You must be on the EPFL campus or connected via the EPFL VPN. If you have not set it up yet, follow the official EPFL VPN instructions.
$USER= your Gaspar username.
ssh $USER@iccluster0XX.iccluster.epfl.chReplace XX with the machine number provided by your supervisor. The DHLAB node is iccluster028.
When connecting for the first time, type yes to accept the fingerprint:
The authenticity of host 'iccluster0XX.iccluster.epfl.ch (XX.XX.XX.XX)' can't be established.
ECDSA key fingerprint is SHA256:XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.
Are you sure you want to continue connecting (yes/no/[fingerprint])?
Using SSH keys avoids typing your password on every connection and is required by some IDE integrations.
# On your LOCAL machine — generate a key pair if you don't have one
ssh-keygen -t ed25519 -C "your_email@epfl.ch"
# Copy your public key to the remote server
ssh-copy-id $USER@iccluster028.iccluster.epfl.chYou can now connect without a password. Keep your private key secure and never share it.
- 256 GB RAM
- 2 GPUs
- 200 GB disk on
/home(shared — do not use for data storage) - Several TB under
/rcp-scratch(primary storage for all work)
du -sh ~ # your /home usage
du -sh /rcp-scratch/students/$USER/ # your scratch usage
df -h /rcp-scratch # total scratch space availableIf the cluster node is heavily loaded or you need more GPU resources, your lab may have access to Run:AI, EPFL's managed GPU scheduling platform. See RCP portal for documentation and access instructions. Ask your supervisor whether Run:AI is available for your project.
⚠️ The machine is shared by all lab members and students. Before running anything intensive, always check current resource usage (see Section 8).
When working on a cluster, your code can exist in three places: your local machine, the remote server, and GitHub. Before writing any code, decide on a workflow and stick to it.
⚠️ Edit code in one place only. Never edit the same file simultaneously on your local machine and on the server — this leads to conflicts and lost work. GitHub is your single source of truth; commit and push regularly from whichever side you are working on.
Your IDE connects to the remote server over SSH. You edit files as if they were local, but everything runs and is saved on the cluster. You push to GitHub from the server.
This is the simplest setup: one copy of the code, no sync to configure, no risk of local/remote divergence.
- Install the Remote - SSH extension.
- Open the Command Palette (
Ctrl+Shift+P/Cmd+Shift+P) → Remote-SSH: Connect to Host → enter:$USER@iccluster028.iccluster.epfl.ch - Once connected, open your project folder:
/rcp-scratch/students/$USER/your-project/. - Set the Python interpreter via the Command Palette → Python: Select Interpreter → point it to your remote conda or virtualenv binary (e.g.
/rcp-scratch/students/$USER/.conda/envs/py311/bin/python). - Open the integrated terminal (
Ctrl+````) — it runs directly on the server.
- Go to Settings → Project → Python Interpreter → Add Interpreter → On SSH.
- Enter the host (
iccluster028.iccluster.epfl.ch) and authenticate with your Gaspar credentials or SSH key. - Point the interpreter to the Python binary in your remote environment (e.g.
/rcp-scratch/students/$USER/.conda/envs/py311/bin/python).
📖 PyCharm SSH interpreter documentation
You write code on your local machine. Your IDE automatically uploads every saved change to the server. You push to GitHub from your local machine.
Use this if you need to work offline, or if you prefer your local tooling (linters, plugins, etc.). Be disciplined: always let the IDE sync before running anything on the server, and never edit files on the server directly while in this mode.
PyCharm can mirror your local project to the server on every save:
- Go to Settings → Build, Execution, Deployment → Deployment → Add → choose SFTP.
- Configure the connection to
iccluster028.iccluster.epfl.chwith your credentials or SSH key. - Under Mappings, set the remote path to
/rcp-scratch/students/$USER/your-project/. - Enable Tools → Deployment → Automatic Upload — PyCharm will sync every saved file immediately.
- Set the remote interpreter as in Option A above.
📖 PyCharm deployment documentation
VS Code does not have built-in auto-sync, but the SFTP extension provides similar functionality. Configure it with the server address and your remote project path.
💡 With either option, the integrated terminal in your IDE opens directly on the remote server, letting you run scripts, manage environments, and monitor jobs without a separate SSH window.
⚠️ Never install environments or store packages in/home. The/homepartition is small and shared. Always use/rcp-scratch/students/$USER/.
mkdir -p /rcp-scratch/students/$USERStep 1 — Configure conda to store environments on scratch space (do this before creating any environment):
conda config --add envs_dirs /rcp-scratch/students/$USER/.conda/envs
conda config --add pkgs_dirs /rcp-scratch/students/$USER/.conda/pkgsStep 2 — Create a new environment:
export PYTHONUTF8=1
export LANG=C.UTF-8
conda create -n projectxyz-py311 python=3.11Activate: source activate projectxyz-py311
Deactivate: source deactivate
If you already created a conda environment in /home by mistake, move it to scratch space:
# 1. Export the existing environment
conda activate your_env_name
conda env export > environment.yml
conda deactivate
# 2. Remove the old environment
conda env remove -n your_env_name
# 3. Make sure conda is configured for scratch (see Step 1 above)
# 4. Recreate the environment in scratch
conda env create -f environment.ymlvirtualenv /rcp-scratch/students/$USER/testenv
source /rcp-scratch/students/$USER/testenv/bin/activateTo avoid memory errors during package installation:
mkdir /rcp-scratch/students/$USER/tmp
export TMPDIR=/rcp-scratch/students/$USER/tmp/
pip install torchmkdir -p /rcp-scratch/students/$USER/.pipenv_tmpdir
export TMPDIR="/rcp-scratch/students/$USER/.pipenv_tmpdir"Also add to your ~/.bashrc:
export PIPENV_VENV_IN_PROJECT=1Large model caches (e.g. from Hugging Face) can quickly fill /home. Move them to scratch:
mkdir -p /rcp-scratch/students/$USER/.cache/Add this line to your ~/.bashrc so they are set automatically on each login:
export HF_HOME=$SCRATCH/.cache/huggingfaceThen reload: source ~/.bashrc
💡
.bashrcis not always sourced automatically at login. If your environment variables seem missing, runsource ~/.bashrcmanually.
| Content | Location |
|---|---|
| Code / repositories | /rcp-scratch/students/$USER/ |
| Datasets | /rcp-scratch/students/$USER/data/ |
| Model checkpoints | /rcp-scratch/students/$USER/checkpoints/ |
| conda environments | /rcp-scratch/students/$USER/.conda/ |
| Hugging Face cache | /rcp-scratch/students/$USER/.cache/ |
⚠️ Do not store any of the above under/home.
# Local → remote
scp -r /path/to/local/file.txt $USER@iccluster0XX.iccluster.epfl.ch:/rcp-scratch/students/$USER/
# Remote → local
scp $USER@iccluster0XX.iccluster.epfl.ch:/rcp-scratch/students/$USER/file.txt /path/to/local/# Local → remote
rsync -avh --progress /path/to/local/folder/ $USER@iccluster0XX.iccluster.epfl.ch:/rcp-scratch/students/$USER/folder/
# Remote → local
rsync -avh --progress $USER@iccluster0XX.iccluster.epfl.ch:/rcp-scratch/students/$USER/folder/ /path/to/local/folder/
⚠️ Be very careful about source and target paths withrsync. Ask your supervisor if unsure.
Common flags:
-a: archive mode (preserves permissions, symlinks, etc.)-v: verbose-h: human-readable sizes--progress: shows transfer progress
- Compress large corpora:
.tar.gzor.bz2. - Coordinate with your supervisor before duplicating large datasets that may already be on the cluster.
- Clean up unused checkpoints, logs, and intermediate files regularly.
🔴 Never put passwords, API tokens, or any credentials in your code or in a file tracked by Git.
- Store secrets (API keys, tokens, passwords) in environment variables or in a
.envfile — never hardcoded in scripts or notebooks. - Add
.envto your.gitignoreimmediately when you create the repository. - Use a password manager for personal credentials.
pip install python-dotenvCreate a .env file at the root of your project (never committed to Git):
HF_TOKEN=your_token_here
OPENAI_API_KEY=your_key_here
Load it in your code:
from dotenv import load_dotenv
import os
load_dotenv()
token = os.getenv("HF_TOKEN")Act immediately — do not just delete the file in a new commit, as the secret remains visible in the Git history.
- Revoke/rotate the credential straight away (API key, token, password) — assume it is compromised.
- Notify your supervisor.
- Remove the secret from the Git history using
git filter-repoor BFG Repo-Cleaner, then force-push. - If the repository is public, treat the secret as fully compromised regardless of how quickly you acted.
📖 GitHub guide: removing sensitive data
Always run long-running processes inside screen (or tmux) so they survive SSH disconnects.
Main screen commands:
| Command | Effect |
|---|---|
screen -S session_name |
Create and enter a new session |
Ctrl-A D |
Detach from the current session (it keeps running) |
screen -r session_name |
Reattach to a running session |
screen -rd session_name |
Force reattach if already attached |
screen -ls |
List all active sessions |
Ctrl-A K |
Kill the current session |
📖 References: linuxize tutorial · full documentation · man screen
Configure Jupyter for direct remote access by following the official documentation. Key steps:
jupyter notebook --generate-config
jupyter notebook passwordThen edit ~/.jupyter/jupyter_notebook_config.py:
c = get_config()
c.NotebookApp.ip = '0.0.0.0'
c.NotebookApp.port = XXXX # use the last 4 digits of your SCIPER to avoid port conflictsYour notebook will then be accessible at http://iccluster028.iccluster.epfl.ch:XXXX.
cd /rcp-scratch/students/$USER/your-project
screen -S work # create a screen session
source activate py311 # activate your environment
jupyter notebook
# Open http://iccluster028.iccluster.epfl.ch:XXXX in your browser
# Then detach: Ctrl-A DYou can now close your terminal and reconnect later — the notebook keeps running.
🔴 The cluster is a shared machine. Before launching any compute-intensive job, you are required to check current resource usage. Failing to do so may block other users' work.
# CPU and memory usage by all users
htop
# GPU usage (refreshes every 2 seconds)
nvidia-smi -l 2
# Your own running processes
ps -u $USER- Always run heavy jobs inside
screenortmux(never in a raw SSH session). - Use
niceto lower the scheduling priority of non-urgent jobs, leaving resources available to others:nice -n 10 python train.py # lower CPU priority - Close idle Jupyter notebooks that are holding GPU memory.
- Do not run multiple GPU-intensive jobs simultaneously without checking with your supervisor.
Do not launch additional intensive jobs. Contact your supervisor to coordinate — they can advise on scheduling, off-hours execution, or whether Run:AI (see Section 2) is a better option for your workload.
ps -u $USER # list your processes and their PIDs
kill -9 <PID> # force-kill a specific processDefault precision is FP32. FP16/bfloat16 may not be supported on all GPUs.
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)Free GPU memory when done:
del model
torch.cuda.empty_cache()If you encounter CUDA out of memory: reduce batch size, sequence length, or use gradient accumulation.
which python # verify you are using the right interpreter
which pipIf pip install fails with a temp-directory error:
mkdir /rcp-scratch/students/$USER/tmp
export TMPDIR=/rcp-scratch/students/$USER/tmp
pip install <package>| Error | Solution |
|---|---|
CUDA out of memory |
Reduce batch size or sequence length; use gradient accumulation |
| Driver mismatch | Restart your session; verify with nvidia-smi |
| GPU not found | Check torch.cuda.is_available(); confirm device is free with nvidia-smi |
Use screen or tmux — your session persists on the server regardless of your connection.
Move conda envs, .cache, and all data to /rcp-scratch/students/$USER/ (see Section 4 and Section 5).
- Version all code with Git and push to your GitHub repository regularly.
- Save environment files:
requirements.txt(pip freeze > requirements.txt) orenvironment.yml(conda env export > environment.yml). - Fix random seeds for all experiments to ensure reproducibility.
- Document experiments in your README or in notebooks (hyperparameters, dataset version, results).
- Clean up unused checkpoints, logs, and intermediate files on the cluster.
- License: apply the license chosen by your supervisor (see Section 1).
Before considering your project complete, go through the following steps with your supervisor.
- All code is committed and pushed to the GitHub repository.
- The README is complete and up to date (see Section 1).
- The final report (PDF) is added to the
report/folder in the repository. - Dependencies are fully documented (
requirements.txtorenvironment.yml). - Experiments are documented (hyperparameters, dataset versions, key results).
- Decide with your supervisor what data should be kept (e.g. final model weights, processed datasets) and what can be deleted.
- Remove unnecessary intermediate files, checkpoints, cached downloads, and temporary files from
/rcp-scratch/students/$USER/. - Confirm with your supervisor that no data needs to be archived or transferred elsewhere before your access is removed.
Once the repository is finalised, add your project to the appropriate lab page so it is discoverable by future students and researchers:
- DHLAB: open a PR to add your project to dh-epfl-students/EPFL-DHLAB-student-projects
- LHST: open a PR to add your project to dh-epfl-students/EPFL-LHST-student-projects
Your entry should include your name, project title, academic year, a link to the GitHub repository, and a link to the report. Do this in coordination with your supervisor.