# Setting up your own BEAURIS site

To make use of it, you will need to create a site-specific GitLab repository, following the [provided example](https://gitlab.com/beaur1s/sample).

## Repository organisation

### `beauris.yml`

This is the main configuration file fo BEAURIS. This is where you will define which modules should be launched, and how they should be configured.

### `.gitlab-ci.yml`

This is the GitLab CI configuration file where the different tasks to run are defined. Although it needs to be written by hand currently, it should be automatically generated from `beauris.yml` at some point in a future release.

### `./organisms/`

This directory contains one yml file per organism. These file can be created/modified by creating pull requests.

Yaml files should conform to the schema in [`beauris/validation/template/schema.yaml`](https://gitlab.com/beaur1s/beauris/-/blob/main/beauris/validation/template/schema.yaml). There are no constraints on the filenames, but it is encouraged to name it using the pattern `genus_species.yml` or `genus_species_strain.yml`.

### `./locked/`

This directory contains yaml files derived from the ones in `./organisms`. They are generated automatically by CI to add internal information (like path/url to locked data).

No manual change should be done to these files. Ever. Period.

### Merge request labels and title

You can alter the way CI is run by using labels on MR, or by writing stuff in MR title.

Add a `run-everything` label to ignore any previous CI run and rerun all the tasks from scratch.

Add a `run-TASKID` label to ignore any previous CI run and rerun the task named "TASKID".

Add a `disable-everything` label to disable running all tasks.

Add a `disable-TASKID` label to avoid running the task named "TASKID".

Write `FU25` or `fu25` at the end of a MR title when you want to reuse the work dir from a previously merged MR (to fix problems in post merge pipeline typically). See #28 for details. (FU stands for Follow-Up (or anything else you prefer)).

## Setting up your own repo

### GitLab CI Runner

You will need a GitLab runner, with the ability to run jobs on a computing cluster.

At GOGEPP we do it like this:

- Docker GitLab runner, running on our Swarm cluster
- With mounted volumes to access raw data, BEAURIS workdir and the runner builds_dir
- With mounted Slurm configuration files to be able to submit jobs
- CI jobs are launched as docker containers from the Docker GitLab runner

Have a look at the [BEAURIS sample repository](https://gitlab.com/beaur1s/sample/-/tree/main/base_services/gitlab_runner) to learn how to setup a GitLab Runner on your infrastructure.

### Storage

BEAURIS needs a storage space to store files generated by the various modules. This storage should be accessible from the DRMAA cluster computing nodes, GitLab Runner, and Docker hosting servers.

A typical BEAURIS storage space (e.g. `/projects/beaurisample/` as used in the rest of the documentation) will be organised like this:

```txt
├── atlaris  # A git clone of your BEAURIS site-specific GitLab repository (e.g. similar to te [provided example](https://gitlab.com/beaur1s/sample))
├── ci_builds  # Directory used by your GitLab Runner to store CI builds data
├── ci_work  # Directory used by BEAURIS to store data while transforming/preparing data
├── locked  # Directory storing data locked by BEAURIS: don't modify the content by hand
└── services  # Directory used by BEAURIS to store all data for deployed Docker containers (docker-compose.yml, mounted data volumes)
```

You will typically want to have a specific Unix user for your project (e.g. `beauris`), that will be the ower of this whole storage space. Don't forget to set correct permissions to make sure no one will be able to read/modify the content of this storage space.

This storage space should also be accessible at the same path inside the Gitlab Runner, on the Swarm nodes, and on the computing nodes.

### Computing

GitLab Runner can be configured to be allowed to submit jobs to a computing cluster.

Currently, we have successfully used the following Docker image to do this: [genouest/docker_slurm_exec](https://github.com/genouest/docker_slurm_exec), which allows to submit jobs to a Slurm cluster using DRMAA (different Slurm versions available). Most of BEAURIS jobs will be launched using this image. If you are not using Slurm, or with a different versions, you can get inspiration from our image to create your own image.

### Swarm

We have made the choice to deploy web applications using Docker containers on a Swarm cluster. Swarm is very simple to set-up (compared to Kubernetes) and has proved to be enough for our use-case until now.

All web Docker containers are deployed on the Swarm cluster using Ansible playbooks, that can be configured in the beauris.yml file.

Have a look at the [sample `beauris.yml`](https://gitlab.com/beaur1s/sample/-/blob/main/beauris.yml) for an example.

Don't forget the additional environment variable named `ANSIBLE_SSH_KEY` that needs to be defined in GitLab CI settings (not in beauris.yml for security), see below.

On the Swarm controller, the following Python modules are required by the ansible playbook:

- jsondiff
- pyyaml

### Traefik/Authelia

HTTP proxying are currently managed using Traefik (BEAURIS automatically sets labels on containers, that are interpreted by Traefik to redirect the HTTP traffic to the corresponding container).

Authentication is currently performed using Authelia (coupled with Traefik), connected to an LDAP server (or file base account configuration). Other authentication systems should be possible, have a look at Traefik documentation for different setups.

You will need to deploy our own Traefik/Authelia setup to use it with BEAURIS.

Have a look at the [BEAURIS sample repository](https://gitlab.com/beaur1s/sample/-/tree/main/base_services/traefik) to learn how to setup Traefik/Authelia on your infrastructure.

On our sample repository, we have placed Traefik behind an Nginx reverse proxy. This is not mandatory, you can configure Traefik to be exposed directly on internet with correct certificates.

#### Domain names

The web applications that you will deploy using BEAURIS will be accessed at different urls. This is completely configurable, but we suggest the following domain names:

- https://auth.example.org: the authentication where Authelia will be accessible
- https://staging.beauris.example.org/sp/genus_species/: the main url to access an organism deployed with BEAURIS in staging mode
- https://staging.beauris.example.org/sp_priv/genus_species/: the main url to access a restricted organism deployed with BEAURIS in staging mode
- https://beauris.example.org/sp/genus_species/: the main url to access an organism deployed with BEAURIS in production mode
- https://beauris.example.org/sp_priv/genus_species/: the main url to access a restricted organism deployed with BEAURIS in production mode

The schema of the urls you want to expose to users can be configured in `beauris.yml`.

Make sure to prepare the DNS records properly before trying to deploy Traefik/Authelia. Please use https access, and make sure you have properly configured certificates.

### Apollo

If you want to load organisms into an Apollo server, you will probably need to deploy it on your infrastructure.

Have a look at the [BEAURIS sample repository](https://gitlab.com/beaur1s/sample/-/tree/main/base_services/apollo) to learn how to setup Apollo on your infrastructure.

BEAURIS will use the REST API with [python-apollo](https://python-apollo.readthedocs.io/en/latest/) to interact with it.

### CI/CD settings

#### Secured environment variables

You need to set-up a few environment variables for BEAURIS, and you _must_ define them only in GitLab CI/CD settings (for security reasons).

Go to `Settings` > `CI/CD` > `Variables` and add them one by one (check `Mask variable` when possible):

```yaml
ANSIBLE_SSH_KEY: xxxxxxxxxxxxxxxxxxxx  # A private ssh key to connect to the Swarm controller, stored as variable (not file). The public key must be in the authorized_keys of the user on the swarm controller

GALAXY_URL: https://usegalaxy.*/  # URL to the Galaxy server that will execute jobs
GALAXY_API_KEY: xxxxxxxxxxxxxxxxxxxx  # API key to connect to the Galaxy server that will execute jobs

APOLLO_PASS_PROD: xxxxxxxxxxxxxxxxxxx  # Password of the production Apollo server (if using Apollo)
APOLLO_PASS_STAGING: xxxxxxxxxxxxxxxxxxx  # Password of the staging Apollo server (if using Apollo)

GITLAB_BOT_TOKEN: xxxxxxxxxxxxxxxxxx  # A project/user access token to the GitLab project ("api" scope, "reporter" role), used for posting comments from CI
GITLAB_RW_TOKEN: xxxxxxxxxxxxxxxxxxx  # A project/user access token to the GitLab project ("api" and "write_repository" scope, "maintainer" role), used for commiting/pushing lock files to master branch
```

As of march 2024, project access tokens are not possible for free projects on GitLab.com.

#### Other variables in `.gitlab-ci.yml`

Several variables need to be defined at the beginning of `.gitlab-ci.yml`.

Have a look at the [BEAURIS sample repository](https://gitlab.com/beaur1s/sample/-/blob/main/sample.gitlab-ci.yml) for more details.

Optionally, you can redefine some DRMAA variables if the [default values](https://github.com/genouest/docker_slurm_exec/blob/master/Dockerfile#L20) don't work for you:

```yaml
SLURMGID: '992'
SLURMUID: '992'
MUNGEGID: '991'
MUNGEUID: '991'
DRMAA_LIBRARY_PATH: '/etc/slurm/drmaa/lib/libdrmaa.so.1'
```

Optionally, you can also set other user groups that can be needed to access the raw data:

```yaml
OTHER_GID: 130844 # List of supplementary gid (comma separated) which the user is also a member of
OTHER_RUN_GROUP: 'foo_project' # List of supplementary group (comma separated) which the user is also a member of. The order must match the order used for OTHER_GID.
```

### Configuration: `beauris.yml`

BEAURIS needs to be configured in each site's `beauris.yml` file.

Although we don't have a detailed documentation for this file yet, have a look at the [example](https://gitlab.com/beaur1s/sample/-/blob/main/beauris.yml) to get a good idea of how you can use it.

### Data cleanup

The [sample `.gitlab-ci.yml`](https://gitlab.com/beaur1s/sample/-/blob/main/sample.gitlab-ci.yml) contains a scheduled task named `clean_workdir` to cleanup temporary data from BEAURIS work dir.

The default behavior is to delete data from merge request that were merged at least 30 days before (this is configurable).

You should set up a scheduled pipeline in your GitLab repo to run this task regularly.

Go to `Build` > `Pipeline schedules`. Click on `New schedule`.

- Description: `auto-clean`
- Interval Pattern : `Custom`, `0 7 1 * *`
- Select target branch or tag : `main` (or `master` for older repositories)
- Activated: checked

Click on `Create pipeline schedule`. Every month the scheduled task will be executed as a normal GitLab pipeline.