Fusion with AWS EKS and S3 object storage
Fusion streamlines the deployment of Nextflow pipeline in a Kubernetes cluster, because it replaces the need to configure and maintain a shared file system in your cluster.
Kubernetes config
You will need to create a namespace and a service account in your Kubernetes cluster to run the job submitted by the pipeline execution.
The following manifest shows the bare minimum configuration.
---
apiVersion: v1
kind: Namespace
metadata:
name: fusion-demo
---
apiVersion: v1
kind: ServiceAccount
metadata:
namespace: fusion-demo
name: fusion-sa
annotations:
eks.amazonaws.com/role-arn: "arn:aws:iam::<YOUR ACCOUNT ID>:role/fusion-demo-role"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: fusion-demo
name: fusion-role
rules:
- apiGroups: [""]
resources: ["pods", "pods/status", "pods/log", "pods/exec"]
verbs: ["get", "list", "watch", "create", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
namespace: fusion-demo
name: fusion-rolebind
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: fusion-role
subjects:
- kind: ServiceAccount
name: fusion-sa
The AWS IAM role should provide read-write permission to the S3 bucket used as the pipeline work directory. For example:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["s3:ListBucket"],
"Resource": ["arn:aws:s3:::<YOUR-BUCKET>"]
},
{
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:PutObjectTagging",
"s3:DeleteObject"
],
"Resource": ["arn:aws:s3:::<YOUR-BUCKET>/*"],
"Effect": "Allow"
}
]
}
In the above policy replace <YOUR-BUCKET>
with a bucket name of your choice.
Also, make sure that the role defines a trust relationship similar to this:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::<YOUR ACCOUNT ID>:oidc-provider/oidc.eks.<YOUR REGION>.amazonaws.com/id/<YOUR CLUSTER ID>"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.eu-west-2.amazonaws.com/id/<YOUR CLUSTER ID>:aud": "sts.amazonaws.com",
"oidc.eks.eu-west-2.amazonaws.com/id/<YOUR CLUSTER ID>:sub": "system:serviceaccount:fusion-demo:fusion-sa"
}
}
}
]
}
Nextflow configuration
The minimal Nextflow configuration looks like the following:
wave.enabled = true
fusion.enabled = true
process.executor = 'k8s'
k8s.context = '<YOUR K8S CLUSTER CONTEXT>'
k8s.namespace = 'fusion-demo'
k8s.serviceAccount = 'fusion-sa'
In the above snippet replace YOUR K8S CLUSTER CONTEXT
with Kubernetes context in your Kubernetes config, and save it
to a file named nextflow.config
into the pipeline launching directory.
Then launch the pipeline execution with the usual run command:
nextflow run <YOUR PIPELINE SCRIPT> -w s3://<YOUR-BUCKET>/work
Replacing YOUR PIPELINE SCRIPT
with the URI of your pipeline Git repository
and YOUR-BUCKET
with a S3 bucket of your choice.
To achieve best performance make sure to setup a SSD volumes as temporary directory. See the section SSD storage for details.