📖 Estimated reading time: 14 min
In the scenario analyzed, the servers in the Red Hat OpenShift cluster consume disk volumes via the Fibre Channel (FC) protocol. Red Hat OpenShift as a platform uses the HPE CSI Operator to create Persistent Volume Claims (PVCs) of the Block type via FC (RWO), and of the File type via NFS (RWX).
I am not affiliated with HP, nor do I have any particular interests in the matter. Nor am I an expert on HP products. My opinion on the aforementioned subject is neutral, and in this material I am merely presenting a technical view of the facts.
In no way is this article intended to manipulate opinions, benefit or denigrate products, or affect decisions regarding the purchase of inputs.
The following software versions were used in this laboratory:
*
Red Hat OpenShift 4.14.30
*
HPE CSI Operator 2.4.2
*
HPE Primera Operating System 4.5.21
HP’s documentation can be found here. However, I warn you that it has inconsistencies and gaps, which, depending on the implementation to be carried out, could cause problems. Sorry if this makes anyone uncomfortable, but it’s the truth to date.
To find out more about the Operator HPE CSI, visit its page.
By the way, the project’s GIT can be found here. Be sure to read the ISSUES of this project, as they contain important tips.
There are some, where what bothered me most was the lack of clear explanations about certain items, as well as the errors. I’m only going to cite 1 example of an error, so that this part of the material doesn’t get too boring.
In the official HP documentation, it says to use TCP port 8080:
However, in another part of the same documentation, it says to use TCP 443:
Curious, isn’t it?
I understand that people involved in a product don’t always have time to update documents or create public KBs (knowledge bases). However, bad documentation is an issue that really gets in the way of anyone working as an architect, consultant or sysadmin. Especially when available time is a critical factor.
The divergence of information mentioned above was easy to resolve by simply analyzing how the primera3par-csp-svc
service was deployed, thus identifying the TCP port in use on the POD.
# oc get services -n hpe-storage
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hpe-csi-operator-controller-manager-metrics-service ClusterIP XX.YY.ZZ.KK <none> 8443/TCP 10d
primera3par-csp-svc ClusterIP XX.YY.ZZ.KK <none> 8080/TCP 10d
But not everything is that simple…
Before moving on to solutions to common problems, let’s take a look at the operator. In the version used, when it is installed, the operator prepares the environment and loads a Helm Chart, where the latter does all the implementation of the necessary resources.
# oc get all -n hpe-storage
NAME READY STATUS RESTARTS AGE
pod/hpe-csi-controller-746d8f6748-6j68d 9/9 Running 3 (4d23h ago) 4d23h
pod/hpe-csi-node-677tk 2/2 Running 0 4d23h
pod/hpe-csi-node-bwklp 2/2 Running 0 4d23h
pod/hpe-csi-node-f2fdb 2/2 Running 9 (36h ago) 4d23h
pod/hpe-csi-node-f7n8m 2/2 Running 0 4d23h
pod/hpe-csi-node-kbc6d 2/2 Running 0 4d23h
pod/hpe-csi-node-l6nx5 2/2 Running 0 4d23h
pod/hpe-csi-node-m6k78 2/2 Running 0 4d23h
pod/hpe-csi-node-n75z2 2/2 Running 0 4d23h
pod/hpe-csi-node-qv8df 2/2 Running 0 4d23h
pod/hpe-csi-node-r6msx 2/2 Running 0 4d23h
pod/hpe-csi-node-stj5w 2/2 Running 0 4d23h
pod/hpe-csi-node-wm5fb 2/2 Running 0 4d23h
pod/hpe-csi-node-zzftk 2/2 Running 0 4d23h
pod/hpe-csi-operator-controller-manager-585f579bb9-mzwnd 2/2 Running 0 10d
pod/primera3par-csp-55d5db7dcf-s76jp 1/1 Running 0 10d
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/hpe-csi-operator-controller-manager-metrics-service ClusterIP 172.XX.YY.ZZ <none> 8443/TCP 10d
service/primera3par-csp-svc ClusterIP 172.XX.YY.ZZ <none> 8080/TCP 10d
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/hpe-csi-node 13 13 13 13 13 <none> 10d
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/hpe-csi-controller 1/1 1 1 10d
deployment.apps/hpe-csi-operator-controller-manager 1/1 1 1 10d
deployment.apps/primera3par-csp 1/1 1 1 10d
NAME DESIRED CURRENT READY AGE
replicaset.apps/hpe-csi-controller-746d8f6748 1 1 1 4d23h
replicaset.apps/hpe-csi-controller-76f8c46d98 0 0 0 10d
replicaset.apps/hpe-csi-operator-controller-manager-585f579bb9 1 1 1 10d
replicaset.apps/primera3par-csp-55d5db7dcf 1 1 1 10d
However, when the NFS service is activated at a later date, its resources are different and it uses a different namespace.
# oc get all -n hpe-nfs
NAME READY STATUS RESTARTS AGE
pod/hpe-nfs-6ccb97de-wwxlc 1/1 Running 0 5d2h
pod/hpe-nfs-cfa86445-7rdg9 1/1 Running 0 5d3h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/hpe-nfs-6ccb97de-7100-439e ClusterIP XX.YY.ZZ.KK <none> 49000/TCP,2049/TCP,2049/UDP,32803/TCP,32803/UDP,20048/TCP,20048/UDP,111/TCP,111/UDP,662/TCP,662/UDP,875/TCP,875/UDP 5d2h
service/hpe-nfs-cfa86445-46ee-42cf ClusterIP XX.YY.ZZ.KK <none> 49000/TCP,2049/TCP,2049/UDP,32803/TCP,32803/UDP,20048/TCP,20048/UDP,111/TCP,111/UDP,662/TCP,662/UDP,875/TCP,875/UDP 5d3h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/hpe-nfs-6ccb97de-7100-439e-8d05-355724b2b844 1/1 1 1 5d2h
deployment.apps/hpe-nfs-cfa86445-46ee-42cf-bf4b-74a3208f3fb8 1/1 1 1 5d3h
NAME DESIRED CURRENT READY AGE
replicaset.apps/hpe-nfs-6ccb97de-7100-439e-8d05-355724b2b844-6cc696b949 1 1 1 5d2h
replicaset.apps/hpe-nfs-cfa86445-46ee-42cf-bf4b-74a3208f3fb8-ccc694958 1 1 1 5d3h
It is important to know which components were created during the installation of the operator, as this is fundamental for analyzing problems.
Now for the part you’ve probably been waiting for. Here I’ll list all the errors I encountered until I was finally able to successfully use HP Primera Storage.
StorageClass:
failed to provision volume with StorageClass "hpe-standard": rpc error: code = Aborted desc = There is already an operation pending for the specified id CreateVolume:pvc-ec2b
Storage Authentication:
level=error msg="Allowing panic to escape" file="csp_manager.go:50" level=info msg="[ REQUEST-ID 100177 ] -- <<<<< createArraySession" file="panic.go:884" http: panic serving X.X.X.X:51022: &{[] 1000 unable to connect to X.X.X.X: dial tcp X.X.X.X:22: connect: connection timed out
Service Account:
level=error msg="Failed to delete node yyyyyyyyyy - hpenodeinfos.storage.hpe.com \"yyyyyyyyyy\" is forbidden: User \"system:serviceaccount:hpe-storage:hpe-csi-node-sa\" cannot delete resource \"hpenodeinfos\" in API group \"storage.hpe.com\" at the cluster scope" file="flavor.go:321"
level=error msg="Error obtaining node info by uuid xceg7621- Get \"https://XX.XX.XX.XX:443/apis/storage.hpe.com/v1/hpenodeinfos\": dial tcp XX.XX.XX.XX:443: i/o timeout\n" file="flavor.go:193"
First of all, it’s important to mention something here. Unlike other enterprise NFS solutions I’ve used, where the storage itself provided a centralized NFS service, this is not the case here.
NFS volumes provided by HPE CSI are delivered via a standalone POD that runs an NFS Server container, created specifically for the delivery of the requested RWX volume.
In other words, each RWX volume created will have its own Pod running an NFS service. This Pod, in turn, will have an RWO volume for its exclusive use.
Depending on the workload, this may not be the most scalable and redundant solution possible, but this is how the HP Primera product works.
🟢 TIP: To expand an HPE CSI NFS volume, you will need to expand the RWO volume for the Pod that is providing the NFS service for the namespace, not the NFS volume directly.
🔴 NOTE: At the time of writing, there is a limit of 32 NFS volumes per node of the OpenShift/Kubernetes cluster. It’s important to consider this information, as this detail can have an impact when multiple applications are requesting RWX volumes.
Now let’s move on to the common errors regarding NFS volumes:
Internal Server Error:
connection.go:252] GRPC error: rpc error: code = Internal desc = Failed to add ACL to volume pvc-fd1xxx for node &{ yyyyyyyyyy fcef9324-dbcb-186f-709a-ca1834942843 [0xc0005a3840] [0xc0005a3880 0xc0005a3890 0xc0005a38a0 0xc0005a38b0] [0xc0005a38c0 0xc0005a38d0] } via CSP, err: Request failed with status code 500 and errors Error code (Internal Server Error) and message (VLUN creation error: failed to find any ready target port on array XX.XX.XX.XX)
csi_handler.go:234] Error processing "csi-12f7d8c840b4759988": failed to attach: rpc error: code = Internal desc = Failed to add ACL to volume pvc-fd1xxx for node &{ yyyyyyyyyy fd1xxx-2843 [0xc0005a3840] [0xc0005a3880 0xc0005a3890 0xc0005a38a0 0xc0005a38b0] [0xc0005a38c0 0xc0005a38d0] } via CSP, err: Request failed with status code 500 and errors Error code (Internal Server Error) and message (VLUN creation error: failed to find any ready target port on array XX.XX.XX.XX)
FailedAttachVolume:
Warning FailedAttachVolume pod/hpe-nfs-e13575db-zdv4v AttachVolume.Attach failed for volume "pvc-8ebf57-c9f6" : rpc error: code = Internal desc = Failed to add ACL to volume pvc-8ebf57-c9f6 for node &{ yyyyyyyyyy 0bd8957e-9323 [0xc000f26a00] [0xc000f26a40 0xc000f26a50 0xc000f26a60 0xc000f26a70] [0xc000f26a80 0xc000f26a90] } via CSP, err: Request failed with status code 500 and errors Error code (Internal Server Error) and message (VLUN creation error: failed to find any ready target port on array XX.XX.XX.XX)
FailedMount:
Warning FailedMount pod/hpe-nfs-e13575db-zzdln Unable to attach or mount volumes: unmounted volumes=[hpe-nfs-e13575d-d21042f], unattached volumes=[hpe-nfs-e13575d-d21042f], failed to process volumes=[]: timed out waiting for the condition
ProvisioningFailed:
Warning ProvisioningFailed persistentvolumeclaim/my-rwx-nfs failed to provision volume with StorageClass "hpe-nfs": rpc error: code = DeadlineExceeded desc = context deadline exceeded
ProvisionStorage:
Warning ProvisionStorage persistentvolumeclaim/my-rwx-nfs gave up waiting for deployment hpe-nfs-e13575d-3d21042f to be available
Warning ProvisionStorage persistentvolumeclaim/my-rwx-nfs rollback of an existing pvc hpe-nfs-e13575d-3d21042f is under progress
Permission denied:
NFS volume mounts, but access denied when trying to write files.
sh-4.4$ echo 1 > /my-first-nfs/ok
sh: /my-first-nfs/ok: Permission denied
😓 As you can see, there were many problems. Does the storage really work? Is it broken?
The first step in solving any problem is to obtain as much information as possible from the available logs.
HP suggests using commands like the ones below to obtain logs.
$ kubectl logs daemonset.apps/hpe-csi-node hpe-csi-driver -n hpe-storage
$ kubectl logs deployment.apps/hpe-csi-controller hpe-csi-driver -n hpe-storage
Although these suggestions are helpful, you will need even more information.
This script is nothing fancy, or anything I can be proud of. It’s just a few commands stacked together to get all the information you need to analyze problems in one go. Use it as a base, and make any changes that are pertinent to your environment.
#!/bin/bash
echo "--------------------"
echo "| HPE-STORAGE LOGS |"
echo "--------------------"
oc logs -n hpe-storage daemonset.apps/hpe-csi-node --all-containers=true
echo; echo; echo; echo
oc logs -n hpe-storage deployment.apps/hpe-csi-controller --all-containers=true
echo; echo; echo; echo
oc logs -n hpe-storage deployment.apps/hpe-csi-operator-controller-manager --all-containers=true
echo; echo; echo; echo
oc logs -n hpe-storage deployment.apps/nimble-csp --all-containers=true
echo; echo; echo; echo
oc logs -n hpe-storage deployment.apps/primera3par-csp --all-containers=true
echo; echo; echo; echo
oc logs -n hpe-storage service/alletra6000-csp-svc --all-containers=true
echo; echo; echo; echo
oc logs -n hpe-storage service/alletra9000-csp-svc --all-containers=true
echo; echo; echo; echo
oc logs -n hpe-storage service/alletrastoragemp-csp-svc --all-containers=true
echo; echo; echo; echo
oc logs -n hpe-storage service/hpe-csi-operator-controller-manager-metrics-service --all-containers=true
echo; echo; echo; echo
oc logs -n hpe-storage service/nimble-csp-svc --all-containers=true
echo; echo; echo; echo
oc logs -n hpe-storage service/primera3par-csp-svc --all-containers=true
echo; echo; echo; echo
PODs=$(oc get pods -n hpe-storage -o custom-columns=POD:.metadata.name --no-headers)
for X in $PODs
do
oc logs -n hpe-storage pod/$X --all-containers=true
done
echo; echo; echo; echo
RSs=$(oc get replicaset -n hpe-storage -o custom-columns=POD:.metadata.name --no-headers)
for Y in $RSs
do
oc logs -n hpe-storage replicaset.apps/$Y
done
echo; echo; echo; echo
echo "CSI LOGS FROM NODE:"
ssh -i /root/xxxxx/id_rsa_xxxxx core@yyyyy "cat /var/log/hpe-csi-node.log"
echo; echo; echo; echo
echo "----------------"
echo "| HPE-NFS LOGS |"
echo "----------------"
oc get pv | grep hpe-nfs
echo
oc get pvc -n hpe-nfs
echo
oc get pvc -ntesting
echo
PODs=$(oc get pods -n hpe-nfs -o custom-columns=POD:.metadata.name --no-headers)
for X in $PODs
do
oc logs -n hpe-nfs pod/$X --all-containers=true
done
echo; echo; echo; echo
oc get events -n hpe-nfs
echo; echo; echo; echo
oc -n testing get pvc my-rwx-nfs -o yaml
echo; echo; echo; echo
oc get events -n testing
By collecting all these logs at once, it became a little clearer how the events correlated, and what processes were taking place in the persistent volume request operations.
I’m no expert on HP products, so maybe there’s a simpler way to debug. This was just the way I found to get the data I needed.
In this topic I will present the YAML templates that I used to do a successful integration, pointing out in them some important items that meant that the storage volumes could actually be used.
At times I had to read the .golang code in order to understand if a specific item was poorly documented, or to find out how to parameterize certain settings properly. For this reason, perhaps my templates are a little different from those found in traditional channels.
I will also show you some commands that can be used to track down the privileges that a Service Account (SA) needs, and how to adjust the Security Context Constraints (SCC) for this SA.
⚠️ My intention here is not to say that the way I succeeded is the right way or anything along those lines. What I'm presenting here is a solution that worked for my specific case. My recommendation is to always seek official support for any product before implementing solutions found on the internet. In my case, HP's support SLA didn't meet the project's time requirements, so I had to get my hands dirty to find solutions.
The backend configuration refers to how OpenShift will communicate with the HPE Primera storage.
An example of the complete template can be seen below.
# cat hpe-backend-secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: hpe-backend
namespace: hpe-storage
stringData:
serviceName: primera3par-csp-svc
servicePort: "8080"
backend: YOUR_STORAGE_IP
username: YOUR_STORAGE_LOGIN
password: YOUR_STORAGE_PASSWORD
In the driver configuration, only leave the storage model in use enabled (in this case, the model is Primera).
In my scenario, it was also necessary to declare backendType: primera
.
An example of the complete template can be seen below.
# cat hpe-csi-driver.yaml
apiVersion: storage.hpe.com/v1
kind: HPECSIDriver
metadata:
name: hpecsidriver
namespace: hpe-storage
spec:
csp:
affinity: {}
labels: {}
nodeSelector: {}
tolerations: []
node:
affinity: {}
labels: {}
nodeSelector: {}
tolerations: []
disable:
alletra6000: true <--- SUPPORT DISABLED
alletra9000: true <--- SUPPORT DISABLED
alletraStorageMP: true <--- SUPPORT DISABLED
nimble: true <--- SUPPORT DISABLED
primera: false
iscsi:
chapPassword: ''
chapUser: ''
controller:
affinity: {}
labels: {}
nodeSelector: {}
tolerations: []
disableNodeConfiguration: false
disableNodeConformance: false
disableNodeGetVolumeStats: false
imagePullPolicy: IfNotPresent
kubeletRootDir: /var/lib/kubelet/
logLevel: warn
registry: quay.io
backendType: primera
The following parameters had to be adjusted:
-
cpg: “SSD_r6”
-
hostSeesVLUN: “true”
-
reclaimPolicy: Delete
-
allowVolumeExpansion: true
-
allowOverrides: description,accessProtocol
The CPG can be consulted directly in Primera storage.
$ primera cli% showvv
-Rsvd(MiB)- --(MiB)--
Id Name Prov Compr Dedup Type CopyOf BsId Rd -Detailed_State- Snp Usr VSize
2 .mgmtdata full NA NA base --- 2 RW normal 0 524288 524288
3 .shared.SSD_r6_0 dds NA NA base --- 3 RW normal 0 1024 67108864
(...)
An example of the complete template can be seen below.
# cat hpe-sc-fc.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
storageclass.kubernetes.io/is-default-class: "true"
name: hpe-standard
provisioner: csi.hpe.com
parameters:
csi.storage.k8s.io/fstype: xfs
csi.storage.k8s.io/controller-expand-secret-name: hpe-backend
csi.storage.k8s.io/controller-expand-secret-namespace: hpe-storage
csi.storage.k8s.io/controller-publish-secret-name: hpe-backend
csi.storage.k8s.io/controller-publish-secret-namespace: hpe-storage
csi.storage.k8s.io/node-publish-secret-name: hpe-backend
csi.storage.k8s.io/node-publish-secret-namespace: hpe-storage
csi.storage.k8s.io/node-stage-secret-name: hpe-backend
csi.storage.k8s.io/node-stage-secret-namespace: hpe-storage
csi.storage.k8s.io/provisioner-secret-name: hpe-backend
csi.storage.k8s.io/provisioner-secret-namespace: hpe-storage
description: "Volume created by the HPE CSI Driver for Kubernetes"
accessProtocol: fc
cpg: "SSD_r6"
hostSeesVLUN: "true"
reclaimPolicy: Delete
allowVolumeExpansion: true
allowOverrides: description,accessProtocol
This was one of the most problematic items, because as one problem was solved, another appeared. Finally, it was possible to obtain a workable combination of configuration parameters.
HP recommends using EXT4 formatting for NFS volumes.
With all due respect, the following parameters had to be adjusted:
-
csi.storage.k8s.io/fstype: ext4
-
cpg: “SSD_r6”
-
accessProtocol: fc
-
hostSeesVLUN: “true”
-
nfsResources: “true”
-
fsMode: “775”
-
allowOverrides: description,nfsNamespace
-
allowMutations: description,nfsNamespace
-
reclaimPolicy: Delete
-
volumeBindingMode: Immediate
-
allowVolumeExpansion: false
An example of the complete template can be seen below.
# cat hpe-sc-nfs.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: hpe-nfs
annotations:
storageclass.kubernetes.io/is-default-class: "no"
provisioner: csi.hpe.com
parameters:
csi.storage.k8s.io/controller-expand-secret-name: hpe-backend
csi.storage.k8s.io/controller-expand-secret-namespace: hpe-storage
csi.storage.k8s.io/controller-publish-secret-name: hpe-backend
csi.storage.k8s.io/controller-publish-secret-namespace: hpe-storage
csi.storage.k8s.io/node-publish-secret-name: hpe-backend
csi.storage.k8s.io/node-publish-secret-namespace: hpe-storage
csi.storage.k8s.io/node-stage-secret-name: hpe-backend
csi.storage.k8s.io/node-stage-secret-namespace: hpe-storage
csi.storage.k8s.io/provisioner-secret-name: hpe-backend
csi.storage.k8s.io/provisioner-secret-namespace: hpe-storage
description: "NFS backend volume created by the HPE CSI Driver for Kubernetes"
csi.storage.k8s.io/fstype: ext4
cpg: "SSD_r6"
accessProtocol: fc
hostSeesVLUN: "true"
nfsResources: "true"
fsMode: "775"
allowOverrides: description,nfsNamespace
allowMutations: description,nfsNamespace
reclaimPolicy: Delete
volumeBindingMode: Immediate
allowVolumeExpansion: false
If all goes well, NFS RWX volumes can be used for your applications.
Typically, a deployment of a service also uses a service account.
⛔️ WARNING: If there are no errors in your logs like is forbidden: User
, DO NOT EXECUTE THE STEPS BELOW.
What privileges are needed?
To find the privileges required for a particular deployment, use the following steps.
⚠️ Note: Ideally, you should create a customized policy that contains exactly the privileges you need in the components where you need them. If this is not possible, follow the examples below.
1)
Use scc-subject-review
to find the necessary privilege level that the service account used in the deployment needs to perform its functions.
$ oc get deployment primera3par-csp -o yaml | oc adm policy scc-subject-review -f -
RESOURCE ALLOWED BY
Deployment/primera3par-csp privileged
2)
Check which service account is being used in this deployment:
$ oc describe deployment primera3par-csp | grep Account
Service Account: hpe-csp-sa
3)
Assign the necessary privilege to the service account.
$ oc adm policy add-scc-to-user privileged -z hpe-csp-sa
clusterrole.rbac.authorization.k8s.io/system:openshift:scc:privileged added: "hpe-csp-sa"
If the errors still persist in the logs and the volumes cannot be used, continue investigating other deployments carried out by the HPE CSI operator.
Example:
$ oc get deployment hpe-csi-controller -o yaml | oc adm policy scc-subject-review -f -
RESOURCE ALLOWED BY
Deployment/hpe-csi-controller privileged
$ oc describe deployment hpe-csi-controller | grep Account
Service Account: hpe-csi-controller-sa
$ oc get deployment hpe-csi-operator-controller-manager -o yaml | oc adm policy scc-subject-review -f -
RESOURCE ALLOWED BY
Deployment/hpe-csi-operator-controller-manager anyuid
1)
Following the same logic, use scc-subject-review
to find the required privilege level that the NFS service account needs.
$ oc -n hpe-nfs get deployment.apps/hpe-nfs-9e44f5e6-2c43-4863-95f6-298f30cb1149 -o yaml | oc adm policy scc-subject-review -f -
RESOURCE ALLOWED BY
Deployment/hpe-nfs-9e44f5e6-2c43-4863-95f6-298f30cb1149 privileged
2)
Find the service account running the NFS service:
$ oc -n hpe-nfs get deployment.apps/hpe-nfs-9e44f5e6-2c43-4863-95f6-298f30cb1149 -o yaml | grep -i account
serviceAccount: hpe-csi-nfs-sa
serviceAccountName: hpe-csi-nfs-sa
3)
Assign the necessary privilege to the service account:
$ oc adm policy add-scc-to-user privileged -z hpe-csi-nfs-sa
I hope that these tips will help anyone who is going through the same difficulties as me to solve their problems, and thus avoid delays in their projects. The Storage HP Primera is an interesting and flexible product, which has a peculiar way of working with some functions. This may cause some strangeness in more conservative sysadmins, but nothing that can’t be learned and enjoyed.
Did you like the content? Check out these other interesting articles! 🔥
Do you like what you find here? With every click on a banner, you help keep this site alive and free. Your support makes all the difference so that we can continue to bring you the content you love. Thank you very much! 😊