AKS Thanos mtls setup – tls: failed to parse private key

I’m working on Thanos enabled multiple Prometheus cluster monitoring in Azure.

In general, I’ve set up this way. I have a management cluster host Thanos server and multiple Prometheus clusters sitting in a separated Azure resource group. The way that Thanos communicate with Prometheus is thru the Thanos sidecar.

I tried to use helm as much as possible. So for Thanos sidecar, I used Prometheus Operator to configured. And for Thanos in the management cluster, I used the Thanos official chart.

The sidecar uses ingress to expose out to the world. It uses mutual tls to communicate with the Thanos server. I use OpenSSL to create a CA and a cert for sub-domain for my Prometheus Thanos sidecar. After setup the A record. the URL should be publicly accessible. However with the following error on the browser because of the mtls.

400 Bad Request
No required SSL certificate was sent
nginx/1.19.1

The next thing is to configure the certificate from Thanos server. The same as above. I use the certs configured the client and server certificate. However, when I finishing install the Thanos helm chart. I realised the thanos querier is in a CrashLoopBackOff state. By checking the logs I found the following.

kubectl logs thanos-querier-9f5c7d485-rhlkq -n thanos
level=info ts=2020-12-18T02:45:43.2992997Z caller=main.go:138 msg="Tracing will be disabled"
level=info ts=2020-12-18T02:45:43.3022682Z caller=client.go:54 msg="enabling client to server TLS"
level=info ts=2020-12-18T02:45:43.3025245Z caller=options.go:76 msg="TLS client using provided certificate pool"
level=info ts=2020-12-18T02:45:43.3028162Z caller=options.go:104 msg="TLS client authentication enabled"
level=info ts=2020-12-18T02:45:43.3047268Z caller=options.go:27 protocol=gRPC msg="enabling server side TLS"
level=error ts=2020-12-18T02:45:43.3050427Z caller=main.go:171 err="tls: failed to parse private key\nserver credentials\ngithub.com/thanos-io/thanos/pkg/tls.NewServerConfig\n\t/app/pkg/tls/options.go:39\nmain.runQuery\n\t/app/cmd/thanos/query.go:481\nmain.registerQuery.func1\n\t/app/cmd/thanos/query.go:159\nmain.main\n\t/app/cmd/thanos/main.go:169\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:204\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1374\nsetup gRPC server\nmain.runQuery\n\t/app/cmd/thanos/query.go:483\nmain.registerQuery.func1\n\t/app/cmd/thanos/query.go:159\nmain.main\n\t/app/cmd/thanos/main.go:169\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:204\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1374\npreparing query command failed\nmain.main\n\t/app/cmd/thanos/main.go:171\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:204\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1374"

tls: failed to parse private key is actually the error. After the investigation, I found that’s because my pem file was created with a password. And Thanos does not have the capability to decrypt with a password at the moment.

So the quick way to fix it is to re-create the pem without a password.