AKS Thanos mtls setup – tls: failed to parse private key

I’m working on Thanos enabled multiple Prometheus cluster monitoring in Azure.

In general, I’ve set up this way. I have a management cluster host Thanos server and multiple Prometheus clusters sitting in a separated Azure resource group. The way that Thanos communicate with Prometheus is thru the Thanos sidecar.

I tried to use helm as much as possible. So for Thanos sidecar, I used Prometheus Operator to configured. And for Thanos in the management cluster, I used the Thanos official chart.

The sidecar uses ingress to expose out to the world. It uses mutual tls to communicate with the Thanos server. I use OpenSSL to create a CA and a cert for sub-domain for my Prometheus Thanos sidecar. After setup the A record. the URL should be publicly accessible. However with the following error on the browser because of the mtls.

400 Bad Request
No required SSL certificate was sent

The next thing is to configure the certificate from Thanos server. The same as above. I use the certs configured the client and server certificate. However, when I finishing install the Thanos helm chart. I realised the thanos querier is in a CrashLoopBackOff state. By checking the logs I found the following.

kubectl logs thanos-querier-9f5c7d485-rhlkq -n thanos
level=info ts=2020-12-18T02:45:43.2992997Z caller=main.go:138 msg="Tracing will be disabled"
level=info ts=2020-12-18T02:45:43.3022682Z caller=client.go:54 msg="enabling client to server TLS"
level=info ts=2020-12-18T02:45:43.3025245Z caller=options.go:76 msg="TLS client using provided certificate pool"
level=info ts=2020-12-18T02:45:43.3028162Z caller=options.go:104 msg="TLS client authentication enabled"
level=info ts=2020-12-18T02:45:43.3047268Z caller=options.go:27 protocol=gRPC msg="enabling server side TLS"
level=error ts=2020-12-18T02:45:43.3050427Z caller=main.go:171 err="tls: failed to parse private key\nserver credentials\ngithub.com/thanos-io/thanos/pkg/tls.NewServerConfig\n\t/app/pkg/tls/options.go:39\nmain.runQuery\n\t/app/cmd/thanos/query.go:481\nmain.registerQuery.func1\n\t/app/cmd/thanos/query.go:159\nmain.main\n\t/app/cmd/thanos/main.go:169\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:204\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1374\nsetup gRPC server\nmain.runQuery\n\t/app/cmd/thanos/query.go:483\nmain.registerQuery.func1\n\t/app/cmd/thanos/query.go:159\nmain.main\n\t/app/cmd/thanos/main.go:169\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:204\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1374\npreparing query command failed\nmain.main\n\t/app/cmd/thanos/main.go:171\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:204\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1374"

tls: failed to parse private key is actually the error. After the investigation, I found that’s because my pem file was created with a password. And Thanos does not have the capability to decrypt with a password at the moment.

So the quick way to fix it is to re-create the pem without a password.