Using a hardware security module to secure the CyberArk Vault's Server Key

The CyberArk Vault allows for the Server key to be stored in a hardware security module (HSM). The Server key is used as a key-encryption-key so it is appropriate to use a HSM as they provide the highest level of protection for the Server key.

HSM integration with CyberArk is actually well-documented. CyberArk provides information on loading an existing Server key into an HSM as well as generating a new Server key directly in the HSM but as with anything, hands-on experience goes a long way to demystifying a concept.

In a lab environment, we will generate a new Server key directly on the HSM then perform a re-key of the Vault using that key. The SecurityServer simulator from Utimaco will be used as our HSM.

What are hardware security modules and what do they offer?

Hardware security modules provide a secure location to store cryptographic information (typically keys) as well as provide a secure place to perform cryptographic operations (such as using the keys to encrypt or decrypt objects.) HSMs are commonly physical, hardware devices that have tamper proof protections. They can be in the form of a network appliance or a PCI device.

Applications interact with HSMs through different APIs -- the PKCS#11 standard being one of the most common and the one CyberArk uses to talk to HSMs.

As HSMs store cryptographic information and provide the place to perform operations involving that information, applications need a constant connection to the HSM. When the connection is broken, the ability to use the cryptographic information is lost. The same applies to CyberArk: when it cannot connect to the HSM, critical functionality such as retrieving credentials from the Vault is does not work!

The Vault and it's encryption keys

Before doing any sort of HSM integration, it first makes sense to understand the encryption keys used by the Vault to secure Vault objects. The CyberArk Technical Community has a nice article on all the Vault keys and their purpose. We will focus on the ones used to encrypt Vault objects.

There are three 'tiers' of keys in the Vault encryption hierarchy: Vault keys, safe keys, and object keys. Each object is encrypted by a unique key specific to the object, object keys are encrypted by a safe key, and Vault keys encrypt the safe keys.

There are two types of Vault keys when referring to the encryption of Vault objects: the Recovery key (really it is a pair as the Recovery 'key' is asymmetric) and the Server key (symmetric.) Safe keys are encrypted and decrypted by both leading to two copies of the encrypted safe key. The Server key must be accessible to the Vault in order for it to start.

Only the Server key can be stored on a HSM and the private key for the Recovery key pair should be kept securely offline.

The CyberArk Technical Community has an excellent knowledge article regarding hierarchical key management.

HSM and CyberArk integration

To better understand HSM integration with CyberArk, in our CyberArk 12.6 Primary-DR lab environment running on Windows Server 2019, we will:

  1. Perform initial Vault configurations
  2. Generate a new Server key directly on the HSM
  3. Re-encrypt all the Vault data and metadata with the new Server key on the HSM

The Server key will be generated on the HSM from the DR Vault. We will then re-encrypt the DR Vault before re-encrypting the Primary vault.

Note: In the lab environment, the HSM (Utimaco's SecurityServer simulator) is already configured with a slot initialized to store our new Server key. On both the Primary and DR Vaults, the PKCS#11 provider used to communicate with the HSM is configured and functional. See this blog post dedicated to configuring Ultimaco's SecurityServer simulator.

Initial Vault configurations

On both Vaults, we need to directly edit the dbparm.ini to define the path to our HSM's PKCS11 provider DLL via the PKCS11ProviderPath as well as use AllowNonStandardFWAddresses to allow outbound communication to the HSM -- the last part being optional if your Vaults are version 12.6 and running Windows Server 2019 due to hardening changes.

image.png

Afterwards we need to define the PIN that the Vault will use to access the slot the Server key is located using CAVaultManager.exe SecureSecretFiles /SecretType HSM /Secret 1111

image.png

Opening dbparm.ini we see HSMPinCode defined with the value of our encrypted PIN.

image.png

Now we are in a position to generate the Server key.

Generate the new Server key

We will generate the Server key on the DR Vault. This only needs to be done once as the same Server key will be used for both the Primary and DR Vault. There is no requirement to generate it on one Vault versus the other -- we just do it on the DR Vault as we will re-encrypt the DR Vault with it first.

After stopping the CyberArk Vault Disaster Recovery service, we use CAVaultManger GenerateKeyOnHSM /ServerKey to generate the Server key. CAVaultManager uses the parameters defined in dbparm.ini to communicate with the HSM. Successful Server key generation lets you know the Vault will be able to as well.

image.png

We need to note down the KeyID as we will use it later.

Re-encrypt the DR Vault with our new Server key

At this point it makes sense to ensure the Primary Vault's Vault services have been stopped as after the re-encryption on the DR Vault the two will be using different Server keys and replication between them will not be possible until both are using the same, new Server key.

With the Master private key at the location defined at RecoveryPrvKey in the dbparm.ini, ChangeServerKeys is used to re-encrypt the Vault data.

We run ChangeServerKeys.exe C:\Keys\DemoOperatorKeys\ C:\Keys\DemoOperatorKeys\VaultEmergency.pass HSM#1. C:\Keys\DemoOperatorKeys\ refers to the location of the keys that will be used to re-encrypt the Vault -- including our existing Master private key. The Server key used will be the one on the HSM, however.

image.png

The changing of keys may take awhile depending on the specifications of your Vault and the HSM as well as the amount of data in the Vault. As this process creates new safe and object keys based on the new Server key, this could take days.

image.png

Like the output says, now we need to update the ServerKey parameter in the dbparm.ini to refer to the HSM.

image.png

And quickly start the Vault services.

image.png

The final verification is logging into PrivateArk.

image.png

After logging out of PrivateArk and stopping the Vault services on the DR Vault, we are ready to move on to the Primary.

Re-encrypt the Primary Vault with our new Server key

We run the exact same ChangeServerKeys command on the Primary Vault.

image.png

image.png

Like the DR Vault, we change the dbparm.ini on the Primary Vault to have the ServerKey parameter point to HSM#1, start the Vault services, and login.

image.png

At this point, we can start the CyberArk Vault Disaster Recovery service on the DR Vault and observe successful replication.

image.png

Vault behavior when the HSM is unavailable

Our Server key being stored on a HSM increases the security posture of our environment but also introduces complexity and another point of failure. We should ensure we have a good understanding on what the behavior of the Vault is when the HSM is unavailable and what we need to do.

We should consider the following:

  • Vault behavior when the Vault tries to start and the HSM is unavailable.
  • Vault behavior when the Vault is already running and the HSM is unavailable.
  • What needs to be done on the Vault side when the HSM becomes available.

Starting the Vault with the HSM unavailable

This is a pretty straightforward case. As the Server key is needed to start the Vault services and the Server key lives on the HSM, the HSM being unavailable prevents the Vault services from starting.

image.png

Looking in the log file for the PKCS#11 provider we are using, we see the device cannot be found which could give us a hint where to start our troubleshooting.

image.png

An already running Vault with the HSM unavailable

If the Vault is already running and the HSM becomes unavailable, the Vault does not become completely unavailable. Some functionality -- user authentication, report generation, user management, changing of safe memberships, etc. -- may still work and give a false sense of security.

What is guaranteed not to work is the retrieval of Vault objects (credentials.) When attempting to retrieve a credential while the HSM is unavailable, there will be an error about the Safe key being incorrect.

image.png

Peeking in the trace logs, we can even see the decryption failing:

image.png

Based on our understanding of the Vault's encryption hierarchy, this makes sense. We know each Vault object is encrypted by an individual key, which is then encrypted by a Safe key where the Safe key is then encrypted by the Server key. With the Server key being unavailable, the Safe key cannot be decrypted in order to be used to decrypt the object's key.

In our current setup, the HSM becoming available again is enough for the Vault to assume normal operations. The first retrieval of a credential takes awhile -- the Vault is probably doing whatever it needs to do to recover from an HSM outage -- while the subsequent ones go much quicker:

image.png

Recovery of the Vault with ReconnectHSMOnErrorCodes

In the case the Vault does not assume normal operations automatically after the HSM becoming unavailable again then we can use ReconnectHSMOnErrorCodes to have the Vault reconnect to the HSM.

DBPARM.sample.ini has an example of the values for ReconnectHSMOnErrorCodes (ReconnectHSMOnErrorCodes=48,50,179) but there does not seem to be a complete list of possible HSM error codes however we can see them in the trace logs with debug levels CRYPT(1,2).

image.png

image.png

Wrapping it up

HSMs provide the the best security for our Vaults' Server key by providing a secure location for them to reside and to perform cryptographic operations with. Generating a Server key and re-encrypting our Vaults' data is easy with the available documentation from CyberArk.

Once the integration with an HSM is complete, together with an understanding of the Vault's encryption hierarchy, Vault debug levels and trace logs give us important insight into when and how the Vault interacts with it, demystifying both HSMs and HSM integration with CyberArk's Vault.

Are you planning to or have you already integrated your CyberArk environment with a HSM? What HSM vendor are you using? Provide some feedback in the comments.