The project I work on uses X509 certificates with custom extensions to manage content access on the Red Hat CDN. The basic idea is that Candlepin issues X509 certificates with an extension saying what content the certificate is good for. Client systems then use that certificate for TLS client authentication when connecting to the CDN. If the content they are requesting (deduced from the request URL) matches the content available to them in the certificate, then access is granted.
This system works well in practice except for one problem: every time content for a particular product changes, the content data in the X509 extension becomes obsolete. We have to revoke the obsolete certificates and issue new ones. The result is an extremely large certificate revocation list (CRL).
For our cryptography needs, Candlepin uses the venerable Legion of the Bouncy Castle Java library. This library anticipates normal CRL usage so when building a CRL object from an existing file, the entire structure is read into memory at once. This approach doesn’t scale well with the numbers of revoked certificates we are dealing with, so we needed to devise a way to stream the CRL. Moreover, the only thing we really care about for our purposes is the revoked certificate’s serial number.
Streaming the CRL means we need to dissect the ASN1 that describes the CRL one piece at a time. RFC 5280 to the rescue! Looking at the description of the ASN1 for a CRL reveals that before the sequence containing the revocation entries, there will be a thisUpdate
and optionally nextUpdate
field of either type UTCTime or GeneralizedTime. We need to descend in the ASN1 until we get to the thisUpdate
field, look for and discard the optional nextUpdate
field and then walk through the revokedCertificates
sequence reading the serial numbers.
That procedure is not exactly a walk in the park, so in the hope that someone else may find it useful, here is the solution I came up with. Keep in mind that the code does not check the signature on the CRL so this code should not be used for any CRL that you do not trust implicitly.
The end results are pretty dramatic. The benchmarking toolkit I’m using shows an improvement in execution time by an order of magnitude (from around 7 seconds to .7 seconds) and memory usage drops by about 30%. You can see the GC statistics in the graph below.
and the benchmarking results are
Benchmark Mode Cnt Score Error Units CRLBenchmark.inMemory avgt 20 7493.602 ± 941.592 ms/op CRLBenchmark.stream avgt 20 669.084 ± 91.382 ms/op
In writing this, A Layman’s Guide to a Subset of ASN.1, BER, and DER was of invaluable assistance to me as was the Wikipedia page on X.690. I recommend reading them both.
Much of the data in the PKI world is stored in Abstract Syntax Notation One (ASN.1) so a basic understanding is necessary. ASN.1 is a way to describe data by starting from primitive types and building up to more complex types. Do you remember Backus-Naur Form? What about writing XML schemas in XSD? It’s the same concept.
Let’s say we have a Widget. Every Widget has a model name, a serial number, and some inspection information with the name of the inspector and the dates of the inspections. Our Widget then looks like this in ASN.1:
Widget ::= SEQUENCE {
model IA5String,
serialNumber INTEGER,
inspections InspectionInfo
}
InspectionInfo ::= SEQUENCE {
inspectorName IA5String,
inspectionDates SEQUENCE OF DATE
}
Now let’s go over that. A SEQUENCE is one of the four ASN.1 structured types and it’s just an ordered collection of items. Inside that sequence we have an IA5String (International Alphabet 5 — basically ASCII), an INTEGER, and then an item of InspectionInfo type. We continue down and see InspectionInfo is also a SEQUENCE containing the inspector’s name and the inspection dates. The inspection dates are a SEQUENCE OF, another structured type that holds zero or more occurrences of a given type. In this case, the given type is DATE.
There is more to it that I don’t understand, but that is enough for the RFCs to make sense.
Now that we have our widget defined, we’ll use an ASN.1 library to build a data structure and then write it to DER. DER (Distinguished Encoding Rules) is just a way to encode an object in binary in a strict, unambiguous manner. Alternatively, we can define an ASN.1 structure and decode DER into it.
DER is the ASN.1 data encoded in binary, but DER isn’t so great if you need to email a public key to someone for example. The binary is apt to get screwed up in transit. So we just take the DER, encode it in Base64 and add on BEGIN and END markers. If you need something in DER, it’s as simple as striping off the markers, removing all the newlines, and then Base64 decoding. Do note, however, that OpenSSL can add “headers” to PEM files. For instance, if an RSA private key has been encrypted with DES3, you’ll see something like
Proc-Type: 4,ENCRYPTED
DEK-Info: DES-EDE3-CBC,C0F5225DEC6ADA07
before the actual Base64 encoded data. This is weird and from what I can tell non-standard. Here is how BouncyCastle reads PEM files.
The PKCS specifications are just ways to specify the ASN.1 for different PKI needs. There is a good overview here. If you’re ever in doubt, use openssl’s asn1parse command and it will show you all the gory details.
Let’s look at an example. PKCS1 defines the format of public and private RSA keys. OpenSSL outputs the RSA keys it generates in PKCS1 (this example goes for unencrypted keys; encrypted keys are in the non-standard format I mentioned earlier). Now that we know ASN.1, we can decode them. Here is the ASN.1 for RSA private keys in PKCS1:
RSAPrivateKey ::= SEQUENCE {
version Version,
modulus INTEGER, -- n
publicExponent INTEGER, -- e
privateExponent INTEGER, -- d
prime1 INTEGER, -- p
prime2 INTEGER, -- q
exponent1 INTEGER, -- d mod (p-1)
exponent2 INTEGER, -- d mod (q-1)
coefficient INTEGER, -- (inverse of q) mod p
otherPrimeInfos OtherPrimeInfos OPTIONAL
}
Version ::= INTEGER { two-prime(0), multi(1) }
(CONSTRAINED BY {
-- version must be multi if otherPrimeInfos present --
})
OtherPrimeInfos ::= SEQUENCE SIZE(1..MAX) OF OtherPrimeInfo
OtherPrimeInfo ::= SEQUENCE {
prime INTEGER, -- ri
exponent INTEGER, -- di
coefficient INTEGER -- ti
}
That’s a lot of stuff, but the otherPrimeInfos bit is optional and it’s unlikely that we’ll care about the version. Here’s how we would parse this using JSS and turn it into a usable Java object:
private KeyPair readPrivateKey(byte[] der, String password)
throws CertificateException {
try {
SEQUENCE.Template template = SEQUENCE.getTemplate();
// Create a template for the sequence matching the PKCS1 format
for (int i = 0; i < 9; i++) {
template.addElement(INTEGER.getTemplate());
}
SEQUENCE seq = (SEQUENCE) template.decode(new ByteArrayInputStream(der));
//element 0 is the version which we don't need
INTEGER mod = (INTEGER) seq.elementAt(1);
INTEGER pubExp = (INTEGER) seq.elementAt(2);
INTEGER privExp = (INTEGER) seq.elementAt(3);
INTEGER p1 = (INTEGER) seq.elementAt(4);
INTEGER p2 = (INTEGER) seq.elementAt(5);
INTEGER exp1 = (INTEGER) seq.elementAt(6);
INTEGER exp2 = (INTEGER) seq.elementAt(7);
INTEGER crtCoef = (INTEGER) seq.elementAt(8);
rsaKeyFactory = KeyFactory.getInstance("RSA");
RSAPublicKeySpec pubSpec = new RSAPublicKeySpec(mod, pubExp);
RSAPrivateCrtKeySpec privSpec = new RSAPrivateCrtKeySpec(
mod, pubExp, privExp, p1, p2, exp1, exp2, crtCoef);
return new KeyPair(
rsaKeyFactory.generatePublic(pubSpec),
rsaKeyFactory.generatePrivate(privSpec));
}
catch (Exception e) {
throw new CertificateException("Could not read private key", e);
}
}
What does that method do? First it defines a template that matches the PKCS1 ASN.1 (a SEQUENCE of 10 INTEGERs). Then we give the template a byte array and tell it to decode our DER data. Now we have the data in a SEQUENCE and we just pluck out the elements that we care about (in this case all the math stuff). We take the elements and define a KeySpec from them. We feed the KeySpec into a KeyFactory and build the KeyPair.
The PKCS12 format is a little more complicated. It defines a way to store a bunch of X509 certs and key pairs all in one file and then have that file encrypted. Thus, these files are often called “keystores”. If you are dealing with keystores often (either PKCS12 or the Java alternative, JKS), I recommend a tool called Portecle. It’s a GUI that let’s you see everything in a keystore, import or export items, and generate CSRs or import signed certs.
Why do we need a keystore? A freshly installed Fedora machine is only going to have a small list of accepted certificate authorities (CAs). You can either get a certificate that has a chain stretching back to one of these root CAs (like Thawte or Verisign) or you can install an additional CA into the system’s list of acceptable CAs. If you opt for the chain of trust approach, then a keystore is a good way of keeping everything in one package.
(On a side note, if you want to install an additional CA into your Fedora machine, drop it in /etc/pki/ca-trust/source/anchors/ and run the update-ca-trust
command.)
There are a lot of file extensions for PKI objects: pem, der, csr, key, crt, p12, etc. Naming a file “.pem” is pretty pointless as that only tells you the format and not what the object in the file is. In my opinion, it’s best to use extensions that mean something like “.key” for a private key, “.csr” for a certificate signing request, or “.p12” for a PKCS12 keystore.
Of course there is a lot more to learn about PKI, but hopefully this primer will give you a basic foundation.