Crypto Acceleration: Enabling a Path to the Future of Computing

Subheadline.

Article

author-image

By

Intel sees a future where everything is encrypted, from your grocery list to your medical records. Today, data is cryptographically protected across layers of the software, network and storage stacks, resulting in the potential for multiple cryptographic operations being performed on every byte of data. These cryptographic operations are very compute intensive, yet they often support critical business operations where security is paramount. Intel’s approach is to reduce the cost of the cryptographic algorithm computations that are used to encrypt data.

For more than a decade, Intel has led the industry in reducing the compute cost of cryptographic algorithms through innovative new instructions, microarchitectural improvements and novel software optimization techniques. As a result, the industry has responded with a steady increase in the use of strong cryptographic ciphers to better secure data and communications.

“Intel has a rich cryptographic technology pipeline that enables us to constantly be on the frontlines of the latest innovations,” said Wajdi Feghali, Intel Fellow in the Intel Security, Architecture and Engineering Group. “As advancements with quantum computing continue to evolve, Intel is working to implement proper cryptography at the hardware level. Intel’s upcoming Intel Xeon processor architecture, code-named ‘Ice Lake,’ brings several new instructions coupled with algorithmic and software innovations to deliver breakthrough performance for the industry’s most widely deployed cryptographic ciphers.”

Brief History

In the early 2000s, the Federal Information Processing Standards (FIPS) Publication 197 specified the definition of the Advanced Encryption Standard (AES). Following this process closely, Intel made a bold move to augment its general-purpose processor’s instruction set with a fixed function set of instructions, Intel® AES New Instructions, that would dramatically reduce the compute cost for AES symmetric encryption. The instructions were first introduced in 2010 with the Intel® Core™ processor family.

Current Perspective

Today, security of data, communications and systems depends heavily on the suite of FIPS-140-2 algorithms. As the industry moves closer to a feasible quantum computer, multiple algorithms may be at risk of having their security controls reduced. This will impact both symmetric and asymmetric, or public, key cryptographic algorithms. Symmetric algorithms (AES) can be made resilient to quantum attacks by increasing key sizes (128 to 256 bits). However, new post-quantum-secure algorithms will likely replace existing asymmetric crypto algorithms (RSA and ECDSA). The industry may need to transition to new post-quantum cryptography standards and provide acceleration of those schemes to successfully navigate the coming decade.

Intel expects to see a long transition period where existing algorithms continue to be deployed while the new post-quantum-resistant algorithms get adopted. This is expected to impose a high computational burden that should be addressed with additional innovations. To support the next wave of adoption of both symmetric and public key algorithms, Intel continues to invest in new microarchitectural enhancements and innovative software techniques that, when used together, can help reduce the computations required to deploy strong cryptography in the future.

Upcoming 3rd Generation Intel® Xeon® Scalable Processors

In the upcoming 3rd generation Intel Xeon Scalable processers, code-named “Ice Lake,” Intel improved performance for the most widely used transport layer security (TLS) ciphers. TLS bulk data transfers have two algorithms applied to them: encryption and authentication.

Starting with the instruction set architecture (ISA), Intel introduced several enhancements designed to significantly increase cryptographic performance.

Public-Key Cryptography

New ISA support for “big number” multiplication often found in public-key ciphers, AVX512 Integer Fused Multiply Add (AVX512_IFMA). The instructions multiply eight – 52-bit unsigned integers found in the wide 512-bit (ZMM) registers, to produce the high and low halves of the result and add to the eight 64-bit accumulators. Combined with software optimization techniques, such as multi-buffer processing, these instructions provide significant performance improvements for RSA and elliptic curve cryptography.

Symmetric Encryption

Two instruction enhancements increase performance for AES symmetric encryption: vectorized AES (VAES) and vectorized carryless multiply. The VAES instructions have been extended to support vector processing of up to four AES blocks (128-bits) at a time using the wide 512-bit (ZMM) registers, and when properly utilized, will provide a performance benefit to all AES modes of operation. Intel also extends support to vector processing of up to four carryless multiplication operations at a time using the wide 512-bit (ZMM) registers to provide additional performance to Galois hashing and the widely used AES-GCM cipher.

Hashing

Secure Hash Algorithm (SHA) also gets a boost in performance with newly added Intel® SHA Extensions. These instructions provide a significant improvement in SHA-256 performance. Although SHA Extensions have been available on Intel Atom®-based architectures for some time, they are now available on mainstream architectures starting with Ice Lake.

In addition to these enhancements, Intel used fundamental algorithmic and software innovations in Ice Lake.

Function Stitching

In 2010, Intel pioneered a technique to optimize two algorithms that typically run in combination yet sequentially, such as AES-CBC and SHA256, and form them into a single optimized algorithm focused on maximizing processor resources and throughput. The result is a fine-grained interleaving of the instructions from each algorithm so that both algorithms execute simultaneously. This enables processor execution units that would otherwise be idle when executing a single algorithm, due to either data dependencies or instruction latencies, to execute instructions from the other algorithm, and vice versa. More information on Function Stitching can be found in this paper.

Multi-Buffer

Multi-buffer is an innovative and efficient technique for processing multiple independent data buffers in parallel for cryptographic algorithms. Intel had previously implemented this technique for algorithms such as hashing and symmetric encryption. Processing multiple buffers simultaneously can result in significant performance improvements — both for the case where the code can take advantage of single instruction multiple data (AVX/AVX2/AVX512) instructions and even where it cannot. More information on multi-buffer can be found in this paper.

The upcoming Ice Lake platform is designed to be the most protected Intel Xeon platform ever, with built-in acceleration for next-generation security that doesn’t sacrifice performance. The hardware enhancements, along with algorithmic and software innovations, will help lead to breakthrough cryptographic performance across a host of important cryptographic algorithms used throughout the industry. These investments support and help accelerate the transition to new post-quantum cryptography schemes that the industry will navigate in the coming decade.

More details can be found on the can be found in the latest Intel Software Developer Manual.

Martin G. Dixon
Intel fellow and Vice President, Intel Security Architecture and Engineering Group

The Small Print:
 

Source code can be found at the following links: https://github.com/intel/intel-ipsec-mb | https://github.com/intel/isa-l | https://github.com/intel/isa-l_crypto | https://github.com/intel/ipp-crypto/tree/ipp-crypto_2020_update3 | https://github.com/intel/QAT_Engine

No product or component can be absolutely secure.

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.