Recursion strategies for fast bitsliced polynomial multiplication over binary fields
Shaindlin, Alex (2024)
Shaindlin, Alex
2024
Master's Programme in Science and Engineering
Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2024-12-01
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tuni-202411019786
https://urn.fi/URN:NBN:fi:tuni-202411019786
Tiivistelmä
Elliptic curve cryptography requires a fast and secure implementation of the multiplication operation in the underlying field. Since the field elements involved are not large enough for the asymptotically optimal multiplication algorithms to always perform best in practice, implementing fast multiplication on a particular computer architecture requires experimental testing to determine which algorithm (or combination of algorithms) performs best for a given field size. Another approach to speeding up computation in binary fields is bitslicing, where a batch of elements is transposed and then operated on bitwise using SIMD vector instructions. Combining a bitsliced base layer with automated generation and testing of different recursion strategies, we offer a way to generate highly efficient library code for different field sizes and different architectures without manual tuning.
The previously published paper Batch Binary Weierstrass (appendix A) gave an overview of the software tool developed for generating and benchmarking the strategies, which uses a subset of the multiplication algorithms presented by Bernstein [3]. This thesis describes the tool in more detail, as well as presenting the necessary mathematical background, including worked examples of the Karatsuba and Toom-Cook fast multiplication algorithms. It also includes a discussion of how the tool has changed since the original publication, since bugs discovered during the writing of this thesis necessitated a rewrite of the strategy generation algorithm (although the generated code itself for any given strategy remains unchanged).
The best-performing strategies are established to be competitive with or even superior to other existing implementations of binary field multiplication at cryptographically relevant field sizes. The structural properties of the entire collection of generated strategies are explored, expanding our understanding beyond previous work which had only focused on experimental performance measurement. Reducing the range of subproblem sizes permitted to serve as base cases for the recursion is shown to have only a small effect on the size of the search space. It is established that the number of generated strategies stays in a narrow constant range for field sizes up to approximately GF(2^{465}) but begins to increase rapidly after that point, suggesting that it would be fruitful to investigate the possibility of using benchmarking results from smaller field sizes to prune known-inefficient strategies for subproblems when generating strategies for larger field sizes. The cause of incorrect multiplication results produced by the generated code for some strategies is revealed to be a bug in one of the C macros, which only manifests when it is expanded with a particular combination of inputs.
The previously published paper Batch Binary Weierstrass (appendix A) gave an overview of the software tool developed for generating and benchmarking the strategies, which uses a subset of the multiplication algorithms presented by Bernstein [3]. This thesis describes the tool in more detail, as well as presenting the necessary mathematical background, including worked examples of the Karatsuba and Toom-Cook fast multiplication algorithms. It also includes a discussion of how the tool has changed since the original publication, since bugs discovered during the writing of this thesis necessitated a rewrite of the strategy generation algorithm (although the generated code itself for any given strategy remains unchanged).
The best-performing strategies are established to be competitive with or even superior to other existing implementations of binary field multiplication at cryptographically relevant field sizes. The structural properties of the entire collection of generated strategies are explored, expanding our understanding beyond previous work which had only focused on experimental performance measurement. Reducing the range of subproblem sizes permitted to serve as base cases for the recursion is shown to have only a small effect on the size of the search space. It is established that the number of generated strategies stays in a narrow constant range for field sizes up to approximately GF(2^{465}) but begins to increase rapidly after that point, suggesting that it would be fruitful to investigate the possibility of using benchmarking results from smaller field sizes to prune known-inefficient strategies for subproblems when generating strategies for larger field sizes. The cause of incorrect multiplication results produced by the generated code for some strategies is revealed to be a bug in one of the C macros, which only manifests when it is expanded with a particular combination of inputs.
