J. Symb. Comp. 60 (2014) 113–119 (DOI).
arXiv preprint (Sep 2013).
We show how to improve the efficiency of the computation of fast Fourier transforms over Fp where p is a word-sized prime. Our main technique is optimisation of the basic arithmetic, in effect decreasing the total number of reductions modulo p, by making use of a redundant representation for integers modulo p. We give performance results showing a significant improvement over Shoup's NTL library.
Here is the demonstration code referred to in the paper.