You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
1792 lines
61 KiB
1792 lines
61 KiB
diff -Naur libtommath-1.0.old/bn.tex libtommath-1.0/bn.tex
|
|
--- libtommath-1.0.old/bn.tex 2016-02-23 11:24:13.632735129 +0100
|
|
+++ libtommath-1.0/bn.tex 2016-02-23 11:43:13.580826898 +0100
|
|
@@ -258,7 +258,7 @@
|
|
So you may be thinking ``should I use LibTomMath?'' and the answer is a definite maybe. Let me tabulate what I think
|
|
are the pros and cons of LibTomMath by comparing it to the math routines from GnuPG\footnote{GnuPG v1.2.3 versus LibTomMath v0.28}.
|
|
|
|
-\newpage\begin{figure}[here]
|
|
+\newpage\begin{figure}[h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{|l|c|c|l|}
|
|
@@ -300,7 +300,7 @@
|
|
There are three possible return codes a function may return.
|
|
|
|
\index{MP\_OKAY}\index{MP\_YES}\index{MP\_NO}\index{MP\_VAL}\index{MP\_MEM}
|
|
-\begin{figure}[here!]
|
|
+\begin{figure}[h!]
|
|
\begin{center}
|
|
\begin{small}
|
|
\begin{tabular}{|l|l|}
|
|
@@ -823,7 +823,7 @@
|
|
for any comparison.
|
|
|
|
\index{MP\_GT} \index{MP\_EQ} \index{MP\_LT}
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{|c|c|}
|
|
\hline \textbf{Result Code} & \textbf{Meaning} \\
|
|
@@ -1290,7 +1290,7 @@
|
|
\end{alltt}
|
|
Where ``XXX'' is one of the following entries from the table \ref{fig:tuning}.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{small}
|
|
\begin{tabular}{|l|l|}
|
|
@@ -1739,7 +1739,7 @@
|
|
(see fig. \ref{fig:primeopts}) which can be OR'ed together. The callback parameters are used as in
|
|
mp\_prime\_random().
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{small}
|
|
\begin{tabular}{|r|l|}
|
|
diff -Naur libtommath-1.0.old/tommath.src libtommath-1.0/tommath.src
|
|
--- libtommath-1.0.old/tommath.src 2016-02-03 19:07:27.000000000 +0100
|
|
+++ libtommath-1.0/tommath.src 2016-02-23 11:46:48.428479620 +0100
|
|
@@ -175,7 +175,7 @@
|
|
typical RSA modulus would be at least greater than $10^{309}$. However, modern programming languages such as ISO C \cite{ISOC} and
|
|
Java \cite{JAVA} only provide instrinsic support for integers which are relatively small and single precision.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{center}
|
|
\begin{tabular}{|r|c|}
|
|
\hline \textbf{Data Type} & \textbf{Range} \\
|
|
@@ -366,7 +366,7 @@
|
|
exercises ranges from one (the easiest) to five (the hardest). The following table sumarizes the
|
|
scoring system used.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{small}
|
|
\begin{tabular}{|c|l|}
|
|
@@ -573,7 +573,7 @@
|
|
used within LibTomMath.
|
|
|
|
\index{mp\_int}
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{small}
|
|
%\begin{verbatim}
|
|
@@ -670,7 +670,7 @@
|
|
\textbf{int} data type with one of the following values (fig \ref{fig:errcodes}).
|
|
|
|
\index{MP\_OKAY} \index{MP\_VAL} \index{MP\_MEM}
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{|l|l|}
|
|
\hline \textbf{Value} & \textbf{Meaning} \\
|
|
@@ -719,7 +719,7 @@
|
|
structure are set to valid values. The mp\_init algorithm will perform such an action.
|
|
|
|
\index{mp\_init}
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
\hline Algorithm \textbf{mp\_init}. \\
|
|
@@ -793,7 +793,7 @@
|
|
When an mp\_int is no longer required by the application, the memory that has been allocated for its digits must be
|
|
returned to the application's memory pool with the mp\_clear algorithm.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
\hline Algorithm \textbf{mp\_clear}. \\
|
|
@@ -857,7 +857,7 @@
|
|
is large enough to simply increase the \textbf{used} digit count. However, when the size of the array is too small it
|
|
must be re-sized appropriately to accomodate the result. The mp\_grow algorithm will provide this functionality.
|
|
|
|
-\newpage\begin{figure}[here]
|
|
+\newpage\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
\hline Algorithm \textbf{mp\_grow}. \\
|
|
@@ -911,7 +911,7 @@
|
|
of input mp\_ints to a given algorithm. The purpose of algorithm mp\_init\_size is similar to mp\_init except that it
|
|
will allocate \textit{at least} a specified number of digits.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -963,7 +963,7 @@
|
|
The purpose of algorithm mp\_init\_multi is to initialize a variable length array of mp\_int structures in a single
|
|
statement. It is essentially a shortcut to multiple initializations.
|
|
|
|
-\newpage\begin{figure}[here]
|
|
+\newpage\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
\hline Algorithm \textbf{mp\_init\_multi}. \\
|
|
@@ -1022,7 +1022,7 @@
|
|
positive number which means that if the \textbf{used} count is decremented to zero, the sign must be set to
|
|
\textbf{MP\_ZPOS}.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
\hline Algorithm \textbf{mp\_clamp}. \\
|
|
@@ -1090,7 +1090,7 @@
|
|
a copy for the purposes of this text. The copy of the mp\_int will be a separate entity that represents the same
|
|
value as the mp\_int it was copied from. The mp\_copy algorithm provides this functionality.
|
|
|
|
-\newpage\begin{figure}[here]
|
|
+\newpage\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
\hline Algorithm \textbf{mp\_copy}. \\
|
|
@@ -1205,7 +1205,7 @@
|
|
useful within functions that need to modify an argument but do not wish to actually modify the original copy. The
|
|
mp\_init\_copy algorithm has been designed to help perform this task.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
\hline Algorithm \textbf{mp\_init\_copy}. \\
|
|
@@ -1235,7 +1235,7 @@
|
|
Reseting an mp\_int to the default state is a common step in many algorithms. The mp\_zero algorithm will be the algorithm used to
|
|
perform this task.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
\hline Algorithm \textbf{mp\_zero}. \\
|
|
@@ -1265,7 +1265,7 @@
|
|
With the mp\_int representation of an integer, calculating the absolute value is trivial. The mp\_abs algorithm will compute
|
|
the absolute value of an mp\_int.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
\hline Algorithm \textbf{mp\_abs}. \\
|
|
@@ -1297,7 +1297,7 @@
|
|
With the mp\_int representation of an integer, calculating the negation is also trivial. The mp\_neg algorithm will compute
|
|
the negative of an mp\_int input.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
\hline Algorithm \textbf{mp\_neg}. \\
|
|
@@ -1334,7 +1334,7 @@
|
|
\subsection{Setting Small Constants}
|
|
Often a mp\_int must be set to a relatively small value such as $1$ or $2$. For these cases the mp\_set algorithm is useful.
|
|
|
|
-\newpage\begin{figure}[here]
|
|
+\newpage\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
\hline Algorithm \textbf{mp\_set}. \\
|
|
@@ -1375,7 +1375,7 @@
|
|
To overcome the limitations of the mp\_set algorithm the mp\_set\_int algorithm is ideal. It accepts a ``long''
|
|
data type as input and will always treat it as a 32-bit integer.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
\hline Algorithm \textbf{mp\_set\_int}. \\
|
|
@@ -1425,7 +1425,7 @@
|
|
|
|
To facilitate working with the results of the comparison functions three constants are required.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{|r|l|}
|
|
\hline \textbf{Constant} & \textbf{Meaning} \\
|
|
@@ -1438,7 +1438,7 @@
|
|
\caption{Comparison Return Codes}
|
|
\end{figure}
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
\hline Algorithm \textbf{mp\_cmp\_mag}. \\
|
|
@@ -1479,7 +1479,7 @@
|
|
Comparing with sign considerations is also fairly critical in several routines (\textit{division for example}). Based on an unsigned magnitude
|
|
comparison a trivial signed comparison algorithm can be written.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
\hline Algorithm \textbf{mp\_cmp}. \\
|
|
@@ -1560,7 +1560,7 @@
|
|
Historically that convention stems from the MPI library where ``s\_'' stood for static functions that were hidden from the developer entirely.
|
|
|
|
\newpage
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{center}
|
|
\begin{small}
|
|
\begin{tabular}{l}
|
|
@@ -1663,7 +1663,7 @@
|
|
For example, the default for LibTomMath is to use a ``unsigned long'' for the mp\_digit ``type'' while $\beta = 2^{28}$. In ISO C an ``unsigned long''
|
|
data type must be able to represent $0 \le x < 2^{32}$ meaning that in this case $\gamma \ge 32$.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{center}
|
|
\begin{small}
|
|
\begin{tabular}{l}
|
|
@@ -1754,7 +1754,7 @@
|
|
Recall from section 5.2 that an mp\_int represents an integer with an unsigned mantissa (\textit{the array of digits}) and a \textbf{sign}
|
|
flag. A high level addition is actually performed as a series of eight separate cases which can be optimized down to three unique cases.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
\hline Algorithm \textbf{mp\_add}. \\
|
|
@@ -1783,7 +1783,7 @@
|
|
either \cite{TAOCPV2} or \cite{HAC} since they both only provide unsigned operations. The algorithm is fairly
|
|
straightforward but restricted since subtraction can only produce positive results.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{|c|c|c|c|c|}
|
|
@@ -1833,7 +1833,7 @@
|
|
\subsection{High Level Subtraction}
|
|
The high level signed subtraction algorithm is essentially the same as the high level signed addition algorithm.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
\hline Algorithm \textbf{mp\_sub}. \\
|
|
@@ -1865,7 +1865,7 @@
|
|
\cite{HAC}. Also this algorithm is restricted by algorithm s\_mp\_sub. Chart \ref{fig:SubChart} lists the eight possible inputs and
|
|
the operations required.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{|c|c|c|c|c|}
|
|
@@ -1911,7 +1911,7 @@
|
|
In a binary system where the radix is a power of two multiplication by two not only arises often in other algorithms it is a fairly efficient
|
|
operation to perform. A single precision logical shift left is sufficient to multiply a single digit by two.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -1967,7 +1967,7 @@
|
|
\subsection{Division by Two}
|
|
A division by two can just as easily be accomplished with a logical shift right as multiplication by two can be with a logical shift left.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -2024,7 +2024,7 @@
|
|
degree. In this case $f(x) \cdot x = a_n x^{n+1} + a_{n-1} x^n + ... + a_0 x$. From a scalar basis point of view multiplying by $x$ is equivalent to
|
|
multiplying by the integer $\beta$.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -2080,7 +2080,7 @@
|
|
|
|
Division by powers of $x$ is easily achieved by shifting the digits right and removing any that will end up to the right of the zero'th digit.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -2136,7 +2136,7 @@
|
|
|
|
\subsection{Multiplication by Power of Two}
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -2199,7 +2199,7 @@
|
|
|
|
\subsection{Division by Power of Two}
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -2252,7 +2252,7 @@
|
|
The last algorithm in the series of polynomial basis power of two algorithms is calculating the remainder of division by $2^b$. This
|
|
algorithm benefits from the fact that in twos complement arithmetic $a \mbox{ (mod }2^b\mbox{)}$ is the same as $a$ AND $2^b - 1$.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -2363,7 +2363,7 @@
|
|
include $\alpha$ which shall represent the number of bits in the type \textbf{mp\_word}. This implies that $2^{\alpha} > 2 \cdot \beta^2$. The
|
|
constant $\delta = 2^{\alpha - 2lg(\beta)}$ will represent the maximal weight of any column in a product (\textit{see ~COMBA~ for more information}).
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -2422,7 +2422,7 @@
|
|
For example, consider multiplying $576$ by $241$. That is equivalent to computing $10^0(1)(576) + 10^1(4)(576) + 10^2(2)(576)$ which is best
|
|
visualized in the following table.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{|c|c|c|c|c|c|l|}
|
|
\hline && & 5 & 7 & 6 & \\
|
|
@@ -2500,7 +2500,7 @@
|
|
Where $\vec x_n$ is the $n'th$ column of the output vector. Consider the following example which computes the vector $\vec x$ for the multiplication
|
|
of $576$ and $241$.
|
|
|
|
-\newpage\begin{figure}[here]
|
|
+\newpage\begin{figure}[h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{|c|c|c|c|c|c|}
|
|
@@ -2521,7 +2521,7 @@
|
|
Now the columns must be fixed by propagating the carry upwards. The resultant vector will have one extra dimension over the input vector which is
|
|
congruent to adding a leading zero digit.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -2577,7 +2577,7 @@
|
|
the smaller input may not have more than $256$ digits if the Comba method is to be used. This is quite satisfactory for most applications since
|
|
$256$ digits would allow for numbers in the range of $0 \le x < 2^{7168}$ which, is much larger than most public key cryptographic algorithms require.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -2783,7 +2783,7 @@
|
|
of this system of equations has made Karatsuba fairly popular. In fact the cutoff point is often fairly low\footnote{With LibTomMath 0.18 it is 70 and 109 digits for the Intel P4 and AMD Athlon respectively.}
|
|
making it an ideal algorithm to speed up certain public key cryptosystems such as RSA and Diffie-Hellman.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -2888,7 +2888,7 @@
|
|
the algorithm can be faster than a baseline multiplication. However, the greater complexity of this algorithm places the cutoff point
|
|
(\textbf{TOOM\_MUL\_CUTOFF}) where Toom-Cook becomes more efficient much higher than the Karatsuba cutoff point.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -2925,7 +2925,7 @@
|
|
\caption{Algorithm mp\_toom\_mul}
|
|
\end{figure}
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -2997,7 +2997,7 @@
|
|
Now that algorithms to handle multiplications of every useful dimensions have been developed, a rather simple finishing touch is required. So far all
|
|
of the multiplication algorithms have been unsigned multiplications which leaves only a signed multiplication algorithm to be established.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -3051,7 +3051,7 @@
|
|
For any $n$-digit input, there are ${{\left (n^2 + n \right)}\over 2}$ possible unique single precision multiplications required compared to the $n^2$
|
|
required for multiplication. The following diagram gives an example of the operations required.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{ccccc|c}
|
|
&&1&2&3&\\
|
|
@@ -3080,7 +3080,7 @@
|
|
The baseline squaring algorithm is meant to be a catch-all squaring algorithm. It will handle any of the input sizes that the faster routines
|
|
will not handle.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -3159,7 +3159,7 @@
|
|
mp\_word arrays. One array will hold the squares and the other array will hold the double products. With both arrays the doubling and
|
|
carry propagation can be moved to a $O(n)$ work level outside the $O(n^2)$ level. In this case, we have an even simpler solution in mind.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -3249,7 +3249,7 @@
|
|
The 100 digit halves will not be squared using Karatsuba, but instead using the faster Comba based squaring algorithm. If Karatsuba multiplication
|
|
were used instead, the 100 digit numbers would be squared with a slower Comba based multiplication.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -3342,7 +3342,7 @@
|
|
derive their own Toom-Cook squaring algorithm.
|
|
|
|
\subsection{High Level Squaring}
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -3560,7 +3560,7 @@
|
|
is considerably faster than the straightforward $3m^2$ method.
|
|
|
|
\subsection{The Barrett Algorithm}
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -3634,7 +3634,7 @@
|
|
In order to use algorithm mp\_reduce the value of $\mu$ must be calculated in advance. Ideally this value should be computed once and stored for
|
|
future use so that the Barrett algorithm can be used without delay.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -3680,7 +3680,7 @@
|
|
|
|
From these two simple facts the following simple algorithm can be derived.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -3706,7 +3706,7 @@
|
|
final result of the Montgomery algorithm. If $k > lg(n)$ and $0 \le x < n^2$ then the final result is limited to
|
|
$0 \le r < \lfloor x/2^k \rfloor + n$. As a result at most a single subtraction is required to get the residue desired.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{|c|l|}
|
|
@@ -3736,7 +3736,7 @@
|
|
and $k^2$ single precision additions. At this rate the algorithm is most certainly slower than Barrett reduction and not terribly useful.
|
|
Fortunately there exists an alternative representation of the algorithm.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -3758,7 +3758,7 @@
|
|
This algorithm is equivalent since $2^tn$ is a multiple of $n$ and the lower $k$ bits of $x$ are zero by step 2. The number of single
|
|
precision shifts has now been reduced from $2k^2$ to $k^2 + k$ which is only a small improvement.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{|c|l|r|}
|
|
@@ -3791,7 +3791,7 @@
|
|
Instead of computing the reduction on a bit-by-bit basis it is actually much faster to compute it on digit-by-digit basis. Consider the
|
|
previous algorithm re-written to compute the Montgomery reduction in this new fashion.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -3849,7 +3849,7 @@
|
|
The baseline Montgomery reduction algorithm will produce the residue for any size input. It is designed to be a catch-all algororithm for
|
|
Montgomery reductions.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -3933,7 +3933,7 @@
|
|
With this change in place the Montgomery reduction algorithm can be performed with a Comba style multiplication loop which substantially increases
|
|
the speed of the algorithm.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -4014,7 +4014,7 @@
|
|
\subsection{Montgomery Setup}
|
|
To calculate the variable $\rho$ a relatively simple algorithm will be required.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -4070,7 +4070,7 @@
|
|
The variable $n$ reduces modulo $n - k$ to $k$. By putting $q = \lfloor x/n \rfloor$ and $r = x \mbox{ mod } n$
|
|
into the equation the original congruence is reproduced, thus concluding the proof. The following algorithm is based on this observation.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -4183,7 +4183,7 @@
|
|
of $x$ and $q$. The resulting algorithm is very efficient and can lead to substantial improvements over Barrett and Montgomery reduction when modular
|
|
exponentiations are performed.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -4249,7 +4249,7 @@
|
|
To setup the restricted Diminished Radix algorithm the value $k = \beta - n_0$ is required. This algorithm is not really complicated but provided for
|
|
completeness.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -4271,7 +4271,7 @@
|
|
Another algorithm which will be useful is the ability to detect a restricted Diminished Radix modulus. An integer is said to be
|
|
of restricted Diminished Radix form if all of the digits are equal to $\beta - 1$ except the trailing digit which may be any value.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -4304,7 +4304,7 @@
|
|
In general the restricted Diminished Radix reduction algorithm is much faster since it has considerably lower overhead. However, this new
|
|
algorithm is much faster than either Montgomery or Barrett reduction when the moduli are of the appropriate form.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -4346,7 +4346,7 @@
|
|
\subsubsection{Unrestricted Setup}
|
|
To setup this reduction algorithm the value of $k = 2^p - n$ is required.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -4385,7 +4385,7 @@
|
|
that there will be value of $k$ that when added to the modulus causes a carry in the first digit which propagates all the way to the most
|
|
significant bit. The resulting sum will be a power of two.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -4487,7 +4487,7 @@
|
|
While this current method is a considerable speed up there are further improvements to be made. For example, the $a^{2^i}$ term does not need to
|
|
be computed in an auxilary variable. Consider the following equivalent algorithm.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -4539,7 +4539,7 @@
|
|
to be used when a small power of an input is required (\textit{e.g. $a^5$}). It is faster than simply multiplying $b - 1$ times for all values of
|
|
$b$ that are greater than three.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -4590,7 +4590,7 @@
|
|
computes the same exponentiation. A group of $k$ bits from the exponent is called a \textit{window}. That is it is a small window on only a
|
|
portion of the entire exponent. Consider the following modification to the basic left to right exponentiation algorithm.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -4626,7 +4626,7 @@
|
|
approach is to brute force search amongst the values $k = 2, 3, \ldots, 8$ for the lowest result. Table~\ref{fig:OPTK} lists optimal values of $k$
|
|
for various exponent sizes and compares the number of multiplication and squarings required against algorithm~\ref{fig:LTOR}.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{small}
|
|
\begin{tabular}{|c|c|c|c|c|c|}
|
|
@@ -4655,7 +4655,7 @@
|
|
|
|
Table~\ref{fig:OPTK2} lists optimal values of $k$ for various exponent sizes and compares the work required against algorithm {\ref{fig:KARY}}.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{small}
|
|
\begin{tabular}{|c|c|c|c|c|c|}
|
|
@@ -4677,7 +4677,7 @@
|
|
\label{fig:OPTK2}
|
|
\end{figure}
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -4728,7 +4728,7 @@
|
|
value of $(1/a) \mbox{ mod }c$ is computed using the modular inverse (\textit{see \ref{sec;modinv}}). If no inverse exists the algorithm
|
|
terminates with an error.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -4780,7 +4780,7 @@
|
|
|
|
\subsection{Barrett Modular Exponentiation}
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -4830,7 +4830,7 @@
|
|
\caption{Algorithm s\_mp\_exptmod}
|
|
\end{figure}
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -4944,7 +4944,7 @@
|
|
Calculating $b = 2^a$ can be performed much quicker than with any of the previous algorithms. Recall that a logical shift left $m << k$ is
|
|
equivalent to $m \cdot 2^k$. By this logic when $m = 1$ a quick power of two can be achieved.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -4986,7 +4986,7 @@
|
|
will be used. Let $x$ represent the divisor and $y$ represent the dividend. Let $q$ represent the integer quotient $\lfloor y / x \rfloor$ and
|
|
let $r$ represent the remainder $r = y - x \lfloor y / x \rfloor$. The following simple algorithm will be used to start the discussion.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -5082,7 +5082,7 @@
|
|
At most the quotient approaches $2\beta$, however, in practice this will not occur since that would imply the previous quotient digit was too small.
|
|
|
|
\subsection{Radix-$\beta$ Division with Remainder}
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -5125,7 +5125,7 @@
|
|
\caption{Algorithm mp\_div}
|
|
\end{figure}
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -5248,7 +5248,7 @@
|
|
Both addition and subtraction are performed by ``cheating'' and using mp\_set followed by the higher level addition or subtraction
|
|
algorithms. As a result these algorithms are subtantially simpler with a slight cost in performance.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -5281,7 +5281,7 @@
|
|
multiplication algorithm. Essentially this algorithm is a modified version of algorithm s\_mp\_mul\_digs where one of the multiplicands
|
|
only has one digit.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -5323,7 +5323,7 @@
|
|
Like the single digit multiplication algorithm, single digit division is also a fairly common algorithm used in radix conversion. Since the
|
|
divisor is only a single digit a specialized variant of the division algorithm can be used to compute the quotient.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -5388,7 +5388,7 @@
|
|
such as the real numbers. As a result the root found can be above the true root by few and must be manually adjusted. Ideally at the end of the
|
|
algorithm the $n$'th root $b$ of an integer $a$ is desired such that $b^n \le a$.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -5441,7 +5441,7 @@
|
|
factoring for example, can make use of random values as starting points to find factors of a composite integer. In this case the algorithm presented
|
|
is solely for simulations and not intended for cryptographic use.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -5483,7 +5483,7 @@
|
|
such that they are printable. While outputting as base64 may not be too helpful for human operators it does allow communication via non binary
|
|
mediums.
|
|
|
|
-\newpage\begin{figure}[here]
|
|
+\newpage\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{cc|cc|cc|cc}
|
|
\hline \textbf{Value} & \textbf{Char} & \textbf{Value} & \textbf{Char} & \textbf{Value} & \textbf{Char} & \textbf{Value} & \textbf{Char} \\
|
|
@@ -5511,7 +5511,7 @@
|
|
\label{fig:ASC}
|
|
\end{figure}
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -5551,7 +5551,7 @@
|
|
\subsection{Generating Radix-$n$ Output}
|
|
Generating radix-$n$ output is fairly trivial with a division and remainder algorithm.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -5620,7 +5620,7 @@
|
|
The most common approach (cite) is to reduce one input modulo another. That is if $a$ and $b$ are divisible by some integer $k$ and if $qa + r = b$ then
|
|
$r$ is also divisible by $k$. The reduction pattern follows $\left < a , b \right > \rightarrow \left < b, a \mbox{ mod } b \right >$.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -5646,7 +5646,7 @@
|
|
greatest common divisors. The faster approach is based on the observation that if $k$ divides both $a$ and $b$ it will also divide $a - b$.
|
|
In particular, we would like $a - b$ to decrease in magnitude which implies that $b \ge a$.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -5680,7 +5680,7 @@
|
|
However, instead of factoring $b - a$ to find a suitable value of $p$ the powers of $p$ can be removed from $a$ and $b$ that are in common first.
|
|
Then inside the loop whenever $b - a$ is divisible by some power of $p$ it can be safely removed.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -5725,7 +5725,7 @@
|
|
The algorithms presented so far cannot handle inputs which are zero or negative. The following algorithm can handle all input cases properly
|
|
and will produce the greatest common divisor.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -5812,7 +5812,7 @@
|
|
Linear Feedback Shift Registers (LFSR) tend to use registers with periods which are co-prime (\textit{e.g. the greatest common divisor is one.}).
|
|
Similarly in number theory if a composite $n$ has two prime factors $p$ and $q$ then maximal order of any unit of $\Z/n\Z$ will be $[ p - 1, q - 1] $.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -5935,7 +5935,7 @@
|
|
factors of $p$ do not have to be known. Furthermore, if $(a, p) = 1$ then the algorithm will terminate when the recursion requests the
|
|
Jacobi symbol computation of $\left ( {1 \over a'} \right )$ which is simply $1$.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -6040,7 +6040,7 @@
|
|
equation.
|
|
|
|
\subsection{General Case}
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -6150,7 +6150,7 @@
|
|
approximately $80\%$ of all candidate integers. The constant \textbf{PRIME\_SIZE} is equal to the number of primes in the test base. The
|
|
array \_\_prime\_tab is an array of the first \textbf{PRIME\_SIZE} prime numbers.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -6196,7 +6196,7 @@
|
|
integers known as Carmichael numbers will be a pseudo-prime to all valid bases. Fortunately such numbers are extremely rare as $n$ grows
|
|
in size.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -6228,7 +6228,7 @@
|
|
value must be equal to $-1$. The squarings are stopped as soon as $-1$ is observed. If the value of $1$ is observed first it means that
|
|
some value not congruent to $\pm 1$ when squared equals one which cannot occur if $n$ is prime.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
diff -Naur libtommath-1.0.old/tommath.tex libtommath-1.0/tommath.tex
|
|
--- libtommath-1.0.old/tommath.tex 2016-02-23 11:24:13.648735326 +0100
|
|
+++ libtommath-1.0/tommath.tex 2016-02-23 11:43:13.591827034 +0100
|
|
@@ -175,7 +175,7 @@
|
|
typical RSA modulus would be at least greater than $10^{309}$. However, modern programming languages such as ISO C \cite{ISOC} and
|
|
Java \cite{JAVA} only provide instrinsic support for integers which are relatively small and single precision.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{center}
|
|
\begin{tabular}{|r|c|}
|
|
\hline \textbf{Data Type} & \textbf{Range} \\
|
|
@@ -366,7 +366,7 @@
|
|
exercises ranges from one (the easiest) to five (the hardest). The following table sumarizes the
|
|
scoring system used.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{small}
|
|
\begin{tabular}{|c|l|}
|
|
@@ -534,7 +534,7 @@
|
|
for new algorithms. This methodology allows new algorithms to be tested in a complete framework with relative ease.
|
|
|
|
\begin{center}
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\includegraphics{pics/design_process.ps}
|
|
\caption{Design Flow of the First Few Original LibTomMath Functions.}
|
|
\label{pic:design_process}
|
|
@@ -579,7 +579,7 @@
|
|
used within LibTomMath.
|
|
|
|
\index{mp\_int}
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{small}
|
|
%\begin{verbatim}
|
|
@@ -676,7 +676,7 @@
|
|
\textbf{int} data type with one of the following values (fig \ref{fig:errcodes}).
|
|
|
|
\index{MP\_OKAY} \index{MP\_VAL} \index{MP\_MEM}
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{|l|l|}
|
|
\hline \textbf{Value} & \textbf{Meaning} \\
|
|
@@ -725,7 +725,7 @@
|
|
structure are set to valid values. The mp\_init algorithm will perform such an action.
|
|
|
|
\index{mp\_init}
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
\hline Algorithm \textbf{mp\_init}. \\
|
|
@@ -831,7 +831,7 @@
|
|
When an mp\_int is no longer required by the application, the memory that has been allocated for its digits must be
|
|
returned to the application's memory pool with the mp\_clear algorithm.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
\hline Algorithm \textbf{mp\_clear}. \\
|
|
@@ -925,7 +925,7 @@
|
|
is large enough to simply increase the \textbf{used} digit count. However, when the size of the array is too small it
|
|
must be re-sized appropriately to accomodate the result. The mp\_grow algorithm will provide this functionality.
|
|
|
|
-\newpage\begin{figure}[here]
|
|
+\newpage\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
\hline Algorithm \textbf{mp\_grow}. \\
|
|
@@ -1022,7 +1022,7 @@
|
|
of input mp\_ints to a given algorithm. The purpose of algorithm mp\_init\_size is similar to mp\_init except that it
|
|
will allocate \textit{at least} a specified number of digits.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -1108,7 +1108,7 @@
|
|
The purpose of algorithm mp\_init\_multi is to initialize a variable length array of mp\_int structures in a single
|
|
statement. It is essentially a shortcut to multiple initializations.
|
|
|
|
-\newpage\begin{figure}[here]
|
|
+\newpage\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
\hline Algorithm \textbf{mp\_init\_multi}. \\
|
|
@@ -1212,7 +1212,7 @@
|
|
positive number which means that if the \textbf{used} count is decremented to zero, the sign must be set to
|
|
\textbf{MP\_ZPOS}.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
\hline Algorithm \textbf{mp\_clamp}. \\
|
|
@@ -1310,7 +1310,7 @@
|
|
a copy for the purposes of this text. The copy of the mp\_int will be a separate entity that represents the same
|
|
value as the mp\_int it was copied from. The mp\_copy algorithm provides this functionality.
|
|
|
|
-\newpage\begin{figure}[here]
|
|
+\newpage\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
\hline Algorithm \textbf{mp\_copy}. \\
|
|
@@ -1479,7 +1479,7 @@
|
|
useful within functions that need to modify an argument but do not wish to actually modify the original copy. The
|
|
mp\_init\_copy algorithm has been designed to help perform this task.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
\hline Algorithm \textbf{mp\_init\_copy}. \\
|
|
@@ -1527,7 +1527,7 @@
|
|
Reseting an mp\_int to the default state is a common step in many algorithms. The mp\_zero algorithm will be the algorithm used to
|
|
perform this task.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
\hline Algorithm \textbf{mp\_zero}. \\
|
|
@@ -1579,7 +1579,7 @@
|
|
With the mp\_int representation of an integer, calculating the absolute value is trivial. The mp\_abs algorithm will compute
|
|
the absolute value of an mp\_int.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
\hline Algorithm \textbf{mp\_abs}. \\
|
|
@@ -1640,7 +1640,7 @@
|
|
With the mp\_int representation of an integer, calculating the negation is also trivial. The mp\_neg algorithm will compute
|
|
the negative of an mp\_int input.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
\hline Algorithm \textbf{mp\_neg}. \\
|
|
@@ -1703,7 +1703,7 @@
|
|
\subsection{Setting Small Constants}
|
|
Often a mp\_int must be set to a relatively small value such as $1$ or $2$. For these cases the mp\_set algorithm is useful.
|
|
|
|
-\newpage\begin{figure}[here]
|
|
+\newpage\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
\hline Algorithm \textbf{mp\_set}. \\
|
|
@@ -1759,7 +1759,7 @@
|
|
To overcome the limitations of the mp\_set algorithm the mp\_set\_int algorithm is ideal. It accepts a ``long''
|
|
data type as input and will always treat it as a 32-bit integer.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
\hline Algorithm \textbf{mp\_set\_int}. \\
|
|
@@ -1843,7 +1843,7 @@
|
|
|
|
To facilitate working with the results of the comparison functions three constants are required.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{|r|l|}
|
|
\hline \textbf{Constant} & \textbf{Meaning} \\
|
|
@@ -1856,7 +1856,7 @@
|
|
\caption{Comparison Return Codes}
|
|
\end{figure}
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
\hline Algorithm \textbf{mp\_cmp\_mag}. \\
|
|
@@ -1938,7 +1938,7 @@
|
|
Comparing with sign considerations is also fairly critical in several routines (\textit{division for example}). Based on an unsigned magnitude
|
|
comparison a trivial signed comparison algorithm can be written.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
\hline Algorithm \textbf{mp\_cmp}. \\
|
|
@@ -2047,7 +2047,7 @@
|
|
Historically that convention stems from the MPI library where ``s\_'' stood for static functions that were hidden from the developer entirely.
|
|
|
|
\newpage
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{center}
|
|
\begin{small}
|
|
\begin{tabular}{l}
|
|
@@ -2244,7 +2244,7 @@
|
|
For example, the default for LibTomMath is to use a ``unsigned long'' for the mp\_digit ``type'' while $\beta = 2^{28}$. In ISO C an ``unsigned long''
|
|
data type must be able to represent $0 \le x < 2^{32}$ meaning that in this case $\gamma \ge 32$.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{center}
|
|
\begin{small}
|
|
\begin{tabular}{l}
|
|
@@ -2411,7 +2411,7 @@
|
|
Recall from section 5.2 that an mp\_int represents an integer with an unsigned mantissa (\textit{the array of digits}) and a \textbf{sign}
|
|
flag. A high level addition is actually performed as a series of eight separate cases which can be optimized down to three unique cases.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
\hline Algorithm \textbf{mp\_add}. \\
|
|
@@ -2440,7 +2440,7 @@
|
|
either \cite{TAOCPV2} or \cite{HAC} since they both only provide unsigned operations. The algorithm is fairly
|
|
straightforward but restricted since subtraction can only produce positive results.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{|c|c|c|c|c|}
|
|
@@ -2529,7 +2529,7 @@
|
|
\subsection{High Level Subtraction}
|
|
The high level signed subtraction algorithm is essentially the same as the high level signed addition algorithm.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
\hline Algorithm \textbf{mp\_sub}. \\
|
|
@@ -2561,7 +2561,7 @@
|
|
\cite{HAC}. Also this algorithm is restricted by algorithm s\_mp\_sub. Chart \ref{fig:SubChart} lists the eight possible inputs and
|
|
the operations required.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{|c|c|c|c|c|}
|
|
@@ -2651,7 +2651,7 @@
|
|
In a binary system where the radix is a power of two multiplication by two not only arises often in other algorithms it is a fairly efficient
|
|
operation to perform. A single precision logical shift left is sufficient to multiply a single digit by two.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -2775,7 +2775,7 @@
|
|
\subsection{Division by Two}
|
|
A division by two can just as easily be accomplished with a logical shift right as multiplication by two can be with a logical shift left.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -2886,7 +2886,7 @@
|
|
degree. In this case $f(x) \cdot x = a_n x^{n+1} + a_{n-1} x^n + ... + a_0 x$. From a scalar basis point of view multiplying by $x$ is equivalent to
|
|
multiplying by the integer $\beta$.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -2929,7 +2929,7 @@
|
|
|
|
\newpage
|
|
\begin{center}
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\includegraphics{pics/sliding_window.ps}
|
|
\caption{Sliding Window Movement}
|
|
\label{pic:sliding_window}
|
|
@@ -3001,7 +3001,7 @@
|
|
|
|
Division by powers of $x$ is easily achieved by shifting the digits right and removing any that will end up to the right of the zero'th digit.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -3115,7 +3115,7 @@
|
|
|
|
\subsection{Multiplication by Power of Two}
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -3249,7 +3249,7 @@
|
|
|
|
\subsection{Division by Power of Two}
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -3387,7 +3387,7 @@
|
|
The last algorithm in the series of polynomial basis power of two algorithms is calculating the remainder of division by $2^b$. This
|
|
algorithm benefits from the fact that in twos complement arithmetic $a \mbox{ (mod }2^b\mbox{)}$ is the same as $a$ AND $2^b - 1$.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -3541,7 +3541,7 @@
|
|
include $\alpha$ which shall represent the number of bits in the type \textbf{mp\_word}. This implies that $2^{\alpha} > 2 \cdot \beta^2$. The
|
|
constant $\delta = 2^{\alpha - 2lg(\beta)}$ will represent the maximal weight of any column in a product (\textit{see sub-section 5.2.2 for more information}).
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -3600,7 +3600,7 @@
|
|
For example, consider multiplying $576$ by $241$. That is equivalent to computing $10^0(1)(576) + 10^1(4)(576) + 10^2(2)(576)$ which is best
|
|
visualized in the following table.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{|c|c|c|c|c|c|l|}
|
|
\hline && & 5 & 7 & 6 & \\
|
|
@@ -3753,7 +3753,7 @@
|
|
Where $\vec x_n$ is the $n'th$ column of the output vector. Consider the following example which computes the vector $\vec x$ for the multiplication
|
|
of $576$ and $241$.
|
|
|
|
-\newpage\begin{figure}[here]
|
|
+\newpage\begin{figure}[h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{|c|c|c|c|c|c|}
|
|
@@ -3774,7 +3774,7 @@
|
|
Now the columns must be fixed by propagating the carry upwards. The resultant vector will have one extra dimension over the input vector which is
|
|
congruent to adding a leading zero digit.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -3830,7 +3830,7 @@
|
|
the smaller input may not have more than $256$ digits if the Comba method is to be used. This is quite satisfactory for most applications since
|
|
$256$ digits would allow for numbers in the range of $0 \le x < 2^{7168}$ which, is much larger than most public key cryptographic algorithms require.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -4129,7 +4129,7 @@
|
|
of this system of equations has made Karatsuba fairly popular. In fact the cutoff point is often fairly low\footnote{With LibTomMath 0.18 it is 70 and 109 digits for the Intel P4 and AMD Athlon respectively.}
|
|
making it an ideal algorithm to speed up certain public key cryptosystems such as RSA and Diffie-Hellman.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -4387,7 +4387,7 @@
|
|
the algorithm can be faster than a baseline multiplication. However, the greater complexity of this algorithm places the cutoff point
|
|
(\textbf{TOOM\_MUL\_CUTOFF}) where Toom-Cook becomes more efficient much higher than the Karatsuba cutoff point.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -4424,7 +4424,7 @@
|
|
\caption{Algorithm mp\_toom\_mul}
|
|
\end{figure}
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -4768,7 +4768,7 @@
|
|
Now that algorithms to handle multiplications of every useful dimensions have been developed, a rather simple finishing touch is required. So far all
|
|
of the multiplication algorithms have been unsigned multiplications which leaves only a signed multiplication algorithm to be established.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -4875,7 +4875,7 @@
|
|
For any $n$-digit input, there are ${{\left (n^2 + n \right)}\over 2}$ possible unique single precision multiplications required compared to the $n^2$
|
|
required for multiplication. The following diagram gives an example of the operations required.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{ccccc|c}
|
|
&&1&2&3&\\
|
|
@@ -4903,7 +4903,7 @@
|
|
The baseline squaring algorithm is meant to be a catch-all squaring algorithm. It will handle any of the input sizes that the faster routines
|
|
will not handle.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -5052,7 +5052,7 @@
|
|
mp\_word arrays. One array will hold the squares and the other array will hold the double products. With both arrays the doubling and
|
|
carry propagation can be moved to a $O(n)$ work level outside the $O(n^2)$ level. In this case, we have an even simpler solution in mind.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -5242,7 +5242,7 @@
|
|
The 100 digit halves will not be squared using Karatsuba, but instead using the faster Comba based squaring algorithm. If Karatsuba multiplication
|
|
were used instead, the 100 digit numbers would be squared with a slower Comba based multiplication.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -5442,7 +5442,7 @@
|
|
derive their own Toom-Cook squaring algorithm.
|
|
|
|
\subsection{High Level Squaring}
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -5705,7 +5705,7 @@
|
|
is considerably faster than the straightforward $3m^2$ method.
|
|
|
|
\subsection{The Barrett Algorithm}
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -5865,7 +5865,7 @@
|
|
In order to use algorithm mp\_reduce the value of $\mu$ must be calculated in advance. Ideally this value should be computed once and stored for
|
|
future use so that the Barrett algorithm can be used without delay.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -5931,7 +5931,7 @@
|
|
|
|
From these two simple facts the following simple algorithm can be derived.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -5957,7 +5957,7 @@
|
|
final result of the Montgomery algorithm. If $k > lg(n)$ and $0 \le x < n^2$ then the final result is limited to
|
|
$0 \le r < \lfloor x/2^k \rfloor + n$. As a result at most a single subtraction is required to get the residue desired.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{|c|l|}
|
|
@@ -5987,7 +5987,7 @@
|
|
and $k^2$ single precision additions. At this rate the algorithm is most certainly slower than Barrett reduction and not terribly useful.
|
|
Fortunately there exists an alternative representation of the algorithm.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -6009,7 +6009,7 @@
|
|
This algorithm is equivalent since $2^tn$ is a multiple of $n$ and the lower $k$ bits of $x$ are zero by step 2. The number of single
|
|
precision shifts has now been reduced from $2k^2$ to $k^2 + k$ which is only a small improvement.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{|c|l|r|}
|
|
@@ -6042,7 +6042,7 @@
|
|
Instead of computing the reduction on a bit-by-bit basis it is actually much faster to compute it on digit-by-digit basis. Consider the
|
|
previous algorithm re-written to compute the Montgomery reduction in this new fashion.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -6100,7 +6100,7 @@
|
|
The baseline Montgomery reduction algorithm will produce the residue for any size input. It is designed to be a catch-all algororithm for
|
|
Montgomery reductions.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -6287,7 +6287,7 @@
|
|
With this change in place the Montgomery reduction algorithm can be performed with a Comba style multiplication loop which substantially increases
|
|
the speed of the algorithm.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -6526,7 +6526,7 @@
|
|
\subsection{Montgomery Setup}
|
|
To calculate the variable $\rho$ a relatively simple algorithm will be required.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -6627,7 +6627,7 @@
|
|
The variable $n$ reduces modulo $n - k$ to $k$. By putting $q = \lfloor x/n \rfloor$ and $r = x \mbox{ mod } n$
|
|
into the equation the original congruence is reproduced, thus concluding the proof. The following algorithm is based on this observation.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -6740,7 +6740,7 @@
|
|
of $x$ and $q$. The resulting algorithm is very efficient and can lead to substantial improvements over Barrett and Montgomery reduction when modular
|
|
exponentiations are performed.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -6888,7 +6888,7 @@
|
|
To setup the restricted Diminished Radix algorithm the value $k = \beta - n_0$ is required. This algorithm is not really complicated but provided for
|
|
completeness.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -6928,7 +6928,7 @@
|
|
Another algorithm which will be useful is the ability to detect a restricted Diminished Radix modulus. An integer is said to be
|
|
of restricted Diminished Radix form if all of the digits are equal to $\beta - 1$ except the trailing digit which may be any value.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -6990,7 +6990,7 @@
|
|
In general the restricted Diminished Radix reduction algorithm is much faster since it has considerably lower overhead. However, this new
|
|
algorithm is much faster than either Montgomery or Barrett reduction when the moduli are of the appropriate form.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -7081,7 +7081,7 @@
|
|
\subsubsection{Unrestricted Setup}
|
|
To setup this reduction algorithm the value of $k = 2^p - n$ is required.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -7153,7 +7153,7 @@
|
|
that there will be value of $k$ that when added to the modulus causes a carry in the first digit which propagates all the way to the most
|
|
significant bit. The resulting sum will be a power of two.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -7293,7 +7293,7 @@
|
|
While this current method is a considerable speed up there are further improvements to be made. For example, the $a^{2^i}$ term does not need to
|
|
be computed in an auxilary variable. Consider the following equivalent algorithm.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -7345,7 +7345,7 @@
|
|
to be used when a small power of an input is required (\textit{e.g. $a^5$}). It is faster than simply multiplying $b - 1$ times for all values of
|
|
$b$ that are greater than three.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -7465,7 +7465,7 @@
|
|
computes the same exponentiation. A group of $k$ bits from the exponent is called a \textit{window}. That is it is a small window on only a
|
|
portion of the entire exponent. Consider the following modification to the basic left to right exponentiation algorithm.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -7501,7 +7501,7 @@
|
|
approach is to brute force search amongst the values $k = 2, 3, \ldots, 8$ for the lowest result. Table~\ref{fig:OPTK} lists optimal values of $k$
|
|
for various exponent sizes and compares the number of multiplication and squarings required against algorithm~\ref{fig:LTOR}.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{small}
|
|
\begin{tabular}{|c|c|c|c|c|c|}
|
|
@@ -7530,7 +7530,7 @@
|
|
|
|
Table~\ref{fig:OPTK2} lists optimal values of $k$ for various exponent sizes and compares the work required against algorithm {\ref{fig:KARY}}.
|
|
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{small}
|
|
\begin{tabular}{|c|c|c|c|c|c|}
|
|
@@ -7552,7 +7552,7 @@
|
|
\label{fig:OPTK2}
|
|
\end{figure}
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -7603,7 +7603,7 @@
|
|
value of $(1/a) \mbox{ mod }c$ is computed using the modular inverse (\textit{see \ref{sec;modinv}}). If no inverse exists the algorithm
|
|
terminates with an error.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -7754,7 +7754,7 @@
|
|
|
|
\subsection{Barrett Modular Exponentiation}
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -7804,7 +7804,7 @@
|
|
\caption{Algorithm s\_mp\_exptmod}
|
|
\end{figure}
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -7896,7 +7896,7 @@
|
|
the two cases of $mode = 1$ and $mode = 2$ respectively.
|
|
|
|
\begin{center}
|
|
-\begin{figure}[here]
|
|
+\begin{figure}[h]
|
|
\includegraphics{pics/expt_state.ps}
|
|
\caption{Sliding Window State Diagram}
|
|
\label{pic:expt_state}
|
|
@@ -8163,7 +8163,7 @@
|
|
Calculating $b = 2^a$ can be performed much quicker than with any of the previous algorithms. Recall that a logical shift left $m << k$ is
|
|
equivalent to $m \cdot 2^k$. By this logic when $m = 1$ a quick power of two can be achieved.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -8239,7 +8239,7 @@
|
|
will be used. Let $x$ represent the divisor and $y$ represent the dividend. Let $q$ represent the integer quotient $\lfloor y / x \rfloor$ and
|
|
let $r$ represent the remainder $r = y - x \lfloor y / x \rfloor$. The following simple algorithm will be used to start the discussion.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -8335,7 +8335,7 @@
|
|
At most the quotient approaches $2\beta$, however, in practice this will not occur since that would imply the previous quotient digit was too small.
|
|
|
|
\subsection{Radix-$\beta$ Division with Remainder}
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -8378,7 +8378,7 @@
|
|
\caption{Algorithm mp\_div}
|
|
\end{figure}
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -8782,7 +8782,7 @@
|
|
Both addition and subtraction are performed by ``cheating'' and using mp\_set followed by the higher level addition or subtraction
|
|
algorithms. As a result these algorithms are subtantially simpler with a slight cost in performance.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -8913,7 +8913,7 @@
|
|
multiplication algorithm. Essentially this algorithm is a modified version of algorithm s\_mp\_mul\_digs where one of the multiplicands
|
|
only has one digit.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -9020,7 +9020,7 @@
|
|
Like the single digit multiplication algorithm, single digit division is also a fairly common algorithm used in radix conversion. Since the
|
|
divisor is only a single digit a specialized variant of the division algorithm can be used to compute the quotient.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -9186,7 +9186,7 @@
|
|
such as the real numbers. As a result the root found can be above the true root by few and must be manually adjusted. Ideally at the end of the
|
|
algorithm the $n$'th root $b$ of an integer $a$ is desired such that $b^n \le a$.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -9255,7 +9255,7 @@
|
|
factoring for example, can make use of random values as starting points to find factors of a composite integer. In this case the algorithm presented
|
|
is solely for simulations and not intended for cryptographic use.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -9339,7 +9339,7 @@
|
|
such that they are printable. While outputting as base64 may not be too helpful for human operators it does allow communication via non binary
|
|
mediums.
|
|
|
|
-\newpage\begin{figure}[here]
|
|
+\newpage\begin{figure}[h]
|
|
\begin{center}
|
|
\begin{tabular}{cc|cc|cc|cc}
|
|
\hline \textbf{Value} & \textbf{Char} & \textbf{Value} & \textbf{Char} & \textbf{Value} & \textbf{Char} & \textbf{Value} & \textbf{Char} \\
|
|
@@ -9367,7 +9367,7 @@
|
|
\label{fig:ASC}
|
|
\end{figure}
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -9478,7 +9478,7 @@
|
|
\subsection{Generating Radix-$n$ Output}
|
|
Generating radix-$n$ output is fairly trivial with a division and remainder algorithm.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -9608,7 +9608,7 @@
|
|
The most common approach (cite) is to reduce one input modulo another. That is if $a$ and $b$ are divisible by some integer $k$ and if $qa + r = b$ then
|
|
$r$ is also divisible by $k$. The reduction pattern follows $\left < a , b \right > \rightarrow \left < b, a \mbox{ mod } b \right >$.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -9634,7 +9634,7 @@
|
|
greatest common divisors. The faster approach is based on the observation that if $k$ divides both $a$ and $b$ it will also divide $a - b$.
|
|
In particular, we would like $a - b$ to decrease in magnitude which implies that $b \ge a$.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -9668,7 +9668,7 @@
|
|
However, instead of factoring $b - a$ to find a suitable value of $p$ the powers of $p$ can be removed from $a$ and $b$ that are in common first.
|
|
Then inside the loop whenever $b - a$ is divisible by some power of $p$ it can be safely removed.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -9713,7 +9713,7 @@
|
|
The algorithms presented so far cannot handle inputs which are zero or negative. The following algorithm can handle all input cases properly
|
|
and will produce the greatest common divisor.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -9891,7 +9891,7 @@
|
|
Linear Feedback Shift Registers (LFSR) tend to use registers with periods which are co-prime (\textit{e.g. the greatest common divisor is one.}).
|
|
Similarly in number theory if a composite $n$ has two prime factors $p$ and $q$ then maximal order of any unit of $\Z/n\Z$ will be $[ p - 1, q - 1] $.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -10060,7 +10060,7 @@
|
|
factors of $p$ do not have to be known. Furthermore, if $(a, p) = 1$ then the algorithm will terminate when the recursion requests the
|
|
Jacobi symbol computation of $\left ( {1 \over a'} \right )$ which is simply $1$.
|
|
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -10268,7 +10268,7 @@
|
|
equation.
|
|
|
|
\subsection{General Case}
|
|
-\newpage\begin{figure}[!here]
|
|
+\newpage\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -10407,7 +10407,7 @@
|
|
approximately $80\%$ of all candidate integers. The constant \textbf{PRIME\_SIZE} is equal to the number of primes in the test base. The
|
|
array \_\_prime\_tab is an array of the first \textbf{PRIME\_SIZE} prime numbers.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -10536,7 +10536,7 @@
|
|
integers known as Carmichael numbers will be a pseudo-prime to all valid bases. Fortunately such numbers are extremely rare as $n$ grows
|
|
in size.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|
|
@@ -10616,7 +10616,7 @@
|
|
value must be equal to $-1$. The squarings are stopped as soon as $-1$ is observed. If the value of $1$ is observed first it means that
|
|
some value not congruent to $\pm 1$ when squared equals one which cannot occur if $n$ is prime.
|
|
|
|
-\begin{figure}[!here]
|
|
+\begin{figure}[!h]
|
|
\begin{small}
|
|
\begin{center}
|
|
\begin{tabular}{l}
|