Dynamic Partitioned Multiply

This is complicated! It is necessary to compute a full NxN matrix of partial multiplication results, then perform a cascade of adds (long multipication, in binary), using PartitionedAdd, which will "automatically" break the results down into segments, at all times, keeping each partitioned result separate.

The Wallace Tree algorithm is presently deployed, here: we need to use the (more efficient) Dadda algorithm