I sped up the ECC again. On an 885 with an 8-bit window I ...

I sped up the ECC again. On an 885 with an 8-bit window I get ECC-192/224/256 in 374K/461K/572K cycles. With a 12-bit window I get 297K/369K/452K … yes that’s right, more than 8000 ECC-192 point muls per second in software! Keep in mind the fastest I got DUAL-threaded RSA-1024 was ~4000/sec. This is a single thread implementation, in theory dual-threads would get close to a 2x boost. Have to define MECC_FP to use it…