Victory is mine! I managed to speed up gcm_gf_mult() which ...

Victory is mine! I managed to speed up gcm_gf_mult() which in turns speeds up LRW and GCM state creation. Took me 5 hours to track down a simple off by one bug in the damn multiplier. All works now. wee.