
Also optimize / 4 and % 4. I assumed the compiler would do this automatically but the performance bump implies it's not doing that. Before | Optimized No dither: 4.8 ms | 3.5 ms Dither : 9.6 ms | 4.2 ms Before: https://drive.google.com[]file/d/0B07DogHRdEHcaXVobi1wZ2wxeUE/view?usp=sharing After: https://drive.google.com[]file/d/0B07DogHRdEHcVS1PN05kaU1odm8/view?usp=sharing Known issue: The remainder from the last Y pixel will leak into the first U pixel. Also U and V remainders leak into each other but I don't think it causes any perceptual difference. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=150255151