Two things:
Previous experiments showed SD (speculative decoding) acceptance rates of only 23.8% (0.6B draft) and 40.5% (1.7B draft), when 70%+ is needed for SD to beat no-SD throughput. Investigate why:
Report: root cause hypothesis, any config/model changes tried, acceptance rates observed, recommendation (proceed with SD or abandon for this workload).
We spun up A100 nodes for fine-tuning experiments (Intent LoRA, SD benchmarking). Verify all of them are shut down.
Be careful not to affect other running Parasail production jobs. Query and report before taking any action; stop idle experiment nodes only.
No activity yet.