t-791 - omni

t-791·WorkTask···

Created4 days ago·Updated4 days ago·pipeline runs →

Description

Goal

Two things:

1. Acceptance rate investigation

Previous experiments showed SD (speculative decoding) acceptance rates of only 23.8% (0.6B draft) and 40.5% (1.7B draft), when 70%+ is needed for SD to beat no-SD throughput. Investigate why:

Are the draft models a poor distributional match for the target?
Is there a quantization mismatch degrading draft quality?
Are speculation parameters (draft length, temperature) misconfigured?
Would a larger/better-matched draft model (e.g. same family, smaller quant) improve acceptance?

Report: root cause hypothesis, any config/model changes tried, acceptance rates observed, recommendation (proceed with SD or abandon for this workload).

2. Idle A100 verification

We spun up A100 nodes for fine-tuning experiments (Intent LoRA, SD benchmarking). Verify all of them are shut down.

Check Parasail for any running A100 instances associated with these experiments
Do NOT touch nodes running production workloads — only check/stop nodes that were used for experiments
Report: which nodes were found, which were already stopped, which (if any) you stopped

Caution

Be careful not to affect other running Parasail production jobs. Query and report before taking any action; stop idle experiment nodes only.

sd: investigate low acceptance rate + verify idle A100s shut down

Description

Goal

1. Acceptance rate investigation

2. Idle A100 verification

Caution

Timeline (0)