Large: more robust to noisy/non-studio recitations but slower
Daily GPU usage limits. Unlimitted CPU usage but slower
Shorter = more segments. Decrease for reciters who have short pauses
Speech segments shorter than this are discarded. Increase to filter out false detections
Extra audio kept before/after each segment to avoid clipping speech edges