API Reference
FastSinCos.fast_sincos_u100k — Function
fast_sincos_u100k(d::SIMD.Vec{N,Float32}) -> (sin, cos)Compute sin and cos simultaneously using an ultra-fast ~100,000 ULP approximation.
Uses a single Cody-Waite range reduction (π/2 split into high + low parts) to [-π/4, π/4], then evaluates 2-coefficient sin and 2-coefficient cos polynomials, swapping based on quadrant.
Input range
Cody-Waite range reduction keeps error flat across the full valid range. The hard limit is |d| < (π/2) × 2³¹ ≈ 3.4 × 10⁹ (Int32 overflow in the quadrant index).
Accuracy
~100,000 ULP (~3e-4 max absolute error) across all input ranges. ~27% faster than fast_sincos_u35 due to minimal polynomial terms (2+2 vs 3+5 coefficients).
FastSinCos.fast_sincos_u3500 — Function
fast_sincos_u3500(d::SIMD.Vec{N,Float32}) -> (sin, cos)Compute sin and cos simultaneously using a fast ~3500 ULP approximation.
Uses a single Cody-Waite range reduction (π/2 split into high + low parts) to [-π/4, π/4], then evaluates 3-coefficient sin and 3-coefficient cos polynomials, swapping based on quadrant.
Input range
Cody-Waite range reduction keeps error flat at ~3.6e-6 across the full valid range. The hard limit is |d| < (π/2) × 2³¹ ≈ 3.4 × 10⁹ (Int32 overflow in the quadrant index).
Accuracy
~3500 ULP (~3.6e-6 max absolute error) across all input ranges. ~15% faster than fast_sincos_u35 due to fewer polynomial terms (3+3 vs 3+5 coefficients).
FastSinCos.fast_sincos_u35 — Function
fast_sincos_u35(d::SIMD.Vec{N,Float32}) -> (sin, cos)Compute sin and cos simultaneously using a fast ~35 ULP approximation. Port of SLEEFPirates' sincos_fast for SIMD.jl Vec types.
Uses a single Cody-Waite range reduction (π/2 split into high + low parts) to [-π/4, π/4], then evaluates separate sin (3-coeff) and cos (5-coeff) polynomials, swapping based on quadrant.
Input range
The Cody-Waite range reduction preserves precision for larger inputs than fast_sincos_u3500. Accurate for |d| ≲ 1e5. The hard limit is |d| < (π/2) × 2³¹ ≈ 3.4 × 10⁹ (Int32 overflow in the quadrant index).
Accuracy
~35 ULP across the valid input range. More accurate than fast_sincos_u3500 (~3500 ULP) but slightly slower due to more polynomial terms (3+5 vs 3 coefficients).