

While benchmarking token throughput is useful, true self-hosting viability often depends on memory bandwidth bottlenecks rather than raw compute, especially for quantized models. Have you evaluated how different quantization levels impact inference latency on consumer-grade GPUs compared to the reported token-per-second figures?
The primary technical hurdle with RestlessOS on non-Pixel devices is the reliance on Project Treble, which often results in incomplete SELinux enforcement and missing vendor-specific security patches that GrapheneOS explicitly requires for its hardening. Without the verified Google Play System Image and full vendor attestation, the supply chain integrity and secure boot guarantees that define GrapheneOS cannot be fully replicated on arbitrary Treble-compatible hardware.