Skip to content

CoreML sometimes produces garbage output on cached models #16492

@metascroy

Description

@metascroy

🐛 Describe the bug

When a CoreML model is cached to disk, subsequent runs can produce corrupted outputs. The corrupted output can be fixed by clearing the model cache manually and re-compiling.

The bug appears to be model-specific. So far, I have only observed it with stories110M. I do not see it on Llama1B (same architecture). The corruption manifests differently depending on compute units.

  • CPU_ONLY: Produces <unk> tokens (completely invalid logits)
  • CPU_AND_NE: Produces nonsensical but consistent text (partially corrupted)

Guess at Root Cause: CoreML framework bug where certain models loaded from cache have corrupted output buffers.

Repro

Environment

  • macOS (Apple Silicon)
  • ExecutorTorch with CoreML backend
  • Tested on macOS 26.2

Prerequisites

  1. Check out (Add C++ static runner for CoreML #16463), ensure you're in the executorch repo root, and set up ExecutorTorch:
cd /path/to/executorch
python install_executorch.py --editable

Step 1: Build the Runner

# Clean and build
rm -rf cmake-out

cmake -S . -B cmake-out \
  -DCMAKE_BUILD_TYPE=Release \
  -DEXECUTORCH_ENABLE_LOGGING=ON \
  -DEXECUTORCH_BUILD_EXTENSION_LLM=ON \
  -DEXECUTORCH_BUILD_EXTENSION_LLM_RUNNER=ON \
  -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
  -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \
  -DEXECUTORCH_BUILD_EXTENSION_NAMED_DATA_MAP=ON \
  -DEXECUTORCH_BUILD_COREML=ON \
  -G Ninja

cmake --build cmake-out -j --target run_static_llm_coreml

Step 2: Download Model Artifacts and Export

cd examples/apple/coreml/llama

# Download stories110M model artifacts
curl -Ls https://huggingface.co/karpathy/tinyllamas/resolve/main/stories110M.pt --output stories110M.pt
curl -Ls https://raw.githubusercontent.com/karpathy/llama2.c/master/tokenizer.model --output tokenizer.model
echo '{"dim": 768, "multiple_of": 32, "n_heads": 12, "n_layers": 12, "norm_eps": 1e-05, "vocab_size": 32000}' > params.json

# Export with CPU_ONLY (triggers severe bug - <unk> tokens)
python export_static_llm_coreml.py --checkpoint stories110M.pt --params params.json --output model_cpu.pte --cpu_only

# Export with CPU_AND_NE (triggers mild bug - nonsensical text)
python export_static_llm_coreml.py --checkpoint stories110M.pt --params params.json --output model_ane.pte

cd ../../../..  # back to repo root

Step 3: Reproduce the CPU_ONLY Bug

# Clear the CoreML cache
rm -rf ~/Library/Caches/executorchcoreml/models/*

# First run - WORKS (compiles and caches the model)
./cmake-out/examples/apple/coreml/llama/runner/run_static_llm_coreml \
  --model examples/apple/coreml/llama/model_cpu.pte \
  --params examples/apple/coreml/llama/params.json \
  --tokenizer examples/apple/coreml/llama/tokenizer.model \
  --prompt "Once upon a time," \
  --max_new_tokens 20

# Second run - FAILS (loads from cache, produces <unk> tokens)
./cmake-out/examples/apple/coreml/llama/runner/run_static_llm_coreml \
  --model examples/apple/coreml/llama/model_cpu.pte \
  --params examples/apple/coreml/llama/params.json \
  --tokenizer examples/apple/coreml/llama/tokenizer.model \
  --prompt "Once upon a time," \
  --max_new_tokens 20

Actual Output (CPU_ONLY)

First Run (cache miss):

Once upon a time, there was a little girl named Lily. She loved to play outside in the sunshine and

Second Run (cache hit):

Once upon a time,<unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk>

Step 4: Reproduce the CPU_AND_NE Bug

# Clear the CoreML cache
rm -rf ~/Library/Caches/executorchcoreml/models/*

# First run - WORKS
./cmake-out/examples/apple/coreml/llama/runner/run_static_llm_coreml \
  --model examples/apple/coreml/llama/model_ane.pte \
  --params examples/apple/coreml/llama/params.json \
  --tokenizer examples/apple/coreml/llama/tokenizer.model \
  --prompt "Once upon a time," \
  --max_new_tokens 20

# Second run - BUGGY (nonsensical but consistent output)
./cmake-out/examples/apple/coreml/llama/runner/run_static_llm_coreml \
  --model examples/apple/coreml/llama/model_ane.pte \
  --params examples/apple/coreml/llama/params.json \
  --tokenizer examples/apple/coreml/llama/tokenizer.model \
  --prompt "Once upon a time," \
  --max_new_tokens 20

# Third run - Same buggy output (deterministic corruption)
./cmake-out/examples/apple/coreml/llama/runner/run_static_llm_coreml \
  --model examples/apple/coreml/llama/model_ane.pte \
  --params examples/apple/coreml/llama/params.json \
  --tokenizer examples/apple/coreml/llama/tokenizer.model \
  --prompt "Once upon a time," \
  --max_new_tokens 20

Actual Output (CPU_AND_NE)

First Run (cache miss):

Once upon a time, there was a little girl named Lily. She loved to play outside in the sunshine and

Second Run (cache hit):

Once upon a time, heal, heal, hiss name, named Timmy, named Samantha toad

Third Run (cache hit):

Once upon a time, heal, heal, hiss name, named Timmy, named Samantha toad

Note: The corrupted output is deterministic/consistent across subsequent cache hits.

Models Tested

Model Compute Units First Run Cache Hit Runs
stories110M CPU_ONLY ✅ Valid <unk> tokens
stories110M CPU_AND_NE ✅ Valid ⚠️ Nonsensical text
llama1b CPU_AND_NE ✅ Valid ✅ Valid

Versions

Collecting environment information...
PyTorch version: 2.11.0.dev20251222
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 26.2 (arm64)
GCC version: Could not collect
Clang version: 17.0.0 (clang-1700.3.19.1)
CMake version: version 3.31.6
Libc version: N/A

Python version: 3.10.19 (main, Oct 21 2025, 16:37:10) [Clang 20.1.8 ] (64-bit runtime)
Python platform: macOS-26.2-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Caching allocator config: N/A

CPU:
Apple M1 Pro

Versions of relevant libraries:
[pip3] executorch==1.1.0a0+aa58075
[pip3] flake8==6.1.0
[pip3] flake8-breakpoint==1.1.0
[pip3] flake8-bugbear==24.4.26
[pip3] flake8-comprehensions==3.14.0
[pip3] flake8-plugin-utils==1.3.3
[pip3] flake8-pyi==23.5.0
[pip3] mypy==1.14.1
[pip3] mypy_extensions==1.1.0
[pip3] numpy==2.2.6
[pip3] pytorch_tokenizers==1.0.1
[pip3] torch==2.11.0.dev20251222
[pip3] torchao==0.16.0+git08e5e203f
[pip3] torchaudio==2.10.0.dev20251222
[pip3] torchdata==0.11.0
[pip3] torchsr==1.0.4
[pip3] torchtune==0.6.1
[pip3] torchvision==0.25.0.dev20251222
[conda] executorch 1.1.0a0+aa58075 pypi_0 pypi
[conda] numpy 2.2.6 pypi_0 pypi
[conda] pytorch-tokenizers 1.0.1 pypi_0 pypi
[conda] torch 2.11.0.dev20251222 pypi_0 pypi
[conda] torchao 0.16.0+git08e5e203f pypi_0 pypi
[conda] torchaudio 2.10.0.dev20251222 pypi_0 pypi
[conda] torchdata 0.11.0 pypi_0 pypi
[conda] torchfix 0.6.0 pypi_0 pypi
[conda] torchsr 1.0.4 pypi_0 pypi
[conda] torchtune 0.6.1 pypi_0 pypi
[conda] torchvision 0.25.0.dev20251222 pypi_0 pypi

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions