Inference of LLaMA language model implemented in C/C++ for use on multiple platforms with optional quantization.
llama.cpp is a plain C/C++ implementation without dependencies for inference of the LLaMA model. The main goal is to run the model using 4-bit quantization on a MacBook, with supported platforms including Mac OS, Linux, Windows, and Docker. The project supports different models and runs on CPU with mixed F16/F32 precision. The model is for educational purposes with new features that can be added mostly through community contributions. The README.md provides usage instructions for obtaining and verifying model data, running the model, and contributing guidelines.