Installation & Setup

Q1: The build dies with errors building llama.cpp due to issues with std::chrono in log.cpp?

A: This is an issue introduced in recent versions of llama.cpp. The problem is related to compatibility between different versions of the C++ standard library and the chrono header. Please refer to this discussion in the GitHub repository for a fix.

Common solutions include:

  • Updating your compiler to a newer version
  • Ensuring consistent C++ standard library versions
  • Checking that submodules are properly initialized

Q2: How to build with clang in conda environment on Windows?

A: Before building the project, verify your clang installation and access to Visual Studio tools by running:

Verify Clang Installation
clang -v

If you see an error message such as "clang is not recognized as an internal or external command", it indicates that your command line window is not properly initialized for Visual Studio tools.

If you are using Command Prompt:

Command Prompt Setup
"C:\Program Files\Microsoft Visual Studio\2022\Professional\Common7\Tools\VsDevCmd.bat" \
  -startdir=none -arch=x64 -host_arch=x64

If you are using Windows PowerShell:

PowerShell Setup
Import-Module "C:\Program Files\Microsoft Visual Studio\2022\Professional\Common7\Tools\Microsoft.VisualStudio.DevShell.dll"
Enter-VsDevShell 3f0e31ad -SkipAutomaticLocation -DevCmdArguments "-arch=x64 -host_arch=x64"

These steps will initialize your environment and allow you to use the correct Visual Studio tools.

Q3: What are the system requirements for BitNet?

A: BitNet requires:

  • Python 3.9 or higher (Python 3.9 recommended)
  • CMake 3.15 or higher
  • C++ Compiler (Clang recommended, GCC supported on Linux)
  • CUDA-capable GPU (optional but recommended for best performance)
  • Minimum 4GB RAM (8GB+ recommended)

For detailed system requirements, see our Installation Guide.

Usage

Q4: How do I download and use a model?

A: Models can be downloaded from HuggingFace using the HuggingFace CLI:

Download Model
huggingface-cli download microsoft/BitNet-b1.58-2B-4T-gguf \
  --local-dir models/BitNet-b1.58-2B-4T

Then set up the environment and run inference. For detailed instructions, see our Getting Started Guide and Usage Guide.

Q5: What quantization types are available?

A: BitNet supports two quantization types:

  • i2_s: 1-bit signed quantization using -1 and +1 values (recommended)
  • tl1: Ternary-like quantization variant

You can specify the quantization type when setting up the environment using the --quant-type or -q option.

Q6: How do I enable conversation mode?

A: Use the -cnv or --conversation flag when running inference:

Conversation Mode
python run_inference.py \
  -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf \
  -p "You are a helpful assistant" \
  -cnv

This will enable interactive conversation mode, where the prompt specified by -p is used as the system prompt.

Performance

Q7: How much memory does BitNet use compared to traditional LLMs?

A: BitNet uses approximately 16x less memory than FP16 models due to 1-bit quantization. For example, a 7B parameter model that would require ~14GB in FP16 format requires only ~1GB with BitNet.

Q8: Can I run BitNet on CPU?

A: Yes, BitNet can run on CPU, though GPU acceleration significantly improves performance. You can control the number of threads for CPU inference using the -t or --threads option.

Q9: How do I benchmark BitNet performance?

A: Use the e2e_benchmark.py script:

Run Benchmark
python utils/e2e_benchmark.py \
  -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf \
  -n 200 \
  -p 256 \
  -t 4

For detailed benchmarking information, see our Benchmark Guide.

Model Formats

Q10: How do I convert a model from .safetensors to GGUF format?

A: Use the conversion utility:

Convert Model
# Download .safetensors model
huggingface-cli download microsoft/bitnet-b1.58-2B-4T-bf16 \
  --local-dir ./models/bitnet-b1.58-2B-4T-bf16

# Convert to GGUF
python ./utils/convert-helper-bitnet.py ./models/bitnet-b1.58-2B-4T-bf16

For more details, see our Usage Guide.

General

Q11: What models are available for BitNet?

A: BitNet supports various model architectures including BitNet-b1.58, Falcon3, and Llama3. For a complete list of available models, see our Models Page.

Q12: Is BitNet free to use?

A: Yes, BitNet is open source and licensed under the MIT License. You can use it freely for both commercial and non-commercial purposes. See the LICENSE file for details.

Q13: How can I contribute to BitNet?

A: We welcome contributions! See our Contributing Guide for information on how to contribute. You can also report bugs or request features on GitHub Issues.

Q14: Where can I get help?

A: You can get help through:

Still Have Questions?

If you have questions that aren't answered here, please:

Related Resources