Getting Started
Quick start guide to begin using BitNet for 1-bit LLM inference
Introduction
Welcome to BitNet! This guide will help you get up and running with BitNet in minutes. BitNet is Microsoft's official inference framework for 1-bit Large Language Models, providing efficient inference with reduced memory footprint and improved performance.
If you haven't already, check out our About Page to learn more about BitNet and its capabilities. For detailed feature information, visit our Features Page.
Prerequisites
Before installing BitNet, ensure you have:
- Python 3.9 or higher - Python 3.9 is recommended
- Conda (recommended) or pip for environment management
- CUDA-capable GPU (optional but recommended for best performance)
- CMake - Required for building from source
- C++ Compiler - clang or GCC (clang recommended)
- Git - For cloning the repository
For detailed system requirements and setup instructions, see our Installation Guide.
Quick Installation
Step 1: Clone the Repository
git clone --recursive https://github.com/microsoft/BitNet.git
cd BitNet
Step 2: Create Conda Environment
We recommend using conda for environment management:
conda create -n bitnet-cpp python=3.9
conda activate bitnet-cpp
Step 3: Install Dependencies
pip install -r requirements.txt
Step 4: Download a Model
Download a pre-quantized model from HuggingFace:
huggingface-cli download microsoft/BitNet-b1.58-2B-4T-gguf \
--local-dir models/BitNet-b1.58-2B-4T
For a complete list of available models, check our Models Page.
Step 5: Setup Environment
python setup_env.py -md models/BitNet-b1.58-2B-4T -q i2_s
Your First Inference
Now you're ready to run your first inference! Use the following command:
python run_inference.py \
-m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf \
-p "You are a helpful assistant" \
-cnv
This will start an interactive conversation with the model. For more usage examples, see our Usage Guide.
What's Next?
Now that you have BitNet up and running, here's what you can do next:
- Explore Usage Examples: Check out our Usage Guide for more examples and advanced usage
- Try Different Models: Visit our Models Page to see all available models
- Run Benchmarks: Use our Benchmark Tools to measure performance
- Read Documentation: Explore our comprehensive Documentation
- Join the Community: Check out our Resources and consider Contributing
Common Issues
If you encounter any issues during setup, check our Frequently Asked Questions page for solutions to common problems. The FAQ covers issues like:
- Build errors with llama.cpp
- CUDA compatibility issues
- Model download problems
- Windows-specific setup challenges
Additional Resources
For more detailed information, check out these resources:
- Installation Guide - Detailed installation instructions
- Usage Guide - Comprehensive usage examples
- Documentation - Complete API reference
- GitHub Repository - Source code and issues
- GitHub Issues - Report bugs or ask questions