StarCoder2-15B achieved 72.6% on HumanEval after instruction tuning, surpassing models with significantly larger parameter counts. The BigCode project released three model variants in February 2024, trained on 67.5 terabytes of code data spanning 619 programming languages. StarCoder2-3B accumulated over 1.42 million downloads within its first year, demonstrating strong adoption across the developer community.
The open-source code generation model evolved from 15.5 billion parameters in the original release to a family of three models ranging from 3 billion to 15 billion parameters. Training infrastructure utilized 1,024 NVIDIA H100 GPUs on the Eos Supercomputer, processing over 4 trillion tokens for the flagship model.
StarCoder Key Statistics
- StarCoder2-15B parameters total 15 billion, trained on 4+ trillion tokens across 600+ programming languages as of 2024.
- The Stack v2 dataset expanded to 67.5 terabytes, representing a 10.5x increase from the original 6.4 terabytes in version 1.
- StarCoder2-15B-Instruct scored 72.6% on HumanEval benchmark, exceeding CodeLlama-70B-Instruct despite having 78% fewer parameters.
- Combined downloads across StarCoder2 variants exceeded 1.5 million as of 2024, with the 3B model accounting for over 1.42 million downloads.
- Training data covered 619 programming languages in Stack v2, compared to 358 languages in the original Stack dataset.
StarCoder Model Family and Parameters
The StarCoder family comprises four distinct models, with StarCoder2 introducing three new variants in February 2024. ServiceNow developed the 3 billion parameter model, Hugging Face created the 7 billion version, and NVIDIA built the 15 billion flagship variant.
StarCoder2-15B underwent training on the NVIDIA Eos Supercomputer using 1,024 H100 GPUs through the NVIDIA NeMo Framework. The model architecture employs Grouped Query Attention with a 16,384-token context window and 4,096-token sliding window attention mechanism.
| Model | Parameters | Training Tokens | Languages |
|---|---|---|---|
| StarCoder2-3B | 3 billion | 3.3 trillion | 17 |
| StarCoder2-7B | 7 billion | 3.5+ trillion | 17 |
| StarCoder2-15B | 15 billion | 4+ trillion | 600+ |
| StarCoder (Original) | 15.5 billion | 1 trillion | 80+ |
StarCoder Training Dataset Growth
The Stack v2 dataset reached 67.5 terabytes, marking a 10.5x expansion from the original 6.4 terabytes in Stack v1. The training set incorporates over 3 billion files sourced from the Software Heritage archive, the largest public repository of software source code.
Dataset composition includes GitHub pull requests, Kaggle notebooks, Jupyter notebooks, and extensive code documentation. The filtering process removed code with restrictive licenses, retaining only material under permissive licensing terms.
| Metric | Stack v1 | Stack v2 | Growth |
|---|---|---|---|
| Total Size | 6.4 TB | 67.5 TB | 10.5x |
| Languages | 358 | 619 | 1.7x |
| Total Files | Not disclosed | 3+ billion | N/A |
| Training Set | Baseline | 4x larger | 4x |
StarCoder Benchmark Performance Analysis
StarCoder2-15B base model recorded 46.3% on HumanEval pass@1, representing a 58% improvement over the original StarCoder’s 29.3% score. The instruction-tuned variant reached 72.6%, establishing performance superiority over substantially larger competing models.
Performance metrics demonstrate 37.8% on HumanEval+ and 33.8% on DS-1000 benchmarks for the base model. GSM8K mathematical reasoning tasks showed 65.1% accuracy, indicating strong code reasoning capabilities beyond basic generation tasks.
| Model | HumanEval | HumanEval+ | DS-1000 |
|---|---|---|---|
| StarCoder2-15B Base | 46.3% | 37.8% | 33.8% |
| StarCoder2-15B-Instruct | 72.6% | N/A | 40.6% |
| StarCoder Original | 29.3% | N/A | N/A |
| CodeLlama-34B | 48.8% | N/A | N/A |
StarCoder vs Competing Code Models
StarCoder2-15B matches or exceeds CodeLlama-34B performance across multiple benchmarks despite operating with 56% fewer parameters. The model shows particular strength in mathematical reasoning and low-resource programming language support.
DeepSeekCoder-33B leads in code completion for high-resource languages, but StarCoder2-15B demonstrates superior performance on languages including D, Julia, Lua, and Perl. The expanded language coverage in Stack v2 directly contributes to this multilingual capability advantage.
| Model | Parameters | Context Window | Training Tokens |
|---|---|---|---|
| StarCoder2-15B | 15 billion | 16,384 tokens | 4+ trillion |
| CodeLlama-34B | 34 billion | 16,384 tokens | 500B-1T |
| DeepSeekCoder-33B | 33 billion | 16,384 tokens | 2 trillion |
StarCoder Download and Adoption Metrics
StarCoder2-3B accumulated 1.42 million downloads, establishing the strongest adoption rate among the model variants. The 7B model reached 76,200 downloads, while the 15B flagship model recorded 15,600 downloads as of 2024.
Community engagement metrics show over 4,860 likes for The Stack v2 dataset on Hugging Face. The model ecosystem includes 20+ fine-tuned variants and 18+ quantized versions of StarCoder2-15B, demonstrating active developer customization and deployment.
StarCoder Memory Requirements
Memory footprint varies significantly across precision levels. StarCoder2-15B requires approximately 32GB at FP32 precision, 16.9GB at 8-bit quantization, and 9.2GB at 4-bit quantization. The 3B model operates efficiently at 2GB in 4-bit mode, enabling deployment on consumer hardware.
| Model | FP32 | 8-bit | 4-bit |
|---|---|---|---|
| StarCoder2-3B | ~12 GB | ~4 GB | ~2 GB |
| StarCoder2-7B | ~29 GB | ~7.7 GB | ~4.2 GB |
| StarCoder2-15B | ~32 GB | ~16.9 GB | ~9.2 GB |
StarCoder Technical Architecture Details
The architecture incorporates Rotary Positional Encodings with a base period of 100,000, replacing learned positional embeddings from the original StarCoder. Grouped Query Attention implementation uses 2 key-value heads for the 3B model and 4 heads for both 7B and 15B variants.
Fill-in-the-Middle training objective enables code completion using both preceding and following context. Training utilized bfloat16 precision across all model variants, with pretraining for the 15B model spanning approximately 1 million iterations before early stopping.
| Component | Specification |
|---|---|
| Context Window | 16,384 tokens |
| Sliding Window | 4,096 tokens |
| Training Precision | bfloat16 |
| Positional Encoding | RoPE (base 100,000) |
StarCoder Training Infrastructure
NVIDIA Eos Supercomputer provided computational resources through 1,024 H100 GPUs configured in DGX H100 systems. The 3B model required 97,120 GPU hours on A100 SXM4 80GB hardware for complete training.
Carbon efficiency metrics recorded 0.386 kgCO2eq/kWh for ServiceNow infrastructure. Training data underwent approximately four to five epochs following compute-optimal methodologies that extend beyond Chinchilla scaling laws.
FAQ
How many parameters does StarCoder2-15B have?
StarCoder2-15B contains 15 billion parameters and was trained on over 4 trillion tokens from 600+ programming languages using 1,024 NVIDIA H100 GPUs on the Eos Supercomputer.
What is StarCoder’s HumanEval benchmark score?
StarCoder2-15B-Instruct achieved 72.6% on HumanEval pass@1, surpassing CodeLlama-70B-Instruct’s 72.0% despite having less than one-quarter of the parameters. The base model scored 46.3%.
How large is The Stack v2 training dataset?
The Stack v2 dataset totals 67.5 terabytes, representing a 10.5x increase from Stack v1’s 6.4 terabytes. It contains over 3 billion files covering 619 programming languages.
How many downloads has StarCoder2 received?
StarCoder2-3B accumulated over 1.42 million downloads, StarCoder2-7B reached 76,200 downloads, and StarCoder2-15B recorded 15,600 downloads as of 2024. Combined downloads exceed 1.5 million across all variants.
What GPUs were used to train StarCoder2-15B?
StarCoder2-15B trained on 1,024 NVIDIA H100 GPUs using the NVIDIA NeMo Framework on the Eos Supercomputer. Training completed approximately 1 million iterations before early stopping.

