pbhappliedsystems

Qwen2.5-32B-Instruct-4bit-20260527_122210

README

License: apache-2.0

Overview

This checkpoint was quantized using BitsAndBytes and evaluated with standard text similarity metrics.

Model Architecture

Table with columns: Attribute, Value
Attribute	Value
Model class	Qwen2ForCausalLM
Number of parameters	17,161,065,472
Hidden size	5120
Number of layers	64
Attention heads	40
Vocabulary size	152064
Compute dtype	bfloat16

Quantization Configuration

json
{
  "quant_method": "bitsandbytes",
  "_load_in_8bit": false,
  "_load_in_4bit": true,
  "llm_int8_threshold": 6.0,
  "llm_int8_skip_modules": null,
  "llm_int8_enable_fp32_cpu_offload": false,
  "llm_int8_has_fp16_weight": false,
  "bnb_4bit_quant_type": "nf4",
  "bnb_4bit_use_double_quant": true,
  "bnb_4bit_compute_dtype": "bfloat16",
  "bnb_4bit_quant_storage": "uint8",
  "load_in_4bit": true,
  "load_in_8bit": false
}

Intended Use

Research and experimentation.
Instruction-following tasks in resource-constrained environments.
Demonstrations of quantized model capabilities.

Limitations

May reproduce biases from the original model.
Quantization may reduce generation diversity and factual accuracy.
Not intended for production without additional evaluation.

Usage

python
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("pbhappliedsystems/Qwen2.5-32B-Instruct-4bit-20260527_122210")
model = AutoModelForCausalLM.from_pretrained("pbhappliedsystems/Qwen2.5-32B-Instruct-4bit-20260527_122210", device_map="auto")

prompt = "Explain the concept of reinforcement learning."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Generation Settings

This model produces best results when generated with:

temperature: 0.3
top_p: 0.9

Model Files Metadata

Table with columns: Filename, Size (bytes), SHA-256
Filename	Size (bytes)	SHA-256
`model-00001-of-00004.safetensors`	4,933,190,348	`b2a0e8a735e99b3a59bb3139541c444808aff3793a28c314c0f02bf17a00b5f7`
`model-00002-of-00004.safetensors`	4,958,587,236	`fd4b028d13261c8da0e29ed57b95189d666f62f3e8d4ab232c17c4e4e131543a`
`model-00003-of-00004.safetensors`	4,999,136,184	`0446d1c6da46a5daea91bed161fd62f2f48a658d879f58a14b7ab5528eb66935`

Notes

Produced on 2026-05-27T12:33:55.921152.
Quantized automatically using BitsAndBytes.

Intended primarily for research and experimentation.

Citation

Qwen2.5-32B-Instruct

Qwen2.5 Technical Report

License

This model is distributed under the apache-2.0 license, consistent with the original /mnt/d/Development/Libraries/Qwen2.5-32B-Instruct.

Model Card Authors

This quantized model was prepared by PBH Applied Systems.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

pbhappliedsystems

Model Tree

Base

this model

Input Modalities

Text

Output Modalities