KookiesXy/Neo50M API & Inference Endpoint

Model Details

Type: decoder-only causal language model, Llama-compatible architecture
Parameters: approximately 52.6M
Context length target: 16k tokens
Training target: about 15B pretraining tokens plus chat/instruction tuning
Hardware: 8x NVIDIA RTX 5090 cloud GPUs
Tokenizer: TinyLlama/Llama-style 32k tokenizer with a Neo50M chat template

Intended Uses

toy/local assistant experiments
educational training and inference demos
lightweight generation
testing HF, GGUF, ONNX, and distributed training pipelines

Limitations

Neo50M is very small. It is not reliable for factual accuracy, has limited reasoning ability, may hallucinate, and should not be used for safety-critical decisions or high-stakes advice.

Transformers Usage

python
from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "KookiesXy/Neo50M"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(repo_id, device_map="auto")

messages = [{"role": "user", "content": "Write a short thank-you note."}]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=120, temperature=0.7, top_p=0.9)
print(tokenizer.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))

GGUF Usage

After downloading a GGUF file:

bash
llama-cli -m neo50m-q4_k_m.gguf -p "User: Write a haiku about GPUs.\nAssistant:"

ONNX Usage

The ONNX export is intended for forward-pass validation and integration experiments. Use ONNX Runtime to load onnx/model.onnx and feed integer input_ids plus attention_mask.

Dataset Summary

The training pipeline streams a configurable mixture of FineWeb-Edu, Cosmopedia, Wikipedia-like text, TinyStories, and a small permissive code component. SFT uses OpenHermes-style, UltraChat-style, Alpaca-style, and small refusal/helpfulness examples when available. Dataset availability can change; the exact configs are included with the upload.

Eval Results

Eval artifacts, when present, are uploaded under evals/.

Neo50M

Get help setting up a custom Dedicated Endpoints.

README