Nebulixlabs/Nutral-v1-Tiny API & Inference Endpoint

📊 Model Details

Model Name: Nutral v1 Tiny
Developer: Nebulixlabs
Model Type: Causal Language Model
Architecture: Llama (Custom Micro Configuration)
- hidden_size: 128
- intermediate_size: 348
- num_hidden_layers: 4
- num_attention_heads: 4
- num_key_value_heads: 4
- vocab_size: 2048
Parameters: ~1.32 Million
Context Length: 256 Tokens
Formats Provided: Hugging Face PyTorch (.safetensors/.bin) & GGUF

🎯 Intended Uses & Capabilities

Because Nutral-v1-Tiny operates with only 1.3M parameters and a restricted 2048-token vocabulary, its capabilities are strictly fundamental.

Primary Use Cases:

Edge Device Testing: A dummy/baseline LLM to test deployment pipelines (e.g., llama.cpp) on hardware with extremely low RAM.
Basic Text Generation: Next-word prediction for simple English sentences.
Syntax Recognition: Demonstrating basic grammatical structures learned from educational data.
Educational Purposes: A fast-training baseline to study Llama architecture behavior at a tiny scale.

Out-of-Scope Uses:

Conversational AI or Chatbots.
Logical reasoning, math, or coding tasks.
Factual QA (the model is highly prone to hallucinations due to its size).

🏋️ Training Details

The model was trained from scratch using a fast-extraction pipeline and optimized hardware.

Dataset: HuggingFaceFW/fineweb-edu (Using the sample-10BT split)
Tokens Trained: 30 Million tokens
Hardware: 2x NVIDIA T4 GPUs
Optimizer: AdamW (optim="adamw_torch")
Precision: FP16
Hyperparameters:
- Learning Rate: 6e-4
- Weight Decay: 0.01
- Batch Size: 16 (with Gradient Accumulation steps: 2)
- Max Steps: 3700

🚀 How to Get Started

You can load the model using the standard transformers library or run the optimized .gguf file using llama.cpp.

1. Using Hugging Face Transformers

python
import torch
from transformers import AutoTokenizer, LlamaForCausalLM

model_id = "Nebulixlabs/Nutral-v1-Tiny"

# Load Tokenizer and Model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = LlamaForCausalLM.from_pretrained(model_id)

# Generate Text
prompt = "The solar system consists of"
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(**inputs, max_new_tokens=30, temperature=0.7, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Nutral-v1-Tiny

Get help setting up a custom Dedicated Endpoints.

README