Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

📊 Model Details

  • Model Name: Nutral v1 Tiny
  • Developer: Nebulixlabs
  • Model Type: Causal Language Model
  • Architecture: Llama (Custom Micro Configuration)
    • hidden_size: 128
    • intermediate_size: 348
    • num_hidden_layers: 4
    • num_attention_heads: 4
    • num_key_value_heads: 4
    • vocab_size: 2048
  • Parameters: ~1.32 Million
  • Context Length: 256 Tokens
  • Formats Provided: Hugging Face PyTorch (.safetensors/.bin) & GGUF

🎯 Intended Uses & Capabilities

Because Nutral-v1-Tiny operates with only 1.3M parameters and a restricted 2048-token vocabulary, its capabilities are strictly fundamental.

Primary Use Cases:

  • Edge Device Testing: A dummy/baseline LLM to test deployment pipelines (e.g., llama.cpp) on hardware with extremely low RAM.
  • Basic Text Generation: Next-word prediction for simple English sentences.
  • Syntax Recognition: Demonstrating basic grammatical structures learned from educational data.
  • Educational Purposes: A fast-training baseline to study Llama architecture behavior at a tiny scale.

Out-of-Scope Uses:

  • Conversational AI or Chatbots.
  • Logical reasoning, math, or coding tasks.
  • Factual QA (the model is highly prone to hallucinations due to its size).

🏋️ Training Details

The model was trained from scratch using a fast-extraction pipeline and optimized hardware.

  • Dataset: HuggingFaceFW/fineweb-edu (Using the sample-10BT split)
  • Tokens Trained: 30 Million tokens
  • Hardware: 2x NVIDIA T4 GPUs
  • Optimizer: AdamW (optim="adamw_torch")
  • Precision: FP16
  • Hyperparameters:
    • Learning Rate: 6e-4
    • Weight Decay: 0.01
    • Batch Size: 16 (with Gradient Accumulation steps: 2)
    • Max Steps: 3700

🚀 How to Get Started

You can load the model using the standard transformers library or run the optimized .gguf file using llama.cpp.

1. Using Hugging Face Transformers

python

import torch
from transformers import AutoTokenizer, LlamaForCausalLM
model_id = "Nebulixlabs/Nutral-v1-Tiny"
# Load Tokenizer and Model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = LlamaForCausalLM.from_pretrained(model_id)
# Generate Text
prompt = "The solar system consists of"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=30, temperature=0.7, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Model provider

Nebulixlabs

Model tree

Base

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today