Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Why this exists

Most "trained a model" portfolio entries are fine-tunes of large pretrained checkpoints. This is different: every weight in this model started as a random number. The goal was to walk through the full pretraining loop end-to-end — custom BPE tokenizer, randomly initialized GPT-2 architecture, causal LM objective, training loop, evaluation — on a free Google Colab T4 GPU.

The output isn't competitive with anything. The understanding is the deliverable.

Model details

ArchitectureGPT-2 (decoder-only Transformer)
Parameters~4.22 M
Layers4
Embedding dim256
Attention heads4
Context length128 tokens
Vocab size4,000
TokenizerByte-level BPE, trained from scratch on TinyStories
InitializationRandom (no pretrained weights)
Training objectiveCausal language modeling (next-token prediction)

Training data

roneneldan/TinyStories — short children's stories generated by GPT-3.5/GPT-4, designed specifically so small language models can learn coherent English. A 5,000-story subset was used for this v0 run.

Training setup

HardwareGoogle Colab Free Tier (NVIDIA Tesla T4, 15.6 GB VRAM)
Precisionfp16
OptimizerAdamW (Hugging Face Trainer default)
Learning rate3e-4, cosine schedule, 30 warmup steps
Weight decay0.01
Batch size32
Steps300 (capped via max_steps)
Wall-clock time~14 seconds
Final train loss5.08 (from initial ~9, vocab=4000)

Intended use & limitations

This is a base language model, not an instruction-tuned chat model. Given a short English prompt, it will continue the text in TinyStories style (children's stories with characters like "Lily", "Tom", "Ben", simple plots).

It will NOT:

  • follow instructions
  • answer questions reliably
  • produce text outside the children's-story domain
  • maintain long-range coherence (context window is only 128 tokens)

This is v0 — explicitly an early, undercooked checkpoint. Output is occasionally repetitive and loses thread across sentences. That's expected at this scale.

Quick start

python

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
model = AutoModelForCausalLM.from_pretrained("laskar-ks/alcyone-v0")
tokenizer = AutoTokenizer.from_pretrained("laskar-ks/alcyone-v0")
gen = pipeline("text-generation", model=model, tokenizer=tokenizer)
print(gen("Once upon a time, there was a little",
max_new_tokens=80,
do_sample=True,
temperature=0.8)[0]["generated_text"])

Roadmap

  • v0 — this checkpoint. Proof that the full pretraining loop works end-to-end.
  • v1 — same architecture, longer training (≥5,000 steps on the full TinyStories train split). Expected: noticeably more coherent stories.
  • v0-id — Bahasa Indonesia variant. Custom BPE tokenizer on an Indonesian corpus, same architecture.

About the name

Alcyone (η Tauri) is the brightest star in the Pleiades open star cluster. Part of a star-named model family alongside other projects (Parallax, Altair, Pleiades agents, etc.).

Author

Trained by Laskar as part of an AI engineering portfolio exploring agentic systems, multi-agent architectures, and foundational ML.

Model provider

laskar-ks

Model tree

Base

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today