Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Why this exists
Most "trained a model" portfolio entries are fine-tunes of large pretrained checkpoints. This is different: every weight in this model started as a random number. The goal was to walk through the full pretraining loop end-to-end — custom BPE tokenizer, randomly initialized GPT-2 architecture, causal LM objective, training loop, evaluation — on a free Google Colab T4 GPU.
The output isn't competitive with anything. The understanding is the deliverable.
Model details
| Architecture | GPT-2 (decoder-only Transformer) |
| Parameters | ~4.22 M |
| Layers | 4 |
| Embedding dim | 256 |
| Attention heads | 4 |
| Context length | 128 tokens |
| Vocab size | 4,000 |
| Tokenizer | Byte-level BPE, trained from scratch on TinyStories |
| Initialization | Random (no pretrained weights) |
| Training objective | Causal language modeling (next-token prediction) |
Training data
roneneldan/TinyStories — short children's stories generated by GPT-3.5/GPT-4, designed specifically so small language models can learn coherent English. A 5,000-story subset was used for this v0 run.
Training setup
| Hardware | Google Colab Free Tier (NVIDIA Tesla T4, 15.6 GB VRAM) |
| Precision | fp16 |
| Optimizer | AdamW (Hugging Face Trainer default) |
| Learning rate | 3e-4, cosine schedule, 30 warmup steps |
| Weight decay | 0.01 |
| Batch size | 32 |
| Steps | 300 (capped via max_steps) |
| Wall-clock time | ~14 seconds |
| Final train loss | 5.08 (from initial ~9, vocab=4000) |
Intended use & limitations
This is a base language model, not an instruction-tuned chat model. Given a short English prompt, it will continue the text in TinyStories style (children's stories with characters like "Lily", "Tom", "Ben", simple plots).
It will NOT:
- follow instructions
- answer questions reliably
- produce text outside the children's-story domain
- maintain long-range coherence (context window is only 128 tokens)
This is v0 — explicitly an early, undercooked checkpoint. Output is occasionally repetitive and loses thread across sentences. That's expected at this scale.
Quick start
python
from transformers import AutoModelForCausalLM, AutoTokenizer, pipelinemodel = AutoModelForCausalLM.from_pretrained("laskar-ks/alcyone-v0")tokenizer = AutoTokenizer.from_pretrained("laskar-ks/alcyone-v0")gen = pipeline("text-generation", model=model, tokenizer=tokenizer)print(gen("Once upon a time, there was a little",max_new_tokens=80,do_sample=True,temperature=0.8)[0]["generated_text"])
Roadmap
v0— this checkpoint. Proof that the full pretraining loop works end-to-end.v1— same architecture, longer training (≥5,000 steps on the full TinyStories train split). Expected: noticeably more coherent stories.v0-id— Bahasa Indonesia variant. Custom BPE tokenizer on an Indonesian corpus, same architecture.
About the name
Alcyone (η Tauri) is the brightest star in the Pleiades open star cluster. Part of a star-named model family alongside other projects (Parallax, Altair, Pleiades agents, etc.).
Author
Trained by Laskar as part of an AI engineering portfolio exploring agentic systems, multi-agent architectures, and foundational ML.
Model provider
laskar-ks
Model tree
Base
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information