poolside-laguna-hackathon

laguna-xs2-dense-stage1

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

Loading

python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
m = AutoModelForCausalLM.from_pretrained("poolside-laguna-hackathon/laguna-xs2-dense-stage1",
        trust_remote_code=True, torch_dtype=torch.bfloat16, device_map="cuda")
tok = AutoTokenizer.from_pretrained("poolside-laguna-hackathon/laguna-xs2-dense-stage1", trust_remote_code=True)

Dense FFNs (intermediate 4608) are zero-padded to 8192 so the stock modeling_laguna.py loads it (numerically identical); exported reports ≈3.8B, true model ≈3.0B. last.pt (raw Stage-1 FFN weights) is also in this repo. Footprint: ≈6 GB bf16 vs ≈67 GB for the 33B MoE (≈11× less weight VRAM).

See the Stage-2 card for the full method, results, and next steps. Code: https://github.com/postscarcity-inc/laguna-xs.2-dense

Model provider

poolside-laguna-hackathon

Model tree

Base

poolside/Laguna-XS.2

Fine-tuned

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Model card

Explore FriendliAI today

Get started Talk to an engineer

Loading

python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
m = AutoModelForCausalLM.from_pretrained("poolside-laguna-hackathon/laguna-xs2-dense-stage1",
        trust_remote_code=True, torch_dtype=torch.bfloat16, device_map="cuda")
tok = AutoTokenizer.from_pretrained("poolside-laguna-hackathon/laguna-xs2-dense-stage1", trust_remote_code=True)

See the Stage-2 card for the full method, results, and next steps. Code: https://github.com/postscarcity-inc/laguna-xs.2-dense

laguna-xs2-dense-stage1

Get help setting up a custom Dedicated Endpoints.

README

Loading

Explore FriendliAI today

README

Loading