Dedicated Endpoints
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Usage
python
from transformers import AutoTokenizer, AutoModelForCausalLMimport torchrepo_id = "0labs-in/Sky-7B"tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)model = AutoModelForCausalLM.from_pretrained(repo_id,trust_remote_code=True,torch_dtype=torch.bfloat16,device_map="auto",)messages = [{"role": "user", "content": "hi"}]inputs = tokenizer.apply_chat_template(messages,add_generation_prompt=True,tokenize=True,return_tensors="pt",return_dict=True,).to(model.device)out = model.generate(**inputs, max_new_tokens=256)print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Model provider
0labs-in
Model tree
Base
allenai/Olmo-3-7B-Think
Fine-tuned
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information