Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0repo_id = "Ayjays132/phillnet-2"
tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( repo_id, trust_remote_code=True, torch_dtype=torch.bfloat16, device_map="auto", ) model.set_tokenizer(tokenizer) model.eval() Once loaded, the same model object exposes:
model_dir = Path(".")
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( model_dir, trust_remote_code=True, torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32, device_map="auto" if torch.cuda.is_available() else None, ) model.set_tokenizer(tokenizer) model.eval() The user-facing model wrapper routes image generation to the packaged diffusion path by default. A plain call uses the cleaner public preset: 512x512, 4 steps, guidance_scale=0.0, quality_strength=1.2, latent_refiner_strength=0.0, and use_memory=False for stable image conditioning.
markdown
</div>
JSON preset with lightweight repair/checking
payload = model.generate_with_preset( "Return valid JSON with keys: model, status, strengths.", preset="json", )
Deliberate mode for harder planning prompts
answer = model.deliberate_generate( "Plan a small web game project with files, milestones, and test checks.", preset="planning", passes=2, ) For best results, ask for one file or one feature at a time, run the generated code, then feed the error or desired refinement back into the model/tooling loop.
ids = model.generate( **inputs, max_new_tokens=64, do_sample=False, use_cache=True, )
print(tokenizer.decode(ids[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)) Example output is saved at examples/text_identity_output.txt.
print(result.metadata) The image route uses the packaged local ImageGen runtime under the hood, resolved relative to the loaded model repository instead of the Transformers dynamic-module cache. The public route uses the same fused path as the base ImageGen project: local Phillnet adapter conditioning plus the packaged SDXL Turbo text encoders/tokenizers, local UNet/VAE weights, and the SDXL Turbo scheduler config. For Turbo-style sampling, guidance_scale=0.0 is used and negative prompts are not required. The final public preset uses the clean one-step route with the extra image-polish pass disabled to avoid subtle latent texture noise. PresetCall SettingsUse Case Default showcasesteps=1, guidance_scale=0.0, quality_strength=1.2, latent_refiner_strength=0.0, use_memory=False, image_quality_polish=FalseClean public examples and product-style images Experimental polishOptional multi-step or polish passesManual experiments only; not the packaged public preset Stable sizingheight=512, width=512Most reliable route; larger outputs can work but cost more memory For sharper images, prefer better prompt structure over heavy guidance: subject, material, lighting, composition, and background. The route already keeps the UNet canvas at a stable internal size and then returns the requested output size.
print(video.metadata) Optional sound is routed through the same package: pass audio_path="examples/speech_identity.wav" to mux an existing track, or pass audio_text="..." to synthesize narration through model.synthesize_speech(...) and attach it to the MP4 when FFmpeg is available. Visual route ImageGen keyframes use the same clean 512x512, one-step route as the asset gallery, then VideoGen interpolates and stabilizes the timeline. Audio route audio_text="auto" creates prompt-aware narration from the actual scene and camera motion; audio_path muxes an existing track. SFX timeline sfx_prompt and inferred ambience cues are stored in metadata for future sound-effect fine-tuning, without claiming speech synthesis is an SFX model. With align_audio=True, VideoGen resolves or synthesizes audio before interpolation, expands effective_seconds to match the audio duration, and computes the final frame count from effective_seconds * fps. The composer backend also raises planned_keyframes from the aligned duration through keyframe_interval_seconds, so long narration or input audio gets multiple visual anchors instead of one stretched frame sequence. With condition_on_audio=True, input audio is routed through listening/encoding first and its summary conditions the visual storyboard.
markdown
<p>Final example output generated through <code>model.generate_video(...)</code>. The GIF is included for model-card preview; the MP4 includes the aligned AAC audio stream for local playback.</p><img src="examples/video_showcase_ad.gif" alt="Phillnet-2 generated product video GIF" style="border-radius: 15px;"><p>Files: <a href="examples/video_showcase_ad.gif"><code>examples/video_showcase_ad.gif</code></a>, <a href="examples/video_showcase_ad.mp4"><code>examples/video_showcase_ad.mp4</code></a>, and the generated narration stem <a href="examples/video_showcase_ad.wav"><code>examples/video_showcase_ad.wav</code></a>.</p></div>
speech = model.synthesize_speech( "The multimodal route can generate clean image keyframes, compose a short video timeline, synthesize narration, and mux audio into the final MP4.", output_path="outputs/speech_multimodal_surface.wav", )
print(speech.metadata) Audio examples generated through model.synthesize_speech(...): examples/speech_identity.wav examples/speech_multimodal_surface.wav Whisper listening assets are packaged under Audio/models/Phillnet-2-Whisper-Large-V3-Turbo.
markdown
<h3>PhillNet-2 Benchmark Visuals</h3><p>Release-readiness benchmark visuals for the experimental AXIOM multimodal GPT-OSS runtime. These charts summarize local smoke validation and bounded <code>lm-evaluation-harness</code> slices.</p>

markdown
<p>High-level benchmark panel for quick model-card scanning.</p>

markdown
<p>Local release-readiness smoke results. These are not SOTA or official leaderboard claims.</p>

markdown
<p>Bounded <code>lm-evaluation-harness</code> slice using primary metrics per task.</p>

markdown
<p>Visual status card for text, code, image, video, audio, and multimodal route validation.</p>

markdown
<p>Actual one-shot website artifact from <code>test_oneshot_website.py</code>, rendered from <code>examples/oneshot_website_demo.html</code>.</p>
Model provider
ayjays132
Model tree
Base
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information