repo_id = "Ayjays132/phillnet-2"
tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
repo_id,
trust_remote_code=True,
torch_dtype=torch.bfloat16,
device_map="auto",
)
model.set_tokenizer(tokenizer)
model.eval()
Once loaded, the same model object exposes:
model_dir = Path(".")
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_dir,
trust_remote_code=True,
torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,
device_map="auto" if torch.cuda.is_available() else None,
)
model.set_tokenizer(tokenizer)
model.eval()
The user-facing model wrapper routes image generation to the packaged diffusion path by default. A plain call uses the cleaner public preset: 512x512, 4 steps, guidance_scale=0.0, quality_strength=1.2, latent_refiner_strength=0.0, and use_memory=False for stable image conditioning.
JSON preset with lightweight repair/checking
payload = model.generate_with_preset(
"Return valid JSON with keys: model, status, strengths.",
preset="json",
)
Deliberate mode for harder planning prompts
answer = model.deliberate_generate(
"Plan a small web game project with files, milestones, and test checks.",
preset="planning",
passes=2,
)
For best results, ask for one file or one feature at a time, run the generated code, then feed the error or desired refinement back into the model/tooling loop.
ids = model.generate(
**inputs,
max_new_tokens=64,
do_sample=False,
use_cache=True,
)
print(tokenizer.decode(ids[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
Example output is saved at examples/text_identity_output.txt.
print(result.metadata)
The image route uses the packaged local ImageGen runtime under the hood, resolved relative to the loaded model repository instead of the Transformers dynamic-module cache. The public route uses the same fused path as the base ImageGen project: local Phillnet adapter conditioning plus the packaged SDXL Turbo text encoders/tokenizers, local UNet/VAE weights, and the SDXL Turbo scheduler config. For Turbo-style sampling, guidance_scale=0.0 is used and negative prompts are not required. The final public preset uses the clean one-step route with the extra image-polish pass disabled to avoid subtle latent texture noise.
PresetCall SettingsUse Case
Default showcasesteps=1, guidance_scale=0.0, quality_strength=1.2, latent_refiner_strength=0.0, use_memory=False, image_quality_polish=FalseClean public examples and product-style images
Experimental polishOptional multi-step or polish passesManual experiments only; not the packaged public preset
Stable sizingheight=512, width=512Most reliable route; larger outputs can work but cost more memory
For sharper images, prefer better prompt structure over heavy guidance: subject, material, lighting, composition, and background. The route already keeps the UNet canvas at a stable internal size and then returns the requested output size.
print(video.metadata)
Optional sound is routed through the same package: pass audio_path="examples/speech_identity.wav" to mux an existing track, or pass audio_text="..." to synthesize narration through model.synthesize_speech(...) and attach it to the MP4 when FFmpeg is available.
Visual route
ImageGen keyframes use the same clean 512x512, one-step route as the asset gallery, then VideoGen interpolates and stabilizes the timeline.
Audio route
audio_text="auto" creates prompt-aware narration from the actual scene and camera motion; audio_path muxes an existing track.
SFX timeline
sfx_prompt and inferred ambience cues are stored in metadata for future sound-effect fine-tuning, without claiming speech synthesis is an SFX model.
With align_audio=True, VideoGen resolves or synthesizes audio before interpolation, expands effective_seconds to match the audio duration, and computes the final frame count from effective_seconds * fps. The composer backend also raises planned_keyframes from the aligned duration through keyframe_interval_seconds, so long narration or input audio gets multiple visual anchors instead of one stretched frame sequence. With condition_on_audio=True, input audio is routed through listening/encoding first and its summary conditions the visual storyboard.
<p>
Final example output generated through <code>model.generate_video(...)</code>. The GIF is included for model-card preview; the MP4 includes the aligned AAC audio stream for local playback.
</p>
<img src="examples/video_showcase_ad.gif" alt="Phillnet-2 generated product video GIF" style="border-radius: 15px;">
<p>
Files: <a href="examples/video_showcase_ad.gif"><code>examples/video_showcase_ad.gif</code></a>, <a href="examples/video_showcase_ad.mp4"><code>examples/video_showcase_ad.mp4</code></a>, and the generated narration stem <a href="examples/video_showcase_ad.wav"><code>examples/video_showcase_ad.wav</code></a>.
</p>
</div>
speech = model.synthesize_speech(
"The multimodal route can generate clean image keyframes, compose a short video timeline, synthesize narration, and mux audio into the final MP4.",
output_path="outputs/speech_multimodal_surface.wav",
)
print(speech.metadata)
Audio examples generated through model.synthesize_speech(...):
examples/speech_identity.wav
examples/speech_multimodal_surface.wav
Whisper listening assets are packaged under Audio/models/Phillnet-2-Whisper-Large-V3-Turbo.
<h3>PhillNet-2 Benchmark Visuals</h3>
<p>
Release-readiness benchmark visuals for the experimental AXIOM multimodal GPT-OSS runtime. These charts summarize local smoke validation and bounded <code>lm-evaluation-harness</code> slices.
</p>

<p>High-level benchmark panel for quick model-card scanning.</p>

<p>Local release-readiness smoke results. These are not SOTA or official leaderboard claims.</p>

<p>Bounded <code>lm-evaluation-harness</code> slice using primary metrics per task.</p>

<p>Visual status card for text, code, image, video, audio, and multimodal route validation.</p>

<p>Actual one-shot website artifact from <code>test_oneshot_website.py</code>, rendered from <code>examples/oneshot_website_demo.html</code>.</p>