Phillnet-2 API & Inference Endpoint

repo_id = "Ayjays132/phillnet-2"

tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( repo_id, trust_remote_code=True, torch_dtype=torch.bfloat16, device_map="auto", ) model.set_tokenizer(tokenizer) model.eval() Once loaded, the same model object exposes:

model_dir = Path(".")

tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( model_dir, trust_remote_code=True, torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32, device_map="auto" if torch.cuda.is_available() else None, ) model.set_tokenizer(tokenizer) model.eval() The user-facing model wrapper routes image generation to the packaged diffusion path by default. A plain call uses the cleaner public preset: 512x512, 4 steps, guidance_scale=0.0, quality_strength=1.2, latent_refiner_strength=0.0, and use_memory=False for stable image conditioning.

markdown
</div>

JSON preset with lightweight repair/checking

payload = model.generate_with_preset( "Return valid JSON with keys: model, status, strengths.", preset="json", )

Deliberate mode for harder planning prompts

answer = model.deliberate_generate( "Plan a small web game project with files, milestones, and test checks.", preset="planning", passes=2, ) For best results, ask for one file or one feature at a time, run the generated code, then feed the error or desired refinement back into the model/tooling loop.

ids = model.generate( **inputs, max_new_tokens=64, do_sample=False, use_cache=True, )

print(tokenizer.decode(ids[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)) Example output is saved at examples/text_identity_output.txt.

print(result.metadata) The image route uses the packaged local ImageGen runtime under the hood, resolved relative to the loaded model repository instead of the Transformers dynamic-module cache. The public route uses the same fused path as the base ImageGen project: local Phillnet adapter conditioning plus the packaged SDXL Turbo text encoders/tokenizers, local UNet/VAE weights, and the SDXL Turbo scheduler config. For Turbo-style sampling, guidance_scale=0.0 is used and negative prompts are not required. The final public preset uses the clean one-step route with the extra image-polish pass disabled to avoid subtle latent texture noise. PresetCall SettingsUse Case Default showcasesteps=1, guidance_scale=0.0, quality_strength=1.2, latent_refiner_strength=0.0, use_memory=False, image_quality_polish=FalseClean public examples and product-style images Experimental polishOptional multi-step or polish passesManual experiments only; not the packaged public preset Stable sizingheight=512, width=512Most reliable route; larger outputs can work but cost more memory For sharper images, prefer better prompt structure over heavy guidance: subject, material, lighting, composition, and background. The route already keeps the UNet canvas at a stable internal size and then returns the requested output size.

print(video.metadata) Optional sound is routed through the same package: pass audio_path="examples/speech_identity.wav" to mux an existing track, or pass audio_text="..." to synthesize narration through model.synthesize_speech(...) and attach it to the MP4 when FFmpeg is available. Visual route ImageGen keyframes use the same clean 512x512, one-step route as the asset gallery, then VideoGen interpolates and stabilizes the timeline. Audio route audio_text="auto" creates prompt-aware narration from the actual scene and camera motion; audio_path muxes an existing track. SFX timeline sfx_prompt and inferred ambience cues are stored in metadata for future sound-effect fine-tuning, without claiming speech synthesis is an SFX model. With align_audio=True, VideoGen resolves or synthesizes audio before interpolation, expands effective_seconds to match the audio duration, and computes the final frame count from effective_seconds * fps. The composer backend also raises planned_keyframes from the aligned duration through keyframe_interval_seconds, so long narration or input audio gets multiple visual anchors instead of one stretched frame sequence. With condition_on_audio=True, input audio is routed through listening/encoding first and its summary conditions the visual storyboard.

markdown
<p>
    Final example output generated through <code>model.generate_video(...)</code>. The GIF is included for model-card preview; the MP4 includes the aligned AAC audio stream for local playback.
  </p>
  <img src="examples/video_showcase_ad.gif" alt="Phillnet-2 generated product video GIF" style="border-radius: 15px;">
  <p>
    Files: <a href="examples/video_showcase_ad.gif"><code>examples/video_showcase_ad.gif</code></a>, <a href="examples/video_showcase_ad.mp4"><code>examples/video_showcase_ad.mp4</code></a>, and the generated narration stem <a href="examples/video_showcase_ad.wav"><code>examples/video_showcase_ad.wav</code></a>.
  </p>
</div>

speech = model.synthesize_speech( "The multimodal route can generate clean image keyframes, compose a short video timeline, synthesize narration, and mux audio into the final MP4.", output_path="outputs/speech_multimodal_surface.wav", )

print(speech.metadata) Audio examples generated through model.synthesize_speech(...): examples/speech_identity.wav examples/speech_multimodal_surface.wav Whisper listening assets are packaged under Audio/models/Phillnet-2-Whisper-Large-V3-Turbo.

markdown
<h3>PhillNet-2 Benchmark Visuals</h3>
  <p>
    Release-readiness benchmark visuals for the experimental AXIOM multimodal GPT-OSS runtime. These charts summarize local smoke validation and bounded <code>lm-evaluation-harness</code> slices.
  </p>

PhillNet-2 premium benchmark panel with headline scores

markdown
<p>High-level benchmark panel for quick model-card scanning.</p>

PhillNet-2 release readiness benchmark scorecard

markdown
<p>Local release-readiness smoke results. These are not SOTA or official leaderboard claims.</p>

PhillNet-2 lm-evaluation-harness benchmark slice

markdown
<p>Bounded <code>lm-evaluation-harness</code> slice using primary metrics per task.</p>

PhillNet-2 multimodal runtime validation table

markdown
<p>Visual status card for text, code, image, video, audio, and multimodal route validation.</p>

PhillNet-2 actual one-shot website artifact

markdown
<p>Actual one-shot website artifact from <code>test_oneshot_website.py</code>, rendered from <code>examples/oneshot_website_demo.html</code>.</p>

repo_id = "Ayjays132/phillnet-2"

model_dir = Path(".")

markdown
</div>

JSON preset with lightweight repair/checking

payload = model.generate_with_preset( "Return valid JSON with keys: model, status, strengths.", preset="json", )

Deliberate mode for harder planning prompts

ids = model.generate( **inputs, max_new_tokens=64, do_sample=False, use_cache=True, )

print(tokenizer.decode(ids[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)) Example output is saved at examples/text_identity_output.txt.

markdown
<p>
    Final example output generated through <code>model.generate_video(...)</code>. The GIF is included for model-card preview; the MP4 includes the aligned AAC audio stream for local playback.
  </p>
  <img src="examples/video_showcase_ad.gif" alt="Phillnet-2 generated product video GIF" style="border-radius: 15px;">
  <p>
    Files: <a href="examples/video_showcase_ad.gif"><code>examples/video_showcase_ad.gif</code></a>, <a href="examples/video_showcase_ad.mp4"><code>examples/video_showcase_ad.mp4</code></a>, and the generated narration stem <a href="examples/video_showcase_ad.wav"><code>examples/video_showcase_ad.wav</code></a>.
  </p>
</div>

markdown
<h3>PhillNet-2 Benchmark Visuals</h3>
  <p>
    Release-readiness benchmark visuals for the experimental AXIOM multimodal GPT-OSS runtime. These charts summarize local smoke validation and bounded <code>lm-evaluation-harness</code> slices.
  </p>

PhillNet-2 premium benchmark panel with headline scores

markdown
<p>High-level benchmark panel for quick model-card scanning.</p>

PhillNet-2 release readiness benchmark scorecard

markdown
<p>Local release-readiness smoke results. These are not SOTA or official leaderboard claims.</p>

PhillNet-2 lm-evaluation-harness benchmark slice

markdown
<p>Bounded <code>lm-evaluation-harness</code> slice using primary metrics per task.</p>

PhillNet-2 multimodal runtime validation table

markdown
<p>Visual status card for text, code, image, video, audio, and multimodal route validation.</p>

PhillNet-2 actual one-shot website artifact

markdown
<p>Actual one-shot website artifact from <code>test_oneshot_website.py</code>, rendered from <code>examples/oneshot_website_demo.html</code>.</p>

Phillnet-2

README

JSON preset with lightweight repair/checking

Deliberate mode for harder planning prompts

Explore FriendliAI today

README

JSON preset with lightweight repair/checking

Deliberate mode for harder planning prompts