MultivexAI/Plyx-15M API & Inference Endpoint

Pre-training Data

The model was trained on a carefully curated mix of data to build a great foundation, trained on approx ~600M tokens:

fineweb-pro: A heavily filtered and refined version of the FineWeb dataset. This provides a strong base in general-purpose language by removing significant noise and low-quality content.
fineweb-edu: A subset of FineWeb containing educational and instructional content, used to ground the model in well-structured, factual information.
finepdfs: A large collection of documents from PDFs, including professional reports and technical papers. This component introduces the model to more formal language, complex sentence structures, and data-rich formats.

A Note on Size and Performance

To set the right expectations: Plyx-15M is a 15-million-parameter model, which is quite small. Its performance won't be comparable to models with billions of parameters. It's best used for research, highly specific tasks, or as a base for fine-tuning - not as a drop-in replacement for a large, general-purpose model.

Limitations

Users should be aware of the biases and limitations of this model, as no model is truly perfect.

License

The data used for pre-training (fineweb-pro, fineweb-edu, and finepdfs) is derived from sources made available under the ODC-By 1.0 license. Users must also abide by the CommonCrawl Terms of Use. We do not alter the license of any of the underlying data.

Plyx-15M

Get help setting up a custom Dedicated Endpoints.

README

Pre-training Data

A Note on Size and Performance

Limitations

License

Explore FriendliAI today

Plyx-15M