Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Model Summary

Can LLMs learn from 1,000 in-context examples?

Introducing MachineLearningLM 🧪📊 — a model continuously pretrained on millions of synthetic tabular ML tasks, enabling robust many-shot in-context learning.

📈 Scales from 8 to 1,024 examples

📈 ​​~15% improvement​​ on unseen tabular tasks compared to o3-mini / GPT-5-mini / Qwen-2.5-7B-Instruct

🌲 ​​Random-Forest–level numerical modeling robustness​

🧠 ​​MMLU score: 75.4%​

📄 Read the paper: https://huggingface.co/papers/2509.06806

GitHub: https://github.com/HaoAreYuDong/MachineLearningLM

Evaluation and Validation

We have developed an automated evaluation framework — simply configure the parameters to easily perform validation and evaluation. The code is now open-sourced at our GitHub repository.

Quick Start

bash

pip install -r requirements.txt
python ./src/evaluation/model_pred/dl_model_pred.py \
--input_dir ./demo_input.jsonl \
--output_dir ./demo_output.jsonl \
--model_name MachineLearningLM/MachineLearningLM-7B-v1

Pipeline

bash

# modify the evaluate_parameters.sh file
source evaluate_parameters.sh
# Option 1 End-to-End Pipeline
./scripts/evaluate_pipeline.sh
# Option 2 Parallel Processing
./scripts/multi_process/data_prep.sh
./scripts/multi_process/prompt_gen.sh # For deep learning only
./scripts/multi_process/model_pred.sh
./scripts/multi_process/evaluation.sh
./scripts/multi_process/report.sh
# Option3 Sequential Processing
./scripts/single_process/data_prep.sh
./scripts/single_process/prompt_gen.sh # For deep learning only
./scripts/single_process/model_pred.sh
./scripts/single_process/evaluation.sh
./scripts/single_process/report.sh

For more usage details, please visit our GitHub.

Quants of Checkpoints

https://huggingface.co/QuantFactory/MachineLearningLM-7B-v1-GGUF

Tabicl Evaluation

This part of the code needs to run in an environment with the tabicl and openpyxl libraries installed.

The evaluation code for tabicl is placed separately in the ./src/evaluation/tabicl_evaluate.py file. Use ./scripts/tabicl_evaluate.sh to obtain the evaluation results for tabicl.

Use --datasets to specify the datasets to be evaluated, and --sample_sizes to indicate the number of shots.

If multiple datasets need to be evaluated, separate them with spaces. To evaluate all CSV files in the input folder, use all.

Prior_data

MachineLearningLM uses the code from tabicl to generate prior data.

Use ./scripts/generate_data.sh to generate the prior data. It generates the corresponding .pt and .csv files, and normalizes the feature values in the CSV files to the range of 0–999, as we did in the paper.

Parameter Introduction(refer to the comments in the file tabicl\src\tabicl\prior\dataset.py

Data Scale & Structure

ParameterTypeDescription
min_featuresintMinimum number of features per dataset
max_featuresintMaximum number of features per dataset
max_classesintMaximum number of target classes
min_seq_lenintMinimum samples per dataset. Uses max_seq_len if None
max_seq_lenintMaximum samples per dataset (Not Include)

Batch Configuration

ParameterTypeDescription
batch_sizeintTotal number of datasets to generate per batch
batch_size_per_gpintNumber of datasets per group (shared characteristics)
batch_size_per_subgpintNumber of datasets per subgroup (similar causal structures). Defaults to batch_size_per_gp if None

Sequence Length Control

ParameterTypeDescription
log_seq_lenboolSample sequence length from log-uniform distribution if True
seq_len_per_gpboolSample sequence length per group (enables variable-sized datasets)
replay_smallboolOccasionally sample smaller sequences for model robustness

Train-Test Split

ParameterTypeDescription
min_train_sizeint/floatStart position/ratio for train split (int: absolute, float: fractional)
max_train_sizeint/floatEnd position/ratio for train split (int: absolute, float: fractional)

Generation Method

ParameterTypeDescription
prior_typestrPrior type: 'mlp_scm', 'tree_scm', or 'mix_scm' (random selection)
fixed_hpdictFixed structural configuration parameters
sampled_hpdictParameters sampled during generation

Computation Settings

ParameterTypeDescription
n_jobsintNumber of parallel jobs (-1 = use all processors)
num_threads_per_generateintNumber of threads per generation job
devicestrComputation device ('cpu' or 'cuda')

Train

MachineLearningLM uses the LLaMA-Factory framework for training.

Training Environment Configuration

bash

cd ./third_party/LLaMA-Factory
pip install -e ".[torch,metrics]" --no-build-isolation
pip install wandb

Use ./scripts/train.sh for training.

Project Structure

markdown

MachineLearningLM/
├──src/
| ├──evaluation/
│ │ ├── data_prep/ # Data preprocessing and chunking utilities
│ │ ├── prompt_gen/ # Prompt generation for deep learning models
│ │ ├── model_pred/ # Model inference (ML and DL prediction engines)
│ │ ├── result_proc/ # 5-layer evaluation architecture and metrics processing
│ │ ├── zero_summary/ # Result summarization and report generation
│ │ └── tabicl_evaluate.py
│ └──prior_data
│ └── pt_to_csv.py
├── scripts/
│ ├── single_process/ # Sequential execution shell scripts
│ ├── multi_process/ # Parallel execution shell scripts (with _mp suffix)
│ ├── evaluate_parameters.sh # Global parameter configuration
| ├── evaluate_pipeline.sh # automated pipeline
| ├── generate_data.sh
| ├── tabicl_evaluate.sh
| └── train.sh
├── datahub_inputs/
│ ├── data_demo/ # Demo datasets for testing
│ └── data_raw/ # Raw input datasets
├── third_party/
│ ├── tabicl/
│ └── LLaMA-Factory/
├── requirements.txt # Python dependencies for Evaluation Framework
├── README.md
├── README_zh.md
├── THIRD_PARTY_NOTICES.md
└── LICENSE

Model provider

ESHMO-AI

Model tree

Base

Qwen/Qwen2.5-7B-Instruct

Fine-tuned

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today