numind

numind

NuMarkdown-8B-Thinking

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: mit

Results

NuMarkdown-8B-Thinking is outperforming generic non-reasoning models like GPT-4o and specialized OCR models like OCRFlux. It is competitive against large reasoning closed-source models like Gemini 2.5.

Table
RankModelμσμ − 3σ
🥇 1gemini-flash-reasoning26.750.8024.35
🥈 2NuMarkdown-reasoning26.100.7923.72
🥉 3NuMarkdown-reasoning-w/o_grpo25.320.8022.93
4OCRFlux-3B24.630.8022.22
5gpt-4o24.480.8022.08
6gemini-flash-w/o_reasoning24.110.7921.74
7RolmoOCR23.530.8221.07

We plan to realease a markdown arena, similar to llmArena, for complex document-to-markdown tasks to provide a tool to evaluate different solutions.

Win/Draw/Lose-rate against others models (image-only):

Training

  1. SFT: Single epoch supervised fine-tuning on synthetic reasoning traces generated from public PDFs.
  2. RL (GRPO): RL phase using a layout-centric reward with difficult image examples.

Example:

markdown

<think>
1. **Analyze the overall structure:** The document consists of two main sections, each containing a header, a set of bullet points, a title, a table, and a footer. The layout is consistent within each section but different between the two.
2. **Handle the first section header and bullet points:**
- The top left corner has "Generalitat de Catalunya Departament d'Educació Institut Gal·lecs". This is a consistent header and should be transcribed as a level 3 heading.
- The top right corner has "Curs 2021-22". This is a distinct header and should be transcribed as a level 3 heading.
- Below the header, there are four bullet points. These are standard list items. I will use Markdown's unordered list syntax (`- `).
3. **Process the first section's main content - Table 1:**
- **Identify table structure:** This is a complex table with a clear header row: "MATERIAL", "TÍTOL", "AUTOR", "EDITORIAL", "ISBN".
- **Determine columns:** There are 5 columns.
- **Handle merged cells:**
- "Llengua Catalana" spans two rows.
- "Llengua Castellana" spans three rows.
- "Anglès" spans two rows.
- **Populate table data:** Transcribe all text, including multi-line content within cells (e.g., "Babulinka Books\nEditorialCol·lecció: Pettes\nJoies per a Grans Lectors.\n2013").
- **Markdown table creation:** Use standard Markdown table syntax, ensuring correct alignment and handling of multi-line content within cells.
4. **Handle the first section's footer:**
- The footer contains "Codí: 04mp02", "Responsable: Coordinador Qualitat", "Versió: 5", "Full d'Informació a l'alumnat i famílies", "Aquest document pot quedar obsolet una vegada imprès", and "Pàgina 1 de 2". This is a simple text block at the bottom. I will transcribe it as a separate paragraph or list, maintaining the original layout if possible, but a simple paragraph is sufficient for a footer.
5. **Handle the second section header and bullet points:**
- The top left corner has "Generalitat de Catalunya Departament d'Educació Institut Gal·lecs". This is a consistent header and should be transcribed as a level 3 heading.
- The top right corner has "Curs 2021-22". This is a distinct header and should be transcribed as a level 3 heading.
- Below the header, there are no bullet points, but there is a table.
6. **Process the second section's main content - Table 2:**
- **Identify table structure:** This is a simpler table with a clear header row: "MATERIAL", "TÍTOL", "AUTOR", "EDITORIAL", "ISBN".
- **Determine columns:** There are 5 columns.
- **Populate table data:** Transcribe all text, including multi-line content within cells (e.g., "Projecte MOTIMATS-ONMAT (llicència digital) Tekman Books\nES GESTIONA A TRAVÉS DE L'AMPA AL SETEMBRE").
- **Markdown table creation:** Use standard Markdown table syntax.
7. **Handle the second section's footer:**
- The footer contains "Codí: 04mp02", "Responsable: Coordinador Qualitat", "Versió: 5", "Full d'Informació a l'alumnat i famílies", "Aquest document pot quedar obsolet una vegada imprès", and "Pàgina 2 de 2". This is a simple text block at the bottom. I will transcribe it as a separate paragraph or list, maintaining the original layout if possible, but a simple paragraph is sufficient for a footer.
8. **Final review:** Check for any missed text, incorrect formatting, or issues with table alignment. Ensure all original content is present and accurately represented in GitHub-Flavored Markdown. The image placeholders are not needed as there are no images in the document.
</think>
<answer>
### Generalitat de Catalunya
### Departament d'Educació
### Institut Gal·lecs
### Curs 2021-22
- Els llibres de color blau indiquen que es manté respecte al curs anterior.
- Els llibres de color groc indiquen que es tracta d'un canvi per a aquest curs.
- Els llibres de color vermell indiquen que no s'han de comprar perquè van a càrrec del centre.
- Matèries sense llibre: Dossiers, llicències i el Kit de tecnologia s'hauran de pagar i de recollir a través de l'AMPA a partir de setembre.
# 1 ESO
| MATERIAL | TÍTOL | AUTOR | EDITORIAL | ISBN |
|---|---|---|---|---|
| Llengua Catalana | Punt Volat | | Castellnou (Didacta +) | 9788417803124 |
| | Duna, Diari d'un estiu. | Muriel Villanueva | Babulinka Books<br>EditorialCol·lecció: Pettes<br>Joies per a Grans Lectors.<br>2013 | 9788494159077 |
| | El nen que xatejava amb Jack Sparrow. | Francesc Puigpelat | Bromera<br>Col·lecció: L'Elefant. 2015 | 9788490264072 |
| Llengua Castellana | Proyecto Asterisco | | Castellnou (Didacta +) | 9788417803186 |
| | Manzanas rojas | Luis Matilla | Ed. Anaya | 978846673989 |
| | Fàbulas de Esopo | Jerry Pinkney | Vicens Vives | 978843671648 |
| Anglès | Think Ahead ESO 1. Student's book.<br>Think Ahead ESO 1. Workbook (cat). | | Burlington Books<br>Burlington Books | 9788925300662<br>9789925300686 |
Codí: 04mp02
Responsable: Coordinador Qualitat
Versió: 5
Full d'Informació a l'alumnat i famílies
Aquest document pot quedar obsolet una vegada imprès
Pàgina 1 de 2
### Generalitat de Catalunya
### Departament d'Educació
### Institut Gal·lecs
### Curs 2021-22
| MATERIAL | TÍTOL | AUTOR | EDITORIAL | ISBN |
|---|---|---|---|---|
| FRANCÈS | Nouvelle Génération A1-A2 | | Santillana | 9788490494745 |
| CIÈNCIES EXPERIMENTALS | Science Bits<br>ES GESTIONA A TRAVÉS DE L'AMPA AL SETEMBRE | | | 9788412213485 (llicència digital) |
| MATEMÀTIQUES | Projecte MOTIMATS-ONMAT (llicència digital) Tekman Books<br>ES GESTIONA A TRAVÉS DE L'AMPA AL SETEMBRE | | | |
| TECNOLOGIA | Tecnologia 1 ESO | TEIDE | | 9788430783175 |
| VISUAL I PLÀSTICA | SENSE LLIBRE-KIT DE MATERIAL | | | |
| CIÈNCIES SOCIALS | SENSE LLIBRE-dossier | | | |
Codí: 04mp02
Responsable: Coordinador Qualitat
Versió: 5
Full d'Informació a l'alumnat i famílies
Aquest document pot quedar obsolet una vegada imprès
Pàgina 2 de 2
</answer>

Quick start:

vLLM:

markdown

vllm serve numind/NuMarkdown-8B-Thinking --trust_remote_code --limit-mm-per-prompt image=1

python

from openai import OpenAI
import base64
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)
def encode_image(image_path):
"""
Encode the image file to base64 string
"""
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
base64_image = encode_image("image.png")
data_url = f"data:image/jpeg;base64,{base64_image}"
chat_response = client.chat.completions.create(
model="numind/NuMarkdown-8B-Thinking",
temperature=0.7,
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {"url": data_url},
"min_pixels": 100 * 28 * 28,
"max_pixels": 5000 * 28 * 28,
},
],
},
]
)
result = chat_response.choices[0].message.content
reasoning = result.split("<think>")[1].split("</think>")[0]
answer = result.split("<answer>")[1].split("</answer>")[0]
print(answer)

🤗 Transformers:

python

import torch
from PIL import Image
from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
model_id = "numind/NuMarkdown-8B-reasoning"
processor = AutoProcessor.from_pretrained(
model_id,
trust_remote_code=True,
min_pixels=100*28*28, max_pixels=5000*28*28
)
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
attn_implementation="flash_attention_2",
device_map="auto",
trust_remote_code=True,
)
img = Image.open("image.png").convert("RGB")
messages = [{
"role": "user",
"content": [
{"type": "image"},
],
}]
prompt = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_input = processor(text=prompt, images=[img], return_tensors="pt").to(model.device)
with torch.no_grad():
model_output = model.generate(**model_input, temperature = 0.7, max_new_tokens=5000)
result = processor.decode(model_output[0])
reasoning = result.split("<think>")[1].split("</think>")[0]
answer = result.split("<answer>")[1].split("</answer>")[0]
print(answer)

Model provider

numind

numind

Model tree

Base

Qwen/Qwen2.5-VL-7B-Instruct

Fine-tuned

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today