Uploading Datasets

This document explains how to upload datasets for fine-tuning. On Friendli, you can upload datasets via the web interface or the SDK.

You can easily upload datasets through the web interface. Files in .jsonl and .parquet formats are supported, and each dataset should be structured as follows:

Conversation

This is the most basic dataset format. The role field can be system, user, or assistant.

{"messages": [{"role": "...", "content": "..."}]}

Alpaca (Beta)

Two types of Alpaca datasets are supported as shown below.
For compatibility with the Conversation format, they are automatically converted according to a template during upload. If you do not want automatic conversion, please convert to the Conversation format before uploading, or use the SDK to upload.

{"instruction": "...", "output": "..."}
{"instruction": "...", "input": "...", "output": "..."}

Multi-Modal (Image)

For multi-modal inputs, the following three formats are supported for compatibility.
Currently, the web interface does not support local path, base64, or PIL.Image objects. For these cases, please use the SDK to upload.

{"messages": [{"role": "...", "content": [{"type": "text", "text": "..."}, {"type": "image", "image": "https://example.com/image.jpg"}]}]}
{"messages": [{"role": "...", "content": [{"type": "text", "text": "..."}, {"type": "image", "image_url": "https://example.com/image.jpg"}]}]}
{"messages": [{"role": "...", "content": [{"type": "text", "text": "..."}, {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}]}]}

How to Upload a Dataset

First, go to the ‘Datasets’ section in the Friendli Suite. Click the ‘New Dataset’ button to start the upload process.
From the dropdown, select ‘Upload a file directly’ option.

Click the File Upload Area in the Dataset file section, or drag and drop the file you want to upload. Then click the ‘Upload’ button to start uploading.

The dataset will be uploaded progressively in the background. Once the upload is complete, you can rename it, add splits, and preview each split.

Next Steps

Now that you have uploaded your dataset, you can proceed to fine-tune your model.