Endpoints are the actual deployments of your models on your specified GPU resource.
TIGHT MEMORY
warning. In such cases, we recommend enabling Online Quantization or increasing the GPU count.
INITIALIZING
status.
Specifically, charges begin after the Initializing GPU
phase, where the endpoint waits to acquire the GPU.
The endpoint then downloads and loads the model onto the GPU, which usually takes less than a minute.