1 basic inference
p# Basic usage of inference module for image-based inference
In this tutorial, we will be using the inference module from urban-worm, which supports three frameworks to run MLLMs: Ollama (built on top of llama.cpp) and Llama.cpp to showcase inference with single and multiple images with InternVL3.
Three type of output schema will be demonstrated for inference:
- plain text generation
- multiple questions with binary answers
- multiple choices
In [1]:
Copied!
from urbanworm.inference.llama import InferenceLlamacpp, InferenceOllama
from urbanworm.inference.llama import InferenceLlamacpp, InferenceOllama
First, let's set up some schema for defining output format and prompts for demonstrating inference tasks.
In [2]:
Copied!
# define the schema for model output
# this the default built-in schema for plain text generation
normal_format = {
"questions": (str, ...),
"answer": (str, ...),
}
# binary answer
bool_format = {
"questions": (str, ...),
"answer": (bool, ...),
}
# multiple choice
from typing import Literal
multiple_choice_format = {
"questions": (str, ...),
"answer": (Literal['occupied', 'unoccupied'], ...),
"explanation": (str, ...),
}
# define the inference task and emphasize the output format in the prompt
multi_questions_prompt = '''
Question 1 - Is there any damage on the roof?
Question 2 - Is any window broken or boarded?
Question 3 - Is any door broken, missing, or boarded?
For each question, you have to respond in the following format:
yes (true) / no (false)
'''
multi_choice_prompt = '''
Does the house look occupied?
For each question, you have to respond in the following format:
'occupied' / 'unoccupied'
'''
# define the schema for model output
# this the default built-in schema for plain text generation
normal_format = {
"questions": (str, ...),
"answer": (str, ...),
}
# binary answer
bool_format = {
"questions": (str, ...),
"answer": (bool, ...),
}
# multiple choice
from typing import Literal
multiple_choice_format = {
"questions": (str, ...),
"answer": (Literal['occupied', 'unoccupied'], ...),
"explanation": (str, ...),
}
# define the inference task and emphasize the output format in the prompt
multi_questions_prompt = '''
Question 1 - Is there any damage on the roof?
Question 2 - Is any window broken or boarded?
Question 3 - Is any door broken, missing, or boarded?
For each question, you have to respond in the following format:
yes (true) / no (false)
'''
multi_choice_prompt = '''
Does the house look occupied?
For each question, you have to respond in the following format:
'occupied' / 'unoccupied'
'''
We will be using three street views that capture a single residential property from different angles:
1 one-time inference¶
1.1 Ollama¶
In [3]:
Copied!
# build constructor
# All these three images in constructor will be used together for a single inference
data = InferenceOllama(llm='hf.co/ggml-org/InternVL3-8B-Instruct-GGUF:Q8_0',
image=["./data/img_1.jpg",
"./data/img_2.jpg",
"./data/img_3.jpg",],
schema=normal_format)
# inference
result = data.one_inference(prompt='what is the color of the house?')
result
# build constructor
# All these three images in constructor will be used together for a single inference
data = InferenceOllama(llm='hf.co/ggml-org/InternVL3-8B-Instruct-GGUF:Q8_0',
image=["./data/img_1.jpg",
"./data/img_2.jpg",
"./data/img_3.jpg",],
schema=normal_format)
# inference
result = data.one_inference(prompt='what is the color of the house?')
result
Out[3]:
| questions1 | answer1 | data | |
|---|---|---|---|
| 0 | What is the color of the house? | The house in each image appears to be light-co... | [./data/img_1.jpg, ./data/img_2.jpg, ./data/im... |
In [4]:
Copied!
result['answer1'][0]
result['answer1'][0]
Out[4]:
"The images depict a two-story house with white siding and multiple windows. The yard appears to be fenced, and there's an assortment of items near the entrance such as trash bins and possibly gardening tools. There is also a sidewalk leading up to the front door."
In [5]:
Copied!
# image can also be provided for a single inference
data.schema = bool_format # replace the output format
result = data.one_inference(prompt=multi_questions_prompt,
image="./data/img_1.jpg")
result
# image can also be provided for a single inference
data.schema = bool_format # replace the output format
result = data.one_inference(prompt=multi_questions_prompt,
image="./data/img_1.jpg")
result
Out[5]:
| questions1 | answer1 | questions2 | answer2 | questions3 | answer3 | data | |
|---|---|---|---|---|---|---|---|
| 0 | Is there any damage on the roof? | False | Is any window broken or boarded? | False | Is any door broken, missing, or boarded? | True | [./data/img_1.jpg] |
In [16]:
Copied!
# multiple choice
data.schema = multiple_choice_format # replace the output format
result = data.one_inference(prompt=multi_choice_prompt,
image="./data/img_1.jpg")
result
# multiple choice
data.schema = multiple_choice_format # replace the output format
result = data.one_inference(prompt=multi_choice_prompt,
image="./data/img_1.jpg")
result
Out[16]:
| questions1 | answer1 | explanation1 | data | |
|---|---|---|---|---|
| 0 | Does the house look occupied? | unoccupied | The porch area appears empty and there are no ... | [./data/img_1.jpg] |
1.2 Llama.cpp¶
In [10]:
Copied!
# build constructor
data = InferenceLlamacpp(
# if model amd mmproj are already downloaded,
# you can directly specify the path to model files in the constructor, for example:
# llm = "model/InternVL3-8B-Instruct-Q8_0.gguf"
# mp = "model/mmproj-InternVL3-8B-Instruct-Q8_0.gguf"
# you can also just provide model's hf repo id and its quant directly:
llm='ggml-org/InternVL3-8B-Instruct-GGUF:Q8_0',
image=["./data/img_1.jpg",
"./data/img_2.jpg",
"./data/img_3.jpg",], # All these three images in constructor will be used together for the inference
# schema=normal_format
)
# build constructor
data = InferenceLlamacpp(
# if model amd mmproj are already downloaded,
# you can directly specify the path to model files in the constructor, for example:
# llm = "model/InternVL3-8B-Instruct-Q8_0.gguf"
# mp = "model/mmproj-InternVL3-8B-Instruct-Q8_0.gguf"
# you can also just provide model's hf repo id and its quant directly:
llm='ggml-org/InternVL3-8B-Instruct-GGUF:Q8_0',
image=["./data/img_1.jpg",
"./data/img_2.jpg",
"./data/img_3.jpg",], # All these three images in constructor will be used together for the inference
# schema=normal_format
)
In [14]:
Copied!
# inference
result = data.one_inference(prompt='what is the color of the house?')
result
# inference
result = data.one_inference(prompt='what is the color of the house?')
result
Out[14]:
| questions1 | answer1 | data | |
|---|---|---|---|
| 0 | What is the color of the house? | The house in each image appears to be light-co... | [./data/img_1.jpg, ./data/img_2.jpg, ./data/im... |
In [18]:
Copied!
# single image inference
data.schema = bool_format
result = data.one_inference(prompt=multi_questions_prompt, image="./data/img_1.jpg")
result
# single image inference
data.schema = bool_format
result = data.one_inference(prompt=multi_questions_prompt, image="./data/img_1.jpg")
result
Out[18]:
| questions1 | answer1 | questions2 | answer2 | questions3 | answer3 | data | |
|---|---|---|---|---|---|---|---|
| 0 | Is there any damage on the roof? | False | Is any window broken or boarded? | False | Is any door broken, missing, or boarded? | True | [./data/img_1.jpg] |
In [17]:
Copied!
# multiple choice
data.schema = multiple_choice_format # replace the output format
result = data.one_inference(prompt=multi_choice_prompt,
image="./data/img_1.jpg")
result
# multiple choice
data.schema = multiple_choice_format # replace the output format
result = data.one_inference(prompt=multi_choice_prompt,
image="./data/img_1.jpg")
result
Out[17]:
| questions1 | answer1 | explanation1 | data | |
|---|---|---|---|---|
| 0 | Does the house look occupied? | unoccupied | The porch area appears empty and there are no ... | [./data/img_1.jpg] |
2 Batched inference with multiple-image input¶
To implement batched multi-image input for inference, we just need to pack images (path) into a nested list/tuple.
2.1 Ollama¶
In [4]:
Copied!
data = InferenceOllama(llm='hf.co/ggml-org/InternVL3-8B-Instruct-GGUF:Q8_0',
schema=bool_format)
data.imgs = [
["./data/img_1.jpg",
"./data/img_2.jpg",],
["./data/img_2.jpg",
"./data/img_3.jpg",]
]
# uncommnet the code below to do batched single-image inference
# data.imgs = [
# ["./data/img_1.jpg",
# "./data/img_2.jpg",
# "./data/img_3.jpg",]
# ]
data.batch_inference(prompt=multi_questions_prompt)
data = InferenceOllama(llm='hf.co/ggml-org/InternVL3-8B-Instruct-GGUF:Q8_0',
schema=bool_format)
data.imgs = [
["./data/img_1.jpg",
"./data/img_2.jpg",],
["./data/img_2.jpg",
"./data/img_3.jpg",]
]
# uncommnet the code below to do batched single-image inference
# data.imgs = [
# ["./data/img_1.jpg",
# "./data/img_2.jpg",
# "./data/img_3.jpg",]
# ]
data.batch_inference(prompt=multi_questions_prompt)
Processing...: 100%|█████████████████████████| 2/2 [00:23<00:00, 11.56s/it]
Out[4]:
| questions1 | answer1 | questions2 | answer2 | questions3 | answer3 | data | |
|---|---|---|---|---|---|---|---|
| 0 | Is there any damage on the roof? | False | Is any window broken or boarded? | False | Is any door broken, missing, or boarded? | True | [./data/img_1.jpg, ./data/img_2.jpg] |
| 1 | Is there any damage on the roof? | False | Is any window broken or boarded? | False | Is any door broken, missing, or boarded? | True | [./data/img_2.jpg, ./data/img_3.jpg] |
In [5]:
Copied!
data.results
data.results
Out[5]:
{'responses': [[QnA(questions='Is there any damage on the roof?', answer=False),
QnA(questions='Is any window broken or boarded?', answer=False),
QnA(questions='Is any door broken, missing, or boarded?', answer=True)],
[QnA(questions='Is there any damage on the roof?', answer=False),
QnA(questions='Is any window broken or boarded?', answer=False),
QnA(questions='Is any door broken, missing, or boarded?', answer=True)]],
'data': [['./data/img_1.jpg', './data/img_2.jpg'],
['./data/img_2.jpg', './data/img_3.jpg']]}
In [6]:
Copied!
data.df
data.df
Out[6]:
| questions1 | answer1 | questions2 | answer2 | questions3 | answer3 | data | |
|---|---|---|---|---|---|---|---|
| 0 | Is there any damage on the roof? | False | Is any window broken or boarded? | False | Is any door broken, missing, or boarded? | True | [./data/img_1.jpg, ./data/img_2.jpg] |
| 1 | Is there any damage on the roof? | False | Is any window broken or boarded? | False | Is any door broken, missing, or boarded? | True | [./data/img_2.jpg, ./data/img_3.jpg] |
2.2 Llama.cpp¶
In [3]:
Copied!
data = InferenceLlamacpp(llm='ggml-org/InternVL3-8B-Instruct-GGUF:Q8_0', schema=bool_format)
# pack images in a nested list to batch multiple-image inference
data.imgs = [
["./data/img_1.jpg",
"./data/img_2.jpg",],
["./data/img_2.jpg",
"./data/img_3.jpg",]
]
# uncommnet the code below to batch single-image inference
# data.imgs = [
# ["./data/img_1.jpg",
# "./data/img_2.jpg",
# "./data/img_3.jpg",]
# ]
data.batch_inference(prompt=multi_questions_prompt)
data = InferenceLlamacpp(llm='ggml-org/InternVL3-8B-Instruct-GGUF:Q8_0', schema=bool_format)
# pack images in a nested list to batch multiple-image inference
data.imgs = [
["./data/img_1.jpg",
"./data/img_2.jpg",],
["./data/img_2.jpg",
"./data/img_3.jpg",]
]
# uncommnet the code below to batch single-image inference
# data.imgs = [
# ["./data/img_1.jpg",
# "./data/img_2.jpg",
# "./data/img_3.jpg",]
# ]
data.batch_inference(prompt=multi_questions_prompt)
Processing...: 100%|█████████████████████████| 2/2 [00:16<00:00, 8.16s/it]
Out[3]:
| questions_1 | answer_1 | questions_2 | answer_2 | questions_3 | answer_3 | data_1 | data_2 | |
|---|---|---|---|---|---|---|---|---|
| 0 | Is there any damage on the roof? | False | Is any window broken or boarded? | False | Is any door broken, missing, or boarded? | False | ./data/img_1.jpg | ./data/img_2.jpg |
| 1 | Is there any damage on the roof? | False | Is any window broken or boarded? | False | Is any door broken, missing, or boarded? | False | ./data/img_2.jpg | ./data/img_3.jpg |
In [4]:
Copied!
data.df
data.df
Out[4]:
| questions_1 | answer_1 | questions_2 | answer_2 | questions_3 | answer_3 | data_1 | data_2 | |
|---|---|---|---|---|---|---|---|---|
| 0 | Is there any damage on the roof? | False | Is any window broken or boarded? | False | Is any door broken, missing, or boarded? | False | ./data/img_1.jpg | ./data/img_2.jpg |
| 1 | Is there any damage on the roof? | False | Is any window broken or boarded? | False | Is any door broken, missing, or boarded? | False | ./data/img_2.jpg | ./data/img_3.jpg |