1 basic inference

p# Basic usage of inference module for image-based inference In this tutorial, we will be using the inference module from urban-worm, which supports three frameworks to run MLLMs: Ollama (built on top of llama.cpp) and Llama.cpp to showcase inference with single and multiple images with InternVL3.

Three type of output schema will be demonstrated for inference:

plain text generation
multiple questions with binary answers
multiple choices

In [1]:

Copied!

from urbanworm.inference.llama import InferenceLlamacpp, InferenceOllama
from urbanworm.inference.llama import InferenceLlamacpp, InferenceOllama

First, let's set up some schema for defining output format and prompts for demonstrating inference tasks.

In [2]:

Copied!





# define the schema for model output

# this the default built-in schema for plain text generation
normal_format = {
    "questions": (str, ...),
    "answer": (str, ...),
}

# binary answer
bool_format = {
    "questions": (str, ...),
    "answer": (bool, ...),
}

# multiple choice
from typing import Literal
multiple_choice_format = {
    "questions": (str, ...),
    "answer": (Literal['occupied', 'unoccupied'], ...),
    "explanation": (str, ...),
}

# define the inference task and emphasize the output format in the prompt
multi_questions_prompt =  '''
    Question 1 - Is there any damage on the roof?
    Question 2 - Is any window broken or boarded?
    Question 3 - Is any door broken, missing, or boarded?

    For each question, you have to respond in the following format:
    yes (true) / no (false)
'''

multi_choice_prompt = '''
    Does the house look occupied?
    For each question, you have to respond in the following format:
    'occupied' / 'unoccupied'
'''
# define the schema for model output

# this the default built-in schema for plain text generation
normal_format = {
    "questions": (str, ...),
    "answer": (str, ...),
}

# binary answer
bool_format = {
    "questions": (str, ...),
    "answer": (bool, ...),
}

# multiple choice
from typing import Literal
multiple_choice_format = {
    "questions": (str, ...),
    "answer": (Literal['occupied', 'unoccupied'], ...),
    "explanation": (str, ...),
}

# define the inference task and emphasize the output format in the prompt
multi_questions_prompt =  '''
    Question 1 - Is there any damage on the roof?
    Question 2 - Is any window broken or boarded?
    Question 3 - Is any door broken, missing, or boarded?

    For each question, you have to respond in the following format:
    yes (true) / no (false)
'''

multi_choice_prompt = '''
    Does the house look occupied?
    For each question, you have to respond in the following format:
    'occupied' / 'unoccupied'
'''

We will be using three street views that capture a single residential property from different angles:

1 one-time inference¶

1.1 Ollama¶

In [3]:

Copied!





# build constructor
# All these three images in constructor will be used together for a single inference
data = InferenceOllama(llm='hf.co/ggml-org/InternVL3-8B-Instruct-GGUF:Q8_0',
                       image=["./data/img_1.jpg",
                              "./data/img_2.jpg",
                              "./data/img_3.jpg",],
                       schema=normal_format)
# inference
result = data.one_inference(prompt='what is the color of the house?')
result
# build constructor
# All these three images in constructor will be used together for a single inference
data = InferenceOllama(llm='hf.co/ggml-org/InternVL3-8B-Instruct-GGUF:Q8_0',
                       image=["./data/img_1.jpg",
                              "./data/img_2.jpg",
                              "./data/img_3.jpg",],
                       schema=normal_format)
# inference
result = data.one_inference(prompt='what is the color of the house?')
result

Out[3]:

	questions1	answer1	data
0	What is the color of the house?	The house in each image appears to be light-co...	[./data/img_1.jpg, ./data/img_2.jpg, ./data/im...

In [4]:

Copied!

result['answer1'][0]
result['answer1'][0]

Out[4]:

"The images depict a two-story house with white siding and multiple windows. The yard appears to be fenced, and there's an assortment of items near the entrance such as trash bins and possibly gardening tools. There is also a sidewalk leading up to the front door."

In [5]:

Copied!





# image can also be provided for a single inference
data.schema = bool_format # replace the output format
result = data.one_inference(prompt=multi_questions_prompt,
                            image="./data/img_1.jpg")
result
# image can also be provided for a single inference
data.schema = bool_format # replace the output format
result = data.one_inference(prompt=multi_questions_prompt,
                            image="./data/img_1.jpg")
result

Out[5]:

	questions1	answer1	questions2	answer2	questions3	answer3	data
0	Is there any damage on the roof?	False	Is any window broken or boarded?	False	Is any door broken, missing, or boarded?	True	[./data/img_1.jpg]

In [16]:

Copied!





# multiple choice
data.schema = multiple_choice_format # replace the output format
result = data.one_inference(prompt=multi_choice_prompt,
                            image="./data/img_1.jpg")
result
# multiple choice
data.schema = multiple_choice_format # replace the output format
result = data.one_inference(prompt=multi_choice_prompt,
                            image="./data/img_1.jpg")
result

Out[16]:

	questions1	answer1	explanation1	data
0	Does the house look occupied?	unoccupied	The porch area appears empty and there are no ...	[./data/img_1.jpg]

1.2 Llama.cpp¶

In [10]:

Copied!





# build constructor
data = InferenceLlamacpp(
    # if model amd mmproj are already downloaded,
    # you can directly specify the path to model files in the constructor, for example:
    # llm = "model/InternVL3-8B-Instruct-Q8_0.gguf"
    # mp = "model/mmproj-InternVL3-8B-Instruct-Q8_0.gguf"

    # you can also just provide model's hf repo id and its quant directly:
    llm='ggml-org/InternVL3-8B-Instruct-GGUF:Q8_0',
    image=["./data/img_1.jpg",
           "./data/img_2.jpg",
           "./data/img_3.jpg",], # All these three images in constructor will be used together for the inference
    # schema=normal_format
)
# build constructor
data = InferenceLlamacpp(
    # if model amd mmproj are already downloaded,
    # you can directly specify the path to model files in the constructor, for example:
    # llm = "model/InternVL3-8B-Instruct-Q8_0.gguf"
    # mp = "model/mmproj-InternVL3-8B-Instruct-Q8_0.gguf"

    # you can also just provide model's hf repo id and its quant directly:
    llm='ggml-org/InternVL3-8B-Instruct-GGUF:Q8_0',
    image=["./data/img_1.jpg",
           "./data/img_2.jpg",
           "./data/img_3.jpg",], # All these three images in constructor will be used together for the inference
    # schema=normal_format
)

In [14]:

Copied!

# inference
result = data.one_inference(prompt='what is the color of the house?')
result
# inference
result = data.one_inference(prompt='what is the color of the house?')
result

Out[14]:

	questions1	answer1	data
0	What is the color of the house?	The house in each image appears to be light-co...	[./data/img_1.jpg, ./data/img_2.jpg, ./data/im...

In [18]:

Copied!





# single image inference
data.schema = bool_format
result = data.one_inference(prompt=multi_questions_prompt, image="./data/img_1.jpg")
result
# single image inference
data.schema = bool_format
result = data.one_inference(prompt=multi_questions_prompt, image="./data/img_1.jpg")
result

Out[18]:

	questions1	answer1	questions2	answer2	questions3	answer3	data
0	Is there any damage on the roof?	False	Is any window broken or boarded?	False	Is any door broken, missing, or boarded?	True	[./data/img_1.jpg]

In [17]:

Copied!





# multiple choice
data.schema = multiple_choice_format # replace the output format
result = data.one_inference(prompt=multi_choice_prompt,
                            image="./data/img_1.jpg")
result
# multiple choice
data.schema = multiple_choice_format # replace the output format
result = data.one_inference(prompt=multi_choice_prompt,
                            image="./data/img_1.jpg")
result

Out[17]:

	questions1	answer1	explanation1	data
0	Does the house look occupied?	unoccupied	The porch area appears empty and there are no ...	[./data/img_1.jpg]

2 Batched inference with multiple-image input¶

To implement batched multi-image input for inference, we just need to pack images (path) into a nested list/tuple.

2.1 Ollama¶

In [4]:

Copied!





data = InferenceOllama(llm='hf.co/ggml-org/InternVL3-8B-Instruct-GGUF:Q8_0',
                       schema=bool_format)
data.imgs = [
    ["./data/img_1.jpg",
     "./data/img_2.jpg",],
    ["./data/img_2.jpg",
     "./data/img_3.jpg",]
]

# uncommnet the code below to do batched single-image inference
# data.imgs = [
#     ["./data/img_1.jpg",
#      "./data/img_2.jpg",
#      "./data/img_3.jpg",]
# ]

data.batch_inference(prompt=multi_questions_prompt)
data = InferenceOllama(llm='hf.co/ggml-org/InternVL3-8B-Instruct-GGUF:Q8_0',
                       schema=bool_format)
data.imgs = [
    ["./data/img_1.jpg",
     "./data/img_2.jpg",],
    ["./data/img_2.jpg",
     "./data/img_3.jpg",]
]

# uncommnet the code below to do batched single-image inference
# data.imgs = [
#     ["./data/img_1.jpg",
#      "./data/img_2.jpg",
#      "./data/img_3.jpg",]
# ]

data.batch_inference(prompt=multi_questions_prompt)

Processing...: 100%|█████████████████████████| 2/2 [00:23<00:00, 11.56s/it]

Out[4]:

	questions1	answer1	questions2	answer2	questions3	answer3	data
0	Is there any damage on the roof?	False	Is any window broken or boarded?	False	Is any door broken, missing, or boarded?	True	[./data/img_1.jpg, ./data/img_2.jpg]
1	Is there any damage on the roof?	False	Is any window broken or boarded?	False	Is any door broken, missing, or boarded?	True	[./data/img_2.jpg, ./data/img_3.jpg]

In [5]:

Copied!

data.results
data.results

Out[5]:

{'responses': [[QnA(questions='Is there any damage on the roof?', answer=False),
   QnA(questions='Is any window broken or boarded?', answer=False),
   QnA(questions='Is any door broken, missing, or boarded?', answer=True)],
  [QnA(questions='Is there any damage on the roof?', answer=False),
   QnA(questions='Is any window broken or boarded?', answer=False),
   QnA(questions='Is any door broken, missing, or boarded?', answer=True)]],
 'data': [['./data/img_1.jpg', './data/img_2.jpg'],
  ['./data/img_2.jpg', './data/img_3.jpg']]}

In [6]:

Copied!

data.df
data.df

Out[6]:

	questions1	answer1	questions2	answer2	questions3	answer3	data
0	Is there any damage on the roof?	False	Is any window broken or boarded?	False	Is any door broken, missing, or boarded?	True	[./data/img_1.jpg, ./data/img_2.jpg]
1	Is there any damage on the roof?	False	Is any window broken or boarded?	False	Is any door broken, missing, or boarded?	True	[./data/img_2.jpg, ./data/img_3.jpg]

2.2 Llama.cpp¶

In [3]:

Copied!





data = InferenceLlamacpp(llm='ggml-org/InternVL3-8B-Instruct-GGUF:Q8_0', schema=bool_format)
# pack images in a nested list to batch multiple-image inference
data.imgs = [
    ["./data/img_1.jpg",
     "./data/img_2.jpg",],
    ["./data/img_2.jpg",
     "./data/img_3.jpg",]
]

# uncommnet the code below to batch single-image inference
# data.imgs = [
#     ["./data/img_1.jpg",
#      "./data/img_2.jpg",
#      "./data/img_3.jpg",]
# ]

data.batch_inference(prompt=multi_questions_prompt)
data = InferenceLlamacpp(llm='ggml-org/InternVL3-8B-Instruct-GGUF:Q8_0', schema=bool_format)
# pack images in a nested list to batch multiple-image inference
data.imgs = [
    ["./data/img_1.jpg",
     "./data/img_2.jpg",],
    ["./data/img_2.jpg",
     "./data/img_3.jpg",]
]

# uncommnet the code below to batch single-image inference
# data.imgs = [
#     ["./data/img_1.jpg",
#      "./data/img_2.jpg",
#      "./data/img_3.jpg",]
# ]

data.batch_inference(prompt=multi_questions_prompt)

Processing...: 100%|█████████████████████████| 2/2 [00:16<00:00,  8.16s/it]

Out[3]:

	questions_1	answer_1	questions_2	answer_2	questions_3	answer_3	data_1	data_2
0	Is there any damage on the roof?	False	Is any window broken or boarded?	False	Is any door broken, missing, or boarded?	False	./data/img_1.jpg	./data/img_2.jpg
1	Is there any damage on the roof?	False	Is any window broken or boarded?	False	Is any door broken, missing, or boarded?	False	./data/img_2.jpg	./data/img_3.jpg

In [4]:

Copied!

data.df
data.df

Out[4]:

	questions_1	answer_1	questions_2	answer_2	questions_3	answer_3	data_1	data_2
0	Is there any damage on the roof?	False	Is any window broken or boarded?	False	Is any door broken, missing, or boarded?	False	./data/img_1.jpg	./data/img_2.jpg
1	Is there any damage on the roof?	False	Is any window broken or boarded?	False	Is any door broken, missing, or boarded?	False	./data/img_2.jpg	./data/img_3.jpg