ImageLMAction
dendron.actions.image_lm_action.ImageLMActionConfig
dataclass
Configuration for an ImageLMAction.
The options in this object control what Hugging Face model is used, how the node interacts with the blackboard, and what decoding strategy is used. If you want a refresher on decoding strategies, check out this blog post: https://huggingface.co/blog/how-to-generate.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name |
`str`
|
The name of the model to use. This should be a valid name corresponding to a Hugging Face model name (including the user name). |
required |
auto_load |
`Optional[bool]`
|
An optional boolean indicating whether or not to automatically
load model either from disk or the Hugging Face hub. If |
True
|
text_input_key |
`Optional[str]`
|
The blackboard key to use for writing and reading the text prompt that this node will consume. Defaults to "text_in". |
'text_in'
|
image_input_key |
`Optional[str]`
|
The blackboard key to use for writing and reading the image prompt that this node will consume. Defaults to "image_in". |
'image_in'
|
output_key |
`Optional[str]`
|
The blackboard key to use for writing and reading the text generated by this node. Defaults to "out". |
'out'
|
device |
`Optional[str]`
|
The device that should be used with the model. Examples include "cpu", "cuda", and "auto". Defaults to "auto". |
'auto'
|
load_in_8bit |
`Optional[bool]`
|
Optional boolean indicating whether or not to use eight-bit quantization
from bitsandbytes. When available, will typically decrease memory usage
and increase inference speed. Defaults to |
False
|
load_in_4bit |
`Optional[bool]`
|
Optional boolean indicating whether or not to use four-bit quantization
from bitsandbytes. When available, will typically decrease memory usage
and increase inference speed. If you observe degraded performance, try
eight-bit quanitization instead. Defaults to |
False
|
max_new_tokens |
`Optional[int]`
|
A limit on the number of new tokens to generate. You will usually want to set this yourself based on your application. Defaults to 16. |
16
|
do_sample |
`Optional[bool]`
|
Optional boolean to control decoding strategy. If set to true, allows use
of non-default generation strategy. Defaults to |
False
|
top_p |
`Optional[float]`
|
Optional float to control use of nucleus sampling. If the value is strictly between 0 and 1, nucleus sampling is activated. |
1.0
|
torch_dtype |
`torch.dtype`
|
The dtype to use for torch tensors. Defaults to |
float16
|
use_flash_attn_2 |
`Optional[bool]`
|
Optional bool controlling whether or not to use Flash Attention 2. Defaults
to |
False
|
Source code in src/dendron/actions/image_lm_action.py
dendron.actions.image_lm_action.ImageLMAction
Bases: ActionNode
An action node that uses a vision-language model to generate some text based on an image prompt and a text prompt contained in the model's blackboard.
This node is based on the Hugging Face transformers library, and will download the model that you specify by name. This can take a long time and/or use a lot of storage, depending on the model you name.
There are enough configuration options for this type of node that the options have all been placed in a dataclass config object. See the documentation for that object to learn about the many options available to you.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cfg |
`ImageLMActionConfig`
|
The configuration object for this model. |
required |
Source code in src/dendron/actions/image_lm_action.py
114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 |
|
set_model(new_model)
Set a new model to use for generating text.
set_input_processor(f)
Set the input processor to use during tick()
s.
An input processor is applied to the prompt image and the prompt text
stored in the blackboard, and can be used to preprocess the prompt.
The processor function should be a map from str
to str
. During
a tick()
, the output of this function will be what is tokenized
and sent to the model for generation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
f |
`Callable`
|
The input processor function to use. Should be a callable object that maps (image,string) pairs to (image,string) pairs. |
required |
Source code in src/dendron/actions/image_lm_action.py
set_output_processor(f)
Set the output processor to use during tick()
s.
An output processor is applied to the text generated by the model,
before that text is written to the output slot of the blackboard.
The function should be a map from str
to str
.
A typical example of an output processor would be a function that removes the prompt from the text returned by a model, so that only the newly generated text is written to the blackboard.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
f |
`Callable`
|
The output processor function. Should be a callable object that maps strings to strings. |
required |
Source code in src/dendron/actions/image_lm_action.py
tick()
Execute a tick, consisting of the following steps:
- Retrieve the text prompt and image prompt for the node's blackboard.
- Apply the input processor, if one exists,
- Process the input text and image into ids for the model.
- Generate new tokens based on the processed prompt.
- Decode the model output into a text string.
- Apply the output processor, if one exists.
- Write the output text to the blackboard.
If any of the above fail, the exception text is printed and the node
returns a status of FAILURE
. Otherwise the node returns SUCCESS
.