Changelog¶
Changelog¶
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[0.2.3] — 2026-05-17¶
Fixed¶
- Multi-GPU BFloat16 dtype mismatch in vendored
urbanworm/inference/unsloth.py(InferenceUnsloth._generate_batch) — withdevice_map='auto'across two GPUs, accelerate splits ViT blocks between devices and moves tensors without re-casting their dtype. The image processor always emitsfloat32pixel_values, which causedRuntimeError: expected scalar type BFloat16 but found Floatdeep inside the vision encoder. Fixed by adding a new_apply_dtype_hooks_once()method that registersregister_forward_pre_hookon the top-level vision encoder and every ViT transformer block (identified bynorm1 + attn/self_attn) to cast floating-point tensors to the model's compute dtype before each forward pass. The hook registration is deferred until after the model is loaded (lazy load happens insidebatch_inference, not__init__) and is idempotent. - Dtype-hook flag applied too early —
_dtype_hooks_appliedwas set toTruebefore themodel is Noneguard, so a subsequent call after the model became available would skip hook registration entirely. Flag is now set only after the model reference is confirmed live. - VRAM budget used total memory instead of free memory —
max_memorywas computed astotal_memory * 0.90, which could trigger OOM on a GPU already partially occupied by other processes. Changed tomem_get_info(i)[0] * 0.90(free VRAM at load time). - Checkpoint resume re-processed trailing partial batch —
start_idx = (len(done_records) // bs) * bsrounded down to the nearest batch boundary, causing the last partial batch of a prior run to be re-inferred on every restart. Changed tostart_idx = len(done_records), withdone_records(notdone_records[:start_idx]) written to the result to match. - Silent exceptions in
_run_chunk_with_retryhalving cascade — when a batch failed andcurrent_bs > 1, the error was swallowed with no log output, making it impossible to diagnose why inference kept slowing down. Addedlogger.warning(...)before the retry to surface the exception type, message, and new batch size. - Missing
_model_dtypeattribute initialisation in__init__— addedself._model_dtype = Nonealongside the otherNone-initialised attributes for clarity and to avoid potentialAttributeErroron early attribute access.
Changed¶
classify_outdoor_unsloth.py— removed the now-redundant_patch_infer_for_dtype_safetyruntime monkey-patch and its_apply_model_dtype_hookshelper; the dtype fix is applied inside the vendored package. Also removed the unusedimport functoolsand reorderedpackagesinprint_runtime_versions_and_devices()sounslothappears beforetransformers(matching unsloth's own import-order requirement).
Added¶
patch_urbanworm.py— surgical patch script that locates the installedurbanwormpackage viaimportlib, usesastto find the exact insertion point insideInferenceUnsloth._generate_batch, and injects the_apply_dtype_hooks_once()call. Supports--check(idempotency test),--revert(restore from.bak), and creates a.bakbackup on first run.
[0.2.2] — 2026-05-17¶
Added¶
- Multi-GPU support for
InferenceUnsloth— when more than one CUDA GPU is detected,device_map="auto"is set automatically and each GPU's VRAM budget is capped at 90 % of its capacity. Override withmax_memory={0: "10GiB", 1: "10GiB"}. Thedeviceconstructor parameter can still force a specific device map. model_dirparameter on all three local inference backends. Controls where downloaded model weights are stored:InferenceUnsloth— passed ascache_dirtoFastVisionModel.from_pretrained(HuggingFace Hub cache).InferenceOllama— setsOLLAMA_MODELSaroundollama.pull(saved and restored so other instances are not affected).InferenceLlamacpp— setsHF_HUB_CACHEin thellama-mtmd-clisubprocess environment (applies when downloading via-hf; no effect on local GGUF paths).- Stability utilities for large-scale
InferenceUnslothjobs: configure_runtime(disable_compile=True)— setsUNSLOTH_COMPILE_DISABLE,UNSLOTH_DISABLE_FAST_GENERATION, andTORCH_COMPILE_DISABLEbefore Unsloth/Torch are imported. Called automatically in__init__(defaultdisable_compile=True). Prevents theAlignDevicesHook/Torch-Dynamo recompile crashes that surface on runs of ~10 k+ samples.clear_compile_cache()— removes the Unsloth compiled-model cache from system temp dirs; useful when a stale cache causes recompile errors.task_chunk_sizeparameter onbatch_inference— logical job-partition size independent ofbatch_size; reports progress at the task-chunk level for long runs.failed_log_pathparameter onbatch_inference— appends permanently failed sample indices and error messages to a CSV so they can be rerun later._log_runtime_versions()— logs torch, CUDA, GPU, transformers, accelerate, and unsloth versions atINFOlevel on model load._classify_error()— identifies known recoverable patterns (compile/hook conflict, CUDA OOM, dtype mismatch) and emits a human-readable hint.- Halving retry cascade in
InferenceUnsloth._run_chunk_with_retry— on failure, the batch is retried at half the original size all the way down to 1, then fills stubs (or re-raises ifskip_errors=False). - MkDocs documentation site —
mkdocs.ymlwith Material theme (light-blue/green + purple/green palette), mkdocstrings, mkdocs-jupyter, autorefs, and git-revision-date-localized. Auto-deployed to GitHub Pages on push tomainvia.github/workflows/docs.yml. Docs added:docs/index.md,docs/installation.md,docs/quickstart.md,docs/api/inference.md,docs/api/dataset.md,docs/api/sources.md,docs/changelog.md.
Fixed¶
- BFloat16 / Float32 dtype mismatch in
InferenceUnsloth._generate_batch— the image processor always emitspixel_valuesas float32, but BF16 models raisedexpected scalar type BFloat16 but found Float. All floating-point input tensors are now cast to the model's compute dtype after the processor call. ModuleNotFoundError: No module named 'ollama'when importingInferenceUnslothin environments without Ollama — caused by an eagerimport ollamaat the top ofllama.pyand an eager import ofllama.pyin__init__.py. All four backends (InferenceOllama,InferenceLlamacpp,InferenceUnsloth,InferenceAPI) are now lazy via__getattr__inurbanworm/__init__.py.llama.pyexposes a_lazy_ollama()helper that raises a descriptiveImportErroronly when Ollama is actually used.pydubSyntaxWarningspam on Python 3.12 — pydub's own source contains invalid escape sequences. The module-levelfrom pydub import AudioSegmentimport is replaced by a_load_audio_segment()helper that suppresses the warning withwarnings.filterwarnings. All three call sites (probe_audio_duration,clip,sound_url_to_temp) updated.skip_errors=Falseignored inInferenceUnslothretry ladder — when the final single-item retry was exhausted, stub responses were always filled regardless ofskip_errors. The flag is now checked and the exception is re-raised whenskip_errors=False.clear_compile_cachesilent cwd deletion — whenTEMPorTMPenv vars are unset,Path("") / "unsloth_compiled_cache"resolved to a relative path in cwd. Env-derived candidates are now only added when the variable is non-empty.OLLAMA_MODELSglobal mutation —InferenceOllama.one_inferenceand.batch_inferencenow save and restore (or remove)OLLAMA_MODELSaroundollama.pullso concurrent instances with differentmodel_dirvalues do not clobber each other.- Subprocess safety in
InferenceLlamacpp._mtmd— added null-byte validation onsystem_messageandprompt, aNoneguard onllm, and a comment documenting why list-based invocation is safe withoutshlex.escape(). - Multi-GPU input preparation:
.to(self._model.device)raisedAttributeErrorwhendevice_map="auto"splits the model across GPUs. Fixed by usingnext(self._model.parameters()).deviceinstead.
[0.2.0] — 2026-05-11 (dev2 branch)¶
Fixed¶
- Version mismatch across
pyproject.toml,urbanworm/__init__.py, andCITATION.cff.__version__is now resolved at runtime viaimportlib.metadataso it stays in sync with the installed distribution. GeoTaggedData.__init__was using chained dict assignment, which aliasedself.svis,self.photos, andself.audiosto the same underlying dict. They are now independent._pack(ininference.Inference) dropped the trailing group when consecutive locations stayed equal to the end of the input.InferenceLlamacpp.one_inference/batch_inferenceshadowed thetemp(temperature) parameter with a temp-file path inside the URL/base64 loops.closest()raisedNameErrorwhen a season was provided but didn't match one of the four hard-coded checks.get_sound_from_locationraisedNameErrorfor the single-clip path (flattened_slice_listwas only defined in the multi-clip branch)._mtmd(Ollama, multi-image,multiImgInput=False) now passesimg[i]to per-image inference instead of the full image list.sound_url_to_tempno longer returns a path to a deleted file on download failure — it cleans up and re-raises.- Bare
except:clauses replaced with narrow exception classes; obviously- unsafetimeout=999/9999values reduced to30/60seconds. download_to_dirnow raisesValueErrorwhento_diris missing instead of silently returning, and aligns sentinel paths so list lengths stay consistent on download failures.construct_unitsnow raisesValueError/TypeErroron bad input instead of printing and silently returningNone.getSVhonoursMAPILLARY_API_KEYenv var like the other source helpers.InferenceOllamaskip_errors=Truenow actually suppresses validation errors and returns an emptyResponseinstead of re-raising.
Changed¶
- Replaced
requirements.txtas the source of truth: dependencies and optional extras (ollama,audio,llamacpp,dev,all) now live inpyproject.toml[project]. Added missing transitive deps (mercantile,pyproj,shapely). pd.concat-in-loop replaced with single concat inget_svi_from_locations,get_photo_from_location, andget_sound_from_location(O(n²) → O(n)).print(...)calls in library code switched to a module-levellogging.getLogger("urbanworm").- Replaced flake8 config with
[tool.ruff]. Added[tool.pytest.ini_options]. - CI split into a fast
unitjob (Ubuntu, py3.10/3.11/3.12) gated on every push and PR, plus a self-hostedintegrationjob that runs onmain. - Internal helpers
_year_rangeand_parse_created(defined insidegetSound) consolidated to top-levelsolr_year_rangeandparse_iso_createdinurbanworm.utils.utils.
Added¶
tests/test_utils.py,tests/test_format.py,tests/test_dataset.py,tests/test_inference_helpers.py— pure-logic unit tests.urbanworm.utils.{geo,io,json_repair,timefilter,face,audio}submodules re-exporting curated helpers from the catch-allutils.utilsmodule.urbanworm.sources.{mapillary,flickr,freesound}submodules re-exportinggetSV,getPhoto,getSoundfor a more discoverable namespace.InferenceUnsloth— new VLM backend mirroringInferenceOllama's public surface but running locally viaunsloth.FastVisionModel. Supports GPUbatch_sizefor throughput, lazy import (no torch/unsloth pulled in unless the class is constructed), JSON-repair fallback, andskip_errorsparity. Default checkpoint:unsloth/Qwen3-VL-3B-Instruct. Tested with Qwen3-VL-3B/8B, Gemma-3-4B-IT, Qwen2-VL-2B, Qwen2.5-VL-7B-bnb-4bit. Install withpip install "urban-worm[unsloth]". Tests intests/test_unsloth.pyuse mocks so they run on any CI without a GPU.- Aporee audio source — new
getSoundAporee()andurbanworm.sources.aporeemodule. Filters a Radio Aporee catalog (CSV path or in-memory DataFrame withurl,latitude,longitudecolumns; optionalid/identifier,name,description,tags,created,duration_s) by spatial proximity using the same semantics as the Freesound path.getSound()is now a dispatcher withsource: str = 'freesound'(default) or'aporee'.GeoTaggedData.get_sound_from_locationaccepts matchingsource=,catalog=, andprobe_durations=parameters; existing Freesound callers keep working unchanged. Output schema includes apreview-hq-mp3alias ofurlsodownload_to_dirand the slicing pipeline need no changes. probe_audio_duration(url)inurbanworm.utils.utils(re-exported fromurbanworm.utils.audio). Downloads an mp3 to a tempfile and reads its length via pydub (with mutagen as a fallback). Used by the Aporee path whenslice_durationis requested but the catalog has noduration_scolumn.enrich_aporee_catalog(catalog, out_path=None, min_duration=None, skip_existing=True, timeout=60)inurbanworm.dataset. One-shot helper that probes every URL in an Aporee catalog, populatesduration_s, optionally drops rows shorter thanmin_duration, and optionally writes the result back to CSV.fetch_aporee_catalog(bbox, year, hour, season, southern, rows, verify_urls, out_path, enrich_durations, min_duration, timeout, page_size)inurbanworm.dataset. Pulls the geolocated Aporee catalog from Internet Archive'sradio-aporee-mapscollection via the IA Scrape API. Server-side bbox + year filters; client-side hour- hemisphere-aware season filters. Optional
verify_urls=Truelooks up the exact mp3 filename per identifier; default is the fast<id>.mp3fallback. Output schema is compatible with :func:getSoundAporeeso a fetched DataFrame can be passed directly. getSoundAporeenow accepts the script-style column aliaseslat/lon/capture_time(renamed internally to the canonicallatitude/longitude/created).- 33 unit tests in
tests/test_aporee.py(filtering, dispatcher, duration probing, enrichment, IA fetcher with mocked HTTP, alias acceptance). fov='auto'forgetSV/get_svi_from_locations— sizes the perspective field-of-view per image so the building footprint at the query location is just framed (extent + 10% margin, clamped to[fov_min, fov_max]). Two new helpers inurbanworm.utils.utils(re-exported viaurbanworm.utils.geo):auto_fov_from_polygon(camera_lon, camera_lat, polygon, ...)— computes the angular extent of ashapelypolygon as seen from the camera. The polygon is taken from each unit'srow.geometrywhenget_svi_from_locationsis called against building footprints loaded bygetBuildings().auto_fov_from_distance(distance_m, building_width_m=15, ...)— heuristic fallback when no polygon is available (e.g. the user passed a bare coordinate togetSV).getSVacceptsfov: int | float | strand a newtarget_polygon=parameter;fov_margin,fov_min,fov_max,building_heightcontrol the auto sizing. Requiresreoriented=True.
fov='auto'is height-aware. Bothauto_fov_from_polygonandauto_fov_from_distancenow takebuilding_height_m(default 9 m, ~3 stories) andaspect_ratio(image_width / image_height) and return the wider of two requirements: horizontal extent of the footprint or horizontal FOV needed so the rendered image's derived vertical FOV (vFOV = wFOV / aspect) covers the building's height. Tall, narrow buildings now have their roofs framed instead of cropped. Setbuilding_height=0to skip the height term. 15 unit tests intests/test_auto_fov.py.- GlobalBuildingAtlas (
gba) building source with per-building height. NewgetGBABuildings(bbox, gba_path, ...)inurbanworm.utils.buildingloads a local GBA file (GPKG / GeoJSON / GeoParquet — anythinggeopandas.read_fileunderstands), filters by bbox + area, and normalizes the height column toheight_m(recognisesheight,h,bldg_height,building_height,zas aliases).GeoTaggedData.getBuildingsgainssource='gba'(with a requiredgba_path=) and logs how many of the loaded buildings carry a height value. - Per-building height for
fov='auto'. Whenself.unitshas aheight_mcolumn,get_svi_from_locationsnow uses each row's actual height instead of the globalbuilding_heightdefault. Falls back gracefully on NaN / missing values. source='globfp3d'ingetBuildings()for the 3D-GloBFP dataset (Che et al., ESSD 2024). Auto-fetchesworld_grid.zip+data_links.txtfrom Zenodo record15487037, intersects with the bbox, then downloads matching per-tile shapefiles from Figshare. New public helpers (canonical names, all inurbanworm.utils.building):getGloBFP3DBuildings,parse_globfp3d_data_links,figshare_article_id,load_globfp3d_grid_manifest,load_globfp3d_data_links,download_globfp3d_tile,fetch_globfp3d_for_bbox. Cached by default under~/.cache/urbanworm/globfp3d.source='gba'ingetBuildings()is now the real GlobalBuildingAtlas dataset (Zhu et al., ESSD 2025) — a separate dataset hosted on HuggingFace + mediaTUM. New helpers:getGBABuildings,load_true_gba_polygon_manifest,fetch_true_gba_for_bbox. Auto-fetches polygon tiles fromzhu-xlab/GBA.LoD1usingrepresentative/lod1.geojsonas the manifest, reprojects from EPSG:3857 to EPSG:4326. Cached under~/.cache/urbanworm/gba. Per-row heights from GBA.Height (mediaTUMm1837832) are NOT yet joined —include_heights=Trueis currently a no-op stub; tracking issue.- Backwards-compat aliases retained for the previous GBA-prefixed
names that actually pointed at the 3D-GloBFP pipeline:
parse_gba_data_links,load_gba_grid_manifest,load_gba_data_links,download_gba_tile,fetch_gba_for_bbox,_default_gba_cache_dir,GBA_ZENODO_RECORD/GBA_GRID_URL/etc. Old code continues to work; new code should use theglobfp3d-prefixed names for clarity. - 23 unit tests in
tests/test_gba.pycovering local-file loaders for both datasets, the new dispatcher ingetBuildings()(validates all four sources), parser helpers, figshare-id extraction, filename matching, end-to-end fetch with mocked HTTP for both datasets. .env.exampledocumenting environment variables for API keys.CHANGELOG.md(this file).
[0.1.9]¶
Pre-existing release. See git history.