pHash Contribution
Perceptual hashes (pHash) let you identify a video against the StashBox community index by content rather than by name. Obscura's pHash pipeline intentionally matches Stash's so that the values cluster with the existing community fingerprint database.
Why bit-for-bit compatibility matters
The whole point of a pHash is that two encodings of the same content produce the same hash. The community index has millions of values computed by Stash's pipeline; if Obscura computed slightly-different values they wouldn't cluster, and identify-by-fingerprint would fail.
So the implementation matches Stash exactly:
- Frame selection (which frames, when)
- ffmpeg seek strategy (
-ssbefore-i) - Frame scale (width
160) - Montage layout (5×5 grid, preserve source aspect)
- Hash function (
goimagehash.PerceptionHash) - Hash format (lowercase 16-character hex)
Changing any of these produces values that no longer cluster. The implementation lives in infra/phash/main.go; treat it as a contract, not a starting point.
Generation summary
- Sample 25 frames evenly from 5 % through 91.4 % of the source duration.
- Use ffmpeg input seek (
-ssbefore-i) and scale each frame to width160, height computed to preserve aspect ratio. - Compose a 5×5 montage of the frames in capture order.
- Run
goimagehash.PerceptionHashon the montage. - Store the lowercase 16-character hex string in
video_episodes.phashorvideo_movies.phash.
The helper binary is built from infra/phash/main.go and copied into the unified Docker image as /usr/local/bin/obscura-phash. The worker shells out to it via the OBSCURA_PHASH_BIN env var (defaulting to obscura-phash on PATH).
When pHash is computed
- On scan, if
library_settings.generate_phashis true (default). - On rebuild preview, when the user clicks Rebuild preview on a video detail page.
- On backfill, via the Backfill pHashes diagnostic in Settings → Generated Storage.
If the helper binary isn't available — which can happen on a dev box without obscura-phash on PATH — the worker logs a warning and skips the hash. The video gets phash = NULL; the rest of the pipeline still works.
Contribution flow
Identify → Accept (StashBox match) → Auto-link → Submit fingerprint
When you accept a StashBox-origin match in the cascade drawer:
- The remote scene link is recorded in
stash_ids(entity_type = video_episode|video_movie,stashbox_endpoint_id,stash_id). - The auto-submit job runs: every fingerprint Obscura has for that video (MD5, OSHASH, PHASH) is submitted to the StashBox endpoint.
- Each submission attempt is logged to
fingerprint_submissionswith status (success/error) and any error message.
Submitting back closes the loop — your hashes contribute to the community index, helping the next person identify the same content faster.
Building the helper locally
If you're iterating on obscura-phash outside Docker:
cd infra/phash
go mod tidy
go build -o obscura-phash .
Put the resulting binary on PATH or point Obscura at it:
export OBSCURA_PHASH_BIN=/path/to/obscura-phash
The helper takes a video file path as its only argument and prints the hash to stdout:
./obscura-phash /path/to/video.mkv
# → 8f3a2b1c4d5e6f70
Troubleshooting
"pHash generation skipped" — the binary isn't on PATH or at OBSCURA_PHASH_BIN. The worker logs a warning per file. Fix the path; future scans will compute the hash.
Hashes don't match the community index — the most likely culprit is a different ffmpeg version with different default scaling behavior. The unified Docker image pins ffmpeg specifically; using a different ffmpeg locally can drift the values. Use the unified image when contributing back.
Slow generation — pHash for a 4K source can take 30+ seconds because the helper has to seek 25 times and scale each frame. The pHash queue is set to concurrency 1 globally to avoid thrashing the disk; you can raise concurrency in Settings → Watched Libraries → Background worker concurrency at the cost of more I/O contention.
Submission failures — check the fingerprint_submissions table for the error string. Common cases:
- Endpoint API key invalid → re-check the endpoint config in Plugins → StashBox.
- Endpoint rate-limited → submissions retry on the next identify pass.
- Endpoint doesn't accept your hash → some endpoints only accept specific algorithms.
Reading the source
If you're going deeper:
infra/phash/main.go— the helper.packages/media-core/src/index.ts—computeFingerprint('video-phash', filePath)glue.apps/worker/src/processors/processFingerprint.ts— when phash is computed during a scan.packages/stash-import/src/stashbox/client.ts— submission GraphQL.packages/db/src/schema.ts—fingerprint_submissionsandstash_idstables.