Matching & Scoring

Engine

apps/api/src/services/matching/engines/deterministicEngine.ts

Uses NormalizationService to pre-process text values.
Computes similarities via similarity() (Damerau-Levenshtein) plus dedicated routines for authors, dates, container titles, volume/issue, and pages.
Aggregates field scores (0–100) according to the configured weights.

Defined in NormalizationService.normalize:

Only requested rules run; others are skipped.

Authors (match-author-initials) – match family names (exact or ≥0.9 similarity) and compare initials via tokenizeGiven/isSubsequence.
Dates (match-structured-dates) – parse CSL dates (date-parts, raw, literal) and weight year (0.8), month (0.15), day (0.05).
Volume/Issue (match-volume-issue-numeric) – extract digits and compare the first occurrence.
Pages (match-page-range-overlap) – detect ranges, expand shorthand (“123-8”), compute overlap/union ratio. Single-page vs range yields 1.0 if contained.
Container title (match-container-title-variants) – drop acronym-only parentheses, test variants, and use Damerau-Levenshtein to pick the best score.

overall = sum(fieldScore * weight) / sum(weight)

overall ≈ 0.3*100 + 0.25*92 + 0.15*85 + 0.15*70 + 0.1*100 + 0.03*0 + 0.02*50 = 83.7 → 84

Configured in settings.matching.matchingConfig.displayThresholds and applied by getScoreColor.

settings.matching.matchingConfig.earlyTermination defaults to { enabled: true, threshold: 95 }.
useVerification.performVerificationWithEarlyTermination:
1. Iterate enabled databases in priority order (settings.search.databases).
2. After each match, check the current score.
3. Stop querying other databases when the score ≥ threshold.
4. Without early termination, the workflow completes all searches and re-matches to provide final scores.

useVerificationProgressStore tracks phases (searching, matching, done, error) per reference.

TODO: Document custom weight presets (strict, balanced, custom) once additional modes ship.
TODO: Provide sample normalisation profiles (e.g. “title + DOI only”).