Detection Methodology

How Verigin Works

A technical explanation of Verigin's detection pipeline, published false positive rates, confidence intervals, and the appeals process. We publish this because institutions can't buy what they can't audit.

Last updated: March 2026
Next update: June 2026 (quarterly cadence)
Questions: methodology@verigin.ai
Detection Layers

Two independent verification layers

Verigin runs two fundamentally different signal types on every piece of content. They are complementary: one is probabilistic, one is cryptographic. Neither is sufficient alone.

🤖

Layer 1: Probabilistic AI detection

Statistical models analyze patterns in text, images, and video. Multiple independent detectors run in parallel — each with different strengths — and their outputs are reconciled into a single confidence score. This layer covers the vast majority of web content, which has no provenance metadata.

Inherently probabilistic. Can produce false positives. FP rates published below.

🔐

Layer 2: C2PA cryptographic provenance

The Coalition for Content Provenance and Authenticity standard embeds a cryptographically signed chain of custody directly into files — from the moment of capture to publication. Where a valid C2PA manifest exists, Verigin reads and validates the signature.

Binary and certain when present. No false positives possible. Coverage growing as hardware adoption expands.

Text Pipeline

Text detection

Two detection models run in parallel for every text submission. Results are blended 50/50 into a single AI probability score.

Model Provider Strength Weighting
Winston AI v2 GoWinston.ai GPT-4, Claude, Gemini, Llama outputs. Auto language detection — supports non-English content. Returns 0–100 (higher = more human). 50%
GPTZero GPTZero.me Strong on academic and journalistic writing styles. Returns completely_generated_prob directly as the AI score. 50%

Preprocessing

Before scoring, text is preprocessed to reduce false positives caused by quoting sources:

Why two models? No single AI detector is reliably accurate across all writing styles, languages, and AI model families. Running two independent models and blending their outputs smooths out the systematic errors of any individual approach. When the two models strongly disagree (delta > 0.4), the output confidence is marked as "low" regardless of the blended score.
Image Pipeline

Image detection

Two image detection models run in parallel. The final score is the maximum of both outputs — not the average. This is a deliberate design choice: the two models cover different attack surfaces, and averaging would suppress true positives.

Model Provider Attack surface Aggregation
AI-generated image detector Sightengine Fully synthetic images: Midjourney, DALL-E, Stable Diffusion, Firefly. Excels at detecting images generated entirely by AI. max()
rd-context-img Reality Defender AI-modified real photographs: inpainting, generative fill, object removal, face swap. The only commercially available model with reliable detection of manipulated real photos. max()
Why max() and not average()? If an image is fully AI-generated, Sightengine will score it high and Reality Defender may score it lower (it's optimized for manipulation, not generation). Averaging would produce a misleadingly low score. Taking the maximum ensures that whichever detector has the relevant expertise for the specific attack type dominates the result.

Image submission flow

1

URL submission or presigned upload

Images can be submitted by URL (Sightengine fetches directly) or as a file upload (presigned S3 URL → Reality Defender async polling).

2

Parallel scoring

Both detection calls run concurrently. Typical latency: 400–900ms. Reality Defender uses async polling with a 30-second timeout.

3

Score reconciliation

Final ai_score = max(sightengine_score, reality_defender_score). Both individual scores are returned in verbose mode.

4

Cache and return

Result is cached by content hash for 24 hours. Repeated calls for the same image return the cached result instantly at no additional API cost.

Cryptographic Provenance

C2PA verification

The Coalition for Content Provenance and Authenticity (C2PA) is an open standard backed by Adobe, Google, Microsoft, the BBC, and 6,000+ member organizations. It enables cameras, editing software, and publishing platforms to embed a cryptographically signed chain of custody into a file.

Verigin verifies C2PA signatures using the official c2pa-python library (open-source, maintained by the Content Authenticity Initiative). Where a valid C2PA manifest exists:

Coverage today: C2PA credentials are currently present on a small fraction of web content — primarily images from Adobe tools, some Android/iOS camera apps, and select stock photo platforms. Coverage is growing rapidly as hardware adoption expands. Verigin checks for C2PA on every call, so your system benefits automatically as coverage increases with no code changes.
Output Design

Signals, not verdicts

Verigin returns detection signals with confidence intervals — not binary verdicts. This is a deliberate product design decision, not a hedge. Automated adverse actions based solely on a probabilistic score create legal liability and cause harm to falsely flagged individuals.

What Verigin returns

API response — text scan (verbose)
{
  "content_type": "text",
  "ai_score": 0.71,
  "confidence": "moderate",
  "signals_elevated": 4,
  "signals_total": 7,
  "c2pa": null,
  "recommendation": "human_review",
  "verbose": {
    "winston_score": 0.68,
    "gptzero_score": 0.74,
    "model_agreement": "high",
    "language": "en",
    "char_count": 1842
  }
}

How to read this output

ai_score: 0.71
71% of detection signals suggest AI generation. Not a verdict — a probability.
signals_elevated: 4 of 7
4 of the 7 internal detection signals exceeded their threshold. The other 3 did not.
recommendation: human_review
Verigin recommends human review before any adverse action. This field is not optional guidance — it is the output's intended use.
model_agreement: high
Winston and GPTZero agree closely (delta < 0.15). Low agreement increases uncertainty regardless of blended score.
Terms of Service requirement: Verigin output may not be used as the sole basis for automated adverse actions (rejection, termination, content removal, denial of access) without human review. This applies to all API integrations. Enterprise contracts include this requirement explicitly. High-stakes API integrations (HR, legal, publishing) default to a mandatory human-review confirmation step.
Accuracy

Published false positive rates

These rates are measured on our internal test sets and updated quarterly. A false positive is a human-authored piece of content that Verigin scores above 0.5 (the threshold at which "human review" is recommended).

Last measured: March 2026  |  Next update: June 2026

Content type FP Rate (current) Public target Notes
Text — standard English 6.2% < 5% Journalism, blog, business writing. Improving with preprocessing updates.
Text — non-native English 11.4% < 8% Elevated. Bias audit in progress — see below. Do not use for adverse screening of ESL writers without additional human review.
Text — academic / scientific 8.1% < 7% Formal writing style overlaps with AI output patterns. Improving.
Images — AI-generated (synthetic) 9.3% < 12% Strong performance on photorealistic AI images. Higher FP on stylized/illustrated content.
Images — AI-modified (deepfake) 14.7% < 18% Manipulation detection is harder. Heavy compression and resaving reduce signal quality.
What these numbers mean: A 6.2% FP rate on standard English text means that approximately 1 in 16 human-written articles submitted to Verigin will score above the 0.5 threshold and receive a "human review recommended" flag. This is why Verigin recommends human review — not automated rejection — for flagged content.
Fairness

Bias audit and non-native writer risk

AI detection models trained predominantly on native English text systematically flag non-native English writing at elevated false positive rates. This is the highest asymmetric risk in Verigin's methodology.

Affected populations include: ESL journalists and academics, neurodivergent writers whose prose patterns differ from training data, writers from non-Western academic traditions, and any writer whose style diverges significantly from mainstream Anglo-American conventions.

Current mitigation plan

1

Systematic bias audit (Q2 2026)

Testing across non-native English writers, neurodivergent writing styles, and non-Western academic conventions. Results will be published regardless of findings — including if they are worse than the estimates above.

2

Segmented FP reporting

All false positive disclosures are segmented by writer type and writing context, not just content type. Aggregate rates can mask systematic bias against specific populations.

3

Advisory board (target: Q3 2026)

A bias mitigation advisory board including a computational linguistics researcher and a digital rights advocate. Advisory board members will be named publicly when confirmed.

4

Proactive civil society engagement

Verigin has shared this methodology documentation proactively with the EFF and ACLU before any complaint is filed. We invite review and critique.

Until the bias audit is complete: Do not use Verigin output as the sole basis for adverse screening decisions involving non-native English writers. The 11.4% FP rate for non-native English is above our target and above acceptable risk for high-stakes use cases. Human review is not optional for these populations.
Appeals

Appeals process

Any content that receives a Verigin score can be submitted for manual review. Human reviewers examine the raw signal data — not just the blended score — and issue a revised assessment if warranted.

1

Submit the appeal

Email appeals@verigin.ai with the content URL or hash and the original score. Include any context you believe is relevant (writing style, language, subject matter).

2

Human review within 48 hours

A trained reviewer examines the raw signal output from both detection models, the preprocessing log, and the content itself. The 48-hour SLA applies to Pro and Enterprise customers. Free tier: best effort.

3

Revised assessment issued

If the appeal is upheld, a revised score is issued and the cached result is updated. The original score and the revised score are both retained in the audit log — Verigin does not delete evidence of errors.

4

Aggregate reporting

Appeal volumes and uphold rates are reported in quarterly methodology updates. Systematic false positive patterns identified through appeals trigger model recalibration.

Limitations

What Verigin cannot do

Transparency about limitations is as important as accuracy claims.

Detect all AI-generated content

Heavily edited AI text, AI text run through paraphrasing tools, and AI content from models not in the training data may score as human. Detection is probabilistic and has known blind spots.

Make editorial judgments

Verigin detects probable origin — not quality, accuracy, truthfulness, or editorial value. A human-written article can be false. An AI-generated article can be accurate. These are different questions.

Score very short text reliably

Text submissions under 300 characters do not have enough signal for a reliable score. Results for short text are returned with a low-confidence flag.

Replace human review for high-stakes decisions

Verigin is a triage tool that identifies content warranting closer review. It is not designed to make final decisions about employment, publication, or legal matters without human judgment.

Questions about the methodology?

We answer methodology questions directly. No sales process required.

Email methodology@verigin.ai → Try the live demo →