new

Release 2026.06 - Bringing Data Observability Into Your Code

new

Release 2026.06 - Bringing Data Observability Into Your Code

new

  • Release 2026.06 - Bringing Data Observability Into Your Code

Data Quality Metrics: A Complete Guide for 2026

|

0

min. Lesezeit

Why Your Data Quality Project Keeps Failing and the 3 Structural Fixes That Actually Work

A dashboard looked fine at 8:00 a.m. By 9:15, finance was questioning the revenue number, operations was chasing a shipment issue, and the data team was trying to work out whether the problem started in ingestion, transformation, or a source system that changed shape overnight.

That's where organizations typically are when they start taking data quality seriously. Not at the moment of theory, but at the moment of failure. A missing address field blocks fulfillment. A duplicate customer record inflates a KPI. A stale table feeds a model that was accurate yesterday and unreliable today.

Data quality metrics are how you stop arguing from instinct and start operating from evidence. They turn “something feels off” into measurable signals, repeatable checks, and incident workflows that people can trust.

Table of Contents

Why Data Quality Metrics Are Non-Negotiable in 2026

Teams usually discover the need for data quality metrics after an avoidable incident. A board report is wrong. A machine learning feature table is stale. A schema change lands without notice and downstream logic keeps running as if nothing happened.

The problem isn't only bad data. The problem is bad data without instrumentation.

According to Monte Carlo's data quality statistics roundup, Gartner predicts that 30% of organizations will fail to achieve successful data-driven initiatives due to poor data quality, while 41% of organizations struggle with managing over 1,000 data sources. At that scale, manual checking stops being a discipline and becomes wishful thinking.

What metrics change in practice

A mature team doesn't ask, “Is this dataset good?” It asks narrower, operational questions:

  • Is the data fresh enough for the decision it supports?

  • Did required fields arrive for the current load?

  • Do records conform to business and format rules?

  • Did a source change shape without a coordinated release?

  • Is the issue local or systemic across domains and pipelines?

Those questions are what turn data quality from governance language into engineering work.

Practical rule: If you can't attach a metric to a failure mode, you don't yet have a monitoring strategy. You have a policy.

This is also why data quality work starts to overlap with operational telemetry. Teams that are already mastering centralized log management usually adapt faster, because they already understand baselines, alert noise, event correlation, and incident ownership. Data metrics need the same operating model.

A good system doesn't chase perfection. It measures what matters, at the right layer, with thresholds tied to business consequences.

The Six Core Data Quality Dimensions Explained

The standard vocabulary still matters. Accuracy, completeness, consistency, timeliness, validity, and uniqueness are the six dimensions most commonly used in practice and align with the ISO/IEC 25012:2008 framing summarized in this survey.

A diagram illustrating the six core data quality dimensions: accuracy, consistency, uniqueness, completeness, timeliness, and validity.

If you want a complementary reference on implementation detail, this guide on data quality dimensions and how to measure them at scale is useful. The important part, though, is to map each dimension to a business question.

Accuracy

Accuracy asks whether data reflects the actual thing it claims to describe.

A customer address can be complete, valid in format, and still wrong. A product price can be present in every downstream table and still not match the approved source. Accuracy failures are expensive because they often look structurally healthy.

Business question: Can we trust this value to represent reality?

Completeness

Completeness measures whether required data is present.

This is the dimension teams hit first because nulls are easy to count and easy to explain. If an order doesn't have a shipping address, the warehouse can't ship it. If a patient record is missing a required attribute, care workflows and audit trails break down quickly.

Business question: Do we have all the fields needed to run the process?

Consistency

Consistency checks whether the same fact stays aligned across systems and transformations.

Many enterprise problems hide in situations like this. A billing platform says a customer is active. The CRM says inactive. The warehouse joins both and exposes whichever value arrived last. None of those records are technically null or invalid, but the business can't act confidently on contradictory states.

Business question: Does the same entity mean the same thing everywhere?

A lot of “analytics confusion” is really a consistency problem wearing a semantic disguise.

Timeliness

Timeliness is about whether data arrives when it's needed for its use.

A daily revenue table that lands at noon may be fine for monthly planning and unacceptable for an 8:30 operations standup. Timeliness is always contextual. Late data becomes bad data when it misses the decision window.

Business question: Was the data available within the time window the process requires?

Validity

Validity checks whether values conform to rules, domains, and formats.

A date column might contain text. A status column might include values outside the approved set. A record may satisfy schema typing and still violate business logic, such as an end date earlier than a start date.

Business question: Does this record comply with the rules we agreed to?

Uniqueness

Uniqueness asks whether records appear once where they should appear once.

Duplicate customers, invoices, or transactions create visible pain fast. Counts inflate, joins multiply rows, and teams end up debating whether the issue is in the source or the model. Uniqueness checks should exist both at technical key level and business entity level, because not all duplicates share the same identifier.

Business question: Are we counting one thing once, or many times?

Here's the operational shortcut I use:

Dimension

Main failure mode

Typical business symptom

Accuracy

Wrong value

Bad decision or incorrect action

Completeness

Missing value

Broken workflow or audit gap

Consistency

Conflicting value

KPI disputes across teams

Timeliness

Late value

Stale dashboards and delayed action

Validity

Rule-breaking value

Process failure or rejected record

Uniqueness

Duplicate value

Inflated counts and noisy joins

How to Calculate Core Data Quality Metrics

Definitions only help if you can turn them into repeatable checks. The most reliable starting point is to calculate a small set of metrics in the warehouse, store the results in a metrics table, and trend them over time.

Start with a metric contract

Before writing SQL, define four things for each metric:

  1. Dataset scope. Which table, partition, or business domain are you measuring?

  2. Logic. What exactly counts as a pass or fail?

  3. Owner. Who triages the alert when the metric moves?

  4. Business impact. What breaks if this metric falls outside tolerance?

Without that contract, teams end up arguing after the alert, not before it.

A simple metrics table often looks like this:

column_name

purpose

metric_date

when the check ran

dataset_name

table or model measured

metric_name

completeness, duplicate_rate, freshness_lag

metric_value

numeric result

threshold_status

pass, warn, fail

dimension

completeness, validity, timeliness, etc.

SQL formulas that teams actually use

Completeness is the cleanest place to start. In the source material from Alation, completeness is defined as the percentage of non-null values in required fields against total row counts, and benchmark data in healthcare and finance shows that datasets with less than 97% completeness generate 40% more regulatory compliance findings in those sectors according to Alation's write-up on data quality metrics.

A basic formula is:

Completeness = (non-null required values / total rows) * 100

Example:

select
  current_date as metric_date,
  'orders' as dataset_name,
  'shipping_address_completeness' as metric_name,
  100.0 * sum(case when shipping_address is not null then 1 else 0 end) / count(*) as metric_value
from analytics.orders;
select
  current_date as metric_date,
  'orders' as dataset_name,
  'shipping_address_completeness' as metric_name,
  100.0 * sum(case when shipping_address is not null then 1 else 0 end) / count(*) as metric_value
from analytics.orders;
select
  current_date as metric_date,
  'orders' as dataset_name,
  'shipping_address_completeness' as metric_name,
  100.0 * sum(case when shipping_address is not null then 1 else 0 end) / count(*) as metric_value
from analytics.orders;

For multiple required columns, don't average blindly. Calculate each field separately and also calculate a record-level completeness score if the workflow depends on all fields being present.

select
  100.0 * sum(
    case
      when customer_id is not null
       and order_date is not null
       and shipping_address is not null
      then 1 else 0
    end
  ) / count(*) as record_completeness
from analytics.orders;
select
  100.0 * sum(
    case
      when customer_id is not null
       and order_date is not null
       and shipping_address is not null
      then 1 else 0
    end
  ) / count(*) as record_completeness
from analytics.orders;
select
  100.0 * sum(
    case
      when customer_id is not null
       and order_date is not null
       and shipping_address is not null
      then 1 else 0
    end
  ) / count(*) as record_completeness
from analytics.orders;

Uniqueness is usually measured as the share of rows that don't violate an expected key.

Uniqueness = 1 - (duplicate rows / total rows)

with keyed as (
  select
    order_id,
    count(*) as row_count
  from analytics.orders
  group by order_id
)
select
  100.0 * sum(case when row_count = 1 then 1 else 0 end) / count(*) as unique_key_rate
from keyed;
with keyed as (
  select
    order_id,
    count(*) as row_count
  from analytics.orders
  group by order_id
)
select
  100.0 * sum(case when row_count = 1 then 1 else 0 end) / count(*) as unique_key_rate
from keyed;
with keyed as (
  select
    order_id,
    count(*) as row_count
  from analytics.orders
  group by order_id
)
select
  100.0 * sum(case when row_count = 1 then 1 else 0 end) / count(*) as unique_key_rate
from keyed;

If the business entity can duplicate under different technical IDs, add fuzzy or rule-based matching later. Start with strict duplicates first.

Validity needs explicit rules. Don't rely on schema types alone.

select
  100.0 * sum(
    case
      when order_status in ('pending','paid','shipped','cancelled')
       and order_total >= 0
       and order_date <= current_date
      then 1 else 0
    end
  ) / count(*) as validity_rate
from analytics.orders;
select
  100.0 * sum(
    case
      when order_status in ('pending','paid','shipped','cancelled')
       and order_total >= 0
       and order_date <= current_date
      then 1 else 0
    end
  ) / count(*) as validity_rate
from analytics.orders;
select
  100.0 * sum(
    case
      when order_status in ('pending','paid','shipped','cancelled')
       and order_total >= 0
       and order_date <= current_date
      then 1 else 0
    end
  ) / count(*) as validity_rate
from analytics.orders;

For healthcare or research models, teams often need domain-specific rule libraries. In that setting, OMOPHub for OMOP data validation is a relevant reference because it focuses on structured validation workflows instead of generic quality advice.

Timeliness is often better expressed as lag than as a percentage.

select
  max(load_timestamp) as latest_load_timestamp,
  current_timestamp - max(load_timestamp) as freshness_lag
from analytics.orders;
select
  max(load_timestamp) as latest_load_timestamp,
  current_timestamp - max(load_timestamp) as freshness_lag
from analytics.orders;
select
  max(load_timestamp) as latest_load_timestamp,
  current_timestamp - max(load_timestamp) as freshness_lag
from analytics.orders;

If you need a validity rule design pattern beyond SQL snippets, this primer on what data validation means in operational systems is a good companion.

Thresholds belong to use cases, not tables

The common mistake is setting one threshold per dimension and applying it everywhere. That fails quickly.

A customer identity table deserves stricter completeness rules than a historical clickstream archive. A risk model feature set may tolerate some late-arriving enrichment data but not schema changes. A finance extract may accept small row-count variance but not a single invalid legal entity code.

Measure the same dimension differently when the cost of failure is different. That isn't inconsistency. It's responsible design.

The warehouse should calculate the metric. The business process should determine the threshold.

From Raw Metrics to Actionable Insights

A metric by itself is just a number in a table. Teams need interpretation logic around it. That means thresholds, anomaly detection, routing, and enough context to decide whether the issue is cosmetic or operational.

A six-step infographic illustrating the data observability process from raw collection to business improvement and optimization.

Thresholds that people will trust

Static thresholds work well when the rule is clear and absolute. Required fields, accepted enumerations, and key uniqueness checks usually fit this model.

Dynamic thresholds help when the metric is naturally variable. Row counts change with seasonality. Freshness patterns vary by schedule. Distribution shifts may be normal on some days and suspicious on others.

A practical split looks like this:

  • Use static thresholds for business rules and compliance-sensitive fields.

  • Use learned baselines for volume, freshness, and behavioral anomalies.

  • Use trend checks when deterioration matters more than one bad run.

Alerting that doesn't train teams to ignore alerts

Alerting fails when every deviation pages someone.

Good data quality alerts include context the first time they fire. The owner should see the dataset, failing metric, recent trend, suspected upstream change, and which dashboards, models, or operational processes depend on that asset. If all the alert says is “completeness dropped,” the responder still has to do basic triage manually.

I've seen a simple rule work well in mature teams:

Alert type

When to use it

Expected response

Hard fail

Rule violation with immediate business impact

Open incident and stop downstream use if needed

Warning

Degradation without clear decision risk yet

Investigate during business hours

Trend watch

Slow deterioration

Add to backlog and review with owner

The fastest way to lose trust in a monitoring system is to send technical alerts without operational context.

Sampling and baselining in messy environments

Not every table deserves the same depth of monitoring. Some are critical, some are intermediate, and some are disposable.

For broad coverage, start with cheap signals. Row counts, freshness, null patterns, schema changes, and duplicate checks surface a large share of real failures. Add record-level validation where business logic is strict. Sample when cost matters, but sample intentionally. For example, validate full loads on gold datasets and profile representative slices on lower-tier models.

AI-based baselining helps most when the environment is too noisy for hand-tuned limits. That's especially useful in organizations with many sources and uneven update patterns. The point isn't to remove human judgment. It's to reserve human attention for deviations that look materially different from normal system behavior.

Operationalizing Data Quality A KPI Dashboard Example

A good dashboard isn't a gallery of charts. It's a working surface for incident response, trend review, and ownership.

Start the page with a health summary that answers three questions immediately: what's failing now, what's getting worse, and what needs an owner today.

Screenshot from https://digna.ai

What the dashboard shows at a glance

The first row usually carries the operational view:

  • Open incidents by severity so on-call engineers know where to start

  • Freshness status by critical dataset to surface late or missing loads

  • Schema change feed for added, removed, or retyped columns

  • Top deteriorating metrics over a recent review window

Then add the analyst view. Trend lines for completeness, duplicate rate, validity failures, and freshness lag help teams distinguish one-off noise from sustained decline. Leaderboards are useful too. A “most unstable tables” view often triggers better prioritization than a long issue log.

One metric deserves special treatment: data downtime. According to Monte Carlo's discussion of data quality metrics, organizations tracking data downtime find that 68% of incidents stem from silent schema drift or freshness violations, and those incidents can cause a 15 to 25% reduction in downstream model accuracy within 48 hours. That's why the dashboard should show not just whether a check failed, but how long the affected dataset has been untrustworthy.

What each team does with it

Data engineers need a technical slice. They care about failed checks, incident duration, lineage, and likely upstream causes.

Analytics engineers and BI developers need a semantic slice. They want to know whether a trusted model still meets business rule expectations and whether dashboards should be annotated, paused, or rebuilt.

Governance and business owners need a risk slice. They want to know which domains repeatedly fail, which controls are weak, and whether issues are resolved within agreed timelines.

A short product walkthrough helps make that operating model concrete:

The best KPI dashboards don't stop at red and green. They show trend, blast radius, and owner. That's what turns monitoring into a system people use in practice during pressure, not just during demos.

Advanced Challenges Unanswered by Standard Metrics

Most guides stop at the six dimensions. In production, that's where the harder questions begin.

A diagram illustrating complex data quality challenges including data drift, contextual relevance, and evolving business requirements.

Micro-errors that hide inside good averages

Aggregate metrics can look healthy while a tiny defect causes disproportionate damage.

A few bad rows in a customer table may route high-value orders incorrectly. A small set of malformed records may break one downstream feature used by a model. Overall completeness, validity, and uniqueness can still appear acceptable because the aggregate score dilutes the impact.

That problem shows up in practitioner discussions too. A data engineering thread on micro-impact errors describes how difficult it is to quantify the impact of flaws that affect only a few rows but still trigger major downstream failures.

The fix isn't another average. It's segmentation and impact-aware checks.

Use patterns like these:

  • Critical-field monitoring for columns whose failure blocks a workflow, even if only a handful of rows are involved

  • Slice-based metrics by region, channel, product line, or customer tier so a local failure doesn't disappear inside a global pass rate

  • Path-sensitive validation that tests records most likely to hit regulated, financial, or model-critical paths

  • Business symptom metrics such as rejected transactions, unjoinable records, or records blocked from fulfillment

A metric should reflect the cost of being wrong, not just the count of wrong rows.

Why a single score is harder than it sounds

Leaders often ask for one number. It's a reasonable request. They want trend, comparison, and prioritization without reading ten charts.

The trouble is weighting.

A late marketing table and a duplicate payment row shouldn't contribute equally to a single score. A completeness issue in a reference archive doesn't carry the same risk as a validity issue in a customer identity domain. Static weights also age badly. Business processes change, model inputs evolve, and what used to be a warning can become a hard stop.

So a useful Data Quality Score has to be contextual. It needs domain-aware weights, business impact multipliers, and periodic recalibration. It should also separate descriptive health from predictive risk. Otherwise, you get a polished number that looks executive-friendly and tells operators very little.

A practical model is to score at three layers:

Layer

What it answers

Metric score

Did this specific check pass, degrade, or fail

Dataset score

Is this asset safe for its intended use

Domain risk score

Which business area carries the greatest operational exposure

That still won't solve every case. But it avoids the worst mistake, which is pretending one universal score can represent every use case equally well.

How Data Observability Platforms Automate Quality Metrics

Manual SQL checks are a strong starting point. They aren't enough once you're dealing with many sources, evolving schemas, and teams that need continuous monitoring instead of weekly review.

That's where data observability platforms earn their place. They automate metric collection, baseline normal behavior, detect anomalies, track freshness, and route issues with enough context for fast triage. They also reduce the maintenance burden that comes from hand-built rule sets scattered across Airflow jobs, dbt tests, warehouse procedures, and BI-layer patches.

The strongest platforms combine multiple modes of control:

  • Rule-based validation for essential business logic

  • Anomaly detection for drift that no one explicitly modeled

  • Timeliness monitoring for late or missing arrivals

  • Schema tracking for structural changes that unexpectedly break downstream assumptions

  • Historical analytics for identifying slow deterioration

If you're evaluating that category, a useful starting point is this explainer on why data observability is crucial for modern data management. One example in the space is digna, which combines in-database metric computation, anomaly detection, timeliness monitoring, record-level validation, and schema tracking in customer-controlled environments.

The practical goal isn't more dashboards. It's fewer surprises, faster root cause analysis, and clearer ownership when data stops being trustworthy.

If your team is trying to move from ad hoc checks to a monitored, operational data quality practice, digna is worth evaluating. It's built for teams that need anomaly detection, validation, timeliness monitoring, and schema tracking without moving production data outside their own environment.

Teilen auf X
Teilen auf X
Auf Facebook teilen
Auf Facebook teilen
Auf LinkedIn teilen
Auf LinkedIn teilen

Lerne das Team hinter der Plattform kennen

Ein in Wien ansässiges Team von KI-, Daten- und Softwareexperten, unterstützt

von akademischer Strenge und Unternehmensexpertise.

Lerne das Team hinter der Plattform kennen

Ein in Wien ansässiges Team von KI-, Daten- und Softwareexperten, unterstützt
von akademischer Strenge und Unternehmensexpertise.

Produkt

Integrationen

Ressourcen

Unternehmen