Data Validation Guide
Data Validation & Quality Assurance Guide
Overview
The Earthquake Catalogue Platform implements a comprehensive data validation and quality assurance system to ensure the integrity, accuracy, and reliability of earthquake data. This guide explains all validation rules, quality metrics, and best practices.
Table of Contents
``Input Validation <#input-validation>``_
``Data Quality Assessment <#data-quality-assessment>``_
``Cross-Field Validation <#cross-field-validation>``_
``Quality Metrics <#quality-metrics>``_
``Completeness Metrics <#completeness-metrics>``_
``Anomaly Detection <#anomaly-detection>``_
``Best Practices <#best-practices>``_
Input Validation
Required Fields
All earthquake events must include the following required fields:
Field |
Type |
Range |
Description |
|---|---|---|---|
|
DateTime |
1000-01-01 to present |
Event origin time (supports historical events back to year 1000 CE) |
|
Number |
-90 to 90 |
Latitude in decimal degrees |
|
Number |
-180 to 180 |
Longitude in decimal degrees |
|
Number |
-2 to 10 |
Event magnitude |
Optional Fields with Validation
Field |
Type |
Range |
Description |
|---|---|---|---|
|
Number |
0 to 1000 km |
Depth below surface |
|
String |
Max 10 chars |
Magnitude scale (ML, Mw, mb, etc.) |
|
String |
Max 255 chars |
Geographic region name |
|
String |
Max 100 chars |
Data source identifier |
|
Number |
0 to 10° |
Latitude uncertainty |
|
Number |
0 to 10° |
Longitude uncertainty |
|
Number |
0 to 100 km |
Depth uncertainty |
|
Number |
0 to 60 s |
Time uncertainty |
|
Number |
0 to 5 |
Magnitude uncertainty |
|
Number |
0 to 360° |
Largest azimuthal gap |
|
Integer |
0 to 1000 |
Number of phases used |
|
Integer |
0 to 500 |
Number of stations used |
|
Number |
0 to 100 s |
RMS residual |
|
Integer |
0 to 500 |
Stations used for magnitude |
Validation Rules
Time Validation
Must be a valid ISO 8601 datetime or parseable date string
Cannot be in the future
Must be after year 1000 CE (minimum supported date for historical seismology)
Informational note for pre-1900 events (pre-instrumental era)
Historical Events Support: The system supports historical earthquake catalogues dating back to year 1000 CE. Events before 1900 are flagged with an informational note (not an error or warning) to indicate they are from the pre-instrumental era and may have higher location/magnitude uncertainties. This enables importing historical seismology catalogues that document earthquakes from written records, archaeological evidence, and other historical sources.
Location Validation
Latitude must be between -90° and 90°
Longitude must be between -180° and 180°
Warning if coordinates are (0, 0) - “Null Island”
Warning if coordinates are very close to (0, 0)
Magnitude Validation
Must be between -2 and 10
Warning if magnitude > 9 (extremely rare)
Warning if magnitude < -1 (unusual)
Depth Validation
Must be >= 0 km (cannot be negative)
Must be <= 1000 km (maximum observed depth)
Warning if depth > 700 km (very deep, rare)
Info if depth = 0 (may indicate missing data)
Data Quality Assessment
The system calculates three primary quality scores:
1. Completeness Score (0-100%)
Measures the presence of required and optional fields:
100%: All required fields present, most optional fields populated
90-99%: All required fields, some optional fields
70-89%: All required fields, few optional fields
50-69%: Some required fields missing
<50%: Many required fields missing
Formula: (Required × 0.7 + Optional × 0.3)
2. Consistency Score (0-100%)
Checks for internal consistency and logical relationships:
Duplicate timestamps
Suspicious magnitude-depth relationships
Inconsistent quality metrics
Geographic bounds validity
Deductions: - -10 points: Duplicate timestamps found - -5 points: Suspicious shallow large-magnitude events - -5 points: Other consistency issues
3. Accuracy Score (0-100%)
Based on uncertainty values and quality metrics:
High accuracy: Small uncertainties, good station coverage
Medium accuracy: Moderate uncertainties
Low accuracy: Large uncertainties, poor station coverage
Deductions: - -30 points: >50% of events have high location uncertainty (>10km) - -20 points: >50% of events have high depth uncertainty - -10 points: Poor quality metrics overall
Overall Quality Grade
Score |
Grade |
Label |
Description |
|---|---|---|---|
95-100 |
A+ |
Excellent |
Publication-quality data |
90-94 |
A |
Excellent |
High-quality, reliable data |
80-89 |
B |
Good |
Good quality, suitable for most analyses |
70-79 |
C |
Fair |
Acceptable quality, some limitations |
60-69 |
D |
Poor |
Marginal quality, use with caution |
<60 |
F |
Failing |
Insufficient quality, not recommended |
Cross-Field Validation
Magnitude-Depth Relationships
Rule: Very shallow events with large magnitudes are extremely rare
Warning: Depth < 5 km AND Magnitude > 8
Warning: Depth > 300 km AND Magnitude < 3
Warning: Depth > 700 km AND Magnitude < 4
Rationale: - Shallow large earthquakes are rare (requires special conditions) - Small deep earthquakes are difficult to detect - Very deep small earthquakes are almost never detected
Uncertainty-Value Relationships
Rules:
Depth Uncertainty vs Depth - Warning: Depth uncertainty > 2 × Depth - Info: Depth uncertainty > Depth - Indicates poorly constrained depth
Magnitude Uncertainty - Warning: Magnitude uncertainty > 1.0 - Error: Magnitude uncertainty > |Magnitude| - Indicates unreliable magnitude
Location Uncertainty Asymmetry - Warning: Ratio of lat/lon uncertainties > 10:1 - Indicates poor station distribution
Quality Metrics Consistency
Rules:
Station vs Phase Count - Error: Station count > Phase count (impossible) - Info: Phase count < 1.2 × Station count (unusual)
Magnitude Stations - Warning: Magnitude stations > Location stations
Azimuthal Gap vs Station Count - Warning: Gap > 180° with >= 10 stations - Info: Gap < 90° with < 6 stations
RMS Residual (Standard Error) - Warning: RMS > 5.0 seconds (poor fit) - Info: RMS < 0.01 seconds (unusually good)
Quality Metrics
Location Quality
Based on uncertainties and network geometry:
Horizontal Uncertainty: < 1 km excellent, > 10 km poor
Depth Uncertainty: < 5 km excellent, > 20 km poor
Azimuthal Gap: < 120° excellent, > 240° poor
Network Geometry
Station Count: >= 10 excellent, < 6 poor
Phase Count: >= 30 excellent, < 8 poor
Azimuthal Gap: < 120° excellent, > 240° poor
Solution Quality
RMS Residual: < 0.3s excellent, > 1.0s poor
Evaluation Status: final > reviewed > confirmed > preliminary
Magnitude Quality
Magnitude Uncertainty: < 0.1 excellent, > 0.3 poor
Station Count: >= 10 excellent, < 3 poor
Completeness Metrics
Required Fields Coverage
Tracks presence of essential fields:
time,latitude,longitude,magnitudeMust be 100% for valid catalogue
Optional Fields Coverage
Tracks presence of quality-enhancing fields:
Uncertainties (location, depth, magnitude)
Quality metrics (gap, phase count, station count)
Metadata (region, source, magnitude type)
Missing Data Patterns
Identifies systematic gaps:
Fields with >50% missing data
Events with no uncertainty information
Events with no quality metrics
Anomaly Detection
Extreme Values
Magnitude Anomalies: - Magnitude > 9 (extremely rare, verify) - Magnitude < -1 (unusual, verify)
Depth Anomalies: - Depth > 700 km (very deep, rare but possible) - Depth = 0 for >10% of events (may indicate missing data)
Temporal Clustering
Duplicate Detection: - Events within 1 second of each other - May indicate duplicates or require review
Geographic Anomalies
Null Island: - Coordinates at (0°, 0°) - Almost always a data error
Extreme Bounds: - Bounds > 180° latitude or 360° longitude - Bounds < 0.01° (very small area)
Best Practices
Data Preparation
Ensure Required Fields: All events must have time, location, and magnitude
Include Uncertainties: Provide uncertainty estimates when available
Add Quality Metrics: Include azimuthal gap, phase counts, station counts
Specify Magnitude Type: Indicate ML, Mw, mb, etc.
Provide Metadata: Include region, source, evaluation status
Quality Improvement
Location Accuracy: - Use more seismic stations - Improve station distribution (reduce azimuthal gap) - Use better velocity models - Include both P and S phases
Magnitude Accuracy: - Use more stations for magnitude calculation - Apply appropriate magnitude scale - Include magnitude uncertainty estimates
Depth Accuracy: - Use depth phases (pP, sP) - Include nearby stations - Consider fixing depth if poorly constrained
Data Validation Workflow
Upload Data: Use supported formats (CSV, JSON, QuakeML)
Review Validation Results: Check errors and warnings
Assess Quality Report: Review completeness, consistency, accuracy scores
Check Anomalies: Investigate flagged events
Review Recommendations: Follow suggested improvements
Fix Issues: Correct errors before final import
Re-validate: Ensure all issues resolved
Minimum Quality Standards
For data to be accepted:
Completeness: >= 50% (all required fields)
No Critical Errors: No validation errors
Overall Score: >= 60/100
Recommended for high-quality analysis:
Completeness: >= 90%
Overall Score: >= 80/100 (Grade B or better)
Uncertainties: Present for >= 50% of events
Quality Metrics: Present for >= 50% of events
Error Messages Reference
Common Errors
Error |
Severity |
Meaning |
Solution |
|---|---|---|---|
“Invalid timestamp format” |
Error |
Time field not parseable |
Use ISO 8601 format |
“Latitude must be >= -90” |
Error |
Invalid latitude |
Check coordinate system |
“Magnitude must be <= 10” |
Error |
Unrealistic magnitude |
Verify magnitude value |
“Station count > Phase count” |
Error |
Impossible relationship |
Check quality metrics |
“Event time is in the future” |
Error |
Invalid timestamp |
Verify time zone and date |
Common Warnings
Warning |
Meaning |
Recommendation |
|---|---|---|
“High location uncertainty” |
Poor location constraint |
Add more stations or fix depth |
“Large azimuthal gap” |
Poor station distribution |
Use more distant stations |
“Very shallow large magnitude” |
Unusual event |
Verify depth and magnitude |
“Duplicate timestamps” |
Possible duplicates |
Review events with same time |
“Missing uncertainty data” |
Limited quality assessment |
Add uncertainty estimates |
API Reference
Validation Functions
// Validate single event
validateEarthquakeEvent(data: unknown): {
success: boolean;
data?: EarthquakeEvent;
errors?: ZodError;
}
// Validate multiple events
validateEarthquakeEvents(data: unknown[]): {
validEvents: EarthquakeEvent[];
invalidEvents: Array<{ index: number; errors: ZodError }>;
}
// Assess data quality
assessDataQuality(events: any[]): DataQualityReport
// Perform comprehensive quality check
performQualityCheck(events: any[]): QualityCheckResult
// Cross-field validation
validateEventCrossFields(event: any): CrossFieldValidationResult
Support
For questions or issues with data validation:
Review this guide
Check validation error messages
Consult the quality report recommendations
Contact the development team
Last Updated: October 31, 2025 Version: 1.0.0