Merging Catalogues

Learn how to merge multiple earthquake catalogues with automated duplicate detection and configurable conflict resolution strategies.

Overview

Catalogue merging allows you to combine earthquake data from multiple sources into a unified, comprehensive catalogue. This is essential for:

Combining regional and national catalogues
Integrating historical and modern data
Comparing independent analyses of the same events
Creating research-ready datasets from multiple sources

Key platform features include:

🆕 Quality-Based Strategy (Recommended): A new merge strategy that scores every duplicate event on a 0–100 point index (station count, azimuthal gap, location error, magnitude uncertainty, magnitude type, review status) and keeps the highest-scoring event. This is a user-selectable option — see Merge Strategies below.
Automated Duplicate Detection: Matches events across catalogues using time, location, and magnitude criteria.
Complete Provenance: Tracks the source of every event in the merged result.
Configurable Thresholds: Adjust matching parameters for different data types.

The platform also applies underlying algorithm improvements. Some run for every strategy, others are specific to the Average strategy:

Date Line Normalisation (all strategies): Spatial matching near ±180° uses unit-vector averaging to avoid arithmetic errors in the Pacific region.
Validation Gates (all strategies): Rejects physically inconsistent duplicate groups before any strategy is applied (e.g., an M4.0 matched against an M7.0, or a group spanning > 200 km).
Magnitude Hierarchy (Average strategy): Uses the ISC standard (Mw > Ms > mb > ML) when computing averages, preventing saturation errors from mixing incompatible scales. Other strategies keep the winning event’s existing magnitude unchanged.
Depth Uncertainty Selection (Average strategy): Selects the depth with the lowest reported uncertainty rather than a simple mean. Other strategies inherit depth directly from the winning event.

Merge Process Overview

        flowchart TD
    subgraph Inputs ["Input Catalogues"]
        A["Catalogue A"]
        B["Catalogue B"]
        C["Catalogue C"]
    end

    Combine["COMBINE ALL EVENTS"]
    Detect["DETECT DUPLICATES<br/>(time + location + magnitude)"]
    Resolve["RESOLVE CONFLICTS<br/>(apply selected merge strategy)"]
    Result["MERGED CATALOGUE<br/>(unique events with provenance)"]

    A & B & C --> Combine
    Combine --> Detect
    Detect --> Resolve
    Resolve --> Result

Understanding Duplicates

What Makes Events Duplicates?

Two events are considered duplicates if they likely represent the same earthquake recorded in different catalogues. The platform uses three criteria, with default thresholds and association logic based on international standards for global and regional earthquake association (Storchak et al., 2013; Benz et al., 2019):

Criterion	Default Threshold	Rationale
Time	± 60 seconds	Origin times may differ due to analysis methods
Distance	≤ 50 km	Locations vary based on velocity models and data
Magnitude	≤ 0.5	Different scales and stations affect magnitude

All three criteria must be met for events to be considered duplicates.

Why Duplicates Occur

Different catalogues may have different:

Seismic networks: Regional vs. global station coverage
Velocity models: Affect calculated locations
Magnitude scales: ML, Mw, mb produce different values
Analysis procedures: Automatic vs. manual processing
Update schedules: Preliminary vs. final solutions

Example Duplicate Detection

Catalogue A: 2024-01-15 10:30:45, M4.5, -41.50, 174.20
Catalogue B: 2024-01-15 10:30:47, M4.6, -41.51, 174.21

Time difference:    2 seconds   (< 60s threshold) ✓
Distance:           1.4 km      (< 50km threshold) ✓
Magnitude diff:     0.1         (< 0.5 threshold) ✓

Result: These are duplicates (same earthquake)

Merge Strategies

Choose the strategy that best fits your use case:

Strategy Decision Guide

        flowchart TD
    Start{"Do you want the<br/>best scientific result?"} -- YES --> Quality["Use Quality-Based<br/>(Recommended)"]
    Start -- NO --> Auth{"Do you have one<br/>authoritative source?"}

    Auth -- YES --> Priority["Use Priority-Based"]
    Auth -- NO --> Recent{"Do your duplicate events<br/>have different origin times<br/>and newer = more reliable?"}

    Recent -- YES --> Newest["Use Newest Data"]
    Recent -- NO --> Matter{"Which matters more?"}

    Matter -- "Metadata completeness" --> Complete["Use Most Complete"]
    Matter -- "Statistical accuracy" --> Average["Use Average Values"]

Priority-Based Strategy

How it works:

You designate one catalogue as “primary”
When duplicates are found, keep the primary catalogue’s event
Discard the duplicate from other catalogues

This approach follows the principle of network authority, where local networks are prioritized for regional events as recommended by Bondár & Storchak (2011).

Example:

Primary (GeoNet):   M4.5, depth 25 km, 42 phases
Secondary (USGS):   M4.6, depth 28 km, 15 phases

Result: Keep GeoNet event (M4.5, depth 25 km, 42 phases)

Best for:

Merging regional data with a trusted national catalogue
When one source has consistently better quality
Operational settings where one authority is preferred

Considerations:

Simple and predictable
May discard valid information from secondary sources
Assumes primary source is always correct

        graph LR
    subgraph Inputs ["Duplicate Group"]
        E1["USGS (M4.6)"]
        E2["GeoNet (M4.5)"]
        E3["ISC (M4.5)"]
    end

    subgraph Logic ["Priority Logic"]
        P1["1. GeoNet (Primary)"]
        P2["2. USGS"]
        P3["3. ISC"]
    end

    Result["GeoNet Event Selected"]

    Inputs --> Logic
    Logic --> Result

    style E2 fill:#f9f,stroke:#333,stroke-width:2px
    style Result fill:#f9f,stroke:#333,stroke-width:2px

Average Values Strategy

How it works:

Compute a weighted-average location (lower uncertainty = higher weight)
Select the best magnitude using the ISC hierarchy (Mw > Ms > mb > ML)
Pick the depth with the lowest reported uncertainty
Use the earliest origin time across duplicates
Preserve metadata from the highest-quality source event

Statistical averaging and uncertainty propagation follow Bayesian principles for combining independent seismic observations (Schorlemmer et al., 2024).

Example:

Catalogue A: M4.5, depth 25 km
Catalogue B: M4.6, depth 28 km
Catalogue C: M4.4, depth 24 km

Result: M4.5 (magnitude hierarchy), depth 25 km (lowest uncertainty)

Best for:

Combining multiple independent analyses
Research where statistical robustness matters
When no single source is clearly better

Considerations:

Reduces random errors through averaging
May blur genuine differences
Works best with similar-quality sources

Note

The Average strategy is actually a hybrid approach: * Location: Weighted average using inverse-variance (lower uncertainty = higher weight). * Magnitude: Uses the Magnitude Hierarchy (Mw > Ms > mb > ML) rather than a simple mean to avoid saturation errors. * Depth: Selects the depth with the lowest reported uncertainty.

        graph TD
    subgraph Sources ["Input Duplicates"]
        S1["Source A: M4.5, ±2km"]
        S2["Source B: M4.7, ±10km"]
    end

    subgraph Processing ["Hybrid Averaging"]
        Loc["Location: Weighted Mean<br/>(Source A weighted 5x)"]
        Mag["Magnitude: Hierarchy<br/>(Prefers Mw over ML)"]
        Dep["Depth: Best Uncertainty"]
    end

    Result["Merged Hybrid Event"]

    Sources --> Processing
    Processing --> Result

Newest Data Strategy

How it works:

Compare origin times across duplicate events
Keep the event with the latest origin time
Useful when later earthquakes in a sequence have better solutions

Example:

Event A: Last updated 2024-01-15 (automatic solution)
Event B: Last updated 2024-01-20 (reviewed solution)

Result: Keep Event B (more recent, likely reviewed)

Best for:

Incorporating revised/reprocessed data
When recent analysis methods are preferred
Updating catalogues with final solutions

Considerations:

Assumes newer is better
May not work well if timestamps aren’t reliable
Good for refreshing operational catalogues

Most Complete Strategy

How it works:

Count the number of populated fields in each event
Keep the event with the most metadata
Preserves detailed quality information

Example:

Event A: time, lat, lon, depth, magnitude (5 fields)
Event B: time, lat, lon, depth, magnitude, uncertainty,
         phases, stations, azimuthal_gap (9 fields)

Result: Keep Event B (more complete metadata)

Best for:

Preserving detailed quality metrics
Combining sparse and detailed catalogues
Research requiring comprehensive metadata

Considerations:

More fields doesn’t always mean better data
May prefer verbose but lower-quality data
Good for maximizing available information

Quality Score Strategy

The platform’s most advanced strategy uses a 100-point composite index to rank events. It evaluates quality across six dimensions, based on international standards for network performance and location accuracy (Bondár, 2004; Bondár & Storchak, 2011; Bormann, 2012):

Station Coverage (25 pts): Logarithmic scale (30+ stations = max points). Quality improvement is non-linear with station count (Bondár, 2004).
Azimuthal Gap (20 pts): Penalizes gaps > 180°; excellent if < 120°; zero points above 270°.
Location Precision (15 pts): Based on Standard Error / RMS residuals (ISC standard: < 0.3s is excellent).
Magnitude Uncertainty (15 pts): Lower uncertainty yields higher scores.
Magnitude Type (15 pts): Preferred order Mw > Ms > mb > ML > Md (Storchak et al., 2013).
Review Status (10 pts): “Reviewed” or “Final” status adds points over “Preliminary” solutions.

        graph TD
    subgraph Group ["Duplicate Candidates"]
        C1["Event 1: 15 stations, Gap 210°"]
        C2["Event 2: 45 stations, Gap 95°"]
    end

    subgraph Scoring ["Quality Engine"]
        S1["Event 1 Score: 45/100"]
        S2["Event 2 Score: 88/100"]
    end

    Winner["Event 2 Selected"]

    Group --> Scoring
    Scoring --> Winner

    style C2 fill:#4CAF50,color:white
    style Winner fill:#4CAF50,color:white

How Metadata is Merged

Beyond the primary fields (time, location, magnitude), the platform performs a Field-Level Union to ensure the merged catalogue is as comprehensive as possible.

Gaps Filling: If the selected primary record is missing a field (e.g., azimuthal gap or phase count) but a secondary record has it, the platform automatically fills that gap from the highest-quality secondary source.
Rich Data Preservation: Complex data types like Picks, Arrivals, and Station Magnitudes are preserved through a ranked inheritance system.
Focal Mechanism Selection: The platform automatically selects the best focal mechanism across all duplicate sources based on a hierarchy (GCMT > GeoNet > USGS > ISC) and quality metrics including station polarity count and misfit values.

Advanced Quality Control

The platform performs several advanced validation checks during the merge process to prevent “over-matching” or physical inconsistencies:

Group Size Gate: Prevents merging groups larger than 15 events, which usually indicates a threshold setting that is too loose.
Spatial Spread Analysis: For groups of 4 or more events, the platform calculates the spatial spread. If it exceeds the magnitude-scaled threshold (100 km for M < 5, 150 km for M 5–6, 200 km for M ≥ 6), the group is rejected and each event is kept as a separate unique event.
Network Mismatch: If the same network reports two different events in the same group, the platform identifies these as likely distinct events (e.g., foreshock/aftershock) and prevents them from being merged.

Scientific Accuracy Features

The platform includes several specialized algorithms to ensure seismological rigour:

Latitude-Aware Spatial Indexing: The search grid adjusts its cell dimensions based on latitude to maintain consistent distance thresholds near the poles and the equator.
Date Line Normalization: Merging events near the International Date Line (±180°) uses Cartesian unit-vector averaging to avoid mathematical errors that occur with simple arithmetic means.
Uncertainty-Weighted Locations: When averaging locations, the platform weights the result by the inverse of the reported horizontal uncertainty (geometric mean of latitude/longitude errors).
Regional Authority Hierarchy: The platform recognizes regional boundaries. For example, it automatically prioritizes GeoNet for events within New Zealand and JMA for events in Japan. This preference is supported by regional quality assessments that show local network superiority for inland and near-shore events (Warren-Smith et al., 2025).

Merge Process

Step 1: Navigate to Merge Page

Click Merge in the navigation menu or go to /merge.

Step 2: Select Source Catalogues

Select two or more catalogues to merge:

Available Catalogues:
☑ GeoNet - New Zealand 2024      (15,432 events)
☑ USGS - Southwest Pacific       (3,241 events)
☑ Local Network Data             (8,756 events)
☐ Historical Catalogue 1990-2000 (45,123 events)

Selected: 3 catalogues, 27,429 total events

Tip

Start with 2-3 catalogues. For complex merges, consider an iterative approach (merge two first, then add more).

Step 3: Configure Matching Rules

Set thresholds for duplicate detection. These thresholds are adaptively scaled based on event magnitude and depth (Tanaka et al., 2022), as larger earthquakes typically have larger location and timing uncertainties in global reports (Benz et al., 2019):

Time Window

Default: ± 60 seconds

Stricter: ± 30 seconds (fewer false matches)
Looser:   ± 120 seconds (catch more duplicates)

Distance Threshold

Default: 50 km

Stricter: 25 km (regional, well-located events)
Looser:   100 km (global, poorly-located events)

Magnitude Difference

Default: 0.5

Stricter: 0.3 (same magnitude scale)
Looser:   1.0 (different magnitude scales)

Threshold Guidelines:

Scenario	Time	Distance	Magnitude
High-quality regional	± 30s	25 km	0.3
Standard national	± 60s	50 km	0.5
Global catalogues	± 120s	100 km	0.5
Historical data	± 180s	150 km	1.0

Step 4: Choose Merge Strategy

Select your conflict resolution strategy:

Quality-Based (Recommended) - Scores each duplicate event 0–100 and keeps the highest-scoring one (station count, azimuthal gap, RMS, magnitude uncertainty, magnitude type, review status).
Priority-Based - Select a primary catalogue; its events always win.
Average Values - Computes a weighted-average location, applies magnitude hierarchy, and picks the lowest-uncertainty depth.
Newest Data - Keeps the event with the latest origin time.
Most Complete - Keeps the event with the most populated fields.

Note

Regardless of the strategy chosen, the platform always applies date line normalisation and validation gates. Magnitude hierarchy and depth uncertainty selection are specific to the Average strategy. The strategy controls which event’s core parameters win when duplicates are resolved.

Tip

Use Quality-Based for scientific work — it selects the most reliable origin automatically. Use Priority-Based when you have a single authoritative source (e.g., always prefer GeoNet for New Zealand events).

Note

During the merge process, different magnitude scales (ML, mb, Ms) are automatically converted to a common Moment Magnitude (Mw) scale using the empirical relationships of Scordilis (2006) to ensure comparability across catalogues.

Step 5: Configure Priority (if applicable)

If using Priority-Based strategy, rank your catalogues:

Priority Order:
GeoNet - New Zealand 2024     (highest priority)
Local Network Data
USGS - Southwest Pacific      (lowest priority)

Step 6: Name the Merged Catalogue

Provide a descriptive name:

Good names:
- "NZ Combined Catalogue 2024 (GeoNet + USGS)"
- "Canterbury Region - All Sources 2020-2024"
- "Research Catalogue v2 - Priority Merged"

Avoid:
- "merged"
- "test123"

Step 7: Execute Merge

Click Merge Catalogues to begin processing.

Processing Steps:

Load events from all source catalogues
Build spatial grid index for efficient geographic lookups
Find candidate duplicates within time and distance windows
Apply distance and magnitude criteria
Resolve conflicts using selected strategy
Record provenance for all events
Calculate quality scores for merged events
Generate summary statistics

Progress Display:

Merging catalogues...
[████████████████████░░░░░░░░░░░░] 65%

Loaded:     27,429 events from 3 catalogues
Candidates: 1,247 potential duplicate groups
Processing: Group 812 of 1,247

Merge Results

After completion, review the summary:

Merge Complete!
===============

Source Catalogues:     3
Total Input Events:    27,429

Duplicate Analysis:
-------------------
Unique Events:         24,891 (retained)
Duplicate Groups:      1,269
Total Duplicates:      2,538 (resolved)

By Source:
- GeoNet:              15,432 events → 14,210 unique
- USGS:                3,241 events  → 2,891 unique
- Local Network:       8,756 events  → 7,790 unique

Final Catalogue:       24,891 events

Processing Time:       12.5 seconds

Detailed Statistics

View additional merge statistics:

Duplicate size distribution: How many events per duplicate group
Match criteria breakdown: Which criteria matched
Source contribution: Events from each catalogue
Quality impact: How quality scores changed

Source Tracking

Provenance Metadata

Every event in the merged catalogue includes:

source_catalogue_id: Original catalogue identifier
source_event_id: Original event ID
merge_strategy: How conflicts were resolved
duplicate_sources: Other catalogues with matching events
merge_timestamp: When the merge was performed

Viewing Provenance

In the event detail view:

Event: 2024-01-15 10:30:45 M4.5

Source Information:
-------------------
Primary Source:  GeoNet - New Zealand 2024
Original ID:     2024p123456

Also found in:
- USGS - Southwest Pacific (ID: us7000abc1)
- Local Network Data (ID: local-2024-0451)

Merge Strategy:  Priority-Based (GeoNet primary)
Merged On:       2024-01-20 14:35:22 UTC

Best Practices

Before Merging

Review source catalogues:
- Check time coverage overlap
- Verify geographic coverage
- Compare event counts for same periods
Understand magnitude scales:
- ML (local) vs. Mw (moment) differ systematically
- Consider magnitude conversions before merging
Check data quality:
- Review quality distributions
- Note any known issues

Threshold Selection

Start conservative, then loosen:

Begin with strict thresholds (30s, 25km, 0.3)
Run merge and review matched pairs
If too many missed duplicates, loosen thresholds
If too many false matches, tighten thresholds

Document your choices:

Keep a record of threshold values and reasoning for reproducibility.

Quality Assurance

After merging:

Spot-check matched pairs:
- Review some duplicate groups manually
- Verify they’re truly the same event
Check edge cases:
- Events near threshold boundaries
- Very large or very small events
Compare statistics:
- Event counts by magnitude
- Temporal distribution
- Spatial patterns

Advanced Features

Filtered Merging

Merge subsets of catalogues:

Export filtered events from each catalogue
Upload filtered data as new catalogues
Merge the filtered catalogues

Example: Merge only M4+ events from regional catalogues:

Export GeoNet M≥4 events → "GeoNet_M4plus"
Export USGS M≥4 events → "USGS_M4plus"
Merge these filtered catalogues

Iterative Merging

For complex multi-source merges:

Stage 1: GeoNet + Local Network
         (Priority: GeoNet)
         → "NZ_National_Regional"

Stage 2: NZ_National_Regional + USGS
         (Priority: NZ_National_Regional)
         → "NZ_Comprehensive"

Stage 3: NZ_Comprehensive + Historical
         (Strategy: Newest Data)
         → "NZ_Complete_1900-2024"

Benefits:

Better control over conflict resolution
Easier to troubleshoot issues
Can use different strategies at each stage

Re-merging with Updated Data

When source catalogues are updated:

Delete the old merged catalogue
Re-run merge with same parameters
Quality scores are recalculated automatically

Troubleshooting

Too Many Duplicates Found

Symptoms: High duplicate count, unexpected matches

Solutions:

Tighten time window (try ± 30s)
Reduce distance threshold (try 25 km)
Reduce magnitude threshold (try 0.3)
Review matched pairs for false positives

Too Few Duplicates Found

Symptoms: Expected duplicates not matched

Solutions:

Loosen time window (try ± 120s)
Increase distance threshold (try 100 km)
Increase magnitude threshold (try 1.0)
Check for systematic time or location offsets

Merge Takes Too Long

Symptoms: Processing stalls or times out

Solutions:

Merge fewer catalogues at once
Filter to smaller event sets first
Increase Node.js memory allocation
Run during off-peak hours

Unexpected Results

Symptoms: Merged catalogue has incorrect data

Solutions:

Verify priority order is correct
Check that strategy matches your intent
Review source catalogue data quality
Try a different merge strategy

Testing and Validation

The merging algorithms are rigorously tested to ensure data integrity and accurate conflict resolution. The core test suite (located in __tests__/lib/merge.test.ts) covers:

Spatial Indexing & Grid Operations: Validates geographic bounds handling, including complex Date Line crossing scenarios.
Adaptive Threshold Matching: Ensures distance and time thresholds scale appropriately across magnitude and depth ranges.
Merge Strategies: Verifies the correct behavior of the quality, priority, average, newest, and complete merge strategies.
Validation of Event Groups: Ensures anomalous clusters (e.g., highly divergent depths or magnitudes) are correctly flagged and handled.
Magnitude Hierarchy: Validates that standard magnitude scales are correctly prioritized (e.g., Mw over ML).

Next Steps

After merging:

Visualization - View merged catalogue on the map
Quality Assessment - Review quality distributions
Exporting Data - Export for analysis or sharing

References

The merging algorithms and conflict resolution strategies in this platform are based on established seismological literature:

Warren-Smith, E., et al. (2025). A quantitative assessment of GeoNet earthquake location quality in Aotearoa New Zealand. New Zealand Journal of Geology and Geophysics. (Regional network performance and station thresholds).
Bondár, I. (2004). Epicentre Accuracy Based on Seismic Network Criteria. Geophysical Journal International. (Network geometry and station count requirements).
Bormann, P., Ed. (2012). IASPEI New Manual of Seismological Observatory Practice (NMSOP-2). Deutsches GeoForschungsZentrum GFZ. (Standardized quality metrics and reporting).
Bondár, I., & Storchak, D. A. (2011). Improved Location Procedures at the International Seismological Centre. Geophysical Journal International. (Quality scoring and azimuthal gap/RMS thresholds).
Storchak, D. A., et al. (2013). Public Release of the ISC-GEM Global Instrumental Earthquake Catalogue (1900-2009). Seismological Research Letters. (Parameter selection and magnitude hierarchy).
Benz, H. M., et al. (2019). Improving Automated Earthquake Association with NEIC Hydra. Bulletin of the Seismological Society of America. (Graph-theoretic event association and group validation).
Tanaka, M., et al. (2022). Discrimination of Seismic Catalogue Duplicates During Aftershock Sequences Using the Nearest-Neighbour Method. Frontiers in Earth Science. (Adaptive thresholds for dense sequences).
Schorlemmer, D., et al. (2024). A Bayesian Merging of Earthquake Magnitudes from Multiple Networks. Seismological Research Letters. (Principles of magnitude averaging).
Scordilis, E. M. (2006). Empirical Global Relations Converting Ms and mb to Moment Magnitude. Journal of Seismology. (Magnitude conversion formulas).