Merging Catalogues

Learn how to merge multiple earthquake catalogues with automated duplicate detection and configurable conflict resolution strategies.

Overview

Catalogue merging allows you to combine earthquake data from multiple sources into a unified, comprehensive catalogue. This is essential for:

  • Combining regional and national catalogues

  • Integrating historical and modern data

  • Comparing independent analyses of the same events

  • Creating research-ready datasets from multiple sources

Key platform features include:

  • 🆕 Quality-Based Strategy (Recommended): A new merge strategy that scores every duplicate event on a 0–100 point index (station count, azimuthal gap, location error, magnitude uncertainty, magnitude type, review status) and keeps the highest-scoring event. This is a user-selectable option — see Merge Strategies below.

  • Automated Duplicate Detection: Matches events across catalogues using time, location, and magnitude criteria.

  • Complete Provenance: Tracks the source of every event in the merged result.

  • Configurable Thresholds: Adjust matching parameters for different data types.

The platform also applies underlying algorithm improvements. Some run for every strategy, others are specific to the Average strategy:

  • Date Line Normalisation (all strategies): Spatial matching near ±180° uses unit-vector averaging to avoid arithmetic errors in the Pacific region.

  • Validation Gates (all strategies): Rejects physically inconsistent duplicate groups before any strategy is applied (e.g., an M4.0 matched against an M7.0, or a group spanning > 200 km).

  • Magnitude Hierarchy (Average strategy): Uses the ISC standard (Mw > Ms > mb > ML) when computing averages, preventing saturation errors from mixing incompatible scales. Other strategies keep the winning event’s existing magnitude unchanged.

  • Depth Uncertainty Selection (Average strategy): Selects the depth with the lowest reported uncertainty rather than a simple mean. Other strategies inherit depth directly from the winning event.

Merge Process Overview

        flowchart TD
    subgraph Inputs ["Input Catalogues"]
        A["Catalogue A"]
        B["Catalogue B"]
        C["Catalogue C"]
    end

    Combine["COMBINE ALL EVENTS"]
    Detect["DETECT DUPLICATES<br/>(time + location + magnitude)"]
    Resolve["RESOLVE CONFLICTS<br/>(apply selected merge strategy)"]
    Result["MERGED CATALOGUE<br/>(unique events with provenance)"]

    A & B & C --> Combine
    Combine --> Detect
    Detect --> Resolve
    Resolve --> Result
    

Understanding Duplicates

What Makes Events Duplicates?

Two events are considered duplicates if they likely represent the same earthquake recorded in different catalogues. The platform uses three criteria, with default thresholds and association logic based on international standards for global and regional earthquake association (Storchak et al., 2013; Benz et al., 2019):

Criterion

Default Threshold

Rationale

Time

± 60 seconds

Origin times may differ due to analysis methods

Distance

≤ 50 km

Locations vary based on velocity models and data

Magnitude

≤ 0.5

Different scales and stations affect magnitude

All three criteria must be met for events to be considered duplicates.

Why Duplicates Occur

Different catalogues may have different:

  • Seismic networks: Regional vs. global station coverage

  • Velocity models: Affect calculated locations

  • Magnitude scales: ML, Mw, mb produce different values

  • Analysis procedures: Automatic vs. manual processing

  • Update schedules: Preliminary vs. final solutions

Example Duplicate Detection

Catalogue A: 2024-01-15 10:30:45, M4.5, -41.50, 174.20
Catalogue B: 2024-01-15 10:30:47, M4.6, -41.51, 174.21

Time difference:    2 seconds   (< 60s threshold) ✓
Distance:           1.4 km      (< 50km threshold) ✓
Magnitude diff:     0.1         (< 0.5 threshold) ✓

Result: These are duplicates (same earthquake)

Merge Strategies

Choose the strategy that best fits your use case:

Strategy Decision Guide

        flowchart TD
    Start{"Do you want the<br/>best scientific result?"} -- YES --> Quality["Use Quality-Based<br/>(Recommended)"]
    Start -- NO --> Auth{"Do you have one<br/>authoritative source?"}

    Auth -- YES --> Priority["Use Priority-Based"]
    Auth -- NO --> Recent{"Do your duplicate events<br/>have different origin times<br/>and newer = more reliable?"}

    Recent -- YES --> Newest["Use Newest Data"]
    Recent -- NO --> Matter{"Which matters more?"}

    Matter -- "Metadata completeness" --> Complete["Use Most Complete"]
    Matter -- "Statistical accuracy" --> Average["Use Average Values"]
    

Priority-Based Strategy

How it works:

  • You designate one catalogue as “primary”

  • When duplicates are found, keep the primary catalogue’s event

  • Discard the duplicate from other catalogues

This approach follows the principle of network authority, where local networks are prioritized for regional events as recommended by Bondár & Storchak (2011).

Example:

Primary (GeoNet):   M4.5, depth 25 km, 42 phases
Secondary (USGS):   M4.6, depth 28 km, 15 phases

Result: Keep GeoNet event (M4.5, depth 25 km, 42 phases)

Best for:

  • Merging regional data with a trusted national catalogue

  • When one source has consistently better quality

  • Operational settings where one authority is preferred

Considerations:

  • Simple and predictable

  • May discard valid information from secondary sources

  • Assumes primary source is always correct

        graph LR
    subgraph Inputs ["Duplicate Group"]
        E1["USGS (M4.6)"]
        E2["GeoNet (M4.5)"]
        E3["ISC (M4.5)"]
    end

    subgraph Logic ["Priority Logic"]
        P1["1. GeoNet (Primary)"]
        P2["2. USGS"]
        P3["3. ISC"]
    end

    Result["GeoNet Event Selected"]

    Inputs --> Logic
    Logic --> Result

    style E2 fill:#f9f,stroke:#333,stroke-width:2px
    style Result fill:#f9f,stroke:#333,stroke-width:2px
    

Average Values Strategy

How it works:

  • Compute a weighted-average location (lower uncertainty = higher weight)

  • Select the best magnitude using the ISC hierarchy (Mw > Ms > mb > ML)

  • Pick the depth with the lowest reported uncertainty

  • Use the earliest origin time across duplicates

  • Preserve metadata from the highest-quality source event

Statistical averaging and uncertainty propagation follow Bayesian principles for combining independent seismic observations (Schorlemmer et al., 2024).

Example:

Catalogue A: M4.5, depth 25 km
Catalogue B: M4.6, depth 28 km
Catalogue C: M4.4, depth 24 km

Result: M4.5 (magnitude hierarchy), depth 25 km (lowest uncertainty)

Best for:

  • Combining multiple independent analyses

  • Research where statistical robustness matters

  • When no single source is clearly better

Considerations:

  • Reduces random errors through averaging

  • May blur genuine differences

  • Works best with similar-quality sources

Note

The Average strategy is actually a hybrid approach: * Location: Weighted average using inverse-variance (lower uncertainty = higher weight). * Magnitude: Uses the Magnitude Hierarchy (Mw > Ms > mb > ML) rather than a simple mean to avoid saturation errors. * Depth: Selects the depth with the lowest reported uncertainty.

        graph TD
    subgraph Sources ["Input Duplicates"]
        S1["Source A: M4.5, ±2km"]
        S2["Source B: M4.7, ±10km"]
    end

    subgraph Processing ["Hybrid Averaging"]
        Loc["Location: Weighted Mean<br/>(Source A weighted 5x)"]
        Mag["Magnitude: Hierarchy<br/>(Prefers Mw over ML)"]
        Dep["Depth: Best Uncertainty"]
    end

    Result["Merged Hybrid Event"]

    Sources --> Processing
    Processing --> Result
    

Newest Data Strategy

How it works:

  • Compare origin times across duplicate events

  • Keep the event with the latest origin time

  • Useful when later earthquakes in a sequence have better solutions

Example:

Event A: Last updated 2024-01-15 (automatic solution)
Event B: Last updated 2024-01-20 (reviewed solution)

Result: Keep Event B (more recent, likely reviewed)

Best for:

  • Incorporating revised/reprocessed data

  • When recent analysis methods are preferred

  • Updating catalogues with final solutions

Considerations:

  • Assumes newer is better

  • May not work well if timestamps aren’t reliable

  • Good for refreshing operational catalogues

Most Complete Strategy

How it works:

  • Count the number of populated fields in each event

  • Keep the event with the most metadata

  • Preserves detailed quality information

Example:

Event A: time, lat, lon, depth, magnitude (5 fields)
Event B: time, lat, lon, depth, magnitude, uncertainty,
         phases, stations, azimuthal_gap (9 fields)

Result: Keep Event B (more complete metadata)

Best for:

  • Preserving detailed quality metrics

  • Combining sparse and detailed catalogues

  • Research requiring comprehensive metadata

Considerations:

  • More fields doesn’t always mean better data

  • May prefer verbose but lower-quality data

  • Good for maximizing available information

Quality Score Strategy

The platform’s most advanced strategy uses a 100-point composite index to rank events. It evaluates quality across six dimensions, based on international standards for network performance and location accuracy (Bondár, 2004; Bondár & Storchak, 2011; Bormann, 2012):

  • Station Coverage (25 pts): Logarithmic scale (30+ stations = max points). Quality improvement is non-linear with station count (Bondár, 2004).

  • Azimuthal Gap (20 pts): Penalizes gaps > 180°; excellent if < 120°; zero points above 270°.

  • Location Precision (15 pts): Based on Standard Error / RMS residuals (ISC standard: < 0.3s is excellent).

  • Magnitude Uncertainty (15 pts): Lower uncertainty yields higher scores.

  • Magnitude Type (15 pts): Preferred order Mw > Ms > mb > ML > Md (Storchak et al., 2013).

  • Review Status (10 pts): “Reviewed” or “Final” status adds points over “Preliminary” solutions.

        graph TD
    subgraph Group ["Duplicate Candidates"]
        C1["Event 1: 15 stations, Gap 210°"]
        C2["Event 2: 45 stations, Gap 95°"]
    end

    subgraph Scoring ["Quality Engine"]
        S1["Event 1 Score: 45/100"]
        S2["Event 2 Score: 88/100"]
    end

    Winner["Event 2 Selected"]

    Group --> Scoring
    Scoring --> Winner

    style C2 fill:#4CAF50,color:white
    style Winner fill:#4CAF50,color:white
    

How Metadata is Merged

Beyond the primary fields (time, location, magnitude), the platform performs a Field-Level Union to ensure the merged catalogue is as comprehensive as possible.

  1. Gaps Filling: If the selected primary record is missing a field (e.g., azimuthal gap or phase count) but a secondary record has it, the platform automatically fills that gap from the highest-quality secondary source.

  2. Rich Data Preservation: Complex data types like Picks, Arrivals, and Station Magnitudes are preserved through a ranked inheritance system.

  3. Focal Mechanism Selection: The platform automatically selects the best focal mechanism across all duplicate sources based on a hierarchy (GCMT > GeoNet > USGS > ISC) and quality metrics including station polarity count and misfit values.

Advanced Quality Control

The platform performs several advanced validation checks during the merge process to prevent “over-matching” or physical inconsistencies:

  • Group Size Gate: Prevents merging groups larger than 15 events, which usually indicates a threshold setting that is too loose.

  • Spatial Spread Analysis: For groups of 4 or more events, the platform calculates the spatial spread. If it exceeds the magnitude-scaled threshold (100 km for M < 5, 150 km for M 5–6, 200 km for M ≥ 6), the group is rejected and each event is kept as a separate unique event.

  • Network Mismatch: If the same network reports two different events in the same group, the platform identifies these as likely distinct events (e.g., foreshock/aftershock) and prevents them from being merged.

Scientific Accuracy Features

The platform includes several specialized algorithms to ensure seismological rigour:

  • Latitude-Aware Spatial Indexing: The search grid adjusts its cell dimensions based on latitude to maintain consistent distance thresholds near the poles and the equator.

  • Date Line Normalization: Merging events near the International Date Line (±180°) uses Cartesian unit-vector averaging to avoid mathematical errors that occur with simple arithmetic means.

  • Uncertainty-Weighted Locations: When averaging locations, the platform weights the result by the inverse of the reported horizontal uncertainty (geometric mean of latitude/longitude errors).

  • Regional Authority Hierarchy: The platform recognizes regional boundaries. For example, it automatically prioritizes GeoNet for events within New Zealand and JMA for events in Japan. This preference is supported by regional quality assessments that show local network superiority for inland and near-shore events (Warren-Smith et al., 2025).

Merge Process

Step 1: Navigate to Merge Page

Click Merge in the navigation menu or go to /merge.

Step 2: Select Source Catalogues

Select two or more catalogues to merge:

Available Catalogues:
☑ GeoNet - New Zealand 2024      (15,432 events)
☑ USGS - Southwest Pacific       (3,241 events)
☑ Local Network Data             (8,756 events)
☐ Historical Catalogue 1990-2000 (45,123 events)

Selected: 3 catalogues, 27,429 total events

Tip

Start with 2-3 catalogues. For complex merges, consider an iterative approach (merge two first, then add more).

Step 3: Configure Matching Rules

Set thresholds for duplicate detection. These thresholds are adaptively scaled based on event magnitude and depth (Tanaka et al., 2022), as larger earthquakes typically have larger location and timing uncertainties in global reports (Benz et al., 2019):

Time Window

Default: ± 60 seconds

Stricter: ± 30 seconds (fewer false matches)
Looser:   ± 120 seconds (catch more duplicates)

Distance Threshold

Default: 50 km

Stricter: 25 km (regional, well-located events)
Looser:   100 km (global, poorly-located events)

Magnitude Difference

Default: 0.5

Stricter: 0.3 (same magnitude scale)
Looser:   1.0 (different magnitude scales)

Threshold Guidelines:

Scenario

Time

Distance

Magnitude

High-quality regional

± 30s

25 km

0.3

Standard national

± 60s

50 km

0.5

Global catalogues

± 120s

100 km

0.5

Historical data

± 180s

150 km

1.0

Step 4: Choose Merge Strategy

Select your conflict resolution strategy:

  • Quality-Based (Recommended) - Scores each duplicate event 0–100 and keeps the highest-scoring one (station count, azimuthal gap, RMS, magnitude uncertainty, magnitude type, review status).

  • Priority-Based - Select a primary catalogue; its events always win.

  • Average Values - Computes a weighted-average location, applies magnitude hierarchy, and picks the lowest-uncertainty depth.

  • Newest Data - Keeps the event with the latest origin time.

  • Most Complete - Keeps the event with the most populated fields.

Note

Regardless of the strategy chosen, the platform always applies date line normalisation and validation gates. Magnitude hierarchy and depth uncertainty selection are specific to the Average strategy. The strategy controls which event’s core parameters win when duplicates are resolved.

Tip

Use Quality-Based for scientific work — it selects the most reliable origin automatically. Use Priority-Based when you have a single authoritative source (e.g., always prefer GeoNet for New Zealand events).

Note

During the merge process, different magnitude scales (ML, mb, Ms) are automatically converted to a common Moment Magnitude (Mw) scale using the empirical relationships of Scordilis (2006) to ensure comparability across catalogues.

Step 5: Configure Priority (if applicable)

If using Priority-Based strategy, rank your catalogues:

Priority Order:
1. GeoNet - New Zealand 2024     (highest priority)
2. Local Network Data
3. USGS - Southwest Pacific      (lowest priority)

Step 6: Name the Merged Catalogue

Provide a descriptive name:

Good names:
- "NZ Combined Catalogue 2024 (GeoNet + USGS)"
- "Canterbury Region - All Sources 2020-2024"
- "Research Catalogue v2 - Priority Merged"

Avoid:
- "merged"
- "test123"

Step 7: Execute Merge

Click Merge Catalogues to begin processing.

Processing Steps:

  1. Load events from all source catalogues

  2. Build spatial grid index for efficient geographic lookups

  3. Find candidate duplicates within time and distance windows

  4. Apply distance and magnitude criteria

  5. Resolve conflicts using selected strategy

  6. Record provenance for all events

  7. Calculate quality scores for merged events

  8. Generate summary statistics

Progress Display:

Merging catalogues...
[████████████████████░░░░░░░░░░░░] 65%

Loaded:     27,429 events from 3 catalogues
Candidates: 1,247 potential duplicate groups
Processing: Group 812 of 1,247

Merge Results

After completion, review the summary:

Merge Complete!
===============

Source Catalogues:     3
Total Input Events:    27,429

Duplicate Analysis:
-------------------
Unique Events:         24,891 (retained)
Duplicate Groups:      1,269
Total Duplicates:      2,538 (resolved)

By Source:
- GeoNet:              15,432 events → 14,210 unique
- USGS:                3,241 events  → 2,891 unique
- Local Network:       8,756 events  → 7,790 unique

Final Catalogue:       24,891 events

Processing Time:       12.5 seconds

Detailed Statistics

View additional merge statistics:

  • Duplicate size distribution: How many events per duplicate group

  • Match criteria breakdown: Which criteria matched

  • Source contribution: Events from each catalogue

  • Quality impact: How quality scores changed

Source Tracking

Provenance Metadata

Every event in the merged catalogue includes:

  • source_catalogue_id: Original catalogue identifier

  • source_event_id: Original event ID

  • merge_strategy: How conflicts were resolved

  • duplicate_sources: Other catalogues with matching events

  • merge_timestamp: When the merge was performed

Viewing Provenance

In the event detail view:

Event: 2024-01-15 10:30:45 M4.5

Source Information:
-------------------
Primary Source:  GeoNet - New Zealand 2024
Original ID:     2024p123456

Also found in:
- USGS - Southwest Pacific (ID: us7000abc1)
- Local Network Data (ID: local-2024-0451)

Merge Strategy:  Priority-Based (GeoNet primary)
Merged On:       2024-01-20 14:35:22 UTC

Best Practices

Before Merging

  1. Review source catalogues:

    • Check time coverage overlap

    • Verify geographic coverage

    • Compare event counts for same periods

  2. Understand magnitude scales:

    • ML (local) vs. Mw (moment) differ systematically

    • Consider magnitude conversions before merging

  3. Check data quality:

    • Review quality distributions

    • Note any known issues

Threshold Selection

Start conservative, then loosen:

  1. Begin with strict thresholds (30s, 25km, 0.3)

  2. Run merge and review matched pairs

  3. If too many missed duplicates, loosen thresholds

  4. If too many false matches, tighten thresholds

Document your choices:

Keep a record of threshold values and reasoning for reproducibility.

Quality Assurance

After merging:

  1. Spot-check matched pairs:

    • Review some duplicate groups manually

    • Verify they’re truly the same event

  2. Check edge cases:

    • Events near threshold boundaries

    • Very large or very small events

  3. Compare statistics:

    • Event counts by magnitude

    • Temporal distribution

    • Spatial patterns

Advanced Features

Filtered Merging

Merge subsets of catalogues:

  1. Export filtered events from each catalogue

  2. Upload filtered data as new catalogues

  3. Merge the filtered catalogues

Example: Merge only M4+ events from regional catalogues:

1. Export GeoNet M≥4 events → "GeoNet_M4plus"
2. Export USGS M≥4 events → "USGS_M4plus"
3. Merge these filtered catalogues

Iterative Merging

For complex multi-source merges:

Stage 1: GeoNet + Local Network
         (Priority: GeoNet)
         → "NZ_National_Regional"

Stage 2: NZ_National_Regional + USGS
         (Priority: NZ_National_Regional)
         → "NZ_Comprehensive"

Stage 3: NZ_Comprehensive + Historical
         (Strategy: Newest Data)
         → "NZ_Complete_1900-2024"

Benefits:

  • Better control over conflict resolution

  • Easier to troubleshoot issues

  • Can use different strategies at each stage

Re-merging with Updated Data

When source catalogues are updated:

  1. Delete the old merged catalogue

  2. Re-run merge with same parameters

  3. Quality scores are recalculated automatically

Troubleshooting

Too Many Duplicates Found

Symptoms: High duplicate count, unexpected matches

Solutions:

  1. Tighten time window (try ± 30s)

  2. Reduce distance threshold (try 25 km)

  3. Reduce magnitude threshold (try 0.3)

  4. Review matched pairs for false positives

Too Few Duplicates Found

Symptoms: Expected duplicates not matched

Solutions:

  1. Loosen time window (try ± 120s)

  2. Increase distance threshold (try 100 km)

  3. Increase magnitude threshold (try 1.0)

  4. Check for systematic time or location offsets

Merge Takes Too Long

Symptoms: Processing stalls or times out

Solutions:

  1. Merge fewer catalogues at once

  2. Filter to smaller event sets first

  3. Increase Node.js memory allocation

  4. Run during off-peak hours

Unexpected Results

Symptoms: Merged catalogue has incorrect data

Solutions:

  1. Verify priority order is correct

  2. Check that strategy matches your intent

  3. Review source catalogue data quality

  4. Try a different merge strategy

Testing and Validation

The merging algorithms are rigorously tested to ensure data integrity and accurate conflict resolution. The core test suite (located in __tests__/lib/merge.test.ts) covers:

  • Spatial Indexing & Grid Operations: Validates geographic bounds handling, including complex Date Line crossing scenarios.

  • Adaptive Threshold Matching: Ensures distance and time thresholds scale appropriately across magnitude and depth ranges.

  • Merge Strategies: Verifies the correct behavior of the quality, priority, average, newest, and complete merge strategies.

  • Validation of Event Groups: Ensures anomalous clusters (e.g., highly divergent depths or magnitudes) are correctly flagged and handled.

  • Magnitude Hierarchy: Validates that standard magnitude scales are correctly prioritized (e.g., Mw over ML).

Next Steps

After merging:

See also

References

The merging algorithms and conflict resolution strategies in this platform are based on established seismological literature:

  • Warren-Smith, E., et al. (2025). A quantitative assessment of GeoNet earthquake location quality in Aotearoa New Zealand. New Zealand Journal of Geology and Geophysics. (Regional network performance and station thresholds).

  • Bondár, I. (2004). Epicentre Accuracy Based on Seismic Network Criteria. Geophysical Journal International. (Network geometry and station count requirements).

  • Bormann, P., Ed. (2012). IASPEI New Manual of Seismological Observatory Practice (NMSOP-2). Deutsches GeoForschungsZentrum GFZ. (Standardized quality metrics and reporting).

  • Bondár, I., & Storchak, D. A. (2011). Improved Location Procedures at the International Seismological Centre. Geophysical Journal International. (Quality scoring and azimuthal gap/RMS thresholds).

  • Storchak, D. A., et al. (2013). Public Release of the ISC-GEM Global Instrumental Earthquake Catalogue (1900-2009). Seismological Research Letters. (Parameter selection and magnitude hierarchy).

  • Benz, H. M., et al. (2019). Improving Automated Earthquake Association with NEIC Hydra. Bulletin of the Seismological Society of America. (Graph-theoretic event association and group validation).

  • Tanaka, M., et al. (2022). Discrimination of Seismic Catalogue Duplicates During Aftershock Sequences Using the Nearest-Neighbour Method. Frontiers in Earth Science. (Adaptive thresholds for dense sequences).

  • Schorlemmer, D., et al. (2024). A Bayesian Merging of Earthquake Magnitudes from Multiple Networks. Seismological Research Letters. (Principles of magnitude averaging).

  • Scordilis, E. M. (2006). Empirical Global Relations Converting Ms and mb to Moment Magnitude. Journal of Seismology. (Magnitude conversion formulas).