How It Works - DaedalMap Docs

Storage and query engine

All pack data is stored as Parquet files. The query engine is DuckDB with the httpfs extension, which allows DuckDB to read Parquet directly from object storage (R2 / S3) without staging a full file locally. Predicate pushdown applies at the Parquet reader level — row groups that don't match the query filter are skipped before data is transferred.

For a typical filtered query (location, time range, metric), only the relevant row groups are read. This is why local mode is 2x to 6x faster than hosted for the same query: the Parquet files are on local disk and there is no network round trip for each row group fetch.

Catalog loading

On startup, the runtime reads a catalog file that maps pack identifiers to their Parquet paths, schema version, coverage metadata, and access tier. The active catalog determines what the runtime can see. Hosted, local, and self-hosted deployments all use the same catalog format — the difference is which catalog is loaded and what paths it points to.

Discovery endpoints (GET /api/v1/catalog and GET /api/v1/packs/{pack_id}) read from the loaded catalog and return structured metadata. No query execution happens at discovery time.

Query execution pipeline

A query arrives at POST /api/v1/query/dataset with a pack identifier, filters (location, time range, metrics), and a row limit. The pipeline:

Validate the request against the catalog entry for the named pack.
Check access tier. Free packs proceed. Paid packs return HTTP 402 with a payment challenge before execution.
Translate filters to a DuckDB SQL query with predicate pushdown on the relevant Parquet partition.
Execute against the Parquet path in the catalog (local disk or object storage via httpfs).
Apply the row limit. Return structured rows.

Requests that exceed the source maximum reject before the payment challenge — the engine does not charge for a query it will not execute.

MCP routing

The MCP server at /mcp is a thin wrapper over the same discovery and query pipeline. tools/list returns one tool per pack from the active catalog. Tool calls translate to the same internal query path as direct HTTP calls. There is no separate data layer behind MCP — the same Parquet files, the same DuckDB engine, the same access tier checks.

Pack-specific facades (/mcp/currency, /mcp/earthquakes, etc.) expose a single-pack tool list. These exist for registry discoverability; they do not change the runtime architecture.

Geography layer

Every pack normalizes location to loc_id, a shared geography key. This key is hierarchical: country, region, and sub-region identifiers share a prefix scheme so queries can match at any level without joining separate geometry tables. See the loc_id guide for the full reference.

Pack-specific region identifiers (ocean basin codes for tsunamis, XOO for international waters) extend the base scheme rather than replacing it.

Pack release pipeline

A source data sheet specifies the raw source, schema mapping, field normalization, and QA requirements for a pack. A pack builder converts the source to Parquet against that spec, validates the output, and produces a catalog entry. The catalog entry is what the runtime loads — the pack does not exist to the engine until it appears in the catalog at or above the minimum release tier.

Release tiers (core, standard, experimental, private) signal quality and availability. The runtime applies the same query path regardless of tier; tier controls access and discovery visibility, not execution behavior.