Skip to content

Inspecting the database

This is a reference page for understanding what CHAP stored. It is not a required workshop step.

Two PostgreSQL databases sit behind the stack: chap-core's (models, runs, and the data they used) and DHIS2's (all DHIS2 metadata and data). The APIs are usually the better way in, but raw SQL is handy for digging into a run or answering a question the API does not.

Database Service Host port DB name User / password
chap-core chap-postgres 15433 chap_core chap / chap
DHIS2 dhis2-db 15432 dhis dhis / dhis

Both are bound to 127.0.0.1 (the bundled stack; ports are overridable with CHAP_DB_PORT / DHIS2_DB_PORT).

Connecting

A. Into the container (no host tools needed)

Open a psql prompt inside the database container - works even if you have no SQL client installed:

# chap database
docker compose -f compose.chapkit.yml exec chap-postgres psql -U chap -d chap_core
# DHIS2 database
docker compose -f compose.chapkit.yml exec dhis2-db psql -U dhis -d dhis

If you run CHAP from source, the chap database service is postgres instead (DHIS2 is unchanged):

docker compose -f compose.yml -f compose.chapkit.yml exec postgres psql -U chap -d chap_core

At the prompt, \dt lists tables and \q quits. Run a one-off query without the prompt by adding -c "<sql>".

B. From the host (psql or a GUI)

The databases are published on 127.0.0.1, so a local client can connect directly:

psql -h 127.0.0.1 -p 15433 -U chap chap_core   # chap   (password: chap)
psql -h 127.0.0.1 -p 15432 -U dhis dhis         # DHIS2  (password: dhis)

Or point a GUI such as DBeaver or pgAdmin at host 127.0.0.1 and those ports.

C. pgAdmin (a web GUI)

If you prefer clicking to typing, run pgAdmin - a web UI for Postgres - on the stack's network so it can reach the databases by service name:

docker run -d --name dac-pgadmin --network docker-dhis2-core_default -p 5050:80 \
  -e PGADMIN_DEFAULT_EMAIL=admin@example.com \
  -e PGADMIN_DEFAULT_PASSWORD=admin \
  -e PGADMIN_CONFIG_SERVER_MODE=False \
  dpage/pgadmin4

Open http://localhost:5050, set a master password when prompted, then Add New Server. On the Connection tab use the service name as the host (pgAdmin is on the same Docker network):

Field chap DHIS2
Host name/address chap-postgres dhis2-db
Port 5432 5432
Maintenance database chap_core dhis
Username / Password chap / chap dhis / dhis

pgAdmin: registering the chap database

Once connected, open Tools -> Query Tool to run SQL (below), or expand the tree to Databases -> chap_core -> Schemas -> public -> Tables and right-click a table -> View/Edit Data to browse it without writing any SQL.

pgAdmin: querying the chap database

Stop pgAdmin when you are done with docker rm -f dac-pgadmin.

What's in the chap database

docker compose -f compose.chapkit.yml exec chap-postgres psql -U chap -d chap_core -c '\dt'

The tables that matter most:

  • backtest, backtestforecast - evaluations and their predicted values
  • prediction, predictionsamplesentry - forecasts and their samples
  • dataset, observation - the data a run used
  • configuredmodeldb, modeltemplatedb - the models and templates

A few example queries (add them to the exec … -c "…" form, or run them at a psql prompt):

-- the evaluations you have run
select id, name, model_id from backtest order by id;

-- your forecasts and how far ahead they go
select id, name, model_id, n_periods from prediction order by id;

-- how much data is stored, and how many forecast rows
select count(*) from observation;
select count(*) from backtestforecast;

-- the configured models
select id, name from configuredmodeldb order by id;

Most of this is on the chap API too (/v1/crud/backtests, …), so reach for SQL when you need a join or an aggregate the API does not give you.

What a run writes to the database

Running a model touches a precise set of tables. Measured on the shared configuration (18 provinces × 24 months, 4 features), each operation first snapshots its input data (dataset + observation), then writes its own outputs:

Table Rows added What
dataset +1 the data snapshot this run used
observation +1728 every input value (4 features × 18 org units × 24 months)
backtest +1 the evaluation itself - config and aggregate metrics
backtestforecast +378 the predictions it scored (7 splits × 18 org units × 3 months)
Table Rows added What
dataset +1 its own data snapshot
observation +1728 the input values again
prediction +1 the forecast itself
predictionsamplesentry +54 the forecast values (18 org units × 3 months)

backtestmetric stays empty (the metrics live in backtest.aggregate_metrics). A predictionsetup row is written when you create a prediction setup (the usual path in the app and the API walkthrough - one per backtest); each forecast run from it adds the prediction row above with its prediction_setup_id set. A one-off make-prediction writes the prediction with prediction_setup_id left null.

The affected tables, column by column

dataset - the input snapshot, one row per run:

Column Meaning
id primary key
name the run's name
type evaluation or prediction
period_type e.g. month
first_period / last_period the period range covered
org_units JSON array of org-unit ids
covariates JSON array of the model's feature names
data_sources JSON [{covariate, dataElementId}] - which DHIS2 data element backs each feature
geojson the org-unit polygons (a stringified FeatureCollection)
created timestamp

observation - the input values, many rows per dataset:

Column Meaning
id primary key
dataset_id dataset
org_unit / period where / when
feature_name which feature (disease_cases, population, rainfall, mean_temperature)
value the number (null where the source had no value)

backtest - one row per evaluation:

Column Meaning
id primary key
name the run's name
model_id the configured model's canonical name
model_db_id configuredmodeldb (its numeric id)
model_template_version the template version used
dataset_id dataset (the data used)
org_units JSON array of evaluated org units
split_periods JSON array of the backtest cut points (where train/test was split)
aggregate_metrics JSON of the computed metrics (mae, rmse, crps, coverage_*, …)
created timestamp

backtestforecast - the evaluation's predictions to score, many rows per backtest:

Column Meaning
id primary key
backtest_id backtest
org_unit / period where / when the prediction is for
values JSON of the predicted quantiles/samples
last_train_period the last period the model trained on (the split)
last_seen_period the last period it had data for

prediction - one row per forecast:

Column Meaning
id primary key
name the run's name
model_id / model_db_id the model used
dataset_id dataset
n_periods how many future periods were forecast
org_units JSON array of org units
prediction_setup_id predictionsetup if it came from a saved setup, else null
meta_data JSON metadata
created timestamp

predictionsamplesentry - the forecast values, many rows per prediction:

Column Meaning
id primary key
prediction_id prediction
org_unit / period where / when
values JSON of the predicted samples for that location and future period

The DHIS2 database

Same idea on port 15432. DHIS2's schema is large and the API is almost always the better tool for DHIS2 data - it enforces permissions and returns clean JSON. Use SQL for low-level checks:

docker compose -f compose.chapkit.yml exec dhis2-db psql -U dhis -d dhis \
  -c "select count(*) from organisationunit;"

Read, don't write

These are the live databases. Inspect freely with SELECTs, but do not hand-edit rows - changes bypass the application logic and can corrupt a run or your DHIS2 instance. To start clean instead, recreate the stack: docker compose -f compose.chapkit.yml down -v.

Assignment: look inside chap

  • Open a psql prompt in the chap database and run \dt.
  • List your backtests and count(*) the observations.

What's next

Now that you can see the data behind a run, the next step is keeping it safe - Backup and restore covers dumping and restoring these databases.