The context data ETL database used by the batch-ingest application may run out of memory if its hardware is improperly configured.

Diagnosis

Occasionally, an Airflow task in the batch-ingest application may fail with the following error:

EOF Detected
psycopg2.OperationalError: SSL SYSCALL error: EOF detected


You may also see:

Database in recovery mode
psycopg2.OperationalError: FATAL: the database system is in recovery mode
FATAL: the database system is in recovery mode

Root cause

This error occurs when the database used by the batch-ingest application to transform UDP data runs out of memory and goes into recovery mode. At that point, the database becomes unreachable to the batch-ingest application and the task fails.

Solution

There is a two-part solution to the problem. First, you must grow the memory resources available to the DB instance. Second, you must run the failed tasks again by clearing them.

1. Increase RAM available to the database

You can change the CloudSQL instance settings in the Google Cloud console. In particular, you will grow the RAM available to the Cloud SQL instance.

2. Clear any failed tasks

Once the Cloud SQL instance is upgraded with a proper amount of RAM, you will re-run the failed tasks by clearing them.



  • No labels