Skip to content

Unstable test - TestTestgresCommon::test_pg_ctl_wait_option[remote] #235

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dmitry-lipetsk opened this issue Apr 5, 2025 · 5 comments
Closed
Assignees

Comments

@dmitry-lipetsk
Copy link
Collaborator

https://app.travis-ci.com/github/postgrespro/testgres/jobs/631955014

I think, we should wait for creation of postmaster.pid and after this begin stopping our node.

______________ TestTestgresCommon.test_pg_ctl_wait_option[remote] ______________
[gw1] linux -- Python 3.12.3 /tmp/testgres_venv/bin/python
self = <testgres.tests.test_testgres_common.TestTestgresCommon object at 0x7efe130f0470>
node_svc = <testgres.tests.helpers.global_data.PostgresNodeService object at 0x7efe13e72780>
    def test_pg_ctl_wait_option(self, node_svc: PostgresNodeService):
        assert isinstance(node_svc, PostgresNodeService)
        C_MAX_ATTEMPTS = 50
    
        node = __class__.helper__get_node(node_svc)
        assert node.status() == NodeStatus.Uninitialized
        node.init()
        assert node.status() == NodeStatus.Stopped
        node.start(wait=False)
        nAttempt = 0
        while True:
            if nAttempt == C_MAX_ATTEMPTS:
                #
                # [2025-03-11]
                #  We have an unexpected problem with this test in CI
                #  Let's get an additional information about this test failure.
                #
                logging.error("Node was not stopped.")
                if not node.os_ops.path_exists(node.pg_log_file):
                    logging.warning("Node log does not exist.")
                else:
                    logging.info("Let's read node log file [{0}]".format(node.pg_log_file))
                    logFileData = node.os_ops.read(node.pg_log_file, binary=False)
                    logging.info("Node log file content:\n{0}".format(logFileData))
    
>               raise Exception("Could not stop node.")
E               Exception: Could not stop node.
C_MAX_ATTEMPTS = 50
__class__  = <class 'testgres.tests.test_testgres_common.TestTestgresCommon'>
logFileData = '2025-04-05 19:38:19.256 GMT [29667] LOG:  starting PostgreSQL 17.4 (Ubuntu 17.4-1.pgdg24.04+2) on x86_64-pc-linux-gnu... FATAL:  could not create any TCP/IP sockets\n2025-04-05 19:38:19.261 GMT [29667] LOG:  database system is shut down\n'
nAttempt   = 50
node       = PostgresNode(name='testgres-2811a483-a174-42ac-ba1d-f27e23be4387', port=57388, base_dir='/tmp/tgsn_RcX1GQ')
node_svc   = <testgres.tests.helpers.global_data.PostgresNodeService object at 0x7efe13e72780>
self       = <testgres.tests.test_testgres_common.TestTestgresCommon object at 0x7efe130f0470>
tests/test_testgres_common.py:801: Exception
------------------------------ Captured log call -------------------------------
INFO     root:test_testgres_common.py:810 Try to stop node. Attempt #1.
INFO     root:test_testgres_common.py:818 Node is not stopped. Exception (ExecUtilException): Utility exited with non-zero code (1). Error: `pg_ctl: PID file "/tmp/tgsn_RcX1GQ/data/postmaster.pid" does not exist
Is server running?`
Command: ('/usr/lib/postgresql/17/bin/pg_ctl -D /tmp/tgsn_RcX1GQ/data -W stop',)
Exit code: 1
---- Error:
pg_ctl: PID file "/tmp/tgsn_RcX1GQ/data/postmaster.pid" does not exist
Is server running?
INFO     root:test_testgres_common.py:806 Wait 1 second.
INFO     root:test_testgres_common.py:808 
INFO     root:test_testgres_common.py:810 Try to stop node. Attempt #2.
INFO     root:test_testgres_common.py:818 Node is not stopped. Exception (ExecUtilException): Utility exited with non-zero code (1). Error: `pg_ctl: PID file "/tmp/tgsn_RcX1GQ/data/postmaster.pid" does not exist
Is server running?`
Command: ('/usr/lib/postgresql/17/bin/pg_ctl -D /tmp/tgsn_RcX1GQ/data -W stop',)
Exit code: 1
---- Error:
pg_ctl: PID file "/tmp/tgsn_RcX1GQ/data/postmaster.pid" does not exist
Is server running?
INFO     root:test_testgres_common.py:806 Wait 1 second.
INFO     root:test_testgres_common.py:808 
INFO     root:test_testgres_common.py:810 Try to stop node. Attempt #3.
INFO     root:test_testgres_common.py:818 Node is not stopped. Exception (ExecUtilException): Utility exited with non-zero code (1). Error: `pg_ctl: PID file "/tmp/tgsn_RcX1GQ/data/postmaster.pid" does not exist
Is server running?`
Command: ('/usr/lib/postgresql/17/bin/pg_ctl -D /tmp/tgsn_RcX1GQ/data -W stop',)
Exit code: 1
---- Error:
pg_ctl: PID file "/tmp/tgsn_RcX1GQ/data/postmaster.pid" does not exist
Is server running?
INFO     root:test_testgres_common.py:806 Wait 1 second.
INFO     root:test_testgres_common.py:808 
INFO     root:test_testgres_common.py:810 Try to stop node. Attempt #4.
INFO     root:test_testgres_common.py:818 Node is not stopped. Exception (ExecUtilException): Utility exited with non-zero code (1). Error: `pg_ctl: PID file "/tmp/tgsn_RcX1GQ/data/postmaster.pid" does not exist
Is server running?`
Command: ('/usr/lib/postgresql/17/bin/pg_ctl -D /tmp/tgsn_RcX1GQ/data -W stop',)
Exit code: 1
---- Error:
pg_ctl: PID file "/tmp/tgsn_RcX1GQ/data/postmaster.pid" does not exist
Is server running?
INFO     root:test_testgres_common.py:806 Wait 1 second.
INFO     root:test_testgres_common.py:808 
INFO     root:test_testgres_common.py:810 Try to stop node. Attempt #5.
INFO     root:test_testgres_common.py:818 Node is not stopped. Exception (ExecUtilException): Utility exited with non-zero code (1). Error: `pg_ctl: PID file "/tmp/tgsn_RcX1GQ/data/postmaster.pid" does not exist
Is server running?`
Command: ('/usr/lib/postgresql/17/bin/pg_ctl -D /tmp/tgsn_RcX1GQ/data -W stop',)
Exit code: 1
.....
@dmitry-lipetsk
Copy link
Collaborator Author

I understood.

https://app.travis-ci.com/github/postgrespro/testgres/jobs/632045028

  • We start node without wait
  • Node can't start because it has a conflict with another (parallel) instance.
  • We can't help him
---- Error:
pg_ctl: PID file "/tmp/tgsn_iMofAp/data/postmaster.pid" does not exist
Is server running?
ERROR    root:conftest.py:626 Node was not stopped.
INFO     root:test_testgres_common.py:822 Let's read node log file [/tmp/tgsn_iMofAp/logs/postgresql.log]
INFO     root:test_testgres_common.py:824 Node log file content:
2025-04-09 11:31:38.253 GMT [27196] LOG:  starting PostgreSQL 17.4 on x86_64-pc-linux-musl, compiled by gcc (Alpine 14.2.0) 14.2.0, 64-bit
2025-04-09 11:31:38.253 GMT [27196] LOG:  could not bind IPv4 address "127.0.0.1": Address in use
2025-04-09 11:31:38.253 GMT [27196] HINT:  Is another postmaster already running on port 54302? If not, wait a few seconds and retry.
2025-04-09 11:31:38.253 GMT [27196] WARNING:  could not create listen socket for "127.0.0.1"
2025-04-09 11:31:38.253 GMT [27196] FATAL:  could not create any TCP/IP sockets
2025-04-09 11:31:38.255 GMT [27196] LOG:  database system is shut down
=========================== short test summary info ============================
FAILED tests/test_testgres_common.py::TestTestgresCommon::test_pg_ctl_wait_option[remote] - Exception: Could not stop node.
======= 1 failed, 265 passed, 4 skipped, 1 xfailed in 638.01s (0:10:38) ========
Command exited with non-zero status 1
real	10m 38.50s
user	12m 27.05s
sys	1m 36.15s
The command "docker run $(bash <(curl -s https://codecov.io/env)) -t tests" exited with 1.

Solution

Port manager should use global (host wide) locks for reserved ports.

@dmitry-lipetsk
Copy link
Collaborator Author

#245

https://app.travis-ci.com/github/postgrespro/testgres/jobs/632597068

ERROR    root:conftest.py:644 Node was not stopped.
INFO     root:test_testgres_common.py:915 Let's read node log file [/tmp/tgsn_EMLgnh/logs/postgresql.log]
INFO     root:test_testgres_common.py:917 Node log file content:
2025-05-04 20:40:53.490 GMT [28164] LOG:  starting PostgreSQL 17.4 on x86_64-pc-linux-musl, compiled by gcc (Alpine 14.2.0) 14.2.0, 64-bit
2025-05-04 20:40:53.490 GMT [28164] LOG:  could not bind IPv4 address "127.0.0.1": Address in use
2025-05-04 20:40:53.490 GMT [28164] HINT:  Is another postmaster already running on port 58514? If not, wait a few seconds and retry.
2025-05-04 20:40:53.491 GMT [28164] WARNING:  could not create listen socket for "127.0.0.1"
2025-05-04 20:40:53.491 GMT [28164] FATAL:  could not create any TCP/IP sockets
2025-05-04 20:40:53.495 GMT [28164] LOG:  database system is shut down
=========================== short test summary info ============================
FAILED tests/test_testgres_common.py::TestTestgresCommon::test_pg_ctl_wait_option[remote] - Exception: Could not stop node.

@dmitry-lipetsk
Copy link
Collaborator Author

This problem on the local system (docker, py 3.8, alpine)

Image

@dmitry-lipetsk
Copy link
Collaborator Author

dmitry-lipetsk commented May 7, 2025

Test must analyze postgresql.log and restart if a port conflict is detected.

dmitry-lipetsk added a commit to dmitry-lipetsk/testgres that referenced this issue May 7, 2025
This commit must fix a problem in test_pg_ctl_wait_option when his PostgreSQL instance conflicts with another one.

For this, we added two new things:
 - PostgresNodeLogReader
 - PostgresNodeUtils

PostgresNodeLogReader reads server logs.

PostgresNodeUtils provides an utility to detect a port conflict.

PostgresNode::start also uses these new classes.
dmitry-lipetsk added a commit that referenced this issue May 7, 2025
This patch must fix a problem in test_pg_ctl_wait_option when his PostgreSQL instance conflicts with another one.

For this, we added two new things:
 - PostgresNodeLogReader
 - PostgresNodeUtils

PostgresNodeLogReader reads server logs.

PostgresNodeUtils provides an utility to detect a port conflict.

PostgresNode::start also uses these new classes.
@dmitry-lipetsk
Copy link
Collaborator Author

Was close via #257

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant