Skip to content

Error: write CONNECTION_CLOSED false #43

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mdorda opened this issue Mar 9, 2020 · 22 comments
Closed

Error: write CONNECTION_CLOSED false #43

mdorda opened this issue Mar 9, 2020 · 22 comments

Comments

@mdorda
Copy link

mdorda commented Mar 9, 2020

First of all, thank you so much for this amazing library! It is the best postgres driver so far.

We have a service with ~60 req/s to the database. The postgres package is set to default, only connection stuffs (host, port, username, password, dbName, ...) are set. Everything works like a charm, but after more or less 2 hours, small number of errors occur:

Error: write CONNECTION_CLOSED false
    at Object.connection (/node-reverse/node_modules/postgres/lib/types.js:191:5)
    at close (/node-reverse/node_modules/postgres/lib/connection.js:173:18)
    at Socket.onclose (/node-reverse/node_modules/postgres/lib/connection.js:199:5)
    at Object.onceWrapper (events.js:286:20)
    at Socket.emit (events.js:198:13)
    at TCP._handle.close (net.js:607:12)

The number of the errors increases with the time of application runtime. It seems like bad pool handling to me. We are testing it right now with the timeout option, so I will let you know if it helps or not.

@mdorda
Copy link
Author

mdorda commented Mar 9, 2020

Nope, it did not help.

@porsager
Copy link
Owner

porsager commented Mar 9, 2020

Thanks @mdorda

It's odd that the connection should close sporadically like that for starters.

Timeout does nothing else than close idle connections in the pool if they are unused, so I don't think that would help either if you have ~60 req/s.

Can you share a little about your setup?

@mdorda
Copy link
Author

mdorda commented Mar 9, 2020

Thank for the answer. I found another error in our logs which occures together with the one above:

Error: read ECONNRESET
    at TCP.onStreamRead (internal/stream_base_commons.js:111:27)

Our infrastructure

  • the database is on separete machine, but at the same cluster
  • we use kubernates and dockers
  • we use pg package for our another projects and we faced very similar issue, but statement_timeout and connectionTimeoutMillis solved everything
  • from time to time some docker can lose its connection to the database, but the pool is automatically reconnected (talking about pg package)

How we use your package:

Everytime any docker starts, it fires once:

const sql = postgres({
    host: '...',
    port: 1234,
    database: 'abcd',
    username: 'user',
    password: 'pass',
    timeout: 10,
    types: [
        ... one postgis type definition ...
    ],
    connection: { statement_timeout: 10000 }
})

And with every single user request (~ 60 req/s) the similar code to the below one is called:

await sql`
    select this, that, etc
    from our_table
    where ST_Intersects(${sql.types.point([lat, lon])}, way);
`

That is all. No update, no insert. Just one select for every each users request.

Hope it is enough. Thank you for your help!

@porsager
Copy link
Owner

porsager commented Mar 9, 2020

Ok, that error makes sense too if connections are being dropped.

Maybe there is a timeout on the other end for how long each tcp connection can stay open, and that's the reason setting a very low timeout value helps out. The silly thing is that you then incur the connection start cost very often which is also a bad solution (although it does result in you not seeing any errors).

I think the correct solution is to figure out why connections are dropped like that.

What value for timeout are you using with pg ? Note that Postgres.js timeout is in seconds and pg is in milliseconds.

@mdorda
Copy link
Author

mdorda commented Mar 9, 2020

We use only statement_timeout: 10000 and connectionTimeoutMillis: 10000 for pg, nothing more. We will check the other end. Thank you for the suggestion. But I do not think it is the reason, we have 3 clusters (europe, asia and america).

  • europe ~60 req/s, the error occurs in ~2h
  • asia ~30 req/s, the error occurs in ~4h
  • america ~25 req/s, the error occurs in ~5h

Timeout on the other end would result into same times regardless the region.

@porsager
Copy link
Owner

porsager commented Mar 9, 2020

Oh, that's interesting. So it actually seems related to the number of requests, not the time..

Do you have any trace of anything related in the PostgreSQL logs? Like, are the dropped connections due to the network, or is PostgreSQL closing them?

You could try to add
onnotice: console.log to your options to see if postgres sends anything.

@porsager
Copy link
Owner

porsager commented Mar 9, 2020

Ok, so I dug a bit deeper, and connectionTimeoutMillis in pg actually sets the PostgreSQL parameter connect_timeout, and isn't related in functionality to my timeout property.

You can try to do

connection: { 
  statement_timeout: 10000,
  connect_timeout: 10000
}

which should then make Postgres.js connections. have the same settings as pg.

@mdorda
Copy link
Author

mdorda commented Mar 10, 2020

It is not possible to set connect_timeout due to unrecognized configuration parameter "connect_timeout" error. It is strange, because I found the same at pg source code. Anyway, according to the postgres documentation, there are basically 2 timeout settings: statement_timeout and idle_in_transaction_session_timeout. Both are set to 10000, but it did not help. The log is still same, even with onnotice: console.log turned on.

In the log there are thousands cases of this error:

{ Error: read ECONNRESET at TCP.onStreamRead (internal/stream_base_commons.js:111:27) errno: 'ECONNRESET', code: 'ECONNRESET', syscall: 'read' }

And right after that:

{ Error: write CONNECTION_CLOSED false
    at Object.connection (/node-reverse/node_modules/postgres/lib/types.js:191:5)
    at close (/node-reverse/node_modules/postgres/lib/connection.js:173:18)
    at Socket.onclose (/node-reverse/node_modules/postgres/lib/connection.js:199:5)
    at Object.onceWrapper (events.js:286:20)
    at Socket.emit (events.js:198:13)
    at TCP._handle.close (net.js:607:12)

Is there any option to programatically reconnect all dead connections?

@mdorda
Copy link
Author

mdorda commented Mar 10, 2020

From a longer observation it seems that it does not depend on the number of requests. More busy clusters show the error earlier than other clusters, but sometimes it takes 30 minutes, sometimes 2 hours... So number of requests just increases the probability of the error.

@porsager
Copy link
Owner

Ah sorry, It's not a connection parameter, it's an option for libpq..

It seems connectionTimeoutMillis in pg is just a timeout in the connection phase before the connection is ready for queries. I don't understand how these options solved your issue in pg.

Is there any option to programatically reconnect all dead connections?

Connections will automatically reconnect at the next query after the error you got.

@porsager
Copy link
Owner

Hmm.. Do you have this in a test setup, or are you seeing the issues in production? I'd suggest some things to try, but I'd rather try to replicate the issue if you end up testing in prod :)

@mdorda
Copy link
Author

mdorda commented Mar 10, 2020

It needs load, so it is production issue. We just rewrote it to pg package, just to check if it helps in this specific case or not. But I would prefer your package, so I am eager to get some tips :-)

@porsager
Copy link
Owner

porsager commented Mar 10, 2020

Ok. cool 🙂

I do have some stability improvements for the next major (currently on master), but there are also a few breaking changes. (bigints from postgres are cast correctly to BigInt or string depending on node version support)

While looking into your issue here, I also found an issue with errors not being thrown correctly in some instances, so you could try out what's on master currently.

@mdorda
Copy link
Author

mdorda commented Mar 10, 2020

Just to inform you, pg version works without any (non-performance) problem. I will try the master version next week and let you know. Thank you!

@porsager
Copy link
Owner

porsager commented Apr 6, 2020

Hey @mdorda . Did you have any chance of trying out master?

@porsager
Copy link
Owner

porsager commented Apr 7, 2020

I spent some time tonight implementing the connect_timeout parameter, and at the same time I found an issue that could cause stale connections. It would be really nice if you'd give master a try using eg. connect_timeout: 10 (it's currently 30 seconds as default.)

@mdorda
Copy link
Author

mdorda commented Apr 7, 2020

Hi, thank you so much for the info. I am quite busy this week. I will let you know next week. I have everything prepared in a branch, I just have not had time to publish it and watch it in production :-)

@porsager
Copy link
Owner

porsager commented Apr 8, 2020

Ah that's awesome ;)

Be sure to test out things first as there are a few breaking changes in master. Mainly bigint is returned as string now instead of unsafely casting to int.

@mdorda
Copy link
Author

mdorda commented Apr 21, 2020

I still think about it, thanks for your patience :-)

@mdorda
Copy link
Author

mdorda commented May 4, 2020

It looks it is working now! Thank you for the fix.

@mdorda mdorda closed this as completed May 4, 2020
@karlhorky
Copy link
Contributor

I'm also encountering something similar with a PostgreSQL service on Render.com, but with a very low number of connections.

I've documented the issue here: https://community.render.com/t/econnreset-with-node-js-and-postgresql/1498

In case this doesn't go anywhere soon, I may create a new issue here too.

@karlhorky
Copy link
Contributor

Opened #179 since it didn't go away overnight.

I've tried changing the connect_timeout to 10 to see if this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants