CERN Accelerating science

HTTP PUT failing on internal redirect through eos-nginx proxy

Hello all,

I’m testing HTTP PUTs against the eos-nginx proxy, and I’m using the internal redirect functionality to hide the redirect from the client, the obvious issues with webdav and other clients following redirects. I’ve checked against the EOS source, uboxed and of course asked Google.

Another set of eyes to point out my error would be appreciated, as I’m totally missing where the fault lies.

I know the eos-nginx instance has a patch for redirecting on PUT, and it works if I use curl with follow redirects, and talk to the MGM directly.

If I talk through the Nginx instance, I’m getting 502 bad gateways, as shown below. GETs, HEADs, and PROPFINDs work properly, as do writes against the MGM directly using the same method (with -L turned on). A curl output is below:

curl -v -T testfile http://mgmproxy:10000/eos/test/testfile
* About to connect() to mgmproxy port 10000 (#0)
*   Trying 10.42.4.16...
* Connected to mgmproxy (10.42.4.16) port 10000 (#0)
> PUT /eos/test/testfile HTTP/1.1
> User-Agent: curl/7.29.0
> Host: mgmproxy:10000
> Accept: */*
> Content-Length: 352
> Expect: 100-continue
>
< HTTP/1.1 100 Continue
* We are completely uploaded and fine
< HTTP/1.1 502 Bad Gateway
< Server: nginx/1.12.2
< Date: Wed, 12 Aug 2020 04:32:08 GMT
< Content-Type: text/html
< Content-Length: 173
< Connection: keep-alive
<
<html>
<head><title>502 Bad Gateway</title></head>
<body bgcolor="white">
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.12.2</center>
</body>
</html>
* Connection #0 to host mgmproxy left intact

Nginx’s internal_redirect.error.log reports:

2020/08/12 05:50:56 [error] 5613#0: *39 upstream prematurely closed connection while reading response header from upstream, client: 10.42.4.16, server: mgmproxy, request: "PUT /eos/test/testfile HTTP/1.1", upstream: "http://10.10.0.174:8001/eos/test/testfile?encURI=....mgm.logid%3Dca72a478-dc5f-11ea-b7cb-b6e839c4465e%26mgm.replicaindex%3D0%26mgm.replicahead%3D0%26mgm.etag%3D%223861444034560:00000000%22%26mgm.id%3D00003831", host: "mgmproxy:10000"

The MGM xrdlog.mgm and the Http.log report no errors, as they all accept the file. The error is in the Nginx redirected PUT.

The running nginx config is relatively simple, and taken from the eos-nginx template.

user daemon;
worker_processes  1;
error_log         /var/log/nginx/error.log info;
pid               /var/run/nginx.pid;
env OPENSSL_ALLOW_PROXY_CERTS;
events {
    worker_connections  1024;
}
http {
    default_type  application/octet-stream;
    upstream eos_mgm_backend {
       ip_hash;
       server eos-mgm-0.mgm.test.svc.cluster.local:8000;
       keepalive 128;
    }
    log_format    main '$remote_addr - $remote_user [$time_local] "$request" '
                       '$status $body_bytes_sent "$http_referer" '
                       '"$http_user_agent" "$http_x_forwarded_for"';
    access_log         /var/log/nginx/access.log
                       main;
    sendfile               on;
    keepalive_timeout      65s;
    client_max_body_size   1000000m;

    server {
        server_name  mgmproxy;
        listen       10000;
        location / {
            add_header content-type "text/html;charset=UTF-8";
            proxy_pass          http://eos_mgm_backend;
            proxy_set_header   Host               $host;
            proxy_set_header   Auth-Type          "unix";
            proxy_set_header   X-Real-IP          $remote_addr;
            proxy_set_header   X-Forwarded-For    $proxy_add_x_forwarded_for;
            proxy_set_header   Remote-User        "testuser";
            client_body_buffer_size     128k;
            proxy_connect_timeout       90s;
            proxy_send_timeout          90s;
            proxy_read_timeout          90s;
            proxy_buffer_size           256k;
            proxy_buffers               8 256k;
            proxy_busy_buffers_size     512k;
            proxy_temp_file_write_size  512k;
        }
        location ~* ^/internal_redirect/(.*):(.*?)/(.*) {
            internal;

            resolver 10.43.0.10;
            access_log /var/log/nginx/internal_redirect.access.log main;
            error_log  /var/log/nginx/internal_redirect.error.log;
            set $download_host $1;
            set $download_port $2;
            set $download_uri  $3;
            set $download_url http://$download_host:$download_port/$download_uri?$args;
            proxy_set_header Host          $download_host:$download_port;
            proxy_set_header Authorization '';
            proxy_max_temp_file_size 0;
            proxy_pass $download_url;
        }
    }
}

Is there something I’m missing, as PUTs certainly don’t work with OC sync presently, and it’s not working with webdav clients or redirect-less curl PUTs?

Hi David,

Do you see any errors in the FST logs on the machine pointed to by the redirection url?

"http://10.10.0.174:8001/eos/test/testfile?encURI=...

Also could you try issuing this request manually against the FST so we exclude any issues on that side? I am no expert on nginx configuration but comparing with what we use I did not spot anything suspicious.

Cheers,
Elvin

Hi @esindril,

The redirection URL works fine when called from curl direct. I even set nginx into debug mode and examined the output, and as far as I can see, the HTTP request is identical. I stripped out extra headers and so on, as the client was running on the same host as the proxy, so the missing headers shouldn’t matter. My nginx is running host native, so I know it’s not the container stack.

Exploring further, the big difference I’m seeing in the FST logs is, using the below line as an example:

200817 03:06:54 time=1597633614.634414 func=open level=INFO logid=b44ff0d0-e036-11ea-aef8-b6e839c4465e unit=fst@za1695pr-0.fst.cta.svc.cluster.local:1095 tid=00007f07693f1700 source=XrdFstOfsFile:119 tident=nobody sec=unix uid=0 gid=0 name=nobody geo="" path=/eos/cta/archive/restic/testbackup/rclone.conf info=&cap.sym=<...>&cap.msg=<...>&mgm.logid=b44ff0d0-e036-11ea-aef8-b6e839c4465e&mgm.replicaindex=0&mgm.replicahead=0&mgm.id=00003860 open_mode=4302

The final part of the log line is relevant. Via the working external redirect: func=open…open_mode=4302

Via the broken internal nginx redirect: func=open…open_mode=0

(You may notice the cluster name, but this particular path is not interacting with CTA and that functionality is turned off at present)

Everything else up to this point matches and is identical, baring logids and timestamps. The vid checks match, and URLs are identical.

I know I’ve seen this before in the past somewhere, but I really can’t remember where. I’ll follow up with the team here at AARNet to see if it jogs any memories, and keep this thread updated, because if I’m coming across this, someone else may at some point.

Hi @davidjericho,

The difference in the open flags means that this is actually an open for read (i.e. GET) rather than a PUT. So it looks like the gateway when redirecting is issuing a GET. Are you using the extra patches for nginx that enable all this handling of PUT requests? What we are using on our machines is eos-nginx-1.9.9-5.

Hope it helps,
Elvin

1 Like

eos-nginx-1.12.2-5.x86_64, installed from the CERN repos using yum install eos-nginx. The discussion within the AARNet team yesterday after I posted, was that possibly the PUT was being morphed to a GET, and were going to run it through a MITM proxy to see exactly what was happening. It didn’t matter that the debug log from nginx was reporting a PUT, it very much appears to have been doing something else.

I performed a force downgrade eos-nginx to 1.9.9-5, and the curl PUT is now behaving exactly as it should. Upgrade to 1.12.2-5 again, it breaks, revert to 1.9.9-5 and it’s all fine again.

I guess that means the eos-nginx-1.12.2-5 in the CERN repo is broken somehow.

Thank you for the help @esindril