Hi All,
We are seeing some strange problems and am wondering if anyone can take a look.
We are currently running EOS 4.5.20 and eosd 4.5.20.
Here is a simple php code snippet that opens a file for writing, writes some bytes and closes the file:
$chunk_size = strlen($data);
$pass = false;
$fh = fopen($chunkFile, 'wb');
if ($fh !== false) {
$written = fwrite($fh, $data, $chunk_size);
fclose($fh);
//at this point the file should have written, lets have a look.
if ($chunk_size != $written) {
log('chunk_size!=written');
} else if ($chunk_size != filesize($chunkFile)) {
log('chunk_size!=filesize())');
} else {
$pass = true;
}
} else {
log(‘fopen failed');
}
We see 4 scenarios where this is failing
Scenario 1: fopen() fails
This is reflected in the eosd logs as an error for file open. Wrapping the above code in a loop that retires up to 100 times seems to resolve this but there is no reason it should have failed to begin with.
What can we do on our end to make this better?
Scenario 2: fwrite() fails
In php the fwrite() function returns how many bytes has been written. When a write is successful its return value should be the size of the data being written. Sometimes it fails and is reflected in the eosd logs:
200702 00:59:04 t=1593651544.011208 f=DoWrite l=ERROR tid=00007fab30ffd700 s=CacheEntry:342 error while writing piece off=0, len=10
Wrapping the above code in a loop that retries up to 100 times seems to resolve this. What can we do on our end to make this better?
Scenario 3: filesize() does not match
Sometimes after a successful fwrite() we test the file with php’s filesize() function and the sizes do not match. I have not seen anything in particular in the EOS logs to link to this but our php logs do report many “chunk_size!=filesize()”
Wrapping the above code in a loop that retries up to 100 times seems to sometimes resolve this. What can we do on our end to make this better?
Scenario 4: code passes fine without error but the file in EOS is 0 bytes!
This is probably the worst out of the four as eosd is not returning any errors back to the filesystem back to php. But looking at eosd logs and FST logs I see the following:
From eosd log:
200701 10:35:25 t=1593599725.092130 f=fileOpen l=ERROR tid=00007f801c7fd700 s=XrdIo:233 error= "open failed url=root://*AAwADAA@eoscs.cdn.aarnet.edu.au:1094///#curl#/eos/something/something/something/something/something/something/something/file3?eos.app=fuse&eos.bookingsize=0&eos.encodepath=1&eos.lfn=fxid:65c87622&fst.valid=1593599673, errno=0, errc=306, msg=[FATAL] Redirect limit has been reached"
From eosd log:
200701 10:35:25 t=1593599725.092292 f=MakeOpen l=ERROR tid=00007f801c7fd700 s=LayoutWrapper:131 file root://*AAwADAA@eoscs.cdn.aarnet.edu.au///#curl#/eos/something/something/something/something/something/something/something/file3 is already closed - won't open
From FST log:
217451 FstOfs_open: *AAwADAA.8008:121@crlt-k8 Unable to open - fst validity expired /#curl#/eos/something/something/something/something/something/something/something/file3; invalid argument
What do these error messages actually mean?
What can we do on our end to make this better?
Cheers,
Michael
Michael D’Silva • Cloud Services
AARNet Pty Ltd
m: +61 403191855
e: michael.dsilva@aarnet.edu.au
w: aarnet.edu.au
a: AARNet, Level 1, 62 Lygon St, Carlton South VIC 3053, Australia