We have a situation where we have populated some 700TB of data in a directory before a quota was applied. The quota appears to only be applied to new data in said directory (data written after the quota). Is there a trick to get the whole dir fed into the quota accounting?
Hi John,
You can use the following command to recompute the quota node:
eos ns recompute_quotanode <path>|cid:<decimal_id>|cxid:<hex_id>
recompute the specified quotanode
Cheers,
Elvin
One question regarding this command. We do not use quota, but we use tree size information, and there is an equivalent command eos ns recompute_treesize, which is sometimes needed because the information is not always accurate.
In 2 cases, when we ran this command on a large hierarchy, this completely blocked the namespace while it was running, and it also seem that it was running way too long eveng considering the number of folders to be browsed. Note that we have still version 5.2.32 so it might be that it is fixed in newer versions, athough I didn’t see anything in the release notes about it.
But I wonder if it is normal that all namespace is locked while such a command is launched ? And if there might be an issue when running it on a large folder that it goes in some infitinite loop ?
In any case, if the lock is unavailable, maybe a maximum running time of the command could be enforced so that we do not risk to hang the whole instance when we make a mistake on the path ?
Hi Franck,
currently this is implemented like this:void
NsCmd::TreeSizeSubcmd(const eos::console::NsProto_TreeSizeProto& tree,
eos::console::ReplyProto& reply)
{
eos::common::RWMutexWriteLock ns_wr_lock(gOFS->eosViewRWMutex);
…
So it locks the whole namespace while it does it, because otherwise the resulting information is wrong if someone changes something in the meanwhile. So you cannot really run this on large trees during production.
One could refine now this function to lock only the individual containers involved, so only the recomputing tree is locked for new writes.
There are several ways to put a protection, we could estimate the run-time and then do a yes/no question, or add a timeout parameter, but then you could forget to put it and you have the same problem.
OK, thank you for confirming that it is a delicate operation in production.
I understand that rewriting the function refining the lock might be a hard work, so at least introducing a protection could be a helpful first step.
Estimating the run-time would also take a long time if the request is large, so maybe not a good option.
To my opinion, a good compromise could be to add a default reasonable timeout that can be overridden in case we really know what we do. Since the whole namespace is locked, and the operation would prevent any other access, so even a few seconds timeout makes sense, for me.
Or also simply change the default of this :
–depth : maximum depth for recomputation, default 0 i.e no limit
put it 1, to just do the first level by default and avoid mistakes
Hi folks,
I have another question about this topic.
We have two paths (/eos/alicelblaf/ and /eos/alicelblhpcs/) and there was only a quota set for one of them (/eos/alicelblaf/):
[root@alicemgm0.lbl.gov ~]# eos quota ls
┏━> Quota Node: /eos/alicelblaf/
┌──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┐
│user │used bytes│logi bytes│used files│aval bytes│aval logib│aval files│ filled[%]│vol-status│ino-status│
└──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┘
cern 1.07 PB 1.07 PB 1.69 M 0 B 0 B 0 100.00 % ignored ignored
┌──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┐
│group │used bytes│logi bytes│used files│aval bytes│aval logib│aval files│ filled[%]│vol-status│ino-status│
└──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┘
900 1.07 PB 1.07 PB 1.69 M 1.35 PB 1.35 PB 0 79.56 % ok ignored
nobody 0 B 0 B 0 0 B 0 B 0 100.00 % ignored ignored
┌──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┐
│summary │used bytes│logi bytes│used files│aval bytes│aval logib│aval files│ filled[%]│vol-status│ino-status│
└──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┘
All users 1.07 PB 1.07 PB 1.69 M 0 B 0 B 0 100.00 % ignored ignored
All groups 1.07 PB 1.07 PB 1.69 M 1.35 PB 1.35 PB 0 79.56 % ok ignored
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
The majority of data is under the other path (/eos/alicelblhpcs/):
[root@alicemgm0.lbl.gov ~]# eos df
┌──────────────┬────────┬────────┬────────┬───────────────┬────────────┬──────┬───────┬─────────────────┐
│Instance │ Size│ Used│ Files│ Directories│ PCR GB/TB*s│ Use%│ Vol-x│ Path│
└──────────────┴────────┴────────┴────────┴───────────────┴────────────┴──────┴───────┴─────────────────┘
alicelblhpcs 7.37 PiB 5.70 PiB 109.03 M 103.79 k 0.01 77% 1.00 /eos/alicelblhpcs
I’ve set a quota of 7.23PB for /eos/alicelblhpcs:
[root@alicemgm0.lbl.gov ~]# eos quota set -g 900 -v 7230000000000000 -p /eos/alicelblhpcs
updating quota using 7230000000000000 bytes (7230000000000000 raw bytes)
success: updated volume quota for gid=900 for node /eos/alicelblhpcs/
But recomputing the quota fails:
[root@alicemgm0.lbl.gov ~]# eos ns recompute_quotanode /eos/alicelblhpcs
error: errc=52 msg=“[ERROR] Operation expired”
And even while it was running the command would have taken a little over a year to complete.
eos quota ls /eos/alicelblhpcs now looks like this:
[root@alicemgm0.lbl.gov ~]# eos quota ls /eos/alicelblhpcs
┏━> Quota Node: /eos/alicelblhpcs/
┌──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┐
│user │used bytes│logi bytes│used files│aval bytes│aval logib│aval files│ filled[%]│vol-status│ino-status│
└──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┘
cern 6.42 PB 6.42 PB 107.15 M 0 B 0 B 0 100.00 % ignored ignored
root 20.48 KB 20.48 KB 715 0 B 0 B 0 100.00 % ignored ignored
┌──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┐
│group │used bytes│logi bytes│used files│aval bytes│aval logib│aval files│ filled[%]│vol-status│ino-status│
└──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┘
900 6.42 PB 6.42 PB 107.15 M 7.23 PB 7.23 PB 0 88.80 % ok ignored
nobody 0 B 0 B 0 0 B 0 B 0 100.00 % ignored ignored
root 20.48 KB 20.48 KB 715 0 B 0 B 0 100.00 % ignored ignored
┌──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┬──────────┐
│summary │used bytes│logi bytes│used files│aval bytes│aval logib│aval files│ filled[%]│vol-status│ino-status│
└──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┘
All users 6.42 PB 6.42 PB 107.15 M 0 B 0 B 0 100.00 % ignored ignored
All groups 6.42 PB 6.42 PB 107.15 M 7.23 PB 7.23 PB 0 88.80 % ok ignored
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
Is there anything else we could try to apply the new quota? Or are we just SOL with that amount of data?
Thanks,
Torben
Hi Torben,
The “operation expired” is an artifact of how the XRootD client works - the command still executed in the MGM server. You can actually see that it accounted for 6.42 PB. You can run the same command using a longer timeout like for example (10h timeout):
XRD_STREAMTIMEOUT=36000 eos ns recompute_quotanode ...
What makes you think that the quota recomputation would take “over an year to complete”?
By the looks of it, most of the data is already accounted in the quota node. Also you need to keep in mind that both the quota and the tree-size accounting (that is used as input for the df command) are not necessarily strongly consistent with the reality on the disk. You can also trigger an ns recompute_tree_size for a more accurate reading of the df result.
Cheers,
Elvin
Hi Elvin,
Thanks for the explanation.
When I had run the eos ns recompute_quotanode command and then checked with eos quota ls there was a value that was slowly increasing (maybe it was under used files) which I misinterpreted as the quota being applied.
Cheers,
Torben