LRU system setting

  1. new command for group default.1:
    изображение
    Without change…
  2. But! I see change placement file in “pain” directory to group default.0 (I did not do anything with this file after my last report):

I think what part of the log should I send you. All file is 141MB.

Hi Andrey,

You don’t need to send the logs, it’s already clear from the output of the file convert command where the issue is coming from. I seems there is a bug in parsing some options in the file convert command. I will take a closer look at it and fix it.

Therefore, you are left with creating a default.0 group. The initial error you got seems to come from a problem reading from the source of the conversion and that is why it failed. If you restarted the MGM then the list of conversions is persisted and retried on MGM restart - so it looks like the conversion went though.

I will let you know when I have the file convert command fixed.

Thanks,
Elvin

Thanks for help!

Thanks again for your help.

I have added a second FS to group default.0:


Convert does work, but shows an error

Convert:

  1. Before:

    Convert command:

    After:

But LRU still does not work with this group and directory:

directory setting (NOTE - group 0):


File copy:

File layout has not changed for the last day (definitely longer than 1 hour)

I do not see automatic conversion happening and I do not see any information about it in the logs. Can you help me with that?

Hello again.

  1. I changed sys.conversion.*
    From
    sys.conversion.*="00100002|gathered:RU::JINR::LITDVL"
    to
    sys.conversion.*="00650002|gathered:RU::JINR::LITDVL"
  2. I restarted LRU and mgm. And I see logs:

    But I do not see the result again…

Hi Andrey,

Let’s first sort out the conversion error and then we can can have a look at the LRU. Could you please send me the logs of the MGM and the FSTs involved in the initial conversion?
Namely, do a simple “eos file convert --rewrite” and let’s understand why you get an error. So even in the presence of this error your file does end up properly converted, right?

Thanks,
Elvin

I repeat “covert” command, but I do not see error now:


изображение
And result:

But I see only old error:

Ah… ok, I understand. You can clear the old failed conversions using the following command:
eos convert clear --failed.
I will now try out the LRU command and see if I can reproduce your issue.

Thanks,
Elvin

Hi Andrey,

The problem comes from the fact that the LRU code was not properly adapted to the new converter mechanism. We currently have two converter subsystems in place: the old one based on paths creaded in /eos/<instance>/proc/conversion/ and the new one which uses QuarkDB and preserves the file identifiers of the converted files. In order to disable the new converter and re-enable the old one which will be able to pick up the LRU jobs you need to define the following env variable in your /etc/sysconfig/eos_env file.
EOS_FORCE_DISABLE_NEW_CONVERTER=1

I will adapt the LRU to work also with the new converter code and this will be available in the next release. I will also improve the LRU since if you change the interval when it runs, one still has to wait for the old interval to expire - which usually is 7 days! … so this is another thing that I will fix.

Please try out the proposed workaround and let me know the outcome. Thanks for all the info and the report.

Cheers,
Elvin

I have Centos 7 and eos EOS 4.8.22 (2020).
Should I change /etc/sysconfig/eos_env?
I puted this line to /etc/sysconfig/eos and restarted MGM. And I see:


But file layout is not chcnged:

I have not seen any files in /eos/jinrdvl/proc/conversion/ before. But I have seen the result of a command
eos convert list
Now result empty:

UUUpps…
Sorry.
I changed /etc/sysconfig/eos as you recommended, and LRU is work now:


Note: LayoutId: 00000000
And group: default.3
And I see files:

But I do not see “convert list”

But the main problem is solved)) Thanks a lot!

Hi Andrey,

It’s normal that you don’t see anything in the output of eos convert list as this only works with the new converter - and by putting that env variable you have disabled the new converter and enabled the old one.

I see there is still a problem with the layout of the file. It’s normal that the group is any in the default space, in this case default.3. Can you try removing all files from /eos/<instance>/proc/conversion/ and try again the LRU?

Let me know if the layout of the files converted by LRU is what you expect.

Cheers,
Elvin

Hello!
I created the new directory with new RLU roles:


Cleared the conversion directory:
изображение
Put the new file:

And restarted MGM.
I see:
LOG:

Result:

NOTE:
1)
I have not specified the LayoutId:
sys.conversion.*="gathered:RU::JINR::LITDVL"
In the result:
LayoutId: 00000000 Redundancy: d1::t0
2)
Time of file created is
Fri Nov 6 17:49:44 MSK 2020
but the time of check of layout:
Fri Nov 6 17:50:30 MSK 2020
File was not older 10 min.

Hi Andrey,

Note that for the sys.lru.convert.match there is no “greater than” operator for time values. Furthermore, if you want it to be 10 minutes the correct specification is 10min, therefore the rule should be written like this (a snippet from my own config):

EOS Console [root://localhost] |/eos/dev/replica/> attr ls .
sys.conversion.*="00100002|gathered:elvin"
sys.forced.atomic="1"
sys.forced.blocksize="4k"
sys.forced.checksum="adler"
sys.forced.layout="replica"
sys.forced.nstripes="2"
sys.forced.space="default"
sys.lru.convert.match="*:10min"

I will improve the documentation page as this is a bit cryptic at the moment.

Hope this helps,
Elvin

Thanks!!

O! I can use layout 00100002 as “plain” in the future)
By documentation:
The hex layout ID contains also the checksum and blocksize settings. The best is to create a file with the desired layout and get the hex layout ID using **eos** **file** **info** **<path>**.
I see some variants in eos fileinfo

Hi Andrey,

We’ve just released EOS 4.8.27 that addresses all the issues discussed in this thread. You should now be able to run the LRU also with the new converter engine. Thanks again for the report!

Cheers,
Elvin

Hello!
I continue testing with LRU and “conversion” on EOS version 4.8.75-1.el7.cern.x86_64.
And I see the first error:

EOS Console [root://dvl-eos.jinr.ru] |/eos/tests/spd/mainpoolfcache/> file convert /eos/tests/spd/mainpoolfcache/0016576 gathered:RU::JINR::LITDVL
info: conversion based layout+stripe arguments
error: unable to push conversion job '000000000000190a:default#00650002' to QuarkDB (errc=0) (Success)
EOS Console [root://dvl-eos.jinr.ru] |/eos/tests/spd/mainpoolfcache/>  

Who are have any ideas?

Hi Andrey,

Can you have a look on the MGM side in the logs and check for errors coming from the “ConvertDriver” file? If there is nothing, then put the instance in debug mode eos debug debug "*" and then issue the command again. Then grep for log lines coming from the ConvertDriver file. Also can you paste the status of the converter? eos convert status.

Thanks,
Elvin

Hi, Elvin!
Sorry for delay, but I found the reason for this error - it is the conflict between the converting process started from LRU and started by hand. But I found some new errors - LRU is not working really. But, sorry, but I need to collect logs and information for the next message.