Geoscheduling test on /eulake

Dear EOS developers,

While testing geoscheduling in the latest version of EOS we’ve run across the following issue:

We have some directories that are bound to specific geotags.

For instance, we want first replicas of files in ‘data.pnpionly’ directory to be stored at PNPI FST:

EOS Console [root://eulake.cern.ch] |/> attr ls  /eos/eulake/tests/rutests/spb/data.pnpionly/
sys.forced.blocksize="4k"
sys.forced.checksum="adler"
sys.forced.group="default.24"
sys.forced.layout="replica"
sys.forced.nstripes="2"
sys.forced.placementpolicy="gathered:RU::PNPI"
sys.forced.space="default"
EOS Console [root://eulake.cern.ch] |/>

Our PNPI FST with “RU::PNPI” geotag is online:

EOS Console [root://eulake.cern.ch] |/> fs ls 154
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
?host                                 ?port?    id?                            path?      schedgroup?             geotag?        boot?  configstatus? drainstatus?  active?          health?
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 v006.pnpi.nw.ru                       1095    154                   /ceph/eosdata0       default.24            RU::PNPI       booted             rw      nodrain   online        no mdstat

EOS Console [root://eulake.cern.ch] |/>    

The following copy operation is supposed to place first replica on PNPI FST, which belongs to the group “default.24”:

[azaroche@alice02 ~]$ file=pp.$HOSTNAME.${RANDOM};xrdcp /etc/passwd root://eulake.cern.ch://eos/eulake/tests/rutests/spb/data.pnpionly/${file}
[2.228kB/2.228kB][100%][==================================================][2.228kB/s]
[azaroche@alice02 ~]$ eos fileinfo /eos/eulake/tests/rutests/spb/data.pnpionly/${file}
  File: '/eos/eulake/tests/rutests/spb/data.pnpionly/pp.alice02.19254'  Flags: 0644
  Size: 2281
Modify: Mon Jun 25 22:39:16 2018 Timestamp: 1529959156.0
Change: Mon Jun 25 22:39:15 2018 Timestamp: 1529959155.836272040
  CUid: 8619 CGid: 2688  Fxid: 0006f0ad Fid: 454829    Pid: 61272   Pxid: 0000ef58
XStype: adler    XS: 6b 0e 07 93     ETAG: 122092230017024:6b0e0793
replica Stripes: 2 Blocksize: 4k LayoutId: 00100112
  #Rep: 2
------------------------------------------------------------------------------------------------------------------------------------------------
?no.? fs-id?                    host?      schedgroup?            path?      boot?  configstatus? drainstatus?  active?                  geotag?
------------------------------------------------------------------------------------------------------------------------------------------------
 0      122         dvl-mb01.jinr.ru        default.0      /mnt/data01     booted             rw      nodrain   online                    Dubna
 1       97  p05798818t49625.cern.ch        default.0          /data02     booted             rw      nodrain   online      0513::R::0050::RA65

*******
[azaroche@alice02 ~]$

The file is there, but for some reason its replicas are located on FSTs from a completely different group “default.0”:

Why is that? Is there anything missing from our configuration? Please advise.
PS: sometimes after a few retries replicas finally get placed in the correct group, but this behavior is very inconsistent.

Groups are selected round-robin e.g. each group must have the same layout. If you make a group with GEOTAG A & B and another with GEOTAG C, it does not mean, that clients from C only place in the second group. If you want that behaviour, you have to create a separate space or create the groups properly. The way to design the groupss is, that each group has all GEOTAGS, otherwise the algorithm does not work.

Thanks a lot, Andreas!

But I thought abiut it, and I checked it:

[azaroche@alice02 ~]$ file=pp.$HOSTNAME.${RANDOM};xrdcp /etc/passwd root://eulake.cern.ch://eos/eulake/tests/rutests/spb/data.pnpionly/${file}?eos.group=24
[2.228kB/2.228kB][100%][==================================================][2.228kB/s]
[azaroche@alice02 ~]$ eos fileinfo /eos/eulake/tests/rutests/spb/data.pnpionly/${file}
  File: '/eos/eulake/tests/rutests/spb/data.pnpionly/pp.alice02.22954'  Flags: 0644
  Size: 2281
Modify: Mon Jun 25 22:40:02 2018 Timestamp: 1529959202.0
Change: Mon Jun 25 22:40:02 2018 Timestamp: 1529959202.700904145
  CUid: 8619 CGid: 2688  Fxid: 0006f0b2 Fid: 454834    Pid: 61272   Pxid: 0000ef58
XStype: adler    XS: 6b 0e 07 93     ETAG: 122093572194304:6b0e0793
replica Stripes: 2 Blocksize: 4k LayoutId: 00100112
  #Rep: 2
????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
?no.? fs-id?                    host?      schedgroup?            path?      boot?  configstatus? drainstatus?  active?                  geotag?
????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
 0      122         dvl-mb01.jinr.ru        default.0      /mnt/data01     booted             rw      nodrain   online                    Dubna
 1      124    fst1.grid.surfsara.nl        default.0     /data/disk01     booted             rw      nodrain   online                  NL-Sara

*******
[azaroche@alice02 ~]$

I choose group space (eos.group=24), but it does not work - I see fst from group.0 again.

Now I see only one step - to change group for fst RU::PNPI to group.0

But, in this situation, geosheduling will work only for group.0. , but groups are selected round-robin, and geosheduling will work time to time only (when group.0 will selected)

And anothe. I use attribute sys.forced.group for my catalog. It does not work too.

Hi Andrey,

I just tried the forced group (eos.group) thing and it works for me. The ‘sys.forced.group’ takes only the index ‘24’ as agrument! If you have put that in the target, that might cause the problem, because it can translate to index 0.

Nevertheless, in your lake, you should in principle have only homogeneous groups e.g. each group has a disk of each location. If that is not possible, then you have to create separate spaces with different constellations.

Ou!! Thank a lot, Andreas for check my error with “sys.forced.group”

I changed attributes:

[azaroche@alice02 ~]$ eos attr ls /eos/eulake/tests/rutests/spb/data.pnpionly/
sys.forced.blocksize="4k"
sys.forced.checksum="adler"
sys.forced.group="24"
sys.forced.layout="replica"

Now group selection works:

[azaroche@alice02 ~]$ file=pp.$HOSTNAME.${RANDOM};xrdcp /etc/passwd root://eulake.cern.ch://eos/eulake/tests/rutests/spb/data.pnpionly/${file}
[2.228kB/2.228kB][100%][==================================================][2.228kB/s]
[azaroche@alice02 ~]$ eos fileinfo /eos/eulake/tests/rutests/spb/data.pnpionly/${file}
  File: '/eos/eulake/tests/rutests/spb/data.pnpionly/pp.alice02.19362'  Flags: 0644
  Size: 2281
Modify: Tue Jun 26 13:02:41 2018 Timestamp: 1530010961.0
Change: Tue Jun 26 13:02:40 2018 Timestamp: 1530010960.518989287
  CUid: 8619 CGid: 2688  Fxid: 000714c9 Fid: 464073    Pid: 61272   Pxid: 0000ef58
XStype: adler    XS: 6b 0e 07 93     ETAG: 124573647372288:6b0e0793
replica Stripes: 2 Blocksize: 4k LayoutId: 00100112
  #Rep: 2
┌───┬──────┬────────────────────────┬────────────────┬────────────────┬──────────┬──────────────┬────────────┬────────┬────────────────────────┐
│no.│ fs-id│                    host│      schedgroup│            path│      boot│  configstatus│ drainstatus│  active│                  geotag│
└───┴──────┴────────────────────────┴────────────────┴────────────────┴──────────┴──────────────┴────────────┴────────┴────────────────────────┘
 0      154          v006.pnpi.nw.ru       default.24   /ceph/eosdata0     booted             rw      nodrain   online                 RU::PNPI
 1       25  p05496644k62259.cern.ch       default.24          /data25     booted             rw      nodrain   online      9918::R::0002::WK07

*******
[azaroche@alice02 ~]$

I checked it several times.

Next step - geosheduling. If you want to have only homogeneous groups, for 20 groups, all FSTs should have 20 pools.

I would try to have only one group and the right placement policy to not select two fs on the same geotag.

Yes. I will select group.0 . I will continue my tests after to change group of PNPI FST to group.0.