Dear Expert,
I have found a very strange behaviour when trying to log on as a user whose home directory is on EOS fusex. The password authentication is through internal FreeIPA (Kerberos V5) and a user account has a valid Kerberos keyring as follows:
-bash-4.2$ klist -A
Ticket cache: KEYRING:persistent:556951043:krb_ccache_duXEK6R
Default principal: eos_admin1@SDFARM.KR
Valid starting Expires Service principal
09/22/2020 16:48:41 09/23/2020 16:48:41 krbtgt/SDFARM.KR@SDFARM.KR
However, when it tried to log on to a client, it does not get its home directory with a permission problem like,
❯ ssh eos_admin1@fcc
eos_admin1@fcc's password:
Last login: Tue Sep 22 16:44:54 2020 from
Could not chdir to home directory /eos/gsdc/user/e/eos_admin1: Permission denied
-bash: /eos/gsdc/user/e/eos_admin1/.bash_profile: Operation not permitted
Strangely, once it drops to /
because it cannot change to its home directory, then it can get the home when entering bash
in interactive mode:
-bash-4.2$ pwd
/
-bash-4.2$ bash -i
[eos_admin1@fcc /]$ cd
[eos_admin1@fcc ~]$ pwd
/eos/gsdc/user/e/eos_admin1
Afterwards, it can get to its home outside bash
:
[eos_admin1@fcc ~]$ exit
exit
-bash-4.2$ cd
-bash-4.2$ pwd
/eos/gsdc/user/e/eos_admin1
Furthermore, when it failed to change to its home directory initially after establishing ssh session, one can list the contents of the home, create(touch), copy, and remove a file (while still cd
does not work…)
❯ ssh eos_admin1@fcc.sdfarm.kr -p4280
eos_admin1@fcc.sdfarm.kr's password:
Last login: Tue Sep 22 16:48:42 2020 from 150.183.234.130
Could not chdir to home directory /eos/gsdc/user/e/eos_admin1: Permission denied
-bash: /eos/gsdc/user/e/eos_admin1/.bash_profile: Operation not permitted
-bash-4.2$ ls -al /eos/gsdc/user/e/eos_admin1/
total 16
drwx------. 1 eos_admin1 eos_admins 4096 Sep 23 09:04 .
drwxr-xr-x. 2 root root 4096 Sep 21 16:09 ..
drwxr-x---. 2 eos_admin1 eos_admins 4096 Sep 22 14:16 testdir
-rwxr-xr-x. 1 eos_admin1 eos_admins 0 Sep 22 13:27 testfile
-bash-4.2$ touch /eos/gsdc/user/e/eos_admin1/testfile2
-bash-4.2$ ls -al /eos/gsdc/user/e/eos_admin1/
total 16
drwx------. 2 eos_admin1 eos_admins 4096 Sep 23 09:09 .
drwxr-xr-x. 2 root root 4096 Sep 21 16:09 ..
drwxr-x---. 2 eos_admin1 eos_admins 4096 Sep 22 14:16 testdir
-rwxr-xr-x. 1 eos_admin1 eos_admins 0 Sep 22 13:27 testfile
-rw-r-----. 1 eos_admin1 eos_admins 0 Sep 23 09:09 testfile2
-bash-4.2$ rm /eos/gsdc/user/e/eos_admin1/testfile2
-bash-4.2$ ls -l /eos/gsdc/user/e/eos_admin1/
total 0
drwxr-x---. 2 eos_admin1 eos_admins 4096 Sep 22 14:16 testdir
-rwxr-xr-x. 1 eos_admin1 eos_admins 0 Sep 22 13:27 testfile
-bash-4.2$ rmdir /eos/gsdc/user/e/eos_admin1/testdir
-bash-4.2$ ls -l /eos/gsdc/user/e/eos_admin1/
total 0
-rwxr-xr-x. 1 eos_admin1 eos_admins 0 Sep 22 13:27 testfile
The followings are krb5 configuration in MGM, auth
config in fuse.log, mount
command output and a snippet of fuse.log
on client side
MGM sec.protocol & sec.protbind configuration (/etc/xrd.cf.mgm
):
sh-4.2# cat /etc/xrd.cf.mgm | grep -E 'protocol|protbind' | grep -v '#'
sec.protocol unix
sec.protocol sss -c /etc/eos.keytab -s /etc/eos.keytab
sec.protocol krb5 host/jbod-mgmt-01.sdfarm.kr@SDFARM.KR
sec.protbind localhost.localdomain sss unix
sec.protbind localhost sss unix
sec.protbind * only krb5 sss unix
auth
part of /etc/eos/fuse.conf
in client:
"auth" : {
"krb5" : 1,
"gsi-first" : 0,
"sss" : 0,
"ssskeytab" : "/etc/eos.keytab",
"shared-mount" : 1,
"environ-deadlock-timeout" : 100,
"forknoexec-heuristic" : 1
},
Fusex
mount on client:
[root@fcc ~]# mount -t fuse eosxd /eos
# fsname=''
# -o allow_other enabled on shared mount
# -o big_writes enabled
# JSON parsing successful
# File descriptor limit: 524288 soft, 524288 hard
# Disabling nagle algorithm (XRD_NODELAY=1)
# Setting MALLOC_CONF=dirty_decay_ms:0
Fusex
client log: (/var/log/eos/fusex/fuse.log
)
200922 16:48:37 t=1600760917.754907 f=HandleResponseWithHosts l=ERROR tid=00007faeea7fc700 s=xrdclproxy:468 state=failed async open returned errmsg=[ERROR] Error response: Permission denied
200922 16:48:37 t=1600760917.754974 f=TryRecovery l=ERROR ino:00000008e0000000 s=data:962 unrecoverable error - code=400 errNo=3010
200922 16:48:37 t=1600760917.755003 f=flush_nolock l=ERROR ino:00000008e0000000 s=data:369 remote open failed - returning 13
200922 16:48:37 t=1600760917.755027 f=flush l=WARN tid=00007faef03ff700 s=eosfuse:4520 invalidating ino=0x8e0000000 after flush error
200922 16:48:37 t=1600760917.755067 f=TryRecovery l=ERROR ino:00000008e0000000 s=data:962 unrecoverable error - code=400 errNo=3010
200922 16:48:37 t=1600760917.755095 f=flush_nolock l=ERROR ino:00000008e0000000 s=data:369 remote open failed - returning 13
200922 16:48:37 t=1600760917.784151 f=TryRecovery l=ERROR ino:00000008e0000000 s=data:962 unrecoverable error - code=400 errNo=3010
200922 16:48:37 t=1600760917.784183 f=flush_nolock l=ERROR ino:00000008e0000000 s=data:369 remote open failed - returning 13
200922 16:48:37 t=1600760917.916266 f=ioflush l=WARN tid=00007faeed7f0700 s=data:2896 OpenAsync failed - trying recovery - ino: 8e0000000 err-code:400
200922 16:48:37 t=1600760917.916293 f=ioflush l=WARN tid=00007faeed7f0700 s=data:2922 giving up OpenAsync request - ino: 8e0000000 err-code:400
200922 16:48:37 t=1600760917.916544 f=ioflush l=CRIT tid=00007faeed7f0700 s=data:2950 ino: 8e0000000 msg=file open failed file-recovery=/var/eos/fusex/cache/jbod-mgmt-01.sdfarm.kr/08E/8E0000000.dc.recover journal-recovery=/var/eos/fusex/journal/jbod-mgmt-01.sdfarm.kr/08E/8E0000000.jc.recover
# -------------------
# - recovery record -
# -------------------
# path := '/eos/gsdc/user/e/eos_admin1/.bash_history'
# fid := 142
# -[ 000 ] 200922 16:48:37 t=1600760917.754963 f=flush_nolock l= ino:00000008e0000000 s=data:364 status='XrdCl::Proxy::FAILED' hint='will TryRecovery'
# -[ 001 ] 200922 16:48:37 t=1600760917.754999 f=flush_nolock l= ino:00000008e0000000 s=data:368 errno='13' hint='failed TryRecovery'
# -[ 002 ] 200922 16:48:37 t=1600760917.755061 f=flush_nolock l= ino:00000008e0000000 s=data:364 status='XrdCl::Proxy::FAILED' hint='will TryRecovery'
# -[ 003 ] 200922 16:48:37 t=1600760917.755091 f=flush_nolock l= ino:00000008e0000000 s=data:368 errno='13' hint='failed TryRecovery'
# -[ 004 ] 200922 16:48:37 t=1600760917.784143 f=flush_nolock l= ino:00000008e0000000 s=data:364 status='XrdCl::Proxy::FAILED' hint='will TryRecovery'
# -[ 005 ] 200922 16:48:37 t=1600760917.784167 f=flush_nolock l= ino:00000008e0000000 s=data:368 errno='13' hint='failed TryRecovery'
The versions of EOS server and client are 4.8.12 and 4.7.7.
It would be very helpful if you point out anything that I missed or made something wrong.
Thank you.
Best regards,
Sang-Un