miramir
(Ivan Kashunin)
July 4, 2022, 7:00am
1
Hi,
We have an EOS cluster with a quark db. When mgm restarted, the /eos/ directory tree lost files.
When you run the command, the files are visible in dir:
eos-m01:~ # eos fileinfo fxid:0b19cd0b
File: '/eos/scg/1_Хранилище (создавать всё внутри)/1_ИСХОДНИКИ/3_СПЕЦПРОЕКТЫ/2021/Марафон по школам Дубны 2021/Марафон/исх/13 12 2021 ОИЯИ в лицее Кадышевского/камера Женя/-/C0015.MP4' Flags: 0764 Clock: 16fd7c066eeca3a9
Size: 138418448
Modify: Mon Dec 13 06:50:38 2021 Timestamp: 1639367438.000000000
Change: Wed May 18 15:39:46 2022 Timestamp: 1652877586.331433099
Birth: Wed May 18 15:39:44 2022 Timestamp: 1652877584.537794598
CUid: 0 CGid: 860 Fxid: 0b19cd0b Fid: 186240267 Pid: 5498326 Pxid: 0053e5d6
XStype: adler XS: e9 1d e8 a7 ETAGs: "49993490997706752:e91de8a7"
Layout: replica Stripes: 2 Blocksize: 4k LayoutId: 00100112 Redundancy: d2::t0
#Rep: 2
┌───┬──────┬────────────────────────┬────────────────┬────────────────┬──────────┬──────────────┬────────────┬────────┬────────────────────────┐
│no.│ fs-id│ host│ schedgroup│ path│ boot│ configstatus│ drain│ active│ geotag│
└───┴──────┴────────────────────────┴────────────────┴────────────────┴──────────┴──────────────┴────────────┴────────┴────────────────────────┘
0 197 eos-f049.jinr.ru default.1 /e/p01 booted rw nodrain online RU::JINR::LIT
1 285 eos-f071.jinr.ru default.1 /e/p01 booted rw nodrain online RU::JINR::LIT
*******
But directory is empty.
We tried to restore it and found out that we have 2 the same directory old (cid 5452838 with data) and new (cid 5596529 - empty):
eos-b02:/home # eos-ns-inspect print --members localhost:7777 --cid 5596529
ID: 5596529
Parent ID: 2
Name: scg
uid: 0, gid: 0
ctime: Wed Jun 1 13:28:02 2022 Timestamp: 1654079282.45653590
mtime: Thu Jan 1 03:00:00 1970 Timestamp: 0.0
stime: Thu Jan 1 03:00:00 1970 Timestamp: 0.0
Tree size: 0
Mode: 16877
Flags: 1
eos-b02:/home # eos-ns-inspect print --members localhost:7777 --cid 5452838
ID: 5452838
Parent ID: 2
Name: scg
uid: 0, gid: 860
ctime: Fri May 27 10:43:36 2022 Timestamp: 1653637416.33695994
mtime: Mon May 30 13:07:01 2022 Timestamp: 1653905221.476521648
stime: Thu Jan 1 03:00:00 1970 Timestamp: 0.0
Tree size: 4161269237640
Mode: 16893
Flags: 1
Extended attributes (11):
sys.eos.btime=1651237100.622317027
sys.owner.auth=*
sys.forced.layout=replica
sys.forced.blocksize=4k
sys.mask=775
user.acl=
sys.forced.nstripes=2
sys.forced.space=default
sys.forced.stripes=16
sys.forced.checksum=adler
sys.acl=g:860:rwxmqc
Full path: /eos/scg/
------------------------------------------------
FileMap:
------------------------------------------------
ContainerMap:
1_Хранилище (создавать всё внутри): 5476312
2_Файлообмен (забрал и удалил): 5498474
We tried to remove new empty dir with ns-inspect:
eos-ns-inspect drop-empty-cid --members localhost:7777 --cid 5596529
But the old directory did not appeared.
Has anyone encountered such a problem?
1 Like
miramir
(Ivan Kashunin)
July 5, 2022, 10:08am
2
I would like to add. The lost directory tree contains 38K files, we urgently need to restore the contents of this tree.
Since the cid of the /eos/scg/ is present in quarkdb, we hope recovery is possible. But we don’t know how to do it.
Any help would be greatly appreciated!
1 Like
apeters
(Andreas Joachim Peters)
July 5, 2022, 11:35am
3
Hi Ivan,
if you run eos-ns-inspect with ‘check-orphans’ it should print directory 5452838 as orphaned.
There is no tool implemented to attach on orphan using eos-ns-inspect.
One can do this manually by using the redis client. I will make a dry run here and then tell you how to do it in a short while …
1 Like
miramir
(Ivan Kashunin)
July 5, 2022, 12:48pm
4
Hi Andreas,
I run
eos-ns-inspect check-orphans --members localhost:7777
and recived:
file-id=25452838 invalid-parent-id=0 size=0 locations= unlinked-locations=
Thank you. I’ll be waiting.
apeters
(Andreas Joachim Peters)
July 5, 2022, 1:05pm
5
After this command,
did you restart the MGM or did ‘eos ns cache drop -d’ ?
If not, you have to do that!
apeters
(Andreas Joachim Peters)
July 5, 2022, 1:06pm
6
I mean, after you dropped the additional /eos/scg/ directory …
miramir
(Ivan Kashunin)
July 5, 2022, 1:37pm
7
I restarted mgm and the directory /eos/scg appeared, but it is empty:
EOS Console [root://eos.jinr.ru] |/eos/flnp-admin/> fileinfo /eos/scg
Directory: '/eos/scg' Treesize: 0
Container: 0 Files: 0 Flags: 40755
Modify: Thu Jan 1 03:00:00 1970 Timestamp: 0.000000000
Change: Tue Jul 5 16:24:07 2022 Timestamp: 1657027447.639821534
Sync : Thu Jan 1 03:00:00 1970 Timestamp: 0.000000000
Birth : Thu Jan 1 03:00:00 1970 Timestamp: 0.000000000
CUid: 0 CGid: 0 Fxid: 005c533d Fid: 6050621 Pid: 2 Pxid: 00000002
ETAG: 5c533d:0.000
apeters
(Andreas Joachim Peters)
July 6, 2022, 8:32am
8
Ok,
sorry for the delay.
What you do now is.
1 Make a backup of QDB
2 eos rmdir /eos/scg/
3 eos-ns-inspect [add connection settings] fix-detached-parent --destination-path /eos/scg --cid 5452838
miramir
(Ivan Kashunin)
July 6, 2022, 11:09am
9
That’s all right
Make a backup - DONE
I removed /eos/scg by the command drop-empty-cid.
Or it was necessary to do it through the command rmdir /eos/scg/ ?
eos-b02:/bkp # eos-ns-inspect fix-detached-parent --members localhost:7777 --destination-path /eos/scg --cid 5452838
Destination path '/eos/scg' does not exist.
miramir
(Ivan Kashunin)
July 7, 2022, 8:24am
10
miramir:
25452838
I checked everything and found an error.
Directory id /eos/sgc is not exist in the check_orhans list.
I mixed up 25452838 and 5452838
apeters
(Andreas Joachim Peters)
July 7, 2022, 9:12am
11
Can you do the following:
mkdir /eos/tmp/
eos-ns-inspect fix-detached-parent --members localhost:7777 --destination-path /eos/tmp --cid 5452838
And then check what is in /eos/tmp/ …
miramir
(Ivan Kashunin)
July 7, 2022, 11:37am
12
eos-b02:~ # eos-ns-inspect fix-detached-parent --members localhost:7777 --destination-path /eos/tmp --cid 5452838
Finding all parents of Container #5452838...
scg: #5452838 with parent #2
eos: #2 with parent #1
Unable to continue - given container (5452838) looks fine? No changes have been made.
EOS Console [root://localhost] |/eos/tmp/> ls -la
drwxrwxr-+ 1 root root 0 Jul 7 14:33 .
drwxrwxr-+ 1 root root 2645894117260142 Jul 7 14:33 ..
apeters
(Andreas Joachim Peters)
July 7, 2022, 12:22pm
13
Argh … ok, this does not do what we want … I need to give you the REDIS commands then …
apeters
(Andreas Joachim Peters)
July 7, 2022, 12:46pm
15
Run this commands:
eos file info /eos/scg
this should show fid:2
redis-cli -p 7777 -h localhost HSET 2:map_conts “1_Хранилище (создавать всё внутри)” 5476312
redis-cli-p 7777 -h localhost HSET 2:map_conts “2_Файлообмен (забрал и удалил)” 5498474
if you get ‘MOVED … ’ run the same commands on that host
drop MGM cache
eos ns cache drop-single-container 2
list /eos/scg
eos ls -la /eos/scg/
Do you see the two directories?
apeters
(Andreas Joachim Peters)
July 7, 2022, 12:47pm
16
No wait … that is screwed …
apeters
(Andreas Joachim Peters)
July 7, 2022, 12:50pm
17
No, you have to attach directory 5452838 to /eos (2),
so you just do this redis command:
redis-cli -p 7777 -h localhost HSET 2:map_conts scg 5476312
(if you were running already the two commands before, you rerun them with only the KEY ( not the number in the end)
HDEL 2:map_conts “1_…”
and
HDEL 2:map_conts “2_…”
)
miramir
(Ivan Kashunin)
July 7, 2022, 12:53pm
18
eos-b02:~ # redis-cli -p 7777 -h localhost HSET 2:map_conts scg 5476312
(integer) 1
EOS Console [root://localhost] |/eos/> ls /eos/scg
Unable to stat /eos/scg; No such file or directory (errc=2) (No such file or directory)
miramir
(Ivan Kashunin)
July 7, 2022, 12:54pm
19
No, I run only:
redis-cli -p 7777 -h localhost HSET 2:map_conts scg 5476312
apeters
(Andreas Joachim Peters)
July 7, 2022, 12:56pm
20
Did you drop the cache for inode 2 before listing?
eos ns cache drop-single-container 2