We have been using WFE to trigger file creation for a long time, and it works well. Currently we upgraded our EOS from 5.2.24 to 5.3.15, and the WFE engine doesn’t work as expected.
The WFE.log has some errors as bellow:
250716 16:05:12 INFO WFE:496 workflow="default" fxid=7ce04162
250716 16:05:12 CRIT WFE:514 parsing of f�J\
failed - setting nobody
250716 16:05:20 INFO WFE:381 workflowdir="/eos/dev/proc/workflow/20250716/q/default/" retry=0 when=1752653120 job-time=1752653120
250716 16:05:22 INFO WFE:496 workflow="default" fxid=7ce04162
250716 16:05:22 CRIT WFE:514 parsing of K��\
failed - setting nobody
It seems that WFE cannot parse something. Looking into mgm/WFE.cc, we found:
attr -g eos.fmd /data72/eos/00033263/7ce04162
Attribute "eos.fmd" had a 146 byte value for /data72/eos/00033263/7ce04162:
bA�|�
l%��vh-99]5��vh=99]ES�vhN�UY��;a��;i��;c8391075zc8391075���.�D���364,3136
The output is unreadable, but the vid might be 99, while the nobody is 65534 in alma 9. Might this be the problem?
Ok, so this means the parsing was faling also before since this attribute does not exist on the file. What other indication in the logs do you have concerning the fact that the WFE does not work as expected? There must be some other clue later in the logs - especially related to the action which is triggered by the workflow. I would recommend tracing this in the main MGM log in /var/log/eos/mgm/xrdlog.mgm as you get more info in there.
Indeed, one difference might be that user nobody has uid 65534 on Alma9. Did you also update the operating system during the upgrade? Can you check the actual error that the action performed by the workflow gets?
I am interested in any messages that you get in the logs when the workflow faild - presummably with 5.3.18. From what I understand, the same message is displayed with both 5.2.24 and 5.3.18 so this is not the root issue per se. We can downgrade the messager to warning rather than critical just to avoid any confusion. This still leaves open the question why your workflow does not work as expected in 5.3.18, but for this you need investigate more the logs and isolate the root cause.
We didn’t have any progress so far. And to make our archival workflow working, we have turned to an external PostgreSQL instead of EOS WFE. BTW WFE worked somehow for a long time, while the group balance didn’t. Maybe it’s related to mq?