Thorny, what controller do you use? Do you know why this is not an issue with other controllers?
I don't play with a controller, I just wrote Ashita's controller handling so I got pretty well acclimated with how the game handles them.
It's not an issue with xinput controllers because the polling API used by xinput is completely different from the one used for directinput.
XInput's API allows the program to check the state of the controller, which FFXI does each frame. If a button was not pressed last frame and is pressed this frame, it gets seen as a press and handled that way. The stick positions will always be correct as well, since they get updated every frame.
Directinput's API allows the program to check the state of the controller, but it also allows the program to request the last X changes to the state. FFXI requests 16 changes per frame, but many directinput controllers will do more than 16*30 or even 16*60 changes per second when sticks are pressed. If your controller is generating 1200 changes per second but FFXI is only getting 960, it falls behind at a rate of ~20% until you stop touching sticks and the amount of changes goes down enough it can catch up. Since FFXI is using the state of the controller for stick inputs, they are always correct. However, button presses won't be handled until the queue catches up.
Ashita fixes it by intercepting the request for 16 changes, replacing it with a request for 64 changes, and discarding unused stick inputs (if stick position is reported 5x, only the latest is sent each frame). DS4Windows and other wrappers fix it by taking your input and converting it to xinput, so ffxi always gets the latest state. Using 60 FPS cap will also help, but may not totally resolve it like ashita or ds4windows would.
Using a different dinput controller won't fix it unless it has a lower rate of reporting.