-
Notifications
You must be signed in to change notification settings - Fork 6.6k
fix(hooks): Add padding support to context parallel hooks #12595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
thanks for the PR! however, we will not want any of these logic go into qwen transformer |
Hi @yiyixuxu, yes I would be interested to support this change. |
|
Hi @yiyixuxu ,Just wanted to follow up. After looking at the hook implementation as you suggested, I've updated the PR with a new approach that is fully generic and contains all logic within the hooks, with no changes to the transformer. The solution now involves adding padding in the ContextParallelSplitHook and then trimming it in theContextParallelGatherHook, using the module instance to temporarily store the original sequence length. I've also added a new unit test for this logic in test_hooks.py. Thanks and lmk if you need more changes. I've updated the PR description with the full details. CC @sayakpaul @DN6 |
93340a2 to
afc18a1
Compare
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Hello @yiyixuxu, I wanted to follow up on this in case you were busy. Thanks. |
|
Hmm, I think we need to wait a bit before #12702 is merged because it is tackling padding too. |
|
Ok got it @sayakpaul , thanks for letting me know. |
What does this PR do?
This PR now modifies the ContextParallelSplitHook and ContextParallelGatherHook to gracefully handle sequence lengths that are not divisible by the world size.
This PR changes:
This ensures that the padding is completely transparent to the model and the end-user, preventing crashes without altering the output shape. The fix is now contained entirely within the hooks and requires no changes to the Qwen transformer or any other model.
I have also added a new unit test in tests/hooks/test_hooks.py that directly tests this new padding and trimming logic,
Fixes #12568
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@sayakpaul @yiyixuxu @DN6