Skip to content

merge paddlefleet to paddleformers#4670

Open
risemeup1 wants to merge 14 commits into
PaddlePaddle:developfrom
risemeup1:merge_paddlefleet_to_paddleformers
Open

merge paddlefleet to paddleformers#4670
risemeup1 wants to merge 14 commits into
PaddlePaddle:developfrom
risemeup1:merge_paddlefleet_to_paddleformers

Conversation

@risemeup1

Copy link
Copy Markdown
Collaborator

Before submitting

  • Lint code. If there are lint issues, please format the code first.
# Install and register `pre-commit` in the project folder
pip install pre-commit && pre-commit install

# Process previous code files separately
pre-commit run --file XXXX.py
  • Add test cases into tests folder. If there are codecov issues, please add tests cases first.

PR types

PR changes

Description

@CLAassistant

CLAassistant commented Jun 15, 2026

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
3 out of 5 committers have signed the CLA.

✅ zjjlivein
✅ XieYunshen
✅ risemeup1
❌ root
❌ LLLLLLLLLLLLe


root seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@Paddle-CI-Bot

Paddle-CI-Bot commented Jun 15, 2026

Copy link
Copy Markdown

PaddleFormers Log Analysis

Run #27564247607 · Attempt 1

日志分析报告

流水线名称 问题标签 修复建议 日志片段
Fleet Model Test (H20 single-card / H20 multi-card / A100) - Install PaddleFormers 依赖冲突(scikit-learn 找不到) PR 新增了 scikit-learn 依赖但 CI 环境无法解析,在 requirements.txt / pyproject.toml 中将 scikit-learn 版本固定到可用版本,或改用 scikit-learn; extra == "optional" 条件依赖 报错代码
CI_XPU - run_xpu_cases 依赖缺失(paddlefleet_ops 未安装) XPU 环境缺少 paddlefleet_opstransformer_layer.py 顶层 from paddlefleet_ops import is_deep_ep_available 导致 import chain 崩溃;需在 XPU CI 安装步骤中补充 uv pip install -e packages/paddlefleet_ops,或将该 import 改为延迟/条件 import 报错代码
Unittest GPU CI - Test 依赖缺失(build 包未安装) python -m build --wheelNo module named build,需在 CI 安装步骤加 pip install build 报错代码
Unittest GPU CI - upload-coverage runner GLIBC 版本不兼容 Node.js 24 要求 GLIBC ≥ 2.27,而 runner_04 系统不满足;与本PR无关,CI 维护人员升级 runner_04 系统或固定 actions/checkout 为 Node.js 20 版本 报错代码
Model Unittest GPU CI - Test 依赖缺失(paddle 未安装) import paddleNo module named 'paddle',paddle whl 未能成功安装到 py_3.12 环境,需检查安装步骤日志确认 whl 是否下载/安装成功 报错代码

失败的测试case:

# CI_XPU
scripts/xpu_ci/test_ernie_21b_sft.py::test_ernie_21b_sft_training
scripts/xpu_ci/test_ernie_28b_thinking_sft.py::test_ernie_28b_thinking_sft_training

# Model Unittest GPU CI
ERROR scripts/regression/test_models.py  (collection failed - ModuleNotFoundError: No module named 'paddle')

# Fleet Model Test
Integration test (H20, single card) - Install PaddleFormers: exit 1
Integration test (H20, multi-card) - Install PaddleFormers: exit 1
Integration test (A100) - Install PaddleFormers: exit 1

根本原因分析:
PR #4670(merge_paddlefleet_to_paddleformers)将 paddlefleet_ops 从独立包合入 PaddleFormers 主仓,同时在 pyproject.toml/requirements.txt 中新增了 scikit-learn 依赖,但 CI 的 uv 解析环境中 scikit-learn 无可用版本,导致三条 Fleet 流水线安装 paddleformers 全部失败;同时 transformer_layer.py 顶层直接 import paddlefleet_ops 在未安装该包的 XPU 环境中造成全链路 import 崩溃,build 包未预装导致 Unittest GPU CI wheel 构建失败。

修复建议:

  1. scikit-learn 依赖:确认 scikit-learn 是否为必需运行时依赖,若非必须则移除或改为 optional;若必须,则固定一个在 PaddlePaddle PyPI 镜像中可解析的版本(如 scikit-learn>=1.3),并在本地 uv pip install 验证可解析后再提交。
  2. paddlefleet_ops XPU 环境:在 XPU CI 的安装脚本(scripts/xpu_ci/ 对应 workflow)中补充 pip install -e packages/paddlefleet_ops --no-build-isolation 步骤;或将 transformer_layer.py:33 的顶层 import 改为 try/except ImportError 条件导入,避免 XPU 环境无该包时整个 import chain 崩溃。
  3. build 包缺失:在 Unittest GPU CI 的 install_requirements 阶段补充 pip install build
  4. Model Unittest paddle 安装失败:确认 paddle whl 安装日志,可能是 build 包缺失导致安装提前中断,修复第3条后验证。
  5. upload-coverage GLIBC:与本PR无关,CI 维护人员处理 runner_04 Node.js 24 兼容性问题。

🔍 准确性记录:请点击评论底部 😊 图标,选择 👍(准确)或 👎(有误),将自动记录到 CI 监控系统

🔄 每次 Re-run 后自动更新

@risemeup1 risemeup1 force-pushed the merge_paddlefleet_to_paddleformers branch from 0a77ccd to 25ab366 Compare June 15, 2026 17:10
@risemeup1 risemeup1 force-pushed the merge_paddlefleet_to_paddleformers branch 3 times, most recently from a6aa38f to 253e26f Compare June 16, 2026 11:36
@risemeup1 risemeup1 force-pushed the merge_paddlefleet_to_paddleformers branch 2 times, most recently from 67e3673 to 3ee3fc5 Compare June 16, 2026 12:28
@risemeup1 risemeup1 force-pushed the merge_paddlefleet_to_paddleformers branch from 3ee3fc5 to ac7b330 Compare June 16, 2026 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants