PPO 跑example例子报错：value should be one of int, float, str, bool, or torch.Tensor

### Reminder

- [X] I have read the README and searched the existing issues.

### System Info

- `llamafactory` version: 0.8.3.dev0
- Platform: Linux-6.1.85+-x86_64-with-glibc2.35
- Python version: 3.10.12
- PyTorch version: 2.3.0+cu121 (GPU)
- Transformers version: 4.41.2
- Datasets version: 2.20.0
- Accelerate version: 0.31.0
- PEFT version: 0.11.1
- TRL version: 0.9.4
- GPU type: NVIDIA A100-SXM4-40GB


### Reproduction

在colab上运行的
!llamafactory-cli train examples/train_lora/llama3_lora_reward.yaml # 正常结果
!llamafactory-cli train examples/train_lora/llama3_lora_ppo.yaml  # 这部报错

![image](https://github.com/hiyouga/LLaMA-Factory/assets/16278392/e9aa7760-4fe3-45ec-9e7b-7184a26690a3)


Traceback (most recent call last):
  File "/usr/local/bin/llamafactory-cli", line 8, in <module>
    sys.exit(main())
  File "/content/drive/My Drive/llama-factory-new/LLaMA-Factory/src/llamafactory/cli.py", line 110, in main
    run_exp()
  File "/content/drive/My Drive/llama-factory-new/LLaMA-Factory/src/llamafactory/train/tuner.py", line 54, in run_exp
    run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/content/drive/My Drive/llama-factory-new/LLaMA-Factory/src/llamafactory/train/ppo/workflow.py", line 58, in run_ppo
    ppo_trainer = CustomPPOTrainer(
  File "/content/drive/My Drive/llama-factory-new/LLaMA-Factory/src/llamafactory/train/ppo/trainer.py", line 118, in __init__
    PPOTrainer.__init__(
  File "/usr/local/lib/python3.10/dist-packages/trl/trainer/ppo_trainer.py", line 227, in __init__
    self.accelerator.init_trackers(
  File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 685, in _inner
    return PartialState().on_main_process(function)(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 2586, in init_trackers
    tracker.store_init_configuration(config)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/tracking.py", line 79, in execute_on_main_process
    return PartialState().on_main_process(function)(self, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/tracking.py", line 211, in store_init_configuration
    self.writer.add_hparams(values, metric_dict={})
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/tensorboard/writer.py", line 341, in add_hparams
    exp, ssi, sei = hparams(hparam_dict, metric_dict, hparam_domain_discrete)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/tensorboard/summary.py", line 316, in hparams
    raise ValueError(
ValueError: value should be one of int, float, str, bool, or torch.Tensor


### Expected behavior

_No response_

### Others

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPO 跑example例子报错：value should be one of int, float, str, bool, or torch.Tensor #4458

Reminder

System Info

Reproduction

Expected behavior

Others

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development