DeepPHY Benchmarking Agentic VLMs on Physical Reasoning
discuss with author:
9,43K