Aiming at the problems of strong subjectivity, lack of accuracy and difficulty in large-scale evaluation of students' sports action standardization, this paper proposes an automatic evaluation algorithm based on computer vision. First, a multi-perspective sports action dataset is constructed and an expert scoring system is designed; Secondly, key point sequences are extracted using an improved pose estimation model, and a multi-scale motion representation method is introduced to integrate joint-level, limb-level, and global features; Furthermore, a bias-aware alignment network is proposed to achieve adaptive modeling of spatiotemporal errors; Finally, a multi-task scoring model based on the fusion of GCN and Transformer is constructed to realize the normative classification and regression prediction of actions. The experimental results show that on the self-built data set, the MAE of this method is reduced to 0.318, which represents an improvement of approximately 29.6% over mainstream methods, the classification accuracy is 91.6%, and the correlation coefficient with expert score is 0.94. At the same time, in the cross-scenario test, the performance decreased by only 2.8%, which was significantly better than the comparison method. Ablation experiments and statistical tests validate the effectiveness of each module. The results show that this method has obvious advantages in accuracy, generalization ability and interpretability, and can provide technical support for intelligent physical education teaching and automatic evaluation.