<?xml version="1.0"?>
<!DOCTYPE ArticleSet PUBLIC "-//NLM//DTD PubMed 2.0//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/static/PubMed.dtd">
<ArticleSet>
  <Article>
    <Journal>
      <PublisherName>Sichuan Knowledgeable Intelligent Sciences</PublisherName>
      <JournalTitle>International Scientific Technical  and Economic Research </JournalTitle>
      <Issn>2959-1309</Issn>
      <Volume>4</Volume>
      <Issue>2</Issue>
      <PubDate PubStatus="epublish">
        <Year>2026</Year>
        <Month>04</Month>
        <Day>08</Day>
      </PubDate>
    </Journal>
    <ArticleTitle>Research on Transformer-Based Action Sequence Modeling of Intangible Cultural Heritage Shadow Play Using Attention Mechanisms</ArticleTitle>
    <FirstPage>51</FirstPage>
    <LastPage>77</LastPage>
    <ELocationID EIdType="doi">10.71451/ISTAER2615</ELocationID>
    <Language>eng</Language>
    <AuthorList>
      <Author>
        <FirstName>Yuxiao</FirstName>
        <LastName>Liu</LastName>
        <Affiliation>Art and Design, Beijing City University, Shunyi District, Beijing, China</Affiliation>
        <Identifier Source="ORCID">0009-0003-7951-314X</Identifier>
      </Author>
      <Author>
        <FirstName>Shuolei</FirstName>
        <LastName>Feng</LastName>
        <Affiliation>Department of Information Science, Beijing City University, Shunyi District, Beijing, China</Affiliation>
        <Identifier Source="ORCID">0009-0003-8967-5134</Identifier>
      </Author>
      <Author>
        <FirstName>Mengyu</FirstName>
        <LastName>Liu</LastName>
        <Affiliation>Art and Design, Beijing City University, Shunyi District, Beijing, China</Affiliation>
        <Identifier Source="ORCID">0009-0003-0522-0280</Identifier>
      </Author>
    </AuthorList>
    <History>
      <PubDate PubStatus="received">
        <Year>2026</Year>
        <Month>04</Month>
        <Day>08</Day>
      </PubDate>
    </History>
    <Abstract>
Shadow puppet movements are characterized by long-range spatiotemporal dependence, pronounced stylization, and complex control and transmission relationships, these characteristics pose two major challenges to digital modeling: capturing long-range dependencies and preserving artistic style expression. This paper proposes an improved Transformer model incorporating a multi-level attention mechanism for modeling and generating action sequences of intangible cultural heritage shadow play. The model designs three types of collaborative attention modules: spatial attention introduces bone adjacency priors to enhance structural rationality; temporal attention captures cross-frame long-range dependencies; and style-aware attention adjusts local computations via global feature statistics to preserve genre-specific performance styles. Furthermore, an enhanced architecture alternately stacking graph convolution and Transformer is adopted, and sparse and hierarchical modeling strategies are introduced to reduce computational complexity from quadratic to approximately linear in sequence length. The experimental results show that the average joint position error of the proposed method in motion prediction tasks is 31.4, which is 11.8 lower than that of the standard Transformer; Style loss decreased by 24.6%; Under the extreme condition of 50% missing key points, the error ratio is 1.31, which is significantly better than the comparison method. The proposed model provides effective technical support for the digital protection and intelligent inheritance of intangible cultural heritage.
</Abstract>
  </Article>
</ArticleSet>
