<?xml version="1.0" encoding="utf-8"?><article xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0-subset Balisage-1.5">
    <title>Is Invisible XML Ready for College Students?</title>
    <subtitle>Trying ixml and XProc on a Music Analysis Project in an Undergraduate Text Analysis
        Course</subtitle>
    <info>
   <confgroup>
      <conftitle>Balisage: The Markup Conference 2025</conftitle>
      <confdates>August 4-8, 2025</confdates>
   </confgroup>
        <abstract>
            <para>Is Invisible XML ready for teaching university undergraduates? Is it a good idea
                to try this? This paper will attempt to address these questions. University students
                in the Digital Media, Arts, and Technology program at Penn State Behrend are offered
                a course in “Large-Scale Text Analysis”. Going into this course, students have
                experience in encoding text with XML, transforming XML with XSLT, and web
                development with HTML and CSS. In the past, the Text Analysis course has been a
                procedural “Regex-and-Python course”: preparing text corpora by generating simple
                XML from regularly-patterned files using regular expression search-and-replace
                operations, using XQuery to extract the portions of the texts to analyze, and
                producing plain-text inputs to provide to Python. Python has dominated the
                experience of the pipeline.</para>
            <para>This year’s course tried a different approach. Students were taught ixml grammars
                as a way to prepare XML for analysis and XProc for pipelining. Regular expression
                matching involved working with XSLT, and the entire XML stack was used before
                approaching Python. Students learned how to install software in alpha stages, and
                they tested how well it works across platforms. From this exploratory start, one
                student project team found a very practical use-case for applying Invisible XML in a
                project pipeline for analyzing chord chart musical notation. In this paper, we
                discuss the potential we discovered for Invisible XML in music analysis. We also
                share our recommendations for guiding people to prepare processing pipelines that
                incorporate Invisible XML, and we reflect on what aspects of this risky teaching
                experiment were most worthwhile.</para>
        </abstract>
        <author>
            <personname>
                <firstname>Michael</firstname>
                <othername>Roy</othername>
                <surname>Simons</surname>
            </personname>
            <personblurb>
                <para>Michael Simons is a Digital Media, Arts, and Technology (DIGIT) student at
                    Penn State Behrend. After two years of studying Computer Science, he decided to
                    pivot to Dr. Beshero-Bondar’s Digit program as it allows for greater creativity
                    and a more focused path while still learning how to get the most out of today’s
                    innovative technologies. In this program, he’s taken a deep dive into the XML
                    stack where he enjoys using tools like XSLT—and more recently, ixml and XProc—to
                    create rich markup that is both satisfyingly organized and able to be processed
                    in interesting ways. Michael’s main passion is music, which he utilized to
                    develop a <link xlink:href="https://newtfire.github.io/GretaVanZeppelin/" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">large-scale
                        text analysis project</link> comparing the lyrics and chord progressions of
                    seemingly similar artists.</para>
            </personblurb>
            <affiliation>
                <jobtitle>Student</jobtitle>
                <jobtitle>Research Assistant / Coding Mentor</jobtitle>
                <orgname>Penn State Erie, The Behrend College</orgname>
            </affiliation>
            <email>mrs7068@psu.edu</email>
        </author>

        <author>
            <personname>
                <firstname>Elisa</firstname>
                <othername>E.</othername>
                <surname>Beshero-Bondar</surname>
            </personname>
            <personblurb>
                <para>Elisa Beshero-Bondar explores and teaches document data modeling with the XML
                    family of languages. She serves on the TEI Technical Council and is the founder
                    and organizer of the <link xlink:href="https://digitalmitford.org" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">Digital
                        Mitford project</link> and <link xlink:href="https://digitalmitford.github.io/DigMitCS/" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">its usually annual coding
                        school</link>. She experiments with visualizing data from complex document
                    structures like epic poems and with computer-assisted collation of differently
                    encoded editions of <link xlink:href="https://frankensteinvariorum.github.io/" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest"><emphasis role="ital">Frankenstein</emphasis></link>. Her ongoing adventures with
                    markup technologies are documented on <link xlink:href="https://newtfire.org" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">her
                        development site at newtfire.org</link>. </para>
            </personblurb>
            <affiliation>
                <jobtitle>Chair</jobtitle>
                <orgname>TEI Technical Council</orgname>
            </affiliation>
            <affiliation>
                <jobtitle>Professor of Digital Humanities</jobtitle>
                <jobtitle>Program Chair of Digital Media, Arts, and Technology</jobtitle>
                <orgname>Penn State Erie, The Behrend College</orgname>
            </affiliation>
            <email>eeb4@psu.edu</email>
        </author>
<legalnotice><para>Copyright © 2025 by the authors.</para></legalnotice>
        <keywordset role="author">
            <keyword>Invisible XML</keyword>
            <keyword>ixml</keyword>
            <keyword>XProc</keyword>
            <keyword>music encoding</keyword>
            <keyword>MEI</keyword>
            <keyword>MusicXML</keyword>
            <keyword>Perl</keyword>
            <keyword>ChordPro</keyword>
            <keyword>regular expressions</keyword>
            <keyword>grammar</keyword>
            <keyword>schema</keyword>
            <keyword>declarative markup</keyword>
            <keyword>declarative methods</keyword>
            <keyword>imperative methods</keyword>
            <keyword>Python</keyword>
            <keyword>XSLT</keyword>
        </keywordset>
    </info>
    <section>
        <title>Context: The Large-Scale Text Analysis Course at Penn State Behrend</title>
        <para>The Large-Scale Text Analysis class (Digit 210) is taught (by me, Elisa
            Beshero-Bondar) in spring semesters in the Digital Media, Arts, and Technology major at
            Penn State Behrend (affectionately known to us as the <quote>DIGIT</quote> program). The
            course is part of a multiple-semester digital humanities core sequence that concentrates
            on text encoding and processing. Students usually come to this class with previous
            experience in text encoding with XML, transformation with XSLT, and web development with
            HTML and CSS from previous coursework in text encoding. Our university semesters are 15
            weeks long with three 50-minute classes per week, during which these classes involve
            daily homework and a few tests with an emphasis on students’ applying what they learn to
            develop semester projects in small teams.</para>

        <para>Digit 210 is usually understood to be <quote>the Regex-and-Python course</quote>. A
            typical semester would involve orienting students to natural language processing in
            Python and preparing text corpora for analysis. In this context, XML was a helpful (but
            not completely necessary) option that contributed to more precise text curation and
            analysis, and the expectation to prepare XML has provided a good means for students to
            learn regular expression search-and-replace operations in order to generate simply but
            meaningfully structured XML from regular patterns in text corpora. Students would then
            learn to write XQuery to output portions of the texts to analyze using natural language
            processing libraries (<link xlink:href="https://spacy.io/usage/spacy-101" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">spaCy</link>
            or <link xlink:href="https://www.nltk.org/" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">NLTK</link>). For an exemplary semester
            project, a team of students <quote>scraped</quote> collections of popular game script
            files. On performing careful document analysis, they applied regular expression
            search-and-replace operations to create a simple XML structure that helped them to
            isolate (and mark off) spoken conversation between NPC characters from passages about
            game items and optional actions. The team prepared XML and wrote XQuery to output the
            texts that described decision making forks, in order to find out how frequently certain
            characters and items are mentioned at specific locations in the game. In other projects,
            students developed text corpora from many seasons of available TV series like
                <emphasis>The Simpsons</emphasis>. They prepared simple XML to identify dialogue,
            speakers, and non-spoken descriptive passages, then applied XQuery to separate out just
            the portions of the texts they wished to analyze (e.g., just the spoken text) in
            plain-text inputs to provide to Python. In previous semesters, we would have student
            teams import the Saxon-C package into their Python scripts to have students apply XPath
            and XQuery directly as a pipeline process within their Python. Thus, Python has
            traditionally come first in this course and dominated the class experience of developing
            a pipeline algorithm for analyzing text corpora.</para>
        <para>Perhaps the 2020s are a decade inviting us to resist complacency, particularly in the
            organizing of teaching syllabi. In Spring 2025, motivated greatly by the interesting
            opportunities of Invisible XML and the creative affordances of SVG discussed in previous
            years at Balisage, we changed direction and put XML technologies much more in the foreground.<footnote>
                <para>Several papers in the Proceedings of Balisage: The Markup Conference vol. 29
                    (2024) demonstrated exciting applications of Invisible XML connected to careful
                    document planning and analysis with exciting implications for interface design.
                    See Joseph Michael Courtney and Michael Robert Gryk, <quote>Pulse, Parse, and
                        Ponder: Using Invisible XML to Dissect a Scientific Domain Specific
                        Language</quote>, <link xlink:href="https://doi.org/10.4242/BalisageVol29.Courtney01" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">https://doi.org/10.4242/BalisageVol29.Courtney01</link>; Mary Holstege,
                        <quote>Invisible Fish: API Experimentation with InvisibleXML</quote>, <link xlink:href="https://doi.org/10.4242/BalisageVol29.Holstege01" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">https://doi.org/10.4242/BalisageVol29.Holstege01</link>; John Lumley,
                        <quote>Variations on an Invisible Theme: Using iXML to produce XML to
                        produce iXML to produce ...</quote>,
                    <link xlink:href="https://doi.org/10.4242/BalisageVol29.Lumley01" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">https://doi.org/10.4242/BalisageVol29.Lumley01</link>; Ari Nordström,
                        <quote>Adventures in Mainframes, Text-based Messaging, and iXML</quote>, 
                    <link xlink:href="https://doi.org/10.4242/BalisageVol29.Nordstrom01" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">https://doi.org/10.4242/BalisageVol29.Nordstrom01</link>; C. M.
                    Sperberg-McQueen, <quote>From Word to XML via iXML: a Word-first XML workflow in
                        the TLRR 2e project</quote>, 
                    <link xlink:href="https://doi.org/10.4242/BalisageVol29.Sperberg-McQueen01" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">https://doi.org/10.4242/BalisageVol29.Sperberg-McQueen01</link>; Bethan
                    Tovey-Walsh, <quote>When women do algorithms: a semi-generative approach to
                        overlay crochet with iXML and XSLT</quote>, 
                    <link xlink:href="https://doi.org/10.4242/BalisageVol29.Tovey-Walsh01" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">https://doi.org/10.4242/BalisageVol29.Tovey-Walsh01</link>.</para>
            </footnote> This year, our students began the course by learning to write SVG and think
            about creative ways to visualize data by programmatically scripting SVG with XSLT. The
            SVG data visualization unit previously came at the end of the Python-dominated course,
            as a consequence of preparing the XML and writing XQuery: students were prepared to
            script SVG with XQuery with an emphasis on extracting data and providing an alternative
            to visualization libraries available in the Python ecosystem. In the new experimental
            course, we decided to <emphasis>begin</emphasis> with hand-encoding SVG as a rewarding
            starting point for our digital creative students, and then review what they had learned
            of XSLT in their Text Encoding course. In this version of our course, they reviewed XSLT
            by pulling data from texts they had encoded in order to make their own data
            visualizations. This moved XSLT <quote>front and center</quote>, but also prevented us
            from covering XQuery, given the time we would now have to devote to new processing
            technologies. We introduced the regular expressions unit much as usual with the same
            goals of preparing XML files from so-called <quote>plain text</quote> using
            search-and-replace operations in the oXygen XML Editor. However, <emphasis>this
                time</emphasis> we found opportunity to return again to XSLT to introduce stages of
            conversion from text to XML using <code>xsl:analyze-string</code> to refine their
            processing and create new element nodes based on regular expression matches and
            non-matches. This set the stage for the new unit on Invisible XML and XProc.</para>
        <para>All this activity concentrated on XML processing before the students wrote any Python.
            The new priority on XML not only helped students review and develop XSLT skills
            introduced in the previous semester, but it also gave students an unusual experience
            with writing grammars and seeing the relationships and differences between regular
            expression matching, grammars, and schemas. Putting these XML, XSLT, and ixml
            experiences first, before introducing Python for natural language processing, changed
            the course experience significantly. Even though not all students applied ixml or XProc
            pipelines in their semester projects, the common experience of encountering these
            technologies certainly changed the way students encountered pipeline processing,
            improved their command line fluency. and introduced them to software development in
            alpha stages. Was Invisible XML ready for experimentation by undergraduates, and was the
            experience worth deferring their attention to Python processing?</para>
        <para>We found that perhaps ixml was <emphasis>just</emphasis> ready enough, but it will
            surely be more ready for student experimentation next year! This semester everything was
            new and different, and here we break down the most challenging and learning-intensive
            experiences for students confronting Invisible XML and XProc for the first time.</para>
        <section>
            <title>Introducing the Battle-Testing Team</title>
            <subtitle>Preparing for a Cross-Platform Educational Experiment</subtitle>
            <para>Hello world! I’m Michael Simons, and I was one of the
                    <quote>battle-testers</quote> for the new ixml and XProc unit in our Digit 210
                class. I was one of three students who signed up to assist Dr. Beshero-Bondar and
                Dr. David J. Birnbaum in trying out some new exciting XML technologies to be taught
                to the rest of our class. Dr. Birnbaum was invited to teach a roughly one week long
                unit on these technologies, but the preparation was long and intense. I was joined
                by Dannika Love, who was taking the course with me, and Caleb King, who had taken
                the course the previous year in its previous form. Dannika is a student with careful
                attention to detail and strong leadership qualities but likely would have not gained
                as much from ixml and XProc on her own if she had not been a behind-the-scenes
                battle-tester. Caleb is also a highly motivated student and leader, and one of his
                main contributions was working through the challenges of installing ixml and XProc
                processors in a Windows environment, as Dr. Beshero-Bondar, Dannika and myself are
                all MacOS users.</para>
            <para>Our group of battle-testers was largely responsible for, most importantly, getting
                ahead of the class and working our way through learning ixml and XProc before our
                peers, so that we could <orderedlist>
                    <listitem>
                        <para>Write instructions for installation and configuration that we felt our
                            peers would be able to easily follow, and</para>
                    </listitem>
                    <listitem>
                        <para>Assist our peers when problems arose.</para>
                    </listitem>
                </orderedlist>We all gained more knowledge about ixml and XProc than we would have
                otherwise because of the problems we worked through together, so that is something I
                value greatly about this experience.</para>
            <para>We hope that our battle-testing was helpful to the developers of ixml and XProc as
                we figured out what documentation was needed to provide these technologies to a
                group of undergraduates. While Dr. Birnbaum developed installation instructions, he
                warned us that they were only for MacOS and for those with a purchased Saxon EE
                license for use with XProc processing.<footnote>
                    <para>Dr. Birnbaum’s installation instructions, which he claimed were developed
                        for his primary audience: himself, <quote>Configuring XProc and ixml
                            processors</quote>, <link xlink:href="http://dh.obdurodon.org/ixml-and-xproc-configuration.xhtml" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://dh.obdurodon.org/ixml-and-xproc-configuration.xhtml</link>
                        (2025). Last accessed 2025-07-02.</para>
                </footnote> Knowing that our students would not be able to purchase a Saxon EE
                license and would be using the HE (Home Edition) instead, this was one difference
                that the battle-testing team needed to work on incorporating into our instructions.
                As the number of differences between Dr. Birnbaum’s instructions and our experiences
                grew rapidly, including the many differences between MacOS and Windows as discovered
                by Caleb and Dr. Beshero-Bondar, we realized we would need separate sets of
                instructions for each platform as well.<footnote>
                    <para>Digit 210’s set of installation instructions, initially drafted by Dr.
                        Beshero-Bondar, the battle-testing students made contributions as they ran
                        into their own issues, and the final documents were proofread and heavily
                        edited by Michael Simons: Version for MacOS: <link xlink:href="https://github.com/newtfire/textAnalysis-Hub/blob/main/Installations/InstallNotes-Mac.md" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">https://github.com/newtfire/textAnalysis-Hub/blob/main/Installations/InstallNotes-Mac.md</link>
                        (2025). Version for Windows: <link xlink:href="https://github.com/newtfire/textAnalysis-Hub/blob/main/Installations/InstallNotes-Win.md" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">https://github.com/newtfire/textAnalysis-Hub/blob/main/Installations/InstallNotes-Win.md</link>
                        (2025).</para>
                </footnote> Our instructions also taught the students many things that would be
                useful to them in general as DIGIT majors, including how to install a package
                manager (<link xlink:href="https://brew.sh/" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">Homebrew</link> for MacOS or <link xlink:href="https://chocolatey.org/" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">Chocolatey</link> for Windows) for the
                purpose of installing OpenJDK,<footnote>
                    <para>Preliminary set of student-developed instructions for installing OpenJDK
                        via a package manager. Version for MacOS: <link xlink:href="https://github.com/newtfire/textAnalysis-Hub/blob/main/Installations/OpenJDK-mac.md" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">https://github.com/newtfire/textAnalysis-Hub/blob/main/Installations/OpenJDK-mac.md</link>
                        (2025). Version for Windows: <link xlink:href="https://github.com/newtfire/textAnalysis-Hub/blob/main/Installations/InstallNotes-Win.md" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">https://github.com/newtfire/textAnalysis-Hub/blob/main/Installations/InstallNotes-Win.md</link>
                        (2025).</para>
                </footnote> creating shell aliases and editing system dot-files,<footnote>
                    <para>Digit 210 assignment: <quote>Create Shell Aliases and System ‘Dot
                            Files.’</quote>
                        <link xlink:href="https://newtfire.org/courses/tutorials/command-line-aliases.html" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">https://newtfire.org/courses/tutorials/command-line-aliases.html</link>
                        (2025). Last accessed 2025-07-02.</para>
                </footnote> and <quote>smoke-testing</quote> their installations to ensure proper
                configuration.</para>
            <para>We are grateful to Dr. Birnbaum for joining us over Zoom for consultation sessions
                as we worked on this battle-testing phase, and we learned a lot from his whimsical
                sidenotes and extraordinary yet accessible knowledge. While individual members of
                the class might have been able to successfully prepare their environments, without
                our work ahead of time, things would have likely been very chaotic when Dr. Birnbaum
                arrived to guest-instruct our class. Additionally, this was a condensed experiement,
                as Dr. Birnbaum was allotted just three, 50 minute class periods for teaching us
                both ixml and XProc. So, ensuring all students were ready to hit the ground running
                at the time of his guest appearance was crucial. He created a thorough lesson plan
                outlining this fast-tracked learning experience.<footnote>
                    <para>David J. Birnbaum, <quote>Lesson plan: Invisible XML and XProc</quote>
                        (hosted on <link xlink:href="http://dh.obdurodon.org/" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">Obdurodon.org</link>), <link xlink:href="http://dh.obdurodon.org/ixml-and-xproc-lesson-plan.xhtml" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://dh.obdurodon.org/ixml-and-xproc-lesson-plan.xhtml</link> (2025).
                        Last accessed 2025-07-02.</para>
                </footnote> His lesson plan shared very important introductory readings by Norm
                Tovey-Walsh and Martin Kraetke, serving as an orientation to ixml and XProc respectively.<footnote>
                    <para>Norm Tovey-Walsh, <quote>Writing Invisible XML grammars</quote>, <link xlink:href="https://www.xml.com/articles/2022/03/28/writing-invisible-xml-grammars/" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">https://www.xml.com/articles/2022/03/28/writing-invisible-xml-grammars/</link>
                        (2022). Martin Kraetke, <quote>XProc 3.0 Tutorial</quote>, <link xlink:href="https://xporc.net/xproc-tutorial/" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">https://xporc.net/xproc-tutorial/</link>. Last accessed
                        2025-07-02.</para>
                </footnote> He also introduced us to John Lumley’s ixml workbench as a helpful
                resource to practice using ixml.<footnote>
                    <para>John Lumley, <quote>jωiXML processor</quote> (an online resource for
                        easily processing text with an iXML grammar), <link xlink:href="https://johnlumley.github.io/jwiXML.xhtml" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">https://johnlumley.github.io/jwiXML.xhtml</link> (2024). Last accessed
                        2025-07-02.</para>
                </footnote></para>
        </section>
    </section>
    <section>
        <title>Invisible XML and the Music Analysis Project</title>
        <para>We hoped initially that students would be inspired to try out these technologies on
            their own projects. And some students did, but we learned some unexpected things about
            their potential application, especially to the field of music encoding. </para>
        <section>
            <title>A Project on Chord Chart Analysis: What ixml Looks Like in a Large-Scale Student
                Project</title>
            <subtitle>The Optimally Motivated Student Experience</subtitle>
            <para>Michael again! Dr. Beshero-Bondar deemed me an <quote>optimally motivated
                    student</quote> because I did the work (relatively on time) and was able to
                grasp ixml and XProc enough to take it further than what most other students
                attempted to do. The big question for me was: <quote>Are these technologies worth
                    incorporating into my group’s semester project?</quote>
            </para>
            <para>The short-term answer, unfortunately, was <quote>no</quote>, because I held up my
                team working on something that I actually completed with regular expressions
                search-and-replace operations in less than a quarter of the time I spent on ixml.
                However, the big picture answer may in fact be <quote>yes</quote>, because here I
                am, reflecting on my learning experience in a professional setting which is a great
                honor. I’m walking away from the semester with a larger perspective on the potential
                of ixml and a greater hopefulness for completing and expanding the project.</para>

            <section>
                <title>Introducing the GretaVanZeppelin Project</title>
                <mediaobject>
                    <imageobject>
                        <imagedata format="png" fileref="../../../vol30/graphics/Beshero-Bondar01/Beshero-Bondar01-001.png" width="30%"/>
                    </imageobject>
                    <caption>
                        <para>GretaVanZeppelin Project Logo</para>
                    </caption>
                </mediaobject>
                <para>The <link xlink:href="https://newtfire.github.io/GretaVanZeppelin/" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">GretaVanZeppelin Project</link> is an analysis of musical artists, Greta
                    Van Fleet and Led Zeppelin, using collections of chord charts encoded in XML and
                    analyzed with various Python tools. While the project is still in what could be
                    considered its early stages, there were many valuable discoveries and triumphs
                    made along the way, as well as a strong setup for future research possibilities.
                    One of the discoveries was finding more efficient ways to encode and analyze
                    chord charts. The main triumph was being able to implement ixml and XProc on a
                    fairly large scale.</para>
                <para>The project goal for my team was to analyze the chord charts of two artists
                    and determine if there were any meaningful comparisons between them, musically
                    or lyrically. <emphasis role="ital">Spoiler alert: our findings were
                        inconclusive!</emphasis> The data we extracted from the documents resulted
                    in some nice data visualizations, but unfortunately, we didn’t take it far
                    enough to draw any of those conclusions between the two chosen artists. So, for
                    me, the main takeaway from this project was, unexpectedly, the process in which
                    we obtained the data, including of course, ixml and an XProc pipeline.</para>
                <para>The future of the project was made aware to us (the project developers) during
                    its showcase at our Digit program’s end-of-the-semester presentation day called
                            <quote><link xlink:href="https://digit-psb.github.io/DIGIT-Works/" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">DIGIT
                            Works</link></quote>. We received valuable feedback from peers and
                    industry professionals about incorporating additional data into the project to
                    achieve a more thorough comparison, specifically, being able to analyze the
                    artists’ influences and genre blending, and using our analysis processes to
                    study more than just two artists.</para>
                <para>My main task for this project was the transformation of the raw text into
                    something the rest of the team could process with various Python tools and
                    display on our website. This became my task largely because of my role as a
                    battle-tester of the new markup tools. Conversely, my battle-tester status
                    motivated me further to include ixml and XProc in our project because I felt a
                    strong desire to implement what I had learned.</para>
                <para>Before discussing my specific processes and work, I feel it is important to
                    share a brief overview of our preliminary research into music encoding, as well
                    as what further research revealed after the conclusion of the semester
                    project.</para>
            </section>
            <section>
                <title>A Brief Background on Music Encoding</title>
                <subtitle>As well as some things we wished we knew when we started the
                    project</subtitle>
                <para>None of us knew where to begin in the markup stages, so Dr. Beshero-Bondar
                    suggested that our team research the <link xlink:href="https://music-encoding.org" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">Music Encoding Initiative
                        (MEI)</link>. We discovered that <quote>the MEI is a community-driven,
                        open-source effort to define a system for encoding musical documents in a
                        machine-readable structure</quote>,<footnote>
                        <para>Music Encoding Initiative, <quote>An introduction to MEI</quote>,
                                <link xlink:href="https://music-encoding.org" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">https://music-encoding.org/about/</link> (2025). Last accessed
                            2025-07-02.</para>
                    </footnote> but we possibly should have kept looking for other options for
                    encoding chord charts at that time. In our research, we only saw examples of
                    encoded sheet music in which all the notes are represented on the page on
                    musical staffs. Sheet music is not conducive to rock music which, in my
                    experience, is based on repeated grooves and phrases that are more effectively
                    communicated verbally/audibly. A chord chart serves as a sufficient guide to
                    show an outline of the structure of the song while displaying what chords need
                    to be played based on what lyrics are being sung. This means there is a lot of
                    implicit information about the song purposefully unmarked in these chord charts,
                    but what we were seeing of the MEI was that it was very much focused on
                    explicitly encoding every note of a piece of music. That, combined with the
                    limited amount of time we had for the project, steered us away from utilizing
                    the MEI guidelines and schema in our project.</para>
                <!-- Image to compare a piece of sheet music to a chord chart. -->
                <para>Our Digit program has one past experience using the MEI: the Locke Anthology Project.<footnote>
                        <para>Locke Anthology Project, <link xlink:href="https://newtfire.github.io/locke-anthology2.0/music.html" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">https://newtfire.github.io/locke-anthology2.0/music.html</link>
                            (2024). Last accessed 2025-07-02.</para>
                    </footnote> This project conformed to MEI and <link xlink:href="https://tei-c.org/release/doc/tei-p5-doc/en/html/index.html" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">TEI
                        Guidelines</link> to preserve a digital version of the poetry and music in
                    Alain Locke’s anthology <emphasis role="ital">The New Negro: An
                        Interpretation</emphasis> (1925). The work includes short snippets of music
                    between its stories and poems, and the team used the MEI’s structure to encode
                    those snippets so that they could be converted to MIDI<footnote>
                        <para>For more information on MIDI, see <link xlink:href="https://midi.org/about-midi-part-1overview" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">https://midi.org/about-midi-part-1overview</link> and <link xlink:href="https://midi.org/specs" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">https://midi.org/specs</link>.
                            Last accessed 2025-07-02.</para>
                    </footnote> and played back as audio files on the website. This use of the MEI
                    was practical, not analytical like the GretaVanZeppelin Project’s focus was. So,
                    while very interesting, the Locke Anthology Project was not something we could
                    have used to help us with our project.</para>
                <para>So, how do we approach a project that utilizes solely chord notation? There
                    are other formats for the digitization of music, most notably <link xlink:href="https://www.musicxml.com/" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">MusicXML</link> which we discovered
                    after the semester’s conclusion. This markup language is used and supported in
                    many popular music notation, recording, and other music-related software.<footnote>
                        <para>MusicXML, <quote>MusicXML Software</quote>, <link xlink:href="https://www.musicxml.com/software/" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">https://www.musicxml.com/software/</link>. For more information on
                            how MusicXML compares to MEI, see <quote><link xlink:href="https://opensheetmusicdisplay.org/blog/blog-music-xml-introduction-comparison/" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">Music XML Introduction and Comparison</link></quote>
                            (OpenSheetMusicDisplay.org, 2025). Last accessed 2025-07-02.</para>
                    </footnote> Upon familiarizing myself with their tutorial,<footnote>
                        <para><link xlink:href="https://www.w3.org/" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">W3C</link>, <quote>MusicXML 4.0
                                Tutorial: Chord Symbols and Diagrams</quote>, <link xlink:href="https://www.w3.org/2021/06/musicxml40/tutorial/chord-symbols-and-diagrams/" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">https://www.w3.org/2021/06/musicxml40/tutorial/chord-symbols-and-diagrams/</link>
                            (2021). Last accessed 2025-07-02.</para>
                    </footnote> I think we <emphasis role="ital">could</emphasis> have utilized it,
                    but it would have still been a very large and time-consuming learning
                    curve.</para>
                <para>We only discovered <link xlink:href="https://www.chordpro.org/" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">ChordPro</link> after the project’s development. I actually knew about
                    ChordPro existing as a way to write chord charts, but only in the context of my
                    church’s music planning software.<footnote>
                        <para><emphasis role="ital">Planning Center Services</emphasis>, the
                            industry-standard program for planning church worship services that
                            includes the ability to easily make and format chord charts using
                            ChordPro. See <quote><link xlink:href="https://pcoservices.zendesk.com/hc/en-us/articles/204262594-Preventing-charts-from-shifting-using-Chord-Pro" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">Preventing charts from shifting using Chord Pro</link></quote>
                            (updated 2025) to learn more about how they implement ChordPro, or watch
                            someone utilize ChordPro within Planning Center: <link xlink:href="https://www.youtube.com/watch?v=GS4GIw_0LQk" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">https://www.youtube.com/watch?v=GS4GIw_0LQk</link> (2018). Last
                            accessed 2025-07-02.</para>
                    </footnote> I was unaware of the scope of ChordPro as an in-development
                    open-source program<footnote>
                        <para>ChordPro on GitHub, <link xlink:href="https://github.com/ChordPro/chordpro" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">https://github.com/ChordPro/chordpro</link>. Last accessed
                            2025-07-02.</para>
                    </footnote> aimed at the markup of chord charts. This could very well be the
                    future of the GretaVanZeppelin project and could also more deeply incorporate
                    ixml, but I’ll discuss that after I’ve explained what the project already
                    accomplished. <emphasis role="ital">(It sure would have been nice to know about
                        ChordPro back in February 2025!)</emphasis></para>
                <para>All of this to say, we ultimately decided to make our own markup specifically
                    for our analysis.</para>
            </section>
            <section>
                <title>Document Analysis and Our Unique Markup</title>
                <para>My implementation of raw text transformation began by sketching out what the
                    process would look like as a pipeline. After some basic XML structure, I knew
                    that I would first need to find the two blank lines that separated the
                    beginnings and endings of each section (Verse, Chorus, etc.) in the song files.
                    Then, I would need to separate chords from lyrics within those sections.
                    Finally, some additional processing would be needed to identify individual
                    chords for counting purposes and to extract either just the chords or just the
                    lyrics for analysis.</para>
                <para>The source of our files was <link xlink:href="https://www.ultimate-guitar.com/" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">Ultimate Guitar</link>. It’s a
                    community resource, so the proofreading process is uncertain. It became apparent
                    that lines were not separated logically as they would be if I or a professional
                    made the chord charts. Instead of the lines being divided by musical
                    phrases/chord progressions, they seemed to be divided by simply what looked the
                    best on the page. This limited our ability to analyze actual chord progressions,
                    so instead we focused on chord usage per song/artist.</para>
                <para>An example of this inconsistent formatting is shown below in an exerpt from
                    Greta Van Fleet’s song <quote><link xlink:href="https://github.com/newtfire/GretaVanZeppelin/blob/d4fc47decd73de3fc04f3d394efa9563be737f8e/pipeline/phase-1/raw-text/Greta/fromTheFires/song-03.txt" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">Flower Power</link></quote>. Each chord is one measure long, so a
                    logical way of dividing the lines of lyrics and chords would be to have each
                    line correspond with one measure. But, the original formatting confusingly makes
                    it seem more complex than that.</para>
                <figure>
                    <title>Original Sample</title>
<programlisting xml:space="preserve">
[Chorus]
A
Turn tonight, firelight
D
Star shines in her eye
A                   D
 Makes me feel like I’m alive
                   A
She’s outta sight, yeah
D
Aw yeah
        A                             D
She’s alright, she’s alright, she’s alright
                         F    G
She’s outta sight, outta sight</programlisting>
                </figure>
                <figure>
                    <title>Ideally Edited Sample (for the purpose of this example)</title>
<programlisting xml:space="preserve">
[Chorus]
A
Turn tonight, firelight
D
Star shines in her eye
A
 Makes me feel like 
D
I’m alive, she’s outta sight
A
Yeah
D
Aw yeah
           A
She’s al - right, she’s alright
           D
She’s al - right, she’s outta sight
      F         G
Outta sight</programlisting>
                </figure>
                <para>One more important note about the raw text: the chords are placed directly
                    over the word, or syllable of the word (e.g., al - <emphasis role="bital">right</emphasis>), that is being sung when that chord is played. In the
                    above example, the chords were placed properly above the words; the easiest way
                    for me to check this is to listen to the song and pay attention to when the
                    chord changes. In some of the chord charts, I noticed that less care was taken
                    to preserve the placement of the chords. For this reason, as well as the
                    sometimes confusing line divisions, I made the decision that we would not make
                    the effort to preserve the chords’ placement with the lyrics. We later
                    discovered a way to accomplish this (see section below on ChordPro), but we
                    cannot see the benefits of it in this project other than beautifully and
                    accurately displaying the final XML output. The objective of the project was not
                    to improve the displaying of the chord charts but rather to analyze them.</para>
                <para>There is, however, something to be said for noting the duration of the chords
                    in the markup. My preliminary research into the <link xlink:href="https://music-encoding.org/" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">MEI</link> showed that they use an
                    attribute <code>@dur</code> in their <code>&lt;chord&gt;</code> elements to
                    indicate the musical duration of the chord. This would allow for chord
                    progressions to be analyzed more completely by our Python analysis tools without
                    having to preserve the spacing between the chords and make the technology
                    understand that each line is, say, one bar. This, however, was not within the
                    scope of this project, because unless there is an AI tool that can accurately
                    listen to music and recognize chord changes, the durations of the chords would
                    have to be marked by hand for every chord in every song. </para>
                <para>All of this begs the question: <quote>Is chord notation declaring itself to be
                        markup in its own right?</quote> What is an easier form of music for a lot
                    of musicians to read is not as easy for a computer to understand because of the
                    implicit information that is purposely not marked in chord charts. Is it worth
                    encoding chord charts more deliberately to support research and analysis? MEI
                    preserves music well in academia; MusicXML outputs MIDI to be played as music;
                    and ChordPro allows humans to write chord charts and accurately place the chords
                    with the words to support performers. Can chord charts allow for a deeper
                    analysis and comparison of music? Our experience with ixml suggests there is
                    hope for this work!</para>
            </section>
            <section>
                <title>Implementation and Reflection on ixml in the Project</title>
                <para>As previously mentioned, this project began with the MEI in mind; the ixml
                    reflects that. Below is an example of a song (as we saw it in oXygen Editor with
                    visible space and newline characters), the full ixml grammar, and XML output
                    after processing the song with the ixml grammar:</para>
                <figure>
                    <title>Raw Text Chord Chart for Greta Van Fleet’s <quote>Flower
                        Power</quote></title>
<programlisting xml:space="preserve">
Flower·Power↵
From·the·Fires↵
Greta·Van·Fleet↵
A↵
↵
[Intro]↵
A·D·A·D·A·D·A·D↵
·↵
·↵
[Verse·1]↵
A····················D↵
·She·is·a·lady,·comes·from·all·around↵
A····························D↵
·She's·many·places,·but·she's·homeward·bound↵
············A↵
And·now·she·walks·kinda·funny↵
I·think·she·knows↵
D↵
Day·by·day·by·day↵
Our·love·grows↵
A···················D↵
·She's·a·lantern·in·the·night↵
She's·outta·sight↵
·↵
·↵
[Pre-Chorus]↵
A·······················D↵
·Ma·ma·ma·ma·ma·ma·ma·ma·ma·ma·ma·ma↵
A·······························D↵
Ma·ma·ma·ma·ma·ma·ma·ma·ma·ma·ma·ma↵
Ma·ma↵
Hey↵
·↵
·↵
[Chorus]↵
A↵
Turn·tonight,·firelight↵
D↵
Star·shines·in·her·eye↵
A···················D↵
·Makes·me·feel·like·I'm·alive↵
···················A↵
She's·outta·sight,·yeah↵
D↵
Aw·yeah↵
········A·····························D↵
She's·alright,·she's·alright,·she's·alright↵
·························F····G↵
She's·outta·sight,·outta·sight↵
·↵
·↵
[Bridge]↵
A·D·A·D↵
·↵
·↵
[Verse·2]↵
A···························D↵
·Electric·gold·our·love·with·tender·care↵
A·····················D↵
·Hills·of·satin·grass·and·maidens·fair↵
········A↵
Now·she·rides·through·the·night↵
On·a·silver·storm↵
D↵
Sword·in·hand↵
Our·fate's·torn↵
A···················D↵
·She's·a·sparrow·of·the·dawn↵
Our·love·is·born↵
·↵
·↵
[Pre-Chorus]↵
A·······················D↵
·Ma·ma·ma·ma·ma·ma·ma·ma·ma·ma·ma·ma↵
A·······························D↵
Ma·ma·ma·ma·ma·ma·ma·ma·ma·ma·ma·ma↵
Ma·ma↵
Hey↵
·↵
·↵
[Chorus]↵
A↵
Turn·tonight,·firelight↵
D↵
Star·shines·in·her·eye↵
A···················D↵
·Makes·me·feel·like·I'm·alive↵
···················A↵
She's·outta·sight,·yeah↵
D↵
Aw·yeah↵
········A·····························D↵
She's·alright,·she's·alright,·she's·alright↵
·························F····G↵
She's·outta·sight,·outta·sight↵
·↵
·↵
[Solo]↵
A···D↵
Yeah↵
A↵
Oh·yeah↵
D·A↵
···Oh·yeah↵
Oh·yeah↵
·······D·A·D·A······D↵
Oh·yeah·······papapa↵
A·D↵
···Oh·yeah↵
·↵
·↵
[Verse·3]↵
A···········G···········D↵
·As·the·days·pass·by·my·mind↵
A··············G↵
·Are·the·wrong,·the·right↵
·······D↵
You·are·my·sunshine↵
A················G··········D↵
·And·as·the·night·begins·to·die↵
A··············G···················D↵
·We·are·the·morning·birds·that·sing·against·the·sky↵
·↵
·↵
[Interlude]↵
A·G·D·A·G·D↵
A·G·D·A·G·D↵
A·G·D·A·G·D↵
A·G·D·A·G·D↵
A·G·D·A·G·D↵
A·G·D·A·G·D↵
A·G·D·A·G·D↵
A·G↵</programlisting>
                </figure>
                <figure>
                    <title>GretaVanZeppelin Project’s ixml</title>
<programlisting language="ixml" xml:space="preserve">
mei: music.
music: title, newline, album, newline, artist, newline, key, newline, newline*, section++newline.
title: ~[#d;#a]+.
album: ~[#d;#a]+.
artist: ~[#d;#a]+.
key: ~[#d;#a]+.
section: type, mdiv.
@type: -"[", ~[#22]+, -"]".
mdiv: ~[#22]+.
-newline: (#d?, #a).
-space: " ".</programlisting>
                </figure>
                <figure>
                    <title>ixml Output File for <quote>Flower Power</quote></title>
<programlisting language="xml" xml:space="preserve">
&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;mei ixml:state="ambiguous" xmlns:ixml="http://invisiblexml.org/NS"&gt;&lt;music&gt;
&lt;title&gt;Flower Power&lt;/title&gt;
&lt;album&gt;From the Fires&lt;/album&gt;
&lt;artist&gt;Greta Van Fleet&lt;/artist&gt;
&lt;key&gt;A&lt;/key&gt;

&lt;section type="Intro"&gt;&lt;mdiv&gt;
A D A D A D A D
 
 &lt;/mdiv&gt;&lt;/section&gt;
&lt;section type="Verse 1"&gt;&lt;mdiv&gt;
A                    D
 She is a lady, comes from all around
A                            D
 She's many places, but she's homeward bound
            A
And now she walks kinda funny
I think she knows
D
Day by day by day
Our love grows
A                   D
 She's a lantern in the night
She's outta sight
 
 &lt;/mdiv&gt;&lt;/section&gt;
&lt;section type="Pre-Chorus"&gt;&lt;mdiv&gt;
A                       D
 Ma ma ma ma ma ma ma ma ma ma ma ma
A                               D
Ma ma ma ma ma ma ma ma ma ma ma ma
Ma ma
Hey
 
 &lt;/mdiv&gt;&lt;/section&gt;
&lt;section type="Chorus"&gt;&lt;mdiv&gt;
A
Turn tonight, firelight
D
Star shines in her eye
A                   D
 Makes me feel like I'm alive
                   A
She's outta sight, yeah
D
Aw yeah
        A                             D
She's alright, she's alright, she's alright
                         F    G
She's outta sight, outta sight
 
 &lt;/mdiv&gt;&lt;/section&gt;
&lt;section type="Bridge"&gt;&lt;mdiv&gt;
A D A D
 
 &lt;/mdiv&gt;&lt;/section&gt;
&lt;section type="Verse 2"&gt;&lt;mdiv&gt;
A                           D
 Electric gold our love with tender care
A                     D
 Hills of satin grass and maidens fair
        A
Now she rides through the night
On a silver storm
D
Sword in hand
Our fate's torn
A                   D
 She's a sparrow of the dawn
Our love is born
 
 &lt;/mdiv&gt;&lt;/section&gt;
&lt;section type="Pre-Chorus"&gt;&lt;mdiv&gt;
A                       D
 Ma ma ma ma ma ma ma ma ma ma ma ma
A                               D
Ma ma ma ma ma ma ma ma ma ma ma ma
Ma ma
Hey
 
 &lt;/mdiv&gt;&lt;/section&gt;
&lt;section type="Chorus"&gt;&lt;mdiv&gt;
A
Turn tonight, firelight
D
Star shines in her eye
A                   D
 Makes me feel like I'm alive
                   A
She's outta sight, yeah
D
Aw yeah
        A                             D
She's alright, she's alright, she's alright
                         F    G
She's outta sight, outta sight
 
 &lt;/mdiv&gt;&lt;/section&gt;
&lt;section type="Solo"&gt;&lt;mdiv&gt;
A   D
Yeah
A
Oh yeah
D A
   Oh yeah
Oh yeah
       D A D A      D
Oh yeah       papapa
A D
   Oh yeah
 
 &lt;/mdiv&gt;&lt;/section&gt;
&lt;section type="Verse 3"&gt;&lt;mdiv&gt;
A           G           D
 As the days pass by my mind
A              G
 Are the wrong, the right
       D
You are my sunshine
A                G          D
 And as the night begins to die
A              G                   D
 We are the morning birds that sing against the sky
 
 &lt;/mdiv&gt;&lt;/section&gt;
&lt;section type="Interlude"&gt;&lt;mdiv&gt;
A G D A G D
A G D A G D
A G D A G D
A G D A G D
A G D A G D
A G D A G D
A G D A G D
A G
&lt;/mdiv&gt;&lt;/section&gt;
&lt;/music&gt;&lt;/mei&gt;</programlisting>
                </figure>
                <para>The way I hear and see the ixml in my head is, <quote>[what’s on the left side
                        of the colon] contains [what’s on the right side of the colon] followed by
                        [anything else, separated by a comma]</quote>.</para>
                <para>Beginning at the first line, the root element named <code>&lt;mei&gt;</code>
                    and the element following (<code>&lt;music&gt;</code>) are named as such simply
                    for the purpose of being adaptable to MEI guidelines in the future. Of course
                    now, we realize that may not be all that beneficial. All of the subsequent
                    elements are contained in the secondary root element <code>&lt;music&gt;</code>.
                    There are four lines of metadata which are there to identify the song. Our team
                    added the <code>&lt;key&gt;</code> metadata to the original documents so that
                    the chords could eventually be identified in a number system for more meaningful 
                    analysis.<footnote><para>See <quote>Converting Chords to Nashville Numbers</quote> on the
                            GretaVanFleet Project’s Methods page for more information on this
                            numbering system: <link xlink:href="https://newtfire.github.io/GretaVanZeppelin/methods.html" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">https://newtfire.github.io/GretaVanZeppelin/methods.html</link>
                            (2025).</para>
                    </footnote></para>
                <para>Then, each <code>&lt;section&gt;</code> element is a different section of the
                    song (Verse, Chorus, etc.). These section headings are noted by square brackets
                        <code>[]</code> in the original documents. While the development of the ixml
                    was difficult as a beginner, it was nothing too challenging until the first
                    major setback: not recognizing the ends of sections. There were no square
                    brackets within the lyrics or the chords themselves—only the section
                    headings—so, logically, a section ends and a new section begins when an opening
                    square bracket <code>[</code> is found. This is a concept we practiced in class
                    using regular expressions in a search-and-replace context. However, this concept
                    didn’t translate one-to-one with ixml, because we also had to account for the
                    two (rather, <emphasis role="ital">most of the time</emphasis> two) blank lines
                    between sections. <link xlink:href="http://dh.obdurodon.org/ixml-ambiguity.xhtml" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">Dr. Birnbaum’s
                        study of ambiguity</link> in ixml helped us solve this problem of parsing
                    the sections, specifically his notes on the double plus sign (++) used to help
                    us define the separation between sections.<footnote>
                        <para>David J. Birnbaum, <quote>Invisible XML and ambiguity</quote>, <link xlink:href="http://dh.obdurodon.org/ixml-ambiguity.xhtml" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://dh.obdurodon.org/ixml-ambiguity.xhtml</link> (2025). Last
                            accessed 2025-07-02.</para>
                    </footnote></para>
                <para>We did, however, encounter an ambiguity problem that was beyond our capacity
                    to solve in ixml. This involved differentiating the multi-character chord
                    symbols from lyrics. These multi-character chords appear periodically in both
                    artists’ songs (as well as throughout Ultimate Guitar), and they provide
                    musicians who read chord charts with valuable information called
                        <quote>extensions</quote> that make the chords more interesting and more
                    accurately representative of how the artist originally played them. Below is a
                    famous example of some of these chords:</para>
                <figure>
                    <title>Snippet from Led Zeppelin’s <quote>Stairway to Heaven</quote></title>
<programlisting xml:space="preserve">
[Verse 1]

There's
  Am         Ammaj9
a lady who's sure  
         Am7         D/F#
All that glitters is gold
          Fmaj7                G  Am
And she's buying a stairway to heaven.</programlisting>
                    <para>This example contains a good variety of the different possible chord
                        extensions: a lowercase <quote>m</quote> for minor, <quote>mmaj9</quote> for
                        a minor chord with a major 9th, and <quote>D/F#</quote> to indicate a D
                        chord with an F# as the bass note of the chord.</para>
                </figure>
                <para>It would be one thing if all the chords were one letter, but the complexity of
                    the chords was too much for us to figure out how to represent in an ixml
                    grammar. There is also the fact that the chord charts do not strictly follow the
                    form of one line of chords followed by one line of lyrics. As seen in a previous
                    figure of Greta Van Fleet’s <quote>Flower Power</quote>, there are sections that
                    contain exclusively chords. There is also the possibility for sections of lyrics
                    with no chords above them. To solve this problem, Dr. Beshero-Bondar developed a
                        <quote>monstrous Regex line</quote> which we implemented with
                        <code>xsl:analyze-string</code> in a single XSLT template through our XProc
                    pipeline following the ixml stage:</para>
<programlisting xml:space="preserve">
&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:math="http://www.w3.org/2005/xpath-functions/math"
    exclude-result-prefixes="xs math"
    version="3.0"&gt;
    &lt;xsl:mode on-no-match="shallow-copy"/&gt;
    &lt;xsl:output method="xml" indent="yes"/&gt;
    &lt;xsl:template match="/"&gt;
        &lt;xsl:apply-templates/&gt;
    &lt;/xsl:template&gt;
    
    &lt;xsl:template match="mdiv"&gt;
        &lt;xsl:analyze-string select="." 
             regex="\n(\s*([A-Z][#ba-z/0-9]*) *([A-Z][#ba-z/0-9]*)?)*\n"&gt;
            &lt;xsl:matching-substring&gt;
                &lt;chordLine&gt;
                    &lt;xsl:for-each select="tokenize(., '\s+')"&gt;
                        &lt;xsl:if test="current() ! matches(., '\S')"&gt;
                            &lt;chord&gt;&lt;xsl:value-of select="current()"/&gt;&lt;/chord&gt;
                        &lt;/xsl:if&gt;
                    &lt;/xsl:for-each&gt;
                &lt;/chordLine&gt;
            &lt;/xsl:matching-substring&gt;
            &lt;xsl:non-matching-substring&gt;
                &lt;lyrics&gt;
                    &lt;xsl:value-of select=". ! normalize-space()"/&gt;
                &lt;/lyrics&gt;
            &lt;/xsl:non-matching-substring&gt;
        &lt;/xsl:analyze-string&gt;
    &lt;/xsl:template&gt;
&lt;/xsl:stylesheet&gt;</programlisting>
                <para>The reason this worked is because there would be one or more capitalized
                        <quote>constructions</quote> (chords) in a row that could combine letters
                    and numbers. That said, this single XSLT template was all we needed to complete
                    our transformation into full XML and represented a successful markup pipeline in
                    our project. I continued by adding some additional attributes to the chords for
                    further processing. To learn more about this, or the entire process including
                    the XProc pipeline, see my <link xlink:href="https://newtfire.github.io/GretaVanZeppelin/methods.html" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest"><quote>Methods</quote> page</link> which was perhaps our most
                    significant development from this semester’s project.</para>
            </section>
            <section>
                <title>What if We Had Used ChordPro? The Potential of ixml</title>
                <para>That last step of separating lyrics and chord lines with XSLT and regular
                    expressions <emphasis role="ital">may</emphasis> have been unncessary had we
                    used ChordPro. Created in 1991, ChordPro predates MEI and MusicXML by almost a decade.<footnote>
                        <para><quote>History of Chord Pro</quote>, <link xlink:href="https://www.chordpro.org/chordpro/chordpro-history/" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">https://www.chordpro.org/chordpro/chordpro-history/</link>. The
                            authors find it disappointing that there seems to be no mention of
                            ChordPro in the documentation of either MEI or MusicXML. (See also
                                    <quote><link xlink:href="https://music-encoding.org/about" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">An
                                    introduction to MEI</link></quote> for a history of MEI and
                                    <quote><link xlink:href="https://opensheetmusicdisplay.org/blog/blog-music-xml-introduction-comparison" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">Music XML Introduction and comparison</link></quote> for a
                            history of MusicXML).</para>
                    </footnote> It began as a chord notation system and has become a fully
                    functional program for creating chord charts, using Perl to produce output.<footnote>
                        <para>A Perl Module file in ChordPro’s library for producing chord chart
                            output, <link xlink:href="https://github.com/ChordPro/chordpro/blob/master/lib/ChordPro/Output/HTML.pm" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">https://github.com/ChordPro/chordpro/blob/master/lib/ChordPro/Output/HTML.pm</link>.
                            Last accessed 2025-07-02.</para>
                    </footnote> Unlike human-readable chord charts that place the chords in a
                    separate line above the lyrics, ChordPro places chords inline, in square
                    brackets, directly in front of the word or even syllable of the word so that
                    their position is preserved no matter the font or spacing. The nature of this
                    inline style of chord placement would allow the chords to remain properly placed
                    within the lyrics after it has been converted to markup.</para>
                <figure>
                    <title>Sample of a Song Written in ChordPro</title>
<programlisting xml:space="preserve">
Flower Power
From the Fires
Greta Van Fleet
A

{start_of_intro}
[A] [D][A][D][A][D][A][D]
{end_of_intro}


{start_of_verse}
[A] She is a lady, comes from[D]all around
[A] She's many places, but she's[Dmaj7add9#13]homeward bound
{end_of_verse}</programlisting>
                    <para>This small example was slightly altered from the original text for
                        debugging purposes to include a chord with many extensions and the end of
                        the verse section.</para>
                </figure>
                <para>During the week of Balisage 2025, the writers of this paper, along with Dr.
                    Birnbaum, were determined to see if a chord chart written in ChordPro would
                    indeed allow ixml to process it entirely (meaning chords were also put in markup
                    separated from the lyrics), taking it further than I originally did in the
                    project’s initial stages. I can confirm that the development of this version of
                    ixml was not any faster than the time it took to develop XSLT to do the same
                    thing (an initial concern with using ixml in the project at all). However, it
                    does, in fact, work! Where the previous ixml was only able to distinguish
                    metadata and song sections (leaving lines, chords, and lyrics as unmarked
                        <quote>blobs</quote> of text), the newly developed ixml that is to be run
                    over chord charts in ChordPro format processes lines, chords, and lyrics with no
                    ambiguity. </para>
                <figure>
                    <title><link xlink:href="https://github.com/newtfire/GretaVanZeppelin/blob/main/pipeline/phase-2-ChordPro/InvisibleXML.ixml" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">ixml for Songs in ChordPro</link></title>
<programlisting xml:space="preserve">
xml: metadata, music.
metadata: title, newline, album, newline, artist, newline, key, newline, newline+.
title: ~[#d;#a]+.
album: ~[#d;#a]+.
artist: ~[#d;#a]+.
key: ~[#d;#a]+.

music: section++(newline, newline+), newline?.
section: type, newline, line++newline, newline, outro.
@type: -"{start_of_", ~["}"]+, -"}".
-outro: -"{end_of_", -~["}"]+, -"}".
line: lineContent.
-lineContent: nullableText, (chord++nullableText, nullableText)?.
chord: -"[", ~["]"]+, -"]".
-nullableText: ~["[]{}";#a;#d]*.
-newline: (-#d?, -#a).</programlisting>
                    <para>Many thanks to Dr. Birnbaum for figuring out how to make the mixed text
                        content of a ChordPro line be unambiguous. The definition of
                            <code>-lineContent</code> and <code>-nullableText</code> was the secret:
                            <code>(chord++nullableText, nullableText)?</code>. It means text may or
                        may not be present at all, and when it is, it might or might not have chords
                        bounded by text (which might not be there at all).</para>
                </figure>
                <figure>
                    <title>Output of a Song Written in ChordPro Processed by Our New ixml</title>
<programlisting xml:space="preserve">
&lt;xml&gt;
   &lt;metadata&gt;
      &lt;title&gt;Flower Power&lt;/title&gt;
      &lt;album&gt;From the Fires&lt;/album&gt;
      &lt;artist&gt;Greta Van Fleet&lt;/artist&gt;
      &lt;key&gt;A&lt;/key&gt;
   &lt;/metadata&gt;
   &lt;music&gt;
      &lt;section type='intro'&gt;
         &lt;line&gt;
            &lt;chord&gt;A&lt;/chord&gt; 
            &lt;chord&gt;D&lt;/chord&gt;
            &lt;chord&gt;A&lt;/chord&gt;
            &lt;chord&gt;D&lt;/chord&gt;
            &lt;chord&gt;A&lt;/chord&gt;
            &lt;chord&gt;D&lt;/chord&gt;
            &lt;chord&gt;A&lt;/chord&gt;
            &lt;chord&gt;D&lt;/chord&gt;
         &lt;/line&gt;
      &lt;/section&gt;
      &lt;section type='verse'&gt;
         &lt;line&gt;
            &lt;chord&gt;A&lt;/chord&gt; She is a lady, comes
            &lt;chord&gt;D&lt;/chord&gt;from all around&lt;/line&gt;
         &lt;line&gt;
            &lt;chord&gt;A&lt;/chord&gt; She's many places, but she's
            &lt;chord&gt;Dmaj7add9#13&lt;/chord&gt;homeward bound&lt;/line&gt;
            ...
      &lt;/section&gt;
      ...
   &lt;/music&gt;
&lt;/xml&gt;</programlisting>
                </figure>
                <para>With this breakthrough, the next steps for the project were to set up and run
                    this new ixml via XProc over the selected texts. The setup was simple:
                    reorganize the project’s GitHub repository into <link xlink:href="https://github.com/newtfire/GretaVanZeppelin/tree/main/pipeline/phase-1" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest"><code>phase-1</code></link> and <link xlink:href="https://github.com/newtfire/GretaVanZeppelin/tree/main/pipeline/phase-2-ChordPro" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest"><code>phase-2-ChordPro</code></link> and start the process from scratch
                    again, beginning with new raw-text files in ChordPro format. This step had an
                    unforeseen issue: there was no proper ChordPro chord chart resource known to us.
                    The best solution we had discovered was <link xlink:href="https://ultimate.ftes.de/" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">this converter webpage</link>, which
                    is designed to input chord charts from Ultimate Guitar and output them in
                    ChordPro format. Sounds perfect! However, besides the fact that the converter
                    only recognized <code>Verse</code> and <code>Chorus</code> and considered all
                    other section names <quote>comments</quote>, it appears they also struggle with
                    the inherent ambiguity of chord charts! It works successfully when the input is
                    as follows: Section Title, line of chords, line of lyrics, line of chords, line
                    of lyrics, etc. But, it struggles when the chart deviates from this pattern
                    (which 99% of our charts do). So, perhaps with our newfound logic for deciding
                    whether a line contains chords, lyrics, or both; we could create our own system
                    for accurately converting chord charts to ChordPro format, which then prepares
                    them for use with our ixml. But, for now, I simply converted, then manually
                    edited, <link xlink:href="https://github.com/newtfire/GretaVanZeppelin/tree/main/pipeline/phase-2-ChordPro/raw-text/Greta/anthemOfThePeacefulArmy" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">one album</link> (eight songs) for the purpose of testing our new ixml on a
                    larger collection in our pipeline. After learning to add the
                        <code>@serialization</code> attribute to the <code>&lt;p:store&gt;</code>
                    step in the XProc pipeline so that we could properly indent and read each of the
                    outputted XML files, we successfully ran our new ixml on a collection of
                    resources! This also means that we elimintated one of the XSLT transformations
                    from the original pipeline, as well as more accurately preserved the chords’
                    placement with the lyrics which means more, and more accurate, data to
                    analyze.</para>
                <para>Most significantly, it seems unlikely that there are any creators or users of
                    ChordPro aware currently of ixml and its ability to read ChordPro and turn it
                    into XML. In our project, ChordPro allows for both a more accurate preservation
                    of the chord charts and a more accurate representation of entire chord
                    progressions. I certainly think it would be interesting to analyze something
                    like the modern artist Greta Van Fleet directly copying a chord progression from
                    a verse in a Led Zeppelin song.</para>
            </section>
        </section>
    </section>

    <section>
        <title>What Do We Gain From Learning and Teaching Invisible XML?</title>

        <para>Was this worth the effort, and was Invisible XML ready for undergraduates in an
            algorithmic text analysis course? Most student project teams were not motivated to apply
            Invisible XML in their projects, and we did not require them to do so. (We only required
            that students give the technologies a try in their homework assignments). For those that
            were motivated (particularly the authors of this paper), there was an interest in the
            technologies themselves cultivated by drafting new documentation and training resources
            for the class, but also an awareness that Invisible XML was not strictly necessary to
            their projects: regular expression matching and XSLT could have sufficed more quickly.
            Yet there is satisfaction in writing a successful grammar and a simplification in the
            documentation, as well. Invisible XML <emphasis>declares</emphasis> the patterns of a
            text document to be the defining grammar of XML nodes, and this is both <emphasis>less
                and more</emphasis> compared to scripting a process with a sequential set of regular
            expression search-and-replace operations. It is <emphasis>more</emphasis> in the sense
            of sheer effort to ensure the operations work, and also <emphasis>more</emphasis>
            definitive as a grammar than a convenient string-match, and perhaps
                <emphasis>more</emphasis> in the sense of requiring the installation of CoffeePot or
            Markup Blitz to process it rather than built-in regular expression search features in a
            coding IDE like oXygen or the <code>re</code> library in Python. But it is
                <emphasis>less</emphasis> in the precise elegance of a grammar that expresses an
            expectation of its source documents, and (in our experience) fewer lines of code, though
            each line scripted with exacting care. Each approach puts students in the position of
            writing declaratively what needs to become their data structure, and perhaps the
            thinking process required of Invisible XML might be different from writing sequential
            regular expression recipes in the way that writing poetry differs from prose. Poetry,
            like Invisible XML, attends to more dimensions of expression for every meaningful and
            resonant word and punctuation mark, by contrast with prose and step-by-step
            search-and-replace operations.</para>
        <para>Certainly for all its expressive power, Invisible XML cannot replace the prosaic
            versatility of regular expressions as a <quote>Swiss-army knife</quote> for many
            different text analysis purposes. And we could have taught our course without it. But
            the experience may have enhanced other kinds of learning in the course. For example, we
            found that students approached Python with better understanding of algorithmic
            pipelines, more clarity and less trouble with installation processes, and especially
            greater comfort with command line processes. Perhaps there was something more: The
            appreciation of <emphasis role="ital">declarative methods</emphasis> this year was
            balanced differently with the <emphasis role="ital">imperative</emphasis> programming of
            Python. Are students understanding the formatting and processing of text differently
            when encountering Invisible XML? In making Invisible XML a topic for homework and
            demonstration, students encountered the concept of grammars, in context with schema
            validation and regular expression matching—related to these but fundamentally different.
            The declaration of patterns, and the experience of writing grammars explicitly to work
            across platforms (Windows, Mac, Linux systems) may have enriched their experience of
            moving between declarative and imperative methods of text handling, and helped them to
            reflect on a certain fluidity of methods.</para>
        <para>Our experience of this course helped to bridge distinct cultures in text analysis that
            are perhaps better connected than experienced in isolation. Perhaps we have found a
            place to assert the value of declarative methods in the 2020s, a time when natural
            language processing of sequential strings dominates the development of Large Language
            Models and <quote>artificial intelligence</quote>. The movement from <quote>raw
                text</quote> to identifying structures as nodes, and then extracting text from nodes
            for meaningful natural language processing, with findings expressed, visualized, and
            output on websites involves a round-trip adventure with text and code formats. Perhaps
            the most significant application of Invisible XML is the agency it gives the student
            coder in constructing their own pattern recognition as XML. The experience of developing
            and processing Invisible XML in a text analysis course makes for a multi-dimensional
            experience of texts, finding value in applying different methods to structured
            architectures and unstructured sequences. Thus we find Invisible XML a worthwhile
            experience for our digital humanities students and look forward to experimenting with it
            in future iterations of our text analysis course.</para>

    </section>

</article>