How to cite this paper

Ogbuji, Uche. “Privately Automating Common, Uncommon, and Surprising Markup Tasks using AI Large Language Models.” Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). https://doi.org/10.4242/BalisageVol28.Ogbuji01.

Balisage: The Markup Conference 2023
July 31 - August 4, 2023

Balisage Paper: Privately Automating Common, Uncommon, and Surprising Markup Tasks using AI Large Language Models

Uche Ogbuji

Independent Consultant

Uche Ogbuji explores emerging technologies, and develops systems to integrate them with more traditional ones. He’s been doing so since markup and the web came together back in the late 90s; while people were claiming that no serious business would ever be done in the Python programming language, he was contributing to the language’s initial XML libraries. Most recently he founded Zepheira and The Library.Link Network (now Bibliograph, by EBSCO, post acquisition), bringing library catalogs to the web for indexing. He is a prolific writer and speaker on tech (and many other) topics, and also a poet, spoken word performer and DJ with 2 award-winning poetry books, Ndewo, Colorado (Colorado Book Award, Westword award) and Ńchéfù Road (Christopher Smart Prize). Born in Calabar, Nigeria, Uche settled near Boulder, Colorado after much world wandering.

Abstract

Generative AI is everywhere, from DALL-E for image generation to ChatGPT for language tasks. The power of these models boggles the mind even for people who’ve been involved in AI for ages. Can they help us? It’s been widely reported that they can be used to write rather sophisticated code in languages like Python and Javascript — but how well do they work with markup? Can they work with XML properly, or can they only treat it as tag soup? Even without any specialized XML training, large language models (LLMs) prove to have some very interesting, and in some cases impressive, capabilities.

By using self-hosted LLMs rather than third-party services such as ChatGPT or Bard, we can exploit those capabilities even for private applications. There are certainly some clear limitations, but LLMs can also handle a surprising number of common markup tasks out of the box.