Balisage Paper: With One Voice: A Modular Approach to Streamlining Character Data for Tokenization

Balisage: The Markup Conference 2019
July 30 - August 2, 2019

The materials listed below were provided by the speaker as supplements to a presentation at Balisage. These materials may include the slides or visuals used in the presentation; supplementary material, such as code samples or a demonstration application; and/or the paper accompanying the presentation (if it has not been provided in XML). These materials have been zipped for easy download and are identified by a brief description of the contents. The materials themselves are untouched, that is, they have not been tested or edited by Balisage: The Markup Conference or by Mulberry Technologies, Inc. As such, they are included on this website AS IS, i.e., as provided by the speaker, with no warranties, express or otherwise, made by Balisage or Mulberry.

Slides and Materials

×

Apache Software Foundation. Lucene 8.0.0 documentation. Package org.apache.lucene.analysis. https://lucene.apache.org/core/8_0_0/core/org/apache/lucene/analysis/package-summary.html#package.description. Accessed 2019-04-12.

×

Bauman, Syd. “The Hard Edges of Soft Hyphens.” Presented at Balisage: The Markup Conference 2016, Washington, DC, August 2–5, 2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). doi:https://doi.org/10.4242/BalisageVol17.Bauman01.

×

Burns, Philip R. 2013. “MorphAdorner v2: A Java Library for the Morphological Adornment of English Language Texts.” Northwestern University. https://morphadorner.northwestern.edu/morphadorner/download/morphadorner.pdf. Accessed 2019-07-05.

×

Davies, Lady Eleanor. 2015. The Benediction, 1651. From the Women Writers Online XML, last modified 2019-02-10 (commit 36259). Published at https://www.wwp.northeastern.edu/texts/davies.benediction.html. (Requires subscription.)

×

eXist-db Project. Documentation. Whitespace Treatment and Ignored Content. In Full Text Index. http://exist-db.org/exist/apps/doc/lucene.xml#D3.19.62. Accessed 2019-07-04.

×

Jockers, Matthew L. 2016. Text Quality, Text Variety, and Parsing XML. In Text Analysis with R for Students of Literature. Quantitative Methods in the Humanities and Social Sciences. Springer International.

×

TEI Consortium. Appendix C Elements. In P5: Guidelines for Electronic Text Encoding and Interchange. Version 3.5.0. https://www.tei-c.org/release/doc/tei-p5-doc/en/html/REF-ELEMENTS.html. Accessed 2019-07-04.

×

W3C. Extensible Markup Language (XML) 1.0 (Fifth Edition). Section 2.4, Character Data and Markup. https://www.w3.org/TR/REC-xml/#syntax. Accessed 2019-04-12.

×

W3C. XQuery and XPath Full Text 1.0. https://www.w3.org/TR/xpath-full-text-10/. Accessed 2019-04-12.

×

XTF Users List. 2012-02-06 – 2012-05-04. Forum thread. Tags that break up words. https://groups.google.com/forum/#!topic/xtf-user/hsvFOTM0b9E. Accessed 2019-07-04.