Publication tool proof of concept

2026-03-25

Current workflow

Author writes in Word
Creates PDF
Opens application, uploads PDF
Publishes

Problem statements

  • Full briefings not available as html
  • Manual process for creating PDFs
  • Cost of Adobe Acrobat licences
  • Complex, fragile, templates

More problems? Add to the chat.

Assumptions (check them)

  • Customers need printable version
  • We will still want to make briefings available in hard copy
  • We still need to update documents
  • Any more?

Proof of concept: scope

  • Word to HTML conversion
  • HTML to PDF conversion
  • Re-editing in basic Word templates

Not considering:

  • Other authoring tools
  • Support model

Word to HTML

  • Chosen option: Mammoth package in Python
  • Widely used; simple; clean html; configurable
  • https://github.com/mwilliamson/python-mammoth
  • mammoth document.docx output.html --style-map=custom-style-map
  • ✅ Full briefings available as html (with some editing)

HTML to PDF

  • CSS can handle print layouts
  • Print function in browser applies styles
  • More functionality available through scripting
  • Options: Prince XML; Docraptor; WeasyPrint
  • Choice: WeasyPrint — free; footnotes, TOCs, accessible, PDF bookmarks

🛑 Need to build application to test!

Demo

What have we demonstrated?

  • Get good HTML from our Word files
  • Get near-equivalent PDFs from the HTML
  • Round-trip: update using Word generated from HTML — no archive of Word files needed

Not done yet

  • Images, charts, tables
  • Implement accessible format
  • PDF tidying

Comparison with XML

Pros Cons
Less business change Limits on structure
No new authoring tool — time and £££ No modular text
Quicker path to benefits Other teams might need XML
Staggered transition

Tentative recommendation: HTML → PDF as first step

Any thoughts?