In daily work, colleagues or supervisors unfamiliar with LaTeX may request Word documents for review and collaboration. This project provides a Python script that uses Pandoc and Pandoc-Crossref to automatically convert LaTeX files into Word documents following a specified format. Although there's no perfect method for converting LaTeX to Word, the output generated by this project meets informal review needs. However, around 5% of the content (such as author information) may require manual corrections post-conversion.
- Supports formula conversion
- Automatically numbers and cross-references images, tables, formulas, and citations
- Converts multi-figure LaTeX files
- Outputs Word files in a specified format
- Supports Chinese
Examples are shown below; more results are in tests:
Ensure Pandoc and Pandoc-Crossref are correctly installed (see Install Dependencies). Execute the following command in your terminal:
tex2docx convert --input-texfile <your_texfile> --output-docxfile <your_docxfile> Replace <...> in the command with the corresponding file path and name.
Ensure you have installed Pandoc, Pandoc-Crossref, and related Python libraries.
Install Pandoc as described in the official documentation. It is recommended to download the latest release from Pandoc Releases.
Install Pandoc-Crossref by following the official documentation. Ensure compatibility between Pandoc and Pandoc-Crossref and configure the path correctly.
Install from PyPI:
pip install tex2docxThis tool supports both command-line and script-based usage. Ensure the required dependencies are installed.
Run the following command in your terminal:
tex2docx convert --input-texfile <your_texfile> --output-docxfile <your_docxfile> --reference-docfile <your_reference_docfile> --bibfile <your_bibfile> --cslfile <your_cslfile>Use convert --help to see details on these parameters.
For example, using tests/en:
convert --input-texfile ./tests/en/main.tex --output-docxfile ./tests/en/main_cli.docx --reference-docfile ./my_temp.docx --bibfile ./tests/ref.bib --cslfile ./ieee.cslThis will generate the Word file main_cli.docx in the tests/en directory.
from tex2docx import LatexToWordConverter
config = {
'input_texfile': '<your_texfile>',
'output_docxfile': '<your_docxfile>',
'reference_docfile': '<your_reference_docfile>',
'cslfile': '<your_cslfile>',
'bibfile': '<your_bibfile>',
'fix_table': True,
'debug': False
}
converter = LatexToWordConverter(**config)
converter.convert()For more examples, refer to tests/test_integration.py.
-
Inconsistent Multi-Figure Layout
The relative positions of sub-figures may differ between LaTeX compilation and Word conversion, as shown below:This may result from redefined page size or parameters in the LaTeX file. To address this, adjust the
MULTIFIG_TEXFILE_TEMPLATEvariable. Below is an example for reference:import tex2docx my_multifig_texfile_template = r""" \documentclass[preview,convert,convert={outext=.png,command=\unexpanded{pdftocairo -r 600 -png \infile}}]{standalone} \usepackage{graphicx} \usepackage{subfig} \usepackage{xeCJK} \usepackage{geometry} \newgeometry{ top=25.4mm, bottom=33.3mm, left=20mm, right=20mm, headsep=10.4mm, headheight=5mm, footskip=7.9mm, } \graphicspath{{%s}} \begin{document} \thispagestyle{empty} %s \end{document} """ config = { 'input_texfile': 'tests/en/main.tex', 'output_docxfile': 'tests/en/main.docx', 'reference_docfile': 'my_temp.docx', 'cslfile': 'ieee.csl', 'bibfile': 'tests/ref.bib', 'multifig_texfile_template': my_multifig_texfile_template, } converter = tex2docx.LatexToWordConverter(**config) converter.convert()
-
The Word Output Doesn't Meet Formatting Requirements
Use Word's style management tools to adjust the styles inmy_temp.docx.
This project relies on Pandoc and Pandoc-Crossref to convert LaTeX files to Word documents. The core command used is:
pandoc texfile -o docxfile \
--lua-filter resolve_equation_labels.lua \
--filter pandoc-crossref \
--reference-doc=temp.docx \
--number-sections \
-M autoEqnLabels \
-M tableEqns \
-M reference-section-title=Reference \
--bibliography=ref.bib \
--citeproc --csl ieee.csl \
-t docx+native_numbering--lua-filter resolve_equation_labels.luahandles equation numbering and cross-references, inspired by Constantin Ahlmann-Eltze's script.--filter pandoc-crossrefhandles cross-references for other elements.--reference-doc=my_temp.docxapplies the styles frommy_temp.docxto the generated Word file. Two template files are included:TIE-temp.docx(for TIE journal submission, double-column format) andmy_temp.docx(single-column, designed for easier annotation).--number-sectionsadds numbering to section headings.-M autoEqnLabels,-M tableEqnsenable automatic numbering of equations and tables.-M reference-section-title=Referenceadds a section title for references.--bibliography=my_ref.bibgenerates the bibliography fromref.bib.--citeproc --csl ieee.cslformats citations and the bibliography using the IEEE citation style.-t docx+native_numberingimproves captions for images and tables.
The conversion for multi-figure LaTeX content may not be perfect. This project extracts multi-figure code from the LaTeX file and uses the convert and pdftocairo tools to compile the figures into a single large PNG file, replacing the original LaTeX image code and updating references to ensure smooth import into Word.
- Captions for figures and tables in Chinese still start with "Figure" and "Table".
- Author information is not fully converted.
-
Major code refactoring: Complete modular restructuring of the codebase
- Split monolithic
tex2docx.py(1139 lines) into 8 specialized modules for better maintainability - Introduced clear separation of concerns with dedicated modules for configuration, parsing, conversion, etc.
- Enhanced type annotations and error handling throughout the codebase
- Split monolithic
-
Improved testing infrastructure:
- Renamed and reorganized test files for better clarity:
test_tex2docx_refactored.py→test_unit.py(unit tests for individual components)test_tex2docx.py→test_integration.py(end-to-end integration tests)
- Added comprehensive test documentation in
tests/README.md - Enhanced pytest configuration with proper markers and test discovery
- Renamed and reorganized test files for better clarity:
-
Critical bug fixes:
- Fixed LaTeX reference line break issue where
\ref{}commands were incorrectly split as\nef{} - Resolved CLI import errors with proper module structure
- Enhanced reference numbering accuracy for tables, figures, and equations
- Fixed LaTeX reference line break issue where
-
Developer experience improvements:
- Better project structure with clear module boundaries
- Comprehensive documentation updates
- Cleaner development workflow with organized test suites
- Preserved all existing functionality while improving code quality
-
Add support for
\includein LaTeX texfiles. (#3) -
Enhanced the display of figures and tables for better formatting and presentation.
-
Fixed conflicts between
cmandvarwidthin tables. -
Resolved conflict issues between
subfigandvarwidth.
- Add feature and option to fix table (issue #2).
- fix comments bug #1.
- Improved default value settings, including built-in Word style templates and ieee.csl (used as default values).
- Fixed module import issues, improving stability.
- Enhanced the command-line tool for a more intuitive and efficient user experience.
- Switched to
pyproject.tomlfor dependency management, replacingsetup.py. - Released on PyPI; users can install via
pip install tex2docx.
There are two kinds of people: those who use LaTeX and those who don't. The latter often ask the former for a Word version. Hence, the following command:
pandoc input.tex -o output.docx\
--filter pandoc-crossref \
--reference-doc=my_temp.docx \
--number-sections \
-M autoEqnLabels -M tableEqns \
-M reference-section-title=Reference \
--bibliography=my_ref.bib \
--citeproc --csl ieee.csl \
-t docx+native_numbering



