[How to Properly Extract Text from a DOCX File in n8n?] solved
Hey Community,
I'm currently facing an issue when trying to extract text from a DOCX file in my workflow. The "Extract from Text" node returns unreadable binary data instead of plain text.
Context:
  • DOCX files are essentially ZIP archives containing multiple XML files.
  • The actual document text is stored in word/document.xml, which is encoded in UTF-8.
  • Simply using "Extract from XML" on the raw DOCX file doesn’t work because it's compressed.
  • Trying to send the extracted data to Supabase Vector Store results in the error: "unsupported Unicode escape sequence 400 Bad Request".
What I Tried:
  1. Passing the DOCX file through "Extract from Text" → produces garbled characters.
  2. Changing file encoding before extracting → no success.
  3. Sending raw DOCX data to Supabase → results in the Unicode error.
Question:
How can I properly extract readable text from a DOCX file in n8n? Do I need an additional step to unzip the DOCX file before extracting the XML content? Is there a built-in node or best practice for handling DOCX files correctly?
Any help or guidance would be greatly appreciated! 🚀
8
11 comments
Benjamin Schölkmann
2
[How to Properly Extract Text from a DOCX File in n8n?] solved
AI Automation Society
skool.com/ai-automation-society
A community built to master no-code AI automations. Join to learn, discuss, and build the systems that will shape the future of work.
Leaderboard (30-day)
Powered by