Replies: 1 comment
-
|
Docling doesn't infer label/sublabel hierarchy in tables from formatting like italics, indentation, or spacing. It treats all rows at the same level unless the document structure encodes hierarchy explicitly (for example, using table row/column structure, heading tags, or list numbering). Formatting such as italics or a single space is stored as text styling and doesn't affect how Docling parses table hierarchy in DOCX, HTML, PDF, or XML inputs. There’s no built-in option to change this behavior—users typically preprocess documents to encode hierarchy explicitly or use post-processing scripts to reconstruct hierarchy based on formatting cues. Some users have had success with external tools like docling-hierarchical-pdf or custom regex scripts for this purpose, but these are not part of core Docling functionality. To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
@dosu
In a table there is a label and 3 sublabel written italic and one space inside like below
A : 1234
B : 24
C: 45
docling cant recognize this difference and behave they are in same hierarchy. How can i solve it ?
Beta Was this translation helpful? Give feedback.
All reactions