PhD Position in Machine Learning (Document Image Analysis)
Large language models (LLMs) have a high potential for analyzing, recognizing, and validating scanned documents. However, they are mainly focused on the OCR text and do not take into account visual aspects, such as layout, illustrations, etc. that are of fundamental importance for document understanding. The successful candidate will perform basic research and develop novel methods for efficient integration of visual aspects into LLMs for document understanding. A particular focus will be to obtain explainable results with respect to both visual and textual contents of the documents.