LayoutLM

Overview

The LayoutLM model was proposed in the paper LayoutLM: Pre-training of Text and Layout for Document Image Understanding by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, and Ming Zhou. It’s a simple but effective pretraining method of text and layout for document image understanding and information extraction tasks, such as form understanding and receipt understanding. It obtains state-of-the-art results on several downstream tasks:

form understanding: the FUNSD dataset (a collection of 199 annotated forms comprising more than 30,000 words).
receipt understanding: the SROIE dataset (a collection of 626 receipts for training and 347 receipts for testing).
document image classification: the RVL-CDIP dataset (a collection of 400,000 images belonging to one of 16 classes).

The abstract from the paper is the following:

Pre-training techniques have been verified successfully in a variety of NLP tasks in recent years. Despite the widespread use of pretraining models for NLP applications, they almost exclusively focus on text-level manipulation, while neglecting layout and style information that is vital for document image understanding. In this paper, we propose the LayoutLM to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents. Furthermore, we also leverage image features to incorporate words’ visual information into LayoutLM. To the best of our knowledge, this is the first time that text and layout are jointly learned in a single framework for document-level pretraining. It achieves new state-of-the-art results in several downstream tasks, including form understanding (from 70.72 to 79.27), receipt understanding (from 94.02 to 95.24) and document image classification (from 93.07 to 94.42).

Tips:

In addition to input_ids, forward() also expects the input bbox, which are the bounding boxes (i.e. 2D-positions) of the input tokens. These can be obtained using an external OCR engine such as Google’s Tesseract (there’s a Python wrapper available). Each bounding box should be in (x0, y0, x1, y1) format, where (x0, y0) corresponds to the position of the upper left corner in the bounding box, and (x1, y1) represents the position of the lower right corner. Note that one first needs to normalize the bounding boxes to be on a 0-1000 scale. To normalize, you can use the following function:

Copied

def normalize_bbox(bbox, width, height):
    return [
        int(1000 * (bbox[0] / width)),
        int(1000 * (bbox[1] / height)),
        int(1000 * (bbox[2] / width)),
        int(1000 * (bbox[3] / height)),
    ]

Here, width and height correspond to the width and height of the original document in which the token occurs. Those can be obtained using the Python Image Library (PIL) library for example, as follows:

Copied

from PIL import Image

# Document can be a png, jpg, etc. PDFs must be converted to images.
image = Image.open(name_of_your_document).convert("RGB")

width, height = image.size

Resources

A list of official BOINC AI and community (indicated by 🌎) resources to help you get started with LayoutLM. If you’re interested in submitting a resource to be included here, please feel free to open a Pull Request and we’ll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.

Document Question Answering

A blog post on fine-tuning LayoutLM for document-understanding using Keras & BOINC AI Transformers.
A blog post on how to fine-tune LayoutLM for document-understanding using only BOINC AI Transformers.
A notebook on how to fine-tune LayoutLM on the FUNSD dataset with image embeddings.
See also: Document question answering task guide

Text Classification

A notebook on how to fine-tune LayoutLM for sequence classification on the RVL-CDIP dataset.
Text classification task guide

Token Classification

A notebook on how to fine-tune LayoutLM for token classification on the FUNSD dataset.
Token classification task guide

Other resources

Masked language modeling task guide

🚀 Deploy

A blog post on how to Deploy LayoutLM with BOINC AI Inference Endpoints.

hashtagLayoutLM

hashtagOverview

hashtagResources

hashtagLayoutLMConfig

hashtagclass transformers.LayoutLMConfig

hashtagLayoutLMTokenizer

hashtagclass transformers.LayoutLMTokenizer

hashtagLayoutLMTokenizerFast

hashtagclass transformers.LayoutLMTokenizerFast

hashtagLayoutLMModel

hashtagclass transformers.LayoutLMModel

hashtagLayoutLMForMaskedLM

hashtagclass transformers.LayoutLMForMaskedLM

hashtagLayoutLMForSequenceClassification

hashtagclass transformers.LayoutLMForSequenceClassification

hashtagLayoutLMForTokenClassification

hashtagclass transformers.LayoutLMForTokenClassification

hashtagLayoutLMForQuestionAnswering

hashtagclass transformers.LayoutLMForQuestionAnswering

hashtagTFLayoutLMModel

hashtagclass transformers.TFLayoutLMModel

hashtagTFLayoutLMForMaskedLM

hashtagclass transformers.TFLayoutLMForMaskedLM

hashtagTFLayoutLMForSequenceClassification

hashtagclass transformers.TFLayoutLMForSequenceClassification

hashtagTFLayoutLMForTokenClassification

hashtagclass transformers.TFLayoutLMForTokenClassification

hashtagTFLayoutLMForQuestionAnswering

hashtagclass transformers.TFLayoutLMForQuestionAnswering

LayoutLM

Overview

Resources

LayoutLMConfig

class transformers.LayoutLMConfig

LayoutLMTokenizer

class transformers.LayoutLMTokenizer

LayoutLMTokenizerFast

class transformers.LayoutLMTokenizerFast

LayoutLMModel

class transformers.LayoutLMModel

LayoutLMForMaskedLM

class transformers.LayoutLMForMaskedLM

LayoutLMForSequenceClassification

class transformers.LayoutLMForSequenceClassification

LayoutLMForTokenClassification

class transformers.LayoutLMForTokenClassification

LayoutLMForQuestionAnswering

class transformers.LayoutLMForQuestionAnswering

TFLayoutLMModel

class transformers.TFLayoutLMModel

TFLayoutLMForMaskedLM

class transformers.TFLayoutLMForMaskedLM

TFLayoutLMForSequenceClassification

class transformers.TFLayoutLMForSequenceClassification

TFLayoutLMForTokenClassification

class transformers.TFLayoutLMForTokenClassification

TFLayoutLMForQuestionAnswering

class transformers.TFLayoutLMForQuestionAnswering