--- comments: true --- # Document Image Preprocessing Pipeline Tutorial ## 1. Introduction to Document Image Preprocessing Pipeline The Document Image Preprocessing Pipeline integrates two key functions: document orientation classification and geometric distortion correction. The document orientation classification module automatically identifies the four possible orientations of a document (0°, 90°, 180°, 270°), ensuring that the document is processed in the correct direction. The text image unwarping model is designed to correct geometric distortions that occur during document photography or scanning, restoring the document's original shape and proportions. This pipeline is suitable for digital document management, preprocessing tasks for OCR, and any scenario requiring improved document image quality. By automating orientation correction and geometric distortion correction, this module significantly enhances the accuracy and efficiency of document processing, providing a more reliable foundation for image analysis. The pipeline also offers flexible service-oriented deployment options, supporting calls from various programming languages on multiple hardware platforms. Additionally, the pipeline supports secondary development, allowing you to fine-tune the models on your own datasets and seamlessly integrate the trained models. The General Document Image Preprocessing Pipeline includes the following two modules. Each module can be trained and inferred independently and contains multiple models. For detailed information, please click on the corresponding module to view the documentation. - [Document Image Orientation Classification Module](../module_usage/doc_img_orientation_classification.md) (Optional) - [Text Image Unwarping Module](../module_usage/text_image_unwarping.md) (Optional) In this pipeline, you can select the models to use based on the benchmark data provided below. > The inference time only includes the model inference time and does not include the time for pre- or post-processing.
Document Image Orientation Classification Module (Optional):
ModelModel Download Links Top-1 Acc (%) GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (MB) Description
PP-LCNet_x1_0_doc_ori Inference Model/Training Model 99.06 2.62 / 0.59 3.24 / 1.19 7 A document image classification model based on PP-LCNet_x1_0, which includes four categories: 0°, 90°, 180°, and 270°.
Text Image Unwarping Module (Optional):
ModelModel Download Link CER GPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
CPU Inference Time (ms)
[Normal Mode / High-Performance Mode]
Model Storage Size (MB) Description
UVDoc Inference Model/Training Model 0.179 19.05 / 19.05 - / 869.82 30.3 A high-precision text image unwarping model.
Test Environment Description:
Mode GPU Configuration CPU Configuration Acceleration Technology Combination
Normal Mode FP32 Precision / No TRT Acceleration FP32 Precision / 8 Threads PaddleInference
High-Performance Mode Optimal combination of precision type and acceleration strategy selected in advance FP32 Precision / 8 Threads Optimal backend (Paddle/OpenVINO/TRT, etc.) selected in advance
## 2. Quick Start Before using the General Document Image Preprocessing Pipeline locally, ensure that you have completed the wheel package installation according to the [Installation Guide](../installation.en.md). After installation, you can experience it via the command line or integrate it into Python locally. Please note: If you encounter issues such as the program becoming unresponsive, unexpected program termination, running out of memory resources, or extremely slow inference during execution, please try adjusting the configuration according to the documentation, such as disabling unnecessary features or using lighter-weight models. ### 2.1 Command Line Experience You can quickly experience the `doc_preprocessor` pipeline with a single command: ```bash paddleocr doc_preprocessor -i https://paddle-model-ecology.bj.bcebos.com/paddlex/demo_image/doc_test_rotated.jpg # Specify whether to use the document orientation classification model via --use_doc_orientation_classify paddleocr doc_preprocessor -i ./doc_test_rotated.jpg --use_doc_orientation_classify True # Specify whether to use the text image unwarping module via --use_doc_unwarping paddleocr doc_preprocessor -i ./doc_test_rotated.jpg --use_doc_unwarping True # Specify the use of GPU for model inference via --device paddleocr doc_preprocessor -i ./doc_test_rotated.jpg --device gpu ```
The command line supports more parameter settings. Click to expand for detailed explanations of command line parameters.
Parameter Description Parameter Type Default Value
input Meaning:The data to be predicted. This parameter is required.
Description: For example, the local path of an image file or PDF file: /root/data/img.jpg; or a URL link, such as the network URL of an image file or PDF file: example; or a local directory, which should contain the images to be predicted, such as the local path: /root/data/ (currently does not support prediction of PDF files in directories; PDF files need to be specified to a specific file path).
str
save_path Meaning:Specify the path to save the inference result file.
Description: If not set, the inference result will not be saved locally.
str
doc_orientation_classify_model_name Meaning:The name of the document orientation classification model.
Description: If not set, the pipeline's default model will be used.
str
doc_orientation_classify_model_dir Meaning:The directory path of the document orientation classification model.
Description: If not set, the official model will be downloaded.
str
doc_unwarping_model_name Meaning:The name of the text image unwarping model.
Description: If not set, the pipeline's default model will be used.
str
doc_unwarping_model_dir Meaning:The directory path of the text image unwarping model.
Description: If not set, the official model will be downloaded.
str
use_doc_orientation_classify Meaning:Whether to load and use the document orientation classification module.
Description: If not set, the parameter value initialized by the pipeline will be used, which defaults to True.
bool
use_doc_unwarping Meaning:Whether to load and use the text image unwarping module.
Description: If not set, the parameter value initialized by the pipeline will be used, which defaults to True.
bool
device Meaning:The device used for inference.
Description: Support for specifying specific card numbers:
  • CPU: For example, cpu indicates using the CPU for inference;
  • GPU: For example, gpu:0 indicates using the first GPU for inference;
  • NPU: For example, npu:0 indicates using the first NPU for inference;
  • XPU: For example, xpu:0 indicates using the first XPU for inference;
  • MLU: For example, mlu:0 indicates using the first MLU for inference;
  • DCU: For example, dcu:0 indicates using the first DCU for inference;
  • MetaX GPU: For example, metax_gpu:0 indicates using the first MetaX GPU for inference;
  • Iluvatar GPU: For example, iluvatar_gpu:0 indicates using the first Iluvatar GPU for inference;
If not set, the pipeline initialized value for this parameter will be used. During initialization, the local GPU device 0 will be preferred; if unavailable, the CPU device will be used.
str
enable_hpi Meaning:Whether to enable high-performance inference. bool False
use_tensorrt Meaning:Whether to use the Paddle Inference TensorRT subgraph engine.
Description: If the model does not support acceleration through TensorRT, setting this flag will not enable acceleration.
For Paddle with CUDA version 11.8, the compatible TensorRT version is 8.x (x>=6), and it is recommended to install TensorRT 8.6.1.6.
bool False
precision Meaning:The computational precision, such as fp32, fp16. str fp32
enable_mkldnn Meaning:Whether to enable MKL-DNN acceleration for inference.
Description: If MKL-DNN is unavailable or the model does not support it, acceleration will not be used even if this flag is set.
bool True
mkldnn_cache_capacity Meaning:MKL-DNN cache capacity. int 10
cpu_threads Meaning:The number of threads used for inference on the CPU. int 8
paddlex_config Meaning:Path to PaddleX pipeline configuration file. str

The running results will be printed to the terminal. The running results of the doc_preprocessor pipeline with default configuration are as follows: ```bash {'res': {'input_path': '/root/.paddlex/predict_input/doc_test_rotated.jpg', 'page_index': None, 'model_settings': {'use_doc_orientation_classify': True, 'use_doc_unwarping': True}, 'angle': 180}} ``` The visualization results are saved under the save_path. The visualization results are as follows: ### 2.2 Integration via Python Script The command-line approach is for quick experience and viewing results. Generally, in projects, integration through code is often required. You can achieve rapid inference in pipelines with just a few lines of code. The inference code is as follows: ```python from paddleocr import DocPreprocessor pipeline = DocPreprocessor() # docpp = DocPreprocessor(use_doc_orientation_classify=True) # Specify whether to use the document orientation classification model via use_doc_orientation_classify # docpp = DocPreprocessor(use_doc_unwarping=True) # Specify whether to use the text image unwarping module via use_doc_unwarping # docpp = DocPreprocessor(device="gpu") # Specify whether to use GPU for model inference via device output = pipeline.predict("./doc_test_rotated.jpg") for res in output: res.print() ## Print the structured output of the prediction res.save_to_img("./output/") res.save_to_json("./output/") ``` In the above Python script, the following steps are executed: (1) Instantiate the doc_preprocessor pipeline object via DocPreprocessor(). The specific parameter descriptions are as follows:
Parameter Description Parameter Type Default Value
doc_orientation_classify_model_name Meaning:The name of the document orientation classification model.
Description: If set to None, the pipeline's default model will be used.
str|None None
doc_orientation_classify_model_dir Meaning:The directory path of the document orientation classification model.
Description: If set to None, the official model will be downloaded.
str|None None
doc_unwarping_model_name Meaning:The name of the text image unwarping model.
Description: If set to None, the pipeline's default model will be used.
str|None None
doc_unwarping_model_dir Meaning:The directory path of the text image unwarping model.
Description: If set to None, the official model will be downloaded.
str|None None
use_doc_orientation_classify Meaning:Whether to load and use the document orientation classification module.
Description: If set to None, the parameter value initialized by the pipeline will be used, which defaults to True.
bool|None None
use_doc_unwarping Meaning:Whether to load and use the text image unwarping module.
Description: If set to None, the parameter value initialized by the pipeline will be used, which defaults to True.
bool|None None
device Meaning:The device used for inference.
Description: Support for specifying specific card numbers:
  • CPU: For example, cpu indicates using the CPU for inference;
  • GPU: For example, gpu:0 indicates using the first GPU for inference;
  • NPU: For example, npu:0 indicates using the first NPU for inference;
  • XPU: For example, xpu:0 indicates using the first XPU for inference;
  • MLU: For example, mlu:0 indicates using the first MLU for inference;
  • DCU: For example, dcu:0 indicates using the first DCU for inference;
  • MetaX GPU: For example, metax_gpu:0 indicates using the first MetaX GPU for inference;
  • Iluvatar GPU: For example, iluvatar_gpu:0 indicates using the first Iluvatar GPU for inference;
  • None: If set to None, the pipeline initialized value for this parameter will be used. During initialization, the local GPU device 0 will be preferred; if unavailable, the CPU device will be used.
str|None None
enable_hpi Meaning:Whether to enable high-performance inference. bool False
use_tensorrt Meaning:Whether to use the Paddle Inference TensorRT subgraph engine.
Description: If the model does not support acceleration through TensorRT, setting this flag will not enable acceleration.
For Paddle with CUDA version 11.8, the compatible TensorRT version is 8.x (x>=6), and it is recommended to install TensorRT 8.6.1.6.
bool False
precision Meaning:The computational precision, such as fp32, fp16. str "fp32"
enable_mkldnn Meaning:Whether to enable MKL-DNN acceleration for inference.
Description: If MKL-DNN is unavailable or the model does not support it, acceleration will not be used even if this flag is set.
bool True
mkldnn_cache_capacity Meaning:MKL-DNN cache capacity.
int 10
cpu_threads Meaning:The number of threads used for inference on the CPU. int 8
paddlex_config Meaning:Path to PaddleX pipeline configuration file. str|None None
(2) Call the predict() method of the doc_preprocessor pipeline object for inference prediction. This method will return a list of results. In addition, the pipeline also provides the predict_iter() method. The two methods are completely consistent in terms of parameter acceptance and result return. The difference is that predict_iter() returns a generator, which can process and obtain prediction results step by step, suitable for scenarios with large datasets or where memory savings are desired. You can choose either of the two methods according to your actual needs. The following are the parameters and their descriptions of the predict() method:
Parameter Description Parameter Type Default Value
input Meaning:The data to be predicted, supporting multiple input types. This parameter is required.
Description:
  • Python Var: For example, image data represented as numpy.ndarray;
  • str: For example, the local path of an image file or PDF file: /root/data/img.jpg; or a URL link, such as the network URL of an image file or PDF file: example; or a local directory, which should contain the images to be predicted, such as the local path: /root/data/ (currently does not support prediction of PDF files in directories; PDF files need to be specified to a specific file path);
  • list: The list elements should be of the above types, such as [numpy.ndarray, numpy.ndarray], ["/root/data/img1.jpg", "/root/data/img2.jpg"], ["/root/data1", "/root/data2"].
Python Var|str|list
use_doc_orientation_classify Meaning:Whether to use the document orientation classification module during inference. bool|None None
use_doc_unwarping Meaning:Whether to use the text image unwarping module during inference. bool|None None
(3) Process the prediction results. The prediction result for each sample is a corresponding Result object, which supports operations such as printing, saving as an image, and saving as a json file:
Method Description Parameter Parameter Type Description Default Value
print() Print the result to the terminal format_json bool Whether to format the output content using JSON indentation True
indent int Specify the indentation level to beautify the output JSON data for better readability. Only valid when format_json is True. 4
ensure_ascii bool Control whether to escape non-ASCII characters to Unicode. When set to True, all non-ASCII characters will be escaped; False retains the original characters. Only valid when format_json is True. False
save_to_json() Save the result as a JSON file save_path str The file path for saving. When it is a directory, the saved file name will be consistent with the input file type name. None
indent int Specify the indentation level to beautify the output JSON data for better readability. Only valid when format_json is True. 4
ensure_ascii bool Control whether to escape non-ASCII characters to Unicode. When set to True, all non-ASCII characters will be escaped; False retains the original characters. Only valid when format_json is True. False
save_to_img() Save the result as an image file save_path str The file path for saving. Supports directory or file paths. None
Here's the continuation of the translation: