--- comments: true --- # Chart Parsing Module Tutorial ## 1. Overview Multimodal chart parsing is a cutting-edge OCR technology that focuses on automatically converting various types of visual charts (such as bar charts, line charts, pie charts, etc.) into structured data tables with formatted output. Traditional methods rely on complex pipeline designs with chart keypoint detection models, which involve many prior assumptions and tend to lack robustness. The models in this module leverage the latest VLM (Vision-Language Model) techniques and are data-driven, learning robust features from vast real-world datasets. Application scenarios include financial analysis, academic research, business reporting, and more—for instance, quickly extracting growth trend data from financial reports, experimental comparison figures from research papers, or user distribution statistics from market surveys—empowering users to transition from “viewing charts” to “using data”. ## 2. Supported Model List
| Model | Download Link | Model Size (B) | Storage Size (GB) | Score | Description |
|---|---|---|---|---|---|
| PP-Chart2Table | Inference Model | 0.58 | 1.4 | 80.60 | PP-Chart2Table is a multimodal chart parsing model developed by the PaddlePaddle team. It demonstrates exceptional performance on both Chinese and English chart parsing tasks. The team designed a specialized “Shuffled Chart Data Retrieval” training task and adopted a carefully designed token masking strategy, significantly improving performance on chart-to-table conversion. Additionally, the team enhanced the model with a high-quality data synthesis process using seed data, RAG, and LLM persona-driven generation to diversify training data. To handle large amounts of out-of-distribution (OOD) unlabeled data, a two-stage large model distillation process was used to ensure excellent adaptability and generalization to diverse real-world data. In internal Chinese-English use case evaluations, PP-Chart2Table achieved state-of-the-art performance among models of similar size and reached accuracy comparable to 7B-parameter VLMs in key scenarios. |
image: The path to the input imageresult: The model's prediction outputChartParsing. Parameters:
| Parameter | Description | Type | Default |
|---|---|---|---|
model_name |
Meaning: Model name. Description: If set to None, defaults to PP-Chart2Table. |
str | None |
None |
model_dir |
MeaningModel storage path. | str | None |
None |
device |
Meaning: Inference device. Description: Examples: "cpu", "gpu", "npu", "gpu:0"Defaults to GPU 0 if available; otherwise falls back to CPU. |
str | None |
None |
predict() method for inference. This returns a list of results. The module also offers a predict_iter() method, which behaves identically in terms of inputs and outputs but returns a generator—ideal for large datasets or memory-sensitive scenarios. Choose based on your needs.
predict() method parameters:
| Parameter | Description | Type | Default |
|---|---|---|---|
input |
Meaning: Input data (required). Description: Input formats vary by model.
|
dict |
N/A |
batch_size |
Meaning: Batch size. Description: Any positive integer. |
int |
1 |
Result objects for each sample, with support for printing and saving to JSON:
| Method | Description | Parameter | Type | Explanation | Default |
|---|---|---|---|---|---|
print() |
Print results to terminal | format_json |
bool |
Format output using JSON indentation | True |
indent |
int |
Indentation level for pretty-printed JSON. Only works when format_json=True |
4 | ||
ensure_ascii |
bool |
Whether to escape non-ASCII characters to Unicode. If False, keeps characters as-is. |
False |
||
save_to_json() |
Save results to JSON file | save_path |
str |
File path to save. If a directory, file will use input name as filename. | N/A |
indent |
int |
Same as in print() |
4 | ||
ensure_ascii |
bool |
Same as in print() |
False |
| Property | Description |
|---|---|
json |
Returns the result in JSON format |