-
Notifications
You must be signed in to change notification settings - Fork 3.4k
feat: Improve the parsing accuracy of wired tables #3242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a new table recognition architecture that improves parsing accuracy for wired tables. The changes implement a sophisticated table processing pipeline with table type classification and orientation correction capabilities.
Key changes include:
- Addition of a new UNet-based table structure recognition model for wired tables
- Implementation of table classification to distinguish between wired and wireless tables
- Integration of image orientation correction for better table analysis
Reviewed Changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 5 comments.
Show a summary per file
File | Description |
---|---|
pyproject.toml | Adds scikit-image dependency for image processing |
mineru/utils/enum_class.py | Adds model paths for new table recognition components |
mineru/model/table/rec/unet_table/ | New UNet-based table recognition implementation with utilities |
mineru/model/table/cls/paddle_table_cls.py | Table classification model to distinguish wired/wireless tables |
mineru/model/ori_cls/paddle_ori_cls.py | Image orientation classification and correction |
mineru/backend/pipeline/ | Updates pipeline to integrate new table processing workflow |
Comments suppressed due to low confidence (1)
mineru/model/table/rec/unet_table/wired_table_rec_utils.py:122
- The type annotation indicates the method returns
np.ndarray
, but line 80 shows it can raiseONNXRuntimeError
. The return type should beUnion[np.ndarray, None]
or the exception should be documented in the docstring.
if not isinstance(img, InputType.__args__):
# 边缘检测 | ||
edges = cv2.Canny(gray, 100, 250, apertureSize=3) | ||
# 霍夫变换,摘自https://blog.csdn.net/feilong_csdn/article/details/81586322 | ||
lines = cv2.HoughLines(edges, 1, np.pi / 180, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code assumes lines[0]
exists on line 406, but cv2.HoughLines
can return None
when no lines are detected. This will cause an IndexError.
lines = cv2.HoughLines(edges, 1, np.pi / 180, 0) | |
lines = cv2.HoughLines(edges, 1, np.pi / 180, 0) | |
if lines is None: | |
return img |
Copilot uses AI. Check for mistakes.
return {"img": images} | ||
|
||
def infer(self, input): | ||
result = self.session(input["img"][None, ...])[0][0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The input is already expanded with [None, ...]
but the session call adds another [None, ...]
, creating a double expansion that will cause shape mismatch errors.
result = self.session(input["img"][None, ...])[0][0] | |
result = self.session(input["img"])[0][0] |
Copilot uses AI. Check for mistakes.
:param ocr_list: [[[xmin,ymin,xmax,ymax], text]] | ||
:return: | ||
""" | ||
threshold = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded threshold value of 10 overrides the function parameter threshold
. This makes the parameter unused and the behavior inconsistent with the function signature.
threshold = 10 |
Copilot uses AI. Check for mistakes.
_boxes, indices = zip(*sorted_boxes_with_idx) | ||
indices = list(indices) | ||
_boxes = [dt_boxes[i] for i in indices] | ||
threshold = 20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to line 305, this hardcoded threshold value of 20 overrides the function parameter, making the parameter ineffective.
threshold = 20 |
Copilot uses AI. Check for mistakes.
@@ -133,7 +134,7 @@ def __call__(self, images_with_extra_info: list) -> list: | |||
|
|||
# 获取OCR模型 | |||
ocr_model = atom_model_manager.get_atom_model( | |||
atom_model_name='ocr', | |||
atom_model_name=AtomicModel.OCR, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent usage of string literals vs AtomicModel constants. Line 325 uses string literal 'ocr' while this line uses the constant. This should be consistent throughout the file.
Copilot uses AI. Check for mistakes.
Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.
Motivation
Please describe the motivation of this PR and the goal you want to achieve through this PR.
Modification
Please briefly describe what modification is made in this PR.
BC-breaking (Optional)
Does the modification introduce changes that break the backward compatibility of the downstream repositories?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.
Use cases (Optional)
If this PR introduces a new feature, it is better to list some use cases here and update the documentation.
Checklist
Before PR:
After PR: