-
Notifications
You must be signed in to change notification settings - Fork 603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 718 - Fix bounding box issue on image redactor #735
Conversation
Thanks @rakan41! |
/azp run |
1 similar comment
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
There is a failing e2e test: image_analyzer_engine = <presidio_image_redactor.image_analyzer_engine.ImageAnalyzerEngine object at 0x7f621a196070>
def test_given_image_then_text_entities_are_recognized_correctly(image_analyzer_engine):
# Image with PII entities
image = get_resource_image("ocr_test.png")
analyzer_results = image_analyzer_engine.analyze(image)
assert len(analyzer_results) == 7
results = __get_expected_ocr_test_image_analysis_results()
for i in range(7):
> assert analyzer_results[i] == results[i]
E assert type: EMAIL_ADDRESS, start: 773, end: 795, score: 1.0 == type: EMAIL_ADDRESS, start: 772, end: 794, score: 1.0
tests/integration/test_image_analyzer_engine_integration.py:12: AssertionError
=========================== short test summary info ============================
FAILED tests/integration/test_image_analyzer_engine_integration.py::test_given_image_then_text_entities_are_recognized_correctly |
Hi @rakan41, in addition to unit tests on the analyzer module, there are integration tests which validate the APIs and integrations between modules. These can be found here: https://github.com/microsoft/presidio/tree/main/presidio-image-redactor/tests/integration |
Hi @rakan41, can we help with this in any way? |
Apologies, it's been a bit crazy lately. Will definitely try to look at this sometime next week! |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
Hi @omri374 , |
Would it be possible to add a test for this? |
Hi @omri374 , sure no problem. Could I get some guidance please? Do I merely add a test function in the "test_image_redactor_engine.py" file? |
I think a test could be your before and after images (as an integration test). Alternatively/in addition a unit test validating the logic could also work. In the current state of the PR there is no test for the code. |
Hi @rakan41, maybe this can help? @pytest.mark.parametrize(
"input_image_file, redacted_image_file ,fill_color",
[
("ocr_test.png", "ocr_test_redacted_matrix.png", (255, 0, 0)),
("img2.png", "img2_redacted.png", (255, 192, 203)),
],
)
def test_given_image_with_text_and_matrix_fill_then_text_is_colored_out(
input_image_file, redacted_image_file, fill_color
):
# Image with PII entities
image = get_resource_image(input_image_file)
redacted_image = ImageRedactorEngine().redact(image, fill_color)
expected_result_image = get_resource_image(redacted_image_file)
assert compare_images(redacted_image, expected_result_image)
assert not compare_images(redacted_image, image) Which extends the test |
This looks helpful! I'll try incorporate something like this. |
and image resources.
Added image resource to testing folder. Also added unit which checks that redacting that image matches the pre-redacted image (as seen in trail above) |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
You have conflicts with your previous change 😄
@SharonHart , I've resolved conflict. I've included both test functions, as one relates to Issue 718 and the other to Issue 749. Hope it is all good now :) |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
Change Description
Image redactor occasionally creates an extra bounding box for the word following the PII text. This occurs when the PII text consists of more than one word and the preceding word is longer than the following word. This is caused by the bounding box location incrementing by the length of the following word, rather than length of the current.
This fixes the above issue by ensuring the bounding box location increments by the current word's length.
Unit Test
Original Image
data:image/s3,"s3://crabby-images/84966/84966a99c0fa228485fc74f0a6eb91302116c16d" alt="image"
data:image/s3,"s3://crabby-images/28e74/28e74f6b55fcb7741a4428c3b4d387b2626b1783" alt="image"
data:image/s3,"s3://crabby-images/0943b/0943bc24aff2238ec4e1c3591d48777cf7185a88" alt="image"
Redaction before code change
Redaction after code change
Issue reference
This PR fixes issue #718
Checklist