Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 718 - Fix bounding box issue on image redactor #735

Merged
merged 8 commits into from
Aug 13, 2021

Conversation

rakan41
Copy link
Contributor

@rakan41 rakan41 commented Jun 17, 2021

Change Description

Image redactor occasionally creates an extra bounding box for the word following the PII text. This occurs when the PII text consists of more than one word and the preceding word is longer than the following word. This is caused by the bounding box location incrementing by the length of the following word, rather than length of the current.

This fixes the above issue by ensuring the bounding box location increments by the current word's length.

Unit Test

Original Image
image
Redaction before code change
image
Redaction after code change
image

Issue reference

This PR fixes issue #718

Checklist

  • [ X] I have reviewed the contribution guidelines
  • [ X] I have signed the CLA
  • [ X] My code includes unit tests
  • [ X] All unit tests and lint checks pass locally
  • [ X] My PR contains documentation updates / additions if required

@omri374
Copy link
Contributor

omri374 commented Jun 17, 2021

Thanks @rakan41!

@navalev
Copy link
Contributor

navalev commented Jun 20, 2021

/azp run

1 similar comment
@omri374
Copy link
Contributor

omri374 commented Jun 20, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@omri374
Copy link
Contributor

omri374 commented Jun 20, 2021

There is a failing e2e test:

image_analyzer_engine = <presidio_image_redactor.image_analyzer_engine.ImageAnalyzerEngine object at 0x7f621a196070>

    def test_given_image_then_text_entities_are_recognized_correctly(image_analyzer_engine):
        # Image with PII entities
        image = get_resource_image("ocr_test.png")
        analyzer_results = image_analyzer_engine.analyze(image)
        assert len(analyzer_results) == 7
        results = __get_expected_ocr_test_image_analysis_results()
        for i in range(7):
>           assert analyzer_results[i] == results[i]
E           assert type: EMAIL_ADDRESS, start: 773, end: 795, score: 1.0 == type: EMAIL_ADDRESS, start: 772, end: 794, score: 1.0

tests/integration/test_image_analyzer_engine_integration.py:12: AssertionError
=========================== short test summary info ============================
FAILED tests/integration/test_image_analyzer_engine_integration.py::test_given_image_then_text_entities_are_recognized_correctly

@omri374
Copy link
Contributor

omri374 commented Jun 29, 2021

Hi @rakan41, in addition to unit tests on the analyzer module, there are integration tests which validate the APIs and integrations between modules. These can be found here: https://github.com/microsoft/presidio/tree/main/presidio-image-redactor/tests/integration
Please reach out here or through [email protected] if you need any help setting up these tests. Thanks!

@omri374
Copy link
Contributor

omri374 commented Jul 14, 2021

Hi @rakan41, can we help with this in any way?

@rakan41
Copy link
Contributor Author

rakan41 commented Jul 14, 2021

Hi @rakan41, can we help with this in any way?

Apologies, it's been a bit crazy lately. Will definitely try to look at this sometime next week!

@omri374
Copy link
Contributor

omri374 commented Jul 27, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rakan41
Copy link
Contributor Author

rakan41 commented Jul 27, 2021

Hi @omri374 ,
It looks like it now passes the AZP checks. Please let me know if there are any actions on my part before approving PR.
Cheers,

@omri374
Copy link
Contributor

omri374 commented Jul 28, 2021

Would it be possible to add a test for this?

@rakan41
Copy link
Contributor Author

rakan41 commented Jul 28, 2021

Would it be possible to add a test for this?

Hi @omri374 , sure no problem. Could I get some guidance please? Do I merely add a test function in the "test_image_redactor_engine.py" file?
If so, would a test based the unit test I made above be sufficient?
Thanks,

@omri374
Copy link
Contributor

omri374 commented Jul 28, 2021

I think a test could be your before and after images (as an integration test). Alternatively/in addition a unit test validating the logic could also work. In the current state of the PR there is no test for the code.

@omri374 omri374 closed this Jul 28, 2021
@omri374 omri374 reopened this Jul 28, 2021
@omri374
Copy link
Contributor

omri374 commented Aug 5, 2021

Hi @rakan41, maybe this can help?

@pytest.mark.parametrize(
    "input_image_file, redacted_image_file ,fill_color",
    [
        ("ocr_test.png", "ocr_test_redacted_matrix.png", (255, 0, 0)),
        ("img2.png", "img2_redacted.png", (255, 192, 203)),
    ],
)
def test_given_image_with_text_and_matrix_fill_then_text_is_colored_out(
    input_image_file, redacted_image_file, fill_color
):
    # Image with PII entities
    image = get_resource_image(input_image_file)

    redacted_image = ImageRedactorEngine().redact(image, fill_color)
    expected_result_image = get_resource_image(redacted_image_file)
    assert compare_images(redacted_image, expected_result_image)
    assert not compare_images(redacted_image, image)

Which extends the test test_given_image_with_text_and_matrix_fill_then_text_is_colored_out to be parameterized and also check for the use case in this PR. The files img2.png and img2_redacted.png are the example files you added to this PR.

@rakan41
Copy link
Contributor Author

rakan41 commented Aug 5, 2021

Hi @rakan41, maybe this can help?

@pytest.mark.parametrize(
    "input_image_file, redacted_image_file ,fill_color",
    [
        ("ocr_test.png", "ocr_test_redacted_matrix.png", (255, 0, 0)),
        ("img2.png", "img2_redacted.png", (255, 192, 203)),
    ],
)
def test_given_image_with_text_and_matrix_fill_then_text_is_colored_out(
    input_image_file, redacted_image_file, fill_color
):
    # Image with PII entities
    image = get_resource_image(input_image_file)

    redacted_image = ImageRedactorEngine().redact(image, fill_color)
    expected_result_image = get_resource_image(redacted_image_file)
    assert compare_images(redacted_image, expected_result_image)
    assert not compare_images(redacted_image, image)

Which extends the test test_given_image_with_text_and_matrix_fill_then_text_is_colored_out to be parameterized and also check for the use case in this PR. The files img2.png and img2_redacted.png are the example files you added to this PR.

This looks helpful! I'll try incorporate something like this.

and image resources.
@rakan41
Copy link
Contributor Author

rakan41 commented Aug 6, 2021

Added image resource to testing folder. Also added unit which checks that redacting that image matches the pre-redacted image (as seen in trail above)

@omri374
Copy link
Contributor

omri374 commented Aug 6, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

omri374
omri374 previously approved these changes Aug 11, 2021
@omri374 omri374 requested a review from navalev August 11, 2021 18:41
@omri374 omri374 requested a review from SharonHart August 11, 2021 18:41
SharonHart
SharonHart previously approved these changes Aug 11, 2021
Copy link
Contributor

@SharonHart SharonHart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
You have conflicts with your previous change 😄

@rakan41 rakan41 dismissed stale reviews from SharonHart and omri374 via a352de5 August 11, 2021 23:33
@rakan41
Copy link
Contributor Author

rakan41 commented Aug 11, 2021

LGTM!
You have conflicts with your previous change 😄

@SharonHart , I've resolved conflict. I've included both test functions, as one relates to Issue 718 and the other to Issue 749. Hope it is all good now :)

@SharonHart
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@omri374 omri374 merged commit b812be8 into microsoft:main Aug 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants