Skip to content

Commit

Permalink
Update conf files location (#1358)
Browse files Browse the repository at this point in the history
  • Loading branch information
omri374 authored Apr 18, 2024
1 parent 41e0202 commit f29e112
Show file tree
Hide file tree
Showing 19 changed files with 60 additions and 62 deletions.
2 changes: 1 addition & 1 deletion docker-compose-transformers.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ services:
context: ./presidio-analyzer
args:
- NAME=presidio-analyzer
- NLP_CONF_FILE=conf/transformers.yaml
- NLP_CONF_FILE=presidio_analyzer/conf/transformers.yaml
dockerfile: Dockerfile.transformers
environment:
- PORT=5001
Expand Down
5 changes: 2 additions & 3 deletions docs/analyzer/adding_recognizers.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ print(results)

For pattern based recognizers, it is possible to change the regex flags, either for
one recognizer or for all.
For one recognizer, use the `global_regex_flags` parameter
For one recognizer, use the `global_regex_flags` parameter
in the `PatternRecognizer` constructor.
For all recognizers, use the `global_regex_flags` parameter in the `RecognizerRegistry` constructor:

Expand All @@ -98,7 +98,6 @@ engine = AnalyzerEngine(registry=registry)
engine.analyze(...)
```


### Creating a new `EntityRecognizer` in code

To create a new recognizer via code:
Expand Down Expand Up @@ -219,7 +218,7 @@ Additional examples can be found in the [OpenAPI spec](../api-docs/api-docs.html
### Reading pattern recognizers from YAML

Recognizers can be loaded from a YAML file, which allows users to add recognition logic without writing code.
An example YAML file can be found [here](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/conf/example_recognizers.yaml).
An example YAML file can be found [here](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/example_recognizers.yaml).

Once the YAML file is created, it can be loaded into the `RecognizerRegistry` instance.

Expand Down
4 changes: 2 additions & 2 deletions docs/analyzer/customizing_nlp_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ Configuration can be done in two ways:
print(results_english)
```

- **Via configuration**: Set up the models which should be used in the [default `conf` file](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/conf/default.yaml).
- **Via configuration**: Set up the models which should be used in the [default `conf` file](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/default.yaml).

An example Conf file:

Expand Down Expand Up @@ -86,7 +86,7 @@ Configuration can be done in two ways:
- `low_confidence_score_multiplier`: A multiplier to apply to the score of entities with low confidence.
- `low_score_entity_names`: A list of entity types to apply the low confidence score multiplier to.

The [default conf file](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/conf/default.yaml) is read during the default initialization of the `AnalyzerEngine`. Alternatively, the path to a custom configuration file can be passed to the `NlpEngineProvider`:
The [default conf file](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/default.yaml) is read during the default initialization of the `AnalyzerEngine`. Alternatively, the path to a custom configuration file can be passed to the `NlpEngineProvider`:

```python
from presidio_analyzer import AnalyzerEngine, RecognizerRegistry
Expand Down
4 changes: 2 additions & 2 deletions docs/analyzer/languages.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,8 +70,8 @@ Link to LANGUAGES_CONFIG_FILE=[languages-config.yml](https://github.com/microsof

When packaging the code into a Docker container, NLP models are automatically installed.
To define which models should be installed,
update the [conf/default.yaml](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/conf/default.yaml) file. This file is read during
update the [conf/default.yaml](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/default.yaml) file. This file is read during
the `docker build` phase and the models defined in it are installed automatically.

For `transformers` based models, the configuration [can be found here](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/conf/transformers.yaml).
For `transformers` based models, the configuration [can be found here](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/transformers.yaml).
A docker file supporting transformers models [can be found here](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/Dockerfile.transformers).
2 changes: 1 addition & 1 deletion docs/tutorial/08_no_code.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ No-code pattern recognizers can be helpful in two scenarios:

Regular expression or deny-list based recognizers can be written in a YAML file, and added to the list of recognizers in Presidio.

An example YAML file can be found [here](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/conf/example_recognizers.yaml).
An example YAML file can be found [here](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/example_recognizers.yaml).

For more information on the schema, see the `PatternRecognizer` definition on the [API Docs](https://microsoft.github.io/presidio/api-docs/api-docs.html#tag/Analyzer)).

Expand Down
1 change: 0 additions & 1 deletion e2e-tests/tests/test_analyzer.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
import pytest

from common.assertions import equal_json_strings
from common.methods import analyze, analyzer_supported_entities

Expand Down
4 changes: 2 additions & 2 deletions presidio-analyzer/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
FROM python:3.9-slim

ARG NAME
ARG NLP_CONF_FILE=conf/default.yaml
ARG NLP_CONF_FILE=presidio_analyzer/conf/default.yaml
ENV PIPENV_VENV_IN_PROJECT=1
ENV PIP_NO_CACHE_DIR=1
WORKDIR /usr/bin/${NAME}
Expand All @@ -16,4 +16,4 @@ RUN pipenv run python install_nlp_models.py --conf_file ${NLP_CONF_FILE}

COPY . /usr/bin/${NAME}/
EXPOSE ${PORT}
CMD pipenv run python app.py --host 0.0.0.0
CMD pipenv run python app.py --host 0.0.0.0
4 changes: 2 additions & 2 deletions presidio-analyzer/Dockerfile.transformers
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
FROM python:3.9-slim

ARG NAME
ARG NLP_CONF_FILE=conf/transformers.yaml
ARG NLP_CONF_FILE=presidio_analyzer/conf/transformers.yaml
ENV PIPENV_VENV_IN_PROJECT=1
ENV PIP_NO_CACHE_DIR=1
WORKDIR /usr/bin/${NAME}
Expand All @@ -19,4 +19,4 @@ RUN pipenv run python install_nlp_models.py --conf_file ${NLP_CONF_FILE}

COPY . /usr/bin/${NAME}/
EXPOSE ${PORT}
CMD pipenv run python app.py --host 0.0.0.0
CMD pipenv run python app.py --host 0.0.0.0
2 changes: 1 addition & 1 deletion presidio-analyzer/Dockerfile.windows
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
FROM python:3.9-windowsservercore

ARG NLP_CONF_FILE=conf/default.yaml
ARG NLP_CONF_FILE=presidio_analyzer/conf/default.yaml
ENV PIPENV_VENV_IN_PROJECT=1
ENV PIP_NO_CACHE_DIR=1
WORKDIR /app
Expand Down
2 changes: 1 addition & 1 deletion presidio-analyzer/install_nlp_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ def _install_transformers_spacy_models(model_name: Dict[str, str]) -> None:
parser.add_argument(
"--conf_file",
required=False,
default="conf/default.yaml",
default="presidio_analyzer/conf/default.yaml",
help="Location of nlp configuration yaml file. Default: conf/default.yaml",
)
args = parser.parse_args()
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,44 +1,44 @@
nlp_engine_name: transformers
models:
-
lang_code: en
model_name:
spacy: en_core_web_sm
transformers: StanfordAIMI/stanford-deidentifier-base

ner_model_configuration:
labels_to_ignore:
- O
aggregation_strategy: max # "simple", "first", "average", "max"
stride: 16 # If stride >= 0, process long texts in
# overlapping windows of the model max
# length. The value is the length of the
# window overlap in transformer tokenizer
# tokens, NOT the length of the stride.
alignment_mode: expand # "strict", "contract", "expand"
model_to_presidio_entity_mapping:
PER: PERSON
PERSON: PERSON
LOC: LOCATION
LOCATION: LOCATION
GPE: LOCATION
ORG: ORGANIZATION
ORGANIZATION: ORGANIZATION
NORP: NRP
AGE: AGE
ID: ID
EMAIL: EMAIL
PATIENT: PERSON
STAFF: PERSON
HOSP: ORGANIZATION
PATORG: ORGANIZATION
DATE: DATE_TIME
TIME: DATE_TIME
PHONE: PHONE_NUMBER
HCW: PERSON
HOSPITAL: ORGANIZATION
FACILITY: LOCATION

low_confidence_score_multiplier: 0.4
low_score_entity_names:
- ID
nlp_engine_name: transformers
models:
-
lang_code: en
model_name:
spacy: en_core_web_sm
transformers: StanfordAIMI/stanford-deidentifier-base

ner_model_configuration:
labels_to_ignore:
- O
aggregation_strategy: max # "simple", "first", "average", "max"
stride: 16 # If stride >= 0, process long texts in
# overlapping windows of the model max
# length. The value is the length of the
# window overlap in transformer tokenizer
# tokens, NOT the length of the stride.
alignment_mode: expand # "strict", "contract", "expand"
model_to_presidio_entity_mapping:
PER: PERSON
PERSON: PERSON
LOC: LOCATION
LOCATION: LOCATION
GPE: LOCATION
ORG: ORGANIZATION
ORGANIZATION: ORGANIZATION
NORP: NRP
AGE: AGE
ID: ID
EMAIL: EMAIL
PATIENT: PERSON
STAFF: PERSON
HOSP: ORGANIZATION
PATORG: ORGANIZATION
DATE: DATE_TIME
TIME: DATE_TIME
PHONE: PHONE_NUMBER
HCW: PERSON
HOSPITAL: ORGANIZATION
FACILITY: LOCATION

low_confidence_score_multiplier: 0.4
low_score_entity_names:
- ID
Original file line number Diff line number Diff line change
Expand Up @@ -135,4 +135,4 @@ def _get_full_conf_path(
default_conf_file: Union[Path, str] = "default.yaml"
) -> Path:
"""Return a Path to the default conf file."""
return Path(Path(__file__).parent.parent.parent, "conf", default_conf_file)
return Path(Path(__file__).parent.parent, "conf", default_conf_file)
2 changes: 1 addition & 1 deletion presidio-analyzer/presidio_analyzer/recognizer_registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -287,7 +287,7 @@ def add_recognizers_from_yaml(self, yml_path: Union[str, Path]) -> None:
Read YAML file and load recognizers into the recognizer registry.
See example yaml file here:
https://github.com/microsoft/presidio/blob/main/presidio-analyzer/conf/example_recognizers.yaml
https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/example_recognizers.yaml
:example:
>>> yaml_file = "recognizers.yaml"
Expand Down

0 comments on commit f29e112

Please sign in to comment.