This repository has been archived by the owner on Jan 15, 2024. It is now read-only.
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.
(not a bug) question about bert create_pretraining_data.tokenize_lines()
#1592
Open
Description
Description
In the function scripts.pretraining.bert.create_pretraining_data.tokenize_lines()
The code snippet:
for line in lines:
if not line:
break
line = line.strip()
# Empty lines are used as document delimiters
if not line:
results.append([])
else:
#<OMITTED FOR BREVITY...>
return results
Suggests that empty or null lines (e.g. ""
or None
) break the for-loop returning only the lines that have been processed so far whereas stripped-empty lines (e.g. " "
) are used as document delimiters.
Could someone shed light as to what the (empty line + break-from-loop) is meant to accomplish? Are empty/null lines used as delimiters?
Activity