Skip to content

Problem: Detect duplicate files and folders across AIPs and pipelines #448

Open
@ross-spencer

Description

@ross-spencer

Please describe the problem you'd like to be solved.

As a someone transferring information into Archivematica I'd like to find duplicate content across AIPs so that I can understand if the content has already been stored for preservation and access, or if there is excess amounts of redundancy in the direct copies that I am maintaining.

Describe the solution you'd like to see implemented.

I would like a checksum comparison to be available somewhere in workflow that will allow me to identify duplicates. I can then make decisions based on the information returned.

Describe alternatives you've considered.

I can detect duplicates before transfer using tools that generate checksums but it is difficult to maintain state over long periods of time, and if I have many AIPs already stored, then there isn't an easy way for me to know if there is content stored that may be identical to the content that I am transferring.


For Artefactual use:
Please make sure these steps are taken before moving this issue from Review to Verified in Waffle:

  • All PRs related to this issue are properly linked 👍
  • All PRs related to this issue have been merged 👍
  • Test plan for this issue has been implemented and passed 👍
  • Documentation regarding this issue has been written and it has been added to the release notes, if needed 👍

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    IISHInternational Institute of Social History

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions