[feature] More efficient handling of sparse files

fuse-archive uses `archive_read_data()` to get archive content. Gaps are filled with nulls and fuse-archive has no idea about them.
It seems that operations on big sparse files could be improved.

```
% tar xvf test/data/sparse.tar
sparse
% time cp sparse sparse.copy
cp sparse sparse.copy  0.00s user 0.00s system 64% cpu 0.002 total
% time cp sparse sparse.copy
cp sparse sparse.copy  0.00s user 0.00s system 63% cpu 0.001 total
% du sparse.copy
4	sparse.copy

% out/fuse-archive test/data/sparse.tar mnt
fuse-archive: Created mount point 'mnt'
% time cp mnt/sparse sparse.copy
cp mnt/sparse sparse.copy  0.00s user 0.36s system 47% cpu 0.735 total
% time cp mnt/sparse sparse.copy
cp mnt/sparse sparse.copy  0.00s user 0.44s system 52% cpu 0.839 total
% du sparse.copy
1048576	sparse.copy
```

For some reason, the first invocation on the file inside the mounted archive is faster than the following ones.
For the simple file, the fact that the second invocation is slightly faster is probably due to the kernel cache.
With `fuse-archive -o kernel_cache`, the second invocation is faster as well:
```
% out/fuse-archive -o kernel_cache test/data/sparse.tar mnt
fuse-archive: Created mount point 'mnt'
% time cp mnt/sparse sparse.copy
cp mnt/sparse sparse.copy  0.00s user 0.34s system 46% cpu 0.738 total
% time cp mnt/sparse sparse.copy
cp mnt/sparse sparse.copy  0.00s user 0.17s system 34% cpu 0.491 total
```


Using directly `archive_read_data_block()` would bring some benefits, such as:

- support SEEK_HOLE and SEEK_DATA (through FUSE_LSEEK)
- more efficient read operation with tools that support sparseness (coreutils, database, VM, etc)
- possibly more efficient sequential read operation in general on big sparse files (probably not, the zeros would have to be put in memory by fuse-archive instead of libarchive, but they would be there anyway)
- report `st_blocks` that would mean something useful
- for some tools, output files would be sparse as well, reducing disk usage and being closer to the original file in the archive.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature] More efficient handling of sparse files #41

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development