It has some interesting standard capabilities which I’m using and is used in Duplicati too (sorry macOS).
The below is random_file2.py but I’m still not sure what tool I put around it. The focus here is on change, potentially Duplicati-style, efficiency, predictability, and flexibility to make random files or change existing randomly, probably either within current size or extending randomly depending on how options were set.
#!/usr/bin/python
# path --block-size= --id-size= --percent= --create=
import os
import random
import argparse
import sys
parser = argparse.ArgumentParser()
parser.add_argument('path')
parser.add_argument('--block-size', type=int, default=102400, help='size of each block of this file')
parser.add_argument('--change-size', type=int, default=2, help='size of change at start of a block')
parser.add_argument('--percent', type=int, default=10, help='percent of blocks to get changed')
parser.add_argument('--create', type=int, help='create file with this many blocks', dest='blocks')
args = parser.parse_args()
if args.blocks is not None:
size = args.block_size * args.blocks
fd = os.open(args.path, os.O_WRONLY | os.O_CREAT | os.O_TRUNC | os.O_BINARY)
os.ftruncate(fd, size)
blocks = args.blocks
elif os.path.exists(args.path):
fd = os.open(args.path, os.O_WRONLY)
size = os.path.getsize(args.path)
blocks = size // args.block_size
else:
print('Please supply path of an existing file, or use --create to create a new file.', file=sys.stderr)
sys.exit(1)
changes = blocks * args.percent // 100
for index in random.sample(range(blocks), k=changes):
os.lseek(fd, index * args.block_size, os.SEEK_SET)
change = random.randbytes(args.change_size)
os.write(fd, random.randbytes(args.change_size))
Efficiency comes from only changing a small amount in a “block”, which need not match Duplicati size.
As little as 1 byte is “different”, although changing just 1 byte will eventually run out of unique blocks…
Predictability comes from being able to orient to Duplicati blocks, unlike trying to predict random bytes.
Flexibility lets you make a completely random-content file of arbitrary size if you want for some reason, and I “think” you can probably also extend an existing file (though only so far, and I haven’t just tested).
An ambitious workload simulator would also add and delete filenames, with this doing changes in them.
I’ve got a very old very idle PC I could run tests on, if anybody can figure out how to make it a workload.