Discussion:
Utility to select smaller of two file versions?
(too old to reply)
Philip Herlihy
2024-08-04 22:00:24 UTC
Permalink
I'm looking for a utility to select the smaller of two versions of a PDF
document.

I'm helping a friend compress PDFs of music scores to load onto a tablet to
play from. We've found that Corel's PDF Fusion can batch process compression,
usually taking about 40% off the file size. But some files come out bigger,
occasionally much bigger! We'd like to use the original if it's smaller than
the processed version.

I'd thought Robocopy could likely to this - but it can't. (It will set a min
or max file size to be copied/moved/merged. I've looked at Xcopy, and I can't
think of a command-line (or other) utility that will do this. CoPilot will
write you a PowerShell script to do it, but not (still) having learned
PowerShell I'd be wary of trusting a generated script.

Can anyone think of a utility that will do this? I'd get PDF Fusion to dump
processed files in a separate folder, then we'd want to compare sizes with the
folder with the originals - unless there's a better way?
--
Phil, London
Andy Burns
2024-08-05 11:38:52 UTC
Permalink
Post by Philip Herlihy
I'm looking for a utility to select the smaller of two versions of a PDF
document.
You could do it using the %~z modifier to get the file size into a
variable in a CMD batch file.

You could use wsh/jscript/vbscript/filesystemobject but that's going
away, so nowadays I'd probably use powershell.
Philip Herlihy
2024-08-05 12:41:44 UTC
Permalink
Post by Andy Burns
Post by Philip Herlihy
I'm looking for a utility to select the smaller of two versions of a PDF
document.
You could do it using the %~z modifier to get the file size into a
variable in a CMD batch file.
You could use wsh/jscript/vbscript/filesystemobject but that's going
away, so nowadays I'd probably use powershell.
Clearly, for scripting PowerShell is the way to go. I've just never yet found
time to learn it. I was something of a guru in Unix shell-scripting in my day,
but there's always something pressing getting in the way of learning things I
don't often have a need for these days!
--
Phil, London
Andy Burns
2024-08-05 13:16:52 UTC
Permalink
Post by Philip Herlihy
Clearly, for scripting PowerShell is the way to go. I've just never yet found
time to learn it.
I can't say that using CoPilot as a buddy-programmer interests me much,
but let it have a go at writing you a script, I suspect communicating to
it how the corresponding old and PDF files will be named might be a
challenge?
Philip Herlihy
2024-08-07 11:35:05 UTC
Permalink
In article <***@news.eternal-september.org>, Philip
Herlihy wrote...
Post by Philip Herlihy
Post by Andy Burns
Post by Philip Herlihy
I'm looking for a utility to select the smaller of two versions of a PDF
document.
You could do it using the %~z modifier to get the file size into a
variable in a CMD batch file.
You could use wsh/jscript/vbscript/filesystemobject but that's going
away, so nowadays I'd probably use powershell.
Clearly, for scripting PowerShell is the way to go. I've just never yet found
time to learn it. I was something of a guru in Unix shell-scripting in my day,
but there's always something pressing getting in the way of learning things I
don't often have a need for these days!
Thanks for all suggestions. Scripting seems to be the way forward. I wouldn't
want to go that way if there was already a command-line utility which could do
this, but that doesn't seem to be an option.
--
Phil, London
GB
2024-08-05 14:57:21 UTC
Permalink
Post by Philip Herlihy
I'm looking for a utility to select the smaller of two versions of a PDF
document.
I'm helping a friend compress PDFs of music scores to load onto a tablet to
play from. We've found that Corel's PDF Fusion can batch process compression,
usually taking about 40% off the file size. But some files come out bigger,
occasionally much bigger! We'd like to use the original if it's smaller than
the processed version.
I'd thought Robocopy could likely to this - but it can't. (It will set a min
or max file size to be copied/moved/merged. I've looked at Xcopy, and I can't
think of a command-line (or other) utility that will do this. CoPilot will
write you a PowerShell script to do it, but not (still) having learned
PowerShell I'd be wary of trusting a generated script.
Can anyone think of a utility that will do this? I'd get PDF Fusion to dump
processed files in a separate folder, then we'd want to compare sizes with the
folder with the originals - unless there's a better way?
ChatGPT came up with this.
On the assumption The filenames are in pairs filename_1.pdf and
filename_2.pdf.


import os
import shutil

def get_file_size(filepath):
"""Returns the size of the file at filepath."""
return os.path.getsize(filepath)

def copy_smaller_files(source_dir, dest_dir):
"""Copies the smaller file of each pair from source_dir to dest_dir."""
if not os.path.exists(dest_dir):
os.makedirs(dest_dir)

# Get a list of files in the source directory
files = os.listdir(source_dir)

# Group the files by their base name without the _1.pdf or _2.pdf
suffix
file_pairs = {}
for file in files:
if file.endswith('.pdf'):
base_name = file.rsplit('_', 1)[0]
if base_name not in file_pairs:
file_pairs[base_name] = []
file_pairs[base_name].append(file)

# Iterate over the file pairs and copy the smaller file to the
destination directory
for base_name, pair in file_pairs.items():
if len(pair) == 2:
file_1 = os.path.join(source_dir, pair[0])
file_2 = os.path.join(source_dir, pair[1])
if get_file_size(file_1) <= get_file_size(file_2):
smaller_file = file_1
else:
smaller_file = file_2
shutil.copy(smaller_file, os.path.join(dest_dir,
os.path.basename(smaller_file)))
print(f"Copied {os.path.basename(smaller_file)} to {dest_dir}")
else:
print(f"Warning: Pair for base name {base_name} is
incomplete or malformed.")

# Example usage
source_directory = r'C:\path\to\source_directory'
destination_directory = r'C:\path\to\destination_directory'
copy_smaller_files(source_directory, destination_directory)
David
2024-08-05 19:46:28 UTC
Permalink
Post by Philip Herlihy
I'm looking for a utility to select the smaller of two versions of a PDF
document.
I'm helping a friend compress PDFs of music scores to load onto a tablet
to play from. We've found that Corel's PDF Fusion can batch process
compression,
usually taking about 40% off the file size. But some files come out
bigger, occasionally much bigger! We'd like to use the original if it's
smaller than the processed version.
I'd thought Robocopy could likely to this - but it can't. (It will set
a min or max file size to be copied/moved/merged. I've looked at Xcopy,
and I can't think of a command-line (or other) utility that will do
this. CoPilot will write you a PowerShell script to do it, but not
(still) having learned PowerShell I'd be wary of trusting a generated
script.
Can anyone think of a utility that will do this? I'd get PDF Fusion to
dump processed files in a separate folder, then we'd want to compare
sizes with the folder with the originals - unless there's a better way?
Never a Perl Monk around when you need them. ;-)

Cheers



Dave R
--
AMD FX-6300 in GA-990X-Gaming SLI-CF running Windows 10 x64
--
This email has been checked for viruses by Avast antivirus software.
www.avast.com
Loading...