Skip to content

DataForge

A simple way to automate working with datasets. You can set a time delay for automatic execution of your command.

if you don’t want the command works in a cycle, just don't use "-r" argument. And it will be executed for one time.

Available commands

  • move - move files from source directory to target directory
  • slice - slice video files to images from the source directory to the target directory. Also, you can set flag "--remove" or "-rm" for deleting a source video file after slicing

  • delete - delete files that match patterns from source directory

  • dedup - find duplicates in source directory that matches a pattern. An image means a duplicate if it's hash has lower Hamming distance with comparing image hash than threshold value. The threshold value setups in percentage and must be in range [0, 100]. Pay attention to core_size parameter: the lower value makes details at photo less important, and the higher value makes details mach important while comparing information at images. It’s implemented only dHash comparing method for now.
  • clean-annotations - find annotation files in directory that doesn't have corresponding files
  • convert-annotations - converts annotations from source format to destination format

to see command syntax and arguments use:

python data_forge.py <command> -h

How to use:

clone git repository:

git clone https://github.com/SeregaCodit/AutoFileManager.git

go to project directory:

cd path_to_project

create virtual environment and activate it:

python -m venv .venv

install requirements :

pip install -r requirements.txt

read the --help command for learn more about available commands and arguments:

for check available commands

python data_forge.py --help

for check the command usage and available arguments

python data_forge.py {command} --help

What else?

For more comfortable using FileManager with multiple tasks you can create an .sh file or modify strat_all_tasks.sh with list of your commands. And run all of them just by one simple command:

bash path_to_file/start_all_tasks.sh

for stop executing of all commands use:

pkill -f data_forge.py