Skip to content

Image Comparer

An orchestrator for comparing images using different hashing algorithms.

This class acts as a manager that selects a specific hashing strategy (like dHash) based on the project settings. It coordinates the process of generating hashes and identifying duplicate files.

Attributes:

Name Type Description
settings AppSettings

Global configuration object containing hashing parameters and paths.

method_mapping Dict

A map that links algorithm names to their corresponding classes.

method BaseHasher

An instance of the selected hashing algorithm.

logger Logger

Logger instance for tracking comparison tasks.

Source code in tools/comparer/img_comparer/img_comparer.py
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
class ImageComparer:
    """
    An orchestrator for comparing images using different hashing algorithms.

    This class acts as a manager that selects a specific hashing strategy
    (like dHash) based on the project settings. It coordinates the process
    of generating hashes and identifying duplicate files.

    Attributes:
        settings (AppSettings): Global configuration object containing
            hashing parameters and paths.
        method_mapping (Dict): A map that links algorithm names to their
            corresponding classes.
        method (BaseHasher): An instance of the selected hashing algorithm.
        logger (logging.Logger): Logger instance for tracking comparison tasks.
    """
    def __init__(self, settings: AppSettings):
        """
        Initializes the ImageComparer with settings and sets up the algorithm.

        Args:
            settings (AppSettings): Global configuration object that includes
                default and user-defined parameters.
        """
        super().__init__()
        self.settings = settings

        self.method_mapping = {
            Constants.dhash: DHash,
        }

        self.method = self.method_mapping[self.settings.method](
            settings=self.settings,
        )

        self.logger = LoggerConfigurator.setup(
            name=self.__class__.__name__,
            log_path=Path(self.settings.log_path) / f"{self.__class__.__name__}.log" if self.settings.log_path else None,
            log_level=self.settings.log_level
        )


    def compare(self, file_paths: Tuple[Path]) -> List[Path]:
        """
        Compares files using the each-with-each principle to find duplicates.

        The process consists of two steps:
        1. Building a hash map for all provided files.
        2. Analyzing the hash map to find matches that satisfy the
           Hamming distance threshold.

        Args:
            file_paths (Tuple[Path]): A collection of paths to the image files
                to be compared.

        Returns:
            List[Path]: A list of file paths that are identified as duplicates.
        """
        hash_map = self.method.get_hashmap(file_paths)
        matches = self.method.find_duplicates(hash_map)
        return matches

__init__(settings)

Initializes the ImageComparer with settings and sets up the algorithm.

Parameters:

Name Type Description Default
settings AppSettings

Global configuration object that includes default and user-defined parameters.

required
Source code in tools/comparer/img_comparer/img_comparer.py
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
def __init__(self, settings: AppSettings):
    """
    Initializes the ImageComparer with settings and sets up the algorithm.

    Args:
        settings (AppSettings): Global configuration object that includes
            default and user-defined parameters.
    """
    super().__init__()
    self.settings = settings

    self.method_mapping = {
        Constants.dhash: DHash,
    }

    self.method = self.method_mapping[self.settings.method](
        settings=self.settings,
    )

    self.logger = LoggerConfigurator.setup(
        name=self.__class__.__name__,
        log_path=Path(self.settings.log_path) / f"{self.__class__.__name__}.log" if self.settings.log_path else None,
        log_level=self.settings.log_level
    )

compare(file_paths)

Compares files using the each-with-each principle to find duplicates.

The process consists of two steps: 1. Building a hash map for all provided files. 2. Analyzing the hash map to find matches that satisfy the Hamming distance threshold.

Parameters:

Name Type Description Default
file_paths Tuple[Path]

A collection of paths to the image files to be compared.

required

Returns:

Type Description
List[Path]

List[Path]: A list of file paths that are identified as duplicates.

Source code in tools/comparer/img_comparer/img_comparer.py
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
def compare(self, file_paths: Tuple[Path]) -> List[Path]:
    """
    Compares files using the each-with-each principle to find duplicates.

    The process consists of two steps:
    1. Building a hash map for all provided files.
    2. Analyzing the hash map to find matches that satisfy the
       Hamming distance threshold.

    Args:
        file_paths (Tuple[Path]): A collection of paths to the image files
            to be compared.

    Returns:
        List[Path]: A list of file paths that are identified as duplicates.
    """
    hash_map = self.method.get_hashmap(file_paths)
    matches = self.method.find_duplicates(hash_map)
    return matches