Outlier Detector
Service for statistical anomaly detection in datasets.
This class provides methods to identify outliers using the 3-sigma rule (Standard Deviation). It can process features globally or per-class to ensure high accuracy in diverse datasets.
Source code in services/outlier_detector.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 | |
mark_outliers(df, columns)
staticmethod
Identifies and marks outliers for specified columns in a DataFrame.
The method calculates iqr for each column. Values outside the range [q1 - 1.5 * iqr, q3 + 1.5 * iqr] are marked as 1. Columns starting with 'object_' are analyzed within each class group. Other columns are analyzed globally.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
The input feature matrix. |
required |
columns
|
List[str]
|
List of numeric column names to analyze. |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: The original DataFrame with additional binary
columns: 'outlier_ |
Source code in services/outlier_detector.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 | |