Interactive desktop tool for cleaning and transforming datasets. Built with Python, PyQt5, and pandas.
- File Support: Load and export CSV and Excel files (.csv, .xlsx, .xls)
- Column Profiling: View detailed statistics for each column including:
- Data type
- Count of values
- Missing values and percentage
- Unique values
- Statistics for numeric columns (mean, std, min, max)
- Missing Value Handling: Multiple strategies for dealing with missing data:
- Drop rows with missing values
- Fill with mean, median, or mode
- Forward fill or backward fill
- Fill with custom constant value
- Data Transformations:
- Normalize columns to 0-1 range
- Standardize columns (Z-score normalization)
- Label encode categorical columns
- Drop duplicate rows
- Reset dataframe index
- Undo/Redo Support: Full history tracking with undo/redo functionality
- Data Preview: Interactive table view with alternating row colors and missing value highlighting
- Export Pipeline: Save cleaned data to CSV or Excel format
- Python 3.7 or higher
- pip package manager
- Clone the repository:
git clone https://github.com/BaseMax/qt-data-cleaner.git
cd qt-data-cleaner- Install dependencies:
pip install -r requirements.txtpython main.pyOr make it executable and run directly:
chmod +x main.py
./main.py-
Open a Dataset:
- Click "File" → "Open..." or use Ctrl+O
- Select a CSV or Excel file
- The data will be displayed in the table view
-
View Column Profile:
- The right panel shows detailed statistics for each column
- Missing values are highlighted in red in the table
-
Handle Missing Values:
- Click "Data" → "Handle Missing Values..." or use the toolbar button
- Select a fill method (mean, median, mode, etc.)
- Choose which columns to apply the operation to
- Click OK to apply
-
Transform Data:
- Click "Data" → "Transform..." or use the toolbar button
- Select a transformation (normalize, standardize, encode, etc.)
- Choose columns if applicable
- Click OK to apply
-
Undo/Redo:
- Use "Edit" → "Undo" (Ctrl+Z) to revert changes
- Use "Edit" → "Redo" (Ctrl+Shift+Z) to reapply changes
-
Export Data:
- Click "File" → "Export..." or use Ctrl+S
- Choose output format (CSV or Excel)
- Save the cleaned dataset
A sample dataset (sample_data.csv) is included with the repository for testing purposes. It contains employee data with some missing values.
- Ctrl+O: Open file
- Ctrl+S: Export file
- Ctrl+Z: Undo
- Ctrl+Shift+Z: Redo
- F5: Refresh profile
- Ctrl+Q: Quit application
The application is structured into several components:
main.py: Application entry pointmain_window.py: Main GUI window and user interfacedata_model.py: Data management with undo/redo supporttransformers.py: Data transformation utilities
- PyQt5: GUI framework
- pandas: Data manipulation and analysis
- numpy: Numerical computing
- openpyxl: Excel file support
- scikit-learn: Data preprocessing and transformations
MIT License - see LICENSE file for details.
Max Base
Contributions are welcome! Please feel free to submit a Pull Request.
