Skip to content

Interactive desktop tool for cleaning and transforming datasets. Built with Python, PyQt5, and pandas.

License

Notifications You must be signed in to change notification settings

BaseMax/qt-data-cleaner

Repository files navigation

Qt Data Cleaner

Interactive desktop tool for cleaning and transforming datasets. Built with Python, PyQt5, and pandas.

Qt Data Cleaner

Features

  • File Support: Load and export CSV and Excel files (.csv, .xlsx, .xls)
  • Column Profiling: View detailed statistics for each column including:
    • Data type
    • Count of values
    • Missing values and percentage
    • Unique values
    • Statistics for numeric columns (mean, std, min, max)
  • Missing Value Handling: Multiple strategies for dealing with missing data:
    • Drop rows with missing values
    • Fill with mean, median, or mode
    • Forward fill or backward fill
    • Fill with custom constant value
  • Data Transformations:
    • Normalize columns to 0-1 range
    • Standardize columns (Z-score normalization)
    • Label encode categorical columns
    • Drop duplicate rows
    • Reset dataframe index
  • Undo/Redo Support: Full history tracking with undo/redo functionality
  • Data Preview: Interactive table view with alternating row colors and missing value highlighting
  • Export Pipeline: Save cleaned data to CSV or Excel format

Installation

Requirements

  • Python 3.7 or higher
  • pip package manager

Setup

  1. Clone the repository:
git clone https://github.com/BaseMax/qt-data-cleaner.git
cd qt-data-cleaner
  1. Install dependencies:
pip install -r requirements.txt

Usage

Running the Application

python main.py

Or make it executable and run directly:

chmod +x main.py
./main.py

Quick Start Guide

  1. Open a Dataset:

    • Click "File" → "Open..." or use Ctrl+O
    • Select a CSV or Excel file
    • The data will be displayed in the table view
  2. View Column Profile:

    • The right panel shows detailed statistics for each column
    • Missing values are highlighted in red in the table
  3. Handle Missing Values:

    • Click "Data" → "Handle Missing Values..." or use the toolbar button
    • Select a fill method (mean, median, mode, etc.)
    • Choose which columns to apply the operation to
    • Click OK to apply
  4. Transform Data:

    • Click "Data" → "Transform..." or use the toolbar button
    • Select a transformation (normalize, standardize, encode, etc.)
    • Choose columns if applicable
    • Click OK to apply
  5. Undo/Redo:

    • Use "Edit" → "Undo" (Ctrl+Z) to revert changes
    • Use "Edit" → "Redo" (Ctrl+Shift+Z) to reapply changes
  6. Export Data:

    • Click "File" → "Export..." or use Ctrl+S
    • Choose output format (CSV or Excel)
    • Save the cleaned dataset

Sample Data

A sample dataset (sample_data.csv) is included with the repository for testing purposes. It contains employee data with some missing values.

Keyboard Shortcuts

  • Ctrl+O: Open file
  • Ctrl+S: Export file
  • Ctrl+Z: Undo
  • Ctrl+Shift+Z: Redo
  • F5: Refresh profile
  • Ctrl+Q: Quit application

Architecture

The application is structured into several components:

  • main.py: Application entry point
  • main_window.py: Main GUI window and user interface
  • data_model.py: Data management with undo/redo support
  • transformers.py: Data transformation utilities

Dependencies

  • PyQt5: GUI framework
  • pandas: Data manipulation and analysis
  • numpy: Numerical computing
  • openpyxl: Excel file support
  • scikit-learn: Data preprocessing and transformations

License

MIT License - see LICENSE file for details.

Author

Max Base

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

About

Interactive desktop tool for cleaning and transforming datasets. Built with Python, PyQt5, and pandas.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages