Market basket analysis with Python. Generate synthetic retail transactions, mine frequent itemsets using Apriori or FP‑Growth, derive association rules, and produce clean outputs and charts.
- Synthetic transactions generator with realistic co‑purchases (e.g., Laptop → Mouse/Keyboard/Bag)
- Frequent itemsets via Apriori or FP‑Growth (mlxtend)
- Association rules with support, confidence, lift, leverage, conviction
- Reproducible artifacts: CSVs for item supports, frequent itemsets, association rules
- Visual: Top‑N items by support (Matplotlib)
- All outputs saved under
outputs/
market-basket-analysis/
├─ README.md
├─ LICENSE
├─ requirements.txt
├─ data/
│ └─ generate_transactions.py
├─ src/
│ ├─ market_basket.py
│ └─ utils.py
└─ outputs/
└─ figures & reports (auto-created)
python -m venv .venv
# Windows:
.venv\Scripts\activate
# macOS/Linux:
source .venv/bin/activate
pip install -r requirements.txtpython data/generate_transactions.py --start 2024-01-01 --end 2024-12-31 --customers 800 --avg_per_day 120 --seed 42 --out data/transactions.csvApriori:
python src/market_basket.py --input data/transactions.csv --outdir outputs --algo apriori --min_support 0.02 --metric lift --min_threshold 1.1 --top_n 12FP‑Growth:
python src/market_basket.py --input data/transactions.csv --outdir outputs --algo fpgrowth --min_support 0.02 --metric lift --min_threshold 1.1 --top_n 12Outputs
outputs/item_support.csvoutputs/frequent_itemsets_<algo>.csvoutputs/association_rules_<algo>.csvoutputs/fig_top_items.pngoutputs/summary.json
Top rule (by lift)
| Antecedents | Consequents | Support | Confidence | Lift | Conviction |
|---|---|---|---|---|---|
| Keyboard, Laptop Bag, USB-C Hub | Laptop, Mouse | 0.024 | 0.684 | 5.095 | 2.743 |
Interpretation: baskets containing Keyboard + Laptop Bag + USB‑C Hub are ~5× more likely than random to also include Laptop + Mouse.
Top‑N item supports
Top rule (by lift)
| Antecedents | Consequent | Support | Confidence | Lift |
|---|---|---|---|---|
| Laptop, Mouse | Keyboard | 0.036 | 0.58 | 2.41 |
Interpretation: if a basket contains Laptop + Mouse, it’s ~2.4× more likely to also contain Keyboard than at random.
Top‑N item supports (chart saved to outputs/fig_top_items.png).
- Frequent Itemset Mining: Identify groups of products that co‑occur in transactions. Two algorithms are provided:
- Apriori: generates candidate itemsets level‑wise using the downward‑closure property, pruning infrequent candidates early.
- FP‑Growth: compresses the dataset into an FP‑tree to mine itemsets without explicit candidate generation. Efficient on dense data.
- Association Rules: For each frequent itemset, derive rules
X → Y(withX ∩ Y = ∅) and compute:- Support:
P(X ∪ Y) - Confidence:
P(Y|X) - Lift:
P(Y|X) / P(Y)( >1 indicates positive association ) - Leverage and Conviction for additional signal.
- Support:
- The synthetic generator biases realistic attachments (e.g., Laptop → Mouse/Keyboard/Bag) so mined rules are interpretable.
- Tune
--min_supportand--min_thresholdfor sparser vs. denser rule sets. - For large datasets, FP‑Growth is typically faster than Apriori.
Market basket analysis with Python. Generate synthetic retail transactions, mine frequent itemsets using Apriori/FP‑Growth, derive association rules, and export figures and CSVs for quick portfolio demos.