Coles vs Woolworths grocery price comparison
  • Python 64.8%
  • HTML 34.4%
  • Dockerfile 0.4%
  • JavaScript 0.4%
Find a file
slothitude a3433f7913 Fix Coles scraper: regex extraction, retry logic, session persistence
Replace HTMLParser (silently fails on 626KB responses) with regex for
__NEXT_DATA__ extraction. Add 3-attempt retry with backoff in the
scraper. Use persistent requests.Session in the proxy with periodic
cookie clearing to reduce WAF blocks. Pass COLES_PROXY env var
through docker-compose for Oracle deployments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-04 23:15:25 +10:00
scrapers Fix Coles scraper: regex extraction, retry logic, session persistence 2026-06-04 23:15:25 +10:00
static TrolleySnipe v0.1 — Coles vs Woolworths price comparison 2026-06-04 12:39:56 +10:00
templates TrolleySnipe v0.1 — Coles vs Woolworths price comparison 2026-06-04 12:39:56 +10:00
.dockerignore TrolleySnipe v0.1 — Coles vs Woolworths price comparison 2026-06-04 12:39:56 +10:00
.gitignore Add .gitignore, remove cached data/pycache from tracking 2026-06-04 13:10:06 +10:00
app.py TrolleySnipe v0.1 — Coles vs Woolworths price comparison 2026-06-04 12:39:56 +10:00
coles_proxy.py Fix Coles scraper: regex extraction, retry logic, session persistence 2026-06-04 23:15:25 +10:00
config.py TrolleySnipe v0.1 — Coles vs Woolworths price comparison 2026-06-04 12:39:56 +10:00
docker-compose.yml Fix Coles scraper: regex extraction, retry logic, session persistence 2026-06-04 23:15:25 +10:00
Dockerfile TrolleySnipe v0.1 — Coles vs Woolworths price comparison 2026-06-04 12:39:56 +10:00
models.py TrolleySnipe v0.1 — Coles vs Woolworths price comparison 2026-06-04 12:39:56 +10:00
README.md Add README with architecture, API docs, and quick start 2026-06-04 16:12:24 +10:00
requirements.txt TrolleySnipe v0.1 — Coles vs Woolworths price comparison 2026-06-04 12:39:56 +10:00
scheduler.py TrolleySnipe v0.1 — Coles vs Woolworths price comparison 2026-06-04 12:39:56 +10:00
test_coles.py Rewrite Coles proxy to Flask+requests, add test script 2026-06-04 16:10:19 +10:00

TrolleySnipe

Coles vs Woolworths grocery price comparison. Find out who's cheaper, track specials, and never overpay on groceries again.

Features

  • Dual-store search — queries Coles and Woolworths simultaneously
  • Product matching — barcode matching + fuzzy name matching across stores
  • Price comparison — shows which store is cheaper and by how much
  • Price history — 30-day price tracking per product
  • Deals page — products currently on special at either store
  • Scheduled refresh — automatic daily price updates + weekly full scrape
  • Unit pricing — compare per-100g/ml prices for fair comparisons

Architecture

                    ┌─────────────┐
  User ──────────►  │  Flask App   │  (Oracle Docker, port 5010)
                    │  app.py      │
                    └──┬──────┬───┘
                       │      │
              ┌────────┘      └────────┐
              ▼                       ▼
    ┌──────────────────┐    ┌──────────────────┐
    │ Woolworths API   │    │ Coles Proxy       │
    │ (JSON API)       │    │ (Lappy, port 8099)│
    │ scrapers/ww.py   │    │ coles_proxy.py    │
    └──────────────────┘    └────────┬─────────┘
                                     │
                                     ▼
                            ┌──────────────────┐
                            │ Coles Website     │
                            │ (HTML, WAF)       │
                            └──────────────────┘

Coles blocks server IPs with Incapsula WAF, so a proxy relay runs on a residential connection (Lappy) to fetch Coles search pages.

Quick Start

Local dev

pip install -r requirements.txt
python app.py  # runs on port 5000

Docker

docker-compose up -d  # runs on port 5010

Coles Proxy (required for Coles scraping)

# Run on Lappy with Python 3.11 (has requests installed)
'C:/Program Files/Python311/python.exe' coles_proxy.py
# Listens on port 8099

API

Endpoint Description
GET /api/search?q=milk Search both stores, return matched products
GET /api/product/<id> Single product with price history
GET /api/compare/<id> Side-by-side store comparison
GET /api/deals Products currently on special
GET /api/categories List product categories
GET /api/cheapest?store=coles Products cheaper at specified store
GET /api/history/<id>?days=30 Price history for a product
GET /api/health Health check

Project Structure

app.py              # Flask web app — routes, search, comparison
config.py           # URLs, rate limits, user agents, scheduler config
models.py           # SQLite — products, prices, search cache
scheduler.py        # APScheduler — daily/weekly price refresh
coles_proxy.py      # Coles WAF bypass proxy (runs on Lappy)
scrapers/
  coles.py          # Coles HTML parser
  woolworths.py     # Woolworths JSON API client
  __init__.py       # Product matching (barcode + fuzzy name)
templates/          # Jinja2 HTML templates
static/             # CSS, JS, images
data/               # SQLite DB (gitignored)

Price Matching

Products are matched across stores using two strategies:

  1. Barcode (exact match) — primary, high confidence
  2. Name similarity (threshold 0.65) — fallback for unmatched products

Results are sorted by savings percentage so the best deals appear first.

Scheduler

  • Daily at 2 AM — refreshes prices for 60+ grocery staples
  • Monday — full category scrape, up to 60 results per query

Requirements

  • Python 3.11+ (3.11 for Coles proxy which needs requests)
  • Flask, requests, APScheduler
  • Docker (for production deployment)

License

MIT