TL;DR: We wrote a Google Apps Script that crawls our entire Gmail history, extracts every contactable email address (From/To/Cc/Bcc), dedupes + validates, tags with context, and writes it all into a Google Sheet for lead gen. It runs in batches with resume checkpoints, supports filters (include/exclude domains/labels), and drops a clean, CRM-ready table.
Why we built it
Over the years, our team has emailed thousands of peopleโprospects, partners, event signups, trial users, you name it. That history is goldโฆ but itโs buried across threads, labels, and accounts. We wanted a one-time harvest and an ongoing trickle that turns inbox activity into a structured lead sourceโwithout buying yet another tool.
What the script does
- Scans Gmail (your mailbox) using label queries (e.g., in:anywhere -category:promotions), in paged batches to respect quotas.
- Extracts emails from From/To/Cc/Bcc headers (per message, not just top thread).
- Optionally scrapes email-looking strings in bodies/signatures (off by default).
- Normalizes (lowercase/trim), validates (simple RFC regex), and dedupes by address.
- Tags each address with lightweight context:
- First seen / last seen date
- Direction (inbound, outbound, mutual)
- Source labels (e.g., lead@site, newsletter, events/2024)
- Thread count & message count involving that address
- First seen / last seen date
- Filters to exclude internal domains, role accounts (e.g., no-reply@), or any blacklist.
- Writes to Google Sheets with a tidy schema and a timestamped โsync runโ sheet for audits.
- Supports dry-run (log only), resume (continues where it left off), and scheduled syncs (time-driven trigger).
The column layout we use
| column | example |
| [email protected] | |
| first_seen | 2021-03-12T09:44:10Z |
| last_seen | 2025-09-18T16:27:03Z |
| direction | inbound / outbound / mutual |
| threads | 14 |
| messages | 53 |
| labels | sales, demo-request |
| source_accounts | [email protected] (who on our side) |
| notes | free-text tags/flags |
| run_id | sync batch id |
We keep a separate โrunsโ tab with start/end time, batch sizes, and error counts for observability.
Safety, quality, and compliance
- Opt-out sources: We skip no-reply@, common mailers, and anything matching your denylist.
- Internal hygiene: Exclude your own domains; optionally whitelist only external domains.
- GDPR/CAN-SPAM: This builds a first-party contact graph, not a cold list. Use responsibly:
- Email only those whoโve opted in or where you have legitimate interest.
- Maintain an unsubscribe and honor it (store it in the Sheet/denylist).
- Email only those whoโve opted in or where you have legitimate interest.
- Rate limits: Processes in small batches (e.g., 100 threads/run), with exponential backoff on 429/5xx.
- Resume checkpoints: Stores lastThreadIndex and runId via PropertiesService, so restarts are safe.
Config knobs youโll change first
- Search query: e.g., in:anywhere -category:promotions -label:spam.
- Date window: one-time โall historyโ vs. rolling โlast 30 daysโ.
- Include/Exclude lists: domains, addresses, labels.
- Body scraping: off by default; turn on if you want signature harvesting.
- Destination Sheet: spreadsheet ID + sheet names for leads and runs.
- Dry-run & batch size: start dry; tune batch size to keep executions <6 minutes.
Our workflow
- Dry-run with conservative filters (-category:promotions, exclude internal).
- Live sync across a few batches; spot-check top rows.
- Set a time-driven trigger (e.g., nightly) for incremental updates.
- Add a couple of Sheets formulas (company from domain, simplified role detection).
- Connect the Sheet to our CRM import or marketing tool.
What made it robust
- Per-message header parsing: finds addresses even when the threadโs โfirstโ message didnโt include them.
- De-dup with merge: on re-encounter, we update last_seen and counters.
- Direction inference: compares our โfromโ identities vs โto/ccโ to classify.
- Label echoing: preserves useful context like demo, trial, events/2024.
- Quiet logs + CSV-style run sheet: easy to debug and audit.
Alternate uses (beyond lead gen)
- Recruiting CRM: build a candidate/contact Sheet from hiring inboxes.
- Vendor catalog: inventory partner/supplier contacts youโve dealt with.
- Customer success watchlist: list of all stakeholders per account domain.
- Security sweep: detect unexpected personal domains that shouldnโt be in sensitive threads.
- Community ops: harvest speaker/attendee contacts from event inbox labels.
- Churn rescue: surface domains that went quiet (no messages since X days).
- Billing hygiene: find role accounts (e.g., ap@, billing@) across customers.
Get the script (free)
Want this Gmail โ Sheets harvester? Email Stacy Cook and heโll send the script and a 2-minute setup guide (paste into Apps Script, set your query & Sheet ID, run in dry-run first).
FAQ
Does it include BCC recipients?
If you sent the message, Gmail keeps your BCCs for your copyโyes. Otherwise, BCC of others isnโt visible.
Will it pull newsletters?
Only if your query includes Promotions/Forums. We default to excluding them.
Multiple mailboxes?
Use delegation or run the script per account and append to the same Sheet with a source_account column.
Will this hit quotas?
We batch reads, backoff on limits, and checkpoint. For huge mailboxes, let it run nightly increments.
If you want a tailored version (e.g., body signature parsing, CRM push, merge with website signups), tell Stacy Cook what you needโweโre happy to share and tweak. ๐
a. Want a version that pushes directly to HubSpot/Close and tags contacts with first_seen/last_seen?
b. Prefer a one-click UI (menu + dialog) where you paste the search query and pick the Sheet?

