Categories: Tips & Tricks

From Inbox to Insights: How We Turned Our Entire Gmail History into a Lead List in Google Sheets

TL;DR: We wrote a Google Apps Script that crawls our entire Gmail history, extracts every contactable email address (From/To/Cc/Bcc), dedupes + validates, tags with context, and writes it all into a Google Sheet for lead gen. It runs in batches with resume checkpoints, supports filters (include/exclude domains/labels), and drops a clean, CRM-ready table.


Why we built it

Over the years, our team has emailed thousands of people—prospects, partners, event signups, trial users, you name it. That history is gold… but it’s buried across threads, labels, and accounts. We wanted a one-time harvest and an ongoing trickle that turns inbox activity into a structured lead source—without buying yet another tool.


What the script does

  • Scans Gmail (your mailbox) using label queries (e.g., in:anywhere -category:promotions), in paged batches to respect quotas.
  • Extracts emails from From/To/Cc/Bcc headers (per message, not just top thread).
  • Optionally scrapes email-looking strings in bodies/signatures (off by default).
  • Normalizes (lowercase/trim), validates (simple RFC regex), and dedupes by address.
  • Tags each address with lightweight context:
    • First seen / last seen date
    • Direction (inbound, outbound, mutual)
    • Source labels (e.g., lead@site, newsletter, events/2024)
    • Thread count & message count involving that address
  • Filters to exclude internal domains, role accounts (e.g., no-reply@), or any blacklist.
  • Writes to Google Sheets with a tidy schema and a timestamped “sync run” sheet for audits.
  • Supports dry-run (log only), resume (continues where it left off), and scheduled syncs (time-driven trigger).

The column layout we use

columnexample
emailalex@customer.com
first_seen2021-03-12T09:44:10Z
last_seen2025-09-18T16:27:03Z
directioninbound / outbound / mutual
threads14
messages53
labelssales, demo-request
source_accountsalex@ourco.com (who on our side)
notesfree-text tags/flags
run_idsync batch id

We keep a separate “runs” tab with start/end time, batch sizes, and error counts for observability.


Safety, quality, and compliance

  • Opt-out sources: We skip no-reply@, common mailers, and anything matching your denylist.
  • Internal hygiene: Exclude your own domains; optionally whitelist only external domains.
  • GDPR/CAN-SPAM: This builds a first-party contact graph, not a cold list. Use responsibly:
    • Email only those who’ve opted in or where you have legitimate interest.
    • Maintain an unsubscribe and honor it (store it in the Sheet/denylist).
  • Rate limits: Processes in small batches (e.g., 100 threads/run), with exponential backoff on 429/5xx.
  • Resume checkpoints: Stores lastThreadIndex and runId via PropertiesService, so restarts are safe.

Config knobs you’ll change first

  • Search query: e.g., in:anywhere -category:promotions -label:spam.
  • Date window: one-time “all history” vs. rolling “last 30 days”.
  • Include/Exclude lists: domains, addresses, labels.
  • Body scraping: off by default; turn on if you want signature harvesting.
  • Destination Sheet: spreadsheet ID + sheet names for leads and runs.
  • Dry-run & batch size: start dry; tune batch size to keep executions <6 minutes.

Our workflow

  1. Dry-run with conservative filters (-category:promotions, exclude internal).
  2. Live sync across a few batches; spot-check top rows.
  3. Set a time-driven trigger (e.g., nightly) for incremental updates.
  4. Add a couple of Sheets formulas (company from domain, simplified role detection).
  5. Connect the Sheet to our CRM import or marketing tool.

What made it robust

  • Per-message header parsing: finds addresses even when the thread’s “first” message didn’t include them.
  • De-dup with merge: on re-encounter, we update last_seen and counters.
  • Direction inference: compares our “from” identities vs “to/cc” to classify.
  • Label echoing: preserves useful context like demo, trial, events/2024.
  • Quiet logs + CSV-style run sheet: easy to debug and audit.

Alternate uses (beyond lead gen)

  • Recruiting CRM: build a candidate/contact Sheet from hiring inboxes.
  • Vendor catalog: inventory partner/supplier contacts you’ve dealt with.
  • Customer success watchlist: list of all stakeholders per account domain.
  • Security sweep: detect unexpected personal domains that shouldn’t be in sensitive threads.
  • Community ops: harvest speaker/attendee contacts from event inbox labels.
  • Churn rescue: surface domains that went quiet (no messages since X days).
  • Billing hygiene: find role accounts (e.g., ap@, billing@) across customers.

Get the script (free)

Want this Gmail → Sheets harvester? Email Stacy Cook and he’ll send the script and a 2-minute setup guide (paste into Apps Script, set your query & Sheet ID, run in dry-run first).


FAQ

Does it include BCC recipients?
If you sent the message, Gmail keeps your BCCs for your copy—yes. Otherwise, BCC of others isn’t visible.

Will it pull newsletters?
Only if your query includes Promotions/Forums. We default to excluding them.

Multiple mailboxes?
Use delegation or run the script per account and append to the same Sheet with a source_account column.

Will this hit quotas?
We batch reads, backoff on limits, and checkpoint. For huge mailboxes, let it run nightly increments.


If you want a tailored version (e.g., body signature parsing, CRM push, merge with website signups), tell Stacy Cook what you need—we’re happy to share and tweak. 🚀

a. Want a version that pushes directly to HubSpot/Close and tags contacts with first_seen/last_seen?
b. Prefer a one-click UI (menu + dialog) where you paste the search query and pick the Sheet?

stacy

Share
Published by
stacy

Recent Posts

Sustainability Expectations: What Clients Are Asking For Now

The ask used to be optional. A client might mention they'd "love" compostable cups or…

23 hours ago

Your Event Ended. Your Best Marketing Window Just Opened.

Most event managers treat post-event content as an afterthought, a recap post, maybe a thank-you…

1 week ago

Stop Selling Tickets. Start Building a Scene.

For years, event social media has followed the same playbook: announce, hype, sell, repeat. But…

1 week ago

From Novelty to Necessity: How AI and Emerging Technology Are Reshaping Event Management in 2026

The events industry has entered a new phase of maturity. After years of experimentation and…

3 weeks ago

The Intelligent Event: How AI Is Reshaping the Way We Plan, Deliver, and Measure Experiences

Artificial intelligence is no longer a futuristic concept reserved for Silicon Valley boardrooms. For event…

4 weeks ago

The Smarter Event: How AI and Technology Are Reshaping the Industry

The event industry has always been about creating memorable human connections, but the tools we…

1 month ago

This website uses cookies.