/01 Why sitemaps matter

An XML sitemap is a structured file that lists the URLs on your site you want search engines to know about — optionally with metadata like when each page last changed. It isn't a ranking factor in itself; it's a discovery mechanism. It tells crawlers like Googlebot, "here is the canonical inventory of pages worth your time."

Search engines don't need a sitemap to crawl a small, well-linked site — they'll follow links and find everything. Sitemaps earn their keep on the sites where link-following alone falls short:

  • Large sites where deep pages are many clicks from the homepage and risk being missed or crawled late.
  • Frequently updated sites where you want new and changed URLs discovered quickly, signalled via <lastmod>.
  • Sites with weak internal linking — new pages, orphaned content, or sections with few inbound links.
  • Sites with rich/media content where a sitemap is the cleanest way to surface URLs.

Google is explicit that a sitemap improves crawl discovery and indexing efficiency for exactly these cases. For the full picture, read Google's official guidance: Sitemaps overview · Google Search Central ↗.

i
Why automate it

Hand-maintaining a sitemap means it drifts the moment your URL inventory changes — stale entries, missing new pages, forgotten resubmissions. Automating generation and submission turns a recurring manual chore into a pipeline you run on a schedule. That's the entire point of the tool below.

/02 What this project does

Sitemaps_Automation is a small, dependency-light Python toolkit that takes a list of URLs and turns it into a complete, Google-compliant set of XML sitemaps — then (optionally) submits that set to Google Search Console for you. It's organised as a single workflow script, sitemap_workflow.py, exposing three commands:

  • update — read your URL list, build the XML sitemaps and a sitemap index, and write a run log.
  • validate — sanity-check a generated sitemap index without touching the network.
  • submit — push the generated sitemaps to your live Search Console property via the API.

The problem it solves: most teams either edit sitemaps by hand (slow, error-prone) or bolt sitemap output onto their CMS and still resubmit manually. This decouples it — feed it a CSV or text file of URLs from any source and it produces clean output you can diff, validate, and submit reproducibly.

What you get out of a run:

  • Google-compliant XML sitemap files, split as needed.
  • A sitemap_index.xml that points at them — the single URL you submit to Search Console.
  • A SITEMAP_LOG.md that's appended with a summary on every run, so you have an audit trail.
Lightweight by design

Per the README, pandas and requests cover the generation steps (1–4). The Google client libraries are only pulled in for the Search Console submit step — so if you only want to generate sitemaps, you can skip the heavier auth dependencies entirely.

/03 Setup & prerequisites

You'll need Python and Git installed locally. Clone the repository and install its dependencies:

bash · terminal
# 1. Clone the repository
git clone https://github.com/kiranbabuthatha/Sitemaps_Automation.git
cd Sitemaps_Automation

# 2. (Recommended) create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate          # macOS / Linux
# .venv\Scripts\activate           # Windows (PowerShell)

# 3. Install dependencies
pip install -r requirements.txt

That installs everything you need: pandas and requests for building sitemaps, plus the google-* client libraries used later for the Search Console submission step.

!
Verify before you copy-paste

The README does not pin a specific Python version. A modern Python 3 (3.9 or newer) is a safe assumption, but confirm against your own environment — and if the repo adds a .python-version or a documented minimum, that takes precedence. (Kiran: drop the exact version here once you've confirmed it.)

/04 Getting your credentials (client_secrets.json)

The generation steps need no credentials. Submitting to Search Console does — you authenticate as yourself (OAuth) or as a service account. Here's the OAuth path, which produces the client_secrets.json the tool expects.

Step 1 — Create a Google Cloud project

Go to the Google Cloud Console ↗, open the project picker, and create a new project (or reuse an existing one). Give it a name you'll recognise, e.g. sitemaps-automation.

Step 2 — Enable the Search Console API

In the same project, open APIs & Services → Library, search for Google Search Console API, and click Enable. This is what lets the script talk to Search Console programmatically.

Step 3 — Configure the OAuth consent screen

Under APIs & Services → OAuth consent screen, choose External (unless you're on Google Workspace and prefer Internal), fill in the app name and your support email, and add your Google account as a Test user. You don't need to publish or verify the app for personal use — test-user access is enough to authorise the script.

Step 4 — Create and download the OAuth client

Go to APIs & Services → Credentials → Create credentials → OAuth client ID. Choose Desktop app as the application type, create it, then click Download JSON. Rename the downloaded file to client_secrets.json and place it in the project root.

Google's official walkthrough for authorising Search Console API requests is here: Authorizing requests · Search Console API ↗.

!
Security — never commit your credentials

client_secrets.json, service_account.json, and the gsc_token.json generated during the OAuth flow are secrets. Anyone with them can act against your Search Console property. Never commit them to a public repo. The project already git-ignores them; make sure your .gitignore contains:

gitignore · .gitignore
# Google API credentials — keep these out of version control
client_secrets.json
service_account.json
gsc_token.json

If you ever do commit a secret by accident, treat it as compromised: revoke the client in the Cloud Console and generate a fresh one.

/05 Running the project

Generation is the update command. It reads your URL list and writes the sitemaps to an output directory (./sitemaps by default, override with --out).

Input format

Point --input at either a CSV or a plain text file:

  • CSV — a url column is required; an optional lastmod column adds last-modified dates.
  • TXT — one URL per line; lines starting with # are ignored.
csv · sample_urls.csv
url,lastmod
https://www.example.com/new-page,2026-05-27
https://www.example.com/another-page,2026-05-30

Generate the sitemaps

bash · terminal
python sitemap_workflow.py update \
    --site https://www.example.com \
    --input examples/sample_urls.csv \
    --base-url https://www.example.com \
    --group-by path_depth --depth 1 \
    --with-lastmod \
    --out ./sitemaps

Here --group-by path_depth --depth 1 splits URLs into separate sitemaps by the first path segment, and --with-lastmod includes the dates from your input. When it finishes, ./sitemaps contains the individual sitemap files plus sitemap_index.xml, and SITEMAP_LOG.md gets a fresh run summary.

Validate the output

Before submitting anything, sanity-check the index. This is offline — no network, no credentials:

bash · terminal
python sitemap_workflow.py validate ./sitemaps/sitemap_index.xml
Test without network access

The repo ships an offline test you can run to validate behaviour end-to-end before wiring up any live property:

python test_sitemap.py

/06 Submitting to Google Search Console

Once you have a validated sitemap_index.xml, there are two ways to get it in front of Google.

Option A — Manually, via the Search Console UI

The no-code route. First host the sitemap files on your domain (e.g. so https://www.example.com/sitemap_index.xml resolves). Then:

  • Open Google Search Console ↗ and select your property.
  • Go to Indexing → Sitemaps in the left-hand menu.
  • Under Add a new sitemap, enter sitemap_index.xml and click Submit.

Google's help article walks through this and explains the status reporting: Build and submit a sitemap · Search Console Help ↗.

Option B — Automatically, via the API

This is where the credentials from step 4 pay off. The submit command pushes your sitemaps to the live property programmatically:

bash · terminal
python sitemap_workflow.py submit \
    --site https://www.example.com/ \
    --base-url https://www.example.com \
    --out ./sitemaps \
    --credentials client_secrets.json --auth-mode oauth

The first run opens a browser to complete the OAuth consent and writes a reusable gsc_token.json so subsequent runs don't re-prompt. The submit step targets your live property and is confirmation-gated by design — it asks before it acts.

i
Running it unattended

For scheduled / CI automation where no human is present to confirm, pass --yes to skip the confirmation prompt. Combined with a service-account credential (--auth-mode set accordingly), the whole generate → validate → submit chain can run on a cron schedule.

/07 Wrap-up

That's the full loop: a URL list goes in, and Google-compliant sitemaps come out, validated and submitted to Search Console — manually through the UI when you want a quick one-off, or automatically through the API when you want it to run on its own. You've gone from hand-editing sitemap.xml to a reproducible pipeline with an audit log.

From here, the obvious next step is to wire the updatevalidatesubmit chain into a scheduled job, fed by whatever produces your URL inventory — your CMS, a crawl, or a database export.

Grab the code and automate it yourself

Sitemaps_Automation is open source. Clone it, run it against your own URL list, and star the repo if it saves you time.

View on GitHub

Building something in the same space — sitemap pipelines, URL curation, or SEO automation at scale? I do this for a living. See the Python SEO automation services or get in touch.