/01 Why sitemaps matter
An XML sitemap is a structured file that lists the URLs on your site you want search engines to know about — optionally with metadata like when each page last changed. It isn't a ranking factor in itself; it's a discovery mechanism. It tells crawlers like Googlebot, "here is the canonical inventory of pages worth your time."
Search engines don't need a sitemap to crawl a small, well-linked site — they'll follow links and find everything. Sitemaps earn their keep on the sites where link-following alone falls short:
- Large sites where deep pages are many clicks from the homepage and risk being missed or crawled late.
- Frequently updated sites where you want new and changed URLs discovered quickly, signalled via
<lastmod>. - Sites with weak internal linking — new pages, orphaned content, or sections with few inbound links.
- Sites with rich/media content where a sitemap is the cleanest way to surface URLs.
Google is explicit that a sitemap improves crawl discovery and indexing efficiency for exactly these cases. For the full picture, read Google's official guidance: Sitemaps overview · Google Search Central ↗.
Hand-maintaining a sitemap means it drifts the moment your URL inventory changes — stale entries, missing new pages, forgotten resubmissions. Automating generation and submission turns a recurring manual chore into a pipeline you run on a schedule. That's the entire point of the tool below.
/02 What this project does
Sitemaps_Automation is a small, dependency-light Python toolkit that takes
a list of URLs and turns it into a complete, Google-compliant set of XML sitemaps — then
(optionally) submits that set to Google Search Console for you. It's organised as a single
workflow script, sitemap_workflow.py, exposing three commands:
update— read your URL list, build the XML sitemaps and a sitemap index, and write a run log.validate— sanity-check a generated sitemap index without touching the network.submit— push the generated sitemaps to your live Search Console property via the API.
The problem it solves: most teams either edit sitemaps by hand (slow, error-prone) or bolt sitemap output onto their CMS and still resubmit manually. This decouples it — feed it a CSV or text file of URLs from any source and it produces clean output you can diff, validate, and submit reproducibly.
What you get out of a run:
- Google-compliant XML sitemap files, split as needed.
- A
sitemap_index.xmlthat points at them — the single URL you submit to Search Console. - A
SITEMAP_LOG.mdthat's appended with a summary on every run, so you have an audit trail.
Per the README, pandas and requests cover the
generation steps (1–4). The Google client libraries are only pulled in for the Search
Console submit step — so if you only want to generate sitemaps,
you can skip the heavier auth dependencies entirely.
/03 Setup & prerequisites
You'll need Python and Git installed locally. Clone the repository and install its dependencies:
# 1. Clone the repository
git clone https://github.com/kiranbabuthatha/Sitemaps_Automation.git
cd Sitemaps_Automation
# 2. (Recommended) create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate # macOS / Linux
# .venv\Scripts\activate # Windows (PowerShell)
# 3. Install dependencies
pip install -r requirements.txt
That installs everything you need: pandas and requests for
building sitemaps, plus the google-* client libraries used later for the
Search Console submission step.
The README does not pin a specific Python version. A modern Python 3
(3.9 or newer) is a safe assumption, but confirm against your own environment — and if
the repo adds a .python-version or a documented minimum, that takes
precedence. (Kiran: drop the exact version here once you've confirmed it.)
/04 Getting your credentials (client_secrets.json)
The generation steps need no credentials. Submitting to Search Console
does — you authenticate as yourself (OAuth) or as a service account. Here's the OAuth path,
which produces the client_secrets.json the tool expects.
Step 1 — Create a Google Cloud project
Go to the
Google Cloud Console ↗,
open the project picker, and create a new project (or reuse an existing one). Give it a name
you'll recognise, e.g. sitemaps-automation.
Step 2 — Enable the Search Console API
In the same project, open APIs & Services → Library, search for Google Search Console API, and click Enable. This is what lets the script talk to Search Console programmatically.
Step 3 — Configure the OAuth consent screen
Under APIs & Services → OAuth consent screen, choose External (unless you're on Google Workspace and prefer Internal), fill in the app name and your support email, and add your Google account as a Test user. You don't need to publish or verify the app for personal use — test-user access is enough to authorise the script.
Step 4 — Create and download the OAuth client
Go to APIs & Services → Credentials → Create credentials → OAuth client ID.
Choose Desktop app as the application type, create it, then click
Download JSON. Rename the downloaded file to
client_secrets.json and place it in the project root.
Google's official walkthrough for authorising Search Console API requests is here: Authorizing requests · Search Console API ↗.
client_secrets.json, service_account.json, and the
gsc_token.json generated during the OAuth flow are
secrets. Anyone with them can act against your Search Console property.
Never commit them to a public repo. The project already git-ignores
them; make sure your .gitignore contains:
# Google API credentials — keep these out of version control
client_secrets.json
service_account.json
gsc_token.json
If you ever do commit a secret by accident, treat it as compromised: revoke the client in the Cloud Console and generate a fresh one.
/05 Running the project
Generation is the update command. It reads your URL list and writes the
sitemaps to an output directory (./sitemaps by default, override with
--out).
Input format
Point --input at either a CSV or a plain text file:
- CSV — a
urlcolumn is required; an optionallastmodcolumn adds last-modified dates. - TXT — one URL per line; lines starting with
#are ignored.
url,lastmod
https://www.example.com/new-page,2026-05-27
https://www.example.com/another-page,2026-05-30
Generate the sitemaps
python sitemap_workflow.py update \
--site https://www.example.com \
--input examples/sample_urls.csv \
--base-url https://www.example.com \
--group-by path_depth --depth 1 \
--with-lastmod \
--out ./sitemaps
Here --group-by path_depth --depth 1 splits URLs into separate sitemaps by the
first path segment, and --with-lastmod includes the dates from your input.
When it finishes, ./sitemaps contains the individual sitemap files plus
sitemap_index.xml, and SITEMAP_LOG.md gets a fresh run summary.
Validate the output
Before submitting anything, sanity-check the index. This is offline — no network, no credentials:
python sitemap_workflow.py validate ./sitemaps/sitemap_index.xml
The repo ships an offline test you can run to validate behaviour end-to-end before wiring up any live property:
python test_sitemap.py
/06 Submitting to Google Search Console
Once you have a validated sitemap_index.xml, there are two ways to get it in
front of Google.
Option A — Manually, via the Search Console UI
The no-code route. First host the sitemap files on your domain (e.g. so
https://www.example.com/sitemap_index.xml resolves). Then:
- Open Google Search Console ↗ and select your property.
- Go to Indexing → Sitemaps in the left-hand menu.
- Under Add a new sitemap, enter
sitemap_index.xmland click Submit.
Google's help article walks through this and explains the status reporting: Build and submit a sitemap · Search Console Help ↗.
Option B — Automatically, via the API
This is where the credentials from step 4 pay off. The submit command pushes
your sitemaps to the live property programmatically:
python sitemap_workflow.py submit \
--site https://www.example.com/ \
--base-url https://www.example.com \
--out ./sitemaps \
--credentials client_secrets.json --auth-mode oauth
The first run opens a browser to complete the OAuth consent and writes a reusable
gsc_token.json so subsequent runs don't re-prompt. The submit step targets your
live property and is confirmation-gated by design — it asks before it acts.
For scheduled / CI automation where no human is present to confirm, pass
--yes to skip the confirmation prompt. Combined with a service-account
credential (--auth-mode set accordingly), the whole generate → validate →
submit chain can run on a cron schedule.
/07 Wrap-up
That's the full loop: a URL list goes in, and Google-compliant sitemaps come out, validated
and submitted to Search Console — manually through the UI when you want a quick one-off, or
automatically through the API when you want it to run on its own. You've gone from
hand-editing sitemap.xml to a reproducible pipeline with an audit log.
From here, the obvious next step is to wire the update → validate →
submit chain into a scheduled job, fed by whatever produces your URL inventory —
your CMS, a crawl, or a database export.
Grab the code and automate it yourself
Sitemaps_Automation is open source. Clone it, run it against your own URL list, and star the repo if it saves you time.
View on GitHub ↗Building something in the same space — sitemap pipelines, URL curation, or SEO automation at scale? I do this for a living. See the Python SEO automation services or get in touch.