If your entire site or large sections are missing from Google's index, you need a systematic recovery plan. This checklist moves from manual review through crawl requests, penalty checks, and site structure audits to get you back in the index.
Quick calculator. Put in the expected monthly value of a page or link batch and the natural waiting time.
When you type 'site:yourdomain.com' into Google and see zero results, panic sets in. But the fix is rarely a mystery. Most indexing failures fall into three buckets: Google can't reach your pages, Google chooses not to index them, or Google indexed them briefly and removed them. Each requires a different response.
A common situation we see is a site with 2,000 product pages returning only 47 indexed URLs. The owner assumes a penalty. The reality is often simpler: a misconfigured noindex tag on the template, a disallowed path in robots.txt, or a server that times out on 30% of crawl requests. You need to rule out each possibility before escalating to a manual action or a content quality problem.
Start with the checklist below. Work top to bottom. Do not skip the site structure audit—many recoveries stall because people fix the crawl but ignore the thin content signal.
| Symptom | Root Cause | Quick Fix | Failure Mode / Risk |
|---|---|---|---|
| Zero indexed pages site:domain returns 0 results | Robots.txt blocks everything or server returns 5xx errors | Check robots.txt for Disallow: / or wildcards.Verify server response: use Search Console URL Inspection. | If you unblock all resources at once, Google may treat it as a sudden flood and throttle crawl rate. |
| Only homepage indexed Internal pages missing | Noindex tag on templates or internal links blocked by JavaScript | Scan site with Screaming Frog for meta robots noindex.Check Google for link rel='canonical' mismatches. | Fixing the tag alone doesn't guarantee re-crawl. You must submit updated sitemap and request indexing in Search Console. |
| Pages drop after 2-3 weeks Indexed then removed | Thin content or duplicate content detected by Panda-style filters | Consolidate near-duplicate pages using 301 redirects or canonical tags. Add unique text blocks to category pages. | Google may soft-404 thin pages. If you re-add without improving content, they will be removed again within days. |
| Site indexed but key sections missing Blog or product categories absent | JavaScript rendering failure Google cannot see links or content | Test with Google's special tags documentation for rendering requirements. Use URL Inspection tool's 'View Tested Page' screenshot. | If you rely on client-side rendering, Google may not see internal links. Pre-rendering or server-side rendering often required. |
| New pages never appear Fresh content ignored | Slow crawl budget or orphan pages with no internal links | Add new pages to XML sitemap within 24 hours. Link from a high-authority page on your domain. | Google may prioritize deeper pages over shallow ones. If your crawl budget is under 50 pages/day for a 5,000-page site, new pages may wait weeks. |
Run site:yourdomain.com in Google. Count indexed pages. Check for sudden drops.
Look for Disallow: / or wildcards blocking CSS/JS. Use Search Console robots.txt tester.
Crawl with Screaming Frog. Filter for 'noindex' in meta robots or HTTP headers.
Use URL Inspection to view rendered HTML. Confirm Google sees your content and links.
Correct issues. Submit updated sitemap. Use 'Request Indexing' for critical pages.
Watch Search Console crawl stats for 5xx errors, blocked resources, and crawl rate changes.
The situation: A client with a 2,500-page e-commerce site had only 72 pages indexed. The SEO team had spent three weeks chasing a phantom penalty.
Step 1: We ran a crawl with Screaming Frog SEO Spider. Settings: 30 threads, 5 second timeout, respect robots.txt. We exported the 'All URLs' report and filtered for status codes and meta robots directives.
Step 2: We found that 1,840 of the 2,500 pages (73.6%) carried a meta name='robots' content='noindex, follow' tag. The tag was injected by a legacy plugin on all product pages that had a stock level below 5. The plugin had been installed three months ago. The client had been adding new products, but they were all set to 'low stock' by default, causing the noindex tag to fire.
Step 3: We removed the plugin and ran a bulk removal of the noindex tag via a database query. We then submitted the XML sitemap (2,476 URLs) through Google Search Console and used the 'Request Indexing' feature on the top 200 category and product pages.
Result: Within 11 days, indexed pages went from 72 to 2,103. The remaining 373 were either out of stock or had duplicate content issues that we handled separately.
Blocked CSS and JavaScript. This is the most common edge case we see. A robots.txt file blocks wp-content/ or dist/ folders. Google then sees a page with no styles and no interactive elements, considers it low quality, and drops it. Fix: unblock CSS and JS, then resubmit the sitemap.
Wrong filters in Search Console. The 'Index Coverage' report can mislead you if you filter by 'Error' only. You may miss 'Excluded' statuses like 'Crawled - currently not indexed' (a sign of content quality issues) or 'Discovered - currently not indexed' (a sign of crawl budget issues). Always look at the full breakdown.
Empty sitemaps. A sitemap that returns zero URLs because of a database error or a caching glitch. Google will treat it as a dead feed and stop recrawling. We once debugged a site where the sitemap generator had a memory limit of 128MB and silently returned an empty index. The fix: increase memory limit and regenerate the sitemap.
Duplicate content across subdomains. If www.domain.com and blog.domain.com have overlapping content, Google may choose to index only one and exclude the other entirely. Consolidation via canonical tags or 301s is required.
Sometimes the technical setup is perfect—robots.txt clean, sitemap submitted, no noindex tags—and still pages don't make it into the index. That's when you hit algorithmic filters like the thin content signal or the site reputation penalty.
Google's documentation on special tags is authoritative, but it doesn't cover every edge case. For example, a page with 200 words of unique content and 800 words of boilerplate footer text may be treated as thin. We have seen sites recover by implementing drip-feed indexing strategies to manage link velocity and avoid triggering spam filters when large volumes of new pages are published.
If you have removed technical blocks and pages still don't index, audit your content depth. A page needs at least 300-500 words of substantive, unique text that matches the search intent. Category pages with only product thumbnails and prices are often excluded.
Run site:yourdomain.com and record the count of indexed pages.
Open robots.txt and verify it does not block CSS, JS, or image files. Use Search Console's robots.txt tester.
Check for noindex meta tags across all page templates using a site crawl.
Inspect Google Search Console for 'Excluded' statuses: 'Crawled - currently not indexed' and 'Discovered - currently not indexed'.
Test a representative page with the URL Inspection tool. Confirm Google sees the rendered content and internal links.
Validate your XML sitemap: it should contain only canonical URLs, be under 50MB, and return HTTP 200.
Check for manual actions in Google Search Console under Security & Manual Actions.
If content is thin, consolidate pages. Add at least 300 words of unique, useful text per page.
Submitting a sitemap is not a guarantee. Google must still crawl and evaluate each URL. Common reasons: robots.txt blocks resources, pages have noindex tags, server returns 5xx errors, or content is too thin. Use Search Console's URL Inspection to see exactly why each page is excluded.
Typically 3-7 days for fixes to reflect, but larger sites (over 10,000 pages) may take 2-4 weeks. Google must recrawl and reprocess each URL. Use 'Request Indexing' in Search Console to speed up the process for critical pages, but expect organic discovery to take longer.
It means Google crawled the page but chose not to include it in the index. The most common cause is thin or low-quality content. Other reasons: duplicate content, poor user experience, or the page is considered an orphan. Improve the content and internal linking, then request indexing again.
Yes, but it's rare for an entire site to be deindexed via a manual action. More often, manual actions target specific sections (e.g., thin affiliate pages). Check Security & Manual Actions in Search Console. If a manual action exists, follow the remediation steps and submit a reconsideration request.
Test a page with the URL Inspection tool and view the 'Tested Page' screenshot. If content or links are missing, switch to server-side rendering or pre-rendering. Ensure your JavaScript does not rely on user interaction to load critical content. Google can render JS but has limits on resources and time.
Disallowing the entire site with 'Disallow: /', blocking CSS and JS files in folders like /wp-content/ or /assets/, and using wildcard patterns that accidentally match important URLs. Also, testing with the robots.txt tester in Search Console is critical—many people forget to check for syntax errors.
No. The noindex tag is a strong directive. Google will not index a page with a noindex tag regardless of backlinks. However, Google may still crawl the page to discover links on it. If you want the page indexed, remove the noindex tag and request indexing.
Cross-reference your indexing drop with known update dates (e.g., Google's search status dashboard). If the drop correlates with an update like the Helpful Content Update, focus on content quality. If the drop is gradual, it's more likely a technical issue. There is no API to detect algorithm penalties—only correlation.
'Discovered' means Google found the URL in a sitemap or link but has not attempted to crawl it yet, often due to crawl budget limits. 'Crawled' means Google tried to crawl the page but chose not to index it, typically due to content quality. Both require different fixes: more crawl budget for the former, better content for the latter.
Not the entire site, but large-scale duplication can lead to significant deindexing. Google may index only one version of near-duplicate pages and exclude the rest. Use canonical tags to point to the preferred version. If the duplication is across domains, implement 301 redirects or cross-domain canonicals.