Quantcast
Channel: SharePoint Strategery
Viewing all articles
Browse latest Browse all 42

Reasons to do a full crawl (*and it’s not for schema changes)

0
0

To avoid burying the lede, I'll start by "boldly" saying...
*You don't need to regularly schedule a full crawl for SharePoint Search

I've been at this long enough to realize that someone likely has an exception to this - but I truly cannot think of a reason to schedule full crawls (especially for SharePoint content). In fact, you generally do not need to run them at all outside of the few/specific reasons provided here in TechNet:

Reasons to do a full crawl
https://technet.microsoft.com/en-us/library/4356bad9-de1d-4e81-b049-17248b4a86c1#Plan_full_crawl

Most of these reasons are straightforward (e.g. the SSA or content source is new). However, the following reason in particular is commonly misunderstood (and drives the most common objection I hear when recommending customers avoid scheduling a full crawl):

  • A Search service application administrator or site collection administrator added or changed a managed property. A full crawl of all affected content sources is required for the new or changed managed property to take effect.

I feel this reason could be more accurately stated as:

  • A Search service application administrator or site collection administrator added or changed a managed property. A full crawl of all affected content sources is required for the new or changed managed property to take effect.
    • If this added/changed managed property impacts a very broad or global scope (such as the Title or Author managed property), a full crawl of the entire content source(s) may be required
    • However, if the added/changed managed property only impacts a smaller subset of content (e.g. a site collection or list/library), then only a full crawl of that subset requires the full crawl
      • In SharePoint 2013/2016 and SharePoint Online, this can be achieved with the “Reindex this site” or “Reindex this list/library” functionality as described here

In other words, the schema defines how an item gets stamped out when submitted to the Search index. Keep in mind, when a item gets written to the index, the item will only contain managed properties that have a value (if the MP is blank for an item, that item in the index will not contain any reference to this MP).

Let's say for example that a managed property mappings were updated to a crawled property for a column that only exists in a single library. If, after making this schema change, you started a full crawl, then only items in this library would get impacted by this schema change (*and all other items that did not have a value for this MP would also get crawled, but would see no change). In this case, the full crawl would generate a lot of load (e.g. crawling everything) just to update the relatively few items from that library. The same outcome could be achieved (after making the schema change) with MUCH less impact by setting this impacted library to be reindexed and starting an incremental.

I hope this helps...

----

(*Hopefully this is self-obvious, but I assume that incremental and/or continuous crawls have been scheduled)


Viewing all articles
Browse latest Browse all 42

Latest Images

Trending Articles





Latest Images