Data Caching Build Components |
The BuildAssembler step of the build process typically takes the most time and consumes the most memory. Slow build times in this step are typically the result of the Resolve Reference Links component when using MSDN style SDK links. In such cases, the component has to contact the MSDN web service to resolve the reference link to an online content ID. In addition, it has to load a large amount of information about the underlying .NET framework. The Copy From Index component instances that load reflection index data and XML comments index data can also consume a large amount of memory. In earlier versions of the Sandcastle tools, the data set for the entire framework was loaded which slowed down component initialization and wasted memory.
Earlier versions of the Sandcastle Help File Builder mitigated these issues somewhat by providing versions of the build components that cached the processed data to disk and reloaded it on subsequent builds. While this helped speed up initialization and improved build times by caching the MSDN content IDs, it did not help reduce memory usage. Building multiple help file formats also increased memory usage significantly because multiple instances of the build components used for each format did not share common data between them.
Starting with version 2.7.3.0 of the Sandcastle tools deployed with version 1.9.7.0 of the Sandcastle Help File Builder, the Resolve Reference Links and Copy From Index build components have been reworked to improve their performance and support data caching by default:
The components have been updated to support filtering by namespace when loading reflection data. This helps to reduce the amount of framework information that needs to be loaded by as much as 75%.
The reflection data and XML comments files are loaded in parallel. Combined with the reduced set of files, this can improve component initialization time by as much as 50% or more on multi-core machines.
The common data is shared amongst all instances of the components. As such, building multiple help file formats in one build does not consume that much more memory than building a single help file format.
The Resolve Reference Links component contains built-in support for saving resolved MSDN content IDs to a cache file. This information can be reloaded on subsequent builds thus saving the time required to look up the IDs. On machines with a slow internet connection, this can reduce the build time by 70% or more.
The API for the two build components was also revised to allow for derived build components that implement alternate caching mechanisms for MSDN content IDs, reflection target data, reflection index data, and XML comments index data. The help file builder contains alternate implementations of the components that allow caching the framework data to ESENT or SQL Server databases. These further reduce the memory requirements but can add to the build time in larger projects.
For most projects, the default component configurations defined in the Sandcastle Help File Builder's template files should prove to be sufficient. They will filter the reflection data by namespace and enable MSDN content ID caching by default.
Note |
---|
The MSDN URL cache is cumulative so you must perform at least one build with each project so as to ensure that all URLs for the given project are in the cache. To rebuild the cache file, simply delete it and rebuild your project files. See the Special Folder Locations topic for the location of the cache files. |
Caching the MSDN content IDs provides the largest performance increase. This will be most evident in control libraries in which the classes are derived from Windows Forms, WPF, or web controls which have a large number of inherited members thus causing a spike in the number of web service lookups that occur. Non-UI projects with base classes that are simpler result in fewer lookups and benefit less.
After the initial build, using cached MSDN content IDs is almost equivalent to setting the *SdkLinkType property to None / Index / Id in regards to build time but adds the benefit of keeping the actual online links in place. That said, the larger the project the less benefit you may see. For example, if your project contains 10,000+ topics, it is already going to run pretty long and the few minutes saved by the cached IDs and faster initialization time may end up making little difference in the end.
For very large projects or for machines that do not have enough memory, the alternate ESENT or SQL Server implementations can be used. In extreme cases, the project data can also be cached to provide the maximum amount of memory to the build process. To use the ESENT or SQL Server versions of the components, select the Components property category in the project properties window. Each version has a matched set of three components with an "(ESENT Cache)" or "(SQL Cache)" suffix in its name:
Comments Index Data
Reflection Index Data
Resolve Reference Links
Add the three matched components noted above to your project. The configuration dialog will appear for each as you add it to let you adjust its settings if necessary. If using the build components in your own build scripts or with other build tools, see the configuration template files included with the help file builder for an example of how to add the components to your configuration file.