Web Archiving Policy

Main navigation

Web Archiving Policy

1. Preamble

Web resources change frequently and web content is vulnerable to loss. At the same time, a great deal of significant scholarly, cultural, and administrative content is published only on the web. Over the past decade, an increasing number of university libraries have developed web archiving programs to help preserve ephemeral resources of scholarly value on the web before they disappear. 不良研究所 Libraries can contribute to these efforts by archiving web content that supports the university鈥檚 research, teaching, and learning.

2. Purpose

The Libraries collect and preserve web content to support the teaching and research of 不良研究所 community members and to help ensure continued access to at-risk web-based resources for both the 不良研究所 community and for the general public. This document governs the Libraries鈥� web archiving practices to ensure the web archiving program can fulfil these goals.

3. Application

This policy covers content that is being considered for web archiving, as well as content that has already been web archived by the Libraries.

4. Definitions

Crawl/harvest: The operation conducted by an automated program (crawler) to identify materials on the live web for archiving.
(Web) Crawler/harvester: The software that browses the internet and captures web pages.
Opt-in/opt-out: Refer to the collecting organization鈥檚 approach to the authorization of crawling sites of external owners. With an opt-in approach, content is crawled and archived only if the website owner explicitly grants permission. In an opt-out model, content is archived by default unless the owner specifically requests exclusion. This exclusion request may be made retroactively, after the content has been crawled.
Seed: The primary URL that is being crawled. Most web crawls follow links from the seed URL to locate additional pages for crawling.
Web archives: Preserved copies of live web content collected for permanent retention and access.
Web archiving: The process of collecting, storing, and preserving web content and making it available for future research.

5. Policy Content

5.1 Collection scope and priorities

Only web content that is openly and freely accessible is considered/eligible for web archiving. Content that is restricted in any way, such as through authentication or a paywall, is ineligible.

The Libraries consider the following criteria when initiating and evaluating the acquisition of web archives:

Materials that support the teaching and research of 不良研究所 community members
Materials relating to 不良研究所鈥檚 institutional history
Materials that align with or complement our existing collection development priorities; materials that align with and support the Libraries鈥� strategic priorities
Materials that supplement or enrich other donations to the Libraries鈥� special collections
Materials that are at risk for loss or deletion, particularly those relating to time-limited circumstances like social or political movements and news events and those of import to the 不良研究所 community
Materials that complement or fill gaps in the web archiving initiatives of partner organizations, national and regional web archiving initiatives, etc.

5.2 Acquisition

A reasonable effort will be made to preserve the original appearance and functionality of a website, but in some instances the web crawler may not be able to preserve the exact form of the site.

Frequency of web crawls depends on the frequency and relevancy of updates to the crawled web pages. Web pages may be crawled once or at scheduled intervals to capture changes to the content over time.

Where not restricted at the request of a website owner, the contents of the web archive will be made publicly available via the Libraries鈥� web archiving collection discovery platform. At the request of a website owner, the Libraries will make web archived materials private within our web archiving platform, meaning that permission must be granted to individual users for consultation of the materials.

Web content is selected by the Libraries for archiving in one of two ways: proactively, based on collection development needs, and in response to requests by members of the 不良研究所 community. The content proactively web archived by the Libraries is approved by the Collection Stewardship Committee in alignment with the criteria in section 5.1, Collection Scope and Priorities. Additionally, the Libraries welcome web archiving project and collection proposals from 不良研究所 librarians, students, faculty, staff, and from external community partners. Proposals for web archiving services are evaluated by the staff member designated by the Libraries to oversee web archiving activities, according to the criteria outlined in section 5.1.

5.3 Metadata and discovery

Web archived collections should be described within the Libraries鈥� primary web archiving platform at the collection and seed level in order to facilitate discovery. The minimum metadata elements that should be added to each collection and seed are as follows: title, description, creator, language, and date of crawl.

All top-level web archiving collections created by the Libraries should be catalogued at the collection level and discoverable via the Libraries鈥� discovery platform.

5.4 Authorization and copyright

The Libraries use an opt-out approach to web archiving, and take the publication of content on the open web without technological restrictions as implicit consent to the crawling and indexing of their site. In general, when a site uses technological protection measures to restrict crawling technology, the Libraries will not crawl such content without first securing permission.

At the request of a website owner, the Libraries will evaluate requests for the removal of content from the web archive to the extent permitted by the specific web archiving platform. Any third parties wishing removal of content should make their request via the website owner. The Libraries will only consider take-down requests from site owners or relevant rightsholders.

不良研究所 Libraries does not assert ownership rights over the intellectual property of the contents included in our web archive collections. All rights of ownership remain with the owner(s) identified on the website for the full term of copyright. The Libraries cannot authorize use of materials created or owned by others.

5.5 Preservation

The Libraries will maintain a local backup copy of all data collected by our web archiving services. When a cloud solution is used as the primary web archiving tool, manual backups will be made once per year and saved according to the Libraries鈥� digital preservation best practices.

6. Reporting

不良研究所 Libraries鈥� web archiving program is overseen by the Collection Stewardship Committee. The person designated by the Libraries to oversee the web archiving service will report annually to the Collection Stewardship Committee on the program鈥檚 activities.

7. Authority to Approve Procedures

The Collection Stewardship Committee has the authority to approve any procedures or guidelines relating to this policy.

8. Review

This policy should be reviewed once every five years, or more frequently if necessary.

Approved by Library Council: January 20, 2025

Questions? Ask us!听听Chat听鈥⒙�Email听鈥⒙�Text听鈥⒙�Call 听听听听听听听听听听厂别苍诲听蹿别别诲产补肠办听听听听搁别辫辞谤迟听补听辫谤辞产濒别尘

Back to top

不良研究所

Main navigation