From 17b50e63b76b61ce0b7dd850a5c01622b8c837eb Mon Sep 17 00:00:00 2001 From: Jacob Coffee Date: Thu, 22 Jan 2026 10:52:51 -0600 Subject: [PATCH 1/3] draft pep --- peps/pep-0786.rst | 400 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 400 insertions(+) create mode 100644 peps/pep-0786.rst diff --git a/peps/pep-0786.rst b/peps/pep-0786.rst new file mode 100644 index 00000000000..30a6d08ac94 --- /dev/null +++ b/peps/pep-0786.rst @@ -0,0 +1,400 @@ +PEP: 786 +Title: Standard Mechanism for Opting Out of Package Index Data Collection +Author: +Sponsor: +PEP-Delegate: +Discussions-To: Pending +Status: Draft +Type: Standards Track +Topic: Packaging +Created: 22-Jan-2026 +Post-History: Pending + + +Abstract +======== + +Package indexes are starting to share download data with third parties. Users +have no way to say "don't track me." This PEP fixes that. + +It defines ``PIP_NO_ANALYTICS``, an environment variable that package tools +like pip, uv, hatch, pdm, and poetry SHOULD check. When set, tools send a +``Prefer: no-analytics`` header to indexes. Indexes SHOULD then exclude that +request from any data they share externally. + + +Motivation +========== + +PyPI logs every download. IP address, user agent, timestamp, what you +installed. This data has always existed for abuse prevention and download +counts. Nobody worried much about it. + +That's changing. There's real interest in sharing this data with analytics +companies who can enrich it. They take your IP, figure out what company you +work for, and tell package maintainers "Company X downloaded your library +400 times last month." Maintainers get insight into who uses their work. +Analytics companies get a product to sell. + +The problem: users running ``pip install requests`` have no idea this is +happening. No notice. No consent mechanism. No way to decline. + +This creates real problems for organizations. A bank's CI server downloads +Python packages all day. Those download patterns reveal their tech stack. +Maybe they don't want that information shared with third parties they've +never heard of. Maybe their security policy prohibits it. Maybe GDPR or +CCPA requires them to know where employee data goes. + +Right now, if you want to opt out, you can't. Or rather, you'd have to +configure each tool separately, assuming the tool even offers an option, +which most don't. + +This PEP creates a single standard. Set one environment variable. Every tool +that follows this spec will respect it. Every index that follows this spec +will honor it. + + +Rationale +========= + +Why Opt Out Instead of Opt In +----------------------------- + +The obvious question: why not make analytics opt in? Require explicit consent +before sharing data? + +Three reasons. First, it would break existing analytics infrastructure that +indexes and maintainers rely on. Download counts, geographic distribution, +abuse detection. Second, most users genuinely don't care about aggregate +stats. The ones who do care are sophisticated enough to set an env var. +Third, opt out matches how this works elsewhere. Browser Do Not Track works +the same way. + +That said, indexes should document what they collect and who they share it +with. Users can't make informed choices without information. + + +Why an Environment Variable +--------------------------- + +An environment variable works because you set it once and forget it. It +applies to pip, uv, hatch, poetry, pdm, whatever tool you use tomorrow. +It works in your shell, in Docker containers, in CI pipelines. + +Python already does this. ``PYTHONDONTWRITEBYTECODE`` stops .pyc files. +``PIP_NO_CACHE_DIR`` disables pip's cache. Same pattern. + + +Why an HTTP Header +------------------ + +The tool has to tell the index "this user opted out." An HTTP header does +that without changing the Simple Repository API. It works for HTML responses +and JSON responses. The index just checks the header and filters accordingly. + + +Specification +============= + + +Environment Variable +-------------------- + +Package installation tools SHOULD recognize ``PIP_NO_ANALYTICS``. When set +to any value (``1``, ``true``, ``yes``, anything nonempty), the tool SHOULD +signal the opt out to package indexes. + +These variables also work: + +- ``PY_NO_ANALYTICS`` +- ``PYTHON_NO_ANALYTICS`` + +If any of them is set, treat it as an opt out. + +Example: + +.. code-block:: bash + + export PIP_NO_ANALYTICS=1 + pip install requests + +Or for one command: + +.. code-block:: bash + + PIP_NO_ANALYTICS=1 pip install requests + + +Configuration File Option +------------------------- + +Tools that use config files SHOULD support ``no-analytics`` in their format. + +For pip.conf: + +.. code-block:: ini + + [global] + no-analytics = true + +For pyproject.toml: + +.. code-block:: toml + + [tool.pip] + no-analytics = true + + [tool.uv] + no-analytics = true + +Environment variables override config files. + + +HTTP Header +----------- + +When opt out is active, tools SHOULD send this header: + +.. code-block:: text + + Prefer: no-analytics + +This uses RFC 7240's Prefer header. It's an existing standard for expressing +client preferences. + +Indexes that support this SHOULD: + +1. Exclude requests with this header from data shared with third parties. +2. Process the request normally. Return the package files. +3. Keep collecting data they need for abuse prevention and operations. + +Indexes MAY still count these requests in aggregate stats they keep internal. + + +User Agent Suffix +----------------- + +As a backup, tools MAY also append to their User Agent: + +.. code-block:: text + + pip/24.0 (no-analytics) + +Some proxies strip headers. This provides a fallback. The Prefer header is +the primary signal. + + +Command Line Flag +----------------- + +Tools MAY offer a flag: + +.. code-block:: bash + + pip install --no-analytics requests + +Or to override a configured opt out: + +.. code-block:: bash + + pip install --analytics requests + +Command line beats environment variable beats config file. + + +What Indexes Should Do +---------------------- + +Indexes SHOULD publish their data practices. What they collect. How long +they keep it. Who they share it with. How they handle the opt out header. + +When an index gets a request with ``Prefer: no-analytics``, it SHOULD: + +1. Return the package normally. +2. Leave this request out of data sent to third party analytics. +3. Still use the request for abuse prevention, rate limiting, operational + monitoring, and legal compliance like OFAC screening. + +Indexes SHOULD NOT: + +- Slow down or degrade service for opted out requests. +- Require login or API keys to honor the opt out. +- Tell third parties which requests opted out. + + +Backwards Compatibility +======================= + +This is purely additive. Tools that don't implement it keep working. Indexes +that don't recognize the header ignore it. Users who don't set the variable +see no change. + +The only edge case: a proxy that strips Prefer headers. The User Agent +fallback handles this. + + +Adoption and Backporting +======================== + +This standard only works if users can actually use it. That means tools need +to ship support soon, not in some future release that lands in two years. + +Tool maintainers SHOULD backport this to all currently supported release +branches. For pip specifically, this should go into whatever versions still +receive updates. The implementation is small (check an env var, add a header) +and carries no risk of breaking existing behavior. + +Backporting further than supported versions is encouraged where feasible. +Many users and organizations run older pip versions. The more versions that +support this, the faster users get protection. + +Index operators like PyPI should implement header detection before or +alongside any third party data sharing arrangements. Users can't opt out +of something if the index doesn't check for the signal. + + +Security Implications +===================== + +The opt out header doesn't grant access to anything. It just says "don't +share my data." Indexes should not make security decisions based on it. + +Opting out doesn't bypass abuse prevention. Indexes can and should keep +logging what they need to stop attacks. + +One wrinkle: the header itself is a signal. Someone watching traffic knows +who opted out. This is the same tradeoff as Do Not Track in browsers. You +have to send something to express the preference. + + +Privacy Considerations +====================== + +A few things to know: + +Opt out is forward looking. It doesn't delete data already collected. + +Indexes aren't required to honor this. This PEP says SHOULD, not MUST. +If you need guarantees, verify what your index actually does. + +This only covers package downloads. Uploading packages, web browsing on +pypi.org, account activity? Separate concerns. Out of scope here. + +Yes, opting out is itself a data point. An index could theoretically use +opt out rates as a signal. The benefit of having the option outweighs this. + + +How to Teach This +================= + +For Users +--------- + +Add this to your shell profile: + +.. code-block:: bash + + export PIP_NO_ANALYTICS=1 + +Done. Every pip install, uv sync, hatch build, whatever. They all respect it. + +For CI/CD, set it in your pipeline config or container environment. + + +For Tool Authors +---------------- + +Check the environment variables. Check your config. If either says opt out, +add the header to your HTTP requests. + +.. code-block:: python + + import os + + def wants_analytics_opt_out(): + for var in ('PIP_NO_ANALYTICS', 'PY_NO_ANALYTICS', 'PYTHON_NO_ANALYTICS'): + if os.environ.get(var): + return True + return False + + def build_headers(): + headers = {'User-Agent': 'mytool/1.0'} + if wants_analytics_opt_out(): + headers['Prefer'] = 'no-analytics' + return headers + + +For Index Operators +------------------- + +Document your data practices. Check incoming requests for the Prefer header. +When you see ``no-analytics``, filter that request out before sending +anything to third party analytics providers. + +Keep serving packages normally. Keep your abuse detection working. + + +Reference Implementation +======================== + +Reference implementations will be provided for pip (the tool changes) and +Warehouse (the PyPI index changes) showing environment variable detection, +config parsing, header handling, and analytics filtering. + + +Rejected Ideas +============== + +**Opt in instead of opt out.** Would break existing analytics. Most users +don't care enough to opt in. Opt out matches industry practice. + +**Per package opt out.** Too complex. If you need that level of control, +use network level filtering. + +**Cryptographic proof of compliance.** No practical way to verify an index +actually honored your preference. Unenforceable. + +**Machine readable privacy policies.** Good idea, different PEP. + +**Custom header like X-PyPI-No-Analytics.** RFC 7240's Prefer header already +exists for exactly this purpose. Use the standard. + + +Open Issues +=========== + +**Variable naming.** ``PIP_NO_ANALYTICS`` is specific but familiar. +``PYTHON_NO_ANALYTICS`` is broader. Current spec accepts both. Is that the +right call? + +**What counts as analytics.** This targets third party data sharing. Internal +stats, abuse prevention, that stays. Is the line clear enough? + +**Verification.** Users can't easily verify indexes honor this. Should we +require transparency reports? Audit mechanisms? That might be overreach for +this PEP. + +**Authenticated requests.** Should API token requests be treated differently? +Current spec says no. All requests are equal. + + +Acknowledgements +================ + +Thanks to the Python Packaging Authority for packaging standards work. + +Thanks to the PSF for taking user privacy seriously when evaluating +analytics partnerships. + + +Footnotes +========= + +.. _RFC 7240: https://datatracker.ietf.org/doc/html/rfc7240 + + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive. From 3ecb90a78f23c3966de7e8535e30cffcc9f171ae Mon Sep 17 00:00:00 2001 From: Jacob Coffee Date: Thu, 22 Jan 2026 10:56:13 -0600 Subject: [PATCH 2/3] PEP 823: Rename from 786 (number was taken) and revise draft Rewrote for clarity and added backporting guidance. Co-Authored-By: Claude Opus 4.5 --- peps/{pep-0786.rst => pep-0823.rst} | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename peps/{pep-0786.rst => pep-0823.rst} (99%) diff --git a/peps/pep-0786.rst b/peps/pep-0823.rst similarity index 99% rename from peps/pep-0786.rst rename to peps/pep-0823.rst index 30a6d08ac94..d324facea13 100644 --- a/peps/pep-0786.rst +++ b/peps/pep-0823.rst @@ -1,4 +1,4 @@ -PEP: 786 +PEP: 823 Title: Standard Mechanism for Opting Out of Package Index Data Collection Author: Sponsor: From 0995caa6aa0372fe024ad4ec07c35cd4f7f819ec Mon Sep 17 00:00:00 2001 From: Jacob Coffee Date: Thu, 22 Jan 2026 11:00:31 -0600 Subject: [PATCH 3/3] fil in data --- peps/pep-0823.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/peps/pep-0823.rst b/peps/pep-0823.rst index d324facea13..488ce110248 100644 --- a/peps/pep-0823.rst +++ b/peps/pep-0823.rst @@ -1,8 +1,8 @@ PEP: 823 Title: Standard Mechanism for Opting Out of Package Index Data Collection -Author: -Sponsor: -PEP-Delegate: +Author: Jacob Coffee +Sponsor: TBD +PEP-Delegate: TBD Discussions-To: Pending Status: Draft Type: Standards Track