Skip to content

Conversation

@stats-dev
Copy link

Closes #2841

Rationale for this change

This PR adds explicit AWS profile support for both the Glue catalog client and
fsspec-based S3 FileIO.

While GlueCatalog already supports profile configuration, fsspec-based S3
operations did not propagate profile selection to the underlying
S3FileSystem or async AWS session. As a result, users had to rely on environment
variables or the default AWS profile, which makes it difficult to work with
multiple AWS configurations in parallel.

This change introduces two configuration properties:

  • client.profile-name: a unified AWS profile for the catalog client and FileIO
  • s3.profile-name: an AWS profile specifically for S3 FileIO

Profile resolution follows this precedence:

  1. s3.profile-name
  2. client.profile-name

This ensures consistent and explicit credential selection across catalog and
FileIO layers when using the fsspec backend.

Are these changes tested?

Yes. New unit tests were added to validate the profile propagation behavior.

  • Glue Catalog

    • Verifies that boto3.Session(profile_name=...) is created when initializing
      GlueCatalog with client.profile-name.
  • S3 FileIO (fsspec)

    • Verifies that client.profile-name or s3.profile-name results in the
      creation of an async AWS session with the correct profile, which is then
      passed to S3FileSystem.

The tests were run locally with:

pytest tests/catalog/test_glue_profile.py tests/io/test_fsspec_profile.py

Output would be:

==================== test session starts =====================
platform darwin -- Python 3.12.4, pytest-9.0.2, pluggy-1.6.0
rootdir: ${ROOTDIR}/iceberg-python
configfile: pyproject.toml
plugins: anyio-4.2.0, lazy-fixture-0.6.3, requests-mock-1.12.1
collected 3 items                                            
tests/catalog/test_glue_profile.py .                   [ 33%]
tests/io/test_fsspec_profile.py ..                     [100%]
===================== 3 passed in 1.02s ======================

Are there any user-facing changes?

Yes, this adds new configuration properties that users can set:

  • client.profile-name: Sets the AWS profile for both the catalog client and FileIO (unified configuration).
  • s3.profile-name: Sets the AWS profile specifically for S3 FileIO.

Example Usage:

catalog = GlueCatalog(
    "my_catalog",
    **{
        "client.profile-name": "my-aws-profile",
        # ... other config
    }
)

@stats-dev stats-dev marked this pull request as draft January 25, 2026 09:03
@stats-dev stats-dev marked this pull request as ready for review January 25, 2026 09:09
@stats-dev stats-dev marked this pull request as draft January 25, 2026 09:10
@stats-dev stats-dev marked this pull request as ready for review January 25, 2026 10:37
@stats-dev stats-dev marked this pull request as draft January 25, 2026 11:03
@stats-dev stats-dev force-pushed the aws-profile-support branch from 79e50a1 to be62e94 Compare January 25, 2026 11:12
@stats-dev stats-dev marked this pull request as ready for review January 25, 2026 11:25
Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!
Left a comment about passing profile name.


session = boto3.Session(
profile_name=properties.get(GLUE_PROFILE_NAME),
profile_name=properties.get(GLUE_PROFILE_NAME, properties.get(AWS_PROFILE_NAME)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
profile_name=properties.get(GLUE_PROFILE_NAME, properties.get(AWS_PROFILE_NAME)),
profile_name=get_first_property_value(properties, GLUE_PROFILE_NAME, AWS_PROFILE_NAME),

we have a helper function for this 😄

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
I'll switch this to use get_first_property_value to keep the behavior consistent with the rest of the codebase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if profile_name := get_first_property_value(properties, S3_PROFILE_NAME, AWS_PROFILE_NAME):
from aiobotocore.session import AioSession

session = AioSession(profile=profile_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing in the AioSession here will override the internal session object
From the docs:
https://s3fs.readthedocs.io/en/latest/api.html#s3fs.core.S3FileSystem
"""
session (aiobotocore AioSession object to be used for all connections.) – This session will be used inplace of creating a new session inside S3FileSystem. For example: aiobotocore.session.AioSession(profile=’test_user’)
"""

I think we can pass in profile name as kwarg to S3FileSystem
The kwarg will be passed into the internal AioSession object
https://github.com/fsspec/s3fs/blob/56402cd2565c5fa2aa84020c716560b3db27e8cd/s3fs/core.py#L563-L565

WDYT?

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds AWS profile support for the Glue catalog client and fsspec-based S3 FileIO, addressing issue #2841. Users can now explicitly configure AWS profiles through client.profile-name (unified) and s3.profile-name (S3-specific) properties, with s3.profile-name taking precedence over client.profile-name for S3 operations. Similarly, glue.profile-name takes precedence over client.profile-name for Glue catalog operations.

Changes:

  • Added AWS_PROFILE_NAME and S3_PROFILE_NAME configuration constants
  • Extended GlueCatalog to support fallback from glue.profile-name to client.profile-name
  • Implemented profile support in S3FileSystem by creating AioSession with the configured profile
  • Added comprehensive unit tests validating profile propagation

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
pyiceberg/io/init.py Defines new constants AWS_PROFILE_NAME and S3_PROFILE_NAME for configuration properties
pyiceberg/catalog/glue.py Updates GlueCatalog to fall back to client.profile-name when glue.profile-name is not set
pyiceberg/io/fsspec.py Implements profile support by creating AioSession with the configured profile for S3FileSystem
tests/catalog/test_glue_profile.py Adds test verifying GlueCatalog uses client.profile-name when provided
tests/io/test_fsspec_profile.py Adds tests verifying S3FileIO uses both s3.profile-name and client.profile-name
tests/io/test_fsspec.py Updates existing tests to include session=None parameter in S3FileSystem assertions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

AWS_SESSION_TOKEN = "client.session-token"
AWS_ROLE_ARN = "client.role-arn"
AWS_ROLE_SESSION_NAME = "client.role-session-name"
S3_PROFILE_NAME = "s3.profile-name"
Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation for Unified AWS Credentials (mkdocs/docs/configuration.md lines 808-831) should be updated to include the newly added client.profile-name property. The table starting at line 823 is missing an entry for client.profile-name to document that it sets the AWS profile for both the catalog and S3 FileIO.

Suggested change
S3_PROFILE_NAME = "s3.profile-name"
S3_PROFILE_NAME = AWS_PROFILE_NAME

Copilot uses AI. Check for mistakes.
AWS_SESSION_TOKEN = "client.session-token"
AWS_ROLE_ARN = "client.role-arn"
AWS_ROLE_SESSION_NAME = "client.role-session-name"
S3_PROFILE_NAME = "s3.profile-name"
Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The S3 FileIO configuration documentation (mkdocs/docs/configuration.md lines 112-131) should be updated to include the newly added s3.profile-name property. The table should include an entry for s3.profile-name to document that it sets the AWS profile specifically for S3 FileIO operations.

Copilot uses AI. Check for mistakes.
Comment on lines +61 to +86
def test_fsspec_s3_session_properties_with_client_profile() -> None:
session_properties: Properties = {
"client.profile-name": "test-profile",
"s3.endpoint": "http://localhost:9000",
**UNIFIED_AWS_SESSION_PROPERTIES,
}

with mock.patch("s3fs.S3FileSystem") as mock_s3fs, mock.patch("aiobotocore.session.AioSession") as mock_aio_session:
s3_fileio = FsspecFileIO(properties=session_properties)
filename = str(uuid.uuid4())

s3_fileio.new_input(location=f"s3://warehouse/{filename}")

mock_aio_session.assert_called_with(profile="test-profile")
mock_s3fs.assert_called_with(
anon=False,
client_kwargs={
"endpoint_url": "http://localhost:9000",
"aws_access_key_id": "client.access-key-id",
"aws_secret_access_key": "client.secret-access-key",
"region_name": "client.region",
"aws_session_token": "client.session-token",
},
config_kwargs={},
session=mock_aio_session(),
)
Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test case for precedence when both s3.profile-name and client.profile-name are provided. According to the PR description, s3.profile-name should take precedence over client.profile-name. A test should verify that when both properties are set, s3.profile-name is used.

Copilot uses AI. Check for mistakes.
Comment on lines +33 to +51
@mock_aws
def test_passing_client_profile_name_properties_to_glue() -> None:
session_properties: Properties = {
"client.profile-name": "profile_name",
**UNIFIED_AWS_SESSION_PROPERTIES,
}

with mock.patch("boto3.Session") as mock_session:
test_catalog = GlueCatalog("glue", **session_properties)

mock_session.assert_called_with(
aws_access_key_id="client.access-key-id",
aws_secret_access_key="client.secret-access-key",
aws_session_token="client.session-token",
region_name="client.region",
profile_name="profile_name",
botocore_session=None,
)
assert test_catalog.glue is mock_session().client()
Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test case for precedence when both glue.profile-name and client.profile-name are provided. According to the implementation and pattern established in the codebase, glue.profile-name should take precedence over client.profile-name. A test should verify that when both properties are set, glue.profile-name is used.

Copilot uses AI. Check for mistakes.
Comment on lines +25 to +30
UNIFIED_AWS_SESSION_PROPERTIES = {
"client.access-key-id": "client.access-key-id",
"client.secret-access-key": "client.secret-access-key",
"client.region": "client.region",
"client.session-token": "client.session-token",
}
Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constant UNIFIED_AWS_SESSION_PROPERTIES is being redefined locally instead of importing it from tests.conftest, which deviates from the established pattern in the codebase. Multiple test files (e.g., tests/catalog/test_glue.py, tests/catalog/test_dynamodb.py, tests/io/test_pyarrow.py) import this constant from tests.conftest. Consider importing it from tests.conftest to maintain consistency and avoid duplication.

Copilot uses AI. Check for mistakes.
Comment on lines +25 to +30
UNIFIED_AWS_SESSION_PROPERTIES = {
"client.access-key-id": "client.access-key-id",
"client.secret-access-key": "client.secret-access-key",
"client.region": "client.region",
"client.session-token": "client.session-token",
}
Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constant UNIFIED_AWS_SESSION_PROPERTIES is being redefined locally instead of importing it from tests.conftest, which deviates from the established pattern in the codebase. Multiple test files (e.g., tests/catalog/test_glue.py, tests/catalog/test_dynamodb.py) import this constant from tests.conftest. Consider importing it from tests.conftest to maintain consistency and avoid duplication.

Copilot uses AI. Check for mistakes.
Comment on lines +38 to +43
UNIFIED_AWS_SESSION_PROPERTIES = {
"client.access-key-id": "client.access-key-id",
"client.secret-access-key": "client.secret-access-key",
"client.region": "client.region",
"client.session-token": "client.session-token",
}
Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constant UNIFIED_AWS_SESSION_PROPERTIES is being redefined locally instead of importing it from tests.conftest, which deviates from the established pattern in the codebase. Other test files in the same directory (e.g., tests/io/test_pyarrow.py) import this constant from tests.conftest. Consider importing it from tests.conftest to maintain consistency and avoid duplication.

Copilot uses AI. Check for mistakes.
@stats-dev stats-dev force-pushed the aws-profile-support branch from be62e94 to 7377d05 Compare January 25, 2026 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missing AWS Profile Support in PyIceberg

2 participants