-
Notifications
You must be signed in to change notification settings - Fork 860
Use ruff dependency graph for nested autoreloads #7069
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
All contributors have signed the CLA ✍️ ✅ |
|
I have read the CLA Document and I hereby sign the CLA |
| modname_to_cell_id[modname] | ||
| for modname in stale_modules | ||
| ), | ||
| relatives=dataflow.import_block_relatives, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this change, the root cell was successfully reloaded with all necessary modules (even the nested ones) but the dependent cells were not automatically rerun.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this may break tests (which is good, will remind us what the intended behavior is)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the part I understand the less.
| modules_copy = modules.copy() | ||
| modules_paths = { | ||
| get_full_path(v.__file__): v | ||
| for k, v in modules_copy.items() | ||
| if hasattr(v, "__file__") and v.__file__ is not None | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using ruff graph, it is better to use the full absolute path of the module as the dict key (ruff gives module paths, not module names)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you leave this as a comment?
| def get_full_path(p: str) -> str: | ||
| p_path = pathlib.Path(p) | ||
| if p_path.is_absolute(): | ||
| return str(p_path.resolve()) | ||
| else: | ||
| return str((pathlib.Path.cwd() / p_path).resolve()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Paths returned by ruff are relative paths relative to the current directory where the marimo edit command is run.
| def get_downstream_dependents( | ||
| dependents_map: dict[str, list[str]], changed_files: list[str] | ||
| ) -> set[str]: | ||
| visited = set() | ||
| queue = deque(changed_files) | ||
|
|
||
| while queue: | ||
| current = queue.popleft() | ||
| if current not in visited: | ||
| visited.add(current) | ||
| for neighbor in dependents_map.get(current, []): | ||
| queue.append(neighbor) | ||
|
|
||
| return visited |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dmadisetti
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! First glance looks mostly correct. Missing tests are the most outstanding issue- would you be willing to give them a pass?
Also, I set off our test suite, I'd bet something will fail as is 🤔
| LOGGER.debug( | ||
| f"Reloading '{stale_modname}' and its downstream dependents {downstream_dependents}." | ||
| ) | ||
| for module in [m] + downstream_dependents_modules: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chance of loading the same module multiple times (if a->b c->b and a, c, then b is called twice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think collecting all outside of the loop, and then just traversing once makes the most sense
|
|
||
| # Not sure how this can happen, but guard against it | ||
| if not ( | ||
| hasattr(src_module, "__file__") and src_module.__file__ is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src_file = getattr(src_module, '__file__', None)
| else: | ||
| return False | ||
|
|
||
| # Legacy way |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No point keeping dead code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know if this new implementation would completely replace the current one, or if users could switch from one to the other via settings
|
|
||
| if GRAPH_MODE: | ||
| graph = reloader.get_module_dependency_graph(src_module) | ||
| if src_module_filename in get_downstream_dependents( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_dependent = any(...)
return is_dependent| modname_to_cell_id[modname] | ||
| for modname in stale_modules | ||
| ), | ||
| relatives=dataflow.import_block_relatives, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this may break tests (which is good, will remind us what the intended behavior is)
|
hey @mthiboust this may have a bit of churn on the PR/review. it might be easier to make the more incremental fixes/changes (like the fallback mentioned in this comment) |
This PR was more a proof-of-concept for now. I clarified the current status of the issue in this recent comment. In short, I don't think that the discussed fallback for relative imports will be sufficient to be useful if we are still lacking support for nested imports. And implementing support for nested imports via |
📝 Summary
Fixes #6940
This PR is a proof-of-concept to make autoreload more robust by ensuring that when a module changes, all modules that depend on it (even indirectly) are also reloaded. This is achieved by leveraging Ruff’s static analysis, which provides a dependency graph. A positive side effect is that
ruff analyze graphsupports both absolute and relative imports.As explained in the issue,
ruff analyze graphis not working as expected for third-party packages insite-packages. Only current project and packages installed in editable mode are supported for automatic reload on file updates.Got inspiration from this blog post: https://vincerose.dev/posts/ruff-test-filtering/
🔍 Description of Changes
The code could have been more straightforward but I chose to keep the existing code structure.
If you consider going further, we should decide whether to completely replace the existing, keep the two and use this new implementation as the default one, with the former one as a fallback?
I don't know how to unit test this feature.
📋 Checklist