Skip to content

Conversation

@CodingOnStar
Copy link
Contributor

Summary

feat(embedding-process): implement embedding process components and polling logic

  • Added , , , and for managing and displaying the embedding process.
  • Introduced utility functions for document lookup and indexing status management.
  • Enhanced the main component to utilize new components and hooks for improved functionality and user experience.
  • Implemented tests for new components to ensure reliability and correctness.

Checklist

  • This change requires a documentation update, included: Dify Document
  • I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.
  • I ran make lint and make type-check (backend) and cd web && npx lint-staged (frontend) to appease the lint gods

…olling logic

- Added , , , and  for managing and displaying the embedding process.
- Introduced utility functions for document lookup and indexing status management.
- Enhanced the main  component to utilize new components and hooks for improved functionality and user experience.
- Implemented tests for new components to ensure reliability and correctness.
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @CodingOnStar, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors and enhances the user interface for monitoring the document embedding process. By introducing modular components and a dedicated polling hook, it provides a clearer, more organized, and robust way for users to track the status of their documents, understand the applied processing rules, and receive relevant upgrade notifications. The changes aim to improve the overall user experience and maintainability of the embedding process feature.

Highlights

  • Modularized UI Components: Introduced dedicated React components (IndexingProgressItem, RuleDetail, UpgradeBanner) to clearly display individual document indexing progress, processing rules, and billing-related upgrade prompts.
  • Centralized Polling Logic: Extracted the document indexing status polling mechanism into a reusable custom React hook (useIndexingStatusPolling), improving code organization and maintainability.
  • Enhanced Document Utilities: Added a utils.ts file containing helper functions for efficient document data lookup, file type extraction, and dynamic progress calculation.
  • Refactored Main Component: The primary EmbeddingProcess component has been streamlined, now leveraging the new sub-components and polling hook for a cleaner and more focused implementation.
  • Comprehensive Test Coverage: New and existing functionalities are thoroughly tested with a dedicated test file (index.spec.tsx), ensuring reliability and correctness of the embedding process display.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and well-executed refactoring of the embedding process components. The logic has been broken down into smaller, more manageable pieces, including new components, utility functions, and a dedicated polling hook. This greatly improves the maintainability and testability of the code, as demonstrated by the comprehensive new test suite.

My review focuses on a few areas for improvement:

  • Correcting a bug in the polling logic due to an inconsistent status list.
  • Simplifying some redundant code for better clarity.
  • Improving the accuracy of a test description to avoid confusion.

Overall, this is a high-quality contribution that enhances the codebase structure.

CodingOnStar added 2 commits January 6, 2026 16:41
- Fixed the description of a test case in index.spec.tsx to accurately reflect its purpose.
- Added 'waiting' status to the EMBEDDING_STATUSES constant in use-indexing-status-polling.ts for improved status tracking.
…ile extensions

- Added comprehensive tests for the DocumentFileIcon component to cover multiple file extensions, including handling of unknown, uppercase, mixed case, and filenames with multiple dots or no extension.
- Removed outdated snapshot tests to streamline the test suite.
@CodingOnStar CodingOnStar marked this pull request as ready for review January 6, 2026 08:57
Copilot AI review requested due to automatic review settings January 6, 2026 08:57
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. 👻 feat:rag Embedding related issue, like qdrant, weaviate, milvus, vector database. labels Jan 6, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a comprehensive embedding process management system with polling, progress tracking, and UI components for document indexing in a dataset creation workflow.

Key Changes:

  • Introduces a polling hook for real-time indexing status updates with automatic cleanup
  • Refactors a monolithic component into smaller, focused, reusable components
  • Adds utility functions for document lookup and status calculation

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
utils.ts Provides utility functions for document lookup, status checking, and progress calculations
use-indexing-status-polling.ts Custom hook implementing polling logic with 2.5s intervals and automatic stop on completion
upgrade-banner.tsx Simple banner component prompting users to upgrade for faster document processing
rule-detail.tsx Component displaying document processing rules, indexing type, and retrieval settings
indexing-progress-item.tsx Individual progress bar item showing document status, icons, and completion percentage
index.tsx Main refactored component orchestrating all sub-components with improved structure
index.spec.tsx Comprehensive test suite with 1562 lines covering all components and edge cases

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…e name retrieval logic

- Updated the test case description in index.spec.tsx for clarity.
- Refined the logic in rule-detail.tsx to ensure only valid rule names are returned, improving robustness in handling undefined or non-string rule names.
Copy link
Member

@WTW0313 WTW0313 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Jan 8, 2026
@CodingOnStar CodingOnStar merged commit 98df99b into main Jan 9, 2026
14 checks passed
@CodingOnStar CodingOnStar deleted the refactor/embedding-process branch January 9, 2026 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

👻 feat:rag Embedding related issue, like qdrant, weaviate, milvus, vector database. lgtm This PR has been approved by a maintainer size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants