Add Kafka Data Source Support for Streaming Data Processing

### Is your feature request related to a problem?


Currently, Daft lacks native support for consuming data directly from Apache Kafka, which creates significant limitations for real-time data processing scenarios. Users working with streaming data pipelines are forced to implement workaround solutions like:
1. Writing Kafka data to intermediate storage (e.g., Parquet files) before loading to Daft
2. Creating custom Python consumers with `kafka-python`/`confluent-kafka` followed by manual DataFrame conversions
3. Relying on external stream processing engines before feeding data to Daft

These approaches introduce unnecessary latency, complexity and potential data consistency issues in streaming workflows.


### Describe the solution you'd like


We propose implementing first-class Kafka support in Daft with:
- **Native Kafka DataSource** integration supporting both batch and streaming modes
- **Structured Streaming** capabilities including:
  - Offset management (automatic checkpointing)
  - Consumer group support
  - Exactly-once processing semantics
- Schema inference from:
  - Kafka message headers
  - Embedded schemas (Avro/Protobuf via Schema Registry)
- Integration with existing DataFrame API:
  ```python
  df = daft.read_kafka(
    bootstrap_servers="kafka:9092",
    topics=["iot-sensors"],
    consumer_group="daft-processor",
    starting_offsets="earliest"
  ).where(col("sensor_type") == "temperature")

### Describe alternatives you've considered

_No response_

### Additional Context

_No response_

### Would you like to implement a fix?

Yes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Kafka Data Source Support for Streaming Data Processing #4603

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional Context

Would you like to implement a fix?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add Kafka Data Source Support for Streaming Data Processing #4603

Description

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional Context

Would you like to implement a fix?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions