Skip to content

Commit 627ab53

Browse files
refactor: markdown parser and renderer to support HTML elements
- Simplified ParserConfig by removing extension configuration and enabling all features by default. - Enhanced HTML element parsing to support self-closing tags and attributes for <img>, <br>, <kbd>, <small>, and <mark>. - Updated the rendering logic to handle HTML elements correctly, including attribute escaping and self-closing tags. - Improved test coverage for HTML element parsing and rendering. - Updated documentation to reflect changes in usage and configuration.
1 parent b31096b commit 627ab53

File tree

17 files changed

+706
-586
lines changed

17 files changed

+706
-586
lines changed

ARCHITECTURE.md

Lines changed: 52 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ gomark is built on the principle of **pragmatic simplicity**:
2828
- **Reusability**: Tokens can be reused by multiple parsers
2929
- **Memory Efficiency**: Tokens reference original string data
3030

31-
**Alternative Considered**: Text-based parsing (like goldmark)
31+
**Alternative Considered**: Text-based parsing
3232
**Why Rejected**: Added complexity without clear benefits for our use cases
3333

3434
### 2. Simple AST Interface ✅
@@ -48,7 +48,7 @@ type Node interface {
4848
- **Focused**: Only implements what's actually needed
4949
- **Memory Efficient**: No overhead for unused tree navigation features
5050

51-
**Alternative Considered**: Complex tree interface (like goldmark)
51+
**Alternative Considered**: Complex tree interface
5252
**Why Rejected**: Analysis showed no actual usage of tree navigation in our codebase
5353

5454
### 3. Stateless Parsers ✅
@@ -106,7 +106,7 @@ const ParagraphNode NodeType = "PARAGRAPH"
106106

107107
### Public vs Internal
108108

109-
**Public Packages** (goldmark-style):
109+
**Public Packages**:
110110
```
111111
├── ast/ # AST definitions - users need access
112112
├── config/ # Configuration - users need to configure
@@ -122,7 +122,7 @@ const ParagraphNode NodeType = "PARAGRAPH"
122122
**Rationale**:
123123
- Public APIs allow extensibility where it matters
124124
- Internal packages keep implementation details hidden
125-
- Follows goldmark patterns for familiarity
125+
- Clean separation of concerns
126126

127127
## Performance Optimizations
128128

@@ -171,18 +171,7 @@ These are **conscious decisions**, not oversights:
171171
### Package Refactoring
172172
**Problem**: Everything was in `internal/` packages
173173
**Solution**: Moved key packages to public for extensibility
174-
**Result**: goldmark-style architecture with better extensibility
175-
176-
## Comparison with goldmark
177-
178-
| Aspect | goldmark | gomark |
179-
|--------|----------|--------|
180-
| **Complexity** | High | Low |
181-
| **Performance** | Good | Excellent |
182-
| **Extensibility** | Very High | Moderate |
183-
| **Maintainability** | Moderate | High |
184-
| **Learning Curve** | Steep | Gentle |
185-
| **Feature Set** | Comprehensive | Focused |
174+
**Result**: Modular architecture with better extensibility
186175

187176
## When to Choose gomark
188177

@@ -191,12 +180,51 @@ These are **conscious decisions**, not oversights:
191180
- You want simple, maintainable code
192181
- You're building applications, not markdown libraries
193182
- You need good performance with moderate extensibility
183+
- You want zero-configuration setup with all features enabled
184+
185+
## Recent Architecture Evolutions
186+
187+
### HTML Elements Support (Phase 1) ✅
188+
189+
**Addition**: Added support for essential HTML elements: `<kbd>`, `<br>`, `<img>`, `<small>`, `<mark>`
190+
191+
**Approach**:
192+
- **Reused existing `HTMLElementNode`** rather than creating separate node types
193+
- **Enhanced with `Children` and `IsSelfClosing`** fields for flexibility
194+
- **Smart parsing**: Different strategies for self-closing vs container elements
195+
- **Attribute handling**: Proper parsing with quote support and sanitization
196+
- **Security-first**: HTML-escaped attributes and content validation
197+
198+
**Rationale**:
199+
- These elements have no markdown equivalents (can't be achieved with existing syntax)
200+
- Essential for documentation and note-taking (especially `<kbd>` for shortcuts)
201+
- CommonMark and GFM standards support for these elements
202+
203+
### Configuration Simplification ✅
194204

195-
**Choose goldmark when**:
196-
- You need maximum extensibility
197-
- You're building a markdown processing library
198-
- You need complex AST transformations
199-
- You need full CommonMark compliance edge cases
205+
**Change**: Simplified configuration to "zero-config by default"
206+
207+
**Before**:
208+
```go
209+
// Required configuration for HTML elements
210+
cfg := config.DefaultConfig().WithAllowHTML(true)
211+
engine := gomark.NewEngine(gomark.WithConfig(cfg))
212+
```
213+
214+
**After**:
215+
```go
216+
// HTML elements work by default - no config needed!
217+
doc, err := gomark.Parse("Press <kbd>Ctrl</kbd> to copy")
218+
```
219+
220+
**New Configuration Approach**:
221+
1. **`gomark.Parse()`** → Uses `DefaultConfig()` (all features enabled)
222+
2. **`config.DefaultConfig()`** → Single configuration with sensible defaults
223+
224+
**Rationale**:
225+
- gomark is primarily used in memos where users want all features
226+
- Configuration complexity was barrier to adoption
227+
- Smart defaults reduce cognitive load
200228

201229
## Future Evolution
202230

@@ -210,16 +238,16 @@ gomark is designed to evolve pragmatically:
210238
### Potential Future Additions
211239

212240
**Only if there's demonstrated need**:
241+
- **Phase 2 HTML Elements**: `<details>/<summary>`, `<a>` with attributes, `<div>`
213242
- AST walking API (if users request it)
214243
- More output formats (if users request them)
215-
- Advanced HTML attributes (if simple approach proves insufficient)
216-
- Text-based parsing (if token-based proves limiting)
244+
- Advanced HTML attribute parsing (if current approach proves insufficient)
217245

218246
## Conclusion
219247

220248
gomark represents a **pragmatic approach** to markdown parsing:
221249

222-
- **Goldmark-inspired architecture** for familiarity and extensibility
250+
- **Clean modular architecture** for extensibility
223251
- **Performance-focused implementation** for real-world applications
224252
- **Simple, maintainable code** that developers can understand and modify
225253
- **Focused feature set** that solves real problems without over-engineering

README.md

Lines changed: 43 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# gomark
22

3-
A fast, extensible, and well-structured markdown parser for Go, inspired by [goldmark](https://github.com/yuin/goldmark) but optimized for simplicity and performance.
3+
A fast, extensible, and well-structured markdown parser for Go, optimized for simplicity and performance.
44

55
## Features
66

@@ -29,27 +29,47 @@ A fast, extensible, and well-structured markdown parser for Go, inspired by [gol
2929
- **Tags**: `#hashtag` syntax
3030
- **Referenced Content**: `[[wiki-style]]` links
3131
- **Embedded Content**: `![[embeds]]`
32-
- **HTML Elements**: Basic HTML tag support
32+
- **HTML Elements**: `<kbd>`, `<br>`, `<img>`, `<small>`, `<mark>`, and more
3333

3434
## Quick Start
3535

36+
gomark is designed for **zero-configuration usage** - all features are enabled by default:
37+
3638
```go
3739
package main
3840

3941
import (
4042
"fmt"
4143
"github.com/usememos/gomark"
44+
"github.com/usememos/gomark/renderer/html"
4245
)
4346

4447
func main() {
45-
// Parse markdown
46-
markdown := "# Hello\n\nThis is **bold** and *italic* text."
48+
// All features enabled by default - no configuration needed!
49+
markdown := `# Hello Memos!
50+
51+
**Bold** and *italic* text with ==highlighting==.
52+
53+
Press <kbd>Ctrl</kbd>+<kbd>C</kbd> to copy.
54+
55+
Math: $E = mc^2$ and task lists:
56+
- [x] HTML elements supported
57+
- [ ] Even more features coming
58+
59+
#gomark ||works great||!`
60+
61+
// Parse with all features enabled
4762
doc, err := gomark.Parse(markdown)
4863
if err != nil {
4964
panic(err)
5065
}
5166

52-
// Restore back to markdown
67+
// Render to HTML
68+
renderer := html.NewHTMLRenderer()
69+
html := renderer.RenderDocument(doc)
70+
fmt.Println(html)
71+
72+
// Or restore back to markdown
5373
restored := gomark.Restore(doc)
5474
fmt.Println(restored)
5575
}
@@ -59,19 +79,18 @@ func main() {
5979

6080
### Custom Configuration
6181

82+
While gomark works great with zero configuration, you can customize it if needed:
83+
6284
```go
6385
import (
6486
"github.com/usememos/gomark"
6587
"github.com/usememos/gomark/config"
6688
)
6789

68-
// Create engine with custom configuration
69-
engine := gomark.NewEngine(
70-
gomark.WithConfig(config.StrictConfig()), // Use strict CommonMark
71-
gomark.WithExtension("tables", false), // Disable tables
72-
gomark.WithExtension("math", true), // Enable math
73-
gomark.WithStrictMode(true), // Enable strict parsing
74-
)
90+
// Customize limits if needed
91+
engine := gomark.NewEngine(gomark.WithConfig(
92+
config.DefaultConfig().WithMaxDepth(100).WithMaxFileSize(1024 * 1024), // 1MB limit
93+
))
7594

7695
doc, err := engine.Parse(markdown)
7796
```
@@ -100,28 +119,23 @@ textOutput := stringRenderer.RenderDocument(doc)
100119
markdownOutput := gomark.Restore(doc)
101120
```
102121

103-
### Configuration Options
122+
### Available Configurations
104123

105124
```go
106125
import "github.com/usememos/gomark/config"
107126

108-
// Default configuration (all extensions enabled)
127+
// Default configuration: all features enabled with generous limits
109128
cfg := config.DefaultConfig()
110129

111-
// Strict CommonMark configuration
112-
cfg := config.StrictConfig()
113-
114-
// Custom configuration
130+
// Custom limits if needed
115131
cfg := config.DefaultConfig().
116-
WithExtension("tables", false).
117-
WithExtension("math", true).
118-
WithStrictMode(true).
119-
WithSafeMode(true)
132+
WithMaxDepth(50). // Limit nesting depth
133+
WithMaxFileSize(1024 * 1024) // 1MB file size limit
120134
```
121135

122136
## Architecture
123137

124-
gomark follows a clean, modular architecture inspired by goldmark:
138+
gomark follows a clean, modular architecture:
125139

126140
```
127141
gomark/
@@ -192,11 +206,13 @@ cfg = cfg.WithSafeMode(true)
192206

193207
## Recent Improvements
194208

209+
-**Phase 1 HTML Elements**: Added support for `<kbd>`, `<br>`, `<img>`, `<small>`, `<mark>`
210+
-**Simplified Configuration**: Zero-config usage with sensible defaults
211+
-**Enhanced HTML Parsing**: Proper attribute handling and self-closing tag support
195212
-**Fixed blockquote blank lines** (GitHub issue #19)
196-
-**Refactored to goldmark-style architecture**
213+
-**Refactored to modular architecture**
197214
-**Improved package organization** with public APIs
198-
-**Enhanced test coverage**
199-
-**Better documentation**
215+
-**Enhanced test coverage** with 26+ HTML element test cases
200216

201217
## Contributing
202218

@@ -212,4 +228,4 @@ This project is part of the [Memos](https://github.com/usememos/memos) ecosystem
212228

213229
## Inspiration
214230

215-
Inspired by [goldmark](https://github.com/yuin/goldmark) but designed for simplicity, performance, and ease of use. While goldmark provides comprehensive CommonMark compliance with complex extensibility, gomark focuses on practical markdown parsing with clean, maintainable code.
231+
Designed for simplicity, performance, and ease of use. gomark focuses on practical markdown parsing with clean, maintainable code.

ast/block.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -124,7 +124,7 @@ type ListKind string
124124
const (
125125
UnorderedList ListKind = "ul"
126126
OrderedList ListKind = "ol"
127-
DescrpitionList ListKind = "dl"
127+
DescriptionList ListKind = "dl"
128128
)
129129

130130
type List struct {

ast/inline.go

Lines changed: 26 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -286,8 +286,10 @@ func (n *Spoiler) Restore() string {
286286
type HTMLElement struct {
287287
BaseInline
288288

289-
TagName string
290-
Attributes map[string]string
289+
TagName string
290+
Attributes map[string]string
291+
Children []Node // For container elements like <kbd>, <small>, <mark>
292+
IsSelfClosing bool // For self-closing elements like <br>, <img>
291293
}
292294

293295
func (*HTMLElement) Type() NodeType {
@@ -303,5 +305,26 @@ func (n *HTMLElement) Restore() string {
303305
if len(attributes) > 0 {
304306
attrStr = " " + strings.Join(attributes, " ")
305307
}
306-
return fmt.Sprintf("<%s%s />", n.TagName, attrStr)
308+
309+
if n.IsSelfClosing {
310+
return fmt.Sprintf("<%s%s />", n.TagName, attrStr)
311+
}
312+
313+
// Container element with children
314+
childrenStr := ""
315+
if len(n.Children) == 1 {
316+
// For simple text content elements like <kbd>text</kbd>
317+
if textNode, ok := n.Children[0].(*Text); ok {
318+
childrenStr = textNode.Content
319+
} else {
320+
childrenStr = n.Children[0].Restore()
321+
}
322+
} else {
323+
// For complex nested content
324+
for _, child := range n.Children {
325+
childrenStr += child.Restore()
326+
}
327+
}
328+
329+
return fmt.Sprintf("<%s%s>%s</%s>", n.TagName, attrStr, childrenStr, n.TagName)
307330
}

ast/util.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ func GetListItemKindAndIndent(node Node) (ListKind, int) {
1616
case *UnorderedListItem:
1717
return UnorderedList, n.Indent
1818
case *TaskListItem:
19-
return DescrpitionList, n.Indent
19+
return DescriptionList, n.Indent
2020
default:
2121
return "", 0
2222
}

0 commit comments

Comments
 (0)