zed/crates/html_to_markdown/src/html_to_markdown.rs
Piotr Osiewicz e6c1c51b37
Some checks are pending
CI / Check formatting and spelling (push) Waiting to run
CI / (macOS) Run Clippy and tests (push) Waiting to run
CI / (Linux) Run Clippy and tests (push) Waiting to run
CI / (Windows) Run Clippy and tests (push) Waiting to run
CI / Create a macOS bundle (push) Blocked by required conditions
CI / Create a Linux bundle (push) Blocked by required conditions
CI / Create arm64 Linux bundle (push) Blocked by required conditions
Deploy Docs / Deploy Docs (push) Waiting to run
Docs / Check formatting (push) Waiting to run
chore: Fix several style lints (#17488)
It's not comprehensive enough to start linting on `style` group, but
hey, it's a start.

Release Notes:

- N/A
2024-09-06 11:58:39 +02:00

46 lines
1.2 KiB
Rust

//! Convert HTML to Markdown.
mod html_element;
pub mod markdown;
mod markdown_writer;
pub mod structure;
use std::io::Read;
use anyhow::{Context, Result};
use html5ever::driver::ParseOpts;
use html5ever::parse_document;
use html5ever::tendril::TendrilSink;
use html5ever::tree_builder::TreeBuilderOpts;
use markup5ever_rcdom::RcDom;
pub use crate::html_element::*;
pub use crate::markdown_writer::*;
/// Converts the provided HTML to Markdown.
pub fn convert_html_to_markdown(html: impl Read, handlers: &mut [TagHandler]) -> Result<String> {
let dom = parse_html(html).context("failed to parse HTML")?;
let markdown_writer = MarkdownWriter::new();
let markdown = markdown_writer
.run(&dom.document, handlers)
.context("failed to convert HTML to Markdown")?;
Ok(markdown)
}
fn parse_html(mut html: impl Read) -> Result<RcDom> {
let parse_options = ParseOpts {
tree_builder: TreeBuilderOpts {
drop_doctype: true,
..Default::default()
},
..Default::default()
};
let dom = parse_document(RcDom::default(), parse_options)
.from_utf8()
.read_from(&mut html)
.context("failed to parse HTML document")?;
Ok(dom)
}