初始化(建议显式)

import yueyang.vostok.Vostok;
import yueyang.vostok.file.VKFileConfig;
import yueyang.vostok.office.VKOfficeConfig;

Vostok.File.init(new VKFileConfig().baseDir("./data"));

Vostok.Office.init(new VKOfficeConfig()
    .officeTempDir("tmp/office"));
当前限制
仅支持 .xlsx/.docx/.pptx/.pdf,且当前仅支持 Vostok.Filelocal 模式。

Excel 示例

import yueyang.vostok.office.excel.*;

VKExcelWorkbook workbook = new VKExcelWorkbook()
    .addSheet(new VKExcelSheet("Orders")
        .addCell(VKExcelCell.stringCell(1, 1, "orderId"))
        .addCell(VKExcelCell.numberCell(2, 2, "99.5")));

Vostok.Office.writeExcel("excel/orders.xlsx", workbook);
Vostok.Office.readExcelRows("excel/orders.xlsx", "Orders",
    VKExcelReadOptions.defaults(), row -> process(row.rowIndex(), row.cells()));

Word 示例

import yueyang.vostok.office.word.*;

VKWordWriteRequest wordReq = new VKWordWriteRequest()
    .addParagraph("订单 A001")
    .addImageBytes("logo.png", logoBytes)
    .addImageFile("images/sign.png");

Vostok.Office.writeWord("word/orders.docx", wordReq);

String text = Vostok.Office.readWordText("word/orders.docx");
int chars = Vostok.Office.countWordChars("word/orders.docx");

VKWordReadOptions metadataOnly = VKWordReadOptions.defaults()
    .imageLoadMode(VKWordImageLoadMode.METADATA_ONLY);
VKWordDocument wordDoc = Vostok.Office.readWord("word/orders.docx", metadataOnly);

PPT 示例(方法名中的 PPT 全大写)

import yueyang.vostok.office.ppt.*;

VKPptWriteRequest req = new VKPptWriteRequest();
req.addSlide().addParagraph("季度总结 Q1").addImageBytes("chart.png", chartBytes);

Vostok.Office.writePPT("ppt/summary.pptx", req);

String pptText = Vostok.Office.readPPTText("ppt/summary.pptx");
int slides = Vostok.Office.countPPTSlides("ppt/summary.pptx");

VKPptReadOptions metadataOnly = VKPptReadOptions.defaults()
    .imageLoadMode(VKPptImageLoadMode.METADATA_ONLY);
VKPptDocument pptDoc = Vostok.Office.readPPT("ppt/summary.pptx", metadataOnly);

PDF 示例(方法名中的 PDF 全大写)

import yueyang.vostok.office.pdf.*;

VKPdfWriteRequest req = new VKPdfWriteRequest();
req.addPage().addParagraph("账单 A001").addImageBytes("logo.png", logoBytes);

Vostok.Office.writePDF("pdf/bill.pdf", req);

String pdfText = Vostok.Office.readPDFText("pdf/bill.pdf");
int pages = Vostok.Office.countPDFPages("pdf/bill.pdf");

VKPdfReadOptions metadataOnly = VKPdfReadOptions.defaults()
    .imageLoadMode(VKPdfImageLoadMode.METADATA_ONLY);
VKPdfDocument pdfDoc = Vostok.Office.readPDF("pdf/bill.pdf", metadataOnly);

模板引擎示例

当前支持 renderWordTemplate/renderPPTTemplate/renderPDFTemplate/renderExcelTemplate

通用语法:

{{name}}                    // 变量替换
{{#items as item}}...{{/items}} // 循环(别名 item)
{{?vip}}...{{/vip}}          // 条件渲染

Word 模板(docx)

模板原文样例(tpl/order.docx)

订单号:{{orderNo}}
客户:{{customer}}
明细:
{{#items as item}}- {{item.name}} x {{item.qty}} = {{item.amount}}
{{/items}}
{{?vip}}VIP 客户折扣:{{vipDiscount}}{{/vip}}
应付总额:{{total}}

渲染调用样例

Vostok.Office.renderWordTemplate(
    "tpl/order.docx",
    "out/order-rendered.docx",
    VKOfficeTemplateData.create()
        .put("orderNo", "A20260304001")
        .put("customer", "张三")
        .put("items", List.of(
            Map.of("name", "可乐", "qty", 2, "amount", "8.00"),
            Map.of("name", "薯片", "qty", 1, "amount", "6.00")
        ))
        .put("vip", true)
        .put("vipDiscount", "2.00")
        .put("total", "12.00")
);

渲染结果样例(读取文本)

订单号:A20260304001
客户:张三
明细:
- 可乐 x 2 = 8.00
- 薯片 x 1 = 6.00
VIP 客户折扣:2.00
应付总额:12.00

Excel 模板(xlsx,行级循环)

Excel 循环块使用“起始行 + 结束行”标记:

起始行某单元格:{{#items as item}}
结束行某单元格:{{/items}}

起始标记支持参数 keepPlaceholderRows=true|false

{{#items as item keepPlaceholderRows=false}}

优先级:标记行参数 > VKExcelTemplateOptions.defaultKeepPlaceholderRows > 默认值 true

模板样例(tpl/order.xlsx,Orders sheet)

R1: 订单号 | {{orderNo}}
R2: {{#items as item}}
R3: {{item.name}} | {{item.qty}} | {{item.amount}}
R4: {{/items}}
R5: 总计 | {{total}}

渲染调用样例

import yueyang.vostok.office.excel.template.VKExcelTemplateOptions;

Vostok.Office.renderExcelTemplate(
    "tpl/order.xlsx",
    "out/order.xlsx",
    Map.of(
        "orderNo", "A20260304001",
        "items", List.of(
            Map.of("name", "可乐", "qty", 2, "amount", "8.00"),
            Map.of("name", "薯片", "qty", 1, "amount", "6.00")
        ),
        "total", "14.00"
    ),
    VKExcelTemplateOptions.defaults()
        .defaultKeepPlaceholderRows(true)
        .targetSheets(List.of("Orders"))
);

结果对照:

// keepPlaceholderRows=true(默认)
R1: 订单号 | A20260304001
R2: ""     // 起始占位行保留并清空标记
R3: 可乐   | 2 | 8.00
R4: 薯片   | 1 | 6.00
R5: ""     // 结束占位行保留并清空标记
R6: 总计   | 14.00

// keepPlaceholderRows=false(可在起始标记或 options 指定)
R1: 订单号 | A20260304001
R2: 可乐   | 2 | 8.00   // 从起始行号开始展开
R3: 薯片   | 1 | 6.00
R4: 总计   | 14.00

空列表行为:

keepPlaceholderRows=true 时保留起止占位行并清空标记;keepPlaceholderRows=false 时起止占位行及中间模板行全部移除。

PPT/PDF 模板提示
renderPPTTemplate(...)renderPDFTemplate(...) 的调用方式、模板语法与 Word 模板一致。

转换能力示例

import yueyang.vostok.office.convert.VKOfficeConvertOptions;

// docx/pptx/xlsx -> pdf
Vostok.Office.convertToPDF("word/orders.docx", "pdf/orders.pdf");

// xlsx -> csv
Vostok.Office.convertExcelToCSV(
    "excel/orders.xlsx",
    "csv/orders.csv",
    VKOfficeConvertOptions.defaults().csvSheetName("Orders")
);

// csv -> xlsx
Vostok.Office.convertCSVToExcel("csv/orders.csv", "excel/orders-back.xlsx");

流式读取与结构化提取

// 流式读取:逐块回调(text/image/meta)
Vostok.Office.readWordStream("word/orders.docx", block -> consume(block.type(), block.text()));
Vostok.Office.readPPTStream("ppt/summary.pptx", block -> consume(block.type(), block.text()));
Vostok.Office.readPDFStream("pdf/bill.pdf", block -> consume(block.type(), block.text()));

// 结构化提取:节点模型(Word/PPT/PDF)
var wordStructured = Vostok.Office.readWordStructured("word/orders.docx");
var pptStructured = Vostok.Office.readPPTStructured("ppt/summary.pptx");
var pdfStructured = Vostok.Office.readPDFStructured("pdf/bill.pdf");

异步任务回调(类 Event 风格)

import yueyang.vostok.office.job.*;

Vostok.Office.onJobCompleted(n ->
    Vostok.Log.info("job done: {} {}", n.jobId(), n.resultPath()));

Vostok.Office.onJobDeadLetter(n ->
    Vostok.Log.warn("unhandled office job notification: {}", n.status()));

String jobId = Vostok.Office.submitJob(
    VKOfficeJobRequest.create(() -> {
        Vostok.Office.convertToPDF("word/orders.docx", "pdf/orders-async.pdf");
        return VKOfficeJobExecutionResult.ofPath("pdf/orders-async.pdf");
    }).type(VKOfficeJobType.CONVERT).tag("batch-a")
);

VKOfficeJobResult result = Vostok.Office.awaitJob(jobId, 30000);
字数统计规则
Word/PPT/PDF 字数统一按“非空白 Unicode code point”统计;空格、换行、制表符不计入字数。

配置参数(VKOfficeConfig)

参数类型默认值说明
officeTempDirString"tmp/office"统一临时目录根;Excel/Word/PPT/PDF 自动使用子目录
pptMaxSlidesint10,000PPT 最大幻灯片数量
pdfMaxPagesint10,000PDF 最大页数
pdfMaxObjectsint1,000,000PDF 最大对象数
pdfMaxStreamByteslong134217728PDF 单 stream 解码上限(默认 128MB)
officeJobEnabledbooleantrue是否启用 Office 异步任务
officeJobWorkerThreadsint4任务执行线程数
officeJobQueueCapacityint4096任务队列容量
officeJobRetentionMslong86400000任务元数据保留时长(ms)
officeJobResultMaxByteslong67108864任务结果元数据最大字节(业务约束)
officeJobCallbackThreadsint2回调分发线程数
officeJobCallbackQueueCapacityint4096回调队列容量
officeJobCallbackTimeoutMslong5000回调处理超时阈值(业务监控用途)
officeJobNotifyOnRunningbooleanfalse是否发送 RUNNING 通知
xxeSampleBytesint8192XML 安全采样字节数(Excel/Word/PPT)

API 速查

方法说明
readExcel(path)读取 .xlsx(全量)
writeExcel(path, workbook)写入 .xlsx
readExcelRows(path, sheetName, options, consumer)流式逐行读取 .xlsx
readWordText(path)读取 .docx 全文文本
readWordImages(path, options)读取 .docx 全部图片
countWordChars(path, options)统计 .docx 字数(仅计数路径)
countWordImages(path, options)统计 .docx 图片数
readWord(path, options)聚合读取 Word 信息
writeWord(path, request, options)生成 .docx(文本 + 图片)
readPPTText(path, options)读取 .pptx 文本
readPPTImages(path, options)读取 .pptx 图片(支持 metadata-only)
countPPTChars(path, options)统计 .pptx 字数(仅计数路径)
countPPTImages(path, options)统计 .pptx 图片数
countPPTSlides(path, options)统计 .pptx 幻灯片数
readPPT(path, options)聚合读取 PPT 信息
writePPT(path, request, options)生成 .pptx(文本 + 图片)
readPDFText(path, options)读取 .pdf 文本
readPDFImages(path, options)读取 .pdf 图片(支持 metadata-only)
countPDFChars(path, options)统计 .pdf 字数(仅计数路径)
countPDFImages(path, options)统计 .pdf 图片数
countPDFPages(path, options)统计 .pdf 页数
readPDF(path, options)聚合读取 PDF 信息
writePDF(path, request, options)生成 .pdf(文本 + 图片)
renderWordTemplate/renderPPTTemplate/renderPDFTemplate模板渲染({{var}}/{{#list as item}}/{{?cond}})
renderExcelTemplate(template, output, data, options)Excel 模板渲染(支持行级循环与占位行策略)
convertToPDF(path, target, options)docx/pptx/xlsx 转 pdf
convertExcelToCSV / convertCSVToExcelxlsx 与 csv 双向转换
readWordStream/readPPTStream/readPDFStream块级流式读取
readWordStructured/readPPTStructured/readPDFStructured结构化节点提取
submitJob/getJob/cancelJob/awaitJob/listJobs异步任务管理
onJob/onceJob/offJob/offAllJobs/onJobDeadLetter任务回调注册(类 Event 风格)
started()是否已初始化
config()获取当前配置副本
close()关闭 Office 模块

错误码

错误码常量触发场景
OF-400INVALID_ARGUMENT参数为空或非法
OF-402CONFIG_ERROROffice 配置不合法
OF-403STATE_ERROR运行状态不满足(如 File 模式不支持)
OF-404NOT_FOUND资源不存在(如文件/部件)
OF-500IO_ERROR底层 IO 失败
OF-530UNSUPPORTED_FORMAT不支持格式(仅 .xlsx/.docx/.pptx/.pdf)
OF-564PARSE_ERROROffice 文档解析失败
OF-565WRITE_ERROROffice 文档写入失败
OF-566LIMIT_EXCEEDED超过字数/图片/页数/大小限制
OF-567SECURITY_ERROR安全检测失败(路径/魔数/XXE)