content-proofreading
$
npx mdskill add aipoch/medical-research-skills/content-proofreadingVerify academic manuscripts for language and formatting errors.
- Ensures bilingual consistency across Chinese and English texts.
- Validates terminology, references, and citation styles automatically.
- Generates shareable reports with precise error locations.
- Delivers HTML or Markdown annotations for revision suggestions.
SKILL.md
.github/skills/content-proofreadingView on GitHub ↗
---
name: content-proofreading
description: An academic proofreading skill for Chinese/English manuscripts, triggered when you need automated checks for spelling, grammar, terminology consistency, and formatting before submission.
license: MIT
author: aipoch
---
> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)
## When to Use
- You are preparing an academic paper for journal/conference submission and need a final language + formatting pass.
- You have bilingual (Chinese/English) content and want consistent punctuation, wording, and style across both languages.
- Your manuscript contains domain terminology (e.g., life sciences) and you need consistent Chinese–English term mapping and abbreviation rules.
- You need to validate references, numbers/units, and heading levels against a required style (APA/MLA/GB/T 7714).
- You want a shareable report (HTML or Markdown annotations) with precise error locations and revision suggestions.
## Key Features
- **English checks**
- Spelling (including US/UK variants)
- Grammar (agreement, tense, articles, clause structure)
- Punctuation conventions (US/UK)
- Style suggestions (redundancy detection, passive voice optimization)
- **Chinese checks**
- Typo/misused character detection (dictionary-based)
- Grammar and collocation checks
- Chinese vs. English punctuation normalization
- Academic expression optimization suggestions
- **Terminology consistency**
- Domain terminology database (life sciences by default)
- Bidirectional Chinese–English correspondence checks
- Abbreviation rules (require full form on first occurrence)
- Synonym unification to preferred standard terms
- **Formatting checks**
- Reference style validation (APA/MLA/GB/T 7714, etc.)
- Number and unit normalization
- Heading level consistency
- Abbreviation consistency across the document
- **Reporting**
- HTML interactive report or Markdown annotations
- Precise error localization
- Actionable revision suggestions
## Dependencies
- **Python**: `>= 3.8`
- **Python packages** (install via `pip install -r requirements.txt`)
- `languagetool-python` (version: see `requirements.txt`) — English grammar checking
- `opencc` (version: see `requirements.txt`) — Traditional/Simplified Chinese conversion
- `jieba` (version: see `requirements.txt`) — Chinese tokenization
- `pyenchant` (version: see `requirements.txt`) — spelling checks
- `markdown` (version: see `requirements.txt`) — Markdown rendering
- `python-docx` (version: see `requirements.txt`) — `.docx` reading
- `docx2pdf` (version: see `requirements.txt`) — Word-to-PDF conversion
## Example Usage
### 1) Install
```bash
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
```
### 2) Run (basic)
```bash
python scripts/init_run.py --input <paper_file_path> --output <output_path>
```
### 3) Run (advanced)
```bash
python scripts/init_run.py \
--input paper.md \
--output report.html \
--lang en \
--style apa \
--terminology biology \
--format html
```
### 4) CLI parameters
| Parameter | Description | Default |
|---|---|---|
| `--input` | Input file path | Required |
| `--output` | Output report path | Generates an HTML report by default |
| `--lang` | Language to check (`en` / `zh` / `both`) | `both` |
| `--style` | Reference style (`apa` / `mla` / `gb`) | `apa` |
| `--terminology` | Domain terminology set | `biology` |
| `--format` | Output format (`html` / `markdown`) | `html` |
| `--no-pdf` | Skip PDF generation during Word→PDF conversion | `false` |
### 5) Use as a Python module (end-to-end)
```python
from scripts.english_checker import EnglishChecker
from scripts.chinese_checker import ChineseChecker
from scripts.terminology_manager import TerminologyManager
from scripts.annotation_generator import AnnotationGenerator
text = """
Messenger RNA (mRNA) is transcribed in the nucleus.
"""
en_checker = EnglishChecker()
zh_checker = ChineseChecker()
term_manager = TerminologyManager(domain="biology")
results = []
results.extend(en_checker.check(text))
results.extend(zh_checker.check(text))
results.extend(term_manager.check(text))
generator = AnnotationGenerator(output_format="html")
report = generator.generate(results)
with open("report.html", "w", encoding="utf-8") as f:
f.write(report)
```
## Implementation Details
### Architecture / Core Modules
- `english_checker.py`
- Core engine for English spelling/grammar/style checks.
- Designed to be rule-extensible (add or register new rule sets).
- `chinese_checker.py`
- Core engine for Chinese typo/grammar/style checks.
- Includes a library of common academic writing error patterns.
- `terminology_manager.py`
- Terminology database management (import/export/query/update).
- Performs term consistency checks, bilingual mapping validation, and abbreviation policy checks.
- `annotation_generator.py`
- Converts detected issues into a visual report (HTML) or annotated Markdown.
- Ensures issues include **location**, **type**, and **suggested fix**.
- `word_converter.py`
- Extracts text from `.docx`.
- Optionally converts Word to PDF (can be disabled via `--no-pdf`).
### Terminology database format (JSON)
Organized by domain; each entry can include bilingual forms and abbreviation metadata:
```json
{
"biology": {
"cell": {
"en": "cell",
"abbrev": null,
"full_form": null
},
"mrna": {
"en": "mRNA",
"abbrev": "mRNA",
"full_form": "messenger RNA"
}
}
}
```
**Checking logic (typical):**
- If an abbreviation (e.g., `mRNA`) appears, verify the **full form** appears at first mention (e.g., `messenger RNA (mRNA)`).
- If both Chinese and English terms appear, verify they match the configured mapping for the selected domain.
- If synonyms are detected, prefer the standardized term defined in the database.
### Rule database format (JSON)
Rules are grouped by language and category:
```json
{
"english": {
"spelling": [],
"grammar": [],
"style": []
},
"format": {
"references": [],
"numbers": [],
"units": []
}
}
```
**How rules are applied (high level):**
- Load rule sets by `--lang` and `--style`.
- Run language-specific checks (English/Chinese) and formatting checks.
- Merge results into a unified issue list.
- Render issues into the selected output format (`html` / `markdown`) with location-aware annotations.
### Extensibility
- **Add new rules**
1. Create a rule file under `assets/rules/`.
2. Implement rules following the project’s rule template.
3. Register the rule set in the rule index.
4. Run tests to validate precision/recall and avoid false positives.
- **Add new terminology sets**
1. Create a terminology JSON under `assets/terminology/`.
2. Follow the domain structure shown above.
3. Register the new domain in the terminology index so it can be selected via `--terminology`.More from aipoch/medical-research-skills
- 3d-molecule-ray-tracerGenerate photorealistic rendering scripts for PyMOL and UCSF ChimeraX.
- abstract-summarizerTransform lengthy academic papers into concise, structured 250-word abstracts.
- abstract-trimmerPrecision editing tool that reduces abstract word count through intelligent compression techniques, maintaining scientific rigor while meeting strict journal and conference requirements.
- academic-abstract-refinerRefines long medical academic texts into SCI-style unstructured Chinese and English abstracts; use when you need to condense drafts/reports/summaries into bilingual abstracts and generate Summary_Report.md.
- academic-cv-generatorGenerate structured academic CVs from free-form Chinese/English text and export to Word (.docx). Use this skill when you are asked to organize, generate, or optimize an academic CV (e.g., publications/projects/awards) into a consistent, formatted document with uniform-colored section headers and optional bilingual output.
- academic-highlight-generatorGenerates submission-ready Elsevier/SCI Highlights from manuscript text or extracted PDF/DOCX/TXT content. Use when a user needs 3-5 concise, evidence-grounded highlight bullets for a research paper, review, meta-analysis, case report, or bioinformatics manuscript.
- academic-norm-reviewDetects content similarity, verifies standardized citations and abbreviations, and flags potential academic integrity risks; use it before submission, during academic writing QA, or for compliance reviews.
- academic-poster-generatorComplete workflow for generating academic research posters from PDF literature; use when you need to extract paper content from PDFs and produce a LaTeX-based poster (beamerposter/tikzposter/baposter) with mandatory figure generation and a final rendered HTML deliverable.
- acronym-unpackerIntelligent medical abbreviation disambiguation tool that resolves ambiguous acronyms using clinical context, specialty-specific knowledge, and document-level semantic analysis.
- active-comparator-single-soc-faers-safety-comparisonGenerates complete FAERS pharmacovigilance study designs for multi-drug or class-level safety comparison inside one predefined SOC or AE family using active comparators, disproportionality analysis, subgroup characterization, and reviewer-facing evidence control.