Feature/tree sitter#3306
Draft
sdottaka wants to merge 104 commits into
Draft
Conversation
Integrate tree-sitter as an optional syntax highlighting engine that supplements the existing keyword-based CrystalEdit parsers. When a grammar DLL and highlight query (.scm) are present in the TreeSitterGrammars directory, tree-sitter provides full AST-based highlighting; otherwise the existing parser runs unchanged. Core components: - TreeSitterParser.h/.cpp: CTreeSitterParser, CTreeSitterColorMap, CTreeSitterLanguage, and TreeSitterRegistry classes - ParseLine virtual override in CMergeEditView for tree-sitter results - Incremental parsing via ts_tree_edit() on each edit operation - Lazy reparse with dirty flag (fires once per paint cycle) - Status bar indicator showing [TS:language] in encoding pane - Post-build step to copy grammar DLLs from Release to Debug/Test Supported languages: bash, c, c-sharp, cpp, css, dtd, flow, fsharp, fsharp_signature, go, html, java, javascript, json, php, php_only, python, ruby, rust, tsx, typescript, xml. Grammar DLLs are built separately via build-grammars.ps1.
- build-grammars.ps1: downloads and compiles grammar DLLs from GitHub releases using MSVC cl.exe/link.exe - grammars.json: defines 17 grammar repos and release tags - fsharp-highlights.scm: F# syntax highlight queries for tree-sitter
Wire in scope-aware highlighting (locals.scm) and language injection (injections.scm) alongside the existing highlights.scm support. - CTreeSitterLanguage: add LoadQuery() helper, load all three .scm files - CTreeSitterParser: add RunLocalsQuery() for scope/def/ref tracking, RunInjectionQuery() for embedded language highlighting, GetSetProperty() for #set! predicate parsing; RunHighlightQuery() cross-references locals - TreeSitterRegistry: add GetLanguageForName() for injection language lookup - build-grammars.ps1: resolve and copy locals.scm and injections.scm files - Fix type mismatch (RefInfo vs PendingRef) and remove dead code
- Add tree-sitter shared items to solution and projects - Update SampleStatic project to include tree-sitter - Fix build-grammars.ps1 to use Git Bash explicitly - Add missing <algorithm> include - Minor solution cleanup and add Italian translation
* fix: bundle inherited tree-sitter queries for grammars Agent-Logs-Url: https://github.com/Thorium/winmerge/sessions/234ce03d-a145-4b8c-b4c2-37eed3e33cf0 Co-authored-by: Thorium <229355+Thorium@users.noreply.github.com> * refine tree-sitter query bundling helpers Agent-Logs-Url: https://github.com/Thorium/winmerge/sessions/234ce03d-a145-4b8c-b4c2-37eed3e33cf0 Co-authored-by: Thorium <229355+Thorium@users.noreply.github.com> * polish tree-sitter query bundle handling Agent-Logs-Url: https://github.com/Thorium/winmerge/sessions/234ce03d-a145-4b8c-b4c2-37eed3e33cf0 Co-authored-by: Thorium <229355+Thorium@users.noreply.github.com> * Earlier CoPilot feedback addressed. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Thorium <229355+Thorium@users.noreply.github.com>
* Doc - Italian language - Updated (#3319) * Update Italian.po * Fix issue #3321: [BUG] Incorrect string used with beta releases * Show error message when entering path in header bar (#3322) * Prioritize explicitly selected plugins over archive detection (#3324) * Prioritize explicitly selected plugins over archive detection * Update Src/7zCommon.cpp Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update Src/7zCommon.cpp --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Use 7-Zip IsArc API for archive detection and refactor format guessing logic (#3323) * Use 7-Zip IsArc API for archive detection and refactor format guessing logic * Update ArchiveSupport/Merge7z/Merge7zCommon.cpp Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Restore extension-only fallback in GuessFormatEx and handle NEED_MORE result Agent-Logs-Url: https://github.com/WinMerge/winmerge/sessions/47af4d0f-fc0a-4e33-ab81-8ec95c0f599e Co-authored-by: sdottaka <98126+sdottaka@users.noreply.github.com> * Use 7-Zip IsArc API for archive detection and refactor format guessing logic (2) * Use 7-Zip IsArc API for archive detection and refactor format guessing logic (3) * Prioritize explicitly selected plugins over archive detection * Update Src/7zCommon.cpp Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update Src/7zCommon.cpp * Update Merge7zCommon.cpp --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: sdottaka <98126+sdottaka@users.noreply.github.com> * Merge7z: Bump revision to 2600.1 * Merge7z: Bump revision to 2600.1 (2) * Update French Manual (#3325) * Refactor: unify open parameters and move recurse to OpenFolderParams (#3326) * Update Manual/French.po * Refactor: unify open parameters and move recurse to OpenFolderParams (#3326) (2) (cherry picked from commit 83af229) * Add Folder comparison mode with archive extraction support (#3320) * Update Manual/French.po * Update Brazilian.po (#3328) Added translation for "Add Folder comparison mode with archive extraction support (#3320)" * Update German.po (#3329) * update zh-cn translation (#3331) * Update Turkish.po (#3333) New string entries * Update Korean (#3334) * Code review fixes for 5 oldest source files#3327 #1 * Code review fixes for 5 oldest source files#3327 #2 * Update Turkish.po * Update TranslationsStatus * Update ChangeLog&ReleaseNotes * Italian language (#3335) * Stabilize tree-sitter highlight precedence Make overlapping captures resolve deterministically so syntax colors stay consistent across panes and languages. Also accept local.* capture prefixes so newer query conventions keep local symbol highlighting working. * Unify tree-sitter block ordering Use one parser-wide block order counter so injected-language highlights cannot collide with primary highlight ordering when the final precedence tie-breaker runs. --------- Co-authored-by: bovirus <1262554+bovirus@users.noreply.github.com> Co-authored-by: Takashi Sawanaka <sdottaka@users.sourceforge.net> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: sdottaka <98126+sdottaka@users.noreply.github.com> Co-authored-by: t3chnob0y <t3chnob0y@users.noreply.github.com> Co-authored-by: Marcellomco <70959309+Marcellomco@users.noreply.github.com> Co-authored-by: René T. Nicolaus <12006431+Havoc7891@users.noreply.github.com> Co-authored-by: YG <1246410+yingang@users.noreply.github.com> Co-authored-by: bilimiyorum <131397022+bilimiyorum@users.noreply.github.com> Co-authored-by: VenusGirl❤ <venusgirl@outlook.com>
* Finish tree-sitter runtime integration for compare views Wire the runtime grammar bundle, compare-view UI, and same-file navigation together so tree-sitter features are actually available in built binaries. This also updates the F# grammar bundle to include tags and disables Go to Definition when the current caret position cannot resolve. * Fix tree-sitter follow-up packaging issues Guard the WiX grammar component reference when harvested files are absent, and remove the redundant TreeSitterWrapper include to avoid the _T macro redefinition warning.
# Conflicts: # ArchiveSupport/Merge7z/BuildArc.cmd # Docs/Users/ChangeLog.html # Docs/Users/ChangeLog.md # Docs/Users/ReleaseNotes.html # Docs/Users/ReleaseNotes.md # DownloadDeps.cmd # Src/FilepathEdit.cpp # Src/Merge.vcxproj.filters # Src/res/new_folder.bmp # Translations/TranslationsStatus.md # Translations/WinMerge/Arabic.po # Translations/WinMerge/Basque.po # Translations/WinMerge/Brazilian.po # Translations/WinMerge/Bulgarian.po # Translations/WinMerge/Catalan.po # Translations/WinMerge/ChineseSimplified.po # Translations/WinMerge/ChineseTraditional.po # Translations/WinMerge/Corsican.po # Translations/WinMerge/Croatian.po # Translations/WinMerge/Czech.po # Translations/WinMerge/Danish.po # Translations/WinMerge/Dutch.po # Translations/WinMerge/English.pot # Translations/WinMerge/Finnish.po # Translations/WinMerge/French.po # Translations/WinMerge/Galician.po # Translations/WinMerge/German.po # Translations/WinMerge/Greek.po # Translations/WinMerge/Hebrew.po # Translations/WinMerge/Hungarian.po # Translations/WinMerge/Italian.po # Translations/WinMerge/Japanese.po # Translations/WinMerge/Korean.po # Translations/WinMerge/Lithuanian.po # Translations/WinMerge/Norwegian.po # Translations/WinMerge/Persian.po # Translations/WinMerge/Polish.po # Translations/WinMerge/Portuguese.po # Translations/WinMerge/Romanian.po # Translations/WinMerge/Russian.po # Translations/WinMerge/Serbian.po # Translations/WinMerge/Sinhala.po # Translations/WinMerge/Slovak.po # Translations/WinMerge/Slovenian.po # Translations/WinMerge/Spanish.po # Translations/WinMerge/Swedish.po # Translations/WinMerge/Tamil.po # Translations/WinMerge/Turkish.po # Translations/WinMerge/Ukrainian.po # Translations/WinMerge/Vietnamese.po
…s and FolderCompare projects are not yet buildable. MFC dependencies still need to be removed from TreeSitterParser.
* Fix tree-sitter go to definition from context menus Update right-click navigation to resolve the symbol under the mouse and prefer tagged type definitions when the position-based lookup stays on the current line. * Update tree-sitter context-menu definition handling
# Conflicts: # Src/Merge.vcxproj # Src/MergeDoc.cpp # Src/MergeDoc.h
Contributor
There was a problem hiding this comment.
CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
Replace ITextBuffer* parameter in NotifyEdit with TextEdit struct. Move notification to buffer layer (AddUndoRecord) for consistency.
- Move TreeSitterParser and TreeSitterWrapper from Externals/crystaledit/editlib to Src/ - Move tree-sitter library from Externals/crystaledit/editlib/ to Externals/ (top-level) - Remove TreeSitter references from editlibparsers.vcxitems (CrystalEdit shared items) - Update include paths in WinMerge source files to reference local TreeSitter headers - Update project files and solution configuration This decouples tree-sitter from CrystalEdit, making CrystalEdit a pure text editor library while keeping tree-sitter as a WinMerge-specific feature.
…esign Remove stored buffer reference from CTreeSitterParser and pass ITextBuffer* explicitly to methods that need it. This eliminates hidden state and makes buffer dependencies explicit at call sites. Changes: - Remove m_pBuffer, SetBuffer(), and GetBuffer() from CTreeSitterParser - Add ITextBuffer* parameter to FindDefinition() and TryGetTagDefinitionByNameAt() - Introduce TreeSitterParseContext struct to hold both parser and buffer references - Update MergeDoc to create and own TreeSitterParseContext instances - Update ParseLineTreeSitter() to use context for lazy reparse with explicit buffer - Update all call sites in MergeEditView to pass buffer parameter
Keep only the highest priority highlight when multiple captures match the same token range, preventing conflicting color indices.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR integrates tree-sitter as an optional/alternative syntax parsing backend in WinMerge (alongside the existing CrystalEdit line-based parsers), adds runtime-loaded grammar/query support, introduces a Tree-sitter mode option in Editor settings, and wires up Go to Definition (F12) using tree-sitter tags/locals.
Changes:
- Add a new
CTreeSitterParserimplementation with registry/grammar DLL loading, highlight caching, locals/tags resolution, and optional injection highlighting. - Add UI/config plumbing for selecting Tree-sitter preference order and register parser factories accordingly.
- Package TreeSitter grammar/query assets in build scripts/installer and add new language IDs / color scheme strings for additional formats.
Reviewed changes
Copilot reviewed 39 out of 40 changed files in this pull request and generated 18 comments.
Show a summary per file
| File | Description |
|---|---|
| WinMerge.vs2017.sln | Adds tree-sitter shared-items project and SharedMSBuildProjectFiles entries (currently with a path mismatch). |
| WinMerge.sln | Adds tree-sitter shared-items project and SharedMSBuildProjectFiles entries. |
| Translations/WinMerge/StringBlacklist.txt | Adds newly supported language names to translation blacklist. |
| Testing/GoogleTest/UnitTests/UnitTests.vcxproj.filters | Includes Src\TreeSitterParser.cpp in UnitTests project filters. |
| Testing/GoogleTest/UnitTests/UnitTests.vcxproj | Imports tree-sitter shared items and compiles TreeSitterParser.cpp in UnitTests project. |
| Testing/FolderCompare/FolderCompare.vcxproj.filters | Includes Src\TreeSitterParser.cpp in FolderCompare test project filters. |
| Testing/FolderCompare/FolderCompare.vcxproj | Imports tree-sitter shared items and compiles TreeSitterParser.cpp in FolderCompare test project. |
| Src/TreeSitterParser.h | Introduces public API for tree-sitter language loading, color mapping, parser, and registry/factory. |
| Src/TreeSitterParser.cpp | Implements tree-sitter parsing/highlighting, locals/tags, injections, caching, and registry lazy-loading. |
| Src/resource.h | Adds Tree-sitter UI control/menu IDs and expands color-scheme IDs for new languages. |
| Src/PropEditor.h | Adds m_nTreeSitterMode option field to editor options panel. |
| Src/PropEditor.cpp | Binds OPT_TREE_SITTER_MODE and populates Tree-sitter mode combo (contains a “Built-in” typo). |
| Src/OptionsInit.cpp | Initializes default value for OPT_TREE_SITTER_MODE. |
| Src/OptionsDef.h | Defines OPT_TREE_SITTER_MODE. |
| Src/MergeEditView.h | Adds tree-sitter parser member + Go to Definition hooks. |
| Src/MergeEditView.cpp | Wires Go to Definition command + incremental edit notifications for the on-demand parser. |
| Src/Merge.vcxproj.filters | Adds TreeSitterParser source/header to Merge project filters. |
| Src/Merge.vcxproj | Imports tree-sitter shared items and compiles TreeSitterParser. |
| Src/Merge.rc | Adds Go to Definition menu item + accelerator, editor option UI controls, and new strings (contains a “Built-in” typo). |
| Src/Merge.cpp | Registers parser factories in preferred order based on OPT_TREE_SITTER_MODE (missing default case). |
| Installer/InnoSetup/WinMergeX86.iss | Installs TreeSitterGrammars directory into {app}. |
| Installer/InnoSetup/WinMergeX64NonAdmin.iss | Installs TreeSitterGrammars directory into {app}. |
| Installer/InnoSetup/WinMergeX64.iss | Installs TreeSitterGrammars directory into {app}. |
| Installer/InnoSetup/WinMergeX64.is6.iss | Installs TreeSitterGrammars directory into {app}. |
| Installer/InnoSetup/WinMergeARM64.is6.iss | Installs TreeSitterGrammars directory into {app}. |
| Externals/versions.txt | Documents tree-sitter and included grammar versions. |
| Externals/tree-sitter.vcxitems | Adds shared-items build of tree-sitter lib.c + include paths. |
| Externals/crystaledit/editlib/TextDefinition.h | Adds new LanguageId entries (F# signature, Markdown, TSX, TypeScript, YAML). |
| Externals/crystaledit/editlib/TextDefinition.cpp | Adds new TextDefinitions and adjusts JS/TS extension mapping. |
| Externals/crystaledit/editlib/editlib.vcxitems.filters | Normalizes formatting (line-numbered diff) without functional changes. |
| DownloadDeps.cmd | Downloads tree-sitter grammar packs and stages TreeSitterGrammars into Build output. |
| Docs/Users/Contributors.txt | Adds tree-sitter and grammar projects to external components list. |
| BuildArc.cmd | Packages TreeSitterGrammars into distribution ZIP layout. |
| ALL.vs2017.sln | Adds tree-sitter shared-items project + SharedMSBuildProjectFiles entries (currently with a path mismatch). |
| ALL.sln | Adds tree-sitter shared-items project + SharedMSBuildProjectFiles entries. |
| .gitmodules | Adds tree-sitter and tree-sitter-grammars submodules. |
| .github/workflows/main.yml | Enables core.longpaths before recursive submodule checkout. |
| .github/workflows/codeql-analysis.yml | Enables core.longpaths before recursive submodule checkout. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+1336
to
+1357
| uint32_t parentStartRow = inj.startPoint.row + capStart.row; | ||
| uint32_t parentEndRow = inj.startPoint.row + capEnd.row; | ||
| uint32_t parentStartCol = (capStart.row == 0) | ||
| ? inj.startPoint.column + capStart.column | ||
| : capStart.column; | ||
|
|
||
| // Add to parent's line blocks | ||
| for (uint32_t row = parentStartRow; | ||
| row <= parentEndRow && row < static_cast<uint32_t>(m_nLineCount); | ||
| row++) | ||
| { | ||
| uint32_t byteCol = (row == parentStartRow) ? parentStartCol : 0; | ||
| int charPos = byteCol / sizeof(wchar_t); | ||
|
|
||
| TreeSitterLineBlock block; | ||
| block.nCharPos = charPos; | ||
| block.nColorIndex = colorIndex; | ||
| block.nPriority = MakeCapturePriority(sCapName, | ||
| ts_node_start_byte(capNode), ts_node_end_byte(capNode)); | ||
| block.nOrder = NextBlockOrder(); | ||
| m_lineBlocks[row].push_back(block); | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.