Kavita/API/Services/Tasks/Scanner/Parser/Parser.cs
Joe Milazzo 1035e911bb
v0.7.5 - Remove from On Deck (#2142)
* Report Media Issues (#1964)

* Started working on a report problems implementation.

* Started code

* Added logging to book and archive service.

* Removed an additional ComicInfo read when comicinfo is null when trying to load. But we've already done it once earlier, so there really isn't any point.

* Added basic implementation for media errors.

* MediaErrors will ignore duplicate errors when there are multiple issues on same file in a scan.

* Fixed unit tests

* Basic code in place to view and clear. Just UI Cleanup needed.

* Slight css upgrade

* Fixed up centering and simplified the code to use regular array instead of observables as it wasn't working.

* Fixed unit tests

* Fixed unit tests for real

* Bump versions by dotnet-bump-version.

* Expanded Metadata for EPUBs (#1965)

* Fixed a bug breaking ability to save server settings

* Explicitly capture more people roles from Epubs, else fallback to how we do it now. It seems to be getting called twice and 2nd time is overriding data. Not sure why

* Refactored the code to clean it up

* Added support for generating collections or reading list based on dc:title and collection title-type with an optional display-seq.

* ReadingList/Collection support can't be done until VersOne supports. https://github.com/vers-one/EpubReader/issues/81

* Double include author for epub parsing and let the People code handle removing duplicates.

* Bump versions by dotnet-bump-version.

* Nothing changed, this is just to retrigger a stable build. (#1967) (#1968)

* Adding paper book reader theme (#1976)

* Adding paper book reader theme

# Added
- Added: Paper book reader theme

* Fixing some leftover styles

* adding book emulation to 2column layout for paper style

* Adding migrations

* removing migration and compressing image

* Reverting DataContextModelSnapshot

* checking out datacontextmodelsnapshot file

* Bump versions by dotnet-bump-version.

* Web Links (#1983)

* Updated dependencies

* Updated the default key to be 256 bits to meet security requirements.

* Added basic implementation of web link resolving favicon. Needs lots more work and testing on all OSes.

* Implemented ability to see links and click on them for an individual chapter.

* Hooked up the ability to set Series web links.

* Render out the web link

* Refactored out the favicon so there is a backup in case it fails. Refactored the baseline image placeholders to be dark mode since that is the default.

* Added Robbie's nice error weblink fallbacks.

* Bump versions by dotnet-bump-version.

* Updated Docker entrypoint (#1984)

* Bump versions by dotnet-bump-version.

* ISBN Support (#1985)

* Fixed a bug where weblinks would always show

* Started to try and support ico -> png conversion by manually grabbing image data out, but it's hard as hell.

* Implemented ability to parse out ISBN codes for books and ISBN-13 codes for ComicInfo. I can't figure out ISBN-10.

* Fixed Favicon not working on anything but windows

* Implemented ISBN support into Kavita

* Don't round so much when transforming bytes

* Bump versions by dotnet-bump-version.

* AVIF Support & Much More! (#1992)

* Expand the list of potential favicon icons to grab.

* Added a url mapping functionality to use alternative urls for fetching icons

* Initial commit to streamline media encoding. No DB migration yet, No UI changes, no Task changes.

* Started refactoring code so that webp queries use encoding format instead.

* More refactoring to remove hardcoded webp references.

* Moved manual migrations to their own folder to keep things organized. Manually drop the obsolete webp keys.

* Removed old apis for converting media and now have one. Reworked where the conversion code was located and streamlined events and whatnot.

* Make favicon encode setting aware

* Cleaned up favicon conversion

* Updated format counter to now just use Extension from MangaFile now that it's been out a while.

* Tweaked jumpbar code to reduce a lookup to hashmap.

* Added AVIF (8-bit only) support.

* In UpdatePeopleList, use FirstOrDefault as Single adds extra checks that may not be needed.

* You can now remove weblinks from edit series page and you can leave empty cells, they will just be removed on backend.

* Forgot a file

* Don't prompt to write a review, just show the pencil. It's the same amount of clicks if you do, less if you dont.

* Fixed Refresh token using wrong Claim to look up the user.

* Refactored how we refresh authentication to perform it every 10 m ins to ensure we always stay authenticated.

* Changed Version update code to run more throughout the day. Updated some hangfire to newer method signatures.

* Bump versions by dotnet-bump-version.

* More Fixes (#1993)

* Strip just isbn: from epub isbns and log when it's back (books)

* Tweaked to allow invalid GTINs but only valid ISBN 10/13s will be saved to Kavita.

* Fixed a bug with parsing series from a filename that is just a chapter range and no chapter/volume keywords.

* Show the media issue count before you open accordion

* Added a inpage filter for Media issues

* Cleanup styles

* Fixed up some code in epub isbn parsing when it's null

* Encode filenames when downloading so that non english characters can be passed properly to UI.

* Added support to parse ComicInfo's with Empty Tags.

* Reset development settings.

* Tweaked the code in generating reading lists to avoid extra work when not needed.

* Fix comicvine's favicon

* Fixed up a unit test

* Tweaked the favicon code to ignore icons that have query parameters

* More favicon work. Expanded ability to grab icons a bit. Added in ability to not keep requesting favicons when we failed to parse already.

* Added a note for later

* Fixed stats server url

* Added more debugging

* Fixed unit tests

* Bump versions by dotnet-bump-version.

* More Fixes from Recent PRs (#1995)

* Added extra debugging for logout issue

* Fixed the null issue with ISBN

* Allow web links to be cleared out

* More logging on refresh token

* More key fallback when building Table of Contents

* Added better fallback implementation for building table of contents based on the many different ways epubs are packed and referenced.

* Updated dependencies

* Fixed up refresh token refresh which was invalidating sessions for no reason. Added it to update last active time as well.

* Bump versions by dotnet-bump-version.

* Fixed a bug with config (#1996)

* Bump versions by dotnet-bump-version.

* Changed IsDocker check (#1998)

* Refactored IsDocker to be completely static and changed to use an environment variable instead.

* Removed file from another branch

* Bump versions by dotnet-bump-version.

* Migrated up to VersOne 3.3 with epub 3.3 support. (#1999)

This enables collection and reading list support from epubs.

* Bump versions by dotnet-bump-version.

* More Bugfixes (EPUB Mainly) (#2004)

* Fixed an issue with downloading where spaces turned into plus signs.

* If the refresh token is invalid, but the auth token still has life in it, don't invalidate.

* Fixed docker users unable to save settings

* Show a default error icon until favicon loads

* Fixed a bug in mappings (keys/files) to pages that caused some links not to map appropriately. Updated epub-reader to v3.3.2.

* Expanded Table of Content generation by also checking for any files that are named Navigation.xhtml to have Kavita generate a simple ToC from (instead of just TOC.xhtml)

* Added another hack to massage key to page lookups when rewriting anchors.

* Cleaned up debugging notes

* Bump versions by dotnet-bump-version.

* More Polish  (#2005)

* Implemented sort title extraction from epub 3 files.

* Added link to wiki for media errors

* Fixed the hack to reduce JWT refresh token expiration

* Fixed up a case where favicon downloading wasn't correcting links that started with // correctly.

Added a fallback for sites that just don't pngs available.

* Implemented a mechanism to fallback to Kavita's website for favicons which can be dynamically added/updated by the community.

* Reworked the logic for bookwalker which will fail to get the base html, so we have to rely on the fallback handler.

* Bump versions by dotnet-bump-version.

* Angular 16 (#2007)

* Removed adv, which isn't needed.

* Updated zone

* Updated to angular 16

* Updated to angular 16 (partially)

* Updated to angular 16

* Package update for Angular 16 (and other dependencies) is complete.

* Replaced all takeUntil(this.onDestroy) with new takeUntilDestroyed()

* Updated all inputs that have ! to be required and deleted all unit tests.

* Corrected how takeUntilDestroyed() is supposed to be implemented.

* Bump versions by dotnet-bump-version.

* Pipeline adjustment for Angular 16 (#2008)

* Bump versions by dotnet-bump-version.

* Try a different build (#2009)

* Bump versions by dotnet-bump-version.

* Continue Reading Bugfix (#2010)

* Fixed an edge case where continue point wasn't considering any chapters that had progress.

Continue point is now slightly faster and uses less memory.

* Added a unit test for a user's case. Still not reproducible

* Bump versions by dotnet-bump-version.

* Ensure chapters are sorted when getting continue point (#2011)

Fixes new behaviour in #1625

* Bump versions by dotnet-bump-version.

* Strip more forms of comments from CSS before parsing/inlining. (#2014)

Handle if ExCSS throws an exception during inlining and attempt to fallback to scoping css instead of inlining.

I still cannot update past ExCSS v4.1.0 else NPEs for common css will be thrown.

* Bump versions by dotnet-bump-version.

* Misc Changes (#2015)

* Updated ng-bootstrap

* Fixed an issue where jumpbar would be disabled when it shouldn't have been.

* When there are duplicate files that make up a volume, show the count on series detail.

* Added basic ISBN searching which will return a chapter back.

* Bump versions by dotnet-bump-version.

* Fixed count for cards (#2016)

* Bump versions by dotnet-bump-version.

* Last Release before Release Testing (#2017)

* Attempting to invalidate JWT on login (when locked out), but can't figure a way to get a JWT, since we don't store them.

Just committing as I'm going to remove the middleware, this is not worth the performance and complexity.

* Removed some security stuff that didn't line up.

* Dropping Token Expiration down to 2 days to test during release testing.

* Bump versions by dotnet-bump-version.

* Removed old migrations for Kavita startup. Only migrations from v0.7.2 onwards are present. (#2019)

* Bump versions by dotnet-bump-version.

* Fixed up jumpbar not properly disabling/enabling (#2022)

* Bump versions by dotnet-bump-version.

* Fix StoryArc & StoryArcNumber mismatch (#2018)

* Ensure StoryArc and StoryArcNumber are max length

* Trim StoryArc to remove excess spaces.

* Replaced with cleaner approach.

* Update with majora2007 recommendations

* Bump versions by dotnet-bump-version.

* Last fixes before release (#2027)

* Disable login button when a login is in-progress. This will help prevent spamming when internet is slow.

* Fixed a bug where an empty space could cause an error when creating a library.

* Apply Split Options throughout the codebase to add extra safe-guard on empty spaces and ensure trimming.

* Bump versions by dotnet-bump-version.

* Added NoContent responses when APIs don't find entities (#2028)

* Bump versions by dotnet-bump-version.

* Few More Fixes (#2032)

* Fixed spreads stretching on PC

* Fixed a bug where reading list dates couldn't be cleared out.

* Reading list page refreshes after updating info in the modal

* Fixed an issue where create library wouldn't take into account advanced settings.

* Fixed an issue where selection of the first chapter of a series to pull series-level metadata could fail in cases where you had Volume 2 and Chapter 1, Volume 2 would be selected.

* Bump versions by dotnet-bump-version.

* Fixed a bug where scan series wouldn't trigger word count analysis nor cover generation. (#2035)

* Bump versions by dotnet-bump-version.

* Okay this should be the last (#2037)

* Fixed improper date visualization for reading list detail page.

* Correct not-read badge position (#2034)

---------

Co-authored-by: Andre Smith <Hobogrammer@users.noreply.github.com>

* Bump versions by dotnet-bump-version.

* Fixed a bug where reading list month wasn't rendering correctly (#2039)

* Bump versions by dotnet-bump-version.

* Version bump (#2040)

* Bump versions by dotnet-bump-version.

* Bugfixes for a hotfix (#2052)

* Nothing changed, this is just to retrigger a stable build. (#1967)

* v0.7.3 - The Quality of Life Update  (#2036)

* Version bump

* Okay this should be the last (#2037)

* Fixed improper date visualization for reading list detail page.

* Correct not-read badge position (#2034)

---------

Co-authored-by: Andre Smith <Hobogrammer@users.noreply.github.com>

* Bump versions by dotnet-bump-version.

* Merged develop in

---------

Co-authored-by: Andre Smith <Hobogrammer@users.noreply.github.com>

* v0.7.3 - The Quality of Life Update (#2041)

* Report Media Issues (#1964)

* Started working on a report problems implementation.

* Started code

* Added logging to book and archive service.

* Removed an additional ComicInfo read when comicinfo is null when trying to load. But we've already done it once earlier, so there really isn't any point.

* Added basic implementation for media errors.

* MediaErrors will ignore duplicate errors when there are multiple issues on same file in a scan.

* Fixed unit tests

* Basic code in place to view and clear. Just UI Cleanup needed.

* Slight css upgrade

* Fixed up centering and simplified the code to use regular array instead of observables as it wasn't working.

* Fixed unit tests

* Fixed unit tests for real

* Bump versions by dotnet-bump-version.

* Expanded Metadata for EPUBs (#1965)

* Fixed a bug breaking ability to save server settings

* Explicitly capture more people roles from Epubs, else fallback to how we do it now. It seems to be getting called twice and 2nd time is overriding data. Not sure why

* Refactored the code to clean it up

* Added support for generating collections or reading list based on dc:title and collection title-type with an optional display-seq.

* ReadingList/Collection support can't be done until VersOne supports. https://github.com/vers-one/EpubReader/issues/81

* Double include author for epub parsing and let the People code handle removing duplicates.

* Bump versions by dotnet-bump-version.

* Nothing changed, this is just to retrigger a stable build. (#1967) (#1968)

* Adding paper book reader theme (#1976)

* Adding paper book reader theme

# Added
- Added: Paper book reader theme

* Fixing some leftover styles

* adding book emulation to 2column layout for paper style

* Adding migrations

* removing migration and compressing image

* Reverting DataContextModelSnapshot

* checking out datacontextmodelsnapshot file

* Bump versions by dotnet-bump-version.

* Web Links (#1983)

* Updated dependencies

* Updated the default key to be 256 bits to meet security requirements.

* Added basic implementation of web link resolving favicon. Needs lots more work and testing on all OSes.

* Implemented ability to see links and click on them for an individual chapter.

* Hooked up the ability to set Series web links.

* Render out the web link

* Refactored out the favicon so there is a backup in case it fails. Refactored the baseline image placeholders to be dark mode since that is the default.

* Added Robbie's nice error weblink fallbacks.

* Bump versions by dotnet-bump-version.

* Updated Docker entrypoint (#1984)

* Bump versions by dotnet-bump-version.

* ISBN Support (#1985)

* Fixed a bug where weblinks would always show

* Started to try and support ico -> png conversion by manually grabbing image data out, but it's hard as hell.

* Implemented ability to parse out ISBN codes for books and ISBN-13 codes for ComicInfo. I can't figure out ISBN-10.

* Fixed Favicon not working on anything but windows

* Implemented ISBN support into Kavita

* Don't round so much when transforming bytes

* Bump versions by dotnet-bump-version.

* AVIF Support & Much More! (#1992)

* Expand the list of potential favicon icons to grab.

* Added a url mapping functionality to use alternative urls for fetching icons

* Initial commit to streamline media encoding. No DB migration yet, No UI changes, no Task changes.

* Started refactoring code so that webp queries use encoding format instead.

* More refactoring to remove hardcoded webp references.

* Moved manual migrations to their own folder to keep things organized. Manually drop the obsolete webp keys.

* Removed old apis for converting media and now have one. Reworked where the conversion code was located and streamlined events and whatnot.

* Make favicon encode setting aware

* Cleaned up favicon conversion

* Updated format counter to now just use Extension from MangaFile now that it's been out a while.

* Tweaked jumpbar code to reduce a lookup to hashmap.

* Added AVIF (8-bit only) support.

* In UpdatePeopleList, use FirstOrDefault as Single adds extra checks that may not be needed.

* You can now remove weblinks from edit series page and you can leave empty cells, they will just be removed on backend.

* Forgot a file

* Don't prompt to write a review, just show the pencil. It's the same amount of clicks if you do, less if you dont.

* Fixed Refresh token using wrong Claim to look up the user.

* Refactored how we refresh authentication to perform it every 10 m ins to ensure we always stay authenticated.

* Changed Version update code to run more throughout the day. Updated some hangfire to newer method signatures.

* Bump versions by dotnet-bump-version.

* More Fixes (#1993)

* Strip just isbn: from epub isbns and log when it's back (books)

* Tweaked to allow invalid GTINs but only valid ISBN 10/13s will be saved to Kavita.

* Fixed a bug with parsing series from a filename that is just a chapter range and no chapter/volume keywords.

* Show the media issue count before you open accordion

* Added a inpage filter for Media issues

* Cleanup styles

* Fixed up some code in epub isbn parsing when it's null

* Encode filenames when downloading so that non english characters can be passed properly to UI.

* Added support to parse ComicInfo's with Empty Tags.

* Reset development settings.

* Tweaked the code in generating reading lists to avoid extra work when not needed.

* Fix comicvine's favicon

* Fixed up a unit test

* Tweaked the favicon code to ignore icons that have query parameters

* More favicon work. Expanded ability to grab icons a bit. Added in ability to not keep requesting favicons when we failed to parse already.

* Added a note for later

* Fixed stats server url

* Added more debugging

* Fixed unit tests

* Bump versions by dotnet-bump-version.

* More Fixes from Recent PRs (#1995)

* Added extra debugging for logout issue

* Fixed the null issue with ISBN

* Allow web links to be cleared out

* More logging on refresh token

* More key fallback when building Table of Contents

* Added better fallback implementation for building table of contents based on the many different ways epubs are packed and referenced.

* Updated dependencies

* Fixed up refresh token refresh which was invalidating sessions for no reason. Added it to update last active time as well.

* Bump versions by dotnet-bump-version.

* Fixed a bug with config (#1996)

* Bump versions by dotnet-bump-version.

* Changed IsDocker check (#1998)

* Refactored IsDocker to be completely static and changed to use an environment variable instead.

* Removed file from another branch

* Bump versions by dotnet-bump-version.

* Migrated up to VersOne 3.3 with epub 3.3 support. (#1999)

This enables collection and reading list support from epubs.

* Bump versions by dotnet-bump-version.

* More Bugfixes (EPUB Mainly) (#2004)

* Fixed an issue with downloading where spaces turned into plus signs.

* If the refresh token is invalid, but the auth token still has life in it, don't invalidate.

* Fixed docker users unable to save settings

* Show a default error icon until favicon loads

* Fixed a bug in mappings (keys/files) to pages that caused some links not to map appropriately. Updated epub-reader to v3.3.2.

* Expanded Table of Content generation by also checking for any files that are named Navigation.xhtml to have Kavita generate a simple ToC from (instead of just TOC.xhtml)

* Added another hack to massage key to page lookups when rewriting anchors.

* Cleaned up debugging notes

* Bump versions by dotnet-bump-version.

* More Polish  (#2005)

* Implemented sort title extraction from epub 3 files.

* Added link to wiki for media errors

* Fixed the hack to reduce JWT refresh token expiration

* Fixed up a case where favicon downloading wasn't correcting links that started with // correctly.

Added a fallback for sites that just don't pngs available.

* Implemented a mechanism to fallback to Kavita's website for favicons which can be dynamically added/updated by the community.

* Reworked the logic for bookwalker which will fail to get the base html, so we have to rely on the fallback handler.

* Bump versions by dotnet-bump-version.

* Angular 16 (#2007)

* Removed adv, which isn't needed.

* Updated zone

* Updated to angular 16

* Updated to angular 16 (partially)

* Updated to angular 16

* Package update for Angular 16 (and other dependencies) is complete.

* Replaced all takeUntil(this.onDestroy) with new takeUntilDestroyed()

* Updated all inputs that have ! to be required and deleted all unit tests.

* Corrected how takeUntilDestroyed() is supposed to be implemented.

* Bump versions by dotnet-bump-version.

* Pipeline adjustment for Angular 16 (#2008)

* Bump versions by dotnet-bump-version.

* Try a different build (#2009)

* Bump versions by dotnet-bump-version.

* Continue Reading Bugfix (#2010)

* Fixed an edge case where continue point wasn't considering any chapters that had progress.

Continue point is now slightly faster and uses less memory.

* Added a unit test for a user's case. Still not reproducible

* Bump versions by dotnet-bump-version.

* Ensure chapters are sorted when getting continue point (#2011)

Fixes new behaviour in #1625

* Bump versions by dotnet-bump-version.

* Strip more forms of comments from CSS before parsing/inlining. (#2014)

Handle if ExCSS throws an exception during inlining and attempt to fallback to scoping css instead of inlining.

I still cannot update past ExCSS v4.1.0 else NPEs for common css will be thrown.

* Bump versions by dotnet-bump-version.

* Misc Changes (#2015)

* Updated ng-bootstrap

* Fixed an issue where jumpbar would be disabled when it shouldn't have been.

* When there are duplicate files that make up a volume, show the count on series detail.

* Added basic ISBN searching which will return a chapter back.

* Bump versions by dotnet-bump-version.

* Fixed count for cards (#2016)

* Bump versions by dotnet-bump-version.

* Last Release before Release Testing (#2017)

* Attempting to invalidate JWT on login (when locked out), but can't figure a way to get a JWT, since we don't store them.

Just committing as I'm going to remove the middleware, this is not worth the performance and complexity.

* Removed some security stuff that didn't line up.

* Dropping Token Expiration down to 2 days to test during release testing.

* Bump versions by dotnet-bump-version.

* Removed old migrations for Kavita startup. Only migrations from v0.7.2 onwards are present. (#2019)

* Bump versions by dotnet-bump-version.

* Fixed up jumpbar not properly disabling/enabling (#2022)

* Bump versions by dotnet-bump-version.

* Fix StoryArc & StoryArcNumber mismatch (#2018)

* Ensure StoryArc and StoryArcNumber are max length

* Trim StoryArc to remove excess spaces.

* Replaced with cleaner approach.

* Update with majora2007 recommendations

* Bump versions by dotnet-bump-version.

* Last fixes before release (#2027)

* Disable login button when a login is in-progress. This will help prevent spamming when internet is slow.

* Fixed a bug where an empty space could cause an error when creating a library.

* Apply Split Options throughout the codebase to add extra safe-guard on empty spaces and ensure trimming.

* Bump versions by dotnet-bump-version.

* Added NoContent responses when APIs don't find entities (#2028)

* Bump versions by dotnet-bump-version.

* Few More Fixes (#2032)

* Fixed spreads stretching on PC

* Fixed a bug where reading list dates couldn't be cleared out.

* Reading list page refreshes after updating info in the modal

* Fixed an issue where create library wouldn't take into account advanced settings.

* Fixed an issue where selection of the first chapter of a series to pull series-level metadata could fail in cases where you had Volume 2 and Chapter 1, Volume 2 would be selected.

* Bump versions by dotnet-bump-version.

* Fixed a bug where scan series wouldn't trigger word count analysis nor cover generation. (#2035)

* Bump versions by dotnet-bump-version.

* Okay this should be the last (#2037)

* Fixed improper date visualization for reading list detail page.

* Correct not-read badge position (#2034)

---------

Co-authored-by: Andre Smith <Hobogrammer@users.noreply.github.com>

* Bump versions by dotnet-bump-version.

* Fixed a bug where reading list month wasn't rendering correctly (#2039)

* Bump versions by dotnet-bump-version.

* Version bump (#2040)

* Bump versions by dotnet-bump-version.

* Fixed bug in CI pipeline for main

---------

Co-authored-by: Robbie Davis <robbie@therobbiedavis.com>
Co-authored-by: Chris Plaatjes <kizaing@gmail.com>
Co-authored-by: pssandhu <pssandhu@users.noreply.github.com>
Co-authored-by: Jolyon Suthers <jolyon.suthers@gmail.com>
Co-authored-by: Andre Smith <Hobogrammer@users.noreply.github.com>

* Reverted a scaling issue for fit to width

* Fixed an issue where creating a new library wouldn't persist advanced options due to a conflict with default value.

When deleting a library, give the library name in the prompt.

* Fixed kbd tags in epubs with paper theme having a style conflict.

* Fixed an edge case where the incorrect first cover could be chosen in some strange grouping situations.

* Manually sort directories as some OSes don't return them in a natural sort order.

* Fixed an issue where autocompleting when adding a directory could throw an error when you're typing.

---------

Co-authored-by: Andre Smith <Hobogrammer@users.noreply.github.com>
Co-authored-by: Robbie Davis <robbie@therobbiedavis.com>
Co-authored-by: Chris Plaatjes <kizaing@gmail.com>
Co-authored-by: pssandhu <pssandhu@users.noreply.github.com>
Co-authored-by: Jolyon Suthers <jolyon.suthers@gmail.com>

* Bump versions by dotnet-bump-version.

* [skipci] No User facing Changes (#2054)

* Setup canary GA

* Fixed bad repo

* Aligned GA (#2059)

* v0.7.4 - Kavita+ Launch (#2117)

* Initial Canary Push (#2055)

* Added AniList Token

* Implemented the ability to set your AniList token. License check is not in place.

* Added a check that validates AniList token is still valid. As I build out more support, I will add more checks.

* Refactored the code to validate the license before allowing UI control to be edited.

* Started license server stuff, but may need to change approach.

Hooked up ability to scrobble rating events to KavitaPlus API.

* Hooked in the ability to sync Mark Series as Read/Unread

* Fixed up unit tests and only scrobble when a full chapter is read naturally.

* Fixed up the Scrobbling service

* Tweak one of the queries

* Started an idea for Scrobble History, might rework into generic TaskHistory.

* AniList Token now has a validation check.

* Implemented a mechanism such that events are persisted to the database, processed every X hours to the API layer, then deleted from the database.

* Hooked in code for want to read so we only send what's important. Will migrate these to bulk calls to lessen strain on API server.

* Added some todos. Need to take a break.

* Hooked up the ability to backfill scrobble events after turning it on.

* Started on integrating license key into the server and ability to turn off scrobbling at the library level. Added sync history table for scrobbling and other API based information.

* Started writing to sync table

* Refactored the migrations to flatten them.

Started working a basic license add flow and added in some of the cache. Lots to do.

* Ensure that when we backfill scrobble events, we respect if a library has scrobbling turned on or not.

* Hooked up the ability to send when the series was started to be read

* Refactored the UI to streamline and group KavitaPlus Account Forms.

* Aligning with API

* Fixed bad merge

* Fixed up inputting a user license.

* Hooked up a cron task that validates licenses every 4 hours and on startup.

* Reworked how the update license code works so that we always update the cache and we handle removing license from user.

* Cleaned up some UI code

* UserDto now has if there is a valid license or not. It's not exposed though as there is no need to expose the license key ever.

* Fixed a strange encoding issue with extra ".

Started working on having the UI aware of the license information.

Refactored all code to properly pass the correct license to the API layer.

* There is a circular dependency in the code.

Fixed some theme code which wasn't checking the right variable.

Reworked the JWT interceptor to be better at handling async code.

Lots of misc code changes, DI circular issue is still present.

* Fixed the DI issue and moved all things that need bootstrapping to app.component.

* Hooked up the ability to not have a donation button show up if the server default user/admin has a valid KavitaPlus license.

* Refactored how we extract out ids from weblinks

* Ensure if API fails, we don't delete the record.

* Refactored how rate checks occur for scrobbling processing.

* Lots of testing and ensuring rate limit doesn't get destroyed.

* Ensure the media item is valid for that user's providers set.

* Refactored the loop code into one method to keep things much cleaner

* Lots of code to get the scrobbling streamlined and foolproof. Unknown series are now reported on the UI.

* Prevent duplicates for scrobble errors.

* Ensure we are sending the correct type to the Scrobble Provider

* Ensure we send the date of the scrobble event for upstream to use.

* Replaced the dedicated run backfilling of scrobble events to just trigger when setting the anilist token for the first time.

Streamlined a lot of the code for adding your license to ensure user understands how it works.

* Fixed a bug where scan series wasn't triggering word count or cover generation.

* Started the plumbing for recommendations

* Merge conflicts

* Recommendation plumbing is nearly complete.

* Setup response caching and general cleanup

* Fixed UI not showing the recommendation tab

* Switched to prod url

* Fixed broken unit tests due to Hangfire not being setup for unit tests

* Fixed branch selection (#2056)

* Damn you GA (#2058)

* Bump versions by dotnet-bump-version.

* Fixed GA not pulling the right branch and removed unneeded building from veresion job (#2060)

* Bump versions by dotnet-bump-version.

* Canary Second (#2071)

* Just started

* Started building the user review card. Fixed Recommendations not having user progress on them.

* Fixed a bug where scrobbling ratings wasn't working.

* Added a temp ability to trigger scrobbling processing for testing.

* Cleaned up the design of review card. Added a temp way to trigger scrobbling.

* Fixed clear scrobbling errors and refactored so reviews now load from DB and is streamlined.

* Refactored so edit review is now a single module component and editable from the series detail page.

* Removed SyncHistory table as it's no longer needed. Refactored read events to properly update to the latest progress information. Refactored to a new way of clearing events, so that user's can see their scrobble history.

* Fixed a bug where Anilist token wouldn't show as set due to some state issue

* Added the ability to see your own scrobble events

* Avoid a potential collision with recommendations.

* Fixed an issue where when checking for a license on UI, it wouldn't force the check (in case server was down on first check).

* External reviews are implemented.

* Fixed unit tests

* Bump versions by dotnet-bump-version.

* Made the api url dynamic based on dev more or not. (#2072)

* Bump versions by dotnet-bump-version.

* Canary Build 3 (#2079)

* Updated reviews to have tagline support to match how Anilist has them.

Cleaned up the KavitaPlus documentation and added a feature list.

Review cards look much better.

* Fixed up a NPE in scrobble event creation

* Removed the ability to have images leak in the read more review card.

Review's now show the user if they are a local user, else External.

* Added caching to the reviews and recommendations that come from an external source. Max of 50MB will be used across whole instance. Entries are cached for 1 hour.

* Reviews are looking much better

* Added the ability for users to share their series reviews with other users on the server via a new opt-in mechanism.

Fixed up some cache busting mechanism for reviews.

* More review polish to align with better matching

* Added the extra information for Recommendation matching.

* Preview of the review is much cleaner now and the full body is styled better.

* More anilist specific syntax

* Fixed bad regex

* Added the ability to bust cache.

Spoilers are now implemented for reviews. Introduces:
--review-spoiler-bg-color
--review-spoiler-text-color

* Bump versions by dotnet-bump-version.

* Canary Build 4 (#2086)

* Updated Kavita Plus feature list. Added a hover-over to the progress bars in the app to know exact percentage of reading for a chapter or series.

* Added a button to go to external review. Changed how enums show in the documentation so you can see their string value too.

Limited reviews to top 10 with proper ordering. Drastically cleaned up how we handle preview summary generation

* Cleaned up the margin below review section

* Fixed an issue where a processed scrobble event would get updated instead of a new event created.

* By default, there is now a prompt on series review to add your own, which fills up the space nicely.

Added the backend for Series Holds.

* Scrobble History is now ordered by recent -> latest. Some minor cleanup in other files.

* Added a simple way to see and toggle scrobble service from the series.

* Fixed a bug where updating the user's last active time wasn't writing to database and causing a logout event.

* Tweaked the registration email wording to be more clear for email field.

* Improved OPDS Url generation and included using host name if defined.

* Fixed the issues with choosing the correct series cover image. Added many unit tests to cover the edge cases.

* Small cleanup

* Fixed an issue where urls with , in them would break weblinks.

* Fixed a bug where we weren't trying a png before we hit fallback for favicon parsing.

* Ensure scrobbling tab isn't active without a license.

Changed how updating user last active worked to supress more concurrency issues.

* Fixed an issue where duplicate series could appear on newly added during a scan.

* Bump versions by dotnet-bump-version.

* Fixed a bad dto (#2087)

* Bump versions by dotnet-bump-version.

* Canary Build 4 (#2089)

* New server-based auth is in place with the ability to register the instance.

* Refactored to single install bound licensing.

* Made the Kavita+ tab gold.

* Change the JWTs to last 10 days. This is a self-hosted software and the usage doesn't need the level of 2 days expiration

* Bump versions by dotnet-bump-version.

* Canary Build 4 (#2090)

* By default, a new library will only have scrobbling on if it's of type book or manga given current scrobble providers.

* Started building out external reviews.

* Added the ability to re-enter your license information.

* Fixed side nav not extending enough

* Fixed a bug with info cards

* Integrated rating support, fixed review cards without a tagline, and misc fixes.

* Streamlined where ratings are located on series detail page.

* Aligned with other series lookups

* Bump versions by dotnet-bump-version.

* Canary Build 6 (#2092)

* Cleaned up some messaging

* Fixed up series detail

* Cleanup

* Bump versions by dotnet-bump-version.

* Canary Build 6 (#2093)

* Fixed scrobble token not being visible by default.

* Added a loader for external reviews

* Added the ability to edit series details (weblinks) from Scrobble Issues page.

* Slightly lessened the focus on buttons

* Fixed review cards so whenever you click your own review, it will open the edit modal.

* Need for speed - Updated Kavita log to be much smaller and replaced all code ones with a 32x version.

* Optimized a ton of our images to be much smaller and faster to load.

* Added more MIME types for response compression

* Edit Series modal name field should be readonly as it is directly mapped to file metadata or filename parsed. It shouldn't be changeable via the UI.

* Removed the ability to update the Series name via Kavita UI/API as it is no longer editable.

* Moved Image component to be standalone

* Moved ReadMore component to be standalone

* Moved PersonBadge component to be standalone

* Moved IconAndTitle component to be standalone

* Fixed some bugs with standalone.

* Hooked in the ability to scrobble series reviews.

* Refactored everything to use HashUtil token rather than InstallId.

* Swapped over to a generated machine token and fixed an issue where after registering, the license would not say valid.

* Added the missing migration for review scrobble events.

* Clean up some wording around busting cache.

* Fixed a bug where chapters within a volume could be unordered in the UI info screen.

* Refactored to prepare for external series rendering on series detail.

* Implemented external recs

* Bump versions by dotnet-bump-version.

* Canary Build 7 (#2097)

* Aligned ExtractId to extract a long, since MAL id can be just that.

* Fixed external series card not clicking correctly.

Fixed a bug when extracting a Mal link.

Fixed cancel button on license component.

* Renamed user-license to license component given new direction for licensing.

* Implemented card layout for recommendations

* Moved more components over to be standalone and removed pipes module. This is going to take some time for sure.

* Removed Cards and SharedCardsSideNav and SideNav over to standalone. This has been shaken out.

* Cleaned up a bunch of extra space on reading list detail page.

* Fixed rating popover not having a black triangle.

* When checking license, show a loading indicator for validity icon.

* Cache size can now be changed by admins if they want to give more memory for better browsing.

* Added LastReadTime

* Cleanup the scrobbling control text for Library Settings.

* Fixed yet another edge case for getting series cover image where first volume is higher than 1 and the rest is just loose leaf chapters.

* Changed OPDS Content Type to be application/atom+xml to align better with the spec.

* Fixed unit tests

* Bump versions by dotnet-bump-version.

* Canary Build 7 (#2098)

* Fixed the percentage readout on card item progress bar

* Ensure scrobble control is always visible

* Review card could show person icon in tablet viewport.

* Changed how the ServerToken for node locking works as docker was giving different results each time.

* After we update series metadata, bust cache

* License componet cleanup on the styles

* Moved license to admin module and removed feature modal as wiki is much easier to maintain.

* Bump versions by dotnet-bump-version.

* Canary Build 8 (#2100)

* Fixed a very slight amount of the active nav tag bleeding outside the border radius

* Switched how we count words in epub to handle languages that don't have spaces.

* Updated dependencies and fixed a series cover image on list item view for recs.

* Fixed a bug where external recs werent showing summary of the series.

* Rewrote the rec loop to be cleaner

* Added the ability to see series summary on series detail page on list view.

Changed Scrobble Event page to show in server time and not utc.

* Added tons of output to identify why unraid generates a new fingerprint each time.

* Refactored scrobble event table to have filtering and pagination support.

Fixed a few bad template issues and fixed loading scrobbling tab on refresh of page.

* Aligned a few apis to use a default pagination rather than a higher level one.

* Undo OPDS change as Chunky/Panels break.

* Moved the holds code around

* Don't show an empty review for the user, it eats up uneeded space and is ugly.

* Cleaned up the review code

* Fixed a bug with arrow on sortable table header.

* More scrobbling debug information to ensure events are being processed correctly.

* Applied a ton of code cleanup build warnings

* Enhanced rec matching by prioritizing matching on weblinks before falling back to name matching.

* Fixed the calculation of word count for epubs.

* Bump versions by dotnet-bump-version.

* Canary Build 9 (#2104)

* Added another unit test

* Changed how we create cover images to force the aspect ratio, which allows for Kavita to do some extra work later down the line. Prevents skewing from comic sources.

* Code cleanup

* Updated signatures to explicitly indicate they return a physical file.

* Refactored the GA to be a bit more streamlined.

* Fixed up how after cover conversion, how we refresh volume and series image links.

* Undid the PhysicalFileResult stuff.

* Fixed an issue in the epub reader where html tags within an anchor could break the navigation code for inner-links.

* Fixed a bug in GetContinueChapter where a special could appear ahead of a loose leaf chapter.

* Optimized aspect ratios for custom library images to avoid shift layout.

Moved the series detail page down a bit to be inline with first row of actionables.

* Finally fixed the media conversion issue where volumes and series wouldn't get their file links updated.

* Added some new layout for license to allow a user to buy a sub after their last sub expired.

* Added more metrics for fingerprinting to test on docker.

* Tried to fix a bug with getnextchapter looping incorrectly, but unable to solve.

* Cleanup some UI stuff to reduce bad calls.

* Suppress annoying issues with reaching K+ when it's down (only affects local builds)

* Fixed an edge case bug for picking the correct cover image for a series.

* Fixed a bug where typeahead x wouldn't clear out the input field.

* Renamed Clear -> Reset for metadata filter to be more informative of its function.

* Don't allow duplicates for reading list characters.

* Fixed a bug where when calculating recently updated, series with the same name but different libraries could get grouped.

* Fixed an issue with fit to height where there could still be a small amount of scroll due to a timing issue with the image loading.

* Don't show a loading if the user doesn't have a license for external ratings

* Fixed bad stat url

* Fixed up licensing to make it so you have to email me to get a sub renewed.

* Updated deps

* When scrobbling reading events, recalculate the highest chapter/volume during processing.

* Code cleanup

* Disabled some old test code that is likely not needed as it breaks a lot on netvips updates

* Bump versions by dotnet-bump-version.

* Canary Build 10 (#2105)

* Aligned fingerprint to be unique

* Updated email button to have a template

* Fixed inability to progress to next chapter when last page is a spread and user is using split rendering.

* Attempted fix at the column reader cutting off parts of the words. Can't fully reproduce, but added a bit of padding to help.

* Aligned AniList icon to match that of weblinks.

* Bump versions by dotnet-bump-version.

* Canary Build 11 (#2108)

* Fixed an issue with continuous reader in manga reader.

* Aligned KavitaPlus->Kavita+

* Updated the readme

* Adjusted first time registration messaging.

* Fixed a bug where having just one type of weblink could cause a bad recommendation lookup

* Removed manual invocation of scrobbling as testing is over for that feature.

* Fixed a bad observerable for downloading logs from browser.

* Don't get reviews/recs for comic libraries. Override user selection for scrobbling on Comics since there are no places to scrobble to.

* Added a migration so all existing comic libraries will have scrobbling turned off.

* Don't allow the UI to toggle scrobbling on a library with no providers.

* Refactored the code to not throw generic 500 toasts on the UI. Added the ability to clear your license on Kavita side.

* Converted reader settings to new accordion format.

* Converted user preferences to new accordion format.

* I couldn't convert CBL Reading modal to new accordion directives due to some weird bug.

* Migrated the whole application to standalone components. This fixes the download progress bar not showing up.

* Hooked up the ability to have reading list generate random items. Removed the old code as it's no longer needed.

* Added random covers for collection's as well.

* Added a speed up to not regenerate merged covers if we've already created them.

* Fixed an issue where tooltips weren't styled correctly after updating a library. Migrated Library access modal to OnPush.

* Fixed broken table styling. Fixed grid breakpoint css variables not using the ones from variables due to a missing import.

* Misc fixes around tables and some api doc cleanup

* Fixed a bug where when switching from webtoon back to a non-webtoon reading mode, if the browser size isn't large enough for double, the reader wouldn't go to single mode.

* When combining external recs, normalize names to filter out differences, like capitalization.

* Finally get to update ExCSS to the latest version! This adds much more css properties for epubs.

* Ensure rejected reviews are saved as errors

* A crap ton of code cleanup

* Cleaned up some equality code in GenreHelper.cs

* Fixed up the table styling after the bootstrap update changed it.

* Bump versions by dotnet-bump-version.

* Canary Build 12 (#2111)

* Aligned GA (#2059)

* Fixed the code around merging images to resize them. This will only look correct if this release's cover generation runs.

* Misc code cleanup

* Fixed an issue with epub column layout cutting off text

* Collection detail page will now default sort by sort name.

* Explicitly lazy load library icon images.

* Make sure the full error message can be passed to the license component/user.

* Use WhereIf in some places

* Changed the hash util code for unraid again

* Fixed up an issue with split render mode where last page wouldn't move into the next chapter.

* Bump versions by dotnet-bump-version.

* Don't ask me how, but i think I fixed the epub cutoff issue (#2112)

* Bump versions by dotnet-bump-version.

* Canary 14 (#2113)

* Switched how we build the unraid fingerprint.

* Fixed a bit of space below the image on fit to height

* Removed some bad code

* Bump versions by dotnet-bump-version.

* Canary Build 15 (#2114)

* When performing a scan series, force a recount of words/pages to ensure read time gets updated.

* Fixed broken download logs button (develop)

* Sped up the query for getting libraries and added caching for that api, which is helpful for users with larger library counts.

* Fixed an issue in directory picker where if you had two folders with the same name, the 2nd to last wouldn't be clickable.

* Added more destroy ref stuff.

* Switched the buy/manage links over to be environment specific.

* Bump versions by dotnet-bump-version.

* Canary Build 16 (#2115)

* Added the promo code for K+ and version bump.

* Don't show see more if there isn't more to see on series detail.

* Bump versions by dotnet-bump-version.

* Last Build (#2116)

* Merge

* Close the view after removing a license key from server.

* Bump versions by dotnet-bump-version.

* Reset version to v0.7.4 for merge.

* Bump versions by dotnet-bump-version.

* Cleanup from the Release (#2127)

* Added an FAQ link on the Kavita+ tab.

* Don't query Kavita+ for ratings on comic libraries as there is no upstream provider yet.

* Jumpbar keys are a little hard to click

* Fixed an issue where libraries that don't allow scrobbling could be scrobbled when generating past history with read events.

* Made the min/max release year on metadata filter number and removed the spin arrows for styling.

* Fixed disable tabs color contrast due to bootstrap undocumented change.

* Refactored whole codebase to unify caching mechanism. Upped the default cache memory amount to 75 to account for the extra data load. Still LRU.

Fixed an issue where Cache key was using Port instead.

Refactored all the Configuration code to use strongly typed deserialization.

* Fixed an issue where get latest progress would throw an exception if there was no progress due to LINQ and MAX query.

* Fixed a bug where Send to Device wasn't present on Series cards.

* Hooked up the ability to change the cache size for Kavita via the UI.

* Bump versions by dotnet-bump-version.

* Overall Ratings (#2129)

* Corrected tooltip for Cache

* Ensure we sync the DB to what's in appsettings.json for Cache key.

* Change the fingerprinting method for Windows installs exclusively to avoid churn due to how security updates are handled.

* Hooked up the ability to see where reviews are from via an icon on the review card, rather than having to click or know that MAL has "external Review" as title.

* Updated FAQ for Kavita+ to link directly to the FAQ

* Added the ability for all ratings on a series to be shown to the user.

Added favorite count on AL and MAL

* Cleaned up so the check for Kavita+ license doesn't seem like it's running when no license is registered.

* Tweaked the test instance buy link to test new product.

* Bump versions by dotnet-bump-version.

* Remove From On Deck (#2131)

* Allow admins to customize the amount of progress time or last item added time for on deck calculation

* Implemented the ability to remove series from on deck. They will be removed until the user reads a new chapter.

Quite a few db lookup reduction calls for reading based stuff, like continue point, bookmarks, etc.

* Bump versions by dotnet-bump-version.

* Preparation for Release (#2135)

* Don't allow Comic libraries to do any scrobbling as there aren't any Comic scrobbling providers yet.

* Fixed a bug where if you have multiple libraries pointing the same folder (for whatever reason), the Scan Folder api could be rejected.

* Handle if publication from an epub is empty to avoid a bad parse error

* Cleaned up some hardcoded default strings.

* Fixed up some defaulting code for the cache size.

* Changed how moving something back to on deck works after it's been removed. Now any progress will trigger it, as epubs don't have chapters.

* Ignore .caltrash, which is a Calibre managed folder, when scanning.

* Added the ability to see Volume Last Read Date (or individual chapter) in details drawer. Hover over the clock for the full timestamp.

* Bump versions by dotnet-bump-version.

* Forgot 2 files in last PR (#2136)

* Don't allow Comic libraries to do any scrobbling as there aren't any Comic scrobbling providers yet.

* Fixed a bug where if you have multiple libraries pointing the same folder (for whatever reason), the Scan Folder api could be rejected.

* Handle if publication from an epub is empty to avoid a bad parse error

* Cleaned up some hardcoded default strings.

* Fixed up some defaulting code for the cache size.

* Changed how moving something back to on deck works after it's been removed. Now any progress will trigger it, as epubs don't have chapters.

* Ignore .caltrash, which is a Calibre managed folder, when scanning.

* Added the ability to see Volume Last Read Date (or individual chapter) in details drawer. Hover over the clock for the full timestamp.

* Somehow some files got left off the commit

* Bump versions by dotnet-bump-version.

* Changed the fingerprinting code for Kavita+. Optimized System tab to be way faster. (#2140)

* Bump versions by dotnet-bump-version.

* Version bump (#2141)

---------

Co-authored-by: Robbie Davis <robbie@therobbiedavis.com>
Co-authored-by: Chris Plaatjes <kizaing@gmail.com>
Co-authored-by: pssandhu <pssandhu@users.noreply.github.com>
Co-authored-by: Jolyon Suthers <jolyon.suthers@gmail.com>
Co-authored-by: Andre Smith <Hobogrammer@users.noreply.github.com>
2023-07-18 06:52:50 -07:00

1077 lines
43 KiB
C#
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

using System;
using System.Collections.Immutable;
using System.IO;
using System.Linq;
using System.Text.RegularExpressions;
using API.Entities.Enums;
namespace API.Services.Tasks.Scanner.Parser;
public static class Parser
{
public const string DefaultChapter = "0";
public const string DefaultVolume = "0";
public static readonly TimeSpan RegexTimeout = TimeSpan.FromMilliseconds(500);
public const string ImageFileExtensions = @"^(\.png|\.jpeg|\.jpg|\.webp|\.gif|\.avif)";
public const string ArchiveFileExtensions = @"\.cbz|\.zip|\.rar|\.cbr|\.tar.gz|\.7zip|\.7z|\.cb7|\.cbt";
private const string BookFileExtensions = @"\.epub|\.pdf";
private const string XmlRegexExtensions = @"\.xml";
public const string MacOsMetadataFileStartsWith = @"._";
public const string SupportedExtensions =
ArchiveFileExtensions + "|" + ImageFileExtensions + "|" + BookFileExtensions;
private const RegexOptions MatchOptions =
RegexOptions.IgnoreCase | RegexOptions.Compiled | RegexOptions.CultureInvariant;
private static readonly ImmutableArray<string> FormatTagSpecialKeywords = ImmutableArray.Create(
"Special", "Reference", "Director's Cut", "Box Set", "Box-Set", "Annual", "Anthology", "Epilogue",
"One Shot", "One-Shot", "Prologue", "TPB", "Trade Paper Back", "Omnibus", "Compendium", "Absolute", "Graphic Novel",
"GN", "FCBD");
private static readonly char[] LeadingZeroesTrimChars = new[] { '0' };
private static readonly char[] SpacesAndSeparators = { '\0', '\t', '\r', ' ', '-', ','};
private const string Number = @"\d+(\.\d)?";
private const string NumberRange = Number + @"(-" + Number + @")?";
/// <summary>
/// non greedy matching of a string where parenthesis are balanced
/// </summary>
public const string BalancedParen = @"(?:[^()]|(?<open>\()|(?<-open>\)))*?(?(open)(?!))";
/// <summary>
/// non greedy matching of a string where square brackets are balanced
/// </summary>
public const string BalancedBracket = @"(?:[^\[\]]|(?<open>\[)|(?<-open>\]))*?(?(open)(?!))";
/// <summary>
/// Matches [Complete], release tags like [kmts] but not [ Complete ] or [kmts ]
/// </summary>
private const string TagsInBrackets = $@"\[(?!\s){BalancedBracket}(?<!\s)\]";
/// <summary>
/// Common regex patterns present in both Comics and Mangas
/// </summary>
private const string CommonSpecial = @"Specials?|One[- ]?Shot|Extra(?:\sChapter)?(?=\s)|Art Collection|Side Stories|Bonus";
/// <summary>
/// Matches against font-family css syntax. Does not match if url import has data: starting, as that is binary data
/// </summary>
/// <remarks>See here for some examples https://developer.mozilla.org/en-US/docs/Web/CSS/@font-face</remarks>
public static readonly Regex FontSrcUrlRegex = new Regex(@"(?<Start>(?:src:\s?)?(?:url|local)\((?!data:)" + "(?:[\"']?)" + @"(?!data:))"
+ "(?<Filename>(?!data:)[^\"']+?)" + "(?<End>[\"']?" + @"\);?)",
MatchOptions, RegexTimeout);
/// <summary>
/// https://developer.mozilla.org/en-US/docs/Web/CSS/@import
/// </summary>
public static readonly Regex CssImportUrlRegex = new Regex("(@import\\s([\"|']|url\\([\"|']))(?<Filename>[^'\"]+)([\"|']\\)?);",
MatchOptions | RegexOptions.Multiline, RegexTimeout);
/// <summary>
/// Misc css image references, like background-image: url(), border-image, or list-style-image
/// </summary>
/// Original prepend: (background|border|list-style)-image:\s?)?
public static readonly Regex CssImageUrlRegex = new Regex(@"(url\((?!data:).(?!data:))" + "(?<Filename>(?!data:)[^\"']*)" + @"(.\))",
MatchOptions, RegexTimeout);
private static readonly Regex ImageRegex = new Regex(ImageFileExtensions,
MatchOptions, RegexTimeout);
private static readonly Regex ArchiveFileRegex = new Regex(ArchiveFileExtensions,
MatchOptions, RegexTimeout);
private static readonly Regex ComicInfoArchiveRegex = new Regex(@"\.cbz|\.cbr|\.cb7|\.cbt",
MatchOptions, RegexTimeout);
private static readonly Regex XmlRegex = new Regex(XmlRegexExtensions,
MatchOptions, RegexTimeout);
private static readonly Regex BookFileRegex = new Regex(BookFileExtensions,
MatchOptions, RegexTimeout);
private static readonly Regex CoverImageRegex = new Regex(@"(?<![[a-z]\d])(?:!?)(?<!back)(?<!back_)(?<!back-)(cover|folder)(?![\w\d])",
MatchOptions, RegexTimeout);
private static readonly Regex NormalizeRegex = new Regex(@"[^\p{L}0-9\+!]",
MatchOptions, RegexTimeout);
/// <summary>
/// Recognizes the Special token only
/// </summary>
private static readonly Regex SpecialTokenRegex = new Regex(@"SP\d+",
MatchOptions, RegexTimeout);
private static readonly Regex[] MangaVolumeRegex = new[]
{
// Dance in the Vampire Bund v16-17
new Regex(
@"(?<Series>.*)(\b|_)v(?<Volume>\d+-?\d+)( |_)",
MatchOptions, RegexTimeout),
// NEEDLESS_Vol.4_-Simeon_6_v2[SugoiSugoi].rar
new Regex(
@"(?<Series>.*)(\b|_)(?!\[)(vol\.?)(?<Volume>\d+(-\d+)?)(?!\])",
MatchOptions, RegexTimeout),
// Historys Strongest Disciple Kenichi_v11_c90-98.zip or Dance in the Vampire Bund v16-17
new Regex(
@"(?<Series>.*)(\b|_)(?!\[)v(?<Volume>" + NumberRange + @")(?!\])",
MatchOptions, RegexTimeout),
// Kodomo no Jikan vol. 10, [dmntsf.net] One Piece - Digital Colored Comics Vol. 20.5-21.5 Ch. 177
new Regex(
@"(?<Series>.*)(\b|_)(vol\.? ?)(?<Volume>\d+(\.\d)?(-\d+)?(\.\d)?)",
MatchOptions, RegexTimeout),
// Killing Bites Vol. 0001 Ch. 0001 - Galactica Scanlations (gb)
new Regex(
@"(vol\.? ?)(?<Volume>\d+(\.\d)?)",
MatchOptions, RegexTimeout),
// Tonikaku Cawaii [Volume 11].cbz
new Regex(
@"(volume )(?<Volume>\d+(\.\d)?)",
MatchOptions, RegexTimeout),
// Tower Of God S01 014 (CBT) (digital).cbz
new Regex(
@"(?<Series>.*)(\b|_|)(S(?<Volume>\d+))",
MatchOptions, RegexTimeout),
// vol_001-1.cbz for MangaPy default naming convention
new Regex(
@"(vol_)(?<Volume>\d+(\.\d)?)",
MatchOptions, RegexTimeout),
// Chinese Volume: 第n卷 -> Volume n, 第n册 -> Volume n, 幽游白书完全版 第03卷 天下 or 阿衰online 第1册
new Regex(
@"第(?<Volume>\d+)(卷|册)",
MatchOptions, RegexTimeout),
// Chinese Volume: 卷n -> Volume n, 册n -> Volume n
new Regex(
@"(卷|册)(?<Volume>\d+)",
MatchOptions, RegexTimeout),
// Korean Volume: 제n화|권|회|장 -> Volume n, n화|권|회|장 -> Volume n, 63권#200.zip -> Volume 63 (no chapter, #200 is just files inside)
new Regex(
@"제?(?<Volume>\d+(\.\d)?)(권|회|화|장)",
MatchOptions, RegexTimeout),
// Korean Season: 시즌n -> Season n,
new Regex(
@"시즌(?<Volume>\d+\-?\d+)",
MatchOptions, RegexTimeout),
// Korean Season: 시즌n -> Season n, n시즌 -> season n
new Regex(
@"(?<Volume>\d+(\-|~)?\d+?)시즌",
MatchOptions, RegexTimeout),
// Korean Season: 시즌n -> Season n, n시즌 -> season n
new Regex(
@"시즌(?<Volume>\d+(\-|~)?\d+?)",
MatchOptions, RegexTimeout),
// Japanese Volume: n巻 -> Volume n
new Regex(
@"(?<Volume>\d+(?:(\-)\d+)?)巻",
MatchOptions, RegexTimeout),
// Russian Volume: Том n -> Volume n, Тома n -> Volume
new Regex(
@"Том(а?)(\.?)(\s|_)?(?<Volume>\d+(?:(\-)\d+)?)",
MatchOptions, RegexTimeout),
// Russian Volume: n Том -> Volume n
new Regex(
@"(\s|_)?(?<Volume>\d+(?:(\-)\d+)?)(\s|_)Том(а?)",
MatchOptions, RegexTimeout),
};
private static readonly Regex[] MangaSeriesRegex = new[]
{
// Russian Volume: Том n -> Volume n, Тома n -> Volume
new Regex(
@"(?<Series>.+?)Том(а?)(\.?)(\s|_)?(?<Volume>\d+(?:(\-)\d+)?)",
MatchOptions, RegexTimeout),
// Russian Volume: n Том -> Volume n
new Regex(
@"(?<Series>.+?)(\s|_)?(?<Volume>\d+(?:(\-)\d+)?)(\s|_)Том(а?)",
MatchOptions, RegexTimeout),
// Russian Chapter: n Главa -> Chapter n
new Regex(
@"(?<Series>.+?)(?!Том)(?<!Том\.)\s\d+(\s|_)?(?<Chapter>\d+(?:\.\d+|-\d+)?)(\s|_)(Глава|глава|Главы|Глава)",
MatchOptions, RegexTimeout),
// Russian Chapter: Главы n -> Chapter n
new Regex(
@"(?<Series>.+?)(Глава|глава|Главы|Глава)(\.?)(\s|_)?(?<Chapter>\d+(?:.\d+|-\d+)?)",
MatchOptions, RegexTimeout),
// Grand Blue Dreaming - SP02
new Regex(
@"(?<Series>.*)(\b|_|-|\s)(?:sp)\d",
MatchOptions, RegexTimeout),
// [SugoiSugoi]_NEEDLESS_Vol.2_-_Disk_The_Informant_5_[ENG].rar, Yuusha Ga Shinda! - Vol.tbd Chapter 27.001 V2 Infection ①.cbz
new Regex(
@"^(?<Series>.*)( |_)Vol\.?(\d+|tbd)",
MatchOptions, RegexTimeout),
// Mad Chimera World - Volume 005 - Chapter 026.cbz (couldn't figure out how to get Volume negative lookaround working on below regex),
// The Duke of Death and His Black Maid - Vol. 04 Ch. 054.5 - V4 Omake
new Regex(
@"(?<Series>.+?)(\s|_|-)+(?:Vol(ume|\.)?(\s|_|-)+\d+)(\s|_|-)+(?:(Ch|Chapter|Ch)\.?)(\s|_|-)+(?<Chapter>\d+)",
MatchOptions,
RegexTimeout),
// Ichiban_Ushiro_no_Daimaou_v04_ch34_[VISCANS].zip, VanDread-v01-c01.zip
new Regex(
@"(?<Series>.*)(\b|_)v(?<Volume>\d+-?\d*)(\s|_|-)",
MatchOptions,
RegexTimeout),
// Gokukoku no Brynhildr - c001-008 (v01) [TrinityBAKumA], Black Bullet - v4 c17 [batoto]
new Regex(
@"(?<Series>.*)( - )(?:v|vo|c|chapters)\d",
MatchOptions, RegexTimeout),
// Kedouin Makoto - Corpse Party Musume, Chapter 19 [Dametrans].zip
new Regex(
@"(?<Series>.*)(?:, Chapter )(?<Chapter>\d+)",
MatchOptions, RegexTimeout),
// Please Go Home, Akutsu-San! - Chapter 038.5 - Volume Announcement.cbz, My Charms Are Wasted on Kuroiwa Medaka - Ch. 37.5 - Volume Extras
new Regex(
@"(?<Series>.+?)(\s|_|-)(?!Vol)(\s|_|-)((?:Chapter)|(?:Ch\.))(\s|_|-)(?<Chapter>\d+)",
MatchOptions, RegexTimeout),
// [dmntsf.net] One Piece - Digital Colored Comics Vol. 20 Ch. 177 - 30 Million vs 81 Million.cbz
new Regex(
@"(?<Series>.+?):? (\b|_|-)(vol)\.?(\s|-|_)?\d+",
MatchOptions, RegexTimeout),
// [xPearse] Kyochuu Rettou Chapter 001 Volume 1 [English] [Manga] [Volume Scans]
new Regex(
@"(?<Series>.+?):?(\s|\b|_|-)Chapter(\s|\b|_|-)\d+(\s|\b|_|-)(vol)(ume)",
MatchOptions,
RegexTimeout),
// [xPearse] Kyochuu Rettou Volume 1 [English] [Manga] [Volume Scans]
new Regex(
@"(?<Series>.+?):? (\b|_|-)(vol)(ume)",
MatchOptions,
RegexTimeout),
//Knights of Sidonia c000 (S2 LE BD Omake - BLAME!) [Habanero Scans]
new Regex(
@"(?<Series>.*)(\bc\d+\b)",
MatchOptions, RegexTimeout),
//Tonikaku Cawaii [Volume 11], Darling in the FranXX - Volume 01.cbz
new Regex(
@"(?<Series>.*)(?: _|-|\[|\()\s?vol(ume)?",
MatchOptions, RegexTimeout),
// Momo The Blood Taker - Chapter 027 Violent Emotion.cbz, Grand Blue Dreaming - SP02 Extra (2019) (Digital) (danke-Empire).cbz
new Regex(
@"^(?<Series>(?!Vol).+?)(?:(ch(apter|\.)(\b|_|-|\s))|sp)\d",
MatchOptions, RegexTimeout),
// Historys Strongest Disciple Kenichi_v11_c90-98.zip, Killing Bites Vol. 0001 Ch. 0001 - Galactica Scanlations (gb)
new Regex(
@"(?<Series>.*) (\b|_|-)(v|ch\.?|c|s)\d+",
MatchOptions, RegexTimeout),
// Hinowa ga CRUSH! 018 (2019) (Digital) (LuCaZ).cbz
new Regex(
@"(?<Series>.*)\s+(?<Chapter>\d+)\s+(?:\(\d{4}\))\s",
MatchOptions, RegexTimeout),
// Goblin Slayer - Brand New Day 006.5 (2019) (Digital) (danke-Empire)
new Regex(
@"(?<Series>.*) (-)?(?<Chapter>\d+(?:.\d+|-\d+)?) \(\d{4}\)",
MatchOptions, RegexTimeout),
// Noblesse - Episode 429 (74 Pages).7z
new Regex(
@"(?<Series>.*)(\s|_)(?:Episode|Ep\.?)(\s|_)(?<Chapter>\d+(?:.\d+|-\d+)?)",
MatchOptions, RegexTimeout),
// Akame ga KILL! ZERO (2016-2019) (Digital) (LuCaZ)
new Regex(
@"(?<Series>.*)\(\d",
MatchOptions, RegexTimeout),
// Tonikaku Kawaii (Ch 59-67) (Ongoing)
new Regex(
@"(?<Series>.*)(\s|_)\((c\s|ch\s|chapter\s)",
MatchOptions, RegexTimeout),
// Fullmetal Alchemist chapters 101-108
new Regex(
@"(?<Series>.+?)(\s|_|\-)+?chapters(\s|_|\-)+?\d+(\s|_|\-)+?",
MatchOptions, RegexTimeout),
// It's Witching Time! 001 (Digital) (Anonymous1234)
new Regex(
@"(?<Series>.+?)(\s|_|\-)+?\d+(\s|_|\-)\(",
MatchOptions, RegexTimeout),
//Ichinensei_ni_Nacchattara_v01_ch01_[Taruby]_v1.1.zip must be before [Suihei Kiki]_Kasumi_Otoko_no_Ko_[Taruby]_v1.1.zip
// due to duplicate version identifiers in file.
new Regex(
@"(?<Series>.*)(v|s)\d+(-\d+)?(_|\s)",
MatchOptions, RegexTimeout),
//[Suihei Kiki]_Kasumi_Otoko_no_Ko_[Taruby]_v1.1.zip
new Regex(
@"(?<Series>.*)(v|s)\d+(-\d+)?",
MatchOptions, RegexTimeout),
// Black Bullet (This is very loose, keep towards bottom)
new Regex(
@"(?<Series>.*)(_)(v|vo|c|volume)( |_)\d+",
MatchOptions, RegexTimeout),
// [Hidoi]_Amaenaideyo_MS_vol01_chp02.rar
new Regex(
@"(?<Series>.*)( |_)(vol\d+)?( |_)(?:Chp\.? ?\d+)",
MatchOptions, RegexTimeout),
// Mahoutsukai to Deshi no Futekisetsu na Kankei Chp. 1
new Regex(
@"(?<Series>.*)( |_)(?:Chp.? ?\d+)",
MatchOptions, RegexTimeout),
// Corpse Party -The Anthology- Sachikos game of love Hysteric Birthday 2U Chapter 01
new Regex(
@"^(?!Vol)(?<Series>.*)( |_)Chapter( |_)(\d+)",
MatchOptions, RegexTimeout),
// Fullmetal Alchemist chapters 101-108.cbz
new Regex(
@"^(?!vol)(?<Series>.*)( |_)(chapters( |_)?)\d+-?\d*",
MatchOptions, RegexTimeout),
// Umineko no Naku Koro ni - Episode 1 - Legend of the Golden Witch #1
new Regex(
@"^(?!Vol\.?)(?<Series>.*)( |_|-)(?<!-)(episode|chapter|(ch\.?) ?)\d+-?\d*",
MatchOptions, RegexTimeout),
// Baketeriya ch01-05.zip
new Regex(
@"^(?!Vol)(?<Series>.*)ch\d+-?\d?",
MatchOptions, RegexTimeout),
// Magi - Ch.252-005.cbz
new Regex(
@"(?<Series>.*)( ?- ?)Ch\.\d+-?\d*",
MatchOptions, RegexTimeout),
// [BAA]_Darker_than_Black_Omake-1, Bleach 001-002, Kodoja #001 (March 2016)
new Regex(
@"^(?!Vol)(?!Chapter)(?<Series>.+?)(-|_|\s|#)\d+(-\d+)?",
MatchOptions, RegexTimeout),
// Baketeriya ch01-05.zip, Akiiro Bousou Biyori - 01.jpg, Beelzebub_172_RHS.zip, Cynthia the Mission 29.rar, A Compendium of Ghosts - 031 - The Third Story_ Part 12 (Digital) (Cobalt001)
new Regex(
@"^(?!Vol\.?)(?!Chapter)(?<Series>.+?)(\s|_|-)(?<!-)(ch|chapter)?\.?\d+-?\d*",
MatchOptions, RegexTimeout),
// [BAA]_Darker_than_Black_c1 (This is very greedy, make sure it's close to last)
new Regex(
@"^(?!Vol)(?<Series>.*)( |_|-)(ch?)\d+",
MatchOptions, RegexTimeout),
// Japanese Volume: n巻 -> Volume n
new Regex(
@"(?<Series>.+?)第(?<Volume>\d+(?:(\-)\d+)?)巻",
MatchOptions, RegexTimeout),
};
private static readonly Regex[] ComicSeriesRegex = new[]
{
// Russian Volume: Том n -> Volume n, Тома n -> Volume
new Regex(
@"(?<Series>.+?)Том(а?)(\.?)(\s|_)?(?<Volume>\d+(?:(\-)\d+)?)",
MatchOptions, RegexTimeout),
// Russian Volume: n Том -> Volume n
new Regex(
@"(?<Series>.+?)(\s|_)?(?<Volume>\d+(?:(\-)\d+)?)(\s|_)Том(а?)",
MatchOptions, RegexTimeout),
// Russian Chapter: n Главa -> Chapter n
new Regex(
@"(?<Series>.+?)(?!Том)(?<!Том\.)\s\d+(\s|_)?(?<Chapter>\d+(?:\.\d+|-\d+)?)(\s|_)(Глава|глава|Главы|Глава)",
MatchOptions, RegexTimeout),
// Russian Chapter: Главы n -> Chapter n
new Regex(
@"(?<Series>.+?)(Глава|глава|Главы|Глава)(\.?)(\s|_)?(?<Chapter>\d+(?:.\d+|-\d+)?)",
MatchOptions, RegexTimeout),
// Tintin - T22 Vol 714 pour Sydney
new Regex(
@"(?<Series>.+?)\s?(\b|_|-)\s?((vol|tome|t)\.?)(?<Volume>\d+(-\d+)?)",
MatchOptions, RegexTimeout),
// Invincible Vol 01 Family matters (2005) (Digital)
new Regex(
@"(?<Series>.+?)(\b|_)((vol|tome|t)\.?)(\s|_)(?<Volume>\d+(-\d+)?)",
MatchOptions, RegexTimeout),
// Batman Beyond 2.0 001 (2013)
new Regex(
@"^(?<Series>.+?\S\.\d) (?<Chapter>\d+)",
MatchOptions, RegexTimeout),
// 04 - Asterix the Gladiator (1964) (Digital-Empire) (WebP by Doc MaKS)
new Regex(
@"^(?<Volume>\d+)\s(-\s|_)(?<Series>.*(\d{4})?)( |_)(\(|\d+)",
MatchOptions, RegexTimeout),
// 01 Spider-Man & Wolverine 01.cbr
new Regex(
@"^(?<Volume>\d+)\s(?:-\s)(?<Series>.*) (\d+)?",
MatchOptions, RegexTimeout),
// Batman & Wildcat (1 of 3)
new Regex(
@"(?<Series>.*(\d{4})?)( |_)(?:\((?<Volume>\d+) of \d+)",
MatchOptions, RegexTimeout),
// Teen Titans v1 001 (1966-02) (digital) (OkC.O.M.P.U.T.O.-Novus), Aldebaran-Antares-t6
new Regex(
@"^(?<Series>.+?)(?: |_|-)(v|t)\d+",
MatchOptions, RegexTimeout),
// Amazing Man Comics chapter 25
new Regex(
@"^(?<Series>.+?)(?: |_)c(hapter) \d+",
MatchOptions, RegexTimeout),
// Amazing Man Comics issue #25
new Regex(
@"^(?<Series>.+?)(?: |_)i(ssue) #\d+",
MatchOptions, RegexTimeout),
// Batman Wayne Family Adventures - Ep. 001 - Moving In
new Regex(
@"^(?<Series>.+?)(\s|_|-)(?:Ep\.?)(\s|_|-)+\d+",
MatchOptions, RegexTimeout),
// Batgirl Vol.2000 #57 (December, 2004)
new Regex(
@"^(?<Series>.+?)Vol\.?\s?#?(?:\d+)",
MatchOptions, RegexTimeout),
// Batman & Robin the Teen Wonder #0
new Regex(
@"^(?<Series>.*)(?: |_)#\d+",
MatchOptions, RegexTimeout),
// Batman & Catwoman - Trail of the Gun 01, Batman & Grendel (1996) 01 - Devil's Bones, Teen Titans v1 001 (1966-02) (digital) (OkC.O.M.P.U.T.O.-Novus)
new Regex(
@"^(?<Series>.+?)(?: \d+)",
MatchOptions, RegexTimeout),
// Scott Pilgrim 02 - Scott Pilgrim vs. The World (2005)
new Regex(
@"^(?<Series>.+?)(?: |_)(?<Chapter>\d+)",
MatchOptions, RegexTimeout),
// The First Asterix Frieze (WebP by Doc MaKS)
new Regex(
@"^(?<Series>.*)(?: |_)(?!\(\d{4}|\d{4}-\d{2}\))\(",
MatchOptions, RegexTimeout),
// spawn-123, spawn-chapter-123 (from https://github.com/Girbons/comics-downloader)
new Regex(
@"^(?<Series>.+?)-(chapter-)?(?<Chapter>\d+)",
MatchOptions, RegexTimeout),
// MUST BE LAST: Batman & Daredevil - King of New York
new Regex(
@"^(?<Series>.*)",
MatchOptions, RegexTimeout),
};
private static readonly Regex[] ComicVolumeRegex = new[]
{
// Teen Titans v1 001 (1966-02) (digital) (OkC.O.M.P.U.T.O.-Novus)
new Regex(
@"^(?<Series>.+?)(?: |_)(t|v)(?<Volume>" + NumberRange + @")",
MatchOptions, RegexTimeout),
// Batgirl Vol.2000 #57 (December, 2004)
new Regex(
@"^(?<Series>.+?)(?:\s|_)(v|vol|tome|t)\.?(\s|_)?(?<Volume>\d+)",
MatchOptions, RegexTimeout),
// Chinese Volume: 第n卷 -> Volume n, 第n册 -> Volume n, 幽游白书完全版 第03卷 天下 or 阿衰online 第1册
new Regex(
@"第(?<Volume>\d+)(卷|册)",
MatchOptions, RegexTimeout),
// Chinese Volume: 卷n -> Volume n, 册n -> Volume n
new Regex(
@"(卷|册)(?<Volume>\d+)",
MatchOptions, RegexTimeout),
// Korean Volume: 제n권 -> Volume n, n권 -> Volume n, 63권#200.zip
new Regex(
@"제?(?<Volume>\d+)권",
MatchOptions, RegexTimeout),
// Japanese Volume: n巻 -> Volume n
new Regex(
@"(?<Volume>\d+(?:(\-)\d+)?)巻",
MatchOptions, RegexTimeout),
// Russian Volume: Том n -> Volume n, Тома n -> Volume
new Regex(
@"Том(а?)(\.?)(\s|_)?(?<Volume>\d+(?:(\-)\d+)?)",
MatchOptions, RegexTimeout),
// Russian Volume: n Том -> Volume n
new Regex(
@"(\s|_)?(?<Volume>\d+(?:(\-)\d+)?)(\s|_)Том(а?)",
MatchOptions, RegexTimeout),
};
private static readonly Regex[] ComicChapterRegex = new[]
{
// Batman & Wildcat (1 of 3)
new Regex(
@"(?<Series>.*(\d{4})?)( |_)(?:\((?<Chapter>\d+) of \d+)",
MatchOptions, RegexTimeout),
// Batman Beyond 04 (of 6) (1999)
new Regex(
@"(?<Series>.+?)(?<Chapter>\d+)(\s|_|-)?\(of",
MatchOptions, RegexTimeout),
// Batman Beyond 2.0 001 (2013)
new Regex(
@"^(?<Series>.+?\S\.\d) (?<Chapter>\d+)",
MatchOptions, RegexTimeout),
// Teen Titans v1 001 (1966-02) (digital) (OkC.O.M.P.U.T.O.-Novus)
new Regex(
@"^(?<Series>.+?)(?: |_)v(?<Volume>\d+)(?: |_)(c? ?)(?<Chapter>(\d+(\.\d)?)-?(\d+(\.\d)?)?)(c? ?)",
MatchOptions, RegexTimeout),
// Batman & Robin the Teen Wonder #0
new Regex(
@"^(?<Series>.+?)(?:\s|_)#(?<Chapter>\d+)",
MatchOptions, RegexTimeout),
// Batman 2016 - Chapter 01, Batman 2016 - Issue 01, Batman 2016 - Issue #01
new Regex(
@"^(?<Series>.+?)((c(hapter)?)|issue)(_|\s)#?(?<Chapter>(\d+(\.\d)?)-?(\d+(\.\d)?)?)",
MatchOptions, RegexTimeout),
// Invincible 070.5 - Invincible Returns 1 (2010) (digital) (Minutemen-InnerDemons).cbr
new Regex(
@"^(?<Series>.+?)(?:\s|_)(c? ?(chapter)?)(?<Chapter>(\d+(\.\d)?)-?(\d+(\.\d)?)?)(c? ?)-",
MatchOptions, RegexTimeout),
// Batgirl Vol.2000 #57 (December, 2004)
new Regex(
@"^(?<Series>.+?)(?:vol\.?\d+)\s#(?<Chapter>\d+)",
MatchOptions,
RegexTimeout),
// Russian Chapter: Главы n -> Chapter n
new Regex(
@"(Глава|глава|Главы|Глава)(\.?)(\s|_)?(?<Chapter>\d+(?:.\d+|-\d+)?)",
MatchOptions, RegexTimeout),
// Russian Chapter: n Главa -> Chapter n
new Regex(
@"(?!Том)(?<!Том\.)\s\d+(\s|_)?(?<Chapter>\d+(?:\.\d+|-\d+)?)(\s|_)(Глава|глава|Главы|Глава)",
MatchOptions, RegexTimeout),
// Batman & Catwoman - Trail of the Gun 01, Batman & Grendel (1996) 01 - Devil's Bones, Teen Titans v1 001 (1966-02) (digital) (OkC.O.M.P.U.T.O.-Novus)
new Regex(
@"^(?<Series>.+?)(?: (?<Chapter>\d+))",
MatchOptions, RegexTimeout),
// Saga 001 (2012) (Digital) (Empire-Zone)
new Regex(
@"(?<Series>.+?)(?: |_)(c? ?)(?<Chapter>(\d+(\.\d)?)-?(\d+(\.\d)?)?)\s\(\d{4}",
MatchOptions, RegexTimeout),
// Amazing Man Comics chapter 25
new Regex(
@"^(?!Vol)(?<Series>.+?)( |_)c(hapter)( |_)(?<Chapter>\d*)",
MatchOptions, RegexTimeout),
// Amazing Man Comics issue #25
new Regex(
@"^(?!Vol)(?<Series>.+?)( |_)i(ssue)( |_) #(?<Chapter>\d*)",
MatchOptions, RegexTimeout),
// spawn-123, spawn-chapter-123 (from https://github.com/Girbons/comics-downloader)
new Regex(
@"^(?<Series>.+?)-(chapter-)?(?<Chapter>\d+)",
MatchOptions, RegexTimeout),
};
private static readonly Regex[] MangaChapterRegex = new[]
{
// Historys Strongest Disciple Kenichi_v11_c90-98.zip, ...c90.5-100.5
new Regex(
@"(\b|_)(c|ch)(\.?\s?)(?<Chapter>(\d+(\.\d)?)-?(\d+(\.\d)?)?)",
MatchOptions, RegexTimeout),
// [Suihei Kiki]_Kasumi_Otoko_no_Ko_[Taruby]_v1.1.zip
new Regex(
@"v\d+\.(\s|_)(?<Chapter>\d+(?:.\d+|-\d+)?)",
MatchOptions, RegexTimeout),
// Umineko no Naku Koro ni - Episode 3 - Banquet of the Golden Witch #02.cbz (Rare case, if causes issue remove)
new Regex(
@"^(?<Series>.*)(?: |_)#(?<Chapter>\d+)",
MatchOptions, RegexTimeout),
// Green Worldz - Chapter 027, Kimi no Koto ga Daidaidaidaidaisuki na 100-nin no Kanojo Chapter 11-10
new Regex(
@"^(?!Vol)(?<Series>.*)\s?(?<!vol\. )\sChapter\s(?<Chapter>\d+(?:\.?[\d-]+)?)",
MatchOptions, RegexTimeout),
// Russian Chapter: Главы n -> Chapter n
new Regex(
@"(Глава|глава|Главы|Глава)(\.?)(\s|_)?(?<Chapter>\d+(?:.\d+|-\d+)?)",
MatchOptions, RegexTimeout),
// Hinowa ga CRUSH! 018 (2019) (Digital) (LuCaZ).cbz, Hinowa ga CRUSH! 018.5 (2019) (Digital) (LuCaZ).cbz
new Regex(
@"^(?!Vol)(?<Series>.+?)(?<!Vol)(?<!Vol.)\s(\d\s)?(?<Chapter>\d+(?:\.\d+|-\d+)?)(?:\s\(\d{4}\))?(\b|_|-)",
MatchOptions, RegexTimeout),
// Tower Of God S01 014 (CBT) (digital).cbz
new Regex(
@"(?<Series>.*)\sS(?<Volume>\d+)\s(?<Chapter>\d+(?:.\d+|-\d+)?)",
MatchOptions, RegexTimeout),
// Beelzebub_01_[Noodles].zip, Beelzebub_153b_RHS.zip
new Regex(
@"^((?!v|vo|vol|Volume).)*(\s|_)(?<Chapter>\.?\d+(?:.\d+|-\d+)?)(?<Part>b)?(\s|_|\[|\()",
MatchOptions, RegexTimeout),
// Yumekui-Merry_DKThias_Chapter21.zip
new Regex(
@"Chapter(?<Chapter>\d+(-\d+)?)", //(?:.\d+|-\d+)?
MatchOptions, RegexTimeout),
// [Hidoi]_Amaenaideyo_MS_vol01_chp02.rar
new Regex(
@"(?<Series>.*)(\s|_)(vol\d+)?(\s|_)Chp\.? ?(?<Chapter>\d+)",
MatchOptions, RegexTimeout),
// Vol 1 Chapter 2
new Regex(
@"(?<Volume>((vol|volume|v))?(\s|_)?\.?\d+)(\s|_)(Chp|Chapter)\.?(\s|_)?(?<Chapter>\d+)",
MatchOptions, RegexTimeout),
// Chinese Chapter: 第n话 -> Chapter n, 【TFO汉化&Petit汉化】迷你偶像漫画第25话
new Regex(
@"第(?<Chapter>\d+)话",
MatchOptions, RegexTimeout),
// Korean Chapter: 제n화 -> Chapter n, 가디언즈 오브 갤럭시 죽음의 보석.E0008.7화#44
new Regex(
@"제?(?<Chapter>\d+\.?\d+)(회|화|장)",
MatchOptions, RegexTimeout),
// Korean Chapter: 第10話 -> Chapter n, [ハレム]ナナとカオル 高校生のSMごっこ 第1話
new Regex(
@"第?(?<Chapter>\d+(?:\.\d+|-\d+)?)話",
MatchOptions, RegexTimeout),
// Russian Chapter: n Главa -> Chapter n
new Regex(
@"(?!Том)(?<!Том\.)\s\d+(\s|_)?(?<Chapter>\d+(?:\.\d+|-\d+)?)(\s|_)(Глава|глава|Главы|Глава)",
MatchOptions, RegexTimeout),
};
private static readonly Regex MangaEditionRegex = new Regex(
// Tenjo Tenge {Full Contact Edition} v01 (2011) (Digital) (ASTC).cbz
// To Love Ru v01 Uncensored (Ch.001-007)
@"\b(?:Omnibus(?:\s?Edition)?|Uncensored)\b",
MatchOptions, RegexTimeout
);
// Matches anything between balanced parenthesis, tags between brackets, {} and {Complete}
private static readonly Regex CleanupRegex = new Regex(
$@"(?:\({BalancedParen}\)|{TagsInBrackets}|\{{\}}|\{{Complete\}})",
MatchOptions, RegexTimeout
);
private static readonly Regex MangaSpecialRegex = new Regex(
// All Keywords, does not account for checking if contains volume/chapter identification. Parser.Parse() will handle.
$@"\b(?:{CommonSpecial}|Omake)\b",
MatchOptions, RegexTimeout
);
private static readonly Regex ComicSpecialRegex = new Regex(
// All Keywords, does not account for checking if contains volume/chapter identification. Parser.Parse() will handle.
$@"\b(?:{CommonSpecial}|\d.+?(\W|-|^)Annual|Annual(\W|-|$)|Book \d.+?|Compendium(\W|-|$|\s.+?)|Omnibus(\W|-|$|\s.+?)|FCBD \d.+?|Absolute(\W|-|$|\s.+?)|Preview(\W|-|$|\s.+?)|Hors[ -]S[ée]rie|TPB|HS|THS)\b",
MatchOptions, RegexTimeout
);
private static readonly Regex EuropeanComicRegex = new Regex(
// All Keywords, does not account for checking if contains volume/chapter identification. Parser.Parse() will handle.
@"\b(?:Bd[-\s]Fr)\b",
MatchOptions, RegexTimeout
);
// If SP\d+ is in the filename, we force treat it as a special regardless if volume or chapter might have been found.
private static readonly Regex SpecialMarkerRegex = new Regex(
@"SP\d+",
MatchOptions, RegexTimeout
);
private static readonly Regex EmptySpaceRegex = new Regex(
@"\s{2,}",
MatchOptions, RegexTimeout
);
public static MangaFormat ParseFormat(string filePath)
{
if (IsArchive(filePath)) return MangaFormat.Archive;
if (IsImage(filePath)) return MangaFormat.Image;
if (IsEpub(filePath)) return MangaFormat.Epub;
if (IsPdf(filePath)) return MangaFormat.Pdf;
return MangaFormat.Unknown;
}
public static string ParseEdition(string filePath)
{
filePath = ReplaceUnderscores(filePath);
var match = MangaEditionRegex.Match(filePath);
return match.Success ? match.Value : string.Empty;
}
/// <summary>
/// If the file has SP marker.
/// </summary>
/// <param name="filePath"></param>
/// <returns></returns>
public static bool HasSpecialMarker(string filePath)
{
return SpecialMarkerRegex.IsMatch(filePath);
}
public static bool IsMangaSpecial(string filePath)
{
filePath = ReplaceUnderscores(filePath);
return MangaSpecialRegex.IsMatch(filePath);
}
public static bool IsComicSpecial(string filePath)
{
filePath = ReplaceUnderscores(filePath);
return ComicSpecialRegex.IsMatch(filePath);
}
public static string ParseSeries(string filename)
{
foreach (var regex in MangaSeriesRegex)
{
var matches = regex.Matches(filename);
var group = matches
.Select(match => match.Groups["Series"])
.FirstOrDefault(group => group.Success && group != Match.Empty);
if (group != null) return CleanTitle(group.Value);
}
return string.Empty;
}
public static string ParseComicSeries(string filename)
{
foreach (var regex in ComicSeriesRegex)
{
var matches = regex.Matches(filename);
var group = matches
.Select(match => match.Groups["Series"])
.FirstOrDefault(group => group.Success && group != Match.Empty);
if (group != null) return CleanTitle(group.Value, true);
}
return string.Empty;
}
public static string ParseVolume(string filename)
{
foreach (var regex in MangaVolumeRegex)
{
var matches = regex.Matches(filename);
foreach (var group in matches.Select(match => match.Groups))
{
if (!group["Volume"].Success || group["Volume"] == Match.Empty) continue;
var value = group["Volume"].Value;
var hasPart = group["Part"].Success;
return FormatValue(value, hasPart);
}
}
return DefaultVolume;
}
public static string ParseComicVolume(string filename)
{
foreach (var regex in ComicVolumeRegex)
{
var matches = regex.Matches(filename);
foreach (var group in matches.Select(match => match.Groups))
{
if (!group["Volume"].Success || group["Volume"] == Match.Empty) continue;
var value = group["Volume"].Value;
var hasPart = group["Part"].Success;
return FormatValue(value, hasPart);
}
}
return DefaultVolume;
}
private static string FormatValue(string value, bool hasPart)
{
if (!value.Contains('-'))
{
return RemoveLeadingZeroes(hasPart ? AddChapterPart(value) : value);
}
var tokens = value.Split("-");
var from = RemoveLeadingZeroes(tokens[0]);
if (tokens.Length != 2) return from;
var to = RemoveLeadingZeroes(hasPart ? AddChapterPart(tokens[1]) : tokens[1]);
return $"{from}-{to}";
}
public static string ParseChapter(string filename)
{
foreach (var regex in MangaChapterRegex)
{
var matches = regex.Matches(filename);
foreach (var groups in matches.Select(match => match.Groups))
{
if (!groups["Chapter"].Success || groups["Chapter"] == Match.Empty) continue;
var value = groups["Chapter"].Value;
var hasPart = groups["Part"].Success;
return FormatValue(value, hasPart);
}
}
return DefaultChapter;
}
private static string AddChapterPart(string value)
{
if (value.Contains('.'))
{
return value;
}
return $"{value}.5";
}
public static string ParseComicChapter(string filename)
{
foreach (var regex in ComicChapterRegex)
{
var matches = regex.Matches(filename);
foreach (var groups in matches.Select(match => match.Groups))
{
if (!groups["Chapter"].Success || groups["Chapter"] == Match.Empty) continue;
var value = groups["Chapter"].Value;
var hasPart = groups["Part"].Success;
return FormatValue(value, hasPart);
}
}
return DefaultChapter;
}
private static string RemoveEditionTagHolders(string title)
{
title = CleanupRegex.Replace(title, string.Empty);
title = MangaEditionRegex.Replace(title, string.Empty);
return title;
}
private static string RemoveMangaSpecialTags(string title)
{
return MangaSpecialRegex.Replace(title, string.Empty);
}
private static string RemoveEuropeanTags(string title)
{
return EuropeanComicRegex.Replace(title, string.Empty);
}
private static string RemoveComicSpecialTags(string title)
{
return ComicSpecialRegex.Replace(title, string.Empty);
}
/// <summary>
/// Translates _ -> spaces, trims front and back of string, removes release groups
/// <example>
/// Hippos_the_Great [Digital], -> Hippos the Great
/// </example>
/// </summary>
/// <param name="title"></param>
/// <param name="isComic"></param>
/// <returns></returns>
public static string CleanTitle(string title, bool isComic = false)
{
title = ReplaceUnderscores(title);
title = RemoveEditionTagHolders(title);
if (isComic)
{
title = RemoveComicSpecialTags(title);
title = RemoveEuropeanTags(title);
}
else
{
title = RemoveMangaSpecialTags(title);
}
title = title.Trim(SpacesAndSeparators);
title = EmptySpaceRegex.Replace(title, " ");
return title;
}
/// <summary>
/// Pads the start of a number string with 0's so ordering works fine if there are over 100 items.
/// Handles ranges (ie 4-8) -> (004-008).
/// </summary>
/// <param name="number"></param>
/// <returns>A zero padded number</returns>
public static string PadZeros(string number)
{
if (!number.Contains('-')) return PerformPadding(number);
var tokens = number.Split("-");
return $"{PerformPadding(tokens[0])}-{PerformPadding(tokens[1])}";
}
private static string PerformPadding(string number)
{
var num = int.Parse(number);
return num switch
{
< 10 => "00" + num,
< 100 => "0" + num,
_ => number
};
}
public static string RemoveLeadingZeroes(string title)
{
var ret = title.TrimStart(LeadingZeroesTrimChars);
return string.IsNullOrEmpty(ret) ? "0" : ret;
}
public static bool IsArchive(string filePath)
{
return ArchiveFileRegex.IsMatch(Path.GetExtension(filePath));
}
public static bool IsComicInfoExtension(string filePath)
{
return ComicInfoArchiveRegex.IsMatch(Path.GetExtension(filePath));
}
public static bool IsBook(string filePath)
{
return BookFileRegex.IsMatch(Path.GetExtension(filePath));
}
public static bool IsImage(string filePath)
{
return !filePath.StartsWith('.') && ImageRegex.IsMatch(Path.GetExtension(filePath));
}
public static bool IsXml(string filePath)
{
return XmlRegex.IsMatch(Path.GetExtension(filePath));
}
public static float MinNumberFromRange(string range)
{
try
{
if (!Regex.IsMatch(range, @"^[\d\-.]+$", MatchOptions, RegexTimeout))
{
return (float) 0.0;
}
var tokens = range.Replace("_", string.Empty).Split("-");
return tokens.Min(float.Parse);
}
catch
{
return (float) 0.0;
}
}
public static float MaxNumberFromRange(string range)
{
try
{
if (!Regex.IsMatch(range, @"^[\d\-.]+$", MatchOptions, RegexTimeout))
{
return (float) 0.0;
}
var tokens = range.Replace("_", string.Empty).Split("-");
return tokens.Max(float.Parse);
}
catch
{
return (float) 0.0;
}
}
public static string Normalize(string name)
{
return NormalizeRegex.Replace(name, string.Empty).Trim().ToLower();
}
/// <summary>
/// Responsible for preparing special title for rendering to the UI. Replaces _ with ' ' and strips out SP\d+
/// </summary>
/// <param name="name"></param>
/// <returns></returns>
public static string CleanSpecialTitle(string name)
{
if (string.IsNullOrEmpty(name)) return name;
var cleaned = SpecialTokenRegex.Replace(name.Replace('_', ' '), string.Empty).Trim();
var lastIndex = cleaned.LastIndexOf('.');
if (lastIndex > 0)
{
cleaned = cleaned.Substring(0, cleaned.LastIndexOf('.')).Trim();
}
return string.IsNullOrEmpty(cleaned) ? name : cleaned;
}
/// <summary>
/// Tests whether the file is a cover image such that: contains "cover", is named "folder", and is an image
/// </summary>
/// <remarks>If the path has "backcover" in it, it will be ignored</remarks>
/// <param name="filename">Filename with extension</param>
/// <returns></returns>
public static bool IsCoverImage(string filename)
{
return IsImage(filename) && CoverImageRegex.IsMatch(filename);
}
/// <summary>
/// Validates that a Path doesn't start with certain blacklisted folders, like __MACOSX, @Recently-Snapshot, etc and that if a full path, the filename
/// doesn't start with ._, which is a metadata file on MACOSX.
/// </summary>
/// <param name="path"></param>
/// <returns></returns>
public static bool HasBlacklistedFolderInPath(string path)
{
return path.Contains("__MACOSX") || path.StartsWith("@Recently-Snapshot") || path.StartsWith("@recycle")
|| path.StartsWith("._") || Path.GetFileName(path).StartsWith("._") || path.Contains(".qpkg")
|| path.Contains(".caltrash");
}
public static bool IsEpub(string filePath)
{
return Path.GetExtension(filePath).Equals(".epub", StringComparison.InvariantCultureIgnoreCase);
}
public static bool IsPdf(string filePath)
{
return Path.GetExtension(filePath).Equals(".pdf", StringComparison.InvariantCultureIgnoreCase);
}
/// <summary>
/// Cleans an author's name
/// </summary>
/// <remarks>If the author is Last, First, this will not reverse</remarks>
/// <param name="author"></param>
/// <returns></returns>
public static string CleanAuthor(string author)
{
return string.IsNullOrEmpty(author) ? string.Empty : author.Trim();
}
/// <summary>
/// Cleans user query string input
/// </summary>
/// <param name="query"></param>
/// <returns></returns>
public static string CleanQuery(string query)
{
return Uri.UnescapeDataString(query).Trim().Replace(@"%", string.Empty)
.Replace(":", string.Empty);
}
/// <summary>
/// Normalizes the slashes in a path to be <see cref="Path.AltDirectorySeparatorChar"/>
/// </summary>
/// <example>/manga/1\1 -> /manga/1/1</example>
/// <param name="path"></param>
/// <returns></returns>
public static string NormalizePath(string? path)
{
return string.IsNullOrEmpty(path) ? string.Empty : path.Replace(Path.DirectorySeparatorChar, Path.AltDirectorySeparatorChar)
.Replace(@"//", Path.AltDirectorySeparatorChar + string.Empty);
}
/// <summary>
/// Checks against a set of strings to validate if a ComicInfo.Format should receive special treatment
/// </summary>
/// <param name="comicInfoFormat"></param>
/// <returns></returns>
public static bool HasComicInfoSpecial(string comicInfoFormat)
{
return FormatTagSpecialKeywords.Contains(comicInfoFormat);
}
private static string ReplaceUnderscores(string name)
{
return string.IsNullOrEmpty(name) ? string.Empty : name.Replace('_', ' ');
}
public static string? ExtractFilename(string fileUrl)
{
var matches = Parser.CssImageUrlRegex.Matches(fileUrl);
foreach (Match match in matches)
{
if (!match.Success) continue;
// NOTE: This is failing for //localhost:5000/api/book/29919/book-resources?file=OPS/images/tick1.jpg
var importFile = match.Groups["Filename"].Value;
if (!importFile.Contains("?")) return importFile;
}
return null;
}
}