v0.5.6 - Performance Part 2 (Is that a new scan loop?) (#1500)

* New Scan Loop (#1447)

* Staging the code for the new scan loop.

* Implemented a basic idea of changes on drives triggering scan loop. Issues: 1. Scan by folder does not work, 2. Queuing system is very hacky and needs a separate thread, 3. Performance degregation could be very real.

* Started writing unit test for new loop code

* Implemented a basic method to scan a folder path with ignore support (not implemented, code in place)

* Added some code to the parser to build out the idea of processing series in batches based on some top level folder.

* Scan Series now uses the new code (folder based parsing) and now handles the LocalizedSeries issue.

* Got library scan working with the new folder-based scan loop. Updated code to set FolderPath (for improved scan times and partial scan support).

* Wrote some notes on update library scan loop.

* Removed migration for merge

* Reapplied the SeriesFolder migration after merge

* Refactored a check that used multiple db calls into one.

* Made lots of progress on ignore support, but some confusion on underlying library. Ticket created. On hold till then.

* Updated Scan Library and Scan Series to exit early if no changes are on the underlying folders that need to be scanned.

* Implemented the ability to have .kavitaignore files within your directories and Kavita will parse them and ignore files and directories based on rules within them.

* Fixed an issue where ignore files nested wouldn't stack with higher level ignores

* Wrote out some basic code that showcases how we can scan series or library based on file events on the underlying system. Very buggy, needs lots of edge case testing and logging and dupplication checking.

* Things are working kinda. I'm getting lost in my own code and complexity. I'm not sure it's worth it.

* Refactored ScanFiles out to Directory Service.

* Refactored more code out to keep the code clean.

* More unit tests

* Refactored the signature of ParsedSeries to use IList. Started writing unit tests and reworked the UpdateLibrary to work how it used to with new scan loop code (note: using async update library/series does not work).

* Fixed the bug where processSeriesInfos was being invoked twice per series and made the code work very similar to old code (except loose leaf files dont work) but with folder based scanning.

* Prep for unit tests (updating broken ones with new implementations)

* Just some notes. Not sure I want to finish this work.

* Refactored the LibraryWatcher with some comments and state variables.

* Undid the migrations in case I don't move forward with this branch

* Started to clean the code and prepare for finishing this work.

* Fixed a bad merge

* Updated signatures to cleanup the code and commit to the new strategy for scanning.

* Swapped out the code with async processing of series on a small library

* The new scan loop is working in both Sync and Async methods. The code is slow and not optimized. This represents a good point to start polling and applying optimizations.

* Refactored UpdateSeries out of Scanner and into a dedicated file.

* Refactored how ProcessTasks are awaited to allow more async

* Fixed an issue where side nav item wouldn't show correct highlight and migrated to OnPush

* Moved where we start to stopwatch to encapsulate the full scan

* Cleaned up SignalR events to report correctly (still needs a redesign)

* Remove the "remove" code until I figure it out

* Put in extremely expensive series deletion code for library scan.

* Have Genre and Tag update the DB immediately to avoid dup issues

* Taking a break

* Moving to a lock with People was successful. Need to apply to others.

* Refactored code for series level and tag and genre with new locking strategy.

* New scan loop works. Next up optimization

* Swapped out the Kavita log with svg for faster load

* Refactored metadata updates to occur when the series are being updated.

* Code cleanup

* Added a new type of generic message (Info) to inform the user.

* Code cleanup

* Implemented an optimization which prevents any I/O (other than an attribute lookup) for Library/Series Scan. This can bring a recently updated library on network storage (650 series) to fully process in 2 seconds.

Fixed a bug where File Analysis was running everytime for each non-epub file.

* Fixed ARM x64 builds not being able to view PDF cover images due to a bad update in DocNet.

* Some code cleanup

* Added experimental signalr update code to have a more natural refresh of library-detail page

* Hooked in ability to send new series events to UI

* Moved all scan (file scan only) tasks into Scan Queue. Made it so scheduled ScanLibraries will now check if any existing task is being run and reschedule for 3 hours, and 10 mins for scan series.

* Implemented the info event in the events widget and added a clear all button to dismiss all infos and errors.  Added --event-widget-info-bg-color

* Remove --drawer-background-color since it's not used

* When new series added, inject directly into the view.

* Some debug code cleanup

* Fixed up the unit tests

* Ensure all config directories exist on startup

* Disabled Library Watching (that will go in next build)

* Ensure update for series is admin only

* Lots of code changes, scan series kinda works, specials are splitting, optimizations are failing. Demotivated on this work again.

* Removed SeriesFolder migration

* Added the SeriesFolder migration

* Added a new pipe for dates so we can provide some nicer defaults. Added folder path to the series detail.

* The scan optimizations now work for NTFS systems.

* Removed a TODO

* Migrated all the times to use DateTime.Now and not Utc.

* Refactored some repo calls to use the includes flag pattern

* Implemented a check for the library scan optimization check to validate if the library was updated (type change, library rename, folder change, or series deleted) and let the optimization be bypassed.

* Added another optimization which will use just folder attribute of last write time if the drive is not NTFS.

* Fixed a unit test

* Some code cleanup

* Bump versions by dotnet-bump-version.

* Misc UI Fixes (#1450)

* Fixed collection cover images not rendering

* added a try/catch on sending email, so we fail silently if it doesn't send.

* Fixed Go Back not returning to last scroll position due to layoutmode change resetting, despite nothing changing.

* Fixed a bug where when turning between pages on default mode, the height calculations could get skewed.

* Fixed a missing case for card item where it wouldn't show tooltip title for series.

* Bump versions by dotnet-bump-version.

* New Scan Loop Fixes (#1452)

* Refactored ScanSeries to avoid a lot of extra work and fixed a bug where Scan Series would invoke the processing twice.

Refactored the series selection code during process such that we use Localized Name as well, for cases where the original name was changed.

Undid an optimization around Last Write time, since Linux file systems match how NTFS works.

* Fixed part of the query

* Added a NormalizedLocalizedName for quick searching in which a series needs grouping. Reworked scan loop code a bit to ensure we don't do extra work.

Tweaked the widget logic to help display better and not show "Nothing going on here".

* Fixed a bug where archives with ._ files would be counted as valid files, while they are actually just metadata files on Mac's.

* Fixed a broken unit test

* Bump versions by dotnet-bump-version.

* Simplify parent lookup with Directory.GetParent (#1455)

* Simplify parent lookup with Directory.GetParent

* Address comments

* Bump versions by dotnet-bump-version.

* Scan Loop Fixes (#1459)

* Added Last Folder Scanned time to series info modal.

Tweaked the info event detail modal to have a primary and thus be auto-dismissable

* Added an error event when multiple series are found in processing a series.

* Fixed a bug where a series could get stuck with other series due to a bad select query.

Started adding the force flag hook for the UI and designing the confirm.

Confirm service now also has ability to hide the close button.

Updated error events and logging in the loop, to be more informative

* Fixed a bug where confirm service wasn't showing the proper body content.

* Hooked up force scan series

* refresh metadata now has force update

* Fixed up the messaging with the prompt on scan, hooked it up properly in the scan library to avoid the check if the whole library needs to even be scanned. Fixed a bug where NormalizedLocalizedName wasn't being calculated on new entities.

Started adding unit tests for this problematic repo method.

* Fixed a bug where we updated NormalizedLocalizedName before we set it.

* Send an info to the UI when series are spread between multiple library level folders.

* Added some logger output when there are no files found in a folder. Return early if there are no files found, so we can avoid some small loops of code.

* Fixed an issue where multiple series in a folder with localized series would cause unintended grouping. This is not supported and hence we will warn them and allow the bad grouping.

* Added a case where scan series fails due to the folder being removed. We will now log an error

* Normalize paths when finding the highest directory till root.

* Fixed an issue with Scan Series where changing a series' folder to a different path but the original series folder existed with another series in it, would cause the series to not be deleted.

* Fixed some bugs around specials causing a series merge issue on scan series.

* Removed a bug marker

* Cleaned up some of the scan loop and removed a test I don't need.

* Remove any prompts for force flow, it doesn't work well. Leave the API as is though.

* Fixed up a check for duplicate ScanLibrary calls

* Bump versions by dotnet-bump-version.

* Scroll Resume (#1460)

* When we navigate from a page then back, resume back on the last scroll key (if clicked)

* Resume jump key position when navigating back to a page. Removed some extra blank space on collection detail when a collection doesn't have a summary or cover image.

* Ignore progress events on series cards

* Added a url to swagger for /, which could be reverse proxy url

* Bump versions by dotnet-bump-version.

* Misc UI fixes (#1461)

* Misc fixes

- Fixed modal being stretched when not needed.
- Fixed Logo vertical align
- Fixed drawer content scroll, and from it being squished due to overridden by bootstrap.

* series detail cover image stretch fix

- Fixes: Fixes series detail cover image being stretched on larger resolutions

* fixing empty lists scrollbar

* Fixing want to read error

* fixing unnecessary scrollbar

* Fixing recently updated tooltip

* Bump versions by dotnet-bump-version.

* Folder Watching (#1467)

* Hooked in a server setting to enable/disable folder watching

* Validated the file rename change event

* Validated delete file works

* Tweaked some logic to determine if a change occurs on a folder or a file.

* Added a note for an upcoming branch

* Some minor changes in the loop that just shift where code runs.

* Implemented ScanFolder api

* Ensure we restart watchers when we modify a library folder.

* Fixed a unit test

* Bump versions by dotnet-bump-version.

* More Scan Loop Bugfixes (#1471)

* Updated scan time for watcher to 30 seconds for non-dev. Moved ScanFolder off the Scan queue as it doesn't need to be there. Updated loggers

* Fixed jumpbar missing

* Tweaked the messaging for CoverGen

* When we return early due to nothing being done on library and series scan, make sure we kick off other tasks that need to occur.

* Fixed a foreign constraint issue on Volumes when we were adding to a new series.

* Fixed a case where when picking normalized series, capitalization differences wouldn't stack when they should.

* Reduced the logging output on dev and prod settings.

* Fixed a bug in the code that finds the highest directory from a file, where we were not checking against a normalized path.

* Cleaned up some code

* Fixed broken unit tests

* Bump versions by dotnet-bump-version.

* More Scan Loop Fixes (#1473)

* Added a ToList() to avoid a bug where a person could be removed from a list while iterating over the list.

* When deleting a series, want to read page will now automatically remove that series from the view.

* Fixed a series lookup which was ignoring format

* Ignore XML comment warnings

* Removed a note since it was already working that way

* Fixed unit test

* Bump versions by dotnet-bump-version.

* Misc UI Fixes (#1477)

* Tweaked a Migration to log correctly only if something is going to be done.

* Refactored Reading List Controller code into a dedicated service and cleaned up some methods that aren't needed anymore.

* Fixed a bug where adding a new item to a reading list wasn't adding it at the end.

* Fixed an issue where collection page would re-render the same covers on multiple items.

* Fixed a missing margin-top which made the page extras drawer not render correctly and hence unclosable on small screens.

* Added some timeout on manage users screen to give data time to flush.

Added a dedicated token log for account flows, in case url encoding plays a part (but from testing it doesn't).

* Reverted back to building for ES6 instead of es2020 for old Safari 12.5.5 browsers (10MB difference in build size).

* Cleaned up the logic in removing series not found during scan loop.

* Tweaked the timings for Library Watcher to 1 min and reprocess queue every 30 seconds.

* Bump versions by dotnet-bump-version.

* Added fixes for libvips (#1479)

* Bump versions by dotnet-bump-version.

* Tachiyomi + Fixes (#1481)

* Fixed a bootstrap bug

* Fixed repeating images on collection detail

* Fixed up some logic in library watcher which wasn't processing all of the queue.

* When parsing non-epubs in Book library, use Manga parsing for Volume support to better support Light Novels

* Fixed some bugs with the tachiyomi plugin api's for progress tracking

* Bump versions by dotnet-bump-version.

* Adding Health controller (#1480)

* Adding Health controller

- Added: Added API endpoint for a health check to streamline docker healthy status.

* review comment fixes

* Bump versions by dotnet-bump-version.

* Simplify Folder Watcher (#1484)

* Refactored Library Watcher to use Hangfire under the hood.

* Support .kavitaignore at root level.

* Refactored a lot of the library watching code to process faster and handle when FileSystemWatcher runs out of internal buffer space. It's still not perfect, but good enough for basic use.

* Make folder watching as experimental and default it to off by default.

* Revert #1479

* Tweaked the messaging for OPDS to remove a note about download role.

Moved some code closer to where it's used.

* Cleaned up how the events widget reports

* Fixed a null issue when deleting series in the UI

* Cleaned up some debug code

* Added more information for when we skip a scan

* Cleaned up some logging messages in CoverGen tasks

* More log message tweaks

* Added some debug to help identify a rare issue

* Fixed a bug where save bookmarks as webp could get reset to false when saving other server settings

* Updated some documentation on library watcher.

* Make LibraryWatcher fire every 5 mins

* Bump versions by dotnet-bump-version.

* Sort series by chapter number only when some chapters have no volume (#1487)

* Sort series by chapter number only when some chapters have no volume information

* Implement a Default static instance of ChapterSortComparer

* Further use Default static Comparers

* Add missing ToLit() as per comments

* SQLite Hangfire  (#1488)

* Update to use SQLIte for Hangfire to retain information on tasks

* Updated all external links to have noopener noreferrer

* When watching folders, ensure the folders exist before creating watchers.

* Tweaked the messaging for Email Service and added link to the project.

* Bump versions by dotnet-bump-version.

* Bump versions by dotnet-bump-version.

* Fixed typeahead not working correctly (#1490)

* Bump versions by dotnet-bump-version.

* Release Testing Day 1 (#1491)

* Fixed a bug where typeahead wouldn't automatically show results on relationship screen without an additional click.

* Tweaked the code which checks if a modification occured to check on seconds rather than minutes

* Clear cache will now clear temp/ directory as well.

* Fixed an issue where Chrome was caching api responses when it shouldn't had.

* Added a cleanup temp code

* Ensure genres get removed during series scan when removed from metadata.

* Fixed a bug where all epubs with a volume would show as Volume 0 in reading list

* When a scan is in progress, don't let the user delete the library.

* Bump versions by dotnet-bump-version.

* Scan Loop Last Write Time Change (#1492)

* Refactored invite user flow to separate error handling on create user flow and email flow. This should help users that have unique situations.

* Switch to using files to check LastWriteTime. Debug code in for Robbie to test on rclone

* Updated Parser namespace. Changed the LastWriteTime to check all files and folders.

* Bump versions by dotnet-bump-version.

* Release Testing Day 2 (#1493)

* Added a no data section to collection detail.

* Remove an optimization for skipping the whole library scan as it wasn't reliable

* When resetting password, ensure the input is colored correctly

* Fixed setting new password after resetting, throwing an error despite it actually being successful.

Fixed incorrect messaging for Password Reset page.

* Fixed a bug where reset password would show the side nav button and skew the page.

Updated a lot of references to use Typed version for formcontrols.

* Removed a migration from 0.5.0, 6 releases ago.

* Added a null check so we don't throw an exception when connecting with signalR on unauthenticated users.

* Bump versions by dotnet-bump-version.

* Fixed a bug where a series with a relationship couldn't be deleted. (#1495)

* Bump versions by dotnet-bump-version.

* Release Testing Day 3 (#1496)

* Tweaked log messaging for library scan when no files were scanned.

* When a theme that is set gets removed due to a scan, inform the user to refresh.

* Fixed a typo and make Darkness -> Brightness

* Make download theme files allowed to be invoked by non-authenticated users, to allow new users to get the default theme.

* Hide all series side nav item if there are no libraries exposed to the user

* Fixed an API for Tachiyomi when syncing progress

* Fixed dashboard not responding to Series Removed and Added events.

Ensure we send SeriesRemoved events when they are deleted.

* Reverted Hangfire SQLite due to aborted jobs being resumed, when they shouldnt. Fixed some scan loop issues where cover gen wouldn't be invoked always on new libraries.

* Bump versions by dotnet-bump-version.

* Updating series detail cover style (#1498)

# FIxed
- Fixed: Fixed an issue with series detail cover when scaled down.

* Bump versions by dotnet-bump-version.

* Version bump

* v0.5.6 Release (#1499)

Co-authored-by: tjarls <tjarls@gmail.com>
Co-authored-by: Robbie Davis <robbie@therobbiedavis.com>
Co-authored-by: Chris Plaatjes <kizaing@gmail.com>
This commit is contained in:
Joseph Milazzo 2022-09-02 07:52:51 -05:00 committed by GitHub
parent b40fcdb6a6
commit 150e67031a
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
207 changed files with 8183 additions and 2218 deletions

View file

@ -0,0 +1,250 @@
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Threading.Tasks;
using API.Data;
using Hangfire;
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Logging;
namespace API.Services.Tasks.Scanner;
/// <summary>
/// Change information
/// </summary>
public class Change
{
/// <summary>
/// Gets or sets the type of the change.
/// </summary>
/// <value>
/// The type of the change.
/// </value>
public WatcherChangeTypes ChangeType { get; set; }
/// <summary>
/// Gets or sets the full path.
/// </summary>
/// <value>
/// The full path.
/// </value>
public string FullPath { get; set; }
/// <summary>
/// Gets or sets the name.
/// </summary>
/// <value>
/// The name.
/// </value>
public string Name { get; set; }
/// <summary>
/// Gets or sets the old full path.
/// </summary>
/// <value>
/// The old full path.
/// </value>
public string OldFullPath { get; set; }
/// <summary>
/// Gets or sets the old name.
/// </summary>
/// <value>
/// The old name.
/// </value>
public string OldName { get; set; }
}
public interface ILibraryWatcher
{
/// <summary>
/// Start watching all library folders
/// </summary>
/// <returns></returns>
Task StartWatching();
/// <summary>
/// Stop watching all folders
/// </summary>
void StopWatching();
/// <summary>
/// Essentially stops then starts watching. Useful if there is a change in folders or libraries
/// </summary>
/// <returns></returns>
Task RestartWatching();
}
/// <summary>
/// Responsible for watching the file system and processing change events. This is mainly responsible for invoking
/// Scanner to quickly pickup on changes.
/// </summary>
public class LibraryWatcher : ILibraryWatcher
{
private readonly IDirectoryService _directoryService;
private readonly IUnitOfWork _unitOfWork;
private readonly ILogger<LibraryWatcher> _logger;
private readonly IScannerService _scannerService;
private readonly Dictionary<string, IList<FileSystemWatcher>> _watcherDictionary = new ();
/// <summary>
/// This is just here to prevent GC from Disposing our watchers
/// </summary>
private readonly IList<FileSystemWatcher> _fileWatchers = new List<FileSystemWatcher>();
private IList<string> _libraryFolders = new List<string>();
private readonly TimeSpan _queueWaitTime;
public LibraryWatcher(IDirectoryService directoryService, IUnitOfWork unitOfWork, ILogger<LibraryWatcher> logger, IScannerService scannerService, IHostEnvironment environment)
{
_directoryService = directoryService;
_unitOfWork = unitOfWork;
_logger = logger;
_scannerService = scannerService;
_queueWaitTime = environment.IsDevelopment() ? TimeSpan.FromSeconds(30) : TimeSpan.FromMinutes(5);
}
public async Task StartWatching()
{
_logger.LogInformation("Starting file watchers");
_libraryFolders = (await _unitOfWork.LibraryRepository.GetLibraryDtosAsync())
.SelectMany(l => l.Folders)
.Distinct()
.Select(Parser.Parser.NormalizePath)
.Where(_directoryService.Exists)
.ToList();
foreach (var libraryFolder in _libraryFolders)
{
_logger.LogDebug("Watching {FolderPath}", libraryFolder);
var watcher = new FileSystemWatcher(libraryFolder);
watcher.Changed += OnChanged;
watcher.Created += OnCreated;
watcher.Deleted += OnDeleted;
watcher.Error += OnError;
watcher.Filter = "*.*";
watcher.IncludeSubdirectories = true;
watcher.EnableRaisingEvents = true;
_fileWatchers.Add(watcher);
if (!_watcherDictionary.ContainsKey(libraryFolder))
{
_watcherDictionary.Add(libraryFolder, new List<FileSystemWatcher>());
}
_watcherDictionary[libraryFolder].Add(watcher);
}
}
public void StopWatching()
{
_logger.LogInformation("Stopping watching folders");
foreach (var fileSystemWatcher in _watcherDictionary.Values.SelectMany(watcher => watcher))
{
fileSystemWatcher.EnableRaisingEvents = false;
fileSystemWatcher.Changed -= OnChanged;
fileSystemWatcher.Created -= OnCreated;
fileSystemWatcher.Deleted -= OnDeleted;
fileSystemWatcher.Dispose();
}
_fileWatchers.Clear();
_watcherDictionary.Clear();
}
public async Task RestartWatching()
{
StopWatching();
await StartWatching();
}
private void OnChanged(object sender, FileSystemEventArgs e)
{
if (e.ChangeType != WatcherChangeTypes.Changed) return;
_logger.LogDebug("[LibraryWatcher] Changed: {FullPath}, {Name}", e.FullPath, e.Name);
ProcessChange(e.FullPath, string.IsNullOrEmpty(_directoryService.FileSystem.Path.GetExtension(e.Name)));
}
private void OnCreated(object sender, FileSystemEventArgs e)
{
_logger.LogDebug("[LibraryWatcher] Created: {FullPath}, {Name}", e.FullPath, e.Name);
ProcessChange(e.FullPath, !_directoryService.FileSystem.File.Exists(e.Name));
}
/// <summary>
/// From testing, on Deleted only needs to pass through the event when a folder is deleted. If a file is deleted, Changed will handle automatically.
/// </summary>
/// <param name="sender"></param>
/// <param name="e"></param>
private void OnDeleted(object sender, FileSystemEventArgs e) {
var isDirectory = string.IsNullOrEmpty(_directoryService.FileSystem.Path.GetExtension(e.Name));
if (!isDirectory) return;
_logger.LogDebug("[LibraryWatcher] Deleted: {FullPath}, {Name}", e.FullPath, e.Name);
ProcessChange(e.FullPath, true);
}
private void OnError(object sender, ErrorEventArgs e)
{
_logger.LogError(e.GetException(), "[LibraryWatcher] An error occured, likely too many watches occured at once. Restarting Watchers");
Task.Run(RestartWatching);
}
/// <summary>
/// Processes the file or folder change. If the change is a file change and not from a supported extension, it will be ignored.
/// </summary>
/// <remarks>This will ignore image files that are added to the system. However, they may still trigger scans due to folder changes.</remarks>
/// <param name="filePath">File or folder that changed</param>
/// <param name="isDirectoryChange">If the change is on a directory and not a file</param>
private void ProcessChange(string filePath, bool isDirectoryChange = false)
{
var sw = Stopwatch.StartNew();
try
{
// We need to check if directory or not
if (!isDirectoryChange &&
!(Parser.Parser.IsArchive(filePath) || Parser.Parser.IsBook(filePath))) return;
var parentDirectory = _directoryService.GetParentDirectoryName(filePath);
if (string.IsNullOrEmpty(parentDirectory)) return;
// We need to find the library this creation belongs to
// Multiple libraries can point to the same base folder. In this case, we need use FirstOrDefault
var libraryFolder = _libraryFolders.FirstOrDefault(f => parentDirectory.Contains(f));
if (string.IsNullOrEmpty(libraryFolder)) return;
var rootFolder = _directoryService.GetFoldersTillRoot(libraryFolder, filePath).ToList();
if (!rootFolder.Any()) return;
// Select the first folder and join with library folder, this should give us the folder to scan.
var fullPath =
Parser.Parser.NormalizePath(_directoryService.FileSystem.Path.Join(libraryFolder, rootFolder.First()));
var alreadyScheduled =
TaskScheduler.HasAlreadyEnqueuedTask(ScannerService.Name, "ScanFolder", new object[] {fullPath});
_logger.LogDebug("{FullPath} already enqueued: {Value}", fullPath, alreadyScheduled);
if (!alreadyScheduled)
{
_logger.LogDebug("[LibraryWatcher] Scheduling ScanFolder for {Folder}", fullPath);
BackgroundJob.Schedule(() => _scannerService.ScanFolder(fullPath), _queueWaitTime);
}
else
{
_logger.LogDebug("[LibraryWatcher] Skipped scheduling ScanFolder for {Folder} as a job already queued",
fullPath);
}
}
catch (Exception ex)
{
_logger.LogError(ex, "[LibraryWatcher] An error occured when processing a watch event");
}
_logger.LogDebug("ProcessChange occured in {ElapsedMilliseconds}ms", sw.ElapsedMilliseconds);
}
}

View file

@ -1,37 +1,53 @@
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Threading.Tasks;
using API.Data.Metadata;
using API.Entities;
using API.Entities.Enums;
using API.Helpers;
using API.Extensions;
using API.Parser;
using API.SignalR;
using Microsoft.AspNetCore.SignalR;
using Microsoft.Extensions.Logging;
namespace API.Services.Tasks.Scanner
{
public class ParsedSeries
{
/// <summary>
/// Name of the Series
/// </summary>
public string Name { get; init; }
/// <summary>
/// Normalized Name of the Series
/// </summary>
public string NormalizedName { get; init; }
/// <summary>
/// Format of the Series
/// </summary>
public MangaFormat Format { get; init; }
}
public enum Modified
{
Modified = 1,
NotModified = 2
}
public class SeriesModified
{
public string FolderPath { get; set; }
public string SeriesName { get; set; }
public DateTime LastScanned { get; set; }
public MangaFormat Format { get; set; }
}
public class ParseScannedFiles
{
private readonly ConcurrentDictionary<ParsedSeries, List<ParserInfo>> _scannedSeries;
private readonly ILogger _logger;
private readonly IDirectoryService _directoryService;
private readonly IReadingItemService _readingItemService;
private readonly IEventHub _eventHub;
private readonly DefaultParser _defaultParser;
/// <summary>
/// An instance of a pipeline for processing files and returning a Map of Series -> ParserInfos.
@ -47,108 +63,51 @@ namespace API.Services.Tasks.Scanner
_logger = logger;
_directoryService = directoryService;
_readingItemService = readingItemService;
_scannedSeries = new ConcurrentDictionary<ParsedSeries, List<ParserInfo>>();
_defaultParser = new DefaultParser(_directoryService);
_eventHub = eventHub;
}
/// <summary>
/// Gets the list of all parserInfos given a Series (Will match on Name, LocalizedName, OriginalName). If the series does not exist within, return empty list.
/// </summary>
/// <param name="parsedSeries"></param>
/// <param name="series"></param>
/// <returns></returns>
public static IList<ParserInfo> GetInfosByName(Dictionary<ParsedSeries, List<ParserInfo>> parsedSeries, Series series)
{
var allKeys = parsedSeries.Keys.Where(ps =>
SeriesHelper.FindSeries(series, ps));
var infos = new List<ParserInfo>();
foreach (var key in allKeys)
{
infos.AddRange(parsedSeries[key]);
}
return infos;
}
/// <summary>
/// Processes files found during a library scan.
/// Populates a collection of <see cref="ParserInfo"/> for DB updates later.
/// This will Scan all files in a folder path. For each folder within the folderPath, FolderAction will be invoked for all files contained
/// </summary>
/// <param name="path">Path of a file</param>
/// <param name="rootPath"></param>
/// <param name="type">Library type to determine parsing to perform</param>
private void ProcessFile(string path, string rootPath, LibraryType type)
/// <param name="scanDirectoryByDirectory">Scan directory by directory and for each, call folderAction</param>
/// <param name="folderPath">A library folder or series folder</param>
/// <param name="folderAction">A callback async Task to be called once all files for each folder path are found</param>
/// <param name="forceCheck">If we should bypass any folder last write time checks on the scan and force I/O</param>
public async Task ProcessFiles(string folderPath, bool scanDirectoryByDirectory,
IDictionary<string, IList<SeriesModified>> seriesPaths, Func<IList<string>, string,Task> folderAction, bool forceCheck = false)
{
var info = _readingItemService.Parse(path, rootPath, type);
if (info == null)
string normalizedPath;
if (scanDirectoryByDirectory)
{
// If the file is an image and literally a cover image, skip processing.
if (!(Parser.Parser.IsImage(path) && Parser.Parser.IsCoverImage(path)))
// This is used in library scan, so we should check first for a ignore file and use that here as well
var potentialIgnoreFile = _directoryService.FileSystem.Path.Join(folderPath, DirectoryService.KavitaIgnoreFile);
var directories = _directoryService.GetDirectories(folderPath, _directoryService.CreateMatcherFromFile(potentialIgnoreFile)).ToList();
foreach (var directory in directories)
{
_logger.LogWarning("[Scanner] Could not parse series from {Path}", path);
normalizedPath = Parser.Parser.NormalizePath(directory);
if (HasSeriesFolderNotChangedSinceLastScan(seriesPaths, normalizedPath, forceCheck))
{
await folderAction(new List<string>(), directory);
}
else
{
// For a scan, this is doing everything in the directory loop before the folder Action is called...which leads to no progress indication
await folderAction(_directoryService.ScanFiles(directory), directory);
}
}
return;
}
// This catches when original library type is Manga/Comic and when parsing with non
if (Parser.Parser.IsEpub(path) && Parser.Parser.ParseVolume(info.Series) != Parser.Parser.DefaultVolume) // Shouldn't this be info.Volume != DefaultVolume?
normalizedPath = Parser.Parser.NormalizePath(folderPath);
if (HasSeriesFolderNotChangedSinceLastScan(seriesPaths, normalizedPath, forceCheck))
{
info = _defaultParser.Parse(path, rootPath, LibraryType.Book);
var info2 = _readingItemService.Parse(path, rootPath, type);
info.Merge(info2);
}
info.ComicInfo = _readingItemService.GetComicInfo(path);
if (info.ComicInfo != null)
{
if (!string.IsNullOrEmpty(info.ComicInfo.Volume))
{
info.Volumes = info.ComicInfo.Volume;
}
if (!string.IsNullOrEmpty(info.ComicInfo.Series))
{
info.Series = info.ComicInfo.Series.Trim();
}
if (!string.IsNullOrEmpty(info.ComicInfo.Number))
{
info.Chapters = info.ComicInfo.Number;
}
// Patch is SeriesSort from ComicInfo
if (!string.IsNullOrEmpty(info.ComicInfo.TitleSort))
{
info.SeriesSort = info.ComicInfo.TitleSort.Trim();
}
if (!string.IsNullOrEmpty(info.ComicInfo.Format) && Parser.Parser.HasComicInfoSpecial(info.ComicInfo.Format))
{
info.IsSpecial = true;
info.Chapters = Parser.Parser.DefaultChapter;
info.Volumes = Parser.Parser.DefaultVolume;
}
if (!string.IsNullOrEmpty(info.ComicInfo.SeriesSort))
{
info.SeriesSort = info.ComicInfo.SeriesSort.Trim();
}
if (!string.IsNullOrEmpty(info.ComicInfo.LocalizedSeries))
{
info.LocalizedSeries = info.ComicInfo.LocalizedSeries.Trim();
}
}
try
{
TrackSeries(info);
}
catch (Exception ex)
{
_logger.LogError(ex, "There was an exception that occurred during tracking {FilePath}. Skipping this file", info.FullFilePath);
await folderAction(new List<string>(), folderPath);
return;
}
await folderAction(_directoryService.ScanFiles(folderPath), folderPath);
}
@ -156,13 +115,14 @@ namespace API.Services.Tasks.Scanner
/// Attempts to either add a new instance of a show mapping to the _scannedSeries bag or adds to an existing.
/// This will check if the name matches an existing series name (multiple fields) <see cref="MergeName"/>
/// </summary>
/// <param name="scannedSeries">A localized list of a series' parsed infos</param>
/// <param name="info"></param>
private void TrackSeries(ParserInfo info)
private void TrackSeries(ConcurrentDictionary<ParsedSeries, List<ParserInfo>> scannedSeries, ParserInfo info)
{
if (info.Series == string.Empty) return;
// Check if normalized info.Series already exists and if so, update info to use that name instead
info.Series = MergeName(info);
info.Series = MergeName(scannedSeries, info);
var normalizedSeries = Parser.Parser.Normalize(info.Series);
var normalizedSortSeries = Parser.Parser.Normalize(info.SeriesSort);
@ -170,7 +130,7 @@ namespace API.Services.Tasks.Scanner
try
{
var existingKey = _scannedSeries.Keys.SingleOrDefault(ps =>
var existingKey = scannedSeries.Keys.SingleOrDefault(ps =>
ps.Format == info.Format && (ps.NormalizedName.Equals(normalizedSeries)
|| ps.NormalizedName.Equals(normalizedLocalizedSeries)
|| ps.NormalizedName.Equals(normalizedSortSeries)));
@ -181,7 +141,7 @@ namespace API.Services.Tasks.Scanner
NormalizedName = normalizedSeries
};
_scannedSeries.AddOrUpdate(existingKey, new List<ParserInfo>() {info}, (_, oldValue) =>
scannedSeries.AddOrUpdate(existingKey, new List<ParserInfo>() {info}, (_, oldValue) =>
{
oldValue ??= new List<ParserInfo>();
if (!oldValue.Contains(info))
@ -195,7 +155,7 @@ namespace API.Services.Tasks.Scanner
catch (Exception ex)
{
_logger.LogCritical(ex, "{SeriesName} matches against multiple series in the parsed series. This indicates a critical kavita issue. Key will be skipped", info.Series);
foreach (var seriesKey in _scannedSeries.Keys.Where(ps =>
foreach (var seriesKey in scannedSeries.Keys.Where(ps =>
ps.Format == info.Format && (ps.NormalizedName.Equals(normalizedSeries)
|| ps.NormalizedName.Equals(normalizedLocalizedSeries)
|| ps.NormalizedName.Equals(normalizedSortSeries))))
@ -205,23 +165,24 @@ namespace API.Services.Tasks.Scanner
}
}
/// <summary>
/// Using a normalized name from the passed ParserInfo, this checks against all found series so far and if an existing one exists with
/// same normalized name, it merges into the existing one. This is important as some manga may have a slight difference with punctuation or capitalization.
/// </summary>
/// <param name="info"></param>
/// <returns>Series Name to group this info into</returns>
public string MergeName(ParserInfo info)
private string MergeName(ConcurrentDictionary<ParsedSeries, List<ParserInfo>> scannedSeries, ParserInfo info)
{
var normalizedSeries = Parser.Parser.Normalize(info.Series);
var normalizedLocalSeries = Parser.Parser.Normalize(info.LocalizedSeries);
// We use FirstOrDefault because this was introduced late in development and users might have 2 series with both names
try
{
var existingName =
_scannedSeries.SingleOrDefault(p =>
(Parser.Parser.Normalize(p.Key.NormalizedName) == normalizedSeries ||
Parser.Parser.Normalize(p.Key.NormalizedName) == normalizedLocalSeries) &&
scannedSeries.SingleOrDefault(p =>
(Parser.Parser.Normalize(p.Key.NormalizedName).Equals(normalizedSeries) ||
Parser.Parser.Normalize(p.Key.NormalizedName).Equals(normalizedLocalSeries)) &&
p.Key.Format == info.Format)
.Key;
@ -233,7 +194,7 @@ namespace API.Services.Tasks.Scanner
catch (Exception ex)
{
_logger.LogCritical(ex, "Multiple series detected for {SeriesName} ({File})! This is critical to fix! There should only be 1", info.Series, info.FullFilePath);
var values = _scannedSeries.Where(p =>
var values = scannedSeries.Where(p =>
(Parser.Parser.Normalize(p.Key.NormalizedName) == normalizedSeries ||
Parser.Parser.Normalize(p.Key.NormalizedName) == normalizedLocalSeries) &&
p.Key.Format == info.Format);
@ -247,34 +208,77 @@ namespace API.Services.Tasks.Scanner
return info.Series;
}
/// <summary>
///
/// This will process series by folder groups.
/// </summary>
/// <param name="libraryType">Type of library. Used for selecting the correct file extensions to search for and parsing files</param>
/// <param name="folders">The folders to scan. By default, this should be library.Folders, however it can be overwritten to restrict folders</param>
/// <param name="libraryName">Name of the Library</param>
/// <param name="libraryType"></param>
/// <param name="folders"></param>
/// <param name="libraryName"></param>
/// <returns></returns>
public async Task<Dictionary<ParsedSeries, List<ParserInfo>>> ScanLibrariesForSeries(LibraryType libraryType, IEnumerable<string> folders, string libraryName)
public async Task ScanLibrariesForSeries(LibraryType libraryType,
IEnumerable<string> folders, string libraryName, bool isLibraryScan,
IDictionary<string, IList<SeriesModified>> seriesPaths, Action<Tuple<bool, IList<ParserInfo>>> processSeriesInfos, bool forceCheck = false)
{
await _eventHub.SendMessageAsync(MessageFactory.NotificationProgress, MessageFactory.FileScanProgressEvent("", libraryName, ProgressEventType.Started));
await _eventHub.SendMessageAsync(MessageFactory.NotificationProgress, MessageFactory.FileScanProgressEvent("File Scan Starting", libraryName, ProgressEventType.Started));
foreach (var folderPath in folders)
{
try
{
async void Action(string f)
await ProcessFiles(folderPath, isLibraryScan, seriesPaths, async (files, folder) =>
{
try
var normalizedFolder = Parser.Parser.NormalizePath(folder);
if (HasSeriesFolderNotChangedSinceLastScan(seriesPaths, normalizedFolder, forceCheck))
{
ProcessFile(f, folderPath, libraryType);
await _eventHub.SendMessageAsync(MessageFactory.NotificationProgress, MessageFactory.FileScanProgressEvent(f, libraryName, ProgressEventType.Updated));
var parsedInfos = seriesPaths[normalizedFolder].Select(fp => new ParserInfo()
{
Series = fp.SeriesName,
Format = fp.Format,
}).ToList();
processSeriesInfos.Invoke(new Tuple<bool, IList<ParserInfo>>(true, parsedInfos));
_logger.LogDebug("Skipped File Scan for {Folder} as it hasn't changed since last scan", folder);
return;
}
catch (FileNotFoundException exception)
_logger.LogDebug("Found {Count} files for {Folder}", files.Count, folder);
await _eventHub.SendMessageAsync(MessageFactory.NotificationProgress, MessageFactory.FileScanProgressEvent(folderPath, libraryName, ProgressEventType.Updated));
if (files.Count == 0)
{
_logger.LogError(exception, "The file {Filename} could not be found", f);
_logger.LogInformation("[ScannerService] {Folder} is empty", folder);
return;
}
}
var scannedSeries = new ConcurrentDictionary<ParsedSeries, List<ParserInfo>>();
var infos = files
.Select(file => _readingItemService.ParseFile(file, folderPath, libraryType))
.Where(info => info != null)
.ToList();
_directoryService.TraverseTreeParallelForEach(folderPath, Action, Parser.Parser.SupportedExtensions, _logger);
MergeLocalizedSeriesWithSeries(infos);
foreach (var info in infos)
{
try
{
TrackSeries(scannedSeries, info);
}
catch (Exception ex)
{
_logger.LogError(ex, "There was an exception that occurred during tracking {FilePath}. Skipping this file", info.FullFilePath);
}
}
// It would be really cool if we can emit an event when a folder hasn't been changed so we don't parse everything, but the first item to ensure we don't delete it
// Otherwise, we can do a last step in the DB where we validate all files on disk exist and if not, delete them. (easy but slow)
foreach (var series in scannedSeries.Keys)
{
if (scannedSeries[series].Count > 0 && processSeriesInfos != null)
{
processSeriesInfos.Invoke(new Tuple<bool, IList<ParserInfo>>(false, scannedSeries[series]));
}
}
}, forceCheck);
}
catch (ArgumentException ex)
{
@ -282,20 +286,76 @@ namespace API.Services.Tasks.Scanner
}
}
await _eventHub.SendMessageAsync(MessageFactory.NotificationProgress, MessageFactory.FileScanProgressEvent("", libraryName, ProgressEventType.Ended));
return SeriesWithInfos();
await _eventHub.SendMessageAsync(MessageFactory.NotificationProgress, MessageFactory.FileScanProgressEvent("File Scan Done", libraryName, ProgressEventType.Ended));
}
/// <summary>
/// Returns any series where there were parsed infos
/// Checks against all folder paths on file if the last scanned is >= the directory's last write down to the second
/// </summary>
/// <param name="seriesPaths"></param>
/// <param name="normalizedFolder"></param>
/// <param name="forceCheck"></param>
/// <returns></returns>
private Dictionary<ParsedSeries, List<ParserInfo>> SeriesWithInfos()
private bool HasSeriesFolderNotChangedSinceLastScan(IDictionary<string, IList<SeriesModified>> seriesPaths, string normalizedFolder, bool forceCheck = false)
{
var filtered = _scannedSeries.Where(kvp => kvp.Value.Count > 0);
var series = filtered.ToDictionary(v => v.Key, v => v.Value);
return series;
if (forceCheck) return false;
return seriesPaths.ContainsKey(normalizedFolder) && seriesPaths[normalizedFolder].All(f => f.LastScanned.Truncate(TimeSpan.TicksPerSecond) >=
_directoryService.GetLastWriteTime(normalizedFolder).Truncate(TimeSpan.TicksPerSecond));
}
/// <summary>
/// Checks if there are any ParserInfos that have a Series that matches the LocalizedSeries field in any other info. If so,
/// rewrites the infos with series name instead of the localized name, so they stack.
/// </summary>
/// <example>
/// Accel World v01.cbz has Series "Accel World" and Localized Series "World of Acceleration"
/// World of Acceleration v02.cbz has Series "World of Acceleration"
/// After running this code, we'd have:
/// World of Acceleration v02.cbz having Series "Accel World" and Localized Series of "World of Acceleration"
/// </example>
/// <param name="infos">A collection of ParserInfos</param>
private void MergeLocalizedSeriesWithSeries(IReadOnlyCollection<ParserInfo> infos)
{
var hasLocalizedSeries = infos.Any(i => !string.IsNullOrEmpty(i.LocalizedSeries));
if (!hasLocalizedSeries) return;
var localizedSeries = infos
.Where(i => !i.IsSpecial)
.Select(i => i.LocalizedSeries)
.Distinct()
.FirstOrDefault(i => !string.IsNullOrEmpty(i));
if (string.IsNullOrEmpty(localizedSeries)) return;
// NOTE: If we have multiple series in a folder with a localized title, then this will fail. It will group into one series. User needs to fix this themselves.
string nonLocalizedSeries;
// Normalize this as many of the cases is a capitalization difference
var nonLocalizedSeriesFound = infos
.Where(i => !i.IsSpecial)
.Select(i => i.Series).DistinctBy(Parser.Parser.Normalize).ToList();
if (nonLocalizedSeriesFound.Count == 1)
{
nonLocalizedSeries = nonLocalizedSeriesFound.First();
}
else
{
// There can be a case where there are multiple series in a folder that causes merging.
if (nonLocalizedSeriesFound.Count > 2)
{
_logger.LogError("[ScannerService] There are multiple series within one folder that contain localized series. This will cause them to group incorrectly. Please separate series into their own dedicated folder or ensure there is only 2 potential series (localized and series): {LocalizedSeries}", string.Join(", ", nonLocalizedSeriesFound));
}
nonLocalizedSeries = nonLocalizedSeriesFound.FirstOrDefault(s => !s.Equals(localizedSeries));
}
if (string.IsNullOrEmpty(nonLocalizedSeries)) return;
var normalizedNonLocalizedSeries = Parser.Parser.Normalize(nonLocalizedSeries);
foreach (var infoNeedingMapping in infos.Where(i =>
!Parser.Parser.Normalize(i.Series).Equals(normalizedNonLocalizedSeries)))
{
infoNeedingMapping.Series = nonLocalizedSeries;
infoNeedingMapping.LocalizedSeries = localizedSeries;
}
}
}
}

View file

@ -0,0 +1,170 @@
using System.IO;
using System.Linq;
using API.Entities.Enums;
using API.Services;
namespace API.Parser;
public interface IDefaultParser
{
ParserInfo Parse(string filePath, string rootPath, LibraryType type = LibraryType.Manga);
void ParseFromFallbackFolders(string filePath, string rootPath, LibraryType type, ref ParserInfo ret);
}
/// <summary>
/// This is an implementation of the Parser that is the basis for everything
/// </summary>
public class DefaultParser : IDefaultParser
{
private readonly IDirectoryService _directoryService;
public DefaultParser(IDirectoryService directoryService)
{
_directoryService = directoryService;
}
/// <summary>
/// Parses information out of a file path. Will fallback to using directory name if Series couldn't be parsed
/// from filename.
/// </summary>
/// <param name="filePath"></param>
/// <param name="rootPath">Root folder</param>
/// <param name="type">Defaults to Manga. Allows different Regex to be used for parsing.</param>
/// <returns><see cref="ParserInfo"/> or null if Series was empty</returns>
public ParserInfo Parse(string filePath, string rootPath, LibraryType type = LibraryType.Manga)
{
var fileName = _directoryService.FileSystem.Path.GetFileNameWithoutExtension(filePath);
ParserInfo ret;
if (Services.Tasks.Scanner.Parser.Parser.IsEpub(filePath))
{
ret = new ParserInfo()
{
Chapters = Services.Tasks.Scanner.Parser.Parser.ParseChapter(fileName) ?? Services.Tasks.Scanner.Parser.Parser.ParseComicChapter(fileName),
Series = Services.Tasks.Scanner.Parser.Parser.ParseSeries(fileName) ?? Services.Tasks.Scanner.Parser.Parser.ParseComicSeries(fileName),
Volumes = Services.Tasks.Scanner.Parser.Parser.ParseVolume(fileName) ?? Services.Tasks.Scanner.Parser.Parser.ParseComicVolume(fileName),
Filename = Path.GetFileName(filePath),
Format = Services.Tasks.Scanner.Parser.Parser.ParseFormat(filePath),
FullFilePath = filePath
};
}
else
{
ret = new ParserInfo()
{
Chapters = type == LibraryType.Comic ? Services.Tasks.Scanner.Parser.Parser.ParseComicChapter(fileName) : Services.Tasks.Scanner.Parser.Parser.ParseChapter(fileName),
Series = type == LibraryType.Comic ? Services.Tasks.Scanner.Parser.Parser.ParseComicSeries(fileName) : Services.Tasks.Scanner.Parser.Parser.ParseSeries(fileName),
Volumes = type == LibraryType.Comic ? Services.Tasks.Scanner.Parser.Parser.ParseComicVolume(fileName) : Services.Tasks.Scanner.Parser.Parser.ParseVolume(fileName),
Filename = Path.GetFileName(filePath),
Format = Services.Tasks.Scanner.Parser.Parser.ParseFormat(filePath),
Title = Path.GetFileNameWithoutExtension(fileName),
FullFilePath = filePath
};
}
if (Services.Tasks.Scanner.Parser.Parser.IsImage(filePath) && Services.Tasks.Scanner.Parser.Parser.IsCoverImage(filePath)) return null;
if (Services.Tasks.Scanner.Parser.Parser.IsImage(filePath))
{
// Reset Chapters, Volumes, and Series as images are not good to parse information out of. Better to use folders.
ret.Volumes = Services.Tasks.Scanner.Parser.Parser.DefaultVolume;
ret.Chapters = Services.Tasks.Scanner.Parser.Parser.DefaultChapter;
ret.Series = string.Empty;
}
if (ret.Series == string.Empty || Services.Tasks.Scanner.Parser.Parser.IsImage(filePath))
{
// Try to parse information out of each folder all the way to rootPath
ParseFromFallbackFolders(filePath, rootPath, type, ref ret);
}
var edition = Services.Tasks.Scanner.Parser.Parser.ParseEdition(fileName);
if (!string.IsNullOrEmpty(edition))
{
ret.Series = Services.Tasks.Scanner.Parser.Parser.CleanTitle(ret.Series.Replace(edition, ""), type is LibraryType.Comic);
ret.Edition = edition;
}
var isSpecial = type == LibraryType.Comic ? Services.Tasks.Scanner.Parser.Parser.ParseComicSpecial(fileName) : Services.Tasks.Scanner.Parser.Parser.ParseMangaSpecial(fileName);
// We must ensure that we can only parse a special out. As some files will have v20 c171-180+Omake and that
// could cause a problem as Omake is a special term, but there is valid volume/chapter information.
if (ret.Chapters == Services.Tasks.Scanner.Parser.Parser.DefaultChapter && ret.Volumes == Services.Tasks.Scanner.Parser.Parser.DefaultVolume && !string.IsNullOrEmpty(isSpecial))
{
ret.IsSpecial = true;
ParseFromFallbackFolders(filePath, rootPath, type, ref ret); // NOTE: This can cause some complications, we should try to be a bit less aggressive to fallback to folder
}
// If we are a special with marker, we need to ensure we use the correct series name. we can do this by falling back to Folder name
if (Services.Tasks.Scanner.Parser.Parser.HasSpecialMarker(fileName))
{
ret.IsSpecial = true;
ret.Chapters = Services.Tasks.Scanner.Parser.Parser.DefaultChapter;
ret.Volumes = Services.Tasks.Scanner.Parser.Parser.DefaultVolume;
ParseFromFallbackFolders(filePath, rootPath, type, ref ret);
}
if (string.IsNullOrEmpty(ret.Series))
{
ret.Series = Services.Tasks.Scanner.Parser.Parser.CleanTitle(fileName, type is LibraryType.Comic);
}
// Pdfs may have .pdf in the series name, remove that
if (Services.Tasks.Scanner.Parser.Parser.IsPdf(filePath) && ret.Series.ToLower().EndsWith(".pdf"))
{
ret.Series = ret.Series.Substring(0, ret.Series.Length - ".pdf".Length);
}
return ret.Series == string.Empty ? null : ret;
}
/// <summary>
/// Fills out <see cref="ParserInfo"/> by trying to parse volume, chapters, and series from folders
/// </summary>
/// <param name="filePath"></param>
/// <param name="rootPath"></param>
/// <param name="type"></param>
/// <param name="ret">Expects a non-null ParserInfo which this method will populate</param>
public void ParseFromFallbackFolders(string filePath, string rootPath, LibraryType type, ref ParserInfo ret)
{
var fallbackFolders = _directoryService.GetFoldersTillRoot(rootPath, filePath).ToList();
for (var i = 0; i < fallbackFolders.Count; i++)
{
var folder = fallbackFolders[i];
if (!string.IsNullOrEmpty(Services.Tasks.Scanner.Parser.Parser.ParseMangaSpecial(folder))) continue;
var parsedVolume = type is LibraryType.Manga ? Services.Tasks.Scanner.Parser.Parser.ParseVolume(folder) : Services.Tasks.Scanner.Parser.Parser.ParseComicVolume(folder);
var parsedChapter = type is LibraryType.Manga ? Services.Tasks.Scanner.Parser.Parser.ParseChapter(folder) : Services.Tasks.Scanner.Parser.Parser.ParseComicChapter(folder);
if (!parsedVolume.Equals(Services.Tasks.Scanner.Parser.Parser.DefaultVolume) || !parsedChapter.Equals(Services.Tasks.Scanner.Parser.Parser.DefaultChapter))
{
if ((string.IsNullOrEmpty(ret.Volumes) || ret.Volumes.Equals(Services.Tasks.Scanner.Parser.Parser.DefaultVolume)) && !parsedVolume.Equals(Services.Tasks.Scanner.Parser.Parser.DefaultVolume))
{
ret.Volumes = parsedVolume;
}
if ((string.IsNullOrEmpty(ret.Chapters) || ret.Chapters.Equals(Services.Tasks.Scanner.Parser.Parser.DefaultChapter)) && !parsedChapter.Equals(Services.Tasks.Scanner.Parser.Parser.DefaultChapter))
{
ret.Chapters = parsedChapter;
}
}
// Generally users group in series folders. Let's try to parse series from the top folder
if (!folder.Equals(ret.Series) && i == fallbackFolders.Count - 1)
{
var series = Services.Tasks.Scanner.Parser.Parser.ParseSeries(folder);
if (string.IsNullOrEmpty(series))
{
ret.Series = Services.Tasks.Scanner.Parser.Parser.CleanTitle(folder, type is LibraryType.Comic);
break;
}
if (!string.IsNullOrEmpty(series) && (string.IsNullOrEmpty(ret.Series) || !folder.Contains(ret.Series)))
{
ret.Series = series;
break;
}
}
}
}
}

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,101 @@
using API.Data.Metadata;
using API.Entities.Enums;
using API.Services.Tasks.Scanner.Parser;
namespace API.Parser
{
/// <summary>
/// This represents all parsed information from a single file
/// </summary>
public class ParserInfo
{
/// <summary>
/// Represents the parsed chapters from a file. By default, will be 0 which means nothing could be parsed.
/// <remarks>The chapters can only be a single float or a range of float ie) 1-2. Mainly floats should be multiples of 0.5 representing specials</remarks>
/// </summary>
public string Chapters { get; set; } = "";
/// <summary>
/// Represents the parsed series from the file or folder
/// </summary>
public string Series { get; set; } = string.Empty;
/// <summary>
/// This can be filled in from ComicInfo.xml/Epub during scanning. Will update the SortName field on <see cref="Entities.Series"/>
/// </summary>
public string SeriesSort { get; set; } = string.Empty;
/// <summary>
/// This can be filled in from ComicInfo.xml/Epub during scanning. Will update the LocalizedName field on <see cref="Entities.Series"/>
/// </summary>
public string LocalizedSeries { get; set; } = string.Empty;
/// <summary>
/// Represents the parsed volumes from a file. By default, will be 0 which means that nothing could be parsed.
/// If Volumes is 0 and Chapters is 0, the file is a special. If Chapters is non-zero, then no volume could be parsed.
/// <example>Beastars Vol 3-4 will map to "3-4"</example>
/// <remarks>The volumes can only be a single int or a range of ints ie) 1-2. Float based volumes are not supported.</remarks>
/// </summary>
public string Volumes { get; set; } = "";
/// <summary>
/// Filename of the underlying file
/// <example>Beastars v01 (digital).cbz</example>
/// </summary>
public string Filename { get; init; } = "";
/// <summary>
/// Full filepath of the underlying file
/// <example>C:/Manga/Beastars v01 (digital).cbz</example>
/// </summary>
public string FullFilePath { get; set; } = "";
/// <summary>
/// <see cref="MangaFormat"/> that represents the type of the file
/// <remarks>Mainly used to show in the UI and so caching service knows how to cache for reading.</remarks>
/// </summary>
public MangaFormat Format { get; set; } = MangaFormat.Unknown;
/// <summary>
/// This can potentially story things like "Omnibus, Color, Full Contact Edition, Extra, Final, etc"
/// </summary>
/// <remarks>Not Used in Database</remarks>
public string Edition { get; set; } = "";
/// <summary>
/// If the file contains no volume/chapter information or contains Special Keywords <see cref="Parser.MangaSpecialRegex"/>
/// </summary>
public bool IsSpecial { get; set; }
/// <summary>
/// Used for specials or books, stores what the UI should show.
/// <remarks>Manga does not use this field</remarks>
/// </summary>
public string Title { get; set; } = string.Empty;
/// <summary>
/// If the ParserInfo has the IsSpecial tag or both volumes and chapters are default aka 0
/// </summary>
/// <returns></returns>
public bool IsSpecialInfo()
{
return (IsSpecial || (Volumes == "0" && Chapters == "0"));
}
/// <summary>
/// This will contain any EXTRA comicInfo information parsed from the epub or archive. If there is an archive with comicInfo.xml AND it contains
/// series, volume information, that will override what we parsed.
/// </summary>
public ComicInfo ComicInfo { get; set; }
/// <summary>
/// Merges non empty/null properties from info2 into this entity.
/// </summary>
/// <remarks>This does not merge ComicInfo as they should always be the same</remarks>
/// <param name="info2"></param>
public void Merge(ParserInfo info2)
{
if (info2 == null) return;
Chapters = string.IsNullOrEmpty(Chapters) || Chapters == "0" ? info2.Chapters: Chapters;
Volumes = string.IsNullOrEmpty(Volumes) || Volumes == "0" ? info2.Volumes : Volumes;
Edition = string.IsNullOrEmpty(Edition) ? info2.Edition : Edition;
Title = string.IsNullOrEmpty(Title) ? info2.Title : Title;
Series = string.IsNullOrEmpty(Series) ? info2.Series : Series;
IsSpecial = IsSpecial || info2.IsSpecial;
}
}
}

View file

@ -0,0 +1,819 @@
using System;
using System.Collections.Generic;
using System.Collections.Immutable;
using System.Diagnostics;
using System.Linq;
using System.Threading.Tasks;
using API.Data;
using API.Data.Metadata;
using API.Entities;
using API.Entities.Enums;
using API.Extensions;
using API.Helpers;
using API.Parser;
using API.Services.Tasks.Metadata;
using API.SignalR;
using Hangfire;
using Microsoft.Extensions.Logging;
namespace API.Services.Tasks.Scanner;
public interface IProcessSeries
{
/// <summary>
/// Do not allow this Prime to be invoked by multiple threads. It will break the DB.
/// </summary>
/// <returns></returns>
Task Prime();
Task ProcessSeriesAsync(IList<ParserInfo> parsedInfos, Library library);
void EnqueuePostSeriesProcessTasks(int libraryId, int seriesId, bool forceUpdate = false);
}
/// <summary>
/// All code needed to Update a Series from a Scan action
/// </summary>
public class ProcessSeries : IProcessSeries
{
private readonly IUnitOfWork _unitOfWork;
private readonly ILogger<ProcessSeries> _logger;
private readonly IEventHub _eventHub;
private readonly IDirectoryService _directoryService;
private readonly ICacheHelper _cacheHelper;
private readonly IReadingItemService _readingItemService;
private readonly IFileService _fileService;
private readonly IMetadataService _metadataService;
private readonly IWordCountAnalyzerService _wordCountAnalyzerService;
private IList<Genre> _genres;
private IList<Person> _people;
private IList<Tag> _tags;
public ProcessSeries(IUnitOfWork unitOfWork, ILogger<ProcessSeries> logger, IEventHub eventHub,
IDirectoryService directoryService, ICacheHelper cacheHelper, IReadingItemService readingItemService,
IFileService fileService, IMetadataService metadataService, IWordCountAnalyzerService wordCountAnalyzerService)
{
_unitOfWork = unitOfWork;
_logger = logger;
_eventHub = eventHub;
_directoryService = directoryService;
_cacheHelper = cacheHelper;
_readingItemService = readingItemService;
_fileService = fileService;
_metadataService = metadataService;
_wordCountAnalyzerService = wordCountAnalyzerService;
}
/// <summary>
/// Invoke this before processing any series, just once to prime all the needed data during a scan
/// </summary>
public async Task Prime()
{
_genres = await _unitOfWork.GenreRepository.GetAllGenresAsync();
_people = await _unitOfWork.PersonRepository.GetAllPeople();
_tags = await _unitOfWork.TagRepository.GetAllTagsAsync();
}
public async Task ProcessSeriesAsync(IList<ParserInfo> parsedInfos, Library library)
{
if (!parsedInfos.Any()) return;
var seriesAdded = false;
var scanWatch = Stopwatch.StartNew();
var seriesName = parsedInfos.First().Series;
await _eventHub.SendMessageAsync(MessageFactory.NotificationProgress,
MessageFactory.LibraryScanProgressEvent(library.Name, ProgressEventType.Updated, seriesName));
_logger.LogInformation("[ScannerService] Beginning series update on {SeriesName}", seriesName);
// Check if there is a Series
var firstInfo = parsedInfos.First();
Series series;
try
{
series =
await _unitOfWork.SeriesRepository.GetFullSeriesByAnyName(firstInfo.Series, firstInfo.LocalizedSeries,
library.Id, firstInfo.Format);
}
catch (Exception ex)
{
_logger.LogError(ex, "There was an exception finding existing series for {SeriesName} with Localized name of {LocalizedName} for library {LibraryId}. This indicates you have duplicate series with same name or localized name in the library. Correct this and rescan", firstInfo.Series, firstInfo.LocalizedSeries, library.Id);
await _eventHub.SendMessageAsync(MessageFactory.Error,
MessageFactory.ErrorEvent($"There was an exception finding existing series for {firstInfo.Series} with Localized name of {firstInfo.LocalizedSeries} for library {library.Id}",
"This indicates you have duplicate series with same name or localized name in the library. Correct this and rescan."));
return;
}
if (series == null)
{
seriesAdded = true;
series = DbFactory.Series(firstInfo.Series, firstInfo.LocalizedSeries);
}
if (series.LibraryId == 0) series.LibraryId = library.Id;
try
{
_logger.LogInformation("[ScannerService] Processing series {SeriesName}", series.OriginalName);
var firstParsedInfo = parsedInfos[0];
UpdateVolumes(series, parsedInfos);
series.Pages = series.Volumes.Sum(v => v.Pages);
series.NormalizedName = Parser.Parser.Normalize(series.Name);
series.OriginalName ??= firstParsedInfo.Series;
if (series.Format == MangaFormat.Unknown)
{
series.Format = firstParsedInfo.Format;
}
if (string.IsNullOrEmpty(series.SortName))
{
series.SortName = series.Name;
}
if (!series.SortNameLocked)
{
series.SortName = series.Name;
if (!string.IsNullOrEmpty(firstParsedInfo.SeriesSort))
{
series.SortName = firstParsedInfo.SeriesSort;
}
}
// parsedInfos[0] is not the first volume or chapter. We need to find it
var localizedSeries = parsedInfos.Select(p => p.LocalizedSeries).FirstOrDefault(p => !string.IsNullOrEmpty(p));
if (!series.LocalizedNameLocked && !string.IsNullOrEmpty(localizedSeries))
{
series.LocalizedName = localizedSeries;
series.NormalizedLocalizedName = Parser.Parser.Normalize(series.LocalizedName);
}
UpdateSeriesMetadata(series, library.Type);
// Update series FolderPath here
await UpdateSeriesFolderPath(parsedInfos, library, series);
series.LastFolderScanned = DateTime.Now;
_unitOfWork.SeriesRepository.Attach(series);
if (_unitOfWork.HasChanges())
{
try
{
await _unitOfWork.CommitAsync();
}
catch (Exception ex)
{
await _unitOfWork.RollbackAsync();
_logger.LogCritical(ex, "[ScannerService] There was an issue writing to the for series {@SeriesName}", series);
await _eventHub.SendMessageAsync(MessageFactory.Error,
MessageFactory.ErrorEvent($"There was an issue writing to the DB for Series {series}",
ex.Message));
return;
}
if (seriesAdded)
{
await _eventHub.SendMessageAsync(MessageFactory.SeriesAdded,
MessageFactory.SeriesAddedEvent(series.Id, series.Name, series.LibraryId), false);
}
_logger.LogInformation("[ScannerService] Finished series update on {SeriesName} in {Milliseconds} ms", seriesName, scanWatch.ElapsedMilliseconds);
}
}
catch (Exception ex)
{
_logger.LogError(ex, "[ScannerService] There was an exception updating series for {SeriesName}", series.Name);
}
await _metadataService.GenerateCoversForSeries(series, false);
EnqueuePostSeriesProcessTasks(series.LibraryId, series.Id);
}
private async Task UpdateSeriesFolderPath(IEnumerable<ParserInfo> parsedInfos, Library library, Series series)
{
var seriesDirs = _directoryService.FindHighestDirectoriesFromFiles(library.Folders.Select(l => l.Path),
parsedInfos.Select(f => f.FullFilePath).ToList());
if (seriesDirs.Keys.Count == 0)
{
_logger.LogCritical(
"Scan Series has files spread outside a main series folder. This has negative performance effects. Please ensure all series are under a single folder from library");
await _eventHub.SendMessageAsync(MessageFactory.Info,
MessageFactory.InfoEvent($"{series.Name} has files spread outside a single series folder",
"This has negative performance effects. Please ensure all series are under a single folder from library"));
}
else
{
// Don't save FolderPath if it's a library Folder
if (!library.Folders.Select(f => f.Path).Contains(seriesDirs.Keys.First()))
{
series.FolderPath = Parser.Parser.NormalizePath(seriesDirs.Keys.First());
}
}
}
public void EnqueuePostSeriesProcessTasks(int libraryId, int seriesId, bool forceUpdate = false)
{
//BackgroundJob.Enqueue(() => _metadataService.GenerateCoversForSeries(libraryId, seriesId, forceUpdate));
BackgroundJob.Enqueue(() => _wordCountAnalyzerService.ScanSeries(libraryId, seriesId, forceUpdate));
}
private static void UpdateSeriesMetadata(Series series, LibraryType libraryType)
{
series.Metadata ??= DbFactory.SeriesMetadata(new List<CollectionTag>());
var isBook = libraryType == LibraryType.Book;
var firstChapter = SeriesService.GetFirstChapterForMetadata(series, isBook);
var firstFile = firstChapter?.Files.FirstOrDefault();
if (firstFile == null) return;
if (Parser.Parser.IsPdf(firstFile.FilePath)) return;
var chapters = series.Volumes.SelectMany(volume => volume.Chapters).ToList();
// Update Metadata based on Chapter metadata
series.Metadata.ReleaseYear = chapters.Min(c => c.ReleaseDate.Year);
if (series.Metadata.ReleaseYear < 1000)
{
// Not a valid year, default to 0
series.Metadata.ReleaseYear = 0;
}
// Set the AgeRating as highest in all the comicInfos
if (!series.Metadata.AgeRatingLocked) series.Metadata.AgeRating = chapters.Max(chapter => chapter.AgeRating);
series.Metadata.TotalCount = chapters.Max(chapter => chapter.TotalCount);
series.Metadata.MaxCount = chapters.Max(chapter => chapter.Count);
// To not have to rely completely on ComicInfo, try to parse out if the series is complete by checking parsed filenames as well.
if (series.Metadata.MaxCount != series.Metadata.TotalCount)
{
var maxVolume = series.Volumes.Max(v => (int) Parser.Parser.MaxNumberFromRange(v.Name));
var maxChapter = chapters.Max(c => (int) Parser.Parser.MaxNumberFromRange(c.Range));
if (maxVolume == series.Metadata.TotalCount) series.Metadata.MaxCount = maxVolume;
else if (maxChapter == series.Metadata.TotalCount) series.Metadata.MaxCount = maxChapter;
}
if (!series.Metadata.PublicationStatusLocked)
{
series.Metadata.PublicationStatus = PublicationStatus.OnGoing;
if (series.Metadata.MaxCount >= series.Metadata.TotalCount && series.Metadata.TotalCount > 0)
{
series.Metadata.PublicationStatus = PublicationStatus.Completed;
} else if (series.Metadata.TotalCount > 0 && series.Metadata.MaxCount > 0)
{
series.Metadata.PublicationStatus = PublicationStatus.Ended;
}
}
if (!string.IsNullOrEmpty(firstChapter.Summary) && !series.Metadata.SummaryLocked)
{
series.Metadata.Summary = firstChapter.Summary;
}
if (!string.IsNullOrEmpty(firstChapter.Language) && !series.Metadata.LanguageLocked)
{
series.Metadata.Language = firstChapter.Language;
}
// Handle People
foreach (var chapter in chapters)
{
if (!series.Metadata.WriterLocked)
{
foreach (var person in chapter.People.Where(p => p.Role == PersonRole.Writer))
{
PersonHelper.AddPersonIfNotExists(series.Metadata.People, person);
}
}
if (!series.Metadata.CoverArtistLocked)
{
foreach (var person in chapter.People.Where(p => p.Role == PersonRole.CoverArtist))
{
PersonHelper.AddPersonIfNotExists(series.Metadata.People, person);
}
}
if (!series.Metadata.PublisherLocked)
{
foreach (var person in chapter.People.Where(p => p.Role == PersonRole.Publisher))
{
PersonHelper.AddPersonIfNotExists(series.Metadata.People, person);
}
}
if (!series.Metadata.CharacterLocked)
{
foreach (var person in chapter.People.Where(p => p.Role == PersonRole.Character))
{
PersonHelper.AddPersonIfNotExists(series.Metadata.People, person);
}
}
if (!series.Metadata.ColoristLocked)
{
foreach (var person in chapter.People.Where(p => p.Role == PersonRole.Colorist))
{
PersonHelper.AddPersonIfNotExists(series.Metadata.People, person);
}
}
if (!series.Metadata.EditorLocked)
{
foreach (var person in chapter.People.Where(p => p.Role == PersonRole.Editor))
{
PersonHelper.AddPersonIfNotExists(series.Metadata.People, person);
}
}
if (!series.Metadata.InkerLocked)
{
foreach (var person in chapter.People.Where(p => p.Role == PersonRole.Inker))
{
PersonHelper.AddPersonIfNotExists(series.Metadata.People, person);
}
}
if (!series.Metadata.LettererLocked)
{
foreach (var person in chapter.People.Where(p => p.Role == PersonRole.Letterer))
{
PersonHelper.AddPersonIfNotExists(series.Metadata.People, person);
}
}
if (!series.Metadata.PencillerLocked)
{
foreach (var person in chapter.People.Where(p => p.Role == PersonRole.Penciller))
{
PersonHelper.AddPersonIfNotExists(series.Metadata.People, person);
}
}
if (!series.Metadata.TranslatorLocked)
{
foreach (var person in chapter.People.Where(p => p.Role == PersonRole.Translator))
{
PersonHelper.AddPersonIfNotExists(series.Metadata.People, person);
}
}
if (!series.Metadata.TagsLocked)
{
foreach (var tag in chapter.Tags)
{
TagHelper.AddTagIfNotExists(series.Metadata.Tags, tag);
}
}
if (!series.Metadata.GenresLocked)
{
foreach (var genre in chapter.Genres)
{
GenreHelper.AddGenreIfNotExists(series.Metadata.Genres, genre);
}
}
}
var genres = chapters.SelectMany(c => c.Genres).ToList();
GenreHelper.KeepOnlySameGenreBetweenLists(series.Metadata.Genres.ToList(), genres, genre =>
{
if (series.Metadata.GenresLocked) return;
series.Metadata.Genres.Remove(genre);
});
// NOTE: The issue here is that people is just from chapter, but series metadata might already have some people on it
// I might be able to filter out people that are in locked fields?
var people = chapters.SelectMany(c => c.People).ToList();
PersonHelper.KeepOnlySamePeopleBetweenLists(series.Metadata.People.ToList(),
people, person =>
{
switch (person.Role)
{
case PersonRole.Writer:
if (!series.Metadata.WriterLocked) series.Metadata.People.Remove(person);
break;
case PersonRole.Penciller:
if (!series.Metadata.PencillerLocked) series.Metadata.People.Remove(person);
break;
case PersonRole.Inker:
if (!series.Metadata.InkerLocked) series.Metadata.People.Remove(person);
break;
case PersonRole.Colorist:
if (!series.Metadata.ColoristLocked) series.Metadata.People.Remove(person);
break;
case PersonRole.Letterer:
if (!series.Metadata.LettererLocked) series.Metadata.People.Remove(person);
break;
case PersonRole.CoverArtist:
if (!series.Metadata.CoverArtistLocked) series.Metadata.People.Remove(person);
break;
case PersonRole.Editor:
if (!series.Metadata.EditorLocked) series.Metadata.People.Remove(person);
break;
case PersonRole.Publisher:
if (!series.Metadata.PublisherLocked) series.Metadata.People.Remove(person);
break;
case PersonRole.Character:
if (!series.Metadata.CharacterLocked) series.Metadata.People.Remove(person);
break;
case PersonRole.Translator:
if (!series.Metadata.TranslatorLocked) series.Metadata.People.Remove(person);
break;
default:
series.Metadata.People.Remove(person);
break;
}
});
}
private void UpdateVolumes(Series series, IList<ParserInfo> parsedInfos)
{
var startingVolumeCount = series.Volumes.Count;
// Add new volumes and update chapters per volume
var distinctVolumes = parsedInfos.DistinctVolumes();
_logger.LogDebug("[ScannerService] Updating {DistinctVolumes} volumes on {SeriesName}", distinctVolumes.Count, series.Name);
foreach (var volumeNumber in distinctVolumes)
{
var volume = series.Volumes.SingleOrDefault(s => s.Name == volumeNumber);
if (volume == null)
{
volume = DbFactory.Volume(volumeNumber);
volume.SeriesId = series.Id;
series.Volumes.Add(volume);
}
volume.Name = volumeNumber;
_logger.LogDebug("[ScannerService] Parsing {SeriesName} - Volume {VolumeNumber}", series.Name, volume.Name);
var infos = parsedInfos.Where(p => p.Volumes == volumeNumber).ToArray();
UpdateChapters(series, volume, infos);
volume.Pages = volume.Chapters.Sum(c => c.Pages);
// Update all the metadata on the Chapters
foreach (var chapter in volume.Chapters)
{
var firstFile = chapter.Files.MinBy(x => x.Chapter);
if (firstFile == null || _cacheHelper.HasFileNotChangedSinceCreationOrLastScan(chapter, false, firstFile)) continue;
try
{
var firstChapterInfo = infos.SingleOrDefault(i => i.FullFilePath.Equals(firstFile.FilePath));
UpdateChapterFromComicInfo(chapter, firstChapterInfo?.ComicInfo);
}
catch (Exception ex)
{
_logger.LogError(ex, "There was some issue when updating chapter's metadata");
}
}
}
// Remove existing volumes that aren't in parsedInfos
var nonDeletedVolumes = series.Volumes.Where(v => parsedInfos.Select(p => p.Volumes).Contains(v.Name)).ToList();
if (series.Volumes.Count != nonDeletedVolumes.Count)
{
_logger.LogDebug("[ScannerService] Removed {Count} volumes from {SeriesName} where parsed infos were not mapping with volume name",
(series.Volumes.Count - nonDeletedVolumes.Count), series.Name);
var deletedVolumes = series.Volumes.Except(nonDeletedVolumes);
foreach (var volume in deletedVolumes)
{
var file = volume.Chapters.FirstOrDefault()?.Files?.FirstOrDefault()?.FilePath ?? "";
if (!string.IsNullOrEmpty(file) && _directoryService.FileSystem.File.Exists(file))
{
_logger.LogError(
"[ScannerService] Volume cleanup code was trying to remove a volume with a file still existing on disk. File: {File}",
file);
}
_logger.LogDebug("[ScannerService] Removed {SeriesName} - Volume {Volume}: {File}", series.Name, volume.Name, file);
}
series.Volumes = nonDeletedVolumes;
}
_logger.LogDebug("[ScannerService] Updated {SeriesName} volumes from {StartingVolumeCount} to {VolumeCount}",
series.Name, startingVolumeCount, series.Volumes.Count);
}
private void UpdateChapters(Series series, Volume volume, IList<ParserInfo> parsedInfos)
{
// Add new chapters
foreach (var info in parsedInfos)
{
// Specials go into their own chapters with Range being their filename and IsSpecial = True. Non-Specials with Vol and Chap as 0
// also are treated like specials for UI grouping.
Chapter chapter;
try
{
chapter = volume.Chapters.GetChapterByRange(info);
}
catch (Exception ex)
{
_logger.LogError(ex, "{FileName} mapped as '{Series} - Vol {Volume} Ch {Chapter}' is a duplicate, skipping", info.FullFilePath, info.Series, info.Volumes, info.Chapters);
continue;
}
if (chapter == null)
{
_logger.LogDebug(
"[ScannerService] Adding new chapter, {Series} - Vol {Volume} Ch {Chapter}", info.Series, info.Volumes, info.Chapters);
chapter = DbFactory.Chapter(info);
volume.Chapters.Add(chapter);
series.LastChapterAdded = DateTime.Now;
}
else
{
chapter.UpdateFrom(info);
}
if (chapter == null) continue;
// Add files
var specialTreatment = info.IsSpecialInfo();
AddOrUpdateFileForChapter(chapter, info);
chapter.Number = Parser.Parser.MinNumberFromRange(info.Chapters) + string.Empty;
chapter.Range = specialTreatment ? info.Filename : info.Chapters;
}
// Remove chapters that aren't in parsedInfos or have no files linked
var existingChapters = volume.Chapters.ToList();
foreach (var existingChapter in existingChapters)
{
if (existingChapter.Files.Count == 0 || !parsedInfos.HasInfo(existingChapter))
{
_logger.LogDebug("[ScannerService] Removed chapter {Chapter} for Volume {VolumeNumber} on {SeriesName}", existingChapter.Range, volume.Name, parsedInfos[0].Series);
volume.Chapters.Remove(existingChapter);
}
else
{
// Ensure we remove any files that no longer exist AND order
existingChapter.Files = existingChapter.Files
.Where(f => parsedInfos.Any(p => p.FullFilePath == f.FilePath))
.OrderByNatural(f => f.FilePath).ToList();
existingChapter.Pages = existingChapter.Files.Sum(f => f.Pages);
}
}
}
private void AddOrUpdateFileForChapter(Chapter chapter, ParserInfo info)
{
chapter.Files ??= new List<MangaFile>();
var existingFile = chapter.Files.SingleOrDefault(f => f.FilePath == info.FullFilePath);
if (existingFile != null)
{
existingFile.Format = info.Format;
if (!_fileService.HasFileBeenModifiedSince(existingFile.FilePath, existingFile.LastModified) && existingFile.Pages != 0) return;
existingFile.Pages = _readingItemService.GetNumberOfPages(info.FullFilePath, info.Format);
// We skip updating DB here with last modified time so that metadata refresh can do it
}
else
{
var file = DbFactory.MangaFile(info.FullFilePath, info.Format, _readingItemService.GetNumberOfPages(info.FullFilePath, info.Format));
if (file == null) return;
chapter.Files.Add(file);
}
}
#nullable enable
private void UpdateChapterFromComicInfo(Chapter chapter, ComicInfo? info)
{
var firstFile = chapter.Files.MinBy(x => x.Chapter);
if (firstFile == null ||
_cacheHelper.HasFileNotChangedSinceCreationOrLastScan(chapter, false, firstFile)) return;
var comicInfo = info;
if (info == null)
{
comicInfo = _readingItemService.GetComicInfo(firstFile.FilePath);
}
if (comicInfo == null) return;
_logger.LogDebug("[ScannerService] Read ComicInfo for {File}", firstFile.FilePath);
chapter.AgeRating = ComicInfo.ConvertAgeRatingToEnum(comicInfo.AgeRating);
if (!string.IsNullOrEmpty(comicInfo.Title))
{
chapter.TitleName = comicInfo.Title.Trim();
}
if (!string.IsNullOrEmpty(comicInfo.Summary))
{
chapter.Summary = comicInfo.Summary;
}
if (!string.IsNullOrEmpty(comicInfo.LanguageISO))
{
chapter.Language = comicInfo.LanguageISO;
}
if (comicInfo.Count > 0)
{
chapter.TotalCount = comicInfo.Count;
}
// This needs to check against both Number and Volume to calculate Count
if (!string.IsNullOrEmpty(comicInfo.Number) && float.Parse(comicInfo.Number) > 0)
{
chapter.Count = (int) Math.Floor(float.Parse(comicInfo.Number));
}
if (!string.IsNullOrEmpty(comicInfo.Volume) && float.Parse(comicInfo.Volume) > 0)
{
chapter.Count = Math.Max(chapter.Count, (int) Math.Floor(float.Parse(comicInfo.Volume)));
}
void AddPerson(Person person)
{
PersonHelper.AddPersonIfNotExists(chapter.People, person);
}
void AddGenre(Genre genre)
{
//chapter.Genres.Add(genre);
GenreHelper.AddGenreIfNotExists(chapter.Genres, genre);
}
void AddTag(Tag tag, bool added)
{
//chapter.Tags.Add(tag);
TagHelper.AddTagIfNotExists(chapter.Tags, tag);
}
if (comicInfo.Year > 0)
{
var day = Math.Max(comicInfo.Day, 1);
var month = Math.Max(comicInfo.Month, 1);
chapter.ReleaseDate = DateTime.Parse($"{month}/{day}/{comicInfo.Year}");
}
var people = GetTagValues(comicInfo.Colorist);
PersonHelper.RemovePeople(chapter.People, people, PersonRole.Colorist);
UpdatePeople(people, PersonRole.Colorist,
AddPerson);
people = GetTagValues(comicInfo.Characters);
PersonHelper.RemovePeople(chapter.People, people, PersonRole.Character);
UpdatePeople(people, PersonRole.Character,
AddPerson);
people = GetTagValues(comicInfo.Translator);
PersonHelper.RemovePeople(chapter.People, people, PersonRole.Translator);
UpdatePeople(people, PersonRole.Translator,
AddPerson);
people = GetTagValues(comicInfo.Writer);
PersonHelper.RemovePeople(chapter.People, people, PersonRole.Writer);
UpdatePeople(people, PersonRole.Writer,
AddPerson);
people = GetTagValues(comicInfo.Editor);
PersonHelper.RemovePeople(chapter.People, people, PersonRole.Editor);
UpdatePeople(people, PersonRole.Editor,
AddPerson);
people = GetTagValues(comicInfo.Inker);
PersonHelper.RemovePeople(chapter.People, people, PersonRole.Inker);
UpdatePeople(people, PersonRole.Inker,
AddPerson);
people = GetTagValues(comicInfo.Letterer);
PersonHelper.RemovePeople(chapter.People, people, PersonRole.Letterer);
UpdatePeople(people, PersonRole.Letterer,
AddPerson);
people = GetTagValues(comicInfo.Penciller);
PersonHelper.RemovePeople(chapter.People, people, PersonRole.Penciller);
UpdatePeople(people, PersonRole.Penciller,
AddPerson);
people = GetTagValues(comicInfo.CoverArtist);
PersonHelper.RemovePeople(chapter.People, people, PersonRole.CoverArtist);
UpdatePeople(people, PersonRole.CoverArtist,
AddPerson);
people = GetTagValues(comicInfo.Publisher);
PersonHelper.RemovePeople(chapter.People, people, PersonRole.Publisher);
UpdatePeople(people, PersonRole.Publisher,
AddPerson);
var genres = GetTagValues(comicInfo.Genre);
GenreHelper.KeepOnlySameGenreBetweenLists(chapter.Genres, genres.Select(g => DbFactory.Genre(g, false)).ToList());
UpdateGenre(genres, false,
AddGenre);
var tags = GetTagValues(comicInfo.Tags);
TagHelper.KeepOnlySameTagBetweenLists(chapter.Tags, tags.Select(t => DbFactory.Tag(t, false)).ToList());
UpdateTag(tags, false,
AddTag);
}
private static IList<string> GetTagValues(string comicInfoTagSeparatedByComma)
{
if (!string.IsNullOrEmpty(comicInfoTagSeparatedByComma))
{
return comicInfoTagSeparatedByComma.Split(",").Select(s => s.Trim()).ToList();
}
return ImmutableList<string>.Empty;
}
#nullable disable
/// <summary>
/// Given a list of all existing people, this will check the new names and roles and if it doesn't exist in allPeople, will create and
/// add an entry. For each person in name, the callback will be executed.
/// </summary>
/// <remarks>This does not remove people if an empty list is passed into names</remarks>
/// <remarks>This is used to add new people to a list without worrying about duplicating rows in the DB</remarks>
/// <param name="names"></param>
/// <param name="role"></param>
/// <param name="action"></param>
private void UpdatePeople(IEnumerable<string> names, PersonRole role, Action<Person> action)
{
var allPeopleTypeRole = _people.Where(p => p.Role == role).ToList();
foreach (var name in names)
{
var normalizedName = Parser.Parser.Normalize(name);
var person = allPeopleTypeRole.FirstOrDefault(p =>
p.NormalizedName.Equals(normalizedName));
if (person == null)
{
person = DbFactory.Person(name, role);
lock (_people)
{
_people.Add(person);
}
}
action(person);
}
}
/// <summary>
///
/// </summary>
/// <param name="names"></param>
/// <param name="isExternal"></param>
/// <param name="action"></param>
private void UpdateGenre(IEnumerable<string> names, bool isExternal, Action<Genre> action)
{
foreach (var name in names)
{
if (string.IsNullOrEmpty(name.Trim())) continue;
var normalizedName = Parser.Parser.Normalize(name);
var genre = _genres.FirstOrDefault(p =>
p.NormalizedTitle.Equals(normalizedName) && p.ExternalTag == isExternal);
if (genre == null)
{
genre = DbFactory.Genre(name, false);
lock (_genres)
{
_genres.Add(genre);
}
}
action(genre);
}
}
/// <summary>
///
/// </summary>
/// <param name="names"></param>
/// <param name="isExternal"></param>
/// <param name="action">Callback for every item. Will give said item back and a bool if item was added</param>
private void UpdateTag(IEnumerable<string> names, bool isExternal, Action<Tag, bool> action)
{
foreach (var name in names)
{
if (string.IsNullOrEmpty(name.Trim())) continue;
var added = false;
var normalizedName = Parser.Parser.Normalize(name);
var tag = _tags.FirstOrDefault(p =>
p.NormalizedTitle.Equals(normalizedName) && p.ExternalTag == isExternal);
if (tag == null)
{
added = true;
tag = DbFactory.Tag(name, false);
lock (_tags)
{
_tags.Add(tag);
}
}
action(tag, added);
}
}
}