* Staging the code for the new scan loop. * Implemented a basic idea of changes on drives triggering scan loop. Issues: 1. Scan by folder does not work, 2. Queuing system is very hacky and needs a separate thread, 3. Performance degregation could be very real. * Started writing unit test for new loop code * Implemented a basic method to scan a folder path with ignore support (not implemented, code in place) * Added some code to the parser to build out the idea of processing series in batches based on some top level folder. * Scan Series now uses the new code (folder based parsing) and now handles the LocalizedSeries issue. * Got library scan working with the new folder-based scan loop. Updated code to set FolderPath (for improved scan times and partial scan support). * Wrote some notes on update library scan loop. * Removed migration for merge * Reapplied the SeriesFolder migration after merge * Refactored a check that used multiple db calls into one. * Made lots of progress on ignore support, but some confusion on underlying library. Ticket created. On hold till then. * Updated Scan Library and Scan Series to exit early if no changes are on the underlying folders that need to be scanned. * Implemented the ability to have .kavitaignore files within your directories and Kavita will parse them and ignore files and directories based on rules within them. * Fixed an issue where ignore files nested wouldn't stack with higher level ignores * Wrote out some basic code that showcases how we can scan series or library based on file events on the underlying system. Very buggy, needs lots of edge case testing and logging and dupplication checking. * Things are working kinda. I'm getting lost in my own code and complexity. I'm not sure it's worth it. * Refactored ScanFiles out to Directory Service. * Refactored more code out to keep the code clean. * More unit tests * Refactored the signature of ParsedSeries to use IList. Started writing unit tests and reworked the UpdateLibrary to work how it used to with new scan loop code (note: using async update library/series does not work). * Fixed the bug where processSeriesInfos was being invoked twice per series and made the code work very similar to old code (except loose leaf files dont work) but with folder based scanning. * Prep for unit tests (updating broken ones with new implementations) * Just some notes. Not sure I want to finish this work. * Refactored the LibraryWatcher with some comments and state variables. * Undid the migrations in case I don't move forward with this branch * Started to clean the code and prepare for finishing this work. * Fixed a bad merge * Updated signatures to cleanup the code and commit to the new strategy for scanning. * Swapped out the code with async processing of series on a small library * The new scan loop is working in both Sync and Async methods. The code is slow and not optimized. This represents a good point to start polling and applying optimizations. * Refactored UpdateSeries out of Scanner and into a dedicated file. * Refactored how ProcessTasks are awaited to allow more async * Fixed an issue where side nav item wouldn't show correct highlight and migrated to OnPush * Moved where we start to stopwatch to encapsulate the full scan * Cleaned up SignalR events to report correctly (still needs a redesign) * Remove the "remove" code until I figure it out * Put in extremely expensive series deletion code for library scan. * Have Genre and Tag update the DB immediately to avoid dup issues * Taking a break * Moving to a lock with People was successful. Need to apply to others. * Refactored code for series level and tag and genre with new locking strategy. * New scan loop works. Next up optimization * Swapped out the Kavita log with svg for faster load * Refactored metadata updates to occur when the series are being updated. * Code cleanup * Added a new type of generic message (Info) to inform the user. * Code cleanup * Implemented an optimization which prevents any I/O (other than an attribute lookup) for Library/Series Scan. This can bring a recently updated library on network storage (650 series) to fully process in 2 seconds. Fixed a bug where File Analysis was running everytime for each non-epub file. * Fixed ARM x64 builds not being able to view PDF cover images due to a bad update in DocNet. * Some code cleanup * Added experimental signalr update code to have a more natural refresh of library-detail page * Hooked in ability to send new series events to UI * Moved all scan (file scan only) tasks into Scan Queue. Made it so scheduled ScanLibraries will now check if any existing task is being run and reschedule for 3 hours, and 10 mins for scan series. * Implemented the info event in the events widget and added a clear all button to dismiss all infos and errors. Added --event-widget-info-bg-color * Remove --drawer-background-color since it's not used * When new series added, inject directly into the view. * Some debug code cleanup * Fixed up the unit tests * Ensure all config directories exist on startup * Disabled Library Watching (that will go in next build) * Ensure update for series is admin only * Lots of code changes, scan series kinda works, specials are splitting, optimizations are failing. Demotivated on this work again. * Removed SeriesFolder migration * Added the SeriesFolder migration * Added a new pipe for dates so we can provide some nicer defaults. Added folder path to the series detail. * The scan optimizations now work for NTFS systems. * Removed a TODO * Migrated all the times to use DateTime.Now and not Utc. * Refactored some repo calls to use the includes flag pattern * Implemented a check for the library scan optimization check to validate if the library was updated (type change, library rename, folder change, or series deleted) and let the optimization be bypassed. * Added another optimization which will use just folder attribute of last write time if the drive is not NTFS. * Fixed a unit test * Some code cleanup
560 lines
26 KiB
C#
560 lines
26 KiB
C#
using System;
|
|
using System.Collections.Concurrent;
|
|
using System.Collections.Generic;
|
|
using System.Diagnostics;
|
|
using System.IO;
|
|
using System.Linq;
|
|
using System.Threading;
|
|
using System.Threading.Tasks;
|
|
using API.Data;
|
|
using API.Data.Repositories;
|
|
using API.Entities;
|
|
using API.Extensions;
|
|
using API.Helpers;
|
|
using API.Parser;
|
|
using API.Services.Tasks.Metadata;
|
|
using API.Services.Tasks.Scanner;
|
|
using API.SignalR;
|
|
using Hangfire;
|
|
using Microsoft.Extensions.Logging;
|
|
|
|
namespace API.Services.Tasks;
|
|
public interface IScannerService
|
|
{
|
|
/// <summary>
|
|
/// Given a library id, scans folders for said library. Parses files and generates DB updates. Will overwrite
|
|
/// cover images if forceUpdate is true.
|
|
/// </summary>
|
|
/// <param name="libraryId">Library to scan against</param>
|
|
[Queue(TaskScheduler.ScanQueue)]
|
|
[DisableConcurrentExecution(60 * 60 * 60)]
|
|
[AutomaticRetry(Attempts = 0, OnAttemptsExceeded = AttemptsExceededAction.Delete)]
|
|
Task ScanLibrary(int libraryId);
|
|
|
|
[Queue(TaskScheduler.ScanQueue)]
|
|
[DisableConcurrentExecution(60 * 60 * 60)]
|
|
[AutomaticRetry(Attempts = 0, OnAttemptsExceeded = AttemptsExceededAction.Delete)]
|
|
Task ScanLibraries();
|
|
|
|
[Queue(TaskScheduler.ScanQueue)]
|
|
[DisableConcurrentExecution(60 * 60 * 60)]
|
|
[AutomaticRetry(Attempts = 3, OnAttemptsExceeded = AttemptsExceededAction.Delete)]
|
|
Task ScanSeries(int seriesId, bool bypassFolderOptimizationChecks = true);
|
|
|
|
[Queue(TaskScheduler.ScanQueue)]
|
|
[DisableConcurrentExecution(60 * 60 * 60)]
|
|
[AutomaticRetry(Attempts = 3, OnAttemptsExceeded = AttemptsExceededAction.Delete)]
|
|
Task ScanFolder(string folder);
|
|
|
|
}
|
|
|
|
public enum ScanCancelReason
|
|
{
|
|
/// <summary>
|
|
/// Don't cancel, everything is good
|
|
/// </summary>
|
|
NoCancel = 0,
|
|
/// <summary>
|
|
/// A folder is completely empty or missing
|
|
/// </summary>
|
|
FolderMount = 1,
|
|
/// <summary>
|
|
/// There has been no change to the filesystem since last scan
|
|
/// </summary>
|
|
NoChange = 2,
|
|
}
|
|
|
|
/**
|
|
* Responsible for Scanning the disk and importing/updating/deleting files -> DB entities.
|
|
*/
|
|
public class ScannerService : IScannerService
|
|
{
|
|
private readonly IUnitOfWork _unitOfWork;
|
|
private readonly ILogger<ScannerService> _logger;
|
|
private readonly IMetadataService _metadataService;
|
|
private readonly ICacheService _cacheService;
|
|
private readonly IEventHub _eventHub;
|
|
private readonly IDirectoryService _directoryService;
|
|
private readonly IReadingItemService _readingItemService;
|
|
private readonly IProcessSeries _processSeries;
|
|
|
|
public ScannerService(IUnitOfWork unitOfWork, ILogger<ScannerService> logger,
|
|
IMetadataService metadataService, ICacheService cacheService, IEventHub eventHub,
|
|
IDirectoryService directoryService, IReadingItemService readingItemService,
|
|
IProcessSeries processSeries)
|
|
{
|
|
_unitOfWork = unitOfWork;
|
|
_logger = logger;
|
|
_metadataService = metadataService;
|
|
_cacheService = cacheService;
|
|
_eventHub = eventHub;
|
|
_directoryService = directoryService;
|
|
_readingItemService = readingItemService;
|
|
_processSeries = processSeries;
|
|
}
|
|
|
|
[Queue(TaskScheduler.ScanQueue)]
|
|
public async Task ScanFolder(string folder)
|
|
{
|
|
// NOTE: I might want to move a lot of this code to the LibraryWatcher or something and just pack libraryId and seriesId
|
|
// Validate if we are scanning a new series (that belongs to a library) or an existing series
|
|
var seriesId = await _unitOfWork.SeriesRepository.GetSeriesIdByFolder(folder);
|
|
if (seriesId > 0)
|
|
{
|
|
BackgroundJob.Enqueue(() => ScanSeries(seriesId, true));
|
|
return;
|
|
}
|
|
|
|
var parentDirectory = _directoryService.GetParentDirectoryName(folder);
|
|
if (string.IsNullOrEmpty(parentDirectory)) return; // This should never happen as it's calculated before enqueing
|
|
|
|
var libraries = (await _unitOfWork.LibraryRepository.GetLibraryDtosAsync()).ToList();
|
|
var libraryFolders = libraries.SelectMany(l => l.Folders);
|
|
var libraryFolder = libraryFolders.Select(Parser.Parser.NormalizePath).SingleOrDefault(f => f.Contains(parentDirectory));
|
|
|
|
if (string.IsNullOrEmpty(libraryFolder)) return;
|
|
|
|
var library = libraries.FirstOrDefault(l => l.Folders.Select(Parser.Parser.NormalizePath).Contains(libraryFolder));
|
|
if (library != null)
|
|
{
|
|
BackgroundJob.Enqueue(() => ScanLibrary(library.Id));
|
|
}
|
|
}
|
|
|
|
[Queue(TaskScheduler.ScanQueue)]
|
|
public async Task ScanSeries(int seriesId, bool bypassFolderOptimizationChecks = true)
|
|
{
|
|
var sw = Stopwatch.StartNew();
|
|
var files = await _unitOfWork.SeriesRepository.GetFilesForSeries(seriesId);
|
|
var series = await _unitOfWork.SeriesRepository.GetFullSeriesForSeriesIdAsync(seriesId);
|
|
var chapterIds = await _unitOfWork.SeriesRepository.GetChapterIdsForSeriesAsync(new[] {seriesId});
|
|
var library = await _unitOfWork.LibraryRepository.GetLibraryForIdAsync(series.LibraryId, LibraryIncludes.Folders);
|
|
var libraryPaths = library.Folders.Select(f => f.Path).ToList();
|
|
if (await ShouldScanSeries(seriesId, library, libraryPaths, series, bypassFolderOptimizationChecks) != ScanCancelReason.NoCancel) return;
|
|
|
|
|
|
var parsedSeries = new Dictionary<ParsedSeries, IList<ParserInfo>>();
|
|
var seenSeries = new List<ParsedSeries>();
|
|
var processTasks = new List<Task>();
|
|
|
|
var folderPath = series.FolderPath;
|
|
if (string.IsNullOrEmpty(folderPath) || !_directoryService.Exists(folderPath))
|
|
{
|
|
// We don't care if it's multiple due to new scan loop enforcing all in one root directory
|
|
var seriesDirs = _directoryService.FindHighestDirectoriesFromFiles(libraryPaths, files.Select(f => f.FilePath).ToList());
|
|
if (seriesDirs.Keys.Count == 0)
|
|
{
|
|
_logger.LogCritical("Scan Series has files spread outside a main series folder. Defaulting to library folder (this is expensive)");
|
|
await _eventHub.SendMessageAsync(MessageFactory.Info, MessageFactory.InfoEvent($"{series.Name} is not organized well and scan series will be expensive!", "Scan Series has files spread outside a main series folder. Defaulting to library folder (this is expensive)"));
|
|
seriesDirs = _directoryService.FindHighestDirectoriesFromFiles(libraryPaths, files.Select(f => f.FilePath).ToList());
|
|
}
|
|
|
|
folderPath = seriesDirs.Keys.FirstOrDefault();
|
|
}
|
|
|
|
if (string.IsNullOrEmpty(folderPath))
|
|
{
|
|
_logger.LogCritical("Scan Series could not find a single, valid folder root for files");
|
|
await _eventHub.SendMessageAsync(MessageFactory.Error, MessageFactory.ErrorEvent($"{series.Name} scan aborted", "Scan Series could not find a single, valid folder root for files"));
|
|
return;
|
|
}
|
|
|
|
|
|
await _eventHub.SendMessageAsync(MessageFactory.NotificationProgress, MessageFactory.LibraryScanProgressEvent(library.Name, ProgressEventType.Started, series.Name));
|
|
|
|
await _processSeries.Prime();
|
|
void TrackFiles(Tuple<bool, IList<ParserInfo>> parsedInfo)
|
|
{
|
|
var skippedScan = parsedInfo.Item1;
|
|
var parsedFiles = parsedInfo.Item2;
|
|
if (parsedFiles.Count == 0) return;
|
|
|
|
var foundParsedSeries = new ParsedSeries()
|
|
{
|
|
Name = parsedFiles.First().Series,
|
|
NormalizedName = Parser.Parser.Normalize(parsedFiles.First().Series),
|
|
Format = parsedFiles.First().Format
|
|
};
|
|
|
|
if (skippedScan)
|
|
{
|
|
seenSeries.AddRange(parsedFiles.Select(pf => new ParsedSeries()
|
|
{
|
|
Name = pf.Series,
|
|
NormalizedName = Parser.Parser.Normalize(pf.Series),
|
|
Format = pf.Format
|
|
}));
|
|
return;
|
|
}
|
|
|
|
seenSeries.Add(foundParsedSeries);
|
|
processTasks.Add(_processSeries.ProcessSeriesAsync(parsedFiles, library));
|
|
parsedSeries.Add(foundParsedSeries, parsedFiles);
|
|
}
|
|
|
|
_logger.LogInformation("Beginning file scan on {SeriesName}", series.Name);
|
|
var scanElapsedTime = await ScanFiles(library, new []{folderPath}, false, TrackFiles, bypassFolderOptimizationChecks);
|
|
_logger.LogInformation("ScanFiles for {Series} took {Time}", series.Name, scanElapsedTime);
|
|
|
|
await Task.WhenAll(processTasks);
|
|
|
|
// We need to handle if parsedSeries is empty but seenSeries has our series
|
|
if (seenSeries.Any(s => s.NormalizedName.Equals(series.NormalizedName)) && parsedSeries.Keys.Count == 0)
|
|
{
|
|
// Nothing has changed
|
|
_logger.LogInformation("[ScannerService] {SeriesName} scan has no work to do. All folders have not been changed since last scan", series.Name);
|
|
await _eventHub.SendMessageAsync(MessageFactory.Info,
|
|
MessageFactory.InfoEvent($"{series.Name} scan has no work to do",
|
|
"All folders have not been changed since last scan. Scan will be aborted."));
|
|
|
|
_processSeries.EnqueuePostSeriesProcessTasks(series.LibraryId, seriesId, false);
|
|
return;
|
|
}
|
|
|
|
// Remove any parsedSeries keys that don't belong to our series. This can occur when users store 2 series in the same folder
|
|
RemoveParsedInfosNotForSeries(parsedSeries, series);
|
|
|
|
// If nothing was found, first validate any of the files still exist. If they don't then we have a deletion and can skip the rest of the logic flow
|
|
if (parsedSeries.Count == 0)
|
|
{
|
|
var anyFilesExist =
|
|
(await _unitOfWork.SeriesRepository.GetFilesForSeries(series.Id)).Any(m => File.Exists(m.FilePath));
|
|
|
|
if (!anyFilesExist)
|
|
{
|
|
try
|
|
{
|
|
_unitOfWork.SeriesRepository.Remove(series);
|
|
await CommitAndSend(1, sw, scanElapsedTime, series);
|
|
}
|
|
catch (Exception ex)
|
|
{
|
|
_logger.LogCritical(ex, "There was an error during ScanSeries to delete the series as no files could be found. Aborting scan");
|
|
await _unitOfWork.RollbackAsync();
|
|
return;
|
|
}
|
|
}
|
|
else
|
|
{
|
|
// I think we should just fail and tell user to fix their setup. This is extremely expensive for an edge case
|
|
_logger.LogCritical("We weren't able to find any files in the series scan, but there should be. Please correct your naming convention or put Series in a dedicated folder. Aborting scan");
|
|
await _eventHub.SendMessageAsync(MessageFactory.Error,
|
|
MessageFactory.ErrorEvent($"Error scanning {series.Name}", "We weren't able to find any files in the series scan, but there should be. Please correct your naming convention or put Series in a dedicated folder. Aborting scan"));
|
|
await _unitOfWork.RollbackAsync();
|
|
return;
|
|
}
|
|
// At this point, parsedSeries will have at least one key and we can perform the update. If it still doesn't, just return and don't do anything
|
|
if (parsedSeries.Count == 0) return;
|
|
}
|
|
|
|
|
|
try
|
|
{
|
|
await _eventHub.SendMessageAsync(MessageFactory.NotificationProgress, MessageFactory.LibraryScanProgressEvent(library.Name, ProgressEventType.Started, series.Name));
|
|
var parsedInfos = ParseScannedFiles.GetInfosByName(parsedSeries, series);
|
|
await _processSeries.ProcessSeriesAsync(parsedInfos, library);
|
|
await _eventHub.SendMessageAsync(MessageFactory.NotificationProgress, MessageFactory.LibraryScanProgressEvent(library.Name, ProgressEventType.Ended, series.Name));
|
|
|
|
await CommitAndSend(1, sw, scanElapsedTime, series);
|
|
}
|
|
catch (Exception ex)
|
|
{
|
|
_logger.LogCritical(ex, "There was an error during ScanSeries to update the series");
|
|
await _unitOfWork.RollbackAsync();
|
|
}
|
|
// Tell UI that this series is done
|
|
await _eventHub.SendMessageAsync(MessageFactory.ScanSeries,
|
|
MessageFactory.ScanSeriesEvent(library.Id, seriesId, series.Name));
|
|
|
|
|
|
await _metadataService.RemoveAbandonedMetadataKeys();
|
|
BackgroundJob.Enqueue(() => _cacheService.CleanupChapters(chapterIds));
|
|
BackgroundJob.Enqueue(() => _directoryService.ClearDirectory(_directoryService.TempDirectory));
|
|
}
|
|
|
|
private async Task<ScanCancelReason> ShouldScanSeries(int seriesId, Library library, IList<string> libraryPaths, Series series, bool bypassFolderChecks = false)
|
|
{
|
|
var seriesFolderPaths = (await _unitOfWork.SeriesRepository.GetFilesForSeries(seriesId))
|
|
.Select(f => _directoryService.FileSystem.FileInfo.FromFileName(f.FilePath).Directory.FullName)
|
|
.Distinct()
|
|
.ToList();
|
|
|
|
if (!await CheckMounts(library.Name, seriesFolderPaths))
|
|
{
|
|
_logger.LogCritical(
|
|
"Some of the root folders for library are not accessible. Please check that drives are connected and rescan. Scan will be aborted");
|
|
return ScanCancelReason.FolderMount;
|
|
}
|
|
|
|
if (!await CheckMounts(library.Name, libraryPaths))
|
|
{
|
|
_logger.LogCritical(
|
|
"Some of the root folders for library are not accessible. Please check that drives are connected and rescan. Scan will be aborted");
|
|
return ScanCancelReason.FolderMount;
|
|
}
|
|
|
|
// If all series Folder paths haven't been modified since last scan, abort
|
|
// NOTE: On windows, the parent folder will not update LastWriteTime if a subfolder was updated with files. Need to do a bit of light I/O.
|
|
if (!bypassFolderChecks)
|
|
{
|
|
|
|
var allFolders = seriesFolderPaths.SelectMany(path => _directoryService.GetDirectories(path)).ToList();
|
|
allFolders.AddRange(seriesFolderPaths);
|
|
|
|
if (allFolders.All(folder => _directoryService.GetLastWriteTime(folder) <= series.LastFolderScanned))
|
|
{
|
|
_logger.LogInformation(
|
|
"[ScannerService] {SeriesName} scan has no work to do. All folders have not been changed since last scan",
|
|
series.Name);
|
|
await _eventHub.SendMessageAsync(MessageFactory.Info,
|
|
MessageFactory.InfoEvent($"{series.Name} scan has no work to do", "All folders have not been changed since last scan. Scan will be aborted."));
|
|
return ScanCancelReason.NoChange;
|
|
}
|
|
}
|
|
|
|
|
|
return ScanCancelReason.NoCancel;
|
|
}
|
|
|
|
private static void RemoveParsedInfosNotForSeries(Dictionary<ParsedSeries, IList<ParserInfo>> parsedSeries, Series series)
|
|
{
|
|
var keys = parsedSeries.Keys;
|
|
foreach (var key in keys.Where(key => !SeriesHelper.FindSeries(series, key))) // series.Format != key.Format ||
|
|
{
|
|
parsedSeries.Remove(key);
|
|
}
|
|
}
|
|
|
|
private async Task CommitAndSend(int seriesCount, Stopwatch sw, long scanElapsedTime, Series series)
|
|
{
|
|
if (_unitOfWork.HasChanges())
|
|
{
|
|
await _unitOfWork.CommitAsync();
|
|
_logger.LogInformation(
|
|
"Processed files and {SeriesCount} series in {ElapsedScanTime} milliseconds for {SeriesName}",
|
|
seriesCount, sw.ElapsedMilliseconds + scanElapsedTime, series.Name);
|
|
}
|
|
}
|
|
|
|
/// <summary>
|
|
/// Ensure that all library folders are mounted. In the case that any are empty or non-existent, emit an event to the UI via EventHub and return false
|
|
/// </summary>
|
|
/// <param name="libraryName"></param>
|
|
/// <param name="folders"></param>
|
|
/// <returns></returns>
|
|
private async Task<bool> CheckMounts(string libraryName, IList<string> folders)
|
|
{
|
|
// Check if any of the folder roots are not available (ie disconnected from network, etc) and fail if any of them are
|
|
if (folders.Any(f => !_directoryService.IsDriveMounted(f)))
|
|
{
|
|
_logger.LogCritical("Some of the root folders for library ({LibraryName} are not accessible. Please check that drives are connected and rescan. Scan will be aborted", libraryName);
|
|
|
|
await _eventHub.SendMessageAsync(MessageFactory.Error,
|
|
MessageFactory.ErrorEvent("Some of the root folders for library are not accessible. Please check that drives are connected and rescan. Scan will be aborted",
|
|
string.Join(", ", folders.Where(f => !_directoryService.IsDriveMounted(f)))));
|
|
|
|
return false;
|
|
}
|
|
|
|
|
|
// For Docker instances check if any of the folder roots are not available (ie disconnected volumes, etc) and fail if any of them are
|
|
if (folders.Any(f => _directoryService.IsDirectoryEmpty(f)))
|
|
{
|
|
// That way logging and UI informing is all in one place with full context
|
|
_logger.LogError("Some of the root folders for the library are empty. " +
|
|
"Either your mount has been disconnected or you are trying to delete all series in the library. " +
|
|
"Scan has be aborted. " +
|
|
"Check that your mount is connected or change the library's root folder and rescan");
|
|
|
|
await _eventHub.SendMessageAsync(MessageFactory.Error, MessageFactory.ErrorEvent( $"Some of the root folders for the library, {libraryName}, are empty.",
|
|
"Either your mount has been disconnected or you are trying to delete all series in the library. " +
|
|
"Scan has be aborted. " +
|
|
"Check that your mount is connected or change the library's root folder and rescan"));
|
|
|
|
return false;
|
|
}
|
|
|
|
return true;
|
|
}
|
|
|
|
[Queue(TaskScheduler.ScanQueue)]
|
|
[DisableConcurrentExecution(60 * 60 * 60)]
|
|
[AutomaticRetry(Attempts = 0, OnAttemptsExceeded = AttemptsExceededAction.Delete)]
|
|
public async Task ScanLibraries()
|
|
{
|
|
_logger.LogInformation("Starting Scan of All Libraries");
|
|
foreach (var lib in await _unitOfWork.LibraryRepository.GetLibrariesAsync())
|
|
{
|
|
await ScanLibrary(lib.Id);
|
|
}
|
|
_logger.LogInformation("Scan of All Libraries Finished");
|
|
}
|
|
|
|
|
|
/// <summary>
|
|
/// Scans a library for file changes.
|
|
/// Will kick off a scheduled background task to refresh metadata,
|
|
/// ie) all entities will be rechecked for new cover images and comicInfo.xml changes
|
|
/// </summary>
|
|
/// <param name="libraryId"></param>
|
|
[Queue(TaskScheduler.ScanQueue)]
|
|
[DisableConcurrentExecution(60 * 60 * 60)]
|
|
[AutomaticRetry(Attempts = 0, OnAttemptsExceeded = AttemptsExceededAction.Delete)]
|
|
public async Task ScanLibrary(int libraryId)
|
|
{
|
|
var sw = Stopwatch.StartNew();
|
|
var library = await _unitOfWork.LibraryRepository.GetLibraryForIdAsync(libraryId, LibraryIncludes.Folders);
|
|
var libraryFolderPaths = library.Folders.Select(fp => fp.Path).ToList();
|
|
if (!await CheckMounts(library.Name, libraryFolderPaths)) return;
|
|
|
|
// If all library Folder paths haven't been modified since last scan, abort
|
|
// Unless the user did something on the library (delete series) and thus we can bypass this check
|
|
var wasLibraryUpdatedSinceLastScan = (library.LastModified.Truncate(TimeSpan.TicksPerMinute) >
|
|
library.LastScanned.Truncate(TimeSpan.TicksPerMinute))
|
|
&& library.LastScanned != DateTime.MinValue;
|
|
if (!wasLibraryUpdatedSinceLastScan)
|
|
{
|
|
var haveFoldersChangedSinceLastScan = library.Folders
|
|
.All(f => _directoryService.GetLastWriteTime(f.Path).Truncate(TimeSpan.TicksPerMinute) > f.LastScanned.Truncate(TimeSpan.TicksPerMinute));
|
|
|
|
// If nothing changed && library folder's have all been scanned at least once
|
|
if (!haveFoldersChangedSinceLastScan && library.Folders.All(f => f.LastScanned > DateTime.MinValue))
|
|
{
|
|
_logger.LogInformation("[ScannerService] {LibraryName} scan has no work to do. All folders have not been changed since last scan", library.Name);
|
|
await _eventHub.SendMessageAsync(MessageFactory.Info,
|
|
MessageFactory.InfoEvent($"{library.Name} scan has no work to do",
|
|
"All folders have not been changed since last scan. Scan will be aborted."));
|
|
return;
|
|
}
|
|
}
|
|
|
|
|
|
// Validations are done, now we can start actual scan
|
|
_logger.LogInformation("[ScannerService] Beginning file scan on {LibraryName}", library.Name);
|
|
|
|
// This doesn't work for something like M:/Manga/ and a series has library folder as root
|
|
var shouldUseLibraryScan = !(await _unitOfWork.LibraryRepository.DoAnySeriesFoldersMatch(libraryFolderPaths));
|
|
if (!shouldUseLibraryScan)
|
|
{
|
|
_logger.LogInformation("Library {LibraryName} consists of one ore more Series folders, using series scan", library.Name);
|
|
}
|
|
|
|
|
|
var totalFiles = 0;
|
|
var seenSeries = new List<ParsedSeries>();
|
|
|
|
|
|
await _processSeries.Prime();
|
|
var processTasks = new List<Task>();
|
|
void TrackFiles(Tuple<bool, IList<ParserInfo>> parsedInfo)
|
|
{
|
|
var skippedScan = parsedInfo.Item1;
|
|
var parsedFiles = parsedInfo.Item2;
|
|
if (parsedFiles.Count == 0) return;
|
|
|
|
var foundParsedSeries = new ParsedSeries()
|
|
{
|
|
Name = parsedFiles.First().Series,
|
|
NormalizedName = Parser.Parser.Normalize(parsedFiles.First().Series),
|
|
Format = parsedFiles.First().Format
|
|
};
|
|
|
|
if (skippedScan)
|
|
{
|
|
seenSeries.AddRange(parsedFiles.Select(pf => new ParsedSeries()
|
|
{
|
|
Name = pf.Series,
|
|
NormalizedName = Parser.Parser.Normalize(pf.Series),
|
|
Format = pf.Format
|
|
}));
|
|
return;
|
|
}
|
|
|
|
totalFiles += parsedFiles.Count;
|
|
|
|
|
|
seenSeries.Add(foundParsedSeries);
|
|
processTasks.Add(_processSeries.ProcessSeriesAsync(parsedFiles, library));
|
|
}
|
|
|
|
|
|
var scanElapsedTime = await ScanFiles(library, libraryFolderPaths, shouldUseLibraryScan, TrackFiles);
|
|
|
|
|
|
await Task.WhenAll(processTasks);
|
|
|
|
//await _eventHub.SendMessageAsync(MessageFactory.NotificationProgress, MessageFactory.LibraryScanProgressEvent(library.Name, ProgressEventType.Ended, string.Empty));
|
|
await _eventHub.SendMessageAsync(MessageFactory.NotificationProgress, MessageFactory.FileScanProgressEvent(string.Empty, library.Name, ProgressEventType.Ended));
|
|
|
|
_logger.LogInformation("[ScannerService] Finished file scan in {ScanAndUpdateTime}. Updating database", scanElapsedTime);
|
|
|
|
var time = DateTime.Now;
|
|
foreach (var folderPath in library.Folders)
|
|
{
|
|
folderPath.LastScanned = time;
|
|
}
|
|
|
|
library.LastScanned = time;
|
|
|
|
// Could I delete anything in a Library's Series where the LastScan date is before scanStart?
|
|
// NOTE: This implementation is expensive
|
|
await _unitOfWork.SeriesRepository.RemoveSeriesNotInList(seenSeries, library.Id);
|
|
|
|
_unitOfWork.LibraryRepository.Update(library);
|
|
if (await _unitOfWork.CommitAsync())
|
|
{
|
|
_logger.LogInformation(
|
|
"[ScannerService] Finished scan of {TotalFiles} files and {ParsedSeriesCount} series in {ElapsedScanTime} milliseconds for {LibraryName}",
|
|
totalFiles, seenSeries.Count, sw.ElapsedMilliseconds, library.Name);
|
|
}
|
|
else
|
|
{
|
|
_logger.LogCritical(
|
|
"[ScannerService] There was a critical error that resulted in a failed scan. Please check logs and rescan");
|
|
}
|
|
|
|
await _eventHub.SendMessageAsync(MessageFactory.NotificationProgress, MessageFactory.LibraryScanProgressEvent(library.Name, ProgressEventType.Ended, string.Empty));
|
|
await _metadataService.RemoveAbandonedMetadataKeys();
|
|
|
|
BackgroundJob.Enqueue(() => _directoryService.ClearDirectory(_directoryService.TempDirectory));
|
|
}
|
|
|
|
private async Task<long> ScanFiles(Library library, IEnumerable<string> dirs,
|
|
bool isLibraryScan, Action<Tuple<bool, IList<ParserInfo>>> processSeriesInfos = null, bool forceChecks = false)
|
|
{
|
|
var scanner = new ParseScannedFiles(_logger, _directoryService, _readingItemService, _eventHub);
|
|
var scanWatch = Stopwatch.StartNew();
|
|
|
|
await scanner.ScanLibrariesForSeries(library.Type, dirs, library.Name,
|
|
isLibraryScan, await _unitOfWork.SeriesRepository.GetFolderPathMap(library.Id), processSeriesInfos, forceChecks);
|
|
|
|
var scanElapsedTime = scanWatch.ElapsedMilliseconds;
|
|
|
|
return scanElapsedTime;
|
|
}
|
|
|
|
/// <summary>
|
|
/// Remove any user progress rows that no longer exist since scan library ran and deleted series/volumes/chapters
|
|
/// </summary>
|
|
private async Task CleanupAbandonedChapters()
|
|
{
|
|
var cleanedUp = await _unitOfWork.AppUserProgressRepository.CleanupAbandonedChapters();
|
|
_logger.LogInformation("Removed {Count} abandoned progress rows", cleanedUp);
|
|
}
|
|
|
|
|
|
/// <summary>
|
|
/// Cleans up any abandoned rows due to removals from Scan loop
|
|
/// </summary>
|
|
private async Task CleanupDbEntities()
|
|
{
|
|
await CleanupAbandonedChapters();
|
|
var cleanedUp = await _unitOfWork.CollectionTagRepository.RemoveTagsWithoutSeries();
|
|
_logger.LogInformation("Removed {Count} abandoned collection tags", cleanedUp);
|
|
}
|
|
|
|
public static IEnumerable<Series> FindSeriesNotOnDisk(IEnumerable<Series> existingSeries, Dictionary<ParsedSeries, IList<ParserInfo>> parsedSeries)
|
|
{
|
|
return existingSeries.Where(es => !ParserInfoHelpers.SeriesHasMatchingParserInfoFormat(es, parsedSeries));
|
|
}
|
|
}
|