Kavita/API/Services/Tasks/Scanner/ParseScannedFiles.cs
Joseph Milazzo 33db123e81
v0.4.8 Release (#720)
* Bump versions by dotnet-bump-version.

* Bump versions by dotnet-bump-version.

* Workflow updates (#658)

# Added
- Added: Added automatic character parsing for discord notifier. Now if the PR is over a certain character limit, it will trim and add an appropriate link to the full changelog. (Release for Stable, PR for Dev)

# Removed
- Removed: Removed Sentry map task from the workflow since Sentry is no longer used.

* Bump versions by dotnet-bump-version.

* Misc Updates (#665)

* Do not allow non-admins to change their passwords when authentication is disabled

* Clean up the login page so that input field text is black

* cleanup some resizing when typing a password and having a lot of users

* Changed the LastActive for a user to not just be login, but also when they open an already authenticated session.

* Bump versions by dotnet-bump-version.

* Logging Cleanup (#668)

* Do not allow non-admins to change their passwords when authentication is disabled

* Clean up the login page so that input field text is black

* cleanup some resizing when typing a password and having a lot of users

* Changed the LastActive for a user to not just be login, but also when they open an already authenticated session.

* Removed some verbose debugging statements and moved some debug to information to be more prevelant to logs for default installs.

* In Progress now sends progress information on the Series

* Add ability to add cards to recently added when new series are added in backend

* Implemented the ability to click the glasses icon to turn off incognito mode from within the reader so you can start tracking progress

* Don't warn the user about authentication when they don't touch that control

* Bump versions by dotnet-bump-version.

* Changed the stats that are sent back to stat server from installed server.

* Revert "Changed the stats that are sent back to stat server from installed server."

This reverts commit 644cb6d1f6.

* Bump versions by dotnet-bump-version.

* Bump versions by dotnet-bump-version.

* Bulk Add to Collection (#674)

* Fixed the typeahead not having the same size input box as other inputs

* Implemented the ability to add multiple series to a collection through bulk operations flow. Updated book parser to handle "@import url('...');" syntax as well as @import '...';

* Implemented the ability to create a new Collection tag via bulk operations flow.

* Bump versions by dotnet-bump-version.

* Bulk Operations for In Progress and Recently Added (#677)

* Don't log a message about bad match if the file is a cover image

* Enable bulk operations for In Progress and Recently Added

* Fixed a bad logic case

* Bump versions by dotnet-bump-version.

* Regression Fix (#680)

* Ensure we mount the backups directory for Docker users

* Fixed a huge logic bug that deleted files in users libraries

* Bump versions by dotnet-bump-version.

* Change chunk size to be a fixed 50 to validate if it's causing issue with refresh. Added some try catches to see if exceptions are causing issues. (#681)

* Bump versions by dotnet-bump-version.

* Fixed a bug where searching on localized name would fail to show on the search. Fixed a bug where extra spaces would cause the search results not to show properly. (#682)

* Bump versions by dotnet-bump-version.

* When we have a special marker, ensure we fall back to folder parsing to try and group correctly to the actual series before just accepting what we parsed. (#684)

Fixed a missed parsing case where comic special parsing wasn't being called on comic libraries.

* Bump versions by dotnet-bump-version.

* iOS Admin page dropdown fix (#686)

# Fixed:
- Fixed: Fixed an issue where the dropdown on the admin server page would not work on Safari or other iOS browsers.

* When the DB fails to save, log out all the series the user should look into for constraint issues and push a message to the admins connected to webui. (#687)

* Bump versions by dotnet-bump-version.

* Bump versions by dotnet-bump-version.

* Stat upload will now schedule itself between midnight and 6am in server time for upload. (#688)

* Bump versions by dotnet-bump-version.

* EPUB CSS Parsing Issues (#690)

* WIP. Rewrote some of the Regex to better support css escaping. We now escape background-image, border-image, and list-style-image within css files.

* Added position relative to help with positioning on books that are just absolute positioned elements.

* When there is absolute positioning, like in some epub based comics, supress the bottom action bar since it wont render in the correct location.

* Fixed tests

* Commented out tests

* Bump versions by dotnet-bump-version.

* More EPUB Scoping Fixes (#691)

* Added better handling around when importing css files that are empty. Moved comment removal on css files to before some css whitespace cleanup to get better matches.

* Some enhancements on the checks to see if we need the bottom action bar on reader. Now we don't query DOM and have something that works more reliably.

* Bump versions by dotnet-bump-version.

* Fixed an issue where docker users were not properly backing up the database. Removed an empty File for when covers/ had nothing in it. (#692)

* Bump versions by dotnet-bump-version.

* Fallback to Folder Parsing Issue (#694)

* Fixed a bug in the scanner where we fall back to parsing from folders for poorly named files. The code was exiting early if a chapter or volume could be parsed out.

* Fixed a unit test by tweaking a regex for fallback

* Bump versions by dotnet-bump-version.

* KavitaStats Cleanup (#695)

* Refactored Stats code to be much cleaner and user better naming.

* Cleaned up the actual http code to use Flurl and to return if the upload was successful or not so we can delete the file where appropriate.

* More refactoring for the stats code to clean it up and keep it consistent with our standards.

* Removed a confusing log statement

* Added support for old api key header from original stat server

* Use the correct endpoint, not the new one.

* Code smell

* Bump versions by dotnet-bump-version.

* Bulk Deletion (#697)

* Implemented bulk deletion of series

* Don't show unauthorized exception on UI, just redirect to the login page.

* Bump versions by dotnet-bump-version.

* Cover Image Picking + Forwarding Headers with EPUBs (#700)

* Ensure Kavita knows about forwarding headers (fixes issue with epub urls not going through https with reverse proxy). Fixed a case where cover image selection preferred nested folders vs files in root directory.

* Fixed broken unit test

* Added bug that I fixed to the unit tests

* Cover Image Picking + Forwarding Headers with EPUBs (#702)

* Updating GA Bump version temporarily for fix (#703)

* Bump versions by dotnet-bump-version.

* Cover Image Picking + Forwarding Headers with EPUBs (GA Fix) (#704)

* Bump versions by dotnet-bump-version.

* Vacation Fixes (#709)

* Ignore system and hidden folders when performing directory scan.

* Fixed the comic parser tests not using Comic mode for parsing.

* Accept all forwarded headers and use them.

* Ignore some changes from another branch

* Bump versions by dotnet-bump-version.

* Breaking Changes: Docker Parity (#698)

* Refactored all the config files for Kavita to be loaded from config/. This will allow docker to just mount one folder and for Update functionality to be trivial.

* Cleaned up documentation around new update method.

* Updated docker files to support config directory

* Removed entrypoint, no longer needed

* Update appsettings to point to config directory for logs

* Updated message for docker users that are upgrading

* Ensure that docker users that have not updated their mount points from upgrade cannot start the server

* Code smells

* More cleanup

* Added entrypoint to fix bind mount issues

* Updated README with new folder structure

* Fixed build system for new setup

* Updated string path if user is docker

* Updated the migration flow for docker to work properly and Fixed LogFile configuration updating.

* Migrating docker images is now working 100%

* Fixed config from bad code

* Code cleanup

Co-authored-by: Chris Plaatjes <kizaing@gmail.com>

* Bump versions by dotnet-bump-version.

* Feature/docker parity (#714)

* Refactored all the config files for Kavita to be loaded from config/. This will allow docker to just mount one folder and for Update functionality to be trivial.

* Cleaned up documentation around new update method.

* Updated docker files to support config directory

* Removed entrypoint, no longer needed

* Update appsettings to point to config directory for logs

* Updated message for docker users that are upgrading

* Ensure that docker users that have not updated their mount points from upgrade cannot start the server

* Code smells

* More cleanup

* Added entrypoint to fix bind mount issues

* Updated README with new folder structure

* Fixed build system for new setup

* Updated string path if user is docker

* Updated the migration flow for docker to work properly and Fixed LogFile configuration updating.

* Migrating docker images is now working 100%

* Fixed config from bad code

* Code cleanup

* Fixed monorepo-build.sh

Co-authored-by: Chris Plaatjes <kizaing@gmail.com>

* Breaking Changes: Docker Parity (#715)

* Fixed a bug in the copy directory to directory in the migration

* Somehow GetFiles lost static modifier.

* Bump versions by dotnet-bump-version.

* Build issue (#716)

* Fixed a bug in the copy directory to directory in the migration

* Somehow GetFiles lost static modifier.

* Please work

* Bump versions by dotnet-bump-version.

* Bump versions by dotnet-bump-version.

* Shakeout Changes (#717)

* Make the appsettings public on Configuration and change how we detect when to migrate for non-docker users.

* Fixed up non-docker copy command and removed duplicate check on source directory for a copy.

* Don't delete files unless we know we are successful

* Bump versions by dotnet-bump-version.

* Fixed a migration issue on docker happening too many times or throwing exception when source wasn't there. (#719)

* Bump versions by dotnet-bump-version.

* Version bump for release (#718)

* Bump versions by dotnet-bump-version.

Co-authored-by: Robbie Davis <robbie@therobbiedavis.com>
Co-authored-by: YEGCSharpDev <89283498+YEGCSharpDev@users.noreply.github.com>
Co-authored-by: Chris Plaatjes <kizaing@gmail.com>
2021-11-04 05:29:02 -07:00

210 lines
8.1 KiB
C#

using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using API.Entities;
using API.Entities.Enums;
using API.Interfaces.Services;
using API.Parser;
using Microsoft.Extensions.Logging;
namespace API.Services.Tasks.Scanner
{
public class ParsedSeries
{
public string Name { get; init; }
public string NormalizedName { get; init; }
public MangaFormat Format { get; init; }
}
public class ParseScannedFiles
{
private readonly ConcurrentDictionary<ParsedSeries, List<ParserInfo>> _scannedSeries;
private readonly IBookService _bookService;
private readonly ILogger _logger;
/// <summary>
/// An instance of a pipeline for processing files and returning a Map of Series -> ParserInfos.
/// Each instance is separate from other threads, allowing for no cross over.
/// </summary>
/// <param name="bookService"></param>
/// <param name="logger"></param>
public ParseScannedFiles(IBookService bookService, ILogger logger)
{
_bookService = bookService;
_logger = logger;
_scannedSeries = new ConcurrentDictionary<ParsedSeries, List<ParserInfo>>();
}
/// <summary>
/// Gets the list of parserInfos given a Series. If the series does not exist within, return empty list.
/// </summary>
/// <param name="parsedSeries"></param>
/// <param name="series"></param>
/// <returns></returns>
public static IList<ParserInfo> GetInfosByName(Dictionary<ParsedSeries, List<ParserInfo>> parsedSeries, Series series)
{
var existingKey = parsedSeries.Keys.FirstOrDefault(ps =>
ps.Format == series.Format && ps.NormalizedName.Equals(Parser.Parser.Normalize(series.OriginalName)));
return existingKey != null ? parsedSeries[existingKey] : new List<ParserInfo>();
}
/// <summary>
/// Processes files found during a library scan.
/// Populates a collection of <see cref="ParserInfo"/> for DB updates later.
/// </summary>
/// <param name="path">Path of a file</param>
/// <param name="rootPath"></param>
/// <param name="type">Library type to determine parsing to perform</param>
private void ProcessFile(string path, string rootPath, LibraryType type)
{
ParserInfo info;
if (Parser.Parser.IsEpub(path))
{
info = _bookService.ParseInfo(path);
}
else
{
info = Parser.Parser.Parse(path, rootPath, type);
}
// If we couldn't match, log. But don't log if the file parses as a cover image
if (info == null)
{
if (!(Parser.Parser.IsImage(path) && Parser.Parser.IsCoverImage(path)))
{
_logger.LogWarning("[Scanner] Could not parse series from {Path}", path);
}
return;
}
if (Parser.Parser.IsEpub(path) && Parser.Parser.ParseVolume(info.Series) != Parser.Parser.DefaultVolume)
{
info = Parser.Parser.Parse(path, rootPath, type);
var info2 = _bookService.ParseInfo(path);
info.Merge(info2);
}
TrackSeries(info);
}
/// <summary>
/// Attempts to either add a new instance of a show mapping to the _scannedSeries bag or adds to an existing.
/// This will check if the name matches an existing series name (multiple fields) <see cref="MergeName"/>
/// </summary>
/// <param name="info"></param>
private void TrackSeries(ParserInfo info)
{
if (info.Series == string.Empty) return;
// Check if normalized info.Series already exists and if so, update info to use that name instead
info.Series = MergeName(info);
var existingKey = _scannedSeries.Keys.FirstOrDefault(ps =>
ps.Format == info.Format && ps.NormalizedName == Parser.Parser.Normalize(info.Series));
existingKey ??= new ParsedSeries()
{
Format = info.Format,
Name = info.Series,
NormalizedName = Parser.Parser.Normalize(info.Series)
};
_scannedSeries.AddOrUpdate(existingKey, new List<ParserInfo>() {info}, (_, oldValue) =>
{
oldValue ??= new List<ParserInfo>();
if (!oldValue.Contains(info))
{
oldValue.Add(info);
}
return oldValue;
});
}
/// <summary>
/// Using a normalized name from the passed ParserInfo, this checks against all found series so far and if an existing one exists with
/// same normalized name, it merges into the existing one. This is important as some manga may have a slight difference with punctuation or capitalization.
/// </summary>
/// <param name="info"></param>
/// <returns></returns>
public string MergeName(ParserInfo info)
{
var normalizedSeries = Parser.Parser.Normalize(info.Series);
var existingName =
_scannedSeries.SingleOrDefault(p => Parser.Parser.Normalize(p.Key.NormalizedName) == normalizedSeries && p.Key.Format == info.Format)
.Key;
if (existingName != null && !string.IsNullOrEmpty(existingName.Name))
{
return existingName.Name;
}
return info.Series;
}
/// <summary>
///
/// </summary>
/// <param name="libraryType">Type of library. Used for selecting the correct file extensions to search for and parsing files</param>
/// <param name="folders">The folders to scan. By default, this should be library.Folders, however it can be overwritten to restrict folders</param>
/// <param name="totalFiles">Total files scanned</param>
/// <param name="scanElapsedTime">Time it took to scan and parse files</param>
/// <returns></returns>
public Dictionary<ParsedSeries, List<ParserInfo>> ScanLibrariesForSeries(LibraryType libraryType, IEnumerable<string> folders, out int totalFiles,
out long scanElapsedTime)
{
var sw = Stopwatch.StartNew();
totalFiles = 0;
var searchPattern = GetLibrarySearchPattern();
foreach (var folderPath in folders)
{
try
{
totalFiles += DirectoryService.TraverseTreeParallelForEach(folderPath, (f) =>
{
try
{
ProcessFile(f, folderPath, libraryType);
}
catch (FileNotFoundException exception)
{
_logger.LogError(exception, "The file {Filename} could not be found", f);
}
}, searchPattern, _logger);
}
catch (ArgumentException ex)
{
_logger.LogError(ex, "The directory '{FolderPath}' does not exist", folderPath);
}
}
scanElapsedTime = sw.ElapsedMilliseconds;
_logger.LogInformation("Scanned {TotalFiles} files in {ElapsedScanTime} milliseconds", totalFiles,
scanElapsedTime);
return SeriesWithInfos();
}
private static string GetLibrarySearchPattern()
{
return Parser.Parser.SupportedExtensions;
}
/// <summary>
/// Returns any series where there were parsed infos
/// </summary>
/// <returns></returns>
private Dictionary<ParsedSeries, List<ParserInfo>> SeriesWithInfos()
{
var filtered = _scannedSeries.Where(kvp => kvp.Value.Count > 0);
var series = filtered.ToDictionary(v => v.Key, v => v.Value);
return series;
}
}
}