PDF Support + MORE!!!! (#416)

# Added
- Added support for PDFs within Kavita. PDFs will open in the Manga reader and you can read through them as images. PDFs are heavier than archives, so they may take longer to open for reading. (Fixes #187)

# Changed
- Changed: Major change in how Kavita libraries work. Kavita libraries will now allow for mixed media types, that means you can have raw images, archives, epubs, and pdfs all within your Manga library. In the case that the same Series exists between 2 different types of medias, they will be separated and an icon will show to help you identify the types. The correct reader will open regardless of what library you are on. Note: Nightly users need to delete their Raw Images libraries before updating.

# Fixed
- Fixed: Fixed an issue where checking if a file was modified since last scan always returned true, meaning we would do more I/O than was needed (Fixes #415)
- Fixed: There wasn't enough spacing on the top menu bar on the Manga reader
- Fixed: Fixed a bug where user preferences dark mode control always showed true, even if you were not using dark mode

# Dev stuff
- For image extraction, if there is only 1 image we will extract  just that, else we will extract only images
- Refactored all the Parser code out of the ScannerService into a self contained class. The class should be created for any scans, allowing multiple tasks to run without any chance of cross over.



* Fixed indentation for cs files

* Fixed an issue where the logic for if a file had been modified or not was not working and always saying modified, meaning we were doing more file I/O than needed.

* Implemented the ability to have PDF books. No reader functionality.

* Implemented a basic form of scanning for PDF files. Reworked Image based libraries to remove the need to separate in a special library and instead just work within the Manga/Comic library.

* Removed the old library types.

* Removed some extra code around old raw library types

* Fully implemented PDF support into Kavita by using docnet. Removed old libraries we tried that did not work. PDFs take about 200ms to save the file to disk, so they are much slower than reading archives.

* Refactored Libraries so that they can have any file extension and the UI will decide which reader to use.

* Reworked the Series Parsing code.

We now use a separate instance for each task call, so there should be no cross over if 2 tasks are running at the same time.

Second, we now store Format with the Series, so we can have duplicate Series with the same name, but a different type of files underneath.

* Fixed PDF transparency issues

- Used this code to fix an issue when a PDF page doesn't have a background. https://github.com/GowenGit/docnet/issues/8#issuecomment-538985672

- This also fixes the same issue for cover images

* Fixed an issue where if a raw image was in a directory with non-image files, those would get moved to cache when trying to open the file.

* For image extraction, if there is only 1 image, just copy that to cache instead of everything else in the directory that is an image.

* Add some spacing to the top menu bar

* Added an icon to the card to showcase the type of file

* Added a tag badge to the series detail page

* Fixed a bug in user preferences where dark mode control would default to true, even if you weren't on it

* Fixed some tests up

* Some code smells

Co-authored-by: Robbie Davis <robbie@therobbiedavis.com>
This commit is contained in:
Joseph Milazzo 2021-07-22 21:13:24 -05:00 committed by GitHub
parent b8165b311c
commit b0df67cdda
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
43 changed files with 1725 additions and 577 deletions

View file

@ -1,7 +1,10 @@
using System;
using System.Collections.Generic;
using System.Drawing;
using System.Drawing.Imaging;
using System.IO;
using System.Linq;
using System.Runtime.InteropServices;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;
@ -9,11 +12,14 @@ using System.Web;
using API.Entities.Enums;
using API.Interfaces.Services;
using API.Parser;
using Docnet.Core;
using Docnet.Core.Models;
using ExCSS;
using HtmlAgilityPack;
using Microsoft.Extensions.Logging;
using NetVips;
using VersOne.Epub;
using Image = NetVips.Image;
using Point = System.Drawing.Point;
namespace API.Services
{
@ -25,6 +31,7 @@ namespace API.Services
public BookService(ILogger<BookService> logger)
{
_logger = logger;
}
private static bool HasClickableHrefPart(HtmlNode anchor)
@ -157,7 +164,8 @@ namespace API.Services
public string GetSummaryInfo(string filePath)
{
if (!IsValidFile(filePath)) return string.Empty;
if (!IsValidFile(filePath) || Parser.Parser.IsPdf(filePath)) return string.Empty;
try
{
@ -182,18 +190,24 @@ namespace API.Services
if (Parser.Parser.IsBook(filePath)) return true;
_logger.LogWarning("[BookService] Book {EpubFile} is not a valid EPUB", filePath);
_logger.LogWarning("[BookService] Book {EpubFile} is not a valid EPUB/PDF", filePath);
return false;
}
public int GetNumberOfPages(string filePath)
{
if (!IsValidFile(filePath) || !Parser.Parser.IsEpub(filePath)) return 0;
if (!IsValidFile(filePath)) return 0;
try
{
using var epubBook = EpubReader.OpenBook(filePath);
return epubBook.Content.Html.Count;
if (Parser.Parser.IsPdf(filePath))
{
using var docReader = DocLib.Instance.GetDocReader(filePath, new PageDimensions(1080, 1920));
return docReader.GetPageCount();
}
using var epubBook = EpubReader.OpenBook(filePath);
return epubBook.Content.Html.Count;
}
catch (Exception ex)
{
@ -231,14 +245,16 @@ namespace API.Services
/// <summary>
/// Parses out Title from book. Chapters and Volumes will always be "0". If there is any exception reading book (malformed books)
/// then null is returned.
/// then null is returned. This expects only an epub file
/// </summary>
/// <param name="filePath"></param>
/// <returns></returns>
public ParserInfo ParseInfo(string filePath)
{
try
{
if (!Parser.Parser.IsEpub(filePath)) return null;
try
{
using var epubBook = EpubReader.OpenBook(filePath);
// If the epub has the following tags, we can group the books as Volumes
@ -301,9 +317,9 @@ namespace API.Services
}
return new ParserInfo()
{
Chapters = "0",
Edition = "",
Format = MangaFormat.Book,
Chapters = Parser.Parser.DefaultChapter,
Edition = string.Empty,
Format = MangaFormat.Epub,
Filename = Path.GetFileName(filePath),
Title = specialName,
FullFilePath = filePath,
@ -320,23 +336,63 @@ namespace API.Services
return new ParserInfo()
{
Chapters = "0",
Edition = "",
Format = MangaFormat.Book,
Chapters = Parser.Parser.DefaultChapter,
Edition = string.Empty,
Format = MangaFormat.Epub,
Filename = Path.GetFileName(filePath),
Title = epubBook.Title,
FullFilePath = filePath,
IsSpecial = false,
Series = epubBook.Title,
Volumes = "0"
Volumes = Parser.Parser.DefaultVolume
};
}
catch (Exception ex)
{
_logger.LogWarning(ex, "[BookService] There was an exception when opening epub book: {FileName}", filePath);
}
}
catch (Exception ex)
{
_logger.LogWarning(ex, "[BookService] There was an exception when opening epub book: {FileName}", filePath);
}
return null;
return null;
}
private static void AddBytesToBitmap(Bitmap bmp, byte[] rawBytes)
{
var rect = new Rectangle(0, 0, bmp.Width, bmp.Height);
var bmpData = bmp.LockBits(rect, ImageLockMode.WriteOnly, bmp.PixelFormat);
var pNative = bmpData.Scan0;
Marshal.Copy(rawBytes, 0, pNative, rawBytes.Length);
bmp.UnlockBits(bmpData);
}
public void ExtractPdfImages(string fileFilePath, string targetDirectory)
{
DirectoryService.ExistOrCreate(targetDirectory);
using var docReader = DocLib.Instance.GetDocReader(fileFilePath, new PageDimensions(1080, 1920));
var pages = docReader.GetPageCount();
for (var pageNumber = 0; pageNumber < pages; pageNumber++)
{
using var pageReader = docReader.GetPageReader(pageNumber);
var rawBytes = pageReader.GetImage();
var width = pageReader.GetPageWidth();
var height = pageReader.GetPageHeight();
using var doc = new Bitmap(width, height, PixelFormat.Format32bppArgb);
using var bmp = new Bitmap(width, height, PixelFormat.Format32bppArgb);
AddBytesToBitmap(bmp, rawBytes);
for (int y = 0; y < bmp.Height; y++)
{
bmp.SetPixel(bmp.Width - 1, y, bmp.GetPixel(bmp.Width - 2, y));
}
var g = Graphics.FromImage(doc);
g.FillRegion(Brushes.White, new Region(new Rectangle(0, 0, width, height)));
g.DrawImage(bmp, new Point(0, 0));
g.Save();
using var stream = new MemoryStream();
doc.Save(stream, ImageFormat.Jpeg);
File.WriteAllBytes(Path.Combine(targetDirectory, "Page-" + pageNumber + ".png"), stream.ToArray());
}
}
@ -344,6 +400,11 @@ namespace API.Services
{
if (!IsValidFile(fileFilePath)) return Array.Empty<byte>();
if (Parser.Parser.IsPdf(fileFilePath))
{
return GetPdfCoverImage(fileFilePath, createThumbnail);
}
using var epubBook = EpubReader.OpenBook(fileFilePath);
@ -374,6 +435,50 @@ namespace API.Services
return Array.Empty<byte>();
}
private byte[] GetPdfCoverImage(string fileFilePath, bool createThumbnail)
{
try
{
using var docReader = DocLib.Instance.GetDocReader(fileFilePath, new PageDimensions(1080, 1920));
if (docReader.GetPageCount() == 0) return Array.Empty<byte>();
using var pageReader = docReader.GetPageReader(0);
var rawBytes = pageReader.GetImage();
var width = pageReader.GetPageWidth();
var height = pageReader.GetPageHeight();
using var doc = new Bitmap(width, height, PixelFormat.Format32bppArgb);
using var bmp = new Bitmap(width, height, PixelFormat.Format32bppArgb);
AddBytesToBitmap(bmp, rawBytes);
for (int y = 0; y < bmp.Height; y++)
{
bmp.SetPixel(bmp.Width - 1, y, bmp.GetPixel(bmp.Width - 2, y));
}
var g = Graphics.FromImage(doc);
g.FillRegion(Brushes.White, new Region(new Rectangle(0, 0, width, height)));
g.DrawImage(bmp, new Point(0, 0));
g.Save();
using var stream = new MemoryStream();
doc.Save(stream, ImageFormat.Jpeg);
stream.Seek(0, SeekOrigin.Begin);
if (createThumbnail)
{
using var thumbnail = Image.ThumbnailStream(stream, MetadataService.ThumbnailWidth);
return thumbnail.WriteToBuffer(".png");
}
return stream.ToArray();
}
catch (Exception ex)
{
_logger.LogWarning(ex,
"[BookService] There was a critical error and prevented thumbnail generation on {BookFile}. Defaulting to no cover image",
fileFilePath);
}
return Array.Empty<byte>();
}
private static string RemoveWhiteSpaceFromStylesheets(string body)
{
body = Regex.Replace(body, @"[a-zA-Z]+#", "#");

View file

@ -18,16 +18,18 @@ namespace API.Services
private readonly IUnitOfWork _unitOfWork;
private readonly IArchiveService _archiveService;
private readonly IDirectoryService _directoryService;
private readonly IBookService _bookService;
private readonly NumericComparer _numericComparer;
public static readonly string CacheDirectory = Path.GetFullPath(Path.Join(Directory.GetCurrentDirectory(), "cache/"));
public CacheService(ILogger<CacheService> logger, IUnitOfWork unitOfWork, IArchiveService archiveService,
IDirectoryService directoryService)
IDirectoryService directoryService, IBookService bookService)
{
_logger = logger;
_unitOfWork = unitOfWork;
_archiveService = archiveService;
_directoryService = directoryService;
_bookService = bookService;
_numericComparer = new NumericComparer();
}
@ -58,7 +60,8 @@ namespace API.Services
if (files.Count > 0 && files[0].Format == MangaFormat.Image)
{
DirectoryService.ExistOrCreate(extractPath);
_directoryService.CopyDirectoryToDirectory(Path.GetDirectoryName(files[0].FilePath), extractPath);
var pattern = (files.Count == 1) ? (@"\" + Path.GetExtension(files[0].FilePath)) : Parser.Parser.ImageFileExtensions;
_directoryService.CopyDirectoryToDirectory(Path.GetDirectoryName(files[0].FilePath), extractPath, pattern);
extractDi.Flatten();
return chapter;
}
@ -73,6 +76,9 @@ namespace API.Services
if (file.Format == MangaFormat.Archive)
{
_archiveService.ExtractArchive(file.FilePath, Path.Join(extractPath, extraPath));
} else if (file.Format == MangaFormat.Pdf)
{
_bookService.ExtractPdfImages(file.FilePath, Path.Join(extractPath, extraPath));
}
}

View file

@ -113,7 +113,15 @@ namespace API.Services
}
}
public bool CopyDirectoryToDirectory(string sourceDirName, string destDirName)
/// <summary>
/// Copies a Directory with all files and subdirectories to a target location
/// </summary>
/// <param name="sourceDirName"></param>
/// <param name="destDirName"></param>
/// <param name="searchPattern">Defaults to *, meaning all files</param>
/// <returns></returns>
/// <exception cref="DirectoryNotFoundException"></exception>
public bool CopyDirectoryToDirectory(string sourceDirName, string destDirName, string searchPattern = "*")
{
if (string.IsNullOrEmpty(sourceDirName)) return false;
@ -136,7 +144,7 @@ namespace API.Services
Directory.CreateDirectory(destDirName);
// Get the files in the directory and copy them to the new location.
var files = dir.GetFiles();
var files = GetFilesWithExtension(dir.FullName, searchPattern).Select(n => new FileInfo(n));
foreach (var file in files)
{
var tempPath = Path.Combine(destDirName, file.Name);

View file

@ -16,190 +16,194 @@ namespace API.Services
{
public class MetadataService : IMetadataService
{
private readonly IUnitOfWork _unitOfWork;
private readonly ILogger<MetadataService> _logger;
private readonly IArchiveService _archiveService;
private readonly IBookService _bookService;
private readonly IDirectoryService _directoryService;
private readonly IImageService _imageService;
private readonly ChapterSortComparer _chapterSortComparer = new ChapterSortComparer();
public static readonly int ThumbnailWidth = 320; // 153w x 230h
private readonly IUnitOfWork _unitOfWork;
private readonly ILogger<MetadataService> _logger;
private readonly IArchiveService _archiveService;
private readonly IBookService _bookService;
private readonly IDirectoryService _directoryService;
private readonly IImageService _imageService;
private readonly ChapterSortComparer _chapterSortComparer = new ChapterSortComparer();
public static readonly int ThumbnailWidth = 320; // 153w x 230h
public MetadataService(IUnitOfWork unitOfWork, ILogger<MetadataService> logger,
IArchiveService archiveService, IBookService bookService, IDirectoryService directoryService, IImageService imageService)
{
_unitOfWork = unitOfWork;
_logger = logger;
_archiveService = archiveService;
_bookService = bookService;
_directoryService = directoryService;
_imageService = imageService;
}
public MetadataService(IUnitOfWork unitOfWork, ILogger<MetadataService> logger,
IArchiveService archiveService, IBookService bookService, IDirectoryService directoryService, IImageService imageService)
{
_unitOfWork = unitOfWork;
_logger = logger;
_archiveService = archiveService;
_bookService = bookService;
_directoryService = directoryService;
_imageService = imageService;
}
private static bool ShouldFindCoverImage(byte[] coverImage, bool forceUpdate = false)
{
return forceUpdate || coverImage == null || !coverImage.Any();
}
private static bool ShouldFindCoverImage(byte[] coverImage, bool forceUpdate = false)
{
return forceUpdate || coverImage == null || !coverImage.Any();
}
private byte[] GetCoverImage(MangaFile file, bool createThumbnail = true)
{
switch (file.Format)
{
case MangaFormat.Book:
return _bookService.GetCoverImage(file.FilePath, createThumbnail);
case MangaFormat.Image:
var coverImage = _imageService.GetCoverFile(file);
return _imageService.GetCoverImage(coverImage, createThumbnail);
default:
return _archiveService.GetCoverImage(file.FilePath, createThumbnail);
}
}
private byte[] GetCoverImage(MangaFile file, bool createThumbnail = true)
{
switch (file.Format)
{
case MangaFormat.Pdf:
case MangaFormat.Epub:
return _bookService.GetCoverImage(file.FilePath, createThumbnail);
case MangaFormat.Image:
var coverImage = _imageService.GetCoverFile(file);
return _imageService.GetCoverImage(coverImage, createThumbnail);
case MangaFormat.Archive:
return _archiveService.GetCoverImage(file.FilePath, createThumbnail);
default:
return Array.Empty<byte>();
}
}
public void UpdateMetadata(Chapter chapter, bool forceUpdate)
{
var firstFile = chapter.Files.OrderBy(x => x.Chapter).FirstOrDefault();
if (ShouldFindCoverImage(chapter.CoverImage, forceUpdate) && firstFile != null && !new FileInfo(firstFile.FilePath).IsLastWriteLessThan(firstFile.LastModified))
{
chapter.Files ??= new List<MangaFile>();
chapter.CoverImage = GetCoverImage(firstFile);
}
}
public void UpdateMetadata(Chapter chapter, bool forceUpdate)
{
var firstFile = chapter.Files.OrderBy(x => x.Chapter).FirstOrDefault();
if (ShouldFindCoverImage(chapter.CoverImage, forceUpdate) && firstFile != null && !new FileInfo(firstFile.FilePath).IsLastWriteLessThan(firstFile.LastModified))
{
chapter.Files ??= new List<MangaFile>();
chapter.CoverImage = GetCoverImage(firstFile);
}
}
public void UpdateMetadata(Volume volume, bool forceUpdate)
{
if (volume != null && ShouldFindCoverImage(volume.CoverImage, forceUpdate))
{
volume.Chapters ??= new List<Chapter>();
var firstChapter = volume.Chapters.OrderBy(x => double.Parse(x.Number), _chapterSortComparer).FirstOrDefault();
public void UpdateMetadata(Volume volume, bool forceUpdate)
{
if (volume != null && ShouldFindCoverImage(volume.CoverImage, forceUpdate))
{
volume.Chapters ??= new List<Chapter>();
var firstChapter = volume.Chapters.OrderBy(x => double.Parse(x.Number), _chapterSortComparer).FirstOrDefault();
// Skip calculating Cover Image (I/O) if the chapter already has it set
if (firstChapter == null || ShouldFindCoverImage(firstChapter.CoverImage))
{
var firstFile = firstChapter?.Files.OrderBy(x => x.Chapter).FirstOrDefault();
if (firstFile != null && !new FileInfo(firstFile.FilePath).IsLastWriteLessThan(firstFile.LastModified))
// Skip calculating Cover Image (I/O) if the chapter already has it set
if (firstChapter == null || ShouldFindCoverImage(firstChapter.CoverImage))
{
volume.CoverImage = GetCoverImage(firstFile);
var firstFile = firstChapter?.Files.OrderBy(x => x.Chapter).FirstOrDefault();
if (firstFile != null && !new FileInfo(firstFile.FilePath).IsLastWriteLessThan(firstFile.LastModified))
{
volume.CoverImage = GetCoverImage(firstFile);
}
}
}
else
{
volume.CoverImage = firstChapter.CoverImage;
}
}
}
public void UpdateMetadata(Series series, bool forceUpdate)
{
if (series == null) return;
if (ShouldFindCoverImage(series.CoverImage, forceUpdate))
{
series.Volumes ??= new List<Volume>();
var firstCover = series.Volumes.GetCoverImage(series.Library.Type);
byte[] coverImage = null;
if (firstCover == null && series.Volumes.Any())
{
// If firstCover is null and one volume, the whole series is Chapters under Vol 0.
if (series.Volumes.Count == 1)
else
{
coverImage = series.Volumes[0].Chapters.OrderBy(c => double.Parse(c.Number), _chapterSortComparer)
.FirstOrDefault(c => !c.IsSpecial)?.CoverImage;
volume.CoverImage = firstChapter.CoverImage;
}
}
}
public void UpdateMetadata(Series series, bool forceUpdate)
{
if (series == null) return;
if (ShouldFindCoverImage(series.CoverImage, forceUpdate))
{
series.Volumes ??= new List<Volume>();
var firstCover = series.Volumes.GetCoverImage(series.Library.Type);
byte[] coverImage = null;
if (firstCover == null && series.Volumes.Any())
{
// If firstCover is null and one volume, the whole series is Chapters under Vol 0.
if (series.Volumes.Count == 1)
{
coverImage = series.Volumes[0].Chapters.OrderBy(c => double.Parse(c.Number), _chapterSortComparer)
.FirstOrDefault(c => !c.IsSpecial)?.CoverImage;
}
if (coverImage == null)
{
coverImage = series.Volumes[0].Chapters.OrderBy(c => double.Parse(c.Number), _chapterSortComparer)
.FirstOrDefault()?.CoverImage;
}
}
series.CoverImage = firstCover?.CoverImage ?? coverImage;
}
UpdateSeriesSummary(series, forceUpdate);
}
private void UpdateSeriesSummary(Series series, bool forceUpdate)
{
if (!string.IsNullOrEmpty(series.Summary) && !forceUpdate) return;
var isBook = series.Library.Type == LibraryType.Book;
var firstVolume = series.Volumes.FirstWithChapters(isBook);
var firstChapter = firstVolume?.Chapters.GetFirstChapterWithFiles();
var firstFile = firstChapter?.Files.FirstOrDefault();
if (firstFile == null || (!forceUpdate && !firstFile.HasFileBeenModified())) return;
if (Parser.Parser.IsPdf(firstFile.FilePath)) return;
var summary = Parser.Parser.IsEpub(firstFile.FilePath) ? _bookService.GetSummaryInfo(firstFile.FilePath) : _archiveService.GetSummaryInfo(firstFile.FilePath);
if (string.IsNullOrEmpty(series.Summary))
{
series.Summary = summary;
}
firstFile.LastModified = DateTime.Now;
}
public void RefreshMetadata(int libraryId, bool forceUpdate = false)
{
var sw = Stopwatch.StartNew();
var library = Task.Run(() => _unitOfWork.LibraryRepository.GetFullLibraryForIdAsync(libraryId)).GetAwaiter().GetResult();
// TODO: See if we can break this up into multiple threads that process 20 series at a time then save so we can reduce amount of memory used
_logger.LogInformation("Beginning metadata refresh of {LibraryName}", library.Name);
foreach (var series in library.Series)
{
foreach (var volume in series.Volumes)
{
foreach (var chapter in volume.Chapters)
{
UpdateMetadata(chapter, forceUpdate);
}
UpdateMetadata(volume, forceUpdate);
}
if (coverImage == null)
{
coverImage = series.Volumes[0].Chapters.OrderBy(c => double.Parse(c.Number), _chapterSortComparer)
.FirstOrDefault()?.CoverImage;
}
}
series.CoverImage = firstCover?.CoverImage ?? coverImage;
}
UpdateSeriesSummary(series, forceUpdate);
}
private void UpdateSeriesSummary(Series series, bool forceUpdate)
{
if (!string.IsNullOrEmpty(series.Summary) && !forceUpdate) return;
var isBook = series.Library.Type == LibraryType.Book;
var firstVolume = series.Volumes.FirstWithChapters(isBook);
var firstChapter = firstVolume?.Chapters.GetFirstChapterWithFiles();
// NOTE: This suffers from code changes not taking effect due to stale data
var firstFile = firstChapter?.Files.FirstOrDefault();
if (firstFile == null || (!forceUpdate && !firstFile.HasFileBeenModified())) return;
var summary = isBook ? _bookService.GetSummaryInfo(firstFile.FilePath) : _archiveService.GetSummaryInfo(firstFile.FilePath);
if (string.IsNullOrEmpty(series.Summary))
{
series.Summary = summary;
}
firstFile.LastModified = DateTime.Now;
}
UpdateMetadata(series, forceUpdate);
_unitOfWork.SeriesRepository.Update(series);
}
public void RefreshMetadata(int libraryId, bool forceUpdate = false)
{
var sw = Stopwatch.StartNew();
var library = Task.Run(() => _unitOfWork.LibraryRepository.GetFullLibraryForIdAsync(libraryId)).GetAwaiter().GetResult();
if (_unitOfWork.HasChanges() && Task.Run(() => _unitOfWork.CommitAsync()).Result)
{
_logger.LogInformation("Updated metadata for {LibraryName} in {ElapsedMilliseconds} milliseconds", library.Name, sw.ElapsedMilliseconds);
}
}
// TODO: See if we can break this up into multiple threads that process 20 series at a time then save so we can reduce amount of memory used
_logger.LogInformation("Beginning metadata refresh of {LibraryName}", library.Name);
foreach (var series in library.Series)
{
foreach (var volume in series.Volumes)
{
public void RefreshMetadataForSeries(int libraryId, int seriesId)
{
var sw = Stopwatch.StartNew();
var library = Task.Run(() => _unitOfWork.LibraryRepository.GetFullLibraryForIdAsync(libraryId)).GetAwaiter().GetResult();
var series = library.Series.SingleOrDefault(s => s.Id == seriesId);
if (series == null)
{
_logger.LogError("Series {SeriesId} was not found on Library {LibraryName}", seriesId, libraryId);
return;
}
_logger.LogInformation("Beginning metadata refresh of {SeriesName}", series.Name);
foreach (var volume in series.Volumes)
{
foreach (var chapter in volume.Chapters)
{
UpdateMetadata(chapter, forceUpdate);
UpdateMetadata(chapter, true);
}
UpdateMetadata(volume, forceUpdate);
}
UpdateMetadata(volume, true);
}
UpdateMetadata(series, forceUpdate);
_unitOfWork.SeriesRepository.Update(series);
}
UpdateMetadata(series, true);
_unitOfWork.SeriesRepository.Update(series);
if (_unitOfWork.HasChanges() && Task.Run(() => _unitOfWork.CommitAsync()).Result)
{
_logger.LogInformation("Updated metadata for {LibraryName} in {ElapsedMilliseconds} milliseconds", library.Name, sw.ElapsedMilliseconds);
}
}
public void RefreshMetadataForSeries(int libraryId, int seriesId)
{
var sw = Stopwatch.StartNew();
var library = Task.Run(() => _unitOfWork.LibraryRepository.GetFullLibraryForIdAsync(libraryId)).GetAwaiter().GetResult();
var series = library.Series.SingleOrDefault(s => s.Id == seriesId);
if (series == null)
{
_logger.LogError("Series {SeriesId} was not found on Library {LibraryName}", seriesId, libraryId);
return;
}
_logger.LogInformation("Beginning metadata refresh of {SeriesName}", series.Name);
foreach (var volume in series.Volumes)
{
foreach (var chapter in volume.Chapters)
{
UpdateMetadata(chapter, true);
}
UpdateMetadata(volume, true);
}
UpdateMetadata(series, true);
_unitOfWork.SeriesRepository.Update(series);
if (_unitOfWork.HasChanges() && Task.Run(() => _unitOfWork.CommitAsync()).Result)
{
_logger.LogInformation("Updated metadata for {SeriesName} in {ElapsedMilliseconds} milliseconds", series.Name, sw.ElapsedMilliseconds);
}
}
if (_unitOfWork.HasChanges() && Task.Run(() => _unitOfWork.CommitAsync()).Result)
{
_logger.LogInformation("Updated metadata for {SeriesName} in {ElapsedMilliseconds} milliseconds", series.Name, sw.ElapsedMilliseconds);
}
}
}
}

View file

@ -0,0 +1,198 @@
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using API.Entities.Enums;
using API.Interfaces.Services;
using API.Parser;
using Microsoft.Extensions.Logging;
namespace API.Services.Tasks.Scanner
{
public class ParsedSeries
{
public string Name { get; init; }
public string NormalizedName { get; init; }
public MangaFormat Format { get; init; }
}
public class ParseScannedFiles
{
private readonly ConcurrentDictionary<ParsedSeries, List<ParserInfo>> _scannedSeries;
private readonly IBookService _bookService;
private readonly ILogger _logger;
/// <summary>
/// An instance of a pipeline for processing files and returning a Map of Series -> ParserInfos.
/// Each instance is separate from other threads, allowing for no cross over.
/// </summary>
/// <param name="bookService"></param>
/// <param name="logger"></param>
public ParseScannedFiles(IBookService bookService, ILogger logger)
{
_bookService = bookService;
_logger = logger;
_scannedSeries = new ConcurrentDictionary<ParsedSeries, List<ParserInfo>>();
}
/// <summary>
/// Processes files found during a library scan.
/// Populates a collection of <see cref="ParserInfo"/> for DB updates later.
/// </summary>
/// <param name="path">Path of a file</param>
/// <param name="rootPath"></param>
/// <param name="type">Library type to determine parsing to perform</param>
private void ProcessFile(string path, string rootPath, LibraryType type)
{
ParserInfo info;
if (Parser.Parser.IsEpub(path))
{
info = _bookService.ParseInfo(path);
}
else
{
info = Parser.Parser.Parse(path, rootPath, type);
}
if (info == null)
{
_logger.LogWarning("[Scanner] Could not parse series from {Path}", path);
return;
}
if (Parser.Parser.IsEpub(path) && Parser.Parser.ParseVolume(info.Series) != Parser.Parser.DefaultVolume)
{
info = Parser.Parser.Parse(path, rootPath, type);
var info2 = _bookService.ParseInfo(path);
info.Merge(info2);
}
TrackSeries(info);
}
/// <summary>
/// Attempts to either add a new instance of a show mapping to the _scannedSeries bag or adds to an existing.
/// This will check if the name matches an existing series name (multiple fields) <see cref="MergeName"/>
/// </summary>
/// <param name="info"></param>
private void TrackSeries(ParserInfo info)
{
if (info.Series == string.Empty) return;
// Check if normalized info.Series already exists and if so, update info to use that name instead
info.Series = MergeName(info);
var existingKey = _scannedSeries.Keys.FirstOrDefault(ps =>
ps.Format == info.Format && ps.NormalizedName == Parser.Parser.Normalize(info.Series));
existingKey ??= new ParsedSeries()
{
Format = info.Format,
Name = info.Series,
NormalizedName = Parser.Parser.Normalize(info.Series)
};
_scannedSeries.AddOrUpdate(existingKey, new List<ParserInfo>() {info}, (_, oldValue) =>
{
oldValue ??= new List<ParserInfo>();
if (!oldValue.Contains(info))
{
oldValue.Add(info);
}
return oldValue;
});
}
/// <summary>
/// Using a normalized name from the passed ParserInfo, this checks against all found series so far and if an existing one exists with
/// same normalized name, it merges into the existing one. This is important as some manga may have a slight difference with punctuation or capitalization.
/// </summary>
/// <param name="info"></param>
/// <returns></returns>
public string MergeName(ParserInfo info)
{
var normalizedSeries = Parser.Parser.Normalize(info.Series);
_logger.LogDebug("Checking if we can merge {NormalizedSeries}", normalizedSeries);
var existingName =
_scannedSeries.SingleOrDefault(p => Parser.Parser.Normalize(p.Key.NormalizedName) == normalizedSeries && p.Key.Format == info.Format)
.Key;
if (existingName != null && !string.IsNullOrEmpty(existingName.Name))
{
_logger.LogDebug("Found duplicate parsed infos, merged {Original} into {Merged}", info.Series, existingName.Name);
return existingName.Name;
}
return info.Series;
}
/// <summary>
///
/// </summary>
/// <param name="libraryType">Type of library. Used for selecting the correct file extensions to search for and parsing files</param>
/// <param name="folders">The folders to scan. By default, this should be library.Folders, however it can be overwritten to restrict folders</param>
/// <param name="totalFiles">Total files scanned</param>
/// <param name="scanElapsedTime">Time it took to scan and parse files</param>
/// <returns></returns>
public Dictionary<ParsedSeries, List<ParserInfo>> ScanLibrariesForSeries(LibraryType libraryType, IEnumerable<string> folders, out int totalFiles,
out long scanElapsedTime)
{
var sw = Stopwatch.StartNew();
totalFiles = 0;
var searchPattern = GetLibrarySearchPattern(libraryType);
foreach (var folderPath in folders)
{
try
{
totalFiles += DirectoryService.TraverseTreeParallelForEach(folderPath, (f) =>
{
try
{
ProcessFile(f, folderPath, libraryType);
}
catch (FileNotFoundException exception)
{
_logger.LogError(exception, "The file {Filename} could not be found", f);
}
}, searchPattern, _logger);
}
catch (ArgumentException ex)
{
_logger.LogError(ex, "The directory '{FolderPath}' does not exist", folderPath);
}
}
scanElapsedTime = sw.ElapsedMilliseconds;
_logger.LogInformation("Scanned {TotalFiles} files in {ElapsedScanTime} milliseconds", totalFiles,
scanElapsedTime);
return SeriesWithInfos();
}
/// <summary>
/// Given the Library Type, returns the regex pattern that restricts which files types will be found during a file scan.
/// </summary>
/// <param name="libraryType"></param>
/// <returns></returns>
private static string GetLibrarySearchPattern(LibraryType libraryType)
{
return Parser.Parser.SupportedExtensions;
}
/// <summary>
/// Returns any series where there were parsed infos
/// </summary>
/// <returns></returns>
private Dictionary<ParsedSeries, List<ParserInfo>> SeriesWithInfos()
{
var filtered = _scannedSeries.Where(kvp => kvp.Value.Count > 0);
var series = filtered.ToDictionary(v => v.Key, v => v.Value);
return series;
}
}
}

View file

@ -14,6 +14,7 @@ using API.Extensions;
using API.Interfaces;
using API.Interfaces.Services;
using API.Parser;
using API.Services.Tasks.Scanner;
using Hangfire;
using Microsoft.Extensions.Logging;
@ -26,7 +27,6 @@ namespace API.Services.Tasks
private readonly IArchiveService _archiveService;
private readonly IMetadataService _metadataService;
private readonly IBookService _bookService;
private ConcurrentDictionary<string, List<ParserInfo>> _scannedSeries;
private readonly NaturalSortComparer _naturalSort;
public ScannerService(IUnitOfWork unitOfWork, ILogger<ScannerService> logger, IArchiveService archiveService,
@ -50,9 +50,8 @@ namespace API.Services.Tasks
var dirs = FindHighestDirectoriesFromFiles(library, files);
_logger.LogInformation("Beginning file scan on {SeriesName}", series.Name);
// TODO: We can't have a global variable if multiple scans are taking place. Refactor this.
_scannedSeries = new ConcurrentDictionary<string, List<ParserInfo>>();
var parsedSeries = ScanLibrariesForSeries(library.Type, dirs.Keys, out var totalFiles, out var scanElapsedTime);
var scanner = new ParseScannedFiles(_bookService, _logger);
var parsedSeries = scanner.ScanLibrariesForSeries(library.Type, dirs.Keys, out var totalFiles, out var scanElapsedTime);
// If a root level folder scan occurs, then multiple series gets passed in and thus we get a unique constraint issue
// Hence we clear out anything but what we selected for
@ -137,7 +136,6 @@ namespace API.Services.Tasks
[AutomaticRetry(Attempts = 0, OnAttemptsExceeded = AttemptsExceededAction.Delete)]
public void ScanLibrary(int libraryId, bool forceUpdate)
{
_scannedSeries = new ConcurrentDictionary<string, List<ParserInfo>>();
Library library;
try
{
@ -152,7 +150,9 @@ namespace API.Services.Tasks
}
_logger.LogInformation("Beginning file scan on {LibraryName}", library.Name);
var series = ScanLibrariesForSeries(library.Type, library.Folders.Select(fp => fp.Path), out var totalFiles, out var scanElapsedTime);
var scanner = new ParseScannedFiles(_bookService, _logger);
var series = scanner.ScanLibrariesForSeries(library.Type, library.Folders.Select(fp => fp.Path), out var totalFiles, out var scanElapsedTime);
foreach (var folderPath in library.Folders)
{
folderPath.LastScanned = DateTime.Now;
@ -188,75 +188,7 @@ namespace API.Services.Tasks
_logger.LogInformation("Removed {Count} abandoned progress rows", cleanedUp);
}
/// <summary>
///
/// </summary>
/// <param name="libraryType">Type of library. Used for selecting the correct file extensions to search for and parsing files</param>
/// <param name="folders">The folders to scan. By default, this should be library.Folders, however it can be overwritten to restrict folders</param>
/// <param name="totalFiles">Total files scanned</param>
/// <param name="scanElapsedTime">Time it took to scan and parse files</param>
/// <returns></returns>
private Dictionary<string, List<ParserInfo>> ScanLibrariesForSeries(LibraryType libraryType, IEnumerable<string> folders, out int totalFiles,
out long scanElapsedTime)
{
var sw = Stopwatch.StartNew();
totalFiles = 0;
var searchPattern = GetLibrarySearchPattern(libraryType);
foreach (var folderPath in folders)
{
try
{
totalFiles += DirectoryService.TraverseTreeParallelForEach(folderPath, (f) =>
{
try
{
ProcessFile(f, folderPath, libraryType);
}
catch (FileNotFoundException exception)
{
_logger.LogError(exception, "The file {Filename} could not be found", f);
}
}, searchPattern, _logger);
}
catch (ArgumentException ex)
{
_logger.LogError(ex, "The directory '{FolderPath}' does not exist", folderPath);
}
}
scanElapsedTime = sw.ElapsedMilliseconds;
_logger.LogInformation("Scanned {TotalFiles} files in {ElapsedScanTime} milliseconds", totalFiles,
scanElapsedTime);
return SeriesWithInfos(_scannedSeries);
}
private static string GetLibrarySearchPattern(LibraryType libraryType)
{
var searchPattern = libraryType switch
{
LibraryType.Book => Parser.Parser.BookFileExtensions,
LibraryType.MangaImages or LibraryType.ComicImages => Parser.Parser.ImageFileExtensions,
_ => Parser.Parser.ArchiveFileExtensions
};
return searchPattern;
}
/// <summary>
/// Returns any series where there were parsed infos
/// </summary>
/// <param name="scannedSeries"></param>
/// <returns></returns>
private static Dictionary<string, List<ParserInfo>> SeriesWithInfos(IDictionary<string, List<ParserInfo>> scannedSeries)
{
var filtered = scannedSeries.Where(kvp => kvp.Value.Count > 0);
var series = filtered.ToDictionary(v => v.Key, v => v.Value);
return series;
}
private void UpdateLibrary(Library library, Dictionary<string, List<ParserInfo>> parsedSeries)
private void UpdateLibrary(Library library, Dictionary<ParsedSeries, List<ParserInfo>> parsedSeries)
{
if (parsedSeries == null) throw new ArgumentNullException(nameof(parsedSeries));
@ -276,16 +208,18 @@ namespace API.Services.Tasks
// Add new series that have parsedInfos
foreach (var (key, infos) in parsedSeries)
{
// Key is normalized already
// Key is normalized already
Series existingSeries;
try
{
existingSeries = library.Series.SingleOrDefault(s => s.NormalizedName == key || Parser.Parser.Normalize(s.OriginalName) == key);
existingSeries = library.Series.SingleOrDefault(s =>
(s.NormalizedName == key.NormalizedName || Parser.Parser.Normalize(s.OriginalName) == key.NormalizedName)
&& (s.Format == key.Format || s.Format == MangaFormat.Unknown));
}
catch (Exception e)
{
_logger.LogCritical(e, "There are multiple series that map to normalized key {Key}. You can manually delete the entity via UI and rescan to fix it", key);
var duplicateSeries = library.Series.Where(s => s.NormalizedName == key || Parser.Parser.Normalize(s.OriginalName) == key).ToList();
var duplicateSeries = library.Series.Where(s => s.NormalizedName == key.NormalizedName || Parser.Parser.Normalize(s.OriginalName) == key.NormalizedName).ToList();
foreach (var series in duplicateSeries)
{
_logger.LogCritical("{Key} maps with {Series}", key, series.OriginalName);
@ -296,12 +230,14 @@ namespace API.Services.Tasks
if (existingSeries == null)
{
existingSeries = DbFactory.Series(infos[0].Series);
existingSeries.Format = key.Format;
library.Series.Add(existingSeries);
}
existingSeries.NormalizedName = Parser.Parser.Normalize(existingSeries.Name);
existingSeries.OriginalName ??= infos[0].Series;
existingSeries.Metadata ??= DbFactory.SeriesMetadata(new List<CollectionTag>());
existingSeries.Format = key.Format;
}
// Now, we only have to deal with series that exist on disk. Let's recalculate the volumes for each series
@ -311,7 +247,7 @@ namespace API.Services.Tasks
try
{
_logger.LogInformation("Processing series {SeriesName}", series.OriginalName);
UpdateVolumes(series, parsedSeries[Parser.Parser.Normalize(series.OriginalName)].ToArray());
UpdateVolumes(series, GetInfosByName(parsedSeries, series).ToArray());
series.Pages = series.Volumes.Sum(v => v.Pages);
}
catch (Exception ex)
@ -321,9 +257,24 @@ namespace API.Services.Tasks
});
}
public IEnumerable<Series> FindSeriesNotOnDisk(ICollection<Series> existingSeries, Dictionary<string, List<ParserInfo>> parsedSeries)
private static IList<ParserInfo> GetInfosByName(Dictionary<ParsedSeries, List<ParserInfo>> parsedSeries, Series series)
{
var foundSeries = parsedSeries.Select(s => s.Key).ToList();
// TODO: Move this into a common place
var existingKey = parsedSeries.Keys.FirstOrDefault(ps =>
ps.Format == series.Format && ps.NormalizedName == Parser.Parser.Normalize(series.OriginalName));
existingKey ??= new ParsedSeries()
{
Format = series.Format,
Name = series.OriginalName,
NormalizedName = Parser.Parser.Normalize(series.OriginalName)
};
return parsedSeries[existingKey];
}
public IEnumerable<Series> FindSeriesNotOnDisk(ICollection<Series> existingSeries, Dictionary<ParsedSeries, List<ParserInfo>> parsedSeries)
{
var foundSeries = parsedSeries.Select(s => s.Key.Name).ToList();
return existingSeries.Where(es => !es.NameInList(foundSeries));
}
@ -364,8 +315,6 @@ namespace API.Services.Tasks
series.Volumes.Add(volume);
}
// NOTE: Instead of creating and adding? Why Not Merge a new volume into an existing, so no matter what, new properties,etc get propagated?
_logger.LogDebug("Parsing {SeriesName} - Volume {VolumeNumber}", series.Name, volume.Name);
var infos = parsedInfos.Where(p => p.Volumes == volumeNumber).ToArray();
UpdateChapters(volume, infos);
@ -473,88 +422,6 @@ namespace API.Services.Tasks
}
}
/// <summary>
/// Attempts to either add a new instance of a show mapping to the _scannedSeries bag or adds to an existing.
/// This will check if the name matches an existing series name (multiple fields) <see cref="MergeName"/>
/// </summary>
/// <param name="info"></param>
private void TrackSeries(ParserInfo info)
{
if (info.Series == string.Empty) return;
// Check if normalized info.Series already exists and if so, update info to use that name instead
info.Series = MergeName(_scannedSeries, info);
_scannedSeries.AddOrUpdate(Parser.Parser.Normalize(info.Series), new List<ParserInfo>() {info}, (_, oldValue) =>
{
oldValue ??= new List<ParserInfo>();
if (!oldValue.Contains(info))
{
oldValue.Add(info);
}
return oldValue;
});
}
/// <summary>
/// Using a normalized name from the passed ParserInfo, this checks against all found series so far and if an existing one exists with
/// same normalized name, it merges into the existing one. This is important as some manga may have a slight difference with punctuation or capitalization.
/// </summary>
/// <param name="collectedSeries"></param>
/// <param name="info"></param>
/// <returns></returns>
public string MergeName(ConcurrentDictionary<string,List<ParserInfo>> collectedSeries, ParserInfo info)
{
var normalizedSeries = Parser.Parser.Normalize(info.Series);
_logger.LogDebug("Checking if we can merge {NormalizedSeries}", normalizedSeries);
var existingName = collectedSeries.SingleOrDefault(p => Parser.Parser.Normalize(p.Key) == normalizedSeries)
.Key;
if (!string.IsNullOrEmpty(existingName))
{
_logger.LogDebug("Found duplicate parsed infos, merged {Original} into {Merged}", info.Series, existingName);
return existingName;
}
return info.Series;
}
/// <summary>
/// Processes files found during a library scan.
/// Populates a collection of <see cref="ParserInfo"/> for DB updates later.
/// </summary>
/// <param name="path">Path of a file</param>
/// <param name="rootPath"></param>
/// <param name="type">Library type to determine parsing to perform</param>
private void ProcessFile(string path, string rootPath, LibraryType type)
{
ParserInfo info;
if (type == LibraryType.Book && Parser.Parser.IsEpub(path))
{
info = _bookService.ParseInfo(path);
}
else
{
info = Parser.Parser.Parse(path, rootPath, type);
}
if (info == null)
{
_logger.LogWarning("[Scanner] Could not parse series from {Path}", path);
return;
}
if (type == LibraryType.Book && Parser.Parser.IsEpub(path) && Parser.Parser.ParseVolume(info.Series) != Parser.Parser.DefaultVolume)
{
info = Parser.Parser.Parse(path, rootPath, type);
var info2 = _bookService.ParseInfo(path);
info.Merge(info2);
}
TrackSeries(info);
}
private MangaFile CreateMangaFile(ParserInfo info)
{
switch (info.Format)
@ -568,7 +435,8 @@ namespace API.Services.Tasks
Pages = _archiveService.GetNumberOfPagesFromArchive(info.FullFilePath)
};
}
case MangaFormat.Book:
case MangaFormat.Pdf:
case MangaFormat.Epub:
{
return new MangaFile()
{
@ -601,9 +469,9 @@ namespace API.Services.Tasks
if (existingFile != null)
{
existingFile.Format = info.Format;
if (!existingFile.HasFileBeenModified() && existingFile.Pages > 0)
if (existingFile.HasFileBeenModified() || existingFile.Pages == 0)
{
existingFile.Pages = existingFile.Format == MangaFormat.Book
existingFile.Pages = (existingFile.Format == MangaFormat.Epub || existingFile.Format == MangaFormat.Pdf)
? _bookService.GetNumberOfPages(info.FullFilePath)
: _archiveService.GetNumberOfPagesFromArchive(info.FullFilePath);
}