I’m excited to announce the first alpha preview of this project that I’ve been working on for the past 4 months. I’m initially posting about this in a few small communities, and hoping to get some input from early adopters and beta testers.
What is a DHT crawler?
The DHT crawler is Bitmagnet’s killer feature that (currently) makes it unique. Well, almost unique, read on…
So what is it? You might be aware that you can enable DHT in your BitTorrent client, and that this allows you find peers who are announcing a torrent’s hash to a Distributed Hash Table (DHT), rather than to a centralized tracker. DHT’s lesser known feature is that it allows you to crawl the info hashes it knows about. This is how Bitmagnet’s DHT crawler works works - it crawls the DHT network, requesting metadata about each info hash it discovers. It then further enriches this metadata by attempting to classify it and associate it with known pieces of content, such as movies and TV shows. It then allows you to search everything it has indexed.
This means that Bitmagnet is not reliant on any external trackers or torrent indexers. It’s a self-contained, self-hosted torrent indexer, connected via the DHT to a global network of peers and constantly discovering new content.
The DHT crawler is not quite unique to Bitmagnet; another open-source project, magnetico was first (as far as I know) to implement a usable DHT crawler, and was a crucial reference point for implementing this feature. However that project is no longer maintained, and does not provide the other features such as content classification, and integration with other software in the ecosystem, that greatly improve usability.
Currently implemented features of Bitmagnet:
- A DHT crawler
- A generic BitTorrent indexer: Bitmagnet can index torrents from any source, not only the DHT network - currently this is only possible via the /import endpoint; more user-friendly methods are in the pipeline
- A content classifier that can currently identify movie and television content, along with key related attributes such as language, resolution, source (BluRay, webrip etc.) and enriches this with data from The Movie Database
- An import facility for ingesting torrents from any source, for example the RARBG backup
- A torrent search engine
- A GraphQL API: currently this provides a single search query; there is also an embedded GraphQL playground at /graphql
- A web user interface implemented in Angular: currently this is a simple single-page application providing a user interface for search queries via the GraphQL API
- A Torznab-compatible endpoint for integration with the Serverr stack
Interested?
If this project interests you then I’d really appreciate your input:
- How did you get along with following the documentation and installation instructions? Were there any pain points?
- There’s a roadmap of high-priority features on the website - what do you see as the highest priority for near-term development?
- If you’re a developer, are you interested in contributing to the project?
Thanks for your attention. If you’re interested in this project and would like to help it gain momentum then please give it a star on GitHub, and expect further updates soon!
I’ve just always used VMs for everything and set up each service to match my existing system. For example, my postfix servers have to all tie in to LDAP, mailman, and the host of services for authenticating email. It seems like the point of docker is to just have a completely preconfigured and self-contained setup. I guess I Just don’t see how that would work in my environment where I already have some services like databases or LDAP already running elsewhere, and I run multiple instances for redundancy. And if I have to reconfigure all that stuff in docker anyway, how is that any better than simply using my existing VMs?
Used to be like you, then I moved from truenas core to scale where it’s now Linux and docker instead of freebsd and iocage jails.
So docker has this concept of persistent volumes. You configure all your settings in the initial setup command (docker compose) and define persistent volumes. This way you don’t lose your data.
Here’s an example, Plex. I run Plex in docker now. So my config directory is defined as a persistent volume. If I need to update Plex, or rebuild it or whatever, the container just updates and has all the data I need via the persistent volume. If the install is messed up or whatever I just get a newer image and run the docker compose and it fires up and mounts the persistent volume and off I go.
Basically it takes away the burden of having to figure out the OS configuration. Makes backups easier - and smaller. And the things are spun up, installed, and usable in seconds.
Not sure the OS configuration is really a burden :-) I have several servers I have to keep up to date anyway. And backups aren’t really an issue, I just run rdiff-backup on everything to provide a year’s worth of incremental backups, which doesn’t really take much extra space. Maybe one of these days when I catch up on other projects I’ll look into it though.