Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for scimag (Journals) #1

Open
Type-IIx opened this issue May 31, 2021 · 5 comments
Open

Support for scimag (Journals) #1

Type-IIx opened this issue May 31, 2021 · 5 comments
Labels
enhancement New feature or request

Comments

@Type-IIx
Copy link

Is there active development for scimag journals in the LibGen API? With your project, do you have imminent plans to support scimag either with or without the API?

@Yetangitu
Copy link
Owner

There is no support in the API but I do plan to add support in books in the near future (now that I'm done with classify: https://ipfs.io/ipfs/QmPjTqZ18NWLjpokUn9NwnaxKnSA8pVaQcwmGBK78SAYJB). This support will be comparable to the way libgen_fiction is supported, i.e. with regular database refresh. I might start posting delta updates to IPFS, not sure yet to the viability of such a scheme.

@Yetangitu
Copy link
Owner

Yetangitu commented Jun 2, 2021

I'm doing some experiments with the scimag dataset which quickly showed that the most common type of query used in books is far too slow to be useable - ~10 minutes for a simple select * from scimag where title like 'AN ASSESSMENT OF CHEMICAL%'; using a dedicated database VM with 16GB of memory - which implies that only direct absolute queries are useable - select * from scimag where doi='10.1111/j.1745-4565.2004.tb00382.x';returns the result in milliseconds. The question is whether it makes sense to integrate libgen_scimag into books with these limitations. It would take a dedicated database server with at least 64GB of memory (the database + index currently takes ~46GB) to speed up partial (like) queries to something approaching useable performance. As it stands now the only use case would be a direct download based on exact DOI or title, i.e. nscimag -d '10.2307/1219787' or nscimag -t 'An Assessment of Chemical Features for the Classification of Plant Fossils' would download the article. Fulltext search performance over author and title is similarly abysmal. I don't think it makes sense to add support in this way so I'll look into using 'net-based resources instead of a local database for scimag support for now.

@Yetangitu Yetangitu added the enhancement New feature or request label Jun 2, 2021
@Type-IIx
Copy link
Author

Type-IIx commented Jun 5, 2021

It's great to see that you have been working on the problem. I suppose it is such a massive dataset; but yes, way more RAM would be needed. I know you're a volunteer, so the fact you're working on this problem so well is greatly appreciated.

I do admit, I would certainly most often be searching by wildcard. However, the ability to download a journal by volume for example would be useful and probably a lot faster. I know there have been solutions to this. On the LibGen mhut forum there was an old project that seems to have been killed off or otherwise inaccessible now by backwar.

Is development for LibGen moving to IPFS for the most part?

@Yetangitu
Copy link
Owner

Yetangitu commented Jun 6, 2021

I don't really know where libgen development is moving, the project has fractured into different forks which do not seem to get along all that well. The founder ("bookwarrior") can now be found at libgen.fun/libgen.life, he sees the people at mhut/libgen.rs as having betrayed the original intention of the project. I started working on books to make it possible to access libgen from a *nix terminal and do not consider myself to be part of any of the libgen projects, books can be used with or adapted to any fork which makes the database available for download.

I do see more and more rumblings about IPFS so I work from the assumption that "client access" will eventually move there. As to whether it will take over in the role of collection distribution currently done through bittorrent I do not know. There is an interesting technical demonstration on libgen.fun/libgen.life for an IPFS-distributed sqlite database + interface to libgen:

http://ipfs.io/ipfs/QmUqd8zbStKHfTHTo3cCLx7FR8t1g11WLKXk8m6Kyv7i5s/

(this link might stop working, the project is discussed at https://libgen.life/viewtopic.php?f=39&t=7940)

This uses a webasm-version of sqlite to access an IPFS-distributed database.

The same idea would probably work for the - far larger - scimag database. I'll have a look at adding support for this type of database access to books, it would make it far easier to keep the thing up to date and would negate the need for a local mysql installation.

@Type-IIx
Copy link
Author

Type-IIx commented Jun 7, 2021

I don't really know where libgen development is moving, the project has fractured into different forks which do not seem to get along all that well. The founder ("bookwarrior") can now be found at libgen.fun/libgen.life, he sees the people at mhut/libgen.rs as having betrayed the original intention of the project. I started working on books to make it possible to access libgen from a *nix terminal and do not consider myself to be part of any of the libgen projects, books can be used with or adapted to any fork which makes the database available for download.

I do see more and more rumblings about IPFS so I work from the assumption that "client access" will eventually move there. As to whether it will take over in the role of collection distribution currently done through bittorrent I do not know. There is an interesting technical demonstration on libgen.fun/libgen.life for an IPFS-distributed sqlite database + interface to libgen:

http://ipfs.io/ipfs/QmUqd8zbStKHfTHTo3cCLx7FR8t1g11WLKXk8m6Kyv7i5s/

(this link might stop working, the project is discussed at https://libgen.life/viewtopic.php?f=39&t=7940)

This uses a webasm-version of sqlite to access an IPFS-distributed database.

The same idea would probably work for the - far larger - scimag database. I'll have a look at adding support for this type of database access to books, it would make it far easier to keep the thing up to date and would negate the need for a local mysql installation.

Ah, Okay. First of all: thank you for the social information re: the fork-schism between mhut and libgen-life. Important.

So, as a new registrant, I am unable to view https://libgen(dot)life/viewtopic.php?f=39&t=7940

I am supportive of the design of IPFS: however, I do take issue with its approach to location-hidden (Tor) servers and clients: see ipfs/notes#37 (perfect being the enemy of good). And any real hope of reinvigorating the effort has seemed to dwindle or die off (https://github.com/berty/go-libp2p-tor-transport), the go-onion-transport being left to OpenBaz--r, a not-quite-legimitate service in my estimation.
One more cynical point, the scihut BitTo--ent distribution of sci-mag and other collections remains a PoC (https://git[dot]sr[dot]ht/~scihut/scihut) without much progress. Though, it is interesting.

Less cynically:

  • I can access the IPFS demonstration you provided and it is fast
  • I agree that bash and *nix development is vital as it's portable for all users, including Cygwin, Darwin, etc.
  • I am very glad you pointed me towards libgen(dot)life. More things!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants