-
Notifications
You must be signed in to change notification settings - Fork 20
Patch/faster loading fmc #191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Why is there no CI? |
|
weird, now CI works again |
776afd1 to
95407f5
Compare
|
Why do the Mac-tests pass :o ? |
|
Oh no....I have two fallbacks: 1. If So I guess somehow it is using the non accelerated path... :/ I will go over it again. |
You have very good unittests! |
cbfa544 to
e741f4e
Compare
|
Turns out MacOs is using |
922711f to
01ced33
Compare
|
Thanks for fixing the mac tests. Is this PR still in draft status, or do you want a review? |
|
I compared this PR and the current lambda3 branch with a ~6 GB nucleotide database (using /dev/shm). Currently the index loading time for the current lambda3 branch is ~69 seconds vs ~65 seconds for this PR. |
|
This is unexpected. When testing the bidirecional index directly, the speed up was much greater. @h-2 I think there is no need to integrate this, if this doesn't bring any real improvements. I will make an extra PR for the mac tests (#196). Another idea, for improved loading, is to not store the blocks and superblocks, but compute them when loading. This should slowdown loading from cache, but increase speeds when loading from network or disk storage. Currently, I am under some time limitations, and won't have any time this week to look at it :/ |
|
@sarahet Maybe you can check the runtimes of (de)serialising the individual members of the index-file struct? This could be done by splitting the call to Unless, we are sure that the index itself is the problem, it doesn't make much sense to have @SGSSGene dig into the Lambda code, I think. [I won't be able to test things until Wednesday next week.] |
|
Loading Database Index... done. Runtime: 67.507s 🙃 That was a good suggestion .. |
|
@SGSSGene would you update this PR and remove the draft status? 🙂 |
01ced33 to
f4c78f5
Compare
f4c78f5 to
a0d6638
Compare
|
The search tests still pass, but the index tests fail (expectedly). Can you update the checksums in the tests so that the tests pass? |
|
@sarahet Or shall we wait with updating the index files / checksums until we also pull the default alphabet changes? |
|
I don't think we have to wait but I can also update the checksums in a separate PR if that's easier |
|
I guess depends how fast we are with the modes so that we can finalize the overall alphabet/mode change. I would like to include the nucleotide default in that PR/change and if we can fix that soon, then sure maybe cleaner to wait. Otherwise I would say we should proceed to keep moving forward? |
|
@SGSSGene we decided we will first include the newly defined modes in a separate PR and merge this one afterwards 🙂 |
|
superseded by #198 |
This updates the fmindex_collection submodule. In that version the loading of the fmindex should be faster.
In my local test, I checked how long it takes to load the bidirectional fmindex (ieprv2) of thehuman genome. I took the fastes run out of a couple of runs.
old version: 4.7s
new version: 2.1s
The index size increased:
old version: 6.2GB
new version: 6.6GB
Attention: a rebuild of the fmindex is required.