报错日志
Traceback (most recent call last):
File "/app/alphafold/run_alphafold.py", line 445, in <module>
app.run(main)
File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/app/alphafold/run_alphafold.py", line 429, in main
is_prokaryote=is_prokaryote)
File "/app/alphafold/run_alphafold.py", line 177, in predict_structure
msa_output_dir=msa_output_dir)
File "/app/alphafold/alphafold/data/pipeline.py", line 225, in process
use_precomputed_msas=self.use_precomputed_msas)
File "/app/alphafold/alphafold/data/pipeline.py", line 101, in run_msa_tool
result = msa_runner.query(input_fasta_path)[0]
File "/app/alphafold/alphafold/data/tools/hhblits.py", line 144, in query
stdout.decode('utf-8'), stderr[:500_000].decode('utf-8')))
RuntimeError: HHblits failed
stdout:
stderr:
- 02:15:01.531 ERROR: Could find neither hhm_db nor a3m_db!
资料收集
- RuntimeError: HHblits failed #9 重新下载所有数据文件
- HHblits failed #12 虚拟内存不足
扩展虚拟内存后无改善
- HHblits failed #14 重新下载bfd数据文件
- Singularity - HHblits: Could find neither hhm_db nor a3m_db #202 运行权限问题`
- HHblits failed: no stdout and no stderr message #211 CPU avx2指令集 兼容性
- RuntimeError: HHblits failed #284 物理内存不足32G/reduced_dbs
问题排查
- 更改运行指令增加:
--db_preset=reduced_dbs
, 使用快速预测模式,正常
单步执行Dockerfile指令,自己构建一个镜像实例
从
Dockfile
中提取关键镜像为:nvidia/cuda:11.1-cudnn8-runtime-ubuntu18.04
,基于这个镜像创建docker实例并进入sudo docker run -it --name alphafold-ldmf -v /pathTo/alphafold/:/data/alphafold -v /pathTo/alphafold-data/:/data/alphafold-data --runtime=nvidia -e NVIDIA_VISIBLE_DEVICE=all nvidia/cuda:11.1-cudnn8-runtime-ubuntu18.04
修改dns和pip源,并单步执行Dockerfile下的每一行操作
/etc/resolv.conf
nameserver 114.114.114.114 nameserver 8.8.8.8 nameserver 119.29.29.29 nameserver 223.5.5.5
- git克隆alphafold_non_docker项目
- wget下载alphafold-v2.1.1release代码(alphafold_non_docker作者还没支持最新的v2.1.2),并解压放到alphafold_non_docker下
修改alphafold_non_docker下的run_alphafold.sh,将alphafold的路径配置好
current_working_dir=$(pwd) 改为 current_working_dir=$(pwd)/alphafold-2.1.1
复制测试T1050.fasta文件,运行
bash run_alphafold.sh -d /data/alphafold-data -o /data/alphafold-result -f /data/alphafold/T1050.fasta -t 2020-05-14
- 报错RT
基于日志,得到单步指令,运行HHblits
/usr/bin/hhblits -i /data/alphafold/T1050.fasta -cpu 4 -oa3m /tmp/tmpsv9qnin5/output.a3m -o /dev/null -n 3 -e 0.001 -maxseq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d /data/alphafold-data/origin_bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt -d /data/alphafold-data/uniclust30/uniclust30_2018_08/uniclust30_2018_08
搜索hhblits,得到git仓库地址,克隆之后,根据源码,确认第8步命令实际是读取:
/data/alphafold-data/origin_bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt
下的.ffindex
和.ffdata
文件/data/alphafold-data/uniclust30/uniclust30_2018_08/uniclust30_2018_08
下的.ffindex
和.ffdata
文件
- 观察到这两个目录下有
.ffindex
和.ffdata
,但没有r权限,所以,chmod给读取权限 - 重新运行,等待,等待,等待,问题解决。
解决方式
确认并将alphafold数据文件的权限,并给所有的文件加读
权限
cd /pathTo/alphafold-data/
chmod -R +r ./*
即:资料收集的Singularity - HHblits: Could find neither hhm_db nor a3m_db #202
验证有效,需要将数据文件改权限。