From: Arthur Zheng on
I'm trying to download Genomes in Progress from NCBI. The accession number,for example, can be "NZ_ACIR00000000". The code is as simple as below:

seq = getgenbank('NZ_ACIR00000000', 'SequenceOnly', 'true');

I tried a numver of times. However, I always got the error:
*********************************************************************
??? Error using ==> getncbidata>accession2gi at 370
The key NZ_ACIR00000000 has more than one sequence file associated with
it in the nucleotide database.

Error in ==> getncbidata at 179
[giID,db] = accession2gi(accessnum,db,'quick');

Error in ==> getgenbank at 82
gb =
getncbidata(accessnum,varargin{:},'database','nucleotide','fileformat','FASTA');
********************************************************************
What's wrong? thanks.
From: Paola Favaretto on
Hi Arthur,

Currently GETGENBANK can retrieve only one sequence at a time. The record you are trying to access (NZ_ACIR00000000) is associated with 216 sequences. Therefore, you could do one of the following:

1) You can access the information using the EUtililites. See the demo (ncbieutilsdemo - Accessing NCBI Entrez Database with E-Utilities) that ships with the toolbox for more information on how to use the EUtilities from MATLAB.

2) Alternatively, you will have to retrieve each sequence separately. Because the sequences have consecutive accession numbers starting from NZ_ACIR01000001 up to NZ_ACIR01000216, you can even automate the search by creating the accession number string and then calling getgenbank with that accession. However, there might be restrictions on how many searches of this type can be done at the NCBI site. Using the EUtils is preferable.

I hope this helps.

-Paola
From: Arthur Zheng on
"Paola Favaretto" <myname.mylastname(a)mathworks.com> wrote in message <hf3fg6$o40$1(a)fred.mathworks.com>...
> Hi Arthur,
>
> Currently GETGENBANK can retrieve only one sequence at a time. The record you are trying to access (NZ_ACIR00000000) is associated with 216 sequences. Therefore, you could do one of the following:
>
> 1) You can access the information using the EUtililites. See the demo (ncbieutilsdemo - Accessing NCBI Entrez Database with E-Utilities) that ships with the toolbox for more information on how to use the EUtilities from MATLAB.
>
> 2) Alternatively, you will have to retrieve each sequence separately. Because the sequences have consecutive accession numbers starting from NZ_ACIR01000001 up to NZ_ACIR01000216, you can even automate the search by creating the accession number string and then calling getgenbank with that accession. However, there might be restrictions on how many searches of this type can be done at the NCBI site. Using the EUtils is preferable.
>
> I hope this helps.
>
> -Paola

Hi Paola,

thanks for your response. I'll try your suggestions.

Hao