Download Tcga Data Using R
Tutorial:Protocol To Downlad TCGA Data From GDC 3 Now that TCGA moved under Genomic data commons (GDC), Almost all the prevous user are struggling to retrive the same information. This tutorial try to show how to download TCGA data from GDC Step 1. Obtaining a Manifest File for Data Download (manifest is use to specify type of the data to download) Step 2. Install download software: GDC Data Transfer Tool (Linux, Windows, MACS) Step 3.1 Downloading Data Using a Manifest File (gdc_manifest.lungCancer.txt) Step 3.2 Downloading Single Data Using a UUID (UUID can be found in manifest file) Step 3.3 Downloading Controlled Data (user authentication token is required) FQA: Answer: glibc 2.12 is the latest that's available for CentOS 6. that means CentOS cannot used to download the data(UCSD, TSCC). 2, How to download controlled data from GDC 3, Eventually, I asked TSCC manager to help me install fastq-dump in TSCC 4, Download failed happened sometimes since the internet problem, but don't worry, just try again GDC TCGA Download methylation Tutorial • 32k views If you are looking for a flexible programmatic approach, you might take a look at the GenomicDataCommons Bioconductor package: https://bioconductor.org/packages/GenomicDataCommons The following code builds a manifest that can be used to guide the download of raw data. Here, filtering finds gene expression files quantified as raw counts using HTSeq from ovarian cancer patients. The next code block downloads the 379 gene expression files specified in the query above. Using multiple processes to do the download very significantly speeds up the transfer in many cases. On a standard 1Gb connection, the following completes in about 30 seconds. If the download had included controlled-access data, the download above would have needed to include a token. For the CentOS, you need to download the gdc-client source code to compile yourself. gdc-client github issued this problem that glibc 2.12 is the latest that's available for CentOS 6. If your system is CentOS release 6.6, I think you should download the gdc-client source code and compile it yourself. gdc-client is based on the py2. You may meet the problem The 'lxml==3.5.0b1' distribution was not found and is required by gdc-client or ImportError: /usr/lib64/libxml2.so.2: version `LIBXML2_2.9.0' not found (required by lxml/etree.so) You need to install Then Finnaly, compile gdc-client source code. It worked. Take Bladder cancer as example: 1, Go the following link (legacy-archive at GDC): https://gdc-portal.nci.nih.gov/legacy-archive/search/f?filters=%7B%22op%22:%22and%22,%22content%22:%5B%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.program.name%22,%22value%22:%5B%22TCGA%22%5D%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:%5B%22TCGA-BLCA%22%5D%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.platform%22,%22value%22:%5B%22Illumina%20Human%20Methylation%20450%22%5D%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_format%22,%22value%22:%5B%22TXT%22%5D%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:%5B%22DNA%20methylation%22%5D%7D%7D%5D%7D 2, Add all 440 files to cart and download Manifest file 3, You will see the first and second column of the Manifest file is UUID and Sample ID
https://gdc-portal.nci.nih.gov/legacy-archive/search/f
https://gdc.nci.nih.gov/access-data/gdc-data-transfer-tool
gdc-client download -m gdc_manifest.lungCancer.txt
gdc-client download 22a29915-6712-4f7a-8dba-985ae9a1f005
gdc-client download -m gdc_manifest_controled.txt -t gdc-user-passwdcode.txt
1, ./gdc-client: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /tmp/_MEI5oSpPi/libz.so.1)
find data
library(GenomicDataCommons) library(magrittr) ge_manifest = files() %>% filter( ~ cases.project.project_id == 'TCGA-OV' & type == 'gene_expression' & analysis.workflow_type == 'HTSeq - Counts') %>% manifest()
Download data
destdir = tempdir() fnames = lapply(ge_manifest$id,gdcdata, destination_dir=destdir,overwrite=TRUE, progress=FALSE)
git clone https://github.com/NCI-GDC/gdc-client
python setup.py install
libxslt
and libxml2
in your home path. And add xml2-config
and xslt-config
to your path. export PATH="/prog_path/libxslt-1.1.29/bin:/prog_path/libxml2-2.9.4/bin:$PATH"
pip uninstall lxml
pip install lxml==3.5.0b1 --install-option="--auto-rpath"
python setup.py install
Login before adding your answer.
Source: https://www.biostars.org/p/204092/
Posted by: blekroom.blogspot.com