Sept 22, 2017
Please run the following snippet to enable shared configuration:
RS=/mnt/research/quantgen
echo -e "\nsource $RS/tools/configfiles/bash/bashrc" \
>> ~/.bashrc
echo -e "\nsource $RS/tools/configfiles/bash/bash_profile" \
>> ~/.bash_profile
touch /mnt/research/quantgen/tools/configfiles/bash/subscribers/$USER
Benefits: auto-loads R and PLINK, better defaults for working together, easier updates.
datasetsprojectsscratchtools, logs, shares, etc.See: /mnt/research/quantgen/README
datasets directorySubdirectories of a dataset:
source (read-only, sometimes encrypted and with access control)
derivative (read-only)
playyard (read-and-write)
projects directory.projects directoryUKB/landscape)
gruenebe)scratch directory/mnt/ls15/scratch/groups/quantgen)For I/O-heavy projects:
$ crontab -l
0 0 * * * /mnt/research/quantgen/tools/cronjobs/ukb-500-output-transfer.sh
$ cat /mnt/research/quantgen/tools/cronjobs/ukb-500-output-transfer.sh
rsync -av /mnt/research/quantgen/scratch/projects/UKB/PIPELINE500/GWAS \
/mnt/research/quantgen/projects/UKB/PIPELINE500/output/
Let me know if you need help setting this up.
500k dataset was released.
All: 488,377 White British: 409,703
Calls: 805,426
Genotype Calls: /mnt/research/quantgen/datasets/UKB/source/genotypes/calls500
Phenotypes: /mnt/research/quantgen/datasets/UKB/source/phenotypes (no changes)
Genotype-derived phenotypes: /mnt/research/quantgen/datasets/UKB/source/genotypes/sample_qc
Problem:
> The genetic data was imputed using two different reference panels. The > Haplotype Reference Consortium (HRC) panel was used as first choice > option, but for SNPs not in that reference panel the UK10K + 1000 Genomes > panel was used. The problem arose in the second set of imputed data from > the UK10K + 1000 Genomes panel. The genotypes at these SNPs are imputed > correctly, but have not been recorded as having the correct genome > position in the files. > For now we recommend that researchers focus exclusively on SNPs in the > HRC panel, or work with the directly genotyped data until the new release > is available.
http://www.ukbiobank.ac.uk/2017/07/important-note-about-imputed-genetics-data/
derivative directory: /mnt/research/quantgen/datasets/UKB/derivative/
BED/calls500_unfiltered (renamed original BED files)cohorts/calls500_unfiltered/whites (white cohort, FID IID)relabeled_phenotypes (uses labels instead of cryptic field IDs)Project directory: /mnt/research/quantgen/projects/UKB/PIPELINE500
BED (whites only, minor QC)phenotypes and phenotypes_genetic (whites only)adjusted_phenotypes (height)cohorts (genotyped_white, genotyped_white_related, genotyped_white_unrelated)BGData, summariesGMatrix and related_pairssample_sets, GWASld, markers