For a list of available instances of PheWeb, navigate [here](http://pheweb.sph.umich.edu). For a walk-through demo see [here](etc/demo.md#demo-navigating-pheweb). If you have questions or comments, check out our [Google Group](https://groups.google.com/g/pheweb-umich).  # How to Cite PheWeb If you use the PheWeb code base for your work, please cite our paper: Gagliano Taliun, S.A., VandeHaar, P. et al. Exploring and visualizing large-scale genetic associations by using PheWeb. *Nat Genet* 52, 550–552 (2020). # How to Build a PheWeb for your Data If this is broken, [open an issue on github](https://github.com/statgen/pheweb/issues/new) and hopefully I can help. ### 1. Install PheWeb ```bash pip3 install pheweb ``` - If that doesn't work, follow [the detailed install instructions](etc/detailed-install-instructions.md#detailed-install-instructions). ### 2. Create a directory and `config.py` for your new dataset ``` mkdir ~/my-new-pheweb && cd ~/my-new-pheweb ``` This directory will store all the files pheweb makes for your dataset. All `pheweb ...` commands should be run in this directory. Make `config.py` in this directory. In it, either set `hg_build_number = 19` or `hg_build_number = 38`. Other options you can set are listed [here](etc/detailed-loading-instructions.md#configuration-options). ### 3. Check that your GWAS summary statistics files will work You need one file for each phenotype. Most common GWAS file formats should work. Here are the requirements: - It needs a header row. - Columns can be delimited by tabs, spaces, or commas. - It needs a column for the reference allele (which must always match the bases on the reference genome that you specified with `hg_build_number`) and a column for the alternate allele. If you have a `MARKER_ID` column like `1:234_C/G`, that's okay too. If you have an allele1 and allele2, and sometimes one or the other is the reference, then you'll need to modify your files. - It can be gzipped if you want. - Variants must be sorted by chromosome and position, with chromosomes in the order [1-22,X,Y,MT]. The file must have columns for: | column description | name | other allowed column names | allowed values | | --- | --- | --- | --- | | chromosome | `chrom` | `#chrom`, `chr` | 1-22, `X`, `Y`, `M`, `MT`, `chr1`, etc | | position | `pos` | `beg`, `begin`, `bp` | integer | | reference allele | `ref` | `reference` | must match reference genome | | alternate allele | `alt` | `alternate` | anything | | p-value | `pval` | `pvalue`, `p`, `p.value` | number in [0,1] | You may also have columns for: | column description | name | other allowed column names | allowed values | | --- | --- | --- | --- | | minor allele frequency | `maf` | | number in (0,0.5] | | allele frequency (of alternate allele) | `af` | `a1freq`, `frq` | number in (0,1) | | AF among cases | `case_af` | `af.cases` | number in (0,1) | | AF among controls | `control_af` | `af.controls` | number in (0,1) | | allele count | `ac` | | integer | | effect size (of alternate allele) | `beta` | | number | | standard error of effect size | `sebeta` | `se` | number | | odds ratio (of alternate allele) | `or` | | number | | R2 | `r2` | | number | | number of samples | `num_samples` | `ns`, `n` | integer, must be the same for every variant in its phenotype | | number of controls | `num_controls` | `ns.ctrl`, `n_controls` | integer, must be the same for every variant in its phenotype | | number of cases | `num_cases` | `ns.case`, `n_cases` | integer, must be the same for every variant in its phenotype | Column names are case-insensitive. If your file has a different column name, set `field_aliases = {"column_name": "field_name"}` in `config.py`. For example, `field_aliases = {'P_BOLT_LMM_INF': 'pval', 'NSAMPLES': 'num_samples'}`. Any field can be null if it is one of ['', '.', 'NA', 'N/A', 'n/a', 'nan', '-nan', 'NaN', '-NaN', 'null', 'NULL']. If a required field is null, the variant gets dropped. If your pval is log10 (like in REGENIE output), then set these variables in config.py: `pval_is_neglog10 = True` and `field_aliases = {'LOGP':'pval'}`. ### 4. Make a list of your phenotypes Inside of your data directory, you need a file named `pheno-list.json` that looks like this: ```json [ { "assoc_files": ["/home/peter/data/ear-length.gz"], "phenocode": "ear-length" }, { "assoc_files": ["/home/peter/data/a1c.X.gz","/home/peter/data/a1c.autosomal.gz"], "phenocode": "A1C" } ] ``` Each phenotype needs `assoc_files` (a list of paths to association files) and `phenocode` (a string representing your phenotype that is used in filenames and URLs, comprised of `[A-Za-z0-9_~-]`). If you want, you can also include: - `phenostring` (string): a name for the phenotype. Shown in tables and tooltips and page headers. - `category` (string): groups together phenotypes in the PheWAS plot. Shown in tables and tooltips. - `num_cases`, `num_controls`, and/or `num_samples` (number): if your input data only has `AC` or `MAC`, this will be used to calculated `AF` or `MAF`. Shown in tooltips. If your input data has correctly-named columns for these, the command `pheweb phenolist read-info-from-association-files` will add them into your existing `pheno-list.json`. - anything else you want, but you'll have to modify templates to use it. You can use a csv by running: ``` pheweb phenolist import-phenolist "/path/to/pheno-list.csv" ``` or you can make one from scratch by running: ``` pheweb phenolist glob --star-is-phenocode "/home/peter/data/*.gz" ``` You can see other methods [here](etc/detailed-loading-instructions.md#making-pheno-listjson). ### 5. Load your association files Run `pheweb process`. To distribute jobs across a cluster, follow [these instructions](etc/detailed-loading-instructions.md#distributing-jobs-across-a-cluster). To include VEP annotations, follow [these instructions](etc/detailed-loading-instructions.md#annotating-with-vep). If something breaks and you can't understand the error message or it's something that PheWeb should support by default, [open an issue on github](https://github.com/statgen/pheweb/issues/new) or email me. ### 6. Serve the website Run `pheweb serve --open`. That command should either open a browser to your new PheWeb, or it should give you a URL that you can open in your browser to access your new PheWeb. If it doesn't, follow [the directions for hosting a PheWeb and accessing it from your browser](etc/detailed-webserver-instructions.md#hosting-a-pheweb-and-accessing-it-from-your-browser). ### More options: To run pheweb through systemd, see sample file [here](etc/pheweb.service). To use Apache2 or Nginx, see instructions [here](etc/detailed-webserver-instructions.md#using-apache2-or-nginx). To require login via OAuth, see instructions [here](etc/detailed-webserver-instructions.md#using-oauth). To track page views with Google Analytics, see instructions [here](etc/detailed-webserver-instructions.md#using-google-analytics). To reduce storage use, see instructions [here](etc/detailed-webserver-instructions.md#reducing-storage-use). To customize page contents, see instructions [here](etc/detailed-webserver-instructions.md#customizing-page-contents). PheWeb can display genetic correlations generated by [another tool](https://github.com/statgen/pheweb-rg-pipeline). To use this feature, set `show_correlations = True` in `config.py` and place the output of the rg pipeline as `pheno-correlations.txt` in the same folder as `pheno-list.json`. To hide the button for downloading summary stats, add `download_pheno_sumstats = "secret"` and `SECRET_KEY = "your random string"` in `config.py`. That will make a secret page (printed to the console when you start the server) to share summary stats. To hide the button for downloading top hits and phenotypes, add `download_top_hits = "hide"` and `download_phenotypes = "hide"` respectively. To allow dynamically filtering the manhattan plot, run `pheweb best-of-pheno` and set `show_manhattan_filter_button=True` in `config.py`. # Modifying PheWeb See instructions [here](etc/detailed-development-instructions.md). See documentation about the files in `generated-by-pheweb/` [here](etc/detailed-internal-dataflow.md).