+ - 0:00:00
Notes for current slide
Notes for next slide

Mobile genetic elements (MGEs) in the gut microbiota.

Mirna Vázquez Rosas Landa

2022/04/04

1 / 32

Mobile genetic elements (MGEs) in gut metagenomes

In the beginning the mobile genetic elements were associated with pathogenic strains; however, this has changed with time Dobrindt et al.2004. For example, an immense diversity of phages has been described in the ocean, suggesting that these mobile elements might play a relevant role in microbial communities working as seed banks for adaptive genes which any community member can potentially acquire (Fancello et al.2011; Modi et al.2013; Subirats et al.2016).

I started my search by looking for these three types of mobile elements.

  • Insertion sequences
  • Integrons
  • Phages
2 / 32

Insertion sequence elements in the gut metagenome

  • To search for insertion elements I used ISEScan. This program is a python pipeline that identifies elements requiring a nucleotide sequence as the input and identifies both complete and partial Insertion Sequence (IS).
3 / 32

Install ISEScan

  • Create a conda environment
conda create -n isescan_env
  • Activate the conda environment
conda activate isescan_env
  • Add Bioconda channels
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
  • Conda install
conda install -n isescan_env -c conda isescan
4 / 32

Run ISEScan

nohup isescan.py --seqfile 0118A1023.contigs.fa --output isescan_gut_microbiota --nthread 20 &
5 / 32

Explore the output of ISEScan

  • The program uses FragGeneScan to make the protein prediction, so one first exciting thing to look at is how many proteins were inferred?
grep -c ">" 0118A1023.contigs.fa.faa

Number of sequences: 15,164

  • How many elements were inferred? One easy way to do it is to count the number of rows in the output file.
wc -l 0118A1023.contigs.fa.tsv

Number of elements: 71

6 / 32

Explore the output of ISEScan

  • We can take a quick look to the output file.
7 / 32

Explore the output of ISEScan

  • Which families are more abundant?
more_abundant<-Insertion %>%
count(family) %>% # Count the number of elements in the family column
arrange(-n) # Arrange the number from the from largest to smallest
head(more_abundant) # Show the top six
## # A tibble: 6 × 2
## family n
## <chr> <int>
## 1 IS21 15
## 2 IS3 9
## 3 IS256 6
## 4 IS66 6
## 5 IS982 6
## 6 IS1182 5

The most abundant element is:

8 / 32

Complete and partial elements

Some of the IS elements can be incomplete, which may reflect ancestral transposition events (Siguier et al., 2014) suggesting that the rest of the key sequence features were lost over time.

Therefore, I focused at first on the complete elements, considering they are more likely to be recent and active in the community.

9 / 32

Explore the output ISEScan

ISEScan has an extra column indicating if the element is complete or partial; therefore, I filtered the output file to find the most abundant complete elements in the metagenome.

  • Which ISs are complete?
Complete<-Insertion %>%
filter(type == "c") %>% # Filter the rows that had a "c" in the type column, that stands for complete
count(family) %>%
arrange(-n)
head(Complete)
## # A tibble: 6 × 2
## family n
## <chr> <int>
## 1 IS1182 4
## 2 IS3 4
## 3 IS30 3
## 4 IS66 3
## 5 IS256 2
## 6 IS4 2

The most abundant elements that are also complete and probably more recent are:

10 / 32

New hypothesis!!!

  • What if there are partial sequences that also contain transposases?

  • Maybe the rest of the sequence is highly divergent, and that is why we can not find any matches to the databases; therefore, the sequences are classified as partial?

  • If they have transposases, that might mean that they must be elements that could be actively moving something in the community, but since they are divergent and new, the program is not able to classify them (properly).

11 / 32

Explore the output of ISEScan to look for potential new elements.

  • I ran InterproScan to identify transposases domains in the sequences that were predicted with FragGeneScan by ISEScan.
interproscan.sh -cpu 20 -goterms -pa -i 0118A1023.contigs.fa.orf_interpr.faa > Log_Interpro_Scan_$i.txt
  • Then I extracted a list of the IDs of the partial sequences.
list_partial<-Insertion %>%
filter(type == "p") %>% # Here I am filtering the output table to extract the "partial" hits.
select(seqID, orfBegin, orfEnd, strand) %>% # Select certain columns on the table
unite("seq_name", c("seqID", "orfBegin", "orfEnd", "strand"), sep="_") %>% # The output of ISEScan uses as sequence ID the name of the scaffold so I create a new colum merging the information from the start and end of the sequence to be able to match the results with InterproScan
distinct() %>% # Remove duplicated
pull() # Create a vector
12 / 32

Explore the output of ISEScan to look for potential new elements.

  • I read the output and filtered the sequences based on the list.
interpro<-read_delim("../data/0118A1023.contigs.fa.orf_interpro.faa.tsv", delim="\t", col_names = F) %>%
filter(X1 %in% list_partial) %>% # Filter the table using the IDs extracted previously
filter(X4 == "Pfam") %>% # Filter the output to get the PFAM domains
select(X1, X5, X6) %>%
filter(str_detect(X6, "Transposase")) # Search for elements annotated as transposases
13 / 32

Conclusion of ISEScan

  • The most abundant element in the dataset is partial IS21.

  • The elements IS1182 and IS3 are the most abundant complete elements.

  • The analysis showed that there are 17 potential new mobile elements in the dataset.

14 / 32

Integron prediction in the gut metagenome.

  • Integrons are assembly platforms — DNA elements that acquire open reading frames embedded in exogenous gene cassettes and convert them to functional genes by ensuring their correct expression (Mazel., 2006).

  • I used Integron Finder to identify the integrons in the metagenome.

15 / 32

Install Integron Finder

  • Create a conda environment
conda create -n integronFinder
  • Activate the conda environment
conda activate integronFinder
  • Conda install
conda install integron_finder -n integronFinder
16 / 32

Run Integron Finder

integron_finder --local-max --func-annot 0118A1023.contigs.fa
17 / 32

Explore Integron Finder output

The output of the program contain 3 files:

  • 0118A1023.contigs.integrons: A file with all integrons and their elements detected in all sequences in the input file.

  • 0118A1023.contigs.summary: A summary file with the number and type of integrons per sequence.

  • integron_finder.out: A copy standard output.

18 / 32

Explore Integron Finder output

Lets see the 0118A1023.contigs.integrons file:

integrons<-read_delim("../data/0118A1023.contigs.integrons", delim = "\t",
skip = 1)

Integron finder can predict three types of elements described in Cury et al., 2016:

  • Complete integrons: Include an integrase and at least one attC site
  • CALIN: The clusters of attC sites lacking integron-integrases (CALIN) are composed of at least two attC sites.
  • In0: The In0 elements are composed of an integron integrase and no attC sites.

In this metagenome, I was only able to identify two CALIN elements.

19 / 32

New hypothesis!!!

  • What genes are present in qb307082014_I_scaffold_18?

  • Exploring the neighborhood genes to the CALIN element would tell us maybe more about what are these sequences doing and why are they potentially important for the environment?

  • I used OperonMapper to predict the genes that are near the CALIN element predicted.

20 / 32

Operon Mapper output

The output files from Operon Mapper are:

  • predicted_protein_sequences_417647
  • predicted_orfs_417647
  • predicted_COGs_417647
  • operonic_gene_pairs_417647
  • list_of_operons_417647
  • functional_descriptions_417647
  • ORFs_coordinates_417647

The list_of_operons_417647 file contains the annotation of the elements and the number of operons predicted.

21 / 32

Operon Mapper output

Exploring the list_of_operons_417647 file.

Operon_mapper<-read_delim("../data/417647/list_of_operons_417647", delim="\t") %>%
fill(Operon) # Fill NA values with the previous value in a column
22 / 32

Operon Mapper output

By manually looking at the list_of_operons_417647 file I found a sequence that is associated to a transposase is COG2826, and was found in the qb307082014_I_scaffold_18 scaffold. Suggesting is the one inferred by Integron Finder.

COG2826<-read_delim("../data/417647/list_of_operons_417647", delim="\t") %>%
fill(Operon) %>%
drop_na(Type) %>%
filter(COGgene == "COG2826")
23 / 32

Operon Mapper output

I look at the operons that where up and down to see which genes would be there and I found something interesting.

operons_to_look<-c("7", "9")
neighbour_genes<-read_delim("../data/417647/list_of_operons_417647", delim="\t") %>%
fill(Operon) %>%
drop_na(Type) %>%
filter(Operon %in% operons_to_look)
24 / 32

Blast search results

  • The gene in operon 9 has a non identidied function, I extracted the sequence and make a web blast search.

  • The best hit at the NCBI database was accessory regulator AgrC. The same protein at UniProt was A0A0K2MC93, which is classified by Gene Ontology database as GO:0000155:

Catalysis of the phosphorylation of a histidine residue in response to detection of an extracellular signal such as a chemical ligand or change in environment, to initiate a change in cell state or activity. The two-component sensor is a histidine kinase that autophosphorylates a histidine residue in its active site. The phosphate is then transferred to an aspartate residue in a downstream response regulator, to trigger a response.

25 / 32

Conclusion

  • In this metagenome I was only able to identify two CALIN elements.

  • These elements were identified in the same scaffold and the Operon Mapper analysis revealed that there is a hypothetical protein cluster nearby involved in signal transduction from the environment.

26 / 32

Conclusion

  • In this metagenome I was only able to identify two CALIN elements.

  • These elements were identified in the same scaffold and the Operon Mapper analysis revealed that there is a hypothetical protein cluster nearby involved in signal transduction from the environment.

I think that the genes within this integron cassette are proteins related to some environmental response to stress, meaning that in this community microbes exchange genes that allows them cope with a changing environment.

26 / 32

Phages in the gut metagenome.

27 / 32

Run PHASTER

You can run a bunch of jobs within a loop using the API. I have done it before for one of my previous publications.

Here I just run it like that

wget --post-file="0118A1023.contigs.fa" "http://phaster.ca/phaster_api" -O 0118A1023.contigs.fa.phaster

And then you have to wait... And I still waiting

This way you can access to the job list and see how long is last for your job to run:

wget "http://phaster.ca/phaster_api?acc=ZZ_cd44018c8e" -O Output_filename
28 / 32

How would I analyze PHASTER results?

After the files were processed, we downloaded the results:

wget -O ID_FROM.phaster "http://phaster.ca/jobs/ID_FROM.phaster/summary.txt"

Then I would extract the number or intact prophages in the metagenome.

for i in $(*.fasta.phaster); do grep "intact prophage" $i >>number_phages; done
29 / 32

Future directions

  • Diversity. To determine how divergent are all those elements elements I would do phylogenetic reconstructions of the transposases and the genes that cluster closely to the mobile elements.
30 / 32

Future directions

Reconstruct genomes. Reconstructing the genomes would allow me to know in which organism this elements are present and build more accurate hypothesis regarding the ecology and evolution of the mobile elements.

Over the MAGs I could run other programs and build a more accurate inferences of the operons:

  • OperonMapper to predict the genes that are potentially being transfer within the community.

  • IslandViewer to identify elements that are actively moving.

31 / 32
32 / 32

Mobile genetic elements (MGEs) in gut metagenomes

In the beginning the mobile genetic elements were associated with pathogenic strains; however, this has changed with time Dobrindt et al.2004. For example, an immense diversity of phages has been described in the ocean, suggesting that these mobile elements might play a relevant role in microbial communities working as seed banks for adaptive genes which any community member can potentially acquire (Fancello et al.2011; Modi et al.2013; Subirats et al.2016).

I started my search by looking for these three types of mobile elements.

  • Insertion sequences
  • Integrons
  • Phages
2 / 32
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
oTile View: Overview of Slides
Esc Back to slideshow