20928 >> Michael Zyskowski: Hello. Good afternoon. And...

June 21, 2018 | Author: Anonymous | Category: Science, Biology, Biochemistry, Genetics
Share Embed Donate


Short Description

Download 20928 >> Michael Zyskowski: Hello. Good afternoon. And......

Description

20928 >> Michael Zyskowski: Hello. Good afternoon. And thank you all for coming. My name is Mike Zyskowski and I'm a program manager in the External Research Group in Microsoft, and I have the pleasure of introducing Xin-Yi Chua, who is an intern in our group this summer. And she is studying at the Queensland University of Technology, getting her Ph.D. in bioinfomatics. And she's joined us this summer to work on a project called GenoZoom. I'll let her describe the project in detail. And if there are other follow-up questions after this event, or if you wish to contact Xin-Yi later about this effort, her contact information is actually included in the About dialogue box of this application. So without further ado, I'd like to introduce Xin-Yi. >> Xin-Yi Chua: Thank you very much for the intro, Mike. So I'd like to first thank you for joining me at the presentation. So I'll just go on ahead. So the name of the project was GenoZoom. I do have quite a few slides. And they do cover a lot of content, but that was mainly because the slides will be put up with the application itself. But I won't necessarily go into every single point. Just a quick overview. So what was the motivation behind this project so currently there are actually a lot of many publicly available genome browsers out there online and we didn't want to reinvent the wheel, but we did notice a few things with the current browsers. So they don't really scale well to the data. They don't provide seamless user experience when you're going from low to high resolutions rapidly. It's difficult to view your own sequence data in the application. So, for example, the UCSC genome browser which is probably the most popular browser out there, to view your own genome sequence, you would have to actually download the source code and then set it up yourself. And it's quite difficult for a non-Unix expert. And they don't really support unformatted user annotations. So you'll get to see a bit of that in my demo later on. So the proposed solution for the project was to investigate how DeepZoom and Silver Light would be able to address some of these issues and make the user experience much more seamless and smooth. So why DeepZoom or AKA also known as Silver Dragon? So basically it was because it provided a nice way to navigate large-scale data and it optimized bandwidth. And it did this by creating a image permit of your high resolution image data. And it only downloads the tiles that you are viewing. So further advantages of actually using DeepZoom and applying it to the genome space is that it takes care of the data sampling for you, because it already creates that image pyramid. With the preprocessed images, the user can pretty much select the region of interest they're interested in, so it jumps straight into the middle of the genome without downloading the entire file again. And then with the different images created as different collections, you have the potential to mix and match and create your own GenoZoom collection. So in the beginning, it all sounded very cool, very nice. But along the way I did notice some limitations. So the major limitation of DeepZoom, applying it to genome browsing domain, was that DeepZoom was primarily designed for images that sort of go with the traditional 4, 3, 16, 9 aspect ratios.

So it didn't lend itself well to the conventional genome images, which are really long and thin. So this is an example of a DNA sequence. So the demo that I'll be demonstrating is actually showing the E. Coli genome. So it's a back tierium that lives in the gut. That one for a back tierium, it's about four and a half million base pairs. So four and a half million characters. At four pixels per base pair, we're looking at a really long image. And only at eight pixels high. It's really long and thin. So the problem with using DeepZoom in this case is that when you zoom out to have a view of the entire genome, that line is pretty much invisible. You don't see it. I tried compensating by increasing the height. So when you're zoomed out you could still see images, but that has a performance hit, when you're generating the images, and it's pretty much a waste of space because everything vertically is the same. So the reason for this obstacle was primarily because of the way that DeepZoom works when you're zooming into an image. It actually stretches the image horizontally and vertically, whereas the desired behavior for a genome browser is just to move horizontally. What I've got is an animation to demonstrate what I mean. So this is your low resolution image at the top there, level N. And then you get a high resolution here, and then N plus 2. So what happens in DeepZoom is when you zoom in, you are actually stretching horizontally and vertically, and then it replaces the tiles from the higher resolution, and then again and you're replacing the tiles from the next zoom level. So that's the actual behavior. The desired behavior for a browser would be at each zoom level, the height of the image is actually the same. And what happens is when you zoom in, it only stretches the image horizontally and then replaces the tiles. So I tried different ways to work around that, find out what I could do with images, how I could play with them, just to get around that zooming obstacle. So one of the trials was I have just a single hosting control. And then I dynamically lay out all the images at runtime. So specify where each of the collection of images go. So at the top there, we've got the entire E. Coli genome. So it's called a gene density track. So the blue lines are showing genes in the forward direction and the black lines are showing genes in the reverse direction. So what the red boxes are meaning if you zoom into a region there, you get sort of more information coming up the deeper you go. So we go down to actual gene blocks with arrows pointing in the direction. You get associated graph information. Keep going down and you eventually get to the DNA sequence. But the problem with this approach is that you sort of lose contact. You can't see all that information in one go at that resolution. So you lose what happened to the peaks of the graphs. So that was one problem. The next attempt that I did was actually go into the individual zoom layer of DeepZoom image tiles and manually tweak those image tiles. So changed the sort of height at different zoom levels.

So zoom levels denoteed by the numbers. But what we notice in here is that the zooming action is no longer seamless. You get this sort of transparency happening in a popping type effect and it just doesn't look really nice when you're zooming in and things are popping out at you. And then there was -- came across an approach called Dynamic DeepZoom. So what this one was, instead of having all your image tyles preprocessed and hosted on the server, what happens is generally the MSR, the multi-scale image control Silver Light that hosts the DeepZoom images, whenever you'd zoom and pan, it actually hits http requests to the server and it comes back with image tiles. So instead of that, we intercept the http request, write out our own handler that generates the images on the fly and then send those tiles back. So the classic example is the Mandlebraut [phonetic] plot. That one, when you zoom in, it's recalculating what you see and then popping up the patterns. And I think this is a possible solution to the problem. And it's a worthwhile further investigation. The only thing I couldn't go further with it for the project was because it requires a database back in to store all the genomic data and I pretty much was weighing doing that in time with the internship itself. So the proposed approach for our GenoZoom was we arrived at a three component-based view. So I've got a navigation view at the top that shows the entire genome, and it's a static image. A region view, which is a zoomed in location, and then a details view, which goes down to the base pair level. And I'll go into more depth in the demo. So just quickly before the demo, sort of tossing out reasons why I selected Silver Light over WPF for this project. Basically because the selection of using DeepZoom in Silver Light we have the multi-scale image control which is not supported in WPF. If I went with WPF, I would have to reimplement all the animations with zooming and panning and actually have to handle data sampling as well. And then with Silver Light, it's accessible by Web. So that makes it real easy for biologists to share their research. And it also has the out-of-browser option which you can download it into your desktop and still interact with the application. But with Silver Light, there were a few disadvantages. Primarily I couldn't use the MBF library to pass my data file to generate the image, the raw image that's required by DeepZoom. And image loading is dependent on the network. So hopefully we won't see that in the demo. But sometimes the image tiles freezes, so it stays blurry. And with Silver Light you cannot access the local file system. But there is a workaround, which is to host the images on your local machine, and it still works. And then I'll look at the demo. Okay. So this is GenoZoom. So the top layer is the navigation view. Actually, I'll use the mouse. It's easier. So what I'm showing here is the E. Coli genome, which is the bacterium in our gut. So the top layer is the navigation view, which shows the entire genome. Blue showing genes in the forward direction and black showing genes in the reverse direction.

It's not very interesting because in bacteria they're small and pretty much the entire genome is contained functional information. So if it was showing [inaudible] you would have a sparser diagram. In the red box in the navigation view here corresponds to what we see in the region view. So that's from one to 25,000. And then the red box in the region view there corresponds to the details view. So most of the interaction and action happens in these two components here. So what I can do is pretty much just drag to a random location of interest. And this region in detail will correspond. So this is something that the current genome browsers lack with that seamless sort of interaction, when you drag to a random region, you have sort of a loading screen. Not that smooth animation that's happening. So let's see what's down to the details. So I can get actual DNA bases. I don't care about the first track. I'll move that. So in the GenoZoom it supports unformatted user annotation. What that means you pretty much put it in post-it note type of information. So I can change the coloring of that. And it pops up. And I can edit or delete that if I wish. What I can also do is search genes in E. Coli. Please be kind. There we go. So let's have a look at that one. It's related to toxins. So, again, I can add another push pin. And I can also search my annotations similar to what I did with searching a gene. Now to simplify sort of -- sorry. Not to simplify. What you can also do is configure the tracks. So you can turn them on and off. And you can also add in your own custom data. So if you have some that are already processed, it will come up. But they do have to be hosted, either on a local machine or somewhere on the Web server. But as long as you have the http URL you can add that in. Now, one example of how geno zoom with that horizontal and vertical -- sorry. Just let me restart. So while we wait, basically what I was going to mention is demo to you what happens with the horizontal and vertical zooming action. If you have graphs with peaks and troughs when you keep zooming in the peaks and troughs pretty much clip. So I was trying to find different ways of visualizing that sort of data but still keep, be able to sort of know what the values are associated with that location. >>: What's going on with our server? I was actually having trouble with the server this morning. So ->>: Where is the server? >>: It's on the DNZ on an external ->> Xin-Yi Chua: I'll just leave it and move on. So this was one of the issues that we noted, that with DeepZoom, you can have the image freezing up on you. One of the reasons why this was happening was because DeepZoom was set up to avoid possible DOS attacks. So if you do a lot

of zooming around and panning, that sort of thing, it sort of just stops you and stops the images from downloading. So unfortunately that is happening right now. But I'll move on. So the thing is that you can pretty much install the application on the desk top, and if you had a net connection you could do all the interactions I was showing in the browser. I also created a tool, the GSIC, GenoZoom Image Generator. Basically it's a tool to convert the gene banked file, put it through the MBF library, deposit, and generate the raw image data which is then put through the DeepZoom tools to create the image pyramids. Okay. Some of the known issues with the application. The image generation can be slow. But if you have reference data, they don't change that often. So once you create it for the reference data, it's not too big of an issue. Data storage can be a problem, because I'm creating turning text pretty much into large image directories. But with the dynamic DeepZoom I was mentioning before with the database back end on the server this could potentially solve that problem. Multiple mouse events before the animation has caught up I have seen can have unpredicted behavior with the locations. And then with the details and range slider. So the red windows in your other views sometimes, sometimes the synchronization behavior there is a bit unexpected there because sort of where is the input that's coming from. Few sort of disadvantages using DeepZoom in a genome browser is I can't dynamically change visuals. What I mean by that I can't select a group of genes and change the color of their genes because I'm dealing with static images. The custom data must be hosted by a Web server, which I mentioned. I guess in this case if there is a multi-scale image control in WPF, that may solve that problem. And the data exposure from text images is also another issue. But with images, you could potentially display much more information than you get in the text. So it's sort of weighing a trade-off there. Performance is limited by the network connections and that again if the images were on a local host performance itself could be faster. Okay. But the advantages. It does produce a smoother user experience. What I should do is actually demonstrate one of the other genome browsers that are currently out there at the moment. If you're interested, I'll be happy to demo that after. You do get quick navigation from low to high resolution. And with the tagging, it supports unformatted user annotation. So what that means is basically you don't have to conform to what the server requires. You can just put in push pin or post-it note type tags. And I have the intention of converting those tags into sort of tracks so you can save them and up load them for future use.

It's easier for the user to view their own data. So they don't need to actually download that and set up their own servers, as long as they've got their data converted into the image pyramid then they can get the URL and convert it into the application itself. And that sort of lends itself to having a potential of the user to create their own sort of genome zoom collection. So if they're happy with all these different tracks and different information then they can put them together and save it. Because I'm dealing with images, the application itself is sort of in a blunt way of putting it, it's dumb to what it's hosting. It's sort of just images. Potentially you can generate any sort of image you want as long as they line up to genome data location coordinate. And I have a whole list of future work but I'll concentrate on a few. The first one is I would really like to look at creating this using the dynamic approach and using the database in the back end to see how it would solve some of the issues that I mentioned previously. Integration with Pivot. That was actually on the agenda but we just ran out of time. And linking to external sources. So a whole list up there. And the last one is actually, I would like to compare the performance of this compared to like a pure Silver Light or WPF version, actually drawing all those features every time you zoom and pan. So just to see what the effects are there. And to close off, just like to say a special thank you to my mentor, Mike. Simon, Bob and Vince who I worked really close throughout this summer and gave me feedback for the project, which I really appreciate. And a whole lot of other teams that helped make this happen. Sort of dangerous putting a thank you slide -- if you don't find your name up there, thank you. And lastly, I'd like to thank you as well for turning out for the presentation. Really appreciate that. And questions? >>: Saw a search can search for something like toxin which implies that somebody already identified its purpose. But what if I wanted to go look for some [inaudible] sequence I found in some other bacteria, could I type in base layers and have it find it for me? >> Xin-Yi Chua: Not in the prototype version. Sort of the issue with that is, as I previously said, the GenoZoom is sort of [inaudible], because it's only hosting static images. If I wanted to do that, I would also have to send the underlying base pair sequence. If I did that, it could essentially do that or you could sort of do all the searches using external tools like MIM or sort of pattern recognition tools. >>: [inaudible]. >> Xin-Yi Chua: Sorry? >>: Then it would tell me where to start. >> Xin-Yi Chua: Yes, then you could upload that information as a track. >>: Will this tool be published outside of Microsoft? Other people can access it?

>> Xin-Yi Chua: Yes, I should have mentioned that. The intention this will be open source and it will be put on to code plex, so the source code will be available and you can download it. The website I was working from is actually MBF server. It might be a temporary link. We haven't worked out if it's going to be hosted somewhere yet. That's going to be -- I'm going to talk to my mentor about that. >>: We'll end up keeping it there at a minimum so people outside of Microsoft can access it, demo it as well as the source code. There will be a document as well as this presentation. >> Xin-Yi Chua: Yes. So there will be a technical document which goes more into sort of the classes and structures and all that stuff. >>: [inaudible]. >> Xin-Yi Chua: Yes. >>: Can you comment a little more on the problems you had with image generation, because I have some of the same problems? The DeepZoom image generation how long it takes because a lot of us have those problems. What kind of possibilities are there that might help that? >> Xin-Yi Chua: Okay. So with image generation, my sort of bottleneck was actually producing the raw images that would feed as input into DeepZoom itself. So it wasn't actually DeepZoom making the pyramids. That was all right. But it was actually generating four and a half billion characters and then all that. So I was looking at 1500 images. And then for each track I was looking at 1500 images. And those were the bottlenecks. So different ways of trying to do that was parallelizing the code. I think it depends on your underlying structure. At times it's saved me time -- saved -- sorry, improved performance, but at other times, because it was a contiguous block, I needed to know what images were in my previous block so I couldn't do that in parallel. But I think that just needs a rewrite of the code itself. >>: So did you have a chance to have anybody in the domain really look at this, see, is it significantly better than some of the ideas in the transitions, animations helping in maintaining context when they're browsing or what other feedback have you had a chance to get from anybody back in Australia or wherever? >> Xin-Yi Chua: So short answer, no. Longer answer, earlier sort of mid-internship I was doing the user experience studies, at that time I had a very, very early prototype version of the Genome Zoom, and I sort of handed it out to people to get their reaction. One person liked the whole animation, smooth navigation, that was positive feedback. But, I think, generally, we didn't see that many people and generally it depends on what they were working on because the trend is that the biologists tend to stay within a specific zoom level. They don't really go all the way from a genome down to the DNA. They tend to sort of go the other top to the navigation region view or the region view and details view. Yeah, I think need mover sort of user feedback on that. And then the other space, I did send a link to David Heckerman so I'm hoping to get some feedback from him as well.

>>: Do you think this will scale to do the [inaudible]. >> Xin-Yi Chua: Yes, because basically you're just doing images. So I'm thinking theoretically it should just scale. The only issue with human is you're going to have to spend probably a day to produce those images. And storage, actually. That's going to be an issue. And an example of storage was the E. Coli gene bank is only 30 megs. Those image pyramids came out to be 12 gigs. So -- yeah. So ->>: Hard drives are cheap. >> Xin-Yi Chua: Yes. So that's sort of why I'm sort of really looking forward to doing that dynamic DeepZoom approach to see what would happen there. But again as I mentioned, in images you can potentially put in a lot more information than what is in a normal sort of gene bank file. So I, for example, could color code my genes based on function or call categories or different domains, that sort of thing. That's a single image but I've got a lot more information in that single image than the normal text-based source. Any other questions? >>: So aside from research purposes, when you see this being used for anything else? >> Xin-Yi Chua: So one of the thoughts throughout the internship was actually this could lend itself to an education domain. So something like the Chrono Zoom. I'm not sure if you're familiar with the Chrono Zoom. But basically it might lend itself well to the education domain. So in that scenario, one could do maybe a cooked up tutorial that goes out from -- if you want to aim it at high school level or sort of undergrad level, so it cooks out from, this is your entire organism, let's concentrate on a dystrophin gene so it causes certain diseases and maybe goes down to the protein level. These are the domains we're interested in. Mutations in the DNA cause these sort of diseases that sort of thing so it goes straight down to the DNA level so that type of tutorial may suit for this type of application. Okay. No other questions, then thank you very much. [applause]

View more...

Comments

Copyright � 2017 NANOPDF Inc.
SUPPORT NANOPDF