-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regarding Catalog Optimal Output File Format #16
Comments
Hi Joe,
yes, this is something I run into as well, and I think is expected to be there. So it is helpful to do a number smaller FoVs covering the desired FoV.
Do you mean you are trying to do just that by storing your instance catalogs for smaller fields of view in a sqlite3 database statement for convenience of using them? I don't think there is such a function, but @danielsf would know better. |
Hi all,
-- Scott |
Hey @rbiswas4, This might give you a better idea of what I am up to: Basically, I want to automate the process of generating instance catalogs for observed AGN in a selected set of unique pointings over the 10 year period of LSST. The end result is about 8338 observations for 9 unique pointings, which contain a total of 624 AGNs. It takes about 10 hours to create all of these files. I suspect this is due to the fact that opening the files and writing them via python is the bottleneck, which is why I was curious to see if there were any quicker methods out there. Considering I will be increasing the FoV to at least a radius of 1 deg from 0.05 deg very soon, longterm storage of both observation meta-data and the instance catalogs is something of a concern. ~ Joe EDIT: Thanks for the comment @danielsf. So the likely bottleneck lies in the photometry mixins? |
I had forgotten that :) . It is so unlikely that I would be writing to a database that I did not think I might have done it. https://github.com/rbiswas4/SNIacatalogs/blob/cadence_notebook/examples/FindLCsFromOpSim.ipynb |
I don't know how much of the bottleneck is in 'python'. Even though the time is taken by the write method of the instance catalog class, this is really where the database is being queried. And increasing the field of view simply means querying through more rows. |
I agree with Rahul. I just did a quick test with a field of view of radius 0.05 degrees (which took 4 seconds) and a field of view of radius 0.1 degrees (which took 60 seconds). The amount of time spent in the mixins quadrupled from 1 second to 4 seconds. The amount of time querying the database when from 1 second to 57 seconds. The galaxies in the database are pretty dense, so you're pulling in a lot of them for any field of view. There might be a way to speed up the database query by cutting out objects that do not have AGN filenames specified at the SQL level. It will require some patches to the code, though. Give me a day or two. |
Actually, I was wrong. You might be able to speed it up now. Try changing your write_catalog statement so that it says: yourCatalog.write_catalog('yourCatalogName.txt', constraint='sedname_agn is not NULL') The constraint kwarg adds an extra constraint to the SQL query. sedname_agn is what the agn SED name is called on fatboy. Applying the "it must have an AGN attached to it" cut at the SQL level, rather than at the python level with the cannot_be_null variable in InstanceCatalog, ought to be faster. Let me know if that does not work. |
Thanks @rbiswas4, I will look into your ipynb for a possible way to store the set of instance catalogs in a database for long-term storage. Thanks to you as well @danielsf. I will try that out with the next run later today to see if it does in fact speed stuff up. Also, is there a way run a loop with write_catalog inside it in parallel? Might be a good thing for us to look into to speed up writing large groups of instance catalogs for independent observations. ~ Joe |
Hey Scott @danielsf, SQLRules = '(sedname_agn IS NOT NULL) AND (agn_sfu < 24.0)' TypeError: write_catalog() got an unexpected keyword argument 'constraint' ~ Joe |
sorry. My mistake. 'constraint' is a kwarg for InstanceCatalog's init. So you need to pass it in when you instantiate the catalog. |
Yeah, I just found it. Also it seems running that the catalog with: Makes a very good cut and reduces the time dramatically. On a 0.1 radius aperture, one catalog of 15 objects is created in 0.38 seconds with the full SQLRules. Getting rid of the magnitude cut, means that the catalog of 272 objects is created in 1.74 seconds. Which means there is a time-saver of 0.0189 seconds per object written. Given that I am writing several thousand files, this may be helpful in when we increase the FoV. ~ Joe |
Just ran the code with the SQLRules value described above for a FoV of 1.75 degrees in radius (aka: LSST's full FoV). It took 34.082017 seconds to generate one observation of 5589 AGNs with mags under 24.0. So, it should take 3.291 days running on this iMac to generate a dataset of 9 unique pointings cover a 10x10 square degree swatch of the night-sky for the entire 10-year span of observations. This is a significant improvement, so it seems that SQL querying is likely the best long-term way of dealing with large data-sets. Maybe we should look into storing the catalogs in a similar manner. My current idea is to make a database of object classes, within which there are time-series tables for each object observed. Each row then is an observation of an object which includes observation-meta data as well as observed magnitudes, errors, etc. This sounds reasonable, yes? ~ Joe |
Have you looked at what the sql is doing on the msqlserver side. If I cheers On Tue, Jul 21, 2015 at 12:43 AM, Joseph Glaser [email protected]
|
I ran the test again and, while the bigger field of view was still slower, it was not 60 times slower. Sometimes things are slower when you run them for the first time. I'm not sure if this is python initializing itself or some extra overhead when first connecting to fatboy. Regardless, it is not as bad as I made it sound. |
That is probably because the database is caching the outputs. On Tue, Jul 21, 2015 at 5:12 PM, danielsf [email protected] wrote:
|
Hey Guys,
So I am noticing that as I generate larger datasets from larger FoVs, CatSim's write_catalog command seems to be running rather slow over large sets of observations. I have begun creating a catalog writer that stores each observation's instance catalog into a table in a SQLite3 database. However, the thought occurred to me that I should probably ask to see if such a command already existed within the InstanceCatalog class. Is this the case?
~ Joe
The text was updated successfully, but these errors were encountered: