Solicitation Q&A

The questions and answers shown here are based on non-proprietary comments or requests for clarifications that are associated with the current versions of the documents released to date. In many cases the questions have been sanitized or revised to reflect a more general case. The Q&A sections mirror the elements of the Solicitation Package. A lower case letter a-e may be appended to the numbering for each question to reflect the Attachment with which it is associated. Questions and Answers are posted in descending order, such that most recent questions are at the top of each section. The date for New/Amended Answers is summarized here.

This Page Last Updated/Amended: February 02, 2010, 5:30 p.m.
    New Q&A: 11a. 40d.

January 28, 2010

The ORNL Exchange Server limits individual email messages to 14MB. To accomodate submissions that may exceed this limit, ORNL offers two alternative methods for delivering electronic submissions.
  • Secure and private access to an ORNL Sharepoint site. Offerors that choose this method will be provided with an ORNL XCAMS username and password that has specific permissions for accessing a private Document Repository. The Offeror will be able to upload files to a private location, and have full control over the contents of that document repository until the deadline for submissions is reached. Offerors that would like to use the XCAMS method should contact the ORNL subcontracts administrator at their earliest convenience so that we can complete the XCAMS account creation process, and test document transfer with the Offeror prior to the submission deadline.
  • Anonymous FTP. ORNL supports an anonymous FTP server that includes additional features that offer privacy and protection from an external GET. Inbound Anonymous FTP is to a PUSH-only subdirectory for which there is no ability to list the subdirectory contents. Contents are kept for two weeks, and then swept. Offerors that choose this method can PUSH files to the subdirectory, but will have no further assurance of receipt other than a message or log from their FTP client that the transaction completed. Offerors that would like to use the Anonymous FTP method should contact the ORNL subcontracts administrator at their earliest convenience so that we can provide explicit directions for testing document transfer prior to the submission deadline.
  • January 27, 2010

    The general period for submitting questions and comments relative to the solicitation has closed. Exceptions will be made for items that are critical to an Offeror completing their proposal. Proprietary questions should be so marked.


    Technical Specification   Evaluation Criteria   Proposal Preparation   Benchmark Instructions   Acceptance Test
      Solicitation
    • Q3. In Section H - Special Provisions of Solicitation No. 6400009257, it is noted that the effort will be ARRA funded. Moreover, the solicitation states that "Special Provisions Related to Work Funded Under American Recovery and Reinvestment Act of 2009 (Dec 2009)." Is UT-Battelle, LLC -- as it relates to the Act -- the Award Recipient (prime contractor) and the Offeror the Subrecipient (first-tier subcontractor) and, in this prime contractor-subcontractor relationship, do only those requirements under Section 3 Subrecipient Information of the Act apply to the sub-contractor? Moreover, are there any other special provisions related to work funded under the Act other than Section 3 that Offerors should be aware of?
    • A3. All requirements of the clause entitled Recovery Act – Special Provisions Related to Work Funded Under American Recovery and Reinvestment Act of 2009 (Dec 2009) will apply to the Successful Offeror. UT-Battelle is the “contractor” and the Successful Offeror is the “first-tier subcontractor” for reporting purposes under the Recovery Act. Note that UT-Battelle will perform all reporting requirements for its subcontractors.

    • Q2. Is it possible for the Government to supply the 2_Solicitation6400009257.pdf form in a non PDF format so that Offerors can electronically enter their information?
    • A2. No, the Solicitation document is only available in a PDF form.
    • Q1. Does the referenced $47M exclude also the annual cost of maintenance and support services?
    • A1. No. Section J.4 states that the total price of the base award, excluding options, shall not exceed $47,000,000.00. The base award includes the elements of the Climate Modeling and Research System as well as the Warranty, Maintenance, and Support Services, as described in Section 11 of Attachment C, the Technical Specification. The separately priced options described in the Solicitation and Offer, Section B.3, Option 1:, Option 2:, Option 3:, Option 4: are not part of the base award. Each represents the individual annual cost to extend the period of performance of an existing maintenance contract on the associated subsystem, and its FS and LTFS as appropriate, beyond that period described in the base award. Note that the option dates in the Solicitation and Offer are based on the delivery of two systems that enter production on 1 October 2010, and 1 October 2011, and each have a 36-month base performance period. The actual performance period for the delivered systems must adhere to the requirements in the Technical Specification, Attachment A, Section 11, and may vary from those stated in the Solicitation and Offer. The request for information relative to Options 1-4 corresponds with the information in Attachment C, Proposal Preparation Instructions, Section 5.4.2. The separately priced option described in the Solicitation and Offer, Section B.3, Option 5: is not included in the base award price, and should follow the instructions in Attachment C, Proposal Preparation Instructions, Section 5.4.1.
      Attachment A: Technical Specification
    • Q11a. Section 8.3 states that "The Offeror's equipment shall attach to the ORNL-provided distribution switches for TCP/IP based connections to the NOAA enclave." Please clarify. Are these distribution switches part of the Offeror's responsibility, or are they provided as part of the base ORNL infrastructure? Section 8.3 states that "Top of rack switches will be used for Layer 2 connectivity for infrastructure/services." Please clarify. Are these top of rack switches part of the Offeror's responsibility, or are they provided as part of the base ORNL infrastructure?
    • A11a. Reference Figure 9 in Section 8.3. There is a large, existing infrastructure that provides all of the necessary connectivity and network services. This infrastructure is designed such that the elements of the CMRS may receive services as needed, via connections to the Distribution Switches. These switches are existing, and are not the responsibility of the Offeror. Please note, Section 8.3 also states that this infrastructure is not provided to interconnect the individual elements of a compute cluster, if such an architecture is proposed. Top of rack switches are simply used to provide an easy method for aggregating servers and then distributing those services. The top of rack switches that are used to help distribute services are not the responsibility of the Offeror.

    • Q10a. Section 4.1.2 states that “In this design, the Offeror shall make the first delivery such that the system (or subsystems) in the first delivery shall enter production no later than 1 October 2010. The second delivery shall be made such that the system (or subsystems) shall enter production no later than 1 October 2011. [C]” Is it the intent that the production date of 1 October 2010 also be marked as [C]?
    • A10a. Yes, all requirements in the described paragraph are considered Critical. The Offeror shall describe any exceptions to items considered critical, and describe proposed methods for mitigating the impact of an offer that does not provide this item.
    • Q9a. Section 5.3 contains two equations (10) and (11) which have an unidentified constant multiplier with a value of 3 at the end of each equation. This constant value and its significance are not defined anywhere. Could the government please provide the significance of the value 3? The paragraph below that stated that the file systems were to grow to no more than 75% full, but if that is the case, then the constant used for the multiplication factor should have been 1.33 instead of 3.
    • A9a. The factor of 3 is to account for data other than model output. The equations produce an estimate of the proposed system's total data output rate, but do not include other important considerations such as input data, job checkpoints, and data staged for transfer (between FS and LTFS and/or between LTFS and the WAN). The 75% requirement must therefore take the factor of 3 into account.
    • Q8a. Section 4.7, Node Memory Subsystem Requirements has a [C] Critical designation for the header, but each of the three subsequent paragraphs in that section has a [S] Significant designation. Please advise if the [C] designation on the header is an error.
    • A8a. The [C] designation on the Section 4.7 Header is an error and may be disregarded.

    • Q7a. Section 5.2 describes a need for the Offeror to provide performance targets for data transfers from the FS to the LTFS, using a data set with files whose size conforms to the distribution shown in Figure (3), CM2-HR File Size Distribution. Can you please clarify what Offerors should include in that synthetic data stream?
    • A7a. The specific distribution for the current version of the model is as follows: File sizes reported in bytes.

      Current CM2-HR File Size Distribution
      File Size Instances File Size Instances File Size Instances File Size Instances File Size Instances
      19780 1 10056596 161 18098192 4 26372008 8 63466288 4
      68964 1 10391984 54 18165880 4 29056940 54 63703896 4
      105620 54 12188044 54 21982168 10 35343716 107 69492816 4
      191600 54 15086192 10 22064380 10 43579048 107 93367552 54
      714364 54 15142600 10 24177088 4 57698968 10 173709720 54
      9096924 54 16592192 8 24267520 4 57914976 10    
      The Offeror may choose to use this specific distribution, as it describes the current file distribution, or one of their own choosing, as long as that distribution conforms to the distribution in Figure (3) of the Technical Specification. Note that the Offeror must describe the method used for generating the synthetic data stream. The corresponding instruction for this is included in Proposal Preparation Instructions, Section 2.2.2.6.

    • Q6a. What are the y-axis units of measure in Figure (3), CM2-HR File Size Distribution?
    • A6a. The y-axis units of measure denote the number of instances. For example, during a one-simulated-year CM2-HR run, more than 100 files are generated that are between 10 and 50MB in size.

    • Q5a. Is it sufficient to run one workflow instance using the number of proposed cores, and use the wall clock time for tCMRS?
    • A5a. It is the responsibility of the Offeror to determine both the number of cores used to execute each instance, and the number of instances that can be completed on the proposed system in no more than 3.5 hours. Offerors may use a single workflow instance time from their benchmark efforts to project the tCMRS variable, as long as they also account for the impact of other instances also running at the same time on the delivered system.

    • Q4a. Attachment D, section 3.2 states that the benchmark must complete in no more than 3.5 hours, and that the offerer shall propose the number of workflow instances that can be completed in the 3.5 hour benchmark time. The formulas for determining storage capacity and bandwidth in the technical specifications document do not appear to take into account the number of workflow instances.
    • A4a. That is correct in that there is no direct correlation to the number of workflow instances. Instead, storage capacity and bandwidth are driven by the anticipated data production rate, RCMRS, for the proposed system, in GB/hour. Reference the relationship from tCMRS to DCMRS and then RCMRS in Attachment A, Section 5.3, Equations (5), (6), and (7).

    • Q3a. Please describe which benchmark is to be used, the procedure to follow, and which timing to use for obtaining the value for the variable tCMRS.
    • A3a. The throughput test is based on the CM-HR benchmark. The procedure is described in Section 3.1 of Attachment D, Benchmark Instructions. The timing from the throughput job (the full, two segment job), projected to the proposed architecture (what the Offeror commits to for the throughput benchmark, as part of acceptance test) should define tCMRS.

    • Q2a. The Technical Specification states that "tCMRS = benchmark job work stream run time on proposed CMRS system as projected by the Offeror (see benchmark instructions )." Is the value of the variable tCMRS derived from the timing of the throughput benchmark documented in Section 3 of Attachment D?
    • A2a. tCMRS, benchmark job work stream run time, workstream instance, and throughput benchmark time are all analogous. The reported tCMRS is the committed time required to complete a single instance on the proposed system, given that there are multiple instances running simultaneously, and given that the Offeror has taken in to account the impact on that instance of the other instances running on that system. The actual benchmark circumstances of an Offeror may produce a different workstream instance time; Offerors shall use the data from the benchmarks to project and commit to a tCMRS.

    • Q1a. Is the variable tE the result of running a single instance of the job work stream benchmark on 8000 cores? How does the Offeror account for a run that is completed on other than 8000 cores?
    • A1a. No. tE is the time on the number of cores needed to meet the requirements of the throughput benchmark job on an existing platform. The number of cores it was run on is not relevant to the calculation of filesystem characteristics using the equations in which it appears. When using these formulas, please use the value of tCMRS that is the proposed for the throughput benchmark job. tE and tCMRS do not need to be run on the same number of cores.
      Attachment C: Proposal Preparation Instructions
    • Q5c. Section 1.2 identifies email as an acceptable form of electronic submission. Are other mechanisms available that will preclude typical email limits for the sizes of the Attachments?
    • A5c. (amended January 28, 2010) Yes. ORNL will provide two additional secure mechanisms for uploading electronic submissions. ORNL will offer XCAMS accounts for private document uploads to a Sharepoint server, and anonymous FTP. Reference information at the top of this page relative to these two services. Offerors should contact the ORNL subcontracts administrator at their earliest convenience if they need either service so that we can provide you with the appropriate information. The ORNL Exchange Server is configured to accept individual emails that are up to 14MB in size. You may use email for responses that fit within this size limitation, or either of these alternatives.
    • Q4c. Section 2.2.5.9 contains a request for FIT rates for common field replaceable components of the CMRS, FS, and LTFS. There does not appear to be a corresponding requirement in Attachment A, Technical Specification, nor an attribute priority. Please clarify.
    • A4c. The Offeror is correct that there is not an explicit paragraph in Section 7 of the Technical Specification that corresponds directly to FIT rates for common FRUs. However, the ability to examine FIT rates for common FRUs will assist the Government in their assessment of multiple sections of Section 7. The Offeror may choose to implicitly or explicitly address FIT rates in their written response to the RRAS sections. The FIT rate information for the CMRS, FS, and LTFS is expected to contribute to the larger assessment of the applicable sections. As there are no specific requirements for individual FIT rates for FRUs, there is no explicit attribute priority.
    • Q3c. Section 2.2.5.4 contains a description for Mean Time to Data Loss (MTTDL) for a RAID set of the FS, and a formula for calculating it. There does not appear to be a corresponding requirement in Attachment A, Technical Specification, nor an attribute priority. Please clarify.
    • A3c. The Offeror is correct that there is not an explicit paragraph in Section 7 of the Technical Specification that corresponds directly to MTTDL. This (the instruction in 2.2.5.4) is a request for information that contributes to the larger, or more general, assessment of the anticipated reliability of the Offeror's proposed file system. Note that, unlike the requests for calculation of SMBTI and SMTBF, there are no explicit design goals, targets, or ranges. Because no design goals, targets, or ranges are provided, there is no explicit attribute priority to associate with the calculation.

    • Q2c. Are system drawings or schematics included in the 50p. count for the format for the Technical Proposal?
    • A2c. Yes. However, please note that Section 1.3 of Attachment C allows foldouts for large or complex diagrams. An 11x17 foldout that is specifically used to demonstrate or display large or complex diagrams may be counted as a single page with respect to the page limit for the Technical Proposal. In addition, a foldout may contain one or more diagrams. Each diagram may include a Figure reference, but Offerors should specifically not address other elements of their response on these foldouts. A foldout that the Offeror chooses to treat as a single page should include a single page number, using the same or equivalent header or footer that the remainder of the response uses for delineating page count.

    • Q1c. Is a compliance matrix allowed, and would this compliance matrix count against the 50 page limit?
    • A1c. Compliance matrices are allowed, shall be clearly marked as such, and will not count against the 50 page limit of the Technical Proposal. Offerors should note that the compliance matrix will be used to assist with the cross reference of requirements to the Offeror's proposal, but that supplemental material within the matrix will not be used to assist in any element of the technical elements evaluation.
      Attachment D: Benchmark Instructions
    • Q40d. Section 1.7 of Attachment D specifies that the materials to be returned as part of the Proposal should be provided on ISO-9660 CDROM format. Would it be acceptable to return the material on a USB device, such as the one on which the original benchmarks were distributed?
    • A40d. Yes. In general, any commonly acceptable physical media in a commonly readable form is fine. It is requested that the Offeror explicitly state the format used for writing the physical media as part of their response.

    • Q39d. The information provided in Answer 28d gives an example of a core count that is not possible with the restriction for APES. Below is an excerpt from the answer:
      "One of the elements in the pair specifying the XX_layout may have a value of 0 (zero) where XX is any of {ap, op, ip}. In this case, the appropriate value will be computed by the code for the element specified by 0. Note that a value of 0 for XX_layout requires that the non-zero element must be a factor of APES (or OPES if XX=op). As an example, APES=202 is incompatible with ip_layout=0,4 because 202/4 is not an integer."

      According to this example, one could choose ip_layout=0,2 since 202/2 is an integer. However, APES could not be 202 since there is also a requirement that APES=6*ap_layout(x)*ap_layout(y) and 202 is not divisible by 6.
      Do all constraints need to be simultaneously satisfied?
    • A39d. Yes, all constraints given in A28d, for all components, need to be simultaneously satisfied. The example only illustrated that APES=202 is incompatible with an ice layout given by ip_layout=0,4. Another example that fails the ice layout restrictions is the combination of { APES=198 and ip_layout=0,4 } because 198/4 is not an integer. While APES=198 satisfies the constraint for the atmosphere and land components (6*ap_layout(x)*ap_layout(y)=198 for ap_layout=3,11), it still fails the ice layout restriction.

    • Q38d. Section 2.6.3 of Attachment D, FIM Verification Procedure, states that both "(t)he contents of fim_out_* files should be identical across different numbers of processors on any machine" and "(b)it wise reproducibility across processor counts is expected at the lowest optimization levels". In testing, we find that "fim_out_*" output files produced by two different MPI rank size jobs may not be bitwise identical in all cases, at any compiler optimization level, with multiple compilers. They do, however, always meet the validation criteria. Can the Government confirm that this should be the case for any two rank count cases?
    • A38d. The FIMnamelist file as originally provided in the Benchmark distribution can produce this behavior, where the validation criteria are met, but the results are not bit-wise identical. Offerors are provided the following additional information that will typically correct this situation: Bitwise-exactness is expected across different numbers of MPI tasks when either of the following two settings are made to the fim/FIMrun/FIMnamelist file:
          FixedGridOrder=.true.
      or/and
          Curve=0

      Offerors may choose whichever setting yields the best performance, and report the results as part of the baseline. Offerors may retain these settings for optimization runs at their discretion.

    • Q37d. Section 2.3.3.1, Paragraph 2, states: "The reproducibility of the atmospheric and ocean components of the model may be verified through a series of checksums and global integrals written to stdout at the end of the run."
      Both CM-CHEM and CM2-HR write many checksums to stdout in the course of their execution. Please specify which checksums should be used to determine reproducibility. Are the “global integrals” the values written to the diag_integral.out file?
    • A37d. Offerors may demonstrate reproducibility by comparing all checksums written to STDOUT across the Offeror's different PE-count reproducibility experiments. Any suitable tool for comparing the checksums across disparate or separate runs is acceptable. As this output is all text from STDOUT, a common UNIX command such as diff is certainly appropriate.
      Any differences for a line containing the keywords 'fv_restart_end', 'checksum', 'chksum' will indicate a reproducibility failure.
      The number of significant digits written in the diag_integral.out file is too small to guarantee absolute reproducibility.

    • Q36d. The Government's answer to benchmark question 35d and subsequent modification of Section 3.2 "Throughput benchmark scoring" eliminates a lower bound on the runtime. However, this creates a dichotomy that needs clarification so that we can best respond to the throughput benchmark requirements.
      By counting instances and by including instance runtime in the evaluation, the CMRS workflow throughput test asks the Offeror to maximize capability while at the same time maximizing capacity (conflicting goals). Since CM-HR does not exhibit perfect scaling and faster runtimes imply lower efficiency, these factors are inversely related, requiring the Offeror to make a choice between total throughput per day and minimal runtime per instance.
      How does the Offeror make this choice without an explicit indication of which is more valuable to the Government? Will the Government please provide additional guidance on the scoring metrics for the benchmark throughput test so that an Offeror can choose between maximizing the number of instances vs. minimizing the elapsed time?
      An example may help describe the confusion. Given a target CMRS configuration with 100,000 cores, and benchmark results that support the following three hypothetical scenarios:
          1. The throughput job runs on 500 cores per instance in 12599 seconds (3.5 hours - 1 second) so 200 instances may be run every 12599 seconds.
          2. The throughput job runs on 1000 cores per instance in 7000 seconds, so 100 instances may be run every 7000 seconds.
          3. The throughput job runs on 2000 cores per instance in 4000 seconds, so 50 instances may be run every 4000 seconds.
      Which is preferred?
    • A36d. The primary purpose of the throughput benchmark is to define the capacity of the proposed configuration. The total (aggregated across instances) throughput shall be maximized with the constraint that no individual job take longer than 3.5 hours. The performance as a function of core count will be assessed separately, using the scaling study benchmark.
      For the example scenarios above, we can convert from runtime to per-instance simulation rates (simulated years per day, SYPD) by using the simulation length over the two segments that comprise this benchmark test, which is 0.4045 years. We can then compute the aggregate number of simulated years per day for each scenario by multiplying by the number of instances:
          1. 0.4045*86400/12599= 2.77 SYPD/instance * 200 = 554.8 SYPD
          2. 4.99 SYPD/instance * 100 = 499.3 SYPD
          3. 8.74 SYPD/instance * 50 = 436.9 SYPD
      For this hypothetical scenario, the Offeror should choose Scenario 1 to maximize the aggregate SYPD. Note that a solution for which the throughput job runs on 500 cores per instance in 12400 (versus 12599) seconds and delivers an aggregate of 563.7 SYPD would score higher even though the total number of instances in 3.5 hours is still 200.
      As described in section 3.2 Throughput Benchmark Scoring, ORNL will extrapolate from the number of instances that can be completed within the 3.5 hour time limit to the number that can be done per day by extrapolating the segment runtimes to segment lengths that represent typical production runs rather than using the simple extrapolation done in the example above, which uses the total runtime of the instance (2 short simulation segments plus two data movement segments).

    • Q35d. In Section 3.2, Throughput Benchmark Scoring, the statement "improvements in job segment time as well as total wallclock time to completion are important so Offerors should run each instance on the number of cores that gives a performance sweetspot and an instance runtime between 3 and 3.5 hours.[R]" seems contradictory, as an Offeror's solution may complete in less than the baseline measurement time, but might be considered non-compliant.
    • A35d. Correct. The Benchmark Instructions have been revised to eliminate the lower runtime bound of 3 hours, as solutions that complete more quickly are preferred. The 3.5 hour upper bound is specified so as to produce a minimum simulation rate (number of simulated years per day) per instance, which is needed for scientific productivity. The statement has been corrected in Revision 2 of the Benchmark Instructions, posted on the Home page of this web site.

    • Q34d. Attachment D, Section 3.1, paragraph 4 says that "The Offeror shall report the timings produced by each segment as well as the time for the complete job script including all data movement in the Benchmark_Results.xls file. If benchmark runs are done using only one filesystem, data shall be copied and then deleted to ensure that blocks are actually moved. [R]” Please explain this instruction in terms of the throughput run script that was provided, i.e., what lines must be added, removed or changed to achieve this.
    • A34d. (amended January 19, 2010) The script was released with explicit copy/rm. The path to the final destination of the history/ascii/restart data is:

      set OutputDir=$workbase/OUTPUT/${name}.${npes}.$BATCH_JOBID

      If a vendor decides to use the two-filesystem approach, the outputDir will need to be modified accordingly. For a two-filesystem approach, please modify the outputDir to point to the filesystem/directory where the restart/ascii/history data is to be stored for further analysis.

    • Q33d. The benchmark instructions section 1.6.3 (User of Hardware Undersubscription) says "Additional results may be provided as part of the reported Optimized Results if improved performance and capacity are achieved in other configurations. [D]" Please explain this statement, particularly with respect to "capacity". What is the measure of capacity?
    • A33d. The baseline results must use all cores on each socket and all sockets on each node. The additional results may leave some cores or sockets unused. Capacity is measured by the throughput benchmarking, described in section 3.2. Like the scaling studies, baseline results for the throughput benchmark must use all cores on each socket and all sockets per node. Additional results for the throughput benchmark may be provided with some cores or sockets left unused. Regardless of whether all cores are used, the capacity is measured as the number of instances that can be run on the proposed system in the allowed time.

    • Q32d. The CM-HR makefiles, Makefile.atmos_dyn, Makefile.atmos_phys, Makefile.coupler, Makefile.fms, Makefile.ice, and Makefile.mom4p1, specify the macro definition ­Duse_shared_pointers. What is the significance of this macro?
    • A32d. The macro definition '-Duse_shared_pointers' is a legacy option whose absence or inclusion during the compilation has no effect.

    • Q31d. The makefile templates, mkmf.template.pscale, for both CM-CHEM and CM-HR include the pathf95 compiler option ³-byteswapio², which is documented to perform byte swapping during I/O operations on unformatted data files. Which I/O files require the use of this option?
    • A31d. This option applies to unformatted files. A list can be obtained by using the unix "file" command in the INPUT directory and ignoring all files with the '.nc' extension. For the CM-CHEM benchmark this indicates that the cns_1600_* will use this option.

    • Q30d. Section 3.1 of the benchmark instructions refers to the baseline measurement for the Throughput Test. Is the baseline measurement the value of t_e (11280 seconds) as shown in Figure 4 of Attachment A, Section 5.3, or is it 12076 seconds, the time included in the sample throughput test output (file CM-HR-tput.693405.txt)?
    • A30d. The value of t_E provided in Attachment A should be used. Timings in files provided with the benchmark distribution are samples only and those values should be considered irrelevant.

    • Q29d. For the CM-HR verification test, the current test specifies that abs_sw = 244.150 +/- .08. Is there a typographical error in this value?
    • A29d.Yes. The Government has identified a typographical error in the value stated for abs_sw in CM-HR/VERIFICATION_DATA file. The value should read:

      abs_sw = 247.150 +/- .08

      An updated version of CM-HR/VERIFICATION_DATA reflecting this correction is now posted on the Home page of this website.

    • Q28d. Can the Government provide additional clarification with respect to modification of the decomposition for the Ice model of the CM-CHEM and CM2-HR applications?
    • A28d. (amended January 19, 2010) The following information describes the variables used to control the decomposition of each component of the coupled climate system. In addition, there are descriptions of the recommended changes to the CM-CHEM and CM-HR scripts that allow the user to modify the decomposition for the ice model in the same way that is currently being done with the ocean, atmosphere and land models.
      This information applies to both CM-CHEM and CM-HR.
      The ocean component runs on a separate set of cores, concurrently with the other components of the climate model (atmosphere, ice and land). The total number of processors (core), NPES, is the sum of the number used for the ocean component (OPES) and those used for the other components (APES).
            NPES=APES+OPES

      OP_LAYOUT describes the decomposition in x and y for the ocean.
            OPES=op_layout(x)*op_layout(y)

      AP_LAYOUT is the decomposition in x and y for the atmosphere and land components. In production runs, the land component is typically given the same decomposition as the atmosphere. This was done in the scripts provided.
            APES=6*ap_layout(x)*ap_layout(y)

      For an atmospheric grid size of 180x180, the maximum values are
            ap_layout(x)=ap_layout(y)=45.

      IP_LAYOUT is the decomposition in x and y for the ice component.
            APES=ip_layout(x)*ip_layout(y).

      One of the elements in the pair specifying the XX_layout may have a value of 0 (zero) where XX is any of {ap, op, ip}. In this case, the appropriate value will be computed by the code for the element specified by 0. Note that a value of 0 for XX_layout requires that the non-zero element must be a factor of APES (or OPES if XX=op). As an example, APES=202 is incompatible with ip_layout=0,4 because 202/4 is not an integer.
      For each component, there are corresponding variables XX_io_layout to describe the IO-subsets. Each element of XX_io_layout must divide equally into the corresponding element in XX_layout. For example, if ap_layout=10,18, then ap_io_layout=1,9 is valid. ap_io_layout=1,10 is not valid, because 18/10 is not an integer.
      While XX_io_layout will accept a value of 0 (zero), the result will be that the model uses values set in fms_io_nml. For the scripts provided, this implies that all processes will read and write file sets.
      To modify the decomposition of the ice model the following script changes are provided:
      Modify layout and io_layout in ice_model_nml in the following manner
      CM-CHEM:
            <   layout=0,4
            <   io_layout=1,4

      becomes
            >   layout=${ip_layout}
            >   io_layout=${ip_io_layout}

      Insert two new lines at the top of the script after the definition of "set op_io_layout="
             set ip_layout=0,4
             set ip_io_layout=1,4

      CM-HR:
            <   layout=0,4
            <   io_laout=9,4

      becomes
            >   layout=${ip_layout}
            >   io_layout=${ip_io_layout}

      Insert two new lines at the top of the script after the variable definition "set op_io_layout"
             set ip_layout=0,4
             set ip_io_layout=9,4

      Note: The values of ip_layout and ip_io_layout defined here maintain the layouts as provided in the initial scripts. Offerors may modify the values as described to achieve their desired layouts and to be compatible with the value of APES that they select.

    • Q27d. During CM compilation, the following error is seen
      ../src/atmos_param/shallow_cu/conv_plumes_k.F90", line 1100.0: 1515-010 (S) String is missing a closing delimiter. Closing delimiter assumed at end of line.
    • A27d. If CM compilation fails with this error, Offerors may modify the offending statement as
      print*, qlu_new, qiu_new, clu_new, ciu_new, &
             qrj, qsj, qlj, qij, "??????????????????"

    • Q26d. During CM compilation, the following error is seen
      ../src/land_lad2/land_model.F90", line 557.34: 1515-019 (S) Syntax is incorrect.
    • A26d. If CM compilation fails with this error, Offerors may modify the offending statement as
      write(*,'(99(a,i3.2,x))') 'i=',i,'j=',j,'face=',current_face()

    • Q25d. During CM compilation, the following error is seen
      "../src/atmos_shared/tracer_driver/tropchem/strat_chem_utilities.F90", line 57.42: 1516-083 (S) All elements in an array constructor must have the same type and type parameters.
    • A25d. If CM compilation fails with this error, Offerors may modify the offending statement as (ignoring the difficulties with translating FORTRAN spacing and continuation characters to HTML)
      character(len=32), dimension(nspecies_age), save :: dfdage_name = &
             (/ "dfdage_cfc11 ", "dfdage_cfc12 ", "dfdage_cfc113 ", "dfdage_ccl4 ", &
             "dfdage_ch3cl ", "dfdage_ch3ccl3", "dfdage_hcfc22 ", "dfdage_bry " /)

      Note the addition of spaces within the quote marks. There may be similar instances in other source code files. The same solution may be employed as needed.

    • Q24d. During CM compilation, the following error is seen
      ../src/atmos_param/cloud_obs/cloud_obs.F90", line 2.20: 1513-191 (S) A variable declared in the scope of a module, interp, that is of a derived type with default initialization, must have the SAVE attribute.
    • A24d. If CM compilation fails with this error, Offerors may modify the source code from
      type (horiz_interp_type) :: Interp
      to
      type (horiz_interp_type), save :: Interp
      as a suggested solution to this problem. There are similar instances in other source code files. A similar solution may be employed as needed.

    • Q23d. Execution of the CM codes fails at runtime because /dev/null is not user writable on the benchmark system.
    • A23d. In the CM source code files ../src/shared/mpp/mpp_domains.F90 and ../src/shared/mpp/mpp.F90, an Offeror may update the source code as
        character(len=32) :: etcfile='._mpp.nonrootpe.msgs'
      ! character(len=32) :: etcfile='/dev/null'
      to correct this problem.

    • Q22d. In the CM source code file ../src/shared/mpp/mpp_domains.F90, attempts to compile were unsuccessful.
    • A22d. mpp_domains.F90 may be compiled with reduced optimization including -O0.

    • Q21d. Both the Technical Specification and the Benchmark Instructions reference netCDF. There are several versions of netCDF available. Is there guidance about which version of netCDF should be used?
    • A21d. The applicable codes have been successfully run using the netCDF 3.6.2 (64-bit) libraries and utilities. It is also expected that Offerors will generally have success with netCDF 3.6.3. Offerors may find that netCDF 4.0.1 is more problematic in specific development environments.

    • Q20d. In Attachment D, Section 2.3.3.2, is the last paragraph misplaced?
    • A20d. Yes. Please refer to Revision 1 of Attachment D for a correction to Section 2.3.3. This Revision is posted on the Home page of this website.

    • Q19d. Please clarify the expected run sequence for the five components of the WACCM benchmark.
    • A19d. The Offeror shall provide the results for Tests 1, 2, 3, and 4 at the core count and compiler optimization options of their choosing. These define the ability of the Offeror to demonstrate the correctness criteria. There is no requirement to demonstrate scaling across multiple core counts for CAM/WACCM Tests 1, 2, 3, and 4. The Offeror shall then execute Test 5 using those same compiler optimization options (the lowest optimization needed to acheive the correctness criteria) to establish the Baseline Results at a variety of core counts. The results of Test 5 are the basis for the scaling study for CAM/WACCM. The Offeror may also submit an Optimized set of results for Test 5.

    • Q18d. The CM-CHEM code fails a pointer check in mpp_redistribute_2D_ (mpp_update_domains2D.h, line 207) when it uses LOC() in an attempt to take the address of an unassociated F90 pointer. Subroutine FLUX_OCEAN_TO_ICE calls MPP_REDISTRIBUTE (line 2914 of coupler/flux_exchange.F90) with the unassociated F90 pointer component u_surf in the structure Ocean. In fact, all F90 pointers in the Ocean structure referenced in this section of code (“case(REDIST)”) by all atmosphere pe’s are unassociated. It’s not clear exactly what damage results when, without the pointer check, the code goes on to redistribute the non-existent arrays. Can we be assured that this behavior has no adverse affect on the results that are produced by the program or on the integrity of the code in general?
    • A18d. The Government has experienced this behavior from the Intel compiler when using "debug" options. Based on our experience, the unassociated pointers associated with the atmosphere pe-list do not impact the integrity of the data when executing without the pointer checks as the data is being redistributed from the ocean pe-list, where the pointers are all valid, to valid space in the atmosphere pe-list.

    • Q17d. NCEP developers have corrected a bug in Benchmark GFS, source code file gfidi_hyb.f, involving a SAVED variable in a threaded region (the entire routine called in a threaded region).
    • A17d. The repair to this file is posted on the Home page of this website in the gfidi_hyb.f file.

      GFS also compartmentalizes calculations into the "Physics" and "Dynamics" portion. To assist with debugging, Physics can be disabled with a namelist option and the code then simply integrates the five simplified equations of motion.

      The namelist option is
      adiab=.true.
      in the file gfs_namelist. False (full physics) is the default.

    • Q16d. A FORTRAN language trap is identified in file

      CM-HR/base/src/atmos_param/moist_processes/moist_processes.F90, subroutine moist_processes

      where parameter nqn is found to be undefined and out of range. How should Offerors address this issue?
    • A16d. Offerors may revise the source code as folllows:

      used = send_data (id_qndt_ls, q_tnd(:,:,:,nqn), Time, is, js, 1, rmask=mask)

      becomes

      if (do_liq_num) used = send_data (id_qndt_ls, q_tnd(:,:,:,nqn), Time, is, js, 1, rmask=mask)

    • Q15d. A FORTRAN language trap is identified in file

      CM-HR/base/src/mom4p1/ocean_core/ocean_model.F90, subroutine ocean_model_data2D_get

      in a series of statements, one of which is

      array2D(isc:,jsc:) = Grid%tmask(Domain%isc:,Domain%jsc:,1)

      where the two arrays array2D and tmask do not have the same bounds. How might this be addressed?
    • A15d. Offerors may revise the source code as follows:

      case('mask')
      array2D(isc:,jsc:) = Grid%tmask(Domain%isc:,Domain%jsc:,1)

      may be revised as

      integer:: iec, jec
      iec = isc + size(array2D,1) - 1
      jec = jsc + size(array2D,2) - 1
      ...
      select case(name)
      ...
      case('mask')
      array2D(isc:iec,jsc:jec) = Grid%tmask(Domain%isc:Domain%iec,Domain%jsc:Domain%jec,1)

      These changes apply to CM-CHEM as well.

    • The Benchmark Q&A below are associated with questions asked and answered prior to the release of the Solicitation

    • Q14d. Compilation of the source code generates warnings relative to the f90 standard, especially with respect to logical compares. One example from src/ice_sis/ice_type.F90 includes

    • if (add_diurnal_sw .eq. .true.) call astronomy_init

      which could be updated as

      if (add_diurnal_sw .eqv. .true.) call astronomy_init

      To what degree should an Offeror make changes to eliminate these warnings?

    • A14d. F90 semantics would dictate that .eqv. should be used in this instance. Code modification of this nature are allowed as needed.

    • Q13d. (Source code error identified in CM-CHEM and CM2-HR)
    • A13d. Please note that Offeror should change line 168 of src/shared/coupler/ensemble_manager.F90 from

      ensemble_pelist(n, atmos_npes+1:npes) = ensemble_pelist_ocean(n,1:atmos_npes)
      to
      ensemble_pelist(n, atmos_npes+1:npes) = ensemble_pelist_ocean(n,1:ocean_npes)
      This modification should be applied to both CM-CHEM and CM2-HR.

    • Q12d. What is the expected maximum number of MPI ranks that the FIM global 10km benchmark can use?
    • A12d. The maximum number of processors that can be applied to the 10km FIM is limited by the halo size, as the halo must be contained within a neighboring process. This is much more difficult to calculate with the icosahedral grid used in FIM than with a standard Cartesian grid. The theoretical limit would likely be more than 100,000 processes. For practical purposes, 40,000 processes should still produce reasonable efficiency. With 40,992 (including 32 I/O) processors, each compute processor would calculate 144 points.

    • Q11d. Are individual benchmarks, such as FIM, available for download?
    • A11d. Individual benchmarks are not available for download. Please request a copy of the benchmark volume by contacting the ORNL subcontracts administrator for this Solicitation.

    • Q10d. What format are the benchmarks delivered in?
    • A10d. The benchmarks were created on an Ubuntu Linux distribution. The format of the file system containing the benchmarks is ext3.

    • Q9d. What parameters or changes are needed to run FIM with core counts larger than 2000 cores?
    • A9d. The Offeror is asked to refer to the "ComputeTasks = ..." entry in the FIMnamelist file. Please note that the number of cores requested in the job submission should also take into account the need for an additonal 32 cores assigned to file I/O. This is also documented in the FIM README.

    • Q8d. Can you please provide little endian versions of the GFS data files?
    • A8d. Offerors may create these using the provided instructions and the fendian_conv.c code. They are also posted on the Home page of this website.

    • Q7d. It appears that there is an error in the CM-CHEM-verification job script with respect to the specified output directories. Can you confirm?
    • A7d. There is an incorrect path in the CM-CHEM-verification job script that refers to CM-CHEM-repro that Offerors should change to CM-CHEM-verification.

    • Q6d. From the Benchmark Instructions, "GFS is a global spectral weather model developed and used at NOAA NCEP. To build the GFS executable you will need to download and build ESMF version 2.2.2 release date 03/16/06 from http://www.esmf.ucar.edu/download/releases.shtml. Are we allowed to use a more recent version of the library? For instance,there is Version 4.0 (dated 10/30/09)
    • A6d. The ESMF API has changed from v2.2.2 to v.4.0. It may be difficult to ensure that the GFS code will work correctly using v.4.0. The modifications needed are allowed, but this is not suggested.

    • Q5d. What version of ESMF should I be using for the GFS benchmark? The top level README says "emsf-2.2.2rp2" but the README file under gfs directory seems to indicate a different version, "ESMF version v2.2.2 release date 03/16/06".
    • A5d. ESMF v2.2.2, release date 03/16/06 is the official release. Please use this version if possible.

    • Q4d. Can ORNL provide additional reference output for CM-CHEM and CM2-HR? Specifically, fms.out log files from CM-CHEM and CM2-HR for the short verification run and a full scaling run and diag_integral.out from both CM-CHEM and CM2-HR for the short verification run and a full scaling run.
    • A4d. Yes, these reference files are now posted on the Home page of this website in the NOAA_benchmark_output.tar.gz file.

    • Q3d. Can Government provide as reference a standard output file for a CM2-HR run that completes in less than 3.5 hours?
    • A3d. Yes. An example is posted on the Home page of this website. A CM2-HR throughput example generating output CM-HR-tput.693405 completed on a Cray XT5 using 2150 cores in 03:21. The download has an artificial .txt extension.

    • Q2d. Are there updates available to the CM-HR-tput throughput benchmark script?
    • A2d. Yes. That update, CM-HR-tput.csh, is posted on the Home page of this website. Minor modifications to suit your benchmarking environment may be required. As the download file has a .csh extension, you may need to right-click and save the file to prevent your OS from attempting to run the script.

    • Q1d. May Offerors see the Benchmark Results spreadsheet template?
    • A1d. That spreadsheet, Benchmark_Results.xls, is posted as the Throughput Benchmark Spreadsheet on the Home page of this website.